GNU Info

Info Node: (wget.info)Spanning Hosts

(wget.info)Spanning Hosts


Next: Types of Files Prev: Following Links Up: Following Links
Enter node , (file) or (file)node

Spanning Hosts
==============

   Wget's recursive retrieval normally refuses to visit hosts different
than the one you specified on the command line.  This is a reasonable
default; without it, every retrieval would have the potential to turn
your Wget into a small version of google.

   However, visiting different hosts, or "host spanning," is sometimes
a useful option.  Maybe the images are served from a different server.
Maybe you're mirroring a site that consists of pages interlinked between
three servers.  Maybe the server has two equivalent names, and the HTML
pages refer to both interchangeably.

Span to any host--`-H'
     The `-H' option turns on host spanning, thus allowing Wget's
     recursive run to visit any host referenced by a link.  Unless
     sufficient recursion-limiting criteria are applied depth, these
     foreign hosts will typically link to yet more hosts, and so on
     until Wget ends up sucking up much more data than you have
     intended.

Limit spanning to certain domains--`-D'
     The `-D' option allows you to specify the domains that will be
     followed, thus limiting the recursion only to the hosts that
     belong to these domains.  Obviously, this makes sense only in
     conjunction with `-H'.  A typical example would be downloading the
     contents of `www.server.com', but allowing downloads from
     `images.server.com', etc.:

          wget -rH -Dserver.com http://www.server.com/

     You can specify more than one address by separating them with a
     comma, e.g. `-Ddomain1.com,domain2.com'.

Keep download off certain domains--`--exclude-domains'
     If there are domains you want to exclude specifically, you can do
     it with `--exclude-domains', which accepts the same type of
     arguments of `-D', but will _exclude_ all the listed domains.  For
     example, if you want to download all the hosts from `foo.edu'
     domain, with the exception of `sunsite.foo.edu', you can do it like
     this:

          wget -rH -Dfoo.edu --exclude-domains sunsite.foo.edu \
              http://www.foo.edu/


automatically generated by info2www version 1.2.2.9