GNU Info

Info Node: (wget.info)HTTP Options

(wget.info)HTTP Options


Next: FTP Options Prev: Directory Options Up: Invoking
Enter node , (file) or (file)node

HTTP Options
============

`-E'
`--html-extension'
     If a file of type `text/html' is downloaded and the URL does not
     end with the regexp `\.[Hh][Tt][Mm][Ll]?', this option will cause
     the suffix `.html' to be appended to the local filename.  This is
     useful, for instance, when you're mirroring a remote site that uses
     `.asp' pages, but you want the mirrored pages to be viewable on
     your stock Apache server.  Another good use for this is when you're
     downloading the output of CGIs.  A URL like
     `http://site.com/article.cgi?25' will be saved as
     `article.cgi?25.html'.

     Note that filenames changed in this way will be re-downloaded
     every time you re-mirror a site, because Wget can't tell that the
     local `X.html' file corresponds to remote URL `X' (since it
     doesn't yet know that the URL produces output of type `text/html'.
     To prevent this re-downloading, you must use `-k' and `-K' so
     that the original version of the file will be saved as `X.orig'
     (Note: Recursive Retrieval Options).

`--http-user=USER'
`--http-passwd=PASSWORD'
     Specify the username USER and password PASSWORD on an HTTP server.
     According to the type of the challenge, Wget will encode them
     using either the `basic' (insecure) or the `digest' authentication
     scheme.

     Another way to specify username and password is in the URL itself
     (Note: URL Format).  Either method reveals your password to
     anyone who bothers to run `ps'.  To prevent the passwords from
     being seen, store them in `.wgetrc' or `.netrc', and make sure to
     protect those files from other users with `chmod'.  If the
     passwords are really important, do not leave them lying in those
     files either--edit the files and delete them after Wget has
     started the download.

     For more information about security issues with Wget, Note:
     Security Considerations.

`-C on/off'
`--cache=on/off'
     When set to off, disable server-side cache.  In this case, Wget
     will send the remote server an appropriate directive (`Pragma:
     no-cache') to get the file from the remote service, rather than
     returning the cached version.  This is especially useful for
     retrieving and flushing out-of-date documents on proxy servers.

     Caching is allowed by default.

`--cookies=on/off'
     When set to off, disable the use of cookies.  Cookies are a
     mechanism for maintaining server-side state.  The server sends the
     client a cookie using the `Set-Cookie' header, and the client
     responds with the same cookie upon further requests.  Since
     cookies allow the server owners to keep track of visitors and for
     sites to exchange this information, some consider them a breach of
     privacy.  The default is to use cookies; however, _storing_
     cookies is not on by default.

`--load-cookies FILE'
     Load cookies from FILE before the first HTTP retrieval.  FILE is a
     textual file in the format originally used by Netscape's
     `cookies.txt' file.

     You will typically use this option when mirroring sites that
     require that you be logged in to access some or all of their
     content.  The login process typically works by the web server
     issuing an HTTP cookie upon receiving and verifying your
     credentials.  The cookie is then resent by the browser when
     accessing that part of the site, and so proves your identity.

     Mirroring such a site requires Wget to send the same cookies your
     browser sends when communicating with the site.  This is achieved
     by `--load-cookies'--simply point Wget to the location of the
     `cookies.txt' file, and it will send the same cookies your browser
     would send in the same situation.  Different browsers keep textual
     cookie files in different locations:

    Netscape 4.x.
          The cookies are in `~/.netscape/cookies.txt'.

    Mozilla and Netscape 6.x.
          Mozilla's cookie file is also named `cookies.txt', located
          somewhere under `~/.mozilla', in the directory of your
          profile.  The full path usually ends up looking somewhat like
          `~/.mozilla/default/SOME-WEIRD-STRING/cookies.txt'.

    Internet Explorer.
          You can produce a cookie file Wget can use by using the File
          menu, Import and Export, Export Cookies.  This has been
          tested with Internet Explorer 5; it is not guaranteed to work
          with earlier versions.

    Other browsers.
          If you are using a different browser to create your cookies,
          `--load-cookies' will only work if you can locate or produce a
          cookie file in the Netscape format that Wget expects.

     If you cannot use `--load-cookies', there might still be an
     alternative.  If your browser supports a "cookie manager", you can
     use it to view the cookies used when accessing the site you're
     mirroring.  Write down the name and value of the cookie, and
     manually instruct Wget to send those cookies, bypassing the
     "official" cookie support:

          wget --cookies=off --header "Cookie: NAME=VALUE"

`--save-cookies FILE'
     Save cookies from FILE at the end of session.  Cookies whose
     expiry time is not specified, or those that have already expired,
     are not saved.

`--ignore-length'
     Unfortunately, some HTTP servers (CGI programs, to be more
     precise) send out bogus `Content-Length' headers, which makes Wget
     go wild, as it thinks not all the document was retrieved.  You can
     spot this syndrome if Wget retries getting the same document again
     and again, each time claiming that the (otherwise normal)
     connection has closed on the very same byte.

     With this option, Wget will ignore the `Content-Length' header--as
     if it never existed.

`--header=ADDITIONAL-HEADER'
     Define an ADDITIONAL-HEADER to be passed to the HTTP servers.
     Headers must contain a `:' preceded by one or more non-blank
     characters, and must not contain newlines.

     You may define more than one additional header by specifying
     `--header' more than once.

          wget --header='Accept-Charset: iso-8859-2' \
               --header='Accept-Language: hr'        \
                 http://fly.srk.fer.hr/

     Specification of an empty string as the header value will clear all
     previous user-defined headers.

`--proxy-user=USER'
`--proxy-passwd=PASSWORD'
     Specify the username USER and password PASSWORD for authentication
     on a proxy server.  Wget will encode them using the `basic'
     authentication scheme.

     Security considerations similar to those with `--http-passwd'
     pertain here as well.

`--referer=URL'
     Include `Referer: URL' header in HTTP request.  Useful for
     retrieving documents with server-side processing that assume they
     are always being retrieved by interactive web browsers and only
     come out properly when Referer is set to one of the pages that
     point to them.

`-s'
`--save-headers'
     Save the headers sent by the HTTP server to the file, preceding the
     actual contents, with an empty line as the separator.

`-U AGENT-STRING'
`--user-agent=AGENT-STRING'
     Identify as AGENT-STRING to the HTTP server.

     The HTTP protocol allows the clients to identify themselves using a
     `User-Agent' header field.  This enables distinguishing the WWW
     software, usually for statistical purposes or for tracing of
     protocol violations.  Wget normally identifies as `Wget/VERSION',
     VERSION being the current version number of Wget.

     However, some sites have been known to impose the policy of
     tailoring the output according to the `User-Agent'-supplied
     information.  While conceptually this is not such a bad idea, it
     has been abused by servers denying information to clients other
     than `Mozilla' or Microsoft `Internet Explorer'.  This option
     allows you to change the `User-Agent' line issued by Wget.  Use of
     this option is discouraged, unless you really know what you are
     doing.


automatically generated by info2www version 1.2.2.9