Copyright (C) 2000-2012 |
Manpages html2textSection: User Commands (1)Updated: 2001-10-05 Index Return to Main Contents NAMEhtml2text - an advanced HTML-to-text converterSYNOPSIShtml2text -helphtml2text -version html2text [ -unparse | -check ] [ -debug-scanner ] [ -debug-parser ] [ -rcfile path ] [ -style ( compact | pretty ) ] [ -width width ] [ -o output-file ] [ -nobs ] [ input-uri ... ] DESCRIPTIONhtml2text reads HTML 3.2 documents from the input-uris, formats each into a stream of plain text characters (ISO 8859-1) and writes the result to standard output (or into output-file, if the -o command line option is used). Documents that are specified by an URI that begins with "http:" (RFC 1738) are retrieved with the Hypertext Transfer Protocol (RFC 1945). URIs that begin with "file:" and URIs that do not contain a colon specify local files. All other URIs are invalid. If no input-uris are specified on the command line, html2text reads from standard input. A dash as the input-uri is an alternate way to specify standard input. html2text understands all HTML 3.2 constructs, but can render only part of them due to the limitations of the text output format. However, the program attempts to provide good substitutes for the elements it cannot render. It also accepts syntactically incorrect input and attempts to interpret it "reasonably". The way in that html2text formats the HTML documents is controlled by formatting properties read from an RC file. html2text attempts to read $HOME/.html2textrc (or the file specified by the -rcfile command line option); if that file cannot be read, html2text attempts to read /etc/html2textrc. If no RC file can be read (or if the RC file does not override all formatting properties), then "reasonable" defaults are assumed. The RC file format is described in the html2textrc(5) manual page.OPTIONS
FILES
CONFORMING TOHTML 3.2 (HTML 3.2 Reference Specification - http://www.w3.org/TR/REC-html32),RFC 1945 (Hypertext Transfer Protocol - HTTP). NOTEShtml2text undergoes considerable effort to parse syntactically incorrect input, but is not always as successful as other HTML processors. If you have the possibility to correct the HTML source code, you may want to use the -unparse or -check options to find out what exactly html2text's problem is.RESTRICTIONShtml2text provides only a basic implementation of the Hypertext Transfer Protocol (HTTP). It requires the complete and exactly matching URI to be given as argument and will not follow redirections (HTTP 301/ 307).AUTHORhtml2text was written up to version 1.2.2 by Arno Unkrig <arno@unkrig.de> for GMRS Software GmbH, Unterschleißheim. Current maintainer and primary download location is:Martin Bayer <mbayer@zedat.fu-berlin.de> http://userpage.fu-berlin.de/~mbayer/tools/html2text.html SEE ALSOhtml2textrc(5), less(1), more(1)
IndexThis document was created by man2html, using the manual pages. Time: 16:59:09 GMT, March 28, 2024 |