GNU Info

Info Node: (python2.1-lib.info)xmllib

(python2.1-lib.info)xmllib


Prev: xml.sax.xmlreader Up: Structured Markup Processing Tools
Enter node , (file) or (file)node

A parser for XML documents
==========================

A parser for XML documents.  This module was written by Sjoerd
Mullender <Sjoerd.Mullender@cwi.nl>.
This manual section was written by Sjoerd Mullender
<Sjoerd.Mullender@cwi.nl>.
_This is deprecated in Python 2.0.  Use `xml.sax' instead.  The newer
XML package includes full support for XML 1.0._

_Changed in Python version 1.5.2_

This module defines a class `XMLParser' which serves as the basis for
parsing text files formatted in XML (Extensible Markup Language).

`XMLParser()'
     The `XMLParser' class must be instantiated without arguments.(1)

This class provides the following interface methods and instance
variables:

`attributes'
     A mapping of element names to mappings.  The latter mapping maps
     attribute names that are valid for the element to the default
     value of the attribute, or if there is no default to `None'.  The
     default value is the empty dictionary.  This variable is meant to
     be overridden, not extended since the default is shared by all
     instances of `XMLParser'.

`elements'
     A mapping of element names to tuples.  The tuples contain a
     function for handling the start and end tag respectively of the
     element, or `None' if the method `unknown_starttag()' or
     `unknown_endtag()' is to be called.  The default value is the
     empty dictionary.  This variable is meant to be overridden, not
     extended since the default is shared by all instances of
     `XMLParser'.

`entitydefs'
     A mapping of entitynames to their values.  The default value
     contains definitions for `'lt'', `'gt'', `'amp'', `'quot'', and
     `'apos''.

`reset()'
     Reset the instance.  Loses all unprocessed data.  This is called
     implicitly at the instantiation time.

`setnomoretags()'
     Stop processing tags.  Treat all following input as literal input
     (CDATA).

`setliteral()'
     Enter literal mode (CDATA mode).  This mode is automatically exited
     when the close tag matching the last unclosed open tag is
     encountered.

`feed(data)'
     Feed some text to the parser.  It is processed insofar as it
     consists of complete tags; incomplete data is buffered until more
     data is fed or `close()' is called.

`close()'
     Force processing of all buffered data as if it were followed by an
     end-of-file mark.  This method may be redefined by a derived class
     to define additional processing at the end of the input, but the
     redefined version should always call `close()'.

`translate_references(data)'
     Translate all entity and character references in DATA and return
     the translated string.

`getnamespace()'
     Return a mapping of namespace abbreviations to namespace URIs that
     are currently in effect.

`handle_xml(encoding, standalone)'
     This method is called when the `<?xml ...?>' tag is processed.
     The arguments are the values of the encoding and standalone
     attributes in the tag.  Both encoding and standalone are optional.
     The values passed to `handle_xml()' default to `None' and the
     string `'no'' respectively.

`handle_doctype(tag, pubid, syslit, data)'
     This  method is called when the `<!DOCTYPE...>' declaration is
     processed.  The arguments are the tag name of the root element,
     the Formal Public  Identifier (or `None' if not specified), the
     system identifier, and the uninterpreted contents of the internal
     DTD subset as a string (or `None' if not present).

`handle_starttag(tag, method, attributes)'
     This method is called to handle start tags for which a start tag
     handler is defined in the instance variable `elements'.  The TAG
     argument is the name of the tag, and the METHOD argument is the
     function (method) which should be used to support semantic
     interpretation of the start tag.  The ATTRIBUTES argument is a
     dictionary of attributes, the key being the NAME and the value
     being the VALUE of the attribute found inside the tag's `<>'
     brackets.  Character and entity references in the VALUE have been
     interpreted.  For instance, for the start tag `<A
     HREF="http://www.cwi.nl/">', this method would be called as
     `handle_starttag('A', self.elements['A'][0], {'HREF':
     'http://www.cwi.nl/'})'.  The base implementation simply calls
     METHOD with ATTRIBUTES as the only argument.

`handle_endtag(tag, method)'
     This method is called to handle endtags for which an end tag
     handler is defined in the instance variable `elements'.  The TAG
     argument is the name of the tag, and the METHOD argument is the
     function (method) which should be used to support semantic
     interpretation of the end tag.  For instance, for the endtag
     `</A>', this method would be called as `handle_endtag('A',
     self.elements['A'][1])'.  The base implementation simply calls
     METHOD.

`handle_data(data)'
     This method is called to process arbitrary data.  It is intended
     to be overridden by a derived class; the base class implementation
     does nothing.

`handle_charref(ref)'
     This method is called to process a character reference of the form
     `&#REF;'.  REF can either be a decimal number, or a hexadecimal
     number when preceded by an `x'.  In the base implementation, REF
     must be a number in the range 0-255.  It translates the character
     to ASCII and calls the method `handle_data()' with the character
     as argument.  If REF is invalid or out of range, the method
     `unknown_charref(REF)' is called to handle the error.  A subclass
     must override this method to provide support for character
     references outside of the ASCII range.

`handle_comment(comment)'
     This method is called when a comment is encountered.  The COMMENT
     argument is a string containing the text between the `<!--' and
     `-->' delimiters, but not the delimiters themselves.  For example,
     the comment `<!--text-->' will cause this method to be called with
     the argument `'text''.  The default method does nothing.

`handle_cdata(data)'
     This method is called when a CDATA element is encountered.  The
     DATA argument is a string containing the text between the
     `<![CDATA[' and `]]>' delimiters, but not the delimiters
     themselves.  For example, the entity `<![CDATA[text]]>' will cause
     this method to be called with the argument `'text''.  The default
     method does nothing, and is intended to be overridden.

`handle_proc(name, data)'
     This method is called when a processing instruction (PI) is
     encountered.  The NAME is the PI target, and the DATA argument is
     a string containing the text between the PI target and the closing
     delimiter, but not the delimiter itself.  For example, the
     instruction `<?XML text?>' will cause this method to be called
     with the arguments `'XML'' and `'text''.  The default method does
     nothing.  Note that if a document starts with `<?xml ..?>',
     `handle_xml()' is called to handle it.

`handle_special(data)'
     This method is called when a declaration is encountered.  The DATA
     argument is a string containing the text between the `<!' and `>'
     delimiters, but not the delimiters themselves.  For example, the
     entity declaration `<!ENTITY text>' will cause this method to be
     called with the argument `'ENTITY text''.  The default method does
     nothing.  Note that `<!DOCTYPE ...>' is handled separately if it
     is located at the start of the document.

`syntax_error(message)'
     This method is called when a syntax error is encountered.  The
     MESSAGE is a description of what was wrong.  The default method
     raises a `RuntimeError' exception.  If this method is overridden,
     it is permissible for it to return.  This method is only called
     when the error can be recovered from.  Unrecoverable errors raise
     a `RuntimeError' without first calling `syntax_error()'.

`unknown_starttag(tag, attributes)'
     This method is called to process an unknown start tag.  It is
     intended to be overridden by a derived class; the base class
     implementation does nothing.

`unknown_endtag(tag)'
     This method is called to process an unknown end tag.  It is
     intended to be overridden by a derived class; the base class
     implementation does nothing.

`unknown_charref(ref)'
     This method is called to process unresolvable numeric character
     references.  It is intended to be overridden by a derived class;
     the base class implementation does nothing.

`unknown_entityref(ref)'
     This method is called to process an unknown entity reference.  It
     is intended to be overridden by a derived class; the base class
     implementation calls `syntax_error()' to signal an error.

See also:
     `Extensible Markup Language (XML) 1.0'{The XML specification,
     published by the World Wide Web Consortium (W3C), defines the
     syntax and processor requirements for XML.  References to
     additional material on XML, including translations of the
     specification, are available at <http://www.w3.org/XML/>.}

     `Python and XML Processing'{The Python XML Topic Guide provides a
     great deal of information on using XML from Python and links to
     other sources of information on XML.}

     `SIG for XML Processing in Python'{The Python XML Special Interest
     Group is developing substantial support for processing XML from
     Python.}

XML Namespaces
---------- Footnotes ----------

(1) Actually, a number of keyword arguments are recognized which
influence the parser to accept certain non-standard constructs.  The
following keyword arguments are currently recognized.  The defaults for
all of these is `0' (false) except for the last one for which the
default is `1' (true). ACCEPT_UNQUOTED_ATTRIBUTES (accept certain
attribute values without requiring quotes), ACCEPT_MISSING_ENDTAG_NAME
(accept end tags that look like `</>'), MAP_CASE (map upper case to
lower case in tags and attributes), ACCEPT_UTF8 (allow UTF-8 characters
in input; this is required according to the XML standard, but Python
does not as yet deal properly with these characters, so this is not the
default), TRANSLATE_ATTRIBUTE_REFERENCES (don't attempt to translate
character and entity references in attribute values).


automatically generated by info2www version 1.2.2.9