GNU Info

Info Node: (python2.1-lib.info)XMLParser Objects

(python2.1-lib.info)XMLParser Objects


Next: ExpatError Exceptions Prev: xml.parsers.expat Up: xml.parsers.expat
Enter node , (file) or (file)node

XMLParser Objects
-----------------

`xmlparser' objects have the following methods:

`Parse(data[, isfinal])'
     Parses the contents of the string DATA, calling the appropriate
     handler functions to process the parsed data.  ISFINAL must be
     true on the final call to this method.  DATA can be the empty
     string at any time.

`ParseFile(file)'
     Parse XML data reading from the object FILE.  FILE only needs to
     provide the `read(NBYTES)' method, returning the empty string when
     there's no more data.

`SetBase(base)'
     Sets the base to be used for resolving relative URIs in system
     identifiers in declarations.  Resolving relative identifiers is
     left to the application: this value will be passed through as the
     BASE argument to the `ExternalEntityRefHandler',
     `NotationDeclHandler', and `UnparsedEntityDeclHandler' functions.

`GetBase()'
     Returns a string containing the base set by a previous call to
     `SetBase()', or `None' if `SetBase()' hasn't been called.

`GetInputContext()'
     Returns the input data that generated the current event as a
     string.  The data is in the encoding of the entity which contains
     the text.  When called while an event handler is not active, the
     return value is `None'.  _Added in Python version 2.1_

`ExternalEntityParserCreate(context[, encoding])'
     Create a "child" parser which can be used to parse an external
     parsed entity referred to by content parsed by the parent parser.
     The CONTEXT parameter should be the string passed to the
     `ExternalEntityRefHandler()' handler function, described below.
     The child parser is created with the `ordered_attributes',
     `returns_unicode' and `specified_attributes' set to the values of
     this parser.

`xmlparser' objects have the following attributes:

`ordered_attributes'
     Setting this attribute to a non-zero integer causes the attributes
     to be reported as a list rather than a dictionary.  The attributes
     are presented in the order found in the document text.  For each
     attribute, two list entries are presented: the attribute name and
     the attribute value.  (Older versions of this module also used this
     format.)  By default, this attribute is false; it may be changed at
     any time.  _Added in Python version 2.1_

`returns_unicode'
     If this attribute is set to a non-zero integer, the handler
     functions will be passed Unicode strings.  If `returns_unicode' is
     0, 8-bit strings containing UTF-8 encoded data will be passed to
     the handlers.  _Changed in Python version 1.6_

`specified_attributes'
     If set to a non-zero integer, the parser will report only those
     attributes which were specified in the document instance and not
     those which were derived from attribute declarations.
     Applications which set this need to be especially careful to use
     what additional information is available from the declarations as
     needed to comply with the standards for the behavior of XML
     processors.  By default, this attribute is false; it may be
     changed at any time.  _Added in Python version 2.1_

The following attributes contain values relating to the most recent
error encountered by an `xmlparser' object, and will only have correct
values once a call to `Parse()' or `ParseFile()' has raised a
`xml.parsers.expat.ExpatError' exception.

`ErrorByteIndex'
     Byte index at which an error occurred.

`ErrorCode'
     Numeric code specifying the problem.  This value can be passed to
     the `ErrorString()' function, or compared to one of the constants
     defined in the `errors' object.

`ErrorColumnNumber'
     Column number at which an error occurred.

`ErrorLineNumber'
     Line number at which an error occurred.

Here is the list of handlers that can be set.  To set a handler on an
`xmlparser' object O, use `O.HANDLERNAME = FUNC'.  HANDLERNAME must be
taken from the following list, and FUNC must be a callable object
accepting the correct number of arguments.  The arguments are all
strings, unless otherwise stated.

`XmlDeclHandler(version, encoding, standalone)'
     Called when the XML declaration is parsed.  The XML declaration is
     the (optional) declaration of the applicable version of the XML
     recommendation, the encoding of the document text, and an optional
     "standalone" declaration.  VERSION and ENCODING will be strings of
     the type dictated by the `returns_unicode' attribute, and
     STANDALONE will be `1' if the document is declared standalone, `0'
     if it is declared not to be standalone, or `-1' if the standalone
     clause was omitted.  This is only available with Expat version
     1.95.0 or newer.  _Added in Python version 2.1_

`StartDoctypeDeclHandler(doctypeName, systemId, publicId, has_internal_subset)'
     Called when Expat begins parsing the document type declaration
     (`<!DOCTYPE ...').  The DOCTYPENAME is provided exactly as
     presented.  The SYSTEMID and PUBLICID parameters give the system
     and public identifiers if specified, or `None' if omitted.
     HAS_INTERNAL_SUBSET will be true if the document contains and
     internal document declaration subset.  This requires Expat version
     1.2 or newer.

`EndDoctypeDeclHandler()'
     Called when Expat is done parsing the document type delaration.
     This requires Expat version 1.2 or newer.

`ElementDeclHandler(name, model)'
     Called once for each element type declaration.  NAME is the name
     of the element type, and MODEL is a representation of the content
     model.

`AttlistDeclHandler(elname, attname, type, default, required)'
     Called for each declared attribute for an element type.  If an
     attribute list declaration declares three attributes, this handler
     is called three times, once for each attribute.  ELNAME is the name
     of the element to which the declaration applies and ATTNAME is the
     name of the attribute declared.  The attribute type is a string
     passed as TYPE; the possible values are `'CDATA'', `'ID'',
     `'IDREF'', ...  DEFAULT gives the default value for the attribute
     used when the attribute is not specified by the document instance,
     or `None' if there is no default value (`#IMPLIED' values).  If
     the attribute is required to be given in the document instance,
     REQUIRED will be true.  This requires Expat version 1.95.0 or
     newer.

`StartElementHandler(name, attributes)'
     Called for the start of every element.  NAME is a string
     containing the element name, and ATTRIBUTES is a dictionary
     mapping attribute names to their values.

`EndElementHandler(name)'
     Called for the end of every element.

`ProcessingInstructionHandler(target, data)'
     Called for every processing instruction.

`CharacterDataHandler(data)'
     Called for character data.  This will be called for normal
     character data, CDATA marked content, and ignorable whitespace.
     Applications which must distinguish these cases can use the
     `StartCdataSectionHandler', `EndCdataSectionHandler', and
     `ElementDeclHandler' callbacks to collect the required information.

`UnparsedEntityDeclHandler(entityName, base, systemId, publicId, notationName)'
     Called for unparsed (NDATA) entity declarations.  This is only
     present for version 1.2 of the Expat library; for more recent
     versions, use `EntityDeclHandler' instead.  (The underlying
     function in the Expat library has been declared obsolete.)

`EntityDeclHandler(entityName, is_parameter_entity, value, base, systemId, publicId, notationName)'
     Called for all entity declarations.  For parameter and internal
     entities, VALUE will be a string giving the declared contents of
     the entity; this will be `None' for external entities.  The
     NOTATIONNAME parameter will be `None' for parsed entities, and the
     name of the notation for unparsed entities.  IS_PARAMETER_ENTITY
     will be true if the entity is a paremeter entity or false for
     general entities (most applications only need to be concerned with
     general entities).  This is only available starting with version
     1.95.0 of the Expat library.  _Added in Python version 2.1_

`NotationDeclHandler(notationName, base, systemId, publicId)'
     Called for notation declarations.  NOTATIONNAME, BASE, and
     SYSTEMID, and PUBLICID are strings if given.  If the public
     identifier is omitted, PUBLICID will be `None'.

`StartNamespaceDeclHandler(prefix, uri)'
     Called when an element contains a namespace declaration.  Namespace
     declarations are processed before the `StartElementHandler' is
     called for the element on which declarations are placed.

`EndNamespaceDeclHandler(prefix)'
     Called when the closing tag is reached for an element that
     contained a namespace declaration.  This is called once for each
     namespace declaration on the element in the reverse of the order
     for which the `StartNamespaceDeclHandler' was called to indicate
     the start of each namespace declaration's scope.  Calls to this
     handler are made after the corresponding `EndElementHandler' for
     the end of the element.

`CommentHandler(data)'
     Called for comments.  DATA is the text of the comment, excluding
     the leading ``<!-'`-'' and trailing ``-'`->''.

`StartCdataSectionHandler()'
     Called at the start of a CDATA section.  This and
     `StartCdataSectionHandler' are needed to be able to identify the
     syntactical start and end for CDATA sections.

`EndCdataSectionHandler()'
     Called at the end of a CDATA section.

`DefaultHandler(data)'
     Called for any characters in the XML document for which no
     applicable handler has been specified.  This means characters that
     are part of a construct which could be reported, but for which no
     handler has been supplied.

`DefaultHandlerExpand(data)'
     This is the same as the `DefaultHandler', but doesn't inhibit
     expansion of internal entities.  The entity reference will not be
     passed to the default handler.

`NotStandaloneHandler()'
     Called if the XML document hasn't been declared as being a
     standalone document.  This happens when there is an external
     subset or a reference to a parameter entity, but the XML
     declaration does not set standalone to `yes' in an XML
     declaration.  If this handler returns `0', then the parser will
     throw an `XML_ERROR_NOT_STANDALONE' error.  If this handler is not
     set, no exception is raised by the parser for this condition.

`ExternalEntityRefHandler(context, base, systemId, publicId)'
     Called for references to external entities.  BASE is the current
     base, as set by a previous call to `SetBase()'.  The public and
     system identifiers, SYSTEMID and PUBLICID, are strings if given;
     if the public identifier is not given, PUBLICID will be `None'.
     The CONTEXT value is opaque and should only be used as described
     below.

     For external entities to be parsed, this handler must be
     implemented.  It is responsible for creating the sub-parser using
     `ExternalEntityParserCreate(CONTEXT)', initializing it with the
     appropriate callbacks, and parsing the entity.  This handler
     should return an integer; if it returns `0', the parser will throw
     an `XML_ERROR_EXTERNAL_ENTITY_HANDLING' error, otherwise parsing
     will continue.

     If this handler is not provided, external entities are reported by
     the `DefaultHandler' callback, if provided.


automatically generated by info2www version 1.2.2.9