Copyright (C) 2000-2012 |
Whole document tree
parserInternalsSynopsisDetailsxmlCreateDocParserCtxt ()
Create a parser context for an XML in-memory document.
xmlCreateFileParserCtxt ()
Create a parser context for a file content. Automatic support for ZLIB/Compress compressed document is provided by default if found at compile-time.
xmlCreateMemoryParserCtxt ()
Create a parser context for an XML in-memory document.
xmlFreeParserCtxt ()
Free all the memory used by a parser context. However the parsed document in ctxt->myDoc is not freed.
xmlNewParserCtxt ()
Allocate and initialize a new parser context.
xmlSwitchEncoding ()
change the input functions when discovering the character encoding of a given entity.
xmlHandleEntity ()
Default handling of defined entities, when should we define a new input stream ? When do we just handle that as a set of chars ? OBSOLETE: to be removed at some point.
xmlNewEntityInputStream ()
Create a new input stream based on an xmlEntityPtr
xmlPushInput ()
xmlPushInput: switch to a new input stream which is stacked on top of the previous one(s).
xmlPopInput ()
xmlPopInput: the current input pointed by ctxt->input came to an end pop it and return the next char.
xmlFreeInputStream ()
Free up an input stream.
xmlNewInputFromFile ()
Create a new input stream based on a file.
xmlSplitQName ()
parse an XML qualified name string [NS 5] QName ::= (Prefix ':')? LocalPart [NS 6] Prefix ::= NCName [NS 7] LocalPart ::= NCName
xmlNamespaceParseNCName ()
parse an XML namespace name. [NS 3] NCName ::= (Letter | '_') (NCNameChar)* [NS 4] NCNameChar ::= Letter | Digit | '.' | '-' | '_' | CombiningChar | Extender
xmlNamespaceParseQName ()
parse an XML qualified name [NS 5] QName ::= (Prefix ':')? LocalPart [NS 6] Prefix ::= NCName [NS 7] LocalPart ::= NCName
xmlNamespaceParseNSDef ()
parse a namespace prefix declaration [NS 1] NSDef ::= PrefixDef Eq SystemLiteral [NS 2] PrefixDef ::= 'xmlns' (':' NCName)?
xmlParseQuotedString ()
[OLD] Parse and return a string between quotes or doublequotes To be removed at next drop of binary compatibility
xmlParseNamespace ()
[OLD] xmlParseNamespace: parse specific PI '<?namespace ...' constructs. This is what the older xml-name Working Draft specified, a bunch of other stuff may still rely on it, so support is still here as if it was declared on the root of the Tree:-( To be removed at next drop of binary compatibility
xmlScanName ()
Trickery: parse an XML name but without consuming the input flow Needed for rollback cases. [4] NameChar ::= Letter | Digit | '.' | '-' | '_' | ':' | CombiningChar | Extender [5] Name ::= (Letter | '_' | ':') (NameChar)* [6] Names ::= Name (S Name)*
xmlParseName ()
parse an XML name. [4] NameChar ::= Letter | Digit | '.' | '-' | '_' | ':' | CombiningChar | Extender [5] Name ::= (Letter | '_' | ':') (NameChar)* [6] Names ::= Name (S Name)*
xmlParseNmtoken ()
parse an XML Nmtoken. [7] Nmtoken ::= (NameChar)+ [8] Nmtokens ::= Nmtoken (S Nmtoken)*
xmlParseEntityValue ()
parse a value for ENTITY declarations [9] EntityValue ::= '"' ([^%&"] | PEReference | Reference)* '"' | "'" ([^%&'] | PEReference | Reference)* "'"
xmlParseAttValue ()
parse a value for an attribute Note: the parser won't do substitution of entities here, this will be handled later in xmlStringGetNodeList [10] AttValue ::= '"' ([^<&"] | Reference)* '"' | "'" ([^<&'] | Reference)* "'" 3.3.3 Attribute-Value Normalization:
Before the value of an attribute is passed to the application or
checked for validity, the XML processor must normalize it as follows:
- a character reference is processed by appending the referenced
character to the attribute value
- an entity reference is processed by recursively processing the
replacement text of the entity
- a whitespace character (
xmlParseSystemLiteral ()
parse an XML Literal [11] SystemLiteral ::= ('"' [^"]* '"') | ("'" [^']* "'")
xmlParsePubidLiteral ()
parse an XML public literal [12] PubidLiteral ::= '"' PubidChar* '"' | "'" (PubidChar - "'")* "'"
xmlParseCharData ()
parse a CharData section. if we are within a CDATA section ']]>' marks an end of section. The right angle bracket (>) may be represented using the string ">", and must, for compatibility, be escaped using ">" or a character reference when it appears in the string "]]>" in content, when that string is not marking the end of a CDATA section. [14] CharData ::= [^<&]* - ([^<&]* ']]>' [^<&]*)
xmlParseExternalID ()
Parse an External ID or a Public ID NOTE: Productions [75] and [83] interract badly since [75] can generate 'PUBLIC' S PubidLiteral S SystemLiteral [75] ExternalID ::= 'SYSTEM' S SystemLiteral | 'PUBLIC' S PubidLiteral S SystemLiteral [83] PublicID ::= 'PUBLIC' S PubidLiteral
xmlParseComment ()
Skip an XML (SGML) comment <!-- .... --> The spec says that "For compatibility, the string "--" (double-hyphen) must not occur within comments. " [15] Comment ::= '<!--' ((Char - '-') | ('-' (Char - '-')))* '-->'
xmlParsePITarget ()
parse the name of a PI [17] PITarget ::= Name - (('X' | 'x') ('M' | 'm') ('L' | 'l'))
xmlParsePI ()
parse an XML Processing Instruction. [16] PI ::= '<?' PITarget (S (Char* - (Char* '?>' Char*)))? '?>' The processing is transfered to SAX once parsed.
xmlParseNotationDecl ()
parse a notation declaration [82] NotationDecl ::= '<!NOTATION' S Name S (ExternalID | PublicID) S? '>' Hence there is actually 3 choices: 'PUBLIC' S PubidLiteral 'PUBLIC' S PubidLiteral S SystemLiteral and 'SYSTEM' S SystemLiteral See the NOTE on xmlParseExternalID().
xmlParseEntityDecl ()
parse <!ENTITY declarations [70] EntityDecl ::= GEDecl | PEDecl [71] GEDecl ::= '<!ENTITY' S Name S EntityDef S? '>' [72] PEDecl ::= '<!ENTITY' S '%' S Name S PEDef S? '>' [73] EntityDef ::= EntityValue | (ExternalID NDataDecl?) [74] PEDef ::= EntityValue | ExternalID [76] NDataDecl ::= S 'NDATA' S Name [ VC: Notation Declared ] The Name must match the declared name of a notation.
xmlParseDefaultDecl ()
Parse an attribute default declaration [60] DefaultDecl ::= ' [ VC: Required Attribute ]
if the default declaration is the keyword [ VC: Attribute Default Legal ] The declared default value must meet the lexical constraints of the declared attribute type c.f. xmlValidateAttributeDecl() [ VC: Fixed Attribute Default ]
if an attribute has a default value declared with the [ WFC: No < in Attribute Values ] handled in xmlParseAttValue()
xmlParseNotationType ()
parse an Notation attribute type. Note: the leading 'NOTATION' S part has already being parsed... [58] NotationType ::= 'NOTATION' S '(' S? Name (S? '|' S? Name)* S? ')' [ VC: Notation Attributes ] Values of this type must match one of the notation names included in the declaration; all notation names in the declaration must be declared.
xmlParseEnumerationType ()
parse an Enumeration attribute type. [59] Enumeration ::= '(' S? Nmtoken (S? '|' S? Nmtoken)* S? ')' [ VC: Enumeration ] Values of this type must match one of the Nmtoken tokens in the declaration
xmlParseEnumeratedType ()
parse an Enumerated attribute type. [57] EnumeratedType ::= NotationType | Enumeration [58] NotationType ::= 'NOTATION' S '(' S? Name (S? '|' S? Name)* S? ')'
xmlParseAttributeType ()
parse the Attribute list def for an element [54] AttType ::= StringType | TokenizedType | EnumeratedType [55] StringType ::= 'CDATA' [56] TokenizedType ::= 'ID' | 'IDREF' | 'IDREFS' | 'ENTITY' | 'ENTITIES' | 'NMTOKEN' | 'NMTOKENS' Validity constraints for attribute values syntax are checked in xmlValidateAttributeValue() [ VC: ID ] Values of type ID must match the Name production. A name must not appear more than once in an XML document as a value of this type; i.e., ID values must uniquely identify the elements which bear them. [ VC: One ID per Element Type ] No element type may have more than one ID attribute specified. [ VC: ID Attribute Default ]
An ID attribute must have a declared default of [ VC: IDREF ] Values of type IDREF must match the Name production, and values of type IDREFS must match Names; each IDREF Name must match the value of an ID attribute on some element in the XML document; i.e. IDREF values must match the value of some ID attribute. [ VC: Entity Name ] Values of type ENTITY must match the Name production, values of type ENTITIES must match Names; each Entity Name must match the name of an unparsed entity declared in the DTD. [ VC: Name Token ] Values of type NMTOKEN must match the Nmtoken production; values of type NMTOKENS must match Nmtokens.
xmlParseAttributeListDecl ()
: parse the Attribute list def for an element [52] AttlistDecl ::= '<!ATTLIST' S Name AttDef* S? '>' [53] AttDef ::= S Name S AttType S DefaultDecl
xmlParseElementMixedContentDecl ()
parse the declaration for a Mixed Element content The leading '(' and spaces have been skipped in xmlParseElementContentDecl [51] Mixed ::= '(' S? ' [ VC: Proper Group/PE Nesting ] applies to [51] too (see [49]) [ VC: No Duplicate Types ] The same name must not appear more than once in a single mixed-content declaration.
xmlParseElementChildrenContentDecl ()
parse the declaration for a Mixed Element content The leading '(' and spaces have been skipped in xmlParseElementContentDecl [47] children ::= (choice | seq) ('?' | '*' | '+')? [48] cp ::= (Name | choice | seq) ('?' | '*' | '+')? [49] choice ::= '(' S? cp ( S? '|' S? cp )* S? ')' [50] seq ::= '(' S? cp ( S? ',' S? cp )* S? ')' [ VC: Proper Group/PE Nesting ] applies to [49] and [50] TODO Parameter-entity replacement text must be properly nested with parenthetized groups. That is to say, if either of the opening or closing parentheses in a choice, seq, or Mixed construct is contained in the replacement text for a parameter entity, both must be contained in the same replacement text. For interoperability, if a parameter-entity reference appears in a choice, seq, or Mixed construct, its replacement text should not be empty, and neither the first nor last non-blank character of the replacement text should be a connector (| or ,).
xmlParseElementContentDecl ()
parse the declaration for an Element content either Mixed or Children, the cases EMPTY and ANY are handled directly in xmlParseElementDecl [46] contentspec ::= 'EMPTY' | 'ANY' | Mixed | children
xmlParseElementDecl ()
parse an Element declaration. [45] elementdecl ::= '<!ELEMENT' S Name S contentspec S? '>' [ VC: Unique Element Type Declaration ] No element type may be declared more than once
xmlParseMarkupDecl ()
parse Markup declarations [29] markupdecl ::= elementdecl | AttlistDecl | EntityDecl | NotationDecl | PI | Comment [ VC: Proper Declaration/PE Nesting ] Parameter-entity replacement text must be properly nested with markup declarations. That is to say, if either the first character or the last character of a markup declaration (markupdecl above) is contained in the replacement text for a parameter-entity reference, both must be contained in the same replacement text. [ WFC: PEs in Internal Subset ] In the internal DTD subset, parameter-entity references can occur only where markup declarations can occur, not within markup declarations. (This does not apply to references that occur in external parameter entities or to the external subset.)
xmlParseCharRef ()
parse Reference declarations [66] CharRef ::= '&#' [0-9]+ ';' |
'& [ WFC: Legal Character ] Characters referred to using character references must match the production for Char.
xmlParseEntityRef ()
parse ENTITY references declarations [68] EntityRef ::= '&' Name ';' [ WFC: Entity Declared ] In a document without any DTD, a document with only an internal DTD subset which contains no parameter entity references, or a document with "standalone='yes'", the Name given in the entity reference must match that in an entity declaration, except that well-formed documents need not declare any of the following entities: amp, lt, gt, apos, quot. The declaration of a parameter entity must precede any reference to it. Similarly, the declaration of a general entity must precede any reference to it which appears in a default value in an attribute-list declaration. Note that if entities are declared in the external subset or in external parameter entities, a non-validating processor is not obligated to read and process their declarations; for such documents, the rule that an entity must be declared is a well-formedness constraint only if standalone='yes'. [ WFC: Parsed Entity ] An entity reference must not contain the name of an unparsed entity
xmlParseReference ()
parse and handle entity references in content, depending on the SAX
interface, this may end-up in a call to [67] Reference ::= EntityRef | CharRef
xmlParsePEReference ()
parse PEReference declarations The entity content is handled directly by pushing it's content as a new input stream. [69] PEReference ::= '%' Name ';' [ WFC: No Recursion ] A parsed entity must not contain a recursive reference to itself, either directly or indirectly. [ WFC: Entity Declared ] In a document without any DTD, a document with only an internal DTD subset which contains no parameter entity references, or a document with "standalone='yes'", ... ... The declaration of a parameter entity must precede any reference to it... [ VC: Entity Declared ] In a document with an external subset or external parameter entities with "standalone='no'", ... ... The declaration of a parameter entity must precede any reference to it... [ WFC: In DTD ] Parameter-entity references may only appear in the DTD. NOTE: misleading but this is handled.
xmlParseDocTypeDecl ()
parse a DOCTYPE declaration [28] doctypedecl ::= '<!DOCTYPE' S Name (S ExternalID)? S? ('[' (markupdecl | PEReference | S)* ']' S?)? '>' [ VC: Root Element Type ] The Name in the document type declaration must match the element type of the root element.
xmlParseAttribute ()
parse an attribute [41] Attribute ::= Name Eq AttValue [ WFC: No External Entity References ] Attribute values cannot contain direct or indirect entity references to external entities. [ WFC: No < in Attribute Values ] The replacement text of any entity referred to directly or indirectly in an attribute value (other than "<") must not contain a <. [ VC: Attribute Value Type ] The attribute must have been declared; the value must be of the type declared for it. [25] Eq ::= S? '=' S? With namespace: [NS 11] Attribute ::= QName Eq AttValue Also the case QName == xmlns:??? is handled independently as a namespace definition.
xmlParseStartTag ()
parse a start of tag either for rule element or EmptyElement. In both case we don't parse the tag closing chars. [40] STag ::= '<' Name (S Attribute)* S? '>' [ WFC: Unique Att Spec ] No attribute name may appear more than once in the same start-tag or empty-element tag. [44] EmptyElemTag ::= '<' Name (S Attribute)* S? '/>' [ WFC: Unique Att Spec ] No attribute name may appear more than once in the same start-tag or empty-element tag. With namespace: [NS 8] STag ::= '<' QName (S Attribute)* S? '>' [NS 10] EmptyElement ::= '<' QName (S Attribute)* S? '/>'
xmlParseEndTag ()
parse an end of tag [42] ETag ::= '</' Name S? '>' With namespace [NS 9] ETag ::= '</' QName S? '>'
xmlParseCDSect ()
Parse escaped pure raw content. [18] CDSect ::= CDStart CData CDEnd [19] CDStart ::= '<![CDATA[' [20] Data ::= (Char* - (Char* ']]>' Char*)) [21] CDEnd ::= ']]>'
xmlParseContent ()
Parse a content: [43] content ::= (element | CharData | Reference | CDSect | PI | Comment)*
xmlParseElement ()
parse an XML element, this is highly recursive [39] element ::= EmptyElemTag | STag content ETag [ WFC: Element Type Match ] The Name in an element's end-tag must match the element type in the start-tag. [ VC: Element Valid ] An element is valid if there is a declaration matching elementdecl where the Name matches the element type and one of the following holds: - The declaration matches EMPTY and the element has no content. - The declaration matches children and the sequence of child elements belongs to the language generated by the regular expression in the content model, with optional white space (characters matching the nonterminal S) between each pair of child elements. - The declaration matches Mixed and the content consists of character data and child elements whose types match names in the content model. - The declaration matches ANY, and the types of any child elements have been declared.
xmlParseVersionNum ()
parse the XML version value. [26] VersionNum ::= ([a-zA-Z0-9_.:] | '-')+
xmlParseVersionInfo ()
parse the XML version. [24] VersionInfo ::= S 'version' Eq (' VersionNum ' | " VersionNum ") [25] Eq ::= S? '=' S?
xmlParseEncName ()
parse the XML encoding name [81] EncName ::= [A-Za-z] ([A-Za-z0-9._] | '-')*
xmlParseEncodingDecl ()
parse the XML encoding declaration [80] EncodingDecl ::= S 'encoding' Eq ('"' EncName '"' | "'" EncName "'") this setups the conversion filters.
xmlParseSDDecl ()
parse the XML standalone declaration [32] SDDecl ::= S 'standalone' Eq (("'" ('yes' | 'no') "'") | ('"' ('yes' | 'no')'"')) [ VC: Standalone Document Declaration ] TODO The standalone document declaration must have the value "no" if any external markup declarations contain declarations of: - attributes with default values, if elements to which these attributes apply appear in the document without specifications of values for these attributes, or - entities (other than amp, lt, gt, apos, quot), if references to those entities appear in the document, or - attributes with values subject to normalization, where the attribute appears in the document with a value which will change as a result of normalization, or - element types with element content, if white space occurs directly within any instance of those types.
xmlParseXMLDecl ()
parse an XML declaration header [23] XMLDecl ::= '<?xml' VersionInfo EncodingDecl? SDDecl? S? '?>'
xmlParseMisc ()
parse an XML Misc* optionnal field. [27] Misc ::= Comment | PI | S
xmlParseExternalSubset ()
parse Markup declarations from an external subset [30] extSubset ::= textDecl? extSubsetDecl [31] extSubsetDecl ::= (markupdecl | conditionalSect | PEReference | S) *
xmlDecodeEntities ()
This function is deprecated, we now always process entities content through xmlStringDecodeEntities TODO: remove it in next major release. [67] Reference ::= EntityRef | CharRef [69] PEReference ::= '%' Name ';'
|