Whole document tree
    

Whole document tree

SP - XML support

XML support

Using SP to parse XML

To enable SP's support for XML 1.0:

  • Set the SP_CHARSET_FIXED environment variable to YES.
  • Set the SP_ENCODING environment variable to XML.
  • Set the SGML_CATALOG_FILES environment variable to point to the file pubtext/xml.soc.
  • Use the -wxml option.
  • If the document is not supposed to be valid, use -wno-valid option.

Limitations

SP does not enforce the following XML constraints:

  • XML constrains processing instructions with a target matching [Xx][Mm][Ll], both in terms of where they can occur and their content.
  • XML does not allow a parameter separator that is adjacent to a delimiter to be omitted.
  • XML has constraints on the use of & in parameter literals. In SGML terms, XML says that the ero delimiter is recognized in a parameter literal, and that it must be followed by an entity reference, but the entity reference is not expanded.

Line ends are normalized using SGML conventions to a CR/LF character pair rather than using the XML convention of a single LF character.

There is no support for characters outside the basic multilingual plane (ie those with scalar values greater than U+FFFF).

SP does not enforce XML's rules on not continuing normal processing after an error. Applications can enforce these if they choose.

Web SGML Adaptations Annex

SP's support for SGML is based on Annex K of ISO 8879 (the Web SGML Adaptations Annex). The following features of Annex K are not yet implemented:

  • Checking of ENTITIES REF assertions
  • #IMPLIED document type name
  • Implying definitions of notations and entities (IMPLYDEF ENTITY YES and NOTATION YES)
  • SGML declarations on subdocuments
  • DATA declared value
  • URN feature

James Clark
jjc@jclark.com