Explanantions of the recommendations to LSB ------------------------------------------- Introduction ------------ This document explains the choices that have been made for the recommendations to LSB about DocBook. This is presented in a separate document to leave the draft concentrate only on the "what?" and not on the "why?" There are hundreds of man-hours behind those recommendations. They really costed blood, sweat and tears. Each line was discussed many times and the global architecture changed quite often. We really tried to hear what everyone add to say. So we would like to encourage LSB in being very careful if they want to modify them. The general philosophy was to keep the "historical" choices everywhere it had no consequences, and the "best" technical choice wherever it was interesting. We have attempted to design a very simple but also powerful architecture, in full respect of the FHS (File system Hierarchy Standard). Another general principal of design was to think to the user, not to the theory. There were many models that would have been much more intellectually satisfactory - but they were all too complex for everyday's use. Definitions ----------- Why those definitions? Because we realized we were speaking about different things with the same words. An example is "SGML application": it can both refer to a specific DTD, or to a computer program meant to process some SGML or XML document, both definitions being perfectly correct. To avoid any potential confusion, we just chose the one we needed. Some definitions like "helper", "backend" and "frontend" are not even necessary to read the rest of the document. We left them because we needed them to provide a reference implementation. R001 - SGML Directory layout ---------------------------- Some existing projects were putting files in /usr/lib/sgml, some other in /usr/local/share/sgml. Those files are not libraries nor local to a system, so we chose /usr/share/sgml. Some projects used to put centralized catalogs at the same place as the other catalogs. Since they can be seen as system configuration files, it was locgical to centralize them in /etc. One very hard question was: should we separe sgml from xml? The relationship between one and the other is very strong, so we chose to keep them at the same place in the directory tree. This allows, for example, to have all docbook stuff, both sgml and xml, at the very same place, which is obviously practical. While /usr/share/sgml does not explicitely reflect this, we found that it was still better than /usr/share/markup (what about TeX then?), than /usr/share/ml or than other proposals. Why having fixed file paths while you could have got them from some configuration variables, autoconf mechanisms, etc? First because it's simpler: we wanted a very strong standard, given that the tools may still use such configuration variables or autoconf mechanisms to adapt to non-LSB platforms. We considered that a standard that does not specify enough is somehow encouraging the most bizarre variations. We chose a dtd-and-package-oriented architecture, instead of a file-type-oriented structure. This was probably the most controversial issue. The "natural" proposal for SGML and XML specialists is to have the FPIs map almost letter-per-letter in the directory names. However, this approach does not take profit of the catalogs mechanism that allow to map FPIs into file paths. A file-type-oriented architecture would have lead things like: /usr/share/sgml/USA-DOD/DTD_Table_Model_951010/EN/ /usr/share/sgml/OASIS/DTD_DocBook_V3.1/EN/ /usr/share/sgml/OASIS/ELEMENTS_DocBook_Information_Pool_V3.1/EN/ /usr/share/sgml/OASIS/ELEMENTS_DocBook_Document_Hierarchy_V3.1/EN/ /usr/share/sgml/OASIS/ENTITIES_DocBook_Additional_General_Entities_V3.1/EN/ /usr/share/sgml/OASIS/ENTITIES_DocBook_Notations_V3.1/EN/ /usr/share/sgml/OASIS/ENTITIES_DocBook_Character_Entities_V3.1/EN/ or something more far away from the FPIs like: /usr/share/sgml/sgml-dtd/hal/docbook/2.4/ /usr/share/sgml/sgml-dtd/davenport/docbook/3.0/ /usr/share/sgml/sgml-dtd/davenport/docbook/3.0/ /usr/share/sgml/sgml-dtd/oasis/docbook/3.1/ /usr/share/sgml/xml-dtd/oasis/docbook/4.0/ /usr/share/sgml/dssl-stylesheets/nwalsh/docbook/3.1/ /usr/share/sgml/xsl-stylesheets/nwalsh/docbook/4.0/ /usr/share/sgml/sgml-dtd/ietf/html/2.0/ /usr/share/sgml/sgml-dtd/w3c/html/3.2/ but in all the case, the files would have been spread according to their file types in distant directories. We would probably have had entities somewhere, stylesheets somewhere else, dtds in a third place, and sgml declarations in a fourth place. This would certainly have broke some relative paths, and required more packaging work. The user does not think in terms of file types, whereas SGML specialists do. The user only thinks "I want to do some MathML" or "I want to do some XHTML" or "I want to do some TEI". This is why the basic unit is the DTD. This DTD-centered approach does not mean that first level directories are for DTDs. It just means that they hold everything related to a given DTD: stylesheets, enterprise-wide customizations, etc... R002 - DocBook Directory layout ------------------------------- Maybe the document seems confused because it mixes recommendantions for SGML and XML with recommendations for DocBook. It would somehow have been good to separate it into two documents. On the other hand, this allowed to think in very practical terms. There is only one lower level of directories. The directory names are vaguely defined as holding one "package". One advantage is that the relation to any RPM or DEB package is very close. The other advantage is that we have a very flat tree, thus easing both hacking, packaging and maintenance by system administrators. The lower level directories are version-numbered. This unusual naming scheme is intended to permit documents that are written using several versions of the same DTD to coexist on the same system. R003 - Open Catalog usage for SGML ---------------------------------- Why focusing so much on catalogs in these recommendations? Because they are the key to your directory structure and give a strong working infrastructure that every SGML or XML tool can count on. Open catalogs have very often been resented because they lead to problems like conflicting SGMLDECLs. However, those problems do not appear if you use them carefully. One of the keys is to avoid putting everything in the same bag, and to have centralized catalogs that are specific to a given DTD. The fact that they are DTD-specific has a number of advantages: - avoid SGMLDECL conflicts without assuming DTDDECL or DELEGATE support, which many tools still not support yet - avoid duplicate FPI declarations - allow to point to the right version of a given DTD and to the corresponding entities and style sheets from only one place When splitting your CATALOG pointers in one file per DTD, you also somehow lose a global vision on all the catalogs that are installed on your system. This is why we have introduced the super-catalog, pointing to all the centralized catalogs on your system. It eases a lot scripting issues. The super catalog may be used as a default centralized catalog, for example when the DTD is not known, however it can't be guaranteed that there won't be any declaration conflicts if an application chooses to use this functionality. OASIS says that all the catalogs should be named "CATALOG" or "catalog". This was impossible to respect in /etc/sgml where you will have the centralized catalogs, because many files cannot hold the same name. Somehow it does not break those directives that much, because all the ordinary catalogs on your system would still be named "catalog". We also choose to specify "catalog" rathen than "CATALOG", while OASIS leaves the choice. We considered that we should encourage one of both versions, whichever it should be, because it made live simpler for everyone (scripts, maintainers, packagers, tools authors, ...). In this respect, LSB implementations could be considered as conformant to OASIS, while the contrary would not be true. R004 - Open Catalog usage for DocBook ------------------------------------- Directories like the ones holding Jade's or OpenJade's declarations and the ISO entities are on top level because they are not specific to any given DTD and can be used by two or more of them. Of course one may argue that Jade's or OpenJade's declarations contain the document type definition of what DSSSL is. But again what is important is the usage, not the formal definition, so it has no reason to go to a dsssl/ directory (which would also encourage packagers to put the stylesheets in, away from their dtd, which is not what we want). R005 - Configuration files -------------------------- This recommendation is voluntarily vague, to ease as much as possible the possibility to create SGML applications with not creativity restrictions with respect to configuration files - the catalog layout solves anyhow one of their major problems: find the files. R006 - Iso-entities ------------------- So far, the most confusion has been with the file names holding these very basic character entities. We have seen the following naming schemes: ISOamsa ISOamsb ... ISOamsa.ent ISOamsb.ent ... iso-amsa iso-amsb ... iso-amsa.gml iso-amsb.gml ... etc... There was a similar confusion for the Formal public identifiers describing these files. We have seen the following naming schemes: "ISO 8879:1986//ENTITIES Added Math Symbols: Arrow Relations//EN" "ISO 8879-1986//ENTITIES Added Math Symbols: Arrow Relations//EN" Again, we chose to avoid deciding not to decide. We had a lot of feedback from users suffering from this indecision. Even if technical workarounds exist, we would like to encourage one of these forms to emerge. R007 - Packages --------------- We are very far from providing inter-distribution compatibility at the package level, and it is likely that someone will get broken dependencies if he/she mixes packages coming from different distributions. This document will not try to fix package names nor proposed dependency declarations for DocBook distributions. We however wanted to point out a problem that may be encountered when packaging SGML or XML: the package numbering scheme.