%dtdmods; ]>
C. M. Sperberg-McQueen and Lou Burnard, One Document Does It All: Documentation for an ODD system for tag set construction. TEI ED W29. Unpublished; for testing and development. November-December 1991. No source; created in m-r form. 26 Mar 92 : CMSMcQ : minor reworking, case fixes 25 Mar 92 : CMSMcQ : fixes after successful printing 18 Mar 92 : CMSMcQ : continued quick revs. Skipped past detailed discussion of ref crystals without completing them, reached ca. line 840 (detailed account of Odd.dtd). 17 Mar 92 : CMSMcQ : continued quick revs, add tag lists in appx 8 Mar 92 : CMSMcQ : Quick revs, hoping to make useful for others. Got to about line 300. 8 Jan 92 : CMSMcQ : Upgrade to new Odd.dtd. 29 Dec 91 : CMSMcQ : Further expansion. 14 Dec 91 : CMSMcQ : Expand to broader overview document to help TEI participants who must read ODD files. 25 Nov 91 : CMSMcQ : Made file (as ODDDOC.P2X)
One Document Does It All: Documentation for an ODD System of Tag Set Construction C. M. Sperberg-McQueen Lou Burnard Text Encoding Initiative: Document TEI ED W29 14 November 1991 Introduction

The Odd (One Document Does it all) system is a prototype SGML DTD-generator developed to aid in the production of version 2 of the Text Encoding Initiative's Guidelines for Text Encoding and Interchange (TEI P2). It is modeled very directly on Donald Knuth's WEB system,

WEB is distributed with the public-domain code of TeX and MetaFont, which are written in WEB. See Donald E. Knuth, Literate Programming (submitted to The Computer Journal [n.d.]) and [Donald Knuth], The WEB System of Structured Documentation [WEB User Manual], Stanford Computer Science Report CS980 (September 1983). Both are distributed with (at least some versions of) TeX.

but substitutes TEI SGML tags for TeX as the document formatting language and SGML DTDs for Pascal as the programming language.

In general, Odd works like this: in a single document (the Odd document) the user describes an SGML-based markup language or tag set, using the Odd tag set. The Odd document is then used by three distinct processors to produce three very different kinds of output: One processor (OddP2X) produces SGML-tagged prose documentation for the tag set, embedding fragments of the DTD within it (as Pascal code is intermingled with prose in the Web system); it corresponds to Knuth's Weave processor. A second processor (OddRef) produces SGML fragments to be included in an alphabetic reference manual of the tags, attributes, and other items in the markup language. OddRef has no direct correspondence in WEB, though it is closer to Weave than to Tangle. The third processor (OddDtd) produces an SGML DTD for the markup language; it corresponds to Knuth's Tangle processor.

A fourth processor (DtdOdd) has been developed which makes the production of Odd documents easier: it takes a conventional DTD as input and produces a set of partially completed tag documentation crystals; it can be used as the first step in preparing Odd documentation for existing tag sets.

In this document, we describe the types of files used by the Odd system for input and output, the specialized tags provided by the Odd DTDs, the DTDs and their structure, and the processors which work with them. An appendix includes a brief description of the tags in Tiny.dtd, the base tag set. The document in its current form is intended for internal use within the TEI and assumes either a profound familiarity with SGML and the TEI DTDs or a great tolerance for unfamiliar technical material. Types of Files

The Odd system uses and produces several types of files, with the following file extensions (common across all systems): These are described in more detail below. Odd Files

Odd is the main document type: it serves as input for all processors and contains all the information used in defining the tag set.

An Odd document uses, for the most part, conventional SGML tags for prose documents; because the Odd tags themselves are defined as an additional tag set, they are in principle usable with multiple bases, and any standard tag set for prose could be taken as the base tag set for Odd documents. At present (March 1992) the Odd system uses a prose base called Tiny.dtd, based on TEI P1 by way of the toy DTDs prepared for TEI workshops during 1991. The tags present in Tiny.dtd are described briefly in an appendix to this document. When TEI P2 is completed we expect to modify the Odd system to use TEI P2's tag set as its base.

The Odd DTD comprises additional tags to be used in what is otherwise a normal document using the Tiny tag set. Odd files should use the Odd DTD, which is invoked thus: %dtdmods; ]> ]]> The Odd DTD keeps its extensions to the base prose tag set rigorously separate and visible, and thus serves as an example of the method of DTD extension envisaged by TEI ML W43 and TEI P1 chapter 8.

This document in its current form focuses largely on the tags of the Odd DTD. Those of the other DTDs (P2X and Ref) are very much the same, differing mostly in whether certain elements are optional or required. P2X Files

P2X files contain prose documentation of the tag set; they will ultimately conform to TEI P2 but use extensions --- hence the file type of P2X (P2 extended), which keeps these files distinct from files vanilla-conformant with TEI P2, which we expect to use the filetype TEI. Since P2 is not ready yet, P2X files, like Odd files, currently use not P2 but the general-purpose prose tag set defined in Tiny.dtd. P2X files should use P2x.dtd, which is invoked thus: %dtdmods; ]> ]]> P2X, like Odd, keeps its extensions separate from the base (here Tiny.dtd) and serves as an example of tag set extension as envisaged in TEI P1 and ML W43. The P2X DTD is so constructed that Ref files may be included within P2X documents. Ref Files

Ref files contain one or more tagDoc, attDoc, classDoc, and entDoc elements. Ref files are the output of the OddRef processor and the filetype should not be used for other files. If Ref files are to be processed as independent documents, they should use Ref.dtd, but normally Ref files are included within P2X files for processing. DTD Files

DTD files contain DTD fragments; when the tag set is prepared using Odd, the DTD files are created by running the OddDtd processor on an Odd file. DTD files will contain element, attList, and comment declarations in the usual syntax. The element and attList declarations are enclosed in marked sections so that individual tags in the set can be conveniently turned off or redefined by users of the markup language; in the future names may also be replaced by entity references as described in TEI P1 chapter 8. At the moment, we expect to perform the name-indirection by post-processing the DTDs, not by modifying OddDtd. Special Tags in Odd and P2X Documents

To the base tag set used, the Odd system adds (1) specialized tags to appear in running prose, which mark tag names, attribute names, sample tags and lists of tag descriptions which should echo the descriptions in the reference manual, and (2) structures (reference crystals) to provide the specialized information needed to print an alphabetical reference list of tags and attributes. The following two sections describe these two classes of tag. Note that some tags are specific to Odd documents or P2X documents, while others are common to both types. Tags for Prose Documentation Phrase-level Tags for SGML Names

SGML tag set documentation requires frequent mention of generic identifiers, attribute names, and special attribute values. These are all technical terms and could be tagged with term, but we distinguish them so they can be typeset differently and generate distinct types of index entries.

Odd and P2X documents both use the following specialized tags: marks SGML start-tags and end-tags appearing in prose. marks a generic identifier for an SGML element type. marks attribute names appearing in prose. All of these elements can carry the following attribute: indicates whether the name marked is a TEI name or not Values: the item is a TEI name and should be indexed as such the item is not a TEI name and should not be indexed

No specialized tag is provided for defined attribute values; they should be tagged as technical terms. Extended examples of SGML tag usage should be enclosed in eg tags; generally the contents of the eg element should be a CDATA marked section. (eg is not defined as a CDATA element, however, because end-tags for open elements are recognized within CDATA elements; if a CDATA element within a paragraph contains a /p tag, it will be recognized as the end-tag for the enclosing paragraph. This interferes too often with examples to make CDATA elements useful for examples of SGML tagging.)

As an example of these tags' use, consider the parenthetical remark in the previous paragraph, which is tagged thus: eg is not defined as CDATA, however, because end-tags for open elements are recognized within CDATA elements; if a CDATA element within a paragraph contains a /p tag, it will be recognized as the end-tag for the enclosing paragraph. This interferes too often with examples to make CDATA elements useful for examples of SGML tagging.) ]]>

These tags are defined formally as follows: ]]> Tag Lists with Descriptions

The normal method of introducing tags in TEI documentation is to provide a single paragraph introducing a set of related tags, list the tags and attributes involved, provide examples, and end with a DTD fragment showing how the tags are defined. The following tags are provided for the lists of tags and attributes: marks a list of tags to be inserted into the prose documentation, formatted as a glossary list. Attributes include: indicates layout style to use for tag description list Values: each tagDesc generates list item with the gi as the term and the one-line desc from the tagDoc element as the description. undefined; present only as stub and reminder that other styles may be desired later. undefined reminder that other styles may be desired later. indicates that a given gi should be included at this point in a tagList. Attributes include: indicates which gi should be included in a tag list. indicates which attributes of the element, if any, should be mentioned in the description of the tag. marks a list of attributes and their descriptions. Attributes include: specifies the element class for which the attributes in the list are defined. indicates which attributes of the class, if any, should be included in the list.

In print format, both tagList and adList should be formatted like a definition list. In a tagList, each tagDesc generates the tag's GI (in angle brackets, bolded) as the term, and the element's one-line description (from the desc element in the tagDoc element) as the definition. Embedded lists of attributes may also occur within any entry for a GI; the atts attribute specifies which attributes defined for an element should be included in the embedded list. (The default is none.) In P2X files, tagList and tagDesc do not occur: the OddP2X processor translates them into glossary lists, as shown below.)

As examples of the use of these tags, consider the introduction of the phrase-level tags for SGML names in the preceding section, which was tagged thus: Odd and P2X documents both use the following specialized tags: All of these elements can carry the following attribute: ]]>

The P2X equivalent of this is as follows: ... ... ... ]]> The file P2xAdd.dtd does actually define a tagList element for P2X documents, but it is not used by OddP2X at this writing (18 March 1992).

These elements are defined thus: ]]> Tags for Embedded DTD Fragments

For reference manuals, at least, it is important not merely to describe the tags of a markup language in prose: the reader must be able to consult the formal definitions of the tags as well. In order to exhibit clearly the logical relationships among related tags, it is useful to be able to embed DTD fragments in a running prose commentary. (Otherwise, the user would have to consult the element and attribute-list declarations in the alphabetical reference list; not a convenient method of understanding a group of tags which conceptually go together.) Indeed, the Odd system assumes that the entire DTD defining the tag set will be reproduced, not necessarily in order, in the DTD fragments included in the documentation. (In this, the influence of Web is particularly clear.)

The tags provided for this purpose are these: encloses a schematic representation of a portion of a DTD within which special elements represent different types of declaration. Attributes include: specifies which DTD file the DTD fragment belongs in. identifies the previous DTD fragment continued by the current one, if any. indicates that the element and attribute list declarations for a given element type logically belong here. Attributes include: indicates which element's declarations appear at this location by pointing at the tagDoc for that element. indicates that an entity declaration belongs at this point in a DTD fragment. Attributes include: points at the reference crystal for this entity declaration. indicates that a parameter entity declaration defining an element class logically belongs at this point in a DTD fragment. Attributes include: points at the classDoc crystal for the class in question. indicates whether entity is for model groups or attribute lists. Values: entity is for model groups. entity is for attribute list declarations. indicates that a parameter entity reference logically belongs at this point in the DTD. indicates that the contents of a distinct dtdFrag element logically belong here. Attributes include: points at the dtdFrag which logically belongs here. indicates that a comment declaration belongs at this point in a DTD fragment, with the same contents as the element. Their meanings and usage are discussed below.

In the simplest and most common case, the dtdFrag element contains a set of tagDecl elements which are replaced in the output with the element declaration and attribute list declaration for the element. The tag's generic identifier and its element declaration are copied directly from the tagDoc whose id is given as the value of the tagDecl tagDoc attribute. The id should usually be the same as the generic identifier, or the first eight characters of it, but may not be, if more than one tagDoc has the same generic identifier (as for div1 in prose texts, verse texts, drama texts, and dictionaries).

In more complicated cases, the dtdFrag may also contain entity declarations (entDecls), parameter entity declarations for element classes (claDecls), parameter entity references (peRefs), references to other DTD fragments (dtdRefs), and SGML comments (commDecls). Each of these except commDecl is empty and must point at an associated reference document crystal. N.B. the content model for dtdFrag thus tightly constrains the form taken by Odd-generated DTDs. It is not possible in Odd as currently implemented, for example, to include notation declarations in a DTD.

entDecl indicates that a general or parameter entity declaration goes in the DTD at this point. The name and entity text of the entity declaration are copied from the appropriate entDoc, indicated by the entDoc attribute on the entDecl tag.

For example, to declare an external file containing entity declarations, which will be referred to somewhere in the DTD, and to declare the entity TEI with the expansion Text Encoding Initiative, a DTD fragment should read, in part: ]]>

The document should also contain entity document crystals which define these entities (see below).

A claDecl indicates that a parameter entity reference defining an element class belongs in the DTD at this point. (For discussion of element classes, see the examples below and the discussion of classDoc in section .) The classDoc attribute points at the relevant classDoc crystal; the type attribute indicates whether the parameter entity is to contain a list of all members of the class (for use in content models) or a set of attribute declarations common to all elements in the class (for use in attribute list declarations). If the type attributes on the claDecl and classDoc elements disagree, the behavior of the processors is undefined. (They should issue a warning, but this has not been implemented yet.)

The entity declarations generated by a claDecl element use entity names constructed from the name of the element class: the name of the class, prefixed by m. or x. for model-group entities and by a. for attribute-list entities. The entity text used in the declarations is generated from the reference crystals as well. For model-group entities, two entity declarations are generated: the entity text for the first (with the x. prefix), is the empty string; for the second (with the m. prefix) the entity text is a list of members of the class, separated by vertical bars and preceded by a reference to the first entity. For attribute-list entities, the entity text is a series of attribute definitions generated from the attList structure within the classDoc crystal.

For a class called crystal, for example, a claDecl classDoc=crystal type=model will cause OddDtD and OddP2X to generate two entity declarations of the form ]]> where the ellipsis is filled with the names of members of the class. The x. entity is provided to allow users to add new elements to the class simply by defining a new meaning for that entity within their DTD subset. To add the new user-defined elements axiom and theorem to a tag set, and allow them to appear anywhere that crystals can appear the user's document would include the following lines within the DTD subset: ]]>

For attributes shared among members of a class, a claDecl classDoc=crystal type=atts will cause OddDtD and OddP2X to generate an entity declaration of the form ]]>

Elements are assigned to classes by the values given to the tagDoc crystal's classes element, as described below in section . If, for example, the tag documentation for emph and highlighted elements assigned classes as follows: ... ... ... ... soup ... phrases ... rhetorical ... typographic ... ]]> then something like the following claDecls should appear in some dtdFrag element: ]]> which would in turn generate (in the OddDtd processor) something like: ]]>

A peRef indicates that a parameter entity reference should occur at this point in the DTD. The peRef is empty; its n attribute gives the name of the entity. For now, no check is made that the parameter entity is declared anywhere, so that references to external files may be included in DTD fragments without entDoc crystals having to be built for them. [The wisdom of this decision is open for discussion.]

Need example.

A dtdRef element indicates that the content of some other dtdFrag element should be inserted here; the dtdFrag attribute gives its ID. OddP2X fetches the section number and name of the other dtdFrag (the latter given as the value of its N attribute) and prints it as a comment, in the form: ]]> OddDtd bodily inserts a similar comment, but then follows it with the contents of the specified DTD fragment, and marks its end with a comment saying that the embedded DTD fragment is now ended. OddRef ignores dtdFrags entirely.

Other comments may be specified using the commDecl element. These should all be block prose comments --- no indented lists or other fancy formatting ---, as they may be reformatted as paragraphs by OddP2X and OddDtd.

Need example. Processing of DTD Fragments

OddP2X processes a dtdFrag by printing a nice header for the fragment and then nicely printing its contents: the comment, element and attribute list declarations are printed with appropriate delimiters, and each dtdRef element present generates a comment with a cross-reference to the appropriate DTD fragment.

OddDtd processes a dtdFrag by emitting a comment identifying the fragment, then writing out the contents of the fragment. Comment, element, and attList declarations are written out with appropriate delimiters, dtdRef cross-references are handled by recursively processing the appropriate dtdFrag. Finally, an ending comment is emitted. The file into which all this is written is given by the dtdFrag file attribute.

If the dtdFrag element bears a contin attribute, it should be viewed as a continuation of the dtdFrag whose ID is given as the contin value, and printed / copied immediately after it. A series of such continuations will be printed / copied in order. It is a semantic error for the ID in question to belong to anything other than a dtdFrag.

Examples needed here.

The dtdFrag element and its constituent parts are defined by the following declarations: ]]> Tags for Reference Material

In addition to the running prose documentation of the tag set, the Odd system produces reference material describing each tag and attribute. Some of the information needed for the reference section is provided in the Odd input files within tagDoc, classDoc, and entDoc elements (called here reference crystals); other pieces of information used in the reference section are generated from dtdFrags and the like. In addition to providing much of the reference matter, the reference crystals are also used to generate the prose descriptions of tags in tag lists and the SGML declarations in DTD fragments and DTD files. tagDoc: Tag Documentation Crystal

The tagDoc crystal documents one SGML element, providing its generic identifier, full name, definition, examples, and the like. The attributes of the element are documented with an embedded attList crystal (documented below).

The tags used in the Odd tagDoc crystal are these: contains reference information concerning one SGML element type

In addition to those used in Odd tagDocs, P2X tagDocs use the following tags:

As an example of a complete tag documentation crystal, consider the tagDoc for the gi element described above: gi generic identifier marks a generic identifier for an SGML element type. entry element contains an entire dictionary entry. ]]>

Although it may be printed in angle brackets, distinguish gi from tag: the latter is for complete start- or end-tags, the former for isolated generic identifiers. Should be a valid SGML name. - O (#PCDATA) ]]>

The tags for tag documentation and other crystals are defined in three distinct files: OddRef.dtd for those used only in Odd documents, P2xRef.dtd for those unique to P2X documents, and ComRef.dtd for those with the same definition in both document types. The relevant portion of OddRef.dtd is this: ]]>

The corresponding definitions in P2xRef.dtd are these: ]]>

The definitions for tagDoc components in ComRef.dtd are these: ]]> classDoc: Element Class Documentation Crystal

The classDoc crystal documents a class of elements. The membership of the class is not given within the crystal; it is governed instead by the classes element in the tagDoc and classDoc crystals describing the members and subclasses of the class. This means that when new tags are added to a tag set, they can be assigned to classes without having to change the class documentation.

The tags used in class documentation crystals include: contains reference information for one element class: either elements which appear together in SGML content models, or elements which share some common attributes.

[Examples here.]

Like those of the preceding section. the SGML declarations for these elements are found in three distinct files: apart from those already defined, the file OddRef.dtd contains: ]]>

The file P2xRef.dtd contains: ]]>

The file ComRef.dtd contains: ]]> entDoc: Entity Documentation Crystal

The entDoc crystal documents a general entity or parameter entity.

The tags used in entity documentation crystals include: contains reference information about one entity

[Examples here.]

Like those of the preceding sections, the SGML declarations for these elements are found in three distinct files: apart from those already defined, the file OddRef.dtd contains: ]]>

The file P2xRef.dtd contains: ]]>

The file ComRef.dtd contains: ]]> attList: Attribute Documentation

Attributes may be defined either for individual elements, in which case they are documented in the relevant tagDoc crystal, or for classes of elements, in which case they are documented in the relevant classDoc. In either case, an attList crystal is used, containing one attDef crystal for each attribute.

Information about the attributes is printed as part of tag lists, when the attribute names are included in the atts attribute of a tagDesc element, or in separate lists for attributes assigned to whole classes of elements, when their names are included in the atts value of an adList element. The documentation crystals are also used to generate attribute list declarations for elements.

The tags used to document attributes are: groups attribute definitions in a tag or class documentation crystal contains documentation for one attribute for one element or element class

[Examples here.]

All tags used for attribute lists are common to both Odd and P2X documents; their definitions occur in file ComRef.dtd. ]]> DTDs and DTD Fragments

The DTDs for the Odd system are broken up into several files, which are described in this section. Overview of DTDs and DTD Fragments

Each type of file has its own DTD; some are constructed from several parts. The various DTD types and the fragments from which they are made are: The Odd Tags and DTD

As noted above, the Odd DTD is split among several files for clarity of construction and ease of maintenance. The specifics of the DTD are described in this section. Odd.dtd: Redefinitions of Element Classes

The file Odd.dtd itself serves mostly as a driver file to include the component parts of the DTD. Because Odd.dtd is invoked in the DTD subset, it need not itself invoke the prose base tag set (Tiny.dtd). Instead, it first defines some parameter entities which will be used in the prose base, allowing the Odd tags to appear within the prose base: ]]> Then it embeds the prose base's own parameter entity declarations (tinyents.dtd), so they can be used in the element declarations for Odd tags, and overrides the default declarations of two elements (list and item). %xents; ]]> Finally it embeds references to the two files OddRef.dtd and oddadd.dtd, which actually define the elements and their attributes. %oddref; %oddadd; ]]> OddAdd.dtd: Additional Tags for Prose OddRef.dtd: Tag Documentation Crystals The P2X DTD Processors

Four processors are envisaged as part of this system. Detailed specifications of their behavior is given in the appropriate LTD (link-type definition) files. (The LTD files are now out of date; they assume an earlier version of the Odd tag set and have not been replaced.) Summary of Tiny.dtd Tags

The following tags of Tiny.dtd are used in Odd and P2X files. For details, see the corresponding tags in TEI P1 and consult files Tiny.dtd and TinySoup.dtd (or check to see whether an Odd file for these DTDs has been prepared).

The following tags in Tiny.dtd are not available in Odd or P2X documents: Summary of Tags Common to Odd and P2X DTDs Summary of Tags Unique to Odds Summary of Tags Unique to P2X