The tags that have been described in earlier Chapters of this Report are only part of what makes up an SGML description of a document encoding scheme. The manner in which document components delimited by these tags are to fit together, along with other features of the encoding scheme, are represented in the document type definitions (DTDs). The draft versions of the TEI DTDs are contained in Appendix C. Readers are invited to comment specifically on the design principles used in constructing these DTDs, as described in this Chapter of the Report.
This Chapter does not attempt to be a complete description of how to write and modify DTDs. Readers desiring such a description should consult the SGML standard document and other tutorial materials.
The use of SGML allows the specification of rigid structures for texts. However, we anticipate that texts encoded using the scheme described in this report will be diverse in structure and content. Some of the aspects of these texts will not have been anticipated; it must be possible to extend the scheme presented here to incorporate new information. Some texts will have components and features that are close to those in this Report, but differ from what is described here either in their content or in their relationship to other components of the text. Accordingly, the DTDs have been written to allow a great deal of flexibility both in using the tags and attributes defined by the project, and in modifying or extending the encoding scheme. This Section describes the approach to designing the DTDs; the remaining Sections in this Chapter describe how to modify the scheme.
The motivating principle for the design of the DTDs has been to allow but not require structural constraints on documents. An encoded document is seens as comprising a header and a body. The header can contain SGML declarations and additional declarations required to conform to the TEI, as described in previous Chapters. The body contains the encoded text itself.
The body is expected to comprise some parts that can be well described using tags and structural constraints, and some parts that can not be so described. This body part can be viewed as a text with some well-structured components. These structured components correspond to aggregates of information marked with some tags from the DTDs; an SGML parser would be able to represent them as well-structured subtrees. The remainder of the text is those parts that are difficult to encode using the tags, or that do not fit together in some nicely predetermined fashion. Some documents can be entirely coded as structured components that fit together predictably; other documents may require much less structured encodings.
The DTDs in the Appendix attempt to accommodate this diversity by
viewing the body of a document as a mixture of text and aggregates. At a
high level in the description, the body is simply data with embedded
structures. This is represented in SGML by a definition of this form:
]]>
This states that
the body element comprises parsable character data, together with the
possible inclusion of structures as defined by the parameter entity
It is reasonable to expect that each aggregate will itself have some
defined structure represented by an SGML model. The inclusion mechanism,
as used in the element declaration just given, allows any of the
aggregates referenced in the inclusion list to appear anywhere within
the
The essence of this approach is to treat the document body as a stream of text which contains more or less structured portions. It will be necessary for users of the scheme to change the amount and kind of structuring. The next sections indicate how to do so.
A document to be presented to an SGML parser, and hence a document encoded according to one of the TEI project DTDs, must include the DTD to be used (perhaps implicitly) and the document instance. The DTD might be given explicitly in the file or, more commonly, it might be in another file and only be referenced.
If the document type declaration is given separately the document
file can contain a declaration that references it, together with some
declarations that change it. These changes are in the
The order in which these parts of the declaration are presented is
important. An SGML parser will interpret the declarations from the
external entity
It can be useful to have some global declarations interpreted before some of the local declarations. In this case, the global declarations are split into separate entities (files); those that should be interpreted among the local declarations can be explicitly included where necessary. The form of the document type declaration would be like this: %globals ]> ]]> In the extreme, to gain full control over the order in which declarations are elaborated, the reference ot an external declaration can be omitted, and explicit references ot external entities included in the local declarations. This technique is used to modularize the definitions in the TEI DTDs; the details are explained in what follows.
The remainder of this Section shows how to make specific changes to DTDs. They are all considered to be taking place in the context of a declaration such as the one shown above.
Extensive modification can essentially result in a complete redefinition of the DTD or at least of its structural aspects. This might be viewed as a bad thing, in that the standardization achieved by the TEI has been done away with. However, if the modifications are accomplished using SGML mechanisms, as is the case with all of the changes described here, there remains a well-defined object with a clearly specifed structure and a clear relationship to the TEI DTDs and tags. These techniques should be used with caution, and with an awareness of the complexities that can be introduced.
The tag names to be used in an SGML document instance are derived from the specification of the structure of the document in the DTD. Users of the TEI encoding scheme will sometimes want to specify their own tag or attibute names, perhaps using names already in use in a particular organization or project, or perhaps using names in some other language. To facilitate this renaming parameter entities can be used to assign new string values to be used in the document type declaration.
For example, consider renaming the paragraph tag from
A similar technique can be used to rename attributes. To rename the
attribute
The DTDs must include declarations for the allowed values for the
attributes that are used with tags. These are defined by making
reference to the entity containing the name of the attribute, and then
specifying the allowed values and the default to be assumed if the
attribute is missing (this was discussed in Chapter 3). For example,
there might be an attribute
Suppose it is required to redefine the
The structural aspects of a document are reflected int he SGML models that specify the content of elements. It is this part of the DTDs that corresponds to a grammar for the class of documents. By redefining the model for a tag, it is possible to restrict where tags can occur, to allow tags to occur in new places, or even - in the extreme case - to redefine the structure of the entire body of the document and thus do away with syntactic restrictions altogether.
The external declarations of models refer to elements indirectly
using the paramter entities that contain the names of the entities. For
example, the model for a simplified chapter could be defined like this:
]]>
This defines a
As was true for references to attribute names via parameter entities in local attribute declarations, so for element names in local model declarations: any names referenced in the local declarations must be declared before they are used. This means that the actual names of the elements must be used in local model declarations, rather than the symbolic names (i.e., parameter entity references).
This declaration could be used locally to redefine the structure that is defined in the global declarations. ]]>
If a model is defined using the SGML keyword
Specifically, it is possible to redefine the entire body of the document to have any arbitrary content, as follows. ]]>
Adding a tag to the encoding scheme requires two things. First, there must be a declaration of the element (name, together with model); earlier examples show some ways of doing this. The model may reference other elements - either contained in the existing DTD or supplied in the local declarations - and thus be the definition of a new aggregate of arbitrary complexity.
Second, this new aggregate must be ties in to the existing grammatical structure. This is done by modifying one or more of the existing models to reference the newly declared element.
Attributes are associated with tags (elements) through the
The first case is the addition of an attribute to a tag that has no attributes in the DTD. The attibute list declaration for the tag must be given in a local declaration. The next Section will show where the declaration should be placed. The declaration will be similar to this example: ]]>
The second case is the addition of an attribute to a tag that already has
attributes. In this case the global declarations contain an
Adding an attribute to a new tag that is defined in the local declarations is done by giving an attribute list declaration with the declaration of the element.
This Section shows a simple DTD and a sequence of transformations required to support the changes described earlier in the Chapter. The DTD shows only some simple structural tagging, using a small subset of the AAP tags and a simplified grammar. The first version is straightforward; the later versions are obtained by simple transformations that could be automated. The final version is more complex. The presentation of the sequence of transformations is intended to make the final version more accessible. The example document will not be modified in intermediate steps, save as is required to access the changing global declarations. The final version will demonstrate how to use these declarations to support changes to the document.
Here is a DTD for a simple class of documents. There are only a few strucural elements defined here. The overall structure of the document is given as was described earlier in the Chapter: there is front matter (simply character data here) and a body; the body is character data with included aggregates (only one in this simple example). ]]>
Here is a simple document that conforms to this DTD, and that makes
reference to an external entity to find it.
First paragraph. Second paragraph. Third paragraph. Fourth paragraph.
This DTD will now be modified in several stages. The first transformation is required to allow renaming of tags. All of the element names are defined in parameter entities.
The entities for the tag names are gathered into a separate file. It contains these definitions: ]]>
The remaining part of the DTD now requires some changes. First, the models that refer to these names are changed to refer to the entities. Second, models may require some minor syntactic modifications. The only modification required is exemplified by the insertion of brackets in the model here for the chapter declaration so that the occurrence indication (the question mark) comes after a bracket and not immediately after the name of the entity: ]]>
The document is changed by making reference to these two external files. %gNames ]> ... body of document unchanged ... ]]>
The global declaration of names is interpreted first, because it is
explicitly
called for in the local declarations. Tags could be renamed by parameter entity
declarations placed before the explicit reference to the global name
declarations. For example, to redefine the name of a document component from
The next transformation allows models to be redefined. All that is required is the provision of parameter entity for the model for each element, and the use of the entity name in the element declarations. If no models are redefined, there are no required changes to the document itself.
It is important to note that there is an entity for each element rather than for each model. If there were an entity for each model, all the elements defined in terms of that model would have be changed in the same manner if the parameter entity were redefined. ]]>
Attribute names must be declared in parameter entities, and these entities referenced in the attribute definition lists. The attribute name declarations can be placed in the same file as the tag name declarations. ]]>
The DTD is changed to reference the entities. ]]>
There are two changes that are required to support redefinition of attributes. The first is the provision of entity names for the definition of each attribute, so that the attribute description can be redeclared if necessary. The second is the addition of an entity to each attribute definition declaration to allow new attributes to be added to tags that have attributes.
The DTD is changed as follows: ]]>
The document itself can now use all of the capabilities that have been
provided by these praeterizations of the DTD.
%gNames
]>
First paragraph. Second paragraph. Third paragraph. Fourth paragraph.
As a final example, we now restructure this to use external entities to group together the sets of definitions being applied by the document writer. Here is the entity for the name redefinitions. ]]> Here is the entity for the other redefinitions. ]]> The revised included definition is now this. %myNames %gNames %myDefs ]> ... document as in last instance ... ]]>
All of the versions of this example have been parsed by an SGML parser.