Constructing an XML Version of the TEI DTD

]> Constructing an XML Version of the TEI DTD C. M. Sperberg-McQueen

An unpublished document.

No source: this electronic form is the original form.

1999-07-07 : CMSMcQ : stop suppressing att, gi, tag, val. They aren't in the main GI and they aren't given replacement declarations. 1999-07-07 : CMSMcQ : wrap declaration of set element in a marked section for TEI.drama 1999-07-06 : CMSMcQ : fix problem with PCDATA elements: their enclosure in marked sections was incomplete, and they lacked attlist declarations. 1999-06-17 : CMSMcQ : test other tag sets, break up long scraps 1999-06-03 : CMSMcQ : finish testing TEI.prose 1999-05-31 : CMSMcQ : finish alignment with Lou's list of changes to TEI P3, and now try to get all the scraps into the right driver scraps, and actually generate the extension files. 1999-05-14 : CMSMcQ : try to get all the scraps into the right driver scraps, and actually generate the extension files. (This didn't happen, because I got caught aligning with the revised TEI P3 DTD.) 1999-05-07 : CMSMcQ : finish manual-changes section. Now ready to generate the actual extension files. Discussion with Lou about sequencing the work and how to finish. 1999-05-04 : CMSMcQ : what is this again? huh? I'm supposed to finish this, but that requires understanding it again. Rats. 1999-02-23 : CMSMcQ : driving toward home. One by one, the tag sets in the manual-changes section fall to the advancing cursor. Terminology is going to have to be rethought and rewritten, almost as fully as the dictionary chapter. 1999-02-23 : CMSMcQ : working through the manual-changes section. 1999-02-22 : CMSMcQ : further work, in Oxford while waiting for LB to deal with other business. Mostly blocking out the manual-changes section 1999-01-12 : CMSMcQ : broaden scope to include general discussion of TEI DTD and XML (London, after EC meeting) 1998-01-10 : CMSMcQ : clean up some more (Chicago) 1998-01-08 : CMSMcQ : tag, clean up (still in Cambridge) 1998-01-08 : CMSMcQ : sketch imf, mf, m functions, put some notes into electronic form 1998-01-04 : CMSMcQ : began work (on paper) in Cambridge, Wisconsin Construction of an XML Version of the TEI DTD C. M. Sperberg-McQueen &date.last.touched; This unpublished document is distributed privately for comment by friends and colleagues; it is not now a formal publication and should not be quoted in published material. This document has not yet been reviewed by both editors of the TEI; what it says about the beliefs of the editors should be taken as a proposal by the author for the approval of his co-editor. Abstract

This document describes issues involved in creating an XML version of the SGML document type definition (DTD) created by the Text Encoding Initiative, and proposes solutions. It defines a TEI extensions file which incorporates those solutions, in order to allow experimentation.

The discussion of inclusion exceptions defines a method of rewriting SGML content models so as to achieve effects similar to those provided by inclusion exceptions. To make an SGML document type definition compatible with XML, inclusion exceptions must be eliminated. The simplest method of ensuring that this change does not invalidate existing documents is to modify the content model of every element which can occur as a descendant of any element with inclusion exceptions in its content model, in the manner described here. That will ensure that elements named in inclusion exceptions remain legal in all the locations where they are currently legal.

The methods of changing content models described in this paper are believed to preserve determinism (what ISO 8879 calls lack of ambiguity) and to simulate the effects of inclusion exceptions properly. At this point, however, no proof of either conjecture is offered.

Introduction XML and DTDs

The Extensible Markup Language (XML) defines a syntax for document type definitions similar to that provided by the Standard Generalized Markup Language (SGML), but more restrictive. In particular, XML allows neither inclusion nor exclusion exceptions, and prohibits the ampersand connector.

Modifying an existing SGML document type definition (DTD), such as the TEI DTD, to conform to XML thus involves: removing tag omissibility information normalizing references to parameter entities by ensuring that they always end with a semicolon removing & connectors normalizing mixed-content models to the canonical form prescribed by XML (#PCDATA must come first, the list of sub-elements must be flat, and the occurrence indicator must be a star) removing exclusion exceptions removing inclusion exceptions

Modifying the TEI DTD for XML

This document describes in detail the changes necessary to perform these modifications on the TEI DTD. The changes take the form of TEI modifications files suitable for use as the entities TEI.extensions.ent and TEI.extensions.dtd files.

The modifications have different degrees of difficulty. Some affect the technical content of the TEI DTD in serious ways, and therefore require review by the TEI's Technical Review Committee before being formally integrated into TEI P3, while others do not affect the technical content of the TEI at all, or affect it only in minor ways. Changes of this latter type may be regarded as corrections of obvious simple errors, and may be performed by the editors under their authority to correct corrigible errors in the text of the Guidelines. (The concept of corrigible error is defined in document TEI ED W46 (?); in brief, a corrigible error is one which both editors agree is an error, which has an obvious fix, and the fix for which will not affect any existing data.) Each change proposed in this paper is identified as either a correction to a corrigible error, which the editors expect to fix in the course of preparing a revised and corrected reprint of TEI P3, or else a substantive change requiring review by the Technical Review Committee.

Overview of changes to the TEI DTD

Not all of the changes to the DTD are handled by this document. In particular, this document does not suppress the tag-omissibility indicators in the TEI DTD; that job is left to special-purpose software. In its current form, this document also does not completely normalize all mixed content models to the form required by XML. I started to make it do so, and have just realized that carthage may already do what is necessary. I need to find out for sure whether carthage does the job, and either complete or remove the partial sets of changes described for the mass redeclaration of all phrase.seq and paraContent elements. Those that are, are summarized in the following overviews of the extensions files.   Provide default tagset declarations Define TEI keywords Fix placePart class Reproduce class declarations for phrases Reproduce inclusion classes Reproduce classes used by specPara Embed ent files for tag sets Element class m.Incl New specialPara New declaration for phrase and phrase.seq New declaration for paraContent New declaration for component and component.seq Suppress definitions of elements with ampersand Suppress element declarations with exclusions Suppress some mixed content elements Suppress users of phrase.seq Suppress standard definitions of PCDATA elements Suppress definitions in core tag set Suppress definitions in text-structure tag set Suppress definitions in front-matter tag set Suppress definitions in header tag set Suppress definitions in verse tag set Suppress definitions in drama tag set Suppress definitions in spoken-text tag set Suppress definitions in terminology tag set Suppress definitions in segmentation and alignment tag set Suppress definitions in analysis tag set Suppress definitions in feature-structures tag set Suppress definitions in text-criticism tag set Suppress definitions in graphs tag set Suppress definitions in tables tag set   New definitions of elements with ampersand Redeclare elements with mixed content models New declarations for users of phrase.seq New declarations for exclusion exceptions New declarations for PCDATA elements  New declaration of set element New definitions for core tag set New definitions for text-structure tag set New definitions for front-matter tag set New definitions for header tag set New definitions for verse tag set New definitions for drama tag set New definitions for spoken-text tag set New definitions for terminology tag set New definitions for flat terminology tag set New definitions for segmentation and alignment tag set New definitions for analysis tag set New definitions for feature-structures tag set New definitions for text-criticism tag set New definitions for graphs tag set New definitions for tables tag set

Intended use of this document

The immediate goal of this document is to allow experimentation with the TEI DTD and XML processors, by providing the extensions files needed to make the full TEI P3 DTD work with XML processors. To use the extensions files created by this document with other extensions files (e.g. those of TEI Lite), manual merger of the extensions files is required. The editors plan to automate this merger as soon as possible; the following stages of development are anticipated: produce extensions files from this document modify these extensions files to allow suppression or modification of individual elements, using the naming convention xml. + GI (e.g. xml.num, xml.recordingStmt, etc.) modify carthage and the Pizza Chef web site to automate the merger of the extensions files. The following calculations will be needed: if the user's TEI.extensions.ent file suppresses an element type e, generate an entity declaration of the form <!ENTITY % xml.e 'IGNORE'> so as to suppress the XML version of that element. (Strictly speaking, this is unnecessary for elements not declared here, but working out whether such a declaration is needed looks like more work than we want to put into a short-term system.)

A list of open questions is included at the end of the document.

Tag omissibility information

Removing tag omissibility information is a trivial task which can be accomplished by a DTD pretty printer, or even a simple editor script. The strings - -, - O, O -, and O O are legal in a DTD only as tag omissibility information, within comments, or within literals. In the TEI DTDs, they do not occur within literals or comments, so a global change in an editor would handle the problem.

To enable the necessary changes to be made with a minimum of manual intervention, however, it is probably better to add a run-time option to a DTD pretty printer, to make it suppress this information, or replace it with a reference to one of the parameter entities om.RR, om.RO, om.OR, or om.OO. If the run-time flag is set, the following entities will be added to the beginning of the DTD: ]]> The program carthago has accordingly been outfitted with two run-time options to suppress the omissibility markers, or to replace them with entity references.

Normalizing parameter-entity references

In the short term, we will normalize parameter-entity references using the pretty printer mentioned above (or else eliminate them entirely, by running the test DTD through a pre-processor like Carthage, which expands all parameter-entity references).

In the long run, we will systematically normalize all content models in the tagdocs of TEI P3 by adding semicolons to parameter-entity references which currently do not have them. N.B. the editors regard this as a correction of a corrigible error, and this normalization will be performed in the text of TEI P3 as soon as possible.

Ampersand connectors

Removing ampersand connectors involves either rewriting the content model as a set of alternative sequence groups (thus retaining strict equivalence with the existing model) or revising the content model entirely. In the case of the TEI, the editors both agree that most uses of & have proven to be design errors, so we propose simply to revise the content models.

The following content models use ampersand connectors in TEI P3: cit (part of the core) respStmt (part of the core) publicationStmt (part of the header) graph (part of the additional tag set for networks and graphs)

In this section, we provide alternate declarations for each of them. In the entity extensions file we must first suppress all of them: <!ENTITY % cit 'IGNORE' > <!ENTITY % respStmt 'IGNORE' > <!ENTITY % publicationStmt 'IGNORE' > <!ENTITY % graph 'IGNORE' > And in thd DTD extensions file we must redefine them all: New cit declaration New respStmt declaration New publicationStmt declaration New graph declaration

N.B. All the ampersand-eliminating content-model changes in this section are regarded by the editors as corrections of corrigible errors, and will be integrated into the text of TEI P3 as soon as possible.

The cit element

The standard declaration for cit is as follows: ]]> We will redefine it with a slightly more general content model (well, almost -- see below): <!ENTITY % XML.cit "INCLUDE" > <![%XML.cit;[ <!ELEMENT %n.cit; - - ((%n.q; | %n.quote; | %m.bibl; | %m.loc; | %m.Incl;)+) > <!ATTLIST %n.cit; %a.global; TEIform CDATA 'cit' > ]]> (The Incl class included here has to do with inclusion exceptions; see below.) If we wished to replicate precisely the original content model, without the ampersand, we could define cit thus: ]]>

As it turns out, however the declaration proposed above is ambiguous, since link is a member of both the loc and Incl classes. We'll have to unroll one or the other of these two classes; a coin toss decides that we should unroll loc. <!ENTITY % XML.cit "INCLUDE" > <![%XML.cit;[ <!ELEMENT %n.cit; - - ((%n.q; | %n.quote; | %m.bibl; | %n.ptr; | %n.ref; | %n.xptr; | %n.xref; | %m.Incl;)+) > <!ATTLIST %n.cit; %a.global; TEIform CDATA 'cit' > ]]>

After further investigation (i.e. further attempts to use the DTD produced by a draft of this paper), however, it becomes clear that loc is a subclass of phrase, so that every content model which uses both the phrase class and the Incl class is going to have troubles. So instead of unrolling each case individually, we take a harsher approach, and remove link from the loc class.  <!ENTITY % x.loc '' > <!ENTITY % m.loc '%x.loc; %n.ptr; | %n.ref; | %n.xptr; | %n.xref;' > This should not cause problems for any existing data, since link is still a member of the class Incl, which is (after all) allowed virtually everywhere.

The respStmt element

Similarly, we could replicate the original definition of respStmt if we wished, but it's probably better regarded as a design error to be fixed: ]]> We give it a simpler and looser declaration instead: <!ENTITY % XML.respStmt "INCLUDE" > <![%XML.respStmt;[ <!ELEMENT %n.respStmt; - O (%n.resp; | %n.name; | %m.Incl;)+ > <!ATTLIST %n.respStmt; %a.global; TEIform CDATA 'respStmt' > ]]> The prose should make clear that in principle, a respStmt should have at least one resp and at least one name. Enforcing that with the content model may be more pedantic than we want to be, though.

The publicationStmt element

The content model for publicationStmt includes an editorial error I am glad to have the occasion to fix. (In normal bibliographic practice, when place and publisher are both given, the place is given first. I don't know what got into me that morning.) ]]> Rather than simply replace the current content model with an equivalent ampersand-less expression, we'll change it. For compatibility with existing data, we'll make the new expression loose rather than tight. <!ENTITY % XML.publicationStmt "INCLUDE" > <![%XML.publicationStmt;[ <!ELEMENT %n.publicationStmt; - O ( (%n.p;, (%m.Incl;)*)+ | ((%n.publisher; | %n.distributor; | %n.authority; | %n.pubPlace; | %n.address; | %n.idno; | %n.availability; | %n.date;), (%m.Incl;)*)+ ) > <!ATTLIST %n.publicationStmt; %a.global; TEIform CDATA 'publicationStmt' > ]]>

The graph element

The graph element uses the content model to require that graphs be encoded nodes-first or arcs-first, but not mixed hugger-mugger. We'll retain that characteristic. The old declaration is this: ]]> We could require arbitrarily that all nodes come first; it's not clear whether any legacy data using graph actually exists. But in the interests of backward compatibility, the new content model might as well allow precisely what the old one did, even if that now seems like a design error: <![%TEI.nets;[ <!ENTITY % XML.graph "INCLUDE" > <![%XML.graph;[ <!ELEMENT %n.graph; - - (((%n.node;, (%m.Incl;)*)+, (%n.arc;, (%m.Incl;)*)*) | ((%n.arc;, (%m.Incl;)*)+, (%n.node;, (%m.Incl;)*)+)) > <!ATTLIST %n.graph; %a.global; type CDATA #IMPLIED label CDATA #IMPLIED order NUMBER #IMPLIED size NUMBER #IMPLIED TEIform CDATA 'graph' > ]]> ]]>

Normalizing mixed-content models Individual elements

The following elements use the keyword #PCDATA in ways that must be changed to be legal in XML: sense (dictionaries) re (dictionaries) persName (names and dates) placeName (names and dates) geogName (names and dates) dateStruct (names and dates) timeStruct (names and dates) dateline (default text structure) In most of these cases, the #PCDATA keyword is given last, not first, in the content model; in one or two, it's neither first nor last. For example: ]]> In one or two cases, the group also has a plus operator instead of a star operator. ]]>

We must redeclare each of them, which means first of all that we must suppress their standard declarations: <!ENTITY % sense 'IGNORE' > <!ENTITY % re 'IGNORE' > <!ENTITY % persName 'IGNORE' > <!ENTITY % placeName 'IGNORE' > <!ENTITY % geogName 'IGNORE' > <!ENTITY % dateStruct 'IGNORE' > <!ENTITY % timeStruct 'IGNORE' > <!ENTITY % dateline 'IGNORE' > and separately we must redefine them: <![%TEI.dictionaries;[ New mixed-content declarations for dictionaries ]]> <![%TEI.names.dates;[ New mixed-content declarations for names and dates ]]> New mixed-content declarations for structure

Since the normalization is purely mechanical, there seems to be no need to reproduce the original declarations here. The new declarations are given below.

N.B. All the mixed-content normalization changes in this section are regarded by the editors as corrections of corrigible errors, and will be integrated into the text of TEI P3 as soon as possible.

Two elements in this group are from the dictionary tag set: <!ENTITY % XML.sense "INCLUDE" > <![%XML.sense;[ <!ELEMENT %n.sense; - - (#PCDATA | %n.sense; | %m.dictionaryTopLevel; | %m.phrase; | %m.Incl;)* > <!ATTLIST %n.sense; %a.global; %a.dictionaries; level NUMBER #IMPLIED TEIform CDATA 'sense' > ]]> <!ENTITY % XML.re "INCLUDE" > <![%XML.re;[ <!ELEMENT %n.re; - O (#PCDATA | %n.sense; | %m.dictionaryTopLevel; | %m.phrase; | %m.Incl;)* > <!ATTLIST %n.re; %a.global; %a.dictionaries; type CDATA #IMPLIED TEIform CDATA 're' > ]]> Note that the standard declaration for re also has an exclusion exception which has been dropped silently here. N.B. Elimination of exclusion exceptions is not a corrigible error; the version of this declaration which will go into TEI P3 without review is this: ]]>

The other elements in this group are from the tag set for names and dates. <!ENTITY % XML.persName "INCLUDE" > <![%XML.persName;[ <!ELEMENT %n.persName; - - (#PCDATA | %m.personPart; | %m.phrase; | %m.Incl;)* > <!ATTLIST %n.persName; %a.global; %a.names; type CDATA #IMPLIED TEIform CDATA 'persName' > ]]> <!ENTITY % XML.placeName "INCLUDE" > <![%XML.placeName;[ <!ELEMENT %n.placeName; - - (#PCDATA | %m.placePart; | %m.phrase; | %m.Incl;)* > <!ATTLIST %n.placeName; %a.global; type CDATA #IMPLIED full (yes | abb | init) yes %a.names; TEIform CDATA 'placeName' > ]]> <!ENTITY % XML.geogName "INCLUDE" > <![%XML.geogName;[ <!ELEMENT %n.geogName; - - (#PCDATA | %n.geog; | %n.name; | %m.Incl;)* > <!ATTLIST %n.geogName; %a.global; %a.placePart; TEIform CDATA 'geogName' > ]]> <!ENTITY % XML.dateStruct "INCLUDE" > <![%XML.dateStruct;[ <!ELEMENT %n.dateStruct; - - (#PCDATA | %m.temporalExpr; | %m.Incl;)* > <!ATTLIST %n.dateStruct; %a.global; %a.temporalExpr; calendar CDATA #IMPLIED exact CDATA #IMPLIED TEIform CDATA 'dateStruct' > ]]> <!ENTITY % XML.timeStruct "INCLUDE" > <![%XML.timeStruct;[ <!ELEMENT %n.timeStruct; - - (#PCDATA | %m.temporalExpr; | %m.Incl;)* > <!ATTLIST %n.timeStruct; %a.global; %a.temporalExpr; zone CDATA #IMPLIED TEIform CDATA 'timeStruct' > ]]>

The dateline element (from the default text-structure tag set) is the last one needing a mixed-content fix: <!ENTITY % XML.dateline "INCLUDE" > <![%XML.dateline;[ <!ELEMENT %n.dateline; - O (#PCDATA | %n.date; | %n.time; | %n.name; | %n.address; | %m.Incl;)* > <!ATTLIST %n.dateline; %a.global; TEIform CDATA 'dateline' > ]]>

The entities phrase and phrase.seq

The XML rules for mixed-content models also require that the declarations for phrase and phrase.seq be changed slightly. The current defintions are: ]]> These give us one level too many of parentheses; we need to remove the parentheses from the entity phrase: <!ENTITY % phrase '#PCDATA | %m.phrase;' > <!ENTITY % phrase.seq '(%phrase;)*' >

N.B. This change to the declaration of phrase is regarded by the editors as the correction of a corrigible error, and will be integrated into the text of TEI P3 as soon as possible.

The element dictAnomaly is new; for a description, see below, section The problem of the dictionary chapter.

We need to declare the name of dictAnomaly. <!ENTITY % n.dictAnomaly 'dictAnomaly' >

Elements using phrase.seq and paraContent

Note that neither phrase.seq nor paraContent may be combined with other elements in a content model, in XML, because of the XML requirement that mixed content models not have nested groups. This affects the declarations for castItem (in drama) docImprint (in front matter) catDesc (in the header) byline (in default text structure) opener (in default text structure) closer (in default text structure) form (in dictionaries) gramGrp (in dictionaries) trans (in dictionaries) etym (in dictionaries) xr (in dictionaries)

These must be suppressed, in order to be redeclared: <!ENTITY % castItem 'IGNORE' > <!ENTITY % docImprint 'IGNORE' > <!ENTITY % catDesc 'IGNORE' > <!ENTITY % byline 'IGNORE' > <!ENTITY % opener 'IGNORE' > <!ENTITY % closer 'IGNORE' > <!ENTITY % form 'IGNORE' > <!ENTITY % gramGrp 'IGNORE' > <!ENTITY % trans 'IGNORE' > <!ENTITY % etym 'IGNORE' > <!ENTITY % xr 'IGNORE' >

And they need to be redefined, tag set by tag set. (We put elements from each tag set into separate scraps to simplify production of specialized modification files.) New castItem New docImprint New catDesc New opener and closer New phrase.seq elements for dictionaries

Next the tag set for front matter: <!ENTITY % XML.docImprint "INCLUDE" > <![%XML.docImprint;[ <!ELEMENT %n.docImprint; - O (#PCDATA | %m.phrase; | %n.pubPlace; | %n.docDate; | %n.publisher; | %m.Incl;)* > <!ATTLIST %n.docImprint; %a.global; TEIform CDATA 'docImprint' > ]]> Then, the header: <!ENTITY % XML.catDesc "INCLUDE" > <![%XML.catDesc;[ <!ELEMENT %n.catDesc; - O (#PCDATA | %m.phrase; | %n.textDesc;)* > <!ATTLIST %n.catDesc; %a.global; TEIform CDATA 'catDesc' > ]]> And the default text-structure tag set: <!ENTITY % XML.byline "INCLUDE" > <![%XML.byline;[ <!ELEMENT %n.byline; - O (#PCDATA | %m.phrase; | %n.docAuthor; | %m.Incl;)* > <!ATTLIST %n.byline; %a.global; TEIform CDATA 'byline' > ]]> <!ENTITY % XML.opener "INCLUDE" > <![%XML.opener;[ <!ELEMENT %n.opener; - O (#PCDATA | %m.phrase; | %n.argument; | %n.byline; | %n.epigraph; | %n.signed; | %n.dateline; | %n.salute; | %m.Incl;)* > <!ATTLIST %n.opener; %a.global; TEIform CDATA 'opener' > ]]> <!ENTITY % XML.closer "INCLUDE" > <![%XML.closer;[ <!ELEMENT %n.closer; - O (#PCDATA | %m.phrase; | %n.signed; | %n.dateline; | %n.salute; | %m.Incl;)* > <!ATTLIST %n.closer; %a.global; TEIform CDATA 'closer' > ]]>

And finally the base tag set for dictionaries; unlike the preceding elements, these all use paraContent, not phrase.seq. N.B. these content models will require further changes before publication. See below, The problem of the dictionary chapter. <![%TEI.dictionaries;[ <!ENTITY % XML.form "INCLUDE" > <![%XML.form;[ <!ELEMENT %n.form; - - (#PCDATA | %m.phrase; | %m.inter; | %m.formInfo; | %m.Incl;)* > <!ATTLIST %n.form; %a.global; %a.dictionaries; type CDATA #IMPLIED TEIform CDATA 'form' > ]]> <!ENTITY % XML.gramGrp "INCLUDE" > <![%XML.gramGrp;[ <!ELEMENT %n.gramGrp; - - (#PCDATA | %m.phrase; | %m.inter; | %m.gramInfo; | %m.Incl;)* > <!ATTLIST %n.gramGrp; %a.global; %a.dictionaries; TEIform CDATA 'gramGrp' > ]]> <!ENTITY % XML.trans "INCLUDE" > <![%XML.trans;[ <!ELEMENT %n.trans; - O (#PCDATA | %m.phrase; | %m.inter; | %m.dictionaryParts; | %m.Incl;)* > <!ATTLIST %n.trans; %a.global; %a.dictionaries; TEIform CDATA 'trans' > ]]> <!ENTITY % XML.etym "INCLUDE" > <![%XML.etym;[ <!ELEMENT %n.etym; - O (#PCDATA | %m.phrase; | %m.inter; | %n.usg; | %n.lbl; | %n.def; | %n.trans; | %n.tr; | %m.morphInfo; | %n.eg; | %n.xr; | %m.Incl;)* > <!ATTLIST %n.etym; %a.global; %a.dictionaries; TEIform CDATA 'etym' > ]]> <!ENTITY % XML.xr "INCLUDE" > <![%XML.xr;[ <!ELEMENT %n.xr; - O (#PCDATA | %m.phrase; | %m.inter; | %n.usg; | %n.lbl; | %m.Incl;)* > <!ATTLIST %n.xr; %a.global; %a.dictionaries; type CDATA #IMPLIED TEIform CDATA 'xr' > ]]> ]]>

Since paraContent also occurs in the definition of specialPara, in a form not legal in XML, the specialPara entity must also be redefined; see below, The problem of specialPara elements.

Exceptions

Removing inclusion and exclusion exceptions typically involves changing the set of documents accepted by the DTD.If the set of inclusions and the set of exclusions on the exception stack are always the same for every possible occurrence of every element type in the DTD, then an exception-free DTD can be created which accepts exactly the same set of documents as the original DTD. A DTD which had exceptions only on the root element type, for example, could be replicated without changing the language it accepts. I am not aware of any production DTDs which fall into this class. In the discussion which follows, I assume that our goal is to ensure that every document legal in the original DTD remains legal in the modified DTD. The changes will cause the modified DTD to accept some other documents which are not valid instances of the original DTD. That is, if the original DTD is taken as an absolutely correct definition of a language, the revised DTD will overgenerate.One could take the converse goal of ensuring that the revised DTD be at least as selective as the original DTD, i.e. that it undergenerate with respect to the original language. This would be interesting as an exercise, but if applied to the TEI DTD it would invalidate existing TEI data, which makes it unacceptable as an approach to creating an XML-conformant version of the TEI DTD. We will wish to keep the overgeneration to a minimum, but in general we cannot eliminate it entirely, since inclusion and exclusion exceptions do extend the expressive power of the DTD notation.This is clearly established by Wood and Kilpeläinen, though they inexplicably claim to have proven the opposite.

Exclusions

Rewriting declarations without exclusion exceptions involves simply removing the exception, and adding an application-specific constraint to be checked outside the SGML parser, that says the excluded element types must not occur within the element type which excluded them. Thus, for example, the TEI s element (for end-to-end segmentation on the level of the orthographic sentence) is currently declared thus: ]]> An XML-compatible TEI DTD would replace this with: ]]> The important change here, for present purposes, is the removal of the exclusion exception. In addition, we have removed the tag omissibility indicators and the parentheses around phrase.seq, for reasons that should be clear from other portions of this document.

It would be possible to simulate the effect of exclusion exceptions by modifying the content models of possible descendants of s, so as to remove s from their content model; for elements which can occur both as parents and as descendants of s, however, this change would render some existing documents illegal; it is thus not pursued further here.

The following elements have exclusion exceptions in TEI P3: s (excludes s) speaker (excludes speaker) stage (excludes stage) hom (excludes entry) re (excludes re)

The new declarations are precisely the same as the old declarations, only without the exclusions: <![ %TEI.analysis; [ <!ENTITY % XML.s "INCLUDE" > <![%XML.s;[ <!ELEMENT %n.s; - - %phrase.seq; > <!ATTLIST %n.s; %a.global; %a.seg; TEIform CDATA 's' > ]]> ]]> <!ENTITY % XML.speaker "INCLUDE" > <![%XML.speaker;[ <!ELEMENT %n.speaker; - O %phrase.seq; > <!ATTLIST %n.speaker; %a.global; TEIform CDATA 'speaker' > ]]> <!ENTITY % XML.stage "INCLUDE" > <![%XML.stage;[ <!ELEMENT %n.stage; - - %specialPara; > <!ATTLIST %n.stage; %a.global; type CDATA mix TEIform CDATA 'stage' > ]]>

And they have to be excluded from the base DTD: <!ENTITY % s 'IGNORE' > <!ENTITY % speaker 'IGNORE' > <!ENTITY % stage 'IGNORE' >

A new definition of re has already been given above, in the context of normalizing mixed-content models. The new definition of hom would be as follows: ]]> The actualy form to be used for hom in an XML DTD, however, varies from this, as described below in The problem of the dictionary chapter.

Inclusions

Removing inclusion exceptions requires simulating their effect in the content model of each element type which can occur as a descendant of the element type bearing the inclusions. This section discusses the effect of inclusions on the language accepted by a content model gaining that effect by modifying a finite-state automaton gaining that effect by modifying a content-model group examples A brief note on the notation used is given in an appendix.

The Effect of Inclusions

Inclusions make included elements legal at any location in a content model, without however changing the requirements of the basic content model, which must still be fulfilled. (For now, I make the simplifying assumption that the set of included elements and the set of elements named in the content model are disjoint. When they are not, special considerations will apply, because of SGML's requirement that content models be deterministic.)

We can summarize the effect of inclusions very simply if we think of an FSA recognizing a content model: included elements do not change the state of the FSA. So to change an FSA without inclusions to an FSA that accepts the same language, except that it also allows the inclusion of any element i in the set of inclusions I, for each state s in the FSA { for each element i in I { add a transition from s to s, on i } }

The Function imf()

We can characterize the language recognized using inclusion exceptions this way. Let us construct a function imf(E,I) which maps from a regular expression E and a set of inclusions I to a new regular expression E'. Ideally we want the following to be true: E' is deterministic if E is deterministic. L(E) &subsetof; L(E')

In general, for sequences of terminals x, y in Σ*: If x is in L(E) then x is in L(E'). If xy is in L(E) and i is in I then xiy is in L(E').

My best cut so far at defining such a function relies in some places on a couple of auxiliary functions. So let us define functions imf(E), mf(E), and m(E) (where i is for initial, m for medial, f for final). Strictly speaking, these ought perhaps to be imf(E,I), mf(E,I), and m(E,I), but for purposes of this paper we will never need different sets of inclusions I. So if it matters, we can define imf(E) formally as imf(E,I), etc. imf(E) makes the claim about xiy true for all x, y in Σ*. mf(E) makes it true for x in Σ+ and y in Σ*. m(E) makes it true for x, y in Σ+. Equivalently, we can say that any element i in I can appear initially, medially, or finally in imf(E), medially or finally (but not initially) in mf(E), and medially (but not initially or finally) in m(E).

The care we have to take with initial and final positions results from the SGML rules about determinism, but also helps keep the resulting expressions simpler than they'd be if we just slapped (I*) in everywhere in the content model.

Here is a first cut at defining the functions. In a number of circumstances, they are undefined; it might perhaps be useful, therefore, to define a simple normalization on (ampersand-free) content models, which would ensure that the functions are always defined.

If E is the empty set, then the content model in question cannot be satisfied; this would be the case if a DTD which lacked any element called nonesuch nevertheless included an element which required it as a subelement: ]]> Given that we want L(E) &subsetof; L(E') we must define imf etc. thus for this case: imf(E) = the empty set mf(E) = the empty set m(E) = the empty set

An element may accept the empty string as its content in either of two ways. First, the element may be declared EMPTY: in this case, inclusions are not legal inside the element. imf(E) = the empty string mf(E) = the empty string m(E) = the empty string Second, the element's content model may accept the empty string, either because all subelements are optional or because the content model may be satisfied by #PCDATA: in this case, inclusions are legal within the element. imf(E) = I* mf(E) is undefined m(E) is undefined

If E is an atomic symbol, e.g. a, then m(E) = E = a ]]> mf(E) = (m(E), I*) = (a, I*) ]]> imf(E) = (I*, mf(E)) = (I*, a, I*)]]>

If E has the form F?, and F is not nullable (does not accept the empty string), then m(E) = m(F)? mf(E) = (m(F), I*)? imf(E) = I*, mf(E) = I*, (m(F), I*)? Note that we require F to be non-nullable in order to preserve determinism.

If E has the form F?, and F is nullable, then m(E) = m(F) mf(E) = mf(F) imf(E) = imf(E) In other words, if F is nullable, the ? is redundant and may be stripped without loss of information.

If E has the form F+, and F is not nullable, then m(E) = (m(F), (I*, m(F))*) mf(E) = (m(F), I*)+ imf(E) = I*, mf(E) = I*, (m(F), I*)+

If E has the form F+, and F is nullable, then m(E) = m(F*) mf(E) = mf(F*) imf(E) = imf(F*)

If E has the form F*, and F is not nullable, then m(E) = (m(F), (I*, m(F))*)? mf(E) = (m(F), I*)* mf(E) = (mf(F))* imf(E) = (m(F) | I)*

If E has the form F*, and F is nullable, then m(E) is undefined mf(E) is undefined imf(E) = (m(F) | I)*

If E has the form (F,G), then m(E) = mf(F), m(G), if and only if G is not nullable, else undefined mf(E) = mf(F), mf(G) imf(E) = imf(F), mf(G) imf(E) = I*, mf(E) = I*, mf(F), mf(G)

If E has the form (F&G), then m(E) = m(F,G)|m(G,F) mf(E) = m(F&G), I* imf(E) = I*, m(F&G), I*

Examples

Let's do some simple examples, abstracted from the TEI.

Simple Examples

(a,b) ==> (I*, a, I*, b, I*) (TEI.2 has this structure.) (a,b+) ==> (I*, a, I*, (b, I*)+) (teiCorpus.2 has this structure.) (a*) ==> (a | I)* (spanGrp and many other elements have this structure.) (#PCDATA | a | b | c | d)* (%paraContent et al.) ==>(m(#PCDATA | a | b | c | d) | I)* ==>((m(#PCDATA) | m(a) | m(b) | m(c) | m(d)) | I)* ==>((#PCDATA | a | b | c | d) | I)* ==>(#PCDATA | a | b | c | d | I)* a+ ==> (I*, (a, I*)+) (a|b)+ ==> (I*, ((a|b), I*)+)

A Complex Example: back

The element back is defined thus: ]]>

Removing the parameter entities and using single-letter identifiers, we can rewrite the content model this way to show its structure a little more clearly: ( (a | b | c)*, ( ( (d | e | f), (d | e | f | g)* ) | ( (h), (h | (a | b | c))* ) | ( (i), (i | (a | b | c))* ) )? ) Or more compactly: ( (a | b | c)*, ( ( (d | e | f), (d | e | f | g)* ) | ( h, (h | a | b | c)* ) | ( i, (i | a | b | c)* ) )? ) i.e. E has the form F,G where F=(a|b|c)* and G=(((d|e|f) ... (i|a|b|c)*))?. So imf(E) = imf(F), mf(G).

Now, F is simple: imf(a|b|c)* = (a | b | c | I)*

But mf(G) requires more work.

G = H? where H = ( ( (d | e | f), (d | e | f | g)* ) | ( h, (h | a | b | c)* ) | ( i, (i | a | b | c)* ) ) So mf(G) = (m(H), I*)?

H in turn is an alternation of three sequences, each of the form (x, (y|z)*). This leads to a problem, because the final term in each sequence is nullable; we will have a determinism conflict with the trailing I*.

So we add a new definition of mf(E) where E = F?. mf(F?) = mf(F)?

Applied to G, we have: mf(G) = (mf(H))?, with H = (J | K | L).

So mf(H) = ((m(J) | m(K) | m(L)), I*)

But J, K, and L don't have m() forms, since their final term is nullable. So we use the alternate definition:

mf(H) = (mf(J) | mf(K) | mf(L))

We have the following: J = ( (d | e | f), (d | e | f | g)* ) mf(J) = ( (d | e | f), I*, (d | e | f | g | I)*) K = ( h, (h | a | b | c)* ) mf(K) = ( h, I*, (h | a | b | c | I)* ) L = ( i, (i | a | b | c)* ) mf(L) = ( i, I*, (i | a | b | c | I)* )

So mf(H) = ( ( (d | e | f), I*, (d | e | f | g | I)*) | ( h, I*, (h | a | b | c | I)* ) | ( i, I*, (i | a | b | c | I)* ) )

Recall that mf(G) = (mf(H))?.

So mf(G) = ( ( (d | e | f), I*, (d | e | f | g | I)*) | ( h, I*, (h | a | b | c | I)* ) | ( i, I*, (i | a | b | c | I)* ) )? and imf(E) = imf(F), mf(G) = ( (a | b | c | I)*, ( ( (d | e | f), I*, (d | e | f | g | I)*) | ( h, I*, (h | a | b | c | I)* ) | ( i, I*, (i | a | b | c | I)* ) )? )

Or, in content model terms (using the usual TEI conventions for names of element classes): ]]>

I think we've got a system we can use manually, though I don't know for sure how to make it a program, given the problems we have defining some of the functions.

Removing inclusions in TEI P3

The following elements have inclusion exceptions in TEI P3 (as of September 1994): entry (includes anchor) entryFree (includes %m.dictionaryParts; | %m.phrase; | %m.inter;) eg (includes %m.dictionaryParts; | %m.formPointers;) orgName (includes orgtitle, orgtype, and orgdivn) text (includes %m.globincl;, i.e. alt, altGrp, cb, certainty, fLib, fs, fsLib, fvLib, index, interp, interpGrp, join, joinGrp, lb, link, linkGrp, milestone, pb, respons, span, spanGrp, and timeline) lem (includes %m.fragmentary;, i.e. lacunaEnd, lacunaStart, witEnd, and witStart) rdg (includes %m.fragmentary;) termEntry (the version in the nested DTD includes %m.terminologyInclusions;, i.e. date, dateStruct, note, ptr, ref, xptr, and xref)

The inclusions on entry, entryFree, and eg will be taken care of separately, in the section on the dictionary chapter.

The inclusions on orgName were dropped in October 1994 (though this change has not been propagated to any public version of the DTD), and so we will ignore them.

The inclusions on text must be propagated to all potential descendants of text.

The inclusions on lem and rdg must be propagated to all potential descendants; it might be possible to do without these, but it's probably not worth the effort.

Note that in the case of terminologyInclusions, the set of inclusions is not disjoint from the set of children named directly in content models.

Study of the full TEI DTD shows that the sets of possible descendants of text, lem, rdg, and termEntry are all identical. This is not surprising given that text is recursive.

The 263 elements in this set fall into the following groups: 52 elements declared EMPTY: addSpan, alt, anchor, any, arc, caesura, cb, certainty, delSpan, dft, divGen, eLeaf, event, gap, handShift, index, iNode, interp, join, kinesic, lacunaEnd, lacunaStart, lb, leaf, link, milestone, minus, move, msr, nbr, node, none, null, oRef, pause, pb, plus, pRef, ptr, rate, respons, root, shift, space, span, sym, uncertain, vocal, when, witEnd, witStart, and xptr 16 elements declared with (#PCDATA): att, day, gi, hour, idno, minute, month, offset, postBox, postCode, second, str, tag, val, week, and year. (Of these, note that att, gi, tag, and val aren't actually in the main DTD, so they won't be handled here. Perhaps all these lists need to be checked once more in a calm moment.) 57 elements declared with (%phrase.seq;): abbr, actor, addrLine, author, authority, biblScope, cl, date, dateRange, del, distance, distinct, distributor, docAuthor, docDate, edition, editor, expan, extent, funder, fw, gloss, headItem, headLabel, label, measure, mentioned, name, num, occasion, orgDivn, orgName, orgTitle, orgType, orig, phr, principal, publisher, pubplace, reg, resp, restore, role, roleDesc, rs, s, salute, signed, soCalled, speaker, sponsor, street, term, time, timeRange, trailer, and wit [Supernumerary in ND: surname, forename, genName, nameLink, addName, roleName, settlement, bloc.] [Supernumerary in Corpus: channel, constitution, derivation, domain, factuality, interaction, preparedness, purpose, birth, firstLang, langKnown, residence, education, affiliation, occupation, socecstatus, locale, activity.] [Supernumerary in Header: symbol, creation, language, classCode] What is wrong with these lists, and why are they not complete? The Names and Dates tag set may not have been selected, or the DTD I used may -- almost surely did -- have the bug that makes much of that tag set unreachable. The Corpus tags are for the header, and may in fact not be descendants of text. one element declared with (%component.seq;): epigraph 35 elements declared with (%paraContent;): admin, camera, caption, cell, country, damage, descrip, docEdition, emph, figDesc, foreign, gram, head, hi, imprimatur, l, lang, x lem, meeting, otherForm, p, x rdg, ref, region, seg, sound, supplied, tech, title, titlePart, unclear, witDetail, witness, writing, and xref. N.B. this list does not include elements from the dictionary tag set, the feature system declaration, or the tag set declaration. The dictionary tag set includes orth, pron, hyph, syll, stress, gram, gen, number, case, per, tns, mood, itype, pos, subc, colloc, def, tr, lang, usg, lbl. The dictionary tag set presents problems of its own, and the others are not part of the main TEI DTD. nine elements declared with (%specialPara;): add, corr, item, note, q, quote, sic, stage, and view 95 elements with non-standard content models requiring manual changes: address, altgrp, analytic, app, argument, availability, back, bibl, biblfull, biblstruct, body, broadcast, byline, c, castgroup, castitem, castlist, cit, closer, dateline, datestruct, div, div0, div1, div2, div3, div4, div5, div6, div7, docimprint, doctitle, editionStmt, epilogue, equipment, etree, f, falt, figure, flib, formula, front, fs, fslib, fvlib, graph, group, imprint, interpgrp, joingrp, lg, lg1, lg2, lg3, lg4, lg5, linkgrp, list, listbibl, m, monogr, notesStmt, ofig, opener, ovar, performance, prologue, publicationStmt, pvar, rdggrp, recording, recordingStmt, respStmt, row, scriptStmt, series, seriesStmt, set, sourcedesc, sp, spangrp, table, termentry, text, tig, timeline, timestruct, titlepage, titleStmt, tree, triangle, u, valt, w, and witlist Note that this list excludes most element types from the dictionary tag set, since they need special treatment anyway. (It does not exclude all of them, though, which puzzles me.)

Empty elements need no changes.

The other groups of elements do require changes to the DTD, which are described in the following sections.

The m.Incl element class

In order to simplify the process of adding inclusions to the content models of the DTD, we define a new class for use in content models, namely m.Incl. This consists of: globincl (included by text) fragmentary, if the additional tag set for text-critical apparatus is selected (included by lem and rdg) For now, we ignore the problems posed by the termEntry element. In the long run, they mean the terminology tag set is going to need to be rewritten. (Of course, it needs rewriting anyway, to align it with more recent ISO work.) <!ENTITY % x.Incl ''> <![%TEI.textcrit;[  <!ENTITY % m.Incl '%x.Incl; %m.globincl; | %m.editIncl; | %m.fragmentary; | %n.anchor;' > ]]>  <!ENTITY % m.Incl '%x.Incl; %m.globincl; | %m.editIncl; | %n.anchor;' >

Changing #PCDATA elements

Each element which now has a content model of #PCDATA should, for compatibility, be revised to have a content model of (#PCDATA | %m.Incl;)*.

In some cases, it might be preferable to leave the content model alone: it's not clear that it's really useful to allow index entries, feature structure libraries, and joins to occur within attribute names, generic identifiers, and the components of structured times and dates. Even within generic identifiers and so on, there might be line breaks, page breaks, or other milestones, but perhaps we should define at least some of these elements as (#PCDATA | %m.refsys;)*.

For now, for purposes of the experimental XML DTD, I propose to use the first form given.

(Scraps with suppression and redefinition of att, day, gi, hour, idno, minute, month, offset, postBox, postCode, second, str, tag, val, week, and year to be supplied here.)

]]>

First, we suppress all of these elements: <!ENTITY % day 'IGNORE' > <!ENTITY % hour 'IGNORE' > <!ENTITY % minute 'IGNORE' > <!ENTITY % month 'IGNORE' > <!ENTITY % offset 'IGNORE' > <!ENTITY % second 'IGNORE' > <!ENTITY % week 'IGNORE' > <!ENTITY % year 'IGNORE' > <!ENTITY % idno 'IGNORE' > <!ENTITY % postBox 'IGNORE' > <!ENTITY % postCode 'IGNORE' > <!ENTITY % str 'IGNORE' >

Then we supply the new declarations: <![%TEI.names.dates;[ <!ENTITY % XML.day "INCLUDE" > <![%XML.day;[ <!ELEMENT %n.day; - - (#PCDATA | %m.Incl;)* > <!ATTLIST %n.day; %a.global; %a.temporalExpr; TEIform CDATA 'day' > ]]> <!ENTITY % XML.hour "INCLUDE" > <![%XML.hour;[ <!ELEMENT %n.hour; - - (#PCDATA | %m.Incl;)* > <!ATTLIST %n.hour; %a.global; %a.temporalExpr; TEIform CDATA 'hour' > ]]> <!ENTITY % XML.minute "INCLUDE" > <![%XML.minute;[ <!ELEMENT %n.minute; - - (#PCDATA | %m.Incl;)* > <!ATTLIST %n.minute; %a.global; %a.temporalExpr; TEIform CDATA 'minute' > ]]> <!ENTITY % XML.month "INCLUDE" > <![%XML.month;[ <!ELEMENT %n.month; - - (#PCDATA | %m.Incl;)* > <!ATTLIST %n.month; %a.global; %a.temporalExpr; TEIform CDATA 'month' > ]]> <!ENTITY % XML.offset "INCLUDE" > <![%XML.offset;[ <!ELEMENT %n.offset; - - (#PCDATA | %m.Incl;)* > <!ATTLIST %n.offset; %a.global; value CDATA #IMPLIED %a.placePart; TEIform CDATA 'offset' > ]]> <!ENTITY % XML.second "INCLUDE" > <![%XML.second;[ <!ELEMENT %n.second; - - (#PCDATA | %m.Incl;)* > <!ATTLIST %n.second; %a.global; %a.temporalExpr; TEIform CDATA 'second' > ]]> <!ENTITY % XML.week "INCLUDE" > <![%XML.week;[ <!ELEMENT %n.week; - - (#PCDATA | %m.Incl;)* > <!ATTLIST %n.week; %a.global; %a.temporalExpr; TEIform CDATA 'week' > ]]> <!ENTITY % XML.year "INCLUDE" > <![%XML.year;[ <!ELEMENT %n.year; - - (#PCDATA | %m.Incl;)* > <!ATTLIST %n.year; %a.global; %a.temporalExpr; TEIform CDATA 'year' > ]]> ]]> <!ENTITY % XML.idno "INCLUDE" > <![%XML.idno;[ <!ELEMENT %n.idno; - - (#PCDATA | %m.Incl;)* > <!ATTLIST %n.idno; %a.global; type CDATA #IMPLIED TEIform CDATA 'idno' > ]]> <!ENTITY % XML.postBox "INCLUDE" > <![%XML.postBox;[ <!ELEMENT %n.postBox; - - (#PCDATA | %m.Incl;)* > <!ATTLIST %n.postBox; %a.global; TEIform CDATA 'postBox' > ]]> <!ENTITY % XML.postCode "INCLUDE" > <![%XML.postCode;[ <!ELEMENT %n.postCode; - - (#PCDATA | %m.Incl;)* > <!ATTLIST %n.postCode; %a.global; TEIform CDATA 'postCode' > ]]> <![%TEI.fs;[ <!ENTITY % XML.str "INCLUDE" > <![%XML.str;[ <!ELEMENT %n.str; - - (#PCDATA | %m.Incl;)* > <!ATTLIST %n.str; %a.global; rel (eq | ne | sb | ns | lt | le | gt | ge) eq TEIform CDATA 'str' > ]]> ]]>

Changing phrase.seq

The parameter entity phrase.seq should be redefined as follows: <!ENTITY % phrase '#PCDATA | %m.phrase; | %m.Incl;' > <!ENTITY % phrase.seq '(%phrase;)*' > (This supersedes the redefinition given earlier. Adding the inclusions to the class phrase (i.e. to the entity m.phrase) might enable some of the redefinitions already given above to stand unchanged, but for now, at least, I propose to keep the inclusions logically separate from the original element classes.) Note that the entity phrase is used only once, in the definition of u.

No changes to the actual content models are needed. (Ah, the joys of indirection.)

(Note, 14 May 1999.) No, wait, actually, that's not true. Many of these declarations read ]]> which, expanded, would be ]]> which is illegal. The content models do need to be changed, to ]]> This is only required if we wish to allow the extensions file to work with the current (1994-09) production DTDs. Since those are what I currently have on this laptop, I do wish. But since we will shortly be releasing corrected versions, we want to make this part of the extensions file optional. We'll do so using a conditional inclusion on the parameter entity base9409, which by default will be defined IGNORE.

The same logic applies to paraContent and (for now) specialPara.

(Note, 30 May 1999.) No, no, wait. Doesn't carthage already normalize these correctly by omitting extra parentheses? I've already spent several hours making the scraps below, and now realize we may not need them after all. (17 June 1999.) I've removed them, since carthage actually does produce legal XML.

Changing component.seq

The entity component.seq must be redefined to allow inclusions between any two components. In the long run, the changes should be made directly within the various declarations which go into component.seq, but those declarations are among the most complicated of the entire TEI DTD, since there are variant versions for each of the two hundred or so possible combinations of base tag sets.

The quick and dirty approach most suitable for use in the experimental XML DTD is to include the Incl class as a subclass of common, thus: <!ENTITY % x.common '%m.Incl; |'> If this proves to introduce ambiguity in the content model, we'll have to find a slower, cleaner way to do it.

Experiment shows that it does indeed introduce ambiguity in content models, notably those for body and text divisions. Rather than hack at those content models, I am going to take the longer and slower approach. <!ENTITY % x.common '' > <!ENTITY % m.common '%x.common %m.bibl; | %m.chunk; | %m.hqinter; | %m.lists; | %m.notes; | %n.stage;' > Reproduce standard component declarations      <!ENTITY % component.seq '((%component;), (%m.Incl;)*)*' >

<!ENTITY % mix.verse '' > <!ENTITY % mix.drama '' > <!ENTITY % mix.spoken '' > <!ENTITY % mix.dictionaries '' > <!ENTITY % mix.terminology '' > <![ %TEI.mixed; [ <!ENTITY % TEI.singleBase 'IGNORE' > <!ENTITY % component '(%m.common; %mix.verse; %mix.drama; %mix.spoken; %mix.dictionaries; %mix.terminology;)' > ]]> <![ %TEI.general; [ <!ENTITY % TEI.singleBase 'IGNORE' > <!ENTITY % component '(%m.common; %mix.verse; %mix.drama; %mix.spoken; %mix.dictionaries; %mix.terminology;)' > <![ %TEI.verse; [ <!ENTITY % gen.verse '((%m.comp.verse;), (%m.common; | %m.comp.verse; | %m.Incl;)*) |' > ]]> <![ %TEI.drama; [ <!ENTITY % gen.drama '((%m.comp.drama;), (%m.common; | %m.comp.drama; | %m.Incl;)*) |' > ]]> <![ %TEI.spoken; [ <!ENTITY % gen.spoken '((%m.comp.spoken;), (%m.common; | %m.comp.spoken; | %m.Incl;)*) |' > ]]> <![ %TEI.dictionaries; [ <!ENTITY % gen.dictionaries '((%m.comp.dictionaries;), (%m.common; | %m.comp.dictionaries; | %m.Incl;)*) |' > ]]> <![ %TEI.terminology; [ <!ENTITY % gen.terminology '((%m.comp.terminology;), (%m.common; | %m.comp.terminology; | %m.Incl;)*) |' > ]]>   <!ENTITY % gen.verse '' > <!ENTITY % gen.drama '' > <!ENTITY % gen.spoken '' > <!ENTITY % gen.dictionaries '' > <!ENTITY % gen.terminology '' > <!ENTITY % component.seq '((%m.common;), (%m.Incl;)*)*, (%gen.verse; %gen.drama; %gen.spoken; %gen.dictionaries; %gen.terminology; TEI...end)?' > <!ENTITY % component.plus '(%gen.verse; %gen.drama; %gen.spoken; %gen.dictionaries; %gen.terminology; TEI...end) | ( ((%m.common;), (%m.Incl;)*)+, (%gen.verse; %gen.drama; %gen.spoken; %gen.dictionaries; %gen.terminology; TEI...end)?' >  ]]> <![ %TEI.prose; [ <!ENTITY % component '(%m.common;)' > <!ENTITY % TEI.singleBase 'INCLUDE' > ]]> <![ %TEI.verse; [ <!ENTITY % component '(%m.common; | %m.comp.verse;)' > <!ENTITY % TEI.singleBase 'INCLUDE' > ]]> <![ %TEI.drama; [ <!ENTITY % component '(%m.common; | %m.comp.drama;)' > <!ENTITY % TEI.singleBase 'INCLUDE' > ]]> <![ %TEI.spoken; [ <!ENTITY % component '(%m.common; | %m.comp.spoken;)' > <!ENTITY % TEI.singleBase 'INCLUDE' > ]]> <![ %TEI.dictionaries; [ <!ENTITY % component '(%m.common; | %m.comp.dictionaries;)' > <!ENTITY % TEI.singleBase 'INCLUDE' > ]]> <![ %TEI.terminology; [ <!ENTITY % component '(%m.common; | %m.comp.terminology;)' > <!ENTITY % TEI.singleBase 'INCLUDE' > ]]>  <!ENTITY % component '(%m.common;)' > <!ENTITY % TEI.singleBase 'INCLUDE' >

Changing paraContent

The parameter entity paraContent must be changed as follows: <!ENTITY % paraContent '(#PCDATA | %m.phrase; | %m.inter; | %m.Incl;)*' >

No change to actual content models is needed.

(Note, 14 May 1999.) No, wait, actually, that's not true. Many of these declarations read ]]> which, expanded, would be ]]> which is illegal. The content models do need to be changed, to ]]>

For now, though, we can rely on carthage to do the job, so I've deleted the long boring scraps that used to be here.

The problem of specialPara elements

In TEI P3, the entity specialPara is defined thus: ]]> It allows an element to contain either a series of chunks or the same content as a paragraph. It is intended for elements like notes and list items: the normal case, in which the item consists of a single paragraph, can be tagged simply (<item> ... </item>) and the multi-paragraph case can be accommodated using nested paragraphs or other chunk-level elements (<item> ... ... </item>). In practice, the multi-paragraph form has proven very disconcerting to users, since it is not intuitively obvious that no white space may appear between the paragraphs.This is a classic example of what is known in DTD design circles as the Mixed-Content Gotcha; the problems associated with it led the XML design group to restrict the form of mixed-content models in order to forbid content models which are subject to the problem. This restriction, in turn, makes it essential to revise specialPara in an XML version of the TEI DTD. The current definition and use of specialPara are thus acknowledged by the editors to be an error. Since there is no obvious solution, however, it is not a corrigible error.

In changing specialPara to meet the requirements of XML, there are three obvious possible solutions. We can overgenerate, so as to allow all existing data to remain valid: ]]> This has the drawback of allowing paragraphs and other chunk-level elements to float within character data, thus violating one of the few consistently followed rules of the TEI DTD.

Alternatively, we can bite the bullet and require that list items and notes which consist of a single paragraph be marked as such: ]]> This has the advantage of being relatively clean, but it has the major disadvantage of requiring retagging for almost all current list items and notes. What is now tagged <item> ... </item> would have to be retagged <item> ... </item>. The best that can be said is that such retagging could in principle be automated.

A third approach would be to have distinct element types for simple list items and notes, and compound ones. The simple form could be defined as containing paraContent, and the compound ones as containing component.seq. This would also require retagging (of all compound list items and notes), but not as much as the previous approach.

For purposes of the experimental XML DTD, we take the first approach.

The following element types are defined as containing specialPara: q (in the core) quote (in the core) sic (in the core) corr (in the core) add (in the core) -- but not del! item (in the core) note (in the core) stage (in the core) set (in drama -- needs manual fix) view (in drama) equiv (in tag set documentation)

The ab element is new and we need to declare its content model: <!ENTITY % n.ab 'ab' >

Only one content model must be redefined by hand, to flatten the group: that of set in the drama tag set. The current definition is this: ]]> If we flatten this in the expected way, we get this: ]]> This has the unfortunate result of allowing head elements at random locations; it might be better, in this case, to tighten the content model instead.An inquiry on TEI-L might usefully reveal whether anyone is actually using set and whether they would be inconvenienced by this tighter model. Version 2 of the new model is this: <![%TEI.drama;[ <!ENTITY % XML.set "INCLUDE" > <![%XML.set;[ <!ELEMENT %n.set; - - ((%n.head;)?, %component.seq;) > <!ATTLIST %n.set; %a.global; TEIform CDATA 'set' > ]]> ]]> Version 2 is not strictly compatible with the old version: to be fully compatible we have to allow inclusions up front (Version 3): ]]> For now, the experimental XML version of the DTD will use Version 2 of this declaration.

Elements requiring manual intervention

(Scraps suppressing and redeclaring the remaining elements to be supplied here.)

The elements to be treated here are: address, altgrp, analytic, app, argument, availability, back, bibl, biblfull, biblstruct, body, broadcast, byline, c, castgroup, castitem, castlist, cit, closer, dateline, datestruct, div, div0, div1, div2, div3, div4, div5, div6, div7, docimprint, doctitle, editionStmt, epilogue, equipment, etree, f, falt, figure, flib, formula, front, fs, fslib, fvlib, graph, group, imprint, interpgrp, joingrp, lg, lg1, lg2, lg3, lg4, lg5, linkgrp, list, listbibl, m, monogr, notesStmt, ofig, opener, ovar, performance, prologue, publicationStmt, pvar, rdggrp, recording, recordingStmt, respStmt, row, scriptStmt, series, seriesStmt, set, sourcedesc, sp, spangrp, table, termentry, text, tig, timeline, timestruct, titlepage, titleStmt, tree, triangle, u, valt, w, and witlist.

The following sections provide the DTD fragments necessary for suppressing the existing declarations for these elements and declaring them with new content models.

Core tag set

<!ENTITY % address 'IGNORE' > <!ENTITY % analytic 'IGNORE' > <!ENTITY % bibl 'IGNORE' > <!ENTITY % biblFull 'IGNORE' > <!ENTITY % biblStruct 'IGNORE' > <!ENTITY % cit 'IGNORE' > <!ENTITY % imprint 'IGNORE' > <!ENTITY % lg 'IGNORE' > <!ENTITY % list 'IGNORE' > <!ENTITY % listBibl 'IGNORE' > <!ENTITY % monogr 'IGNORE' > <!ENTITY % respStmt 'IGNORE' > <!ENTITY % series 'IGNORE' > <!ENTITY % sp 'IGNORE' >

The existing declarations are these: ]]>

The new definitions are these; note that cit and respStmt have already been declared above. <!ENTITY % XML.address "INCLUDE" > <![%XML.address;[ <!ELEMENT %n.address; - O ((%m.Incl;)*, ( (%n.addrLine;, (%m.Incl;)*)+ | ((%m.addrPart;), (%m.Incl;)*)*)) > <!ATTLIST %n.address; %a.global; TEIform CDATA 'address' > ]]> <!ENTITY % XML.analytic "INCLUDE" > <![%XML.analytic;[ <!ELEMENT %n.analytic; - O (%n.author; | %n.editor; | %n.respStmt; | %n.title; | %m.Incl;)* > <!ATTLIST %n.analytic; %a.global; TEIform CDATA 'analytic' > ]]> <!ENTITY % XML.bibl "INCLUDE" > <![%XML.bibl;[ <!ELEMENT %n.bibl; - O (#PCDATA | %m.phrase; | %m.biblPart; | %m.Incl;)* > <!ATTLIST %n.bibl; %a.global; %a.declarable; TEIform CDATA 'bibl' > ]]> <!ENTITY % XML.biblFull "INCLUDE" > <![%XML.biblFull;[ <!ELEMENT %n.biblFull; - O ((%m.Incl;)*, (%n.titleStmt;, (%m.Incl;)*), (%n.editionStmt;, (%m.Incl;)*)?, (%n.extent;, (%m.Incl;)*)?, (%n.publicationStmt;, (%m.Incl;)*), (%n.seriesStmt;, (%m.Incl;)*)?, (%n.notesStmt;, (%m.Incl;)*)?, (%n.sourceDesc;, (%m.Incl;)*)* ) > <!ATTLIST %n.biblFull; %a.global; %a.declarable; TEIform CDATA 'biblFull' > ]]> <!ENTITY % XML.biblStruct "INCLUDE" > <![%XML.biblStruct;[ <!ELEMENT %n.biblStruct; - O ((%m.Incl;)*, (%n.analytic;, (%m.Incl;)*)?, ( (%n.monogr;, (%m.Incl;)*), (%n.series;, (%m.Incl;)*)* )+, ( (%n.note; | %n.idno;), (%m.Incl;)*)*) > <!ATTLIST %n.biblStruct; %a.global; %a.declarable; TEIform CDATA 'biblStruct' > ]]>  <!ENTITY % XML.imprint "INCLUDE" > <![%XML.imprint;[ <!ELEMENT %n.imprint; - O (%n.pubPlace; | %n.publisher; | %n.date; | %n.biblScope; | %m.Incl;)* > <!ATTLIST %n.imprint; %a.global; TEIform CDATA 'imprint' > ]]> <!ENTITY % XML.lg "INCLUDE" > <![%XML.lg;[ <!ELEMENT %n.lg; - O ((%m.divtop; | %m.Incl;)*, (%n.l; | %n.lg;), (%n.l; | %n.lg; | %m.Incl;)*, ((%m.divbot;), (%m.Incl;)*)*) > <!ATTLIST %n.lg; %a.global; %a.divn; %a.metrical; TEIform CDATA 'lg' > ]]> <!ENTITY % XML.list "INCLUDE" > <![%XML.list;[ <!ELEMENT %n.list; - - ((%m.Incl;)*, (%n.head;, (%m.Incl;)*)?, ( ((%n.item;, (%m.Incl;)*)*) | ( (%n.headLabel;, (%m.Incl;)*)?, (%n.headItem;, (%m.Incl;)*)?, (%n.label;, (%m.Incl;)*, %n.item;, (%m.Incl;)*)+))) > <!ATTLIST %n.list; %a.global; type CDATA simple TEIform CDATA 'list' > ]]> <!ENTITY % XML.listBibl "INCLUDE" > <![%XML.listBibl;[ <!ELEMENT %n.listBibl; - - ((%m.Incl;)*, (%n.head;, (%m.Incl;)*)?, (%n.bibl; | %n.biblStruct; | %n.biblFull;), (%n.bibl; | %n.biblStruct; | %n.biblFull; | %m.Incl;)*, (%n.trailer;, (%m.Incl;)*)?) > <!ATTLIST %n.listBibl; %a.global; %a.declarable; TEIform CDATA 'listBibl' > ]]> <!ENTITY % XML.monogr "INCLUDE" > <![%XML.monogr;[ <!ELEMENT %n.monogr; - O ( ((%m.Incl;)*, (( (%n.author; | %n.editor; | %n.respStmt;), (%n.author; | %n.editor; | %n.respStmt; | %m.Incl;)*, (%n.title;, (%m.Incl;)*)+, ((%n.editor; | %n.respStmt;), (%m.Incl;)*)* ) | ( (%n.title;, (%m.Incl;)*)+, ( (%n.author; | %n.editor; | %n.respStmt;), (%m.Incl;)* )* )) )?, ((%n.note; | %n.meeting;), (%m.Incl;)*)*, (%n.edition;, (%n.editor; | %n.respStmt; | %m.Incl;)*)*, %n.imprint;, (%n.imprint; | %n.extent; | %n.biblScope; | %m.Incl;)* ) > <!ATTLIST %n.monogr; %a.global; TEIform CDATA 'monogr' > ]]>  <!ENTITY % XML.series "INCLUDE" > <![%XML.series;[ <!ELEMENT %n.series; - O (%n.title; | %n.editor; | %n.respStmt; | %n.biblScope; | %m.Incl;)* > <!ATTLIST %n.series; %a.global; TEIform CDATA 'series' > ]]> <!ENTITY % XML.sp "INCLUDE" > <![%XML.sp;[ <!ELEMENT %n.sp; - O ((%m.Incl;)*, (%n.speaker;, (%m.Incl;)*)?, ((%n.p; | %n.l; | %n.lg; | %n.seg; | %n.ab; | %n.stage;), (%m.Incl;)*)+) > <!ATTLIST %n.sp; %a.global; who IDREFS #IMPLIED TEIform CDATA 'sp' > ]]>

Basic text-structure tag set

<!ENTITY % argument 'IGNORE' > <!ENTITY % back 'IGNORE' > <!ENTITY % body 'IGNORE' > <!ENTITY % byline 'IGNORE' > <!ENTITY % closer 'IGNORE' > <!ENTITY % dateline 'IGNORE' > <!ENTITY % div 'IGNORE' > <!ENTITY % div0 'IGNORE' > <!ENTITY % div1 'IGNORE' > <!ENTITY % div2 'IGNORE' > <!ENTITY % div3 'IGNORE' > <!ENTITY % div4 'IGNORE' > <!ENTITY % div5 'IGNORE' > <!ENTITY % div6 'IGNORE' > <!ENTITY % div7 'IGNORE' > <!ENTITY % group 'IGNORE' > <!ENTITY % opener 'IGNORE' > <!ENTITY % text 'IGNORE' >

The current definitions are these: ]]>

The new definitions are as follows: <!ENTITY % XML.argument "INCLUDE" > <![%XML.argument;[ <!ELEMENT %n.argument; - - ((%m.Incl;)*, (%n.head;, %component.seq;)?) > <!ATTLIST %n.argument; %a.global; TEIform CDATA 'argument' > ]]> <!ENTITY % XML.back "INCLUDE" > <![%XML.back;[ <!ELEMENT %n.back; - O ( (%m.front; | %m.Incl;)*, ( ( (%m.divtop;), (%m.divtop; | %n.titlePage; | %m.Incl;)*) | ( (%n.div;), (%n.div; | %m.front; | %m.Incl;)*) | ( (%n.div1;), (%n.div1; | %m.front; | %m.Incl;)*) )? ) > <!ATTLIST %n.back; %a.global; %a.declaring; TEIform CDATA 'back' > ]]> <!ENTITY % XML.body "INCLUDE" > <![%XML.body;[ <!ELEMENT %n.body; - O ( (%m.divtop; | %m.Incl;)*, ( ( ((%component;), (%m.Incl;)*)+, ((%n.divGen;, (%m.Incl;)*)*, ( (%n.div;, (%n.div; | %n.divGen; | %m.Incl;)*) | (%n.div0;, (%n.div0; | %n.divGen; | %m.Incl;)*) | (%n.div1;, (%n.div1; | %n.divGen; | %m.Incl;)*) )? ) ) | ( (%n.divGen;, (%m.Incl;)*)*, ( (%n.div;, (%n.div; | %n.divGen; | %m.Incl;)*) | (%n.div0;, (%n.div0; | %n.divGen; | %m.Incl;)*) | (%n.div1;, (%n.div1; | %n.divGen; | %m.Incl;)*) ) ) ), ((%m.divbot;), (%m.Incl;)*)* ) > <!ATTLIST %n.body; %a.global; %a.declaring; TEIform CDATA 'body' > ]]>  <!ENTITY % XML.div "INCLUDE" > <![%XML.div;[ <!ELEMENT %n.div; - O ( (%m.divtop; | %m.Incl;)*, ( ((%n.div; | %n.divGen;), (%m.Incl;)*)+ | ( (%component;, (%m.Incl;)*)+, ((%n.div; | %n.divGen;), (%m.Incl;)*)*) ), ((%m.divbot;), (%m.Incl;)*)*) > <!ATTLIST %n.div; %a.global; %a.declaring; %a.divn; TEIform CDATA 'div' > ]]> <!ENTITY % XML.div0 "INCLUDE" > <![%XML.div0;[ <!ELEMENT %n.div0; - O ((%m.divtop; | %m.Incl;)*, ( ((%n.div1; | %n.divGen;), (%m.Incl;)*)+ | ( (%component;, (%m.Incl;)*)+, ((%n.div1; | %n.divGen;), (%m.Incl;)*)*)), ((%m.divbot;), (%m.Incl;)*)*) > <!ATTLIST %n.div0; %a.global; %a.declaring; %a.divn; TEIform CDATA 'div0' > ]]> <!ENTITY % XML.div1 "INCLUDE" > <![%XML.div1;[ <!ELEMENT %n.div1; - O ((%m.divtop; | %m.Incl;)*, ( ((%n.div2; | %n.divGen;), (%m.Incl;)*)+ | ((%component;, (%m.Incl;)*)+, ((%n.div2; | %n.divGen;), (%m.Incl;)*)*)), ((%m.divbot;), (%m.Incl;)*)*) > <!ATTLIST %n.div1; %a.global; %a.declaring; %a.divn; TEIform CDATA 'div1' > ]]> <!ENTITY % XML.div2 "INCLUDE" > <![%XML.div2;[ <!ELEMENT %n.div2; - O ((%m.divtop; | %m.Incl;)*, ( ((%n.div3; | %n.divGen;), (%m.Incl;)*)+ | ((%component;, (%m.Incl;)*)+, ((%n.div3; | %n.divGen;), (%m.Incl;)*)*)), ((%m.divbot;), (%m.Incl;)*)*) > <!ATTLIST %n.div2; %a.global; %a.declaring; %a.divn; TEIform CDATA 'div2' > ]]> <!ENTITY % XML.div3 "INCLUDE" > <![%XML.div3;[ <!ELEMENT %n.div3; - O ((%m.divtop; | %m.Incl;)*, ( ((%n.div4; | %n.divGen;), (%m.Incl;)*)+ | ((%component;, (%m.Incl;)*)+, ((%n.div4; | %n.divGen;), (%m.Incl;)*)*)), ((%m.divbot;), (%m.Incl;)*)*) > <!ATTLIST %n.div3; %a.global; %a.declaring; %a.divn; TEIform CDATA 'div3' > ]]> <!ENTITY % XML.div4 "INCLUDE" > <![%XML.div4;[ <!ELEMENT %n.div4; - O ((%m.divtop; | %m.Incl;)*, ( ((%n.div5; | %n.divGen;), (%m.Incl;)*)+ | ((%component;, (%m.Incl;)*)+, ((%n.div5; | %n.divGen;), (%m.Incl;)*)*)), ((%m.divbot;), (%m.Incl;)*)*) > <!ATTLIST %n.div4; %a.global; %a.declaring; %a.divn; TEIform CDATA 'div4' > ]]> <!ENTITY % XML.div5 "INCLUDE" > <![%XML.div5;[ <!ELEMENT %n.div5; - O ((%m.divtop; | %m.Incl;)*, ( ((%n.div6; | %n.divGen;), (%m.Incl;)*)+ | ((%component;, (%m.Incl;)*)+, ((%n.div6; | %n.divGen;), (%m.Incl;)*)*)), ((%m.divbot;), (%m.Incl;)*)*) > <!ATTLIST %n.div5; %a.global; %a.declaring; %a.divn; TEIform CDATA 'div5' > ]]> <!ENTITY % XML.div6 "INCLUDE" > <![%XML.div6;[ <!ELEMENT %n.div6; - O ((%m.divtop; | %m.Incl;)*, ( ((%n.div7; | %n.divGen;), (%m.Incl;)*)+ | ((%component;, (%m.Incl;)*)+, ((%n.div7; | %n.divGen;), (%m.Incl;)*)*)), ((%m.divbot;), (%m.Incl;)*)*) > <!ATTLIST %n.div6; %a.global; %a.declaring; %a.divn; TEIform CDATA 'div6' > ]]> <!ENTITY % XML.div7 "INCLUDE" > <![%XML.div7;[ <!ELEMENT %n.div7; - O ((%m.divtop; | %m.Incl;)*, (%component;, (%m.Incl;)*)+, ((%m.divbot;), (%m.Incl;)*)*) > <!ATTLIST %n.div7; %a.global; %a.declaring; %a.divn; TEIform CDATA 'div7' > ]]> <!ENTITY % XML.group "INCLUDE" > <![%XML.group;[ <!ELEMENT %n.group; - O ((%m.divtop; | %m.Incl;)*, ((%n.text; | %n.group;), (%n.text; | %n.group; | %m.Incl;)*), ((%m.divbot;), (%m.Incl;)*)*) > <!ATTLIST %n.group; %a.global; %a.declaring; TEIform CDATA 'group' > ]]>  <!ENTITY % XML.text "INCLUDE" > <![%XML.text;[ <!ELEMENT %n.text; - - ((%m.Incl;)*, (%n.front;, (%m.Incl;)*)?, (%n.body; | %n.group;), (%m.Incl;)*, (%n.back;, (%m.Incl;)*)?) > <!ATTLIST %n.text; %a.global; %a.declaring; TEIform CDATA 'text' > ]]>

Front-matter tag set

<!ENTITY % docTitle 'IGNORE' > <!ENTITY % front 'IGNORE' > <!ENTITY % titlePage 'IGNORE' >

The existing declarations are these: ]]>

The new definitions are these. The definition for front has been changed to use fmchunk instead of divtop. <!ENTITY % XML.front "INCLUDE" > <![%XML.front;[ <!ELEMENT %n.front; - O ( (%m.front; | %m.Incl;)*, ( ( (%m.fmchunk;), (%m.fmchunk; | %n.titlePage; | %m.Incl;)*) | ( (%n.div;), (%n.div; | %m.front; | %m.Incl;)*) | ( (%n.div1;), (%n.div1; | %m.front; | %m.Incl;)*) )? ) > <!ATTLIST %n.front; %a.global; %a.declaring; TEIform CDATA 'front' > ]]> <!ENTITY % XML.titlePage "INCLUDE" > <![%XML.titlePage;[ <!ELEMENT %n.titlePage; - O ((%m.Incl;)*, (%m.tpParts;), (%m.tpParts; | %m.Incl;)*) > <!ATTLIST %n.titlePage; %a.global; type CDATA #IMPLIED TEIform CDATA 'titlePage' > ]]> <!ENTITY % XML.docTitle "INCLUDE" > <![%XML.docTitle;[ <!ELEMENT %n.docTitle; - O ((%m.Incl;)*, (%n.titlePart;, (%m.Incl;)*)+) > <!ATTLIST %n.docTitle; %a.global; TEIform CDATA 'docTitle' > ]]> Header tag set

<!ENTITY % availability 'IGNORE' > <!ENTITY % broadcast 'IGNORE' > <!ENTITY % editionStmt 'IGNORE' > <!ENTITY % equipment 'IGNORE' > <!ENTITY % notesStmt 'IGNORE' >  <!ENTITY % recording 'IGNORE' > <!ENTITY % recordingStmt 'IGNORE' > <!ENTITY % scriptStmt 'IGNORE' > <!ENTITY % seriesStmt 'IGNORE' > <!ENTITY % sourceDesc 'IGNORE' > <!ENTITY % titleStmt 'IGNORE' >

The current definitions are these: ]]>

The new definitions are as follows. We've changed the language for some element types, in parallel with changes to TEI P3: availability can be empty <!ENTITY % XML.availability "INCLUDE" > <![%XML.availability;[ <!ELEMENT %n.availability; - O (%n.p; | %m.Incl;)* > <!ATTLIST %n.availability; %a.global; status (free | unknown | restricted) #IMPLIED TEIform CDATA 'availability' > ]]> <!ENTITY % XML.broadcast "INCLUDE" > <![%XML.broadcast;[ <!ELEMENT %n.broadcast; - - ((%m.Incl;)*, ((%n.p;, (%m.Incl;)*)+ | ((%n.bibl; | %n.biblStruct; | %n.biblFull; | %n.recording;), (%m.Incl;)*))) > <!ATTLIST %n.broadcast; %a.global; %a.declarable; TEIform CDATA 'broadcast' > ]]> <!ENTITY % XML.editionStmt "INCLUDE" > <![%XML.editionStmt;[ <!ELEMENT %n.editionStmt; - O ((%m.Incl;)*, ((%n.edition;, (%n.respStmt; | %m.Incl;)*) | (%n.p;, (%m.Incl;)*)+) ) > <!ATTLIST %n.editionStmt; %a.global; TEIform CDATA 'editionStmt' > ]]> <!ENTITY % XML.equipment "INCLUDE" > <![%XML.equipment;[ <!ELEMENT %n.equipment; - O ((%m.Incl;)*, (%n.p;, (%m.Incl;)*)+) > <!ATTLIST %n.equipment; %a.global; %a.declarable; TEIform CDATA 'equipment' > ]]> <!ENTITY % XML.notesStmt "INCLUDE" > <![%XML.notesStmt;[ <!ELEMENT %n.notesStmt; - O ((%m.Incl;)*, (%n.note;, (%m.Incl;)*)+) > <!ATTLIST %n.notesStmt; %a.global; TEIform CDATA 'notesStmt' > ]]> <!ENTITY % XML.recording "INCLUDE" > <![%XML.recording;[ <!ELEMENT %n.recording; - - (((%m.Incl;)*, (%n.p;, (%m.Incl;)*)+) | ((%n.respStmt; | %n.equipment; | %n.broadcast; | %n.date;), (%m.Incl;)*)*) > <!ATTLIST %n.recording; %a.global; %a.declarable; type (audio | video) audio dur CDATA #IMPLIED TEIform CDATA 'recording' > ]]> <!ENTITY % XML.recordingStmt "INCLUDE" > <![%XML.recordingStmt;[ <!ELEMENT %n.recordingStmt; - - ((%m.Incl;)*, ((%n.p;, (%m.Incl;)*)+ | (%n.recording;, (%m.Incl;)*)+ ))> <!ATTLIST %n.recordingStmt; %a.global; TEIform CDATA 'recordingStmt'> ]]> <!ENTITY % XML.scriptStmt "INCLUDE" > <![%XML.scriptStmt;[ <!ELEMENT %n.scriptStmt; - - ((%m.Incl;)*, ((%n.p;, (%m.Incl;)*)+ | ((%n.bibl; | %n.biblStruct; | %n.biblFull;), (%m.Incl;)*))) > <!ATTLIST %n.scriptStmt; %a.global; %a.declarable; TEIform CDATA 'scriptStmt' > ]]> <!ENTITY % XML.seriesStmt "INCLUDE" > <![%XML.seriesStmt;[ <!ELEMENT %n.seriesStmt; - O ((%m.Incl;)*, ((%n.title;, (%n.idno; | %n.respStmt; | %m.Incl;)* ) | (%n.p;, (%m.Incl;)*)+) ) > <!ATTLIST %n.seriesStmt; %a.global; TEIform CDATA 'seriesStmt' > ]]> <!ENTITY % XML.sourceDesc "INCLUDE" > <![%XML.sourceDesc;[ <!ELEMENT %n.sourceDesc; - - ((%m.Incl;)*, ((%n.p; | %n.bibl; | %n.biblFull; | %n.biblStruct; | %n.listBibl; | %n.scriptStmt; | %n.recordingStmt;), (%m.Incl;)*)+) > <!ATTLIST %n.sourceDesc; %a.global; %a.declarable; TEIform CDATA 'sourceDesc' > ]]> <!ENTITY % XML.titleStmt "INCLUDE" > <![%XML.titleStmt;[ <!ELEMENT %n.titleStmt; - O ( (%m.Incl;)*, (%n.title;, (%m.Incl;)*)+, ( (%n.author; | %n.editor; | %n.sponsor; | %n.funder; | %n.principal; | %n.respStmt;), (%m.Incl;)*)* ) > <!ATTLIST %n.titleStmt; %a.global; TEIform CDATA 'titleStmt' > ]]>

Verse tag set

<!ENTITY % lg1 'IGNORE' > <!ENTITY % lg2 'IGNORE' > <!ENTITY % lg3 'IGNORE' > <!ENTITY % lg4 'IGNORE' > <!ENTITY % lg5 'IGNORE' >

The current definitions are these: ]]>

The new definitions are as follows: <![%TEI.verse;[ <!ENTITY % XML.lg1 "INCLUDE" > <![%XML.lg1;[ <!ELEMENT %n.lg1; - O ((%m.Incl;)*, (%n.head;, (%m.Incl;)*)?, ((%n.l; | %n.lg2;), (%m.Incl;)*)+) > <!ATTLIST %n.lg1; %a.global; %a.divn; %a.metrical; TEIform CDATA 'lg1' > ]]> <!ENTITY % XML.lg2 "INCLUDE" > <![%XML.lg2;[ <!ELEMENT %n.lg2; - O ((%m.Incl;)*, (%n.head;, (%m.Incl;)*)?, ((%n.l; | %n.lg3;), (%m.Incl;)*)+) > <!ATTLIST %n.lg2; %a.global; %a.divn; %a.metrical; TEIform CDATA 'lg2' > ]]> <!ENTITY % XML.lg3 "INCLUDE" > <![%XML.lg3;[ <!ELEMENT %n.lg3; - O ((%m.Incl;)*, (%n.head;, (%m.Incl;)*)?, ((%n.l; | %n.lg4;), (%m.Incl;)*)+) > <!ATTLIST %n.lg3; %a.global; %a.divn; %a.metrical; TEIform CDATA 'lg3' > ]]> <!ENTITY % XML.lg4 "INCLUDE" > <![%XML.lg4;[ <!ELEMENT %n.lg4; - O ((%m.Incl;)*, (%n.head;, (%m.Incl;)*)?, ((%n.l; | %n.lg5;), (%m.Incl;)*)+) > <!ATTLIST %n.lg4; %a.global; %a.divn; %a.metrical; TEIform CDATA 'lg4' > ]]> <!ENTITY % XML.lg5 "INCLUDE" > <![%XML.lg5;[ <!ELEMENT %n.lg5; - O ((%m.Incl;)*, (%n.head;, (%m.Incl;)*)?, (%n.l;, (%m.Incl;)*)+) > <!ATTLIST %n.lg5; %a.global; %a.divn; %a.metrical; TEIform CDATA 'lg5' > ]]> ]]>

Drama tag set

<!ENTITY % castGroup 'IGNORE' >  <!ENTITY % castList 'IGNORE' > <!ENTITY % epilogue 'IGNORE' > <!ENTITY % performance 'IGNORE' > <!ENTITY % prologue 'IGNORE' > <!ENTITY % set 'IGNORE' >

The current definitions are these: ]]>

The new definitions are as follows: <![%TEI.drama;[ <!ENTITY % XML.castGroup "INCLUDE" > <![%XML.castGroup;[ <!ELEMENT %n.castGroup; - - ((%m.Incl;)*, (%n.head;, (%m.Incl;)*)?, ((%n.castItem; | %n.castGroup;), (%m.Incl;)*)+, (%n.trailer;, (%m.Incl;)*)?) > <!ATTLIST %n.castGroup; %a.global; TEIform CDATA 'castGroup' > ]]>  <!ENTITY % XML.castList "INCLUDE" > <![%XML.castList;[ <!ELEMENT %n.castList; - - ( (%m.divtop; | %m.Incl;)*, ((%component;), (%m.Incl;)*)*, ((%n.castItem; | %n.castGroup;), (%m.Incl;)*)+, ((%component;), (%m.Incl;)*)*) > <!ATTLIST %n.castList; %a.global; TEIform CDATA 'castList' > ]]> <!ENTITY % XML.epilogue "INCLUDE" > <![%XML.epilogue;[ <!ELEMENT %n.epilogue; - - ((%m.divtop; | %m.Incl;)*, ((%component;), (%m.Incl;)*)+, ((%m.divbot;), (%m.Incl;)*)*) > <!ATTLIST %n.epilogue; %a.global; TEIform CDATA 'epilogue' > ]]> <!ENTITY % XML.performance "INCLUDE" > <![%XML.performance;[ <!ELEMENT %n.performance; - - ((%m.divtop; | %m.Incl;)*, ((%component;), (%m.Incl;)*)+, ((%m.divbot;), (%m.Incl;)*)*) > <!ATTLIST %n.performance; %a.global; TEIform CDATA 'performance' > ]]> <!ENTITY % XML.prologue "INCLUDE" > <![%XML.prologue;[ <!ELEMENT %n.prologue; - - ((%m.divtop; | %m.Incl;)*, ((%component;), (%m.Incl;)*)+, ((%m.divbot;), (%m.Incl;)*)*) > <!ATTLIST %n.prologue; %a.global; TEIform CDATA 'prologue' > ]]>  ]]>

Spoken-text tag set

<!ENTITY % u 'IGNORE' >

The current definition is this: ]]>

The new definitions are as follows: <![%TEI.spoken;[ <!ENTITY % XML.u "INCLUDE" > <![%XML.u;[ <!ELEMENT %n.u; - - (#PCDATA | %m.phrase; | %m.comp.spoken; | %m.Incl;)* > <!ATTLIST %n.u; %a.global; %a.timed; %a.declaring; trans (smooth | latching | overlap | pause) smooth who IDREF %INHERITED; TEIform CDATA 'u' > ]]> ]]>

Dictionary tag set

We handle the dictionary tag set below, not here. (The list above does contain oVar and pVar, but that must be a mistake.)

Terminology tag set

<!ENTITY % ofig 'IGNORE' > <!ENTITY % termEntry 'IGNORE' > <!ENTITY % tig 'IGNORE' >

The current definitions in the nested tag set are these: ]]>

Note that termEntry has inclusions of its own. These do not require special treatment in our propagation of inclusions, since the set of legal descendants of termEntry is the same as the set of legal descendants of text. The set of terminology inclusions, however, does need to be revised for future versions of the DTD, since it's not disjoint from elements named in content models. It includes elements normally included in any phrase-level content model; we don't want to include them in m.Incl, since that would cause ambiguity. So all terminological content models should be rewritten for TEI P4, or even P3.5.

The new definitions are as follows: <![%TEI.terminology;[ <!ENTITY % XML.ofig "INCLUDE" > <![%XML.ofig;[ <!ELEMENT %n.ofig; - O ((%m.terminologyMisc; | %m.Incl;)*, (%n.otherForm;, (%n.gram; | %m.Incl;)*), ((%m.terminologyMisc;), (%m.Incl;)*)*) > <!ATTLIST %n.ofig; %a.global; type CDATA #IMPLIED TEIform CDATA 'ofig' > ]]> <!ENTITY % XML.termEntry "INCLUDE" > <![%XML.termEntry;[ <!ELEMENT %n.termEntry; - O ((%m.terminologyMisc; | %m.terminologyInclusions; | %m.Incl;)*, (%n.tig;, (%m.Incl; | %m.terminologyInclusions;)*)+) > <!ATTLIST %n.termEntry; %a.global; type CDATA #IMPLIED TEIform CDATA 'termEntry' > ]]> <!ENTITY % XML.tig "INCLUDE" > <![%XML.tig;[ <!ELEMENT %n.tig; - O ((%m.terminologyMisc; | %m.terminologyInclusions; | %m.Incl;)*, (%n.term;, (%n.gram; | %m.terminologyInclusions; | %m.Incl;)*), ((%m.terminologyMisc;), (%m.terminologyInclusions; | %m.Incl;)*)*, (%n.ofig;, (%m.terminologyInclusions; | %m.Incl;)*)*) > <!ATTLIST %n.tig; %a.global; type CDATA #IMPLIED TEIform CDATA 'tig' > ]]> ]]>

In the flat version of the terminology tag set, there is no ofig and no tig element. The current definition of termEntry is this one: ]]>

The new definition is as follows. Since we need both versions in the extensions file, we invent a new parameter entity (TEI.terminology.flat) to signal the difference between the nested and flat terminology element sets. <![%TEI.terminology;[ <!ENTITY % TEI.terminology.flat 'IGNORE'> <![%TEI.terminology.flat;[ <!ENTITY % XML.termEntry "INCLUDE" > <![%XML.termEntry;[ <!ELEMENT %n.termEntry; - O ( (%m.terminologyMisc; | %n.otherForm; | %n.gram; | %m.terminologyInclusions; | %m.Incl;)*, (%n.term;, (%m.terminologyMisc; | %n.otherForm; | %n.gram; | %m.terminologyInclusions; | %m.Incl;)* )+ ) > <!ATTLIST %n.termEntry; %a.global; type CDATA #IMPLIED TEIform CDATA 'termEntry' > ]]> ]]> ]]>

Segmentation and alignment tag set

<!ENTITY % altGrp 'IGNORE' > <!ENTITY % joinGrp 'IGNORE' > <!ENTITY % linkGrp 'IGNORE' > <!ENTITY % timeline 'IGNORE' >

The current definitions are these: ]]>

The new definitions are as follows. We take the opportunity to level the declarations by using stars, instead of plus signs, on all of them. This has the drawback of allowing a link group to contain no links (only members of m.Incl), but the advantage of dramatically simplifying the content model. <![%TEI.linking;[ <!ENTITY % XML.altGrp "INCLUDE" > <![%XML.altGrp;[ <!ELEMENT %n.altGrp; - - ((%n.ptr; | %n.xptr; | %m.Incl;)*) > <!ATTLIST %n.altGrp; %a.global; %a.pointerGroup; mode (excl | incl) excl wScale (perc | real) perc TEIform CDATA 'altGrp' > ]]> <!ENTITY % XML.joinGrp "INCLUDE" > <![%XML.joinGrp;[ <!ELEMENT %n.joinGrp; - - ((%n.ptr; | %n.xptr; | %m.Incl;)*) > <!ATTLIST %n.joinGrp; %a.global; %a.pointerGroup; result CDATA #IMPLIED desc CDATA #IMPLIED TEIform CDATA 'joinGrp' > ]]> <!ENTITY % XML.linkGrp "INCLUDE" > <![%XML.linkGrp;[ <!ELEMENT %n.linkGrp; - - (%n.ptr; | %n.xptr; | %m.Incl;)* > <!ATTLIST %n.linkGrp; %a.global; %a.pointerGroup; TEIform CDATA 'linkGrp' > ]]> <!ENTITY % XML.timeline "INCLUDE" > <![%XML.timeline;[ <!ELEMENT %n.timeline; - - ((%n.when;), (%m.Incl;)*)+ > <!ATTLIST %n.timeline; %a.global; origin IDREF #REQUIRED unit NMTOKEN #IMPLIED interval NUTOKEN #IMPLIED TEIform CDATA 'timeline' > ]]> ]]>

We have included m.Incl within these content models in the interests of consistency: this document is intended to provide an XML-compatible DTD which accepts all valid TEI P3 documents, and does not change the language unnecessarily. In the long run, however, it seems unlikely that we need to allow any m.Incl elements within any of these content models. Page breaks really and truly do not occur within link groups. Allowing timelines to nest within timelines is daft. And as we have seen, adding m.Incl to the original content models introduces ambiguity, since some members of that class were already named in the models. Removing the explicit mention avoids the ambigutity, but renders the content model misleading.

It is the editors' view that in P4, the m.Incl class should not appear in these models; they should revert to the form given in P3.

Analysis and interpretation tag set

<!ENTITY % c 'IGNORE' > <!ENTITY % interpGrp 'IGNORE' > <!ENTITY % m 'IGNORE' > <!ENTITY % spanGrp 'IGNORE' > <!ENTITY % w 'IGNORE' >

The current definitions are these: ]]>

The new definitions are as follows: <![%TEI.analysis;[ <!ENTITY % XML.c "INCLUDE" > <![%XML.c;[ <!ELEMENT %n.c; - - (#PCDATA) > <!ATTLIST %n.c; %a.global; %a.seg; TEIform CDATA 'c' > ]]> Since interp is a member of class Incl, we cannot name it directly in the content model, on pain of ambiguity. (Sigh.) <!ENTITY % XML.interpGrp "INCLUDE" > <![%XML.interpGrp;[  <!ELEMENT %n.interpGrp; - - (%m.Incl;)* > <!ATTLIST %n.interpGrp; %a.global; %a.interpret; TEIform CDATA 'interpGrp' > ]]> <!ENTITY % XML.m "INCLUDE" > <![%XML.m;[ <!ELEMENT %n.m; - - (#PCDATA | %n.seg; | %n.c; | %m.Incl;)* > <!ATTLIST %n.m; %a.global; %a.seg; baseform CDATA #IMPLIED TEIform CDATA 'm' > ]]> The spanGrp element, like interpGrp, becomes close to meaningless now, if one doesn't understand that it is supposed to contain spans, which are included in m.Incl. <!ENTITY % XML.spanGrp "INCLUDE" > <![%XML.spanGrp;[  <!ELEMENT %n.spanGrp; - - (%m.Incl;)* > <!ATTLIST %n.spanGrp; %a.global; %a.interpret; TEIform CDATA 'spanGrp' > ]]> <!ENTITY % XML.w "INCLUDE" > <![%XML.w;[ <!ELEMENT %n.w; - - (#PCDATA | %n.seg; | %n.w; | %n.m; | %n.c; | %m.Incl;)* > <!ATTLIST %n.w; %a.global; %a.seg; lemma CDATA #IMPLIED TEIform CDATA 'w' > ]]> ]]>

Feature structures tag set

The arguments given above against propagating global inclusions to the segmentation and alignment element types apply with equal or greater force to the feature-structures element types. But we resist the siren song of common sense and press on doggedly toward our goal of an upward-compatible experimental XML DTD. <!ENTITY % f 'IGNORE' > <!ENTITY % falt 'IGNORE' > <!ENTITY % flib 'IGNORE' > <!ENTITY % fs 'IGNORE' > <!ENTITY % fslib 'IGNORE' > <!ENTITY % fvlib 'IGNORE' > <!ENTITY % valt 'IGNORE' >

The current definitions are these: ]]>

It will be noted that the new versions are identical to the old versions. Common sense has won out, and in this experimental XML version of the TEI DTD, global inclusions are not propagated into these feature-structure element types.

Names and dates tag set

The dateStruct and timeStruct element types have already been rewritten above.

Text-criticism tag set

<!ENTITY % app 'IGNORE' > <!ENTITY % rdgGrp 'IGNORE' > <!ENTITY % witList 'IGNORE' >

The current definitions are these: ]]>

The new definitions are as follows. We take the opportunity to address one of Peter Robinson's long-standing concerns, and allow witnesses to the lemma to be listed. Note that the model for rdgGrp seems bizarre. Why are readings and reading groups treated similarly in app entries and not in rdgGrp elements? <![%TEI.textcrit;[ <!ENTITY % XML.app "INCLUDE" > <![%XML.app;[ <!ELEMENT %n.app; - O ( (%m.Incl;)*, (%n.lem;, (%m.Incl;)*, (%n.wit;, (%m.Incl;)*)?)?, ( (%n.rdg;, (%m.Incl;)*, (%n.wit;, (%m.Incl;)*)?) | (%n.rdgGrp;, (%m.Incl;)*, (%n.wit;, (%m.Incl;)*)?) )+ ) > <!ATTLIST %n.app; %a.global; type CDATA #IMPLIED from IDREF #IMPLIED to IDREF #IMPLIED loc CDATA #IMPLIED TEIform CDATA 'app' > ]]> <!ENTITY % XML.rdgGrp "INCLUDE" > <![%XML.rdgGrp;[ <!ELEMENT %n.rdgGrp; - O ((%m.Incl;)*, (((%n.rdgGrp;, (%m.Incl;)*) | (%n.rdg;, (%m.Incl;)*, (%n.wit;, (%m.Incl;)*)?)))+) > <!ATTLIST %n.rdgGrp; %a.global; %a.readings; TEIform CDATA 'rdgGrp' > ]]> <!ENTITY % XML.witList "INCLUDE" > <![%XML.witList;[ <!ELEMENT %n.witList; - O ((%m.Incl;)*, (%n.witness;, (%m.Incl;)*)+) > <!ATTLIST %n.witList; %a.global; TEIform CDATA 'witList' > ]]> ]]>

Graphs and digraphs tag set

<!ENTITY % eTree 'IGNORE' > <!ENTITY % forest 'IGNORE' > <!ENTITY % forestGrp 'IGNORE' > <!ENTITY % graph 'IGNORE' > <!ENTITY % tree 'IGNORE' > <!ENTITY % triangle 'IGNORE' >

The current definitions are these: ]]>

The new definitions are as follows: <![%TEI.nets;[ <!ENTITY % XML.tree "INCLUDE" > <![%XML.tree;[ <!ELEMENT %n.tree; - - ((%n.leaf; | %n.iNode; | %m.Incl;)*, %n.root;, (%n.leaf; | %n.iNode; | %m.Incl;)*) > <!ATTLIST %n.tree; %a.global; label CDATA #IMPLIED arity NUMBER #IMPLIED ord (Y | N | partial) Y order NUMBER #IMPLIED TEIform CDATA 'tree' > ]]> <!ENTITY % XML.eTree "INCLUDE" > <![%XML.eTree;[ <!ELEMENT %n.eTree; - - ((%n.eTree; | %n.triangle; | %n.eLeaf; | %m.Incl;)*) > <!ATTLIST %n.eTree; %a.global; label CDATA #IMPLIED value IDREF #IMPLIED TEIform CDATA 'eTree' > ]]> <!ENTITY % XML.triangle "INCLUDE" > <![%XML.triangle;[ <!ELEMENT %n.triangle; - - ((%n.eTree; | %n.triangle; | %n.eLeaf; | %m.Incl;)*) > <!ATTLIST %n.triangle; %a.global; label CDATA #IMPLIED value IDREF #IMPLIED TEIform CDATA 'triangle' > ]]> <!ENTITY % XML.forest "INCLUDE" > <![%XML.forest;[ <!ELEMENT %n.forest; - - ((%n.tree; | %n.eTree; | %n.triangle; | %m.Incl;)*) > <!ATTLIST %n.forest; %a.global; type CDATA #IMPLIED TEIform CDATA 'forest' > ]]> <!ENTITY % XML.forestGrp "INCLUDE" > <![%XML.forestGrp;[ <!ELEMENT %n.forestGrp; - - ((%n.forest;, (%m.Incl;)*)+) > <!ATTLIST %n.forestGrp; %a.global; type CDATA #IMPLIED TEIform CDATA 'forestGrp' > ]]> ]]>

Tables tag set

<!ENTITY % figure 'IGNORE' > <!ENTITY % formula 'IGNORE' > <!ENTITY % row 'IGNORE' > <!ENTITY % table 'IGNORE' >

The current definitions are these: ]]>

The new definitions are as follows: <![%TEI.figures;[ <!ENTITY % XML.table "INCLUDE" > <![%XML.table;[ <!ELEMENT %n.table; - - ((%n.head; | %m.Incl;)*, (%n.row;, (%m.Incl;)*)+) > <!ATTLIST %n.table; %a.global; rows NUMBER #IMPLIED cols NUMBER #IMPLIED TEIform CDATA 'table' > ]]> <!ENTITY % XML.row "INCLUDE" > <![%XML.row;[ <!ELEMENT %n.row; - O ((%n.cell; | %n.table;), (%m.Incl;)*)+ > <!ATTLIST %n.row; %a.global; role CDATA data TEIform CDATA 'row' > ]]> <!ENTITY % XML.figure "INCLUDE" > <![%XML.figure;[ <!ELEMENT %n.figure; - - ((%m.Incl;)*, (%n.head;, (%m.Incl;)*)?, (%n.p;, (%m.Incl;)*)*, (%n.figDesc;, (%m.Incl;)*)?, (%n.text;, (%m.Incl;)*)?) > <!ATTLIST %n.figure; %a.global; entity ENTITY #IMPLIED TEIform CDATA 'figure' > ]]> <!ENTITY % XML.formula "INCLUDE" > <![%XML.formula;[ <!ELEMENT %n.formula; - - %formulaContent; > <!ATTLIST %n.formula; %a.global; notation %formulaNotations; #REQUIRED TEIform CDATA 'formula' > ]]> ]]>

The problem of the dictionary chapter

The TEI base tag set for dictionaries cannot be made XML conformant using the methods described here. That tag set distinguishes two top-level elements for dictionary entries: entry, which has a relatively well-defined structure, and entryFree, which has no prescribed structure at all: any element used in tagging dictionary entries may appear, within any other element, at any level of nesting. The desired freedom for entryFree entries is guaranteed by the inclusion exception on entryFree. The standard declaration for the element is this: ]]>

If we use the techniques described above, all of the members of the classes dictionaryParts, phrase, and inter will be made legal at every point within any members of any of those classes. Apart from the havoc that would wreak on the core tag set, it would wholly erase the distinction between entry and entryFree elements.

So some other method of handling anomalous dictionary entries is needed in an XML version of the TEI DTD. Borrowing ideas from B. Tommie Usdin and Deborah A. Lapeyre, and with thanks also to David J. Birnbaum, I propose a new approach to the problem.

The basic idea is to define an element for anomalous structures in dictionary entries. In this discussion, I'll assume this element is called dictAnomaly for (dictionary anomaly). For every element in the normal structure of a dictionary, the existing content model is changed by taking the existing content model and adding dictAnomaly as an alternative. Thus the element superentry currently has the following declaration: ]]> After the change, it will have the declaration: ]]> That is, a superentry is either normal (an optional form element followed by one or more entry elements), or else it is anomalous. The dictAnomaly element itself is defined as allowing any sequence of character data, dictionary elements, inter-level elements, or phrase-level elements: ]]> An anomalous superentry contains a single dictAnomaly element, and nothing else.

For elements which are currently defined with mixed content, dictAnomaly is simply added to the list of elements which can occur within them. This allows us to evade the mixed-content problem. The simplest way to do this is to define dictAnomaly as a phrase-level element in the dictionary tag set. It also allows anomalies to occur within generic phrase-level and inter-level elements which are used in dictionary entries.

In principle, the extensions file should handle this thus: ]]> But since we have to include new declarations for the entire phrase-level class system in the extensions file anyway (to fix the problems with phrase.seq), we can simply add dictAnomaly to phrase, as was done above.

Open questions and checklists

This list brings together in one place a number of open questions mentioned above. Should entities for omissibility indications be introduced into the TEI Odd files? Or should they be introduced only in the DTD output from odddtd? (Current leaning: only in the odddtd output: entification would made the DTD fragments in the Guidelines too hard to read.) Should graph be defined as proposed here, or more loosely? Should the failure to parameterize exclusion exceptions be regarded as a corrigible error? (N.B. parameterizing them will require the creation of new entDoc elements for each of them.) How many of the current class of global inclusions should actually be globally legal? Particularly to be considered here are the elements now defined as taking only #PCDATA. What should we do in the short term (experimental XML version of the DTD) about specialPara? What should we do in the long term (TEI P3.5 and P4) about specialPara?

Corrigible errors identified in this document are: absence of semicolons in parameter-entity references use of ampersand connectors in four content models use of #PCDATA not as prescribed in XML 1.0 excess parentheses in definition of phrase

Miscellaneous Housekeeping

A few scraps necessary for housekeeping have no obvious home in this document; I'll put them here.

Before we define component, we need to embed all the entity files for the selected tag sets:  <![ %TEI.verse; [ <!ENTITY % TEI.verse.ent system 'teivers2.ent' > %TEI.verse.ent; ]]> <![ %TEI.drama; [ <!ENTITY % TEI.drama.ent system 'teidram2.ent' > %TEI.drama.ent; ]]> <![ %TEI.spoken; [ <!ENTITY % TEI.spoken.ent system 'teispok2.ent' > %TEI.spoken.ent; ]]> <![ %TEI.dictionaries; [ <!ENTITY % TEI.dictionaries.ent system 'teidict2.ent' > %TEI.dictionaries.ent; ]]> <![ %TEI.terminology; [ <!ENTITY % x.common '' > <!ENTITY % m.common '%x.common %m.bibl; | %m.chunk; | %m.hqinter; | %m.lists; | %m.notes; | %n.stage;' > <!ENTITY % TEI.terminology.ent system 'teiterm2.ent' > %TEI.terminology.ent; ]]> <![ %TEI.linking; [ <!ENTITY % TEI.linking.ent system 'teilink2.ent' > %TEI.linking.ent; ]]> <![ %TEI.analysis; [ <!ENTITY % TEI.analysis.ent system 'teiana2.ent' > %TEI.analysis.ent; ]]> <![ %TEI.transcr; [ <!ENTITY % TEI.transcr.ent system 'teitran2.ent' > %TEI.transcr.ent; ]]> <![ %TEI.textcrit; [ <!ENTITY % TEI.textcrit.ent system 'teitc2.ent' > %TEI.textcrit.ent; ]]> <![ %TEI.names.dates; [ <!ENTITY % TEI.names.dates.ent system 'teind2.ent' > %TEI.names.dates.ent; ]]> <![ %TEI.figures; [ <!ENTITY % TEI.figures.ent system 'teifig2.ent' > %TEI.figures.ent; ]]> Note that the terminology entity file unwisely refers to common, which we thus must define in an ad hoc way.

Before we do that, we have to provide default values for all the tagset entities: <!ENTITY % TEI.prose 'IGNORE' > <!ENTITY % TEI.verse 'IGNORE' > <!ENTITY % TEI.drama 'IGNORE' > <!ENTITY % TEI.spoken 'IGNORE' > <!ENTITY % TEI.dictionaries 'IGNORE' > <!ENTITY % TEI.terminology 'IGNORE' > <!ENTITY % TEI.general 'IGNORE' > <!ENTITY % TEI.mixed 'IGNORE' > <!ENTITY % TEI.linking 'IGNORE' > <!ENTITY % TEI.analysis 'IGNORE' > <!ENTITY % TEI.fs 'IGNORE' > <!ENTITY % TEI.certainty 'IGNORE' > <!ENTITY % TEI.transcr 'IGNORE' > <!ENTITY % TEI.textcrit 'IGNORE' > <!ENTITY % TEI.names.dates 'IGNORE' > <!ENTITY % TEI.nets 'IGNORE' > <!ENTITY % TEI.figures 'IGNORE' > <!ENTITY % TEI.corpus 'IGNORE' >

And we need to define the TEI keywords and default generic identifiers: <!ENTITY % INHERITED '#IMPLIED' > <!ENTITY % ISO-date 'CDATA' > <!ENTITY % extPtr 'CDATA' > <!ENTITY % TEI.elementNames system 'teigis2.ent' > %TEI.elementNames;

Notation

The notation in this paper is fairly simple: E, E' (E-prime), F, G are regular expressions. For purposes of this discussion, they are also content-model groups. L(E) is the language accepted by E Σ is the alphabet (set) of atomic symbols used in the expressions E, etc. Σ* is any string of symbols in Σ, including the empty string I is the set of symbols named in the relevant (active) inclusion exceptions; in the context of a regular expression E', I should be taken to stand for an alternation of all the symbols i in the set I. In an actual content model, the expression written here as I* will normally be written %Istar; or (%m.I;)*, where the parameter entities are declared along these lines: ]]> i is an arbitrary symbol in the set I x, y are strings of atomic symbols (members of Σ*) xy is the concatenation of x and y.

Simple content-model normalizations

In order to ensure that the methods of handling inclusions are always applicable, it may be necessary to normalize content models. The following reduction rules should be applied repeatedly, as long as any reduction rules apply. if E is nullable, then red(E?) = red(E) red((E+)?) = red(E)* if E is nullable, then red(E+) = red(E)* red((E?)*) = red(E)* red((E+)*) = red(E)* red((E*)*) = red(E)* ]]> Reachability summary

TeiCorpus2 > teiHeader TeiCorpus2 > tei.2 tei.2 > teiHeader

]]>