%isogdoc %ISONums %ISOLat1 %ISOPubl ]> Tags for TEI Documents <tline>Supplement to ISO 8879 Annexe E <author>C. M. Sperberg-McQueen <docnum>TEI &docfile <date>&docdate </titlep> <abstract> This document lists and describes the tags used in Text Encoding Initiative documents to date (through March 1990) with reference to the tag set of ISO 8879 Annexe E (<q>ISO starter set</q>). Semantic and structural analogues with the ISO starter set are pointed out, as are divergences in the usage of tags which are present in the starter set. <p> This document is intended to provide the basis needed for extending the ISO starter set to handle the needs of documents internal to the TEI. The DTD to be so constructed is called here the TEI DTD for internal documents. This document does not now contain the DTD. </abstract> <toc> </frontm> <body> <h1>Introduction <p> This document lists and describes the usage of all the tags (generic identifiers) and entities found in a scan of all TEI documents coded in (S)GML, with the exception of some tags used as examples or used in error, which were omitted where detected. All tag and entity names found by a primitive SGML scanner are included, though some tags have not been used for some time, others appear to reflect some attempts at descriptive markup which were never implemented in Waterloo GML, and some of the entities are used solely to solve some special layout problems in a couple of documents. All tag names defined in the TEI extensions to Waterloo Script are included as well, whether they appear in the scanner's output or not. The unimplemented or experimental tags are included less for their historical interest, such as it might be, than because (a) this list should be useful to anyone working with TEI documents, whether old or new, and (b) it is simpler to include them than to decide correctly for each tag whether it is <q>real</q> or not. <p> Some peculiarities of existing TEI internal documents are also explained briefly for the benefit of those implementing software to handle these documents outside their native environment. <note> The section numbers are to be interpreted as though they had a leading decimal point. <q>11</q> and <q>12</q> are subordinate to <q>1</q>. </note> <!> <h1>1 Elements <p> This section lists elements used in TEI documents or defined in the ISO starter set, in several groups. First, tags present in both sets are listed (section 11), then tags defined but not used (section 12) or used by the TEI but not defined in the starter set (set subtraction). The last set (tags used by the TEI but not in the starter set) is divided into (a) tags which should be added to the DTD (section 13), (b) tags of uncertain status, which ought possibly to be added to the DTD (section 14), (c) tags to be renamed in accordance with the starter set usage (section 15), and (d) tags to be eliminated from documents as and when they are made conformant to the TEI DTD for internal documents (section 16). <!> <h2>11 Elements in ISO Starter Set Used by TEI Documents <p> These elements used in TEI documents already occur within the ISO starter set. <ul compact> <li>abstract <li>address <li>aline <li>appendix <li>author <li>backm <li>body <li>c <li>cit <li>date <li>dd <li>ddhd <li>dl <li>docnum <li>dt <li>dthd <li>fig <li>figcap <li>figref <li>fn <li>frontm <li>gd <li>gl <li>gt <li>hdref <li>hp1 <li>hp2 <li>hp3 <li>h1 <li>h2 <li>h3 <li>h4 <li>li <li>liref <li>lq <li>note <li>ol <li>p <li>preface <li>q <li>sl <li>title <li>titlep <li>toc <li>ul <li>xmp </ul> <!> <h2>12 Elements in ISO Starter Set Not Used by TEI Documents <!> These elements in the ISO starter set have not thus far been used in any TEI documents. They should be supported on principle, but partial implementations can leave these out with lower cost in utility than other tags. Tags which have not been used but which provide standard equivalents for TEI tags (e.g. <tag>bibliog</tag>) are not listed here. <ul compact> <li>artwork <li>f <li>figbody <li>figdesc <li>figlist <li>fnref <li>fr <li>gdg <li>glossary <li>hp0 <li>h0t <li>h1t <li>h2t <li>h3t <li>h4t <li>index <li>ix <li>lines <li>nl <li>th <li>tline <li>top1 <li>top2 <li>top3 <li>top4 </ul> <!> <h2>13 Elements to Be Added to the DTD <!> These elements are used in TEI documents and should be handled by any software for handling TEI documents in general. <gl> <gt>act <gd>within an <tag>action</tag>, says what is to be done <gt>action <gd>action item assigned to an individual during a meeting. Contains subtags <tag>who</tag>, <tag>act</tag>, and <tag>duedate</tag> (all required), and optionally <tag>docref</tag> tags. <gt>attend <gd>for meetings, contains lists of committee members present and absent. Contents are prose, use same content model as paragraph. <gt>bib <gd>within a <tag>bl</tag> (bibliographic list) marks the individual bibliographic items. End tag is omissible. <gt>bl <gd>bibliographic list (occurs e.g. within a bibliography). Takes <term>compact</term> attribute like other lists. <gt>cit <gd>name of cited (monographic) work (for formatting, typically italicizes) <gt>docref <gd>within an <tag>action</tag>, indicates what document number has been assigned for the work to be done. End tag is omissible. <gt>duedate <gd>within an <tag>action</tag>, indicates deadline. End tag is omissible. <gt>emph <gd>emphasized phrase <gt>event <gd>within an hourplan, the description of the event which is to occur at a given time (given by <tag>time</tag>). End tag is omissible. <gt>eventh <gd>header for event column within hour plan. <gt>hp <gd>hourplan (list composed of <tag>time</tag> and <tag>event</tag> pairs). Takes <term>compact</term>, <term>headhi</term>, <term>termhi</term>, and <term>tsize</term> attributes like definition lists. <gt>ital <gd>italicized phrase (when it is unclear why italics are used) <gt>lihd <gd>heading for a list item (<q>title</q> of the item, for structured list items in ordered, unordered, or simple lists). End tag is omissible. <gt>lp <gd>paragraph within a list but not within any list item. End tag is omissible. <gt>mdecl <gd>(sample) markup declaration (typically italicizes, bolds, or delimits with markup declaration open and close delimiters) <gt>org <gd>organization name (used in title page to label author information; see also <tag>rep</tag>). End tag is omissible. <gt>presentee <gd>within title page, gives name of organization to whom the document is presented (e.g. for a grant request, the name of the funding agency). End tag is omissible. <gt>rep <gd>name of an individual representing an organization (used in titlepage information and in lists of organizations and their representatives). End tag is omissible. <gt>sc <gd>schedule (list composed of <tag>scheddate</tag> and <tag>schedtodo</tag> pairs). Takes <term>compact</term> and <term>termhi</term> attributes like glossary lists. <gt>scheddate <gd>within a schedule, the date by which something is expected. End tag is omissible. <gt>schedtodo <gd>within a schedule, the description of what is to be accomplished by a given date. End tag is omissible. <gt>sgmlxmp <gd>empty tag to mark location of embedded external CDATA entity (for SGML examples). Takes <term>file</term> attribute to indicate location of the data. <gt>tag <gd>tag (marks SGML tags used within prose text -- may bold, italicize, or delimit) <gt>term <gd>technical term (typically italicized) <gt>time <gd>within an hour plan, the time at which something (given by <tag>event</tag>) is scheduled to occur. End tag is omissible. <gt>timeh <gd>header for time column within hour plan. <gt>who <gd>within an <tag>action</tag>, says who is responsible. End tag is omissible. </gl> <!> <h2>14 Elements to Be Added, Renamed or Changed <!> These elements are used in TEI documents but not defined in the ISO starter set. It is not clear to me whether these should be added to a TEI tag set or eliminated from any TEI documents made to conform to the TEI internal tag set. <gl> <gt>acronym <gd>Acronym of a participating organization <gt>box <gd>Segment of text printed within a box (e.g. to make it stand out) <gt>budgetsummary <gd>specialized table for budget summary information. If retained, this should acquire a <term>cols</term> attribute to match <tag>tbl</tag>. <gt>include <gd>empty tag to call attention to file inclusions. Takes <term>file</term>, <term>from</term>, and <term>to</term> attributes to specify location of data and allow inclusion of a subset of the file.<fn>It may be noted that the file <term>teiblank</term> regularly included in TEI documents does nothing but reset some internal Waterloo Script variables to allow leading blanks in the input stream to be ignored. Where such processing instructions are not needed, this entity can be mapped to an empty string.</fn> <gt>net <gd>network address <gt>set <gd>specify processing parameters for application. Takes attributes for name and value of parameter (<term>item</term> and <term>value</term>) and optionally for name of <term>tag</term> concerned. <gt>sub <gd>subscript string <gt>tel <gd>telephone number (in address list) </gl> <!> <h2>15 Elements to Be Renamed or Changed <!> These elements are used in TEI documents but have close analogues in the ISO starter set or in the set of tags already listed as being needed and should be used with the ISO names or those given in 13. Also listed here are tags used in TEI documents with nonstandard attributes or attribute values. <gl> <gt>assoc <gd>Name of an organization (in list of organizations). Use <tag>org</tag> instead. <gt>association<gd>Name of an organization (in list of organizations). Use <tag>org</tag> instead. <gt>back <gd>Back matter. Use <tag>backm</tag>. <gt>by <gd>In an action, name of responsible member. Use <tag>who</tag>. <gt>cite <gd>Cited title. Use <tag>cit</tag>. <gt>doc <gd>Document. Use <tag>general</tag> or whatever name is used for TEI DTD. For <term>id</term> attribute use <tag>docnum</tag>, for <term>date</term> attribute use <tag>date</tag>, both in the <tag>titlep</tag> area.<fn>In general, boilerplate files cab give the <term>docdate</term> entity as the content of the latter, and possibly the <term>docfile</term> as the content of the latter, if these entities can be conveniently initialized.</fn> <gt>docid <gd>Document id. Use <tag>docnum</tag>. <gt>eg <gd>Example. Use <tag>xmp</tag>. Where the example contains SGML markup a marked section must also be used. <gt>fig <gd><term>Font</term> attribute is not supported by the ISO starter set. <gt>front <gd>Front matter. Use <tag>frontm</tag>. <gt>gdoc <gd>Document. Use <tag>general</tag> or whatever name is used for TEI DTD. The <term>ju</term> attribute is not supported by the ISO starter set. Distribute the information now given in the <term>sec</term> attribute among the <term>security</term>, <term>status</term>, and <term>version</term> attributes. <gt>GS <gd>Gary Simons? This appears to be a typo: content rather than markup in brackets. <gt>head <gd>Heading. Use <tag>h1</tag>. <gt>h1sub <gd>Second line of an H1 title. Use <tag>h1t</tag> or just use multiple lines without delimiting. <gt>l <gd>line of a budget summary or table. Use the <tag>r</tag> tag for table rows. <gt>label <gd>header section of budget summary or table. Use the <tag>hr</tag> tag for header rows and place the <term>cols</term> value on the <tag>tbl</tag> tag. <gt>lit <gd><q>literal</q> tag for printing SGML without interpretation. Use <tag>tag</tag> tag instead. <gt>ol <gd><term>Compact</term> attribute does not take numeric values in the ISO starter set. <gt>row <gd>line of a budget summary or table. Use the <tag>r</tag> tag for table rows. <gt>sc <gd><term>Compact</term> attribute does not take numeric values in the ISO starter set. <gt>section <gd>Heading. Use <tag>h1</tag>. <gt>sl <gd><term>Compact</term> attribute does not take numeric values in the ISO starter set. <gt>subsection<gd>Heading. Use <tag>h2</tag>. <gt>t <gd>Title of section or list item. Use <tag>h1t</tag> or <tag>h2t</tag> for the former, <tag>lihd</tag> for the latter. <gt>table <gd>table. Use <tag>tbl</tag>. <gt>tblhdr <gd>header section of budget summary or table. Use the <tag>hr</tag> tag for header rows and place the <term>cols</term> value on the <tag>tbl</tag> tag. <gt>tp <gd>title page. Use <tag>titlep</tag>. <gt>ul <gd><term>Compact</term> attribute does not take numeric values in the ISO starter set. <gt>xmp <gd><term>Font</term> attribute is not supported by the ISO starter set. </gl> <p> The use of the <term>font</term> attribute on various tags should perhaps be allowed rather than forbidden, in which case the corresponding entries in the list above should be moved to section 13 or 14. <!> <h2>16 Elements to Be Eliminated <!> These elements are used in TEI documents and have no direct analogue in the ISO starter set, but I believe they need not be added to the TEI internal tag set; instead they should be avoided in TEI documents. <gl> <gt>budget <gd>body of budget summary. Starter set tables do not mark the table body as a unit. <gt>tblbody <gd>table body. Use no tag to group the rows of the body. Place <term>cols</term> information on the <tag>tbl</tag> tag, not here. </gl> <p> Also to be eliminated are the end tags which use <q>< + e</q> as the end-tag open delimiter rather than &etago;. <!> <h1>2 Entity References <!> This section lists entity names (or <q>symbols</q> in Waterloo Script terminology) used within TEI documents, grouped with respect to their appearance or non-appearance in the public entity sets of ISO 8879 and with respect to their utility for a TEI entity set for internal documents. <!> <h2>21 Standard Entities Used in TEI Documents <!> <p> The TEI documents scanned use explicitly only the following entities from ISO 8879: <ul compact> <li>amp (ampersand) <li>gt (greater than or left angle bracket) <li>hellip (horizontal ellipsis: three periods) <li>lt (less than or right angle bracket) <li>oslash (Scandinavian o with slash) <li>period </ul> <term>Hellip</term> is in the Publishing entity set (ISO 8879-1986(E) Clause D.4.3.3); <term>oslash</term> is in entity set Added Latin 1 (Clause D.4.2.1). The others are all in the Numeric and Special Graphic entity set (Clause D.4.3.1). <p> The following entities are used implicitly (they are used in the definition of some undelimited string-substitution rules which might be implemented on a conforming parser with shortrefs). <ul compact> <li>a e i o u A E I O U umlaut <li>a e i o u A E I O U circumflex <li>a e i o u A E I O U acute <li>a e i o u A E I O U grave <li>aring, szlig, ntilde, Ntilde, ccedil, Ccedil <li>c C acute (Added Latin 2) <li>ldquo, lsquo (from Numeric and Special Graphic) <li>mdash (from Publishing) <li>uml, acute, circ, grave (from Diacritical Marks) </ul> Except as noted, all of these are from the Added Latin 1 entity set. <p> From this I infer that if not all public entity sets of ISO 8879 can be adopted, the sets Added Latin 1, Diacritical Marks, Numeric and Special Graphics, and Publishing are the most crucial, with Added Latin 2 next important. <!> <h2>22 Entities to Be Added to the Entity Sets <!> These entities appear useful enough that they should be added to a TEI entity set. Any software for handling TEI internal documents should be able to recognize and process them. <gl> <gt>AI <gd>Committee for Text Analysis and Interpretation <gt>docdate <gd>Date document (or section) last revised. The TEI Waterloo Script extensions get this date from the system for each included file; since SGML does not allow dynamic redefinition of entities, a conformant document will have to assign a single document date for an entire document. <gt>docfile <gd>File in which document (or driver file) resides. The TEI Waterloo Script extensions use this entity as part of the document number; this allows a single canned header to be used for all documents. If it is onerous to implement this in conformant SGML I am willing to omit it. <gt>docstatus <gd>Version number or descriptive term like <q>Draft</q> or <q>Final</q>. TEI extensions to Waterloo GML use this value in generating page footers and other values (if status is draft, then docdate includes a time of day stamp). Could perhaps be hidden in Waterloo implementation if hard to handle elsewhere. <gt>etago <gd>String version of end-tag-open delimiter (</) <gt>ML <gd>Committee for Metalanguage and Syntax Issues <gt>stago <gd>String version of start-tag-open delimiter (<) <gt>sysfnam <gd>System file name. TEI extensions to Waterloo GML use this from time to time, but it should be hidden as far as possible within the Waterloo extensions. If hiding is not successful, it may pop up. Software would do well to be prepared for it. <gt>tagc <gd>String version of tag-close delimiter (>) <gt>TD <gd>Committee for Text Documentation <gt>TEI <gd>Text Encoding Initiative <gt>TR <gd>Committee for Text Representation </gl> <p> As a precautionary measure, the <term>stago</term>, <term>etago</term>, and <term>tagc</term> entities should probably be joined by similar entities for each delimiter named in the standard: i.e. AND, COM, CRO, etc. (as in Figure 3 of ISO 8879). <!> <h2>23 Entities to Be Renamed or Changed <!> These entities have ISO public analogues which should be used in preference to the forms used thus far. The ISO names are often opaque and might usefully be replaced with TEI names when those are ready. <gl> <gt>aa <gd>Scandinavian a. Use <term>aring</term>. <gt>ad <gd>a umlaut. Use <term>auml</term>. <gt>amper <gd>ampersand. Use <term>amp</term> <gt>blank <gd>Blank (space). Use <term>emsp</term>, <term>ensp</term>, or <term>numsp</term>. <gt>clsquote <gd>opening single quote. Use <term>lsquo</term>. <gt>cq <gd>closing single quote. Use <term>rsquo</term>. <gt>cquote(1) <gd>closing single quote. Use <term>rsquo</term>. <gt>cquote(2) <gd>closing double quote. Use <term>rdquo</term>. <gt>dash <gd>em-dash. Use <term>mdash</term>. ISO 8879 defines <term>dash</term> as a <term>hyphen</term>. That is bad semantics but not our responsibility. <gt>dwn <gd>down arrow. Use <term>darr</term>. <gt>emdash <gd>em-dash. Use <term>mdash</term>. <gt>lft <gd>left arrow. Use <term>larr</term>. <gt>opsquote <gd>opening single quote. Use <term>lsquo</term>. <gt>oq <gd>opening single quote. Use <term>lsquo</term>. <gt>oquote(1) <gd>opening single quote. Use <term>lsquo</term>. <gt>oquote(2) <gd>opening double quote. Use <term>ldquo</term>. <gt>rgt <gd>right arrow. Use <term>rarr</term>. <gt>sysrb <gd>Script system variable for hard space (required blank). Use <term>emsp</term>, <term>ensp</term>, or <term>numsp</term>. <gt>ud <gd>u umlaut. Use <term>uuml</term>. <gt>umlaut <gd>Umlaut or diaeresis. Use <term>uml</term> or <term>die</term>. <gt>up <gd>up arrow. Use <term>uarr</term>. <gt>uumlaut <gd>Lower-case u umlaut. Use <term>uuml</term> </gl> <!> <h2>24 Entities Not to Be Added <!> These entities have been used in TEI documents to handle specific problems but do not have wide enough utility to merit being included in a TEI entity set. They and similar entities may be defined as needed within specific documents. <p> The first set are normal text-replacement entities. <gl> <gt>ab <gd>Advisory Board <gt>ai <gd>Committee for Text Analysis and Interpretation (Use uppercase form for committee names) <gt>arcb <gd>Archives Board <gt>assnlogo <gd>the acronym for a given participating association <gt>assocname <gd>the full name of a given participating association <gt>comp <gd>Compensible Costs <gt>dirc <gd>Direct Compensible Costs <gt>dirn <gd>Direct Noncomp. Costs <gt>dirt <gd>Total Direct Costs <gt>element <gd>? (possibly '<!ELEMENT') <gt>fb <gd>Fringe Benefits <gt>ncomp <gd>Noncompensible Costs <gt>repaddr <gd>street address of a representative of a participating organization <gt>repcity <gd>city name of a representative of a participating organization <gt>repinst <gd>institutional affiliation of a representative of a participating organization <gt>repname <gd>name of a representative of a participating organization <gt>repnet <gd>network address of a representative of a participating organization <gt>sal <gd>Salary and Wages <gt>sc <gd>Steering Committee <gt>sysfnam <gd>System file name. TEI extensions to Waterloo GML use this from time to time, but it should be hidden as far as possible within the Waterloo extensions. <gt>t <gd>Total <gt>tc <gd>Total Compensible <gt>tei <gd>Text Encoding Initiative <gt>teiana <gd>Text Analysis and Interp. <gt>ti <gd>Total Indirect Costs <gt>tn <gd>Total Noncompensible <gt>wcom <gd>Working Committees </gl> <p> The next set are Boolean values used in a mass mailing to control inclusion and exclusion of sections (like parameter entities on marked sections) and integer values used to control formatting in sections coded in native Waterloo Script (rare, but occurs in the memoranda of understanding) or (rarely) Script functions which return an integer value. <gl> <gt>accepted <gd>Boolean value for controlling mail-merge <gt>achcol <gd>Integer value for controlling layout <gt>achend <gd>Integer value for controlling layout <gt>appointed <gd>Boolean value for controlling mail-merge <gt>colwid <gd>Integer value for controlling layout <gt>invited <gd>Boolean value for controlling mail-merge <gt>midcol <gd>Integer value for controlling layout <gt>nwidth <gd>Integer value for controlling layout <gt>sysll <gd>Integer value for layout control (system line length) <gt>syspdev <gd>Script system variable for type of print device <gt>uiccol <gd>Integer value for controlling layout <gt>uicend <gd>Integer value for controlling layout <gt>xwidth <gd>Integer value for controlling layout </gl> <p> The final set are not really entity names at all, but pseudo-entities which will be rejected by a conforming SGML processor. <gl> <gt>D <gd>British Library R&D Report <gt>I <gd>TEI A&I Committee <gt>LC <gd>L&LC, Literary & Linguistic Computing <gt>T <gd>AT&T Bell Laboratories <gt>x <gd>Script hex function (e.g. & + x + ' + 16 + . = hex 16) </gl> <!> </body> </teidoc00>