About These Guidelines

Intended Applications

Interchange, Local Use, and Data Creation These guidelines are intended to be useful in the interchange of text from one scholar to another, from one research group or center to another, or from one computing system to another. Analogously, they should serve in moving text from one application program to another, or maintaining text in a format common to several applications, rather than maintaining several copies of a text, one in each format required. And finally, they are also intended to provide guidance to the scholar embarking on the creation of an electronic text, both as to what textual features should be captured and as to how they should be represented. These guidelines thus serve three primary functions: These three functions are not identical, but they so thoroughly interrelate in practice that it is hardly possible to achieve any one without achieving the others. The line between data capture and local processing, especially, disappears entirely when one considers that the aim of local processing might be precisely the capture of new information about the text, which is to be represented by the encoding.

Use of Guidelines for Interchange When these guidelines are used for interchange, it is expected that researchers or centers which use other encoding schemes internally in the center or project will translate outgoing data from the encoding scheme used internally into the scheme described by these guidelines, and similarly translate incoming data from the scheme described here into that used internally. The scheme described here is designed to enable such translation to occur without information loss. That is, the scheme described here has been designed to be at least as expressive (in a formal sense) as any encoding scheme now known to be in wide use for textual research. The extension techniques described in chapter 9 may be used to give the TEI scheme whatever tags are necessary to capture the information in a non-TEI encoding; the intention has been to minimize the need for recourse to such extensions. In the simple case, the two sites or individuals exchanging texts know each other and know or can inquire what equipment the other is using. In the general case, however, a text may be made publicly available through an archive, a bulletin board, anonymous file transfer server, or other mechanism, without either the originator or the final recipient of the text knowing who the other is. In the simple case, these guidelines serve primarily as a convenient pre-existing documenation of a file format which can be referred to without being transmitted. Existing software may also make the transfer through this format simpler. Special variations in format to suit special requirements of the partners are possible by private arrangement. In the general case, of course, such special arrangements are impossible; both originator and recipient should be prepared to follow the guidelines strictly. There is not, in this draft of the guidelines, a separate formal definition of the interchange format as opposed to the general recommendations for local processing; this reflects the vagueness of the distinction. The interchange format is to be understood as requiring:
  1. strict adherence to the DTDs and the SGML declaration reproduced in the appendix, unless modified or extended as described in chapter 9
  2. provision of tag documentation as described in part II for all tags not defined in these guidelines
  3. strict adherence to the requirements of the text documentation area in providing bibliographic identification of the text and description of the encoding practiceStrictly speaking, what is required (as described in Chapter 5) is that the required information either be provided or be marked as unavailable. The option of marking information as unavailable is intended to enable sites with large collections of existing texts to export conforming texts without having to enrich their existing databases. It is emphatically not recommended as a method of evading the need to provide sound documentation of a text.
  4. rendition of the text in the characters of the Minimal Character Repertoire described in chapter 4This is SDR's character conformance level 1.
If a more formal definition of the interchange format is required for any reason, those interested should contact the Text Encoding Initiative to describe the requirements which need to be met.

Use of the Guidelines for Local Processing The need to create a language rich enough for information-preserving interchange entails ensuring that the language can represent the information represented in any scheme intended for a specific application of computers to texts. Any single language adequate for many applications will necessarily have interest for anyone using more than one kind of application software on their texts, or even for those developing new software for just one application. Machine-readable text can be manipulated in many ways; our aim has been to avoid assuming too much about what the reader of these guidelines will do with texts markup up according to these rules. It is assumed that this markup must be able to be used by programs which: The aim has been to make these guidelines useful for marking up texts used in any of these applications; this has meant trying to avoid anything which would restrict their use in texts intended for any other application. It is safe to assume that printing and editing, being the most universally familiar operations upon machine-readable text, received at least as much attention as others, but the aim has never been to create yet another language for controlling text formatters or editors.

Use of the Guidelines in Text Creation The description of textual features found in the chapters which follow should provide a useful checklist for scholars planning the creation of machine-readable versions of any text. Where there appears to be consensus in the text-computing community on what constitutes good or bad practice in some particular area, specific comments to that effect are provided in the chapters which follow. Where a given feature is generally found useful, the tag for that feature is recommended for general use; where it is found not worth tagging, it is disparaged. Where the feature is neither generally useful nor generally pointless, its tagging or omission is left to the discretion of the individual working with the text. At the least, therefore, these guidelines should be useful in deciding what to capture and what to lose when representing a text in machine-readable form. Responsibility for the adequacy of the encoded text remains, of course, with the individual scholar. Problems specific to data-capture have not been considered explicitly in the pages which follow. The document type declarations in the appendix do specify when tags may be omitted when the text is being processed using the SGML OMITTAG feature, because this is a very simple and general method of easing data capture. In general, though, methods for minimizing keystrokes, correcting the results of optical scanning, or augmenting existing texts with useful information are too completely dependent upon the details of the individual situation to be susceptible to useful treatment here. The text being captured, the type of research foreseen, the computer system at hand, all affect the methods of data capture. Possible techniques for simplifying, speeding, or reducing the cost of data capture include editor macros and keyboard shorthands, simple parsers to recognize structural features in scanner output, special-purpose software to put word-processor or scanner data into SGML,A wide variety of this software has become available as a result of the CALS Initiative. the exploitation of SGML's rich set of mechanisms for minimizing the amount of markup which need be explicitly provided in a text,Because they substantially complicate processing for those who have no conforming SGML processor handy, these optional markup minimization features are all forbidden in the TEI exchange format; their use for local processing is, of course, a local decision. and even the development of special SGML document type declarations specifically for data capture, together with programs to read data in those forms and produce the desired form.