Guidelines for Text Encoding for Interchange

======================================================================== 784 Return-Path: Received: from UICVM (NJE origin SMTP@UICVM) by UICVM.CC.UIC.EDU (LMail V1.2a/1.8a) with BSMTP id 9027; Fri, 20 May 1994 12:29:28 -0500 Received: from oxmail2.ox.ac.uk by UICVM.UIC.EDU (IBM VM SMTP V2R1) with TCP; Fri, 20 May 94 12:28:54 CDT Received: from vax.ox.ac.uk by oxmail2.ox.ac.uk. with SMTP (PP) id <24399-0@oxmail2.ox.ac.uk.>; Fri, 20 May 1994 18:18:09 +0100 Received: by vax.ox.ac.uk (MX V4.0-1 VAX) id 2; Fri, 20 May 1994 18:20:47 +0100 Sender: lou@vax.ox.ac.uk Date: Fri, 20 May 1994 18:20:46 +0100 From: Lou Burnard To: U35395@UICVM.UIC.EDU Message-ID: <0097EB95.6B3E8372.2@vax.ox.ac.uk> From: OXVAXD::LOU "Lou Burnard" 20-MAY-1994 18:17:32.52 To: MX%"techapp@cix.compulink.co.uk" CC: LOU Subj: Re: draft paper for OII workshop in Luxembourg The Wider Relevance of the Text Encoding Initiative Lou Burnard (Oxford University Computing Services) 12 Nov 1993 Introduction

Standards come into being in many different ways. They may come about as ad hoc consequences of market forces; an obvious example is the IBM pc. They may result from pressure applied by well-intentioned groups of experts; much ISO standardization is of this type. Or standards come about as a result of the gradual recognition by all members of a large community that convergence on a common set of principles and practices is in their own best interests. This last method is the most likely to last, but the most difficult to achieve. This paper, derived from one presented at an OII Workshop in Luxembourg in December 1993 gives a brief introduction to one such: the Recommendations for Text Encoding for Interchange published by the ACH-ALLC-ACL Text Encoding Initiative (TEI) in May 1994.

The TEI addresses three problems common to all users of modern information technology: the difficulty of ensuring that information is reusable; the difficulty of ensuring that information represented in different ways can be seamlessly integrated; and the difficulty of facilitating loss-free information interchange between the widest choice of different platforms, different application systems and different languages. Its approach to these problems is characterized by innovative uses of SGML and an ambitious modular approach to dtd development, which we believe provides a model for many other applications.

These needs are common to many communities: in particular, they are likely to be as important in the commercial sector as the research community. In a world where technology continues to mutate at an alarming rate, the need for a scheme which is designed to be future proof is surely as great within an enterprise attempting to maximize the return on an expensive investment, as it is to an academic attempting to plan for a research project which may last for decades. What is the TEI?

The Text Encoding Initiative (TEI) is a research project sponsored and organized by three leading professional associations in the field of computer-assisted literary and linguistic research: the Association for Computational Linguistics (ACL), the Association for Literary and Linguistic Computing (ALLC) and the Association for Computing and the Humanities (ACH). These societies have a combined membership of several thousand leading scholars, researchers and teachers worldwide.

The TEI has been funded throughout its five years of activities on both sides of the Atlantic: primarily by the US National Endowment for the Humanities and by the EC within its framework Programme for Linguistic Research and Engineering, but also with grants from the Mellon Foundation and from the Canadian Social Sciences and Humanities Research Council. Of equal significance has been the donation of time and expertise by the many members of the wider research community who have served on the TEI's Working Committees and Working Groups.

The TEI community is a particularly demanding one: the purpose of research is to discover solutions to problems that have not yet been posed, and any scheme designed to support research must therefore place more emphasis on flexibility and extensibility to cope with the unforeseen than on highly optimized solutions to well understood problems.

At the same time, academic researchers are likely to be as impatient as anyone else with solutions that require extensive specialist knowledge of little relevance to the problem in hand. The TEI scheme must therefore be simple to grasp in its essentials while retaining an ability to scale up gracefully.

As its title suggests, the TEI is strongly interested in text. But this interest is by no means confined to the use of electronic text as a stage in the production of paper documents. It is equally concerned with the usage of electronic text as an end in itself, whether as a research database or a component in non-paper publications. Like the publishing industry, the research community has long realised that its stock in hand is not words on the page, but information, independent of any particular physical realization. As technology begins to emerge which is genuinely adequate to the task of integrating text, graphics and audio into a seamless information-bearing vehicle, so the importance of that integrated vision becomes more apparent. By providing a description of information which is independent of realization or media, the TEI scheme, like other SGML-based approaches, enormously facilitates the construction and exploitation of multimedia technology.

The TEI Guidelines have a dual focus: being concerned with both what should be encoded (i.e. made explicit) in an electronic text, and how that encoding should be represented for interchange. The approach taken is a two stage one: firstly, the identification of those distinctions concerning which there is common agreement; secondly the creation of a uniform encoding system within which those distinctions can be expressed for interchange. Early on in the project, the Standard Generalized Markup Language (SGML; ISO 8879) was chosen as the most appropriate vehicle to represent the textual features identified by the scheme, on the purely pragmatic grounds that no other candidate seemed to meet the project's initial design goals. If SGML had proved inadequate to the needs of researchers, we would have abandoned it without a qualm; perhaps fortunately, it did not. On the contrary, it has proved remarkably difficult to find problems for which a solution could not be expressed in SGML.

The prime deliverable of the TEI scheme is a large and integrated collection of SGML tag sets, providing hardware-, software-, and application- independent support for the encoding of all kinds of text in all languages and of all times. These tag sets are necessarily based on, but not limited by, existing encoding practices; they are designed to be both comprehensive and extensible. They are collectively documented in a substantial reference manual, the Guidelines for Text Encoding for Interchange. A first draft of this publication appeared in November 1990. Between 1992 and 1994, chapters of a revised draft were circulated electronically. The fully revised and completed document was finally published in May 1994. Guidelines for the encoding and interchange of machine-readable texts edited by C.M.Sperberg-McQueen and Lou Burnard (Chicago and Oxford, ALLC-ACH-ACL Text Encoding Initiative, 1994). It is published both in paper form and also as a set of SGML files freely available over the internet. Organization of the TEI scheme

As an SGML application, the TEI scheme necessarily requires the existence of some kind of document type definition (DTD). Current approaches to dtd design may be caricatured as falling into one of three camps, depending on their answer to the question How many DTDs does the world need?.

For many of the first users of SGML, the appropriate answer was One: the whole purpose of the exercise being to define a template against which all texts could be checked rigourously and consistently. This approach, which might be characterized by the phrase we know what's best for you, has an obvious place in applications such as technical documentation, but is equally obviously inappropriate where the object of the exercise is to describe texts produced before the blessings of structured document design were revealed to the world.

At the opposite extreme are those whose answer would be none, for whom no DTD can ever be adequate to the full complexity of the texts to be described: this attitude might be caricatured as No-one will ever understand my problem. Again, it is not impossible to imagine applications for which a DTD consisting only of elements with the content model ANY would be entirely appropriate (the first electronic edition of the Oxford English Dictionary provides one obvious example), although its usefulness in the general case is less clear.

Perhaps most numerous are those who shrug their shoulders and say as many as it takes: the world will always need new DTDs, in the boundary case, one per document. In the name of pragmatism, this attitude risks crowding the fledgeling possibility of information interchange out of the nest entirely; nevertheless, its popularity reminds us of the importance of ensuring that the document must drive the DTD, rather than the reverse.

The approach taken by the TEI attempts to combine virtues of all three of these approaches. It defines not one, but many possible DTDs, which may be tailored to the needs of a particular application in a way difficult or impossible with most other general purpose DTDs so far developed. The user of the TEI scheme is offered the opportunity of building a DTD which matches his or her requirements, but constrained to do so in a way that facilitates interchange.

We refer to this somewhat jocularly as the Chicago Pizza model (see figure ). All pizzas have some ingredients in common (cheese and tomato sauce); in Chicago, at least, they may have entirely different forms of pastry crust, with which (universally) the consumer is expected to make his or her own selection of toppings. In the same way, the user of the TEI scheme constructs a view of the TEI DTD by combining the core tag sets (which are always present), exactly one base tag set and his or her own selection of additionaltag sets or toppings.

We use the term tag set to denote simply a collection of definitions for SGML elements and their attributes. In general, elements appear in only one tag set, though the current model allows for the redefinition of elements within different base tag sets. Elements may not be defined in more than one additional tag set.

This modularization is achieved by the use of parameter entities in the TEI DTD, which is further discussed below. To illustrate the basic mechanism we present here the start of a minimal TEI-conformant document in which the base tagset for prose has been selected together with the additional tag set for linking: ]> ]]> Because this selection of tag sets is effected explicitly by declarations within the DTD subset, as shown above, any recipient of the document can tell which TEI tag sets are required to process it. Any deviations or modifications of the TEI definitions (for example, the renaming of elements, or the addition of new ones) may be made in a similar declarative manner. The TEI core

Two core tag sets are available to all TEI documents without formality. The first defines a large number of elements which may appear in almost any kind of document, whatever kind of base tag set is in use. The second defines the header, providing something analogous to an electronic title page for the electronic text, as further discussed in section below. Elements available to all bases

The core tag set common to all TEI documents provides means of encoding with a reasonable degree of sophistication the following: textual features: typographically highlighted phrases, (optionally distinguishing amongst highlighting for emphasis, technical terms, foreign words, titles etc.) quoted phrases, optionally distinguishing amongst direct speech, quotation, glosses, cited phrases etc. names, numbers and measures, dates and times, and similar data-like phrases. lists of all kinds basic editorial changes (e.g. correction of apparent errors; regularization and normalization; additions, deletions and omissions) simple links and cross references, providing basic hypertextual features. pre-existing or generated annotation and indexing bibliographic citations, adequate for most commonly used bibliographic packages, in either a free or a tightly structured format simple or complex referencing systems, not necessarily dependent on the existing SGML structure. There are few documents which do not exhibit some of these features; and none of these features is particularly restricted to any one kind of document. In most cases, additional more specialized tag sets are provided for those wishing to encode aspects of these features in more detail (see further section below), but the elements defined in this core are believed to be adequate for most applications most of the time. The header

The TEI scheme attaches particular importance to the provision of documentary or bibliographic information about electronic texts. Such information is essential for any satisfactory interchange of texts coming from multiple sources, or for which long term uses are envisaged. As with software, leaving the documentation of an electronic text to the last moment is a recipe for disaster all too commonly followed.

The TEI header is one of the few mandatory elements in a TEI document. It has four major divisions which together provide a detailed syntax for the documentation of: the electronic document itself and the sources from which it was derived the encoding system which has been applied descriptive information categorizing the document and its subject matter its revision history

The first of these, the file description, contains traditional bibliographic material, detailing title, intellectual responsibility and publication or distribution information relating to an electronic text, which can readily be translated into a conventional catalogue record for use by the growing number of forward-thinking academic and public libraries now coming to terms with their new role as curators of non-print electronic materials.

Several commentators, noticing how the day to day information processing of all sectors of the economy now takes place in electronic form only, have expressed concern at the difficulties faced by librarians and archivists in handling these new forms of historical records. Others, trying to come to terms with the wealth of information in cyberspace, have lamented the absence of any effective cataloguing standards for networked resources and other forms of electronic publication. The TEI Header represents a major contribution to overcoming both these problems.

Many electronic texts are essentially derivative works, created either by keying or scanning previously existing print materials, combining or modifying previously existing electronic materials, or both. The source description part of the TEI header allows an encoder to specify the source or sources from which a text has been derived, using traditional bibliographic concepts. The pedigree of a TEI-conformant text can thus be specified, in the same way as a conventional book will generally document its publishing history. A detailed formal description of changes made in producing a text can be recorded as a distinct revision history ; this is particularly useful for highly dynamic texts.

As noted above, the TEI is not a fixed encoding scheme, but offers a variety of options appropriate to different situations. Consequently, the encoding description within a TEI Header is of particular importance to users of an electronic document. It provides, in structured or unstructured form, vital information about editorial conventions or policies, design decisions and even the selection of tags actually used within the document.

The profile description is used to group together a wide range of additional descriptive information ranging from specifications of the languages used within it, the situation or social context in which it was produced, its topics or classification, to demographic or social characteristics of its authors or participants. No-one is likely to need all of these categories of information, but the working groups involved in defining the header agreed that all of them are likely to be essential to some users.

At one extreme, an encoder may provide only a bibliographic identification of the text. At the other, encoders wishing to ensure that their texts can be used for the widest range of applications, will want to provide a level of detailed documentation approximating to the kind most often supplied in the form of a manual. Most texts will lie somewhere between these extremes; textual corpora in particular will tend more to the latter extreme.

A collection of TEI headers can also be regarded as a distinct document, and an auxiliary DTD is provided to support interchange of headers alone, for example between libraries or archives. The TEI base tag sets

To construct a view of the TEI DTD, the user must always choose one of eight base tag sets. Six of these are intended for documents which are predominantly composed of one type of text; the other two are provided for use with texts which combine these basic tag sets. prose verse drama transcribed speech letters and memoranda dictionary entries terminological entries

Each TEI base tag set determines the basic structure of all the documents with which it is to be used. More exactly, it defines the constituents of text elements, combined as described above. In practice, so far, almost all the TEI bases defined are very similar in their basic structure, though means exist for them to vary. They do however differ greatly in their components: the subelements likely to appear within the divisions of a dictionary (for example) will be entirely different in kind from those likely to appear within the divisions of a letter or a novel. To cater for this variety, the constituents of all divisions of a TEI text element are not defined explicitly, but in terms of parameter entities. Figure gives a simplified description of the mechanism used: ]&null;]> ]&null;]> Within the body of the DTD, elements are defined using these parameter entities only, for example: When a base tag set is selected, one or other of the two optional entity declarations will be "activated" by a declaration within the DTD subset such as: This will over-ride the declaration within the TEI DTD itself, because it is given first. If no base is declared, the DTD will not compile. Figure 2: Use of Parameter Entities in TEI DTD ]]>

The components of a text are parameterized by the use of entities whose values are specific to the particular base in use. All textual divisions are defined with the same content model, which includes a reference to the parameter entity &perc;component.seq; the value of this parameter entity will however be different in different bases. In this way it is possible for the divisions of a text using the drama base (for example) to consist of speeches and stage directions, while those of a text using the dictionary base will consist of lexical entries. Textual Divisions

Some forms of text (notably transcriptions of spoken language) are only notionally divisible into multiple levels of structure. For the majority however, there is a bewildering and highly application or culture-specific variety of high level units into which they may be divided. Fundamentally however, all objects such as chapters, sections, entries, acts and scenes, cantos etc. seem to behave in the same way: they are incomplete in themselves, and often nest hierarchically. In the TEI scheme all such objects are therefore regarded as the same kind of element, called here a division; though a distinction is made between divisions whose hierarchic position is regarded as inseparable from their semantics (these are encoded as div1, div2 etc. down to div7 elements) and those for which their position in the document tree is regarded as of lesser importance (these are known as vanilla divs). Numbered and un-numbered division elements may not be mixed in the same front, body, or back element.

A type attribute may be used to distinguish amongst divisions in some respect other than their hierarchic position: the values for this attribute (as for several others in the TEI scheme) are not standardized, precisely because no consensus exists, or is likely to exist, as to a generic typology. A set of legal values should however be defined for a given application, either in the TEI Header or by a user-defined modification.

In the normal case, the components of all divisions in a particular base are homogeneous --- they all use the same value for &perc;component.seq. However, the scheme also allows for two kinds of heterogeneity. If the general base is selected, together with two or more other bases, then different divisions of a text may have different constituents, though each division must itself be homogeneous. A mixed base is also defined, in which components from any selection of bases may be combined promiscuously across division boundaries. The way in which this is done is beyond the scope of this article; full details are however given in the appropriate chapter of the TEI Guidelines. The TEI Class System

Textual features, and hence the elements which encode them, may be categorized or classified in a number of ways. The TEI scheme identifies two kinds of classification scheme: attribute classes and model classes. The distinction is however more formal than semantic; both are used for broadly similar purposes.

Members of an attribute class share the same set of attributes. For example, all elements which represent links or associations between one element and another do so using a common set of attributes, and are thus regarded as forming the attribute class pointer. All elements are members of at least one attribute class, the class global, which is further discussed below (section ).

Members of a model class share the same structural properties: that is, they may appear at the same position within the SGML document structure. For example, the class phrase includes all elements which can appear within paragraphs but not spanning them, while the class chunk includes all elements which cannot appear within paragraphs (such as paragraphs, for example). A class inter is also defined, for elements such as lists which can appear either within or between chunk elements. Similarly, the class divtop contains all elements (headings, epigraphs etc.) which can appear at the start of a textual division.

As well as these general purpose classes, some functional or semantic classes are defined: for example, all elements used to mark editorial corrections or omissions are all members of the class edit; elements marking bibliographic citations etc. are all members of the class bibl and so on.

Elements may of course be members of more than one class. Classes may have super- and sub-classes, and properties (notably associated attributes) may be inherited. For example, reflecting the needs of many TEI users to treat texts both as documents and as input to databases, a sub-class of phrase called data is defined to include data-like features such as names of persons, places or organizations, numbers and dates, abbreviations and measures. These behave in exactly the same way as phrase elements, and so the data class is a sub-class of the phrase class.

The formal definition of these classes in the SGML syntax used to express the TEI scheme makes it possible for users of the scheme to extend it in a simple and controlled way: new elements may be added into existing classes, and existing elements renamed or undefined, without any need for extensive revision of the TEI document type definitions. The process is demonstrated in . To add a new element (say, "keywords") to this class, enabling it to appear anywhere in the content model that other members of the class do, all that is needed is to re-define the "x-entity" within the document type subset: The global attributes

One particularly important class is the global attribute class: all elements in the TEI scheme belong to this class and may therefore bear the following attributes: idprovides an SGML identifier for an element nprovides a possibly non-unique name or number for an element langspecifies the language and hence the writing system used for an element rendprovides information about the rendering of an element where this is not otherwise specified

The id and n attributes allow for the identification of any element occurrence within a TEI-conformant text. Elements carrying an id attribute value may be the object of a link or cross-reference, or any of the other re-structuring mechanisms proposed by the TEI for circumventing the rigidly hierarchic structure of a simple SGML DTD. The fact that the requirement for such links is usually unpredictable is one reason for making this attribute global.

Values on id attributes must be unique (their declared value is ID). Values on the n attribute however need not be; they may be used to carry a TEI canonical reference. A method for defining the structure of such canonical reference schemes is also provided, so that documents using it can be processed automatically.

The lang attribute indicates both the language and hence the writing system applicable to the element's content, thus providing explicit support for polyglot or multiscript texts. If no value is given, that of the element's direct parent is assumed. (A number of TEI attributes have this characteristic, which is catered for by a TEI-defined keyword). The value of this element identifies a special purpose language element which documents the language in use, optionally associating it with an external entity in which a formal writing system declaration may be given.

The TEI writing system declaration (WSD) attempts to help encoders come to terms with a world in which, for one reason or another, documents may not always use the same universal character set, whether from ignorance, perversity, or the sheer impossibility of finding one large enough to represent all the glyphs they contain. It provides for the systematic documentation of a writing system, in terms of existing international or other standards, public or private entity sets, ad hoc transliteration schemes or explicit definitions, as well as combinations of all four.

Finally, the global rend element may be used to give information about the physical presentation of the text in the source, where this is not otherwise given. A default rendition may be specified for all elements of a given type. No specific set of values is defined for this attribute in the current draft, though it is probable that some suitable set of DSSSL primitives will be proposed in a later version.

It should be stressed that the rend element is not intended for use as a means of specifying the desired formatting of an element, except insofaras this may be determined by a desire to mimic the approximate appearance of the original text. Like other SGML applications, the TEI scheme attempts to provide elements for the encoding of those textual features deemed essential to a productive use of the encoded text; however, unlike most other SGML applications, the TEI scheme recognizes that for some, it is precisely the appearance of a text which is the object of research. The TEI additional tag sets

A number of optional additional tag sets are defined by the current proposals. These include tag sets for special application areas such as alignment and linkage of text segments to form hypertexts; a wide range of other analytic elements and attributes; a tag set for detailed physical description of manuscript material and another for the recording of an electronic variorum modelled on the traditional critical apparatus; tag sets for the detailed encoding of names and dates; abstractions such as networks, graphs or trees; mathematical formulae and tables etc.

In addition to these application-specific specialized tag sets, a very general purpose tag set is also proposed for the encoding of entirely abstract interpretations of a text, either in parallel with it or embedded within it. This is based on the feature structure notation employed in theoretical linguistics, but has applications far beyond linguistic theory. A good introduction to this tag set is provided by D. T. Langendoen and G.F. Simons ``A rationale for the TEI recommendations for feature-structure markup'' in Computers and the Humanities (forthcoming, 1994); for an extended discussion of an application of the feature structure scheme to the problems of encoding historical source materials, see D. I. Greenstein, and L. Burnard ``Speaking with one voice'' (ib). Using this mechanism, encoders are at liberty to define arbitrarily complex bundles or sets of features identified in a text, according to their own methodological bias. They may thus embed a whole range of interpretations of a text, linguistic, literary, or thematic, within a text in a controlled manner. The syntax defined by the Guidelines not only formalizes the way in which such features are encoded, but also provides for a detailed specification of legal feature value/pair combinations and rules determining, for example, the implication of under-specified or defaulted features. This is known as a feature system declaration.

A set of additional elements is also provided for the encoding of degrees of uncertainty or ambiguity in the encoding of a text. These two tag sets exhibit in a particularly noticeable form one of the chief strengths of the TEI approach to encoding: it provides the encoder with a well-defined set of tools which can be used to make explicit his or her reading of a text. No claim to absolute authority is made by any encoder, nor ever should be; the TEI scheme merely allows encoders to come clean about what they have perceived in a text, to whatever degree of detail seems appropriate.

A user of the TEI scheme may combine as many or as few additional tag sets as suit his or her needs. The existence of tag sets for particular application areas in the current draft reflects, to some extent, accidents of history: no claim to systematic or encyclopaedic coverage is implied. Indeed, it is confidently expected that new tag sets will be added, and their definition will form an important part of the continued work of this and successor projects. From General to Specific

The TEI Guidelines have taken more than five years to reach their present state, the first at which they can be said to be reasonably complete. In retrospect, it is doubtless true that they could have been created much more quickly with less involvement from the research community, or a clearer statement from it of a set of particular goals. But that statement would have inevitably limited the scope of the resulting scheme, providing exactly the kind of strait-jacket which we wished to avoid. Moreover, by prioritizing any one research agenda however well-articulated, we would have effectively disenfranchized and alienated all others. A little like the early Church fathers then, the TEI chose to provide as broad and as catholic a means of salvation as possible.

At the same time, the TEI scheme applies rigorously the principle essentia non sunt multiplicanda praeter necessitatemGenerally attributed to William of Occam (1300-1349), this recommendation is known as Occam's Razor; it may be translated as Essences should not be unnecessarily multiplied and refers properly to the distinction made by the Scholiasts between essence --- those properties of an entity which define its type and accidents --- those properties specific only to one instance of an entity. Thus all kinds of links between document elements, whatever their semantics, are encoded using the same tag set, in just the same way as all kinds of analytic segmentation of elements may be performed using the same jtag set. The use of feature structure analysis is one instance of this principle; another is the way in which a

At the same time, there are many situations in which the TEI's desire to exclude no-one has lead to a multiplication of distinctions at first sight rather bewildering. It seems to say the least unlikely that anyone will ever encode a document using every possible element defined by the union of every TEI tagset, though such a monster DTD is indeed possible. Even in a relatively small area such as the definition of text classification schemes, the TEI proposes three parallel (and mutually incompatible) methods. In the matter of hypertextual addressing the TEI syntax permits of 14 different location methods. Names of persons places and organizations may be left unmarked, tagged simply as referring strings, or analysed into subcomponents specific to them. Bibliographic citations may be presented as simple prose, or as assemblages of specific elements, either highly structured or loosely assembled. The Guidelines are even, seemingly, unable to make up their mind whether to organize text into numbered or un-numbered divisions!

It is probable that many people confronted by the 1400 pages of the current printed version are likely to derive less comfort from knowing that somewhere in it exists precisely the general-purpose solution they need than they would from a demonstration of the application of that general mechanism to the specific problem currently facing them. As published, the Guidelines constitute a substantial document not intended for casual browsing. The TEI therefore plans to make available a number of smaller introductory tutorials focused on particular application areas. Two such have already appeared: one dealing with terminological systems, Melby, Alan et al Terminology Interchange Format (TIF): a tutorial (Vienna, Infoterm, 1993) , and another on encoding of manuscript transcriptions Robinson, Peter Encoding of Primary Sources Using SGML, Oxford, Office for Humanitires Communication, 1994; others are in the planning stage. Conclusions

There are many reasons why the TEI Recommendations deserve consideration outside the academic world. This article has focussed chiefly on the complexity and generality of the scheme, with a view to demonstrating the intellectual adequacy of the TEI scheme as a good model for many SGML applications. It has also attempted to demonstrate how a simple modular scheme can be implemented in such a way as to maximize the interchange space within which information interchange takes place. The origins of the TEI scheme in the academic world mean that it has been designed with the widest possible set of applications in mind. Optimizing it for particular sets of users will be a new challenge. For more information...

A TEI electronic bulletin board is maintained at the University of Illinois at Chicago: this is used to announce availability of TEI drafts and other publications, and to distribute them in electronic form, as well as providing an open forum for comment and discussion of the TEI recommendations. To subscribe to this service, send an electronic mail message in the form SUBSCRIBE TEI-L Your Name to the address Listserv@uicvm.uic.edu.

TEI publications are to be found at a number of anonymous FTP archives, notably that maintained by the University of Exeter at sgml.ex.ac.uk in the directory tei.