Design Principles

The basic goals of these Guidelines are determined by the results of a planning conference held at Vassar College in Poughkeepsie, New York, at the outset of the project. That planning conference agreed on the following statement of principles:

  1. The guidelines are intended to provide a standard format for data interchange in humanities research.
  2. The guidelines are also intended to suggest principles for the encoding of texts in the same format.
  3. The guidelines should
    1. define a recommended syntax for the format,
    2. define a metalanguage for the description of text-encoding schemes,
    3. describe the new format and representative existing schemes both in that metalanguage and in prose.
  4. The guidelines should propose sets of coding conventions suited for various applications.
  5. The guidelines should include a minimal set of conventions for encoding new texts in the format.
  6. The guidelines are to be drafted by committees on
    1. text documentation
    2. text representation
    3. text interpretation and analysis
    4. metalanguage definition and description of existing and proposed schemes,
    coordinated by a steering committee of representatives of the principal sponsoring organizations.
  7. Compatibility with existing standards will be maintained as far as possible.
  8. A number of large text archives have agreed in principle to support the guidelines in their function as an interchange format. We encourage funding agencies to support development of tools to facilitate this interchange.
  9. Conversion of existing machine-readable texts to the new format involves the translation of their conventions into the syntax of the new format. No requirements will be made for the addition of information not already coded in the texts.
These basic principles are expounded in various documents of the Text Encoding Initiative (notably TEI EDP1 and TEI EDP2) and the interested reader is directed to those documents for further discussion.

The mandate of creating a common interchange format requires the specification of a specific markup syntax as well as the definition of a large predefined tag set and the provision of mechanisms for extending the markup scheme. The mandate to provide guidance for new text encodings (suggest principles for text encoding) requires that recommendations be made as to what textual features should be recorded in various situations.

In designing the tag set and formulating the recommendations, the following design goals have been paramount. These Guidelines are intended to:

  1. suffice to represent the textual features needed for research
  2. be simple, clear, and concrete
  3. be easy for researchers to use without special-purpose software
  4. allow the rigorous definition and efficient processing of texts
  5. provide for user-defined extensions
  6. conform to existing and emergent standards
This draft of these Guidelines does not completely fulfill the first design goal: there are many areas of scholarly endeavor not yet addressed, and even those here discussed are not treated completely. The recommendations which are made here, moreover, need testing and examination by practicing researchers and revision in the light of their experience. Such revision and extension of these Guidelines are the goal of the next cycle of development in the TEI project.

The simplicity and ease of use of the Guidelines are best left to the reader to judge; examples throughout the text and appendices should, however, make clear that the markup here described does allow the rigorous definition and processing of textual objects and can be used without special software (although the experience of design has suggested more strongly than before how useful it is to have software capable of exploiting SGML markup).

The rules and recommendations made in this document do conform to the salient international standards (notably ISO 8879, which defines the Standard Generalized Markup Language, and ISO 646, which defines a standard seven-bit character set in terms of which the recommendations on character-level interchange are formulated).