Design Principles
The basic goals of these Guidelines are determined by the results of
a planning conference held at Vassar College in Poughkeepsie, New
York, at the outset of the project. That planning conference agreed
on the following statement of principles:
- The guidelines are intended to provide a standard format for
data interchange in humanities research.
- The guidelines are also intended to suggest principles for
the encoding of texts in the same format.
- The guidelines should
- define a recommended syntax for the format,
- define a metalanguage for the description of text-encoding
schemes,
- describe the new format and representative existing schemes
both in that metalanguage and in prose.
- The guidelines should propose sets of coding conventions
suited for various applications.
- The guidelines should include a minimal set of conventions
for encoding new texts in the format.
- The guidelines are to be drafted by committees on
- text documentation
- text representation
- text interpretation and analysis
- metalanguage definition and description of existing and
proposed schemes,
coordinated by a steering committee of representatives of the
principal sponsoring organizations.
- Compatibility with existing standards will be maintained as
far as possible.
- A number of large text archives have agreed in principle to
support the guidelines in their function as an interchange
format. We encourage funding agencies to support development of
tools to facilitate this interchange.
- Conversion of existing machine-readable texts to the new
format involves the translation of their conventions into the
syntax of the new format. No requirements will be made for the
addition of information not already coded in the texts.
These basic principles are expounded in various documents of the
Text Encoding Initiative (notably TEI EDP1 and TEI EDP2) and the
interested reader is directed to those documents for further discussion.
The mandate of creating a common interchange format requires the
specification of a specific markup syntax as well as the definition
of a large predefined tag set and the provision of mechanisms for
extending the markup scheme. The mandate to provide guidance
for new text encodings (suggest principles for text encoding
)
requires that recommendations be made as to what textual features
should be recorded in various situations.
In designing the tag set and formulating the recommendations, the
following design goals have been paramount. These Guidelines
are intended to:
- suffice to represent the textual features needed for research
- be simple, clear, and concrete
- be easy for researchers to use without special-purpose software
- allow the rigorous definition and efficient processing
of texts
- provide for user-defined extensions
- conform to existing and emergent standards
This draft of these Guidelines does not completely fulfill the
first design goal: there are many areas of scholarly endeavor not
yet addressed, and even those here discussed are not treated
completely. The recommendations which are made here, moreover,
need testing and examination by practicing researchers and
revision in the light of their experience. Such revision and
extension of these Guidelines are the goal of the next cycle of
development in the TEI project.
The simplicity and ease of use of the Guidelines are best left to the
reader to judge; examples throughout the text and appendices should,
however, make clear that the markup here described does allow the
rigorous definition and processing of textual objects and can be used
without special software (although the experience of design has
suggested more strongly than before how useful it is to have software
capable of exploiting SGML markup).
The rules and recommendations made in this document do conform to
the salient international standards (notably ISO 8879, which defines
the Standard Generalized Markup Language, and ISO 646, which
defines a standard seven-bit character set in terms of which the
recommendations on character-level interchange are formulated).