About These Guidelines

<!-- TEI P1, sec. 2.1                                           -->
<!-- Title:   About these Guidelines:  Intended Applications    -->
<!-- Drafted:  CMSMcQ, April 1990                               -->
<!-- ********************************************************** -->
<!-- Revision History (add lines at top)                        -->
<!-- Date      Who    What                                      -->
<!--  1 May 90 CMSMcQ into draft 0                              -->
<!-- ********************************************************** -->
<h1 id=z2>About These Guidelines
 
<h2 id=z21>Intended Applications
 
<h3 id=z212>Interchange, Local Use, and Data Creation
 
These guidelines are intended to be useful in the interchange of text
from one scholar to another, from one research group or center to
another, or from one computing system to another.  Analogously, they
should serve in moving text from one application program to another, or
maintaining text in a format common to several applications, rather than
maintaining several copies of a text, one in each format required.  And
finally, they are also intended to provide guidance to the scholar
embarking on the creation of an electronic text, both as to what textual
features should be captured and as to how they should be represented.
These guidelines thus serve three primary functions:
 
<ul><li>support of data interchange
    <li>support of application-independent local processing
    <li>guidance for individual or local practice in text creation
or capture.</ul>
 
These three functions are not identical, but they so thoroughly
interrelate in practice that it is hardly possible to achieve any one
without achieving the others.  The line between data capture and local
processing, especially, disappears entirely when one considers that the
aim of local processing might be precisely the capture of new
information about the text, which is to be represented by the encoding.
 
<h3 id=z213>Use of Guidelines for Interchange
 
When these guidelines are used for interchange, it is expected that
researchers or centers which use other encoding schemes internally in
the center or project will translate outgoing data from the encoding
scheme used internally into the scheme described by these guidelines,
and similarly translate incoming data from the scheme described here
into that used internally.  The scheme described here is designed to
enable such translation to occur without information loss.  That is, the
scheme described here has been designed to be at least as expressive (in
a formal sense) as any encoding scheme now known to be in wide use for
textual research.  The extension techniques described in chapter 9 may
be used to give the TEI scheme whatever tags are necessary to capture
the information in a non-TEI encoding; the intention has been to
minimize the need for recourse to such extensions.
 
In the simple case, the two sites or individuals exchanging texts know
each other and know or can inquire what equipment the other is using.
In the general case, however, a text may be made publicly available
through an archive, a bulletin board, anonymous file transfer server, or
other mechanism, without either the originator or the final recipient of
the text knowing who the other is.  In the simple case, these
guidelines serve primarily as a convenient pre-existing documenation of
a file format which can be referred to without being transmitted.
Existing software may also make the transfer through this format
simpler.  Special variations in format to suit special requirements of
the partners are possible by private arrangement.  In the general case,
of course, such special arrangements are impossible; both originator and
recipient should be prepared to follow the guidelines strictly.
 
There is not, in this draft of the guidelines, a separate formal
definition of the <q>interchange format</q> as opposed to the general
recommendations for local processing; this reflects the vagueness of the
distinction.  The <q>interchange format</q> is to be understood as
requiring:
 
<ol>
<li>strict adherence to the DTDs and the SGML declaration reproduced in
    the appendix, unless modified or extended as described in chapter 9
<li>provision of tag documentation as described in part II for all
    tags not defined in these guidelines
<li>strict adherence to the requirements of the text documentation area
    in providing bibliographic identification of the text and
    description of the encoding practice<fn>Strictly speaking, what is
        required (as described in Chapter 5) is that the required
        information either be provided or be marked as unavailable.  The
        option of marking information as unavailable is intended to
        enable sites with large collections of existing texts to
        export conforming texts without having to enrich their
        existing databases.  It is emphatically not recommended as a
        method of evading the need to provide sound documentation of a
        text.</fn>
<li>rendition of the text in the characters of the Minimal Character
    Repertoire described in chapter 4<fn>This is SDR's <q>character
        conformance level 1</q>.</fn>
</ol>
 
If a more formal definition of the interchange format is required for
any reason, those interested should contact the Text Encoding Initiative
to describe the requirements which need to be met.
 
<h3 id=z214>Use of the Guidelines for Local Processing
 
The need to create a language rich enough for information-preserving
interchange entails ensuring that the language can represent the
information represented in any scheme intended for a specific
application of computers to texts.  Any single language adequate for
many applications will necessarily have interest for anyone using more
than one kind of application software on their texts, or even for those
developing new software for just one application.
 
Machine-readable text can be manipulated in many ways; our aim has been
to avoid assuming too much about what the reader of these guidelines
will do with texts markup up according to these rules.  It is assumed
that this markup must be able to be used by programs which:
 
<ul>
        <li>edit texts (e.g. word processors, syntax-directed
            editors, hypertext systems)
        <li>format and print texts (word processors
            again, and batch-oriented formatting programs like Scribe,
            Script, Runoff, roff, or TeX)
        <li>load texts into free-text retrieval databases or
            conventional databases
        <li>unload texts from databases as search results or for export
            to other software
        <li>search texts for words or phrases
        <li>perform content analysis on texts
        <li>collate texts for critical editions
        <li>scan texts for automatic indexing or similar purposes
        <li>parse texts linguistically
        <li>analyze texts stylistically
        <li>scan verse texts metrically
        <li>link words of a text to images of the objects named by the
            words (as in a hypertext language-teaching system)
</ul>
 
The aim has been to make these guidelines useful for marking up texts
used in any of these applications; this has meant trying to avoid
anything which would restrict their use in texts intended for any other
application.  It is safe to assume that printing and editing, being the
most universally familiar operations upon machine-readable text,
received at least as much attention as others, but the aim has never
been to create yet another language for controlling text formatters or
editors.
 
<h3 id=z215>Use of the Guidelines in Text Creation
 
The description of textual features found in the chapters which follow
should provide a useful checklist for scholars planning the creation of
machine-readable versions of any text.  Where there appears to be
consensus in the text-computing community on what constitutes good or
bad practice in some particular area, specific comments to that effect
are provided in the chapters which follow.  Where a given feature is
generally found useful, the tag for that feature is recommended for
general use; where it is found not worth tagging, it is disparaged.
Where the feature is neither generally useful nor generally pointless,
its tagging or omission is left to the discretion of the individual
working with the text.  At the least, therefore, these guidelines should
be useful in deciding what to capture and what to lose when representing
a text in machine-readable form.  Responsibility for the adequacy of the
encoded text remains, of course, with the individual scholar.
 
Problems specific to data-capture have <emph>not</emph> been considered
explicitly in the pages which follow.  The document type declarations in
the appendix do specify when tags may be omitted when the text is being
processed using the SGML OMITTAG feature, because this is a very simple
and general method of easing data capture.  In general, though, methods
for minimizing keystrokes, correcting the results of optical scanning,
or augmenting existing texts with useful information are too completely
dependent upon the details of the individual situation to be susceptible
to useful treatment here.  The text being captured, the type of research
foreseen, the computer system at hand, all affect the methods of data
capture.  Possible techniques for simplifying, speeding, or reducing the
cost of data capture include editor macros and keyboard shorthands,
simple parsers to recognize structural features in scanner output,
special-purpose software to put word-processor or scanner data into
SGML,<fn>A
    wide variety of this software has become available as a
    result of the CALS Initiative.</fn>
the exploitation of SGML's rich set of mechanisms for minimizing the
amount of markup which need be explicitly provided in a text,<fn>Because
    they substantially complicate processing for those who have no
    conforming SGML processor handy, these optional markup minimization
    features are all forbidden in the TEI exchange format; their use for
    local processing is, of course, a local decision.</fn>
and even the development of special SGML document type declarations
specifically for data capture, together with programs to read data in
those forms and produce the desired form.
 
