Reference Systems
Reference systems are necessary in order to be able to mark a place
within a text, and to enable other readers to find it again.
Traditional referencing systems may use structural units (chapters,
paragraphs, sentences; stanza and verse), typographic units (page and
line numbers), or divisions created specifically for reference purposes
(chapter and verse in Biblical texts). The ID and IDREF attribute types
of SGML (discussed above in section ) can provide new
methods of reference, or can be used to implement traditional reference
systems. Traditional reference schemes and schemes using the SGML ID
attributes may be more useful than those that rely on the SGML tagging
of a text without using ID attributes, since the latter may be garbled
if the SGML tagging is revised. (See section for a
detailed discussion of the problems of various methods of identifying
text segments without using SGML IDs.)
When traditional reference schemes represent a hierarchical structuring
of the text, it is recommended that they be marked with hierarchically
defined tags. When the hierarchy of the reference scheme mirrors that
of the SGML document, the N attribute defined for all tags
may be used to indicate the traditional identifier (name, number, or
combination) of the relevant structural units. N may also
be used to record the numbering of sections or list items in the copy
text if the copy-text numbering is important for some reason (e.g. the
numbers are out of sequence). When the hierarchy of the SGML-encoded
document and that of the traditional scheme diverge (e.g. for reference
schemes based on page and line numbers) or when there are several
conflicting traditional reference schemes, the reference scheme should
be tagged using a concurrent document hierarchy. (See the discussion of
the CONCUR feature in section for an introduction to
the concept of concurrent hierarchies.)
If concurrent markup is not desired (e.g. because the available SGML
parser does not support the CONCUR feature), boundaries between segments
in a traditional reference scheme may be specified using the
milestone tag described below in section .
No SGML validation of the reference scheme is possible using the
milestone tag, so it will be the responsibility of the
encoder or the application software to ensure that milestone tags occur
in a sensible order (e.g. with a page reference before the first line
reference).
When creating SGML versions of any text, it is recommended that the page
boundaries of the source text be marked using a concurrent hierarchy for
its pages and lines or using the page.break or
milestone tags. (If it falls in mid-word, the page tag may
be moved to the end of a word if desired.) It is strongly
recommended when the text has no traditional referencing scheme or
acknowledged reference edition. Line breaks in prose texts may be, but
need not be, tagged.
Concurrent Markup for Pages and Lines
Perhaps the most common form of traditional reference system specifies
the page and line, or page, column, and line of a passage as it appears
in some standard edition. Such references may be specified using a
concurrent markup hierarchy which divides the body of a text into pages
and lines or into pages, columns, and lines. Volumes may also need to
be identified. The document type name should be a short-hand identifier
for the edition cited.
Page and line numbers for an edition by Lachmann, for example, might be
specified thus:
<(La)line n=1> [Text from Lachmann, p. 223, line 1]
<(La)line n=2> [Text from Lachmann, p. 223, line 2]
<(La)line n=3> [Text from Lachmann, p. 223, line 3]
<(La)line n=4> [Text from Lachmann, p. 223, line 4]
(etc.)
<(La)page n=224>
<(La)line n=1> [Text from Lachmann, p. 224, line 1]
<(La)line n=2> [Text from Lachmann, p. 224, line 2]
<(La)line n=3> [Text from Lachmann, p. 224, line 3]
<(La)line n=4> [Text from Lachmann, p. 224, line 4]
(etc.)
]]>
The following SGML declarations define such a concurrent markup
hierarchy:
]]>
This concurrent hierarchy is enabled as shown in the comments; the
sequence of lines shown (from DOCTYPE ...
to
]>
) should be embedded in the document file after the normal
document type specification. (See examples in the appendix.) If page
and line numbers from more than one standard edition are to be marked,
then the relevant lines may be repeated, each time using a different
value for the document type and entity definition (where the example has
La
).
Concurrent Markup for Other Hierarchies
Hierarchies similar to that defined above can be provided for a variety
of common hierarchical reference schemes. The document type
declarations in the appendix include definitions for three such
hierarchies:
- act, scene, line (for conventional dramatic structures)
- book, canto, stanza, line (for longer narrative verse whether
stichic or stanzaic)
- book, poem, stanza, line (for collections of verse grouped into
books, and referred to by book.poem-number.(stanza).line references.
Any text with idiosyncratic canonical referencing will require its own
DTD, so that appropriately named tags can be created for the reference
units. Such DTDs may be modeled on those in the appendix.
Using the ID and N Attributes
In some cases, the canonical reference unit and the content units marked
by an SGML tagging may coincide. For example, a reference to Ovid's
Amores might be Amores 2.10.7
---book 2, poem 10, line
7. Book, poem, and line are structural units of the work and will be
tagged in any case. (See section for a discussion of
structural units in verse collections.) In such cases, it is convenient
to record traditional reference numbers of the structural units using
the N attribute. The relevant tags for our example would
be:
]]>
This method is not without problems, since some editions may define
structural units differently. For example, another edition of the
Amores considers poem 10 a continuation of poem 9, and therefore would
specify the same line as 2.9.31. In such cases, one must specify the
competing schemes in concurrent markup hierarchies, or else use the
milestone tags described in section .
If a text has no canonical reference scheme of its own, and was entered
without preserving the pagination of its source edition, a reference
scheme, if needed, may be derived from the structure of the electronic
text, specifically from the SGML markup of the text. As with any
reference scheme intended for long-term use, it is important to see the
reference as an established, unchanging point in the text. Should the
text be revised or rearranged, the reference-scheme identifiers
associated with any bit of text must stay with that bit of text, even if
it means the reference numbers fall out of sequence. (A new reference
scheme may always be created beside the old one if out-of-sequence
numbers must be avoided.)
The global attributes N and ID may be used to
assign reference identifiers to segments of the text.
Identifiers specified by either attribute apply to the entire element
for which they are given. SGML enforces uniqueness on ID attributes
within a single document, and ID values must begin with a letter. No
such restrictions are made on the values of N attributes.
A convenient method of mechanically generating unique values for
ID or N attributes, based on the SGML
structure of the document is to use the type path
or untyped
path
method to identify elements within the text segment
of a TEI document.
The text segment is recommended rather than the
TEI.doc as a whole or the body of the text
only. No values need usually be generated for the
TEI.header section of the document, if the reference
scheme is intended primarily for the text; values should not
usually be restricted to the text body, because front and back
matter must also be referred to.
This is a convenient method, but is in no way required for anyone
creating a reference scheme.
If the ID attribute is used to record the reference
identifiers generated, each value should record the entire path. If the
N attribute is used, each value may record either the
entire path or only the subpath from the SGML parent element.
Milestone Tags
When concurrent markup is not used, checkpoints for any traditional
reference scheme may be incorporated into a document using empty tags
which can appear at any point and which mark the boundaries between
sections in the tradition reference scheme. Page and line boundaries,
for example, can be marked using the page.break and
line.break tags described in section . For
other reference schemes, a single tag is here defined, called
milestone. Using these tags, the reference scheme of any one
edition can be recreated from a text in which all are marked by simply
ignoring all tags that do not describe that edition. The
milestone elements have no content, and subdivide the text
into regions just as milestones divide a road into segments.
A milestone tag indicates the beginning of some segment
marked in a traditional reference system. The specific system, the type
of segment marked, and the identifier of the segment are specified using
the attributes ed (for edition
), unit,
and N (for name
or number
). Each of these
attributes can take any character string as its value. N
is optional, since an application can keep a count from the start of the
document if desired; the others are required.
For unit the following values are suggested as appropriate:
page for page breaks
column for column breaks
line for physical line of the page (in page / column /
line reference systems) or for verse line (in
reference systems for verse)
book for any unit termed book,
liber,
etc.
poem for an individual poem
canto for a canto or major section of a poem
stanza for a stanza within a poem, book, or canto
act for an act within a play
scene for a scene within a play or act
section for a section of any kind
absent if it desired to specify that a given piece of
text is not present in the edition in question
(such specification is wholly optional)
Other terms may of course be used as desired (e.g. Stephanus
to
indicate Stephanus numbers in Plato). The
encoding.declarations section of the TEI file header should
contain an explanation of the reference system(s) used and bibliographic
references to their sources, if appropriate, under the rubric
reference.system. (See section for a full
discussion of the encoding declarations area.)
The value of the N attribute may but need not include the
identifiers used for any larger sections. That is, either of the
following styles is legitimate:
[text of Act 1, Scene 1 ... Traditional reference is "1.1"]
[text of Act 1, Scene 2 ... Traditional reference is "1.2"]
(etc. ...)
]]>
or
[text of Act 1, Scene 1 ... Traditional reference is "1.1"]
[text of Act 1, Scene 2 ... Traditional reference is "1.2"]
(etc. ...)
]]>
When counting lines on a page for reference purposes, headers, footers
and headings are usually ignored. When using milestone tags,
line numbers may be supplied for every line or only periodically (every
fifth, every tenth line). The latter may be simpler; the former is more
reliable. Note that SGML short references
may be used to
simplify the marking of page and line breaks during data capture. Such
short references must be resolved to the fully specified SGML form
before TEI-conformant interchange.
The style of numbering used in the values of N is
unrestricted: for the example above, I.i
and I.ii
could
have been used equally well if preferred. The special value
unnumbered should be reserved for marking sections of text
which fall outside the normal numbering scheme (e.g. chapter heads,
poem numbers, titles, or speaker attributions in a verse drama).
No hierarchical ordering is or can be defined for the various types of
milestone tag; it is the encoder's responsibility to ensure
that milestone values are valid.
Because the ed attribute is unrestricted, no change need be
made to the document type declaration of a file before adding tags to
describe a new reference scheme. (The value of ed may be
restricted to a defined set of edition symbols by using the techniques
described in chapter .)
The SGML declarations for the milestone tag and its
attributes are as follows:
]]>