Texts which have been hand copied or printed in several editions contain variations of many kinds. Scholars will usually wish to record or refer to these variants in encoding such texts. Methods for encoding textual variation may be simple or highly complex: the method of choice should be dictated by the interests and goals of textual scholarship and by the nature of the textual materials. In one case, the goal may be to encode the critical apparatus of a standard edition as a means of producing a slightly improved printed critical edition from the electronic format; in another case, the goal may be to present text publication of a new fragment of a well-known literary text, showing only the most important variants; in another case, the goal may be to create an electronic text-critical database from fresh collation of hundreds of manuscripts solely for the purpose of database queries. Because the goals of textual inquiry change over time, it is desirable that general encoding solutions be used to represent the underlying textual variation, whatever the immediate goal of the investigator.
This section presents several methods for encoding information about textual variation in a text. Much work remains to be done in this area: the relative strengths and weaknesses of these methods must be established in practice with various applications, and other approaches remain to be explored. The specific tags and encoding methods described here should be understood as work in progress, intended to provide a basis for public discussion of these problems, and not in any sense as exhaustive or optimal solutions. No recommendations are made at this time as to the particular method to be preferred and none should be inferred from the order of presentation. Researchers with an interest in this area are encouraged to contact the Text Encoding Initiative with comments and with information on systems now in use for encoding variants, their particular strengths and shortcomings, and the requirements of the researcher.
This section touches exclusively on the problem of recording textual variations. Other problems of critical editions, for example the marking of cruxes and editorial interventions of various kinds are not touched upon; they remain to be addressed in the further development of these Guidelines. It is hoped that a common notation can be found which will be adequate to the varying needs of textual scholars working on widely varying periods, languages, and cultures.
Two general approaches to encoding textual variation may be distinguished: (a) in-line encoding and (b) external representation. In the in-line approach, information about textual variation is encoded within the running text; in the external approach, SGML cross-referencing is used to link the running text with text-critical information held outside of the running text. The choice between these two methods is a matter of individual preference and convenience. The in-line encoding method offers the advantage of allowing the reader to see all the related information at a single locus without special software, at the risk of obscuring the base text with the apparatus. The external encoding method allows convenient separation of text and apparatus, at the cost of slightly more elaborate requirements for linking the two. With proper windowing and/or hypertext software, the differences between the two methods become less significant.
The representation of individual variants need not differ radically between the in-line and external approaches; what does vary is the method used to align individual variant readings with the text used as a base text, and with each other. In the sections which follow, three methods of encoding textual variants are described which differ primarily in the way they align the variants with each other:
If textual variants are encoded using one of these methods,
the encoding declarations section of the file header should contain
a beginning
or end.
The latter is the default.
Sample variant-encoding style declarations are thus:
This method segments all versions of the text in parallel: all versions contain the same number of segments in the same sequence. At the boundary of any segment, all versions of the text are synchronized with each other. Because textual variants branch off from each other only at explicitly marked segment boundaries, an application can extract any single version of the text in a single pass over the text simply by scanning the text in sequence, selecting the correct version within each segment. An apparatus can also be generated simply, because all variants for any given segment of the text are stored together. For simple texts, it also permits a human reader to view the textual evidence within the encoded document amidst minimum markup clutter. It may be an optimal method for texts which contain a small number of witnesses and which involve minimal text-critical complexity (witnesses written in the same language; minimal codependency between variant readings).
Consider the following hypothetical set of variant texts, presented here in a Paritur Umschrift format:
zero.varwhen a reading in one version corresponds to nothing at all (a empty or
zerovariant) in another. (The zero-variant entity allows us to distinguish cases where a version is complete and lacks the reading in question from cases where the version is incomplete or imperfect and we know nothing about its reading.)
The simple example given here would be represented thus using
the
A more compact display can be used if
lemmaor
preferred:
Because the parallel-segmentation method makes no structural
distinction among witnesses and has no notion of a base text
to which a separate apparatus could be keyed, it requires the
in-line encoding of variants. There is no external-representation
method corresponding to in-line encoding with parallel segmentation.
As the number of witnesses and the complexity of the textual variations
increase the parallel-segmentation method places greater demands upon the
encoder. Since all versions must have the same segmentation, the
addition of new witnesses may require the re-segmentation of all the old
witnesses. When many witnesses are involved, perhaps in several
languages, and in several genetic strata, it is difficult to
segment the text properly or optimally in advance, and equally difficult
to change the segmentation with the addition of each new witness.
In some cases, the segmentation required to exhibit properly the
relation of one pair of variants to each other conflicts with that
required for some other pair. Parallel segmentation proves particularly
difficult and particularly prone to obscure the relationships among
readings when texts contain substantive conflations, long-range
transpositions, large quantities of data, or widely different
recensions. In these cases, decisions about text segmentation and
multiple overlapping variations can be deferred by using an
incompletely segmented
or divergent-segmentation
method.
In these methods, readings of the witnesses are registered as variants
of some base text
and recorded in an apparatus. The base text is
marked with the location at which each variant group is attached, and
the other versions are each segmented notionally into portions which
agree with the base text and portions which disagree, but each version
is segmented separately, not in parallel with all other versions.
Using the incomplete segmentation methods, one can encode
textual variation using the following tags:
apparatus entry
reflects
both a convenience and a reminiscence of the historic convention for the
layout of critical texts on the printed page, but the textual object may
be conceived in a neutral manner as simply area of text-critical
interest.
Similarly, the term base text
denotes (as here
used) only the text in terms of which the apparatus is formulated. The
text chosen as base text may indeed represent the editor's or encoder's
preferred text, or a historically revered standard
text, or it
may be an particularly full text chosen merely as a convenience. There
is no requirement that the base text correspond throughout to any single
witness or edition, though that appears to be the most convenient
approach.
The variants are connected to the base text either by being inserted
into it at the end of the corresponding reading in the base text (using
the in-line approach), or by a pointer from a separate location (using
the external-representation approach). In the latter case, the pointer
is implemented either with an SGML ID reference or with pointer of the
type described in section
In the
The following example illustrates the single end-point attachment method
using almost the same example as in the preceding section. The A text
is chosen arbitrarily as the base-text, and the other readings are
recorded with single end-point attachment method
the variants are
attached to the end of the corresponding reading in the base text; the
beginning of the reading must be found by applications software by
comparing the base text with the content of a beginning.
The default value is
Note that the overall opposition of The quick/sleek brown fox
and A silver wolf
is recorded in one apparatus entry, with the
opposition of quick
and sleek
nested within it. Such
nested
The complete base text occurs outside the lemma
) is repeated in the
j.o.t.l.d.
as an abbreviation for the lemma given
above would have to pass qualifying tests such as: (a) the abbreviation
sequence is unique in the region of text to be scanned for the lemma;
(b) use of period
does not collide with literal period, which
would have to be quoted; (c) the abbreviation based upon discrete words
(fixed word boundaries) implies that word boundaries are not part of the
text-critical issue, obscured by the notation. No shorthand conventions
should be used unless their resolution by a machine is completely
predictable. No specific conventions are recommended at this time; the
definition of a reliable and usable set of such conventions is a topic
for further work.
If variants are to be recorded externally to the base text, the
points at which the variants are to be attached must be specified
by
The quick
Because the single end-point attachment method explicitly marks only the
end (or beginning) of each segment in the base text to which a set of
variants is opposed, an application cannot easily tell, for any given
word or passage in the base text, whether any variation is open at that
point. To find out, the application must scan and analyze each
It is easier to find the full range of textual variation on a given
portion of the base text if the beginning and ending of each variation
are explicitly marked in the base text (as they are in the parallel
segmentation method). A processor can then, with a single pass over the
text, mark all the portions of the base text which have opposed variants,
without scanning each
The
Encoded using the double end-point attachment method and in-line
variants, the sample text looks like this:
Double End-Point Attachment Method
Since the lemma can now be read in the base text, the repetition of
the lemma in the