8.2.4 Implicit Alignment for Muliple Analyses
Analyses may be aligned implicitly by treating the running text as a
simple series of units, each unit containing one or more
levels of content. Typically the levels of content are a
base form and any number of annotations of that base form; the contents
of the unit at a given level will typically be either a simple text
string or a series of (nested) units at the next lower level of
analysis. Annotation levels can attach either to the base or to another
annotation level.
Each level of content may optionally be described with a
type attribute which describes what type of analysis it
contains. This attribute can be any text string; typical values would
include original transcription
, retranscription
,
word-by-word gloss
, allomorphic transcription
(i.e. a
transcription which indicates morpheme boundaries as cuts within the
surface form of a word), morphemic representation
, and so
on.Compare the discussion of parallel texts in chapter 6.
The unit structure could be used for parallel texts;
the values of type might in that case be sigla for
the various versions of the text.
Like all other tags, both unit and level
may have an ID attribute which assigns a unique identifier
to the element. Optionally, any level of annotation may point with a
base attribute to the level of content on which it is
based.E.g. a
morphemic-representation level might point at the allomorphic
representation level, which in turn points at the orthographic
level.
This definition of annotated unit is kept intentionally general. Every
individual analyst is likely to want to use a different scheme of
analysis, involving different kinds of units and involving different
sets of annotations (likely to include completely novel annotations)
even when the same kinds of units are used. In view of this the
proposed markup scheme makes no commitment to any of the content of the
analysis. The type attribute is provided to allow the user to encode
information about the semantic structure of the analysis. Application
software could use the type values to process the analyzed data in
accordance with that semantic structure. For instance, an editor might
use the type of a unit to constrain the types and relative order of its
annotations. A formatter could use the annotation types to select font
parameters; it would use unit types to select interlinear alignment (for
the annotations of low-level units) versus synchronization in parallel
columns (for the annotations of high-level units).
The SGML declarations required for the elements described here is:
]]>
Strictly speaking, of course, it is desirable to allow all the
phrase-level tags described elsewhere in these guidelines to
appear within units and levels; the formal document type
declarations in the appendix allow anything to occur
within a level element which can occur within a
paragraph.
Work needs to be done to ensure that the LEVEL and UNIT elements
and the TYPE attribute are readily translatable into feature structures
and alignment maps. -Ed.