8.2.5 Explicit Alignment of Multiple Analyses
Where analyses differ in their decomposition or ordering of the base
text, implicit alignment of the analyses is not possible. In such
cases, the alignment must be made explicit. The data structures
described in this section allow specification of alignment between
arbitrary sets of linguistic analyses with a very simple mechanism. An
alignment consists of at least two alignment
maps, which in turn consist of pointers to elements from the
analyses being aligned.Normally,
two different analyses will be involved, but alignments which
link different parts of a single analysis to itself are legal.
It is the responsibility of the encoder to specify what such an
alignment might mean.
Each pointer can be:
- a simple reference index,
- a list of such indices, or
- a pair of indices which define a range of elements.
The necessary SGML declarations are as follows:
]]>
The alignment of three distinct syntactic analyses of the sentence He
won't hang it up.
is illustrated below. (The analyses are intended
to serve as illustrations only and have no theoretical status.) The
first representation is simply the input text, including punctuation.
The indices under the segments in the representation will be referred to
in the alignment map.
A: He won't hang it up .
-- ---- ---- -- -- --
A1 A2 A3 A4 A5 A6
The second representation differs from the first in that the contraction
won't
is split into a sequence of wo
and the negative
morpheme n't
.
B: He wo n't hang it up .
-- -- --- ---- -- -- --
B1 B2 B3 B4 B5 B6 B7
Finally, in the third representation, the contraction won't
is
represented as a sequence of the two words will
and not
,
and the particle verb hang up
is represented both as a two-word
sequence and as a single lexeme.
C: He will not hang up it
-- ---- --- ------- --
C1 C2 C3 C4 C5
---- --
C6 C7
The SGML encoding of these analyses---without yet taking into account
the alignments among them---is shown below. The tags sent
,
w
, seg
and lex
are illustrative labels only and
have no standing within the standard.
The segmentation of the orthographic form here should be
redone using existing tags from chapter 6.11, if possible. If that is
not possible, then we need some extensions. -Ed.
He
won't
hang
it
up
.
He
wo
n't
hang
it
up
.
He
will
not
hang
up
it
]]>
The alignment among the different levels of analysis is given below. As
shown below, alignment need not be fully specified, and it can be
declared separately from the encoding of the analyses to which it
refers.
]]>