8.2.5 Explicit Alignment of Multiple Analyses

Where analyses differ in their decomposition or ordering of the base text, implicit alignment of the analyses is not possible. In such cases, the alignment must be made explicit. The data structures described in this section allow specification of alignment between arbitrary sets of linguistic analyses with a very simple mechanism. An alignment consists of at least two alignment maps, which in turn consist of pointers to elements from the analyses being aligned.Normally, two different analyses will be involved, but alignments which link different parts of a single analysis to itself are legal. It is the responsibility of the encoder to specify what such an alignment might mean. Each pointer can be:

The necessary SGML declarations are as follows: <![ CDATA [ <!ELEMENT alignment - - (al.map)+ > <!ELEMENT al.map - - (al.ptr | al.list | al.range), (al.ptr | al.list | al.range)+ > <!ELEMENT al.ptr - O EMPTY > <!ATTLIST al.ptr id IDREF #REQUIRED > <!ELEMENT al.list - - (al.ptr+) > <!ELEMENT al.range - O EMPTY > <!ATTLIST al.range from IDREF #REQUIRED to IDREF #REQUIRED > ]]>

The alignment of three distinct syntactic analyses of the sentence He won't hang it up. is illustrated below. (The analyses are intended to serve as illustrations only and have no theoretical status.) The first representation is simply the input text, including punctuation. The indices under the segments in the representation will be referred to in the alignment map.

A: He won't hang it up . -- ---- ---- -- -- -- A1 A2 A3 A4 A5 A6 The second representation differs from the first in that the contraction won't is split into a sequence of wo and the negative morpheme n't. B: He wo n't hang it up . -- -- --- ---- -- -- -- B1 B2 B3 B4 B5 B6 B7 Finally, in the third representation, the contraction won't is represented as a sequence of the two words will and not, and the particle verb hang up is represented both as a two-word sequence and as a single lexeme. C: He will not hang up it -- ---- --- ------- -- C1 C2 C3 C4 C5 ---- -- C6 C7 The SGML encoding of these analyses---without yet taking into account the alignments among them---is shown below. The tags sent, w, seg and lex are illustrative labels only and have no standing within the standard. The segmentation of the orthographic form here should be redone using existing tags from chapter 6.11, if possible. If that is not possible, then we need some extensions. -Ed. <![ CDATA [ <sent> <A> <w id = A1> He </w> <w id = A2> won't </w> <w id = A3> hang </w> <w id = A4> it </w> <w id = A5> up </w> <w id = A6> . </w> </A> <B> <seg id = B1> He </seg> <seg id = B2> wo </seg> <seg id = B3> n't </seg> <seg id = B4> hang </seg> <seg id = B5> it </seg> <seg id = B6> up </seg> <seg id = B7> . </seg> </B> <C> <lex id = C1> He </lex> <lex id = C2> will </lex> <lex id = C3> not </lex> <lex id = C4> <lex id = C6> hang </lex> <lex id = C7> up </lex> </lex> <lex id = C5> it </lex> </C> </sent> ]]> The alignment among the different levels of analysis is given below. As shown below, alignment need not be fully specified, and it can be declared separately from the encoding of the analyses to which it refers. <![ CDATA [ <alignment> <al.map> <al.ptr id = A1> <al.ptr id = B1> </al.map> <al.map> <al.ptr id = A2> <al.list> <al.ptr id = B2> <al.ptr id = B3> </al.list> <al.list> <al.ptr id = C2> <al.ptr id = C3> </al.list> </al.map> <al.map> <al.list> <al.ptr id = A3> <al.ptr id = A5> </al.list> <al.list> <al.ptr id = B4> <al.ptr id = B6> </al.list> <al.ptr id = C4> </al.map> <al.map> <al.range al.start = A1 al.end = A3> <al.range al.start = B1 al.end = B4> <al.range al.start = C1 al.end = C6> </al.map> </alignment> ]]>