TEI TC Core W04: Proposal for Anonymous Block Element

]> TEI TC Core W04: Proposal for Anonymous Block Element C. M. Sperberg-McQueen

An unpublished document.

No source: this electronic form is the original form.

Proposal for Anonymous Block Element C. M. Sperberg-McQueen 3 February 1997 TEI TC Core W04 Background

The DTD of TEI P3 defines a large number of element types, with a wide variety of meanings. In addition, it defines one element (seg), which has no specified meaning. The seg element may be used: for the delineation of arbitrary segments, without requiring that their type or meaning be specified (e.g. to allow them to be pointed at in a hypertext system), or to identify phrase-level objects of types not predefined in P3, as a simple form of semantic extension Because seg has no defined meaning of its own beyond that inherent in the concept of an SGML element type, it may be regarded as a sort of anonymous element type (by analogy with the anonymous functions provided by some programming languages).

The seg element can be used only for phrase-level elements, because seg is a member of class phrase. It thus can appear within paragraphs, etc. (strictly: within any element with a content model of paraContent, specialPara, or phrase.seq), but not between paragraphs, directly within text divisions.

It would be convenient to have an anonymous element type usable at the component level of documents; this would allow a cleaner markup of Biblical verses (which are not really paragraphs even in prose sections, and not always coextensive with lines of verse in the verse sections) portions of speeches in plays, when the encoder does not wish to make any claim about whether the material is prose or verse blocks of material appearing at component level within the text, for which the TEI provides no element type

Possible Solutions

Two possible solutions seem obvious: shift seg from phrase-level to inter-level, thus allowing it to occur not only within, but also between, paragraphs add a second anonymous element type, tentatively called block, at the chunk or inter level. The Core subcommittee leans toward the second solution, since the structural distinction between anonymous phrase-level elements and anonymous chunk-level elements seems worth reflecting in the element type.The elements are semantically anonymous, but their structural position is usually clear; there seems no reason not to make it manifest.

Proposal

An element block should be added to the additional tag set for linking and alignment, in section 14.3, which is where seg is defined.

It should have the following description:block: contains any arbitrary component-level unit of text. Attributes include: type characterizes the type of the text block ident characterizes the function of the text block subtype provides a subcategorization of the text block, if needed

The tag list at the beginning of the section should list the tags in the order anchor, seg, and block, and the discussion of the anchor element should be moved from the end of the discussion section, where it is currently lost, to the beginning.

The discussion of seg and block should read:

The seg and block elements can be used at the encoder's discretion to mark almost any segment of the text which is of interest for processing. One use of these elements is to mark text features for which these Guidelines otherwise provide no appropriate markup, i.e. as a simple extension mechanism. Another use is to provide an identifier for some segment which is to be pointed at by some other element, i.e. to provide a target, or a part of a target, for a ptr or other similar element.

Several examples of uses for the seg element are provided elsewhere ...

(Continue with current discussion of seg element.)

The remainder of this chapter contains a number of examples of the use of the seg element simply to provide an element to which an identifier may be attached, for example so that another segment may be linked or related to it in some way.

The block element performs a similar function for portions of the text which occur not within paragraphs or other component-level elements, but at the component level themselves. It may be used, for example, to tag the canonical verse divisions of Biblical texts: The First Book of Moses, Called Genesis In the beginning God created the heaven and the earth. And the earth was without form, and void; and darkness was upon the face of the deep. And the Spirit of God moved upon the face of the waters. And God said, Let there be light: and there was light. ]]>

In other cases, where the text clearly indicates paragraph divisions containing one or more verses, the p element may be used to tag the paragraphs, and the seg element used to subdivide them. The block element may not be used here, as it may occur only within, not between, paragraphs. Das Erste Buch Mose.

Am Anfang schuff Gott Himel vnd Erden. Vnd die Erde war wüst vnd leer / vnd es war finster auff der Tieffe / Vnd der Geist Gottes schwebet auff dem Wasser.

Vnd Gott sprach / Es werde Liecht / Vnd es ward Liecht. ]]>

Additional examples of the use of the block element are given elsewhere in these Guidelines: as a means of marking dramatic speeches when it is not clear whether the speech is to be regarded as prose or verse (see section 6.11.2, "Core Tags for Drama," on p. 212, and section 10.2.4, "Speech Contents," on p. 285). (to be specified)

The discussions of dramatic speeches in the sections indicated should use block, not seg.

Section 14.3 should be renamed Segments, Blocks, and Anchors.

The declaration for block should be ]]>

Open Questions

A number of questions are answered implicitly in the proposal just given; they may need explicit discussion. should block go into TEI.analysis with seg? Unlike seg, which is needed to allow idref links to arbitrary phrase-level spans within a paragaph, block seems unlikely to be seriously useful for hypertext linking. Its main function will be as an anonymous element. Perhaps both block and seg belong in the core? should block be a member of class chunk or class inter? If the former, is it a serious drawback that two structures thought by the encoder to be of the same type, may need to be tagged in two different ways? Further Discussion blort blort blort

And furthermore, blort blort blort which really puts it beyond all doubt, I think.

]]> If the latter, should the encoder prefer seg or block when the item in question occurs within a paragraph? Our chief weapon is fear. Fear and surprise. Our two chief weapons are Fear and surprise. ]]> (Or see the example of Bible verses, above.) The name block is five characters long, but in some cases (Biblical texts, plays where the encoder doesn't want to make a claim about prose or verse, other texts where the units are neither p nor l, prose segments of dictionary entries) the element will occur very frequently. The normal TEI rule has been that very frequent elements should probably have one- or two-character names. Use bl? Should block be dropped and seg merely shifted to inter? If so, special steps would need to be taken to ensure that seg may still be used to subdivide elements at all levels, since many elements, including byline, fw, gloss, measure, name, phr, rs, s, and sense may not contain inter-level elements.