Core Structural Features

This section concerns basic structural features which are common to a large number of texts and which may be said to establish their principal gross structure or shape. Typical examples, in the case of the modern book, are the chapters into which the body of a text is divided; the sections and subsections into which chapters are in turn divided; the index, glossary, bibliography, and notes, which are parts of the backmatter; the title page, foreword, preface, abstract, tables of contents etc. which together make up the front matter.

Quite what it is that these core structural features have in common is hard to define. Nevertheless, their recognition as parts of a book is well established. They are discussed prominently in style manuals for authors and editors, manuals of book design for compositors and designers, and scholarly discussions of bibliography. For example, Esdaile's Manual of Bibliography revised edition by Roy Stokes, (London: George Allen and Unwin Ltd) 1931, 1967; and The Chicago Manual of Style (Chicago and London: University of Chicago Press 1969, 1982, 13th edition), both of which discuss core structural features in chapters with the same title: The Parts of A Book. Although in what follows we discuss core structural features largely in terms of modern book length monographs, it should be easy to generalise our discussion to other analogous features in other kinds of text. Section discusses in more detail methods of extending the set of tags proposed here. The TEI core structural elements tag set was developed with special attention to BK-1, the SGML document type definition for books, and dissertations included in the AAP tagset; Annex E of the SGML standard; and Chapter 1, Parts of a Book, of The Chicago Manual of Style

Some Formal Issues

When elements are organised hierarchically, there are two important sorts of qualifications that can be made of any element instance: (i) its depth of nesting within the hierarchy, that is, its level, and (ii) its ordinal position in a sequence of elements all at the same notional level.

For instance, to say that a section is a sub-section, or a second level section, of a chapter is to characterize its level within the hierarchy of chapters and sections. To say that a section is the third section of a chapter is to characterize its position within the sequence of sections in that chapter.

These two dimensions give us four different ways of characterising core feature tags. Unqualified The tag, for instance, s, carries no indication of of its depth within the hierarchy or its order in the sequence of siblings. These characteristics are determined by context only. Depth Specific Different tag names are used for different nesting levels -- for instance, part, chap, sect, subsect, subsubsect, -- but the order of an element in its sequence of siblings is determined by context. Order Specific The nesting depth of a tag is determined by context, but the order of an element in its sequence of siblings is determined by the tagname: s2 would be the second element in its sequence of siblings. Depth and Order Specific Both nesting depth and order in sibling sequence are indicated by the tag itself: Part1, Part2, Chap1....

Each of these approaches has distinctly different advantages and disadvantages and, except for the third, each has been used extensively by preparers of machine-readable texts. These Guidelines for the most part prefer depth specific tagging. In this respect, we follow both the AAP standard and Annex E of ISO8879.

Overall structure

As always the best tagging strategy for a text must be determined by an analysis of the nature of the text and the likely purposes to which the machine-readable copy will be put. At some level of description all texts have a very similar gross structure: they have front mattercomprising such things as title, author, imprint, prefaces etc. body mattertypically comprising parts, chapter, paragraphs arranged in a single hierarchy back mattercontaining appendixes, indexes etc. Many other elements may be nested within these major structural elements. This chapter discusses the immediate sub-divisions of the front, body and back matter down to the level of individual paragraphs and similar elements. These elements, called here paragraph level units, include such elements as paragraphs, lists, block quotations, notes, figures and tables. Their contents are further discussed in chapter . The rest of the present chapter discusses immediate sub-elements of the front, body and back elements.

Front Matter

The front matter, also known as the preliminaries or prelims is traditionally important for descriptive bibliography and cataloguing see the index entry under Preliminaries in AACR2. It is also however of importance in stylistic or interpretative analysis of a text, as front matter elements are typically characterised by different registers or usage patterns.

The following high level structural elements, it is proposed, should always be distinguished when front matter is tagged: title.pageA grouping tag, within which one or more of the following elements may be embedded. No rules are specified for their order, but all text within the title.pagetag must be tagged as one of the following: title.mainTitular information referring to the whole work. title.partAny additional text found on a title page. doc.authorNames etc. of those responsible for the intellectual content of the work, as given on the title page. doc.imprintNames etc. of the publisher, printer, distributor etc of of a work, as given on the title page. doc.dateDate of the text, as given on a title page. forewordA foreword or preface addressed to the reader in which the author or publisher explains the content, purpose or origin of the text acknowledgementsA formal declaration of acknowledgment by the author in which one or more persons or institutions is thanked for their part in the creation of a text dedicationA formal offering or dedication of a text to one or more persons or institutions by the author abstractA summary of the content of a text as continuous prose contentsA table of contents, specifying the structure of a work and listing its constituents. More than one contents may appear, listing for example chapter, illustrations, tables, abbreviations etc. frontispieceA pictorial frontispiece, possibly including some text. front.partAny other separate part of the front matter.

Most of the elements described above will generally contain additional sub-elements at the paragraph level or below, as mentioned previously. Some are however so frequently encountered in front-matter elements (and so rarely elsewhere) that they are listed here. epigraphA quotation or other phrase from another work, anonymous or attributed, sometimes used at the start of a section or chapter, or on a title page. It may itself contain subelements to identify the reference, which should use the tags proposed in section . headingA special form of title used as the heading for anything less than the whole of a work, e.g. for the foreword, abstract, etc. saluteAny salutation or greeting at the start of a foreword, dedicatory epistle etc signatureThe signature or other formula used to sign a foreword, acknowledgment etc. It may itself contain subelements name (see section ) and date (see section ).

Body Matter

The names used for the major structural subdivisions of texts vary with the genre and period of the text, or even with the whim of the author, editor or publisher. For example, the major subdivisions of an epic or of the bible are called books, those of a report are usually called parts, those of a novel chapters, (unless it is an epistolary novel, in which case they may be called letters), those of an anthology may be parts or poems --- and so forth. Where genre-specific tags are either not available or not required, a neutral set of tag names to identify the main structural grouping of body matter may be useful; these are discussed below. Relationships among different structural groupings of the body are discussed in section while the components of the body are of course the subject matter of this whole chapter.

The subdivisions proposed here are all larger than the paragraph-level units described in section , and are usually associated with some referencing scheme or title information. Each subdivision groups elements at the next level down, and, except for the top level, each proposed tag is tied to a particular hierarchic level. The tag names proposed are deliberately neutral: the largest possible such subdivision of the body text should be tagged div0, and the smallest possible div5. The digit following the `div' (for division) identifies the level of the element it tags within the overall hierarchy. It may be convenient to think of `div0' as corresponding with `part', and `div1' as corresponding with `chapter', but this need not necessarily be the case.

Each div0, div1, div2 etc. tag may take the following optional attributes: nameSpecifies the name used for this structural subdivision in the original text. Examples include `part', `canto' etc. refSpecifies whether this structural subdivision is used in a canonical reference scheme (see further section ); takes the values `Y' or `N'.

In general, the contents of any division element, say div3 will be composed of the following, in the sequence given:

Note that once a division contains only `paragraph level units', no further subdivisions can be introduced; in the same way, once a subdivision is encountered in a division, no further paragraph level units can be introduced. In other words, within a div3 element, (and disregarding heading and trailer elements), only the following sequences are legal:
  1. paragraphs ... div4 ... div4 ... div4
  2. paragraphs
  3. div4 ... div4 ... div4
  4. nothing at all
but the following sequences are illegal:
  1. div4 ... div4... paragraphs ... div4
  2. paragraphs ... div4 ... paragraphs

Note also that the top level of the hierarchy may begin with either a div0element or a div1element. This convention (corresponding with the idea that a type-set document may begin either with a `level 0' or a `level 1' heading) is provided for convenience and compatability with widely used formatting systems.

An alternative method would be to use a single level tag, called body.part with attributes `level' and `type'. This is not however proposed, since whatever it would gain in generality of application would be lost in difficulty of processing.

Back Matter

Conventions as to which elements are grouped as back matter and which as front vary. For example, some books place information typically found in a table of contents at the front, and others at the back. Title pages may appear at the back of a book as well as at the front. The same sub-elements as those mentioned in the discussion of front matter (see section ) may therefore be included in back matter. The following lists the major subdivisions or groupings of such sub-elements which are regarded as typical of back matter. appendixAn ancillary self-contained section of a work, often providing additional but in some sense extra-canonical text glossaryA special form of index in which terms are associated with definition texts. notesA section in which textual or other kinds of notes are gathered together bibliographyA section in which full bibliographic references are given for all the bibliographic items cited in a text. (See further section ) indexAny form of index to the work. colophonA printer's device or other statement performing a function analagous to that of the imprint on a title page back.partAny other clearly identified section of the back matter