Core Structural Features
This section concerns basic structural features which are common to a
large number of texts and which may be said to establish their principal
gross structure or shape. Typical examples, in the case of the modern
book, are the chapters into which the body of a text is divided; the
sections and subsections into which chapters are in turn divided; the
index, glossary, bibliography, and notes, which are parts of the
backmatter; the title page, foreword, preface, abstract, tables of
contents etc. which together make up the front matter.
Quite what it is that these core structural features
have in
common is hard to define. Nevertheless, their recognition as
parts of a book
is well established. They are discussed
prominently in style manuals for authors and editors, manuals of
book design for compositors and designers, and scholarly discussions
of bibliography.
For example, Esdaile's Manual of Bibliography revised
edition by Roy Stokes, (London: George Allen and Unwin Ltd) 1931,
1967; and The Chicago Manual of Style (Chicago and
London: University of Chicago Press 1969, 1982, 13th edition), both
of which discuss core structural features in chapters with the same
title: The Parts of A Book
.
Although in what follows we discuss core structural features largely
in terms of modern book length monographs, it should be easy to
generalise our discussion to other analogous features in other kinds
of text. Section discusses in more detail methods
of extending the set of tags proposed here.
The TEI core structural elements tag set was developed with special
attention to BK-1, the SGML document type definition for books,
and dissertations included in the AAP tagset; Annex E of
the SGML standard; and Chapter 1, Parts of a Book, of The
Chicago Manual of Style
Some Formal Issues
When elements are organised hierarchically, there are two important sorts of
qualifications that can be made of any element instance: (i) its depth of
nesting within the hierarchy, that is, its level, and (ii)
its ordinal position in a sequence of elements all at the
same notional level.
For instance, to say that a section is a sub-section, or a second
level section, of a chapter is to characterize its level within the
hierarchy of chapters and sections. To say that a
section is the third section of a chapter is to characterize its
position within the sequence of sections in that chapter.
These two dimensions give us four different ways of characterising
core feature tags.
Unqualified
The tag, for instance, s, carries no indication of
of its depth within the hierarchy or its order in the sequence of
siblings. These characteristics are determined by context only.
Depth Specific
Different tag names are used for different nesting levels --
for instance, part, chap, sect,
subsect, subsubsect, -- but the order of an
element in its sequence of siblings is determined by context.
Order Specific
The nesting depth of a tag is determined by context, but
the order of an element in its sequence of siblings is determined
by the tagname: s2 would be the second element in its
sequence of siblings.
Depth and Order Specific
Both nesting depth and order in sibling sequence are indicated by
the tag itself: Part1, Part2,
Chap1....
Each of these approaches has distinctly different advantages and
disadvantages and, except for the third, each has been used
extensively by preparers of machine-readable texts. These Guidelines
for the most part prefer depth specific tagging.
In this respect, we follow both the AAP standard and Annex E of ISO8879.
Overall structure
As always the best tagging strategy for a text must be determined
by an analysis of the nature of the text and the likely purposes
to which the machine-readable copy will be put. At some level of
description all texts have a very similar gross structure: they have
front mattercomprising such things as title, author,
imprint, prefaces etc.
body mattertypically comprising parts, chapter, paragraphs
arranged in a single hierarchy
back mattercontaining appendixes, indexes etc.
Many other elements may be nested within these major structural elements.
This chapter discusses the immediate sub-divisions of the front, body and
back matter down to the level of individual paragraphs and similar elements.
These elements, called here paragraph level units, include such
elements as paragraphs, lists, block quotations, notes, figures and tables.
Their contents are further discussed in chapter . The
rest of the present chapter discusses immediate sub-elements of the
front, body and back elements.
Front Matter
The front matter, also known as the preliminaries
or prelims
is traditionally important for descriptive bibliography and cataloguing
see the index entry under Preliminaries
in AACR2. It is also
however of importance in stylistic or interpretative analysis of a text,
as front matter elements are typically characterised by different registers
or usage patterns.
The following high level structural elements, it is proposed, should always be
distinguished when front matter is tagged:
title.pageA grouping tag, within which one or more of the following
elements may be embedded. No rules are specified for their order, but all
text within the title.pagetag must be tagged as one of the
following:
title.mainTitular information referring to the whole work.
title.partAny additional text found on a title page.
doc.authorNames etc. of those responsible for the intellectual
content of the work, as given on the title page.
doc.imprintNames etc. of the publisher, printer, distributor etc of
of a work, as given on the title page.
doc.dateDate of the text, as given on a title page.
forewordA foreword or preface addressed to the reader in which the
author or publisher explains the content, purpose or origin of the text
acknowledgementsA formal declaration of acknowledgment by the author
in which one or more persons or institutions is thanked for their part in
the creation of a text
dedicationA formal offering or dedication of a text to one or more
persons or institutions by the author
abstractA summary of the content of a text as continuous prose
contentsA table of contents, specifying the structure of a work and
listing its constituents. More than one contents may appear,
listing for example chapter, illustrations, tables, abbreviations etc.
frontispieceA pictorial frontispiece, possibly including
some text.
front.partAny other separate part of the front matter.
Most of the elements described above will generally contain additional
sub-elements at the paragraph level or below, as mentioned previously.
Some are however so frequently encountered in front-matter elements
(and so rarely elsewhere) that they are listed here.
epigraphA quotation or other phrase from another work, anonymous or
attributed, sometimes used at the start of a section or chapter, or on a title
page. It may itself contain subelements to identify the reference, which
should use the tags proposed in section .
headingA special form of title used as the heading for anything
less than the whole of a work, e.g. for the foreword, abstract, etc.
saluteAny salutation or greeting at the start of a foreword, dedicatory
epistle etc
signatureThe signature or other formula used to sign a foreword,
acknowledgment etc. It may itself contain subelements name (see
section ) and date (see section
).
Body Matter
The names used for the major structural subdivisions of texts vary with
the genre and period of the text, or even with the whim of the author, editor
or publisher. For example, the major subdivisions of an epic or of the
bible are called book
s, those of a report are usually
called part
s, those of a novel chapter
s, (unless it is an
epistolary novel, in which case they may be called letter
s), those
of an anthology may be part
s or poem
s --- and so forth.
Where genre-specific tags are either not available or not required, a
neutral set of tag names to identify the main structural grouping of body
matter may be useful; these are discussed below. Relationships among
different structural groupings of the body are discussed in section
while the components of the body are of course the
subject matter of this whole chapter.
The subdivisions proposed here are all larger than the paragraph-level
units described in section , and are usually associated
with
some referencing scheme or title information. Each subdivision groups
elements at the next level down, and, except for the top level, each
proposed tag is tied to a particular hierarchic level. The tag names
proposed are deliberately neutral: the largest possible such subdivision of
the body text should be tagged div0, and the smallest possible
div5. The digit following the `div' (for division) identifies
the level of the element it tags within the overall hierarchy. It may be
convenient to think of `div0' as corresponding with `part', and `div1'
as corresponding with `chapter', but this need not necessarily be the
case.
Each div0, div1, div2 etc. tag may take
the following optional attributes:
nameSpecifies the name used for this structural subdivision in
the original text. Examples include `part', `canto' etc.
refSpecifies whether this structural subdivision is used in a
canonical reference scheme (see further section );
takes the values `Y' or `N'.
In general, the contents of any division element, say div3 will
be composed of the following, in the sequence given:
- An optional title or heading line, e.g.
Section 4: The Larch
. This
should be tagged using the heading tag.
- Possibly some introductory text, tagged using the `paragraph level units'
referred to in section above.
- Optionally, a sequence of one or more tagged divisions at the next level
down, i.e. in this case, a sequence of div4 tags.
- An optional closing title or heading line, e.g.
End of the fourth
section
. This should be tagged using the trailer tag.
Note that once a division contains only `paragraph level units', no further
subdivisions can be introduced; in the same way, once a subdivision is
encountered in a division, no further paragraph level units can be
introduced. In other words, within a div3 element, (and
disregarding heading and trailer elements), only the
following sequences are legal:
- paragraphs ... div4 ... div4 ... div4
- paragraphs
- div4 ... div4 ... div4
- nothing at all
but the following sequences are illegal:
- div4 ... div4... paragraphs ... div4
- paragraphs ... div4 ... paragraphs
Note also that the top level of the hierarchy may begin with either a
div0element or a div1element. This convention (corresponding
with the idea that a type-set document may begin either with a `level 0' or
a `level 1' heading) is provided for convenience and compatability with
widely used formatting systems.
An alternative method would be to use a single level tag, called
body.part with attributes `level' and `type'. This is not
however proposed, since whatever it would gain in generality of application
would be lost in difficulty of processing.
Back Matter
Conventions as to which elements are grouped as back matter and which as front
vary. For example, some books place information typically found in a table of
contents at the front, and others at the back. Title pages may appear at
the back of a book as well as at the front. The same sub-elements as those
mentioned in the discussion of front matter (see section )
may therefore be included in back matter. The following lists the major
subdivisions or groupings of such sub-elements which are regarded as typical
of back matter.
appendixAn ancillary self-contained section of a work, often
providing additional but in some sense extra-canonical text
glossaryA special form of index in which terms are associated with
definition texts.
notesA section in which textual or other kinds of notes are gathered
together
bibliographyA section in which full bibliographic references are
given for all the bibliographic items cited in a text. (See further section
)
indexAny form of index to the work.
colophonA printer's device or other statement performing a function
analagous to that of the imprint on a title page
back.partAny other clearly identified section of the back matter