Literary texts
A satisfactory definition of `literature' is not easy to
provide, and we do not attempt one here. The texts discussed
here are distinguished from others by their
use of recognisable structural forms, their use of stylistic and
formal conventions for their own sakes, and by the consequent
richness and complexity of their internal structure. Some
specific formal conventions, such as those of verse and drama are
sufficiently well established and understood that it is possible
to propose tagsets for them: this we do in sections
and below. Others, such as those
employed in the structures of formal narratives (novels, sermons,
guidebooks, recipe books, etc. etc.), are less clearly defined and
we can only suggest some examples of ways in which they might be
tagged, (see ). Because the form
of a
literary work is, in an important sense, a part of its content,
and because literary works are generally regarded as objects in
their own right, the descriptive tagging of literary works poses
in acute form a variety of problems touched on elsewhere in these
guidelines, which for convenience we summarise here.
- context: a literary work is rarely considered in
isolation. It must be interpreted (and is generally encountered)
in the context of a mass of accreted interpretation and
annotation. Multiple versions of the same textual object must be
encoded and the whole embedded in a critical interpretive
framework. (See sections and )
- presentation: the presentation or rendering of a literary
work (particularly, but not exclusively, an ancient one) cannot
be lightly disregarded. It may be impossible or critically
irresponsible to assume that a particular rendering feature has a
unique functional role. See section .
- complexity: literary works are typically mixed in form, using
elements drawn from many heterogenous structures simultaneously.
An encoding scheme in which (for example) verse lines can only
appear in collections of poetry will be unable to cope with
dramatic works which mix verse and prose even within a single
speech. Any literary analysis may wish to pay attention not just
to these multiple structures, but also to their interaction with
rendering features such as the layout of text on the page and
with elements of linguistic analysis. See sections
and
for more detailed discussion.
Verse
The fundamental unit of verse considered here is the
line. This should normally be tagged independently of the
typographic line, as a metrical unit, wherever the line breaks
may come in the printed text, or however formatted. In many
poetic forms however, the layout of text on the page and the
indentation of lines with respect to one another are important
parts of the sense of a poem and should therefore be tagged. An
attribute `indent' with numeric values could be used for this
purpose. The units are unimportant in this case, indicating simply
the relative size of indents used in the text.
This line is indented to the right
This line is against the left margin
This line is indented to the left
This line is further indented than the first one
This line is probably disappearing into the binding
]]>
Note that indentation for purely typographic purposes need not be preserved.
For example, where a verse line has been broken across two typographic lines
as a result of the exigencies of available space, it is customary to indent
the second part of the line to indicate as much. We do not advocate the
encoding of such rendering-specific information, unless of course it is the
relationship between metrics and typography which forms the subject of study.
In such a case, the tag line.break, as further discussed in
section , could be used to specify break points within
metrical lines and their appearance.
Metre
With a few notable exceptions (Gerard Manley Hopkins being a
particularly interesting one in English literature), poets do not
mark the metre of their verse explicitly nor do agreed
typographic conventions exist for the purpose; the tagging of
metre is thus an exercise in analysis and interpretation, for
which only very general principles can be adumbrated here.
For most types of syllabic verse, the metrical status of a line
is adequately described by an attribute `metre', with values
taken from a list such as `hexameter', `iambic pentameter',
`trimeter' to summarise both the number of metrical feet in the
line and their pattern. (Values such as `heroic couplet' or
`alexandrine' which indicate both rhyme and metre are confusing
and should be avoided.) For more complex or less well-defined
metrical patterns, a metre sub-element of the
line element would be appropriate, the content of
which would encode the metrical pattern in some application-
specific protocol. Note that in
both cases, the information encoded represents the metrical
pattern independent of its realisation in the particular line
(and could therefore be replaced by a simple entity reference to
minimise data entry chores). To compare the metrical pattern with
the actual content of the line, it is necessary to introduce
sub-elements of the basic line tag, such as foot, with
attributes `type' (`iamb', `anapaest' etc). An empty
caesura tag may be used to mark the caesura or other
content-free elements of metrical analysis. The mapping between
these sub-elements and the actual content of the line would then
be represented by use of the alignment mechanism discussed in
section .
Rhyme
Similar considerations apply to rhyme scheme. In regular cases,
this should be encoded as an attribute `rhyme' of the appropriate
higher level unit (stanza for stanzaic verse,
poem
for others). Its values can be taken from a list of agreed names,
where this exists, or more usually given as a string such as
ABABCDCDEE
. The advantage of using an attribute for this
purpose rather than a sub-element is that a default value may be
specified for the whole work. Where more the rhyme scheme defines more
than one unit (for example in heroic verse containing a mixture of
couplets and triplets), there will of course be more than one such
default.
First line of a couplet
Second line of a couplet
First line, second couplet
...
First line of a triplet
Second line of a triplet
Third line of a triplet
First line of yet another couplet
...
]]>
For very detailed studies of rhyming it will be necessary to
specify a sub-element rhyme in
order to bound the words or morphemes which constitute the rhyme.
In this case also, an ID may be supplied to identify the rhyme
and thus link rhyming words or phrases. See further .
As with alliterative verse, where a very detailed
analysis is needed, perhaps down to the phoneme level, a parallel
phonetic transcription linked by the kind of mapping scheme
described in sections and
may be the most effective solution. It should also be noted that
rhyme and alliteration can be marked either as chains (lists in
which each rhyming word points to the next) or as trees
(structured lists in which the first occurrence of a rehyme or
alliuterating syllable in some sense dominates all the subsequent
ones). Further work is needed in this area.
Higher level structures
Verse lines are typically grouped into higher level structural
units, which are often labelled in particular, genre-specific,
ways, and generally coincide with metrical boundaries.
So many different literary forms exist that an exhaustive
list of them is barely possible. In most however, we may discern
a simple hierarchic grouping (cantos composed of stanzas; books
composed of verse paragraphs; sonnet sequences etc.). As with
prose texts, each such hierarchic grouping may have its own title
or number, thus permitting the creation of canonical reference
schemes as described in section . For most
cases, the hierarchy proposed here should be adequate: for
particular verse forms, it may be redefined using the techniques
outlined in section .
poem An individual titled or numbered work
book Major subdivision of a long poem such as an epic;
synonymous with `canto'
stanzaA group of lines which functions as a
metrical unit
para A group of lines, typically in blank
verse, usually marked by indentation or spacing
part Any other identified grouping of lines or stanzas
The following additional elements may appear at almost any point
in the structure
refrain A chorus, refrain or burden. Typically given in
full on its first appearance and abbreviated on
second and subsequent ones.
epigraph A quotation or other other comment typically
found at the start of a major subdivision of a
literary work. May contain as a subelement a
cit tag to identify the work and author
cited
title A title or number, attached to a poem, book,
stanza paragraph, or other structural unit.
note An authorial (i.e. not introduced by a modern
editor or transcriber of the text) note, gloss or
summary. Takes an attribute `place' with values
`foot', `left', `right'.
signature A special note at the foot of a poem,
typically giving the date and place of its
composition
Drama
The main body of a dramatic work is composed of speeches and
stage directions, which may be embedded within or between
speeches. Speeches may be in prose or verse or a mixture; each
speech is usually assigned to one speaker, but can sometimes be
shared by several. Speeches are normally consecutive but may
overlap. The low-level elements of s-unit (for prose) and line
(for verse) are frequently broken across speeches. In older or
more formal drama, individual parts of the drama are given
specific names (for example `strophe' and `antistrophe' in
classical Greek drama). Additional elements such as prologues,
epilogues and embedded songs or masques are commonplace.
Sequences of speeches, stage directions etc. are are usually (but
not always) grouped into higher level structures such as acts and
scenes. And of course plays are often found embedded within plays
(Hamlet, A Midsummer Nights Dream...). Drama is thus one of the
most complex of literary forms, and it is probably impossible to
define a single hierarchic structure for a drama of anything but
banal simplicity.
At least speeches,speaker prefixes, stage directions and interludes should be
distinguished, as further discussed below.
Speeches
Each speech, as delimited in the text being encoded, should be
tagged with a Speech tag. An optional attribute `Spkr' may
be useful to supply a normalised or coded form of the speaker for
analytic purposes. Where part or all of a
speech is so categorised, sub-elements Soliloquy and
aside may be appropriate in some kinds of analysis.
Where part or all of a speech is in verse, each line should be
tagged with the line tag, as described above for poetry
in general. Where a verse line has been broken between two
speeches, an empty lj (linejoin) element may be used
to reconstitute the verse structure.
A speaker prefix is a specialised form of stage direction used in
printed version of dramatic text to indicate the speaker or speakers
of a particular speech. Rather than regard this as a subelement of the
speech element, we propose to treat the speech prefix as an element of
the text in its own right, using the tag Speaker to identify
it. Note the distinction between this tag and the `spkr' attribute of
the speech tag.
Cor.
Nothing, my lord.
Lear
How now, Cordelia...
]]>
Stage directions
Each stage direction in the source should be tagged with the
tag STAGE. An attribute `type' may be used to
categorise the stage direction in a variety of ways, for example,
as one of `descriptive' (for initial scene setting etc.), `movement'
(for entrances and exits) or `business' (for actions embeddded
in or between speeches). Directions which combine even these three
simple aspects are not uncommon however and they are proposed
here only as examples. While stage directions of type `movement'
(for example) are of great
importance in determing which characters are on stage at any
time, an accurate dramatic analysis of stage business will
usually require additional interpretative tagging, not described
here.
High level elements
Many plays are traditionally divided into acts, composed of
scenes, which also provide canonical reference units, as
described in . The overall structure of such
plays may thus be described fully by a simple DTD like the
following
]]>
In other kinds of drama, however, formal division into act and
scene has little structural importance. Scenes may be defined, as
in classical French drama, simply by the entrance or exit of one
or more characters, or the play structured simply as a more or
less arbitrary sequence of collections of speeches and actions. Scenes
may also be characterised by their narrative level, (see section z724), as in
an induction or a play within a play.
Other elements
Except where indicated, the following higher level structural
elements may appear at any point within a dramatic piece.
interludeAn interlude is any embedded element, typically
a song or masque, which does not form part of the main dramatic
structure. It may contain speeches and stage directions of its
own.
prologueA prologue is a single speech at the start
of a play which does not form part of the main action. It may
have a title and a speaker.
epilogueAn epilogue is a single speech at the end of a
play which does not form part of the main action. It may have a
title and a speaker.
cast.listA castlist or dramatis personae may appear
either at the start or the end of the play. It may have a title
and a stage direction indicating the place of action, and is
composed of a sequence of cast elements, each
containing one or more of a character's name (role),
a description of a character e.g. A gentleman of means
(role.desc) and an actor's name (actor)
Narrative structures
Most of the base elements of many literary prose texts differ
little, if at all, from those described in the section on core
structural tags (hdref refid=z63>). Novels and other prose texts
are composed of paragraphs which may be split into s-units or
grouped into chapters, sections etc. in the same way as other
types of text.
One particular feature of narrative texts which may need care is
in the identification of different levels
of narration.
The simple case of direct speech has already been addressed in
section . The dspeech tag proposed
there should be used in simple tagging of simple narrative.
Authorial interruption of the type said she
should be
marked off by the tag inspeech tag. more
discussion needed here
This simple approach to the tagging of dialogue needs extension
in the following cases:
- where different categories or registers of dialogue are
to be marked (for example, Burrows study of Jane Austen
Burrows ). In this case, an attribute 'label' with values such as
'authorial' 'implied' 'internal-monologue' etc. may be used.
- where dialogue is deeply nested, as in the
tale within
the tale
convention of many literary works. see below
- where speakers of dialogue are indicated, as in a play, by
formal prefixes, it is better to treat the passage as drama (see
above) using Speech and speaker tags
- other changes in narrative level might be handled as
attributes (where their extent coincides with some other
narrative structure, e.g. a chapter, s-unit or speech) or as
empty elements (
milestones
) (where this is not the case).
Examples to be supplied
Similar considerations apply to content or thematic analyses,
which are not considered in any detail at present, though it is
clear that existing mechanisms for identifying loosely-defined
discontinuous or overlapping segments will be needed, as well
as hypertextual links and concurrent structures.
Nested dialogues and nested tales
A common convention in 18th and 19th c novels is for one speaker
to launch into a new narrative, as for example Charles Maturin's
notorious Melmoth the Wanderer
(1817), in which a
succession of mysterious strangers launch into narratives in the
midst of which further mysterious strangers launch into further
narratives ... Aside from the nesting, and the consequent possibility
of interruptions, such Chinese box narrative
structures are no different from the sequentially organised
multi-narrative work, such as collections of tales told within a
common framework like the Decameron or more complex
framed
tales such as Bronte's Wuthering Heights
. If
it is required to handle all of these in a uniform way, we
suggest that a narrative, with a `level' attribute is the easiest
way to do it.
At any one point in a text, only a single
narrative is active. Narrative boundaries often, but not invariably,
coincide with boundaries of other formal elements such as
chapters; for this reason they are best tagged using milestone (empty)
tags. Note that this simple approach may not be appropriate for very
highly fragmented or experimental works, in which placing the milestones
may be problematic. It is intended for use with narrative switches
which are clearly and unambiguously marked as part of the authorial intent.
The following supposedly literary items should really be included in
section 6.2 - general structural elements - I believe
epigrapha quotation or other phrase, anonymous or
attributed, sometimes used as part of a section or chapter
heading or on a title page in a literary work. Subelements:
author, work, reference, language
imprintImprint as given on a title page. Includes
subelements Name, date.
epistleThe epistle is a formal device very frequently
used in early printed books which performs a function roughly
analogous to the modern blurb, foreword and preface or
dedication. Epistles are often signed, by the author, pubnlisher
or printer or by some well-wisher of same. The signature should
be tagged separately using the tag. They may be
addressed to the reader directly, to the author (in praise of
the work) or to one or more patrons, usually employing a formula
or descriptive phrase of some kind. As well as tagging this, it
may be useful to include an attribute in the EPISTLE tag
indicating which kind of addressee the epistle has.
frontispieceA pictorial frontispiece, possibly including
some text.