Literary texts

A satisfactory definition of `literature' is not easy to provide, and we do not attempt one here. The texts discussed here are distinguished from others by their use of recognisable structural forms, their use of stylistic and formal conventions for their own sakes, and by the consequent richness and complexity of their internal structure. Some specific formal conventions, such as those of verse and drama are sufficiently well established and understood that it is possible to propose tagsets for them: this we do in sections and below. Others, such as those employed in the structures of formal narratives (novels, sermons, guidebooks, recipe books, etc. etc.), are less clearly defined and we can only suggest some examples of ways in which they might be tagged, (see ). Because the form of a literary work is, in an important sense, a part of its content, and because literary works are generally regarded as objects in their own right, the descriptive tagging of literary works poses in acute form a variety of problems touched on elsewhere in these guidelines, which for convenience we summarise here.

Verse

The fundamental unit of verse considered here is the line. This should normally be tagged independently of the typographic line, as a metrical unit, wherever the line breaks may come in the printed text, or however formatted. In many poetic forms however, the layout of text on the page and the indentation of lines with respect to one another are important parts of the sense of a poem and should therefore be tagged. An attribute `indent' with numeric values could be used for this purpose. The units are unimportant in this case, indicating simply the relative size of indents used in the text.

<![ CDATA[ <poem> <line indent=1>This line is indented to the right <line>This line is against the left margin <line indent=-1>This line is indented to the left <line indent=2>This line is further indented than the first one <line indent=-2>This line is probably disappearing into the binding ]]>

Note that indentation for purely typographic purposes need not be preserved. For example, where a verse line has been broken across two typographic lines as a result of the exigencies of available space, it is customary to indent the second part of the line to indicate as much. We do not advocate the encoding of such rendering-specific information, unless of course it is the relationship between metrics and typography which forms the subject of study. In such a case, the tag line.break, as further discussed in section , could be used to specify break points within metrical lines and their appearance.

Metre

With a few notable exceptions (Gerard Manley Hopkins being a particularly interesting one in English literature), poets do not mark the metre of their verse explicitly nor do agreed typographic conventions exist for the purpose; the tagging of metre is thus an exercise in analysis and interpretation, for which only very general principles can be adumbrated here.

For most types of syllabic verse, the metrical status of a line is adequately described by an attribute `metre', with values taken from a list such as `hexameter', `iambic pentameter', `trimeter' to summarise both the number of metrical feet in the line and their pattern. (Values such as `heroic couplet' or `alexandrine' which indicate both rhyme and metre are confusing and should be avoided.) For more complex or less well-defined metrical patterns, a metre sub-element of the line element would be appropriate, the content of which would encode the metrical pattern in some application- specific protocol. Note that in both cases, the information encoded represents the metrical pattern independent of its realisation in the particular line (and could therefore be replaced by a simple entity reference to minimise data entry chores). To compare the metrical pattern with the actual content of the line, it is necessary to introduce sub-elements of the basic line tag, such as foot, with attributes `type' (`iamb', `anapaest' etc). An empty caesura tag may be used to mark the caesura or other content-free elements of metrical analysis. The mapping between these sub-elements and the actual content of the line would then be represented by use of the alignment mechanism discussed in section .

Rhyme

Similar considerations apply to rhyme scheme. In regular cases, this should be encoded as an attribute `rhyme' of the appropriate higher level unit (stanza for stanzaic verse, poem for others). Its values can be taken from a list of agreed names, where this exists, or more usually given as a string such as ABABCDCDEE. The advantage of using an attribute for this purpose rather than a sub-element is that a default value may be specified for the whole work. Where more the rhyme scheme defines more than one unit (for example in heroic verse containing a mixture of couplets and triplets), there will of course be more than one such default.

<![ CDATA [ <poem rhyme=AA> <line>First line of a couplet <line>Second line of a couplet <line>First line, second couplet ... <triplet rhyme=AAA> <line>First line of a triplet <line>Second line of a triplet <line>Third line of a triplet </triplet> <line> First line of yet another couplet ... ]]>

For very detailed studies of rhyming it will be necessary to specify a sub-element rhyme in order to bound the words or morphemes which constitute the rhyme. In this case also, an ID may be supplied to identify the rhyme and thus link rhyming words or phrases. See further . As with alliterative verse, where a very detailed analysis is needed, perhaps down to the phoneme level, a parallel phonetic transcription linked by the kind of mapping scheme described in sections and may be the most effective solution. It should also be noted that rhyme and alliteration can be marked either as chains (lists in which each rhyming word points to the next) or as trees (structured lists in which the first occurrence of a rehyme or alliuterating syllable in some sense dominates all the subsequent ones). Further work is needed in this area.

Higher level structures

Verse lines are typically grouped into higher level structural units, which are often labelled in particular, genre-specific, ways, and generally coincide with metrical boundaries. So many different literary forms exist that an exhaustive list of them is barely possible. In most however, we may discern a simple hierarchic grouping (cantos composed of stanzas; books composed of verse paragraphs; sonnet sequences etc.). As with prose texts, each such hierarchic grouping may have its own title or number, thus permitting the creation of canonical reference schemes as described in section . For most cases, the hierarchy proposed here should be adequate: for particular verse forms, it may be redefined using the techniques outlined in section . poem An individual titled or numbered work book Major subdivision of a long poem such as an epic; synonymous with `canto' stanzaA group of lines which functions as a metrical unit para A group of lines, typically in blank verse, usually marked by indentation or spacing part Any other identified grouping of lines or stanzas

The following additional elements may appear at almost any point in the structure refrain A chorus, refrain or burden. Typically given in full on its first appearance and abbreviated on second and subsequent ones. epigraph A quotation or other other comment typically found at the start of a major subdivision of a literary work. May contain as a subelement a cit tag to identify the work and author cited title A title or number, attached to a poem, book, stanza paragraph, or other structural unit. note An authorial (i.e. not introduced by a modern editor or transcriber of the text) note, gloss or summary. Takes an attribute `place' with values `foot', `left', `right'. signature A special note at the foot of a poem, typically giving the date and place of its composition

Drama

The main body of a dramatic work is composed of speeches and stage directions, which may be embedded within or between speeches. Speeches may be in prose or verse or a mixture; each speech is usually assigned to one speaker, but can sometimes be shared by several. Speeches are normally consecutive but may overlap. The low-level elements of s-unit (for prose) and line (for verse) are frequently broken across speeches. In older or more formal drama, individual parts of the drama are given specific names (for example `strophe' and `antistrophe' in classical Greek drama). Additional elements such as prologues, epilogues and embedded songs or masques are commonplace. Sequences of speeches, stage directions etc. are are usually (but not always) grouped into higher level structures such as acts and scenes. And of course plays are often found embedded within plays (Hamlet, A Midsummer Nights Dream...). Drama is thus one of the most complex of literary forms, and it is probably impossible to define a single hierarchic structure for a drama of anything but banal simplicity.

At least speeches,speaker prefixes, stage directions and interludes should be distinguished, as further discussed below.

Speeches

Each speech, as delimited in the text being encoded, should be tagged with a Speech tag. An optional attribute `Spkr' may be useful to supply a normalised or coded form of the speaker for analytic purposes. Where part or all of a speech is so categorised, sub-elements Soliloquy and aside may be appropriate in some kinds of analysis. Where part or all of a speech is in verse, each line should be tagged with the line tag, as described above for poetry in general. Where a verse line has been broken between two speeches, an empty lj (linejoin) element may be used to reconstitute the verse structure.

A speaker prefix is a specialised form of stage direction used in printed version of dramatic text to indicate the speaker or speakers of a particular speech. Rather than regard this as a subelement of the speech element, we propose to treat the speech prefix as an element of the text in its own right, using the tag Speaker to identify it. Note the distinction between this tag and the `spkr' attribute of the speech tag.

<![ CDATA [ <speaker>Cor.</speaker> <speech spkr=CORDELIA>Nothing, my lord.</speech> <speaker>Lear</speaker> <speech spkr=LEAR>How now, Cordelia... ]]>

Stage directions

Each stage direction in the source should be tagged with the tag STAGE. An attribute `type' may be used to categorise the stage direction in a variety of ways, for example, as one of `descriptive' (for initial scene setting etc.), `movement' (for entrances and exits) or `business' (for actions embeddded in or between speeches). Directions which combine even these three simple aspects are not uncommon however and they are proposed here only as examples. While stage directions of type `movement' (for example) are of great importance in determing which characters are on stage at any time, an accurate dramatic analysis of stage business will usually require additional interpretative tagging, not described here.

High level elements

Many plays are traditionally divided into acts, composed of scenes, which also provide canonical reference units, as described in . The overall structure of such plays may thus be described fully by a simple DTD like the following

<![ CDATA[ <!element play - - (front, body, back) > <!element body - - (act+) +(interlude)> <!element act - - (number, scene+)> <!element scene - - (number, (stage+ & speech+)> ]]> In other kinds of drama, however, formal division into act and scene has little structural importance. Scenes may be defined, as in classical French drama, simply by the entrance or exit of one or more characters, or the play structured simply as a more or less arbitrary sequence of collections of speeches and actions. Scenes may also be characterised by their narrative level, (see section z724), as in an induction or a play within a play.

Other elements

Except where indicated, the following higher level structural elements may appear at any point within a dramatic piece. interludeAn interlude is any embedded element, typically a song or masque, which does not form part of the main dramatic structure. It may contain speeches and stage directions of its own. prologueA prologue is a single speech at the start of a play which does not form part of the main action. It may have a title and a speaker. epilogueAn epilogue is a single speech at the end of a play which does not form part of the main action. It may have a title and a speaker. cast.listA castlist or dramatis personae may appear either at the start or the end of the play. It may have a title and a stage direction indicating the place of action, and is composed of a sequence of cast elements, each containing one or more of a character's name (role), a description of a character e.g. A gentleman of means (role.desc) and an actor's name (actor)

Narrative structures

Most of the base elements of many literary prose texts differ little, if at all, from those described in the section on core structural tags (hdref refid=z63>). Novels and other prose texts are composed of paragraphs which may be split into s-units or grouped into chapters, sections etc. in the same way as other types of text.

One particular feature of narrative texts which may need care is in the identification of different levels of narration. The simple case of direct speech has already been addressed in section . The dspeech tag proposed there should be used in simple tagging of simple narrative. Authorial interruption of the type said she should be marked off by the tag inspeech tag. more discussion needed here

This simple approach to the tagging of dialogue needs extension in the following cases:

Examples to be supplied

Similar considerations apply to content or thematic analyses, which are not considered in any detail at present, though it is clear that existing mechanisms for identifying loosely-defined discontinuous or overlapping segments will be needed, as well as hypertextual links and concurrent structures.

Nested dialogues and nested tales A common convention in 18th and 19th c novels is for one speaker to launch into a new narrative, as for example Charles Maturin's notorious Melmoth the Wanderer (1817), in which a succession of mysterious strangers launch into narratives in the midst of which further mysterious strangers launch into further narratives ... Aside from the nesting, and the consequent possibility of interruptions, such Chinese box narrative structures are no different from the sequentially organised multi-narrative work, such as collections of tales told within a common framework like the Decameron or more complex framed tales such as Bronte's Wuthering Heights. If it is required to handle all of these in a uniform way, we suggest that a narrative, with a `level' attribute is the easiest way to do it.

At any one point in a text, only a single narrative is active. Narrative boundaries often, but not invariably, coincide with boundaries of other formal elements such as chapters; for this reason they are best tagged using milestone (empty) tags. Note that this simple approach may not be appropriate for very highly fragmented or experimental works, in which placing the milestones may be problematic. It is intended for use with narrative switches which are clearly and unambiguously marked as part of the authorial intent. The following supposedly literary items should really be included in section 6.2 - general structural elements - I believe epigrapha quotation or other phrase, anonymous or attributed, sometimes used as part of a section or chapter heading or on a title page in a literary work. Subelements: author, work, reference, language imprintImprint as given on a title page. Includes subelements Name, date. epistleThe epistle is a formal device very frequently used in early printed books which performs a function roughly analogous to the modern blurb, foreword and preface or dedication. Epistles are often signed, by the author, pubnlisher or printer or by some well-wisher of same. The signature should be tagged separately using the tag. They may be addressed to the reader directly, to the author (in praise of the work) or to one or more patrons, usually employing a formula or descriptive phrase of some kind. As well as tagging this, it may be useful to include an attribute in the EPISTLE tag indicating which kind of addressee the epistle has. frontispieceA pictorial frontispiece, possibly including some text.