Typographic Rendition and other Appearance Features

In general, descriptive markup is used to record the structural and other features signalled by the physicial rendition (or the presentation) of a text, not the presentation itself. However, in special circumstances it may be necessary to tag the presentation of a text. Historical documents often contain formatting whose purpose is not well understood, and it may therefore not be possible to determine what structural features are represented by particular presentational features. In such cases, formatting information must be preserved so that it may be interpreted later. Early printed books and manuscripts also contain formatting whose significance is not entirely understood, and which must be preserved for that reason.

The other case where formatting information must be tagged is that of texts where the formatting details form a significant part of the content of the text, such as poetry, on which see further section . In general formatting information can be categorised as follows:

The rest of this section addresses each of the above general topics in turn. It should be emphasized here, as elsewhere, that much more work is needed in this area and that the recommendations provided are intended to deal only with simple cases, and to be generally indicative of possibilities.

Rendition associated with specific features

Most changes of rendition signal some underlying textual feature of the type described elsewhere in the present chapter. If details of presentation are to be recorded for such features, the recommended method is to use the `rendition' attribute on the tag which marks the underlying feature. If the underlying feature is unidentified or uncertain, then the general purpose highlight should be used instead (see section ). Some specific types of rendition features are closely associated with particular types of element, and are therefore described together with that element, notably quotations (see section ).

Suitable values for the `rendition' attribute will depend on the purpose for which it is being tagged. It will not normally be used to provide much more than a descriptive name for the typographic style or family used: suitable values might be chosen from: roman, italic, bold, smallcaps, underscored, smallertype, largertype, swash italic, black letter, fraktur, ragged right, ragged left, centred, Bodonoi 10 on 12, etc etc. Much more work is needed in this area before any convincing typology of typographic rendition can be proposed, and one is not attempted here. At this stage only simple distinctions can be recommended. As an example, consider the use of italic font in the following passage from Samuel Richardson's Clarissa (1747). A pretty common case, I believe; in all vehement debatings. She says I am too witty; Anglicè, too pert; I, that she is too wise; that is to say, being likewise put into English, not so young as she has been: in short, she is grown so much into a mother, that she had forgotten she ever was a daughter. ...

Clearly, the word `vehement' is not italicised for the same reason as the phrase `not so young as she has been'; the former is emphasized, while the latter is proverbial, but it also provides an ironic gloss for the words `too wise', in the same way as `too pert' glosses `too witty'. The glossed phrases are not however technical terms or cited words, but quoted phrases, as if Clarissa were putting words into her own and her mother's mouths. Finally the words `mother' and `daughter' are apparently italicised simply to oppose them in the sentence; certainly they do not fit into any of the categories so far proposed as reasons for italicising. They are thus best tagged using the highlight with a `rendition' attribute, as follows:

<![ CDATA [ A pretty common case, I believe; in all <emph rendition=italic> vehement</emph> debatings. She says I am <q rendition=italic> too witty;</q> <foreign lang=LA rendition=roman>Anglic&egrave;, </foreign> <gloss rendition=italic>too pert</gloss>; I, that she is <q rendition=italic>too wise</q>; that is to say, being likewise put into English, <gloss rendition=italic> not so young as she has been</gloss>: in short, she is grown so much into <highlight rendition=italic>mother,</highlight> that she had forgotten she ever was a <highlight rendition=italic>daughter</highlight> ]]>

Special Layout Tags

A second major group of presentational features consists of those concerned with page layout. Page, column and line breaks, headings etc. may carry information intrinsically, in the way that they are rendered in addition to the information the convey about the structuring of the text. If page breaks (etc.) are of importance only as a means of subdividng the text for reference purposes, then the proposals of section should be sufficient. If textual features, such as running titles, column headings, page numbers etc., are to be treated as a part of the content of a text, it will almost certainly be necessary to define a separate concurrent hierarchy for the elements concerned, since page divisions (etc.) rarely fit into the same structural hierarchy as that of the text they contain. Even then, a full description of a particularly complex layout structure may be simply impossible. It is clear that considerable work is needed in this area. All that is proposed at present is a set of simple building bricks which may be used to record the rendition of particularly significant elements which occur in most texts. The method adopted is similar to the milestone approach proposed in section .

Three empty elements are proposed: page.break to mark the start of a new page, col.break to mark the start of a new column and line.break to mark the start of a new typographic line. In addition, a fourth empty element vertical.space is provided to mark areas of white space within the page. All four are described below.

Printers Ornaments and other devices

Graphical devices that are incorporated into a text at the character level, such as bullets or leafstops, are most easily represented as entity references. More complex graphical elements such as rules or printers ornaments should however be represented as empty elements; these two are described further below. Completely graphical elements such as figures, illustrations or frontispieces are represented by particular elements described elsewhere in this chapter (see sections and ).

The rule tag should be used to mark the presence of a printer's rule or similar horizontal ornament drawn across the width of the page. It has no content and takes the following attributes: rendition Text describing the type of rule. Suitable keywords might be `single', `double', `ornamental'... rule.size Specifies the height of the rule expressed in one of alllowable units rule.units Units in which the vertical space is measured. May be one of `inches', `mm', `points'.

The ornament tag should be used to represent any printer's ornament or decorative feature other than a single character (for which an entity reference should be used) or a full illustration or figure. Ornamental initial capitals may also be rendered using this tag. It has no content and takes the following attributes: type Type describing the ornament. Suitable keywords might be `initial cap', `emblem', `cartouche' etc. text Any text embedded within the ornament for example, in the case of an initial capital, the letter capitalised, or, the text enclosed by a cartouche. This may duplicate material already encoded as part of some other element: the redundancy is provided to simplify processing. std.num A standard number used to identify this ornament in a standard published catalogue of printers' devices. image An external entity reference identifying a file containing a graphic image of the ornament itself. Examples:

<![ CDATA [ <p><ornament type=dic text=T>This ... ]]> Here the first word of a paragraph (`This') has been printed with a dropped initial capital (`dic').