Typographic Rendition and other Appearance Features
In general, descriptive markup is used to record the structural and
other features signalled by the physicial rendition (or the
presentation) of a text, not the presentation itself. However,
in special circumstances it may be necessary to tag the presentation
of a text. Historical documents often contain formatting whose
purpose is not well understood, and it may therefore not be possible
to determine what structural features are represented by particular
presentational features. In such cases, formatting information
must be preserved so that it may be interpreted later. Early
printed books and manuscripts also contain formatting whose
significance is not entirely understood, and which must be preserved
for that reason.
The other case where formatting information must be tagged is that of
texts where the formatting details form a significant part of
the content of the text, such as poetry, on which see further section
. In general formatting information can be categorised
as follows:
- associated with a specific textual element
- concerned exclusively with the layout of features on a page, thus
forming a separate concurrent hierarchy from that of the text
- concerned with specific quasi-textual or non-textual objects such
as graphics
The rest of this section addresses each of the above general topics in
turn. It should be emphasized here, as elsewhere, that much more work
is needed in this area and that the recommendations provided are intended
to deal only with simple cases, and to be generally indicative of
possibilities.
Rendition associated with specific features
Most changes of rendition signal some underlying textual feature of the
type described elsewhere in the present chapter. If details of
presentation are to be recorded for such features, the recommended
method is to use the `rendition' attribute on the tag which marks
the underlying feature. If the underlying feature is unidentified
or uncertain, then the general purpose highlight should be
used instead (see section ). Some specific types
of rendition features are closely associated with particular types
of element, and are therefore described together with that element,
notably quotations (see section ).
Suitable values for the `rendition' attribute will depend on the purpose
for which it is being tagged. It will not normally be used to provide
much more than a descriptive name for the typographic style or family
used: suitable values might be chosen from: roman, italic, bold, smallcaps,
underscored, smallertype, largertype, swash italic, black letter, fraktur,
ragged right, ragged left, centred, Bodonoi 10 on 12, etc etc. Much more
work is needed in this area before any convincing typology of typographic
rendition can be proposed, and one is not attempted here. At this stage
only simple distinctions can be recommended.
As an example, consider the use of italic font in the following passage
from Samuel Richardson's Clarissa (1747).
A pretty common case, I believe; in all
vehement debatings. She says I am
too witty; Anglicè,
too pert; I, that she is
too wise; that is to say, being likewise
put into English, not so young as she has
been: in short, she is grown so much into a
mother, that she had forgotten
she ever was a daughter. ...
Clearly, the word `vehement' is not italicised for the same reason as
the phrase `not so young as she has been'; the former is emphasized,
while the latter is proverbial, but it also provides an ironic gloss
for the words `too wise', in the same way as `too pert' glosses `too
witty'. The glossed phrases are not however technical terms or
cited words, but quoted phrases, as if Clarissa were putting words
into her own and her mother's mouths. Finally the words `mother'
and `daughter' are apparently italicised simply to oppose them in
the sentence; certainly they do not fit into any of the categories
so far proposed as reasons for italicising. They are thus best
tagged using the highlight with a `rendition' attribute,
as follows:
vehement debatings. She says I am
too witty;
Anglicè,
too pert; I, that she is
too wise
; that is to say, being likewise
put into English, not so young as she has
been: in short, she is grown so much into mother, that she had forgotten
she ever was a daughter
]]>
Special Layout Tags
A second major group of presentational features consists of those concerned
with page layout. Page, column and line breaks, headings etc. may carry
information intrinsically, in the way that they are rendered in addition
to the information the convey about the structuring of the text. If page
breaks (etc.) are of importance only as a means of subdividng the text for
reference purposes, then the proposals of section should
be sufficient. If textual features, such as running titles, column headings,
page numbers etc., are to be treated as a part of the content of a text,
it will almost certainly be necessary to define a separate concurrent
hierarchy for the elements concerned, since page divisions (etc.) rarely
fit into the same structural hierarchy as that of the text they contain.
Even then, a full description of a particularly complex layout structure
may be simply impossible. It is clear that considerable work is needed
in this area. All that is proposed at present is a set of simple building
bricks which may be used to record the rendition of particularly significant
elements which occur in most texts. The method adopted is similar to the
milestone approach proposed in section .
Three empty elements are proposed: page.break to mark the start of
a new page, col.break to mark the start of a new column and
line.break to mark the start of a new typographic line. In
addition, a fourth empty element vertical.space is provided
to mark areas of white space within the page. All four are described below.
-
The page.break tag appears at the start of every new page. In
addition to the usual `id' and `n' attributes (of which the latter should
be used to supply the actual number of the page), it takes the following
special purpose optional attributes:
ed A tag for the edition in which the page break appears at this
point. Details of the edition are given in the source.description
element in the file header.
sig The signature of the page, both as given on the page and as
calculated by inspection of the work's foliation. The latter should be
given in square brackets, and preceded by an equals sign if it differs from
that printed on the page.
catchword The catchword printed at the foot of the page
Example:
]]>
This tag would appear at the head of the page numbered 43 in the edition
of 1678, which has the signature "D4" though it is in fact the 6th leaf of
the third forme, and which has the catchword "Thus" printed on it.
-
The col.break tag appears at the start of every new column on
a page. It may be omitted if the page has no columns. In
addition to the usual `id' and `n' attributes (of which the latter should
be used to supply the actual number of the column, it takes the following
special purpose optional attributes:
ed A tag for the edition in which the column break appears at this
point. Details of the edition are given in the source.description
element in the file header.
catchword Any catchword printed at the foot of the column.
-
The line.break tag appears at the start of every new line. Note
that this tag should not be used for lines of verse, which are structural
units. (See further section ).
It is intended only for cases where the lineation of a prose
text is considered of importance in its own right. In addition to the usual
`id' and `n' attributes (of which the latter should be used to supply the
line number), it takes the following special purpose optional attributes:
ed A tag for the edition in which the line break appears at this
point. Details of the edition are given in the source.description
element in the file header.
rendition Defines how the line is aligned with respect to the
rest of the text, and takes values `ljust' (left justified), `rjust'
(right justified), `centred' etc.
-
The vertical.space tag may be used to signal vertical white
space within a page or column which is regarded as significant for some
reason. It takes the following attributes:
ed A tag for the edition in which the vertical space appears at this
point. Details of the edition are given in the source.description
element in the file header.
size Size of the vertical space expressed in one of the allowable
units
units Units in which the vertical space is measured. May be one of
`inches', `mm', `points'.
Printers Ornaments and other devices
Graphical devices that are incorporated into a text at the character
level, such as bullets or leafstops, are most easily represented as
entity references. More complex graphical elements such as rules or
printers ornaments should however be represented as empty elements;
these two are described further below. Completely graphical elements
such as figures, illustrations or frontispieces are represented by
particular elements described elsewhere in this chapter (see sections
and ).
The rule tag should be used to mark the presence of a printer's
rule or similar horizontal ornament drawn across the width of the page. It
has no content and takes the following attributes:
rendition Text describing the type of rule. Suitable keywords
might be `single', `double', `ornamental'...
rule.size Specifies the height of the rule expressed in one of
alllowable units
rule.units Units in which the vertical space is measured. May
be one of `inches', `mm', `points'.
The ornament tag should be used to represent any printer's
ornament or decorative feature other than a single character (for
which an entity reference should be used) or a full illustration
or figure. Ornamental initial capitals may also be rendered using
this tag. It has no content and takes the following attributes:
type Type describing the ornament. Suitable keywords
might be `initial cap', `emblem', `cartouche' etc.
text Any text embedded within the ornament for example, in the
case of an initial capital, the letter capitalised, or, the text enclosed
by a cartouche. This may duplicate material already encoded as part of
some other element: the redundancy is provided to simplify processing.
std.num A standard number used to identify this ornament in
a standard published catalogue of printers' devices.
image An external entity reference identifying a file containing
a graphic image of the ornament itself.
Examples:
This ...
]]>
Here the first word of a paragraph (`This') has been printed with a
dropped initial capital (`dic').