By a text we understand an extended stretch of natural discourse,
whether written or spoken. The structure and features of the web of
discourse vary with the type of text and with the point of view of the
analyst. There are, however, some features for which there are
generally accepted typographical or manuscript conventions. We
distinguish structural and non-structural features present in a range of
written texts as follows.
These features are marked by spacing, punctuation marks or typographical
shifts, but there is no one-to-one correspondence between feature and
realization. Reactions to this fact differ. One may prefer to mark not
the typographic characteristics of a text's presentation (e.g. italics,
boldface) but the underlying textual feature signaled by the typography
(e.g. emphatic stress, book title, technical term). Or one may prefer
simply to mark the typographic features. The former approach, that of
The sections below deal with features common to many text types, primarily those for which there are established typographical or manuscript conventions. Apart from the types of features mentioned above, we also take into account the representation of figures and tables, critical apparatus, and parallel texts.
The tags given allow for descriptive markup; for cases where it is not possible, or even desirable, always to mark the underlying features, simple presentational tags are also given. There are also some general suggestions as to the representation of typographical features and layout; a fuller treatment of physical description remains a topic for further investigation during the next development cycle of the project.
Any text to be used for scholarly purposes must include or make provisions for a reference system, which makes it possible to refer uniquely to specific points in the text. Suitable reference points are provided by the tags for structural features like those mentioned above, or they may be taken over from the printed original (in the case of conversion of printed texts to machine-readable form). We discuss different solutions and also suggest methods for establishing cross-references within texts and inter-textual links.
We take written conventions as a starting-point in much the same way as they have been used for orthographic transcriptions of speech. But distinctions made in writing should not be carried over slavishly. Just as writing has made possible new types of texts (compared with speech), so will the electronic media (and have already). Our aim is to provide mechanisms for text representation appropriate to the electronic age.