Features Common to Many Text Types

Principles and Definitions
By a text we understand an extended stretch of natural discourse, whether written or spoken. The structure and features of the web of discourse vary with the type of text and with the point of view of the analyst. There are, however, some features for which there are generally accepted typographical or manuscript conventions. We distinguish structural and non-structural features present in a range of written texts as follows. Structural features include front matter (title page, preface, table of contents etc.); body: (chapter, section, sub-section etc.) and back matter (appendix, bibliography, index etc.). Non-structural features apply to individual words or sequences of words in running text and include such things as emphasis, quotation, foreign word etc.
These features are marked by spacing, punctuation marks or typographical shifts, but there is no one-to-one correspondence between feature and realization. Reactions to this fact differ. One may prefer to mark not the typographic characteristics of a text's presentation (e.g. italics, boldface) but the underlying textual feature signaled by the typography (e.g. emphatic stress, book title, technical term). Or one may prefer simply to mark the typographic features. The former approach, that of descriptive markup, allows for more sophisticated analysis and processing of the text, at the cost of requiring more time and effort and at the risk of introducing subjective or erroneous decisions. The latter approach, that of presentational markup, has the advantage of making it simpler to tag texts acquired from typesetting tapes or optical scanners. Either approach may be used in tagging texts for TEI-conformant interchange.
The sections below deal with features common to many text types, primarily those for which there are established typographical or manuscript conventions. Apart from the types of features mentioned above, we also take into account the representation of figures and tables, critical apparatus, and parallel texts.
The tags given allow for descriptive markup; for cases where it is not possible, or even desirable, always to mark the underlying features, simple presentational tags are also given. There are also some general suggestions as to the representation of typographical features and layout; a fuller treatment of physical description remains a topic for further investigation during the next development cycle of the project.
Any text to be used for scholarly purposes must include or make provisions for a reference system, which makes it possible to refer uniquely to specific points in the text. Suitable reference points are provided by the tags for structural features like those mentioned above, or they may be taken over from the printed original (in the case of conversion of printed texts to machine-readable form). We discuss different solutions and also suggest methods for establishing cross-references within texts and inter-textual links.
We take written conventions as a starting-point in much the same way as they have been used for orthographic transcriptions of speech. But distinctions made in writing should not be carried over slavishly. Just as writing has made possible new types of texts (compared with speech), so will the electronic media (and have already). Our aim is to provide mechanisms for text representation appropriate to the electronic age.