Received: by UICVM (Mailer R2.03B) id 9751; Thu, 19 Apr 90 18:43:42 CDT Date: Tue, 17 Apr 90 15:47:00 GMT Reply-To: Text Encoding Initiative - Text Representation Committee list , Lou Burnard Sender: Text Encoding Initiative - Text Representation Committee list From: Lou Burnard Subject: Delayed contribution from Cencioni To: Michael Sperberg-McQueen From: R. Cencioni To: L. Burnard Date: 30/3/90 Subj: Contributions to draft TEI guidelines Please find enclosed our contribution. Draft Contribution from CEC (R. Cencioni, C. Devillers and M. Casarella) The following sections of the draft Guidelines are covered by this document: 6.7.10 Numbers and Dates 6.5 Formulas, Tables and Figures 7.3 Office documents Note for the editor: 1) a general question arises when defining tags: are the tags to be designed in a way that text content is not affected (no text is embedded in the tag) or not? For example, to encode the originator of an office document, is ' Cencioni' sufficient or do we want to write 'from: Cencioni' ?? The latter seems more appropriate for the TEI environment, since it does not alter the text content. 2) for sake of clarity, the names of the tags/features proposed in this paper, as well as the corresponding attributes, are rather long. It is good practice in SGML to shorten them to say a max. of 6-8 characters. ! 6.7.10 Numbers and dates [recte 6.3.10 -MSM] Just like names or abbreviations, numbers and dates can occur virtually anywhere in a text. They can be classified in two classes: class 1: numbers - cardinals - ordinals - fractions - percentages class 2: dates - fully specified dates - partial dates - range(s) of dates These 'objects' are particular in the sense that i) they can be written either with letters (e.g. twenty-one) or with digits (e.g. 21), and ii) their presentation (appearance) is language dependent (e.g. 5th in English becomes 5. in Greek; 111,745.15 in English equals 111.745,15 in French). Their handling can be rather problematic in NLP/MT applications, where fully automatic recognition is normally required. For these applications, some sort of standardization is extremely helpful, since it allows to delimit the feature in the text while providing an appropriate encoding of its value. A clean, general solution to the problem is to mark-up these types of numeric material with one tag and to use the 'type' and 'value' attributes to store i) the type, and ii) the value of the feature in a standard form. We propose to adopt the feature for numbers and the feature for dates. ! 6.7.10.1 Numbers The value of the 'type' attribute can be cardinal, ordinal, fraction, percentage. If other types are felt to be necessary, they can be tagged in the same way. The 'value' attribute acts as a placeholder, for the storage of the actual value of the numeric string. TAGS: NO Number. The optional attribute TYPE identifies the type of numeric string according to the aforementioned classification; the optional attribute VALUE allows to record the value of the string in a standard format. Examples: twenty-one 1.5 1,5 ten percent 10% 5th one half 1/2 ! 6.7.10.2 Dates There is only one (optional) attribute ('value'), which follows the ISO/R 2014 standard format to encode the value of the date: - the year is specified first, with four digits, - the month is specified next, with two digits, - the day follows with two digits, - every field is delimited by a hyphen. If necessary, the content portion of a date can be marked up with three separate features: year, month and day. TAGS: DATE: the date feature. The optional attribute VALUE holds the ISO standard coding for the date. YR: the year feature, MO: the month feature, DAY: the day feature. Example: .. onFebruary 21 1980 .. or alternatively: .. onFebruary 211980 .. Another optional refinement would be to fully mark up the constituent elements of the date as numeric strings: .. on the 21th of February 1980 .. Partial dates (e.g. .. in September 1990 ..) can be catered for by setting to zero the corresponding field in the VALUE attribute. Some applications may well require an explicit reference to the calendar system in use (e.g. b.C, a.C, Chinese calendar, etc...). For these applications, the ISO standard format could be extended to encompass an explicit reference to the system in use (NB: we should refer here to the latest version of SdR's crystals). !