CNRS

MULTEXT/EAGLES - Corpus Encoding Standard
Document MUL/EAG-CES 1. Annex 1. Version 0.1. Last Modified 14 December 1995





Annex 1 - Relevant standards




| Back to main document |


Copyright (c) Centre National de la Recherche Scientifique, 1995.

This document is only a draft and should be cited as such. Creators of WWW documents pointing to it are warned that its content and location may change without notice. This document is provided as is without any express or implied warranties. While every effort has been taken to ensure the accuracy of the information contained, the authors assume no responsibility for errors or omissions, or for damages resulting from the use of the information contained herein.

Permission is granted to make and distribute verbatim copies of this document for non-commercial purposes provided the copyright notice and this permission notice are preserved on all copies.


This is a list of standards referred to in the CES or relevant to text encoding generally.


Document encoding

ISO 8879:1986

Information Processing--Text and Office Systems--Standard Generalized Markup Language (SGML)

ISO/IEC DIS 13673:1993

Information Technology -- Text and Office Systems -- Conformance Testing for Standard Generalized Markup Language (SGML) Systems

TEI P3:1994

Sperberg-McQueen, C.M., Burnard, L. (Eds.) (1994) Guidelines for Electronic Text Encoding and Interchange, TextEncoding Initiative, Chicago and Oxford. Available online at

<URL:http://etext.virginia.edu/TEI.html>

ISO/IEC DIS 10744:1992

Hypermedia/Time-based Document Structuring Language (Hytime)

ISO 12083

Standardized SGML document type definitions for books, articles with tables, formulaes, etc.

ISO 8601:1988

Representation of dates and times.

"This standard defines a lot of details of the calendar. E.g. the ISO definition of the week numbers is that the first day (day number 1) of a week is Monday and that the first week in a year (week number 1) is the week that includes the first Thursday in January, i.e. the first week that has at least four days in January. Other definitions are, e.g., that hours of a day are counted from 0 to 24 and that the international notation of dates is the Bigendian format year-month-day, e.g. 1993-04-17 and that for time is e.g. 20:36:04 (hh:mm:ss). There are also string formats for computer applications specified that have to represent date and time in files and protocol packets. (See

<URL:ftp://ftp.uni-erlangen.de/pub/doc/ISO/ISO8601.ps.Z>
for a very detailed summary.)"

ISO 4217

Codes for the representation of currencies and funds

ITU-T/CCITT Recommendation E.123

Notation for international telephone numbers (a '+' followed by the country code, followed by a space, ...).


Language and country codes

ISO 639:1988

Code for the representation of names of languages

Provides two-letter codes for about 140 languages and is intended primarily for use in terminology, lexicography and linguistics.

The list is available online at
<URL:http://www.stonehand.com/unicode/standard/iso639.html>

ISO 639-2:1995

Code for the representation of names of languages--Alpha-3 code

Three-letter codes for the representation of names of languages for information interchange", developed by a Joint Working Group of ISO TC37/SC2 and TC46/SC2. Covers a wider range of the world's languages than ISO 639.

The list is available online at

<URL:http://www.stonehand.com/unicode/standard/cd639-2.html>

ISO 3166:1993

Codes for the representation of names of countries

This standard defines a 2-letter, a 3-letter and a numeric code for each country on this planet. E.g. US/USA/840=United States, DE/DEU/276=Germany, GB/GBR/826=United Kingdom, FR/FRA/250=France, ...). The 2-letter codes are well known in the Internet as top-level domain names. The 3-letter versions are often used at international sports events.

Internet-Draft of HTML 3.0 [LANG attribute]

The current Internet-Draft of HTML 3.0 (29-Mar-95) provides a LANG Attribute, whose value is composed from the two letter language code from ISO 639, optionally followed by a period and a two letter country code from ISO 3166., e.g. "en.uk" for the variation of English spoken in the United Kingdom

<URL:http://www.hpl.hp.co.uk/people/dsr/html/CoverPage.html>


Character sets

ISO 646.IRV:1991

Information Processing -- ISO 7-bit coded character set for information interchange [=ANSI X3.4-1986]

ISO-8859

Information Processing -- 8-bit Single-Byte Coded Graphic Character Sets -- Part 1: Latin Alphabet No. 1, ISO 8859-1:1987. Part 2: Latin alphabet No. 2, ISO 8859-2, 1987. Part 3: Latin alphabet No. 3, ISO 8859-3, 1988. Part 4: Latin alphabet No. 4, ISO 8859-4, 1988. Part 5: Latin/Cyrillic alphabet, ISO 8859-5, 1988. Part 6: Latin/Arabic alphabet, ISO 8859-6, 1987. Part 7: Latin/Greek alphabet, ISO 8859-7, 1987. Part 8: Latin/Hebrew alphabet, ISO 8859-8, 1988. Part 9: Latin alphabet No. 5, ISO 8859-9, 1990.

ISO/IEC 10646-1

Information technology - Character sets and information coding -Universal multiple-octet coded character set - Part 1 - Architecture and basic multilingual plane

GLOSIX 0.1

EAGLES Tools subgroup. DOCUMENT MUL/EAG--LSD1 Version of December 1995.
Guidelines for Linguistic Software Development - Draft proposal

<URL:http://www.lpl.univ-aix.fr/projects/multext/LSD/LSD.html>

UNICODE 1.1

"The Unicode Standard, Version 1.1": Version 1.0, Volume 1 (ISBN 0-201-56788-1), Version 1.0, Volume 2 (ISBN 0- 201-60845-6), and "Unicode Technical Report #4, The Unicode Standard, Version 1.1" (available from The Unicode Consortium, and soon to be published by Addison- Wesley).

[This character set is identical with the character repertoire and coding of the international standard ISO/IEC 10646-1:1993(E); Coded Representation Form=UCS-2; Subset=300; Implementation Level=3.]

CNRS

NAVIGATOR

| Top | Main document | MULTEXT | EAGLES Text Representation subgroup | LPL