MULTEXT/EAGLES - Corpus Encoding Standard
Document MUL/EAG-CES 1. Title page. Version 0.1. Last modified 11 December 1995.
Copyright (c) Centre National de la Recherche Scientifique, 1995.
This document is only a draft and should be cited as such. Creators of
WWW documents pointing to it are warned that its content and location may change
without notice. This document is provided as is without any express or implied
warranties. While every effort has been taken to ensure the accuracy of the
information contained, the authors assume no responsibility for errors or omissions,
or for damages resulting from the use of the information contained herein.
Permission is granted to make and distribute verbatim copies of this document for
non-commercial purposes provided the copyright notice and this permission notice are
preserved on all copies.
This document is the first version of the MULTEXT/EAGLES Corpus Encoding Standard (CES). The CES has been designed to be optimally suited for use in language engineering research and applications, in order to serve as a widely accepted set of encoding standards for European corpus work. The CES is an application of SGML (ISO 8879:1986, Information Processing--Text and Office Systems--Standard Generalized Markup Language), strongly influenced by and in broad agreement with the specifications of the TEI Guidelines for Electronic Text Encoding and Interchange of the Text Encoding Initiative. The CES specifies a minimal encoding level that corpora must achieve to be considered standardized in terms of descriptive representation (marking of structural and typographic information) as well as general architecture (so as to be maximally suited for use in a text database). It also provides encoding specifications for linguistic annotation, together with a data architecture for linguistic corpora.
The CES is being developed in a bottom up fashion, starting with minimal specifications and expanding based upon feedback resulting from its use, and the input of the research community in general. We invite and encourage all comments and discussion of any aspect of the CES.
This document results from joint effort of the European projects MULTEXT (LRE), MULTEXT-EAST (Copernicus) and EAGLES. CNRS has supported the integration effort.