Licensed under
created electronically
TEI P5 Progress Report, 2005
This report covers P5-related activities at Oxford between November 2004 and January 2005. We have found it hard or impossible to distinguish between work specifically charged to the Meta workgroup and work of an editorial nature in describing general progress towards P5. It seems more sensible simply to report to the TEI Council what has happened.
Following the Members’ Meeting in November, at which the suggestion
of a
Council members will recall the decision made in Ghent to implement
a complete change in the way cross-referencing and linking is
done at P5. The Guidelines themselves contain several hundred
cross-references and thus provide an excellent test for the practical
implications of that decision. Using a combination of XSLT transforms,
emacs macros, and hand editing, we converted all
- the ODD documents defining schema content models for affected elements and classes
- the text of the Guidelines themselves (much of it manual)
- the embedded examples in the P5 Guidelines
- the test files
- the XSLT stylesheets which process the ODDs to generate TEI P5, and which also form part of the general TEI processing library
- the Roma application which creates schemas
All these changes have been completed, but not all of their consequence have yet been thoroughly tested. In particular, we suspect that there may be more consequences yet to be discovered arising from the now-pervasive use of XPointer references throughout the XSLT stylesheets.
Converting the Guidelines themselves, after the (relatively simple) change had been made to content models, was largely automated. Conversion of embedded examples was more complex, as these were present in the text in two forms: as validatable embedded XML source, and as non-validatable CDATA marked sections. After some argument, we resolved to convert as many as possible of the latter to the former. Of over 1800 examples, only 68 were found to need to remain as unvalidatable CDATA (for example, because they were intended to demonstrate invalid XML). Several thousand genuine markup errors were found as a result of this process. Though laborious, the process of hand-correcting them all means that the entire TEI Guidelines text is now valid against a 3-pass validation process:
- a check against a Relax NG schema (using three separate parser implementations: jing, rnv and xmllint), each of which validates the main text and checks that all the examples are well-formed
- a separate validation check with a full Relax NG schema for each of the examples individually (this is managed using the Namespace Routing Language implemented in jing)
- an XSLT script which checks class membership of elements; this is necesary because class membership is not implemented using xml:id, and consequently errors here would not otherwise be detected.
Stage 2 caught many instances of duplicate IDs. It is a feature of xml:id that all values must be unique across the whole document, whatever namespace is used. This means that 3 examples in a row which use
to make some point cause a validation error. Whether this is desirable, acceptable, or plain wrong caused a heated debate between LB, SB, and SR; but in the end all IDs were made unique by means of tedious hand-editing.
Converting the XSLT stylesheets and Roma was mostly
straightforward, with one notable exception. The problem is that W3C
Schemas cannot declare elements or attributes in more than one
namespace at a time. One schema has to import another if
validation of a multi-namespace schema is required. Unfortunately, the switch to using
As noted above, the first Open Source release of TEI P5 took place in mid January 2005. The CVS repository at tei.sf.net now holds the master version of:
- The ODD sources of P5, the necessary tools for processing them, and test files
- The XSLT stylesheets which process P5 and other TEI P4 and P5 documents
- The P5 internationalization data
- Roma
- TEI Emacs customizations
- Example TEI extensions
Derived from these, we also provide
Roma has been checked again, and updated to work with xml:id and xml:lang as needed, and a number of bugs fixed in the underlying XSLT stylesheets.
Some of the superceded chapters in TEI P5 have now either been removed
from the source, or been marked with a strong
A small number of minor changes, some of them originating as SF feature requests were made in the text of P5 itself. Notable examples include:
- implementation of a
graphic element, separating reference to physical images from the concept offigure and generalization of the latter to include e.g. nested diagrams or figures - introduction of the
choice element as a replacement forjanus style tagging
During this period, we received and integrated a first revision of the SH (independent header) chapter; further work on extending this is anticipated. This was the only substantive contribution to the new draft received from any TEI workgroup or affiliated body in the last 5 months.
Although we now have a stable and self-consistent release of TEI P5, work is very far from complete. We identify at least the following tasks remaining:
- Review class membership, and element content models, to rationalize and simplify
- Review and re-implement where needed
characterful attributes - Review and dispose of outstanding feature requests from SF list
- Replace all remaining references to DTD-based modular system
- Completely rewrite ST to reflect new modular system
- Revise other chapters marked as needing revision in edw81
- Reconsider gross organization of P5 (e.g. are CO and HD too long)
We are concerned that this process is in danger of losing direction and motivation, as well as being unduly protracted. We need to find better ways of generating input for the revision from interested parties and more efficient ways of acting upon such input when it is received.
We suggest that Council may wish to consider replacing the current Meta
workgroup with a new