Next | First | Previous TEI meets Unicode 14

WSD and related questions

  • The current WSD is broken:
    • it requires subdoc, which is not available in XML
    • It assumes a glyph registry
    • It duplicates documentation (e.g. entity tables need to be maintained separately)
    • It can not be processed in a viable way by existing processors
    • It bundles language, orthography(script) and character identification
    • Unicode provides properties for its characters, therefore the WSD does not need to provide them
    • It is too unflexible in the character properties it allows to define
    • It is required, but this requirement is not enforced by the DTD.
  • Requirements for a new Charset Extension Mechanism to replace the WSD.
    • Limited scope: identify only differences/additions/extensions to the Unicode Database of properties.
    • Make it optional
    • Placed in the header, not subdoc, not auxiliary document.
    • Ensure that the constructs are actually useful and usable in the processing of the document.
    • Dont assume a straightforward relationship between Characters and glyph as in Latin and East Asian scripts, allow for n to n mappings of characters/strings to glyphs/glyph sequences.
    • Do not attempt to produce a imaging model, glyph description language or some such. Stay within the scope of TEI.
    • The unbundling of language and writing system might require a mechanism to identify the writing system in TEI documents, for example a new global attribute for this purpose.