ODD Development Working Group DH2012 Meeting Notes


James Cummings2013-01-23T12:04:05

freely available

Was originally a google doc

James Cummings2013-01-23T14:36:54

DH 2012 ODD Development Meeting Executive Summary

2012-07-21 09:31 to 17:33 and 2012-07-22 09:15 to 16:00

Present: Brett Barney, Syd Bauman, Piotr Banski, Lou Burnard, James Cummings, Bertrand Gaiffe, Stefan Majewski, Trevor Muñoz, Brian Pytlik Zillig, Doug Reside, Raffaele Viglianti, Ron Van den Branden, Peter Stadler, Sebastian Rahtz, Laurent Romary (skype)

Durand Conundrum:c.f. http://foxglove.hypotheses.org/368

  • TEI ODD uses a subset of RelaxNG, but we don’t document what that subset contains
  • Some things hard to express: e.g. non-deterministic content models
  • “If you are going to use RelaxNG to express content models, why not use RelaxNG for everything?” (Or W3C Schema, etc.)
  • One way to resolve: Yup, do everything in RelaxNG.
  • Another way: reinvent RelaxNG subset in TEI ODD
  • RelaxNG also used for datatypes, not just content models
  • Implement as TEI: possibly use existing elementRef/classRef/moduleRef in more general way (and with min/max/etc.)
  • classRef with new attribute to say whether it is repeatable, etc.
  • new elements for sequence optional, alternation, etc.
  • And need a way to say ‘there is a text node here”.. what is difference between this and any other datatype?
  • Can anyone explain what ODD can do that RelaxNG can’t do? Only argument is really if ODD is more expressive or seriously easier to use. Using islands of relaxNG with ODD means not full-blown RelaxNG and so it protects you. Buys you clarity and possibility for more user-friendly ways. Novice TEI users need to learn _another_ language. (But that could be good for them.) Same concepts have to be learned. Tool support better for RelaxNG, we shouldn’t isolate TEI. Would need very precise guidelines of TEI’s use of RelaxNG. Would still need extra step of assembling documentation from the RelaxNG. RelaxNG designed as schema language and customisation, not as much as a documentation language. “The problem with schema languages is there are more than one.” With ODD you aren’t just documenting the structure (etc.) but the underlying ontology of the TEI Abstract model (and could generate a very different schema technology).
  • Schematron another example of escaping into islands of another language; Some parts of the Guidelines are just recommendations in prose not expressible in RelaxNG. So even when generating output will still need it.
  • Using ODD is much more user-focussed, implementation independent view of conceptual model. An abstraction.
  • Co-occurrence constraints like in Syd’s presentation are good argument for having it in one language.
  • We won’t reimplement all sorts of things which are minor standard aspects (xpath, regexes, etc.) as there are different layers for different purposes.
    • But (as stefan argues) that there are multiple ways to do the same thing (ODD/RelaxNG/Schematron)
  • MEI uses some conditional statements, and so sees no problem using schematron.
  • Can express co-occurrence constraints in current ODD by doing what is currently tag abuse. Let’s formalise that abuse to make it accepted.
  • If we decided to completely reverse the TEI and implement in RelaxNG, it would take lots and lots of time. (Should that be factored in?) All schema languages fail to provide good methods for consistent regular documentation. If we believe in literate programming then turning it on its head probably a bad idea.
  • There are proponents of different schema languages, thus we’ll be a meta-schema you can generate your preferred one. Not just a language of its own for no reason, it is an intentional abstraction. W3C/RelaxNG overlap and ODD expresses that overlap. It produces and manages the intersection between these schema languages. But you can do wonderful things in ODD that you can’t in these schema languages, mostly in documentation.
  • RelaxNG: Bit of documentation, bit of schema, bit of documentation, bit of schema? Documentation only linked to the element specification by ‘neighbourliness’.
    • c.f. DocBook doesn’t have much of notion of customisation other than on the module level
  • If we constrain the language we can help with ODD customisation to warn about conformance at level of content models
  • Although TEI is mostly aimed at DH audience, ODD is used by others who want to create markup languages. If we had more rigorous specification/test suite/etc. that would help those.
  • Re-expressing everything in ODD might fill a lacuna of a documentation/schema language, in the same way xptr/XPointer did.
  • ODD as a high-level language that we’re compiling into a slightly lower level language. This compiling is always the case as the tools that use RelaxNG are doing something similar.
  • Would DocBook use ODD if we updated it?
  • Conceptually your customisations of the TEI we can analyse as an object in itself.
  • We might want to express things in ODD which aren’t able to be done in any individual schema language, so tying ourselves to any individual one a bad idea? e.g. stand-off annotation, constrain annotation segments to particular places in the document
  • Would be good to get a better sense of where the payoffs are down the road. Closing the loophole not that much of a payoff. It is in user-focussed, education, learning of TEI. Cost, is more moving the TEI into a world that relies on our own community’s tools/implementations.
  • BB: From user perspective schematron is what I’d like to get rid of;
  • <content> wouldn’t contain this new markup? Currently macro.schemaPattern, relaxNG. Proposing to invent new element which is a new alternative.
  • Better test suite, better, clearer, processing model, means not just SPQR as implementor.
  • Recommendation: Needs more testing, and discussion, but generally in favour of moving to wholly TEI-based language. Does not necessitate move to P6 (but major P5 version possibly). (Solving it other way, relaxNG, would necessitate move to P6)
  • The Crystal Maze
  • LR presented crystals
  • LB, SR, & SB think we should investigate whether <specGrp> and <specGrpRef> are sufficient (LB & SR think they will be sufficient, SB is suspicious …) probably work for simple case, but will they do for more complex use-cases.
  • Need for “crystal library” how to address and chain together predefined chunks of declaration
  • Need for subclassing elements (and bringing in their documentation); bringing in <desc> and <gloss> etc.
  • In general in TEI we defer evaluation of pointers until as late as possible, so what order of precedence takes place in evaluating pointers since one may affect the others. (Need better specification of ODD.) (Xinclude being another option since that is how specGrpRef… still needs customisation.)
  • When using a crystal you are doing so within its own context/scope… but this is mostly an implementation/specification aspect. How do you decide what to do with conflicting crystal customisations / priority.
  • Problem if you take two contradictory crystals. Currently all pointers, rather than grab a copy of it. (If modified would need a different way to refer to that customised version)
  • suggestion of @context on specGrpRef to take multiple xpaths?
    • or perhaps only take a @type value
  • Recommendation: Good idea, but may already to be done, given a library structure. Need more use cases and attempts at formalising. LR to try existing mechanisms, and see if that solves the crystal problem. Then look at whether @context addition will be useful, if even possible in RelaxNG
  • SPQR attempted a notation for this: <schemaSpec xmlns:rng=”http://relaxng.org/ns/structure/1.0″ source=”file:///usr/share/xml/tei/odd/p5subset.xml” ident=”SPQR-crystals”>

<!–

first set up a group/crystal which references <monogr>, and

changes <title>

–>

<specGrp id=”JCmonogr”>

<elementRef key=”monogr”/>

<elementRef key=”title”/>

<elementSpec ident=”title” mode=”change”>

<model>

<pcText/>

</model>

<attList>

<attDef ident=”special” ns=”http://www.example.com/foo” mode=”add”>

<datatype>

<rng:text/>

</datatype>

</attDef>

</attList>

</elementSpec>

</specGrp>

<!– the next group references <analytic> –>

<specGrp id=”SBanalytic” >

<elementRef key=”analytic”/>

</specGrp>

<!– the main biblStruct group references the first two groups,

some more elements, and changes att.global –>

<specGrp id=”SPQRbibl” xmlns=”http://www.tei-c.org/ns/1.0″>

<p>My view of TEI’s biblstruct</p>

<specGrpRef target=”#JCmonogr”/>

<specGrpRef target=”#SBanalytic”/>

<elementRef key=”monogr”/> <!– not an error even though I get it from JCmonogr –>

<elementRef key=”biblStruct”/>

<elementRef key=”publisher”/>

<elementRef key=”author”/>

<classRef key=”att.global”/>

<classSpec ident=”att.global” type=”atts”>

<classes mode=”replace”/>

<attList>

<attDef ident=”n” mode=”delete”/>

<attDef ident=”rend” mode=”delete”/>

<attDef ident=”rendition” mode=”delete”/>

</attList>

</classSpec>

</specGrp>

<!–

Now we can reference the biblStruct group in two ways.

Firstly, same as current specGrpRef macro inclusion,

where the redefinition of <title> and att.global are global

–>

<specGrpRef target=”#SPQRbibl”/>

<!–

Secondly, we pull in the same group,

but this time as a local copy, with a scope of the

element defined by @context –>

<!–

change needed to TEI: add @context to specGrpRef

(defaults to ‘global’)

–>

<specGrpRef target=”#SPQRbibl” context=”listBibl bibl”/>

<!–

Thirdly, a new group references the biblStruct

one, but overrides one of the changes and adds a new

element

–>

<specGrp id=”dougsbibl”>

<specGrpRef target=”#SPQRbibl”/>

<elementRef key=”pubPlace”/>

<elementSpec ident=”title” mode=”change”>

<desc>new description</desc>

<attList>

<attDef ident=”special” mode=”delete”/>

</attList>

</elementSpec>

</specGrp>

<specGrp id=”rp”>

<elementSpec ident=”p” mode=”change”>

<content>

<rng:text/>

</content>

</elementSpec>

<elementSpec ident=”note” mode=”delete”/>

</specGrp>

<!–

It is possible that we might want to

be more precise about the context, using

XPath syntax

–>

<specGrpRef target=”#SPQRbibl” context=”back/listBibl”/>

<specGrpRef target=”#SPQRbibl” context=”text[@type=’special’]”/>

<specGrp id=”rab”>

<elementSpec ident=”ab” mode=”change”>

<content>

<rng:text/>

</content>

</elementSpec>

</specGrp>

<specGrpRef target=”#rp” context=”teiHeader link”/>

<specGrp id=”bg”>

<specGrpRef target=”#rp” context=”p”/>

<specGrpRef target=”#rab”/>

</specGrp>

<specGrpRef target=”#bg” context=”body”/>

<specGrpRef target=”#rp” context=”div[@type=’foo’]”/>

<specGrpRef target=”#rp” context=”model.pLike”/>

<specGrpRef target=”#rp” context=”div.foo fileDesc”/>

</schemaSpec>

Areas of ODD2 which are not expressive enough but which can be fixed without a new paradigm

  • Module inter-dependency (if you ask for module X, also bring in module Y). In some cases elements in one TEI module (A) require that another TEI module (B) is loaded because the content models of an element in A directly requires something that is defined in B (e.g. a class or element) but no method exists to indicate this module inter-dependency. This sort of dependency can, of course, happen inside a single module as well with a content model explicitly referring to a particular element which is then removed. Some concept of dependency of references needs to be implemented. (difference between moduleSpec and specGrp discussed, elements claim membership in modules, specGrp’s point to elements). Module dependency on nested module subset?
    • modules are just groups of elements: and an element cannot appear in more than one
    • the desire to have a group of usefully coexisting elements is better satisfied by having predefined schema specs (ODDs); an element can appear in more than one such group
    • a specGrp can also be referred to from more than one such group
    • There are ~482 direct references to elements in content models
    • recommendation: Multiple options – either allow elements to be in multiple modules; or content model has to somehow mandate that this element is really really required. (But what about choice between two elements? — have to get both of them) Here be dragons. (leave well alone pending Crystal Store of Magical Woo Woo specGrps.)
  • Bertrand’s model classes questions: at class member declaration time, also declare cardinality: <memberOf key=”model.duckLike” minOccurs=”1” maxOccurs=”3”/>.<classRef key=”model.duckLike” minOccurs=”1” maxOccurs=”3”/>recommendation: Would be solved if LB’s durand conundrum solution is adopted. But adding cardinality is desirable.

Per-element attribute-based customisation

  • Desc: The ability to customise the desc, valList, and other aspects of an attribute inherited from a class on a per-element basis. For example giving different suggested values for @type on an element, or a more specific description of an attribute when used on a certain element.
  • Suggested solutions: This has been agreed by council already as http://purl.org/tei/fr/3415801 but is not yet implement
  • Changing <valList> seems unproblematic, changing <desc> might cause more problems because likely to be an ontological change.

Areas of ODD2 which are not well-enough defined

  • the processing model.
    • Needs to be much better defined.
  • what order are things worked on? (are deletions processed first?)
  • there is no defined test suite for an ODD processor:
    • How do we validate a schema?
  • is the DTD/RELAXNG/XSD output defined or can a processor choose different methods?
  • No. Is intersection between RNG/XSD a requirement? (How do you prove you have catered for this in whatever schema language you’ve chosen?) Can you implement only part of it? (limited subset for DTDs for example)
  • what documentation generation options are supported? how to document customizations? (examples?)
    • When you’ve modified element content models do not output original examples? Or exclude invalid examples? Or offer user switch or warning?
    • What should a processor do with <specGrp rend=”blah”/>
  • ODD processing settings (Example – where to put <sch:ns/> in non-TEI ODDs?)

ODD-processing which is unimplemented at present

  • Chaining of ODDs (a customization of a customization)
    • Conformance still ok, but have to track back to original TEI.
    • Agnostic about processor method, but ‘patches’ in order
    • Implementation will have to keep track of where changes came from
    • (Should attDef record what customisation came from?)
    • is @source pointing to output of previous customisation or its ODD?
    • @source must provide a set of TEI declarations usable by an ODD processor
  • Per-element attribute-based customisation at source (specialized @type on div)
  • Supporting multiple customizations of the same element (in different areas)
  • Declaring an attribute class and then customizing it straightaway
  • Can you have both the definition of an object and a change of that object immediately.
  • Generating XSD natively
  • poor cousins that we don’t support directly

Problems which do not seem amenable to ODD as it stands and need a new version

  • Durand Conundrum: Express content models as native TEI?
  • Subclasses (element <foo> has a simpler content model if its in the header)
    • Should support in ODD, with crystals?
  • Attribute/content interdependency (if child::foo, then attribute self::*/@y must be used)
    • also if @y then child::foo
  • Alternation of attribute/element (either child <y> or attribute @y)
    • Is there enough demand? Implement in schematron
    • mei:measure containing child::note/@slur or chid::slur
  • Expressing XPath-based constraints (Schematron) in a TEI language with embedded documentation
    • Modify constraintSpec to allow tei documentation? Would have to preprocess to remove tei:* from the schematron.
    • Child of Durand: SB — bad move, let’s not reinvent schematron. (But then why Durand?)
    • Are there simple rules that could be implemented as simple attribute values
    • One advantage of native TEI version of schematron would be that the names of elements are abstracted by the customisation. (eg change name of ‘p’ to ‘para’ and schematron will still work)
    • Middleground between durand and schematron. Use same attribute-value-template {$foo} that XSLT uses? Plausible suggestion, but low priority.
  • Short-cut content models based entirely on model classes (<classes><containerOf key=”model.divLike” mixed=”true”/></classes>)
    • SB: Really bad idea, insanely bad.
    • Optional alternative way to do it, removes ordering
    • classes container expresses relationship to everything above/below
    • SB: Order, cardinality, alternation all expressive and important
    • LB: case for this where there is mixed content, simplifies content models
    • Probably better done in the Burnard Resolution
  • Subclasses:
  • Make a subclass a divs, containing necessarily 2 heads, … must exist within a particular context otherwise it would fight with the other div.
  • Forking <div> to <chapter> but remembering its inheritance so new attributes are inherited.
  • ODD as is: How can we improve support/training/community/ODD collections?
    • More flagging of power of schematron and pointers to tutorials on how to use it
    • More ODD tutorials, step through website
    • Schematron (and lots of other things) Roma doesn’t do, should we be explaining these? Or getting better version of Roma?
    • On TEI website, you get RelaxNG content model but not link to elementSpec as a whole.
    • Should ODD have ‘I want a new element that is a fork of tei:foo or a clone of tei:foo’ (fork inherits, clone static copy)
    • Corpus of sample ODDs more accessible? Can get examples from Roma, if you select it then save it, but this is not obvious.
    • Various ppl teaching, link to their online materials more clearly?
    • ODD as term used in different ways, maybe be more consistent
    • Most useful effort seems to be on front-end Roma development — perhaps most useful thing that Council could do with its resources
    • But, good documentation editing, that means you need a full TEI?
    • The sample templates of ODDs need to be better designed. By council? By Community?
    • SM: Abolish tei_all as it is poisonous in not encouraging people to customise/document their encoding? Important part is document modelling, documentation of encoding practices.
    • Get more users doing customisations and using ODD to do so.
    • Roma as bottleneck if we encourage users to start with tei_bare
    • TEI Conformance is hidden in the guidelines and could be foregrounded…part of Roma…
    • More circular documentation of the TEI, from document modelling, customisation, encoding, circular process
    • Same problem with set of customizations available in oXygen as from front page of Roma … should we be tucking those from wiki into these lists also?
    • index of collection of ODDs by what elements they include
  • Exploration of crystals
    • We went through the above Crystal Maze section and looked at possible implementation (difficult) and what changes mean and what constitutes an error.
  • paper/whiteboard-based exploration of Lou’s solution to Durand Conundrum
    • probably don’t need <pcData> but @allowText? But you might want element, element, pcData, element? consensus: Better to have <pcData> but different name.
    • Issues to discuss:
      • key= to other namespaces
      • why not include <attRef>s?
      • more thinking on <macroRef> (? datatype)
      • <pcdata> should be called <textNode>
    • LB agrees to go away and implement an ODD for this, then look at
  • Group-writing grant application
    • Agreed good idea for TEI community members to put in a bid for rewritting Roma, but with a backdoor workpackage on modifying the underlying language
  • If we had the money, what is the best thing to spend it on:
    • ODD Processor? (And what sort of processor?)
    • New ODD Editor? (funding more likely?)
      • Consensus was that a new ODD editor was probably more fundable and more likely to provide benefit to the overall community.
      • But it was thought that any new editor should be flexible enough to cope with the planned additions to the language.
      • It was suggested that it would be beneficial if this was not created/maintained by Oxford or the TEI-C itself but as a community effort… but all agreed that this was hard to generate.
  • Other thoughts
    • Better ways of visualising the information that is in an ODD
      • Visual modelling methods; ODD as intermediate layer between UML and schemas
      • And to see how they have customised the TEI / see differences
      • ODD/Roma for educational purposes
    • Ability to combine ODDs together
      • Grabbing ‘crystals’ from any ODD/customisation (not so flat customisation)
    • Implementation
      • ODD needs to be more well specified / robust as a language
      • Need for re-implementation of separate ODD processor
      • What tools do we need?
    • Note: Target audience not in this room: make easy things easy
    • Documentation side of ODD
    • Mappings to semantic web / LOD
    • MEI: Module inter-dependency
    • <constraintSpec> mechanism for inserting Schematron a bit cumbersome
    • Easier ways to reduce content model but maintain conformance: Learning RelaxNG as well, and maybe schematron, etc.
    • ontology reference of TEI
    • The relation of ODD to Standoff annotation