Where do I put edw77 then?
1082805310 | 95949 You should create a new module "editors work " module in CVS if you want to manage edw77 there [rahtz] |
1081987135 | [nobody] |
The Guidelines should explicitly address a suggested method for local documentation of the project's encoding "handbook". See http://listserv.brown.edu/archives/cgi-bin/wa?A2=ind0110&L=tei-l&D=0&P=3788
1095681563 | 95949 I agree that any project should have a handbook, but I don't see that the TEI should mandate how to do it. If you use Roma to make the extension, you can use the ODD document it creates as a useful container; it can be processed to make a subset of the Guidelines. I dont see this as a feature request [rahtz] |
1094253640 | 1021146 What do you have in mind here? It's definitely a good idea, and roma makes some significant steps towards making it a lot easier for people to generate project specific documentation of their schemas, but a manual of project usage is not something I'd have thought we can legislate for. Other than to agree emphatically that it's a good idea! [louburnard] |
Hello. If I do something like: java com.thaiopensource.relaxng.util.Driver hamlet.rng hamlet.xml on test files included in P5 distribution, everything seems to work fine. But I cannot figure how to execute the Makefile, since my system does not know of jing and friends. A few words of explanation would help. Kind regards, Yves
1095529070 | 95949 I am sorry, this should have been answered a long long time ago. That set of test material does assume a certain setup, The Makefile was really there for my internal purposes. We are about to replace the test suite with something more rational, and with a demonstration setup. you can get jing from www.thaiopensource.com, if you need to in the short term again, apologies Sebastian Rahtz [rahtz] |
The <respStmt> element in P4 is far too permissive. At a minimum it should be element respStmt { attlist.respStmt , tei.Incl*, ( name, tei.Incl* ), ( ( resp, tei.Incl* )+ | ( ( name, tei.Incl* )+, ( resp, tei.Incl* ) ) ) } although perhaps even the tei.Incl class is too permissive here. See attached paper for detailed discussion.
1095441047 | 1021146 What are these tei.Incl elements doing in here anyway? If <respStmt> is in the header, they shouldn't be allowed! I suggest a content model of ((name, resp?) | (resp, name?))+ [louburnard] |
The current <figure> element has a content model which permits all sorts of textual things within it. P4 makes clear that the intention is that these should be transcribed from the image whose presence the <figure> denotes. The image itself is indicated (in pure P4) by an entity attribute pointing to an external graphic entity, or (in practice) by a URL attribute. There is no wrapper element combining a graphic with its heading etc. There is no way of embedding graphic content expressed using SVG or even the TEI tags for trees within a <figure>, though there is a <figDesc> "surrogate" for the graphic information. I propose to address these concerns as follows: 1. Introduce a new core element called <graphic> with a URL attribute (or whatever the TEI Workgroup on Standoff finally decides it should be called). 2. Introduce a new wrapper element to hold text content of a graphic, called <figText>, with content model mle the current <figure> content model. 3. Introduce a new class tei.figContent, with members <graphic>, <eTree>, <eg> (or whatever we decide to call that thing), <figText> ). 4. Redefine the content model of <figure> to be (%(tei.figContent), figDesc?, head?) What I don't know is how to add an SVG element into the mix. Ideally, I would like to add <svg:something> to the tei.figContent class. Suggestions?
1095683097 | 95949 I suggest that <figure> be recursively allowed within <figure>, which makes a lot of things easier. Then the model would be more like (tei.figContent | figure)+, head? [rahtz] |
A new <email> element should be permitted as a child of <address>.
1096038809 | 612078 Why not include a single generic URL/ICQ/IM/Other element with a descriptive attribute? <address> [normal address stuff] <resource type="url">http://purl.org/cummings/family/</resource> <resource type="email">james-spam@cummingsfamily.org.uk</resource> <resource type="jabberID">JamesC@jabber.org</resource> [etc.] </address> Not wonderful I agree, but this at least allows for the wedge that you mention to expand infinitely. -Jamesc [jcummings] |
1096038780 | 612078 Why not include a single generic URL/ICQ/IM/Other element with a descriptive attribute? <address> [normal address stuff] <resource type="url">http://purl.org/cummings/family/</resource> <resource type="email">james-spam@cummingsfamily.org.uk</resource> <resource type="jabberID">JamesC@jabber.org</resource> [etc.] </address> Not wonderful I agree, but this at least allows for the wedge that you mention to expand infinitely. -Jamesc [jcummings] |
1095785134 | 1021146 OK. This might also help deal with the FAQ "where do I put the PURL for this resource when I am making its header". See my reply to Stuart Yeates' query on TEI-L [louburnard] |
1095681092 | 95949 I agree, the content model of <address> should a class, of which addrLine and email are members [rahtz] |
1095440497 | 1021146 Thin end of wedge imho. What about phone number, fax number, ITIN, ICQ id... At present <address> has specific semantics of snailmail physical address, which is why its content model is addrLine+ Better to define a tei.address (or something) class, I think, with members address and email and whatever else comes alongs. [louburnard] |
Please add a <figBody> child to <figure>, which in turn can have <tree>, <eTree> or similar elements as children.
1094056965 | 1021146 Thank you for the example. I am about to submit a proposal for an enhanced <figure> element which I think will address this and other cases. [louburnard] |
1092643795 | 950793 Here is an example: <figure> <head>An example</head> <figBody> <eTree> <eLabel>S</eLabel> <eTree> <eLabel>Int</eLabel> <eLeaf> <eLabel>hello</eLabel> </eLeaf> </eTree> <eTree> <eLabel>N</eLabel> <eLeaf> <eLabel>world</eLabel> </eLeaf> </eTree> </eTree> </figBody> </figure> A typical formatting of this <figure> element would be: S /\ Int N | | hello world Figure 1: An example In P4, a <figure> can have the following children (apart from globally included elements): <head>, <figDesc>, <p>, and <text>. According to chap. 22, none of those is appropriate as a container for the *graphics* (e.g., <eTree> or SVG content) itself. [nolda] |
1092606853 | 1021146 Please could you expand this suggestion, preferably giving a usage example? The <figure> element does of course have content. As I asked in my earlier note, would it not be easier to represent a <tree> using the SVG vocabulary? [louburnard] |
In linguistics, examples, glosses, or parts thereof are often prefixed with symbols like "*", "?", "?*", "(?)", "#" etc., specifying the type or degree of (un)acceptability. (Unfortunately, there is no consensus as to the inventory and exact interpretation of these symbols.) These specifications could be easily represented in TEI by an optional "accept" attribute on <mentioned>, <gloss>, and <seg>. The value of the attribute should directly give the symbol instead of some fixed meta-value like "yes" or "no".
1095873016 | 1021146 We agree that this should be investigated further, and probably involving definition of a new attribute class in the certainty module. [louburnard] |
1095640947 | 686243 While the whole idea is new to me, my first reaction is that this is a fine idea, and Lou's right, should probably include quite a few elements in the class of elements that gain this attribute. On the other hand, it is worth spending at least a little time and effort considering other mechanisms for acheiving this goal before settling on one method. Possible examples follow, not at all thoroughly thought through, rather just thrown out to demonstrate that there are a lot of possibilities. * A new <acceptability> empty element which would operate much like the current <certainty> (and which might fit well into the certainty module) which would bear target=, given=, degree=, and desc=. (And rend= for the actual symbol; or perhaps the symbol should be the value of degree= if a more precise degree is not known.) * A new <accept> element which operates somewhat like <sic>, but has degree= to indicate the level of (un)acceptability. * Using ana= to point to an element that describes the elements (un)acceptability. It also occurs to me that this problem isn't so dissimilar to the problem of wanting to demonstrate how *not* to do something by putting incorrect grammer, erroneous computer code, or (perhaps the worst for us) invalid XML into an <eg>. [sbauman] |
1093211184 | 950793 The purpose of the proposed "accept" attribute is to encode a meta-linguistic acceptability statement on some linguistic entity. Of course, an acceptability statement is more than a rendition specification of the entity it applies to (but see below). Unfortunately, linguists often record acceptability statements by vague and partly undefined marks, though. While "*" is usually interpreted as 'unacceptable', the interpretation of other notations like "?", "?*", "(?)", "#", "%", "**" etc. are rarely specified explicitly. Thus, someone encoding or quoting a pre-existing linguistic text with undefined acceptability marks has to encode their original form--i.e. their original rendition. The proposed "accept" attribute could of course be applied to additional elements besides <mentioned>, <gloss>, and <seg>--namely to all elements representing some linguistic entity, including rule-like entities (recall the Generative practice of attaching acceptability marks--or, rather, grammaticality marks--to rules or parts thereof). By the way, the <seg> element may be useful for marking up acceptability statements with a restricted scope, for instance: <mentioned>Bill loves/<seg accept="'*">love</seg> Mary.</mentioned> [nolda] |
1093198681 | 1021146 Intriguing suggestion! Would you not also want it for <eg>, <q>, <cit> etc? though, e.g. to mark spurious sentences and quotations? The certainty module provides attributes which might be of relevance to this application. Again, if the only purpose of the proposal is to affect the rendition of the output, I think the REND attribute is a better example. [louburnard] |
At present, the elements you need to record information about real people in a P5 document are scattered across several modules. The Corpora module defines <partic> and <particDesc> for participants in a transcribed text. These allow for any number of so-called "demographic" subelements, such as <birth>, <occupation>, <residence> etc. The Names and Dates module defines <persName> and various subcomponents for names of people, but resolutely eschews any attempt to describe reality: it's onomastic, rather than prosopographic, by design. The new Manuscript Description module defines a <listPerson> elementand a <person> element to fill it. This contains some of the same elements as <partic> but also adds some. For example, <birth> is in the corpora module, but <death> is in the MS one. (Not really surprising, since the people manuscript describers are interested in are usually dead, whereas the people corpus encoders are concerned with usually aren't) We could resolve this confusion by creating a new standalone module, or by groiuping all related elements in one place. More precisely: (a) we could define an entirely new module concerned with prosopographic information, defining <listPerson> and <person> and their children, and remove <partic> and <particDesc> from the corpora module (b) we could do essentially the same but add the new elements to the core module (c) we could do essentially the same, but add the new elements to the names and dates module The reactions to this proposal on TEI-L (9-10 may 04) so far suggest that option (c) is preferred.
1095530909 | 686243 I like the idea, and like others on TEI-L don't like (b). I lean slightly towards (a) a new separate module, over (c) adding to Names & Dates. However, the division of various detail elements (those for names and "corporate bodies, dates, events, places, objects, roles, techniques" etc.) into modules needs to be attacked as a project of its own. It may be a good idea, e.g., to separate names and dates into different modules. My first instinct is that Bruce Robertson's HEML is a good place for the TEI to look at as a resource when building this new prosopography module, but that the Guidelines should not just recommend "use heml:Person" and go on. [sbauman] |
1090177670 | [nobody] |
The rend= attribute exists to record the rendition of the source. Since the elements in <teiHeader> are metadata elements created by the encoder, rend= makes no sense (except for on <rendition>, see 1022072). Thus rend= should not be permitted on elements that do not occur as a descendant of <text>, but rather only as a descendant of <teiHeader>, e.g. <encodingDesc>.
1096826154 | 1021146 The issue is not whether or not restriction as such is a good thing. The issue is whether or not it makes sense to remove the ability for people to specify how header elements should be rendered. Both Sebastian and I are of the opinion that it does not. Your assertion that rend "exists to record the rendition of the source" tallies with the strict letter of its definition, but I think there is a fair bit of case law suggesting that people also use it to suggest how they'd like something to be rendered, e.g. when writing new TEI documents. You also haven't addressed my point about other metadata like elements which still carry the rend attribute. [louburnard] |
1096672679 | 686243 Sorry I've been so delayed in responding, Natasha. I'm afraid I don't understand your question, though. Do you mean "why not permit rend= on (e.g.) <encodingDesc>", or "why not restrict people so that they can't put rend= on (e.g.) <encodingDesc>"? I see no reason not make this restriction (except that it's a lot easier for the editors to manage a single truly "global" pot of attributes than one set that is "global" and one that is "global except for header-specific elements" :-), so I'll briefly answer the former. One of the reasons to have a shared vocabulary for XML documents is so that we can have a shared understanding of what the markup means. But if there is markup that (sort of definitionally) doesn't make any sense, it's hard to have a shared understanding. There exist vocabularies with well defined semantics for altering the presentation of XML documents — they're called stylesheets. For structured data that we create (e.g., the <teiHeader>), stylesheets are more than adequate. (The same is not entirely true of data that we capture, of course.) So the answer to "why restrict people?" is a) to help them not shoot themselves in the foot, and b) because the restriction helps create a more coherent encoding scheme, with a clearer distinction between data and metadata. [sbauman] |
1096667582 | 1124399 Honestly, I do not understand why not? sorry if i miss the logic. [natashasmith] |
1096321188 | 1124399 Honestly, I do not understand why not? sorry if i miss the logic. [natashasmith] |
1095867197 | 1124399 Honestly, I do not understand why not? sorry if i miss the logic. [natashasmith] |
1095681300 | 95949 I agree with Lou. Why restrict people? [rahtz] |
1094253042 | [nobody] |
Wouldn't it be consistent with naming scheme the rest of the document to have layoutNote as children of layoutDesc?
In 1.2 it says, that <origDate> has no other attributes than those globally available. In the formal description there's tei.datable listed. Besides this: when msHeading was removed, where there any suggestions where to put the information of origDate instead? As phrase-level element it can be used everywhere but where do I find (and put) the information about "the whole manuscript", the way many catalogues provide the information? (And German catalogues HAVE TO!)
1096540347 | 1125795 PS: The same question as for origDate arises for origPlace. origDate and origPlace might be put in <origin>, but how to deal with multiple origDates in a document, e.g. in history/origin and in bindingDesc and in msPart? And where goes <textLang>? [schassan] |
Hi, concerning the documentation of the mansucript description element: the element watermarks is missing both in 1.6.1 Support, list of subcategories, although mentioned in example 1.65, and in the formal definition of support. Greetings, Torsten Schassan
1096546542 | 1125795 I generally agree with your ideas of NOT having an element of its own for watermarks anymore because from the cataloguing practice we can see that these informations are often combined with the material in <support>. Anyway: How do you see the problem that the watermark does not only consist of the former <term>, now <watermark>, but also of the <ref> which enables the exact identification? Isn't the connection somewhat like in <cit><note><xptr/></cit>? How can we represent this? And: How do we deal with lists of references mixed with dates like in the following example? Wasserzeichen: Dreiberg PICCARD VII 2317 (1477, 1478), 2319 (1474-1478), Ochsenkopf PICCARD XI 259 (1476), 342 (1470-1475), XIII 188 (1473-1475), ähnlich XIII 677 (1476), XV 360 (1474-1475), 515 (1467-1471), XII 851 (1470-1472). Shouldn't watermark be of the class "datable"? Watermarks are important tools for dating the manuscript. [schassan] |
1096537834 | 1125795 I generally agree with your ideas of NOT having an element of its own for watermarks anymore because from the cataloguing practice we can see that these informations are often combined with the material in <support>. Anyway: How do you see the problem that the watermark does not only consist of the former <term>, now <watermark>, but also of the <ref> which enables the exact identification? Isn't the connection somewhat like in <cit><note><xptr/></cit>? How can we represent this? And: How do we deal with lists of references mixed with dates like in the following example? Wasserzeichen: Dreiberg PICCARD VII 2317 (1477, 1478), 2319 (1474-1478), Ochsenkopf PICCARD XI 259 (1476), 342 (1470-1475), XIII 188 (1473-1475), ähnlich XIII 677 (1476), XV 360 (1474-1475), 515 (1467-1471), XII 851 (1470-1472). Shouldn't watermark be of the class "datable"? Watermarks are important tools for dating the manuscript. [schassan] |
1096322691 | 1021146 This will be corrected in the next draft. While you;re there, what do you think the content for <watermarks> should be? Can you supply us with some examples? [louburnard] |
Please add a general purpose element for theorems, definitions, and similar displayed text blocks. This element, say <theorem>, should have the following content model: (head?, p+) Its attributes include besides global ones: * "type" (e.g., "theorem" or "definition") * "typeN" (the theorem number, as in "Definition 3") The "n" attribute of <theorem>, however, should be reserved for specifying running numbers of displayed blocks in general. Consider the following example: <theorem n="12" type="definition" typeN="3"> <head>Multiplication</head> <p>...</p> </theorem> which could be rendered as: (12) Definition 3 (Multiplication) ...
1096444643 | 950793 The problem is that, according to my view, <head> should be optional: <theorem n="12" type="definition" typeN="3"> <p>...</p> </theorem> corresponding to: (12) Definition 3 ... The 'bare' label "Definition 3" can be generated from the "type" and "typeN" values. [nolda] |
1096322306 | 1021146 But surely this kind of numbering scheme (where individual items get a special number, as well as the general sequence numbering) is not specific to theorems? A picture might, for example, be Figure 123 with the title "(3) A moonlit scene" Would a <label> within the <head> not be a more generic approach to solving this kind of problem? [louburnard] |
1096295005 | 950793 Regarding Syd Bauman's comment on the proposed "typeN" attribute: As my original example shows, theorems can have up to two numbers: a theorem-type related number ("3" in by example above) and a common running number for displayed elements like formulae, examples, and theorems ("(12)"). That may be bad style, but can nevertheless be found in real-world papers. So, if the encoder (or the author, for that matter) wishes to specify both numbers of the theorem explicitly, he needs two attributes for that purpose. [nolda] |
1095873158 | 1021146 I wonder if this <theorem> thing is a possible child of the redefined <figure>, along with <eg>? [louburnard] |
1095654684 | 686243 I agree that a theorem needs someplace to go in the TEI scheme. For the time being (i.e., for encoders using P4), probably an <ab type="theorem"> is the way to go. I think it's worth trying to collect the set of things that tend to be treated in the same way (not part of the running prose, but part of the main text unlike a <note>; generally occur at a specific spot in the text, although may float and be referred to from the main text; may have a heading, and may be referred to be a number or other label; often rendered by being indented on both sides) and seeing if we can extract any commonalities. (Andreas has already done this for us, to some extent, with "theorem" and "definition". :-) This might permit a single element to handle them all (perhaps with a type= attribute). I don't see the need for the typeN= attribute nor restricting the semantics of the global n= attribute for <theorem>. [sbauman] |
1093199298 | 1021146 Thank you for the clarification. There are analogous structures in e.g. legal and linguistic texts, so I think I agree that it might be helpful to add this concept to the TEI's world view. I'm not sure where to put it though. [louburnard] |
1093019315 | 950793 Theorem-like entities are well-established structural units in scientific (in particular, logical and mathematical) texts. (There are several LaTeX implementations providing environments for them.) Semantically, theorem-like entities are theoretical sentences (definitions, axioms, theorems, proofs, etc.). They are neither quotations (so <q> is not appropriate) nor examples (which could be tagged as <eg>). Unlike <formula>e, 'theorems' are often formulated in natural language. In general, 'theorems' are formatted as a block. Normally, a 'theorem' has a label such as "Definition" or "D" specifying its type. In many cases, canonical references are constructed from the label and a proper counter ("Definition 3" or "D3"). Sometimes, 'theorems' are numbered instead by the general counter used for all numbered displayed block elements--including <formula>e--(e.g. "(12) Definition"). There are also cases, where both numbers are given ("(12) Definition 3"). Finally, 'theorems' can have an optional header, supplying some characterization or 'nick-name' for it, for example: "Definition 3 (Multiplication)" or "Theorem 5 (Gödel's Incompleteness Theorem)". [nolda] |
1093016704 | 1021146 What is the defining characteristic of a "theorem"? If it's simply that it has to be formatted in a particular way, maybe it would be better to specify "theorem" as a value for the rend attribute on some more generic element such as <q> or <eg> or <formula> ? (OK, I know, <eg> isn't in P4 yet...) Alternatively, if <theorem> is actually intended to have some semantics, could you spell them out a bit for us? Not the same as the existing <formula>, I assume. [louburnard] |
Hi again, it would be useful to allow the use of the attributes of layout (columns, ruledLines and writtenLines) in subdivisions of layout (p?, span?). Thus it would be possible to specify more detailed where to a certain layout aspect applies in the codex. Right now I have to encode: <layout columns="2-3"> <p>1r-v zweispaltig, 2r-10v dreispaltig.</p> </layout> It would be possible to write this: <layout columns="2-3"> <span columns="2">1r-v zweispaltig</span> <span columns="3">2r-10v dreispaltig</span> </layout> An even more complicated example: (the short form without locus and dimensions) <layout writtenLines="35-43">Schriftraum: 1r-202v: 19-19,5 x 10 cm, 35-37 Zeilen; 205r-398v: 20,5-21,5 x 10-12,5cm, 38-43 Zeilen. </layout> Here, it would be useful to be able to encode: <layout>Schriftraum: <p writtenLines="35-37">1r-202v: 19-19,5 x 10 cm, 35-37 Zeilen;</p> <p writtenLines="38-43">205r-398v: 20,5-21,5 x 10-12,5cm, 38-43 Zeilen.</p> </layout> Additionally it would be helpful to have seperate elements as well for the description of the written space as for the way of ruling. Greetings, Torsten Schassan
1096331070 | 1125795 But the DTD does not allow to use multiple layout elements! I suggested elements instead of attributes because I see various "functions" within layout: as we want to describe the number of lines, the columns, the written space the pricking etc, in my opinion layout calls out for subdivisions as any other element with more complexity does. Description of pricking might be wanted to do as well. Within CEEC we used an element called preparationOfThePage. Do you think this would fit? [schassan] |
1096322950 | 1021146 It seems to me that the way to do this properly might be to define multiple layout elements, associating each one with the appropriate locuses to which it applies. This is not too difficult a modification of the current model. Your suggestion that we should include description of written space is noted. How about prickings? [louburnard] |
In the current draft chapter on the manuscript description there might be an inconsistency and although changed, I still have a need concerning the former msHeading: In 1.1 Overview it is said, that the msDesc might contains msIdentifier, head, msContents and so forth. Example and formal description of msDesc state: msDescription.content = ( msIdentifier, p | msContents | physDesc | history | additional | msPart* ) There seems head to be missing, but maybe it needs only clarification. Anyway: In Germany, according to the cataloguing rules, we must supply main author/s (and sometimes title/s of the major work/s) as first information in an catalogue entry. In the form proposed the element head shall not contain author or origDate or OrigPlace, like it was used in MASTER. The alternative p's don't make clear the function of them for the catalogue entry in general. Therefore, either the p's should have the type attribute required to provide something with the function of the former msHeading in the P5 or something like msHeading has to be kept. On the other hand: If we keep something like msHeading, I see the need for a group mechanism of author and title within it. The problem: especially in catalogue entries for manuscripts consisting of msParts you may face the situation that you have more than one author/title-combination in msHeading. How do we make clear, which author belongs to which title? Example: taken from Sankt Gallen, Stiftsbibliothek, Codex 658 (as to see at www.unifr.ch/cesg) Titel: (1) Robertus Monachus: Geschichte des 1. Kreuzzugs (bebildert); (2) Ottokar von Steiermark: Österreichische Reimchronik: Fall Akkons. Suggestion: Not a real one, but as far as I can see the use of the attribute 'n' doesn't seem to be very satisfying. Nor do I want to rely on the order of elements in the file. Open to discussion,
1095784942 | 1021146 The Taskforce felt quite strongly that <msHeading> should be removed in favour of a generic <head> element. However, I think you have correctly identified a discrepancy between what the text says and the schema permits, which I will fix as soon as possible. I think the intention of the workgroup was to allow msDescription.content = ( msIdentifier, head*, (p | msContents | physDesc | history | additional | msPart* ) and I will amend the draft accordinly. This would allow you (at a pinch) to mark up your example as <msIdentifier>Sankt Gallen, Stiftsbibliothek, Codex 658 </msIdentifier> <head n="1">Robertus Monachus: Geschichte des 1. Kreuzzugs (bebildert)</head><head n="2">Ottokar von Steiermark:Österreichische Reimchronik: Fall Akkons. </head> However, the Taskforce felt that the <head> element should not contain any more structure within it: a more structured view should be provided by the <msContents>, for example. Your comment about the requirement to satisfy German cataloguing rules is a good one, but we have to recognize that other cataloguing practices exist and may be just as mandatory. It seems better therefore to define a DTD for local practice which enforces them, and which can be automatically mapped to the proposed TEI structure. The fewer options that structure offers for different ways of doing the same thing, the easier that mapping will be. [louburnard] |
Please add an optional "type" attribute to <mentioned>, specifying the ontological or linguistic type of the mentioned entity. Sample value could include: * term (typically rendered in quotes) * symbol (typically rendered as is or in quotes, too) * graphs (typically rendered in italics) * graphemes (typically rendered as "<...>") * phones (typically rendered as "[...]") * phonemes (typically rendered as "/.../")
1096294026 | 950793 So far three different approaches have been suggested on this page: 1. <mentioned type="graphemes">Karl</mentioned> 2. <mentioned><seg type="graphemes">Karl</seg></mentioned> 3. <seg type="graphemes">Karl</seg> (or, "for local usage" <grapheme>Karl</grapheme>) In my view, (3) does not express that "Karl" is mentioned as opposed to being used. <seg> just attaches some type to its content and is neutral with respect to the mention/use distinction. (Cf. a corpus encoder tagging some lemma by <seg> with part of speech-information: he does not imply that the lemma is mentioned, instead of being used, in its context.) As to (2), that method is more verbose than (1). What is more, (2) is also less restrictive than (1). As far as I can see, a string is always mentioned *either* as a sequence of graphemes *or* as a sequence of phonemes *or* as whatever; there are no 'mixed' real-world examples like: <mentioned> <seg type="phonemes">Karl</seg> <seg type="graphemes">kommt</seg> </mentioned> In sum, I'd opt for method (1). [nolda] |
1095637334 | 686243 Maybe I'm being dense, but when is a grapheme or a phoneme or a phone ever actually used as opposed to mentioned? That is, isn't some element (<seg type="phoneme"> jumps to mind for interchange, although one might prefer to have a <phoneme> element for local usage) that indicates its content is a phoneme sufficient without being inside a <mentioned>? Perhaps some examples will help straighten me out. [sbauman] |
1093211290 | 950793 The value of the proposed "type" attribute should *not* specify the rendition. (The examples in my proposal above only gave *typical* renditions.) In fact, the rendition of, say, a <mentioned type="graphs"> element can vary according to style or context (e.g. block vs. inline). Of course, you can always specify the ontological type of the mentioned entity by some subelement. Personally, I'd prefer <seg> to <ident> because <seg> also allows for non-PCDATA content. However, I'd like to argue that an (optional) type distinction on <mentioned> makes sense. A mention always involves mentioning something; and mentioning a word, say, qua graphic entity or qua graphematic entity results in different mentions. Distinct rendition conventions only reflect these type distinctions. [nolda] |
1093197675 | 1021146 If you simply want this facility in order to provide different rendering styles, then the rend attribute should be used. The purpose of <mentioned> is to distinguish cases of "mention" from use. If you additionally want to say something about the ontological status of the mentioned entity, the way to do it is by embedding a more ontologically- specific tag (e.g. <term>, <ident>) within the <mentioned> tag, I think. (And <ident> *does* have a type attribute) [louburnard] |
Please make <interp> non-empty, thereby allowing for specifying 'values' containing additional markup.
1096291436 | 950793 Personally, I am in favour of Syd Bauman's first, direct way of providing a value. The second, indirect approach would create another indirection: say, from an "ana" reference to an <interp> element and from the latter's "value" to a look-up table. [nolda] |
1095872725 | 1021146 I suggest that the content of <interp> should be the macro.glossSeq, or some subset thereof. Also that <interp> probably wants to join <note> in my proposed tei.pervasive class [louburnard] |
1095591624 | 686243 The value= attribute of <interp> and <span> is one of the thornier ones to deal with in our quest to rid TEI of "content textual attributes". I see two possible ways of thinking about the value of value=. First, we might think of it as being used to provide, directly, the interpretation of the indicated element(s). Second, it might be thought of as a key which is used to look up the actual interpretation, possibly in the human readers' mind. If the first is the case, it obviously makes sense to make the value= attribute a child of <interp> or <span> instead. Possibilities for the child include: * Zero or more characters, without markup. Does not satisfy Andreas's request for allowing markup inside the value. * Paragraph-type content (characters plus phrase level elements, i.e. the P5 equivalent of %paraContent;). * A new element, e.g. <value>, created for this purpose. Besides very clear semantics, this has the advantage that if <value> is allowed to repeat, the elements <interpGrp> and <spanGrp> could be dropped, in favor of permitting <interp> and <span> to contain themselves. The content of <value> would be permitted to contain phrase-level markup, giving it more expressive power than an attribute, and satisfying Andreas's request. * A <p> (or perhaps <seg> or <ab>), or perhaps one or more of them. This has the advantage of using an already existing element. In all of these cases the intent is that the interpretation is written out long hand, as it were. So one can imagine rather than just "foreshadowing", an interpretation with more detail: "foreshadowing <name key="LS">Luke</name>'s return to <place key="YP">Dagobah</place> in <rs key="SWE6RotJ">episode 6</rs>." One can easily imagine, of course, that folks would use this for other things, e.g. for commentary on the interpretation itself, or for explanations of why other interpretations were dismissed, etc. It is not at all clear to me whether such use would be a good thing or a bad thing, an argument in favor or against this permissiveness. If the second possibility (that the value of value= should be thought of as a key used to look up the interpretation) is the case, then the value of value= (e.g., "aftermath") should not be thought of as a complete interpretation itself, but rather as a key with which the user can determine the interpretation, e.g. via a computer table look-up or by knowing what the word means in a natural language (sort of a table look-up in the brain, as it were). The actual interpretation could be spelled out somewhere ("the section of the narative following the climax which describes the negative results of the protaganists' actions" -- OK, I'm not a literature scholar, you get the idea) or left open to interpretation. In the former case, the value itself is just a string; it may be purposefully designed to resemble a word in some natural language, but as far as computer systems are concerned could just as well be a random sequence. In the latter case some mnemonic string is used to jog a reader's memory, as it were. In neither of these cases does using internal markup seem to be necessary of even make sense. On the other hand, if value= is just a key into a (computer or mental) table lookup, why not call it key=? For that matter, why not permit it to actually point directly to the full interpretation (by being a URI)? In which case the full itnerpretation could contain any kind of markup you like, i.e. might even be in a markup language other than TEI. [sbauman] |
The group that has been working on bibliographic entries for TEI makes the following proposal. Please see the attached text file.
1095872245 | 1021146 This is a proposal to revise the content model for biblStruct to facilitate better automatic processing and management of citations. It needs closer attention by the Council, but I think it is basically a good idea. [louburnard] |
I would like to suggest adding a metadata element to the header (maybe under profileDesc?) This element would have a series of keys and values; e.g., <profileDesc> <metadata> <metadataItem> <key>Provenance</key> <value>Oxyrhynchus</value> </metadataItem> <metadataItem> <key>Location</key> <value>Sackler Library</value> </metadataItem> </metadata> </profileDesc> This would allow people to use their inhouse scheme, which may not always be compatible with what TEI already has, or extensions like MASTER. I attach a file (a work in progress) which is an attempt to encode a Graeco-Roman papyrus manuscript. You will see at the bottom of the file my <div type="metadata"> hack to include metadata that I need for the digital library system I am using (Greenstone). It seems to me that this data should be in the header. I did try using the existing conventions, but gave up. Tim Finney
1095912018 | 1047169 Who am I to keep on with this after SB politely says it is a bad idea and SR and LB both say to use another namespace? Nevertheless, I lunge once more. Allowing viral elements might not be a bad idea. It all depends on whether you believe in creation or evolution. With creation, God designs the thing and it works. God's creating agents (LB, SB, SR and, to a lesser extent, the SIG servants) understand the whole design and can see what is good and what is evil. Nevertheless, being somewhat less than God, they are not omniscient: there might be things that are good that they haven't thought of yet. When mere mortals try to use the design, they don't have the knowledge of good and evil. They abuse tags, call for things that are already there, etc. Perhaps the best thing is to encourage them to use the thing, and thereby learn it better. But they are discouraged when something that they want now does not appear to be there. They might try to use an inferior design or even come up with their own. (God forbid!) With evolution, viral elements are what make the whole thing go. The process takes far longer and there are a lot of sorry mistakes along the way. The end result might have useless appendixes. Nevertheless, advantageous elements get in and natural selection weeds out the rest. Oh well. Back to the point. I bow to your collective wisdom. If I may beg your indulgence, some of us novices get frightened when we hear words like "namespace" (my ignorance is showing). Maybe P5 could have a primer with a working example on how to do this? [tfinney] |
1095872039 | 1021146 The general question of how to integrate the TEI header with other metadata frameworks needs to be discussed and has been raised with the Libraries SIG. It will also come up at the Members Meeting. The specific suggestion here of allowing for "probationary" viral elements within the TEI header is an interesting one. I agree with Sebastian, however, that such material probably belongs in a different namespace. [louburnard] |
1095871933 | 1021146 The general question of how to integrate the TEI header with other metadata frameworks needs to be discussed and has been raised with the Libraries SIG. It will also come up at the Members Meeting. The specific suggestion here of allowing for "probationary" viral elements within the TEI header is an interesting one. I agree with Sebastian, however, that such material probably belongs in a different namespace. [louburnard] |
1095684595 | 95949 I'd recommend using a different namespace for this rogue/probabationary metadata [rahtz] |
1095665394 | 1047169 Thank you for what you have written, Syd. The arguments you put forward are no doubt incontravertible (sp?). Even so, I remain a heretic and still want the open-ended metadata hack. Here is why (using "I" vicariously for other TEI novices who know only in part but not yet in full): (1) I don't know what metadata my employer eventually may want -- the requirements change from time to time; (2) I want to be able to do quick and nasty hacks then do it the right way later if worth it; (3) Sometimes a little bit of tag heresy is a good thing. Who knows, you might find this hack will encourage more people to use TEI? (Sly enticement.) Afterwards, keepers of the true faith can lean on the reprobates to purify their TEI (i.e. suggest new tags, have them ratified, then recast their hacky docs to reach the ideal of full interchangability.) (4) The metadata hack is typically for use with in-house systems, which tend to be short-lived. Anything in the hacked metadata that proves itself worthy of being kept through the ages can be given a canonical TEI home. Maybe think of the metadata hack as a probationary area where potential header-type tags can be tried out? By the way, I set out to use Dublin Core, but gave up trying to bend the metadata requirements of the present papyrology project into the DC shape. Papyrologists want to be able to use a lot of metadata fields (see APIS, for example) -- provenance is merely an example of the desiderata. Things are still in a state of flux. It's the old 20 80 rule -- 20% of the features will be used 80% of the time, etc., etc. [tfinney] |
1095541736 | 686243 While there is a part of me that likes the idea of permitting open-ended metadata, another voice tells me it's a bad idea. The logic, which I admit is not well thought out, let alone perfect, is that if you have a need for a Provenance field in your metadata, either a) no one else knows about it, so if it's in general-purpose metadata elements it won't get used nearly as much as a defined bit of metadata (although, admittedly more than none, which is the flaw in my logic), or b) others know about it and agree it is useful metadata, in which case I'd prefer that we come up with a particular place in <teiHeader> for it. (I.e., you should be making an argument for a <provenance> element somewhere in <teiHeader>.) Note that the suggested construct <metadataItem> <key>Provenance</key> <value>Oxyrhynchus</value> </metadataItem> contains the same information as <provenance>Oxyrhynchus</provenance> and clearly one could be transformed to the other quite easily. The advantages, IMHO, of using the latter rather than the former (even if it requires a project to customize the TEI schema) include * that it can easily be validated that the key is "provenance" not "provennance" (yes, there are methods of validating the generic, but not using schemas generated by the TEI system); * that the new metadata item can be put somewhere in the header that makes sense, e.g. be a child of <publicationStmt>; * that there is a predefined mechanism (the TEI ODD customization system) for documenting its semantics. Especially if P5 codifies methods for using other metadata standards (e.g. Dublin Core, METS, EAD, whatever) as I believe the Library SIG would like to, I don't think generic metadata is a great idea. [sbauman] |
I am familiar with TEI lite. When the standard is converted to a Schema, I strongly recommend that you provide a very detailed tag library. Schemas are very difficult to read and interpret. A good model for documentation would be the EAD tag library at http://www.loc.gov/ead/tglib/index.html. Each tag is defined, and possible parent and child tags for the given tag are listed. Examples are also provided. The EAD Tag Library also has a good explanation of linking tags and attributes. The DTD for P4 was relatively easy to search and interpret. The Schema and DTD for P5 are much more difficult. I also would suggest that you provide detailed information about what schema validators will work, provide detailed information on how to install and configure them and provide information on where they can be obtained. Thanks
1085063540 | 1021146 This comment appears to come from someone who hasn't actually looked at the documentation for P5, since it does provide exactly the kind of "tag library" feature being requested, as indeed did P4. The suggestion to provide more information on schema validators is an interesting one. Such info in the Guidelines would be out of place, because rapidly outdated; it would however be very useful on the TEI Software page, and we should not lose sight of this suggestion for that purpose. This being an anonymous comment, I'm not sure where it goes from here, so I am going to close it off. Lou [louburnard] |
The status= attribute of the <teiHeader> element currently defaults to "new". It should not have any default value. Most encoders do not explicitly specify a status= of <teiHeader>, as it is not required, and for many people isn't useful. Then sometime later the encoder comes along and updates the header, without even thinking of status=. Now, because the default for status= is "new", an XML parser will report that the header has not been modified, when in fact it has. This could easily be avoided by not having a default value for status=. (Note that status=, about which the Guidelines say very little, would appear not to be needed when date.created= and date.updated= are used.)
1095871373 | 1021146 I have removed the default value for this attribute [louburnard] |
1095708072 | 1124399 I completely agree with this proposal, and would certainly give some real life examples when the status="new" has never been changed, when it should have been. [natashasmith] |
1094253709 | 1021146 I agree. In fact I think we should get rid of as many default values as we possibly can, not just this one. I wonder if anyone will notice. [louburnard] |
There exist content objects that I would like to have in my TEI file that are neither part of the original work being encoded per se, nor are really part of the "metadata" that is the <teiHeader>, either. Examples include: - the <castList> of a drama if no cast list appeared in the source; - the keyed list of names that occur in the document[1] - <timeline> elements - <linkGrp> or <joinGrp> elements - <note> elements (other than those in the <notesStmt>) that one chooses not to encode in-line or in-place.[2] These elements belong inside the main <TEI.2> element, but do not belong inside the <teiHeader> or the <front>, <body>, or <back> elements. It is arguable whether they belong inside the <text> element or outside. This suggestion is for an element (called <hyperDiv>; the name <ldb> has also been suggsted, for "link data block"; alternatively, it might just be an <ab>, although that would make appropriate constraints difficult or impossible) which would occur as an optional single child of <text> before <front>, whose express purpose would be to hold this sort of thing. Notes ----- [1] Were a project to decide to make key= of <name> an IDREF or XPointer, so that it could point to a database stored in the same instance. [2] By "in-line" I mean the standard OHCO method of encoding <note>s at their anchor point in the text. By "in-place" I mean encoding a note where it appeared on the source page.
1095440606 | 1021146 List of names should probably go in whatever replaces <particDesc> in the header. But I agree something like this would be handy. But why not put it in the header? [louburnard] |
The distributor element is only allowed within the publicationStmt element. However, the distributor is often needed when creating a bibliographic entry with a biblStruct element. The MLA and Chicago styles (and probably others) require the distributor of a book. I therefore request that the distributor element be allowed in the imprint element.
1095871153 | 1021146 I have added distributor to the biblPart class [louburnard] |
1093200126 | 1021146 <distributor> was provided as an alternative to <publisher> within <publicationStmt>, for cases where an item such as a digital text was distributed but not "properly" published. For that reason, it was never added to <bibl> or <biblStruct>, though it is of course available in <biblFull>. It wouldn't be unreasonable to add it to the bibnPart class though, which would make it available within <bibl>. I don't think it belongs inside <imprint>, since an imprint simply records what a title page *says* about the publication of a work, and has nothing to do with the distribution, or necessarily even its actual publication! [louburnard] |
1092880952 | 663081 I would like to add that it would be helpful to allow <distributor> anywhere in <bibl>. [paultremblay] |
Section 6.3.4 of TEI P4 introduces three elements which don't seem to have much in common, except that they are often typographically distinguished in running text. They are <term>, <gloss>, and <mentioned>. Leaving aside the last of these, which I think really ought to be discussed along with its oft-confused friend <soCalled>, I would like to propose (for P5) a more rational way of grouping together <term> and <gloss> and a small number of other similar phrase level elements. My proposal is to establish a new club for them and a few select others, tentatively called the tei.glossy class. The core module would populate this class with <term>, <gloss> and the following other elements: <desc> -- currently defined in both the new "gaiji" module which replaces the old WSD, and the new tagdoc module which replaces the old TSD <equiv> and <altIdent> -- also defined in the new tagdoc module <trans> -- defined in the dictionary module Making these elements all available in the core would (a) make life a lot easier when trying to build schemas -- you wouldn't have to load the dictionary module just to record that a phrase is a translation rather than original (b) reduce the clutter of near synonyms in the Guidelines -- you wouldn't be tempted to make up your own "translated" element Putting them into a class would (a) make clearer the conceptual structure of the Guidelines (b) enable you to add your own near synonym if you really want to! As this is a proposal for P5, I'm posting this as a source- forge feature request as well as to TEI-L. Feel free to send your comments to whichever forum you feel more comfortable with... Lou
1095683347 | 95949 It does not really work as a class, becase there is effectively nowhere to use such a class. Instead, P5 now implements something similar as a pattern called macro.glossSeq which expands to <rng:optional><rng:ref name="altIdent"/></rng:optional> <rng:zeroOrMore><rng:ref name="equiv"/></rng:zeroOrMore> <rng:optional><rng:ref name="gloss"/></rng:optional> <rng:optional><rng:ref name="desc"/></rng:optional> [rahtz] |
In the <tagsDecl>, the default rendition should be formally specified on the rend= attribute of <rendition>; the content of <rendition> should be a prose description for humans. (This allows systems where there is a rend= value that says "take that element's value and use it" to work nicely; besides, it makes more sense.)
1094253534 | 1021146 If you use the rend attribute of <rendition> for anything it should be to say how the <rendition> element is to be rendered, not how the element annotated is to be rendered. So I'm afraid I disagree with this one too! [louburnard] |
Currently P4 says "A <tagsDecl> … must … contain exactly one occurrence of a <tagUsage> element for each distinct element marked within the outermost <text> element". This restriction (that there be exactly n <tagUsage> elements) is unhelpful. There would be nothing wrong with specifying <tagUsage gi="castList" occurs="0"/> in a drama that did not have a cast list, or <tagUsage gi="emph" occurs="0" render="rend.italic"/> in a document in a project with many files that do have italicized words encoded as <emph>, where there happen to be none in *this* file. Furthermore, there is no reason to insist that a project list all the element types used in <text> just to use the <tagsDecl> in order to specify the default rendition of a single element. Thus this restriction should be softened to "contain at most one occurence of a <tagUsage> element for each disticnt …"
1094253368 | 1021146 Why is this restriction "unhelpful"? It's called tag *usage* so it shouldn't record information about tags that are not being used. If you want to know what tags *might* be used, look in the schema! Your comment about rendition suggests that the underlying problem here is that you want a facility for attaching default rendition information to some (but not all) elements. If there is a change to be made to tagUsage I would argue that the change should be to uncouple the rendition information from the usage information completely, since the two don't fit comfortably. Why not make a proposal for a new tagRendition element? [louburnard] |
Tone Merete Bruvik points out that <addSpan> has a type=, but <add> does not. So I poked around a bit and found the following in P4:2004. * <add> does not have type=. * <addSpan> does, but it is only listed in the reference section, not in chapter 18.[1] * <del> does; a note says that type= "should not be used to record the manner in which the deletion is signalled in the source. This should be recorded using the global rend= attribute, with values such as ‘subpunction’, ‘overstrike’, ‘erasure’, ‘bracketed’." * <delSpan> does, but the 'sample values include' has entries about the manner in which the deletion is signalled in the source. This needs to be straightened out. I think that all 4 should bear a type= attribute (listed both in the prose and the reference section) and the <note> that is included with <del> should replace the "sample values include" list of <delSpan>, although I am certainly open to other possibilites. (In particular this would perhaps be considered by some to be an odd semantic for rend=, as it is being specified on an empty element, but is supposed to apply to all the text between this empty element and the one pointed to by its to= attribute.) Notes ----- [1] I.e., the <tagDesc tagDoc="ADDSPAN"> in file //TEI/P4/Odds/p2ph.odd does not list "del" in the value of its atts= attribute.
The placement of <note> elements seems to be too restrictive. E.g., you can place a <note> as a child of <head>, but not in as a child of <opener>, <closer>, or <dateline>; you can place a <note> as a child of <body>, but not of <front> or <back>. Since a <note> may well be used to comment on the encoding of a document, rather than the textual features of the document being encoded, <note> should be permitted just about anywhere. The same is true for <anchor>, since one may want to put only the <anchor> at the spot of interest, and the <note> elsewhere.
1097616111 | 686243 <anchor> is indeed a member of the include model-class ("m.Incl" in P4, "tei.Incl" in P5). It's hard to see why a <note> should not be permitted where an <anchor> is. Why is the point of attachment of a <note> in //TEI.2/text/back/div[@type="notes"] that points to an <anchor> that appears between "</div2>" and "<div2>" more dubious than one that occurs between those tags directly? I'm also not entirely sure why a <note> between "</div2>" and "<div2>" is dubious in the first place. It's a child of <div1> with a <div2> as its left sibling and a <div2> as its right sibling. If it has no target= attribute indicating a point of attachment, then there is some ambiguity as to whether or not it is "attached" to the preceding <div2> or not, but the same is true when a <note> is encoded inside a <p> (especially if surrounded by whitespace :-). While there is a popular convention that permits thinking of the <note> as being "attached" to the preceding textual unit (and whether that unit is a word, phrase, clause, or sentence can often be ambiguous), it is only a convention. And Lou — if you feel disquiet at the idea of a <note> being used to comment on the encoding of a document, you must feel an awful lot of conflict, too. Besides the fact that the Guidelines recommend it (17.1.1), it was you who taught me (circa early 2002) that one can't use an XML comment for this purpose as it won't survive many kinds of XML processing. [sbauman] |
1096040060 | 612078 I think I'd like to be able to comment on XML encoding not in an xml comment. Mostly because some users systems do not retain the comments when processed (as the spec. allows). Now if the processing was always in my control, that would be a different matter. -James [jcummings] |
1095870846 | 1021146 <anchor> is already, I think, permitted anywhere, or at least wherever tei.incl is permitted. I am less convinced of the wisdom of allowing <note> everywhere, because the "point of attachment" for a note is part of its semantics. If you allow it anywhere, then its point of attachment becomes dubious. For example, if a note occured between say </div2> and <div2> then its point of attachment would presumably be the parent <div1>. I bet several people would find that counter-intuitive. However, I think there is a good case to be made for defining a new class (shall we call it tei.pervasive?) for content bearing elements that we want to allow almost anywhere. Then we just have to decide what "almost" means... And finally, may I record my disquiet with the suggestion that a <note> may be used to comment on the encoding of a document. If this means what I think it means, why is it not done with an XML comment? [louburnard] |
1095707876 | 1124399 I fully support this proposal and would like to move it to the highest priority group of issues. [natashasmith] |
1095690752 | 1124399 I fully support this proposal and would like to move it to the highest priority group of issues. [natashasmith] |
With P5, the TEI is committed to a completely new release of the Guidelines, including Schemas, Documentation, Stylesheets and other files, while at the same time also continuing to maintain the current release tree of P4. To faciliate the task of managing, releasing, explaining and using this complex system, I would like to see a well thought out and documented layout of the TEI subtree on the file system. As a model for this, I would like to point to a document from the Debian GNU/Linux distribution, which details how XML/SGML applications are to be handled in the distribution, which is available at http://debian-xml-sgml.alioth.debian.org/xml-policy/xml-dir-layout-file-placement.html The gist of this would be something like the outline below, although the details will of course have to be thought out more clearly. (.../xml/)/tei/ custom/ myTEI/ schema (customized versions from Roma etc. go here) stylesheet doc/ (guidelines go here, maybe also versioned) misc/ schema/ dtd/ P4 P5 rnc/ P4 P5 (and other schemas) stylesheet/ xsl/ P4 P5
1095680937 | 95949 I will release a proposal for this at the members meeting in October, instantiated as a series of Debian packages. I hope this will provide a testing ground to clear up the area. Sebastian [rahtz] |
The rend= attribute exists to record the rendition of the source. Since the elements in <teiHeader> are metadata elements created by the encoder, rend= makes no sense (except for on <rendition>, see 1022072). Thus rend= should not be permitted on elements that do not occur as a descendant of <text>, but rather only as a descendant of <teiHeader>, e.g. <encodingDesc>.
The <castGroup> content model needs to allow <roleDesc>. The attached PNG is an extract from the cast list of Margaret Cavendish’s _A_Piece_of_a_Play_. In it, two <castGroup>s each contain two <castItem>s. However, there is only one role description for each <castGroup>. Ideally, only one <roleDesc> should be permitted in a <castGroup>, but it should be permitted either before or after the series of <castItem>s or <castGroup>s (which may or may not all have <label>s depending on whether feature request 1022100 is enacted). In order to make that clearer, here I have expressed that idea useing straight RelaxNG compact syntax (i.e., not a TEI syntax pattern, as there are no references to the globally included elements nor the TEI class and pattern indirection system). This presumes both a desire for <roleDesc> as described here, and for <label> as requested in 1022100. # maybe label caststuff pairs mlcp = ( ( label, ( castItem | castGroup ) )+ | ( castItem | castGroup )+ ) element castGroup = { head?, ( ( roleDesc, mlcp ) | ( mlcp, roleDesc? ) ), trailer? }
The elements currently available to encode correspondence are insufficient and require workarounds that make the encoding of a simple postscript a major chore. The ways in which letters are written vary fairly significantly over time and across cultures. Modern business letters require the ability to encode letterhead, addresses, attention lines, subjects, reference lines, persons copied, and enclosures (this list come off the top of my head, if it were developed systematically it might be quite a bit longer). I am currently working on a 19th century book of anecdotes that incorporates letters quite frequently as part of an ongoing narrative. I would appreciate elements that would isolate the addressee in an opening <salute> and that would indicate the sender in the closing one. I would like a way to indicate the position held by either of these parties, without getting involved in the depths of names&dates. Memoranda are also rather difficult to encode. I used to work with United Nations human rights documents that included large amounts of memoranda in the immense bureaucratic morasses they called reports.
1093536436 | 258273 I was vaguely aware of the DALF project, but I am starting to study it more closely. It seems to add a lot of good stuff, but doesn't do quite all that I need. I'll be back soon with specific comments/questions. Thanks, Nick [finkend] |
1093473678 | 1110665 Sorry, URI for the Dalf Project is: http://www.kantl. be/ctb/project/dalf/ Edward [edwardvanhoutte] |
1093473501 | 1110665 Please have a look at the DALF project which Lou mentioned and which is Ron Van den Branden's as much as it is 'my' project, and which is entirely financed by the Royal Academy of Dutch Language and Literature's Centre for Scholarly Editing and Document Studies: <http://www.kantl.be/project/dalf/>. The guidelines we wrote for this extension and modification of the TEI are quite extensive and illustrative. We're open for suggestions on how to develop this into a possible new module for TEI P5/P6. At the moment we're preparing our first DALF edition which is a carpus of 1800+ letters between a Flemish author and his publishers from end 19th-beginning 20th century. A small corpus of 80+ letters between the South African authors Lynne Bryer and Daphne Rooke has been prepared by my students and will be published shortly. I hope you'll find something useful in our suggestions and I'm looking forward to work together to 'flesh this proposal out a bit' as Lou put it. Best, Edward ================ Edward Vanhoutte Coordinator Centrum voor Teksteditie en Bronnenstudie - CTB (KANTL) Centre for Scholarly Editing and Document Studies Reviews Editor, Literary and Linguistic Computing Koninklijke Academie voor Nederlandse Taal- en Letterkunde Royal Academy of Dutch Language and Literature Koningstraat 18 / b-9000 Gent / Belgium tel: +32 9 265 93 51 / fax: +32 9 265 93 49 edward.vanhoutte@kantl.be http://www.kantl.be/ctb/ http://www.kantl.be/ctb/vanhoutte/ http://www.kantl.be/ctb/staff/edward.htm [edwardvanhoutte] |
1093368706 | 1021146 Edward Vanhoutte's DALF project at Ghent has developed a set of TEI extensions which includes some (but not all) of these features. It might be worth getting together with them to flesh this proposal out a bit. I will suggest it to them at any rate! Lou [louburnard] |
According to the examples in P4, § 6.10, <extent> can be used for specifying the number of pages of a bibliographic item like a book. Another reasonable usage of <extent> would be, for instance, the specification of the total number of volumes of a multi-volume book. As <extent> does not have a "type" attribute, measure strings like "pp." oder "vols" have to be included into its content. This is unfortunate in cases where bibliographic data shall be stored in a language- and style-neutral way. In addition, <biblScope type="pages"> cannot simply be substituted for <extent type="pages"> because of their distinct semantics. <biblScope> defines the 'scope'--some part--of a bibliographic item (say, a collection) with respect to some subitem (e.g., an article). <extent>, on the other hand, measures the whole (e.g., the collection). So the xml-biblio group (cf. the archives of the xml-biblio-discuss@lists.sourceforge.net mailing list) proposes to add a "type" attribute to <extent> with suggested values similar to those of like <biblScope>'s "type" attribute. Here is an example, using P4's <biblStruct> model: <biblStruct> <analytic> <author> <persName> <forename>Edward</forename> <forename>L.</forename> <surname>Keenan</surname> </persName> </author> <author> <persName> <forename>Dag</forename> <surname>Westerståhl</surname> </persName> </author> <title lang="eng" level="a">Generalized Quantifiers in Linguistics and Logic</title> </analytic> <monogr> <title lang="eng" level="m">Handbook of Logic and Language</title> <editor> <persName> <forename>Johan</forename> <forename>F.</forename> <forename>A.</forename> <forename>K.</forename> <nameLink>van</nameLink> <surname>Benthem</surname> </persName> </editor> <editor> <persName> <forename>Alice</forename> <nameLink>ter</nameLink> <surname>Meulen</surname> </persName> </editor> <imprint> <pubPlace>Amsterdam</pubPlace> <publisher>Elsevier</publisher> <pubPlace>Cambridge, Mass.</pubPlace> <publisher>MIT Press</publisher> <date>1997</date> </imprint> <extent type="pages">1247</extent> <biblScope type="pages">837–893</biblScope> </monogr> </biblStruct>
1097000135 | 686243 Currently (i.e., in P4) <extent> may occur within <bibl>, <biblFull>, <fileDesc>, or <monogr>, but not <biblSctruct>. It can be repeated as a child of <bibl> or <monogr>, but not when it is a child of <fileDesc> or <biblFull>. But <measure> is a valid child of <extent>, and perfectly reasonable for this use, I should think. <extent> <seg type="designation">Text data</seg> <!-- what is this? --> <measure type="wordsize">60,000 words</measure> <measure type="filenumber">1 TEI XML File</measure> <measure type="filesize">123 KiB</measure> </extent> And, it seems to me, we could do a lot better in P5 by putting at least a untit= on <measure>. (See discussion of feature request #980854.) <extent> <label>Text data</label> <!-- ?? --> <measure num="60000" unit="words" stuff="textsize">60,000 words</measure> <measure num="1" unit="count" stuff="files">1 TEI XML File</measure> <measure num="123" unit="KiB" stuff="diskspace">123 KiB</measure> </extent> [sbauman] |
1096299520 | 612078 I wasn't looking at using <extent> inside a <biblStruct>, but inside the teiHeader's <fileDesc>, where one only seems to be (currently) allowed more than one. Since these are all types of extent, though, I'd be more comfortable with them being nested. <extent> <foo type="words">60000</foo> ... </extent> But I am certainly in favour of more type attributes, even if it makes them prone to abuse. -James [jcummings] |
1096292059 | 950793 With a "type" attribute on <extent>, you could say instead: <extent type="words">60000</extent> <extent type="files">1</extent> <extent type="KB">123</extent> ... [nolda] |
1096038025 | 612078 I have experienced people using <extent> for multiple extent-like references, and think the further (optional) structuring of <extent> would be very useful. Currently people do things like: <extent> <seg type="designation">Text data</seg> <seg type="wordsize">60,000 words</seg> <seg type="filenumber">1 TEI XML File</seg> <seg type="filesize">123 KiB</seg> </extent> I'm not saying that is the *right* way of doing things, but obviously there is a demand for recording this kind of information that never seems to fit all together in one place easily. -James [jcummings] |
1093010770 | 950793 "Unit" would indeed be more to the point than "type". "Type", however, is more in line with <biblScope>'s "type" attribute, which serves the same purpose. But one could change that attribute name, too, of course ... That new attribute on <extent> should be an optional one (as is <biblScope>'s "type" attribute). As a consequence, it would still be perfectly legal to use <extent> without any attribute for complex contents like "12 A4 pages bound with 16 assorted sizes photographs". [nolda] |
1092999912 | 1021146 I would have expected the TYPE attribute for <extent> to take values such as "exact" or "approx". If there is a need for an attribute with the meanings suggested here, then I think "UNITS" might be a better name for it. However, as currently defined, <extent> is not really a structured field as it has potential for much wider application than this. For example, it might contain more than one kind of unit (e.g. "2000 files of average size 4000 mega octets", "12 A4 pages bound with 16 assorted sizes photographs" ) You're right to say it's not the same as biblScope tho. [louburnard] |
I propose adding a "class" attribute to elements within the <biblStruct> element as well as to the <biblStruct> element itself. This attribute would be of type IDREFS, and would point to an ID in a <taxonomy> element. It is often necessary to classify works when creating a bibliographic entry. For example, it is necessary to know that an article appears in a magazine: <biblStruct> <analytic> <title>article title</title> </analytic> <monogr> <!--need to indicat a magazine rather than a journal--> <title level="j">Title of magazine</title> ... </monogr> </biblStruct> TEI does not have elements necessary to classify works, such as <genre> or <container-type>. A way to classify works is by using the <taxonomy> in the header. The <taxonomy> is meant to "classify texts," so it seems proper to use it to classify bibliogrphic entries. In order to be able to point to the actual <taxonomy> element, one would need an attribute. I propose that "class" be this attribute. The above entry would then look like this: <!--in header--> <taxonomy> <categoy id="magazine"> <catDesc>magazine</catDesc> </categoy> </taxonomy> ... <biblStruct> <analytic> <title>article title</title> </analytic> <monogr class="magazine"> <title level="j">Title of magazine</title> ... </monogr> </biblStruct>
1092880709 | 663081 As the person who posted this proposal, I would like to withdraw it. The group working on bibliographic entries for TEI has put in a proposal that would make needing this markup unnecessary. [paultremblay] |
I propose that <ptr> be allowed in the <biblStruct>, <analytic>, and <monogr> elements. It is often necessary to point to other elements when constructing bibliographic entries within a <biblStruct> element. For example, if you are trying to describe a review, you need to point to the work being reviewed: <!--the article of the review--> <biblStruct> <analytic> <author> ... </author> <title>This book is work reading</title> <!--point to book being reviewed with its authors--> <!--not valid TEI!--> <ptr target="book-reviewed"/> </analytic> </biblStruct> <!--the book being reviewed--> <biblStruct id="book-reviewed"> <monogr> ... </monogr> </biblStruct> Right now, it is impossible to use either <ptr> or <ref> inside the <analytic>, <monogr>, or <biblStruct> elements. This makes pointing impossible and unnecessarily restricts the scope of <biblStruct>.
1092880851 | 663081 As the person who submitted this proposal, I would like to withdraw it. The group working on bibliographic data in TEI has submitted proposals that would make this proposal on <ptr> unneccesary. [paultremblay] |
Henrik Ibsen's Writings has made several changes and additions to the TEI DTDs regarding the encoding of manuscript changes and manuscript phenomena. In the following we wish to present some of the modifications and encourage the inclusion of these themes in the P5 revision discussion. - <clarification> We've created a new element to record the clarification phenomenon in manuscripts. Ibsen and his copyists some times clarify words or letters either writing upon the already written word/letters or by repeating the word/letters offline. We encode these instances of repeating the same for the purpose of clarification, like this: <clarification hand="HI" place="inline">Henrik</clarification> <clarification hand="HI" place="offline">Henrik</clarification> We believe this is a well known phenomenon for manuscript transcribers, and we think it would be a useful addition in a revised chapter on "Transcription of Primary Sources". - TEI elements for manuscript changes We've made it possible to include almost all kinds of elements in <app> (i.e. in <lem>/<rdg> of course), <add>, <del> etc. and to use <app>, <add>, <del>, <gap/> etc. almost globally in our manuscript transcriptions in order to record manuscript changes in a way that reflects our view of the changes. E.g. we allow <div> inside <add> and <del> to make possible the inclusion or deletion of a complete scene, thus reflecting the change in the document structure more clearly. - The hand attribute We have allowed the hand attribute in <app>, <hi> and <emph>. Including hand in <app> makes it unnecessary to use hand both in <add> and <del> when another hand has revised the text of the manuscript. Using the hand attribute in <hi> (and similarly in <emph>) allows us to record e.g. the red pencil underlining throughout a manuscript otherwise written in black ink. - A new element for substitutions/revision of the <app> element The last years there have been discussions on the TEI-L and elsewhere of the need to modify and revise the <app> element to make it more useful for manuscript changes. Several alternatives have been discussed. At Henrik Ibsen's Writings we have also discussed this, and although we have managed to use the excisting <app> element for the manuscript changes in our material (though with some modifications of what <lem> and <rdg> can contain and the inclusion of some attributes), we would like to encourage the expansion of this part of the chapter on "Transcription of Primary Sources", at least an element for encoding substitutions should be included. We dislike the double role of the <app> structure, and would feel better having one structure for manuscript changes and one for the critical apparatus. Our use of the <app> element for manuscript changes has resulted in us constructing a new element, <tcApp>, for our critical apparatus and text critical notes. Please contact Hilde Bøe (hilde.boe@ibsen.uio.no) at Henrik Ibsen's Writings, if you have any questions or remarks.
- <lg> At Henrik Ibsen's Writings we perform detailed metrical encoding in all verse texts. Regarding the many verse dramas, one of the main goals is to mark up the main verse structures clearly, i.e. the starting and ending points of the different meters occuring in the text. As <lg> is defined in the TEI DTD it seems to be related to poems only, not to verse dramas. The element may contain verse lines, headings, closers and so on, but not dramatic elements like speeches and stages. To avoid heavily fragmenting and linking or milestones, we have decided to modify our dtd to allow <sp>, <stage> and <div> inside <lg>. This makes <lg> more parallel to the <div> element, and we use the <lg> element to mark up verse structures and the <div> element to mark up the drama structures (acts and scenes). We would suggest a similar change to the TEI DTD. - additional attributes for metrical analysis The attributes for metrical analysis in TEI P4 are the met, real and rhyme attributes. These are intended for metrical structure, deviation from the metrical structure and rhyme scheme respectively. An attribute for deviation from the rhyme scheme is not included in the TEI DTD. We have therefore split the real attribute in several categories: realMet (for deviation from metrical structure) and realRhyme (for deviation from rhyme scheme). In addition we have attributes for notation of anacrusis and deviations attached to these, respectively the an and the realAn attribute. These attributes may have the values "single", "double" and "no". We would suggest these attributes to be included in TEI P5. Please contact Stine Brenna Taugbøl (s.b.taugbol@ibsen.uio.no) at Henrik Ibsen's Writings, if you have any questions or remarks.
1094663163 | 686243 P4 vanilla does not permit <stage> as a child of <lg> (which it obviously should -- I pointed this out on TEI-L in April 2001). It does allow <stage> as a child of <l>, which I think it probably should, but is not nearly so obvious. As for <speaker> as a child of <lg>, the argument is a bit more tenuous, but still holds water. If one thinks of <speaker> as either a special case of <stage> or a special case of <label>, rather than as "first child of <sp>", it does make sense. The WWP permitted this (<speaker> as child of <lg>) in mid-1999. As for <div> as a child of <lg>, I don't think this makes a lot of sense. I've read the arguments here, and find (what I understand of them) unconvincing. Just because two sets of metrical lines are in the same meter does not mean they need to be (or even should be) in the same <lg> element. The <div> element exists for dividing a text into logical units, like acts and scenes. <lg> exists for holding a set of metrical lines in a convenient way, not for wrapping a set of divisions which happen to include metrical lines, I don't think. I don't think this particular suggestion was part of the set of suggestions the Henrik Ibsen's group posted (to TEI-L) in December 2001, so if there are further arguments there I have not reread them. [sbauman] |
1094196473 | [nobody] |
1094160074 | 686243 P4 vanilla does not permit <stage> as a child of <lg> (which it obviously should -- I pointed this out on TEI-L in April 2001). It does allow <stage> as a child of <l>, which I think it probably should, but is not nearly so obvious. As for <speaker> as a child of <lg>, the argument is a bit more tenuous, but still holds water. If one thinks of <speaker> as either a special case of <stage> or a special case of <label>, rather than as "first child of <sp>", it does make sense. The WWP permitted this (<speaker> as child of <lg>) in mid-1999. As for <div> as a child of <lg>, I don't think this makes a lot of sense. I've read the arguments here, and find (what I understand of them) unconvincing. Just because two sets of metrical lines are in the same meter does not mean they need to be (or even should be) in the same <lg> element. The <div> element exists for dividing a text into logical units, like acts and scenes. <lg> exists for holding a set of metrical lines in a convenient way, not for wrapping a set of divisions which happen to include metrical lines, I don't think. I don't think this particular suggestion was part of the set of suggestions the Henrik Ibsen's group posted (to TEI-L) in December 2001, so if there are further arguments there I have not reread them. [sbauman] |
1094128180 | [nobody] |
1094123114 | 1021146 Thank you for the examples of metrical analysis. A proposal for adding these extra attributes for metrical analysis seems fair enough to me, though it is rather specialized. But I am afraid I am still not convinced that there is need for the changes you propose making to <lg>. You say you want to allow <div>, <sp> and <stage> within <lg>. Taking these in reverse order: * <stage> is already permitted, both within and between <l>s (and if it is not, then it should be!) * If you allow <sp> within <lg>, you will allow a nonensical structure like the following <lg> <sp> <p>....</p> </sp> </lg> What this tells us is that whatever the thing is you want to include within a <lg> it's not a <sp> as currently defined, since that can contain either prose or verse, and your thing obviously can only contain verse. * Why do you want <div> within <lg> ? What's wrong with a nested <lg> ? Is the motivation for this change the familiar cross-hierarchy problem (speech structure doesn't respect the verse structure boundaries, and vice versa)? There are a number of ways proposed already for dealing with that, of varying satisfactoriness. But I don't think this is one of them. [louburnard] |
1094116191 | [nobody] |
1094055429 | 1021146 I think the proposal to turn <lg> into a special kind of <div> is inappropriate. If you want a special kind of <div>, use <div type="verseGroup"> or something. Repurposing <lg>, which is defined as a chunk level element, in this way is tag abuse. The proposals for extending the metrical analysis attributes are interesting. Would you be able to provide some more detail about those, perhaps in the form of a worked example? [louburnard] |
According to P4, <publisher> "provides the name of the organization responsible for the publication or distribution of a bibliographic item" and <pubPlace> "contains the name of the place where a bibliographic item was published". Consider the following <biblStruct> example, which includes two pairs of publishers and publication places: <biblStruct> <monogr> <editor> <persName> <forename>Johan</forename> <forename>F.</forename> <forename>A.</forename> <forename>K.</forename> <nameLink>van</nameLink> <surname>Benthem</surname> </persName> </editor> <editor> <persName> <forename>Alice</forename> <nameLink>ter</nameLink> <surname>Meulen</surname> </persName> </editor> <title lang="eng" level="m">Handbook of Logic and Language</title> <imprint> <pubPlace>Amsterdam</pubPlace> <publisher>Elsevier</publisher> <pubPlace>Cambridge, Mass.</pubPlace> <publisher>MIT Press</publisher> <date>1997</date> </imprint> </monogr> </biblStruct> As the example shows, the <pubPlace>-<publisher> pairs cannot be explicitly stated. As a consequence, bibliographic stylesheets have to rely on some document order convention in order to determine which <pubPlace> applies to which <publisher>: Benthem, Johan F. A. K. van and Alice ter Meulen (eds.) (1997). _Handbook of Logic and Language_. Amsterdam: Elsevier and Cambridge, Mass.: MIT Press. The xml-biblio group (cf. the archives of the xml-biblio-discuss@lists.sourceforge.net mailing list) proposes to reformulate the above-mentioned meta-language definition of <publisher> in a way that covers the following alternative markup of the example: <biblStruct> <monogr> <editor> <persName> <forename>Johan</forename> <forename>F.</forename> <forename>A.</forename> <forename>K.</forename> <nameLink>van</nameLink> <surname>Benthem</surname> </persName> </editor> <editor> <persName> <forename>Alice</forename> <nameLink>ter</nameLink> <surname>Meulen</surname> </persName> </editor> <title lang="eng" level="m">Handbook of Logic and Language</title> <imprint> <publisher> <placeName>Amsterdam</placeName> <orgName>Elsevier</orgName> </publisher> <publisher> <placeName>Cambridge, Mass.</placeName> <orgName>MIT Press</orgName> </publisher> <date>1997</date> </imprint> </monogr> </biblStruct> (P4's content model for <publisher> already allows for <placeName> and <orgName> children.) As an alternative, P5 could introduce a wrapper element for <pubPlace> and <publisher>, e.g. "<publication>".
1094056450 | 1021146 <imprint> can be used in two different scenarios. 1. When it is used within e.,g. <titlePage>, i.e. when encoding an existing published resource. It records what the imprint of that resource says. If it is ambiguous in the source, it should be ambiguous in the encoding. 2. When (as in your first example above) you are creating a new bibliographic description. Here you have the opportunity to organize the information a bit better. I would suggest, for this example, that you should supply two imprints, one for each publisher. However, as we appear to be going round in circles on this one, would you mind moving the discussion to TEI-L? [louburnard] |
1093017513 | 950793 Without a wrapper element nor an order convention for the publisher's organization name and place name, ambiguous cases can arise. Consider the following <imprint>, where only one of the <publisher>s has a corresponding <pubPlace>: <imprint> <publisher>Universität Stuttgart</publisher> <publisher>Universität Tübingen</publisher> <pubPlace>Heidelberg</pubPlace> <publisher>IBM</publisher> <date>2003</date> </imprint> Without an document order convention, a formatting stylesheet could not know to which <publisher> occurrence <pubPlace>Heidelberg</pubPlace> applies. Our proposal (either defining a wrapper element for <publisher> and <pubPlace> or including both informations into <publisher> as an <orgName> and, if any, a <placeName>) would resolve the ambiguity without falling back on an order convention. [nolda] |
1093012867 | [nobody] |
1093009375 | 950793 The P4 order convention for <publicationStmt> (in particular, <publisher> preceeding <pubPlace>) appears not to apply to <imprint> (cf. the reference for <imprint> and the examples in § 6.10.2.3). [nolda] |
1093000575 | 1021146 It is explicitly stated in P4 that "although not enforced by the DTD, it is a requirement for TEI conformance that information about publication place, address...date be given in that order, following the name of the publisher, distributor, or authority concerned" (see reference for <publicationStmt>). Introducing an additional wrapper just to group these two therefore seems unnecessary to me. The proposal to include <placeName> *within* <publisher> would be a radical re-interpretation of the semantics of the element. It is meant to include only the NAME of the publishing organization. There is quite a difference between that and the place of publication. The only circumstance in which (with current definitions) it would make sense to include a placeName would be if the name was somehting like "Oxford University Press", and then you might choose to tag "Oxford" as a placeName -- irrespective of the actual place of publication. [louburnard] |
Please adopt a policy for specifying whether some element is to be rendered * inline * as a displayed block without a running number * as a displayed block with a running number like "(12)" * or as a floating block These specifications are crucial for authors and cannot be left to stylesheets. I would like to propose a policy along the following lines. For each phrase-level or block element a default rendition is defined: * inline (e.g., for <formula> or <mentioned>) * displayed without running number (e.g. for a putative <listDisplayed> element; see below) * floating (e.g., for <figure> or <table>) Authors can switch to a different rendition by specifying one of the following "rend" values: * "inline" * "displayed" * "floating" Running numbers for displayed elements are given as an "n" value, with the special value "generated" for numbers which are to be automatically determined by the processing stylesheet. So, for example: * <formula> is rendered inline * <formula rend="displayed"> is rendered as an unnumbered displayed block * <formula rend="displayed" n="12"> and <formula rend="displayed" n="generated"> are rendered as numbered displayed blocks * <formula rend="floating"> is rendered as a floating block At least in linguistic documents, there can be numbered displayed blocks on different 'display levels', e.g.: (12) a. ... b. i. ... ii. ... For configurations of this type, a <listDisplayed> element could be defined, whose children are automatically rendered as numbered displayed blocks on a subordinate level: <listDisplayed n="12"> <formula>...</formula> <listDisplayed> <formula>...</formula> <formula>...</formula> </listDisplayed> </listDisplayed>
1095653844 | 686243 Although I think the Guidelines should remain agnostic as to what system to use in rend=, we should consider providing a mechanism for declaring a notation, and a list a few sample notations. E.g., a notation= attribute on <rendition> that points to CSS, XSLFO, WWP rendition ladders, or some such. [sbauman] |
1093211063 | 950793 P5 would facilitate the re-use of stylesheets if it would propose some convention for basic rendition options like those under discussion here. [nolda] |
1093198915 | 1021146 While I agree with you that these matters are important for authors, and that authors need a way of specifying their intentions, I am not convinced that ths cannot be done in a stylesheet. It is not impossible to think up values for the REND attribute which will say whether or not something should be rendered inline, and whether or not its N attribute value should be used to number. I agree that some code of practice, or set of conventions, needs to be drawn up. It occurs to me that the SIG on presentational issues and authoring might have some views on this, so I will draw their attention to these proposals. [louburnard] |
The current <index> model of TEI suffers from several limitations, including: 1. index entries are given as attribute values, so they cannot contain additional markup 2. there is no support for ranges 3. there is no support for cross-references like "see" or "see also" The first limitation is easily fixed by substituting <label> elements to the "level<n>" attributes. As to the second limitation, <index> could be changed from a 'milestone-like' element to an element containing all of the material to be indexed in a subelement, e.g.: <index id="index.lemmatization.arabic"> <indexLabel level="1">lemmatization</label> <indexLabel level="2">arabic</label> <indexContent>The students understand procedures for Arabic lemmatisation and are beginning to build parsers.</indexContent> <index> (For the "id" attribute on <index>, see below.) The third limitation could be removed by adding a pointer child to <index>: <index> <indexLabel level="1">arabic lemmatization</label> <indexRef>see <ptr target="index.lemmatization.arabic" type="index"/></indexRef> </index> Alternatively, TEI could simply adopt DocBook's index model.
1096291785 | 950793 Right, <indexContent> should hold the source content instead of replicating it. [nolda] |
1095623170 | 686243 As Lou has previously aluded to elsewhere, in the Big Picture it would probably be the right thing to ditch the entire TEI index model and replace it with hooks designed to make using an XML Topic Map easier. (XTM was designed to generate indexes, after all.) In the interum, there are 4 specific suggestions here: 1. Change levenN= attributes to child elements. This is a good idea; such a good idea, it is already on the list of things to do for P5. We have not, however, come up with a good name for this child element. Suggestions welcome. (I, for one, am not fond of Andreas's <indexLabel>.) 2. Use <indexContent> (I think Andreas intends this to hold, rather than replicate, source content, but I'm not sure.) I'm iffy on this one; overall I don't think I like it, but can't elucidate why very well. I'd prefer the DocBook zone= method (if I understand it correctly), I think.[1] 3. Permit a pointer child to <index> for redirecting the reader to a different index entry. While it's probably a good idea to provide this kind of functiaonality, I have a feeling that by the time one is getting this complicated XTM is really the better way to go. And, as an alternate to the above 4. Adopt the DocBook index model. I am not thoroughly knowledgable about the DocBook model, but I'm worried that it tries to do too much (or at least, more than we need) at once and therefore makes usage & processing more difficult than need be. Note ---- [1] I think the DocBook zone= method works such that rather than look at the spot where <index> element appears in the text, the index generation software takes as that which is to be indexed the target of the zone= attribute. In the TEI world we'd probably use a target= attribute, and the semantics would be that the default value of target= ("default" here meaning what software should do if it finds no value, not what the schema should say is provided as a default value) is the <index> element itself. [sbauman] |
Please replace the "label" attribute of <eTree>, <eLeaf>, and <triangle> by <label> (or <eLabel>) elements, thereby allowing for specifying labels containing markup.
1095616524 | 686243 In principle, it seems almost obvious to me that label= of <arc>, <eLeaf>, <eTree>, <graph>, <iNode>, <leaf>, <node>, <root>, <tree>, and <triangle> (as well as label2= of <arc> and <node>) should become child elements instead of attributes. However, if <node> is to have child elements, it is probably worth considering having <arc> as a child of <node>, and doing away with adj=, adjFrom=, and adjTo=, no? (So, e.g., rather than either one <arc> element with from= & to= or using adj=, adjFrom=, or adjTo=, each <arc> would be represented by 2 <arcEnd> elements, at least one of which would have an otherEnd= attribute pointing to the other.) There may be some similar improvement to be made in <eTree>s. My gut instinct is that unless someone who knows a lot more about directed and undirected graphs, trees, and embedded trees than I do steps up and is willing to work on this issue, we should just move the label= and label2= attributes to <label> children. Alternatively, of course, we could change the name of the attributes (say, to lblPtr=) and just point to a <label> element elsewhere. [sbauman] |
1092643769 | 950793 I use <eTree>s for representing linguistic structures and wrote a stylesheet translating them to PSTricks' \pstree commands. (The stylesheet is included in the tei2tex package, which is available from my homepage.) Compared to SVG, the <eTree> model provides a more generic markup for tree graphs. For instance, using <eTree>s, the exact positioning of the nodes can be left up to the processor. Thus you can make trees grow or shrink according to the width of the nodes' contents without hardcoding their positions. As far as I can see, in SVG you would have to fall back to scripts in order to achieve the same effect. But I may be wrong here ... [nolda] |
1092606711 | 1021146 It would probably be a good idea to change this label attribute into a child element as you suggest. However, we are not yet sure whether the trees and graphs module in which the <eTree> element appears will be carried forward to P5. Do you know of any TEI applications which actually use this module? Would it not be better to represent such structures using SVG? [louburnard] |
I would would like to suggest that the measure element be added to the typed class so that it has both type and subtype attributes. The additional subtype attirbute will be useful in dealing with currency and probably other measurements as well. For example: <p>This book costs <measure type="currency" subtype="USD" reg="$8.00">eight dollars</measure>.</p> In the above example, the currency symbol "$" in the reg attribute is not a precise indicator of the type of currency, since it is used for both Candian and US dollars. I think this is a general enough case that it might merit a change in P5. John
1095559142 | 686243 The following is more a thought experiment out loud than anything else. Take it with a grain of salt ... or two. Since "in its fullest form, a measure consists of a number, a phrase expressing units of measure, and a phrase expressing the commodity being measured", I'm inclined to say that type= of <measure> should be dropped in favor of three attributes, let's call them num=, unit=, and stuff=. This permits regularization of each component of the measure separately, and thus also would permit us a more precise semantics of reg= (either the normalization of num= or the normalization of both num= and unit= together; might even want to call it norm=, removing <measure> from the name class, thus removing the key= attribute, which I daresay seems a bit silly, although perhaps I'm overlooking something here). However, this doesn't actually address the problem John was pointing out, which is that sometimes it's useful to indicate what dimension (wrong word, but I can't think of a better one right now) a particular unit is measuring. So just as "CAD", "SEK", and "EUR" are all units of "currency", it is also the case that "metre", "mile", and "light-year" are all units of "distance". It is reasonable to expect that at times people may wish to indicate this larger category instead of or in addition to the actual unit. But before we go attribute-crazy, we need to also think about which of these bits of information need to be elements rather than attributes (such that information about them, e.g. their language, can be supplied). [sbauman] |
1088514677 | [nobody] |
1088493880 | 1021146 I dont have strong feelings about adding this element to the typed class, but I do think that the example quoted is misleading. The subtype should relate to the element, not one of its attributes. USD is not a subtype of measure, it's (possibly) a subtype of currency. Also, the reg attribute in the above example is being expected to do two different things: regularize the currency name and regularize the measurement itself. So (unless we want to use different attributes for this purpose which seems like overkill) it ought to be 'reg="USD 8.00"' (or CAD or HKD or...) Syd comments that the Guidelines should suggest exactly how currencies be encoded without relying on subtype, which seems entirely right. He also suggests something like <measure type="currency" reg="37.54 EUR">$45.70</measure> which I think is also a misuse of "reg" -- it doesn't mean "equivalent at today's trading rate". [louburnard] |
The <tagsDecl> element in the header is used to record the usage of XML elements present in a document. With the advent of multi-namespaced documents in TEI P5, it will be necessary to distinguish element names by namespace. Proposal: either (a) add a ns attribute to <tagsDecl> the value of which is a full name space (not a prefix). Default is http://www.tei-c.org/ns/1.0 This requires that <tagsDecl> be made repeatable, which makes it possible to get things wrong. or (b) add a new <nameSpace> element with attribute NAME, as child of <tagsDecl> and parent of <tagUsage> On balance, (b) seems preferable. Existing documents could be accomodated unchanged if we added a rule that says any <tagUsage> not wrapped in a <nameSpace> is assumed ipso facto to be in the TEI namespace.
TEI P4 has four elements, <date>, <dateRange>, <time>, and <timeRange> for encoding and normalizing text that describes a point or period in time. (Not counting <timeline> and <when> which are special purpose elements for establishing synchronous points.) The only difference between a date and a time is the level of precision. The quick description of the difference between a <date> or <time> and a <dateRange> or <timeRange> is that the *Range describes a period greater than the level of precision used. Furthermore, in its discussion of attributes for these elements, P4 conflates accuracy and precision (and also, IIRC, confidence in the accuracy :-), and does not address whether ranges are inclusive or exclusive. Thus I am suggesting that this mix of elements and attributes need some attention for P5. Some first suggestions follow. Since it is easy to indicate a range with the international standard representation of dates and times (ISO 8601:2000), the *Rnage elements are unnecessary, and should be dropped from P5. The following example (from P4 6.4.4, source is Virginia Woolf's _Mrs._Dalloway_) demonstrates an encoding of a range without <dateRange>. | Those five years — | <date value="1918/1923">1918 to 1923</date> | — had been, he suspected, | somehow very important. The Guidelines should simply state that the range specified on value= is inclusive. E.g. | <date value="1067/1776-07-03">After 1066 but before | American independance</date> (Which, of course, could also be encoded | After <date value="1066">1066</date> but before | <date value="1776-07-04">American indenpendance</date> with the same accuracy and precision) | <date value="1869-10-02/1948-01-30">during the life | of the Mahatma</date> The exact= attribute of <*Range> should become the accuracy= attribute of <date> and <time>. The precision is indicated by the precision of value=. Since a date and time indicate the same thing (albeit with varying precision) and the normalized representation (ISO 8601) can include both, the Guidelines should explicitly state that <time> and <date> are technically interchangeable. The Guidelines should be explicit about whether a "T" is to be specified between the date and time fields of an ISO 8601 value=. (I.e., whether the contents of value= is an ISO 8601 format date followed by whitespace followed by an ISO 8601 time (e.g. "2004-09-03 15:24Z") or an ISO 8601 time and date (e.g. "2004-09-03T15:24Z"). I prefer the latter myself. The Guidelines should explicitly prohibit the notation "24:00" to represent midnight in the value of value=. (This notation *is* premitted by ISO 8601, one of the few indications that it was written by committee :-) We can imagine two different uses of the value= attribute of <date> (or <time>, I suppose): 1. regularize the content of <date> into a format which can easily be searched, preferably one that can easily be parsed and searched 2. normalize the content of <date> to a date along an agreed upon timeline (aka calander system) It might make sense, then, to separate these into two separate attributes, as one may reasonably want different values for these purposes. For example, one might like to regularize the *format* of the Julian dates in early modern printing, but may well rather not be bothered trying to figure out what the Gregorian or proleptic Gregorian (i.e., normalized) value would be. | <docDate norm="1548-04-07" reg="1548-03-28">The.xxviii.day | of <name>Marche</name> | <lb/>the yere of our lorde. | <lb/>M.D.XLVIII.</docDate>
If the "label-item pair" model of a list is retained, then <castList>s must be allowed to have <label>s, too. For that matter, <castItem>s and <castGroup>s should be allowed to have <label>s, as well.
The <castGroup> content model needs to allow <roleDesc>. The attached PNG is an extract from the cast list of Margaret Cavendish’s _A_Piece_of_a_Play_. In it, two <castGroup>s each contain two <castItem>s. However, there is only one role description for each <castGroup>. Ideally, only one <roleDesc> should be permitted in a <castGroup>, but it should be permitted either before or after the series of <castItem>s or <castGroup>s (which may or may not all have <label>s depending on whether feature request 1022100 is enacted). In order to make that clearer, here I have expressed that idea useing straight RelaxNG compact syntax (i.e., not a TEI syntax pattern, as there are no references to the globally included elements nor the TEI class and pattern indirection system). This presumes both a desire for <roleDesc> as described here, and for <label> as requested in 1022100. # maybe label caststuff pairs mlcp = ( ( label, ( castItem | castGroup ) )+ | ( castItem | castGroup )+ ) element castGroup = { head?, ( ( roleDesc, mlcp ) | ( mlcp, roleDesc? ) ), trailer? }
In order to create a collated edition, the pagebreaks in various manuscripts need to be noted. For this reason, the <pb/> tag (which has no span) should accumulate the possibility of having the wit (witness) tag associated with it when the transcription module is included, as follows. <witlist> <witness name="C1">ms C1</witness> <witness name="C2">ms C2</witness> <witness name="W1">ms W1</witness> <witness name="Sam">1867 printed edition</witness> </witlist> <pb wit="C1 C2 W1" n="1v"/> <pb wit="Sam" n="1"/> <!-- some text --!> <pb wit="C1" n="2r"/> <!-- more text --!> <pb wit="Sam" n="2"/> <!-- more text --!> <pb wit="W1" n="2r"/> Thanks! Will Tuladhar-Douglas will@nairatmya.org
1093001027 | 1021146 The existing ED attribute on <pb> provides more or less this functionality, though it does not allow for multiple values as in your first example. It might be worth rethinking its name, though, I agree. [louburnard] |