MASTER

Manuscript Access Through Standards for Electronic Records

Key: DMU-CTA De Montfort; IRHT Paris; NLP Prague; OU Oxford; KB Royal Library, the Hague; EAMSS US partners; VL Vatican Library; AP other associated partners; EG Expert group; BFM Marburg

 

Fourth Project Meeting, March 17-19, 2000. Arnamagnaean Institute, Copenhagen

Minutes

 

Attending:

Anne Korteweg (KB), Klaas van der Hoek (KB), Lou Burnard (OU), Richard Gartner (OU), Zdenek Uhlir (NLP), Elisabeth Lalou (IHRT), Muriel Gougerot (IRHT), Malcolm Bothwell (IRHT), Consuelo Dutschke (EAMSS), Terry Dinovan (EAMSS), Merrilee Profitt (EAMSS, TEI), Peter Robinson (DMU-CTA), Adrian Welsh (DMU-CTA), Andrea Jones (DMU-CTA), Matthew Driscoll (AMI), Anne Mette Hansen (AMI), Ragnheidr Mosesdottir (AMI), Peter Springborg (AMI), Eva Nylundr (Lund/TEI), Már Jónsson (Reykjavik), Bjarni Thordarson (Reykjavik), Milena Dobreva (Bulgaria), Ljuba Varbanova (Bulgaria)

 

Meeting subject: agreement of modifications to the DTD, following the experience of the first round of implementation. These minutes also note subsequent changes following the meeting itself, and comments on the actual practice of the MASTER partners as shown in the 800+ descriptions created in the implementation phases (the 'test files'), in [ …]

After official greetings from Peter Springborg, the discussion began.

 

Review of changes since September meeting

LB explained changes made in the DTD following the Prague meeting. These were as agreed at Prague, with modifications and additions made in the October TEI meeting.

 

Discussion of the DTD

The following draws together discussions, which in some cases was spread over three days, under distinct headings. In many cases, the discussion carried on after the meeting and the results are here reported in [ ].

<msHeading> and <msSummary>.

The element <msSummary> has been scrapped, and in its place we have agreed a looser model for <msHeading>. That is: this may now contain the elements <author> <title> <origdate> etc IN ANY ORDER and with PCDATA. [Implemented in May revision of DTD. In fact: it appears that the MASTER partners are all using the tighter model of this element]

<langUsage>

There are problems using this to denote the language of the manuscript, rather than the language of the description. Therefore, a new element <textLang> has been proposed:

<textLang langkey=MHG>#PCDATA</textLang> with attribute.

This should appear ANYWHERE a phrase level element can appear. The langkey attribute should be an IDREF so that we can control this, by referring it to an authority list of language identifiers (in fact, the ISO two and three letter language codes).

[Implemented in May revision. this element is now widely used: 2382 instances in the test files. But only 541 of those, around 25%, use the langKey attribute]

<signatures> and <catchwords>

These are to become phrase-level elements available anywhere within a <p> and analogous elements

[<signatures>: not used even once in the test files!!! <catchwords>: used just twice, by AMI:

<collation><p>The manuscript consists of 7 gatherings. The first four gatherings are complete in 8 folios and consecutive. <catchwords>There are catchwords at the end of each gathering except on f. 38.</catchwords>]

attributes attested and accepted

It seems we need a mechanism to say: is this information derived from the manuscript (attested)? Do we accept this attestation (accepted). Attribute values: This would be most useful on the author title respStmt elements. Agreed that this would indeed be useful.

[not implemented in May revisions, though described in the reference documentation. In fact, used four times only by AMI:

<author attested="unk" accepted="no">Bonaventura (Giovanni di Fidanza)</author>

<author attested="no" accepted="yes">Cistercian monk of the Abbey of Pontigny</author>

as a result, these records do not parse!]

names

The rigorous and exact naming of persons or other phenomena associated with any aspect of manuscript creation is crucial. If anything, we have TOO MANY options, some of them documented in notes by PR following the Berkeley meeting. Among the options:

[<name> is used widely in the test files: 1553 instances. In 1467 of these it is used with the type attribute, thus:

<origin><p>The manuscript was written in <name type="place">Villingaholt</name>, in southern Iceland, in the <date>mid-17th century</date> by <name type="person">J&oacute;n Erlendsson</name>.</p></origin>

<provenance><p>The manuscript was owned by <name type="person">J&oacute;n &THORN;orl&aacute;ksson</name> (c. 1643-1712).</p></provenance>

attribute values for "type" used include: person 16; place 219; owner 780; scribe 54; binder 33; scholar 325; institution 6; author 13 -- these account for all 21 of the name type= uses.

<persname> <term> and <orgname> are NOT used AT ALL!]

secundo folio

Agreed as a phrase level element within notes and paragraphs <secFol>

[implemented in May revisions. Not used even ONCE in the test files!!!!]

<foliation> and <paratext>

accepted as children of physDesc, both with p+ content

[foliation used 36 times in test files, paratext 135 times, for example:

<foliation><p>The manuscript is foliated 1-194 (incl. 1bis). The manuscript is paginated in the upper right-hand corner on some of the rectos and in the upper left-hand corner on some of the versos: 5, 7, 9, 12-18, 21-26, 28, 30, 32, 36-40, 42, 44, 50, 58. From 60 only each 10th page is paginated: 60, 70, 80 etc. to 200.</p></foliation>

<paratext><p>Until folio 60 the content of the chapter is given in the margin of the codex as a guidance to the insertion of the chapter-headings. Fol. 2r-11v (- 10v)is provided with linenumbers.</p></paratext>

]

<deconote>

to be made available within binding, bindingDesc and msitem elements

[implemented in May revisions. 212 uses of decoNote in the test files. I can not tell whether any of these are within binding etc: all I have seen are within <decoration>]

rename <marginalia> as <additions>

Necessary because many such annotations etc are NOT written in the margin!

[Implemented in May revisions. 90 uses in the test files, for example:

<additions>

<p>

Indications &agrave; l'encre rouge pour les lectures : "in refectorio" (ff.5,26,42v)

</p>

</additions>

]

<abstract> renamed as <summary>

Designed to hold discursive account of manuscript matter, contained within <msItem>. (It may be used in accounts of legal documents, corresponding to the use of 'docket' by some cataloguers (eg Consuelo Dutschke).

[implemented in May revisions. Used 19 times in test files:

<summary>

<locus>f.2</locus>

Sunt hec collecta libro vulgalia multa, ex alphabeto distincte scripta teneto, et positum titulo quodlibet est proprio.

</summary>

There is a question of whether this element should be used to hold a summary statement TRANSCRIBED from the manuscript (as here); or should it be a summary supplied by the cataloguer? I think all the uses so far are the first kind.

]

<incipit> <explicit> <rubric> <msitem> elements

The old <finalRubric> element is removed, in favour of <rubric type=final

[268 uses of <rubric>; 21 of these are type=final]

A type attribute is needed on incipit and explicit to indicate whether this is defective, etc. We need also to indicate whether the incipit is a supplied title (or whether we want to use incipits as supplied titles at all)

[implemented in May revisions. Very widely used by AMI: 303 instances of incipit type=def, 316 of explicit type=def]

Some means of classification of <msItem> is desired, perhaps through a type attribute. Lou suggests that if we want to apply a classification to the items, we should use the class attribute, which comes with a TEI mechanism for declaring classifications.

[implemented in May revisions. Not used even once!!!!]

<dimensions> leaf options

Suggested that we have a simple type attribute with an open list of attribute values.

[implemented in May revisions. Of 945 uses of <dimensions> in the test files, 294 use the type attribute. Values include: leaf 278; binding 6; written 4.]

<physdesc> content model

The MASTER group has always worked to a strict content model: the elements within physDesc may not repeat and must occur in a fixed order. The TEI workgroup would like to see the elements repeatable and occurring in any order. MASTER determined to retain the stricter model. The Bulgarians pointed out that if we allowed elements to repeat, it would be easy to implement bi-lingual descriptions: one would just duplicate the element structure, changing the language of the content.

[PR LB CWD have since agreed that the difference between the two groups should be reflected in the documentation and DTDs: that is, we will state the two different models and produce different DTDs for the two models. The effect will be that all MASTER records will pass against the TEI DTD, but not the reverse.

<heraldry>

We know we need this, and it should be a phrase level element available almost anywhere, and have phrase content.

[Implemented in May revisions. Not documented, and no one has used it in the test files]

declaring type/class of manuscript

There appears to be a need to say the kind of manuscript we are dealing with at the level of <msDescription> and <msPart>. We agreed to add a type attribute to these elements to say whether this is a diploma, or a codex, etc.

[Implemented. However, there is not one use of this in the test files -- compared to 500+ uses of the status attribute (with values uni compo frag def)]

refining treatment of language and alphabets

This was the focus of a presentation from the Bulgarian group, which made clear that there needs to be a more robust mechanism for the following:

Dealing with each of these in turn:

the languages in the manuscript

The <textLang> element [now widely used] copes very well with this

the languages of transcribed text in <q> elements etc

Use of the universal lang attribute on text transcription elements (<q><incipit><explicit> should cope with this. LB also proposed the addition of a hand attribute to these, where we want to indicate a scribal hand [not implemented]

[Are we sure the lang attribute used this way is correct?]

the palaeographic and orthographic aspects of the manuscript

We need both better means of describing these formally, and then of characterizing individual parts of manuscripts in terms of these formal descriptions. The <msWriting> element is the right place to hold the formal description. This can hold one <handdesc> element for each hand, with scribe script medium and scope attributes on this enabling formal identification of these aspects. Effectively, one would give each <handdesc> an ID and then refer to this ID in the hand attribute on <q> <incipit> and other transcription elements.

In addition, MJD proposed a set of further elements: palaeography orthography morphology which should be used to describe these aspects of the manuscript writing more fully.

[The additional attributes on handdesc have been implemented. However, the hand attribute (as IDREF) has not been added to the transcription elements. The palaeography orthography morphology elements are described in the reference documentation but have NOT been implemented in the DTD.

There are some uncertainties here. There will be problems with scope: if we give a <handdesc> an ID, under SGML rules this ID must uniquely identify this hand for the WHOLE SGML document, which might contain many thousand <handdesc>s: that is, we cannot restrict this hand identification to this particular <msDescription>. Thus: 'hand b' in one manuscript might clash with 'hand b' in another manuscript.

There are also problems with the overlap between this msWriting/handDesc mechanism (available within msDescriptions) and the handlist/hand mechanism (available only in the TEI header): again, there could be collisions between declarations of 'hand b' in the two places.

In the test files, there are 216 uses of <msWriting>, by all partners, and 213 uses of <handDesc>, again by all partners, with considerable use of the attributes. These mechanisms seem to have made possible some very rich encoding, as in this example (abbreviated):

<msWriting hands="13">

<handDesc scribe="AM 544 4to 1" script="carolingian-insular minuscule" medium="ink" scope="sole"><p>Hand 1 occurs on 1r-14v and has been called <q>the first Norwegian hand</q> as the writer is obviously not Icelander. The script is carolingian-insular minuscule</p></handDesc>

<handDesc scribe="AM 544 4to 2" script="old gothic bookhand" medium="ink" scope="sole"><p>Hand 2 is found on 15r-18v,l.31. The script of this hand is old gothic bookhand</p></handDesc>

…. Various hand descriptions omitted …

<handDesc scribe="AM 544 4to 14" script="gothic bookhand" medium="ink" scope="sole"><p>Hand 14 occurs on 107v:l-5. The script is gothic bookhand.</p></handDesc>

</msWriting>

the combination of language and writing system used in the manuscript

Bulgarian is characterised by a division between language and writing system. For example: one could write an incipit in Old Church Slavonic language using a Latin alphabet; or in Old Church Slavonic using a Church Slavonic alphabet. This contradicts directly a fundamental tenet of TEI, that a language is the COMBINATION of language and writing system. LB pointed out that the Bulgarian situation could be dealt with by defining additional 'two part' language codes. Thus: CHU-CHU would mean Church Slavonic written in the Church Slavonic alphabet; CHU-LAT would mean Church Slavonic written in Latin.


Dated/dateable elements

The dateable attribute class, devised by Lou, seems to cope very well with the problems of indicating different kinds of dates.

[This is proving very popular with the partners: 786 uses of the notBefore etc attributes in the test files: 115 with <origdate>, 174 with <binding>, 320 with <origin>, 20 with <custEvent>, 89 with <acquistion>]

Using <msIdentifier> to refer to manuscripts cited (not described)

The very robust method devised by MASTER to identify the manuscript being described would appear to be suitable to identify manuscripts referred to, rather than described. This would discussed in some length. Some thought that this might weaken the reliability of the mechanism, and proposed that a distinct element (<msBibl>?) be created for this. No decision was made at this meeting (though some thought otherwise!).

[Later discussion, also involving the TEI group, agreed that the <msIdentifier> mechanism should indeed be used to refer to cited manuscripts, and there was no need for a new element for this. However, the TEI group believe that it cannot be used for cited manuscripts without a relaxation of the content model, enabling the different elements to repeat, to occur in multiple orders, and with pcdata between elements. Thus, a similar agreement was made as for <physDesc>: this looser model will be part of the TEI DTD, which will so form a superset of the MASTER DTD.]

watermarks

The lack of any facilities to record information about watermarks is a clear deficiency. The group was not certain whether it should be a 'paragraph' level element, whose position within the document hierarchy would then have to be fixed (probably, within <support>) or a 'phrase level' element, which might occur within any <p> element and so might occur effectively anywhere.

[The reference documentation actually suggests BOTH, which is impossible -- or rather, the description suggests 'paragraph level' while one example is 'phrase level'. This needs resolution]

<overview>

Towards the end of the meeting, we realized we had created a large number of elements with a specialized substructure. For example: <msWriting> can contain a series of <handDesc> elements. It was felt that people would need the facility to make more general statements about the element, in addition to those specific statements made in the <handDesc> elements. Thus, we should provide an <overview> element to carry such more general descriptions.

[The reference documentation describes this <overview> element but it is NOT implemented in the DTD. In fact, one could use <p> elements to contain any such overview, if it were felt desirable. Indeed, some encoders are doing just this:

<msWriting hands="3"><p>Written in three or possibly four different hands, but there are no orthographic discrepancies supporting the theory of the four hands, therefore this account will count on three different hands.</p>

<handDesc scribe="AM02-061-H1" script="younger gothic" medium="ink" scope="major"><p>ff. 1v-109v, which contains the older part of the manuscript, is written in a handsome, clear and very practised icelandic gothic bookhand.</p></handDesc>

..text omitted…

</msWriting>

There seems little point in having this element.]

<foliation>

At present, this has only the global attributes ID lang N etc. This is clearly inadequate. There needs to be more definition here -- but several efforts to define this have run into the same sort of problems which have plagued the attempt to map precise attribute sets/values to incipit and other elements.