Making and Parsing Master Records using NoteTab within Windows

Downloading and Installing NoteTab
Using NoteTab
Notes on using NoteTab Pro
Creating and parsing MASTER records
Using a parser on your own computer
Downloading and Installing NoteTab

The first step in making Master records with NoteTab is to retrieve the software. A good site to download the software from is www.tucows.com:

  • Once at this site you should select your operating platform (i.e. Windows 95/98, or NT)
  • The next screen will ask you which region of the world you are in.
  • This will be followed by a list of mirrors in your region for tucows. Select your nearest server.
  • In the next screen, click on the link to 'Download software'.
  • The next screen will show a choice of types of software to download. From the section HTML Tools choose Editors Text.
  • This will take you a list of HTML editors. Scroll down the list until you reach NoteTab Light.

(A note on NoteTab Light and NoteTab Pro. NoteTab Light is freeware. However, it is a cut-down version of NoteTab Pro. It does not allow tag highlighting and several other features. If you'd like to download NoteTab Pro you should go to HTML Tools: HTML Editors Advanced. This will give you a copy of NoteTab Pro that will last for 30 days. After the 30 days the program is disabled and a copy may be bought at a cost of $19.95)

Once you have located NoteTab Pro or Light hit the 'Download now' button on the page and download the program to an appropriate location on your hard drive. This will give you a compressed zip file. (If you do not have a decompression program, Winzip is the most popular and this can also be downloaded from the tucows site under General Tools: Compression Utilities.)

The next stage is to unzip and install the program to your hard drive. The easiest way to do this is to click on the install icon in Winzip. Follow the installation instructions and close Winzip after the program has been installed successfully.

You should have now installed NoteTab on your computer, and an icon for NoteTab will have been added to your Start: programs menu.


Using NoteTab

Once you have started NoteTab you will see a group of tabs running across the bottom of the screen. (see fig 1 below)
Fig 1
If you click on one of the tags (e.g. HTML) another window appears on the left hand side of the screen. (see fig 2 below)
Fig 2
Within this window there is a list containing to mark-up tags that can be inserted into your open document on the right hand screen. To insert one of these tags simply left double-click on it (e.g. Paragraph). You will see the tags <P></P> appear, ready for you to insert the text between them.

The next stage is to add the Master specific tab (as developed by Matthew Driscoll, refined by Adrian Welsh and Peter Robinson) to your copy of NoteTab. This will let you access the Master SGML/XML tags in the same way we have just done for the HTML tags on the HTML tab.

The MASTER project has developed two tab files: one for preparing SGML encoded descriptions, one for XML encoded descriptions. We recommend that you download the XML tab file, as the MASTER project (from 1 January 2001) now gives a priority to support for XML.

You can download the XML tab file for MASTER from the Master site, masterx.clb. (If for any reason you would prefer to make SGML descriptions, you can obtain the SGML tab file from the same place, msdescription.clb.) You should place this file in the libraries directory, located within the NoteTab directory. If you have done a standard installation of NoteTab you will find the NoteTab directory in the program files directory on your C: drive, (see fig 3 below)
Fig 3
(if you installed NoteTab to a different directory, go there).

Once the masterx.clb file has been placed in the libraries directory, a tab called masterx will appear along the bottom of NoteTab. Click on this and the list of tags on the left of the screen are now the tags to be used for creating your Master records according to the current DTD


Notes on using NoteTab Pro

Highlighting Tags

Highlighting tags can only be done in NoteTab Pro and can be achieved in the following manner: Under the View: Options Menu item click on the HTML files tab. In the empty box under HTML file extensions, add the suffix for your xml master files (ie sgm or xml or add both) and ensure that the highlight HTML tags checkbox is ticked.(see fig 4 below)

Fig 4

Creating and parsing MASTER records

Creating and parsing a simple Master record.

Below is an example of a very simple manuscript record before it has been marked up in XML:

       

Oxford, Corpus Christi College, MS 198
Geoffrey Chaucer The Canterbury Tales. c. 1400

This manuscript record consists of just two parts:

  • The manuscript identifier (Oxford, Corpus Christi College, MS 198)
  • A statement of what the manuscript contains and its date (Geoffrey Chaucer The Canterbury Tales. c. 1400)

This can be encoded as follows using the MASTER system. Firstly, we must start a new <msDescription> element to contain the whole description. In NoteTab, double click the msDescription start tag in the left hand window. A dialogue box will appear asking you to enter your status type. Choose one of the following:
        uni: unitary: the manuscript is a complete entity which exists as a single fragment
compo: composite: the manuscript is a complete entity comprising multiple fragments
frag: fragmentary: the manuscript is an incomplete assemblage of one or more fragments
unknown: unknown or unstated.
In this case, choose uni and click OK. You will see the start of the encoded manuscript description appear in the right hand window: <msDescription status='uni'>

Now, we are ready to state the manuscript identifier. This is contained in a <msIdentifier> element. Double click on msIdentifier in the left hand window. A dialogue box will ask for details of the number, country code, country, city, repository, collection and shelfmark: all of these except shelfmark are optional. Fill in the number ('1'), country ('Great Britain'), country code ('GB'), settlement ('Oxford'), city ('Corpus Christi College') and shelfmark ('MS 198') boxes, then click OK. The following will appear in the right hand window:

       <msIdentifier n="1">
<country reg="GB">Great Britain</settlement>
<settlement>Oxford</settlement>
<repository>Corpus Christi College</repository>
<idNo>MS 198</idNo>
</msIdentifier>

Next, we wish to encode the statement of what the manscript contains and its date. We use the msHeading element to do this. Select 'Summary description (msHeading)' in the left and fill in the appropriate boxes, so that you see in the right hand window:

       <msHeading>
<title>The Canterbury Tales</title>
<author>Geoffrey Chaucer</author>
<origPlace>?</origPlace>
<origDate notBefore="1395" notAfter="1420">c. 1400</origDate>
<textLang langKey="ENM">Middle English</textLang>
</msSummary>

Observe the possibilities for precise searches which this encoding allows. We could find this manuscript as dated between 1395 and 1420; as being written in Middle English (with language key "ENM"), and more.

We have now got a complete, though rather short, manuscript description. This should appear as follows:

       <msDescription>
       <msIdentifier n="1">
<country reg="GB">Great Britain</settlement>
<settlement>Oxford</settlement>
<repository>Corpus Christi College</repository>
<idNo>MS 198</idNo>
</msIdentifier>
       <msHeading>
<title>The Canterbury Tales</title>
<author>Geoffrey Chaucer</author>
<origPlace>?</origPlace>
<origDate notBefore="1395" notAfter="1420">c. 1400</origDate>
<textLang langKey="ENM">Middle English</textLang>
</msHeading>
       </msDescription>

You can now parse this manuscript description, to make sure that everything is in order:

  • Save the file (for example, with the name 'mytest1.xml')
  • Double click on 'validate document using external validator' in the left hand window (in 'Validation')
  • This should open up your internet browser, and take you to the MASTER online parser, at http://www.cta.dmu.ac.uk/projects/master/parser
  • Press the Browse button, and select the file you have just made
  • Then press the Send File button

The parser will then parse the file. If there are errors, it will tell you where the errors are. If there are no errors, you will receive an encouraging message.

Creating and parsing a more complex Master record.

The above example is rather simplistic. The real power of the MASTER encoding is its ability to deal with very complex manuscript records. Here is a slightly fuller description of the same manuscript:

       

Oxford, Corpus Christi College, MS 198
Geoffrey Chaucer The Canterbury Tales. c. 1400
Folios 1r-266v. The Canterbury Tales. A274-I290. Defective at beginning and end.
Parchment, trimmed. 33.5 x 22.5 cm. Quires [14, 15, and 28] were disordered in the previous binding. They have been reordered and refoliated, with the old foliation being the uppermost. Two consecutive folios are numbered '64a' and '64'
Written by the scribe identified by Doyle and Parkes as 'Hand d'
Dated c. 1400 (personal communication, Malcolm Parkes). On fol. 146r is the name 'Burle' in drypoint, in the margin next to E1396. Cp came to the College as a bequest of William Fulman, according to a note on fol. 1r : 'Liber C.C.C.Oxon Ex dono Gulielmi Fulman A.M. hujus Collegii quondam socius.'

We can now give more detail about the manuscript. The <msContents> element allows us to give details of the texts contained by the manuscript, in a series of <msItem> elements (one for each text). This manuscript has only one text (the Canterbury Tales), and so for this description we have only one <msItem>. Note that this msItem is defective, and we can use the 'defective' attribute to indicate this. Within <msItem> we have a <locus> element, to state the folios or pages on which this text is found, and a <title> element to state its title. (You could also use <author> to state the author again). You state the exact text contained in the manuscript (A274-I290) by using a <biblScope> element within a <bibl>, and use a <note> element to state that the text is defective:

       <msContents>
<msItem n="1" defective="yes">
<locus from="1" to="266">Folios 1r-266v</locus>
<title type="uniform">The Canterbury Tales</title>
<bibl><biblScope>A274-I290</biblScope></bibl>
<note>Defective at beginning and end</note>
</msItem>
</msContents>

We now state the form of the manuscript (<form>), its material (<support>), its dimensions (<dimensions>), its collation (<collation>) and its writing (<msWriting>), all within a <physDesc> element:

       <physDesc>
<form><p>Codex.</p></form>
<support><p>Parchment, trimmed.</p></support>
<extent>266.<dimensions type="leaf" scope="all"><height>33.5</height><width>22.5</width></dimensions></extent>
<collation><p>Quires [14, 15, and 28] were disordered in the previous binding. They have been reordered and refoliated with the old foliation being the uppermost. Two consecutive folios are numbered '64a' and '64'</p></collation>
<msWriting hands="1">
<handDesc scribe="Hand D (Doyle/Parkes)" script="Anglicana" medium="ink" scope="sole"><p>Written by the scribe identified by Doyle and Parkes as '<name type="person" role="scribe" key="DPhandD">Hand d</name>'</p></handDesc>
</msWriting>
</physDesc>

Notice the use of the <name> element here to create a formal and indexable statement of the name of the scribe. From this, we could include this in indices of all persons mentioned in the descriptions, all scribes, and find all instances of the scribe labelled 'DPhandD'.

We use the <history> element to give the history of the manuscript, within <origin>, <provenance> and <acquisition> elements:

       <history>
<origin notBefore="1395" notAfter="1420" certainty="high" evidence="conjecture"><p>Dated c. <origDate>1400</origDate> (personal communication, Malcolm Parkes).</p></origin>
<provenance><p>On fol. 146r is the name 'Burle' in drypoint, in the margin next to E1396. </p></provenance>
<acquisition><p>Cp came to the College as a bequest of William Fulman, according to a note on fol. 1r : <q>'Liber C.C.C. Oxon Ex dono Gulielmi Fulman A.M. hujus Collegii quondam socius.'</q></p></acquisition>
</history>

This example record in its entirety would look thus:

       <msDescription>
       <msIdentifier n="1">
<country reg="GB">Great Britain</settlement>
<settlement>Oxford</settlement>
<repository>Corpus Christi College</repository>
<idNo>MS 198</idNo>
</msIdentifier>
       <msHeading>
<title>The Canterbury Tales</title>
<author>Geoffrey Chaucer</author>
<origPlace>?</origPlace>
<origDate notBefore="1395" notAfter="1420">c. 1400</origDate>
<textLang langKey="ENM">Middle English</textLang>
</msHeading>
       <msContents>
<msItem n="1" defective="yes">
<locus from="1" to="266">Folios 1r-266v</locus>
<title type="uniform">The Canterbury Tales</title>
<bibl><biblScope>A274-I290</biblScope></bibl>
<note>Defective at beginning and end</note>
</msItem>
</msContents>
       <physDesc>
<form><p>Codex.</p></form>
<support><p>Parchment, trimmed.</p></support>
<extent>266.<dimensions type="leaf" scope="all"><height>33.5</height><width>22.5</width></dimensions></extent>
<collation><p>Quires [14, 15, and 28] were disordered in the previous binding. They have been reordered and refoliated with the old foliation being the uppermost. Two consecutive folios are numbered '64a' and '64'</p></collation>
<msWriting hands="1">
<handDesc scribe="Hand D (Doyle/Parkes)" script="Anglicana" medium="ink" scope="sole"><p>Written by the scribe identified by Doyle and Parkes as '<name type="person" role="scribe" key="DPhandD">Hand d</name>'</p></handDesc>
</msWriting>
</physDesc>
       <history>
<origin notBefore="1395" notAfter="1420" certainty="high" evidence="conjecture"><p>Dated c. <origDate>1400</origDate> (personal communication, Malcolm Parkes).</p></origin>
<provenance><p>On fol. 146r is the name 'Burle' in drypoint, in the margin next to E1396. </p></provenance>
<acquisition><p>Cp came to the College as a bequest of William Fulman, according to a note on fol. 1r : <q>'Liber C.C.C. Oxon Ex dono Gulielmi Fulman A.M. hujus Collegii quondam socius.'</q></p></acquisition>
</history>
       </msDescription>

You can now parse this fuller record online, in the same way as before, by double clicking on 'validate document using external validator' in the left hand window (in 'Validation')

This example gives only a brief account of the MASTER encoding. A more detailed description of the tags can be found in the formal reference documentation for Master. Clearly, too, there are many different ways of applying the MASTER encoding. The draft set of cataloguing rules agreed by the MASTER project participants, prepared by Richard Gartner, attempts to define how the tags should be used: see http://www.hcu.ox.ac.uk/TEI/Master/Cataloguing/catrules.html.


Using a parser on your own computer

Setting up the parser
Adding power to the internal parser: referring to entities
Configuring the parser to use standard dtd and entity files

Setting up the parser

The system outlined above depends on you using the on-line parser maintained by the MASTER partners at De Montfort University. However, under some circumstances this will not be satisfactory. For example: the online parser is designed to parse a single record at a time. When you have hundreds of manuscript records, this will not be convenient. Also, you may want to include additional materials (bibliographic records, information about particular scribes and other people) alongside the manuscript descriptions, and in this case the 'single record' parser will not be satisfactory. You will then need to set up a parser on your own machine, and to parse the records with this.

In order to parse the manuscript records on your own machine, you will need:

  • A parser. MASTER has found James Clark's NSGMLS excellent. You can download this from James Clark's website, at www.jclark.com. (Click on the SP link, and then 'How to get SP', and then 'binaries for Windows')
  • To install the parser where NoteTab can find it. Install this in a directory called 'SP' on your 'C' drive. The 'SP' directory should contain three directories 'bin' 'pubtext' and 'doc', with the 'nsgms.exe' in the 'bin' directory
  • To set the parser up for XML. You will find a file 'xml.dcl' in the 'pubtext' directory. Copy this into the 'bin' directory
  • A document type definition. Download the MASTER xml dtd file www.hcu.ox.ac.uk/TEI/Master/Reference/DTD/masterx.dtd and place it in the same folder as the XML file you wish to parse.

You are now ready to parse files on your own machine. You do this as follows:

  • Insert a 'doctype' declaration at the beginning of the file you wish to parse. Do this by moving the cursor to the beginning of the document, then clicking on 'doctype declaration' at the top of the lefthand window. The following should appear in your document on the right:
           <?xml version=1.0">
    <!DOCTYPE msDescription SYSTEM "masterx.dtd" [
    ]>
  • Now, choose 'validate XML document using internal parser' from 'Validation' in the left hand screen. NoteTab should now call the parser, and open a new window to show the results of the parse.

You may notice some inconsistencies between the results of the on-line parser and the internal parser. This is because the on-line parser makes certain assumptions about your files. For example: it assumes you may want to refer to language and country identifiers, and so includes all ISO language and country identifiers as it parses. Thus, when you refer to <textLang langKey="ENM"> it knows about the langKey ENM and does not indicate any error. The internal parser makes no such assumptions, and so does not know about language and country identifiers, and so may indicate an error here.

Adding power to the internal parser: referring to entities

The above installation of XML will give you only basic facilities. For example: if you refer to special characters such as é using the standard XML system of encoding them as &eacute you will find that the parser will send you an error message. In order to have the parser process these properly, you will need to set up the parser.

Information concerning special characters is contained in documents known as 'entity sets'. For the parser to understand these characters, you need to direct the parser to load the relevant entity set documents. For the character é, SGML/XML entity &eacute;, the relevant entity set is that known as 'ISO Latin 1'. This is a public entity set, and is available in many places. You can download an ISO Latin 1 entity set file from www.cta.dmu.ac.uk/projects/master/ISOlat1.ent. To configure your parser to use this entity set:

  • Copy the ISOlat1.ent file into the same directory as the file you want to parse
  • Edit the DOCTYPE statement at the beginning of the file you wish to parse so that it reads as follows:
           <?xml version=1.0">
    <!DOCTYPE msDescription SYSTEM "masterx.dtd" [
      <!ENTITY % ISOlat1 SYSTEM "ISOlat1.ent">
      %ISOlat1;
    ] >

The first of these added lines '<!ENTITY % ISOlat1 SYSTEM "ISOlat1.ent">' gives the name of the entity file, and associates it with an entity name 'ISOLat1'. The second added line '%ISOlat1;' calls this entity: the effect of this is to tell the parser to load the file 'ISOlat1.ent'

Notice that these lines are contained in square brackets, before the '>' which closes the DOCTYPE statement. In SGML/XML language, this is known as the 'document type subset'. By convention, statements contained within the document type subset are processed before the document type definition is read. Thus, you can use this to extend or refine the processing instructions given to the parser before it parses the document itself.

Configuring the parser to use standard dtd and entity files

When you have many different manuscript descriptions, you may need to keep these in many different directories. In the simple system here outlined, you would need to include separate dtd and entity files in each directory. This is inconvenient and might lead to serious problems with different versions of the DTD and entity files in different directories.

You can avoid this problem by configuring the parser so that it always points at one and only one set of dtd and entity files. You do this by altering the statements in the document type subset so that they point at the same files, wherever the file you are parsing happens to be. For example:

       <?xml version=1.0">
<!DOCTYPE msDescription SYSTEM "C:\Sp\pubtext\masterx.dtd" [
  <!ENTITY % ISOlat1 SYSTEM "C:\Sp\pubtext\ISOlat1.ent">
  %ISOlat1;
] >

Here, you are pointing the parser at the masterx.dtd and ISOlat1.ent files contained in the pubtext directory in the Sp directory on the C drive. (You will have to put these files there first!).

If your computer is online, you can use files on the internet. This may be useful if you want to be sure (for example) that you are always using the very latest version of the dtd. This form of the doctype statement will include the XML form of the DTD held on the Oxford MASTER partner's website:

        <!DOCTYPE msDescription SYSTEM "http://www.hcu.ox.ac.uk/TEI/Master/Reference/DTD/masterx.dtd">

Similarly, the public entity file for the ISO Latin 1 entities can also be referenced externally. As this is a 'public' entity set (that is, a set which has undergone a formal registration and acceptance process) this should be referenced slightly differently:

        <!ENTITY % ISOlat1 PUBLIC "ISO 8879-1986//ENTITIES Added Latin 1//EN" "http://www.cta.dmu.ac.uk/projects/master/ISOlat1.ent">

 

This page was last updated 14 January 2001