XSLT script for cleaning up ECCO texts TEI P5 conversion
LicenseThis software is dual-licensed:
1. Distributed under a Creative Commons Attribution-ShareAlike 3.0
Unported License http://creativecommons.org/licenses/by-sa/3.0/
2. http://www.opensource.org/licenses/BSD-2-Clause
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
This software is provided by the copyright holders and contributors
"as is" and any express or implied warranties, including, but not
limited to, the implied warranties of merchantability and fitness for
a particular purpose are disclaimed. In no event shall the copyright
holder or contributors be liable for any direct, indirect, incidental,
special, exemplary, or consequential damages (including, but not
limited to, procurement of substitute goods or services; loss of use,
data, or profits; or business interruption) however caused and on any
theory of liability, whether in contract, strict liability, or tort
(including negligence or otherwise) arising in any way out of the use
of this software, even if advised of the possibility of such damage.
text nodes are examined to find soft-hyphen characters,
which are replaced an empty <g>. To be on the safe side,
apply Unicode NFC normalization to text (some decomposed
characters seen in headers).
<xsl:template match="MILESTONE"><xsl:choose><xsl:when test="not(@UNIT) or ( parent::LABEL or (parent::LIST and preceding-sibling::ITEM))"><xsl:call-template name="makenewmilestone"/></xsl:when><xsl:when test="parent::LIST and not(preceding-sibling::ITEM)"><head type="tcpmilestone"><seg type="milestoneunit"><xsl:value-of select="@UNIT"/><xsl:text></xsl:text></seg><xsl:value-of select="@N"/></head></xsl:when><xsl:when test="parent::BIBL and @UNIT"><note type="tcpmilestone"><seg type="milestoneunit"><xsl:value-of select="@UNIT"/><xsl:text></xsl:text></seg><xsl:value-of select="@N"/></note></xsl:when><xsl:when test="@UNIT='Ans;w.' and not(@N)"><milestone type="tcpmilestone" n="Ans;w." unit="unspecified"/></xsl:when><xsl:when test="@UNIT and (parent::SP or parent::SPEAKER or parent::DIV1 or parent::DIV2 or parent::DIV3 or parent::DIV4 or parent::DIV5 or parent::BODY)"><xsl:call-template name="makenewmilestone"/></xsl:when><xsl:otherwise><label type="milestone"><xsl:apply-templates select="@ID"/><xsl:apply-templates select="@REND"/><seg type="milestoneunit"><xsl:value-of select="@UNIT"/><xsl:text></xsl:text></seg><xsl:value-of select="@N"/></label></xsl:otherwise></xsl:choose></xsl:template>
Previous way of doing milestones:
a) if there is no @n, just @unit == marginal note
b) if there is no @unit, just a @n, == marginal note, @type='milestone'
c) if @unit is from a closed list of words (page, line, folio), it
seems editorial, add as subtype on @note
d) otherwise, make a label from @unit + @n, and put in a
marginal note, @type='milestone'
Namespace
No namespace
Match
OLDMILESTONE
Mode
#default
Import precedence
0
Source
<xsl:template match="OLDMILESTONE"><xsl:choose><xsl:when test="parent::NOTE and not(@N)"/><xsl:when test="@UNIT and (not(@N) or @N='')"><note place="margin" type="milestone"><xsl:apply-templates select="@ID"/><xsl:value-of select="@UNIT"/></note></xsl:when><xsl:when test="parent::L and @ID"><label type="milestone"><xsl:apply-templates select="@ID"/><xsl:value-of select="@N"/></label></xsl:when><xsl:when test="not(@UNIT) and @N"><note place="margin" type="milestone"><xsl:apply-templates select="@ID"/><xsl:value-of select="@N"/></note></xsl:when><xsl:when test="@UNIT='unspec' and @N"><note place="margin" type="milestone"><xsl:apply-templates select="@ID"/><xsl:value-of select="@N"/></note></xsl:when><!-- this short list seem like editorial words. are there more? --><xsl:when test=" @UNIT='article' or @UNIT='canon' or @UNIT='chapter' or @UNIT='commandment' or @UNIT='date' or @UNIT='day' or @UNIT='folio' or @UNIT='ground of' or @UNIT='indulgence' or @UNIT='leaf' or @UNIT='line' or @UNIT='monarch' or @UNIT='motive' or @UNIT='month' or @UNIT='reason' or @UNIT='verse' or @UNIT='year' "><note place="margin" type="milestone" subtype="{@UNIT}"><xsl:apply-templates select="@ID"/><!--
<xsl:if test="$debug='true'">
<xsl:message>Milestone 1: <xsl:value-of
select="@UNIT"/>/<xsl:value-of select="@N"/></xsl:message>
</xsl:if>
--><xsl:value-of select="@N"/></note></xsl:when><xsl:when test="parent::SP or parent::LIST or parent::SPEAKER or parent::LABEL or parent::BIBL"><note place="margin" type="milestone"><xsl:apply-templates select="@ID"/><xsl:value-of select="@UNIT"/><xsl:text></xsl:text><xsl:value-of select="@N"/></note></xsl:when><xsl:otherwise><!--
<xsl:if test="$debug='true'">
<xsl:message>Milestone 2: <xsl:value-of
select="@UNIT"/><xsl:text> </xsl:text><xsl:value-of
select="@N"/></xsl:message>
</xsl:if>
--><label place="margin" type="milestone"><xsl:apply-templates select="@ID"/><xsl:value-of select="@UNIT"/><xsl:text></xsl:text><xsl:value-of select="@N"/></label></xsl:otherwise></xsl:choose></xsl:template>
Template
HEADNOTE[P/FIGURE and not(following-sibling::HEAD or following-sibling::OPENER)]
Documentation
Description
the HEADNOTE element can be bypassed if it just has a figure
in, and no following head or opener
Namespace
No namespace
Match
HEADNOTE[P/FIGURE and not(following-sibling::HEAD or following-sibling::OPENER)]
Mode
#default
Import precedence
0
Source
<xsl:template match="HEADNOTE[P/FIGURE and not(following-sibling::HEAD or following-sibling::OPENER)]"><xsl:apply-templates select="@*|*|processing-instruction()|comment()|text()"/></xsl:template>
<xsl:template match="PUBLICATIONSTMT"><publicationStmt><xsl:choose><xsl:when test="PUBLISHER or AUTHORITY or DISTRIBUTOR"><xsl:apply-templates select="PUBLISHER|AUTHORITY|DISTRIBUTOR"/><xsl:apply-templates select="*[not(self::PUBLISHER or self::DISTRIBUTOR or self::AUTHORITY)]"/><xsl:if test="parent::FILEDESC"><xsl:call-template name="makeID"/></xsl:if><xsl:call-template name="idnoHook"/></xsl:when><xsl:otherwise><p><xsl:apply-templates/><xsl:if test="parent::FILEDESC"><xsl:call-template name="makeID"/></xsl:if><xsl:call-template name="idnoHook"/></p></xsl:otherwise></xsl:choose></publicationStmt></xsl:template>
<xsl:template match="TEXT"><xsl:choose><xsl:when test="parent::ETS or parent::EEBO or parent::GROUP"><text><xsl:apply-templates select="@*|*|processing-instruction()|comment()|text()"/></text></xsl:when><xsl:otherwise><floatingText><xsl:apply-templates select="@*"/><xsl:apply-templates select="*|processing-instruction()|comment()|text()"/></floatingText></xsl:otherwise></xsl:choose></xsl:template>
<xsl:template match="EDITORIALDECL"><editorialDecl><p>EEBO-TCP is a partnership between the Universities of Michigan and Oxford and the publisher ProQuest to create accurately transcribed and encoded texts based on the image sets published by ProQuest via their Early English Books Online (EEBO) database (http://eebo.chadwyck.com). The general aim of EEBO-TCP is to encode one copy (usually the first edition) of every monographic English-language title published between 1473 and 1700 available in EEBO.</p><p>EEBO-TCP aimed to produce large quantities of textual data within the usual project restraints of time and funding, and therefore chose to create diplomatic transcriptions (as opposed to critical editions) with light-touch, mainly structural encoding based on the Text Encoding Initiative (http://www.tei-c.org).</p><p>The EEBO-TCP project was divided into two phases. The 25,363 texts created during Phase 1 of the project have been released into the public domain as of 1 January 2015. Anyone can now take and use these texts for their own purposes, but we respectfully request that due credit and attribution is given to their original source.</p><p>Users should be aware of the process of creating the TCP texts, and therefore of any assumptions that can be made about the data.</p><p>Text selection was based on the New Cambridge Bibliography of English Literature (NCBEL). If an author (or for an anonymous work, the title) appears in NCBEL, then their works are eligible for inclusion. Selection was intended to range over a wide variety of subject areas, to reflect the true nature of the print record of the period. In general, first editions of a works in English were prioritized, although there are a number of works in other languages, notably Latin and Welsh, included and sometimes a second or later edition of a work was chosen if there was a compelling reason to do so.</p><p>Image sets were sent to external keying companies for transcription and basic encoding. Quality assurance was then carried out by editorial teams in Oxford and Michigan. 5% (or 5 pages, whichever is the greater) of each text was proofread for accuracy and those which did not meet QA standards were returned to the keyers to be redone. After proofreading, the encoding was enhanced and/or corrected and characters marked as illegible were corrected where possible up to a limit of 100 instances per text. Any remaining illegibles were encoded as <gap>s. Understanding these processes should make clear that, while the overall quality of TCP data is very good, some errors will remain and some readable characters will be marked as illegible. Users should bear in mind that in all likelihood such instances will never have been looked at by a TCP editor.</p><p>The texts were encoded and linked to page images in accordance with level 4 of the TEI in Libraries guidelines.</p><p>Copies of the texts have been issued variously as SGML (TCP schema; ASCII text with mnemonic sdata character entities); displayable XML (TCP schema; characters represented either as UTF-8 Unicode or text strings within braces); or lossless XML (TEI P5, characters represented either as UTF-8 Unicode or TEI g elements).</p><p>Keying and markup guidelines are available at the <ref target="http://www.textcreationpartnership.org/docs/.">Text Creation Partnership web site</ref>.</p></editorialDecl></xsl:template>
<xsl:template match="PUBPLACE"><xsl:choose><xsl:when test="parent::PUBLICATIONSTMT/PUBLISHER or parent::PUBLICATIONSTMT/AUTHORITY or parent::PUBLICATIONSTMT/DISTRIBUTOR"><pubPlace><xsl:apply-templates select="@*"/><xsl:apply-templates/></pubPlace></xsl:when><xsl:otherwise><name type="place"><xsl:apply-templates select="@*"/><xsl:apply-templates/></name></xsl:otherwise></xsl:choose></xsl:template>
@PLACE has both inconsistencies and mistakes; some values
should obviously be @n
Namespace
No namespace
Match
@PLACE
Mode
#default
Import precedence
0
Source
<xsl:template match="@PLACE"><xsl:variable name="p" select="lower-case(.)"/><xsl:choose><xsl:when test="$p='marg' or $p='marg;' or $p='marg)' or $p='marg=' or $p='ma / rg' or $p='6marg'"><xsl:attribute name="place">margin</xsl:attribute></xsl:when><xsl:when test="$p = 'unspecified'"/><xsl:when test="$p='foot;' or $p='foor;' or $p='foot'"><xsl:attribute name="place">bottom</xsl:attribute></xsl:when><xsl:when test="$p='foot1' or $p='foot2'"><xsl:attribute name="place">bottom</xsl:attribute><xsl:attribute name="type" select="$p"/></xsl:when><xsl:when test="$p='inter'"><xsl:attribute name="rend" select="$p"/></xsl:when><xsl:when test="$p='‡' or $p='†' or $p='‖' or $p='6' or $p='“' or $p='1' or $p='*'"><xsl:attribute name="n"><xsl:value-of select="$p"/></xsl:attribute></xsl:when><xsl:otherwise><xsl:attribute name="place"><xsl:value-of select="$p"/></xsl:attribute></xsl:otherwise></xsl:choose></xsl:template>
<xsl:template match="@TYPE"><xsl:choose><xsl:when test=".='poem (rebus)'"><xsl:attribute name="type">poem</xsl:attribute><xsl:attribute name="subtype">rebus</xsl:attribute></xsl:when><xsl:when test=".='poem(s)'"><xsl:attribute name="type">poems</xsl:attribute></xsl:when><xsl:when test=".='poem and response'"><xsl:attribute name="type">poem</xsl:attribute><xsl:attribute name="subtype">response</xsl:attribute></xsl:when><xsl:when test=".='poem collection'"><xsl:attribute name="type">poem</xsl:attribute><xsl:attribute name="subtype">collection</xsl:attribute></xsl:when><xsl:when test=".='poem fragment'"><xsl:attribute name="type">poem</xsl:attribute><xsl:attribute name="subtype">fragment</xsl:attribute></xsl:when><xsl:when test=".='poem fragments'"><xsl:attribute name="type">poem</xsl:attribute><xsl:attribute name="subtype">fragments</xsl:attribute></xsl:when><xsl:when test=".='poem from author to the reader'"><xsl:attribute name="type">poem</xsl:attribute><xsl:attribute name="subtype">from_author_to_the_reader</xsl:attribute></xsl:when><xsl:when test=".='poem in honor of Gustavus'"><xsl:attribute name="type">poem</xsl:attribute><xsl:attribute name="subtype">in_honor_of_Gustavus</xsl:attribute></xsl:when><xsl:when test=".='poem incorporating anagrams'"><xsl:attribute name="type">poem</xsl:attribute><xsl:attribute name="subtype">incorporating_anagrams</xsl:attribute></xsl:when><xsl:when test=".='poem incorporating the Creed'"><xsl:attribute name="type">poem</xsl:attribute><xsl:attribute name="subtype">incorporating_the_Creed</xsl:attribute></xsl:when><xsl:when test=".='poem on frontispiece'"><xsl:attribute name="type">poem</xsl:attribute><xsl:attribute name="subtype">on_frontispiece</xsl:attribute></xsl:when><xsl:when test=".='poem on the seven virtues'"><xsl:attribute name="type">poem</xsl:attribute><xsl:attribute name="subtype">on_the_seven_virtues</xsl:attribute></xsl:when><xsl:when test=".='poem to Archpapist'"><xsl:attribute name="type">poem</xsl:attribute><xsl:attribute name="subtype">to_Archpapist</xsl:attribute></xsl:when><xsl:when test=".='poem to God from second edition'"><xsl:attribute name="type">poem</xsl:attribute><xsl:attribute name="subtype">to_God_from_second_edition</xsl:attribute></xsl:when><xsl:when test=".='poem to author'"><xsl:attribute name="type">poem</xsl:attribute><xsl:attribute name="subtype">to_author</xsl:attribute></xsl:when><xsl:when test=".='poem to book'"><xsl:attribute name="type">poem</xsl:attribute><xsl:attribute name="subtype">to_book</xsl:attribute></xsl:when><xsl:when test=".='poem to king'"><xsl:attribute name="type">poem</xsl:attribute><xsl:attribute name="subtype">to_king</xsl:attribute></xsl:when><xsl:when test=".='poem to pupils'"><xsl:attribute name="type">poem</xsl:attribute><xsl:attribute name="subtype">to_pupils</xsl:attribute></xsl:when><xsl:when test=".='poem to readers'"><xsl:attribute name="type">poem</xsl:attribute><xsl:attribute name="subtype">to_readers</xsl:attribute></xsl:when><xsl:when test=".='poem to subjects'"><xsl:attribute name="type">poem</xsl:attribute><xsl:attribute name="subtype">to_subjects</xsl:attribute></xsl:when><xsl:when test=".='poem to the author'"><xsl:attribute name="type">poem</xsl:attribute><xsl:attribute name="subtype">to_the_author</xsl:attribute></xsl:when><xsl:when test=".='poem to the censorious reader'"><xsl:attribute name="type">poem</xsl:attribute><xsl:attribute name="subtype">to_the_censorious_reader</xsl:attribute></xsl:when><xsl:when test=".='poem to the censors'"><xsl:attribute name="type">poem</xsl:attribute><xsl:attribute name="subtype">to_the_censors</xsl:attribute></xsl:when><xsl:when test=".='poem to the pious reader'"><xsl:attribute name="type">poem</xsl:attribute><xsl:attribute name="subtype">to_the_pious_reader</xsl:attribute></xsl:when><xsl:when test=".='poem to the reader'"><xsl:attribute name="type">poem</xsl:attribute><xsl:attribute name="subtype">to_the__reader</xsl:attribute></xsl:when><xsl:when test=".='poem with commentary'"><xsl:attribute name="type">poem</xsl:attribute><xsl:attribute name="subtype">commentary</xsl:attribute></xsl:when><xsl:when test=".='poem(s) by one author'"><xsl:attribute name="type">poems</xsl:attribute><xsl:attribute name="subtype">by_one_author</xsl:attribute></xsl:when><xsl:when test=".='poems and commentary'"><xsl:attribute name="type">poems</xsl:attribute><xsl:attribute name="subtype">commentary</xsl:attribute></xsl:when><xsl:when test=".='poems gratulatory'"><xsl:attribute name="type">poems</xsl:attribute><xsl:attribute name="subtype">gratulatory</xsl:attribute></xsl:when><xsl:when test=".='poems of acknowledgment'"><xsl:attribute name="type">poems</xsl:attribute><xsl:attribute name="subtype">acknowledgment</xsl:attribute></xsl:when><xsl:when test=".='poems on the Symbols'"><xsl:attribute name="type">poems</xsl:attribute><xsl:attribute name="subtype">on_the_Symbols</xsl:attribute></xsl:when><xsl:when test=".='poems to the reader'"><xsl:attribute name="type">poems</xsl:attribute><xsl:attribute name="subtype">to_the_reader</xsl:attribute></xsl:when><xsl:when test="not(normalize-space(.)='')"><xsl:attribute name="type"><xsl:analyze-string regex="([0-9]+)(.*)" select="translate(translate(.,'( &/', '____'),$intype,'')"><xsl:matching-substring><xsl:text>n</xsl:text><xsl:value-of select="regex-group(1)"/><xsl:value-of select="regex-group(2)"/></xsl:matching-substring><xsl:non-matching-substring><xsl:value-of select="."/></xsl:non-matching-substring></xsl:analyze-string></xsl:attribute></xsl:when></xsl:choose></xsl:template>
You sometimes find a milestone inside a marginal note, where
the note has the same value for @type as the milestone has for unit.
Kill the @type on note in this situation.
Template
P[count(FIGURE)=count(*) and not (text()) and parent::*/count(P[not(FIGURE)])>1]
Documentation
Description
Figures inside paragraphs can generally be free-standing,
unless they are the only paragraph of this type (ie inside a
div consisting only of pictures).
Namespace
No namespace
Match
P[count(FIGURE)=count(*) and not (text()) and parent::*/count(P[not(FIGURE)])>1]
Mode
#default
Import precedence
0
Source
<xsl:template match="P[count(FIGURE)=count(*) and not (text()) and parent::*/count(P[not(FIGURE)])>1]"><xsl:apply-templates/></xsl:template>
<xsl:template match="ENCODINGDESC/PROJECTDESC"><projectDesc><p>Created by converting TCP files to TEI P5 using tcp2tei.xsl,
TEI @ Oxford.
</p></projectDesc></xsl:template>
Template
tei:p[not(parent::tei:sp or parent::tei:headnote or parent::tei:postscript
or parent::tei:argument) and count(*)=1 and not(text()) and (tei:list or
tei:table)]
Documentation
Description
A p with list, floatingText or table as singletons can lose itself
Namespace
No namespace
Match
tei:p[not(parent::tei:sp or parent::tei:headnote or parent::tei:postscript
or parent::tei:argument) and count(*)=1 and not(text()) and (tei:list or
tei:table)]
Mode
#default
Import precedence
0
Source
<xsl:template match="tei:p[not(parent::tei:sp or parent::tei:headnote or parent::tei:postscript or parent::tei:argument) and count(*)=1 and not(text()) and (tei:list or tei:table)]"><xsl:apply-templates select="*|text()|processing-instruction()|comment()" mode="pass2"/></xsl:template>
Template
tei:q[count(*)=1 and not(text()) and tei:floatingText]
Documentation
Description
A singleton floatingText inside a q can skip the q
Namespace
No namespace
Match
tei:q[count(*)=1 and not(text()) and tei:floatingText]
Mode
#default
Import precedence
0
Source
<xsl:template match="tei:q[count(*)=1 and not(text()) and tei:floatingText]"><xsl:apply-templates select="*|processing-instruction()|comment()|text()" mode="pass2"/></xsl:template>
<xsl:template match="tei:availability" mode="pass3"><xsl:variable name="d" select="/*/tei:teiHeader/tei:fileDesc/tei:publicationStmt/tei:date"/><availability><xsl:choose><xsl:when test="contains($d,'Phase 2')"><p>This keyboarded and encoded edition of the work
described above is co-owned by the institutions
providing financial support to the Early English Books
Online Text Creation Partnership. Searching, reading,
printing, or downloading EEBO-TCP texts is reserved for
the authorized users of these project partner
institutions. Permission must be granted for subsequent
distribution, in print or electronically, of this
EEBO-TCP Phase II text, in whole or in part.</p></xsl:when><xsl:otherwise><p>This keyboarded and encoded edition of the
work described above is co-owned by the institutions
providing financial support to the Early English Books
Online Text Creation Partnership. This Phase I text is
available for reuse, according to the terms of <ref target="https://creativecommons.org/publicdomain/zero/1.0/">Creative
Commons 0 1.0 Universal</ref>. The text can be copied,
modified, distributed and performed, even for
commercial purposes, all without asking permission.</p></xsl:otherwise></xsl:choose></availability></xsl:template>