TEI Conference and Members' Meeting 2022

September 12 - 16, 2022 | Newcastle, UK

JavaScript is Disabled
Your browser's JavaScript functionality is disabled. It has to be enabled to use this function of ConfTool.
Here you can find information on how to enable JavaScript
If you have any problems, please contact the organisers at tei2022@ncl.ac.uk.

Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

Session Overview

Session

Import to your local calendar

Session 2B: Long Papers

Time:

Wednesday, 14/Sept/2022:

11:30am - 1:00pm

Session Chair: Hugh Cayless, Duke University

Location: ARMB: 2.16

Armstrong Building: Lecture Room 2.16. Capacity: 100

Presentations

ID: 145 / Session 2B: 1
Long Paper
Keywords: collation, information transfer, ecdotics, materiality

TEICollator: a semi-automatic TEI to TEI workflow

M. Gille Levenson

ENS Lyon, France

Automated text comparison has been an area of interest for many years [Nury 2019]: tools such as CollateX allow automated text comparison, and even export to TEI. However, there is no tool today that allows, from transcripts encoded and structured in XML-TEI, to automate the collation of texts and to inject the produced apparatuses into the original files. Working in this way ensures that the contextual and structural information specific to each witness (structure, additions, deletions, line changes, etc) encoded in XML-TEI is not lost. In other words, there is a need of being able to work on textual differences without ignoring the individual, structural and material reality of each text or witness.

Furthermore, the increasing use of Optical Character Recognition (OCR) or Handwritten Text Recognition (HTR) tools [Kiessling 2019], which is interesting both in terms of speed of acquisition and of quality of the preserved information [Camps 2016], have consequences for the ecdotical methods: should we keep collating the text manually, when its acquisition has been done by the computer ?

My work focus on a semi-automatic collation workflow. I will present a complete TEI to TEI processing chain, from single TEI-encoded transcriptions to meaningful collated ones (by the production of typed apparatus, for instance: see [Camps 2018]) that allows to keep the original structural information. This process also identifies the omissions and transpositions, and finally the transformation of the data into documents that present the textual information in the clearest possible way. I will present my work from the perspective of information transfer and pointing out the dialectic between material and textual collation (as carried out by Blekker et al 2018, but using other methods): the latter being the alignment of material features encoded in TEI. Finally, I will outline the limitations and difficulties I face along the processing chain (can the tokenisation of TEI-encoded text be fully automated? What level of textual heterogeneity can manage the worflow ? What quality of lemmatisation is required? what encoding method should be prefered to get the best result posible ?).

I want to show how the TEI standard, the pivot format of this computational method, can be used to describe text as well as to process it. Finally, I will show how the last operation, the transformation from TEI to LaTeX, maybe the most complex task, is fully part of the ecdotic chain, and contributes to produce meaning from the data: in this sense, my work is part of the reflection carried out for several years on Digital Scholarly Editions [Pierazzo 2015; Pierazzo and Driscoll 2016], -- I made a choice to prefer the print/pdf format over a web interface -- thanks to the LaTeX Reledmac package developed and maintained by Maïeul Rouquette [Rouquette 2022].

This paper will be the technical counterpart of a paper presented in La Laguna in July, which will focus on the philological side of the processing chain.

Gille Levenson-TEICollator-145.odt

ID: 148 / Session 2B: 2
Long Paper
Keywords: digital edition, data quality assurance, XSL-FO, software test, PDF

Back to analog: the added value of printing TEI editions

M. Kupreyev

Goethe Universität Frankfurt am Main, Germany

Saale (2017) [1] provides the operational definition of a scholarly digital edition by contrasting its paradigm to that of a print edition. His bottom line is that any “digital edition cannot be given in print without significant loss of content and functionality”. In my talk I will touch upon the challenges of printing TEI XML datasets but also substantiate its positive effects: PDF export, indeed, presents only a part of the encoded information, but it can play essential role in data quality assurance. Creating a printed version of a digital edition can enhance the consistency of encoding and affect the overall production pipeline of the TEI XML data.

At the “School of Salamanca” [2] project the TEI XML of the early modern print editions goes through the restrictive Schema and Schematron check-ups, after which it is exported to HTML and JSON IIIF for web display [3]. Recently, an option of PDF export was added. Considering the complexity and the depth of annotation the solution integrated in Salamanca’s Oxygen workflow was chosen, namely a free Apache FOP processor. Similar results may have been achieved with TEI Publisher or Oxygen PDF Chemistry processor. The PDF export highlighted the issues which pertain to two ontologically different areas:

• Rendering XML elements in a constrained two-dimensional PDF layout.

• Varying XML encoding of semantically identical chunks of information.

The issues of the first type refer, for example, to the representation of marginal notes and their anchors, and to the pagination correlation between XML and IIIF (as representing the original), and PDF (as a print output). The second type embraces different rendering of semantically identical text parts, induced either by errors in the original or by the text editors.

PDF generation was initially intended to be one of the export methods of TEI data. It is now implemented early in the TEI production workflow, as it pinpoints the semantic and structural inconsistencies in the data and allows to correct them before the final XML release. PDF production thus adheres to one of the principles of agile software testing, which states that capturing and eliminating defects in the early stages of RDLC (research data life cycle) is less time-consuming, less resource-intensive and less prone to collateral bugs (Crispin 2008) [4].

[1] Sahle, Patrick. 2017. "What is a Scholarly Digital Edition?" in Digital Scholarly Editing, edited by Matthew James Driscoll and Elena Pierazzo, 19-39. Cambridge: Open Book Publishers.

[2] https://www.salamanca.school/en/index.html , accessed on 20.06.2022.

[3] https://blog.salamanca.school/de/2022/04/27/the-school-of-salamanca-text-workflow-from-the-early-modern-print-to-tei-all/,

https://blog.salamanca.school/de/2020/03/17/deutsch-entwicklung-der-webanwendung-v2-0/ , accessed on 20.06.2022.

[4] Crispin, LIsa. 2008. Agile Testing: A Practical Guide for Testers and Agile Teams. Addison-Wesley.

Kupreyev-Back to analog-148.docx

ID: 106 / Session 2B: 3
Long Paper
Keywords: poetry, rhyme, sound

Encoding sonic devices: what is it good for?

M. Holmes

University of Victoria, Canada

The Digital Victorian Periodical Poetry project[1] has captured metadata and page-images for 15,548 poems from Victorian periodicals, and transcribed and encoded a representative sample of 2,150 poems. Our encoding captures rhyme and other sonic devices such as anaphora, epistrophe, and refrains. This presentation will describe our encoding practices and then discuss what useful outcomes can be gained from this undertaking. Although even TEI P1 specified both a rhyme attribute to capture rhyme-scheme and a rhyme element for "very detailed studies of rhyming" (TEI P1 P172)[2], and all significant TEI tutorials teach the encoding of rhyme (e.g. TEI by Example Module 4), it is difficult to find work which makes explicit use of TEI encoding of rhyme (let alone other sonic devices) in the analysis of English poetry.

Is manual encoding of rhyme still necessary? Chisholm & Robey noted back in 1995 that "much of the analysis which currently requires extensive manual markup will in due course be carried out by electronic means" (100), and much work has been devoted to the automated detection of rhyme (Kavanagh 2008; Kilner & Fitch 2017). However, these tools are not completely successful, and in our own work, there is a consistent subset of cases which generate disagreement and discussion regarding type of rhyme, or even whether a rhyme is intended. We do make use of automated detection of anaphora and epistrophe, but only to generate suggestions for cases that might have been missed after the initial encoding has been done. We therefore believe that manually-curated encoding of sonic devices is a prerequisite for serious literary analysis which depends on that encoding.

[1] DVPP, https://dvpp.uvic.ca/.

[2] See also Chisholm & Robey 1995.

Having invested in careful encoding of sonic devices, what are the potential uses for research? DVPP has begun by making rhyme-scheme discoverable and searchable in our search interface, and this is beginning to generate research questions. We can already test notions such as the claim that irregular rhyme-schemes were more frequently used as the century progressed; a table of the percentage of irregularly-rhymed poems in each decade in our collection (Appendix) shows only the weakest support for this claim.

In addition to tracing trends in poetic practice, and the construction of historical rhyme dictionaries, sonic device encoding might also be used for:

- Dialect detection. For example, our dataset includes a significant subset of poems written in Scots dialect, and others which may or may not be; for problem cases, where other factors such as poet and host publication suggest a dialect poem, but surface features are not persuasive, rhyme patterns may provide more evidence.

- Genre detection. Particular poetic genres, such as sonnets or ballads are characterized by formal structures which include rhyme-scheme.

- Bad poetry. We are particularly interested in the notion of what constitutes bad poetry, and our early work suggests that poetry which subjectively seems to be of poor quality also exhibits features such as monotonous rhyme-schemes and intrusive echoic devices.

- Authorship attribution.

- Diachronic sound-change.

- Historical rhyming dictionaries.

Holmes-Encoding sonic devices-106.odt

Mobile View Print View

Contact and Legal Notice · Contact Address:
Privacy Statement · Conference: TEI 2022