TEI Conference and Members' Meeting 2022

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

Date: Friday, 16/Sept/2022

9:00am - 9:30am

Registration - Friday

9:30am - 11:00am

Session 7A: Short Papers
Location: ARMB: 2.98
Session Chair: Patricia O Connor, University of Oxford

ID: 118 / Session 7A: 1
Short Paper
Keywords: Spanish literature, Digital library, TEI-Publisher, facsimile, sourceDoc

Encoding Complex Structures: The Case of a Gospel Spanish Chapbook

E. Leblanc, P. Jacsont

University of Geneva, France

The project Untangling the cordel seeks to study and revalue a corpus of Spanish chapbooks dating from the 19th century by creating a digital library (Leblanc and Carta 2021). This corpus of chapbooks, also called pliegos de cordel, is highly heterogeneous in its content and editorial formats, giving rise to multiple reflections on its encoding.

In this short paper, we would like to share our feedback and thoughts on the XML-TEI encoding of a Gospel pliego for its integration into TEI-Publisher.

This pliego is an in-4° containing 16 small columns with extracts from the Four Gospels (John's prologue, Annunciation, Nativity, Mark's finale and the passion according to John; i.e. the same extracts as those in the book of hours (Join-Lambert 2016)) duplicated on both sides. This printout had to be cut in half and then folded to obtain two identical sets of excerpts from the Four Gospels. Whoever acquires it appropriates the object for private devotions or protection: it is therefore not an object kept for reading (the text is written in Latin with small letters) but for apotropaic or curative use (Botrel 2021).

To put forward the interest of this pliego as a devotional object and not strictly as a textual object required much reflection concerning its encoding and its publication on our digital library. Indeed, depending on our choice of encoding, the information conveyed differs: should we favour a diplomatic and formal edition or an encoding that follows the reading?

To determine which encoding would be the most suitable, we decided to test two encoding solutions, one with <facsimile> and another with <sourceDoc>. The visualisation of the two encodings possibilities on TEI-Publisher will allow us to develop the advantages and disadvantages of each method.

ID: 124 / Session 7A: 2
Short Paper
Keywords: Digital Scholarly Edition, Dictionary, Linguistics, Manuscript

Annotating a historical manuscript as a linguistic resource

H.-J. Döhla³, H. Klöter², M. Scholger¹, E. Steiner¹

¹University of Graz; ²Humboldt-Universität zu Berlin; ³Universität Tübingen

The Bocabulario de lengua sangleya por las letraz de el A.B.C. is a historical Chinese-Spanish dictionary held by the British Library (Add ms. 25.317), probably written in 1617. It consists of 223 double-sided folios with about 1400 alphabetically arranged Hokkien Chinese lemmas in the Roman alphabet.

The contribution will introduce our considerations on how to extract and annotate linguistic data from the historical manuscript and the design of a digital scholarly edition (DSE) in order to answer research questions in the fields of linguistics, missionary linguistics and migration (Klöter/Döhla 2022).

ID: 163 / Session 7A: 3
Short Paper
Keywords: text mining, topic modeling, digital scholarly editions, data modeling, data integration

How to Represent Topic Models in Digital Scholarly Editions

U. Henny-Krahmer¹, F. Neuber²

¹University of Rostock, Germany; ²Berlin-Brandenburgische Akademie der Wissenschaften, Germany

Topic modeling (Blei et al. 2003, Blei 2012) as a quantitative text analysis method is not part of the classic editing workflow as it stands for a way of working with text that in many respects contrasts with critical editing. However, for the purpose of a thematic classification of documents, topic modeling can be a useful enhancement to an editorial project. It has the potential to replace the cumbersome manual work that is needed to represent and structure large edition corpora thematically, as has been done for instance in the projects Alfred Escher Briefedition (Jung 2022), Jean Paul – Sämtliche Briefe digital (Miller et al. 2018) or the edition humboldt digital (Ette 2016).

We apply topic modeling to two edition corpora of correspondence of the German-language authors Jean Paul (1763-1825) and Uwe Johnson (1934-1984), compiled at the Berlin-Brandenburg Academy of Sciences and Humanities (BBAW) and the University of Rostock (Miller et al. 2018, Helbig et al. 2017). In our contribution, we discuss how the results of the topic modeling can be usefully integrated into digital editions. We propose to integrate them into the TEI corpora on three levels: (1) the topic model of a corpus, including the topic words and the parameters of its creation, is modeled as a taxonomy in a separate TEI file, (2) the relevance of the topics for individual documents is expressed in the text classification section of the TEI header of each document in the corpus, and (3) the assignment of individual words in a document to topics is expressed by links from word tokens to the corresponding topic in the taxonomy. Following a TEI encoding workflow as outlined above allows for developing digital editions that include topic modeling as an integral part of their user interface.

ID: 119 / Session 7A: 4
Short Paper
Keywords: Odyssey, heroines, prosopography, women

Analyzing the Catalogue of Heroines through Text Encoding

R. Milio

Bucknell University, United States of America

The Catalogue of Heroines (Odyssey 11.225-330) presents a corpus of prominent mythological women as Odysseus recounts the stories of each woman he encounters in the Underworld. I undertook a TEI close reading of the Catalogue in order to center ancient women in a discussion of the Odyssey and determine how the relationships between the heroines contribute to the Catalogue’s overall purpose. In this short paper I demonstrate first my process: developing my own detailed feminist translation of the Catalogue, applying a TEI close reading to both my translation and the original ancient Greek, and creating a customized schema to best suit my purposes. Then, I detail my analysis of my close reading using cross-language encoding and a prosopography I developed through that reading, which reveals complex connections, both explicit and implied, among characters of the Catalogue. Third, I present the result of this analysis: that through this act of close reading I identified a heretofore unconsidered list of objects within the Catalogue and then demonstrated how these four objects of the Catalogue, ζώνη (girdle), βρόχοs (noose), ἕδνα (bride-price), and χρυσὸs (gold), reveal the ancient Greek stigma surrounding women, sexuality, and fidelity. These objects clearly allude to negative perceptions of women in ancient Greek society and through these objects the Catalogue of Heroines reminds its audience of Odysseus’ concerns regarding the faithfulness of his wife Penelope. Ultimately, by applying and adapting a TEI close reading, I identified patterns within the text that spoke to a greater purpose for the Catalogue and the Odyssey overall, that was able to export for further analysis of this prosopographical data. By the time of the conference, I will be able to present data visualizations that provide pathways that can assist other classicists to center women in ancient texts.

9:30am - 11:00am

Session 7B: Long Papers
Location: ARMB: 2.16
Session Chair: Gimena del Rio Riande, CONICET

ID: 132 / Session 7B: 1
Long Paper
Keywords: TEI, born-digital heritage, retrocomputing, digitality, materiality

Is it still data? Scholarly Editing of Text from Early Born-Digital Heritage

T. Roeder

Universität Würzburg, Germany

Digital heritage is strongly bound to original devices and displays. Even in today’s standardized environments, text can change its appearance depending on the monitor technology, on the processing software, and on the available fonts on the system: Text as data depends much on technical interpretation.

Creating a scholarly digital edition from born-digital heritage, expecially text, needs to consider the original conditions, like encoding and hardware standards. My question is: Are the encoding guidelines of the TEI suitable for representing born-digital text? How much information is required about the original environment? Can a screenshot serve as facsimile, or it is neccessary to link to emulated states of the display software?

To give an example, I will present a preliminary scholarly TEI-based digital edition of “disk magazines”. These magazines were a special type of periodical that was published mostly on floppy disk mainly in the 1980s and 1990s. Created by home computer enthusiasts for the community, disk magazines are potentially valuable as a historical resource to study the experiences of programmers, users and gamers in the early stage of microcomputing.

In the examples (one of them is available at https://diskmags.github.io/md_87-11.html), the digital texts are decompressed byte sequences of PETSCII code, which is only partially compatible to ASCII. The appearance of the characters could be changed completely by the programmer to display foreign characters or alphabets. Further, it depended on a 40x25 characters layout, where text had to be aligned manually by inserting whitespaces. The once born-digital text – as data – is transformed into readable text – as image – on a screen. The example demonstrated that the connection between textual data and textual display can be very fragile.

For TEI encoding, this would have some consequences. On the one side, there would be a requirement to preserve as much of the original data as possible. On the other side, a scholarly edition needs to represent the semantics of the visible document. It would require an interpretative layer to communicate between these two levels, which could be implemented by different markup strategies; however it needs to be discussed whether classes like “att.global.rendition” are actually suited for this. It also needs to be discussed in which way a digital document (or which instance of it: as stored data, as memory state, as display?) can be interpreted in the same way as a material document – and which implications this would have for TEI encoding of born-digital heritage.

Roeder-Is it still data Scholarly Editing of Text from Early Born-Digital Heritage-132.odt

ID: 152 / Session 7B: 2
Long Paper
Keywords: publishing, LOD, TEI infrastructure

Using Citation Structures

H. Cayless

Duke University, United States of America

This paper is really a follow-up to one I gave at Balisage in 2021.[1] Citation Structures are a TEI feature introduced in version 4.2.0 of the Guidelines, which provide an alternative (and more easily machine-processable) method for declaring their internal structures.[2] This mechanism is important because of the heterogeneity of texts and consequently of the TEI structures used to model them. This heterogeneity necessarily means it is difficult for any system publishing collections of TEI editions to treat their presentation consistently. For example, a citation like “1.2” might mean “poem 1, line 2” in one edition, and “book 1, chapter 2” in another. It might be perfectly sensible to split an edition into chapters, or even small sections, for presentation online, but not at all to split a poem into lines (though maybe groups of lines might be desirable). A publication system otherwise will have to rely on assumptions and guesswork about the items in its purview, and may fail to cope with new material that does not behave as it expects. Worse, there is no guarantee that the internal structures of editions are consistent within themselves. We might consider, for example, Ovid’s ‘Tristia’, in which the primary organizational structure is book, poem, line, but book two is a single, long poem.

Citation structures permit a level of flexibility hard to manage otherwise, by allowing both nested structures and alternative structures at every level. In addition, a key new feature of citation structures over the older reference declaration methods is the ability to attach information that may be used by a processing system to each structural level. The <citeData> element which makes this possible will allow, for example, a structural level to indicate what name it should be given in a table of contents, or even whether or not it should appear in such a feature.

I will discuss the mechanics of creating and using citation structures. Finally, I will present a working system in XSLT that can exploit <citeStructure> declarations to produce tables of contents, split large documents into substructures for presentation on the web, and resolve canonical citations to parts of an edition.

1. https://www.balisage.net/Proceedings/vol26/html/Cayless01/BalisageVol26-Cayless01.html

2. See https://tei-c.org/release/doc/tei-p5-doc/en/html/CO.html#CORS6 and https://tei-c.org/release/doc/tei-p5-doc/en/html/SA.html#SACRCS.

ID: 160 / Session 7B: 3
Long Paper
Keywords: Manuscript cataloguing, semantic markup, retro-conversion vs. born-digital

Text between data and metadata: An examination of input types and usage of TEI encoded texts

T. Schaßan

Herzog August Bibliothek Wolfenbüttel, Germany

Many texts that have been encoded using the TEI in the past are retro-converted from printed sources: manuscript catalogues and dictionaries are examples for highly structured texts, drama, verse, and performance texts are usually less structured, editions appear somewhere inbetween.

Many of the text types for which the TEI offers specialised elements represent both metadata and data, according to the scenarios in which these texts are used.

In the field of manuscript cataloguing, it has been a question for a long time whether the msdescription module is sufficient for the representation of a retro-converted text of a formerly printed catalogue. One may argue, that a catalogue is first of all a visually structured text, a succession of paragraphs, whose semantics are only loosely connected to the main elements the TEI defines, such as <msContents>, <physDesc>, or <msPart>. On the other hand, on a sub-paragraph level, the TEI offers structures, which may not be align-able with the actual text of the catalogue so that the person who carries out the retro-conversion has to decide whether to change the text according to the TEI schema rules or encode the text semantically wrong or structure the text with much less semantic information as it would be possible.

Now, that the TEI is more and more used to store these kind of texts as born-digitals, the questions is whether the structures offered by the TEI meet all the needs the texts and their authors might have in different scenarios: Is a TEI-encoded text of a given kind equally useful for all search and computational uses, as well as publishing needs? Are the TEI structures flexible enough or do they privilege some uses over others? How much of the semantic information is encoded in the text and how much of it might be realised only in the processing of the sources?

In this paper, manuscript catalogues serve as an example for the more general question about what structures, how much markup and what kind of markup is needed in the time of powerful search engines and artificial intelligence, authority files and the Linked Open Data.

Schaßan-Text between data and metadata-160.odt

11:00am - 11:30am

Friday Morning Refreshment Break
Location: ARMB: King's Hall

11:30am - 1:00pm

Session 8A: Long Papers
Location: ARMB: 2.98
Session Chair: Meaghan Brown, Independent Scholar

ID: 102 / Session 8A: 1
Long Paper
Keywords: medieval studies; medieval literature; xforms; manuscript; codicology

Codex as Corpus : Using TEI to unlock a 14th-century collection of Old French short texts

S. Dows-Miller

University of Oxford, United Kingdom

Medieval manuscript collections of short texts are, in a sense, materially discrete corpora, offering data that can help scholarship understand the circumstances of their composition and early readership.

This paper will discuss the role played by TEI in an ongoing mixed-method study into a fourteenth-century manuscript written in Old French: Bibliothèque nationale de France, fonds français, 24432. The aim of the project has been to display how fruitful the combination of traditional and more data-driven approaches can be in the holistic study of individual manuscripts.

TEI has been critical to the project so far, and has enabled discoveries about the manuscript which have eluded less technologically enabled generations of scholarship. For example, quantitative analysis of scribal abbreviation, made possible through the manuscript’s encoding, has illuminated the contributions of a number of individuals in the production of the codex. Similarly, analysis of the people and places mentioned in the texts allows for greater localisation of the manuscript than was previously considered possible.

As with any project of this nature, the process of encoding BnF fr. 24432 in TEI has not been without difficulty, and so this paper will also discuss the ways in which attempts have been made to streamline the process through automation and UI tools, most notably in the case of this project through the use of XForms.

Dows-Miller-Codex as Corpus-102.docx

ID: 149 / Session 8A: 2
Long Paper
Keywords: ODD, ODD chaining, RELAX NG, schema, XSLT Stylesheets

atop: another TEI ODD processor

S. Bauman¹, H. Bermúdez Sabel², M. Holmes³, D. Maus⁴

¹Northeastern University, United States of America; ²University of Neuchâtel, Switzerland; ³University of Victoria, Canada; ⁴State and University Library Hamburg, Germany

TEI is, among other things, a schema. That schema is written in and customized with the TEI schema language system, ODD. ODD is defined by Chapter 22 of the _Guidelines_, and is also used to _define_ TEI P5. It can also be used to define non-TEI markup languages. The TEI supports a set of stylesheets (called, somewhat unimaginatively, “the Stylesheets”) that, among other things, convert ODD definitions of markup languages (including TEI P5) and customizations thereof into schema languages like RELAX NG and XSD that one can use to validate XML documents.

Holmes and Bauman have been fantasizing for years about re-writing those Stylesheets from scratch. Spurred by Maus’ comment of 2021-03-23[1] Holmes presented a paper last year describing the problems with the current Stylesheets and, in essence, arguing that they should be re-written.[2] Within a few months the TEI Technical Council had charged Bauman with creating a Task Force for the purpose of creating, from scratch, an ODD processor that reads in one or more TEI ODD customization files, merges them with a TEI language (likely, but not necessarily, TEI P5 itself), and generates RELAX NG and Schematron schemas. It is worth noting that this is a distinctly narrower scope than the current Stylesheets,[3] which, in theory, convert most any TEI into any of a variety of formats including DocBook, MS Word, OpenOffice Writer, MarkDown, ePub, LaTeX, PDF, and XSL-FO (and half of those formats into TEI); and convert a TEI ODD customization file into RELAX NG, DTD, XML Schema, ISO Schematron, and HTML documentation. A different group is working on the conversion of a customization ODD into customized documentation using TEIPublisher.[4]

The Task Force, which began meeting in April, comprises the authors. We meet weekly, with the intent of making slow, steady progress. Our main goals are that the deliverables be a utility that can be easily run on GNU/Linux, MacOS, or within oXygen, and that they be programs that can be easily maintained by any programmer knowledgeable about TEI ODD, XSLT, and ant. Of course we also want the program to work properly. Thus we are generating test suites and performing unit testing (with XSpec[5]) as we go, rather than creating tests as an afterthought. We have also developed naming and other coding conventions for ourselves and written constraints (mostly in Schematron) to help enforce them. So, e.g., all XSLT variables must start with the letter ‘v’, and all internal parameters must start with the letter ‘p’ or letters “tp” for tunnel parameters.

We are trying to tackle this enormous project in a sensible, piecemeal approach. We have (conceptually) completely separated the task of assembling one or more customization ODDs with a source ODD into a derived ODD from the task of converting the derived ODD into RELAX NG, and from converting the derived ODD into Schematron. In order to make testing-as-we-go easier, we are starting with the derived ODD→RELAX NG process, and expect to demonstrate some working code at the presentation.

Bauman-atop another TEI ODD processor-149.odt

11:30am - 1:00pm

Session 8B: Demonstrations
Location: ARMB: 2.16
Session Chair: Tiago Sousa Garcia, Newcastle University

ID: 114 / Session 8B: 1
Demonstration
Keywords: Digital Humanities Critical Editions Tools IIIF

Transcribing Primary Sources using FairCopy and IIIF

N. Laiacona

Performant Software Solutions LLC, United States of America

FairCopy is a simple and powerful tool for reading, transcribing, and encoding primary sources using the TEI Guidelines. FairCopy can import IIIF manifests as a starting point for transcription. Users can then highlight zones on each surface and link them to the transcription. FairCopy exports valid TEI-XML which is linked back to the original IIIF endpoints. In this demonstration, we will demonstrate the IIIF functionality in FairCopy and then take a look at the exported TEI-XML and how it provides a consistent interface to images as well as the original IIIF manifest.

Laiacona-Transcribing Primary Sources using FairCopy and IIIF-114.docx

ID: 133 / Session 8B: 2
Demonstration
Keywords: Digital publishing, TEI processing, static sites, programming

Adapting CETEIcean for static site building with React and Gatsby

R. Viglianti

University of Maryland, United States of America

The JavaScript library CETEIcean, written by Hugh Cayless and Raff Viglianti, relies on the DOM processing of web browsers and HTML5 Custom Elements to publish TEI documents as a component pluggable into any HTML structure. This makes it possible to publish and lightly transform TEI documents directly in the user’s browser, doing away with complex server-side infrastructure for TEI publishing. However, CETEIcean provides a fairly bare-bones API for a fully-fledged TEI publishing solution and, without some additional considerations, TEI documents rendered with CETEIcean can be invisible to search engines.

This demonstration will showcase an adaptation of the CETEIcean algorithm as a plugin for the static site generator Gatsby, which relies on the popular framework React for building user interfaces. Two plugins will be shown:

gatsby-transformer-ceteicean (https://www.gatsbyjs.com/plugins/gatsby-transformer-ceteicean/) prepares XML to be registered as HTML5 Custom Elements. It also allows users to apply custom NodeJS transformations before and after processing.

gatsby-theme-ceteicean (https://www.npmjs.com/package/gatsby-theme-ceteicean) implements HTML5 Custom Elements for XML publishing, particularly with TEI. It re-implements parts of CETEIcean excluding behaviors; instead, users can define React components to customize the behavior of specific TEI elements.

The demonstration will show examples from the Scholarly Editing journal (https://scholarlyediting.org), which published TEI-based small-scale editions with these tools alongside other essay-like content.

Viglianti-Adapting CETEIcean for static site building with React and Gatsby-133.docx

ID: 167 / Session 8B: 3
Demonstration
Keywords: TEI, Translation, crowdsourcing

Spec Translator: Enabling translation of TEI Specifications

H. Cayless

Duke University, United States of America

This demonstration will introduce Spec Translator, available from https://translate.tei-c.org/ which enables users to submit pull requests for translations of specification pages from the TEI Guidelines.

ID: 168 / Session 8B: 4
Demonstration
Keywords: TEI, RDF, Online Editors

LEAF-Writer: a TEI + RDF online XML editor

D. Jakacki¹, S. Brown², J. Cummings³

¹Bucknell University, United States of America; ²University of Guelph, Canada; ³Newcastle University, UK

LEAF-Writer is an open-source, open-access Extensible Markup Language (XML) editor that runs in a web browser and offers scholars and students a rich textual editing experience without the need to download, install, and configure proprietary software, pay ongoing subscription fees, or learn complex coding languages. This user-friendly editing environment incorporates Text Encoding Initiative (TEI) and Resource Description Framework (RDF) standards, meaning that texts edited in LEAF-Writer are interoperable with other texts produced by the scholarly editing community and with other materials produced for the Semantic Web. LEAF-Writer is particularly valuable for pedagogical purposes, allowing instructors to teach students best practices for encoding texts without also having to teach students how to code in XML directly. LEAF-Writer is designed to help bridge the gap by providing access to all who want to engage in new and important forms of textual production, analysis, and discovery. LEAF-Writer draws on TEI All as well as other TEI-C-supplied schemas, can use project-specific customized schemas, and offers continuous validation against supported and declared schemas. LEAF-Writer allows users to access and synchronize their documents in GitHub and GitLab, as well as to upload and save documents from their desktop. This prsentation will demonstrate the variety of funcationality and affordances of LEAF-Writer.

1:00pm - 2:30pm

Friday Lunch Break
Location: ARMB: King's Hall

2:30pm - 4:00pm

Closing Keynote: Emmanuel Ngue Um, 'Tone as “Noiseless Data”: Insight from Niger-Congo Tone Languages'
Location: ARMB: 2.98
Session Chair: Martina Scholger, University of Graz

With Closing Remarks, Dr James Cummings, Local TEI2022 Conference Organiser

ID: 166 / Closing Keynote: 1
Invited Keynote

Tone as “Noiseless Data”: Insight from Niger-Congo Tone Languages

E. Ngue Um

University of Yaoundé 1 & University of Bertoua (Cameroon), Cameroon

Text processing assumes two layers of textual data: a "noisy" layer and a "noiseless" layer. The “noisy” layer is generally considered unsuitable for analysis and is eliminated at the pre-processing stage. In current Natural Language Processing (NLP) technologies like text generation in machine translation, the representation of tones as diacritical symbols in the orthography of Niger-Congo languages leads to these symbols being pre-processed as “noisy” data. As an illustration, none of the 15 Niger-Congo tone languages modules available on Google Translate delivers in a systematic and consistent manner, text data that contains linguistic information encoded through tone melody.

The Text Encoding Initiative (TEI) is a framework which can be used to circumvent the “noisiness” brought about by diacritical tone symbols in the processing of text data of Niger-Congo languages.

In novel work, I propose a markup scheme for tone that encompasses:

a) The markup of tone units within an <m> (morpheme) element; this aims to capture the functional properties of tone units, just like segmental morphemes.

b) The markup of tonal characters (diacritical symbols) within a <g> (glyph) element and the representation of the pitch by hexadecimal data representing the Unicode character code for that pitch; this aims to capture tone marks as autonomous symbols, in contrast with their combining layout when represented as diacritics.

c) The markup of downstep and upstep within an <accid> (accidental) element mirroring musical accidentals such as “sharp” and “flat”; this aims to capture strictly melodic properties of tone on a separate annotation tier.

The objectives of tone encoding within the TEI framework are threefold:

a) To harness quantitative research on tone in Niger-Congo languages.

b) To leverage “clean” language data of Niger-Congo languages that can be used more efficiently in machine learning tasks for tone generation in textual data.

c) To gain better insights into the orthography of tone in Niger-Congo languages.

In this paper, I will show how this novel perspective to the annotation of tone can be applied productively, using a corpus of language data stemming from 120 Niger-Congo languages.

Ngue Um-Tone as “Noiseless Data”-166.pdf

4:00pm - 5:30pm

Closing Keynote Reception
Location: ARMB: King's Hall

Only Sessions at Location/Venue
Only Sessions at Date / Time

Conference Agenda