TEI Conference and Members' Meeting 2022

Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

Date: Wednesday, 14/Sept/2022

9:30am - 11:00am

Session 1B: Long Papers
Location: ARMB: 2.16
Session Chair: Syd Bauman, Northeastern University

ID: 139 / Session 1B: 1
Long Paper
Keywords: intertextuality, bibliography, interface development, customization

Texts All the Way Down: The Intertextual Networks Project

S. Connell, A. Clark

Northeastern University, United States of America

In 2016, the Women Writers Project (WWP) began a new research project on the multivalent ways that early women writers engaged with literate culture, at the center of which were systemic enhancements to a longstanding TEI corpus. The WWP’s flagship publication, Women Writers Online (WWO), collects approximately 450 works from the sixteenth to the nineteenth centuries, a watershed period in which women’s participation in the authorship and consumption of texts expanded dramatically. With generous funding from the National Endowment for the Humanities, we used WWO’s TEI encoding to jumpstart the creation of a standalone bibliography containing and linking to all the works referenced in WWO. This bibliography currently includes 3,431 book-level entries; 942 entries that are parts of larger works, such as individual essays or poems; and 126 simple bibliographic entries (e.g. books of the Bible). The bibliography identifies the genre of each work and the gender of the author, where known. We also expanded WWO’s custom TEI markup in order to say more about “intertextual gestures”—or WWO authors’ engagement with other works—which include not only named titles and quotations but also textual remix, adaptation, and parody. By the end of the grant period, we had identified 11,787 quotations, 5,692 titles, 4,825 biblical references, and 1,968 other bibliographic references, linking the individual instances within the WWO texts to the relevant bibliography entries.

Now, the WWP has published “Women Writers: Intertextual Networks” (https://wwp.northeastern.edu/intertextual-networks), a web interface built on these two sources of rich TEI data: the bibliography and WWO’s newly refined intertextual gestures. In this paper we will discuss the challenge of turning dense, textually-embedded data into an interface. Though the encoded texts themselves can stand alone as complete documents, we built Intertextual Networks with a focus on connective tissue, using faceting and linkages to invite curiosity about how authors and works are in conversation with each other. As the numbers above suggest, this project attempts to enable investigations at scale, but we have also sought to draw out the local, even individual, ways that our writers engaged with other texts and authors. Thus, the interface includes visualizations that show overall patterns of usage (for example, the kinds of intertextual gestures employed by each author), but it also allows the reader to view the complete text of each gesture, reading through quotations, named titles, citations, and so on in full, with filtering and faceting to support exploration of this language.

An important challenge for this project has been to build an interface that can address the multidirectional levels of textual imbrication at stake, allowing researchers to examine patterns among both referenced and referencing texts. This paper will share some key insights for TEI projects seeking to undertake similar markup expansion and interface development initiatives. We will discuss strategies for modeling, enabling discovery, and revealing complex layers of textual data and textuality among not only a primary corpus but also a related collection of texts.

ID: 116 / Session 1B: 2
Long Paper
Keywords: sex, gender, TEI Guidelines, document data, theory

Revising Sex and Gender in the TEI Guidelines

E. Beshero-Bondar¹, R. Viglianti², H. Bermúdez Sabel³, J. Jenstad⁴

¹Penn State Behrend, United States of America; ²University of Maryland, United States of America; ³University of Neuchâtel, Switzerland; ⁴University of Victoria, Canada

In Spring 2022, the co-authors collaborated in a TEI Technical Council subgroup to introduce a long-awaited <gender> element and attribute. In the process, we wrote new language for the TEI Guidelines on how to approach these concepts. As we submit this abstract, our proposed changes are under review by the Council for introduction in the next release of the TEI Guidelines, slated for October 2022. We wish to discuss this work with the TEI community to validate and address

* the history of the Guidelines' representation of these concepts,

* applications of the new encoding, and

* the extent to which the new specifications preserve backwards compatibility.

We must recognize as digital humanists and textual scholars that coding sex and gender as true "data" from texts significantly risks categorical determinism and normative cultural bias (Sedgwick 1990, 27+). Nevertheless, we believe that the TEI community is well prepared to encounter these risks with diligent study and expertise on the cultures that produce the textual objects being encoded, in that TEI projects are theoretical in their deliberate efforts to model document data (Ramsay and Rockwell 2012). We seek to encourage TEI-driven research on sex and gender by enhancing the Guidelines' expressiveness in these areas. Our revision of the Guidelines therefore provides examples but resists endorsing any single particular standard for specifying values for sex or gender. We recommend that projects encoding sex and/or gender explicitly state the theoretical groundwork for their ontological modeling, such that the encoding articulates a context-appropriate, informed, and thoughtful epistemology.

Gayle Rubin's influential theory of "sex/gender systems" informs some of our new language in the Guidelines “Names and Dates” chapter (Rubin 1975). While updating existing examples for encoding sex and introducing related examples for encoding gender, we mention the “sex/gender systems” concept to suggest that sex and gender may be related, such that a culture's perspective on biological sex gives rise to its notions of gender identity. Unexpectedly, we found ourselves confronting the Guidelines' prioritization of personhood in discussion of sex, likely stemming from the conflation of sex and gender in the current version of the Guidelines. In revising the technical specifications describing sex, we introduced the term "organism" to broaden the application of sex encoding. We leave it to our community to investigate the fluid concepts of gender and sex in their textual manifestations of personhood and biological life.

Encoding of cultural categories, when unquestioned, can entrench biases and do harm, a risk we must face in digital humanities generally. Yet we seek to make the TEI more expressive and adaptable for projects that complicate, question, and theorize sex and gender constructions. We look forward to working with the TEI community, in hopes of continued revisions, examples, and theoretical document data modeling of sex and gender for future projects. In particular, we are eager to learn more from project customizations that “queer” the TEI and theorize about sexed and gendered cultural constructions, and we hope for a lively discussion at the TEI conference and beyond.

ID: 104 / Session 1B: 3
Long Paper
Keywords: TEI, Spanish, Survey, Community, Geopolitics of Knowledge

Where is the Spanish in the TEI?: Insights on a Bilingual Community Survey

G. del Rio Riande¹, S. Allés-Torrent²

¹CONICET, Argentine Republic; ²Unversity of Miami, USA

Who can best define the interests and needs of a community? The members of the community itself.

“Communicating the Text Encoding Initiative to a Multilingual User Community” is a research project financed by the A. Mellon Foundation in which scholars from North and South America are generating linguistic, cultural, didactic and situated educational materials to improve the XML-TEI encoding, editing and publication of Spanish texts.

As part of the project activities, we prepared a bilingual survey (Spanish-English) aimed at inquiring t who uses or has used XML-TEI practices, and where and how they have been applied to Spanish humanistic texts. Bearing in mind that many digital scholarly edition projects of Spanish texts are carried out in Spanish-speaking and Anglophone institutions, we did not focus on a geographical survey, but on the use of XML at a global level. The survey ran between February and April 2022. It is an anonymous survey and consists of 22 questions. It received 104 responses, 77 in Spanish and 28 in English.

Some of the data that we will discuss in this short presentation aims at illustrating the significant differences regarding the organization of projects, collaboration, financing and use of TEI in master's and doctoral research. In broad terms, the survey allowed us to better understand not only the Spanish-speaking community that uses XML-TEI, but also to think of strategies that can contribute with more inclusive practices for scholars from less represented countries and in less favorable contexts inside the global TEI community. Last but not least, we believe the survey will be useful for designing actions that can support a wider range of modes of interaction and collaboration inside the global TEI community.

11:30am - 1:00pm

Session 2B: Long Papers
Location: ARMB: 2.16
Session Chair: Hugh Cayless, Duke University

ID: 145 / Session 2B: 1
Long Paper
Keywords: collation, information transfer, ecdotics, materiality

TEICollator: a semi-automatic TEI to TEI workflow

M. Gille Levenson

ENS Lyon, France

Automated text comparison has been an area of interest for many years [Nury 2019]: tools such as CollateX allow automated text comparison, and even export to TEI. However, there is no tool today that allows, from transcripts encoded and structured in XML-TEI, to automate the collation of texts and to inject the produced apparatuses into the original files. Working in this way ensures that the contextual and structural information specific to each witness (structure, additions, deletions, line changes, etc) encoded in XML-TEI is not lost. In other words, there is a need of being able to work on textual differences without ignoring the individual, structural and material reality of each text or witness.

Furthermore, the increasing use of Optical Character Recognition (OCR) or Handwritten Text Recognition (HTR) tools [Kiessling 2019], which is interesting both in terms of speed of acquisition and of quality of the preserved information [Camps 2016], have consequences for the ecdotical methods: should we keep collating the text manually, when its acquisition has been done by the computer ?

My work focus on a semi-automatic collation workflow. I will present a complete TEI to TEI processing chain, from single TEI-encoded transcriptions to meaningful collated ones (by the production of typed apparatus, for instance: see [Camps 2018]) that allows to keep the original structural information. This process also identifies the omissions and transpositions, and finally the transformation of the data into documents that present the textual information in the clearest possible way. I will present my work from the perspective of information transfer and pointing out the dialectic between material and textual collation (as carried out by Blekker et al 2018, but using other methods): the latter being the alignment of material features encoded in TEI. Finally, I will outline the limitations and difficulties I face along the processing chain (can the tokenisation of TEI-encoded text be fully automated? What level of textual heterogeneity can manage the worflow ? What quality of lemmatisation is required? what encoding method should be prefered to get the best result posible ?).

I want to show how the TEI standard, the pivot format of this computational method, can be used to describe text as well as to process it. Finally, I will show how the last operation, the transformation from TEI to LaTeX, maybe the most complex task, is fully part of the ecdotic chain, and contributes to produce meaning from the data: in this sense, my work is part of the reflection carried out for several years on Digital Scholarly Editions [Pierazzo 2015; Pierazzo and Driscoll 2016], -- I made a choice to prefer the print/pdf format over a web interface -- thanks to the LaTeX Reledmac package developed and maintained by Maïeul Rouquette [Rouquette 2022].

This paper will be the technical counterpart of a paper presented in La Laguna in July, which will focus on the philological side of the processing chain.

ID: 148 / Session 2B: 2
Long Paper
Keywords: digital edition, data quality assurance, XSL-FO, software test, PDF

Back to analog: the added value of printing TEI editions

M. Kupreyev

Goethe Universität Frankfurt am Main, Germany

Saale (2017) [1] provides the operational definition of a scholarly digital edition by contrasting its paradigm to that of a print edition. His bottom line is that any “digital edition cannot be given in print without significant loss of content and functionality”. In my talk I will touch upon the challenges of printing TEI XML datasets but also substantiate its positive effects: PDF export, indeed, presents only a part of the encoded information, but it can play essential role in data quality assurance. Creating a printed version of a digital edition can enhance the consistency of encoding and affect the overall production pipeline of the TEI XML data.

At the “School of Salamanca” [2] project the TEI XML of the early modern print editions goes through the restrictive Schema and Schematron check-ups, after which it is exported to HTML and JSON IIIF for web display [3]. Recently, an option of PDF export was added. Considering the complexity and the depth of annotation the solution integrated in Salamanca’s Oxygen workflow was chosen, namely a free Apache FOP processor. Similar results may have been achieved with TEI Publisher or Oxygen PDF Chemistry processor. The PDF export highlighted the issues which pertain to two ontologically different areas:

• Rendering XML elements in a constrained two-dimensional PDF layout.

• Varying XML encoding of semantically identical chunks of information.

The issues of the first type refer, for example, to the representation of marginal notes and their anchors, and to the pagination correlation between XML and IIIF (as representing the original), and PDF (as a print output). The second type embraces different rendering of semantically identical text parts, induced either by errors in the original or by the text editors.

PDF generation was initially intended to be one of the export methods of TEI data. It is now implemented early in the TEI production workflow, as it pinpoints the semantic and structural inconsistencies in the data and allows to correct them before the final XML release. PDF production thus adheres to one of the principles of agile software testing, which states that capturing and eliminating defects in the early stages of RDLC (research data life cycle) is less time-consuming, less resource-intensive and less prone to collateral bugs (Crispin 2008) [4].

[1] Sahle, Patrick. 2017. "What is a Scholarly Digital Edition?" in Digital Scholarly Editing, edited by Matthew James Driscoll and Elena Pierazzo, 19-39. Cambridge: Open Book Publishers.

[2] https://www.salamanca.school/en/index.html , accessed on 20.06.2022.

[3] https://blog.salamanca.school/de/2022/04/27/the-school-of-salamanca-text-workflow-from-the-early-modern-print-to-tei-all/,

https://blog.salamanca.school/de/2020/03/17/deutsch-entwicklung-der-webanwendung-v2-0/ , accessed on 20.06.2022.

[4] Crispin, LIsa. 2008. Agile Testing: A Practical Guide for Testers and Agile Teams. Addison-Wesley.

ID: 106 / Session 2B: 3
Long Paper
Keywords: poetry, rhyme, sound

Encoding sonic devices: what is it good for?

M. Holmes

University of Victoria, Canada

The Digital Victorian Periodical Poetry project[1] has captured metadata and page-images for 15,548 poems from Victorian periodicals, and transcribed and encoded a representative sample of 2,150 poems. Our encoding captures rhyme and other sonic devices such as anaphora, epistrophe, and refrains. This presentation will describe our encoding practices and then discuss what useful outcomes can be gained from this undertaking. Although even TEI P1 specified both a rhyme attribute to capture rhyme-scheme and a rhyme element for "very detailed studies of rhyming" (TEI P1 P172)[2], and all significant TEI tutorials teach the encoding of rhyme (e.g. TEI by Example Module 4), it is difficult to find work which makes explicit use of TEI encoding of rhyme (let alone other sonic devices) in the analysis of English poetry.

Is manual encoding of rhyme still necessary? Chisholm & Robey noted back in 1995 that "much of the analysis which currently requires extensive manual markup will in due course be carried out by electronic means" (100), and much work has been devoted to the automated detection of rhyme (Kavanagh 2008; Kilner & Fitch 2017). However, these tools are not completely successful, and in our own work, there is a consistent subset of cases which generate disagreement and discussion regarding type of rhyme, or even whether a rhyme is intended. We do make use of automated detection of anaphora and epistrophe, but only to generate suggestions for cases that might have been missed after the initial encoding has been done. We therefore believe that manually-curated encoding of sonic devices is a prerequisite for serious literary analysis which depends on that encoding.

[1] DVPP, https://dvpp.uvic.ca/.

[2] See also Chisholm & Robey 1995.

Having invested in careful encoding of sonic devices, what are the potential uses for research? DVPP has begun by making rhyme-scheme discoverable and searchable in our search interface, and this is beginning to generate research questions. We can already test notions such as the claim that irregular rhyme-schemes were more frequently used as the century progressed; a table of the percentage of irregularly-rhymed poems in each decade in our collection (Appendix) shows only the weakest support for this claim.

In addition to tracing trends in poetic practice, and the construction of historical rhyme dictionaries, sonic device encoding might also be used for:

- Dialect detection. For example, our dataset includes a significant subset of poems written in Scots dialect, and others which may or may not be; for problem cases, where other factors such as poet and host publication suggest a dialect poem, but surface features are not persuasive, rhyme patterns may provide more evidence.

- Genre detection. Particular poetic genres, such as sonnets or ballads are characterized by formal structures which include rhyme-scheme.

- Bad poetry. We are particularly interested in the notion of what constitutes bad poetry, and our early work suggests that poetry which subjectively seems to be of poor quality also exhibits features such as monotonous rhyme-schemes and intrusive echoic devices.

- Authorship attribution.

- Diachronic sound-change.

- Historical rhyming dictionaries.

2:30pm - 4:00pm

Session 3B: Notes from the DEPCHA Field and Beyond: TEI/XML/RDF for Accounting Records
Location: ARMB: 2.16
Session Chair: Syd Bauman, Northeastern University

ID: 154 / Session 3B: 1
Panel
Keywords: accounts, accounting, DEPCHA, bookkeeping ontology

Notes from the DEPCHA Field and Beyond: TEI/XML/RDF for Accounting Records

K. Tomasek¹, O. Bullock¹, L. Hermsen², R. Walker², N. Kokaze³

¹Wheaton College Massachusetts, United States of America; ²Rochester Institute of Technology, United States of America; ³Chiba University, Japan

Notes from the DEPCHA Field and Beyond: TEI/XML/RDF for Accounting Records

Session Proposal—Short Papers

TEI 2022, Newcastle

Abstract:

The short papers in this session focus on questions that arise in the process of editing manuscript account books. Some of these questions result from the “messiness” of accounting practices in contrast to the “rationality” of accounting principles; others arise from efforts to reflect in the markup social and economic relationships beyond those imagined in Chapter 14 of the P5 TEI Guidelines, “Tables, Formulae, Graphics, and Notated Music.” The Bookkeeping Ontology developed by Christopher Pollin for the Digital Edition Publishing Cooperative for Historical Accounts (DEPCHA) in the Graz Asset Management System (GAMS) extends the potential of TEI/XML using RDF.

In “Operating Centre Mills,” Tomasek and Bullock focus on markup for information about the people, materials, and machines used to produce cotton batting at Centre Mills, a textile mill in Norton, Massachusetts, in 1847-48. The general ledger for this enterprise includes store accounts, production records, and tracking of materials used to run the mill. Entries that reflect the costs of mill operation show sources of raw cotton, daily use of materials, and payments for wages and board for a small labor force. Examples in the paper demonstrate flexible use of the <measure> element combined with a draft taxonomy based on Historical Statistics of the United States, a resource for economic history originally published by the U.S. Bureau of the Census. The goal of the edition is to develop additional semantic markup to supplement Pollin’s Bookkeeping Ontology.

“Wages and Hours,” Hermsen and Walker’s paper, emerges from their work on a digital scholarly edition of account books of William Townsend & Sons, Printers, Stationers, and Account Book Manufacturers, Sheffield UK (1830-1910). Volume 3, “Business Guide and Works Manual,” speaks both to book history and to cultural observations about unionization, gender roles, and credit/debit accounting. Parts of this complex manuscript might be considered a nineteenth-century commonplace book; it also contains specific instructions for book binding, including lists of required materials and a recipe for glue.

The financial accounts in this collection are recorded in ambiguous tabular form with in-text page references to nearly indecipherable price keys. For example, Townsend provides a “key” to determine the size of an account book. The formula is figured using imperial standards for the size of a sheet of paper (i.e. Foolscap) and quarto or octavo folds of the sheet and the number of sheets. This formula, along with the type of ruling and binding, provides the necessary numbers for the arithmetic that will determine the price of an account book.

Naoki’s paper, “Stakeholders in the British Ship-Breaking Industry,” develops a set of methods to analyse structured data of historical financial records, taking a disbursement ledger of Thomas W. Ward, the largest British shipbreaker in the twentieth century, as an example. That ledger is held by the Marine Technology Special Collection at Newcastle University, UK. The academic contribution of this research is to critically examine the possibilities and limitations of DEPCHA, the ongoing digital humanities approach for semantic datafication of historical financial records with the TEI and RDF, mainly developed by scholars in the United States and Austria, and to present an original argument in British maritime history, which is to visualise a part of the overall structure of the British shipbreaking industry.

Development of DEPCHA was supported by a joint initiative of the National Historic Publications and Records Commission at the National Archives and Records Administration in the United States and the Andrew W. Mellon Foundation.

Bios:

Kathryn Tomasek is Professor of History at Wheaton College. She has been working on TEI for account books since 2009, and she was PI for the DEPCHA planning award in 2018. She chaired the TEI Board between 2018 and 2021

Olivia Bullock is a senior Creative Writing major at Wheaton College who studies intersectional identities in literature and history.

Lisa Hermsen is Professor and Caroline Werner Gannett Endowed Chair in the College of Liberal Arts at Rochester Institute of Technology.

Rebecca Walker, Digital Humanities Librarian, coordinates large-scale DH projects and supports classroom digital initiatives in the College of Liberal Arts at Rochester Institute of Technology.

Naoki Kokaze is an Assistant Professor at Chiba University, where he leads the design and implementation of DH-related lectures in the government-funded humanities’ graduate education program conducted in collaboration with several Japanese universities. He is a PhD candidate in History at the University of Tokyo, writing his doctoral dissertation focusing on the social, economic, and diplomatic aspects of the disposal of obsolete British Royal Navy’s warships from the mid-nineteenth century through the 1920s.

Date: Thursday, 15/Sept/2022

9:30am - 11:00am

Session 4B: Long Papers
Location: ARMB: 2.16
Session Chair: Elisa Beshero-Bondar, Penn State Behrend

ID: 138 / Session 4B: 1
Long Paper
Keywords: IPIF, Prosopography, Personography, Linked Open Data

From TEI Personography to IPIF data

R. W. J. Hadden¹, G. Vogeler^2,1

¹Austrian Academy of Sciences, Austria; ²University of Graz, Austria

The International Prosopography Interchange Format (IPIF) is an open API and data model for prosopographical data interchange, access, querying and merging, using a regularised format. This paper discusses the challenges for converting TEI personographies into the IPIF format, and more general questions of using the TEI for so-called 'factoid' prospographies.

ID: 147 / Session 4B: 2
Long Paper
Keywords: data modeling, information retrieval, data processing, digital philology, digital editions

TEI as Data: Escaping the Visualization Trap

R. Rosselli Del Turco¹, E. Magnanti², G. Cerretini³

¹Università di Torino, Italy; ²University of Vienna, Austria; ³Università di Pisa, Italy

During the last few years, the TEI Guidelines and schemas have continued growing in terms of capability and expressive power. A well-encoded TEI document constitutes a small treasure trove of textual data that could be queried to quickly derive information of different types. However, access to such data is mainly intended for visualization purposes in many edition browsing tools, e.g. EVT (http://evt.labcd.unipi.it/). Such an approach seems to be hardly compatible with the strategy of setting up databases to query this data, thus leading to a splitting of environments: DSEs to browse edition texts versus databases to perform powerful and sophisticated queries. It would be interesting to expand the capabilities of EVT, and possibly other tools, adding functionalities which would allow them to process TEI documents to answer complex user queries. This requires both an investigation to define the text model in terms of TEI elements and a subsequent implementation of the desired functionality, to be tested on a suitable TEI project that can adequately represent the text model.

The Anglo-Saxon Chronicle stands out as an ideal environment to test such a method. The wealth of information that it records about early medieval England makes it the optimal footing upon which to enhance computational methods for textual criticism, knowledge extraction and data modeling for primary sources. The application of such a method could here prove essential to assist the retrieval of knowledge otherwise difficult to extract from a text that survives in multiple versions. Bridging together, cross-searching and querying information dispersed in all the witnesses of the tradition would allow us to broaden our understanding of the Chronicle in unprecedented ways. Interconnecting the management of a wide spectrum of named entities and realia—which is one of the greatest assets of TEI—with the representation of historical events would make it possible to gain new knowledge about the past. Most importantly, it would lay the groundwork for a Digital Scholarly Edition of the Anglo-Saxon Chronicle, a project never undertaken so far.

Therefore, we decided to implement a new functionality capable of extracting and processing a greater amount of information by cross-referencing various types of TEI/XML-encoded data. We developed a TypeScript library to outline and expose a series of APIs allowing the user to perform complex queries on the TEI document. Besides the cross referencing of people, places and events as hinted above—on the basis of standard TEI elements such as <listPerson>/<person>, <listPlace>/<place>, <listEvent>/<event> etc.—we plan to support ontology-based queries, defining the relationships between different entities by means of RDF-like triples. In a similar way, it will be possible to query textual variants recorded in the critical apparatus by typology and witness distribution. This library will be integrated in EVT to interface directly with its existing data structures, but it is not limited to it. We are currently working on designing a dedicated GUI within EVT to make the query system intuitive and user-friendly.

ID: 120 / Session 4B: 3
Long Paper
Keywords: linked data, conversion, reconciliation, software development

LINCS’ Linked Workflow: Creating CIDOC-CRM from TEI

C. Crompton, H. Zafar, A. Defours

University of Ottawa, Canada

TEI data is so often carefully curated without any of the noise and error common to algorithmically created data, that it is a perfect candidate for linked data creation; however, while most small TEI projects boast clean beautifully crafted data, linked data creation is often out of reach both technically and financially for these project teams. This paper reports (following where others have tread ) on the Networked Cultural Scholarship project (LINCS) workflow, mappings, and tools for creating linked data from TEI resources.

The process of creating linked data is far from straightforward since TEI is by nature hierarchical, taking its meaning from the deep nesting of elements. Any one element in TEI may be drawing its meaning from its relationship to a grandparent well up the tree (for example a persName appearing inside a listPerson inside the teiHeader is more likely to be a canonical reference to a person than a persName whose parent is a paragraph). Furthermore, the meaning of TEI elements are not always well-represented in existing ontologies and the time and money required to represented TEI-based information about people, places, time, and cultural production as linked data is out of reach of many small projects.

This paper introduces the LINCS workflow for creating linked data from TEI. We will introduce the named entity recognition and reconciliation service, NSSI (pronounced nessy), and its integration into a TEI-friendly vetting interface, Leaf Writer. Following NSSI reconciliation, Leaf Writer users can download their TEI with the entity uris in idno elements for their own use. If they wish to contribute to LINCS, they may proceed to enter the TEI document they have exported from Leaf Writer into XTriples, a customized version of Mainz’s Digitale Akademie’s XTriples tool of the same name, which converts TEI to CIDOC-CRM for either private use, or for integration into the LINCS repository. We have adopted the XTriples tool because it meets the needs of a very common type of TEI user: the director or team member of a project who is not going to be able to learn the intricacies of CIDOC-CRM, or indeed perhaps not even of linked data principles, but would still like to contribute their data to LINCS. That said, we are keen to get the feedback of the expert users of the TEI community on our workflow, CIDOC-CRM mapping, and tools.

Bodard, Gabriel, Hugh Cayless, Pietro Liuzzo, Chiara Cenati, Alison Cooley, Tom Elliott, Silvia Evangelisti, Achille Felicetti, et al. “Modeling Epigraphy with an Ontology.” Zenodo, March 26, 2021.

Ciotti, Fabio. “A Formal Ontology for the Text Encoding Initiative.” Umanistica Digitale, vol. 2, no. 3, 2018.

Eide, Ø., and C. Ore. “From TEI to a CIDOC-CRM Conforming Model: Towards a Better Integration Between Text Collections and Other Sources of Cultural Historical Documentation.” Digital Humanities, 2007.

Ore, Christian-Emil, and Øyvind Eide. “TEI and Cultural Heritage Ontologies: Exchange of Information?” Literary and Linguistic Computing, vol. 24, no. 2, 2009, pp. 161–72., https://doi.org/10.1093/llc/fqp010.

11:30am - 1:00pm

Session 5B: Panel - Manuscript catalogues as data for research
Location: ARMB: 2.16
Session Chair: Katarzyna Anna Kapitan, University of Oxford

ID: 144 / Session 5B: 1
Panel
Keywords: Manuscripts, Provenance, Research, Clustering, Linked Data

Manuscript catalogues as data for research

H. E. Jones¹, Y. Faghihi¹, M. Holford², T. Schaßan³, T. Burrows², K. A. Kapitan², N. K. Yavuz⁴

¹Cambridge University, United Kingdom; ²University of Oxford; ³Herzog August Bibliothek; ⁴University of Leeds

Manuscript catalogues present problems and opportunities for researchers, not least the status of manuscript descriptions as both information about texts and texts in themselves. In this panel, we will present three recent projects which have used manuscript catalogues as data for research, and which raise general questions in text encoding, in manuscript studies and in data-driven digital humanities. This will be followed by a panel discussion to further investigate issues and questions raised by the papers.

1. Investigating the Origins of Islamicate Manuscripts Using Computational Methods (Yasmin Faghihi and Huw Jones):

This project evaluated computational methods for the generation of new information about the origins of manuscripts from existing catalogue data. The dataset was the Fihrist Union Catalogue of Manuscripts from the Islamicate World. We derived a set of codicological features from the TEI data, clustered together manuscripts sharing features, and used dated/placed manuscripts to generate hypotheses about the provenance of other manuscripts in the clusters. We aimed to establish a set of base criteria for the dating/placing of manuscripts, to investigate methods of enriching existing datasets with inferred data to form the basis of further research, and to engage critically with the research cycle in relation to computational methods in the humanities.

2. Re-thinking the <provenance> element in TEI Manuscript Description to support graph database transformations (Toby Burrows and Matthew Holford):

This paper reports on the transformation of the Bodleian Library’s online medieval manuscripts catalogue, based on the “Manuscript Description” section of the TEI Guidelines, into RDF graphs using the CIDOC-CRM and FRBROO ontologies. This work was carried out in the context of two Linked Open Data projects: Oxford Linked Open Data and Mapping Manuscript Migrations.

One area of particular focus was the provenance data relating to these manuscripts, which proved challenging to transform effectively from TEI to RDF. An important output from the MMM project was a set of recommendations for re-thinking the structure and encoding of the TEI <provenance> element to enable more effective reuse of the data in graph database environments. These recommendations draw on concepts previously outlined by Ore and Eide (2009), but also take into account the parallel work being done in the art museum and gallery community.

3. The use of TEI in the Handschriftenportal (Torsten Schaßan)

The national manuscript portal for Germany in the making, the Handschriftenportal, is built on TEI encoded data. These include representations for manuscripts, descriptions that have been imported, authority data, and OCR-generated catalogues. In the future, it will be possible to enter descriptions directly into the backend database.

The structure of the descriptive data shall be adopted according to the latest developments in manuscript studies, e.g. the risen importance of material aspects, or the alignment of the description of texts and illuminations.

Especially the latter, the data to be entered in the future, poses several issues to the TEI encoding as currently defined in the Guidelines. This comprises the overall structure of the main components of a description, as well as needs on a more detailed level.

Bios

Dr Toby Burrows is a Digital Humanities researcher at the University of Oxford and the University of Western Australia. His research focuses on the history of cultural heritage collections, and especially medieval and Renaissance manuscripts.

Yasmin Faghihi is Head of the Near and Middle Eastern Department at Cambridge University Library. She is the editor of FIHRIST, the online union catalogue for manuscripts from the Islamicate world.

Matthew Holford is Tolkien Curator of Medieval Manuscripts at the Bodleian Library, Oxford. He has a long-standing research interest in the use of TEI for the description and cataloguing of Western medieval manuscripts.

Huw Jones is Head of the Digital Library at Cambridge University Library, and Director of CDH Labs at Cambridge Digital Humanities. His work spans many aspects of collections-driven digital humanities, from creating and making collections available to their use in a research and teaching context.

Torsten Schaßan is member of the Manuscripts and Special Collections department of the Herzog August Bibliothek Wolfenbüttel. He was involved in many manuscript digitisation and cataloguing projects. In the Handschriftenportal project he is responsible for the definition of schemata and all transformations of data for import into the portal.

Chair: Dr Katarzyna Anna Kapitan is manuscript scholar and digital humanist specialising in Old Norse literature and culture. Currently she is Junior Research Fellow at Linacre College, University of Oxford, where she works on a digital book-historical project, “Virtual Library of Torfæus”, funded by the Carlsberg Foundation.

Respondent: Dr N. Kıvılcım Yavuz works at the intersection of medieval studies and digital humanities, with an expertise in medieval historiography and European manuscript culture. She is especially interested in digitisation of manuscripts as cultural heritage items and creation, collection and interpretation of data and metadata in the context of digital repositories.

Date: Friday, 16/Sept/2022

9:30am - 11:00am

Session 7B: Long Papers
Location: ARMB: 2.16
Session Chair: Gimena del Rio Riande, CONICET

ID: 132 / Session 7B: 1
Long Paper
Keywords: TEI, born-digital heritage, retrocomputing, digitality, materiality

Is it still data? Scholarly Editing of Text from Early Born-Digital Heritage

T. Roeder

Universität Würzburg, Germany

Digital heritage is strongly bound to original devices and displays. Even in today’s standardized environments, text can change its appearance depending on the monitor technology, on the processing software, and on the available fonts on the system: Text as data depends much on technical interpretation.

Creating a scholarly digital edition from born-digital heritage, expecially text, needs to consider the original conditions, like encoding and hardware standards. My question is: Are the encoding guidelines of the TEI suitable for representing born-digital text? How much information is required about the original environment? Can a screenshot serve as facsimile, or it is neccessary to link to emulated states of the display software?

To give an example, I will present a preliminary scholarly TEI-based digital edition of “disk magazines”. These magazines were a special type of periodical that was published mostly on floppy disk mainly in the 1980s and 1990s. Created by home computer enthusiasts for the community, disk magazines are potentially valuable as a historical resource to study the experiences of programmers, users and gamers in the early stage of microcomputing.

In the examples (one of them is available at https://diskmags.github.io/md_87-11.html), the digital texts are decompressed byte sequences of PETSCII code, which is only partially compatible to ASCII. The appearance of the characters could be changed completely by the programmer to display foreign characters or alphabets. Further, it depended on a 40x25 characters layout, where text had to be aligned manually by inserting whitespaces. The once born-digital text – as data – is transformed into readable text – as image – on a screen. The example demonstrated that the connection between textual data and textual display can be very fragile.

For TEI encoding, this would have some consequences. On the one side, there would be a requirement to preserve as much of the original data as possible. On the other side, a scholarly edition needs to represent the semantics of the visible document. It would require an interpretative layer to communicate between these two levels, which could be implemented by different markup strategies; however it needs to be discussed whether classes like “att.global.rendition” are actually suited for this. It also needs to be discussed in which way a digital document (or which instance of it: as stored data, as memory state, as display?) can be interpreted in the same way as a material document – and which implications this would have for TEI encoding of born-digital heritage.

ID: 152 / Session 7B: 2
Long Paper
Keywords: publishing, LOD, TEI infrastructure

Using Citation Structures

H. Cayless

Duke University, United States of America

This paper is really a follow-up to one I gave at Balisage in 2021.[1] Citation Structures are a TEI feature introduced in version 4.2.0 of the Guidelines, which provide an alternative (and more easily machine-processable) method for declaring their internal structures.[2] This mechanism is important because of the heterogeneity of texts and consequently of the TEI structures used to model them. This heterogeneity necessarily means it is difficult for any system publishing collections of TEI editions to treat their presentation consistently. For example, a citation like “1.2” might mean “poem 1, line 2” in one edition, and “book 1, chapter 2” in another. It might be perfectly sensible to split an edition into chapters, or even small sections, for presentation online, but not at all to split a poem into lines (though maybe groups of lines might be desirable). A publication system otherwise will have to rely on assumptions and guesswork about the items in its purview, and may fail to cope with new material that does not behave as it expects. Worse, there is no guarantee that the internal structures of editions are consistent within themselves. We might consider, for example, Ovid’s ‘Tristia’, in which the primary organizational structure is book, poem, line, but book two is a single, long poem.

Citation structures permit a level of flexibility hard to manage otherwise, by allowing both nested structures and alternative structures at every level. In addition, a key new feature of citation structures over the older reference declaration methods is the ability to attach information that may be used by a processing system to each structural level. The <citeData> element which makes this possible will allow, for example, a structural level to indicate what name it should be given in a table of contents, or even whether or not it should appear in such a feature.

I will discuss the mechanics of creating and using citation structures. Finally, I will present a working system in XSLT that can exploit <citeStructure> declarations to produce tables of contents, split large documents into substructures for presentation on the web, and resolve canonical citations to parts of an edition.

1. https://www.balisage.net/Proceedings/vol26/html/Cayless01/BalisageVol26-Cayless01.html

2. See https://tei-c.org/release/doc/tei-p5-doc/en/html/CO.html#CORS6 and https://tei-c.org/release/doc/tei-p5-doc/en/html/SA.html#SACRCS.

ID: 160 / Session 7B: 3
Long Paper
Keywords: Manuscript cataloguing, semantic markup, retro-conversion vs. born-digital

Text between data and metadata: An examination of input types and usage of TEI encoded texts

T. Schaßan

Herzog August Bibliothek Wolfenbüttel, Germany

Many texts that have been encoded using the TEI in the past are retro-converted from printed sources: manuscript catalogues and dictionaries are examples for highly structured texts, drama, verse, and performance texts are usually less structured, editions appear somewhere inbetween.

Many of the text types for which the TEI offers specialised elements represent both metadata and data, according to the scenarios in which these texts are used.

In the field of manuscript cataloguing, it has been a question for a long time whether the msdescription module is sufficient for the representation of a retro-converted text of a formerly printed catalogue. One may argue, that a catalogue is first of all a visually structured text, a succession of paragraphs, whose semantics are only loosely connected to the main elements the TEI defines, such as <msContents>, <physDesc>, or <msPart>. On the other hand, on a sub-paragraph level, the TEI offers structures, which may not be align-able with the actual text of the catalogue so that the person who carries out the retro-conversion has to decide whether to change the text according to the TEI schema rules or encode the text semantically wrong or structure the text with much less semantic information as it would be possible.

Now, that the TEI is more and more used to store these kind of texts as born-digitals, the questions is whether the structures offered by the TEI meet all the needs the texts and their authors might have in different scenarios: Is a TEI-encoded text of a given kind equally useful for all search and computational uses, as well as publishing needs? Are the TEI structures flexible enough or do they privilege some uses over others? How much of the semantic information is encoded in the text and how much of it might be realised only in the processing of the sources?

In this paper, manuscript catalogues serve as an example for the more general question about what structures, how much markup and what kind of markup is needed in the time of powerful search engines and artificial intelligence, authority files and the Linked Open Data.

11:30am - 1:00pm

Session 8B: Demonstrations
Location: ARMB: 2.16
Session Chair: Tiago Sousa Garcia, Newcastle University

ID: 114 / Session 8B: 1
Demonstration
Keywords: Digital Humanities Critical Editions Tools IIIF

Transcribing Primary Sources using FairCopy and IIIF

N. Laiacona

Performant Software Solutions LLC, United States of America

FairCopy is a simple and powerful tool for reading, transcribing, and encoding primary sources using the TEI Guidelines. FairCopy can import IIIF manifests as a starting point for transcription. Users can then highlight zones on each surface and link them to the transcription. FairCopy exports valid TEI-XML which is linked back to the original IIIF endpoints. In this demonstration, we will demonstrate the IIIF functionality in FairCopy and then take a look at the exported TEI-XML and how it provides a consistent interface to images as well as the original IIIF manifest.

ID: 133 / Session 8B: 2
Demonstration
Keywords: Digital publishing, TEI processing, static sites, programming

Adapting CETEIcean for static site building with React and Gatsby

R. Viglianti

University of Maryland, United States of America

The JavaScript library CETEIcean, written by Hugh Cayless and Raff Viglianti, relies on the DOM processing of web browsers and HTML5 Custom Elements to publish TEI documents as a component pluggable into any HTML structure. This makes it possible to publish and lightly transform TEI documents directly in the user’s browser, doing away with complex server-side infrastructure for TEI publishing. However, CETEIcean provides a fairly bare-bones API for a fully-fledged TEI publishing solution and, without some additional considerations, TEI documents rendered with CETEIcean can be invisible to search engines.

This demonstration will showcase an adaptation of the CETEIcean algorithm as a plugin for the static site generator Gatsby, which relies on the popular framework React for building user interfaces. Two plugins will be shown:

gatsby-transformer-ceteicean (https://www.gatsbyjs.com/plugins/gatsby-transformer-ceteicean/) prepares XML to be registered as HTML5 Custom Elements. It also allows users to apply custom NodeJS transformations before and after processing.

gatsby-theme-ceteicean (https://www.npmjs.com/package/gatsby-theme-ceteicean) implements HTML5 Custom Elements for XML publishing, particularly with TEI. It re-implements parts of CETEIcean excluding behaviors; instead, users can define React components to customize the behavior of specific TEI elements.

The demonstration will show examples from the Scholarly Editing journal (https://scholarlyediting.org), which published TEI-based small-scale editions with these tools alongside other essay-like content.

ID: 167 / Session 8B: 3
Demonstration
Keywords: TEI, Translation, crowdsourcing

Spec Translator: Enabling translation of TEI Specifications

H. Cayless

Duke University, United States of America

This demonstration will introduce Spec Translator, available from https://translate.tei-c.org/ which enables users to submit pull requests for translations of specification pages from the TEI Guidelines.

ID: 168 / Session 8B: 4
Demonstration
Keywords: TEI, RDF, Online Editors

LEAF-Writer: a TEI + RDF online XML editor

D. Jakacki¹, S. Brown², J. Cummings³

¹Bucknell University, United States of America; ²University of Guelph, Canada; ³Newcastle University, UK

LEAF-Writer is an open-source, open-access Extensible Markup Language (XML) editor that runs in a web browser and offers scholars and students a rich textual editing experience without the need to download, install, and configure proprietary software, pay ongoing subscription fees, or learn complex coding languages. This user-friendly editing environment incorporates Text Encoding Initiative (TEI) and Resource Description Framework (RDF) standards, meaning that texts edited in LEAF-Writer are interoperable with other texts produced by the scholarly editing community and with other materials produced for the Semantic Web. LEAF-Writer is particularly valuable for pedagogical purposes, allowing instructors to teach students best practices for encoding texts without also having to teach students how to code in XML directly. LEAF-Writer is designed to help bridge the gap by providing access to all who want to engage in new and important forms of textual production, analysis, and discovery. LEAF-Writer draws on TEI All as well as other TEI-C-supplied schemas, can use project-specific customized schemas, and offers continuous validation against supported and declared schemas. LEAF-Writer allows users to access and synchronize their documents in GitHub and GitLab, as well as to upload and save documents from their desktop. This prsentation will demonstrate the variety of funcationality and affordances of LEAF-Writer.

Session Overview
Location: ARMB: 2.16 Armstrong Building: Lecture Room 2.16. Capacity: 100