Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

 
 
Session Overview
Session
Session 4A: Short-Papers
Time:
Thursday, 15/Sept/2022:
9:30am - 11:00am

Session Chair: Peter Stadler, Paderborn University
Location: ARMB: 2.98

Armstrong Building: Lecture Room 2.98. Capacity: 168

Presentations
ID: 126 / Session 4A: 1
Short Paper
Keywords: digital texts, textual studies, born-digital, electronic literature

TEI and the Re-Encoding of Born-Digital and Multi-Format Texts

E. Forget, A. Galey

University of Toronto, Canada

What affordances can TEI encoding offer scholars who work with born-digital, multi-format, and other kinds of texts produced in today’s publishing environments, where the term “digitization” is almost redundant? How can we use TEI and other digitization tools to analyze materials that are already digital? How do we distinguish between a digital text’s multiple editions or formats and its paratexts, and what differences do born-digital texts make to our understanding of markup? Can TEI help with a situation such as the demise of Flash, where the deprecation of a format has left many works of electronic literature newly vulnerable — and, consequently, newly visible as historical artifacts?

These questions take us beyond descriptive metadata and back to digital markup’s origins in electronic typesetting, but also point us toward recent work on electronic literature, digital ephemera, and the textual artifacts of the very recent past (e.g. those described in recent work by Matthew Kirschenbaum, Dennis Tenen, and Richard Hughes Gibson). Drawing from textual studies, publishing studies, book history, disability studies, and game studies, we are experimenting with the re-encoding of born-digital materials, using TEI to encode details of the texts’ form and function as digital media objects. In some cases, we are working from a single digital source, and in others we are working with digital editions of materials that are available in multiple analogue and digital formats. Drawing on our initial encoding and modelling experiments, this paper explores the affordances of using TEI and modelling for born-digital and multi-format textual objects, particularly emerging digital book formats. We reconsider what the term “data” entails when one’s materials are born-digital, and the implications for digital preservation practice and the emerging field of format theory.

Forget-TEI and the Re-Encoding of Born-Digital and Multi-Format Texts-126.docx


ID: 107 / Session 4A: 2
Short Paper
Keywords: online forum, thread structure, social media, computer mediated communication

Capturing the Thread Structure: A Modification of CMC-Core to Account for Characteristics of Online Forums

S. Reimann, L. Rodenhausen, F. Elwert, T. Scheffler

Ruhr-University Bochum, Germany

Representing computer mediated communication (CMC), such as discussions in online forums, according to the guidelines of the Text Encoding Initiative was addressed by the CMC Special Interest Group (SIG). Their latest schema, CMC-core, presents a basic way of representing a wide range of different types of CMC in TEI P5. However, this schema has a general aim and is not specifically tailored for capturing the thread structure of online forums.

In particular, CMC-core is organized centrally by the time stamp of posts (a timeline structure), whereas online forums often split into threads and subthreads, giving less importance to the time of posting. In addition, forums may contain quotes from external sources as well as from other forum posts, which need to be differentiated in an adapted <quote> element. Not only do online forums as a whole differ from other forms of CMC, but there are often also considerable differences between individual online forums. We created a corpus of posts from various religious online forums, including different communities on Reddit, as well as two German forums which specifically focus on the topic of religion, with the purpose of analyzing their structure and textual content. These forums differ in the way threads are structured, how emoticons and emojis are used, and how people are able to react to other posts (for example by voting).

This raises the need for a schema which on the one hand takes the features of online forums as a genre into account, and, on the other hand, is flexible enough to enable the representation of a wide range of different online forums. We present some modifications of the elements in CMC-core in order to guarantee a standardized representation of three substantially different online forums while retaining all their potentially interesting microstructural characteristics.

Reimann-Capturing the Thread Structure-107.docx


ID: 111 / Session 4A: 3
Short Paper
Keywords: digital publications, VRE, open access, scholarly communication, web publication

Publishing the grammateus research output with the TEI : how our scholarly texts become data

E. Nury

University of Geneva, Switzerland

The TEI is not exclusively used to encode primary sources: TEI-based scholarly publishing represents a non-negligible portion of TEI-encoded texts (Baillot and Giovacchini 2019). I present here how the encoding of secondary sources such as scholarly texts can benefit researchers, with the example of the grammateus project.

In the grammateus project, we are creating a Virtual Research Environment to present a new way of classifying Greek documentary papyri. This environment comprises a database of papyri, marked up with the standard EpiDoc subset of the TEI. It includes as well the textual research output from the project, such as introductory materials, detailed descriptions of papyri by type, and an explanation on the methodology of the classification. The textual research output was deliberately prepared as an online publication so as to fully take advantage of the interactivity with data offered by a web application, in contrast to a printed book. We are thus experimenting with a new model of scholarly writing and publishing.

In this short paper I will describe how we have used the TEI not only for modeling papyrological data, but also for the encoding of scholarly texts produced in the context of the project, which would have traditionally been material for a monograph or academic articles. I will also demonstrate how this has enabled us later on to enrich our texts with markup for features that have emerged as relevant. We implemented a spiraling encoding process in which methodological documentation and analytical descriptions keep feeding back the editorial encoding of the scholarly texts. Documentation and analytical text therefore become data, within a research process based on a feedback method.

Nury-Publishing the grammateus research output with the TEI-111.docx


ID: 153 / Session 4A: 4
Short Paper
Keywords: HTR, Transkribus, Citizen Science

Handwritten Text Recognition for heterogeneous collections? The Use Case Gruß & Kuss

S. Büdenbender1, M. Seltmann2, J. Baum1

1University of Applied Sciences Darmstadt (h_da), Germany; 2University and State Library Darmstadt, Germany

Gruß & Kuss – Briefe digital. Bürger*innen erhalten Liebesbriefe – a research project funded by BMBF for 36 months – aims to digitize and explore love letters from ordinary persons with the help of dedicated volunteers, also raising the question of how citizens can actively participate in the indexing and encoding of textual sources.

To present, transcriptions are made manually in Transkribus (lite), tackling a corpus consisting of more than 22,000 letters from 52 countries and 345 donators, divided into approximately 750 bundles (i.e., correspondences between usually two writers). The oldest letter dates from 1715, the most recent from 2021, using a very broad concept of letter and including, for instance, notes left on pillows or WhatsApp messages.

The paper investigates the applicability of Handwritten Text Recognition (HTR) to this highly heterogeneous stock in a citizen science context. In an explorative approach, we will investigate at which scope of a bundle, respectively at which number of pages of the same handwriting, HTR becomes worthwhile.

For this purpose, the effort of a manual transcription is first compared to the effort of a model creation in Transkribus (in particular the creation of a training and validation set by double keying), including final corrections. In a second step, we will explore whether a modification of the procedure can be used to process even smaller bundles. Based on given metadata (time of origin, gender, script ...) a first clustering can be created, and existing models can be used as a basis for graphemically similar handwritings, allowing training sets to be kept much smaller while maintaining acceptable error rates. Another possibility is to start off with mixed training sets covering a class of related scripts.

Furthermore, we discuss how manual transcription by citizen scientists can be quantified in relation to the project’s overall resources.

Büdenbender-Handwritten Text Recognition for heterogeneous collections The Use Case Gruß & Kuss-153.docx