Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

 
Only Sessions at Location/Venue 
 
 
Session Overview
Location: ARMB: 3.38
Armstrong Building: Teaching Room 3.38. Capacity: 59
Date: Monday, 12/Sept/2022
9:30am - 6:00pmWorkshop 1: From a collection of documents to a published edition : how to use an end-to-end publication pipeline [Full Day]
Location: ARMB: 3.38
 
ID: 125 / WS 1: 1
Workshop
Keywords: digital edition, historical manuscripts, encoding pipeline, publication workflow

From a collection of documents to a published edition : how to use an end-to-end publication pipeline

F. Chiffoleau1,2, H. Scheithauer1

1Inria, France; 2Le Mans Université, France

In 2021, during the last edition of the TEI Conference “Next Gen TEI”, I took part in a session where I presented a project I had been working on for a year and a half. This project, both relying massively on the Text Encoding Initiative and benefiting its community, focusses on the creation of a pipeline for the publication of digital scholarly editions. Our pipeline, which was still a work in progress at the time of the 2021 Conference, but is now complete, aims at providing open source, free, easy-to-use and interoperable tools; its goal is to support the editorial process from the digitization of a collection of documents to its publication in a machine-readable standard.

In the following, I will succinctly describe the six steps that compose this pipeline, and then move to the way I intend to conduct the workshop based on them.

Firstly, the collection of images that composes the corpus has to be preserved and curated somewhere online to keep them available for the researcher. For this task, we rely on IIIF, to ensure sustainability and interoperability.

The three following steps (segmentation/transcription/post-OCR correction) are conducted with eScriptorium, an open-source automatic transcription application. It offers various options: images uploading, manual and automatic segmentation/transcription, import of models, production of ground truths, model training. Finally, if there are remaining errors in the transcription (in case of automatic transcription), it is possible to either correct them manually in eScriptorium or export the files and correct with the help of specifically designed scripts.

Once the transcription is fully ready, we encode it in TEI XML. For this step, we provide various solutions, depending on the transcription file format (Page XML, XML Alto, Text). We also propose a series of scripts and documentation that help automatize and speed up this process.

The publication itself is made available for online consultation with the help of TEI Publisher, an application created to generate custom publications for corpora encoded in TEI XML. We have developed and launched a dedicated application for digital scholarly editions (DiScholEd) on this basis. It is available online together with a thorough documentation, and is conceived as an open application: new corpora can always be added to it, and we welcome new collaborations.

The goal of our workshop is to demonstrate how an available corpus could be processed for publication on the DiScholEd application. The workshop participants will learn to experiment with a ready-to-use solution that provides an easy and quick online publication of a corpus. They will also get tips and shortcuts to help speed up the creation of a digital edition. Moreover, by the end of the session, this workshop will provide the participants with a visualization of their respective corpus, with side by side transformed text and original image; all of which then showing what can be achieved while working with TEI in the context of an end-to-end publication pipeline.

The program for this workshop is the following: Firstly, it will start with a presentation of the development of the pipeline, its objectives and how it works. Then, the time we have will be divided into several slots corresponding to the work steps of the pipeline. Each slot will start with a quick presentation of what is expected of the participants and what tools they will need to use. Next, they will be allotted some time to process their data according to the requirements of the concerned work step, as they all require a certain amount of time. At the end of the day, a 30mn feedback session will make it possible for each participant as well as for the workshop organizers to assess the benefits of the session and envision further possible collaborations.

Considering the number of steps in this pipeline and the time required for each of these steps, a full day is necessary for this workshop. The number of participants should be 10-15 maximum, in order for the two workshop conveners to be able to provide the necessary technical support in the hands-on parts of the workshop.

In order for the participants to be able to work correctly on the pipeline, they will need a laptop as well as the following tools: a command line interface for the execution of the scripts and an XML editor (Oxygen is the best choice). It is also preferable if, beforehand, they get an account at Huma-Num and eScriptorium.

GitHub repository of the pipeline:

https://github.com/DiScholEd/pipeline-digital-scholarly-editions

Chiffoleau-From a collection of documents to a published edition-125.docx
 

Date: Tuesday, 13/Sept/2022
9:30am - 1:00pmWorkshop 4: Building TEI-powered websites with static site technology. A hands on exploration of the publishing toolkit of the Scholarly Editing Journal [Half Day, Morning]
Location: ARMB: 3.38
 
ID: 134 / WS 4: 1
Workshop
Keywords: Digital publishing, TEI processing, static sites, programming

Building TEI-powered websites with static site technology. A hands on exploration of the publishing toolkit of the Scholarly Editing Journal

R. Viglianti

University of Maryland, United States of America

This half-day (approximately 3 hours) workshop will introduce TEI publishing with static site generators and front-end technologies, namely React JS and the static site generator Gatsby. It will introduce the attendees to the publishing strategies and tool sets developed for the reboot of the online Scholarly Editing journal (https://scholarlyediting.org/), which publishes, among essay-like content, TEI-based small scale editions. This workshop is aimed at attendees who already have some experience with programming (including XSLT) and the command line; however, all are welcome and will be supported as much as possible throughout the workshop.

The publishing tools presented in this workshop were developed for the reboot of the Scholarly Editing journal, which published its newest issue, volume 39, in April 2022. The previous site, built with Apache Cocoon, was converted into a static site and made accessible as an archive (https://scholarlyediting.org/se.index.issues.html). The new website and journal issues are built using Gatsby, a static site generator that relies on React JS for building user interfaces. The journal’s editors chose to adopt a static site generator because, once built, static sites do not need maintenance and can be easily moved and archived. This requires less infrastructure to publish and keep online the site on the web, which is desirable both for keeping operational costs of the journal low and to ensure its longevity. XML technologies can be and are used to generate static sites; the TEI Guidelines are a notable example. Regardless of how the static site is built, the result has minimal infrastructure requirements. A server is always needed to publish something on the web, but its role is limited to sending files over to the client, essentially just supporting HTTP GET operations. This is cheap and it makes it possible to rely on affordable web hosting, or take advantage of free services, or even use a home server.

During the workshop, participants will create a Gatsby website starting from a provided template that includes the TEI rendering tools gatsby-transformer-ceteicean and gatsby-theme-ceteicean. These tools re-implement principles pioneered by CETEIcean, which relies on the browser’s DOM processing and HTML5 Custom Elements to publish TEI documents as a component pluggable into any HTML structure (Cayless and Viglianti 2018). Example TEI documents to integrate into the website will be provided, but attendees are encouraged to bring their own.

After an introduction on static sites, the motivations for using them, and an open discussion, the workshop will introduce:

  • How to set up Gatsby and the CETEIcean plugins
  • How to use built-in behaviors
  • Customization via CSS (and CSS-in-JS)
  • Defining custom behaviors as React components
  • Applying transformation to TEI documents before and after ingestion into Gatsby.

If time allows, we will conclude with open discussion and collaborative experimentation.

Participants must bring their own laptop and be able to install (free) software on it. Internet access will be required. The tutor will require a projector.

References:

Cayless, Hugh, and Raffaele Viglianti. “CETEIcean: TEI in the Browser.” Presented at Balisage: The Markup Conference 2018, Washington, DC, July 31 - August 3, 2018. In Proceedings of Balisage: The Markup Conference 2018. Balisage Series on Markup Technologies, vol. 21 (2018). https://doi.org/10.4242/BalisageVol21.Cayless01.

Biography:

Dr. Raffaele (Raff) Viglianti is a Senior Research Software Developer at the Maryland Institute for Technology in the Humanities, University of Maryland. His research is grounded in digital humanities and textual scholarship, where “text” includes musical notation. He researches new and efficient practices to model and publish textual sources as innovative and sustainable digital scholarly resources. Dr. Viglianti is currently an elected member of the Text Encoding Initiative technical council and the Technical Editor of the Scholarly Editing journal.

Viglianti-Building TEI-powered websites with static site technology A hands-134.docx
 
2:30pm - 4:00pmSIG 1: Manuscripts
Location: ARMB: 3.38
4:30pm - 6:00pmSIG 4: Correspondence
Location: ARMB: 3.38