TEI Conference and Members' Meeting 2022
September 12 - 16, 2022 | Newcastle, UK
Conference Agenda
Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).
|
Session Overview |
Date: Monday, 12/Sept/2022 | |
9:00am - 9:30am | Registration - Monday |
9:30am - 6:00pm | Workshop 1: From a collection of documents to a published edition : how to use an end-to-end publication pipeline [Full Day] Location: ARMB: 3.38 |
|
ID: 125
/ WS 1: 1
Workshop Keywords: digital edition, historical manuscripts, encoding pipeline, publication workflow From a collection of documents to a published edition : how to use an end-to-end publication pipeline 1Inria, France; 2Le Mans Université, France In 2021, during the last edition of the TEI Conference “Next Gen TEI”, I took part in a session where I presented a project I had been working on for a year and a half. This project, both relying massively on the Text Encoding Initiative and benefiting its community, focusses on the creation of a pipeline for the publication of digital scholarly editions. Our pipeline, which was still a work in progress at the time of the 2021 Conference, but is now complete, aims at providing open source, free, easy-to-use and interoperable tools; its goal is to support the editorial process from the digitization of a collection of documents to its publication in a machine-readable standard. In the following, I will succinctly describe the six steps that compose this pipeline, and then move to the way I intend to conduct the workshop based on them. Firstly, the collection of images that composes the corpus has to be preserved and curated somewhere online to keep them available for the researcher. For this task, we rely on IIIF, to ensure sustainability and interoperability. The three following steps (segmentation/transcription/post-OCR correction) are conducted with eScriptorium, an open-source automatic transcription application. It offers various options: images uploading, manual and automatic segmentation/transcription, import of models, production of ground truths, model training. Finally, if there are remaining errors in the transcription (in case of automatic transcription), it is possible to either correct them manually in eScriptorium or export the files and correct with the help of specifically designed scripts. Once the transcription is fully ready, we encode it in TEI XML. For this step, we provide various solutions, depending on the transcription file format (Page XML, XML Alto, Text). We also propose a series of scripts and documentation that help automatize and speed up this process. The publication itself is made available for online consultation with the help of TEI Publisher, an application created to generate custom publications for corpora encoded in TEI XML. We have developed and launched a dedicated application for digital scholarly editions (DiScholEd) on this basis. It is available online together with a thorough documentation, and is conceived as an open application: new corpora can always be added to it, and we welcome new collaborations. The goal of our workshop is to demonstrate how an available corpus could be processed for publication on the DiScholEd application. The workshop participants will learn to experiment with a ready-to-use solution that provides an easy and quick online publication of a corpus. They will also get tips and shortcuts to help speed up the creation of a digital edition. Moreover, by the end of the session, this workshop will provide the participants with a visualization of their respective corpus, with side by side transformed text and original image; all of which then showing what can be achieved while working with TEI in the context of an end-to-end publication pipeline. The program for this workshop is the following: Firstly, it will start with a presentation of the development of the pipeline, its objectives and how it works. Then, the time we have will be divided into several slots corresponding to the work steps of the pipeline. Each slot will start with a quick presentation of what is expected of the participants and what tools they will need to use. Next, they will be allotted some time to process their data according to the requirements of the concerned work step, as they all require a certain amount of time. At the end of the day, a 30mn feedback session will make it possible for each participant as well as for the workshop organizers to assess the benefits of the session and envision further possible collaborations. Considering the number of steps in this pipeline and the time required for each of these steps, a full day is necessary for this workshop. The number of participants should be 10-15 maximum, in order for the two workshop conveners to be able to provide the necessary technical support in the hands-on parts of the workshop. In order for the participants to be able to work correctly on the pipeline, they will need a laptop as well as the following tools: a command line interface for the execution of the scripts and an XML editor (Oxygen is the best choice). It is also preferable if, beforehand, they get an account at Huma-Num and eScriptorium. GitHub repository of the pipeline: https://github.com/DiScholEd/pipeline-digital-scholarly-editions |
11:00am - 11:30am | Monday Morning Refreshment Break Location: ARMB: King's Hall |
1:00pm - 2:30pm | Monday Lunch Break Location: ARMB: King's Hall |
2:30pm - 6:00pm | Workshop 2: Creating Digital Editions with FairCopy [Half Day, Afternoon] Location: ARMB: 1.04 |
|
ID: 112
/ WS 2: 1
Workshop Keywords: Digital Humanities Critical Editions Tools IIIF Creating Digital Editions with FairCopy Performant Software Solutions LLC, United States of America In this half day workshop, participants will learn how to use FairCopy to transform historical texts into online digital editions. Using crowdsourced transcriptions as a starting point, we will add semantic structure and mark names of people, places, and events. We will then publish our digital editions using Hugo. The TEI Guidelines have been used by hundreds of scholarly projects and are an essential tool for researching, preserving, and disseminating cultural heritage world-wide. And yet, despite its mission to provide a common vocabulary for describing texts, TEI faces problems of adoption and use in the wider scholarly community. While the basics of TEI XML encoding are simple enough, true fluency in TEI requires institutional support and commitment in the form of training, technical staff, IT infrastructure, and the time and commitment of the individual scholar. Even within institutions that have these resources, projects often adopt a simpler interface for domain experts to interact with. This interface then translates the scholar’s work into TEI behind the scenes. This is sometimes accomplished technologically, sometimes through a tiered system of labor, or both. These interfaces are more often than not specialized to the needs of the projects which develop them. This current state of affairs leads to a structural problem of access which further limits whose texts can be digitized and preserved. FairCopy addresses this problem of access by providing a simple editing environment in which anyone can produce valid TEI documents. FairCopy doesn’t hide the complexity of TEI, but rather makes it available for users to explore at their own pace. Users are quickly comfortable with its interface and able to focus on the text, not XML syntax. FairCopy has support for most of the 500+ elements in TEI and allows users to customize a schema for their particular project. Scholars can seamlessly import and export TEI-XML documents. Additionally, scholars can bring in IIIF images of primary resources and link them to their transcriptions. In this half day workshop, participants will learn how to use FairCopy to transform historical texts into online digital editions encoded using TEI. Using crowdsourced transcriptions as a starting point, we will add semantic structure and mark names of people, places, and events. We will then publish our digital editions using Hugo. In the first part of the workshop, we will begin with a demonstration of FairCopy. We will then select texts to work on based on participants interests. Participants are encouraged to bring their own texts. Finally, we will break into small groups. In the second part, each group will work on encoding a text using FairCopy. Participants will work collaboratively to choose elements and attributes that best suit their selected texts. The presenter will float between groups answering questions. In the third part, we will export our texts into a pre-made Hugo template that can display both the original IIIF page images and the TEI encoded texts. Participants in this workshop will need to bring a Mac, Windows, or Linux laptop on which they can install FairCopy for free. No web design or XML skills are required. Participants in this workshop will learn how to use FairCopy to create a digital edition. They will also learn about using TEI semantics to structure and mark texts. The will also gain familiarity with using IIIF Manifests to interoperate between library collections and digital editions. Presenter Bio Nick Laiacona is a partner at Performant Software Solutions LLC. Performant serves clients in the Digital Humanities throughout North America and Europe. Laiacona has developed tools for critical digital editions including: Juxta, Digital Mappa, TextLab, and now FairCopy. Laiacona has helped produce a number of critical editions, including “Secrets of Craft and Nature in Renaissance France” and the “Melville Electronic Library.” |
2:30pm - 6:00pm | Workshop 3: A short introduction to Schematron [Half Day, Afternoon] Location: ARMB: 1.06 |
|
ID: 130
/ WS 3: 1
Workshop Keywords: Schematron, Validation, Quality Assurance A short introduction to Schematron State and University Library Hamburg, Germany Schematron is a rule based validation language for structured documents. It was designed by Rick Jelliffe in 1999 (Jelliffe 1999) and standardized as ISO/IEC 19757-3 in 2006 (ISO 2006). The key concepts of Schematron validation are patterns that are the focus of a validation, rules selecting the portions of a document contributing to the pattern, and assertion tests that are run in the context of a rule. Schematron uses XPath both as the language to select the portion of a document and as the language of the assertion tests. This use of XPath gives Schematron the flexibility to validate arbitrary relationships and dependencies of information items in a document. What also sets Schematron apart from other languages is that it encourages the use of natural language descriptions targeted to human readers. This way validation can be more than just a binary distinction (document valid/invalid) but also support authors of in-progress documents with quick feedback on erroneous or unwanted document structure and content. The flexibility and (relative) simplicity of Schematron makes it an invaluable tool for XML-based text encoding projects. The range of supported tasks reaches from "hard" validation to enforce constraints on documents to "soft" validation to report potential problems such as Unicode characters from Unicode Private Use Areas to interactive error correction with Schematron extensions like Schematron QuickFix (Kutscherauer and Nadolu 2018). This half-day workshop will introduce the participants to principle idea of Schematron and practice its application to XML-based text encoding projects. Together we will explore patterns, rules, and assertions as the basic Schematron concepts and touch phases, variables, and abstract patterns as more advanced features of Schematron validation. From the participants the workshop requires a general understanding of XML document editing and basic knowledge of XPath. The material requirements are a projector and laptops to follow through with the examples given in the workshop. Any operating system with a recent Java Runtime is sufficient Participants are recommended to bring their own device. |
4:00pm - 4:30pm | Monday Afternoon Refreshment Break Location: ARMB: King's Hall |
Contact and Legal Notice · Contact Address: Privacy Statement · Conference: TEI 2022 |
Conference Software - ConfTool Pro 2.6.145+CC © 2001–2022 by Dr. H. Weinreich, Hamburg, Germany |