Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

 
Only Sessions at Location/Venue 
 
 
Session Overview
Session
Session 1A: Short-Papers
Time:
Wednesday, 14/Sept/2022:
9:30am - 11:00am

Session Chair: Martin Holmes, University of Victoria
Location: ARMB: 2.98

Armstrong Building: Lecture Room 2.98. Capacity: 168

Show help for 'Increase or decrease the abstract text size'
Presentations
ID: 140 / Session 1A: 1
Short Paper
Keywords: text mining, stand-off annotations, models of text, generic services

Standoff-Tools. Generic services for building automatic annotation pipelines around existing tools for plain text analysis

C. Lück

Universität Münster, Germany

TEI XML excels at encoding text. But when it comes to machine-based

analysis of a corpus as data, XML is no good platform. NLP, NER, topic

modelling, text-reuse detection etc. work on plain text; they get very

complicated and slow, if they have to traverse a tree structure. While

extracting plain text from XML is simple, feeding the result back into

XML is tricky. However, having the analysis in XML is desired: Its

result can be related to the internal markup, e.g. for overviews of

names per chapter, ellipsis per verse etc. In my short paper I will

introduce standoff-tools, a suite of generic tools for building

(automatic) annotation pipelines around plain text tools.

standoff-tools implement the extractor *E* and the internalizer

*I*. *E* produces a special flavour of plain text, I term *equidistant

plain text*: The XML tags are replaced by special characters,

e.g. zero-width non-joiner U+200C, i.e. all non-special characters

have the same character offset as in the XML source. This equidistant

plain text can then be fed to an arbitrary tagger *T* designed for

plain text. Its only requirement is to produce positioning

information. *I* inserts tags based on positioning information into

XML. For this purpose, it splits the annotated spans of text, so that

the result is syntactically valid XML without overlapping edges. It

aggregates the splits back together with `@next` and `@from`.

Optionally, a shrinker *S* removes the special characters in the

output of *E* and also produces a map of character positions. This map

of character positions is applied by a corrector *C* to the

positioning information produced by the tagger *T*.

The internalizer can also be used to internalize stand-off markup

produced manually with CATMA, GNU Emacs standoff-mode, etc. into

syntactically correct XML.

Lück-Standoff-Tools Generic services for building automatic annotation pipelines around existing tools for .odt


ID: 103 / Session 1A: 2
Short Paper
Keywords: TEI, indexes, XQuery

TEI Automatic Enriched List of Names (TAELN): An XQuery-based Open Source Solution for the Automatic Creation of Indexes from TEI and RDF Data

G. Fernandez Riva

Universität Heidelberg, Germany

The annotation of names of persons, place or organizations is a common feature of TEI editions. One way of identifying the annotated individuals is through the use of IDs from authority records like Geonames, Wikidata or the GND.

In this paper I will introduce an open source tool written in XQuery that enables the creation of TEI indexes using a very flexible custom templating language. The TEI Automatic Enriched List of Names (TAELN) uses the ids according to one authority document to create a custom index (model.listLike) with information from one or more RDF endpoints.

TAELN has been developed for the edition of the diaries and travel journals from Albrecht Dürer and his family. People, places and art works are identified with GND-numbers in the TEI edition. The indexes generated with TAELN include some information from GND records, but mostly from duerer.online, a virtual research portal, created with WissKI (https://wiss-ki.eu/), which offers an RDF endpoint.

TAELN relies on an XML-template to indicate how to retrieve information from the different endpoints and how to structure the desired TEI output. The templates use a straight-forward but flexible syntax. Simple use cases are depicted in the following example that retrieves the person name from the GND and the occupation from WissKI (which relies on the so-called »Pathbuilder syntax«).

<person>

<persName origin="gnd">preferredNameForThePerson</persName>

<occupation origin="wisski">ecrm:E21_Person -> ecrm:P11i_participated_in -> wvz:WV7_Occupation -> ecrm:P3_has_note</occupation>

</person>

Much more complex outputs can be achieved. TAELN offers editions an out of the box solution to generate TEI indexes by gathering information from different endpoints and it only requires the creation of the corresponding template and the knowledge of how to apply an XQuery transformation. The tool will be published shortly before the date of the TEI conference.

Fernandez Riva-TEI Automatic Enriched List of Names-103.docx


ID: 151 / Session 1A: 3
Short Paper
Keywords: manuscripts, codicology, paleography, XForms

manuForma – A Web Tool for Cataloging Manuscript Data

M. de Molière

University of Munich, Germany

The team of the ERC-funded project "MAJLIS – The Transformation of Jewish Literature in Arabic in the Islamicate World" at the University of Munich needed a software solution for describing manuscripts in TEI that would be easy to learn for non-specialists. After about one year of development, manuForma provides to our manuscript catalogers an accessible platform for entering their data. Users can choose elements and attributes from a list, add them to their catalog file and rearrange them with a mouse click. While manuForma does not spare our catalogers the need to learn the fundamentals of TEI, the restrictions the forms based approach proffers, enhances both TEI conformance and the uniformity of our catalog records. Moreover, our tool eliminates the need to install commercial XML editors on the machine of each and every project member tasked with describing manuscripts. Instead, our tool offers a web interface for the entire editorial process.

At the heart, manuForma uses XForms, which has been modified to allow adding, moving and deleting elements and attributes. A tightly knit schema file controls which elements and attributes can be added and in which situations to ensure conformance to the project's scholarly objectives. As an existDB application, manuForma integrates well with other apps that provide the front end to the manuscript catalog. TEI records can be stored on and retrieved from GitHub, tying the efforts of the entire team together. The web solution is adaptable to other entities by writing a dedicated schema and template file. Moreover, manuForma will be available under an OpenSource licence.

de Molière-manuForma – A Web Tool for Cataloging Manuscript Data-151.docx


 
Contact and Legal Notice · Contact Address:
Privacy Statement · Conference: TEI 2022
Conference Software - ConfTool Pro 2.6.145+CC
© 2001–2022 by Dr. H. Weinreich, Hamburg, Germany