TEI Conference and Members' Meeting 2022
September 12 - 16, 2022 | Newcastle, UK
Conference Agenda
Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).
|
Session Overview |
Session | |||
Session 1A: Short-Papers
| |||
Presentations | |||
ID: 140
/ Session 1A: 1
Short Paper Keywords: text mining, stand-off annotations, models of text, generic services Standoff-Tools. Generic services for building automatic annotation pipelines around existing tools for plain text analysis Universität Münster, Germany TEI XML excels at encoding text. But when it comes to machine-based analysis of a corpus as data, XML is no good platform. NLP, NER, topic modelling, text-reuse detection etc. work on plain text; they get very complicated and slow, if they have to traverse a tree structure. While extracting plain text from XML is simple, feeding the result back into XML is tricky. However, having the analysis in XML is desired: Its result can be related to the internal markup, e.g. for overviews of names per chapter, ellipsis per verse etc. In my short paper I will introduce standoff-tools, a suite of generic tools for building (automatic) annotation pipelines around plain text tools. standoff-tools implement the extractor *E* and the internalizer *I*. *E* produces a special flavour of plain text, I term *equidistant plain text*: The XML tags are replaced by special characters, e.g. zero-width non-joiner U+200C, i.e. all non-special characters have the same character offset as in the XML source. This equidistant plain text can then be fed to an arbitrary tagger *T* designed for plain text. Its only requirement is to produce positioning information. *I* inserts tags based on positioning information into XML. For this purpose, it splits the annotated spans of text, so that the result is syntactically valid XML without overlapping edges. It aggregates the splits back together with `@next` and `@from`. Optionally, a shrinker *S* removes the special characters in the output of *E* and also produces a map of character positions. This map of character positions is applied by a corrector *C* to the positioning information produced by the tagger *T*. The internalizer can also be used to internalize stand-off markup produced manually with CATMA, GNU Emacs standoff-mode, etc. into syntactically correct XML.
ID: 103
/ Session 1A: 2
Short Paper Keywords: TEI, indexes, XQuery TEI Automatic Enriched List of Names (TAELN): An XQuery-based Open Source Solution for the Automatic Creation of Indexes from TEI and RDF Data Universität Heidelberg, Germany The annotation of names of persons, place or organizations is a common feature of TEI editions. One way of identifying the annotated individuals is through the use of IDs from authority records like Geonames, Wikidata or the GND. In this paper I will introduce an open source tool written in XQuery that enables the creation of TEI indexes using a very flexible custom templating language. The TEI Automatic Enriched List of Names (TAELN) uses the ids according to one authority document to create a custom index (model.listLike) with information from one or more RDF endpoints. TAELN has been developed for the edition of the diaries and travel journals from Albrecht Dürer and his family. People, places and art works are identified with GND-numbers in the TEI edition. The indexes generated with TAELN include some information from GND records, but mostly from duerer.online, a virtual research portal, created with WissKI (https://wiss-ki.eu/), which offers an RDF endpoint. TAELN relies on an XML-template to indicate how to retrieve information from the different endpoints and how to structure the desired TEI output. The templates use a straight-forward but flexible syntax. Simple use cases are depicted in the following example that retrieves the person name from the GND and the occupation from WissKI (which relies on the so-called »Pathbuilder syntax«). <person> <persName origin="gnd">preferredNameForThePerson</persName> <occupation origin="wisski">ecrm:E21_Person -> ecrm:P11i_participated_in -> wvz:WV7_Occupation -> ecrm:P3_has_note</occupation> </person> Much more complex outputs can be achieved. TAELN offers editions an out of the box solution to generate TEI indexes by gathering information from different endpoints and it only requires the creation of the corresponding template and the knowledge of how to apply an XQuery transformation. The tool will be published shortly before the date of the TEI conference.
ID: 151
/ Session 1A: 3
Short Paper Keywords: manuscripts, codicology, paleography, XForms manuForma – A Web Tool for Cataloging Manuscript Data University of Munich, Germany The team of the ERC-funded project "MAJLIS – The Transformation of Jewish Literature in Arabic in the Islamicate World" at the University of Munich needed a software solution for describing manuscripts in TEI that would be easy to learn for non-specialists. After about one year of development, manuForma provides to our manuscript catalogers an accessible platform for entering their data. Users can choose elements and attributes from a list, add them to their catalog file and rearrange them with a mouse click. While manuForma does not spare our catalogers the need to learn the fundamentals of TEI, the restrictions the forms based approach proffers, enhances both TEI conformance and the uniformity of our catalog records. Moreover, our tool eliminates the need to install commercial XML editors on the machine of each and every project member tasked with describing manuscripts. Instead, our tool offers a web interface for the entire editorial process. At the heart, manuForma uses XForms, which has been modified to allow adding, moving and deleting elements and attributes. A tightly knit schema file controls which elements and attributes can be added and in which situations to ensure conformance to the project's scholarly objectives. As an existDB application, manuForma integrates well with other apps that provide the front end to the manuscript catalog. TEI records can be stored on and retrieved from GitHub, tying the efforts of the entire team together. The web solution is adaptable to other entities by writing a dedicated schema and template file. Moreover, manuForma will be available under an OpenSource licence.
|
Contact and Legal Notice · Contact Address: Privacy Statement · Conference: TEI 2022 |
Conference Software - ConfTool Pro 2.6.145+CC © 2001–2022 by Dr. H. Weinreich, Hamburg, Germany |