JavaScript is Disabled
Your browser's JavaScript functionality is disabled. It has to be enabled to use this function of ConfTool.
Here you can find information on how to enable JavaScript
If you have any problems, please contact the organisers at tei2022@ncl.ac.uk.

Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

Session Overview

Session

Closing Keynote: Emmanuel Ngue Um, 'Tone as “Noiseless Data”: Insight from Niger-Congo Tone Languages'

Time:

Friday, 16/Sept/2022:

2:30pm - 4:00pm

Session Chair: Martina Scholger, University of Graz

Location: ARMB: 2.98

Armstrong Building: Lecture Room 2.98. Capacity: 168

With Closing Remarks, Dr James Cummings, Local TEI2022 Conference Organiser

Presentations

ID: 166 / Closing Keynote: 1
Invited Keynote

Tone as “Noiseless Data”: Insight from Niger-Congo Tone Languages

E. Ngue Um

University of Yaoundé 1 & University of Bertoua (Cameroon), Cameroon

Text processing assumes two layers of textual data: a "noisy" layer and a "noiseless" layer. The “noisy” layer is generally considered unsuitable for analysis and is eliminated at the pre-processing stage. In current Natural Language Processing (NLP) technologies like text generation in machine translation, the representation of tones as diacritical symbols in the orthography of Niger-Congo languages leads to these symbols being pre-processed as “noisy” data. As an illustration, none of the 15 Niger-Congo tone languages modules available on Google Translate delivers in a systematic and consistent manner, text data that contains linguistic information encoded through tone melody.

The Text Encoding Initiative (TEI) is a framework which can be used to circumvent the “noisiness” brought about by diacritical tone symbols in the processing of text data of Niger-Congo languages.

In novel work, I propose a markup scheme for tone that encompasses:

a) The markup of tone units within an <m> (morpheme) element; this aims to capture the functional properties of tone units, just like segmental morphemes.

b) The markup of tonal characters (diacritical symbols) within a <g> (glyph) element and the representation of the pitch by hexadecimal data representing the Unicode character code for that pitch; this aims to capture tone marks as autonomous symbols, in contrast with their combining layout when represented as diacritics.

c) The markup of downstep and upstep within an <accid> (accidental) element mirroring musical accidentals such as “sharp” and “flat”; this aims to capture strictly melodic properties of tone on a separate annotation tier.

The objectives of tone encoding within the TEI framework are threefold:

a) To harness quantitative research on tone in Niger-Congo languages.

b) To leverage “clean” language data of Niger-Congo languages that can be used more efficiently in machine learning tasks for tone generation in textual data.

c) To gain better insights into the orthography of tone in Niger-Congo languages.

In this paper, I will show how this novel perspective to the annotation of tone can be applied productively, using a corpus of language data stemming from 120 Niger-Congo languages.

Ngue Um-Tone as “Noiseless Data”-166.pdf