Named Entity Annotation Projection Applied to Classical Languages

Tariq Yousef*, Chiara Palladino, Gerhard Heyer, Stefan Jänicke

*Kontaktforfatter

Publikation: Kapitel i bog/rapport/konference-proceedingKonferencebidrag i proceedingsForskningpeer review

Abstract

In this study, we demonstrate how to apply cross-lingual annotation projection to transfer named-entity annotations to classical languages for which limited or no resources and annotated texts are available, aiming to enrich their NER training datasets and train a model to perform NER tagging. Our method uses sentence-level aligned parallel corpora ancient texts and the translation in a modern language, for which high-quality off-the-shelf NER systems are available. We automatically annotate the text of the modern language and employ a state-of-the-art neural word alignment system to find translation equivalents. Finally, we transfer the annotations to the corresponding tokens in the ancient texts using a direct projection heuristic.We applied our method to ancient Greek, Latin, and Arabic using the Bible with the English translation as a parallel corpus. We used the resulting annotations to enhance the performance of an existing NER model for ancient Greek
OriginalsprogEngelsk
TitelProceedings of the 7th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature
RedaktørerStefania Degaetano-Ortlieb, Anna Kazantseva, Nils Reiter, Stan Szpakowicz
Antal sider8
UdgivelsesstedDubrovnik, Croatia
ForlagAssociation for Computational Linguistics
Publikationsdatomaj 2023
Sider175-182
ISBN (Elektronisk)9781959429548
DOI
StatusUdgivet - maj 2023
Begivenhed7th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature - Dubrovnik, Kroatien
Varighed: 5. maj 20235. maj 2023

Workshop

Workshop7th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature
Land/OmrådeKroatien
ByDubrovnik
Periode05/05/202305/05/2023

Citationsformater