Named Entity Annotation Projection Applied to Classical Languages

Tariq Yousef*, Chiara Palladino, Gerhard Heyer, Stefan Jänicke

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Abstract

In this study, we demonstrate how to apply cross-lingual annotation projection to transfer named-entity annotations to classical languages for which limited or no resources and annotated texts are available, aiming to enrich their NER training datasets and train a model to perform NER tagging. Our method uses sentence-level aligned parallel corpora ancient texts and the translation in a modern language, for which high-quality off-the-shelf NER systems are available. We automatically annotate the text of the modern language and employ a state-of-the-art neural word alignment system to find translation equivalents. Finally, we transfer the annotations to the corresponding tokens in the ancient texts using a direct projection heuristic.We applied our method to ancient Greek, Latin, and Arabic using the Bible with the English translation as a parallel corpus. We used the resulting annotations to enhance the performance of an existing NER model for ancient Greek
Original languageEnglish
Title of host publicationProceedings of the 7th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature
EditorsStefania Degaetano-Ortlieb, Anna Kazantseva, Nils Reiter, Stan Szpakowicz
Number of pages8
Place of PublicationDubrovnik, Croatia
PublisherAssociation for Computational Linguistics
Publication dateMay 2023
Pages175-182
ISBN (Electronic)9781959429548
DOIs
Publication statusPublished - May 2023
Event7th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature - Dubrovnik, Croatia
Duration: 5. May 20235. May 2023

Workshop

Workshop7th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature
Country/TerritoryCroatia
CityDubrovnik
Period05/05/202305/05/2023

Fingerprint

Dive into the research topics of 'Named Entity Annotation Projection Applied to Classical Languages'. Together they form a unique fingerprint.

Cite this