Abstract
In this study, we demonstrate how to apply cross-lingual annotation projection to transfer named-entity annotations to classical languages for which limited or no resources and annotated texts are available, aiming to enrich their NER training datasets and train a model to perform NER tagging. Our method uses sentence-level aligned parallel corpora ancient texts and the translation in a modern language, for which high-quality off-the-shelf NER systems are available. We automatically annotate the text of the modern language and employ a state-of-the-art neural word alignment system to find translation equivalents. Finally, we transfer the annotations to the corresponding tokens in the ancient texts using a direct projection heuristic.We applied our method to ancient Greek, Latin, and Arabic using the Bible with the English translation as a parallel corpus. We used the resulting annotations to enhance the performance of an existing NER model for ancient Greek
Original language | English |
---|---|
Title of host publication | Proceedings of the 7th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature |
Editors | Stefania Degaetano-Ortlieb, Anna Kazantseva, Nils Reiter, Stan Szpakowicz |
Number of pages | 8 |
Place of Publication | Dubrovnik, Croatia |
Publisher | Association for Computational Linguistics |
Publication date | May 2023 |
Pages | 175-182 |
ISBN (Electronic) | 9781959429548 |
DOIs | |
Publication status | Published - May 2023 |
Event | 7th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature - Dubrovnik, Croatia Duration: 5. May 2023 → 5. May 2023 |
Workshop
Workshop | 7th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature |
---|---|
Country/Territory | Croatia |
City | Dubrovnik |
Period | 05/05/2023 → 05/05/2023 |