Abstract
In this study, we demonstrate how to apply cross-lingual annotation projection to transfer named-entity annotations to classical languages for which limited or no resources and annotated texts are available, aiming to enrich their NER training datasets and train a model to perform NER tagging. Our method uses sentence-level aligned parallel corpora ancient texts and the translation in a modern language, for which high-quality off-the-shelf NER systems are available. We automatically annotate the text of the modern language and employ a state-of-the-art neural word alignment system to find translation equivalents. Finally, we transfer the annotations to the corresponding tokens in the ancient texts using a direct projection heuristic.We applied our method to ancient Greek, Latin, and Arabic using the Bible with the English translation as a parallel corpus. We used the resulting annotations to enhance the performance of an existing NER model for ancient Greek
Originalsprog | Engelsk |
---|---|
Titel | Proceedings of the 7th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature |
Redaktører | Stefania Degaetano-Ortlieb, Anna Kazantseva, Nils Reiter, Stan Szpakowicz |
Antal sider | 8 |
Udgivelsessted | Dubrovnik, Croatia |
Forlag | Association for Computational Linguistics |
Publikationsdato | maj 2023 |
Sider | 175-182 |
ISBN (Elektronisk) | 9781959429548 |
DOI | |
Status | Udgivet - maj 2023 |
Begivenhed | 7th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature - Dubrovnik, Kroatien Varighed: 5. maj 2023 → 5. maj 2023 |
Workshop
Workshop | 7th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature |
---|---|
Land/Område | Kroatien |
By | Dubrovnik |
Periode | 05/05/2023 → 05/05/2023 |