Abstract
This contribution presents a novel approach to the development and evaluation of transformer-based models for Named Entity Recognition and Classification in Ancient Greek texts. We trained two models with annotated datasets by consolidating potentially ambiguous entity types under a harmonized tagset. Then, we tested their performance with out-of-domain texts, reproducing a real-world use case. Both models performed very well under these conditions, with the multilingual model Ancient Greek Alignment being slightly superior. In the conclusion, we emphasize current limitations due to the scarcity of high-quality annotated corpora and to the lack of cohesive annotation strategies for ancient languages.
Originalsprog | Engelsk |
---|---|
Titel | 3rd Workshop on Language Technologies for Historical and Ancient Languages, LT4HALA 2024 at LREC-COLING 2024 - Workshop Proceedings |
Redaktører | Rachele Sprugnoli, Marco Passarotti |
Forlag | European Language Resources Association (ELRA) |
Publikationsdato | 2024 |
Sider | 89–97 |
ISBN (Elektronisk) | 9782493814463 |
Status | Udgivet - 2024 |
Begivenhed | 3rd Workshop on Language Technologies for Historical and Ancient Languages, LT4HALA 2024 - Torino, Italien Varighed: 25. maj 2024 → … |
Konference
Konference | 3rd Workshop on Language Technologies for Historical and Ancient Languages, LT4HALA 2024 |
---|---|
Land/Område | Italien |
By | Torino |
Periode | 25/05/2024 → … |