An Annotated Error Corpus for Esperanto

Eckhard Bick*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Abstract

This paper presents and evaluates a new multi-genre error corpus for (written) Esperanto, EspEraro, building on both learner, news and internet data and covering both ordinary spelling errors and real-word errors such as grammatical and word choice errors. Because the corpus has been annotated not only for errors, error types and corrections, but also with Constraint Grammar (CG) tags for part-of-speech, inflection, affixation, syntactic function, dependency and semantic class, it allows users to linguistically contextualize errors and to craft and test CG rules aiming at the recognition and/or correction of the various error types covered in the corpus. The resource was originally created for regression-testing a newly developed spell- and grammar checker, and contains about 75,000 tokens (~ 4,000 sentences), with 3,330 tokens annotated for one or more errors and a combined correction suggestion. We discuss the different error types and evaluate their weight in the corpus. Where relevant, we explain the role of Constraint Grammar (CG) in the identification and correction of the individual error types.
Original languageEnglish
Title of host publicationProceedings of the 9th Workshop on Constraint Grammar and Finite State NLP : Rule-based and hybrid methods and tools for user communities
EditorsTrond Trosterud, Linda Wiechetek, Flammie Pirinen
Place of PublicationTartu
PublisherUniversity of Tartu Library
Publication dateMar 2025
Pages1-8
ISBN (Electronic)978-9908-53-113-7
Publication statusPublished - Mar 2025
EventThe 9th Workshop on Constraint Grammar and Finite State NLP: Rule-based and hybrid methods and tools for user communities - Hestia Hotel Europa, Tallinn, Estonia
Duration: 5. Mar 20255. Mar 2025

Workshop

WorkshopThe 9th Workshop on Constraint Grammar and Finite State NLP
LocationHestia Hotel Europa
Country/TerritoryEstonia
CityTallinn
Period05/03/202505/03/2025

Fingerprint

Dive into the research topics of 'An Annotated Error Corpus for Esperanto'. Together they form a unique fingerprint.

Cite this