Validation of the Danish PAROLE lexicon (upubliceret): Final Report, Under contract of October 18, 1999, with ELRA/ELDA

Publikation: AndetAndet bidragRådgivning

Abstrakt

This validation is based on the Danish PAROLE lexicon dated June 20, 1998, downloaded on March 16, 1999.

Subsequently, the developers of the lexicon have informed us that they have been revising the lexicon, in particular the morphological level. Morphological entries were originally generated automatically from a machine-readable version of the Official Danish Spelling Dictionary (Retskrivningsordbogen 1986, in the following RO86), and this resulted in some overgeneration, which the developers started eliminating after submitting the Danish PAROLE lexicon for validation. The present validation is, however, based on the January 1997 version of the lexicon.

The validation as such complies with the specifications described in ELRA validation manuals for lexical data, i.e. Underwood and Navaretta: "A Draft Manual for the Validation of Lexica, Final Report" [Underwood & Navaretta1997] and Braasch: "A Draft Protocol for the Validation of PAROLE Lexica" [Braasch 1998].

The investigation was carried out manually using the GNU emacs search facility, and automatically using an sgrep tool which allowed automatic extraction of  entries with specified attributes and/or values. The tool did not, however, make it possible to extract entries which did NOT contain certain features, e.g. grammatical category, which would have been relevant. Also, it did not allow us to match e.g. Description IDs against the lexicon entries to see if they were actually used[1]. As such "blind" Descriptions, if any, would be of no consequence for the quality of the lexicon entries coded, we did not attempt to check for them manually.

As specified in the validation manuals, we have calculated error rates. When testing the entire lexicon for the error in question, we have calculated the error rate as a percentage of the entire lexicon. When testing only a sample for the error in question, but not the entire lexicon, we have calculated the error rate as a percentage of that sample. In each case, the method will be specified.

[1] Actually, there may be such "blind" declarations: the noun file substantiv.sgml contains 21 PositionCs, i.e. functional descriptions of arguments, which are never used.

OriginalsprogEngelsk
Publikationsdato2000
UdgivelsesstedKolding
StatusUdgivet - 2000

Bibliografisk note

Udgiver: Syddansk Universitet

Emneord

  • sprogteknologi, vidensdeling, informationsstyring
  • leksikalske ressourcer

Fingeraftryk

Dyk ned i forskningsemnerne om 'Validation of the Danish PAROLE lexicon (upubliceret): Final Report, Under contract of October 18, 1999, with ELRA/ELDA'. Sammen danner de et unikt fingeraftryk.

Citationsformater