TY - JOUR
T1 - Evaluating the Linguistic Coverage of OpenAlex
T2 - An Assessment of Metadata Accuracy and Completeness
AU - Céspedes, Lucía
AU - Kozlowski, Diego
AU - Pradier, Carolina
AU - Sainte-Marie, Maxime Holmberg
AU - Shokida, Natsumi Solange
AU - Benz, Pierre
AU - Poitras, Constance
AU - Ninkov, Anton Boudreau
AU - Ebrahimy, Saeideh
AU - Ayeni, Philips
AU - Filali, Sarra
AU - Li, Bing
AU - Larivière, Vincent
N1 - Reference list updated and corrected, corresponding author's email contact added, minor typographical errors corrected
PY - 2025/1/14
Y1 - 2025/1/14
N2 - Clarivate's Web of Science (WoS) and Elsevier's Scopus have been for decades the main sources of bibliometric information. Although highly curated, these closed, proprietary databases are largely biased toward English‐language publications, underestimating the use of other languages in research dissemination. Launched in 2022, OpenAlex promised comprehensive, inclusive, and open‐source research information. While already in use by scholars and research institutions, the quality of its metadata is currently still being assessed. This paper contributes to this literature by assessing the completeness and accuracy of OpenAlex's metadata related to language, through a comparison with WoS, as well as an in‐depth manual validation of a sample of 6836 articles. Results show that OpenAlex exhibits a far more balanced linguistic coverage than WoS. However, language metadata are not always accurate, which leads OpenAlex to overestimate the place of English while underestimating that of other languages. If used critically, OpenAlex can provide comprehensive and representative analyses of languages used for scholarly publishing, but more work is needed at infrastructural level to ensure the quality of metadata on language.
AB - Clarivate's Web of Science (WoS) and Elsevier's Scopus have been for decades the main sources of bibliometric information. Although highly curated, these closed, proprietary databases are largely biased toward English‐language publications, underestimating the use of other languages in research dissemination. Launched in 2022, OpenAlex promised comprehensive, inclusive, and open‐source research information. While already in use by scholars and research institutions, the quality of its metadata is currently still being assessed. This paper contributes to this literature by assessing the completeness and accuracy of OpenAlex's metadata related to language, through a comparison with WoS, as well as an in‐depth manual validation of a sample of 6836 articles. Results show that OpenAlex exhibits a far more balanced linguistic coverage than WoS. However, language metadata are not always accurate, which leads OpenAlex to overestimate the place of English while underestimating that of other languages. If used critically, OpenAlex can provide comprehensive and representative analyses of languages used for scholarly publishing, but more work is needed at infrastructural level to ensure the quality of metadata on language.
KW - cs.DL
KW - cs.DB
U2 - 10.1002/asi.24979
DO - 10.1002/asi.24979
M3 - Journal article
SN - 2330-1635
VL - 76
SP - 884
EP - 895
JO - Journal of the Association for Information Science and Technology
JF - Journal of the Association for Information Science and Technology
IS - 6
ER -