Improving Historical Census Transcriptions: A Machine Learning Approach

Torben Johansen, Christian Møller Dahl, Sam Il Myoung Hwang, Munir Squires

Research output: Working paperResearch

Abstract

Historical U.S. censuses have been an important data source for economics, particularly because they allow researchers to track individuals’ life outcomes over long periods of time. However, linking individuals across multiple census rounds is challenging often due to errors in name transcription. In this paper, we improve the name transcription in historical U.S. censuses using a machine-learning model. Our ap-
proach resulted in a significant increase in the likelihood of linking individuals across censuses. We also find that our model performs especially well when human transcribers struggle, i.e., when the legibility of names on the original census form is low. The increased linkage rate is observed across nearly all socio-demographic subgroups, including those that are typically difficult to link.
Original languageEnglish
Publication statusPublished - 24. Aug 2024

Fingerprint

Dive into the research topics of 'Improving Historical Census Transcriptions: A Machine Learning Approach'. Together they form a unique fingerprint.

Cite this