Abstract
Tables are paramount in quantitive social science, economics, and demography as they provide structured information that can easily be operationalised for statistical analysis. We provide an overview of the challenges of transcribing such tables and suggest novel applications of coherent point drift, auto-encoders and geometric map learning for this purpose. We show that these methods can eectively be applied for automated segmentation of tables from historic documents and be used as a pre-step before feeding into conventional OCR and transcription systems (eg. Tesseract/micro-task platforms).
Original language | English |
---|---|
Publication status | Published - 2019 |