Resumé

Tables are paramount in quantitive social science, economics, and demography as they provide structured information that can easily be operationalised for statistical analysis. We provide an overview of the challenges of transcribing such tables and suggest novel applications of coherent point drift, auto-encoders and geometric map learning for this purpose. We show that these methods can eectively be applied for automated segmentation of tables from historic documents and be used as a pre-step before feeding into conventional OCR and transcription systems (eg. Tesseract/micro-task platforms).
OriginalsprogEngelsk
StatusUdgivet - 2019

Fingeraftryk

Optical character recognition
Social sciences
Transcription
Statistical methods
Economics

Citer dette

@techreport{8b389cc37af14ec2b3c2bc2a768ed8c8,
title = "Table detection and Segmentation",
abstract = "Tables are paramount in quantitive social science, economics, and demography as they provide structured information that can easily be operationalised for statistical analysis. We provide an overview of the challenges of transcribing such tables and suggest novel applications of coherent point drift, auto-encoders and geometric map learning for this purpose. We show that these methods can eectively be applied for automated segmentation of tables from historic documents and be used as a pre-step before feeding into conventional OCR and transcription systems (eg. Tesseract/micro-task platforms).",
author = "Dahl, {Christian M.} and S{\o}rensen, {Emil N{\o}rmark} and Westermann, {Christian Emil}",
year = "2019",
language = "English",
type = "WorkingPaper",

}

TY - UNPB

T1 - Table detection and Segmentation

AU - Dahl, Christian M.

AU - Sørensen, Emil Nørmark

AU - Westermann, Christian Emil

PY - 2019

Y1 - 2019

N2 - Tables are paramount in quantitive social science, economics, and demography as they provide structured information that can easily be operationalised for statistical analysis. We provide an overview of the challenges of transcribing such tables and suggest novel applications of coherent point drift, auto-encoders and geometric map learning for this purpose. We show that these methods can eectively be applied for automated segmentation of tables from historic documents and be used as a pre-step before feeding into conventional OCR and transcription systems (eg. Tesseract/micro-task platforms).

AB - Tables are paramount in quantitive social science, economics, and demography as they provide structured information that can easily be operationalised for statistical analysis. We provide an overview of the challenges of transcribing such tables and suggest novel applications of coherent point drift, auto-encoders and geometric map learning for this purpose. We show that these methods can eectively be applied for automated segmentation of tables from historic documents and be used as a pre-step before feeding into conventional OCR and transcription systems (eg. Tesseract/micro-task platforms).

M3 - Working paper

BT - Table detection and Segmentation

ER -