Essays in Economics and Data Science

Simon Wittrock

Research output: ThesisPh.D. thesis

Abstract

This thesis consists of six self-contained papers within the fields of machine learning and economics. The first and second paper discuss machine learning within the legal system and investigate to what extent court decisions can be estimated based on offender characteristics. The remaining chapters describes methods for automatic handwritten text recognition and present different implementation possibilities.

The first chapter analyses the predictive power of decision trees for court decisions. Using machine learning models we find that sentencing outcomes are to a large extent predictable and with only a small number of variables we obtain a 20 percentage point increase compared to expecting the most common outcome. The second chapter discuss the effects of socioeconomic status on the risk of incarceration when sentenced. The results indicate that unemployed offenders are more likely to be incarcerated and that this seems to be related to being unemployed and reoffending. In addition, this chapter predicts the risk of recidivism of offenders and find a significant difference in the estimated risk of recidivism for offenders that at some point in time reoffend compared to individuals without any registered reoffence.

The third chapter presents a pipeline for automatically transcribing historical archives. First, this paper introduce an unsupervised document classification which is tested on nurse journals where the classification of the nurse documents enables us to correctly identify all treated and non-treated cases in the sample. The second part of this chapter is an automatic handwritten text recognition procedure based on an attention-based network. The fourth chapter presents the largest digital handwritten name database which proves useful for transfer learning to other handwritten datasets. To this end, we present transfer learning results on the Danish and US census, improving the accuracy obtained without transfer learning. The database is constructed using police records from Denmark in the period from 1890 to 1923, which cover all adults above the age of 10 residing in Copenhagen at the time and amounts to 1,106,020 segmented handwritten names. In the fifth chapter we implement a handwritten date recognition system for constructing automatic recognition and transcription of handwritten dates with accuracies between 92% and 100%. The system is built on a large handwritten date database across a collection of different types of historical documents totalling more than 3.1 million images. In addition, we show that the date recognition system transfer learns well to other applications and significantly reduces the error rate on other datasets. Finally, this chapter utilize the constructed network to provide automatic transcriptions of the entire 1916 Danish Census (excluding Copenhagen). The sixth chapter use the automatic transcriptions of the entire 1916 census obtained in chapter five for linking psychiatric patients to income data. We find clear correlations between schizophrenia patients more often being selected for a prefrontal lobotomy surgery. There are however no indications that lobotomy patients were selected based on being part of a lower socioeconomic class or that schizophrenia patients were from generally worse off families, although schizophrenia patient were more frequently impaired from work at a young age, due to mental illnesses and hospitalization.
Original languageEnglish
Awarding Institution
  • University of Southern Denmark
Supervisors/Advisors
  • Dahl, Christian Møller, Principal supervisor
  • Mellace, Giovanni, Co-supervisor
  • Wray, Anthony, Co-supervisor
Date of defence7. Dec 2022
Publisher
DOIs
Publication statusPublished - 20. Jan 2023

Note re. dissertation

Print copy of the thesis is restricted to reference use in the Library. 

Fingerprint

Dive into the research topics of 'Essays in Economics and Data Science'. Together they form a unique fingerprint.

Cite this