TY - GEN
T1 - Essays in Economics and Data Science
AU - Wittrock, Simon
PY - 2023/1/20
Y1 - 2023/1/20
N2 - This thesis consists of six self-contained papers within the fields of machine learning
and economics. The first and second paper discuss machine learning within the legal
system and investigate to what extent court decisions can be estimated based on offender
characteristics. The remaining chapters describes methods for automatic handwritten
text recognition and present different implementation possibilities.The first chapter analyses the predictive power of decision trees for court decisions.
Using machine learning models we find that sentencing outcomes are to a large extent
predictable and with only a small number of variables we obtain a 20 percentage point
increase compared to expecting the most common outcome. The second chapter discuss
the effects of socioeconomic status on the risk of incarceration when sentenced. The results
indicate that unemployed offenders are more likely to be incarcerated and that this seems
to be related to being unemployed and reoffending. In addition, this chapter predicts
the risk of recidivism of offenders and find a significant difference in the estimated risk
of recidivism for offenders that at some point in time reoffend compared to individuals
without any registered reoffence.The third chapter presents a pipeline for automatically transcribing historical archives.
First, this paper introduce an unsupervised document classification which is tested on nurse journals where the classification of the nurse documents enables us to correctly
identify all treated and non-treated cases in the sample. The second part of this chapter
is an automatic handwritten text recognition procedure based on an attention-based network. The fourth chapter presents the largest digital handwritten name database which
proves useful for transfer learning to other handwritten datasets. To this end, we present
transfer learning results on the Danish and US census, improving the accuracy obtained
without transfer learning. The database is constructed using police records from Denmark
in the period from 1890 to 1923, which cover all adults above the age of 10 residing in
Copenhagen at the time and amounts to 1,106,020 segmented handwritten names. In the
fifth chapter we implement a handwritten date recognition system for constructing automatic recognition and transcription of handwritten dates with accuracies between 92%
and 100%. The system is built on a large handwritten date database across a collection
of different types of historical documents totalling more than 3.1 million images. In addition, we show that the date recognition system transfer learns well to other applications
and significantly reduces the error rate on other datasets. Finally, this chapter utilize the
constructed network to provide automatic transcriptions of the entire 1916 Danish Census (excluding Copenhagen). The sixth chapter use the automatic transcriptions of the
entire 1916 census obtained in chapter five for linking psychiatric patients to income data.
We find clear correlations between schizophrenia patients more often being selected for
a prefrontal lobotomy surgery. There are however no indications that lobotomy patients
were selected based on being part of a lower socioeconomic class or that schizophrenia
patients were from generally worse off families, although schizophrenia patient were more
frequently impaired from work at a young age, due to mental illnesses and hospitalization.
AB - This thesis consists of six self-contained papers within the fields of machine learning
and economics. The first and second paper discuss machine learning within the legal
system and investigate to what extent court decisions can be estimated based on offender
characteristics. The remaining chapters describes methods for automatic handwritten
text recognition and present different implementation possibilities.The first chapter analyses the predictive power of decision trees for court decisions.
Using machine learning models we find that sentencing outcomes are to a large extent
predictable and with only a small number of variables we obtain a 20 percentage point
increase compared to expecting the most common outcome. The second chapter discuss
the effects of socioeconomic status on the risk of incarceration when sentenced. The results
indicate that unemployed offenders are more likely to be incarcerated and that this seems
to be related to being unemployed and reoffending. In addition, this chapter predicts
the risk of recidivism of offenders and find a significant difference in the estimated risk
of recidivism for offenders that at some point in time reoffend compared to individuals
without any registered reoffence.The third chapter presents a pipeline for automatically transcribing historical archives.
First, this paper introduce an unsupervised document classification which is tested on nurse journals where the classification of the nurse documents enables us to correctly
identify all treated and non-treated cases in the sample. The second part of this chapter
is an automatic handwritten text recognition procedure based on an attention-based network. The fourth chapter presents the largest digital handwritten name database which
proves useful for transfer learning to other handwritten datasets. To this end, we present
transfer learning results on the Danish and US census, improving the accuracy obtained
without transfer learning. The database is constructed using police records from Denmark
in the period from 1890 to 1923, which cover all adults above the age of 10 residing in
Copenhagen at the time and amounts to 1,106,020 segmented handwritten names. In the
fifth chapter we implement a handwritten date recognition system for constructing automatic recognition and transcription of handwritten dates with accuracies between 92%
and 100%. The system is built on a large handwritten date database across a collection
of different types of historical documents totalling more than 3.1 million images. In addition, we show that the date recognition system transfer learns well to other applications
and significantly reduces the error rate on other datasets. Finally, this chapter utilize the
constructed network to provide automatic transcriptions of the entire 1916 Danish Census (excluding Copenhagen). The sixth chapter use the automatic transcriptions of the
entire 1916 census obtained in chapter five for linking psychiatric patients to income data.
We find clear correlations between schizophrenia patients more often being selected for
a prefrontal lobotomy surgery. There are however no indications that lobotomy patients
were selected based on being part of a lower socioeconomic class or that schizophrenia
patients were from generally worse off families, although schizophrenia patient were more
frequently impaired from work at a young age, due to mental illnesses and hospitalization.
U2 - 10.21996/jrzs-xp77
DO - 10.21996/jrzs-xp77
M3 - Ph.D. thesis
PB - Syddansk Universitet. Det Samfundsvidenskabelige Fakultet
ER -