TY - JOUR
T1 - Is medieval distant viewing possible? : Extending and enriching annotation of legacy image collections using visual analytics
AU - Meinecke, Christofer
AU - Guéville, Estelle
AU - Wrisley, David Joseph
AU - Jänicke, Stefan
PY - 2024/6
Y1 - 2024/6
N2 - Abstract Distant viewing approaches have typically used image datasets close to the contemporary image data used to train machine learning models. To work with images from other historical periods requires expert annotated data, and the quality of labels is crucial for the quality of results. Especially when working with cultural heritage collections that contain myriad uncertainties, annotating data, or re-annotating, legacy data is an arduous task. In this paper, we describe working with two pre-annotated sets of medieval manuscript images that exhibit conflicting and overlapping metadata. Since a manual reconciliation of the two legacy ontologies would be very expensive, we aim (1) to create a more uniform set of descriptive labels to serve as a “bridge” in the combined dataset, and (2) to establish a high-quality hierarchical classification that can be used as a valuable input for subsequent supervised machine learning. To achieve these goals, we developed visualization and interaction mechanisms, enabling medievalists to combine, regularize and extend the vocabulary used to describe these, and other cognate, image datasets. The visual interfaces provide experts an overview of relationships in the data going beyond the sum total of the metadata. Word and image embeddings as well as co-occurrences of labels across the datasets enable batch re-annotation of images, recommendation of label candidates, and support composing a hierarchical classification of labels.
AB - Abstract Distant viewing approaches have typically used image datasets close to the contemporary image data used to train machine learning models. To work with images from other historical periods requires expert annotated data, and the quality of labels is crucial for the quality of results. Especially when working with cultural heritage collections that contain myriad uncertainties, annotating data, or re-annotating, legacy data is an arduous task. In this paper, we describe working with two pre-annotated sets of medieval manuscript images that exhibit conflicting and overlapping metadata. Since a manual reconciliation of the two legacy ontologies would be very expensive, we aim (1) to create a more uniform set of descriptive labels to serve as a “bridge” in the combined dataset, and (2) to establish a high-quality hierarchical classification that can be used as a valuable input for subsequent supervised machine learning. To achieve these goals, we developed visualization and interaction mechanisms, enabling medievalists to combine, regularize and extend the vocabulary used to describe these, and other cognate, image datasets. The visual interfaces provide experts an overview of relationships in the data going beyond the sum total of the metadata. Word and image embeddings as well as co-occurrences of labels across the datasets enable batch re-annotation of images, recommendation of label candidates, and support composing a hierarchical classification of labels.
KW - Collections as Data
KW - Distant Viewing
KW - Latin Bibles
KW - Legacy Data
KW - Medieval Manuscripts
KW - Visual Analytics
KW - Visual Thinking
KW - Visualization in the Humanities
KW - Vocabulary Interoperability
U2 - 10.1093/llc/fqae020
DO - 10.1093/llc/fqae020
M3 - Journal article
SN - 2055-7671
VL - 39
SP - 638
EP - 656
JO - Digital Scholarship in the Humanities
JF - Digital Scholarship in the Humanities
IS - 2
ER -