The VADA architecture for cost-effective data wrangling

N. Konstantinou, M. Koehler, E. Abel, C. Civili, B. Neumayr, E. Sallinger, A.A.A. Fernandes, G. Gottlob, J.A. Keane, L. Libkin, N.W. Paton

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Abstract

Data wrangling, the multi-faceted process by which the data required by an application is identified, extracted, cleaned and integrated, is often cumbersome and labor intensive. In this paper, we present an architecture that supports a complete data wrangling lifecycle, orchestrates components dynamically, builds on automation wherever possible, is informed by whatever data is available, refines automatically produced results in the light of feedback, takes into account the user's priorities, and supports data scientists with diverse skill sets. The architecture is demonstrated in practice for wrangling property sales and open government data.
Original languageEnglish
Title of host publicationProceedings of the ACM SIGMOD International Conference on Management of Data
PublisherAssociation for Computing Machinery
Publication date2017
Pages1599–1602
ISBN (Electronic)9781450341974
DOIs
Publication statusPublished - 2017
Externally publishedYes

Keywords

  • Data wrangling

Fingerprint

Dive into the research topics of 'The VADA architecture for cost-effective data wrangling'. Together they form a unique fingerprint.

Cite this