Toward an infrastructure for data-driven multimodal communication research

Francis F. Steen, Anders Hougaard, Jungseock Joo, Inés Olza, Cristóbal Pagán Cánovas, Anna Pleshakova, Soumya Ray, Peter Uhrig, Javier Valenzuela, Jacek Woźny, Mark Turner*

*Corresponding author for this work

Research output: Contribution to journalJournal articleResearchpeer-review

140 Downloads (Pure)


Research into the multimodal dimensions of human communication faces a set of distinctive methodological challenges. Collecting the datasets is resource-intensive, analysis often lacks peer validation, and the absence of shared datasets makes it difficult to develop standards. External validity is hampered by small datasets, yet large datasets are intractable. Red Hen Lab spearheads an international infrastructure for data-driven multimodal communication research, facilitating an integrated cross-disciplinary workflow. Linguists, communication scholars, statisticians, and computer scientists work together to develop research questions, annotate training sets, and develop pattern discovery and Machine learning tools that handle vast collections of multimodal data, beyond the dreams of previous researchers. This infrastructure makes it possible for researchers at multiple sites to work in real-time in transdisciplinary teams. We review the vision, progress, and prospects of this research consortium.

Original languageEnglish
Article number20170041
JournalLinguistics Vanguard
Issue number1
Number of pages9
Publication statusPublished - 2018


  • Automated parsing
  • Corpora
  • Machine learning
  • Multimodality
  • Research consortia


Dive into the research topics of 'Toward an infrastructure for data-driven multimodal communication research'. Together they form a unique fingerprint.

Cite this