Annotating Emoticons and Emojis in a German-Danish Social Media Corpus for Hate Speech Research

Eckhard Bick

    Research output: Contribution to journalJournal articleResearchpeer-review

    202 Downloads (Pure)

    Abstract

    This paper presents and evaluates the emoticon/emoji annotation of a large German-Danish Social Media corpus for hate speech research. Overall tagging was more accurate for emojis (98-99%) than for text emoticons (91-92%), and slightly better for German than for Danish.
    We discuss problems and strategies involved in the recognition and linguistic annotation
    of emoticons and emojis, and show how an emoticon classification system can be used to
    highlight interesting differences between German and Danish Twitter, as well as between the background corpus on the one hand and tweets targeting the immigrant/refugee minorities on the other. Using concrete examples, we illustrate how the annotation facilitates corpus inspection and how certain emoticon types (e.g. ‘wink’ and ‘skeptical’) can help to identify otherwise inaccessible examples of non-direct hate speech. Finally, we use emoji-informed word embedding to investigate the emotional content of equivalent immigration key words in German and Danish.
    Original languageEnglish
    JournalRASK – International journal of language and communication
    Volume52
    Pages (from-to)1-20
    ISSN0909-8976
    Publication statusPublished - 2020

    Fingerprint

    Dive into the research topics of 'Annotating Emoticons and Emojis in a German-Danish Social Media Corpus for Hate Speech Research'. Together they form a unique fingerprint.

    Cite this