Abstract
This paper presents and evaluates the emoticon/emoji annotation of a large German-Danish Social Media corpus for hate speech research. Overall tagging was more accurate for emojis (98-99%) than for text emoticons (91-92%), and slightly better for German than for Danish.
We discuss problems and strategies involved in the recognition and linguistic annotation
of emoticons and emojis, and show how an emoticon classification system can be used to
highlight interesting differences between German and Danish Twitter, as well as between the background corpus on the one hand and tweets targeting the immigrant/refugee minorities on the other. Using concrete examples, we illustrate how the annotation facilitates corpus inspection and how certain emoticon types (e.g. ‘wink’ and ‘skeptical’) can help to identify otherwise inaccessible examples of non-direct hate speech. Finally, we use emoji-informed word embedding to investigate the emotional content of equivalent immigration key words in German and Danish.
We discuss problems and strategies involved in the recognition and linguistic annotation
of emoticons and emojis, and show how an emoticon classification system can be used to
highlight interesting differences between German and Danish Twitter, as well as between the background corpus on the one hand and tweets targeting the immigrant/refugee minorities on the other. Using concrete examples, we illustrate how the annotation facilitates corpus inspection and how certain emoticon types (e.g. ‘wink’ and ‘skeptical’) can help to identify otherwise inaccessible examples of non-direct hate speech. Finally, we use emoji-informed word embedding to investigate the emotional content of equivalent immigration key words in German and Danish.
Original language | English |
---|---|
Journal | RASK – International journal of language and communication |
Volume | 52 |
Pages (from-to) | 1-20 |
ISSN | 0909-8976 |
Publication status | Published - 2020 |