Keyword extraction in German: Information-theory vs. Deep learning

Max Kölbl, Yuki Kyogoku, J. Nathanael Philipp, Michael Richter, Clemens Rietdorf, Tariq Yousef

Publikation: Kapitel i bog/rapport/konference-proceedingKonferencebidrag i proceedingsForskningpeer review

Abstract

This paper reports the results of a study on automatic keyword extraction in German. We employed in general two types of methods: (A) an unsupervised method based on information theory (Shannon, 1948). We employed (i) a bigram model, (ii) a probabilistic parser model (Hale, 2001) and (iii) an innovative model which utilises topics as extra-sentential contexts for the calculation of the information content of the words, and (B) a supervised method employing a recurrent neural network (RNN). As baselines, we employed TextRank and the TF-IDF ranking function. The topic model (A)(iii) outperformed clearly all remaining models, even TextRank and TF-IDF. In contrast, RNN performed poorly. We take the results as first evidence, that (i) information content can be employed for keyword extraction tasks and has thus a clear correspondence to semantics of natural language’s, and (ii) that - as a cognitive principle - the information content of words is determined from extra-sentential contexts, that is to say, from the discourse of words.

OriginalsprogEngelsk
TitelICAART 2020 - Proceedings of the 12th International Conference on Agents and Artificial Intelligence
RedaktørerAna Rocha, Luc Steels, Jaap van den Herik
Antal sider6
ForlagSCITEPRESS Digital Library
Publikationsdato2020
Sider459-464
ISBN (Elektronisk)9789897583957
DOI
StatusUdgivet - 2020
Udgivet eksterntJa
Begivenhed12th International Conference on Agents and Artificial Intelligence, ICAART 2020 - Valletta, Malta
Varighed: 22. feb. 202024. feb. 2020

Konference

Konference12th International Conference on Agents and Artificial Intelligence, ICAART 2020
Land/OmrådeMalta
ByValletta
Periode22/02/202024/02/2020
SponsorInstitute for Systems and Technologies of Information, Control and Communication (INSTICC)
NavnInternational Conference on Agents and Artificial Intelligence
ISSN2184-433X

Bibliografisk note

Publisher Copyright:
Copyright © 2020 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved.

Fingeraftryk

Dyk ned i forskningsemnerne om 'Keyword extraction in German: Information-theory vs. Deep learning'. Sammen danner de et unikt fingeraftryk.

Citationsformater