TY - GEN
T1 - Uniform Density in Linguistic Information Derived from Dependency Structures
AU - Richter, Michael
AU - Farrell, Mariah
AU - Kölbl, Max
AU - Kyogoku, Yuki
AU - Philipp, J. Nathanael
AU - Yousef, Tariq
AU - Heyer, Gerhard
AU - Himmelmann , Nikolaus P.
PY - 2022
Y1 - 2022
N2 - This pilot study addresses the question of whether the Uniform Information Density principle (UID) can be proved for eight typologically diverse languages. The lexical information of words is derived from dependency structures both in sentences preceding the sentences and within the sentence in which the target word occurs. Dependency structures are a realisation of extra-sentential contexts for deriving information as formulated in the surprisal model. Only subject, object and oblique, ie, the level directly below the verbal root node, were considered. UID says that in natural language, the variance of information and information jumps from word to word should be small so as not to make the processing of a linguistic message an insurmountable hurdle. We observed cross-linguistically different information distributions but an almost identical UID, which provides evidence for the UID hypothesis and assumes that dependency structures can function as proxies for extrasentential contexts. However, for the dependency structures chosen as contexts, the information distributions in some languages were not statistically significantly different from distributions from a random corpus. This might be an effect of too low complexity of our model’s dependency structures, so lower hierarchical levels (eg phrases) should be considered.
AB - This pilot study addresses the question of whether the Uniform Information Density principle (UID) can be proved for eight typologically diverse languages. The lexical information of words is derived from dependency structures both in sentences preceding the sentences and within the sentence in which the target word occurs. Dependency structures are a realisation of extra-sentential contexts for deriving information as formulated in the surprisal model. Only subject, object and oblique, ie, the level directly below the verbal root node, were considered. UID says that in natural language, the variance of information and information jumps from word to word should be small so as not to make the processing of a linguistic message an insurmountable hurdle. We observed cross-linguistically different information distributions but an almost identical UID, which provides evidence for the UID hypothesis and assumes that dependency structures can function as proxies for extrasentential contexts. However, for the dependency structures chosen as contexts, the information distributions in some languages were not statistically significantly different from distributions from a random corpus. This might be an effect of too low complexity of our model’s dependency structures, so lower hierarchical levels (eg phrases) should be considered.
KW - Dependency Structures
KW - Uniform Information Density
KW - Universal Dependencies
U2 - 10.5220/0010969600003116
DO - 10.5220/0010969600003116
M3 - Article in proceedings
SN - 9789897585470
SP - 496
EP - 503
BT - Proceedings of the 14th International Conference on Agents and Artificial Intelligence
A2 - Rocha, Ana Paula
A2 - Steels, Luc
A2 - van den Herik, Jaap
ER -