Representation Learning from Text and Structured Data

Rui Zhang

Research output: ThesisPh.D. thesis

Abstract

Text and networks, as two common forms of data, always appear cooperatively in describing diverse applications in the real world, such as review systems, social networks, and citation networks. As the demand of data analytics continues to grow, how to effectively and efficiently represent text and network data has become a critical research issue. To resolve this problem, various machine learning models have been proposed for text and network representation learning, but most of them mainly rely on tons of manually labeled training samples, complex systems, and/or high-dimensional vectors to improve the accuracy and precision of representations, which often bring new challenges to computation and storage costs in both upstream and downstream applications. Thus, this thesis addresses the above challenges
and makes contributions to representation learning on the text and text-attributed network data.


For text representation learning, a label-semantic augmented multi-label-learning model is proposed to categorize text-based publications with hierarchical category structure, which creatively learns representations of publications and categories, recognizes and passes their matching information hierarchically, and as a result, achieves better hierarchical-category predictions.


For text-attributed network representation learning, a meta-path-based embedding method is first developed, which is able to learn low-dimensional representations for target-typed nodes from their text attributes and topological structures by a cascaded self-supervised mechanism. Moreover, in order to overcome the limitation of preset meta-paths and reduce the extra learning cost, we also propose a selfsupervised meta-path-free algorithm with relation-based neighbor-graph contrast learning, which could produce global node representations by encoding all-typed
nodes and relations. These representations can be used for a variety of downstream tasks and outperform state-of-the-art baselines.


Overall, this thesis provides a comprehensive review of existing representation learning methods and proposes several novel approaches based on deep learning to produce much more effective and efficient representations for text and networks.
The contributions are empirically validated on several real-world datasets and tasks.
Original languageEnglish
Awarding Institution
  • University of Southern Denmark
Supervisors/Advisors
  • Zimek, Arthur, Principal supervisor
  • Schneider-Kamp, Peter, Co-supervisor
Date of defence20. May 2022
Place of PublicationOdense
Publisher
DOIs
Publication statusPublished - 25. May 2022

Note re. dissertation

Print copy of the thesis is restricted to reference use in the Library.

Fingerprint

Dive into the research topics of 'Representation Learning from Text and Structured Data'. Together they form a unique fingerprint.

Cite this