TY - GEN
T1 - Representation Learning from Text and Structured Data
AU - Zhang, Rui
PY - 2022/5/25
Y1 - 2022/5/25
N2 - Text and networks, as two common forms of data, always appear cooperatively in describing diverse applications in the real world, such as review systems, social networks, and citation networks. As the demand of data analytics continues to grow, how to effectively and efficiently represent text and network data has become a critical research issue. To resolve this problem, various machine learning models have been proposed for text and network representation learning, but most of them mainly rely on tons of manually labeled training samples, complex systems, and/or high-dimensional vectors to improve the accuracy and precision of representations, which often bring new challenges to computation and storage costs in both upstream and downstream applications. Thus, this thesis addresses the above challengesand makes contributions to representation learning on the text and text-attributed network data.For text representation learning, a label-semantic augmented multi-label-learning model is proposed to categorize text-based publications with hierarchical category structure, which creatively learns representations of publications and categories, recognizes and passes their matching information hierarchically, and as a result, achieves better hierarchical-category predictions.For text-attributed network representation learning, a meta-path-based embedding method is first developed, which is able to learn low-dimensional representations for target-typed nodes from their text attributes and topological structures by a cascaded self-supervised mechanism. Moreover, in order to overcome the limitation of preset meta-paths and reduce the extra learning cost, we also propose a selfsupervised meta-path-free algorithm with relation-based neighbor-graph contrast learning, which could produce global node representations by encoding all-typednodes and relations. These representations can be used for a variety of downstream tasks and outperform state-of-the-art baselines.Overall, this thesis provides a comprehensive review of existing representation learning methods and proposes several novel approaches based on deep learning to produce much more effective and efficient representations for text and networks.The contributions are empirically validated on several real-world datasets and tasks.
AB - Text and networks, as two common forms of data, always appear cooperatively in describing diverse applications in the real world, such as review systems, social networks, and citation networks. As the demand of data analytics continues to grow, how to effectively and efficiently represent text and network data has become a critical research issue. To resolve this problem, various machine learning models have been proposed for text and network representation learning, but most of them mainly rely on tons of manually labeled training samples, complex systems, and/or high-dimensional vectors to improve the accuracy and precision of representations, which often bring new challenges to computation and storage costs in both upstream and downstream applications. Thus, this thesis addresses the above challengesand makes contributions to representation learning on the text and text-attributed network data.For text representation learning, a label-semantic augmented multi-label-learning model is proposed to categorize text-based publications with hierarchical category structure, which creatively learns representations of publications and categories, recognizes and passes their matching information hierarchically, and as a result, achieves better hierarchical-category predictions.For text-attributed network representation learning, a meta-path-based embedding method is first developed, which is able to learn low-dimensional representations for target-typed nodes from their text attributes and topological structures by a cascaded self-supervised mechanism. Moreover, in order to overcome the limitation of preset meta-paths and reduce the extra learning cost, we also propose a selfsupervised meta-path-free algorithm with relation-based neighbor-graph contrast learning, which could produce global node representations by encoding all-typednodes and relations. These representations can be used for a variety of downstream tasks and outperform state-of-the-art baselines.Overall, this thesis provides a comprehensive review of existing representation learning methods and proposes several novel approaches based on deep learning to produce much more effective and efficient representations for text and networks.The contributions are empirically validated on several real-world datasets and tasks.
U2 - 10.21996/39jm-9m60
DO - 10.21996/39jm-9m60
M3 - Ph.D. thesis
PB - Syddansk Universitet. Det Naturvidenskabelige Fakultet
CY - Odense
ER -