Data and code for: Topic-aware Heterogeneous Graph Neural Network for Link Prediction
This record contains the data and code for CIKM 2021 paper “Topic-aware Heterogeneous Graph Neural Network for Link Prediction”.
Heterogeneous graphs (HGs), consisting of multiple types of nodes and links, can characterize a variety of real-world complex systems. Recently, heterogeneous graph neural networks (HGNNs), as a powerful graph embedding method to aggregate heterogeneous structure and attribute information, has earned a lot of attention. Despite the ability of HGNNs in capturing rich semantics which reveal different aspects of nodes, they still stay at a coarse-grained level which simply exploits structural characteristics. In fact, rich unstructured text content of nodes also carries latent but more fine-grained semantics arising from multi-facet topic-aware factors, which fundamentally manifest why nodes of different types would connect and form a specific heterogeneous structure. However, little effort has been devoted to factorizing them.In this paper, we propose a Topic-aware Heterogeneous Graph Neural Network, named THGNN, to hierarchically mine topic-aware semantics for learning multi-facet node representations for link prediction in HGs. Specifically, our model mainly applies an alternating two-step aggregation mechanism including intra-metapath decomposition and inter-metapath mergence, which can distinctively aggregate rich heterogeneous information according to the inferential topic-aware factors and preserve hierarchical semantics. Furthermore, a topic prior guidance module is also designed to keep the quality of multi-facet topic-aware embeddings relying on the global knowledge from unstructured text content in HGs. It helps to simultaneously improve both performance and interpretability. Experimental results on three real-world HGs demonstrate that our proposed model can effectively outperform the state-of-the-art methods in the link prediction task, and show the potential interpretability of learnt multi-facet topic-aware representations.
History
Confidential or personally identifiable information
- I confirm that the uploaded data has no confidential or personally identifiable information.