Hypernetwork Link Prediction Method Based on Fusion of Topology and Attribute Features

Link prediction aims at predicting missing or potential links based on the known information of complex networks. Most existing methods focus on pairwise low-order relationships while ignoring the high-order interaction and the rich attribute information of entities in the actual network, leading to the low performance of the model in link prediction. To mine the cross-modality interactions between the high-order structure and attributes of the network, this paper proposes a hypernetwork link prediction method for fusion topology and attributes (TA-HLP). Firstly, a dual channel coder is employed for jointly learning the structural features and attribute features of nodes. In structural encoding, a node-level attention mechanism is designed to aggregate neighbor information to learn structural patterns effectively. In attribute encoding, the hypergraph is used to refine the attribute features. The high-order relationship between nodes and attributes is modeled based on the node-attribute-node feature update, which preserves the semantic information jointly reflected by nodes and attributes. Moreover, in the joint embedding, a hyperedge-level attention mechanism is introduced to capture nodes with different importance in the hyperedge. Extensive experiments on six data sets demonstrate that this method has achieved a more significant link prediction effect than the existing methods.


Introduction
Various networks can describe a large number of complex systems in the real world. Scholars generally use complex networks to describe real systems. A complex network consists of nodes and connected edges, in which nodes can be used to represent different individuals, and links between two nodes can represent their interactions. However, in many practical problems, these relationships may exist among multiple actors, not only including low-order pairwise relationships. For example, in the coauthorship network [1], multiple authors jointly publish a paper; in the metabolic network [2], multiple chemicals participate in a reaction. Obviously, in these cases, when modeling the properties of these systems with a single network consisting of paired nodes, there will be limitations to varying degrees, and much valuable information will be lost. Hypernetworks provide an effective way to model such high-order topological features. Hypernetworks are the generalization of networks. Different from the connection of pairs of nodes, hyperedges can represent the relationship between multiple nodes, so a hypernetwork can fully represent the higher-order correlation between entities. Scholars have paid extensive attention to hypernetworks in recent years owing to the efficiency and sufficiency of their modeling. For example, in social networks, hypernetworks can intuitively represent the interaction, and message transmission between friend groups, such as personalized recommendation [3]; in the knowledge graph, the hypernetwork can maintain multiple relationships between facts, such as knowledge base completion [4], intelligent question and answer [5], and other tasks. In addition to containing relational network topology information, attribute hypernetworks can also explicitly capture the attribute information of entities. For example, papers in citation networks also have attributes, such as abstract and title. The reasonable use of this cross-modal semantic information can enhance the analytical expression ability of the network.
As an important task in network science, link prediction aims to explore the formation mechanism and evolution law of the network to predict the connection edges that actually exist in the network but cannot be observed or not yet observed. It mines hidden pattern information in data through modeling association between objects, which is widely used in various fields, such as information retrieval of knowledge maps [6], interactive prediction of proteins [7], and accident assessment of traffic networks [8]. Some of the current methods focus on homogeneous information networks [9,10], learning the characteristics of the network from the two dimensions of topology and node attribute information for link prediction, while others focus on heterogeneous information networks [11,12], which takes into account the semantic relevance of different node types and connection relationships in the network compared with the former, and has received more extensive research. However, these methods all follow with interesting low-level semantic information about entities, ignoring the high-order complexity commonly existing in real networks, which has become the limitation of using traditional network modeling methods to solve link prediction tasks. For example, in the paper coauthorship network [1], a valuable scientific research achievement usually involves multiple authors. The traditional research method can only establish a link relationship based on whether the authors have cooperated with each other due to the constraint of the binary relationships. The specific and diversified cooperation information is ignored, including the participation relationship of multiple authors, the detailed attributes of all authors, and even other cooperative associations within the cooperative group. Similarly, in the metabolic network [2], a complete reaction must contain at least three chemical substances. Traditional methods can only establish a relationship based on whether two substances can react. As for the specific information of the reaction, including the attribute information of other participants, their mutual influence, and the inclusion relationship of different reaction groups are only lost by this single flattening mapping. It can be seen that these networks driven by higher-order relationships often contain semantic information from multiple perspectives. How to accurately capture potential semantic information to ensure the integrity of modeling has become the key to link prediction. The hyperlink prediction method can more accurately depict the interaction between entities and solves the problem of information loss in mapping multivariate relations to binary relations. Although many scholars have conducted extensive research on the variable cardinality problem [2,13], the tuple level measurement problem of hyperedges [14,15], and the capture problem of local and global structures [16,17] in the prediction of hypernetwork links, all of which measure the possibility of the existence of hyperedges in terms of the high order of topology, there is less research on how to use the complex interaction between entities and attributes to build deep network models. There is also a higher-order mapping relationship between entities and attributes. Multiple entities may have the same attributes. This potential association often contains the inherent relevance of the network structure. Using the information interaction between entities and attributes can help the model to carry out more expressive embedded learning. Although attribute hypernetwork can capture more abundant semantic information, it is a challenging task. How to embed the content attributes of entities in high-order and complex interactions and then mine the cross-modal associations between data is very essential for the comprehensive and accurate modeling of network features.
In response to these challenges, in this paper, we design a link prediction method oriented to attribute hypernetworks. This method learns both structural and attribute modal information and combines the attention mechanism for more targeted training, which effectively enhances the performance of link prediction. The contributions of this paper are as follows: • We propose a general link prediction framework for the attribute hypernetwork, which integrates the structure and attribute characteristics of entities. Through a dual channel encoder, the interaction between network structure and node attributes is captured, and adopting hypergraph neural networks learns high-order interactions of nodes and attributes so that node information can be effectively transmitted through shared attributes. • We introduce the node level-attention mechanism in structure encoding to model the importance of node neighbors to capture the neighborhood information of nodes; We introduce the hyperedge-level attention mechanism in the joint embedding module to consider the influence of different nodes in each hyperedge to screen the key nodes from a large amount of information. In this way, high-quality node embedding and hyperedge embedding can be obtained. • Numerous experiments on six datasets have verified the efficiency of the TA-HLP model in link prediction. Compared with the latest hyperlink prediction model [2,10,14,[18][19][20][21][22][23][24][25], the TA-HLP model has significantly improved under the AUC and recall indicators. At the same time, ablation experiments have also proved the effectiveness of the joint encode and the attention mechanisms.

Related Work
With the continuous breakthrough of artificial intelligence and network science, link prediction has been extensively studied in machine learning. Traditional link prediction is mainly based on similarity index or network embedding method to judge the probability of links between pairs of nodes. The former primarily uses the scores of static indicators to calculate the probability of link existence, such as Common Neighbor [26], Local Path [27], Katz [28], etc. The network embedding method learns the vector representation of nodes for link prediction tasks. DeepWalk [18], LINE [19], and Node Node2vec [10] define different sampling strategies to embed the obtained node sequence. Some emerging graph neural networks are also used in graph representation learning. They adopt different aggregation methods to map the original information of the network, such as graph convolution networks [29], graph self-encoders GAE [30], VGAE [31], and GraphSAGE [32]. However, these methods are all aimed at the network constructed by the common graph, and the reality often requires modeling between nodes that go beyond pairwise association. Therefore, our approach uses a hypergraph to show and represent the high-level interaction between multiple nodes to comprehensively and effectively capture the rich and diverse topology of the network.
Hypernetwork is an important extension and promotion of ordinary networks, which can more intuitively model the more complex high-order structure inside the system. At present, most research on hypernetworks focuses on embedded learning. The purpose of hypernetwork embedding is to learn a low dimensional embedding space, retain the information of the original network in the embedding space as much as possible, and screen out redundant information and noise information in the data so that the similarity of edge nodes embedded should be higher than that of irrelevant nodes. It is used to perform subsequent tasks such as classification, clustering, link prediction, subgraph segmentation. Recently, hypergraph neural networks [22] and hypergraph convolution networks(HyperGCN) [23] have been proposed to encode higher-order data correlations in hypernetwork structures. In addition, a large amount of research is used to explore the indecomposability of heterogeneous hypernetworks. Tu et al. [14] designed a deep embedding model DHNE to achieve the nondecomposable cluster similarity function. Huang et al. [33] learned a hyper-path-based random walk model HPHG to retain the structural information of the hypernetwork. Yu et al. [15] proposed a goal that is applicable to both uniform and non-uniform hypergraphs with nonnegative embedding vector constraints. In addition, the newly proposed embedding schemes [24,25,34] also focus on the self-learning problem of the hypernetwork. Compared with the fixed learning mechanism of the former, they define different attention modules to achieve dynamic information aggregatiotab, further improving the flexibility and effectiveness of the hypernetwork representation. However, most of these methods only measure the embedding of hyperedges from the topology of the network. Although some neural network methods [22][23][24][25] for hypergraphs consider the attributes of nodes, they only use attributes to initiate the node characteristics. They ultimately learn the vector representation of nodes based on topology, ignoring the deep interaction between entities and attributes, and being unable to fully integrate multivariate attribute information into the measurement of entity features. Therefore, our method captures the content attributes of entities while maintaining the high-order topology between entities and separately encodes the embedded representation of entities and attributes, thus, retaining the semantics jointly reflected by both to the greatest extent. Table 1 summarizes the commonly used symbols and definitions.

Notations and Problem Statement
The number of nodes. n The number of hyperedges. d The dimension of node attributes.
Structural embedding of nodes.
Attribute embedding of nodes.
Embedding of nodes after fusion. Z e ∈ R n×d e Embedding of hyperedges.

Definition 1 (Attribute Hypernetwork
). An attribute hypernetwork can be defined as G = {V, E, X}, including a node set V = {v 1 , · · · , v m }, a hyperedge set E = {e 1 , · · · , e n } and an attribute characteristic matrix of nodes X = [x 1 , . . . , x m ] T . Each e i is a set of partial nodes in V, and each node v i has a d-dimensional attribute vector x i . m = |V| represents the number of nodes, n = |E| represents the number of hyperedges, and d = |x i | represents the dimension of node attributes.
Definition 2 (Structure Incidence Matrix). The topology of the hypernetwork G can be described by a structure incidence matrix S ∈ R m×n , and each element S ij in the matrix is defined as: Definition 3 (Attribute Incidence Matrix). Corresponding to the given node attribute characteristic matrix X, the relationship between nodes and attributes in the hypergraph can be expressed by an attribute incidence matrix H ∈ R m×d , and each element of the matrix H ij is defined as: Definition 4 (Degree Matrix of Nodes and Attributes). Given a node v i ∈ V, its degree is defined as d(v i ) = ∑ a j ∈A H v i , a j . For an attribute a i ∈ A, its degree is defined as d(a i ) = ∑ v j ∈V H v j , a i . According to this, we can get the degree matrix of nodes D v and the degree matrix of attributes D a . Obviously, they are diagonal matrices. The diagonal elements are the degrees of nodes and attributes, respectively, and the rest are all 0. The definitions are as follows: Problem 1 (Attribute Hypernetwork Representation Learning). The purpose of attribute hypernetwork representation learning is to map the nodes of the original network into a deep embedding in a low-dimensional space where the structural and attribute characteristics of the node are retained. Formally, given an attribute hypernetwork G = {V, E, X}, it aims to learn a function f : G → Z SE , Z AE to map the nodes of the original network into potential vectors in low dimensional space. Z SE ∈ R m×d 1 and Z AE ∈ R m×d 2 are, respectively, the learned structure embedding and attribute embedding of nodes.
Problem 2 (Hypernetwork Link Prediction). The problem of hypernetwork link prediction is a tuple (E,Ē). E represents the hyperlink set of a known incomplete hypernetwork, andĒ = 2 V − E represents a candidate set composed of unknown hyperlinks. The task of hypernetwork link prediction is to learn a hyperlink scoring function s : e i → y i ∈ R so that the hyperlink allocation score in the known link set E is as high as possible than that in the unknown link setĒ.

Proposed Method
The proposed TA-HLP model is a deep link prediction model for the attribute hypernetwork. The framework is shown in Figure 1, which is composed of an encoder that jointly learns the structure and attributes of the hypernetwork and a decoder that performs hyperlink prediction. The central processing procedures are as follows: • Based on the topological relationship, the structure encoder adopts a node-level attention mechanism to learn neighbors with different weights, so as to obtain the structural feature embedding of nodes. • Based on attribute information, the attribute encoder uses hypergraph neural network to capture the potential association between entities and attributes to obtain the attribute feature embedding of nodes. • The decoder combines the structure and attribute features in two independent semantic spaces, and then adopts the hyperedge-level attention mechanism to learn the importance of different nodes in each hyperedge. After the weight information is aggregated, the hyperedge is embedded. • Input hyperedge embedding and learn a hyperlink scoring function to judge the hyperedges that have more remarkable similarity with the original network's mode and behavior at the two levels of network structure and content attributes, i.e., the possible missing hyperedges of the original network.

Structure Encoder
Structure encoder embeds entity structure information through feature mapping and aggregates node structure features based on weighted neighbors, effectively capturing the impact of different levels of neighborhood information on node embedding to improve the quality of node representation. The specific process is as follows: First, we utilize a nonlinear transformation to embed different types of nodes from a heterogeneous space into a common potential space, input node feature X, and obtain the feature representation of nodes in the embedded spaceZ SE : σ(•) represents the nonlinear activation function. W SE ∈ R d×d SE and b SE ∈ R d SE ×1 represent the weight parameters and bias parameters to be learned for the nonlinear transformation layer respectively. Then, the node-level attention mechanism is employed to characterize the relative importance between nodes and their neighbors. The importance weight ω ij of node v j to node v i is expressed as 2 ×d SE and a ∈ R d 1 ×1 are the parameters to be trained for the attention layer. indicates the concatenate operation. After softmax normalization: Based on this importance weight, a weighted message transmission mechanism is adopted; that is, a new embedded representation of the node Z SE v i is obtained through the weighted sum of the feature vectors learned previously:

Attribute Encoder
In this paper, a hypergraph is employed to model the higher-order relationship between nodes and attributes. Entities with the same attributes are mapped to a hyperedge. The incidence matrix H ∈ R m×d can describe the many-to-many relationship between nodes and attributes. For any two entities, the more hyperedges associated with them, the more similarities will appear at the attribute level, and the closer they should be in the embedded feature space, which is consistent with the core idea of hypergraph embedding. Therefore, in the attribute encoding module, we use the transformation and aggregation mechanism of hypergraph neural network to re-encapsulate and represent entities and attributes, so that node information can be more spread through shared attributes. Compared with ordinary deep learning methods, we capture attribute features through hypernetworks, and entities and attributes can be updated bi-directionally using hyperedges, which is no longer just simple aggregation. Each layer of the hypergraph neural network can extract the potential associations of entities in attribute dimensions through convolution operation. Figure 2 shows the overall structure of the attribute encoder. The specific process is as follows: The hypergraph neural network learns attribute embedding through aggregation node embedding. Input the node attribute matrix X, and obtain the attribute embedded Y AE1 after linear transformation of a hyperedge convolution layer: Similarly, hypergraph neural network updates node embedding through aggregation attribute embedding. According to the obtained attribute embedding Y AE1 ∈ R d×d 2 , update the node embedding as X AE2 ∈ R m×d 2 after each hyperedge convolution. W AE2 ∈ R d 2 ×d 2 is the filter matrix to be learned by the second layer hypergraph convolution network. The updated formula is as follows: The attribute encoder based on a hypergraph neural network superimposes a twolayer hypergraph convolution network. It updates the nodes and attributes by collecting node features and attribute features, finally obtaining the embedded representation Z AE of nodes on attribute features.

Feature Fusion
Since the structure encoding module and attribute encoding module get the representation of nodes in two semantic spaces, before decoding, entities from different encoding modules are mapped into the same vector space through feature fusion. Z SE represents the embedded representation of the node on the structural characteristics, Z AE represents the embedded representation of the node on the attribute characteristics, and represents the concatenate operation. The final embedded representation of the node Z V is obtained through a linear transformation: is the parameter that the feature fusion module needs to learn.

Hyperedge Embedding
By fusing the outputs of the two encoders, the embedded representation of each node contains semantic information of different dimensions, which makes the nodes that make up the hyperedge have different structures and attributes, resulting in a large difference in their impact on the hyperedge. Therefore, it is necessary to design a hyperedge-level attention module to mine the importance of different nodes, so as to aggregate multisource information selectively and adaptively determine the most valuable node in the hypergraph propagation process. Specifically, first, we obtained the activation vector of node v i through a nonlinear function, then calculated its product with the learning parameter w, and finally standardized the product through a softmax layer. The formula for calculating the importance weight of node v i is as follows: σ 1 (•) and σ 2 (•) represent different nonlinear activation functions, and W EA and w are learning parameters with the size of d EA × d v and d EA × 1, respectively. Finally, the embedded representation of the hyperedge is obtained by the weighted sum of the embedded vector of the node and its weight coefficient:

Hyperlink Prediction
The purpose of the decoder is to learn a hyperlink scoring function to calculate the similarity of each hyperlink and screen the hyperlinks that best match the original network. The definition is as follows: W SC ∈ R d e ×d v and b SC ∈ R d e ×1 are the training parameters of the hyperlink prediction layer.

Loss Function
E 1 represents observed hyperlinks, that is, known relationships between vertices, 2 V − E 1 represents a group of unknown relationships, including some actual but unobserved hyperlinks, and E 2 represents a group of sampling sets from unknown relationships. The purpose of optimization is to maximize the number of hyperlinks scoring higher than the average score of E 2 in E 1 , that is, to make the observed hyperlinks scoring higher than the unobserved hyperlinks as much as possible. The definition is as follows: (•)represents a non-decreasing function (x) = log(1 + exp(x)), and functionis used here. For each known hyperedge e i ∈ E 1 , we generate a corresponding hyperedge e j ∈ E 2 . One-half of the nodes in e j come from the random sampling of e i , and the other half is from the random sampling of the remaining node set V − e i . To avoid experimental deviation caused by hyperlink size, we ensure that the number of nodes of each known hyperedge e i and the corresponding unknown hyperedge is the same e j . Algorithm 1 summarizes the optimization learning process of the TA-HLP model.

Algorithm 1 TA-HLP.
Input: S ∈ R m×n : Structure incidence matrix, H ∈ R m×d : Attribute incidence matrix, X: Attribute characteristic matrix, T: Max epochs Output: Z v ∈ R m×d v : Embedded representation of nodes, Z e ∈ R n×d e : Embedded representation of hyperedges 1: for epoch=1 to T do 2: Implement semantic feature mapping of nodes via Equation (5) 3: Calculate the importance weight of the neighborhood of the nodes via Equation (7) 4: Adopt weighted message passing mechanism to update the structure embedding of nodes via Equation (8) 5: Use hypergraph neural network to update the attribute embedding of nodes via Equation(10) 6: Combine structure and attribute features to obtain the final embedding of nodes via Equation (11)   7: Learn the importance weight of different nodes in the same hyperedge via Equation (12) 8: Aggregating node information with different weights to obtain embedding of hyperedge via Equation (13) 9: Construct a scoring function to predict the missing hyperlink via Equation (14) 10: Minimize loss function (15) to update model parameters 11: end for

Datasets
This paper uses real hypernetwork datasets to test the prediction ability of the designed model, including two coauthored network datasets: CoauthorshipCora (represented by CC) and CoauthorshipDBLP (represented by CD), two citation network datasets: Societa-tionCiteseer (represented by CCI) and SocietationPubmed (represented by CP), one organic response network dataset: OrganicReaction (represented by OR), and one restaurant access dataset: YelpRestaurant (represented as YR). The statistical scale of the data set is shown in Table 2, which is described in detail as follows: Coauthorship network [1,35]: Coauthorship datasets provide a way to analyze research collaboration. The author is represented by nodes, and the cooperation relationship participating in the same paper is characterized by hyperedges. The purpose is to anticipate other potential collaborations.
Citation network [35]: the paper is represented by nodes, and the features of each paper are the word bag representation of the document. In this paper, we generate hyperedges based on the hyperedge extension strategy proposed in the paper [22]. The purpose is to speculate on unknown reference relationships.
Organic reaction network [2]: The reconstruction of a chemical reaction network is a fundamental problem in computational chemistry. Chemical substances are represented by nodes, and the reaction relationship between organic substances is represented by hyperedges. The basic properties of substances constitute the characteristics of nodes. The purpose is to reconstruct a complete reaction network.
Restaurant access network [36]: The restaurant is represented by a node, and the hyperedge is formed by selecting the restaurant visited by the same user. Node features are composed of a hotspot code of latitude, longitude, city, and state, and the word codes of the first 1000 words in the corresponding restaurant name. The purpose is to recommend the favorite restaurant groups of users.

Baseline Methods
This paper selects the mainstream network embedding method and hyperlink prediction method as the baseline method.
DeepWalk [18]: This method obtains the node sequence by truncated random walk and then uses word2vec to transform the nodes into low-dimensional vectors.
LINE [19]: This method defines the first-order and second-order similarity of nodes and then minimizes the KL divergence representing the similarity and the actual similarity to obtain the vector representation of nodes.
node2vec [10]: In this method, the random walk strategy is weighed by parameters during the sampling process to generate high-quality node representation.
The above methods [10,18,19] does not provide a computer system for hyperlink similarity. This paper uses the following scoring function to measure the difference between hyperlinks: |e| is the number of nodes in the hyperedge, and x i is the learned node representation. CMM [2]: This method alternates the nonnegative matrix decomposition and least square matching in the adjacent space of vertices to find the most suitable hyperlink for incomplete networks.
C3MM [20]: This method introduces a "clique closure" hypothesis based on the CMM method to improve the performance of the algorithm.
DHNE [14]: The method defines a nonlinear bundle similarity function to keep the nondecomposability of the hypernetwork and designs a deep automatic encoder to capture the structure of the hyper network.
NHP [21]: NHP-U is a neural hyperlink predictor that uses GCN to refine the embedding of each vertex.
HGNN [22]: This method uses the hypergraph convolution operation to learn the hidden layer representation of higher-order data structures to code the hypergraph data correlation.
HyperGCN [23]: The nonlinear Laplacian operator is introduced to define the GCN on the hypergraph.
Hyper-SAGNN [24]: This method proposes a new self-attention mechanism to solve the representation learning problem of heterogeneous hypergraphs.
Hyper-Atten [25]: This method adds an attention module to the incidence matrix of the hypergraph to dynamically update the internal relationship between vertices.
Since the above methods [22,23,[23][24][25] do not perform the hyperlink prediction task, the hyperlink scoring function proposed in this paper is used for hyperlink prediction.

Evaluation Indicators
To comprehensively and accurately evaluate the prediction ability of the algorithm, this paper selects the common link prediction indicators AUC score (the Area Under the receiver operating characteristic Curve) and recall. AUC is a commonly used index to measure link prediction accuracy as a whole. It is defined as follows: pred(•)is the prediction score of the model for the genuinely missing hyperlink and the nonexistent hyperlink. N and M represent the numbers of two types of samples, respectively. The recall is an indicator to measure whether the algorithm prediction is comprehensive, that is, how many missing hyperlinks in the network are predicted. Here, we rank the prediction scores from high to low and select the prediction results of the first k (k is half of the actual number of missing) to calculate the Recall@k (represented by R@k), which is defined as follows: n is the correct predicted result in the top k rankings, and L is the actual number of missing hyperlinks in the network.

Experimental Design and Analysis
In the experiment, we randomly hide 20% of the hyperedges are randomly hidden as the test set, 70% of the hyperedges and the randomly generated unknown hyperedges are employed for model training, and 10% of the hyperedges are used as the verification set to optimize the hyper-parameters. In the training process, we use the Adam algorithm [37] to optimize our loss function and set the learning rate to 0.001. In the hyperedge-level attention module, σ 1 (•) selects the softmax function, σ 2 (•) selects the tanh function, and the rest of the nonlinear activation functions σ(•) select ReLU. Set the same embedding dimension d 1 = d 2 = 128 in both the structure and attribute encoder and use dropout to avoid overfitting, with a value of 0.5. All parameters in the algorithm are optimized through a grid search strategy. To eliminate the influence of random errors and evaluate the experimental results as accurately as possible, 100 independent experiments were carried out on each network, and the average results were obtained, as shown in Table 3. The AUC score and R@k of the TA-HLP model and eight benchmark models are analyzed below.

1.
From the overall observation, compared with all baseline methods, TA-HLP has achieved the most significant effect on all indicators under different datasets. The AUC score increased by 5-10 percentage points, and the recall increased by 5-7 percentage points. The superiority of the model in link prediction performance is verified. TA-HLP can enhance the expression learning ability of the original network and achieve more comprehensive and accurate prediction.

2.
DeepWalk, LINE, and node2vec are all traditional complex network embedding methods. To predict hyperlinks, we calculate the average value of pairwise similarity to represent the similarity of the entire hyperedge, neither maintaining the irresolvability of the hyperedge nor ignoring the content attributes of the node. Although CMM and DHNE use hypergraphs to model the relationship between nodes that transcend pairwise connections, they ignore the attribute information of nodes. Therefore, the performance of these two methods on data sets is poor, which proves that both higher-order structure and attribute modalities are necessary for link prediction.
Their complementary relationship can enhance the depth of network embedding and better characterize network information. C3MM believes that the newly generated hyperlinks in the network are more likely to have evolved from the recently closed cliques, mainly from the perspective of temporal evolution. Similarly, there is no systematic learning for the attribute information of data, so our data is only slightly improved compared with the CMM method.
3. NHP, HGNN, HyperGCN, Hyper-SAGNN, and Hyper-Atten all combine the deep neural network, and also consider the network structure and attribute characteristics. The experimental results also prove their effectiveness. However, in the modeling process, they simply use the attribute characteristics of the convolutional layer aggregation node. They do not fully use the potential higher-order interaction between entities and attributes, resulting in poor prediction performance. To fully mine the association between nodes and attributes, this model adopts a hypergraph structure to model the correlation between nodes and attributes. Each attribute is defined as a hyperedge and nodes with the same attribute form the adjacency relationship. At the same time, the deep association between nodes and attributes is captured based on the "node attribute node" transformation. We also introduce a two-layer attention mechanism for reinforcement learning of important nodes, so the experimental results of TA-HLP are obviously better than other algorithms. 4.
In addition, the Hyper-Atten model has the best performance among all baseline methods. This may be because the Hyper-Atten model additionally applies an attention module to the incidence matrix to update the degree of association between nodes in real-time. It can be seen that more attention to important content is effective. However, compared with the Hyper-Atten model, our model can distinguish the importance of different levels in a more fine-grained way. It measures the impact of different components on network embedding from the perspectives of nodes and hyperedges, respectively, so that the embedded representation of the network can more truly restore the connotation of structure and semantics, which is also an important reason for the significant improvement of our model performance.

Ablation Experiment
To further illustrate the effectiveness of the joint encoding module and the two attention mechanisms in the TA-HLP model, we conducted the following ablation experiments. This paper designs four variants of the algorithm. TA-HLP-1 removes the structure coding module from the original model, and TA-HLP-2 removes the attribute coding module from the original model. Both variants only learn the structure or attribute characteristics of nodes. Compared with TA-HLP itself, these two methods show poor performance. This shows that compared with a single learning module, the dual channel encoder proposed in this paper can more effectively learn the vector representation of entities. They capture the similarity of entities from different dimensions, retain more diversified semantic information in the original network, and have obvious advantages in improving link prediction performance. TA-HLP-3 removes the node-level attention mechanism and only studies the importance of each node for hyperedge embedding. TA-HLP-4 removes the hyperedge level attention mechanism and considers that the nodes in the same hyperedge have the same degree of influence and only consider the contribution of node neighbors to the structure pattern. Figure 3 shows the experimental results proving the effectiveness of the two attention mechanisms. The prediction performance of the two variants on each data is lower than that of the proposed algorithm, which proves the advantages of the two attention strategies in the encoding and decoding modules. They can help the model extract more critical information, carry out targeted learning of node embedding and hyperedge embedding, and, thus, improve the expression ability of the algorithm.

Attention Mechanism Analysis
When learning embedding vectors of hyperedges, we consider the importance of different nodes and assign them different weights. To better understand the meaning of weight, we conduct an in-depth analysis of the hyperedge-level attention mechanism through a specific case. CoauthorshipDBLP is a network dataset of co-authored papers, which records the publication of papers from four different research fields. For details, refer to Section 5.1 of the article. Taking hyperedge as an example, Figure 4 shows the topological relationship of e 543 , and Figure 5 shows the attention weights of different nodes in the embedded representation of e 543 . e 543 is a paper published by a machine in the field of learning. It includes four authors v 1013 , v 2152 , v 3203 , and v 29019 , respectively. The weight of the first three nodes is larger, and the weight of the last node is smaller. According to the bag feature vector of the paper, the authors represented by v 1013 , v 2152 , and v 3203 have published a large number of papers in the field of machine learning, and they also have cooperative relations with other papers in the field, so they have more similarities with the category and affiliation, and make greater contributions to the publication of the paper e 543 . However, most of the papers published by v 29019 are in the field of information retrieval, which is just an extension of his research field. Therefore, the features of v 29019 cannot well express the information of hyperedge e 543 . v 29019 is not a key node to form hyperedge embedding and should be given a lower weight. According to the above analysis, the hyperedge-level attention weight can evaluate the contribution of nodes to the hyperedge, and assign higher weights to some valuable nodes.

Parameter Sensitivity Analysis
To analyze the impact of different parameters on the TA-HLP model, we conducted experiments on node embedding vectors of different dimensions on six datasets. The results are shown in Figure 6. In general, before the TA-HLP model achieves the best effect, with the increase in embedding dimension, more and more node features are captured by the encoder, and the prediction results of the model are steadily promoted; however, when the optimal performance is reached, too many embedded dimensions may lead to information redundancy, which will not only increase the computational burden of the model but also interfere with the fitting ability of the model, resulting in a continuous decline in prediction performance. When the embedding dimension is 64 or 128, the TA-HLP model achieves the best effect. When the dimensions are 8 and 1024, the impact of the model is poor. It can be seen that too few or too many embedding dimensions may result in insufficient or invalid information used by the model, which may lead to overfitting and poor effect.

Conclusions and Future Work
In this paper, we study the problem of link prediction in attribute hypernetworks. Aiming at how to collect more semantic information from the original network for comprehensive and multivariate reconstruction, a general framework is proposed to learn the network structure and attribute information jointly. The encoder consists of a structure encoder based on node level attention mechanism and an attribute encoder based on the hypergraph neural network. It extracts entity features from the topological structure and attributes content, respectively, fully uses the semantic association between nodes and attributes, and captures the cross-modal interaction of higher-order structure and node content. In addition, the hyperedge-level attention mechanism is introduced in the decoder to aggregate nodes with different contributions in the hyperedge. The experimental results also demonstrate that the proposed method can deeply mine the composite interaction characteristics of the network and effectively improve the link prediction performance of the model. This shows that hypernetworks can not only effectively model high-order mappings between entities and relationships, but also accurately depict interactions between entities and attributes, which further expands the research direction of hypernetworks and is a very valuable work for in-depth exploration of the issue of "why the complex system has complexity".
In the future, it is worth further exploring how to design an embedding method that integrates attribute-level attention mechanisms. At the same time, extending the proposed method to temporal networks to perceive the evolution pattern of structure and attribute characteristics is also an important research topic of attribute hypernetwork prediction.