MEAHNE: miRNA–Disease Association Prediction Based on Semantic Information in a Heterogeneous Network

Correct prediction of potential miRNA–disease pairs can considerably accelerate the experimental process in biomedical research. However, many methods cannot effectively learn the complex information contained in multisource data, limiting the performance of the prediction model. A heterogeneous network prediction model (MEAHNE) is proposed to make full use of the complex information contained in multisource data. To fully mine the potential relationship between miRNA and disease, we collected multisource data and constructed a heterogeneous network. After constructing the network, we mined potential associations in the network through a designed heterogeneous network framework (MEAHNE). MEAHNE first learned the semantic information of the metapath instances, then used the attention mechanism to encode the semantic information as attention weights and aggregated nodes of the same type using the attention weights. The semantic information was also integrated into the node. MEAHNE optimized parameters through end-to-end training. MEAHNE was compared with other state-of-the-art heterogeneous graph neural network methods. The values of the area under the precision–recall curve and the receiver operating characteristic curve demonstrated the superiority of MEAHNE. In addition, MEAHNE predicted 20 miRNAs each for breast cancer and nasopharyngeal cancer and verified 18 miRNAs related to breast cancer and 14 miRNAs related to nasopharyngeal cancer by consulting related databases.


Introduction
miRNA is a type of noncoding RNA that plays an important role in the regulation of gene expression in eukaryotes [1][2][3]. The important roles of miRNAs in the occurrence and development of diseases have been revealed through the continuous improvement of biological technology [4][5][6]. During the development of diseases, miRNA can inhibit or promote disease by interacting with miRNA targets [7,8]. Identifying the miRNAs related to a disease is of great help for prevention and diagnosis. However, the number of elements in the existing miRNA set is much larger than the number of miRNAs associated with diseases, representing a considerable challenge in biomedical research. Therefore, computational methods are used to predict the links between miRNAs and diseases. The computational research methods used to predict the association between miRNA and disease can be divided into three categories: prediction based on similarity measures, machine-learning-based methods, and graph-neural-network-based methods.
The central idea of the method based on similarity measures is that miRNAs with similar functions may be associated with similar diseases. Jiang et al. [9] established an miRNA can learn rich semantic information. However, indiscriminately aggregating different types of nodes can make node embeddings too similar.
Here, we propose a new semantic-based attention mechanism for use on heterogeneous graphs; we applied the proposed mechanism to predict potential miRNA-disease connections in heterogeneous networks. We first collected multisource data to form a heterogeneous network. We used metapaths to split the original graph into multiple subgraphs. Then, a nonlinear neural network was used to mine the semantic information contained in the metapath instances in the subgraphs, which learned the diverse semantic information from different metapath modes. The obtained semantic information was encoded into association weights through the attention mechanism. The target node aggregates the information of its metapath neighbors through association weights. Finally, the representations of target nodes under multiple metapaths are fused through a nonlinear neural network. This model can make good use of metapaths to learn the complex association information of multisource biological networks.

Data Collection and Construction of Heterogeneous Networks
In this section, we introduce the data we used, which consist of three types of nodes, namely miRNA, disease, and gene nodes, and four types kinds of relationships between the three types of nodes. The four types of relationships are miRNA-disease relationships, miRNA-gene relationships, disease-gene relationships, and protein-protein interaction relationships (Tables 1 and 2). We collected related links between miRNAs and diseases from the HMDD3.2 [33] database. HMDD is a reliable database that specifically collects miRNA-disease associations. We collected 17,972 links between 1206 miRNAs and 893 diseases and integrated miRNAs and diseases as nodes and miRNA-disease associations as edges into the heterogeneous network. We collected related links between miRNAs and target genes from the Circ2disease [34] database. We selected 4676 links between 202 miRNAs and 1713 genes and integrated miRNAs and target genes as nodes and the associations between them as edges into the heterogeneous network. We collected the related links between diseases and genes from DisGeNET [35]. We selected 84,038 links between 11,181 diseases and 9703 genes and integrated diseases and genes as nodes and the associations between them as edges into the heterogenous network.
When constructing the PPI network, we used the PPI network data retrieved directly from HerGePred [36]. We selected the genes that are related to miRNAs and disease. The 105,171 associations between these genes were integrated into the heterogeneous network as edges. Finally, we established a heterogeneous network with 1296 miRNAs, 11,783 diseases, 10,116 genes, and 211,857 edges. Heterogeneous networks have many types of nodes and many types of relationships. The paths composed of different types of nodes and different types of instances contain rich semantic information, which is not available in homogeneous graphs. To learn the semantic information in heterogeneous graphs, the concept of a metapath is proposed. For example, 1 = a 1 r 1 → a 2 r 3 → a 3 r 5 → a 1 is a kind of metapath, and 2 = a 2 r 3 → a 3 r 4 → a 2 is another kind of metapath. In i ( i ∈ P ), i represents a specific metapath, and P represents all types of metapaths in the heterogeneous graph. For a i ∈ A and r i ∈ R, A represents the collection of all node types in the heterogeneous graph, and R represents the collection of all relationship types in the heterogeneous graph.
In this experiment, we used multiple metapaths to mine heterogeneous networks. The original network was sampled under each metapath to obtain subgraphs. We called all the node sequences on the subgraph that conformed to the metapath mode metapath instances. For example, v 1 a 1 → v 5 a 2 → v 3 a 3 → v 2 a 5 is a metapath instance under 1 in which v i a i represents the ith node of type a i . The sampling subgraph under each metapath contained the target node and the metapath instance connected to the target node. We called the nodes on the subgraph that are of the same type as the target node metapath neighbors.

Specific Steps
In this section, we introduce the main methods, ideas, and specific implementation details of the MEAHNE model. The MEAHNE model is divided into six parts: A. node conversion, B. subgraph extraction, C. metapath instances semantic extraction, D. node aggregation method based on semantic attention, E. multisemantic information fusion, and F. link prediction. Figure 1 shows the overall framework of MEAHNE.

A. Node conversion
If we want to learn representations of heterogeneous networks, we need to perform interactive calculations on the nodes of the graph. However, heterogeneous graphs have multiple types of nodes, and different types of nodes are located in different spaces. If the nodes are not processed, interactive calculation between nodes becomes too difficult, so we first converted all types of nodes into the same space to facilitate calculations between nodes as follows.
A trainable linear transformation matrix was set for each type of node, and original nodes of different types were projected into the same space, as shown in Formula (1): where x a i represents the original feature vector of node type a i ; and M a i ∈ R d ×d a i , in which d represents the feature space dimension after space conversion, and d a i represents the original feature dimension of node type a i .

B. Subgraph extraction
To mine heterogeneous graphs in multiple metapaths, the first step is to separate the corresponding subgraphs based on specific metapaths.
We separated the subgraph (G i ) according to the metapath ( i ); G i represents the subgraph mined in i mode. The node sequence corresponding to i mode in G i was sampled and denoted as P(v, u), which connects the target node (v) and its metapath neighbor (u). on the instance. All the nodes on the metapath instance were concatenated according to the order of the metapath, as shown in Formula (2): ℎ ( , ) = ∥ ( ( , )) = ∥ ∀ ∈{ ( , ) } (ℎ ) (2) where ( , ) represents the metapath instance from to , ( , ) represents the set of nodes on the metapath instance, and ℎ ( , ) represents the vector obtained by concatenating the vectors of the nodes on the metapath instance ( ( , )). The subgraph under each metapath and the metapath edges on subgraphs are extracted. C. We encoded the semantic information into values as semantic weights to aggregate nodes of a single type. D. The semantic information on metapath edges was aggregated to obtain a more powerful

C. Metapath instances semantic extraction
When mining the information from the corresponding subgraph (G i ) under a single metapath ( i ), different types of nodes are transformed into the same space through space conversion, which allows different types of nodes to represent each other. The metapath instance is composed of different types of nodes connected to each other and contains rich semantic information. Therefore, to learn the semantic information on the metapath instance in the subgraph, we first integrated the information on the metapath instance. Each metapath instance was represented as a vector that represents the semantic information on the instance. All the nodes on the metapath instance were concatenated according to the order of the metapath, as shown in Formula (2): where P(v, u) represents the metapath instance from v to u, m P(v,u) represents the set of nodes on the metapath instance, and h P(v,u) represents the vector obtained by concatenating the vectors of the nodes on the metapath instance (P(v, u)). A nonlinear neural network was used to learn vector h, resulting in semantic information of the metapath instance. A nonlinear neural network, which has strong information extraction capabilities, is a network composed of multiple fully connected layers and nonlinear activation functions, as shown in Formula (3): where W j i represents the weight matrix of the jth fully connected layer of the neural network under metapath i , the bias value of the kth layer of the neural network under metapath i is b k i , X represents the input feature, and φ l i represents the vector representation of input vector X learned through l connection layers in the neural network under metapath i . We used vector h P(v,u) as the input of the nonlinear neural network to obtain the semantic information of the metapath instance, as shown in Formula (4):

D. Node aggregation method based on semantic attention
After obtaining the semantic information from the metapath instances, we can aggregate the semantic information into the target nodes connected to these metapath instances; the semantic information is obtained by the fusion of different types of nodes. If the target node only aggregates semantic information, each type of node contains information about other types of nodes, causing different types of nodes to lose their distinction. To make the node representation more complete based on the aggregation of semantic information, we aggregated the same types of nodes, and the embeddings obtained for different types of nodes were strongly distinguishable. For aggregating nodes of the same type, we designed a method to encode semantic information into attention weights and used the obtained attention coefficient to aggregate metapath neighbors. Finally, we fused the information obtained by the aggregation of nodes of the same type and semantic information from metapath instances as the final node representation.
The metapath subgraph retains only the nodes of the same type as the target node to form a homogenous graph (G). Therefore, graph G only contains the target node and metapath neighbor of the target node. We encoded the semantic information on the instance using the attention mechanism as a weight value-the correlation strength coefficient between the target node and the metapath neighbor, as shown in Figure 2 and Equations (5) and (6).
e vu where represents the value encoded by the attention mechanism; Leaky_relu() is a nonlinear activation function; a represents the attention weight matrix under metapath ; N v represents the set of metapath neighbors connected to the target node (v) on the subgraph in mode ; and w vu represents the semantic weight between node v and node u, where node u is the metapath neighbor of node v.
Next, the metapath neighbors of the same type were aggregated according to the weight (w vu ). The semantic information was also integrated to ensure the integrity of the node embedding.
To reasonably integrate semantic information during the node aggregation stage, we performed secondary learning on semantic information by continuously adjusting the proportion of semantic information through end-to-end optimization and by adaptively learning the optimal semantic information. We designed a trainable matrix to optimize Life 2022, 12, 1578 7 of 14 the weights of semantic information and added nonlinear activation operations to the optimization results, as shown in Formula (7).
where b represents a learnable weight matrix under metapath , and the content of semantic information is continuously adjusted through end-to-end learning. We used the learned metapath semantic weight to aggregate the metapath neighbors and added the semantic information learned twice. Therefore, the target node could be more comprehensively expressed, as shown in Formula (8): In this way, the target node not only learned the semantic information on the metapath instance but also learned the information obtained by the aggregation of nodes of the same type. The nodes of different types remained distinct, making the representation of the nodes more complete. Next, the metapath neighbors of the same type were aggregated according to the weight ( ). The semantic information was also integrated to ensure the integrity of the node embedding.
To reasonably integrate semantic information during the node aggregation stage, we performed secondary learning on semantic information by continuously adjusting the proportion of semantic information through end-to-end optimization and by adaptively learning the optimal semantic information. We designed a trainable matrix to optimize the weights of semantic information and added nonlinear activation operations to the optimization results, as shown in Formula (7).
where represents a learnable weight matrix under metapath , and the content of semantic information is continuously adjusted through end-to-end learning.
We used the learned metapath semantic weight to aggregate the metapath neighbors and added the semantic information learned twice. Therefore, the target node could be more comprehensively expressed, as shown in Formula (8): In this way, the target node not only learned the semantic information on the meta-

E. Multisemantic information fusion
In the above steps, we only learned the graph under a single metapath. Our model learned the graph in multiple metapath modes and generated the representation of the target node in multiple metapath modes. We used neural network methods to integrate node representations under multiple metapaths, as shown by Formula (9): where h i v represents the embedding obtained by aggregating the target node (v) under metapath i , and h v represents the result of concatenating the representation of the target node (v) under all metapaths. Then, the embedding (h v ) was input into the nonlinear neural network to learn a low-dimensional embedding that fuses the target node representation under multiple metapaths, as shown in Formula (10): After learning through a nonlinear neural network, H v represented a low-dimensional embedding that fused multiple metapath representation results as the final representation of the target node.

F. Link prediction
The vector inner product was used as the score of the link strength of the two nodes. If the two vectors are highly correlated, then the score of the node inner product will be higher. We used this as the basis for link prediction, as shown in Formula (11): Our link prediction was between miRNA and disease. The higher the prediction score, the stronger the correlation, and the lower the prediction score, the weaker the correlation. We used two-class cross entropy as the optimization target. Our optimization goal is shown in Formula (12): (12) where Φ represents the set of miRNA and disease pairs that have been verified to be associated, and Φ − represents the set of all miRNA-disease pairs that have not been experimentally verified. The goal of optimization is to increase the score between verified node pairs and decrease that between unverified node pairs. Because our model is an end-to-end training model, the parameters in the model are continuously optimized during the training process, and the continuously optimized parameters enable us to achieve the optimization goal.

Experimental Data and Performance Evaluation
We built miRNAs, diseases, and genes into a network and conducted experiments to compare our model with other comparative models on the network. The links that are verified from the databases in our dataset are positive samples, and the others are negative samples. We split the dataset into training (70%), validation (10%), and test (20%) sets using a random sampling method without repetition. The ratio of positive-to-negative samples in all sets is 1:1. Parameters of our model were set as follows: learning rate, 0.005; dropout rate, 0.5; network node dimension, 90; number of layers for semantic extraction, 1; number of neighbor samples, 60. To prevent the model from overfitting, we used an early stopping mechanism and set the patience of the mechanism to 3. We compared our method with other heterogeneous network embedding methods under three metrics: area under the receiver operating characteristic curve (AUC), area under the precision-recall curve (AP), and the prediction accuracy of the highest K in the prediction results (Precision@K).

Factors Influencing Model Performance
Two factors significantly affect model performance: the number of sampled neighbors and the number of semantic extraction layers. The experimental results show that when the number of sampled neighbors is 40 ( Figure 3) and there is one semantic extraction layer (Figure 4), the model achieved the best performance. In the experiment, we used the control variable method to evaluate the effect of parameters on the model by changing one parameter and keeping the other parameters fixed.

Effect of the Number of Sampled Neighbors
Some nodes in the network have many neighbors, whereas others have few neighbors. If a node aggregates all its neighbors, some nodes receive too much information and other nodes receive too little information. This can considerably affect the predictive performance of the model. To solve this problem, our model adopts a random sampling method. Each node samples a fixed number of neighbors. In this way, the information of all nodes is relatively balanced, which can considerably improve the effect of the model. We analyze the effect of sampling number on the model by modifying the number of node-sampling neighbors ( Figure 3). Experimental results show that our model performed best when the number of neighbors is 40 because sampling 40 neighbors can ensure that each node has enough neighbors to be sampled. If too few neighbors are sampled, the performance of the model will suffer from a lack of information.

Effect of Number of Semantic Extract Layers
Assigning semantic attention weights to nodes is a key feature of the model. Semantic information directly affects the size of semantic attention weights. The number of layers of semantic information extraction affects the performance of the model. If the number of extraction layers is large, an overfitting effect is easily produced, resulting in partial loss of semantic information. The experimental results confirmed this (Figure 4). The model performs best with one extraction layer.

Comparison with Other Models
Comparison experiments have been conducted using the representative graph representation method in recent years; the heterogeneous graph representation method met-apath2vec and the best performing matepath (miRNA-disease-gene-miRNA) are selected after multiple experiments. Because GAT is a homogeneous network method, we used metapaths to split the original network into homogeneous networks, used the GAT method to extract the information of homogeneous networks, and selected the best result

Effect of the Number of Sampled Neighbors
Some nodes in the network have many neighbors, whereas others have few neighbors. If a node aggregates all its neighbors, some nodes receive too much information and other nodes receive too little information. This can considerably affect the predictive performance of the model. To solve this problem, our model adopts a random sampling method. Each node samples a fixed number of neighbors. In this way, the information of all nodes is relatively balanced, which can considerably improve the effect of the model. We analyze the effect of sampling number on the model by modifying the number of node-sampling neighbors (Figure 3). Experimental results show that our model performed best when the number of neighbors is 40 because sampling 40 neighbors can ensure that each node has enough neighbors to be sampled. If too few neighbors are sampled, the performance of the model will suffer from a lack of information.

Effect of Number of Semantic Extract Layers
Assigning semantic attention weights to nodes is a key feature of the model. Semantic information directly affects the size of semantic attention weights. The number of layers of semantic information extraction affects the performance of the model. If the number of extraction layers is large, an overfitting effect is easily produced, resulting in partial loss of semantic information. The experimental results confirmed this (Figure 4). The model performs best with one extraction layer.

Comparison with Other Models
Comparison experiments have been conducted using the representative graph representation method in recent years; the heterogeneous graph representation method met-apath2vec and the best performing matepath (miRNA-disease-gene-miRNA) are selected after multiple experiments. Because GAT is a homogeneous network method, we used metapaths to split the original network into homogeneous networks, used the GAT method to extract the information of homogeneous networks, and selected the best result

Effect of the Number of Sampled Neighbors
Some nodes in the network have many neighbors, whereas others have few neighbors. If a node aggregates all its neighbors, some nodes receive too much information and other nodes receive too little information. This can considerably affect the predictive performance of the model. To solve this problem, our model adopts a random sampling method. Each node samples a fixed number of neighbors. In this way, the information of all nodes is relatively balanced, which can considerably improve the effect of the model. We analyze the effect of sampling number on the model by modifying the number of node-sampling neighbors ( Figure 3). Experimental results show that our model performed best when the number of neighbors is 40 because sampling 40 neighbors can ensure that each node has enough neighbors to be sampled. If too few neighbors are sampled, the performance of the model will suffer from a lack of information.

Effect of Number of Semantic Extract Layers
Assigning semantic attention weights to nodes is a key feature of the model. Semantic information directly affects the size of semantic attention weights. The number of layers of semantic information extraction affects the performance of the model. If the number of extraction layers is large, an overfitting effect is easily produced, resulting in partial loss of semantic information. The experimental results confirmed this (Figure 4). The model performs best with one extraction layer.

Comparison with Other Models
Comparison experiments have been conducted using the representative graph representation method in recent years; the heterogeneous graph representation method metap-ath2vec and the best performing matepath (miRNA-disease-gene-miRNA) are selected after multiple experiments. Because GAT is a homogeneous network method, we used metapaths to split the original network into homogeneous networks, used the GAT method to extract the information of homogeneous networks, and selected the best result as the performance of GAT model. HAN, MAGNN, HECO, and GAEMDA are all well-performing heterogeneous graph neural network methods. For the sake of fairness, we adjusted these Life 2022, 12, 1578 10 of 14 models to the best results as the model effects. Our model achieved the best performance under both AUC and AP metrics ( Table 3). The receiver operating characteristic (ROC) and precision-recall (P-R) curves are shown in Figure 5. The confusion matrix is shown in Figure 6.The codes of Metapath2vec, GAT, and HAN were derived from the open-source graph representation learning framework OpenHINE. The rest of the comparative test codes were retrieved from their official GitHub codes (Supplementary Materials). as the performance of GAT model. HAN, MAGNN, HECO, and GAEMDA are all wellperforming heterogeneous graph neural network methods. For the sake of fairness, we adjusted these models to the best results as the model effects. Our model achieved the best performance under both AUC and AP metrics ( Table 3). The receiver operating characteristic (ROC) and precision-recall (P-R) curves are shown in Figure 5. The confusion matrix is shown in Figure 6.The codes of Metapath2vec, GAT, and HAN were derived from the open-source graph representation learning framework OpenHINE. The rest of the comparative test codes were retrieved from their official GitHub codes (Supplementary Materials).   as the performance of GAT model. HAN, MAGNN, HECO, and GAEMDA are all wellperforming heterogeneous graph neural network methods. For the sake of fairness, we adjusted these models to the best results as the model effects. Our model achieved the best performance under both AUC and AP metrics ( Table 3). The receiver operating characteristic (ROC) and precision-recall (P-R) curves are shown in Figure 5. The confusion matrix is shown in Figure 6.The codes of Metapath2vec, GAT, and HAN were derived from the open-source graph representation learning framework OpenHINE. The rest of the comparative test codes were retrieved from their official GitHub codes (Supplementary Materials).

Case Study
In order to verify the effectiveness of our model, we selected two cancers in the dataset to predict potential cancer-associated miRNAs. The model predicted 18 validated breast-cancer-related miRNAs that were included in our dataset. The model predicted 14 validated nasopharyngeal-carcinoma-related miRNAs that were not included in our dataset, as shown in Tables 4 and 5. "*" indicates that the miRNA predicted by the model has been verified in the dbDEMC database [38].

Ablation Experiment
In order to demonstrate the effectiveness of the semantic attention mechanism of our model, we removed the semantic attention module and replaced it with summation. Accordingly, we designed a comparative experiment, changing the hidden layer dimensions of the models and observing how the models performed. The experimental results are shown in Figure 7. The performance of our model diminished significantly without the use of a semantic attention module. The experimental results illustrate the effectiveness of the semantic attention module. NS_MEAHNE means MEAHNE without semantic attention module.

Conclusions
In this paper, we propose a heterogeneous graph neural network model that can fully learn a variety of information in a heterogeneous network. This model integrates the se mantic information and node type information into the node representation, which no only avoids the early-summarization [25] problem but also avoids the problem of homog enization of different types of nodes due to a large amount of aggregated semantic infor mation and maintains the distinction of nodes. We propose an attention mechanism based on the semantics of the metapath instance. Under each metapath, the semantic infor mation of the learned metapath instance is encoded into attention weights to perform node aggregation, and the semantic information is also integrated into the node represen tation so that nodes retain comprehensive information. Finally, a multilayer neural net work is used to fuse the representation of multiple metapaths as the final node represen tation. Experimental results show that our model performs better than other models.
However, there is still room for improvement with respect to our model. The seman tic information obtained through the semantic information extraction layer considerably affects the allocation of attention weights, and we used a nonlinear neural network as th extraction tool. Whether other graph neural network methods can be used for semanti information extraction deserves further investigation.
Author Contributions: C.H. and J.L. designed the study, performed bioinformatics analysis, and drafted the manuscript. All authors performed the analysis. J.L. conceived of the study, participated in its design and coordination, and drafted the manuscript. All authors have read and agreed to th published version of the manuscript.

Conclusions
In this paper, we propose a heterogeneous graph neural network model that can fully learn a variety of information in a heterogeneous network. This model integrates the semantic information and node type information into the node representation, which not only avoids the early-summarization [25] problem but also avoids the problem of homogenization of different types of nodes due to a large amount of aggregated semantic information and maintains the distinction of nodes. We propose an attention mechanism based on the semantics of the metapath instance. Under each metapath, the semantic information of the learned metapath instance is encoded into attention weights to perform node aggregation, and the semantic information is also integrated into the node representation so that nodes retain comprehensive information. Finally, a multilayer neural network is used to fuse the representation of multiple metapaths as the final node representation. Experimental results show that our model performs better than other models.
However, there is still room for improvement with respect to our model. The semantic information obtained through the semantic information extraction layer considerably affects the allocation of attention weights, and we used a nonlinear neural network as the extraction tool. Whether other graph neural network methods can be used for semantic information extraction deserves further investigation.