HLGNN-MDA: Heuristic Learning Based on Graph Neural Networks for miRNA–Disease Association Prediction

Identifying disease-related miRNAs can improve the understanding of complex diseases. However, experimentally finding the association between miRNAs and diseases is expensive in terms of time and resources. The computational screening of reliable miRNA–disease associations has thus become a necessary tool to guide biological experiments. “Similar miRNAs will be associated with the same disease” is the assumption on which most current miRNA–disease association prediction methods rely; however, biased prior knowledge, and incomplete and inaccurate miRNA similarity data and disease similarity data limit the performance of the model. Here, we propose heuristic learning based on graph neural networks to predict microRNA–disease associations (HLGNN-MDA). We learn the local graph topology features of the predicted miRNA–disease node pairs using graph neural networks. In particular, our improvements to the graph convolution layer of the graph neural network enable it to learn information among homogeneous nodes and among heterogeneous nodes. We illustrate the performance of HLGNN-MDA by performing tenfold cross-validation against excellent baseline models. The results show that we have promising performance in multiple metrics. We also focus on the role of the improvements to the graph convolution layer in the model. The case studies are supported by evidence on breast cancer, hepatocellular carcinoma and renal cell carcinoma. Given the above, the experiments demonstrate that HLGNN-MDA can serve as a reliable method to identify novel miRNA–disease associations.


Introduction
Since the discovery of microRNAs, an increasing number of researchers have investigated these molecules [1][2][3][4][5][6][7][8]. In particular, the discovery of a regulatory role for microRNAs in cellular activity suggests that these molecules are inextricably linked to many diseases [9][10][11][12][13]. Uncovering microRNA-disease associations has important implications for understanding disease mechanisms and assisting disease treatment [14][15][16][17][18][19][20]. However, due to the long time period required for biological experiments and the high resource costs, the use of computational methods to predict miRNA-disease associations has now become an important means for guiding traditional biological experiments, and it has greatly improved the efficiency of discovering disease-related miRNAs.
Some miRNA-disease association prediction models derive from combinatorial optimization theory and metric learning ideas, such as matrix-related operations and score estimation. MCMDA (matrix completion for miRNA-disease association) [21] performs matrix completion by applying a singular value thresholding algorithm on known miRNAdisease associations, and ILRMR (improved low-rank matrix recovery) [22] improves lowrank matrix recovery by referencing a weight matrix to enhance the prediction accuracy. MDMF (miRNA-disease Based on Matrix Factorization) [23] uses matrix factorization with disease similarity constraints to identify potential miRNA-disease associations. MDHGI (Decomposition and Heterogeneous Graph Inference) [24] discovers new miRNA-diseases associations by integrating the predicted association probability obtained from matrix graph convolutional networks via graph sampling through the feature and topology graph to improve the training efficiency and accuracy. Instead of using heterogeneous graphs, MDA-GCNFTG constructs a homogeneous graph with MDPs (miRNA-disease pairs) as the nodes, which is the biggest difference with respect to our method. Although both use the GCN algorithm, the background graphs and the model focused on are completely different. For this task, the models focus on solving different problems.
However, there are still shortcomings in these recently proposed excellent computational models. Most of the current methods for predicting miRNA-disease associations are based on a strong assumption of similarity data. However, different models have different definitions of similarity, which makes the prediction results inaccurate. In addition, the miRNA functional similarity is incomplete and derived from known associations. Additionally, inconsistencies and incompleteness further lead to inaccurate prediction results. In this article, a heuristic learning method based on graph neural networks for miRNA-disease association prediction (HLGNN-MDA) is proposed. Inputting the whole miRNA-disease association network into the graph neural network for the model training causes high computational costs. To overcome this, we choose to train the graph neural network on enclosing subgraphs. Figure 1 shows the overall framework of HLGNN-MDA. Our HLGNN-MDA model improves the graph neural network so that it can learn the information between miRNA and disease nodes and the topological relationships among homogeneous nodes at the same time. Compared with the previous computational models, our proposed method does not require too many similarity data and improves the applicability of graph neural networks  Figure 2 where each sub-figure of three in the second part corresponds to the processing results of three convolution modules. As shown in Figure 3, each convolution module has the same structure. The prediction results are obtained through the graph convolution layer, the graph pool ing layer, 1D convolution and fully connected layers. Finally, the predicted results are verified against databases. The enclosing subgraphs are input into the graph neural network. The graph convolution layers adopt the three-layer structure model as shown in Figure 2 where each sub-figure of three in the second part corresponds to the processing results of three convolution modules. As shown in Figure 3, each convolution module has the same structure. The prediction results are obtained through the graph convolution layer, the graph pooling layer, 1D convolution and fully connected layers. Finally, the predicted results are verified against databases.

Performance Analysis of HLGNN-MDA Mode
In this section, we compare HLGNN-MDA with other related methods using t cross-validation [20]. The algorithms selected for comparison were BLHARMDA tite local models and hubness-aware regression for miRNA-disease association p tion) [51], BNPMDA (bipartite network projection for miRNA-disease association p tion) [52], IMCMDA (inductive matrix completion for miRNA-disease association p tion) [52], LFEMDA (predict miRNA-disease associations by latent features extr [53] and MKRMDA (multiple kernel learning-based Kronecker regularized least s for miRNA-disease association prediction) [54]. All models for performance analy periments adopted the association data from HMDD v2.0. We also performed a mance comparison with the DGCNN model in the subsequent experiments of grap volutional layer analysis, which is the baseline of our HLGNN-MDA convolutiona ule.
The miRNA functional similarity and disease semantic similarity were down directly from IMCMDA. HLGNN-MDA used 5430 known miRNA-disease assoc as positive samples and a random sample of the same number of candidate assoc as negative samples. The dataset with the positive and negative samples togeth then randomly divided into ten parts. One copy of each round was selected as the t and the remaining nine copies were used as the training set. The tenfold cross-vali was completed in turn. In each round, HLGNN-MDA removed the positive sam the test set from the adjacency matrix. All five comparison algorithms completed t cross-validation on the miRNA-disease association matrix. There were six eval metrics to analyze the model: ACC, precision, recall, AUROC, AUPR and MCC. T sults are shown in Table 1, and their corresponding ROC curves are shown in Fi The corresponding precision-recall (PR) curves are shown in Figure 5.

Performance Analysis of HLGNN-MDA Mode
In this section, we compare HLGNN-MDA with other related methods cross-validation [20]. The algorithms selected for comparison were BLHAR tite local models and hubness-aware regression for miRNA-disease assoc tion) [51], BNPMDA (bipartite network projection for miRNA-disease assoc tion) [52], IMCMDA (inductive matrix completion for miRNA-disease assoc tion) [52], LFEMDA (predict miRNA-disease associations by latent featur [53] and MKRMDA (multiple kernel learning-based Kronecker regularized for miRNA-disease association prediction) [54]. All models for performanc periments adopted the association data from HMDD v2.0. We also perfor mance comparison with the DGCNN model in the subsequent experiments volutional layer analysis, which is the baseline of our HLGNN-MDA convo ule.
The miRNA functional similarity and disease semantic similarity were directly from IMCMDA. HLGNN-MDA used 5430 known miRNA-diseas as positive samples and a random sample of the same number of candidat as negative samples. The dataset with the positive and negative samples then randomly divided into ten parts. One copy of each round was selected and the remaining nine copies were used as the training set. The tenfold cr was completed in turn. In each round, HLGNN-MDA removed the positi the test set from the adjacency matrix. All five comparison algorithms com cross-validation on the miRNA-disease association matrix. There were s

Performance Analysis of HLGNN-MDA Mode
In this section, we compare HLGNN-MDA with other related methods using tenfold cross-validation [20]. The algorithms selected for comparison were BLHARMDA (bipartite local models and hubness-aware regression for miRNA-disease association prediction) [51], BNPMDA (bipartite network projection for miRNA-disease association prediction) [52], IMCMDA (inductive matrix completion for miRNA-disease association prediction) [52], LFEMDA (predict miRNA-disease associations by latent features extraction) [53] and MKR-MDA (multiple kernel learning-based Kronecker regularized least squares for miRNAdisease association prediction) [54]. All models for performance analysis experiments adopted the association data from HMDD v2.0. We also performed a performance comparison with the DGCNN model in the subsequent experiments of graph convolutional layer analysis, which is the baseline of our HLGNN-MDA convolutional module.
The miRNA functional similarity and disease semantic similarity were downloaded directly from IMCMDA. HLGNN-MDA used 5430 known miRNA-disease associations as positive samples and a random sample of the same number of candidate associations as negative samples. The dataset with the positive and negative samples together was then randomly divided into ten parts. One copy of each round was selected as the test set, and the remaining nine copies were used as the training set. The tenfold cross-validation was completed in turn. In each round, HLGNN-MDA removed the positive samples in the test set from the adjacency matrix. All five comparison algorithms completed tenfold cross-validation on the miRNA-disease association matrix. There were six evaluation metrics to analyze the model: ACC, precision, recall, AUROC, AUPR and MCC. The results are shown in Table 1, and their corresponding ROC curves are shown in Figure 4. The corresponding precision-recall (PR) curves are shown in Figure 5.         In Table 1, HLGNN-MDA-hopx represents the HLGNN-MDA model in which the enclosing subgraph's hop is x (x = 1, 2, 3 or 4). Hops were used to extract subgraphs. The extraction process is described in Section 3.2.1. Compared with other methods, HLGNN-MDA-hopx had high performance in five of the six indicators. The minimum AUROC of HLGNN-MDA-hopx was greater than those of BNPMDA, IMCMDA, LFEMDA and MKRMDA. The minimum value of the AUPR of HLGNN-MDA-hopx was greater than the results of all other algorithms.
From the ROC curve in Figure 4, HLGNN-MDA-hop4 could cover the curves of BNPMDA, IMCMDA and MKRMDA. The maximum AUROC was 0.25% larger than the AUROC of BLHARMDA. Similarly, in the PR curve of Figure 5, HLGNN-MDA-hop4 could cover the PR curves of BNPMDA, IMCMDA and MKRMDA, which were roughly the same as that of BLHARMDA. The maximum AUPR value of HLGNN-MDA-hop4 was 0.55% larger than the maximum value of BLHARMDA.
Compared with the other algorithms, HLGNN-MDA had a relatively large advantage. Moreover, HLGNN-MDA used less similarity information to obtain better prediction results.

Influence of Different Hops in the Enclosing Subgraph
In this section, we discuss how HLGNN-MDA was affected by different enclosing subgraph hops. The required experimental data were miRNA-disease associations. The positive samples represented the known associations, and a random sample of the same number of candidates represented the negative samples. The test set was one-tenth the size of the entire dataset.
First, we evaluated the results of the HLGNN-MDA model using different hops in the enclosed subgraph. Each model was trained with the same training set and evaluated on the same test set. The range of the number of hops in the enclosing subgraph was 1 to 4. Their corresponding results are shown in Table 2. The ROC, PR and accuracy curves under different thresholds are shown in Figures 6-8. MDA-hopx had high performance in five of the six indicators. The minimum AUROC of HLGNN-MDA-hopx was greater than those of BNPMDA, IMCMDA, LFEMDA and MKRMDA. The minimum value of the AUPR of HLGNN-MDA-hopx was greater than the results of all other algorithms. From the ROC curve in Figure 4, HLGNN-MDA-hop4 could cover the curves of BNPMDA, IMCMDA and MKRMDA. The maximum AUROC was 0.25% larger than the AUROC of BLHARMDA. Similarly, in the PR curve of Figure 5, HLGNN-MDA-hop4 could cover the PR curves of BNPMDA, IMCMDA and MKRMDA, which were roughly the same as that of BLHARMDA. The maximum AUPR value of HLGNN-MDA-hop4 was 0.55% larger than the maximum value of BLHARMDA.
Compared with the other algorithms, HLGNN-MDA had a relatively large advantage. Moreover, HLGNN-MDA used less similarity information to obtain better prediction results.

Influence of Different Hops in the Enclosing Subgraph
In this section, we discuss how HLGNN-MDA was affected by different enclosing subgraph hops. The required experimental data were miRNA-disease associations. The positive samples represented the known associations, and a random sample of the same number of candidates represented the negative samples. The test set was one-tenth the size of the entire dataset.
First, we evaluated the results of the HLGNN-MDA model using different hops in the enclosed subgraph. Each model was trained with the same training set and evaluated on the same test set. The range of the number of hops in the enclosing subgraph was 1 to 4. Their corresponding results are shown in Table 2     As shown in Table 2, HLGNN-MDA obtained the best value for all six metrics when the hops were 4. In Figures 6-8, it can be seen that the curves with less hops were always covered by curves with larger hops. In particular, it should be noted that the results of HLGNN-MDA-hop2 had a larger increase than the results of HLGNN-MDA-hop1. The results of HLGNN-MDA-hop4 had a larger increase than the results of HLGNN-MDA-hop3. There was only a slight increase between HLGNN-MDA-hop3 and HLGNN-MDA-hop2, as shown in the three figures. These findings show that using an even larger hop in enclosing subgraphs for miRNA-disease association prediction could obtain better results.

Analysis of the Improved Graph Convolutional Layer
To show that our improvement in the graph neural network was effective, in this section, we present the analysis of the improved graph convolutional layer. After considering the information transfer between homogeneous nodes and heterogeneous nodes, HLGNN-MDA aggregated four propagation functions in the graph convolutional layer:    As shown in Table 2, HLGNN-MDA obtained the best value for all six metrics when the hops were 4. In Figures 6-8, it can be seen that the curves with less hops were always covered by curves with larger hops. In particular, it should be noted that the results of HLGNN-MDA-hop2 had a larger increase than the results of HLGNN-MDA-hop1. The results of HLGNN-MDA-hop4 had a larger increase than the results of HLGNN-MDA-hop3. There was only a slight increase between HLGNN-MDA-hop3 and HLGNN-MDA-hop2, as shown in the three figures. These findings show that using an even larger hop in enclosing subgraphs for miRNA-disease association prediction could obtain better results.

Analysis of the Improved Graph Convolutional Layer
To show that our improvement in the graph neural network was effective, in this section, we present the analysis of the improved graph convolutional layer. After considering the information transfer between homogeneous nodes and heterogeneous nodes, HLGNN-MDA aggregated four propagation functions in the graph convolutional layer:  As shown in Table 2, HLGNN-MDA obtained the best value for all six metrics when the hops were 4. In Figures 6-8, it can be seen that the curves with less hops were always covered by curves with larger hops. In particular, it should be noted that the results of HLGNN-MDA-hop2 had a larger increase than the results of HLGNN-MDA-hop1. The results of HLGNN-MDA-hop4 had a larger increase than the results of HLGNN-MDA-hop3. There was only a slight increase between HLGNN-MDA-hop3 and HLGNN-MDA-hop2, as shown in the three figures. These findings show that using an even larger hop in enclosing subgraphs for miRNA-disease association prediction could obtain better results.

Analysis of the Improved Graph Convolutional Layer
To show that our improvement in the graph neural network was effective, in this section, we present the analysis of the improved graph convolutional layer. After considering the information transfer between homogeneous nodes and heterogeneous nodes, HLGNN-MDA aggregated four propagation functions in the graph convolutional layer: A, A 2 , D −1 A and D − 1 2 AD − 1 2 . Next, we explored the role of each propagation function in graph neural networks. If we deleted a certain propagation function from the current HLGNN-MDA but obtained a better result, it showed that the propagation function had a negative effect on the graph neural network. If the result of deleting a certain propagation function was worse, it had a positive effect on the graph neural network. Therefore, we discussed the role of the four propagation functions in turn in this way. The HLGNN-MDA model that removed propagation function A was marked as HLGNN-MDA-a. In this order, the HLGNN-MDA model deleting A 2 was marked as HLGNN-MDA-b; the model without D  Figure 9 were replaced with the AUPR and the ACC, a similar trend could be obtained. Next, we explored the role of each propagation function in graph neural networks. If we deleted a certain propagation function from the current HLGNN-MDA but obtained a better result, it showed that the propagation function had a negative effect on the graph neural network. If the result of deleting a certain propagation function was worse, it had a positive effect on the graph neural network. Therefore, we discussed the role of the four propagation functions in turn in this way. The HLGNN-MDA model that removed propagation function A was marked as HLGNN-MDA-a. In this order, the HLGNN-MDA model deleting 2 A was marked as HLGNN-MDA-b; the model without  Figure 9 were replaced with the AUPR and the ACC, a similar trend could be obtained. Furthermore, since the graph neural network [55,56] of HLGNN-MDA is improved with respect to DGCNN, in order to show the effectiveness of HLGNN-MDA model, we also compared the performance of four variants of HLGNN-MDA with DGCNN with different hops. From Table 3, it could be concluded that DGCNN and HLGNN-MDA-b performed similarly. HLGNN-MDA-b was slightly higher than DGCNN when the enclosing subgraph hops were 1, 2 and 3. Compared with other HLGNN-MDA variant models, HLGNN-MDA-b, i.e., the 2 A -deleted model, had the worst performance among all models. As the value of hops increased, the six measures of HLGNN-MDA-b also increased slowly, and when hop = 3, HLGNN-MDA-b reached a maximum. This finding was consistent with the results presented in the above figures; that is, 2 A played a large role in Furthermore, since the graph neural network [55,56] of HLGNN-MDA is improved with respect to DGCNN, in order to show the effectiveness of HLGNN-MDA model, we also compared the performance of four variants of HLGNN-MDA with DGCNN with different hops. From Table 3, it could be concluded that DGCNN and HLGNN-MDAb performed similarly. HLGNN-MDA-b was slightly higher than DGCNN when the enclosing subgraph hops were 1, 2 and 3. Compared with other HLGNN-MDA variant models, HLGNN-MDA-b, i.e., the A 2 -deleted model, had the worst performance among all models. As the value of hops increased, the six measures of HLGNN-MDA-b also increased slowly, and when hop = 3, HLGNN-MDA-b reached a maximum. This finding was consistent with the results presented in the above figures; that is, A 2 played a large role in the HLGNN-MDA model. Overall, HLGNN-MDA performed better than DGCNN in predicting miRNA-disease associations. In summary, the four propagation functions in HLGNN-MDA all played a positive role, thus leading HLGNN-MDA to achieve good predictive performance. Of these, A and A 2 had a stronger role in improving the predictive performance of the model, while D −1 A and D − 1 2 AD − 1 2 enabled the model to be stable in its results with different hops of the enclosing subgraph. The combined use of these four propagation functions allowed HLGNN-MDA to perform well in the task of predicting miRNA-disease associations.

Validation of Prediction Results
In this section, all the known correlations in HMDD v2.0 were the training set for the model, and as many potential associations as possible were sampled as negative samples. According to the above analysis, HLGNN-MDA-hop4 gave the best predictions, so training was performed on it. All potential association relationships in HMDD v2.0 were then extracted and predicted on the trained HLGNN-MDA-hop4.
The prediction results were validated using the following databases: HMDD v3.0 [57], dbDEMC [58] and miR2Disease [59]. One association was considered to be validated if it was found in at least one database.
The final validation results were as follows: 10 out of the top 10 predictions were validated; a total of 49 out of the top 50 predictions were verified; a total of 97 out of the top 100 predictions were verified; and 169 out of the top 180 predictions were verified. The results demonstrated the effectiveness of HLGNN-MDA and its ability to predict potential novel associations.

Breast Cancer
Breast cancer is one of the most dangerous malignancies to human health, especially for women. Globally, breast cancer accounts for 2.088 million new cases and 627,000 deaths per year, making it the number one malignancy in women [60]. The top 50 miRNAs predicted to be associated with breast cancer are listed in Table 4. In total, 49 of them were found in the relevant validation database. Only hsa-mir-362 (validation = no) does not present a record related to breast cancer in the three databases at present. Based on the literature validation [61], the hERG potassium channel, which enhances tumor aggressiveness and breast cancer proliferation, is transcriptionally regulated by hsa-miR-362-3p and thus associated with breast cancer growth. Another study [62] compared MDA-MB-231 and MCF7 breast cancer cell lines to the control CCD-1095Sk cell line, where hsa-miR-362-5p showed significant upregulation. The inhibition of hsa-miR-362-5p was found to significantly inhibit the diffusion, migration and invasion of MCF7 human breast cancer cells.

Hepatocellular Carcinoma
Primary liver cancer is the fifth most common cancer worldwide, mainly including hepatocellular carcinoma (HCC) [63,64]. A total of 49 of the top 50 miRNAs predicted to be related to hepatocellular carcinoma could be validated in three validation databases. The results are shown in Table 5. At present, no clear association between hsa-mir-495 and hepatocellular carcinoma could be found in these databases. However, a previous study [65] reported that hsa-mir-495 expression was frequently downregulated in hepatocellular carcinoma tissues and cell lines. Its expression levels were significantly correlated with tumor size, tumor lymph node metastasis (TNM) stage and lymph node metastasis in patients with hepatocellular carcinoma [65].

Renal Cell Carcinoma
Approximately 270,000 kidney cancer cases and 116,000 deaths are diagnosed annually worldwide [66]. Ninety percent of kidney cancers are tumors originating from the kidney epithelium and renal cell carcinoma [67]. The miRNAs predicted to be associated with the top 50 renal cell carcinomas are listed in Table 6. A total of 47 of the top 50 miRNAs could be validated clearly.

Data Resources
We collected human miRNA-disease associations from the HMDD v2.0 database [68] and obtained 5430 miRNA-disease associations between 495 miRNAs and 383 diseases. Therefore, many miRNA-disease associations were organized into an adjacency matrix Y ∈ N nm × nd , where nm and nd represent the number of miRNAs and the number of diseases, respectively. If an association between miRNA m i and disease d j was recorded in HMDD v2.0, then Y(i, j) equaled 1; otherwise, it equaled 0.

Extraction of the Enclosing Subgraph of Node Pair
The design of our HLGNN-MDA model is inspired to the SEAL (learning from subgraphs, embedding and attributes for link prediction) [69] framework, which uses graph neural networks for link prediction. SEAL proves that most high-order heuristics can be approximated by learning from local enclosing subgraphs. Therefore, the first step of HLGNN-MDA is to extract the enclosing subgraph of all pairs of nodes. The enclosing subgraph is composed of two nodes, which are marked as central nodes, and their surrounding nodes. Then, the h-hop enclosing subgraph of central nodes m and d consists of all nodes whose distance from node m or node d is less than h steps.
HLGNN-MDA inputs the enclosing subgraphs into an "end-to-end" graph neural network for training. The association information of the central node pair of the enclosing subgraph is used as the supervisory label of the graph neural network. In association matrix Y, miRNA-disease associations equal to 1 are considered known, while those equal to 0 are treated as potential. The training set extracts known associations as positive samples and randomly samples an equal number of potential associations as negative samples. Subsequently, the enclosing subgraphs of all samples in the training set are extracted. To prevent the leakage of supervised labels, the edges between the central node pairs and the positive samples is removed during the extraction of the enclosing subgraph. The enclosing subgraph is denoted with A.

Label Nodes
In this section, we label all the nodes in each enclosing subgraph. Node labeling is the process of assigning an integer to a node in an enclosing subgraph and can be defined as f l : V → N . Node labeling uses the DRNL (double-radius node labeling) method proposed in SEAL. First, we label the central two nodes with label 1. Other nodes with the same distance from the central two nodes are labeled with the same value. The farther the distance is, the greater the value is. The label of a node i can be derived from the following hash function: where d x and d y represent the distances from node i to central nodes x and y, respectively, with d = d x + d y ; and d/2 and d%2 indicate division and remainder, respectively. When a node is disconnected from the central nodes, it is labeled with 0.
In this way, we can label all the nodes in the enclosing subgraphs. Then, before being input into the graph neural network, the label of each node is expanded into one-hot encoding as its feature. This one-hot feature represents the position information of the node in the enclosing subgraph.

Construct Graph Neural Network
After extracting the enclosing subgraphs and labeling nodes in each enclosing subgraph, we can input the labeled enclosing subgraphs into the graph neural network for predicting miRNA-disease associations. At present, most of the proposed graph neural networks are applicable to homogeneous networks, whereas the miRNA-disease associations we use are in a bipartite graph network. Therefore, we improve the graph neural network DGCNN (deep graph convolutional neural network) to obtain better performance in heterogeneous networks.
Graph convolution layers. The role of the graph convolution layer is to learn the node representations [70,71]. A graph is input into the graph convolution layer through multilayer convolution, and the vector representation of each node can be extracted. The vector contains local substructural features of the graph. The process of graph convolution is the aggregation of feature information from the neighbors around each node.
Given the adjacency matrix of a graph A ∈ R n×n , its information matrix is X ∈ R n×d , where n indicates the number of nodes and d represents the dimension of the features. Graph convolution can be represented as follows: where W ∈ R d×c indicates the parameters to be trained, which converts the d-dimensional signal into c-dimensional signals; and f (A) represents the propagation function of adja- D are the degree matrices of ∼ A. Moreover, f (A) · X · W indicates that the feature vector of each node is aggregated in the manner of a propagation function and then undergoes information conversion. In addition, σ(·) is the activation function.
In the bipartite undirected network, miRNAs are only connected with diseases. That is, after a two-step jump, one disease node can only reach another disease, which is also true for miRNA nodes. Therefore, we use a propagation function for second-order topological information that allows information between homogeneous nodes to be aggregated together directly, i.e., defining a propagation function f (A) = A 2 .
With different propagation functions selected, the topological characteristics of the graph learned by the graph convolutional network are slightly different [72]. By splicing the neighbor information of nodes aggregated by different propagation functions, the graph convolutional layer can learn better graph topological features. The graph convolutional layer of our HLGNN-MDA model is shown in Figure 3 and can be represented by Formula (3): where Z 0 = X and Z t ∈ R n×d t represent the output of the t-th graph convolutional layer; d t is the output dimension of the t-th layer graph convolution; and [·] represents the splicing of row vectors, which splices the node vectors obtained through different propagation functions. The propagation functions used in our model are A, A 2 , D −1 A and D − 1 2 AD − 1 2 . After splicing, the topological features captured by these several propagation functions can be considered at the same time. W 4d t ×d t+1 is the parameter to be trained, which maps the spliced node features from the 4d t dimension to the d t+1 dimension. Dimension 4d t is used here because there are four propagation functions involved in the graph convolution process.
Enclosing subgraph A and its node features X go through t graph convolution layers to produce output Z t , t = 1, . . . , T. The overall graph convolution layer uses a global structure where the results of each layer are stitched together at the end, resulting in a result denoted as Z = [Z 1 , . . . , Z T ]. Each row of Z ∈ R n×∑ T 1 d t is the ∑ T 1 d t -dimensional embedding vector representation of a node that contains rich topological information about that node in that graph. Figure 2 shows the overall architecture of the graph convolutional layer.
Graph pooling layers. A graph convolutional layer is used to learn a latent vector representation for each node. Here, the graph pooling layer of HLGNN-MDA selects the k most important nodes from the nodes of the graph to represent the graph. The importance of the node is evaluated using the result of the graph convolutional layer. The last layer of graph convolution maps the result of the previous layer to 1 dimension. That is, the parameter, W T , of the last layer of graph convolution maps dimension 4d T−1 to 1 dimension. In this way, a value is obtained for each node, and this value indicates how important the node is in the graph.
We sort Z in descending order according to its last dimension. If two nodes appear to be equal in the last dimension of Z, we compare their penultimate dimension and so on, until the two nodes can be separated. The graph pooling layer takes the top k nodes in the ranking as its output, which helps subsequent conventional neural network layers to obtain a tensor with a fixed specification. When the number of nodes, n, is less than k, (n − k) zero vectors are added after the sorted nodes.
Convolution layers and fully connected layer. After the graph pooling layer, a tensor Z = k × ∑ T 1 d t is obtained. In the remaining layers, we first use the traditional onedimensional convolutional neural network combined with the max pooling layer to further refine the graph representation features and then make the final prediction using a fully connected layer.
To train a one-dimensional convolutional neural network on tensor Z, it first needs to be reshaped into a one-dimensional vector. To apply the filter for each node feature, the filter size and step length of the first one-dimensional convolutional neural network are set to ∑ T 1 d t ; that is, each node feature is convolved first. Then, after a max pooling layer, a one-dimensional convolutional neural layer is used to further learn the graph to represent the local features in the sequence features. Finally, it is connected to the fully connected layer. We use NLLLoss as the loss function, which adds up the predicted values of all predicted samples under the true labels. The predicted value here is a negative number in the logarithmic form of a normalized exponential function (softmax), mapping the prediction range from (0, 1) to (0, +∞). If all prediction samples are predicted correctly, NLLLoss is closer to 0. Finally, the fully connected layer outputs the probability of miRNA and disease node pair connections.

Evaluation Metrics
In order to verify the model performance, we choose the following metrics: AUROC (Receiver Operating Characteristic curve), ACC (accuracy), precision, recall, AUPR (area under the precision-recall curve) and MCC (Matthews correlation coefficient). The relevant definitions are as follows: TP (true positive), FP (false positive), TN (true negative) and FN (false negative) were all derived from the confusion matrix. The ROC curve has the FPR (false positive rate) and TPR (recall or true positive rate) as the horizontal and vertical coordinates, respectively, and the area under the ROC curve is the AUC value. The area under the curve with recall and precision as the horizontal and vertical coordinates is the AUPR value.

Conclusions
Understanding the relationship between miRNAs and disease has important implications for disease prevention, detection and treatment. This paper proposes the HLGNN-MDA method, which is a heuristic for learning miRNA-disease association prediction from known miRNA-disease relationships based on graph neural networks. HLGNN-MDA first extracts the enclosing subgraphs around each miRNA-disease pair to be predicted to obtain the local network structure. Each node in the enclosing subgraphs is then labeled. The labeled subgraphs are then input into the graph neural network for classification. In particular, second-order topological information is added to the convolutional layer of the graph neural network to enable it to learn information between similar nodes. Second, different combinations of propagation functions are designed to improve the accuracy and stability of the graph neural network. We compared the model with the same type of miRNA-disease association prediction model using tenfold cross-validation. The results showed that HLGNN-MDA was able to obtain better performance than most miRNAdisease association prediction models. After discussing the effect of hop count on extracting closed subgraphs, we successively evaluated each propagation function combined in the model. Finally, we used the trained HLGNN-MDA model to make predictions and performed case studies on breast cancer, hepatocellular carcinoma and renal cell carcinoma. A total of 49 of the top 50 predicted miRNAs for breast cancer could be found in the validation database. The remaining hsa-mir-362 was also found to be associated with breast cancer, as supported by the literature. Similarly, 49 of the top 50 miRNAs predicted for hepatocellular carcinoma could be validated against the database. The remaining hsa-mir-495 was also found to be related to hepatocellular carcinoma, as supported by the literature. Finally, 47 of the top 50 miRNAs associated with renal cell carcinoma could be validated.
Generally, HLGNN-MDA has the following advantages: First, HLGNN-MDA can select arbitrary links for prediction without having to predict all potential miRNA-disease associations in the adjacency matrix. In particular, when predicting individual diseases or miRNAs, HLGNN-MDA can directly obtain the corresponding results. Second, HLGNN-MDA does not strictly require the corresponding similarity data because it can learn information through the topology of the network.
However, there are still some aspects for improvement. Valid miRNA and disease signatures are also important for prediction. Therefore, adding valid miRNA and disease signatures to HLGNN-MDA should be further investigated. Second, HLGNN-MDA is an end-to-end supervised learning algorithm framework. One of the problems with miRNA-disease association prediction is that there is no definite negative sample. For the potential associations, only a small proportion of them are truly associated, and most are unassociated. In this paper, the prediction of miRNA-disease association is approximated as a supervised learning model [73] with insufficient samples. Therefore, a new linkage prediction heuristic represents a future researchable direction.