A Semi-Supervised Learning Algorithm for Predicting Four Types MiRNA-Disease Associations by Mutual Information in a Heterogeneous Network

Increasing evidence suggests that dysregulation of microRNAs (miRNAs) may lead to a variety of diseases. Therefore, identifying disease-related miRNAs is a crucial problem. Currently, many computational approaches have been proposed to predict binary miRNA-disease associations. In this study, in order to predict underlying miRNA-disease association types, a semi-supervised model called the network-based label propagation algorithm is proposed to infer multiple types of miRNA-disease associations (NLPMMDA) by mutual information derived from the heterogeneous network. The NLPMMDA method integrates disease semantic similarity, miRNA functional similarity, and Gaussian interaction profile kernel similarity information of miRNAs and diseases to construct a heterogeneous network. NLPMMDA is a semi-supervised model which does not require verified negative samples. Leave-one-out cross validation (LOOCV) was implemented for four known types of miRNA-disease associations and demonstrated the reliable performance of our method. Moreover, case studies of lung cancer and breast cancer confirmed effective performance of NLPMMDA to predict novel miRNA-disease associations and their association types.


Introduction
MicroRNAs (miRNAs) are small endogenous non-coding RNAs that mainly regulate gene expression at the post-transcriptional level [1][2][3]. They are evolutionarily conserved and play a regulatory role by base pairing with messenger RNAs (mRNAs), resulting in mRNA degradation or translation inhibition [2,4,5]. Increasing evidence suggests that miRNAs are involved in a variety of critical biological processes, such as development, differentiation, apoptosis and metabolism [2]. Since the discovery of lin-4 and let-7 [6,7], many researchers have focused on the study of miRNAs, and numerous miRNAs have been identified. Furthermore, a great deal of databases have been established to provide information on miRNAs, such as the Human microRNA Disease Database (HMDD) [8], miR2Disease [9], database of Differentially Expressed miRNAs in human Cancers (dbDEMC) [10] and so on. It has been demonstrated that dysregulation of miRNAs may lead to a variety of diseases [11][12][13]. For example, miR-21 can target the MAP2K3 gene directly during the carcinogenesis of hepatocellular carcinoma, resulting in expression inhibition of MAP2K3 [14]. This also indicates that miRNAs can serve as efficient biomarkers for disease detection, diagnosis and prognosis [15]. Therefore, identifying disease-related miRNAs is a crucial problem.
During the past few decades, various disease-related miRNAs have been identified by several experimental methods. However, with the increasing of new miRNAs and other biological information, the hypothesis that distributional semantics can reveal information of relationships between miRNAs and diseases, Pasquier and Gardès [26] proposed a vector space model to discover new disease-miRNA associations. In this method, distributional information of miRNAs and diseases is represented in a high-dimensional vector space, which contains miRNA-disease associations, miRNA-related target mRNAs, family information of miRNAs, and genomic location information of miRNAs and abstracts of associated studies. By reducing the dimensionality of this high-dimensional vector space to fewer dimensions, they calculated the cosine distance of two vectors to measure their correlations. This method makes full use of miRNA-related information and achieves a satisfactory performance.
All of the above computational methods have identified various novel miRNA-disease interactions, but the specific types of interaction have not been predicted. Thus, mechanisms underlying the miRNA-disease associations still cannot be fully understood. In recent years, investigating the role of miRNAs in pathogenesis of human diseases has become one of the hottest topics [8], especially for entries from circulating miRNAs, epigenetics, miRNA-target interactions and genetics, whose number recorded in the HMDD has increased remarkably. As for interactions between miRNAs and targeted genes, for example, miRTar [27], an integrated system, identifies miRNA-target interactions in various scenarios and analyzes miRNA-targeted genes in pathways. In order to improve the accuracy of miRNA-gene target interaction identification, Pio et al. [28] presented a semi-supervised ensemble-based classifier that combines the prediction scores returned by several base algorithms to infer miRNA-targeted genes. They also predicted miRNA regulatory networks by a bi-clustering algorithm, which analyzes miRNA-target interactions to obtain inference results. The predicted miRNA-target interactions and miRNA regulatory network are stored in Co-clustered miRNA Regulatory Networks (ComiRNet) database. All of these researchers have shown that the miRNA regulatory network is complicated. Therefore, Chen et al. [29] developed a method to predict multiple types of miRNA-disease associations by a restricted Boltzmann machine (RBM) model. They constructed RBMs for miRNAs based on the data derived from HMDD v2.0, which included four types of miRNA-disease associations. Based on a contrastive divergence (CD) algorithm, they trained the constructed RBMs by initially setting the visible layer and hidden layer to obtain parameters of the RBM model. Finally, novel disease-related miRNAs and their types of interaction can be predicted by the trained RBM model. Although this method builds on the first model to predict multiple types of miRNA-disease associations, it only takes advantage of the data of known four types of miRNA-disease associations and ignores the relationships of disease-disease pairs and miRNA-miRNA pairs. Besides, RBM is a deep learning model and its training is time-consuming. In addition, the interaction prediction method among other types of biological entities can provide constructive suggestions for us in miRNA-disease interaction inference. By comparing over thirty network inference methods, Marbach et al. [30] observed that community-based methods can result in a powerful and robust performance for gene regulatory network reconstruction across different gold standards datasets. Therefore, Ceci et al. [31] proposed a semi-supervised method to deal with the problem of gene network reconstruction based on a multi-view learning framework. After assigning labels and identifying the multiple views, the method builds a classifier for each view, and then combines the output results of views to obtain final results. By applying a clustering algorithm, such as principle components analysis (PCA) or k-means, the views can be automatically identified by the system. This algorithm resolves the low quality and small quantity problem of known gene-gene interaction data and combines advantages of existing methods to achieve a good performance.
In this paper, a semi-supervised model called network-based label propagation method for inferring multiple types of miRNA-disease associations (NLPMMDA) is proposed by mutual information derived from the heterogeneous network. Label propagation is an efficient algorithm which can make full use of the information of labeled and unlabeled data and has been used in many studies [32][33][34][35]. A key of the NLPMMDA method is to construct a heterogeneous network. Firstly, a disease similarity homo-network is established by disease semantic similarity and Gaussian interaction profile kernel similarity. Secondly, a miRNA similarity homo-network is constructed Genes 2018, 9, 139 4 of 16 in a similar way, which combines miRNA functional similarity and Gaussian interaction profile kernel similarity. Thirdly, a multi-type miRNA-disease association hetero-network is established by validating four types of miRNA-disease associations. Then NLPMMDA performs label propagation in each homo-network. The homo-networks are used to capture cluster structure among diseases and miRNAs, and the hetero-network is used to capture mutual information of miRNA and disease pairs. Finally, final label scores of miRNA-disease pairs under four types can be calculated by propagating information on the heterogeneous network. The results of LOOCV and case studies demonstrated the reliable performance of NLPMMDA.

Data Preparation
In this paper, four types of human miRNA-disease association data were retrieved from HMDD [36]. In the updated database, human miRNA-disease data were annotated in four types, including entries from miRNA-target interactions, circulation samples, epigenetics and genetics [8]. After mapping the different miRNA precursors to mature miRNAs, the repeating miRNA-disease associations were removed. Finally, 682 miRNA-disease association data were obtained from miRNA-target interactions, 443 entries from circulations, 199 entries from epigenetics and 356 entries from genetics. All of these 1680 miRNA-disease associations are involved in 324 miRNAs and 171 diseases. These four types of miRNA-disease associations were used to construct a multi-type miRNA-disease association hetero-network which can offer the mutual interaction information. Besides, these four types of miRNA-disease associations are used as the gold standard dataset to evaluate the performance of our algorithm.

Construct Disease Similarity Homo-Network
The relationship of diseases can be represented by a directed acyclic graph (DAG) according to the disease classification system in the Medical Subject Headings (MeSH) database, in which nodes represent diseases and links represent the relationship of two diseases. For instance, a DAG of a disease d i can be represented as DAG(d i ) = (d i , V(d i ), E(d i )), where V(d i ) represents the vertices set of all ancestor diseases of d i and disease d i itself, and E(d i ) represents the edges set of corresponding links. According to the algorithm proposed in [18], semantic similarity value SS of d i and d j can be calculated by: where D d i (d) is the contribution of disease d to the semantic value of disease d i , the contribution of disease d i itself to its own semantic value is defined as 1 and the contribution of other diseases is defined as max ∆ * D d i (d ) d ∈ children node of d . Here, ∆ is the semantic contribution factor to distinguish the different semantic contribution values of disease d in different layers of DAG(d i ); Gaussian interaction profile kernel similarity for diseases can be calculated by Gaussian kernel [37]. The miRNA interaction profile of a disease d i is defined as DIP(d i ), which is a binary vector to represent whether the disease d i interacts with every miRNA in the multi-type miRNA-disease association hetero-network. Thus, the Gaussian interaction profile kernel similarity GS d of disease d i and disease d j is defined as: where γ d is a parameter used to control the kernel bandwidth, which is set as 1/(∑  By integrating the disease sematic similarity matrix and Gaussian interaction profile kernel similarity matrix for diseases, disease similarity matrix S d of disease similarity homo-network can be obtained as Equation (3).
In the disease similarity homo-network, the transition probability matrix is defined as: where D d is a diagonal matrix and D d (i, i) = ∑ j∈N d S d (i, j), and N d is the neighboring nodes set of disease d.

Construction of the miRNA Similarity Homo-Network
Similar to the construction of the disease similarity homo-network, the miRNA similarity homo-network is constructed based on miRNA functional similarity and Gaussian interaction profile kernel similarity. MiRNA functional similarity was calculated in a previous study [18]. The miRNA functional similarity value of miRNA m i and m j can be represented by MFS(m i , m j ). In order to reveal associations of miRNAs and diseases under different types, MFS(m i , m j ) is simply extended to multiple types of miRNA functional similarity matrix MMFS(m i , m j , k), it is defined as: where k is the specific type, n k is the total number of types. The Gaussian interaction profile kernel similarity matrix for miRNAs can be calculated by: where MIP k (m i ) is a binary vector which can represent relationships of miRNA m i and the whole diseases under type k. γ m,k is a parameter used to control the kernel bandwidth which is set as Here, n m is the number of miRNAs.
The integrated miRNA similarity homo-network is constructed: In the miRNA similarity homo-network, the transition probability matrix is defined as: where D m,k is a diagonal matrix and D m,k (i, i) = ∑ j,k∈N m S m,k (i, j, k), and N m is the neighboring nodes set of the miRNA m in miRNA homo-network.

Construction of the Multi-Type miRNA-Disease Association Hetero-Network
The multi-type miRNA-disease association hetero-network shows the relationships between miRNAs and diseases extracted from HMDD, including four types of human miRNA-disease association data. Figure 1 shows an example of the heterogeneous network, which contains four diseases and five miRNAs. The edges of multi-type miRNA-disease association hetero-network are created by four known types of miRNA-disease associations, and there are four edges between a disease and a miRNA at most. The edge vector E ij = {e k } is used to represent the edges between Genes 2018, 9, 139 6 of 16 disease d i and miRNA m j , where e k = 1 if d i and m j has an association of type k, and e k = 0 otherwise. For example, if there are three association types between d 3 and m 2 , then the edge vector is E 32 = [1, 1, 1, 0]. Based on the edge vectors, the adjacency matrix of multi-type miRNA-disease association hetero-network can be created. If disease d i and miRNA m j have confirmed associations, then A(d i , m j ) = E ij , where i = 1, . . . , n d , j = 1, . . . , n m , n d and n m are the number of diseases and miRNAs, respectively.
Then, transition probability of miRNAs and diseases in hetero-network can be calculated by: where D d,m,k is a diagonal matrix and D d, miRNAs and diseases extracted from HMDD, including four types of human miRNA-disease association data. Figure 1 shows an example of the heterogeneous network, which contains four diseases and five miRNAs. The edges of multi-type miRNA-disease association hetero-network are created by four known types of miRNA-disease associations, and there are four edges between a disease and a miRNA at most.
n , d n and m n are the number of diseases and miRNAs, respectively. Then, transition probability of miRNAs and diseases in hetero-network can be calculated by: is a diagonal matrix and

Network-Based Label Propagation Algorithm for Predicting Multiple miRNA-Disease Associations
Label propagation is a semi-supervised method. Its main purpose is to predict the labels of unlabeled data from both labeled and unlabeled data. A regularization framework for performing label propagation algorithm for a single network has been introduced and its convergence has been proved [35]. In this paper, label propagation is extended on a single network to our heterogeneous network, which is motivated by literature [38], and NLPMMDA is presented. Figure 2 shows the procedures of the NLPMMDA algorithm. The NLPMMDA method takes full advantage of mutual information in the heterogeneous network. Based on this method, novel disease-related miRNAs and the specific association types can be predicted.

Network-Based Label Propagation Algorithm for Predicting Multiple miRNA-Disease Associations
Label propagation is a semi-supervised method. Its main purpose is to predict the labels of unlabeled data from both labeled and unlabeled data. A regularization framework for performing label propagation algorithm for a single network has been introduced and its convergence has been proved [35]. In this paper, label propagation is extended on a single network to our heterogeneous network, which is motivated by literature [38], and NLPMMDA is presented. Figure 2 shows the procedures of the NLPMMDA algorithm. The NLPMMDA method takes full advantage of mutual information in the heterogeneous network. Based on this method, novel disease-related miRNAs and the specific association types can be predicted.
The NLPMMDA algorithm can be described in detail as follows: Step 1. Obtaining four types of miRNA-disease association data from HMDD and carrying out a data cleaning process.
Step 2. According to Sections 2.2-2.4, the heterogeneous network is constructed. In this study, the heterogeneous network G = (V, E) is composed of the disease similarity homo-network  The NLPMMDA algorithm can be described in detail as follows: Step 1. Obtaining four types of miRNA-disease association data from HMDD and carrying out a data cleaning process.
Step 2. According to Section 2.2, 2.3 and 2.4, the heterogeneous network is constructed. In this study, the heterogeneous network Step 3. Performing network-based label propagation algorithm on the disease similarity homo-network. For a given query disease, the final label vector can be obtained by iteratively implementing Equation (10).
where d P is the transition probability matrix calculated by Equation (4); can be obtained by Equation (11).  Step 3. Performing network-based label propagation algorithm on the disease similarity homo-network. For a given query disease, the final label vector can be obtained by iteratively implementing Equation (10).
where P d is the transition probability matrix calculated by Equation (4); f t−1 d is a current label vector of diseases in which the ith element provides a current label score of disease d i at time t − 1; f t d is the final label vector of diseases; f 0 d is the initial label vector of disease nodes, and it can be obtained by Equation (11).
where l 0 d is the current label vector of diseases which is derived from miRNA-disease interaction hetero-network; f m is the current label vector of miRNA nodes, λ d is a diffusion parameter of disease similarity homo-network which specifies the relative amount of information from its neighbors and its initial label; P d,m,k is the transition probability matrix calculated by Equation (9). Finally, f t d converged to its limit f d when f t d − f t−1 d < σ, where σ is a threshold to control terminate iteration.
Step 4. Performing network-based label propagation algorithm on the miRNA similarity homo-network to obtain the final label vector according to Equation (12). where P m,k is the transition probability matrix calculated by Equation (8); f t−1 m,k is the current label vector of miRNAs at time t − 1; f t m,k is the final label vector of miRNAs; f 0 m,k is the initial label vector of miRNAs in four types, which is calculated by Equation (13).
where l 0 m is the current label vector of miRNAs in which the jth element represents the current label score of miRNA m j under type k; f d is the current label vector of diseases; λ m is a diffusion parameter of miRNA similarity homo-network. Similarly, the condition of convergence is f t m,k − f t−1 m,k < σ, where σ is a threshold to control terminate iteration.
Step 5. Sequentially implementing network-based label propagation in the disease similarity homo-network and miRNA similarity homo-network to update the final label vector f m and f d until both homo-networks converge. The condition of convergence is the same as mentioned above. Finally, for a given miRNA-disease pair, its final confidence label score in four types can be obtained. By ranking the label score in the final label vector, the top miRNAs are as considered as the most probable disease-related miRNAs and their type is considered as the most probable type.

Performance Evaluation
In this study, to evaluate the performance of NLPMMDA, a LOOCV was implemented on four known and experimentally verified types of human miRNA-disease associations. Each known miRNA-disease association was left out in turn, and the remaining miRNA-disease associations were used as the labeled set. Then, the NLPMMDA method was implemented and the predictive scores of four types for each known miRNA-disease association were obtained. In addition, a receiver-operating characteristic (ROC) curve was drawn, which plots the true positive rate (TPR) versus the false positive rate (FPR) at different thresholds. The corresponding area under the ROC curve (AUC) was calculated to evaluate the predictive performance of the NLPMMDA method, where AUC = 1 means perfect performance and AUC = 0.5 means random performance. The ROC curve is typically used in binary classification problems to demonstrate the performance of a classifier. If a dataset only has positive and unlabeled samples, the ROC curve and AUC can be obtained by the ranked result of test samples. For example, in LOOCV, the test sample is ranked by the prediction scores of candidate miRNAs without confirmed association with currently investigated disease. In this paper, because the dataset can be divided into four classes, the output is operated by binarization and an ROC curve for each type is drawn. Finally, by considering each element of predictive scores as a binary prediction, the micro-average ROC curve was obtained. As can be seen in Figure 3, NLPMMDA obtained a reliable micro-average AUC value of 0.9739. The AUC value of four types of miRNA-disease associations is 0.9396, 0.9822, 0.9957 and 0.9813, respectively; type 1 represents entries from miRNA-target interactions, type 2 represents entries from circulation samples, type 3 represents entries from epigenetics and type 4 represents entries from genetics.
Besides, considering the limited number of known miRNA-disease associations, the area under the precision-recall (AUPR) curve is applied to further evaluate the performance of NLPMMDA. The precision-recall (PR) curve plots the relationship between precision and recall at different thresholds, where high precision is related to a low false positive rate, and high recall is related to a low false negative rate. Generally, an AUPR value closer to 1 means the performance is better. As shown in Figure 4, the micro-average AUPR value of NLPMMDA is 0.9323, and the AUPR value for every type is 0.9441, 0.9371, 0.9625 and 0.9225, respectively. Genes 2018, 9, x FOR PEER REVIEW 9 of 16 respectively; type 1 represents entries from miRNA-target interactions, type 2 represents entries from circulation samples, type 3 represents entries from epigenetics and type 4 represents entries from genetics. Besides, considering the limited number of known miRNA-disease associations, the area under the precision-recall (AUPR) curve is applied to further evaluate the performance of NLPMMDA. The precision-recall (PR) curve plots the relationship between precision and recall at different thresholds, where high precision is related to a low false positive rate, and high recall is related to a low false negative rate. Generally, an AUPR value closer to 1 means the performance is better. As shown in Figure 4, the micro-average AUPR value of NLPMMDA is 0.9323, and the AUPR value for every type is 0.9441, 0.9371, 0.9625 and 0.9225, respectively.   Besides, considering the limited number of known miRNA-disease associations, the area under the precision-recall (AUPR) curve is applied to further evaluate the performance of NLPMMDA. The precision-recall (PR) curve plots the relationship between precision and recall at different thresholds, where high precision is related to a low false positive rate, and high recall is related to a low false negative rate. Generally, an AUPR value closer to 1 means the performance is better. As shown in Figure 4, the micro-average AUPR value of NLPMMDA is 0.9323, and the AUPR value for every type is 0.9441, 0.9371, 0.9625 and 0.9225, respectively.

Comparison with the Restricted Boltzmann Machine Model for Predicting Multiple Types of miRNA-Disease Associations Method
As far as we know, the restricted Boltzmann machine model for predicting multiple types of miRNA-disease associations (RBMMMDA) [29] is the first method to predict multiple types of miRNA-disease associations. It only makes use of known multiple types of miRNA-disease association data, and the AUC score of LOOCV is 0.8606. However, our method, NLPMMDA, integrates the information of disease semantic similarity, Gaussian interaction profile kernel similarity for diseases, miRNA functionally similarity, Gaussian interaction profile kernel similarity for miRNAs and the known four types of miRNA-disease associations, obtaining a better performance. The micro-average AUC value of NLPMMDA is 0.9739. Considering the complex structure of the RBM model, it is difficult to combine the disease similarity information and miRNA similarity information in the RBM model. The performances of RBMMMDA and NLPMMDA can be seen in Table 1. In addition, the RBM model has various parameters and parameter selection problem is not solved well, thus the parameters of the RBM model are simply a used experience value. Parameters of the NLPMMDA method are selected by the performance of the experiment. Besides, training of the RBM model takes a long time. However, NLPMMDA is a semi-supervised method, and the execution time is short.

Effect of the Parameters
There are two parameters λ d and λ m in the NLPMMDA algorithm. λ d is a diffusion parameter of disease similarity homo-network, which adjusts the relative amount of information from its initial label to its neighbors. λ m is a diffusion parameter of miRNA similarity homo-network. In this paper, λ d and λ m are set to the same value. By selecting different λ d and λ m values (varying from 0.1 to 0.9 with scale 0.1), LOOCV is implemented to obtain the AUC score of the NLPMMDA method. The LOOCV results are shown in Table 2. As a result, the AUC value is almost equal in the range of 0.1 ≤ λ d ≤ 0.4 and 0.1 ≤ λ m ≤ 0.4, and AUC value is decreased in the range of 0.6 ≤ λ d ≤ 0.9 and 0.6 ≤ λ m ≤ 0.9. However, our predictive method has no predictive ability when λ d and λ m are equal to 0.5, which is a result of the approach of initialization in homo-networks. Therefore, in this study, λ d = 0.2 and λ m = 0.2 are selected to predict novel miRNA-disease association types by the NLPMMDA algorithm.
The optimal values of parameters depend on the known miRNA-disease association dataset.

Case Studies of Lung Cancer and Breast Cancer
To further confirm the robustness of the NLPMMDA method, case studies of lung cancer and breast cancer were implemented to evaluate the ability of the NLPMMDA method for predicting multi-types of miRNA-disease associations. All known miRNA-disease associations under four types were assigned as labeled data, and unknown miRNA-disease pairs were used as unlabeled data. Then, based on labeled and unlabeled data, NLPMMDA can predict miRNA-disease relationships and their specific types. Prediction results were manually verified by online databases and recent literature. The top 50 potential miRNA-disease association types of lung cancer and breast cancer are listed in Tables 3 and 4, respectively, including disease-related miRNAs, miRNA-disease association types and evidences related to miRNA-disease pairs. The evidence is the PubMed Unique Identifier (PMID) of related literature. Due to the complexity of diseases and the associated miRNA roles, a predicted association type supported by three PubMed articles at least can be considered as a reliable association type.
The morbidity and mortality of lung cancer is high in both men and women, and lung cancer is the most common cause of cancer death worldwide [39]. Although various new therapeutics and strategies for detection and early diagnosis have progressed in lung cancer, its prognosis remains poor [40]. Recent studies demonstrated the important role of miRNAs in development and therapy response of lung cancer. In the labeled data, there are 52 miRNA-disease associations, which are classified as the miRNA-target type [41,42], circulating miRNA type [43,44], epigenetics type [45] and genetics type [46,47]. After implementing the NLPMMDA method on labeled and unlabeled data, scores of miRNA-disease pairs are predicted. As a result, among top 20 and top 50 candidates without relevance of known association types, 17 and 44 lung cancer-related miRNAs and their association types are supported by different evidence, respectively, and 25 predicted results are considered as reliable association types. As shown in Table 3, in the top 50 potential lung cancer-related miRNAs, miR-133a plays a tumor suppressor role in non-small cell lung cancer (NSCLC) by targeting IGF-1R, TGFBR1 and EGFR [48]. Also, in NSCLC, miR-143 targets ATG2B and miR-34a targets TGFβR2 to inhibit cell proliferation [49,50]; Besides, serum miR-126 and miR-21 levels can be used as novel biomarkers in non-small cell lung cancer development, metastasis and screening [51,52], and circulating miR-29a shows a highly prognostic signature in non-squamous NSCLC patients [53]. The single nucleotide polymorphisms rs2910164 of miR-146a are associated with the risk of NSCLC in the Chinese population, which can be regarded as the genetics type [54]. Based on annual statistical data, breast cancer is one of the most common types of cancer which mainly occurs in women [55]. Current studies demonstrated related death rates of breast cancer are still on the rise [56]. Besides, accumulating evidence shows that miRNAs play a vital role in breast cancer and can be used as diagnosis and therapeutic biomarkers for breast cancer patients. In our labeled data, there are 176 known miRNAs-disease associations which can be divided into four types according to evidence from literature. For example, serum miR-155 is up-regulated in breast cancer patients; thus, serum miR-155 is a potential biomarker to track breast cancer [57,58]. According to HMDD, the association between miR-155 and breast cancer is labeled as the circulation type [8]. The candidate miRNAs without known breast cancer-related miRNAs and their association types are predictive by the NLPMMDA method. Among the top 20 and top 50 potential miRNAs, 17 and 37 miRNA-disease association types are confirmed by biological evidence, respectively, and 16 predicted results are considered as reliable association types. Table 4 shows the details. Hsa-miR-1 is a breast cancer-related miRNA in the HMDD database. However, their underlying association type is not clear. In our predictive result, the relationship between hsa-miR-1 and breast cancer is target type, which can be proved by various evidence. For example, as described in the result of Liu et al. [59], hsa-miR-1 can function as a tumor suppressor in breast cancer by targeting K-RAS and MALAT1. Also, IMPDH1 and NPEPL1 genes are identified as direct targets of miR-19a in breast cancer by a quantitative proteomic strategy [60]; miR-19b can promote metastasis of breast cancer by targeting MYLIP and its related cell adhesion molecules [61]; and miR-133a acts as a tumor suppressor in breast cancer by targeting EGFR [62]. Moreover, the plasma level of circulating miR-146a is involved in breast cancer biology and tumor progression [63]. In primary human breast cancer, hsa-miR-9 is affected by epigenetic inactivation because of aberrant hypermethylation [64]. In conclusion, 44 and 37 out of the top 50 predictive lung cancer-related and breast cancer-related miRNAs and their specific association types are confirmed by experimental evidence, respectively.
The results of case studies demonstrated the robustness of NLPMMDA method.

Web Server for Network-Based Label Propagation Algorithm to Predicting Multiple miRNA-Disease Association Method
In this study, a web server was built to show the prediction results of the NLPMMDA method, which is freely available at http://39.107.230.144/NLPMMDA. The web server enables the function of predicting four types of miRNA-disease associations based on the NLPMMDA algorithm. The final prediction result for a specific disease will be shown in a table, and the rank, miRNA name, association type and potential association probability will be included. The tables contain known verified related miRNAs and types for a disease, whose value of potential association probability is 1.0.

Discussion
Increasing evidence indicates the prominent role of miRNAs in the development of various diseases. Understanding the underlying mechanisms of miRNAs in diseases is becoming an urgent problem worldwide. In this study, a network-based label propagation algorithm is proposed to infer specific types of miRNA-disease associations, which integrated four types of known human miRNA-disease associations derived from HMDD. The NLPMMDA method constructed a heterogeneous network, in which a disease similarity homo-network is constructed by integrating disease sematic similarity information with Gaussian interaction profile kernel similarity information, and miRNA similarity homo-network is constructed by integrating miRNA functional similarity information with Gaussian interaction profile kernel similarity information. Besides, a multi-type miRNA-disease interaction hetero-network is constructed by four types of known miRNA-disease association data. In addition, the traditional label propagation algorithm is extended to the heterogeneous network and the strategy of label initialization is changed in the NLPMMDA method. The LOOCV result, case studies of lung cancer, and breast cancer demonstrate the reliable performance of the NLPMMDA method.
Compared with current computational methods which can predict multiple type miRNA-disease associations, the NLPMMDA method achieves a better performance because of several factors. Firstly, the network-based label propagation algorithm is a semi-supervised machine learning model. As we all know, one of the current difficulties of predictive models is the selection of negative samples. NLPMMDA does not require verified negative miRNA-disease associations. Secondly, transition probability among diseases and miRNAs under four types are calculated in the NLPMMDA method, which can capture the similarity information from neighboring nodes in homo-networks and improve the predictive function of the computational model. Thirdly, construction of heterogeneous network could offer mutual information between the miRNA similarity homo-network and disease homo-network. The label values of nodes in the homo-networks are initialized by their initial labels and neighbors from other homo-networks, which makes label confidence score more reliable. Although NLPMMDA exhibited highly reliable results, it still has some limitations. Transition probability scores among four types are simply calculated by miRNA functional similarity and Gaussian interaction profile kernel similarity, which may result in offset error. In addition, the NLPMMDA method is not applicable to diseases without any known associations of miRNAs. The different combination of diffusion parameters in homo-networks may improve the performance of the NLPMMDA method, which can be further studied in the future.