A Novel Approach Based on a Weighted Interactive Network to Predict Associations of MiRNAs and Diseases

Accumulating evidence progressively indicated that microRNAs (miRNAs) play a significant role in the pathogenesis of diseases through many experimental studies; therefore, developing powerful computational models to identify potential human miRNA–disease associations is vital for an understanding of the disease etiology and pathogenesis. In this paper, a weighted interactive network was firstly constructed by combining known miRNA–disease associations, as well as the integrated similarity between diseases and the integrated similarity between miRNAs. Then, a new computational method implementing the newly weighted interactive network was developed for discovering potential miRNA–disease associations (WINMDA) by integrating the T most similar neighbors and the shortest path algorithm. Simulation results show that WINMDA can achieve reliable area under the receiver operating characteristics (ROC) curve (AUC) results of 0.9183 ± 0.0007 in 5-fold cross-validation, 0.9200 ± 0.0004 in 10-fold cross-validation, 0.9243 in global leave-one-out cross-validation (LOOCV), and 0.8856 in local LOOCV. Furthermore, case studies of colon neoplasms, gastric neoplasms, and prostate neoplasms based on the Human microRNA Disease Database (HMDD) database were implemented, for which 94% (colon neoplasms), 96% (gastric neoplasms), and 96% (prostate neoplasms) of the top 50 predicting miRNAs were confirmed by recent experimental reports, which also demonstrates that WINMDA can effectively uncover potential miRNA–disease associations.


Introduction
Recently, increasing studies indicated that non-coding RNAs (ncRNAs) play an extensive and important role in many biological processes such as cell differentiation, ontogenesis, and disease development [1][2][3]. In particular, microRNAs (miRNAs), a class of small ncRNAs with a length of 20-25 nucleotides, was proven to be closely related to the occurrence of many diseases that are seriously harmful to human health [4,5]; they are able to regulate many functions of eukaryotic cells and affect various behaviors such as gene expression, cell-cycle regulation, and individual development [6]. For example, miR-126 was demonstrated to be associated with clear cell human renal cell carcinoma [7], while miR-34a-5p was proven to have a critical impact on ovarian cancer (OC) cell was adopted to construct a weighted interactive network in this paper. Moreover, on the basis of premises that functionally similar miRNAs may regulate similar diseases and similar diseases tend to associate with functionally similar miRNAs, a novel prediction model based on the newly constructed weighted interactive network was developed for miRNA-disease association inference (WINMDA). Comparing several state-of-the-art computational models, the strong point of WINMDA lies in the construction of a weighted interactive network and the introduction of the shortest path between nodes in the weighted interactive network. This also means that WINMDA proposes an idea for improving the sparseness of the adjacency matrix A and does not need negative samples to predict potential miRNA-disease associations simultaneously. All prediction results of potential miRNA-disease associations are shown in the Supplementary files 1-3; researchers could use these data to guide biological experiments in the future. In addition, the performance of WINMDA was evaluated by cross-validation and case studies of colon neoplasms, gastric neoplasms, and prostate neoplasms. Our simulation results show that WINMDA can achieve reliable area under the receiver operating characteristics (ROC) curve (AUC) results of 0.9183 ± 0.0007, 0.9200 ± 0.0004, 0.9243, and 0.8856 in terms of 5-fold cross-validation, 10-fold cross-validation, global leave-one-out cross-validation (LOOCV), and local LOOCV, respectively. Additionally, 94% (colon neoplasms), 96% (gastric neoplasms), and 96% (prostate neoplasms) of the top 50 predicting miRNAs were confirmed by dbDEMC [16], miR2Disease [18], and recently published experimental studies. These results also demonstrate that WINMDA can effectively predict potential miRNA-disease associations.

Results and Case Studies
In this section, we evaluated the predictive performance of WINMDA through the following experiments: we firstly compared WINMDA with four state-of-the-art methods, namely BNPMDA [34], PBMDA [29], WBSMDA [27], and RLSMDA [26] in the framework of LOOCV. Then, the process of five-fold cross-validation was repeated 50 times for our method to evaluate the prediction performance of WINMDA. Thirdly, the influence of given parameters T and w on the prediction model was analyzed. Moreover, several case studies were performed to validate the feasibility of our method. Finally, experimental results regarding the top 50 predicted associations between miRNAs and four important neoplasms were listed, and we implemented the performance comparisons between WINMDA and four state-of-the-art methods through observing the rankings of six important disease-related miRNAs in the case studies.

Comparison with Existing State-of-the-Art Methods
We evaluated the performance of WINMDA by observing it along with four state-of-the-art methods to predict the accuracy of potential miRNA-disease associations using global and local LOOCV. In the global LOOCV, each known disease-miRNA association was alternately used as a test sample, and other known miRNA-disease associations were considered as a training set, while all unknown disease-miRNA associations in HMDD were considered as a candidate set. However, in local LOOCV, for each given disease d, each known miRNA related to d was utilized as a test sample, while all other known miRNAs related to d were used as training samples, and all other unknown miRNAs related to d were considered as candidates. Hence, through comparing the scores obtained from the test sample with other candidate associations, we could evaluate how well this association was ranked in the candidate set; if the predicted ranking of the test sample was higher than the preset threshold, then the sample was successfully predicted by the computational model. In other words, let TP be true positive, TN be true negative, FN be false negative, and FP be false positive; then, under different threshold settings, the corresponding true positive rate (TPR; sensitivity) and false positive rate (FPR; specificity) can be obtained as follows: Here, sensitivity means that the percentage of the test samples with predicted ranks was higher than the given threshold, whereas specificity was computed as the percentage of negative samples with predicted ranks lower than the given threshold. Obviously, after obtaining different TPR and FPR pairs under different thresholds, the final ROC curve (in which, FPR is the horizontal axis of the coordinate system and TPR is the longitudinal axis of the coordinate system) could be plotted by connecting these pairs. Finally, the AUC could be obtained to represent the specific prediction performance of the computational models. Furthermore, it is obvious that the larger the AUC value is, the more likely the current classification algorithm is to place a positive sample in front of the negative sample, such that it can be better classified. Next, for the global LOOCV and local LOOCV, we compared WINMDA with four state-of-the-art computational methods, namely BNPMDA, PBMDA, WBSMDA, and RLSMDA; the simulation results are shown in Figures 1 and 2, respectively. From the two figures, it is easy to see that WINMDA can achieve reliable AUCs of 0.9243 and 0.8856 for global LOOCV and local LOOCV, respectively, when T = 16 and w = 0.6, which are much higher than the AUCs of 0.9169 and 0.8523 achieved by PBMDA, 0.9082 and 0.8571 achieved by BNPMDA, 0.8030 and 0.8390 achieved by WBSMDA, and 0.8426 and 0.7169 achieved by RLSMDA. It is obvious that our newly proposed method WINMDA is superior to these four traditional computational models in global and local LOOCV; therefore, it can be used as an important tool for discovering potential miRNA-disease associations. coordinate system and TPR is the longitudinal axis of the coordinate system) could be plotted by connecting these pairs. Finally, the AUC could be obtained to represent the specific prediction performance of the computational models. Furthermore, it is obvious that the larger the AUC value is, the more likely the current classification algorithm is to place a positive sample in front of the negative sample, such that it can be better classified. Next, for the global LOOCV and local LOOCV, we compared WINMDA with four state-of-the-art computational methods, namely BNPMDA, PBMDA, WBSMDA, and RLSMDA; the simulation results are shown in Figures 1 and 2, respectively. From the two figures, it is easy to see that WINMDA can achieve reliable AUCs of 0.9243 and 0.8856 for global LOOCV and local LOOCV, respectively, when T = 16 and w = 0.6, which are much higher than the AUCs of 0.9169 and 0.8523 achieved by PBMDA, 0.9082 and 0.8571 achieved by BNPMDA, 0.8030 and 0.8390 achieved by WBSMDA, and 0.8426 and 0.7169 achieved by RLSMDA. It is obvious that our newly proposed method WINMDA is superior to these four traditional computational models in global and local LOOCV; therefore, it can be used as an important tool for discovering potential miRNA-disease associations.
In addition, considering the potential bias of random sample partitioning for performance assessment, we divide the known miRNA-disease associations by 50 times, and the AUCs were obtained in the similar way as global LOOCV. As a result, WINMDA achieved the prediction performance with average AUCs of 0.9183 and 0.9200 with standard deviation of 0.0007 and 0.0004 when using the 5-Fold and 10-Flod cross validation (Table 1).

Evaluation of the Effects of Parameters
There are two kinds of important parameters existing in our newly proposed model of WINMDA, as illustrated in Equation (22), one is w and the other is T.

Effects of Parameter T
From Equations (20), (21), and (22), it is easy to know that parameter T will have important effects on the accuracy of our prediction model WINMDA. For instance, if the value of T is too large, then lots of noise data will be included, which will reduce the predictive performance of WINMDA. Alternatively, if the value of T is too small, then the useful associations may not be sufficient for accurate estimation of potential associations between some specific diseases and miRNAs. Hence, in order to evaluate the effects of parameter T, we set the value of T ranging from 1-20 during the implementation of WINMDA, and the simulation results are shown in Table 2. From Table 2, it is easy to see that the AUCs achieved by WINMDA varied with the different values of T. Specifically, the prediction performance of WINMDA increased upon increasing of the value of T, while T varied from 1 to 16, which indicates that the number of useful neighbors is positively related to the prediction performance of our model. Meanwhile, it is easy to find that the AUC will decline when T > 16, which indicates that an excess of noise data will markedly interfere with our prediction model WINMDA. Therefore, it was determined as best to set T to 16 for WINMDA in this paper. In this section, to investigate the effects of parameter w on the prediction performance of WINMDA, we set w to different values ranging from 0-1, while implementing WINMDA under LOOCV, and the results are shown in the Table 3. It is obvious that the variation of the value of w has an important influence on the performance of our prediction model WINMDA. Specifically, from Table 3, it is obvious that WINMDA can achieve the maximum AUC value when w is set to 0.6. Hence, we set w to 0.6 in this paper. Table 3. Effects of w on the prediction performance of WINMDA when T = 16. In addition, considering the potential bias of random sample partitioning for performance assessment, we divide the known miRNA-disease associations by 50 times, and the AUCs were obtained in the similar way as global LOOCV. As a result, WINMDA achieved the prediction performance with average AUCs of 0.9183 and 0.9200 with standard deviation of 0.0007 and 0.0004 when using the 5-Fold and 10-Flod cross validation (Table 1).

Evaluation of the Effects of Parameters
There are two kinds of important parameters existing in our newly proposed model of WINMDA, as illustrated in Equation (22), one is w and the other is T.

Effects of Parameter T
From Equations (20), (21), and (22), it is easy to know that parameter T will have important effects on the accuracy of our prediction model WINMDA. For instance, if the value of T is too large, then lots of noise data will be included, which will reduce the predictive performance of WINMDA. Alternatively, if the value of T is too small, then the useful associations may not be sufficient for accurate estimation of potential associations between some specific diseases and miRNAs. Hence, in order to evaluate the effects of parameter T, we set the value of T ranging from 1-20 during the implementation of WINMDA, and the simulation results are shown in Table 2. From Table 2, it is easy to see that the AUCs achieved by WINMDA varied with the different values of T. Specifically, the prediction performance of WINMDA increased upon increasing of the value of T, while T varied from 1 to 16, which indicates that the number of useful neighbors is positively related to the prediction performance of our model. Meanwhile, it is easy to find that the AUC will decline when T > 16, which indicates that an excess of noise data will markedly interfere with our prediction model WINMDA. Therefore, it was determined as best to set T to 16 for WINMDA in this paper. In this section, to investigate the effects of parameter w on the prediction performance of WINMDA, we set w to different values ranging from 0-1, while implementing WINMDA under LOOCV, and the results are shown in the Table 3. It is obvious that the variation of the value of w has an important influence on the performance of our prediction model WINMDA. Specifically, from Table 3, it is obvious that WINMDA can achieve the maximum AUC value when w is set to 0.6. Hence, we set w to 0.6 in this paper.

Case Studies
Recently, increasing evidence demonstrated that miRNAs play an extensive and important role in the physiological processes of the body [35]. In addition, in developed countries such as the United States and throughout Europe, cancer is the second leading cause of human death, while it ranks second or third in developing countries [36]. Therefore, in order to further evaluate the accuracy of WINMDA in predicting potential disease-miRNA associations, we chose three kinds of cancers, i.e., colon neoplasms, gastric neoplasms, and prostate neoplasms, as case studies for WINMDA, and the prediction results were verified by recently published experimental studies and two databases, namely miR2Disease and dbDEMC. During the simulation, for each kind of cancer, all known related miRNAs were considered as seed miRNAs, and the other miRNAs were considered as candidate miRNAs. In addition, all candidate miRNAs associated with colon neoplasms, gastric neoplasms, and prostate neoplasms were ranked in descending order according to our prediction results, as illustrated in Tables 4-6, respectively. Recently, colon cancer (colon neoplasms) ranks third among the most common female cancers and second among the most common male cancers in the world [37]. Each year, more than one million people died from colon cancer [38]. The incidence rates vary widely around the world, depending on lifestyle, environment, and heredity [39]. Recent studies reported that miRNAs are closely related to the diagnosis, prognosis, and chemo-sensitivity of colon cancer, which indicates that miRNAs can be used as a marker for the early diagnosis of colon cancer and as a guideline for various stages of colon cancer [40]. Hence, case studies on colon cancer-related miRNAs were implemented to further verify the predictive ability of WINMDA and, as a result, 10 of the top 10 and 47 of the top 50 candidate miRNAs were shown to be associated with colon neoplasms by miR2Disease, dbDEMC, and other known experimental studies (Table 4). For example, some researchers confirmed that the overexpression of miR-143 (ranked first in the WINMDA forecast list) reduces cell proliferation and migration of mutant KRAS HCT116 colon cancer cells [41]. Additionally, experimental studies also found that miR-20a is a member of the miR-17 miRNA family, which is part of the regulatory machinery that defines the pro-tumorigenic differentiation of stromal fibroblasts. In stromal fibroblasts, miR-20a (ranked second in the WINMDA forecast list) can modulate chemokine C-X-C ligand 8 (CXCL8) function, thereby influencing tumor latency [42]. Moreover, some researchers found that the rs35301225 polymorphism in miR-34a (ranked third in the WINMDA forecast list) is involved in the development of human colon cancer via downregulation of tumor-promoting gene E2F1 as a tumor suppressor, and the C/A single-nucleotide polymorphism of miR-34a promotes colon cancer cell proliferation via upregulating E2F1 [43].
In recent years, it was reported that gastric cancer (gastric neoplasms) is one of the most common malignant tumors of the digestive tract in the world, and Japan, South Korea, and China are high-risk areas for gastric cancer [44]. Therefore, it is necessary to explore the mechanism of miRNA in the development of gastric cancer and provide a basis for the early diagnosis of gastric cancer. We used potential gastric cancer-associated miRNAs as a case study to further illustrate the predictive power of WINMDA in this section. As a result, 10 of the top 10 and 48 of the top 50 potential gastric cancer-related miRNAs were validated by miR2Disease, dbDEMC, and other known experimental studies (Table 5). For example, gene UHRF1 plays a significant role in the development of gastric cancer. Furthermore, Zhou et al. identified and verified miR-146b (ranked first in our prediction list) and miR-146a as direct upstream regulators of UHRF1 in gastric cancer metastasis. [45]. In addition, according to the target genes of miR-143 (ranked seventh in our prediction list), IGF1R and BCL2, which are related to cisplatin resistance, we can regulate the resistance of human gastric cancer cells to cisplatin via differential expression of IGF1R and BCL2 in gastric cancer tissues and cell lines [46].
Prostate cancer (prostate neoplasms) is the third most common type of cancer. In 2012, the incidence rate of prostate cancer in the neoplasm registration area in China was 99.2%, which ranked sixth in the incidence of male malignant tumors [47]. However, early patients with prostate tumors have only subtle symptoms that make it difficult to detect cancer at an early stage [48]. Increasing studies confirmed that some miRNAs are related to prostate neoplasms. Therefore, case studies about prostate cancer-related miRNAs were implemented to further verify the predictive ability of WINMDA in this section. As a result, nine of the top 10 and 48 of the top 50 predicted prostate cancer-related miRNAs were validated by miR2Disease, dbDEMC, and other known experimental studies (Table 6). For example, Chu et al. selected single-nucleotide polymorphisms (SNPs) in the 1000 bp upstream from the transcription start site of hsa-miR-143 (ranked first in our prediction list) precursor in the dbSNP database with the condition that MAF > 0.05 in the Chinese population and finally identified that rs4705342 T > C was associated with the risk of prostate cancer, and that the C allele had a protective effect [49]. Wang et al. explored the effects of miR-182 (ranked second in our prediction list) on the growth, migration, and apoptosis of prostate cancer cells using qRT-PCR analysis. Moreover, they found that miR-182 plays an important role in prostate cancer, which enhances HIF1α signaling by targeting PHD2 and FIH1 in prostate cancer [50]. Furthermore, Taddei et al. confirmed that hsa-miR-210 (ranked fourth in our prediction list) overexpression increased senescence-associated features in young fibroblasts and converted them into cancer-associated fibroblast-like cells. These senescent fibroblasts can induce epithelial-mesenchymal transition in prostate cancer cells, support tumor angiogenesis, and recruit endothelial precursor cells, thus contributing to cancer progression [51].
In order to further illustrate the high efficiency of our method, we compared the performances of PBMDA, WBSMDA, LRLSMDA, and our model WINMDA by counting the top 50 disease-associated miRNAs in the predicted results and observing how many disease-related miRNAs were identified by miR2Disease, dbDEMC, and recent biological experimental studies (Table 7) in the case studies of six important diseases. As a result, from Table 7, it is easy to see that WINMDA is more effective than other methods in general. In addition, as a global computational model, WINMDA can not only achieve reliable prediction performances, but also simultaneously predict all potential associations between the diseases and miRNAs in HMDD, which means that potential associations with high predicted values obtained by WINMDA can be used preferentially for biological experiment verification and public release. Hence, we may easily reach a conclusion that our newly proposed model WINMDA is of great value in predicting potential miRNA-disease associations.

Discussion
Increasing studies based on biological experiments indicated that miRNAs are closely related to the occurrence of many diseases that are seriously harmful to human health, and the identification of potential miRNA-disease associations can not only play an important role in the diagnosis, treatment, and prevention of disease, but also effectively addresses the high cost and long-term shortcomings of traditional biological experiments. In this article, we developed a novel prediction model called WINMDA to predict potential relationships between miRNAs and diseases based on premises that functionally similar miRNAs may regulate similar diseases and similar diseases tend to associate with functionally similar miRNAs. In WINMDA, we firstly integrated disease semantic similarity, Gaussian interaction profile kernel similarity, and miRNA function similarity, and then constructed a weighted interactive network for potential miRNA-disease prediction. The important difference between WINMDA and previous state-of-the-art prediction models is that the problem of limited known miRNA-disease associations was considered in WINMDA and the shortest paths in the weighted interactive network were adopted to solve the problem. Moreover, we evaluated the predictive performance of WINMDA through LOOCV (including global LOOCV and local LOOCV), k-fold cross-validation, and several case studies. Experimental results show that WINMDA can effectively uncover potential disease-miRNA candidates, which means that it can be used as a reliable and accurate calculation model for finding potential miRNA-disease associations.
Although WINMDA achieved effective performance in predicting candidate relationships between miRNAs and diseases, there are still some existing limitations that can be improved in the future. Firstly, the parameters T and w play important roles in WINMDA, and the selection of suitable values for T and w are critical problems that shall be addressed in future studies. Secondly, the assigned weight may not be accurate enough, as it was on the basis of premises that functionally similar miRNAs may regulate similar diseases and similar diseases tend to associate with functionally similar miRNAs. Finally, a weighted interactive network was constructed in WINMDA based on the disease similarity, miRNA similarity, fig:ijms-405460-f003and known miRNA-disease associations. The performance of WINMDA will be further improved considering more databases storing other information about diseases, miRNAs, and miRNA-disease associations.

Construction of the miRNA-Disease Interactive Network
In order to construct the miRNA-disease interactive network, we firstly downloaded known miRNA-disease associations from the HMDD database on 14 July 2018. After eliminating duplicate values, erroneous data, and disorganized data, human miRNA-disease associations were downloaded from the HMDD database, which includes 5430 experimentally validated human miRNA-disease associations involving 495 miRNAs and 383 diseases.

Calculation of the Disease Semantic Similarity
For any two diseases d i and d j that belong to S D , the semantic similarity between d i and d j was calculated according to the following steps: Step 1: Firstly, we collected the Medical Subject Headings (MeSH) descriptors of d i and d j from the National Library of Medicine (http://www.nlm.nih.gov/).
Step 2: Secondly, we constructed direct acyclic graphs (DAGs) corresponding to d i and d j separately and, as illustrated in the Figure 3, for any given disease H, its DAG can be represented as DAG(H) = (N(H), E(H)), where N(H) is the node set and E(H) is the edge set. Moreover, in DAG(H), each node corresponds to a different disease MeSH descriptor, and all the MeSH descriptors are connected by a direct edge from a more general term (called a parent node) to a more specific term (called a child node). Furthermore, N(H) consists of the node H itself and its ancestor nodes; E (H) consists of corresponding direct edges from a parent node to a child node, and each edge in E(H) represents the relationship between these two nodes connected by it. Step 3: Thirdly, based on the newly constructed DAG(H), let d be an ancestor node of H in DAG(H); then, we defined the contribution of an ancestor node d to the semantic value of the disease H and the contribution of the semantic value of disease H itself as follows: Here, the parameter is a semantic contribution attenuation factor with a value between zero and one, and its value was set to 0.5 in this paper according to previous state-of-the-art methods [52,53]. The parameter is the number of addresses or codes included in the node * , which indicates the weight of the contribution of disease * for H in DAG(H).
Obviously, according to Equation (2), it is easy to know that an ancestor node d with a larger number of child nodes in DAG(H) will make a more significant contribution to the semantic value of H. For instance, in DAG(BN) of Figure 3, the entry on "central nervous system neoplasms" includes two addresses or codes: C04.588.614.250 and C10.551.240; however, the entry on "brain diseases" includes only one code: C10.228.140. Thus, the contribution of "central nervous system neoplasms" to the semantic value of the "brain neoplasms" is 2 × α × 1, while the contribution of "brain diseases" to the semantic value of the "brain neoplasms" is 1 × α × 1 only.
Step 4: Next, based on Equation (2), we calculated the sematic value of disease H by accumulating the contributions of all disease terms to H in DAG(H) as follows: For example, according to Equation (3), in DAG(BN) of Figure 3, the semantic value of the disease "brain neoplasms" can be obtained by "the contributions of 'brain neoplasms' to it" (= 1 × 1) + "the contributions of 'central nervous system neoplasms' to it" (= 2 × 0.5) + "the contributions of Step 3: Thirdly, based on the newly constructed DAG(H), let d be an ancestor node of H in DAG(H); then, we defined the contribution of an ancestor node d to the semantic value of the disease H and the contribution of the semantic value of disease H itself as follows: Here, the parameter α is a semantic contribution attenuation factor with a value between zero and one, and its value was set to 0.5 in this paper according to previous state-of-the-art methods [52,53]. The parameter β is the number of addresses or codes included in the node d * , which indicates the weight of the contribution of disease d * for H in DAG(H).
Obviously, according to Equation (2), it is easy to know that an ancestor node d with a larger number of child nodes in DAG(H) will make a more significant contribution to the semantic value of H. For instance, in DAG(BN) of Figure 3, the entry on "central nervous system neoplasms" includes two addresses or codes: C04.588.614.250 and C10.551.240; however, the entry on "brain diseases" includes only one code: C10.228.140. Thus, the contribution of "central nervous system neoplasms" to the semantic value of the "brain neoplasms" is 2 × α × 1, while the contribution of "brain diseases" to the semantic value of the "brain neoplasms" is 1 × α × 1 only.
Step 5: Finally, we defined the semantic similarity between d i and d j as follows: Additionally, for any two diseases d a and d b that belong to S D , if d a and d b do not have semantic similarity, we define SDD(d a , d b ) = −1; then, based on Equation (4), it is obvious that we can obtain a D × D dimensional disease semantic similarity matrix SDD (i, j).

Calculation of the miRNA Functional Similarity
Considering that, in the HMDD database, one miRNA may be associated with multiple disease items and vice versa, and, according to the state-of-the-art literature [39], the functional similarity can be obtained by integrating the semantic similarity of the two groups of diseases associated with these two miRNAs, then, for any two diseases m i and m j that belong to S M , the functional similarity between m i and m j can be calculated according to the following steps: Step 1: Firstly, let dx be any given disease, and Dgroup= {dy 1 , dy 2 , dy 3 . . . . dy r } be a set consisting of r different diseases, and then the semantic similarity between dx and Dgroup can be calculated as follows: Step 2: Secondly, let Dgroup i and Dgroup j be the sets of diseases associated with m i and m j , respectively; supposing that there are N and M different diseases in Dgroup i and Dgroup j , then we calculated the functional similarity between m i and m j as follows:

Disease Gaussian Interaction Profile Kernel Similarity Measurement
On the basis of premises that functionally similar miRNAs may regulate similar diseases and similar diseases tend to associate with functionally similar miRNAs, let DLP(d i ) represent the i-th row in the matrix DMM; then, for any two diseases d i and d j that belong to S D , we can calculate the Gaussian interaction profile kernel similarity between them as follows: Additionally, based on previous work [54], we can further improve the disease Gaussian interaction profile kernel similarity using a logistic function as follows:

MicroRNA Gaussian Interaction Profile Kernel Similarity Measurement
On the basis of premises that functionally similar miRNAs may regulate similar diseases and similar diseases tend to associate with functionally similar miRNAs, let MLP(m i ) represent the i-th column in the matrix DMM; then, for any two miRNAs m i and m j that belong to S M , we can calculate the Gaussian interaction profile kernel similarity between them as follows:

Calculation of the Integrated Similarity
Based on Equations (6) and (10), the disease integrated similarity matrix FDD can be calculated based on the disease semantic similarity matrix (SDD) and the disease Gaussian interaction profile kernel similarity matrix (FDGS) as follows: Similarly, based on Equations (8) and (11), the miRNA integrated similarity matrix FMM can be calculated based on the miRNA functional similarity matrix (SMM) and the miRNA Gaussian interaction profile kernel similarity matrix (FMGS) as follows:

Construction of the Weighted Interactive Network
For any given miRNA m i ∈ S M , we define the miRNA m x ∈ S M as the most related miRNA to m i , if m x satisfies the following: Thereafter, as illustrated in Figure 4, we can construct the weighted interactive network according to the following four steps:

Construction of the Weighted Interactive Network
For any given miRNA mi ∈ SM, we define the miRNA mx ∈ SM as the most related miRNA to mi, if mx satisfies the following: Thereafter, as illustrated in Figure 4, we can construct the weighted interactive network according to the following four steps: Step1: Firstly, for any given disease di ∈ SD, we define the miRNA mj as a potential miRNA to di if and only if mj satisfies DMM(i,j) = 0; otherwise, we define the miRNA mj as a known miRNA to di. Hence, according to premises that functionally similar miRNAs may regulate similar diseases and similar diseases tend to associate with functionally similar miRNAs, it is reasonable to assume that the miRNA mj is related to di if mj is a potential miRNA to di, mx is a most related miRNA to mj, and mx is also a known miRNA to di at the same time. Thereafter, based on this assumption, for any given disease di ∈ SD and any given miRNA mj ∈ SM, we can define the weight between di and mj as follows: Step 2: Secondly, according to Equation (10), for any two given diseases di and dj that belong to SD, we define the weight between di and dj as follows: From Equation (14), it is easy to know that the higher the semantic similarity between di and dj is, the smaller the weight between di and dj will be.
Step 3: Similarly, according to Equation (11), for any two given miRNAs mi and mj that belong to SM, we define the weight between mi and mj as follows: Step1: Firstly, for any given disease d i ∈ S D , we define the miRNA m j as a potential miRNA to d i if and only if m j satisfies DMM(i,j) = 0; otherwise, we define the miRNA m j as a known miRNA to d i . Hence, according to premises that functionally similar miRNAs may regulate similar diseases and similar diseases tend to associate with functionally similar miRNAs, it is reasonable to assume that the miRNA m j is related to d i if m j is a potential miRNA to d i , m x is a most related miRNA to m j , and m x is also a known miRNA to d i at the same time. Thereafter, based on this assumption, for any given disease d i ∈ S D and any given miRNA m j ∈ S M , we can define the weight between d i and m j as follows: Step 2: Secondly, according to Equation (10), for any two given diseases d i and d j that belong to S D , we define the weight between d i and d j as follows: From Equation (14), it is easy to know that the higher the semantic similarity between d i and d j is, the smaller the weight between d i and d j will be.
Step 3: Similarly, according to Equation (11), for any two given miRNAs m i and m j that belong to S M , we define the weight between m i and m j as follows: From Equation (15), it is also easy to know that the higher the functionally similarity between m i and m j is, the smaller the weight between m i and m j will be.
Thereafter, based on above three steps, for i ∈ [1,D + M] and j ∈ [1,D + M], a weighted miRNA-disease interactive network can finally be constructed as follows:

Calculation of the Shortest Path Based on the Weighted Interactive Network
For any two given nodes A and B in the weighted interactive network G, supposing that there is a path P consisting of n hops such as P 0 (=A), P 1 , P 2 , . . . ,P n (=B) from A to B in G, then we define the weights of path P as ∑ n−1 i=0 GFW(i, i + 1). In addition, among all the paths from A to B in G, a path from A to B with smallest weights is called the shortest path from A to B. Thereafter, it is reasonable to assume that, for any two given nodes A and B in the weighted interactive network G, the smaller the weight of the shortest path from A to B is, the more related to each other the nodes A and B will be. Thus, based on this assumption, for any two given nodes A and B in the weighted interactive network G, we can design an algorithm for searching the shortest path from A to B in G as follows: Step 1: Initially, we define that S = {V 0 } is a set consisting of an arbitrary node V 0 in G, T is a set consisting of all nodes in G other than V 0 , and DD is a matrix defined as follows: where i ∈ [1,D+M] and j ∈ [1,D+M].
Step 2: Next, we select a node V k from T randomly, if V k satisfies that V k / ∈ S and the distance from V k to S is smaller than the distance from any other node other than V k in T to S. Here, we define the distance from a node x in G to a node set V in G as the smallest value of the distances between x and all nodes in V.
Step 4: After repeating step 2 and step 3 until all nodes in G are included in S, then it is obvious that we can transfer the matrix DD to a (D + M) × (D + M) dimensional shortest path matrix (SPM).

Calculation of the Shortest Path Based on the Weighted Interactive Network
Considering the fact that known miRNA-disease associations are very sparse, for a specific disease d i and a specific miRNA m j , as illustrated in Figure 5, in this section, we adopt the concept of T most similar neighbors to estimate the association between d i and m j according to the following steps: where w is a weight coefficient with a value from zero to one.

Conclusions
In this article, the effective predictive performance of WINMDA was mainly due to several reasons. Firstly, the sematic disease similarity, functional miRNA similarity, and Gaussian interaction profile kernel similarity were integrated. Secondly, we proposed a new method for calculating the semantic similarity of diseases. Thirdly, we constructed a weighted interactive network-based disease similarity, miRNA similarity, and known miRNA-disease associations. Fourthly, the concept of T most similar neighbours was introduced. Finally, an algorithm for searching the shortest path in the weighted interactive network was introduced. Furthermore, in future work, multiple Step 1: Firstly, for the disease d i , let DK i = {d i1 , d i2 , d i3 . . . ., d iT } be a set consisting of the first T nodes in S D after sorting the nodes in S D by the disease integrated similarity between them with d i in descending order, and, for the miRNA m j , let MK j = {m j1 , m j2 , m j3 . . . ., m jT } be a set consisting of the first T nodes in S M after sorting the nodes in S M by the miRNA integrated similarity between them with m j in descending order.
Step 2: Secondly, according to premises that functionally similar miRNAs may regulate similar diseases and similar diseases tend to associate with functionally similar miRNAs, we calculate the association between d i and MK j and the association between m j and DK i as follows: Step 3: In order to optimize the prediction results, by integrating the above two associations and the matrix SPM, we can obtain our final prediction results as follows: where w is a weight coefficient with a value from zero to one.

Conclusions
In this article, the effective predictive performance of WINMDA was mainly due to several reasons. Firstly, the sematic disease similarity, functional miRNA similarity, and Gaussian interaction profile kernel similarity were integrated. Secondly, we proposed a new method for calculating the semantic similarity of diseases. Thirdly, we constructed a weighted interactive network-based disease similarity, miRNA similarity, and known miRNA-disease associations. Fourthly, the concept of T most similar neighbours was introduced. Finally, an algorithm for searching the shortest path in the weighted interactive network was introduced. Furthermore, in future work, multiple heterogeneous biological data can be collected and pre-processed to be utilized in the weighted interactive network, thus improving the performance of prediction algorithms.