LE-MDCAP: A Computational Model to Prioritize Causal miRNA-Disease Associations

MicroRNAs (miRNAs) are associated with various complex human diseases and some miRNAs can be directly involved in the mechanisms of disease. Identifying disease-causative miRNAs can provide novel insight in disease pathogenesis from a miRNA perspective and facilitate disease treatment. To date, various computational models have been developed to predict general miRNA-disease associations, but few models are available to further prioritize causal miRNA-disease associations from non-causal associations. Therefore, in this study, we constructed a Levenshtein-Distance-Enhanced miRNA-disease Causal Association Predictor (LE-MDCAP), to predict potential causal miRNA-disease associations. Specifically, Levenshtein distance matrixes covering the sequence, expression and functional miRNA similarities were introduced to enhance the previous Gaussian interaction profile kernel-based similarity matrix. LE-MDCAP integrated miRNA similarity matrices, disease semantic similarity matrix and known causal miRNA-disease associations to make predictions. For regular causal vs. non-disease association discrimination task, LF-MDCAP achieved area under the receiver operating characteristic curve (AUROC) of 0.911 and 0.906 in 10-fold cross-validation and independent test, respectively. More importantly, LE-MDCAP prominently outperformed the previous MDCAP model in distinguishing causal versus non-causal miRNA-disease associations (AUROC 0.820 vs. 0.695). Case studies performed on diabetic retinopathy and hsa-mir-361 also validated the accuracy of our model. In summary, LE-MDCAP could be useful for screening causal miRNA-disease associations from general miRNA-disease associations.


Introduction
MicroRNAs (miRNAs) are a class of endogenous small RNAs of~20 nucleotides in length that have various regulatory roles within cells. MiRNAs suppress target mRNA expression at the post-transcriptional level by binding to the 3 untranslated regions (3 -UTRs) [1,2]. Accumulating evidence has demonstrated that miRNAs are involved in diverse biological processes, such as cell proliferation, differentiation, death and signal transduction [2][3][4]. Accordingly, more and more miRNAs have been confirmed to be associated with the onset and development of complex diseases [5]. For instance, miR-1 is dysregulated in multiple common heart disease [6,7], miR-355 and miR-31 are connected with the inhibition of breast cancer [8,9] and the loss of miR-206 accelerates amyotrophic lateral sclerosis (ALS) progression [10]. Therefore, the effective identification of miRNAdisease associations, especially miRNAs directly involved in disease mechanisms, is critical for promoting the treatment of complex human diseases.
With the growing body of research on the associations between miRNAs and diseases, 35,547 miRNA-disease association entries from a wide range of experimental evidence were gathered in the latest version of HMDD (v3.2, released in March 2019) [11]. Based on the 2 of 13 type of experimental evidence, miRNA-disease associations can be sorted into causal associations (i.e., miRNAs that can be directly involved in disease mechanisms) and non-causal associations (i.e., miRNAs that exhibit differential expression but no known evidence of direct involvement in disease mechanisms) [12,13]. The causal miRNA-disease associations play a pivotal role in gaining insight into the molecular and cellular mechanisms of a disease and in identifying target miRNAs for further intervention. In the latest HMDD v3.2 database, Gao et al. [12] annotated causal associations by conducting a manual review of the literature. Specifically, in the "target" category of miRNA-disease associations, we selected the associations in which miRNAs target disease causal genes; meanwhile, in the "genetics" category of miRNA-disease associations, we selected the associations in which the genetic perturbation (knockdown/overexpression) of miRNAs could lead to altered disease phenotypes. Moreover, the associations in which miRNAs could enhance drug effects but have no contributions to diseases were excluded. Further manual confirmation by at least two investigators was performed, and, finally, 4294 miRNA-disease associations were labeled as causal associations. This sizable and biologically validated dataset enabled better investigations of general or even causal miRNA-disease associations by computational methods. Given the costly and time-consuming nature of traditional experimental methods, more and more researchers are using computational prediction models to effectively explore the relationship between miRNA and disease [14,15], such as label propagation algorithms used by MCLPMDA [16], LPNLS [17] and SNMDA [18]; machine-learning classification algorithms adopted by EGBMMDA [19]; and latent feature extraction with positive samples taken by LFEMDA [20], among others.
Although vast models have been designed to predict general disease-related miRNAs, our previous benchmark study has shown that most of these models could not distinguish causal miRNA-disease associations from non-causal associations (with AUROC < 0.55 in the task for causal/non-causal association discrimination) [12]. In other words, there is still an urgent demand for a new model specifically designed for prioritizing causal miRNAdisease association. To this end, Gao et al. [13] first proposed the MiRNA-disease Causal Association Predictor (MDCAP) based on the label propagation algorithm for predicting potential causal miRNA-disease associations. MDCAP showed reliable performance (AU-COR > 0.9) in distinguishing between causal miRNA-disease associations and unrelated miRNA-disease pairs. However, as for the discrimination between causal and non-causal miRNA-disease associations, the performance of MDCAP is greatly reduced.
In this study, we developed a novel prediction model named LE-MDCAP (Levenshtein-Distance-Enhanced MiRNA-disease Causal Association Predictor), based on Levenshtein distance and matrix decomposition algorithm, to prioritize potential causal miRNA-disease associations. The key improvement of LE-MDCAP is that it could specifically discriminating between causal and non-causal miRNA-disease associations, facilitating more precise identification of potential disease-causative miRNAs. To demonstrate the effectiveness of our proposed approach, we performed 10-fold cross-validation and independent tests to comprehensively measure our model performance. Moreover, the case study further validated the model reliability by comparing the prediction results with the latest experimental evidence that has not been considered in the HMDD v3.2 datasets.

Overview of LE-MDCAP and Overall Performance Evaluation
In this work, we developed LE-MDCAP to predict potential causal miRNA-disease associations. Figure 1 shows the workflow of LE-MDCAP. First of all, we obtained a causal miRNA-disease association matrix with causal association data from HMDD v3.2. Second, we integrated multiple sources of information to represent miRNA similarity, including sequence similarity, expression level similarity and target pathway similarity, all calculated in the form of Levenshtein distances, in addition to Gaussian similarity. For diseases, the semantic similarity matrix was constructed based on the known disease relationships in MeSH. Finally, the matrix decomposition algorithm was introduced to establish the model based on each source of the miRNA similarity matrix, and the prediction scores from each model were integrated by using the weighted summing approach as the final prediction results of LE-MDCAP. miRNA-disease association matrix with causal association data from HMDD v3.2. Second, we integrated multiple sources of information to represent miRNA similarity, including sequence similarity, expression level similarity and target pathway similarity, all calculated in the form of Levenshtein distances, in addition to Gaussian similarity. For diseases, the semantic similarity matrix was constructed based on the known disease relationships in MeSH. Finally, the matrix decomposition algorithm was introduced to establish the model based on each source of the miRNA similarity matrix, and the prediction scores from each model were integrated by using the weighted summing approach as the final prediction results of LE-MDCAP. To evaluate the overall performance of model LE-MDCAP, we performed 10-fold cross-validation and independent testing based on known causal miRNA-disease associations in HMDD v3.2. One-fifth of the known causal associations were randomly selected as the independent testing set, and the remaining four-fifths were used as the training set. Similarly, in each round of 10-fold cross-validation, the causal associations were divided into a training set and a test set in proportion. To avoid leakage of test data, the prediction results of the model were solely calculated based on the training set. Then, we plotted receiver operating characteristic (ROC) curves by calculating the true positive rate (TPR) and false positive rate (FPR) at different thresholds and calculated the area under the ROC curve (AU-ROC). The closer the AUROC value is to 1, the better the predictive effect of the prediction model. As shown in Figure 2, LE-MDCAP obtained AUROC values of 0.911 in the 10-fold cross-validation on the training dataset, and a comparable performance of AUROC = 0.906 was achieved in the independent testing. These results demonstrated the better performance of our method in predicting potentially causal miRNA-disease associations. To evaluate the overall performance of model LE-MDCAP, we performed 10-fold crossvalidation and independent testing based on known causal miRNA-disease associations in HMDD v3.2. One-fifth of the known causal associations were randomly selected as the independent testing set, and the remaining four-fifths were used as the training set. Similarly, in each round of 10-fold cross-validation, the causal associations were divided into a training set and a test set in proportion. To avoid leakage of test data, the prediction results of the model were solely calculated based on the training set. Then, we plotted receiver operating characteristic (ROC) curves by calculating the true positive rate (TPR) and false positive rate (FPR) at different thresholds and calculated the area under the ROC curve (AUROC). The closer the AUROC value is to 1, the better the predictive effect of the prediction model. As shown in Figure 2, LE-MDCAP obtained AUROC values of 0.911 in the 10-fold cross-validation on the training dataset, and a comparable performance of AUROC = 0.906 was achieved in the independent testing. These results demonstrated the better performance of our method in predicting potentially causal miRNA-disease associations. Because such causal associations versus non-associations can also be well distinguished by the previous MDCAP model, we here focused on the reliability of the model in distinguishing causal and non-causal miRNA-disease associations, where performance of the previous MDCAP model was largely compromised. To this end, we divided all miRNA-disease pairs in the dataset into three groups as causal miRNA-disease associations (causal), non-causal miRNA-disease associations (non-causal) and unassociated miRNA-disease pairs (non-disease). We chose for these causal miRNA-disease associations to be the positive samples and non-causal miRNA-disease associations to be the negative samples for method evaluation. It is noteworthy that, to justify the method comparison, the prediction results from the above-described model were directly used on this dataset, and no model retraining was conducted during this evaluation. Indeed, the previous MDCAP model did not distinguish the causal versus non-causal group with a satisfactory accuracy (AUROC = 0.695). By contrast, as shown in Figure 3a, LE-MDCAP shows a unique advantage in discriminating causal and non-causal miRNA-disease associations (AUROC = 0.820). We also assessed the statistical significance of the difference in the prediction scores between three miRNA-disease groups (i.e., causal, non-causal and non-disease) by the Wilcoxon rank sum test. Figure 3b,c shows an increasing tendency from the prediction scores in the non-disease group to those in the causal group. Moreover, there is a significant difference between the causal and non-causal prediction scores in the LE-MDCAP prediction scores (p = 1.00 × 10 −100 ), and this difference is more pronounced than in the MDCAP (p = 2.22 × 10 −72 ). Together, the stepped distribution of prediction scores between the three groups suggests that LE-MDCAP can identify causal associations not only from a large number of unassociated miRNA-disease pairs, but also further efficiently from non-causal associations. Because such causal associations versus non-associations can also be well distinguished by the previous MDCAP model, we here focused on the reliability of the model in distinguishing causal and non-causal miRNA-disease associations, where performance of the previous MDCAP model was largely compromised. To this end, we divided all miRNAdisease pairs in the dataset into three groups as causal miRNA-disease associations (causal), non-causal miRNA-disease associations (non-causal) and unassociated miRNA-disease pairs (non-disease). We chose for these causal miRNA-disease associations to be the positive samples and non-causal miRNA-disease associations to be the negative samples for method evaluation. It is noteworthy that, to justify the method comparison, the prediction results from the above-described model were directly used on this dataset, and no model retraining was conducted during this evaluation. Indeed, the previous MDCAP model did not distinguish the causal versus non-causal group with a satisfactory accuracy (AUROC = 0.695). By contrast, as shown in Figure 3a, LE-MDCAP shows a unique advantage in discriminating causal and non-causal miRNA-disease associations (AUROC = 0.820). We also assessed the statistical significance of the difference in the prediction scores between three miRNA-disease groups (i.e., causal, non-causal and non-disease) by the Wilcoxon rank sum test. Figure 3b,c shows an increasing tendency from the prediction scores in the non-disease group to those in the causal group. Moreover, there is a significant difference between the causal and non-causal prediction scores in the LE-MDCAP prediction scores (p = 1.00 × 10 −100 ), and this difference is more pronounced than in the MDCAP (p = 2.22 × 10 −72 ). Together, the stepped distribution of prediction scores between the three groups suggests that LE-MDCAP can identify causal associations not only from a large number of unassociated miRNA-disease pairs, but also further efficiently from non-causal associations.
We have also tried to include a new feature in the model based on the target gene relationship of miRNAs with disease genes in order to further improve the predictive performance of LE-MDCAP. While its performance in identifying causal and non-causal miRNA disease associations did not improve, the addition of the new target gene features improved the model's AUROC from 0.901 to 0.907 in distinguishing causal miRNA disease associations from unrelated miRNA disease pairs (Supplementary Materials Figure S1). Since the improvement is still marginal, we did not include target gene features in our final model. However, this result suggests that miRNA-gene-targeting features would be a possible direction for the future improvements of disease-causative miRNA predictions.

Case Study
To further verify the effectiveness of LE-MDCAP, we implemented case studies on causal miRNA-disease associations with high prediction scores by querying the latest literature records that have not been included in the HMDD v3.2 dataset. Because these articles were not included in either the training or the testing dataset, they could serve as a supplementary evaluation of LE-MDCAP's performance in addition to the regular ROC assessments. We first checked if the LE-MDCAP prediction would facilitate finding the potential causal miRNAs of the investigated diseases. Diabetic retinopathy is a common and specific microvascular complication of diabetes that can lead to blindness in severe cases [21]. Understanding the molecular mechanisms of diabetic retinopathy can help develop therapeutic agents to alleviate the symptoms. Here, we looked for miRNAs with causal potential for diabetic retinopathy based on the predictive score of LE-MDCAP. As shown in Table 1, four of the top five and eight of the top 15 causal miRNA-disease associations in prediction scores have been validated by the literature in the last two years. We found that hsa-mir-21 had a score of 0.104, ranking the best among all potential miRNAs. Moreover, the upregulation of hsa-miR-21-5p has been reported to damage human retinal pigment epithelial cells, thereby inducing the development of diabetic retinopathy [22]. Another research found that hsa-miR-34a promotes vascular endothelial cell apoptosis in diabetic retinopathy by targeting SIRT1 [23]. Similarly, a study showed that hsa-miR-221-3p regulates microvascular dysfunction in diabetic retinopathy by targeting TIMP3 [24]. Moreover, a study published in the last year reported that has-miR-126 enhances proliferation and inhibits apoptosis in high-glucose-induced human retinal endothelial cells by targeting IL-17A, which in turn accelerates the disease process [25]. These articles were all published recently and have not yet been included in the HMDD v3.2 database, so they will not affect the prediction results of the LE-MDCAP algorithm. In general, it can be confirmed that the associations between potentially causal miRNAs predicted by LE-MDCAP and diabetic retinopathy are indeed causal. In addition, we also checked if LE-MDCAP can facilitate identifying potential causalassociated diseases of a given miRNA. Here we selected has-mir-361 as a case study to verify the performance of our model. Previous studies have shown that hsa-mir-361 plays a crucial role in the development of several cancer types and cardiovascular diseases [26,27]. Excluding diseases in the training set for which causality have been identified, we used LE-MDCAP to predict other causal disease associations for hsa-mir-361. The prediction results are shown in Table 2, five out of the top five and ten out of the top 15 on the list were verified based on recent experimental reports. We also found a score of 0.157 for breast neoplasms, which ranked the best among all potential diseases. A study on breast neoplasms and hsa-mir-361 indicated that has-miR-361-3p promotes human breast cancer cell viability by inhibiting the E2F1/P73 signaling pathway [28]. Hsa-miR-361-5p was reported to exert tumor-suppressing functions in gastric carcinoma by targeting syndecan-binding protein [29]. Furthermore, long non-coding RNA BLACAT1 inhibits cell proliferation in prostate cancer by acting on hsa-miR-361 [30]; long non-coding RNA PVT1 contributes to cell growth and metastasis in non-small-cell lung cancer by regulating miR-361-3p [31]. Glioblastoma-related studies confirm that COX10-AS1 competitively binds hsa-mir-361-5p to promotes glioma development [32]. All of the above experimental investigations have suggested that hsa-mir-361 is involved in the progression of the disease predicted by LE-MDCAP. Taken together, the results of the analysis further confirmed the capability of LE-MDCAP to predict causal miRNA-disease associations.

LE-MDCAP Server
To facilitate the community, we established an easy-to-query webserver interface for LE-MDCAP (http://www.rnanut.net/LEMDCAP/). The query interface of the LE-MDCAP server is shown in Supplementary Materials Figure S2. The users can retrieve prediction results based either on a miRNA name keyword or a disease-term keyword. Bothe exact and fuzzy searching mode were supported. The users can also customize the method to sort the prediction scores, according to the per miRNA ranking (miRNA), the per disease ranking (disease, default) or the overall ranking (any), so that the most likely causative miRNA in the specific diseases can be easily prioritized. We also provided the dataset and all prediction results at the more stable GitHub Repository (https://github. com/bioinfohy/LE-MDCAP/), as an alternative data-access approach.

Discussion
As miRNA research has expanded into a large number of disease areas, it has become clear that the expression levels of certain miRNAs are altered in many diseases. Most of these miRNAs are only passively altered during disease progression, and we refer to these miRNA-disease associations as non-causal associations. Although disease non-causal miRNAs are not directly involved in disease mechanisms, they are widely employed in clinical diagnosis, treatment response and prognosis, due to their sensitivity. Evidence suggests that they can play an important role as biomarkers in cancer through exosomemediated intercellular communication [33,34] and in neurology for the diagnosis and prognosis of Alzheimer's disease [35], among others. However, for the purpose of accurate dissection of disease mechanisms or effective identification of therapeutic targets of miRNA interventions, causal miRNA-disease associations are more important.
Many algorithms have been proposed to screen miRNA-disease associations, but few of them have considered the more critical causal information during disease progression. Gao et al. proposed a model MDCAP for predicting causal miRNA-disease associations based on the latest disease causality annotation from HMDD v3.2 [13]. Nevertheless, as shown above, MDCAP cannot effectively discern between causal and non-causal miRNAdisease associations. Therefore, we constructed LE-MDCAP, a model for predicting causal miRNA-disease associations by using Levenshtein distance and matrix decomposition algorithms as a framework. LE-MDCAP exhibits competitive performance in both a 10-fold cross-validation and independent test. Notably, LE-MDCAP showed considerable advantages over MDCAP in prioritizing disease causal miRNAs from non-causal ones, highlighting the unique advantages for distinguishing between causal and non-causal miRNA-disease associations. The contribution of Levenshtein-distance-based similarity is intuitively expressed by the weights from the optimized prediction score integration formula as follows: MD' = 0.35MD' S + 0.4 MD' E + 0.15MD' P + 0.1MD' G . The Gaussian interaction profile kernel similarity matrix, which is the core similarity matrix of the previous MDCAP model, only contributes a minor fraction to the final prediction result (the weight of MD' G is only 0.1). By integrating similarity in the miRNA seed sequences, mature miRNA sequences and pre-miRNA sequences, the sequence-based Levenshteindistance similarity matrix becomes a core component of the model (with the weight of MD' S being 0.35). Moreover, further integration of expression-and pathway-based Levenshteindistance similarities also significantly contribute to the final model (the weights are 0.4 and 0.15, respectively). In all, the enriched Levenshtein-distance similarity matrices covering the sequence, expression and functional miRNA similarities have effectively enhanced the performance for causal miRNA-disease association prediction.
Although LE-MDCAP has an acceptable performance in prioritizing causal miRNAdisease associations from a large number of general miRNA-disease associations, it still has clear limitations. First, a realistic limitation is that the disease prediction space is limited to diseases included in the causal miRNA-disease association dataset, resulting in prediction models that do not apply to new diseases without any known causal associations with miRNAs. The prediction performance of LF-MDCAP would improve with the amount of disease causal miRNA annotation data increasing in future work. Second, the AUROC of MDCAP for causal versus non-disease prediction is 0.928 and 0.925 in 10-fold crossvalidation and independent testing, respectively, outperforming LE-MDCAP. To elevate the prediction accuracy of our model, we tried to combine the miRNA target information data, but this only resulted in a marginal performance improvement (Supplementary Materials Figure S1). In the future, the better construction of the disease semantic similarity matrix may further improve the performance. Third, designing score functions for causal miRNA-disease associations by accumulating works from the literature may also help to extract additional features for the causal miRNA-disease association prediction models in the future.

Human Causal miRNA-Disease Associations
The human causal miRNA-disease associations dataset was downloaded directly from HMDD v3.2 (http://www.cuilab.cn/hmdd/, accessed on 18 May 2021) [11]. To compare our method with Gao's method [13], we use the same datasets as they did that contain 4228 experimentally verified causal associations between 535 miRNAs and 302 diseases. We constructed a nm × nd adjacency matrix, MD, to better represent the causal miRNA-disease associations, where nm and nd denote the number of miRNAs and diseases, respectively. Specifically, the element MD(i, j) is 1 if miRNA m(i) is confirmed to be causally associated with disease d(j); otherwise, it is 0.

MiRNAs Similarities
To more fully characterize the similarity of miRNAs, we introduced the Levenshteindistance algorithm to measure the feature similarity between any two miRNAs. The Levenshtein distance, also known as the edit distance between strings, is defined as the minimum number of operations required to make two inputs equal. Thus, we obtain the following Equation (1): where LD (m 1 , m 2 ) represents the minimum editing cost of converting the miRNA m 1 feature string to another miRNA m 2 feature string, and len represents the length of miRNA feature string. Therefore, the functional similarity of two miRNAs as MS (m 1 , m 2 ) can be calculated as follows Equation (2): Because only unilateral editing distance was considered here, the calculated MS (m 1 , m 2 ) scores should range from 0.5 to 1. A larger score indicates that the two miRNA feature strings are more similar and therefore more likely to perform similar functions.
Instead of being simply an approach for measuring sequence similarity between miRNAs, Levenshtein distance was employed to established an enriched set of miRNA similarity matrixed covering the similarity in seed sequences, mature miRNA sequences, hairpin precursor sequences, expression levels and target pathways between miRNAs. First, miRNAs follow the base-pairing principle when binding to their target genes, and, more importantly, sequence features could be applied to all miRNAs without the bias reported in the literature. Therefore, we here used the sequence information of miRNAs as the primary proxy to probe their functions. The sequence data of miRNAs from the miRbase (http://www.mirbase.org/, version 22, accessed on 30 September 2021) [36] were collected, and the Levenshtein distance was utilized to measure the similarity of pre-miRNA sequences, mature miRNA sequences and seed sequences between any two miRNAs. Accordingly, three functional similarity matrices, namely MS SP , MS SM and MS SS , were obtained. The sequence-information-based miRNA similarity matrix, MS S , was obtained based on the weighted sum of the above three matrixes, where the weights of all scores sum to 1 and were optimized in steps of 0.05. We introduced the MS S obtained by combining different weights into the algorithm separately and finally selected the combined weight (MS S = 0.05MS SP + 0.05MS SM + 0.9MS SS ) due to its better AUROC value when the algorithm distinguished causal and non-causal miRNA-disease associations (Supplementary Materials Table S1). Second, as a typical class of non-coding RNAs, the expression profiles of miRNAs are often cell-type-specific, and the function of a miRNA is heavily dependent on what cell it is expressed by. For this reason, we obtained miRNAs expression data from Lorenzi's study [37] for 137 cell types, followed by calculating miRNA expression similarity by Levenshtein distance to determine the functional similarity matrix, MS E -more specifically, by classifying the expression level of a miRNA in each cell type as A, B, C and D, according to the quantile allocation of its expression level across all miRNAs. We have also tried other configurations but find such a quantile allocation performed slightly better than others (Supplementary Materials Figure S3); the expression data for each miRNA can be depicted as an expression string of length 137 that were further used for the Levenshtein distance algorithm. Third, a straightforward description of functional similarities between miRNAs is to measure how their targeted biological processes and signaling pathways overlap. Accordingly, we downloaded the p-value data for miRNA pathway enrichment analysis results from the miRPathDB (https://mpd.bioinf.uni-sb.de/, version 2.0, accessed on 17 October 2021) [38] database and screened for pathways with at least three miRNAs with p-value < 0.05. We graded the p-values of the 1409 retained pathways in four levels: specifically, A represented p-values less than 0.05 and greater than 0.01, B represented p-values less than 0.01 and greater than 0.0001, C represented p-values less than 0.0001 and N indicated non-significant p-values greater than 0.05. Therefore, each miRNA is assigned a 1409-dimononal pathway string vector for subsequent Levenshteindistance calculations, resulting in the functional similarity matrix, MS P , based on the miRNA target pathways.
In addition, according to previous studies [39,40], we also constructed the Gaussian interaction profile kernel similarity matrix, GM, for miRNAs as the baseline method for miRNA similarity. Together, we finally calculated four miRNA similarity matrices, i.e., sequence-based, MS S ; expression-based, MS E ; pathway-based, MS P ; and the previous Gaussian interaction profile kernel similarity matrix, GM.

Disease Semantic Similarity
The widely applied Wang's disease semantic similarity [41] was introduced, which is based on sematic topology relations between diseases as recorded in the Medical Subject Headings (MeSH) database (https://www.nlm.nih.gov/, accessed on 19 July 2020). In the MeSH system, the topology of disease can be described as a directed acyclic graph (DAG), i.e., DAG D = (D, T D , E D ), where T D denotes the node set that includes the disease, D, and its ancestor diseases; and E D denotes the edge set of all relationships of DAG D . The contribution of disease, d, to the semantic value of disease, D, can be defined by the following Equation (3): In the above equation, ∆ is the semantic contribution factor, which is usually set to 0.5 [42]. The semantic value DC(D) is given by integrating all contributions of the ancestral disease and the disease, D, Equation (4): Therefore, the semantic similarity of diseases D i and D j is calculated as follows Equation (5): It is obvious that diseases sharing most of the DAGs are more likely to have higher semantic similarity.

Matrix Decomposition
From the above, we obtained the causal miRNA-disease association matrix, MD; the disease semantic similarity matrix, DS; and four miRNA similarity matrices, namely MS S , MS E , MS P and GM. Next, we utilized the matrix decomposition algorithm proposed by Che et al. [20] to predict the causal association scores of miRNAs with diseases, respectively.
First, the initial projection vector of each miRNA and disease is given in a fixed k dimensional space, and their inner product is used to represent the causal association between them, as Equation (6): where M and D are k × m and k × d matrices, respectively; m is the number of miRNAs; and D is the number of diseases. Thus, the causal miRNA-disease association problem can be thought of as minimizing the distance between MD' matrix and MD matrix of known causality by solving for the appropriate M and D. The objective function can be expressed as follows Equation (7): On the other hand, the M and D should also fit the known miRNA similarity matrices and disease semantic similarity matrix in the model, so another part of the objective function is as follows Equation (8): These two parts of objective function can be optimized together by using the iterative least square approach, which was specified in Che et al.'s original article [20].

Integrated Prediction Score of LE-MDCAP
The inner product of the calculated M and D yields a prediction association score matrix, MD = M T D. The four miRNA similarity matrices, namely MS S , MS E , MS P and GM, correspond to the predicted score matrices, namely MD S , MD E , MD P and MD G , respectively. The composite prediction score matrix, MD , is obtained based on the weighted sum of the above four prediction scores, where the weights of all scores sum to 1 and have been optimized in steps of 0.05. Finally, we determine the integrated prediction score matrix, MD , as MD = 0.35MD S + 0.4 MD E + 0.15MD P + 0.1MD G (Supplementary Materials  Table S2).

Model Evaluation and Server Construction
To evaluate the prediction accuracy of LF-MDCAP, we also performed an independent test and 10-fold cross-validation. In terms of distinguishing causal from non-causal miRNA-disease associations, our model was compared with the MDCAP predictor. The prediction of LE-MDCAP was available as an online web server that was constructed with the HTML + PHP + Apache framework.