Improved Prediction of miRNA-Disease Associations Based on Matrix Completion with Network Regularization.

The identification of potential microRNA (miRNA)-disease associations enables the elucidation of the pathogenesis of complex human diseases owing to the crucial role of miRNAs in various biologic processes and it yields insights into novel prognostic markers. In the consideration of the time and costs involved in wet experiments, computational models for finding novel miRNA-disease associations would be a great alternative. However, computational models, to date, are biased towards known miRNA-disease associations; this is not suitable for rare miRNAs (i.e., miRNAs with a few known disease associations) and uncommon diseases (i.e., diseases with a few known miRNA associations). This leads to poor prediction accuracies. The most straightforward way of improving the performance is by increasing the number of known miRNA-disease associations. However, due to lack of information, increasing attention has been paid to developing computational models that can handle insufficient data via a technical approach. In this paper, we present a general framework—improved prediction of miRNA-disease associations (IMDN)—based on matrix completion with network regularization to discover potential disease-related miRNAs. The success of adopting matrix factorization is demonstrated by its excellent performance in recommender systems. This approach considers a miRNA network as additional implicit feedback and makes predictions for disease associations relevant to a given miRNA based on its direct neighbors. Our experimental results demonstrate that IMDN achieved excellent performance with reliable area under the receiver operating characteristic (ROC) area under the curve (AUC) values of 0.9162 and 0.8965 in the frameworks of global and local leave-one-out cross-validations (LOOCV), respectively. Further, case studies demonstrated that our method can not only validate true miRNA-disease associations but also suggest novel disease-related miRNA candidates.


Introduction
MicroRNAs (miRNAs) are small single-stranded non-coding RNAs that bind to the 3 untranslated regions (UTRs) of target messenger RNAs (mRNAs) [1,2]. miRNAs tend to restrain gene expression by control of their own regulatory sequences and promoters; they bind to specific target mRNAs through base-paring, which inhibits the translation and stability. Since the first discovery of two miRNAs (Caenorhabditis elegans lin-4 and let-7) in 1993 and 2000, increasing attention has been paid to this research field. Numerous studies continue to demonstrate the crucial roles of miRNAs in diverse biologic processes such as apoptosis [3], cell development [4], proliferation [5], viral infection [6] and metabolism [7]. As indicated by previous studies, miRNAs are becoming diagnostic/therapeutic tools among miRNAs based on the assumption that two miRNAs are functionally related if the number of shared environmental factors is statistically significant. Environmental factors include drugs, alcohol, stress and diet. However, this model does not consider the chemical structure of the EF, which leaves a room for improvement in prediction accuracy by measuring the precise similarity among the miRNAs. Shi et al. made use of the protein-protein interaction (PPI) network by implementing the random walk algorithm to exploit miRNA-disease associations [20]. In summary, most similarity-based models have encountered difficulties in their performance owing to the lack of sufficient validated interactions. Hence, these approaches are highly biased towards miRNA-disease associations, which is not applicable for the miRNAs with no disease associations.
Machine learning-based approaches have delivered superior performance in various scientific research areas including bioinformatics and computational biology. For example, Chen et al. developed the computational framework named regularized least square for miRNA-disease association (RLSMDA) [21]. This study is based on semi-supervised learning that can predict miRNA-disease associations without using negative samples. However, the main drawback of this model is finding optimal parameters of RLSMDA and combining the classifiers from two different spaces. Chen et al. also presented the model of the restricted Boltzmann machine for multiple types of miRNA-disease prediction (RBMMMDA) by utilizing the restricted Boltzmann machine (RBM) [22]. The main advantage of this model is not just the resulting improvement in prediction accuracy, but mainly the ability of estimating the corresponding types of miRNA-disease associations. Ha et al. proposed the matrix factorization-based model called PMAMCA to identify potential miRNA-disease associations [23]. This model, with the utilization of miRNA expression data and known miRNA-disease associations, outperformed the previous models in terms of area under the receiver operating characteristic (ROC) curve (AUC) scores. However, this model leaves room for further improvement by using diverse biologic information as implicit data. To date, Li et al. proposed a matrix completion algorithm-based model called MCMDA [24]. In this study, they constructed binary adjacency matrix with known miRNA-disease associations, and a singular value threshold (SVT) algorithm was constructed to find novel disease-related miRNAs. However, finding optimal parameters of this model remains a critical issue. Xio et al. proposed the framework graph regularized non-negative matrix factorization (GRNMF) that exploits the weighted gene network to calculate the interaction profiles of new miRNAs and diseases [25]. Chen et al. developed a model of ranking-based k-nearest neighbors for miRNA-disease association prediction (RKNNMDA) by exploring the k-nearest-neighbors of miRNAs and diseases. SVM was adopted for calculating the k-nearest-neighbors, and prioritized miRNA-disease associations based on weighted voting [26]. Chen et al. proposed the approach of inferring miRNA-disease associations by making complete use of inductive matrix complementation with matrix decomposition and heterogeneous graphs (IMCMDA) [27]. This model not only explores disease-related miRNAs but also measures the comprehensive similarities of miRNAs and diseases. Chen et al. presented the matrix decomposition and heterogeneous graph inference (MDHGI). This model prioritizes disease-related miRNAs by combining the matrix decomposition algorithm with miRNA functional similarity, disease semantic similarity and Gaussian interaction profile kernel similarity [28]. In summary, most machine learning-based approaches have difficulties in adjusting the optimal parameters and using negative samples. Furthermore, various optimal parameter combinations may exist in different scenarios, thereby resulting in more complicated sensitivity analysis.
Identifying novel miRNA-disease associations is beneficial for the understanding of disease pathogenesis at the molecular level and the development of the disease diagnostic biomarkers. However, most previous miRNA-prediction algorithms are still impeded by the data sparsity problem; hence, it is challenging to predict the miRNAs with a few known disease associations. These miRNAs are called rare miRNAs. In the recommender system, similar problems were efficiently addressed by adopting matrix factorization to predict the most plausible rating scores of each user. Inspired by the recent advancement of recommender systems, this study addresses common research problems by formalizing a matrix factorization-based model for collaborative filtering. In the light of this issue, we transform the task of predicting miRNA-disease associations in a recommender task.
In this study, we present a computational miRNA-disease association prediction model for improved prediction based on matrix completion with network regularization (IMDN). We consider the miRNA network to efficiently handle rare miRNAs. The core idea of IMDN is based on the consideration of relationship among the miRNAs within the network to better capture the embeddings through the direct neighbors. To inject the influence of miRNA network, we coin the network regularization term to consider network constraints on the prediction model. Because of the limited number of predetermined weight values on the miRNA similarity network, our proposed model was extended to calculate the precise miRNA similarity through the Gaussian interaction profile kernel. Our primary contribution to IMDN relies on its expandability of matrix factorization-based model, which applies miRNA similarity network as the regularization term and miRNA expression value as the weight of the objective function. By mapping the miRNA expression value as a weight of the objective function, we could train the model even though we do not know the miRNA-disease associations. Further, calculation of new similarities among the miRNAs could be one of the main contributions to the delivery of outstanding performance. We expect that IMDN can serve as an effective tool for discovering potential miRNAs-disease associations by considering the miRNA network. Various experimental results demonstrated that IDMN outperforms the state-of-the-art miRNA-disease association prediction model in terms of the AUC scores and the survival analysis.

Methods Overview
We present the novel computational framework of IMDN to predict miRNA-disease associations. IDMN comprises three main steps. First, to construct the miRNA functional similarity network, we utilize the pre-calculated weight of misim and calculate the new miRNA similarity through the Gaussian interaction profile kernel. Second, given a miRNA similarity network and miRNA expression data, we apply matrix factorization-based model to efficiently train the miRNA latent feature vector and disease latent feature vector based on the known miRNA-disease associations. Lastly, we prioritize miRNA candidates based on scores that were assigned by the IDMN. The workflow of IDMN is illustrated in detail in Figure 1.

Human miRNA-disease Associations
We collected human miRNA-disease associations data from HMDD v2 [29], dbDEMC [30] and miR2Disease [31]. Despite the comparable effectiveness of MF in a wide variety of domains, the challenge in prediction performance remains owing to the insufficient experimentally validated interactions in binary adjacency matrix R. Therefore, the operation of combining the miRNA-disease associations from three public databases was conducted to produce rich input data. As duplicate entries exist in the three public databases, we implemented data preprocessing to eliminate the duplicates. HMDD is an online public database that provides 10,368 experimentally confirmed human miRNA-disease associations regarding 572 miRNAs and 378 diseases. dbDEMC is an integrated human miRNA database of differentially expressed miRNAs in human cancers (dbDEMC) that contains information on 2224 miRNAs and 36 diseases. miR2disease is a manually curated database that contains in the form of 1939 entries on 299 miRNAs and 94 diseases. After unification, we conducted an operation unifying the names of different miRNAs under one miRNA gene based on standard mesh disease terms. Variables N m and N d stand for the number of miRNAs and diseases, respectively, Cells 2020, 9, 881 5 of 18 and the N m × N d binary adjacency matrix R was constructed on the basis of the integrated human miRNA-disease associations. The binary adjacency matrix R is expressed as follows: R(m(u), d(i)) = 1, i f miRNA m(u) and disease(i)has veri f ied association 0, otherwise (1) Cells 2019, 8, x FOR PEER REVIEW 5 of 17 Figure 1. Workflow of IMDN. First, the functional similarity network in which node is miRNA was constructed from already known database misim and the proposed inference approach. Second, matrix factorization was performed with both the inferred miRNA network and miRNA expression data. Finally, prioritization was implemented based on the highly scored miRNAs.

Human miRNA-disease Associations
We collected human miRNA-disease associations data from HMDD v2 [29], dbDEMC [30] and miR2Disease [31]. Despite the comparable effectiveness of MF in a wide variety of domains, the challenge in prediction performance remains owing to the insufficient experimentally validated interactions in binary adjacency matrix R. Therefore, the operation of combining the miRNA-disease associations from three public databases was conducted to produce rich input data. As duplicate entries exist in the three public databases, we implemented data preprocessing to eliminate the duplicates. HMDD is an online public database that provides 10,368 experimentally confirmed human miRNA-disease associations regarding 572 miRNAs and 378 diseases. dbDEMC is an integrated human miRNA database of differentially expressed miRNAs in human cancers (dbDEMC) that contains information on 2224 miRNAs and 36 diseases. miR2disease is a manually curated database that contains in the form of 1939 entries on 299 miRNAs and 94 diseases. After unification, we conducted an operation unifying the names of different miRNAs under one miRNA gene based on standard mesh disease terms. Variables and stand for the number of miRNAs and diseases, respectively, and the × binary adjacency matrix R was constructed on the basis of the integrated human miRNA-disease associations. The binary adjacency matrix R is expressed as follows:

miRNA Expression Data
To model the prediction model more precisely and effectively, we utilized the miRNA expression dataset to compensate insufficient miRNA-disease associations. As a large number of Figure 1. Workflow of IMDN. First, the functional similarity network in which node is miRNA was constructed from already known database misim and the proposed inference approach. Second, matrix factorization was performed with both the inferred miRNA network and miRNA expression data. Finally, prioritization was implemented based on the highly scored miRNAs.

miRNA Expression Data
To model the prediction model more precisely and effectively, we utilized the miRNA expression dataset to compensate insufficient miRNA-disease associations. As a large number of biologic datasets are being generated with the help of the high-throughput technique, these datasets create opportunities to decipher the understanding of diverse meaningful biologic functions such as disease pathogenesis and disease etiology as well as discover novel disease biomarkers. Therefore, miRNA expression data were obtained from the cancer genome atlas (TCGA), which provides multimodal genomics and proteomics data for thousands of tumor samples for more than 20 types of cancer [32]. To construct the N m × N d miRNA expression weight matrix W, min-max normalization was conducted first. We only take the weight value W (u,i) into account when there is no association between miRNA m(u) and disease d(i) in the original matrix R, otherwise, we regard it as one. miRNA functional similarity scores were calculated based on the hypothesis that functionally similar miRNAs are more inclined to associate with phenotypically similar diseases. miRNA Functional similarity data misim 2.0 was downloaded from http://www.lirmed.com/misim/ to construct the N m × N m miRNA functional similarity matrix FS [33]. The similarity score between miRNA m(u) and m(i) can be expressed as FS(u,i).

Gaussian Interaction Profile Kernel miRNA Similarity
Multiple studies continue to prove the effectiveness of the Gaussian interaction profile kernel on calculating similarities among both diseases and miRNAs [34,35]. To calculate the comprehensive and precise similarity score among the miRNAs, we adopted the Gaussian kernel function, which is also called radical basis function (RBF). We regraded two miRNAs to be functionally related if they have similar patterns of interactions with the diseases on the basis of the known human miRNA-disease associations. For a given miRNA u, the feature vectors of IP(m(u)) were extracted from the i-th row of the miRNA latent feature vector to express the interaction profile of m(u). The Gaussian kernel similarity between miRNA m(i) and m(j) could be computed by: GS is denoted as Gaussian interaction profile kernel, where r m is the hyperparameter that controls the bandwidth of the kernel, which can be calculated as follows:

Integrated miRNA Similarity
We obtained the integrated miRNA similarity score that was used for constructing miRNA similarity network based on the miRNA functional similarity FS and miRNA Gaussian interaction kernel similarity GS. The integrated weight value that was used for the edge of miRNA similarity network S can be expressed as follows: S(m(u), m(i)) = FS(m(u), m(i)) i f m(u)and m(i)has f unctional simialrity GS(m(u), m(i)) otherwise (5)

IMDN
Among various collaborative filtering methods, matrix factorization has yielded immense success on recommendation systems [36]. However, the large-scale and sparse data of the original matrix usually degrades the performance of the matrix factorization model. Hence, most of the matrix factorization-based models are suffering from a cold start problem when there are miRNAs with few disease associations in the binary adjacency matrix. To handle this issue, various advanced matrix factorization methods have been proposed by utilizing various biologic datasets. In this work, we used the miRNA network as auxiliary information to enhance the prediction accuracy.
The miRNA network can be defined as a graph where there is a node corresponding to each miRNA, and an edge corresponding to each similarity weight. The physical meaning of the weight edge in network S u,i can be interpreted as how much miRNA M u is similar to the miRNA M i .
Applying the network influence, the trait of each miRNA can be affected by its direct neighbors E u . Based on the intuition that nodes have similar structural roles in network should be located close together, the miRNA latent feature vector M u is highly affected by the latent feature vectors of Cells 2020, 9, 881 7 of 18 its direct neighbors v ∈ E u .M u is an estimated latent feature calculated from feature vectors of its direct neighbors. All the notations, which were used in following equations, are described in Table 1. Formulation is described as follows:

Symbol
Description By fully taking advantage of the characteristic of miRNAs in the miRNA similarity network, the new estimated latent feature vector of miRNA can be calculated by the weighted average of its direct miRNA latent feature vectors as follows: Considering the miRNA similarity network as implicit feedback does not change the conditional distribution of known miRNA-disease associations. It only takes miRNA latent vectors into account. Therefore, the expression of conditional probability can be expressed as follows.
The zero-mean Gaussian prior is assigned to miRNA latent vectors to avoid over-fitting. Motivated by the fact that characteristic of miRNA is highly affected by its direct neighbor, conditional distribution of miRNA latent vector is given the latent vectors of its direct neighbors as follows: Our goal is to capture the most plausible latent vectors of miRNAs M u and diseases D i , so that the inner product of each latent vector would be close to the entry of binary association matrix R u,i . Aiming at modeling the cost function more accurately, we added additional miRNA terms to better capture the characteristic of miRNA latent vector M u which naturally reflects the neighbors' characteristic of M v in Cells 2020, 9, 881 8 of 18 the miRNA similarity network S. We also coin the miRNA expression weight matrix as W to efficiently train the latent vector of miRNA and disease.
Maximizing the log-posterior over latent vectors of miRNAs and diseases can be thought of equivalent to minimizing the cost function below. The goal is to minimize the loss between the entry of R u,i and dot product of corresponding miRNA latent vector M u and disease latent vector D i .
The derivative of M u and D i for all miRNAs u and all diseases i can be expressed as follows by performing a gradient decent. Our approach is efficient even when performing a simple gradient descent method. λ M , λ D , λ S are the hyper-parameters that were applied to control regulators to avoid overfitting. Graphical modeling of IMDN is illustrated in Figure 2.

Performance Evaluation
To demonstrate the superiority of IMDN, we compared our method with other state-of-the-art methods such as PMAMCA [23], MDHGI [28], RKNNMDA [26], RWRMDA [15], MCMDA [24] and RLSMDA [21]. All models were assessed by implementing leave-one-out cross-validation (LOOCV) based on integrated miRNA-disease associations (dbDEMC, miR2diseaes and HMDD v2). Typically, LOOCV can be divided into global and local LOOCV, wherein each known miRNA-disease association was left out in turn as a test sample, whereas all the other remaining miRNA-disease pairs were considered as training samples. Global LOOCV evaluates the performance of the model by considering all diseases simultaneously, whereas local LOOCV only considers miRNAs for a specific disease. That is to say, in global LOOCV, each association was considered as test sample while in turn the remains were regarded as training samples. In local LOOCV, assessment of local prediction was performed by considering the ability to recover the miRNA-disease associations for a specific disease.
For both global and local LOOCV, all test samples are prioritized based on the prediction scores assigned by IMDN. This partition-prediction-ranking step was conducted 100 times to derive the mean AUC score of IMDN for reasonable estimation of the prediction accuracy. The AUC scores were calculated to demonstrate the performance of each method. We drew the ROC curve in terms of the true positive rate (TPR, sensitivity) and false positive rate (FPR, 1-specificity), where sensitivity and specificity could be defined as follows: Speci f icity = TN TN + FP Sensitivity refers to the extracted candidates ranked above the threshold and specificity refers to the candidates that are ranked below the threshold. TP and TN denote the numbers of correctly identified positive and negative samples, whereas FP and FN denote the numbers of misidentified positive and negative samples. Typically, an AUC value of 1 represents perfect prediction, whereas an AUC value of 0.5 represents random selections. Therefore, models with AUC scores that are close to 1 are considered competitive prediction models. We demonstrate the efficacy of IDMN over state-of-the-art methods by comparing the AUC scores. The performance comparison in terms of the ROC curve is illustrated in Figure 3. As shown in Figure  To demonstrate the performance of IDMN more precisely, we additionally drew precision/recall curve and calculated auprc scores. As illustrated in Figure 4, IMDN achieved the best performance compared to previous prediction models. The comparison shows that IMDN achieves a comparable performance under the reliable evaluation metric, which supports that our approach is capable of predicting a large number of disease-related miRNAs. evaluation based on global LOOCV because it considers diseases one at a time. To demonstrate the performance of IDMN more precisely, we additionally drew precision/recall curve and calculated auprc scores. As illustrated in Figure 4, IMDN achieved the best performance compared to previous prediction models. The comparison shows that IMDN achieves a comparable performance under the reliable evaluation metric, which supports that our approach is capable of predicting a large number of disease-related miRNAs.

Effect of miRNA Functional Similarity Network
With the vast sizes of biologic datasets that are generated nowadays, an important issue for evaluating IDMN is whether the model efficiently reflects additional biologic data. We validated the possible expandability of IDMN for the new input data (i.e., implicit feedback) such as miRNA functional similarity network data. In this study, we used the network regularization term to inject the information of miRNA functional similarity data into the matrix factorization-based model. To demonstrate the efficacy of miRNA functional similarity information, we checked the prediction accuracy in two cases: 1) without the network regularization term, we only mine the miRNA-disease performance of IDMN more precisely, we additionally drew precision/recall curve and calculated auprc scores. As illustrated in Figure 4, IMDN achieved the best performance compared to previous prediction models. The comparison shows that IMDN achieves a comparable performance under the reliable evaluation metric, which supports that our approach is capable of predicting a large number of disease-related miRNAs.

Effect of miRNA Functional Similarity Network
With the vast sizes of biologic datasets that are generated nowadays, an important issue for evaluating IDMN is whether the model efficiently reflects additional biologic data. We validated the possible expandability of IDMN for the new input data (i.e., implicit feedback) such as miRNA functional similarity network data. In this study, we used the network regularization term to inject the information of miRNA functional similarity data into the matrix factorization-based model. To demonstrate the efficacy of miRNA functional similarity information, we checked the prediction accuracy in two cases: 1) without the network regularization term, we only mine the miRNA-disease

Effect of miRNA Functional Similarity Network
With the vast sizes of biologic datasets that are generated nowadays, an important issue for evaluating IDMN is whether the model efficiently reflects additional biologic data. We validated the possible expandability of IDMN for the new input data (i.e., implicit feedback) such as miRNA functional similarity network data. In this study, we used the network regularization term to inject the information of miRNA functional similarity data into the matrix factorization-based model. To demonstrate the efficacy of miRNA functional similarity information, we checked the prediction accuracy in two cases: 1) without the network regularization term, we only mine the miRNA-disease association binary matrix and employ known miRNA-disease associations for making predictions; 2) with the network regularization term, we fuse the information from the miRNA similarity graph to capture the trait of each miRNA purely from its direct neighbors. Consequently, we could confirm the significant increase in the performance of IDMN with the miRNA functional similarity network, as illustrated in Figure 5. The motivation behind applying the miRNA similarity network was to reflect the hidden characteristics through its direct neighbors. We can conclude that IDMN supports the well-known biologic assumption that functionally similar miRNAs are inclined to associate with phenotypically similar diseases. the significant increase in the performance of IDMN with the miRNA functional similarity network, as illustrated in Figure 5. The motivation behind applying the miRNA similarity network was to reflect the hidden characteristics through its direct neighbors. We can conclude that IDMN supports the well-known biologic assumption that functionally similar miRNAs are inclined to associate with phenotypically similar diseases.

Case Studies
We also studied three main common diseases in the human population to qualitatively ascertain the performance of IMDN for novel disease-related miRNA prediction. We observed the number of correctly identified disease-related miRNAs for the three diseases within the top 50 candidates.
Colon neoplasm (CN) is the most common malignant cancer that typically arises from lesions in the human colon or rectum, which poses a major threat to human life. According to the latest statistic in 2019 [37], 145,600 newly diagnosed CN cases and 51,020 deaths from CN were reported in the United States. To date, many researchers have proposed that the utilization of miRNAs as new biomarkers can be a good alternative for detecting CN. Therefore, IMDN was implemented to predict the potential CN-related miRNAs by prioritizing the candidates with the scores assigned by IMDN. As shown in Table 2, IMDN confirmed 46 out of the top 50 CN-related miRNAs. Among the four remaining candidates, three were validated by experimental studies. For example, miR-150 was found to function as a tumor suppressor in CN by targeting c-Myb [38]; overexpression of miR-122 could lead to the development of CN liver metastasis [39]; expression of miR-199a-3p (pre-miRNA of miR-199a) could be involved in the development, tumorigenesis and progression of CN [40]. Consequently, 49 out of the top 50 potential CN-related miRNAs were validated by experimental results.

Case Studies
We also studied three main common diseases in the human population to qualitatively ascertain the performance of IMDN for novel disease-related miRNA prediction. We observed the number of correctly identified disease-related miRNAs for the three diseases within the top 50 candidates.
Colon neoplasm (CN) is the most common malignant cancer that typically arises from lesions in the human colon or rectum, which poses a major threat to human life. According to the latest statistic in 2019 [37], 145,600 newly diagnosed CN cases and 51,020 deaths from CN were reported in the United States. To date, many researchers have proposed that the utilization of miRNAs as new biomarkers can be a good alternative for detecting CN. Therefore, IMDN was implemented to predict the potential CN-related miRNAs by prioritizing the candidates with the scores assigned by IMDN. As shown in Table 2, IMDN confirmed 46 out of the top 50 CN-related miRNAs. Among the four remaining candidates, three were validated by experimental studies. For example, miR-150 was found to function as a tumor suppressor in CN by targeting c-Myb [38]; overexpression of miR-122 could lead to the development of CN liver metastasis [39]; expression of miR-199a-3p (pre-miRNA of miR-199a) could be involved in the development, tumorigenesis and progression of CN [40]. Consequently, 49 out of the top 50 potential CN-related miRNAs were validated by experimental results.
Kidney neoplasm (KN) is a nonhomogeneous cancer that accounts for 5% of the new male cancer cases. Approximately 73,820 new KN cases were reported in the United States in 2019 [37]. Recent studies showed that miRNAs can play a role in discovering the hidden mechanism of KN. Therefore, we applied IDMN to extract potential miRNAs that are relevant to KN. As shown in Table 3, 46 out of the top 50 candidates were confirmed to be KN-related miRNAs, whereas the remaining four candidates were validated by recent studies. Overexpression of miR-142-3p could induce the apoptosis in RCC 786-O and ACHN cells. RCC is the most common type of adult kidney cancer [41]. Expression of miR-30a-5p was found to be substantially downregulated in the RCC tissues compared to normal tissues [42]. To conclude, 48 out of the top 50 proved to be KN-related miRNAs by public databases and other publications. has-miR-101, has-let-7c and has-miR-139), we could prove that these miRNAs highly associate with the survival rates of breast cancer patients. The overall analysis is illustrated in Figure 6.

Discussion
Identification of potential miRNA-disease associations could expand the understanding of disease etiology and pathogenesis. To this end, this study presents the novel framework of improved prediction of miRNA-disease associations based on matrix completion with network regularization (IMDN) for prioritization of disease-related miRNAs. The goal of IMDN is to learn miRNA and disease latent vectors through matrix factorization while preserving the properties of miRNA-disease associations. With the vast amount of omics datasets that are publicly available, an important criterion for evaluating the IMDN is whether the model effectively reflects additional biologic data while enhancing the prediction accuracy. Traditional MF-based prediction models are highly dependent on the known miRNA-disease associations while they ignore the relationship among the miRNAs in the network. To address this issue, we modified a cost function that we could use to Figure 6. Kaplan-Meier plots of hsa-miR-148a, hsa-miR-133b, hsa-miR-130a and hsa-let-7e for survival of breast cancer patients.

Discussion
Identification of potential miRNA-disease associations could expand the understanding of disease etiology and pathogenesis. To this end, this study presents the novel framework of improved prediction of miRNA-disease associations based on matrix completion with network regularization (IMDN) for prioritization of disease-related miRNAs. The goal of IMDN is to learn miRNA and disease latent vectors through matrix factorization while preserving the properties of miRNA-disease associations. With the vast amount of omics datasets that are publicly available, an important criterion for evaluating the IMDN is whether the model effectively reflects additional biologic data while enhancing the prediction accuracy. Traditional MF-based prediction models are highly dependent on the known miRNA-disease associations while they ignore the relationship among the miRNAs in the network. To address this issue, we modified a cost function that we could use to adaptively learn miRNAs and disease latent vectors, given the miRNA similarity network constructed using misim and Gaussian interaction profile kernels. Our prediction model was characterized by fully exploring the constructed miRNA similarity network to inject the correlations among the miRNAs. After implementing matrix factorization model with various biologic data, it was natural that miRNAs with a high chance of involvement in disease incidence would be highly prioritized with a high score. The AUC value was adopted to measure the prediction accuracy. As a result, the IMDN delivered superior performance with reliable AUC values of 0.9162 and 0.8965 in the frameworks of global and local LOOCV, respectively. Furthermore, case studies were conducted on three significant human diseases to verify the stable and reliable performance of IMDN. In summary, the experiments under various evaluation metrics qualitatively validated the excellent performance of IMDN compared to previous methods.

Conclusions
The excellent prediction performance of IDMN may be attributed to several important factors. First, we applied a matrix factorization model that yielded immense success in the recommender system. Among various collaborative filtering techniques, matrix factorization has been a promising technique in a wide variety of domains. In bioinformatics, matrix factorization helps in identifying hidden links among genes-and in recommender systems-it infers the most plausible rating scores that users may give to certain items. Thus, we transform the prediction of miRNA-disease associations into a recommender task. Second, IMDN is expandable in terms of additional biologic data, such as miRNA expression data and it improves the prediction accuracy. Lastly, our model exploited not only the known miRNA-disease associations but also integrated the miRNA similarity to better capture the characteristic of miRNA through its direct neighbors in the miRNA similarity network. It is noteworthy that the consideration of the miRNA similarity network lead to train the miRNA latent vector well. Most importantly, we anticipated that IMDN can serve as an effective tool for discovering potential links between miRNAs and diseases.
For future work, larger biologic datasets can be used to better capture the latent vectors of miRNAs and diseases to infer potential disease-related miRNAs. Furthermore, evaluation of miRNA candidates with not only the in silico experiments but also in vivo experiments shall clearly demonstrate the performance of the model and improve the credibility of the study. We also expect more comprehensive and public databases to be open in the future such that inferring novel miRNA-disease associations would achieve a more accurate and stable performance.