Next Article in Journal
Ionizing Radiation Increases the Activity of Exosomal Secretory Pathway in MCF-7 Human Breast Cancer Cells: A Possible Way to Communicate Resistance against Radiotherapy
Next Article in Special Issue
In Silico Prediction of Drug-Induced Liver Injury Based on Ensemble Classifier Method
Previous Article in Journal
Behavioral Disturbances in Dementia and Beyond: Time for a New Conceptual Frame?
Previous Article in Special Issue
An Ensemble Classifier to Predict Protein–Protein Interactions by Combining PSSM-based Evolutionary Information with Local Binary Pattern Model
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Inferring the Disease-Associated miRNAs Based on Network Representation Learning and Convolutional Neural Networks

1
School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China
2
School of Computer Science, Beijing University of Posts and Telecommunications, Beijing 100876, China
3
School of Mathematical Science, Heilongjiang University, Harbin 150080, China
*
Authors to whom correspondence should be addressed.
Int. J. Mol. Sci. 2019, 20(15), 3648; https://doi.org/10.3390/ijms20153648
Submission received: 11 June 2019 / Revised: 17 July 2019 / Accepted: 18 July 2019 / Published: 25 July 2019
(This article belongs to the Special Issue Special Protein or RNA Molecules Computational Identification 2019)

Abstract

:
Identification of disease-associated miRNAs (disease miRNAs) are critical for understanding etiology and pathogenesis. Most previous methods focus on integrating similarities and associating information contained in heterogeneous miRNA-disease networks. However, these methods establish only shallow prediction models that fail to capture complex relationships among miRNA similarities, disease similarities, and miRNA-disease associations. We propose a prediction method on the basis of network representation learning and convolutional neural networks to predict disease miRNAs, called CNNMDA. CNNMDA deeply integrates the similarity information of miRNAs and diseases, miRNA-disease associations, and representations of miRNAs and diseases in low-dimensional feature space. The new framework based on deep learning was built to learn the original and global representation of a miRNA-disease pair. First, diverse biological premises about miRNAs and diseases were combined to construct the embedding layer in the left part of the framework, from a biological perspective. Second, the various connection edges in the miRNA-disease network, such as similarity and association connections, were dependent on each other. Therefore, it was necessary to learn the low-dimensional representations of the miRNA and disease nodes based on the entire network. The right part of the framework learnt the low-dimensional representation of each miRNA and disease node based on non-negative matrix factorization, and these representations were used to establish the corresponding embedding layer. Finally, the left and right embedding layers went through convolutional modules to deeply learn the complex and non-linear relationships among the similarities and associations between miRNAs and diseases. Experimental results based on cross validation indicated that CNNMDA yields superior performance compared to several state-of-the-art methods. Furthermore, case studies on lung, breast, and pancreatic neoplasms demonstrated the powerful ability of CNNMDA to discover potential disease miRNAs.

1. Introduction

MicroRNAs (miRNAs) are a class of endogenous small RNAs of approximately 20–24 nucleotides in length. miRNAs regulate gene expression in plants and animals after transcription [1,2,3]. Accumulating studies indicate that miRNAs are closely related to the development of human diseases [4,5,6,7]. Therefore, it is imperative to explore potential disease-associated miRNAs (disease miRNAs) in order to understand disease etiology and pathogenesis.
Disease miRNAs prediction can provide reliable candidates for experimental research. Several methods have been proposed for predicting potential disease miRNAs. Mainstream methods are roughly grouped into two categories. The first category of methods primarily uses the regulatory relationship between miRNAs and their target mRNA to predict potential miRNA-disease associations [8]. First, target genes related to miRNAs are obtained by analyzing base complementarity between the miRNA sequence and the putative target gene sequence. Then, using the interactions between the target gene and known disease-related genes, the potential disease miRNAs are predicted [9,10,11,12]. However, such methods are difficult to use due to experimentally validated targets being insufficiently described to date. Although more target gene samples were obtained through some experiments [13,14], prediction results from these methods have a high false positive rate.
Methods belonging to the second category are based on prior biological knowledge that miRNAs with similar functions are usually associated with similar diseases [15]. First, network medicine is the mainstream way of defining related diseases [16,17,18], some methods make full use of network topology to identify disease miRNAs [19,20]. Moreover, disease miRNAs are identified by a random walk on a single miRNA similarity network [21,22]. However, these methods rely too much on known disease-associated miRNAs and are ineffective for new diseases that lack associated miRNAs. To address this drawback, disease similarity information and miRNA-disease associations were introduced to form miRNA-disease heterogeneous networks, where random walks on a two-layer network were used to predict candidate miRNA-disease associations [23,24]. In addition, there are other methods available for calculating miRNA-disease correlation scores, several methods use non-negative matrix factorization [25,26,27,28,29]. By applying structural perturbation [30], by using transduction learning [31], by using the induction matrix [32], through the binary network projection [33], and extracting potential features that pertain to positive sample information [34]. However, there are complex and non-linear relationships between miRNA-miRNA, disease-disease, and miRNA-disease, all previous methods struggle to extract such relationships.
In this study, we present a new approach on the basis of convolutional neural networks for predicting miRNA-disease association, called CNNMDA. It contains two parts consisting of a left and a right. CNNMDA’s left part deeply integrates miRNA similarities, disease similarities, and miRNA-disease associations, and uses these prior biological knowledge to construct the left embedding layer of the miRNA-disease node pair. The right part uses network representation learning to obtain a potential low-dimensional representation of the network node while preserving the topology of the network. Integrating the low dimensional features of miRNAs and diseases helps to estimate the likelihood of association between miRNAs and diseases at the global network level. We construct a deep learning framework based on convolutional neural networks (CNN) for the left and right parts, and learn the original representation and global representation of miRNA-disease node pairs. For some high-frequency diseases, CNNMDA can determine them with high accuracy. Moreover, case studies on 3 diseases indicate that CNNMDA is able to discover potential disease associated miRNAs.

2. Results and Discussion

2.1. Evaluation Metrics

To evaluate the performance of our prediction model, we performed a 5-fold cross-validation on CNNMDA. In the miRNA-disease association data set, the known miRNA-disease associations are called positive samples, while the unknown associations are considered negative samples. In the first place, all positive samples were extracted, and were divided into five subsets randomly. The next step was to extract the same number of negative samples as the positive samples, and these negative samples are also divided into five subsets randomly. In each cross-validation, we took four positive and four negative samples from five subsets to train the prediction model, and the remaining one positive sample and one negative sample were used as test data to evaluate the prediction performance.
Given a threshold τ, a positive sample is obtained when the prediction score is higher than τ, otherwise a negative sample is added. Accordingly, TPR and FPR are calculated by the following formula:
TPR = TP TP + FN ,   FPR = FP TN + FP ,
where TP and TN represent the number of positive and negative samples that are judged correctly, respectively. FN indicates the number of positive samples that are misidentified as negative samples, and FP represents the number of negative samples that are misidentified as positive examples. We can calculate different TPRs and FPRs based on different thresholds. The obtained TPRs and FPRs can be plotted as ROC curves, and the area under the receiver operating characteristic curve (AUC) can be used as a criterion for evaluating prediction performance.
By observing relevant data, we noted that there were only a few known miRNA-disease associations (positive samples), accounting for 1 31 of all associated data. It is not difficult to surmise that there is a serious imbalance between positive samples and negative samples. In this case, the PR (precision-recall) curve usually reflects more information than the ROC curve [35,36]. Precision indicates the proportion of positive samples that are defined correctly compared to the number of positive samples currently defined as positive examples. Recall indicates the proportion of positive samples that are defined correctly compared to all positive samples. This is calculated as follows:
Precision = TP TP + FP ,   Recall = TP TP + FN .
Similarly, precisions and recalls are calculated by different thresholds. Based on these values, the PR curve can be plotted and the area under the precision–recall curve (AUPR) can be calculated to evaluate the prediction performance of the model. In addition, biologists usually choose the top-rank prediction results for experimental validation, so we calculated the average recall value for 15 diseases in the top   k { 30 ,   60 ,   90 , ,   240 } as another evaluation method.

2.2. Comparison with Other Method

To evaluate the prediction performance of CNNMDA, we compare it with several methods that are at the forefront of the field. These included DMPred [29], GSTRW [37], BNPMDA [33], and Liu’s method [23], where the parameter settings for each method were set to achieve the best performance. In CNNMDA, the parameters w l , w f , and w p in the convolution operation were set to 3, 5, and 2, respectively. Thus, the size of the convolution sliding window J R 3 × 5 , and the sliding window F R 1 × 2 in the pooling operation. The number of filters was set to 30 (nconv = 30). The parameters α ,   β ,   λ m ,   and   λ d . used in the matrix factorization were all obtained from the set {0.2, 0.5, 0.8, 1, 2, 5, 8} by cross-validating the values of the various parameters. CNNMDA achieved the best performance when α = 0.2 , β = 0.2 , λ m = 0.2 , and λ d = 0.2 . In addition, the parameter λ in the combination formula for the left part and right part was set to 0.4. In other comparison methods, the parameters are set according to the values given in the original article.
As shown in Figure 1A and Table 1, CNNMDA achieved the best average performance for 15 diseases (AUC of ROC curve = 0.968). DMPred’s performance was the second best, where the AUC was 5% lower than CNNMDA, reaching 0.918. In addition, the AUC values of BNPMDA and Liu reached 0.838 and 0.870, which were 13% and 9.8% lower than CNNMDA, respectively. GSTRW performed poorly compared with other methods, and its AUC value was only 0.816, 15.2% lower than CNNMDA. Among the methods, GSTRW displayed poor performance since only miRNA and disease similarity information is used in this method. Liu’s method and BNPMDA fully capture the information of the network topology, and DMPred improves performance by integrating multiple sources of effective information. Our method, CNNMDA, through deep learning original representation and global representation of miRNA-disease node pairs, achieved the best prediction performance. CNNMDA also obtained the best results in each disease.
As shown in Figure 1B and Table 2, we obtained the average AUPR of all the methods with respect to 15 diseases, and plotted the corresponding PR curves. It is not difficult to surmise that the average AUC-PR area of CNNMDA under 15 diseases was also significantly higher than for other methods. Compared with GSTRW, BNPMDA, Liu’s Method and DMPred, CNNMDA displayed AUC-PR increases of 43.9%, 28.9%, 27.7%, and 24%, respectively. Moreover, in 13 of the 15 diseases, CNNMDA achieved the best performance.
In addition, to further verify the superior performance of our method compared with other methods, we applied a commonly used method called a paired t-test. After calculation, the p-values of all paired t-test results were less than 0.05 (Table 3), indicating that the performance of CNNMDA is significantly better than other methods.
This was accompanied by a higher recall rate, which means that we have successfully identified more positive samples in the top k candidate list, further indication of the superiority of this model’s prediction performance. Therefore, we calculated the average recall rate for all methods in 15 diseases (Figure 2). Our method achieved the highest average recall rate at different thresholds, where the top 30 reached 0.712, the top 60 reached 0.921, and the top 90 reached 0.980. The recall rate of DMPred was the second best at all thresholds, and ranked 0.512 in the top 30, 0.726 in the top 60, and 0.860 in top 90. The recall rate of BNPMDA and Liu was very close. The average recall rates of the top 30, the top 60, the top 90 in the former were 0.459, 0.645, and 0.753, and the latter were 0.411, 0.641, and 0.763, respectively. In contrast, GSTRW exhibited poor performance, and the recall rates in the top 30, top 60 and top 90 were 0.191, 0.469, and 0.661, respectively.

2.3. Case Studies of Lung Neoplasms, Breast Neoplasms, and Pancreatic Neoplasms

To demonstrate CNNMDA’s ability to discover potential candidate disease miRNAs, we carried out our method on case studies of lung, breast, and pancreatic neoplasms. Because of space limitations, here, we focused on analyzing the candidates for lung neoplasms and listed the potential top 50 candidate miRNAs in detail (Table 4). For the other two diseases, we briefly analyzed the top 50 candidates, and their candidates are listed separately in Supplementary Table S1 and Supplementary Table S2, respectively. To ensure the reliability of prediction results, we first verified our predictions through four public databases, dbDEMC [38], PhenomiR [39], miRCancer [40], and TCGA [41]. Among them, dbDEMC explored miRNAs with abnormal expression in different cancers, where miRNAs with significantly different expression levels in cancer compared with normal tissues were retrieved and statistically analyzed through a “Significance Analysis of Microarrays” method. Similarly, PhenomiR consisted of dysregulated miRNAs associated with diseases. miRCancer provided a comprehensive collection of miRNA expression profiles in a variety of human cancers that are automatically extracted from published literature. TCGA sequenced the entire genome of some neoplasms, including at least 6000 candidate genes and microRNA sequences. It stored genomic characterization and sequence analysis of different tumor types. Since lung cancer is one of the most frequent cancers at present, we took lung neoplasms as an example and analyzed the top 50 candidate miRNAs in detail (Table 4). Among them, dbDMEC contained 43 candidates, and 32 candidates were verified by PhenomiR, indicating that they have been confirmed to be upregulated or downregulated in lung neoplasms. In addition, 10 candidates are included in the miRCancer, which further confirms their associations with the disease, and 7 miRNAs are contained in TCGA, indicating their different expression levels between cancer and normal tissues. The remaining 7 candidates were verified by the literature, where 5 miRNAs were confirmed to exert dysregulations in lung tissues compared with normal tissue [42,43,44,45,46]. miR-15a is involved in the regulation of non-small cell lung cancer and controls cell cycle progression in a synergistic and Rb-dependent manner [47], while miR-374a was confirmed to have different effects at different stages of lung cancer [48].
Among the top 50 candidates for breast neoplasms (ST1), dbDEMC and PhenomiR included 46 and 33 candidates, respectively, whose expression levels varied significantly in breast tumors compared with the normal tissues. The miRCancer contained 22 candidates indicating their associations with breast neoplasms, and 3 candidates were confirmed by TCGA, which demonstrates their different expression levels in different biological states. The remaining 3 candidates were verified by the literature. Among them, miR-142 is upregulated in human breast cancer stem cells (BCSCs) as compared to the non-tumorigenic breast cancer cells [49]. In addition, miR-542 can be used to predict the prognosis of breast cancer patients based on the mRNA expression of target gene lymphocyte antigen 9 (LY9), resulting in the secretion of frizzled protein-related protein 1 (SFRP1) [50]. miR-30e has separately been identified as an independent subtype-specific prognostic marker in breast cancer [51].
The top 50 pancreatic tumor candidates are listed in ST2, where 45 and 34 candidates are contained in the dbDEMC and PhenomiR, respectively. There are 19 candidates in the miRCancer that are known to be associated with the disease. Moreover, TCGA comprises 3 candidates. Five other candidates were also confirmed by the literature [52,53], where we also confirmed their different regulatory effects on pancreatic tumors. Moreover, the downregulation of the tumor protein UNC51-like kinase 1 (ULK1) by miR-372 inhibits the survival of human pancreatic cancer cells [54]. While miR-483 promotes cell proliferation by down-regulating its target gene Smad4 in pancreatic ductal adenocarcinoma (PDAC) cells. The three case studies provided above demonstrated the strong performance of CNNMDA in discovering potential disease associated miRNAs [55].
Functional enrichment analysis of miRNAs is helpful in understanding the function of disease-related miRNAs. Some tools [56,57,58] can be used to analyze the association between the function of the potential disease-associated miRNAs and disease progression. Among these tools, TAM [57] is a convenient online tool (http://cmbi.bjmu.edu.cn/tam), it integrates miRNAs into different sets according to various rules and provides investigators with the potential biological functions of the list of miRNAs. We performed functional enrichment analysis for the predicted top 50 potential disease-related miRNAs based on TAM. Here, we focused on the analysis of candidate miRNAs related to lung neoplasms (Figure 3). The results of the enrichment analysis of breast neoplasms and pancreatic neoplasms are listed in Supplementary Figures S1 and S2, respectively. Among the top 50 candidate miRNAs that relate to lung neoplasms, 12 miRNAs are involved in cell cycle-related functions, and 13 miRNAs are involved in human embryonic stem cell regulation functions. Furthermore, 9 miRNAs are concerned with apoptosis. In addition, 7, 7, and 6 miRNAs are related to cell proliferation, hormones regulation, and immune response, respectively. All the miRNA-related functions mentioned above have been confirmed to be closely related to the development of diseases. For instance, numerous studies have confirmed that cell cycle changes are closely related to cancer. When the normal cell cycle changes, the changes may lead to the division of some cells in the body and further cause cancer [59,60]. Specifically, it has been confirmed that cell cycle regulators play an important role in lung neoplasms [61]. As for human embryonic stem cell regulation, some research indicates it may be the origin of some solid tumors, including lung neoplasms, stomach neoplasms, and breast neoplasms [62,63]. Moreover, the metastasis of lung cancer may occur due to the dysregulation of some hormones in the human body [64], and the senescence of the immune system is a possible cause of lung cancer [65]. The other enriched functions associated with more miRNAs, such as apoptosis and cell proliferation, are related to the occurrence and development of diseases [66]. The above analysis can provide some insights into the putative roles of these candidates in lung neoplasms.

3. Materials and Methods

3.1. Dataset

We obtained miRNA-disease association data from the human miRNA-disease database (HMDD v2. 0) [67]. The database has collected thousands of miRNA-disease associations that have been experimentally verified. There were 492 miRNAs and 329 diseases in the dataset of our study, which contained 5218 known associations between them. The disease terms we used were derived from the U.S. National Library of Medicine. In terms of diseases, phenotype similarities and the semantic similarities between them were extracted from related literature [68].

3.2. Representation of miRNA and Disease Heterogeneous Data

3.2.1. MiRNA Similarity Measure

miRNAs with approximate function have high probabilities of being associated with similar diseases. Most existing miRNA similarity data are obtained by calculating the similarity of the diseases to which they are associated. For example, miRNA m 1 is associated with diseases d 2 , d 3 , and d 4 , miRNA m 2 is associated with diseases d 1 , d 3 , and d 4 . By calculating the similarity between disease set { d 2 , d 3 , d 4 } and set { d 1 , d 3 , d 4 } as the similarity between m 1 and m 2 [69], it can be defined as M 12 . miRNA similarities used in this study were calculated according to the above method. The similarity of N m miRNAs is represented by matrix [ M i j ] R N m × N m and each value is between 0 and 1.

3.2.2. Disease Similarity Measure

Similarities between disease pairs can be judged by their semantics and phenotype; under normal conditions, if there are more common semantic terms and phenotypes between disease pairs, then they have a high probability of similarity. Accordingly, previous work calculated disease similarity based on the phenotypic and semantic information of the disease [29]. Disease similarities used in this study were obtained using Xuan’s method. The similarity of Nd diseases are represented by matrix [ D i j ] R N d × N d and each value is also between 0 and 1.

3.2.3. miRNA-Disease Associations

We used the matrix A R N m × N m to represent the associations between N m miRNAs and N d diseases. If miRNA m i is known to be associated with a disease d j , A i j = 1 ; contrastingly, A i j = 0 indicates that their association has not been explored.

3.3. Prediction Model Based on Network Representation Learning and Dual CNN

Here, we developed a novel prediction method based on network representation learning and dual CNN to infer potential miRNA-disease associations. Its prediction model is divided into a left part and a right part (Figure 4). The left part learns feature association representation between a miRNA m i and a disease d j through original feature information. The right part projects all miRNA and disease nodes into a low-dimensional space, thereby integrating their global information to obtain representative low-dimensional features of m i and d j . These two parts use CNN layer deep learning node level representation and global level representation, respectively. Next, the two sides obtain prediction scores for m i and d j through the fully connected layer, respectively. Finally, we integrated two scores as a final prediction score between m i and d j .

3.3.1. Embedding Layer on the Left

The left part integrates original feature information of miRNA and disease pairs. This is performed on the basis that miRNAs may be associated with similar diseases if they have similar functions and vice versa. Therefore, we combined miRNA and disease similarities as well as associations between them to form the feature representation of the left part. As an example, we have described the integration process of miRNA m 1 and disease d 5 (Figure 5). The first row of M is denoted as M 1 . It contains similarity information between miRNA m 1 and all of the miRNAs. The fifth row of A T is denoted as A 5 T , it consists of the association of disease d 5 with all of the miRNAs. miRNA m 1 is similar to m 3 , m 5 , and m 6 , and the disease d 5 has known association with m 3 and m 5 . Thus m 1 and d 5 are likely to be associated, as they are all related to m 3 and m 5 . Similarly, we integrate the first row of matrix A ( A 1 ) together with the third row of matrix D ( D 5 ). miRNA m 1 is known to be associated with d 1 , d 3 , and d 4 , and disease d 5 is similar to d 1 and d 3 , since both m 1 and d 5 are related to d 1 and d 3 . Therefore m 1 and d 5 may be associated with each other. Finally, we integrated M 1 , A 1 , D 5 , and A 5 T to form the feature matrix B R 2 × ( N m + N d ) .

3.3.2. Embedding Layer on the Right

In the right part, miRNA (disease) is projected into k-dimensional space to obtain representative low-dimensional features of miRNA and disease pairs, and integrate their global information. Non-negative matrix factorization (NMF) is an effective way to get a low-dimensional representation, and is widely used in data representation [70,71]. It aims to calculate two optimal non-negative matrices such that their product approximates the original matrix. Specifically, for the miRNA similarity matrix M R N m × N m , each row in it can be considered as a feature vector of a single miRNA, and we need to find non-negative matrices W R N m × k and X R N m × k whose products approximate to M, such as M W X T . Therefore, there is an optimization item as follows:
m i n W 0 , X 0 M W X T F 2 ,
where · F is the Frobenius norm of a matrix, X represents a low-dimensional feature matrix of miRNA, and W is the basic matrix which is similar to the parameter matrix. Finally, k represents the target dimension that we reduce to.
Similarly, we also project disease information into k-dimensional space, in terms of disease similarity matrix D R N d × N d , calculating matrices V R N d × k and Y R N d × k , and D V Y T . Thus, combined with Equation (3), we obtain the following objective function:
m i n W , X , V , Y 0 M W X T F 2 + α D V Y T F 2 ,
where α is a parameter for control the contribution of the second item. Y represents a low-dimensional disease feature matrix, and V is a basic matrix.
The i-th row of feature matrix X, x i , which is a row vector, represents the k-dimensional features of miRNA m i . Similary, the j-th row of feature matrix Y, y j , also a row vector, represents the k-dimensional features of disease d j . If the k-dimensional features of m i and d j are mostly consistent, there may be potential links between them. The association probability between them is estimated by the formula ( x i ) ( y T ) j = ( x y T ) i j , and the score should be close to A i j , which is the true association probability between m i and d j . As a result, we extend the objective function to:
m i n W , X , V , Y 0 M W X T F 2 + α D V Y T F 2 + β A X Y T F 2 ,
where β is a parameter used to adjust the contribution of the third item.
In addition, if miRNA m i is similar to miRNA m j , m i is likely related to other miRNAs whose similarity scores are relatively high with m j . To preserve this network topology information, we introduce the graph regular term, which indicates that if the two miRNAs (diseases) m i and m j are close in original feature space, these two miRNAs (diseases) should also be closer to each other when their feature dimensions are reduced. However, prior to this, we need to establish a graph model for miRNA and disease feature matrices.
For the miRNA feature matrix, a graph model S m is constructed. The elements S i j m are comprised of:
S i j m = { 1 , i f   m i   i s   t h e   k n e a r e s t   n e i g h b o r   o f   m j 0 , o t h e r w i s e
where m i and m j represent the i-th miRNA and the j-th miRNA, respectively. The similarity score between them is obtained from matrix M, and similarity scores of the m i are sorted with the rest of the miRNAs to determine whether m j belongs to the k-nearest of m i .
For the disease feature matrix, a supplementary graph model S d is constructed:
S p q d = { 1 , i f   d p   i s   t h e   k n e a r e s t   n e i g h b o r   o f   d q 0 , o t h e r w i s e
where d p and d q represent disease p and disease q. The similarity between d p and d q are obtained from matrix D.
The graph regular terms for miRNAs and diseases are defined as:
1 2 i , j = 1 N m x i x j 2 S i j m = t r ( X T L m X ) ,
1 2 p , q = 1 N d y p y q 2 S p q d = t r ( Y T L d Y ) ,
where tr(.) represents the trace of a matrix, x i represents the i-th row of the matrix X, and y p represents the p-th row of the matrix Y. L m = D m S m and L d = D d S d are graph Laplacian matrices for S m and S d , respectively, D m and D d are the diagonal matrices and D m ( i , i ) = j = 1 N m S m ( i , j ) , D d ( p , p ) = q = 1 N d S d ( p , q ) . Combining the graph regular terms into the objective function gives:
m i n W , X , V , Y 0 M W X T F 2 + α D V Y T F 2 + β A X Y T F 2 + λ m T r ( X T L m X ) + λ d T r ( Y T L d Y ) ,
where λ m and λ d are parameters used to adjust the regularization terms.
Since the objective function in Equation (10) is not convex, it is unrealistic to hope to find a global optimal solution. We propose a strategy to find local minima by iteratively updating one item with other items fixed, such as updating X with W, Y, and V fixed. In addition, to constrain the matrix elements that are non-negative ( w i j 0 , x i j 0 , v p q 0 , y p q 0 ), we add the corresponding Lagrangian function. Finally, according to the trace and Frobenius norm of a matrix, the objective function L can also be expressed as:
L = T r ( M M T W X T M T M X W T + W X T X W T ) + α T r ( D D T V Y T D T D Y V T + V Y T Y V T ) + β T r ( A A T X Y T A T A Y X T + X Y T Y X T ) + λ m T r ( X T L m X ) + λ d T r ( Y T L d Y ) + T r ( δ W T ) + T r ( μ X T ) + T r ( ϕ V T ) + T r ( θ Y T ) ,
where δ ,   μ ,   φ ,   θ represents a Lagrange multiplier. Then the partial derivatives of X, W, Y, and Z can be calculated through the following function:
L X = 2 M T W + 2 X W T W 2 β A Y + 2 β X Y T Y + 2 λ m L m X + μ ,
L W = 2 M X + 2 W X T X + δ ,
L V = 2 α D Y + 2 α V Y T Y + ϕ ,
L Y = 2 α D T V + 2 α Y V T V 2 β A T X + 2 β Y X T X + 2 λ d L d Y + θ .
According to Karush–Kuhn–Tucker (KKT) conditions [72], δ i j w i j = 0 , μ i j x i j = 0 , φ i j v i j = 0 , θ i j y i j = 0 , the following equations are obtained:
( M T W + X W T W β A Y + β X Y T Y + λ m L m X ) i j x i j = 0 ,
( M X + W X T X ) i j w i j = 0 ,
( α D Y + α V Y T Y ) i j v i j = 0 ,
( α D T V + α Y V T V β A T X + β Y X T X + λ d L d Y ) i j y i j = 0 .
Finally, we obtained the following update rules:
x i j x i j ( M T W + β A Y + λ m S m X ) i j ( X W T W + β X Y T Y + λ m D m X ) i j ,
w i j w i j ( M X ) i j ( W X T X ) i j ,
w i j w i j ( M X ) i j ( W X T X ) i j ,
y i j y i j ( α D T V + β A T X + λ d S d Y ) i j ( α Y V T V + β Y X T X + λ d D d Y ) i j .
Here, we iteratively update W, X, V, and Y through the above update formula until convergence. The first row of X , x 1 , is the feature vector of miRNA m 1 and the fifth row of Y , y 5 , is the feature vector of disease d 5 . If the k-dimensional features of m 1 and d 5 are mostly consistent, there may be potential links between them. Moreover, x 1 and y 5 are integrated together to form a global feature representation matrix P   ϵ   R 2 × k (Figure 6).

3.3.3. Convolutional Module on the Left

Feature matrix B, consisting of m 1 and d 5 , is input to the CNN module to learn the original node pair representation between m 1 and d 5 . In the convolutional layer, the convolution filter size is set to w l × w f , and the number of filters is n c o n v . Therefore, the convolution filters can be represented as W c o n v R w l × w f × n c o n v . The output after the convolution operation is expressed as C 1 R 2 × ( N m + N d w f + 1 ) × n c o n v . The following formulas represents the convolution process of X:
X c o n v , i , j = ( X ( i , j , 1 ) , X ( i , j , 2 ) , , X ( i , j , j + w f 1 ) ) X c o n v , i , j R w l × w f ,
C 1 ( i , j , t ) = g ( X c o n v , i , j W c o n v ( : , : , t ) + b c o n v ( t ) ) , i [ 1 , 2 ] ,   j [ 1 , N m + N d w f + 1 ] ,   t [ 1 , n c o n v ] ,
where X ( i , j , 1 ) . indicates the first column vector in the sliding window when the filter moves to the j-th position of the i-th layer, and C 1 ( i , j , t ) represents the convolution result when the t-th filter slides to the j-th position of the i-th layer. g is a nonlinear activation function and b c o n v is a bias vector. In the above formula, the stride is set to 1 by default. In the pooling layer, we apply the max-pooling operation to compress the convolution result C 1 , and get the output P 1 R ( N m + N d ) × n c o n v :
P 1 ( i , p , t ) = m a x ( C 1 ( i , w p ( p 1 ) + 1 , t ) , , C 1 ( i , w p p , t ) ) ,
where P 1 ( i , p , t ) is the pooling result for the p-th position in the i-th row, and w p is the width of the sliding window in the pooling operation. Next, P 1 is used as the input to enter the second convolution layer after the same convolution and pooling operations as above to get the result H 1 R 1 2 × ( N m + N d ) × 2 n c o n v . We then flatten H 1 to a column vector c R v × 1 ( v = 1 2 × ( N m + N d ) × 2 n c o n v ). Finally, through the fully connected layer W L and the softmax layer, we obtain the association prediction score between m 1 and d 5 . The score is defined as s c o r e 1 R 2 × 1 :
s c o r e 1 = W L × c .

3.3.4. Convolutional Module on the Right

The embedding in the right part, P R 2 × k , is used as input to learn global information about miRNA m 1 and disease d 5 through their representative k-dimensional features. The process of convolution and pooling on the right is similar to the left, and the detailed operation process is defined as follows:
Y c o n v , i , j = ( Y ( i , j , 1 ) , Y ( i , j , 2 ) , , Y ( i , j , j + w f 1 ) ) Y c o n v , i , j R w l × w f ,
C 2 ( i , j , t ) = g ( Y c o n v , i , j W c o n v ( : , : , t ) + b c o n v ( t ) ) ,
P 2 ( i , p , t ) = m a x ( C 2 ( i , w p ( p 1 ) + 1 , t ) , , C 2 ( i , w p p , t ) ) ,
where Y indicates the value of the sliding window at different positions. C 2 is the feature output after the convolution layer, which then passes through the pooling layer to obtain P 2 . We also use P 2 as the input for the next convolution layer, and obtain the output H 2 R 1 2 × k × 2 n c o n v through convolution and pooling operations. The next step is to flatten H 2 to a column vector o R v × 1 ( v = 1 2 × k × 2 n c o n v ). Finally, through the fully connected layer W R and the softmax layer, we obtain the association prediction score between m 1 and d 5 . The score is defined as s c o r e 2 R 2 × 1 :
s c o r e 2 = W R × o .

3.3.5. Combined Strategy

Considering the two parts of the prediction scores between m 1 and d 5 from different perspectives, the optimal performance of the two parts may be different. Therefore, we integrated s c o r e 1 and s c o r e 2 as the final association score. It is defined as follows:
s c o r e = λ × s c o r e 1 + ( 1 λ ) × s c o r e 2 ,
where λ ( 0 , 1 ) is a parameter used to weigh the score contributions of s c o r e 1 and s c o r e 2 . The left and right CNN models all establish a loss function based on cross entropy, defined as l o s s 1 and l o s s 2 , respectively:
l o s s 1 = i = 1 T [ y l a b e l × l o g a + ( 1 y l a b e l ) × l o g ( 1 a ) ] ,
a = e s c o r e 1 ( 1 ) e s c o r e 1 ( 0 ) + e s c o r e 1 ( 1 ) ,
l o s s 2 = i = 1 T [ y l a b e l × l o g b + ( 1 y l a b e l ) × l o g ( 1 b ) ] ,
b = e s c o r e 2 ( 1 ) e s c o r e 2 ( 0 ) + e s c o r e 2 ( 1 ) ,
where y l a b l e represents the actual associated label between the miRNA and the disease. If the association between the miRNA and the disease is known, y l a b l e = 1 , otherwise, y l a b l e = 0 . s c o r e 1 ( 0 ) and s c o r e 1 ( 1 ) represent the association scores of miRNAs and diseases on the left side. It is similar to a binary classification problem, where s c o r e 1 ( 0 ) represents the probability that m 1 and d 5 are not associated, and s c o r e 1 ( 1 ) represents the probability of an association. Finally, we used the softmax function to obtain the association probability a. Similarly, for the calculated right path association probability b, score ( 1 ) indicates the final prediction score between m 1 and d 5 , and T represents the number of training samples.

3.4. Predicting Novel Disease-Related miRNAs

The predictive performance of CNNMDA was evaluated through a cross-validation process and several case studies, and was applied to predict potential candidate miRNAs for all 329 diseases. We used all positive and negative samples to train CNNMDA. The predicted results of 329 diseases are listed in Supplementary Table S3. Moreover, the candidate miRNAs related to 3 diseases are analyzed in case studies and they come from Supplementary Table S3.

4. Conclusions

CNNMDA has been developed as a novel method based on network representation learning and dual convolutional neural networks for predicting potential miRNA-disease associations. CNNMDA captures the internal relationships between miRNAs and diseases, including miRNA similarities and disease similarities. Meanwhile, it also captures the associations between miRNAs and diseases. Moreover, the representations of the miRNA nodes and the disease nodes are learned based on an entire miRNA-disease network, and as such are deeply integrated to enhance logical reasoning. The new framework based on network representation learning and dual convolutional neural networks is able to learn the original and global representations of a miRNA-disease pair. CNNMDA’s performance was verified by cross-validation with 15 common diseases and case studies on 3 diseases. Experimental results indicated that CNNMDA outperforms existing methods in terms of both AUCs and AUPRs. It is able to generate reliable candidate miRNA-disease associations for subsequent validation by biologists.

Supplementary Materials

Supplementary materials can be found at https://www.mdpi.com/1422-0067/20/15/3648/s1.

Author Contributions

P.X., H.S. and X.W. conceived the prediction method, and H.S. wrote the paper. H.S. and S.P. developed the computer programs. P.X. and T.X. analyzed the results and revised the paper.

Funding

The work was supported by the Natural Science Foundation of China (61702296, 61302139), the Natural Science Foundation of Heilongjiang Province (LH2019F049, LH2019A029), the China Postdoctoral Science Foundation (2019M650069), the Heilongjiang Postdoctoral Scientific Research Staring Foundation (BHL-Q18104), the Fundamental Research Foundation of Universities in Heilongjiang Province for Technology Innovation (KJCX201805), the Fundamental Research Foundation of Universities in Heilongjiang Province for Youth Innovation Team (RCYJTD201805), and Heilongjiang university key laboratory jointly built by Heilongjiang province and ministry of education (Heilongjiang university).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Chen, K.; Rajewsky, N. The evolution of gene regulation by transcription factors and microRNAs. Nat. Rev. Genet. 2007, 8, 93–103. [Google Scholar] [CrossRef] [PubMed]
  2. Subramanian, S.; Fu, Y.; Sunkar, R.; Barbazuk, W.B.; Zhu, J.-K.; Yu, O. Novel and nodulation-regulated microRNAs in soybean roots. BMC Genom. 2008, 9, 160. [Google Scholar] [CrossRef] [PubMed]
  3. Zhang, B.; Wang, Q.; Pan, X. MicroRNAs and their regulatory roles in animals and plants. J. Cell. Physiol. 2007, 210, 279–289. [Google Scholar] [CrossRef] [PubMed]
  4. Calin, G.A.; Croce, C.M. MicroRNA signatures in human cancers. Nat. Rev. Cancer 2006, 6, 857–866. [Google Scholar] [CrossRef] [PubMed]
  5. Chen, X.; Xie, D.; Zhao, Q.; You, Z.-H. MicroRNAs and complex diseases: From experimental results to computational models. Brief. Bioinform. 2017, 20, 515–539. [Google Scholar] [CrossRef] [PubMed]
  6. Gaur, A.; Jewell, D.A.; Liang, Y.; Ridzon, D.; Moore, J.H.; Chen, C.; Ambros, V.R.; Israel, M.A. Characterization of microRNA expression levels and their biological correlates in human cancer cell lines. Cancer Res. 2007, 67, 2456–2468. [Google Scholar] [CrossRef] [PubMed]
  7. Meola, N.; Gennarino, V.A.; Banfi, S. microRNAs and genetic diseases. Pathogenetics 2009, 2, 7. [Google Scholar] [CrossRef] [PubMed]
  8. Bartel, D.P. MicroRNAs: Genomics, biogenesis, mechanism, and function. Cell 2004, 116, 281–297. [Google Scholar] [CrossRef]
  9. Jiang, Q.; Hao, Y.; Wang, G.; Juan, L.; Zhang, T.; Teng, M.; Liu, Y.; Wang, Y. Prioritization of disease microRNAs through a human phenome-microRNAome network. BMC Syst. Biol. 2010, 4, S2. [Google Scholar] [CrossRef]
  10. Qabaja, A.; Alshalalfa, M.; Bismar, T.A.; Alhajj, R. Protein network-based Lasso regression model for the construction of disease-miRNA functional interactions. EURASIP J. Bioinform. Syst. Biol. 2013, 2013, 3. [Google Scholar] [CrossRef]
  11. Shi, H.; Xu, J.; Zhang, G.; Xu, L.; Li, C.; Wang, L.; Zhao, Z.; Jiang, W.; Guo, Z.; Li, X. Walking the interactome to identify human miRNA-disease associations through the functional link between miRNA targets and disease genes. BMC Syst. Biol. 2013, 7, 101. [Google Scholar] [CrossRef] [PubMed]
  12. Xu, C.; Ping, Y.; Li, X.; Zhao, H.; Wang, L.; Fan, H.; Xiao, Y.; Li, X. Prioritizing candidate disease miRNAs by integrating phenotype associations of multiple diseases with matched miRNA and mRNA expression profiles. Mol. Biosyst. 2014, 10, 2800–2809. [Google Scholar] [CrossRef] [PubMed]
  13. Kertesz, M.; Iovino, N.; Unnerstall, U.; Gaul, U.; Segal, E. The role of site accessibility in microRNA target recognition. Nat. Genet. 2007, 39, 1278–1284. [Google Scholar] [CrossRef] [PubMed]
  14. Lewis, B.P.; Shih, I.-h.; Jones-Rhoades, M.W.; Bartel, D.P.; Burge, C.B. Prediction of mammalian microRNA targets. Cell 2003, 115, 787–798. [Google Scholar] [CrossRef]
  15. Bandyopadhyay, S.; Mitra, R.; Maulik, U.; Zhang, M.Q. Development of the human cancer microRNA network. Silence 2010, 1, 6. [Google Scholar] [CrossRef] [PubMed]
  16. Barabási, A.-L.; Gulbahce, N.; Loscalzo, J. Network medicine: A network-based approach to human disease. Nat. Rev. Genet. 2011, 12, 56–68. [Google Scholar] [CrossRef] [PubMed]
  17. Paci, P.; Colombo, T.; Fiscon, G.; Gurtner, A.; Pavesi, G.; Farina, L. SWIM: A computational tool to unveiling crucial nodes in complex biological networks. Sci. Rep. 2017, 7, 44797. [Google Scholar] [CrossRef] [PubMed]
  18. Fiscon, G.; Conte, F.; Farina, L.; Paci, P. Network-based approaches to explore complex biological systems towards network medicine. Genes 2018, 9, 437. [Google Scholar] [CrossRef]
  19. Fiscon, G.; Conte, F.; Farina, L.; Pellegrini, M.; Russo, F.; Paci, P. Identification of Disease–miRNA Networks Across Different Cancer Types Using SWIM. In MicroRNA Target Identification; Humana Press: New York, NY, USA, 2019; pp. 169–181. [Google Scholar]
  20. Xu, J.; Li, C.-X.; Lv, J.-Y.; Li, Y.-S.; Xiao, Y.; Shao, T.-T.; Huo, X.; Li, X.; Zou, Y.; Han, Q.-L. Prioritizing candidate disease miRNAs by topological features in the miRNA target–dysregulated network: Case study of prostate cancer. Mol. Cancer Ther. 2011, 10, 1857–1866. [Google Scholar] [CrossRef] [PubMed]
  21. Chen, X.; Liu, M.-X.; Yan, G.-Y. RWRMDA: Predicting novel human microRNA–disease associations. Mol. BioSyst. 2012, 8, 2792–2798. [Google Scholar] [CrossRef] [PubMed]
  22. Xuan, P.; Han, K.; Guo, Y.; Li, J.; Li, X.; Zhong, Y.; Zhang, Z.; Ding, J. Prediction of potential disease-associated microRNAs based on random walk. Bioinformatics 2015, 31, 1805–1815. [Google Scholar] [CrossRef] [PubMed]
  23. Liu, Y.; Zeng, X.; He, Z.; Zou, Q. Inferring microRNA-disease associations by random walk on a heterogeneous network with multiple data sources. IEEE/ACM Trans. Comput. Biol. Bioinform. 2016, 14, 905–915. [Google Scholar] [CrossRef] [PubMed]
  24. Luo, J.; Xiao, Q. A novel approach for predicting microRNA-disease associations by unbalanced bi-random walk on heterogeneous network. J. Biomed. Inform. 2017, 66, 194–203. [Google Scholar] [CrossRef] [PubMed]
  25. Chen, X.; Huang, L. LRSSLMDA: Laplacian regularized sparse subspace learning for MiRNA-disease association prediction. PLoS Comput. Biol. 2017, 13, e1005912. [Google Scholar] [CrossRef] [PubMed]
  26. Shen, Z.; Zhang, Y.-H.; Han, K.; Nandi, A.K.; Honig, B.; Huang, D.-S. miRNA-disease association prediction with collaborative matrix factorization. Complexity 2017, 2017, 2498957. [Google Scholar] [CrossRef]
  27. Xiao, Q.; Luo, J.; Liang, C.; Cai, J.; Ding, P. A graph regularized non-negative matrix factorization method for identifying microRNA-disease associations. Bioinformatics 2017, 34, 239–248. [Google Scholar] [CrossRef]
  28. Xuan, P.; Shen, T.; Wang, X.; Zhang, T.; Zhang, W. Inferring disease-associated microRNAs in heterogeneous networks with node attributes. IEEE/ACM Trans. Comput. Biol. Bioinform. 2018. [Google Scholar] [CrossRef]
  29. Zhong, Y.; Xuan, P.; Wang, X.; Zhang, T.; Li, J.; Liu, Y.; Zhang, W. A non-negative matrix factorization based method for predicting disease-associated miRNAs in miRNA-disease bilayer network. Bioinformatics 2017, 34, 267–277. [Google Scholar] [CrossRef] [Green Version]
  30. Zeng, X.; Liu, L.; Lü, L.; Zou, Q. Prediction of potential disease-associated microRNAs using structural perturbation method. Bioinformatics 2018, 34, 2425–2432. [Google Scholar] [CrossRef] [Green Version]
  31. Luo, J.; Ding, P.; Liang, C.; Cao, B.; Chen, X. Collective prediction of disease-associated miRNAs based on transduction learning. IEEE/ACM Trans. Comput. Biol. Bioinform. 2016, 14, 1468–1475. [Google Scholar] [CrossRef]
  32. Chen, X.; Wang, L.; Qu, J.; Guan, N.-N.; Li, J.-Q. Predicting miRNA–disease association based on inductive matrix completion. Bioinformatics 2018, 34, 4256–4265. [Google Scholar] [CrossRef] [PubMed]
  33. Chen, X.; Xie, D.; Wang, L.; Zhao, Q.; You, Z.-H.; Liu, H. BNPMDA: Bipartite network projection for MiRNA–disease association prediction. Bioinformatics 2018, 34, 3178–3186. [Google Scholar] [CrossRef] [PubMed]
  34. Che, K.; Guo, M.; Wang, C.; Liu, X.; Chen, X. Predicting MiRNA-Disease Association by Latent Feature Extraction with Positive Samples. Genes 2019, 10, 80. [Google Scholar] [CrossRef] [PubMed]
  35. Saito, T.; Rehmsmeier, M. The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS ONE 2015, 10, e0118432. [Google Scholar] [CrossRef] [PubMed]
  36. Xuan, P.; Sun, C.; Zhang, T.; Ye, Y.; Shen, T.; Dong, Y. Gradient Boosting Decision Tree-Based Method for Predicting Interactions Between Target Genes and Drugs. Front. Genet. 2019, 10. [Google Scholar] [CrossRef] [PubMed]
  37. Chen, M.; Liao, B.; Li, Z. Global Similarity Method Based on a Two-tier Random Walk for the Prediction of microRNA–Disease Association. Sci. Rep. 2018, 8, 6481. [Google Scholar] [CrossRef] [PubMed]
  38. Yang, Z.; Ren, F.; Liu, C.; He, S.; Sun, G.; Gao, Q.; Yao, L.; Zhang, Y.; Miao, R.; Cao, Y. dbDEMC: A database of differentially expressed miRNAs in human cancers. BMC Genom. 2010, 11, S5. [Google Scholar] [CrossRef] [PubMed]
  39. Ruepp, A.; Kowarsch, A.; Schmidl, D.; Buggenthin, F.; Brauner, B.; Dunger, I.; Fobo, G.; Frishman, G.; Montrone, C.; Theis, F.J. PhenomiR: A knowledgebase for microRNA expression in diseases and biological processes. Genome Biol. 2010, 11, R6. [Google Scholar] [CrossRef] [PubMed]
  40. Xie, B.; Ding, Q.; Han, H.; Wu, D. miRCancer: A microRNA–cancer association database constructed by text mining on literature. Bioinformatics 2013, 29, 638–644. [Google Scholar] [CrossRef]
  41. Tomczak, K.; Czerwińska, P.; Wiznerowicz, M. The Cancer Genome Atlas (TCGA): An immeasurable source of knowledge. Contemp. Oncol. 2015, 19, A68–A77. [Google Scholar] [CrossRef]
  42. Chen, L.-T.; Xu, S.-D.; Xu, H.; Zhang, J.-F.; Ning, J.-F.; Wang, S.-F. MicroRNA-378 is associated with non-small cell lung cancer brain metastasis by promoting cell migration, invasion and tumor angiogenesis. Med. Oncol. 2012, 29, 1673–1680. [Google Scholar] [CrossRef] [PubMed]
  43. Daugaard, I.; Sanders, K.; Idica, A.; Vittayarukskul, K.; Hamdorf, M.; Krog, J.; Chow, R.; Jury, D.; Hansen, L.; Hager, H. miR-151a induces partial EMT by regulating E-cadherin in NSCLC cells. Oncogenesis 2017, 6, e366. [Google Scholar] [CrossRef] [PubMed]
  44. Hu, L.; Ai, J.; Long, H.; Liu, W.; Wang, X.; Zuo, Y.; Li, Y.; Wu, Q.; Deng, Y. Integrative microRNA and gene profiling data analysis reveals novel biomarkers and mechanisms for lung cancer. Oncotarget 2016, 7, 8441–8454. [Google Scholar] [CrossRef] [PubMed]
  45. Shen, W.; Liu, J.; Zhao, G.; Fan, M.; Song, G.; Zhang, Y.; Weng, Z.; Zhang, Y. Repression of Toll-like receptor-4 by microRNA-149-3p is associated with smoking-related COPD. Int. J. Chronic Obstr. Pulm. Dis. 2017, 12, 705–715. [Google Scholar] [CrossRef] [PubMed]
  46. Tang, Y.; Cui, Y.; Li, Z.; Jiao, Z.; Zhang, Y.; He, Y.; Chen, G.; Zhou, Q.; Wang, W.; Zhou, X. Radiation-induced miR-208a increases the proliferation and radioresistance by targeting p21 in human lung cancer cells. J. Exp. Clin. Cancer Res. 2016, 35, 7. [Google Scholar] [CrossRef]
  47. Bandi, N.; Vassella, E. miR-34a and miR-15a/16 are co-regulated in non-small cell lung cancer and control cell cycle progression in a synergistic and Rb-dependent manner. Mol. Cancer 2011, 10, 55. [Google Scholar] [CrossRef] [PubMed]
  48. Zhao, M.; Xu, P.; Liu, Z.; Zhen, Y.; Chen, Y.; Liu, Y.; Fu, Q.; Deng, X.; Liang, Z.; Li, Y. Dual roles of miR-374a by modulated c-Jun respectively targets CCND1-inducing PI3K/AKT signal and PTEN-suppressing Wnt/β-catenin signaling in non-small-cell lung cancer. Cell Death Dis. 2018, 9, 78. [Google Scholar] [CrossRef]
  49. Isobe, T.; Hisamori, S.; Hogan, D.J.; Zabala, M.; Hendrickson, D.G.; Dalerba, P.; Cai, S.; Scheeren, F.; Kuo, A.H.; Sikandar, S.S. miR-142 regulates the tumorigenicity of human breast cancer stem cells through the canonical WNT signaling pathway. Elife 2014, 3, e01977. [Google Scholar] [CrossRef]
  50. Zhu, Q.-N.; Renaud, H.; Guo, Y. Bioinformatics-based identification of miR-542-5p as a predictive biomarker in breast cancer therapy. Hereditas 2018, 155, 17. [Google Scholar] [CrossRef]
  51. D’aiuto, F.; Callari, M.; Dugo, M.; Merlino, G.; Musella, V.; Miodini, P.; Paolini, B.; Cappelletti, V.; Daidone, M. miR-30e* is an independent subtype-specific prognostic marker in breast cancer. Br. J. Cancer 2015, 113, 290–298. [Google Scholar] [CrossRef] [Green Version]
  52. Gui, Z.; Li, S.; Liu, X.; Xu, B.; Xu, J. Oridonin alters the expression profiles of microRNAs in BxPC-3 human pancreatic cancer cells. BMC Complement. Altern. Med. 2015, 15, 117. [Google Scholar] [CrossRef] [PubMed]
  53. Yu, J.; Li, A.; Hong, S.-M.; Hruban, R.H.; Goggins, M. MicroRNA alterations of pancreatic intraepithelial neoplasias. Clin. Cancer Res. 2012, 18, 981–992. [Google Scholar] [CrossRef] [PubMed]
  54. Chen, H.; Zhang, Z.; Lu, Y.; Song, K.; Liu, X.; Xia, F.; Sun, W. Downregulation of ULK 1 by micro RNA-372 inhibits the survival of human pancreatic adenocarcinoma cells. Cancer Sci. 2017, 108, 1811–1819. [Google Scholar] [CrossRef] [PubMed]
  55. Hao, J.; Zhang, S.; Zhou, Y.; Hu, X.; Shao, C. MicroRNA 483-3p suppresses the expression of DPC4/Smad4 in pancreatic cancer. FEBS Lett. 2011, 585, 207–213. [Google Scholar] [CrossRef] [PubMed]
  56. Backes, C.; Khaleeq, Q.T.; Meese, E.; Keller, A. miEAA: microRNA enrichment analysis and annotation. Nucleic Acids Res. 2016, 44, W110–W116. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  57. Li, J.; Han, X.; Wan, Y.; Zhang, S.; Zhao, Y.; Fan, R.; Cui, Q.; Zhou, Y. TAM 2.0: Tool for MicroRNA set analysis. Nucleic Acids Res. 2018, 46, W180–W185. [Google Scholar] [CrossRef] [PubMed]
  58. Fan, Y.; Habib, M.; Xia, J. Xeno-miRNet: A comprehensive database and analytics platform to explore xeno-miRNAs and their potential targets. PeerJ 2018, 6, e5650. [Google Scholar] [CrossRef] [PubMed]
  59. Park, M.-T.; Lee, S.-J. Cell cycle and cancer. J. Biochem. Mol. Biol. 2003, 36, 60–65. [Google Scholar] [CrossRef] [PubMed]
  60. Collins, K.; Jacks, T.; Pavletich, N.P. The cell cycle and cancer. Proc. Natl. Acad. Sci. USA 1997, 94, 2776–2778. [Google Scholar] [CrossRef] [Green Version]
  61. Eymin, B.; Gazzeri, S. Role of cell cycle regulators in lung carcinogenesis. Cell Adhes. Migr. 2010, 4, 114–123. [Google Scholar] [CrossRef]
  62. Visvader, J.E. Cells of origin in cancer. Nature 2011, 469, 314–322. [Google Scholar] [CrossRef] [PubMed]
  63. Martin-Belmonte, F.; Perez-Moreno, M. Epithelial cell polarity, stem cells and cancer. Nat. Rev. Cancer 2012, 12, 23–38. [Google Scholar] [CrossRef] [PubMed]
  64. Deng, X.; Tannehill-Gregg, S.H.; Nadella, M.V.; He, G.; Levine, A.; Cao, Y.; Rosol, T.J. Parathyroid hormone-related protein and ezrin are up-regulated in human lung cancer bone metastases. Clin. Exp. Metastasis 2007, 24, 107–119. [Google Scholar] [CrossRef] [PubMed]
  65. Domagala-Kulawik, J.; Osinska, I.; Hoser, G. Mechanisms of immune response regulation in lung cancer. Transl. Lung Cancer Res. 2014, 3, 15–22. [Google Scholar] [PubMed]
  66. Liu, G.; Pei, F.; Yang, F.; Li, L.; Amin, A.; Liu, S.; Buchan, J.; Cho, W. Role of autophagy and apoptosis in non-small-cell lung cancer. Int. J. Mol. Sci. 2017, 18, 367. [Google Scholar] [CrossRef] [PubMed]
  67. Li, Y.; Qiu, C.; Tu, J.; Geng, B.; Yang, J.; Jiang, T.; Cui, Q. HMDD v2. 0: A database for experimentally supported human microRNA and disease associations. Nucleic Acids Res. 2013, 42, D1070–D1074. [Google Scholar] [CrossRef] [PubMed]
  68. Hoehndorf, R.; Schofield, P.N.; Gkoutos, G.V. The role of ontologies in biological and biomedical research: A functional perspective. Brief. Bioinform. 2015, 16, 1069–1080. [Google Scholar] [CrossRef] [PubMed]
  69. Wang, D.; Wang, J.; Lu, M.; Song, F.; Cui, Q. Inferring the human microRNA functional similarity and functional network based on microRNA-associated diseases. Bioinformatics 2010, 26, 1644–1650. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  70. Hosoda, K.; Watanabe, M.; Wersing, H.; Körner, E.; Tsujino, H.; Tamura, H.; Fujita, I. A model for learning topographically organized parts-based representations of objects in visual cortex: Topographic nonnegative matrix factorization. Neural Comput. 2009, 21, 2605–2633. [Google Scholar] [CrossRef]
  71. Zheng, C.-H.; Huang, D.-S.; Zhang, L.; Kong, X.-Z. Tumor clustering using nonnegative matrix factorization with gene selection. IEEE Trans. Inf. Technol. Biomed. 2009, 13, 599–607. [Google Scholar] [CrossRef] [PubMed]
  72. Facchinei, F.; Kanzow, C.; Sagratella, S. Solving quasi-variational inequalities via their KKT conditions. Math. Program. 2014, 144, 369–412. [Google Scholar] [CrossRef]
Figure 1. ROC curves and precision-recall (PR) curves of CNNMDA and other methods for 15 diseases.
Figure 1. ROC curves and precision-recall (PR) curves of CNNMDA and other methods for 15 diseases.
Ijms 20 03648 g001
Figure 2. Recall values of top k candidates of CNNMDA and the other four methods.
Figure 2. Recall values of top k candidates of CNNMDA and the other four methods.
Ijms 20 03648 g002
Figure 3. Functional enrichment analysis of lung cancer-related miRNAs. The horizontal ordinates represent 35 significant enriched functions of the top 50 candidate miRNAs associated with lung neoplasms. The vertical coordinates represent the number of miRNAs associated with each enriched function.
Figure 3. Functional enrichment analysis of lung cancer-related miRNAs. The horizontal ordinates represent 35 significant enriched functions of the top 50 candidate miRNAs associated with lung neoplasms. The vertical coordinates represent the number of miRNAs associated with each enriched function.
Ijms 20 03648 g003
Figure 4. Construction of a deep learning framework based on dual convolutional neural networks to learn original representation and global network representation.
Figure 4. Construction of a deep learning framework based on dual convolutional neural networks to learn original representation and global network representation.
Ijms 20 03648 g004
Figure 5. Establishment of the left embedding layer of miRNA m1 and disease d5 by combining their similarities and associations.
Figure 5. Establishment of the left embedding layer of miRNA m1 and disease d5 by combining their similarities and associations.
Ijms 20 03648 g005
Figure 6. Establishment of the right embedding layer miRNA m1 and disease d5 by integrating their projection vectors in low-dimensional space.
Figure 6. Establishment of the right embedding layer miRNA m1 and disease d5 by integrating their projection vectors in low-dimensional space.
Ijms 20 03648 g006
Table 1. Prediction results of CNNMDA and the other four methods for 15 diseases in terms of the area under the receiver operating characteristic curve (AUC).
Table 1. Prediction results of CNNMDA and the other four methods for 15 diseases in terms of the area under the receiver operating characteristic curve (AUC).
Diseases NameAUC CNNMDAGSTRWDMPredBNPMDALiu’s Method
Breast neoplasms0.9910.8220.9390.9060.896
Hepatocellular carcinoma0.9780.7700.8990.7840.846
Renal cell carcinoma0.9600.8010.8970.8300.785
Squamous cell carcinoma0.9320.8210.8940.7930.897
Colorectal neoplasms0.9240.7420.8820.7240.864
Glioblastoma0.9160.8210.9060.7810.828
Heart failure0.9860.8230.9840.9290.816
Acute myeloid leukemia0.9690.8170.8940.7840.924
Lung neoplasms0.9870.7950.9410.9030.931
Melanoma0.9940.7880.9090.9090.859
Ovarian neoplasms0.9550.8310.9340.9240.855
Pancreatic neoplasms0.9710.8530.9130.7250.892
Prostatic neoplasms0.9820.8280.9470.8960.895
Stomach neoplasms0.9940.7810.9220.7400.838
Urinary bladder neoplasms0.9820.8210.9210.8790.870
The bold values indicate the higher AUCs.
Table 2. Prediction results of CNNMDA and other four methods for 15 diseases in terms of the area under the precision–recall curve (AUPR).
Table 2. Prediction results of CNNMDA and other four methods for 15 diseases in terms of the area under the precision–recall curve (AUPR).
Diseases NameAUPR CNNMDAGSTRWDMPredBNPMDALiu’s Method
Breast neoplasms0.9190.2610.6810.2450.378
Hepatocellular carcinoma0.8710.2340.5390.5740.335
Renal cell carcinoma0.5490.1270.3250.3280.152
Squamous cell carcinoma0.2900.1040.1910.2720.170
Colorectal neoplasms0.4250.1360.2790.1770.273
Glioblastoma0.2770.1420.2700.4520.166
Heart failure0.8740.1600.6690.4510.157
Acute myeloid leukemia0.2620.1180.2360.3670.207
Lung neoplasms0.7060.1400.4810.4800.343
Melanoma0.8960.1570.4100.4770.309
Ovarian neoplasms0.5430.1520.4530.3860.239
Pancreatic neoplasms0.5930.1330.3080.1360.283
Prostatic neoplasms0.6730.1500.4140.1750.231
Stomach neoplasms0.8810.2070.5030.3060.303
Urinary bladder neoplasms0.6940.1340.3310.2920.229
The bold values indicate the higher AUPRs.
Table 3. Comparison of different methods based on AUCs with a paired t-test.
Table 3. Comparison of different methods based on AUCs with a paired t-test.
p-Value between CNNMDA and Another MethodDMPredGSTRWBNPMDALiu’s Method
p-values of ROC curves3.3219 × 10−58.5916 × 10−235.4483 × 10−102.0247 × 10−10
p-values of PR curves1.4386 × 10−82.7951 × 10−131.181 × 10−22.9012 × 10−8
Table 4. The top 50 lung neoplasms-related candidates.
Table 4. The top 50 lung neoplasms-related candidates.
RankmiRNA NameEvidence
1hsa-mir-106bdbDEMC, PhenomiR
2hsa-mir-15aLiterature [47]
3hsa-mir-16dbDEMC, PhenomiR, miRCancer
4hsa-mir-130adbDEMC, PhenomiR
5hsa-mir-193bdbDEMC, PhenomiR, TCGA
6hsa-mir-520ddbDEMC
7hsa-mir-429dbDEMC, miRCancer
8hsa-mir-122dbDEMC, PhenomiR, miRCancer
9hsa-mir-149dbDEMC, PhenomiR
10hsa-mir-424dbDEMC, PhenomiR
11hsa-mir-451adbDEMC
12hsa-mir-378aLiterature [42]
13hsa-mir-708dbDEMC
14hsa-mir-20bdbDEMC, PhenomiR, TCGA
15hsa-mir-15bdbDEMC, PhenomiR, miRCancer
16hsa-mir-520adbDEMC, TCGA
17hsa-mir-10adbDEMC
18hsa-mir-520bdbDEMC
19hsa-mir-625dbDEMC
20hsa-mir-141dbDEMC, PhenomiR, miRCancer
21hsa-mir-449adbDEMC, PhenomiR, miRCancer
22hsa-mir-99adbDEMC, PhenomiR, TCGA
23hsa-mir-195dbDEMC, PhenomiR, miRCancer
24hsa-mir-151aLiterature [43]
25hsa-mir-296Literature [44]
26hsa-mir-449bdbDEMC, PhenomiR, miRCancer
27hsa-mir-28dbDEMC, PhenomiR
28hsa-mir-342dbDEMC, PhenomiR
29hsa-mir-372dbDEMC, PhenomiR, TCGA
30hsa-mir-345dbDEMC, PhenomiR
31hsa-mir-92bdbDEMC, PhenomiR
32hsa-mir-328dbDEMC, PhenomiR
33hsa-mir-367dbDEMC, PhenomiR
34hsa-mir-373dbDEMC, PhenomiR
35hsa-mir-302bdbDEMC, PhenomiR, miRCancer
36hsa-mir-194dbDEMC, PhenomiR
37hsa-mir-1258dbDEMC
38hsa-mir-320adbDEMC, PhenomiR
39hsa-mir-152dbDEMC, PhenomiR
40hsa-mir-302cdbDEMC, PhenomiR
41hsa-mir-151bdbDEMC
42hsa-mir-204dbDEMC, PhenomiR
43hsa-mir-23bdbDEMC, PhenomiR
44hsa-mir-129dbDEMC, PhenomiR, TCGA
45hsa-mir-451bLiterature [45]
46hsa-mir-374aLiterature [48]
47hsa-mir-211dbDEMC, PhenomiR
48hsa-mir-208aLiterature [46]
49hsa-mir-1254dbDEMC, miRCancer
50hsa-mir-337dbDEMC, PhenomiR, TCGA

Share and Cite

MDPI and ACS Style

Xuan, P.; Sun, H.; Wang, X.; Zhang, T.; Pan, S. Inferring the Disease-Associated miRNAs Based on Network Representation Learning and Convolutional Neural Networks. Int. J. Mol. Sci. 2019, 20, 3648. https://doi.org/10.3390/ijms20153648

AMA Style

Xuan P, Sun H, Wang X, Zhang T, Pan S. Inferring the Disease-Associated miRNAs Based on Network Representation Learning and Convolutional Neural Networks. International Journal of Molecular Sciences. 2019; 20(15):3648. https://doi.org/10.3390/ijms20153648

Chicago/Turabian Style

Xuan, Ping, Hao Sun, Xiao Wang, Tiangang Zhang, and Shuxiang Pan. 2019. "Inferring the Disease-Associated miRNAs Based on Network Representation Learning and Convolutional Neural Networks" International Journal of Molecular Sciences 20, no. 15: 3648. https://doi.org/10.3390/ijms20153648

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop