CNNDLP: A Method Based on Convolutional Autoencoder and Convolutional Neural Network with Adjacent Edge Attention for Predicting lncRNA–Disease Associations

It is well known that the unusual expression of long non-coding RNAs (lncRNAs) is closely related to the physiological and pathological processes of diseases. Therefore, inferring the potential lncRNA–disease associations are helpful for understanding the molecular pathogenesis of diseases. Most previous methods have concentrated on the construction of shallow learning models in order to predict lncRNA-disease associations, while they have failed to deeply integrate heterogeneous multi-source data and to learn the low-dimensional feature representations from these data. We propose a method based on the convolutional neural network with the attention mechanism and convolutional autoencoder for predicting candidate disease-related lncRNAs, and refer to it as CNNDLP. CNNDLP integrates multiple kinds of data from heterogeneous sources, including the associations, interactions, and similarities related to the lncRNAs, diseases, and miRNAs. Two different embedding layers are established by combining the diverse biological premises about the cases that the lncRNAs are likely to associate with the diseases. We construct a novel prediction model based on the convolutional neural network with attention mechanism and convolutional autoencoder to learn the attention and the low-dimensional network representations of the lncRNA–disease pairs from the embedding layers. The different adjacent edges among the lncRNA, miRNA, and disease nodes have different contributions for association prediction. Hence, an attention mechanism at the adjacent edge level is established, and the left side of the model learns the attention representation of a pair of lncRNA and disease. A new type of lncRNA similarity and a new type of disease similarity are calculated by incorporating the topological structures of multiple bipartite networks. The low-dimensional network representation of the lncRNA-disease pairs is further learned by the autoencoder based convolutional neutral network on the right side of the model. The cross-validation experimental results confirm that CNNDLP has superior prediction performance compared to the state-of-the-art methods. Case studies on stomach cancer, breast cancer, and prostate cancer further show the ability of CNNDLP for discovering the potential disease lncRNAs.


Introduction
For the past few years, genetic information has been thought to be stored only in protein-coding genes, while non-coding RNAs (ncRNAs) are only byproducts of the transcription process [1,2]. However, accumulating evidences indicate that ncRNAs play important roles in various biological processes, especially long non-coding RNAs (lncRNAs), with lengths > 200 nucleotides [3,4].
The previous methods have been presented for predicting the lncRNA-disease associations, and they are classified into three categories. The methods in the first category utilize machine learning methods to identify the candidate associations. Chen et al. develop a semi-supervised learning model, LRLSLDA, which uses Laplacian regularized least squares to identify possible associations between lncRNA and disease [5]. A model based on the Bayesian classifier was developed for predicting candidate disease lncRNAs [6]. However, most of the methods in this category fail to achieve the good performances for the lncRNAs with no any known associated diseases.
The second category of methods takes use of the biological premise that lncRNAs with similar functions tend to be associated with similar diseases [7]. First, the similarity between two lncRNAs is calculated by the diseases associated with the lncRNAs, and a network composed of lncRNA is constructed by using the similarities between lncRNAs [8]. Several methods are presented for predicting the lncRNAs related to a given disease based on the lncRNA network, for instance, via random walks on the lncRNA network [9,10] or by utilizing the information of neighboring nodes of lncRNA [11]. These methods are ineffective for the new diseases with no known related lncRNAs, as they rely on a set of seed lncRNAs that have been observed to be related to the disease. Some methods attempt to introduce additional information about diseases to solve this shortcoming. Disease information is incorporated with the lncRNA network to create a heterogeneous lncRNA-disease network that contains information of lncRNA similarities, that of disease similarities and that of lncRNA-disease associations. Several methods exploit the information, but they construct different models within the heterogeneous network to estimate the association scores between the lncRNAs and the diseases. For instance, the association scores are derived by random walks in the lncRNA-disease network [10,12], or by matrix factorization of lncRNA-disease associations [13,14]. Since lncRNAs are often involved in disease processes along with miRNAs, it is necessary to integrate the interactions and associations about the miRNAs. Nevertheless, most of the previous methods overlook these information related to the miRNAs.
The third category of methods integrates multiple biological data sources about the lncRNA, the miRNA, the proteins. MFLDA integrates various information about the genes and the miRNAs interacted with lncRNAs, and about the diseases associated with lncRNAs. The method constructs a matrix factorization based on data fusion model for predicting disease lncRNAs [15]. Zhang et al. introduce the protein information to establish the lncRNA-protein-disease network and predict the candidate associations between lncRNAs and diseases based on propagating information streams in the network [16]. The diverse information available about the lncRNAs, diseases, genes, and proteins reflect the associations of lncRNAs and diseases from the different perspectives. However, it is difficult for these methods to deeply integrate heterogeneous data from multiple sources. Therefore, we present a novel prediction method based on dual convolutional neural networks to learn the latent representations of lncRNA-disease pairs from the multiple-source data.

Evaluation Metrics
Five-fold cross-validation is used to evaluate the prediction performances of CNNDLP and several state-of-the-art methods for predicting lncRNA-disease associations. All the known lncRNA-disease associations are regarded as positive samples, and the unobserved associations are taken as negatives samples. We randomly divided all the positive samples into five subsets, and four of them are used to training the model. As the number of positive samples is far less than that of the negative samples (ratio of positive samples to negative samples is nearly 1:36 in our study), during the training process, we select the negative samples randomly whose number match to the number of the positive training samples, and these negative samples are also used for training the model. The positive samples in the remaining subset and all the negative samples are considered as the testing samples. The number of positive samples and that of negative samples during the cross-validation process are listed in Supplement Table S1. In particular, during each cross-validation, the positive samples used for testing are removed and the lncRNA similarities are recalculated by using the remaining positive samples.
We obtain the association scores of testing samples and prioritize them by their scores. The positive samples are ranked higher, which indicate that the prediction performance is better. The lncRNA-disease node pairs whose scores are greater than a classification threshold θ are identified as positive samples, and the ones that have lower scores are determined as negative samples. The true positive rates (TPRs) and the false positive rates (FPRs) at various θ values are calculated as follows: where TP and TN are the numbers of positive and negative samples that are identified correctly, while FN and FP are the numbers of misidentified positive and negative samples. The receiver operating characteristic (ROC) curve can be drawn according to the TPRs and FPRs at each various θ, while the area under the ROC curve (AUC) is usually used to evaluate the overall performance of a prediction method [17]. A serious imbalance between the positive samples and the negative ones appears since their ratio is 1:36. For such imbalanced cases, precision-recall (PR) curve is confirmed to be more informative than ROC curve [18]. Therefore, the PR curve is used as another important measurement for the prediction performance of each method. Precision and recall are calculated as follows: where precision is the proportion of the real positive samples among the samples that are identified as the positive ones, while recall is the proportion of the real positive samples to the total actual positive ones. The area under the P-R curve (AUPR) is also utilized to evaluate the performance of lncRNA-disease association prediction [19]. In addition, the top candidate lncRNAs are usually selected by the biologists for further experimental verification of their associations with an interested disease. Therefore, we demonstrate the recall rates of the top 30, 60, and 240 candidates, which demonstrates how many of the positive samples are identified correctly within the ranking list of top k.

Comparison with Other Methods
To assess the prediction performance of CNNDLP, we compare it with several state-of-the-art methods for predicting disease lncRNAs: SIMCLDA [20], Ping's method [21], MFLDA [15], LDAP [22] and CNNLDA [8]. CNNDLP and the other four methods have specific hyperparameters for fine-tuning to achieve their best association prediction performance. We choose the values of CNNDLP's hyperparameters, α, β and λ, from {0.1, ..., 0.9}. CNNDLP achieved the best performance of five-fold cross-validation, when α = 0.9, β = 0.8 and λ = 0.3. The prediction performances of CNNDLP at different values of α, β, and λ on CNNDLP in the Supplementary Table S2, Supplementary Table S3,  and Supplementary Table S4. In addition, the window size of all convolutional layers and pooling layers in CNNDLP is set as 2 × 2. The number of filters in the first and the second convolutional layers n conv1 and n conv2 are set to 16 and 32 respectively. CNNDLP has a great many parameters, which is easy to make the model overfit all the training samples. Therefore, we adopt dropout strategy and batch normalization to prevent the overfitting. To make a fair comparison, we set the hyperparameters of other methods to the optimal values that are recommended by their respective literatures (i.e., α l = 0.8, α d = 0.6 and λ = 1 for SIMCLDA, α = 0.6 for Ping's method, α = 105 for MFLDA, Gap open = 10. Gap extend = 0.5 for LDAP).
As shown in Figure 1a, CNNDLP yields the highest average performance on all of the 405 diseases (AUC = 0.969). In particular, its performance is increased SIMCLDA by 21.2%, Ping's method by 9.3%, MFDDA by 34.4%, LDAP by 10.7%, and CNNLDA by 1.7%. The AUCs of the five methods on 10 well-characterized diseases are also listed in Table 1, and CNNDLP achieves the best performance in all of the 10 diseases. The AUC of CNNDLP is slightly better than CNNLDA, but the AUPR of the former is 3.5% higher than the latter. The possible reason for this is that CNNLDA did not learn the low-dimensional network representation of a lncRNA-disease pair. Ping's method and the LDAP achieved decent performance as they take advantage of the various similarities of different types of lncRNAs and diseases. MFLDA does not perform as well as the other four methods. A possible reason is that it ignored the lncRNA similarity and the disease similarity, while are exploited by the other methods. The improvement of CNNDLP over the compared methods is primarily due to the fact that it deeply learns the attention representation and low-dimensional network-level representation of the lncRNA-disease node pairs. AUPR of the former is 3.5% higher than the latter. The possible reason for this is that CNNLDA did not learn the low-dimensional network representation of a lncRNA-disease pair. Ping's method and the LDAP achieved decent performance as they take advantage of the various similarities of different types of lncRNAs and diseases. MFLDA does not perform as well as the other four methods. A possible reason is that it ignored the lncRNA similarity and the disease similarity, while are exploited by the other methods. The improvement of CNNDLP over the compared methods is primarily due to the fact that it deeply learns the attention representation and low-dimensional network-level representation of the lncRNA-disease node pairs.  As shown in Figure 1(b), CNNDLP's average AUPR is also higher than other methods on 405 diseases (AUPR = 0.286). Its average AUPR is 22.7%, 13.4%, 24.7%, 15.9%, and 3.5% higher SIMCLDA, Ping's method, MFLDA, LDAP and CNNLDA, respectively. In addition, CNNDLP performs the best performance among nine of the ten well-characterized diseases ( Table 2).
A higher recall value in the top k of ranking list indicates that more real lncRNA-disease associations are identified correctly. Figure 2 shows CNNDLP outperforms the other methods at different top k cutoffs, and ranks 88.6% in top 30, ranks 94.6% in top 60, ranks 97.5% in top 90, and ranks 98.3% in top 120. Most of the recall rates of Ping's method are very close to LDAP. The former ranked 68.9%，81.3%，87.5% and 92.7% in top 30, 60, 90 and 120, respectively, and the latter ranked 68.5%, 81.7%, 88.0% and 93.3%. MFLDA is still worse than the other methods, and it ranked 42.0%，  As shown in Figure 1b, CNNDLP's average AUPR is also higher than other methods on 405 diseases (AUPR = 0.286). Its average AUPR is 22.7%, 13.4%, 24.7%, 15.9%, and 3.5% higher SIMCLDA, Ping's method, MFLDA, LDAP and CNNLDA, respectively. In addition, CNNDLP performs the best performance among nine of the ten well-characterized diseases ( Table 2). A higher recall value in the top k of ranking list indicates that more real lncRNA-disease associations are identified correctly. Figure 2 shows CNNDLP outperforms the other methods at different top k cutoffs, and ranks 88.6% in top 30, ranks 94.6% in top 60, ranks 97.5% in top 90, and ranks 98.3% in top 120. Most of the recall rates of Ping's method are very close to LDAP. The former ranked 68.9%, 81.3%, 87.5% and 92.7% in top 30, 60, 90 and 120, respectively, and the latter ranked 68.5%, 81.7%, 88.0% and 93.3%. MFLDA is still worse than the other methods, and it ranked 42.0%, 53.9%, 61.0% and 65.6%.   In addition, a paired Wilcoxon test is conducted to confirm whether CNNDLP's prediction performance is significantly greater than the other methods. The statistical results in Table 3 show that CNNDLP yields better performance than the other methods in terms of not only AUCs but AUPRs, as well for the threshold p-value of 0.05.  In addition, a paired Wilcoxon test is conducted to confirm whether CNNDLP's prediction performance is significantly greater than the other methods. The statistical results in Table 3 show that CNNDLP yields better performance than the other methods in terms of not only AUCs but AUPRs, as well for the threshold p-value of 0.05. To further demonstrate the capability of CNNDLP to discover potential disease-related candidate lncRNAs, we construct the case studies on stomach cancer, breast cancer, and prostate cancer. For each of these three diseases, we prioritize the candidate lncRNA-disease associations based on their association scores and gather their respective 15 candidates.
Stomach cancer is currently the fourth most common malignant tumor in the world and the second leading cause of cancer-related death [23]. First, Lnc2Cancer is a manually curated database that are verified associations between the lncRNAs and the human cancers by the biological experiments [24]. Twelve of 15 candidates are included by Lnc2Cancer (Table 4), which indicates that these lncRNAs are indeed associated with the disease. Second, LncRNADisease records more than 4564 lncRNA-disease associations that are obtained from experiments, the published literatures or computation, and then the dysregulation of lncRNAs are manually confirmed [25]. There are 14 candidates contained by the LncRNADisease, indicating they are upregulated or downregulated in stomach cancer tissues. In addition, one candidate labeled by "literature" is supported by the literature, and it is confirmed to have dysregulation in the cancer when compared with the normal tissues [26].
Among the top 15 candidates for breast cancer, 11 candidates are reported in Lnc2Cancer with abnormal expression in breast cancer. (Table 5) LncRNADisease contains 12 candidates, which confirms the associations between these candidates and the disease. The remaining 2 candidates are confirmed by the literatures to have desregulation in the breast cancer [27,28]. The top 15 prostate cancer-related candidates and the corresponding evidences are listed in Table 6. Fourteen candidates are included by Lnc2Cancer and 14 ones are contained by LncRNADisease, which indicates that they truly are related to the disease. All the case studies confirm that CNNDLP is effective and impactful for discovering potential candidate disease lncRNAs.

Prediction of Novel Disease lncRNAs
After five-fold cross validation and case studies to confirm its prediction performance, we further apply CNNDLP to 405 diseases. All the known lncRNA-disease associations are used for training CNNDLP's to predict potential disease-related lncRNAs. The top 50 potential candidates for each of 405 diseases are demonstrated in Supplementary Table S5.

Datasets for lncRNA-Disease Association Prediction
We obtained thousands of lncRNA-disease associations, lncRNA-miRNA interactions and miRNA-disease associations from a published work [15]. The human lncRNA-disease database (LncRNADisease) consists of 2687 lncRNA-disease associations that were verified by the biological experiments, covering 240 lncRNAs and 405 diseases [29]. The disease similarities were calculated based on directed acyclic graphs (DAGs) and the DAGs were constructed based on the disease terms from the U.S. National Library of Medicine (MeSH). The 1002 lncRNA-miRNA interactions were originally extracted from starBasev2.0 and they have been confirmed by biological experiments [30], and were involved 495 miRNAs. The 13,559 miRNA-disease associations were obtained from HMDD database [31].

Bipartite Graphs about the lncRNAs, Diseases, miRNAs, and Representations
We firstly construct a bipartite graph composed of lncRNAs and diseases by connecting them according to the observed lncRNA-disease associations (Figure 3a). The graph is represented by matrix A = A ij ∈ R N l ×N d , where N l and N d are the number of lncRNAs and that of diseases, respectively. Each of rows corresponds to a lncRNA while each of columns represent a disease. If a lncRNA l i has been observed to be associated a disease d j , the A ij in A is set to 1, otherwise A ij is 0. and were involved 495 miRNAs. The 13,559 miRNA-disease associations were obtained from HMDD database [31].

Bipartite Graphs about the lncRNAs, Diseases, miRNAs, and Representations
We firstly construct a bipartite graph composed of lncRNAs and diseases by connecting them according to the observed lncRNA-disease associations (Figure 3(a)). The graph is represented by matrix = ∈ × , where and are the number of lncRNAs and that of diseases, respectively. Each of rows corresponds to a lncRNA while each of columns represent a disease. If a lncRNA has been observed to be associated a disease , the in is set to 1, otherwise is 0. There are a great many interactions between the lncRNAs and miRNAs that have been confirmed by the biological experiments [32]. A bipartite graph composed of lncRNA and miRNA nodes is established when there are known interactions between them (Figure 3(b)). = ∈ × is used to represent interaction matrix, the graph including lncRNAs and miRNAs. If it is known that lncRNA is interacted with miRNA , = 1 , or = 0 when their interaction has not been observed. An edge is added to connect a miRNA and a disease, when they are observed to have association (Figure 3(c)). = ∈ × is a matrix representing a bipartite graph with miRNAs and diseases. We set to 1 if miRNA is associated with disease , or 0 when no such association is observed. There are a great many interactions between the lncRNAs and miRNAs that have been confirmed by the biological experiments [32]. A bipartite graph composed of lncRNA and miRNA nodes is established when there are known interactions between them (Figure 3b). B = B ij ∈ R N l ×N m is used to represent interaction matrix, the graph including N l lncRNAs and N m miRNAs. If it is known that lncRNA l i is interacted with miRNA m j , B ij = 1, or B ij = 0 when their interaction has not been observed.
An edge is added to connect a miRNA and a disease, when they are observed to have association ( Figure 3c). C = C ij ∈ R N m ×N d is a matrix representing a bipartite graph with N m miRNAs and N d diseases. We set C ij to 1 if miRNA m i is associated with disease d j , or 0 when no such association is observed.

LncRNA-Disease Association Prediction Model Based on CNN
In this section, we describe the prediction model based on convolutional neural networks and attention mechanism for learning the latent representation and predicting the association score of lncRNA l i and disease d j . The embedding layer is firstly constructed by incorporating the associations, the similarities, the interactions about lncRNAs, diseases, miRNAs. A novel prediction model is constructed and it is composed of the left and right parts. The left side of the model learns the attention representation of l i and d j , while the network representation of l i and d j is learned in the right side of model. Each of the two representations goes through a fully connected layer and a softmax layer and the associated possibility between l i and d j is obtained and it is regarded as their association score. The final score is the weighted sum of two association scores.

Embedding Layer on the Left lncRNA Functional Similarity Measurement
On the basis of the biological premise that lncRNAs with similar functions are more possibly to be associated with similar diseases, the similarity of two lncRNAs is measured by their associated diseases. For instance, lncRNA l a is associated with disease d 1 , d 2 and d 4 and lncRNA l b is associated with diseases d 2 , d 4 and d 5 . The similarity between E a = {d 1 , d 2 , d 4 } and E b = {d 2 , d 4 , d 5 } is regarded as the functional similarity of l a and l b . The lncRNA similarity that are used by us is calculated according to the Xuan's method [8]. Matrix L = L ij ∈ R N l ×N l is the lncRNA similarity matrix (Figure 3d), where L ij is the similarity of lncRNAs l i and l j , L ij value changes between 0 and 1.

Disease Similarity Measurement
All semantic terms related to a disease form its directed acyclic graph (DAG). The semantic similarities between the diseases are successfully calculated by Wang et al. based on their DAGs [33]. We calculate the disease similarities according to Wang's method, and the similarities can be represented by matrix D = D ij ∈ R N d ×N d , where D ij is the similarity of disease d i and d j (Figure 3e). The similarity of two diseases also varies between 0 and 1.

The Left Embedding Layer for Integrating the Original Information
If a lncRNA and a disease have similarity relationships and association relationships with the more common lncRNAs, they are more likely to associated with each other. We take the lncRNA l 1 and the disease d 2 as an example to explain the process of constructing the embedding layer on the left. As shown in Figure 4, let L 1 represents the first row of L which records the similarities between l 1 and all the lncRNAs. The second row of A T , A T 2 , contains the associations between d 2 and all the lncRNAs. For example, as l 1 is similar to l 2 and l 5 , and d 2 has been associated with l 2 , l 4 and l 5 , l 1 is possibly related to d 2 . We stack L 1 and A T 2 together as the first part of the embedding layer. Similarly, l 1 and d 2 are more likely to associate when l 1 and d 2 have the similarity and association connections with more common diseases. Therefore, we stack A 1 and D 2 as the second part of the embedding layer. In addition, when a lncRNA and a disease have interaction and association relationships with the common miRNAs, they are more possibly to have association. For instance, there is a possible association between l 1 and d 2 , since l 1 interacts with miRNAs m 1 and m 3 , and disease d 2 is associated with m 2 and m 3 . The first row of B and the second row G T are stacked as the third part of the embedding layer. The final embedding layer matrix between l 1 and d 2 is denoted as X ∈ R 2×(N l +N m +N d ) .

Attention at the adjacent edge level
For a lncRNA node or a disease node, not all the adjacent edges of the node have equal contributions for learning the representation of a pair of lncRNA-disease. In order to solve the issue, we establish the attention mechanism to enhance the adjacent edges that are important for predicting the lncRNA-disease associations. In the embedding layer matrix , represents the connection case between the i-th node and the j-th node, and is assigned an attention weight , which is defined as follows, where and are a weight matrix and a context vector respectively, and is a bias vector. is an attention score that represents the importance of . is a normalized importance . is the enhanced embedding layer matrix after the attention mechanism at the adjacent edge level is applied for .

Embedding Layer on the Right
First, it is known that two lncRNA nodes are similar if they are associated with some common disease nodes [22]. In the bipartite network of lncRNA-disease ( Figure 5(a)), lncRNA and are associated with a common disease node , so and are similar. and are also similar because they are related to a disease node ( Figure 5(b)). Similarly, is similar to as they are associated with common lncRNA node ( Figure 5(c)). Second, if two lncRNA nodes have no common neighboring nodes, while they are related to some similar disease nodes, they are also

Attention at the Adjacent Edge Level
For a lncRNA node or a disease node, not all the adjacent edges of the node have equal contributions for learning the representation of a pair of lncRNA-disease. In order to solve the issue, we establish the attention mechanism to enhance the adjacent edges that are important for predicting the lncRNA-disease associations. In the embedding layer matrix X, X ij represents the connection case between the i-th node and the j-th node, and X ij is assigned an attention weight α ij , which is defined as follows, where W and b are a weight matrix and a context vector respectively, and u e is a bias vector. F ij is an attention score that represents the importance of X ij . α ij is a normalized importance X ij .X is the enhanced embedding layer matrix after the attention mechanism at the adjacent edge level is applied for X.

Embedding Layer on the Right
First, it is known that two lncRNA nodes are similar if they are associated with some common disease nodes [22]. In the bipartite network of lncRNA-disease (Figure 5a), lncRNA l 1 and l 3 are associated with a common disease node d 2 , so l 1 and l 3 are similar. l 3 and l 5 are also similar because they are related to a disease node d 1 (Figure 5b). Similarly, d 1 is similar to d 2 as they are associated with common lncRNA node l 3 (Figure 5c). Second, if two lncRNA nodes have no common neighboring nodes, while they are related to some similar disease nodes, they are also similar to each other [22]. As shown in Figure 5d, l 1 and l 5 are similar, because their neighboring nodes d 1 and d 2 are similar. Similarly, d 2 and d 3 are similar as they are associated with similar neighboring nodes l 3 and l 5 (Figure 5e). Ping et al. successfully measured the lncRNA similarities and the disease similarities by utilizing the lncRNA-disease bipartite network.
associations. The first kind of lncRNA similarity ( ) = ( ) ∈ × , and the first kind of disease similarity ( ) = ( ) ∈ × are calculated according to Ping's method. The second kind of lncRNA similarity is measured by exploiting the information of common miRNA nodes and similar ones interacting with two lncRNA nodes in the lncRNA-miRNA bipartite network ,and it is denoted as ( ) = ( ) ∈ × . Finally, the second kind of disease similarity ( ) = ( ) ∈ × is calculated based on the miRNA-disease bipartite network.
In order to incorporate two kinds of lncRNA similarities ( ) and ( ) , the final lncRNA similarity ( ) is defined as follows, where α is the parameter utilized to control the contributions of ( ) and ( ) . Similarly, the final disease similarity ( ) is the weighted sum of ( ) and ( ) , as follows, where β is a parameter for regulating the weights of ( ) and ( ) . The right embedding layer for integrating the second kinds of lncRNA and disease similarities. The establishment of the right embedding layer matrix ∈ ×( ) is similar to the left embedding layer matrix . First, we stack the first row of ( ) , ( ) , and together as the first part of the embedding layer. Second, and ( ) are stacked as the second part of the embedding layer.
Finally, the first row of and the second row are stacked as the third part of the embedding layer. ∈ R N l ×N l . Finally, the second kind of disease similarity D (2) is calculated based on the miRNA-disease bipartite network.
In order to incorporate two kinds of lncRNA similarities L (1) and L (2) , the final lncRNA similarity L (c) is defined as follows, where α is the parameter utilized to control the contributions of L (1) and L (2) . Similarly, the final disease similarity D (c) is the weighted sum of D (1) and D (2) , as follows, where β is a parameter for regulating the weights of D (1) and D (2) . The right embedding layer for integrating the second kinds of lncRNA and disease similarities. The establishment of the right embedding layer matrix Y ∈ R 2×(N l +N m +N d ) is similar to the left embedding layer matrix X. First, we stack the first row of L (c) , L 2 are stacked as the second part of the embedding layer. Finally, the first row of B and the second row G T are stacked as the third part of the embedding layer.

Convolutional Module on the Left
In this section, we describe a novel model based on convolutional neural networks with adjacent attention for learning latent representations of lncRNA-disease node pairs. The overall architecture is showed in Figure 6. We describe the left convolutional module in detail. Left module includes convolution and activation layer, max-pooling layer, fully connected layer. The embedding matrix X ∈ R 2×(N l +N m +N d ) is inputted the convolutional module to learn an original representation of a pair of lncRNA-disease node.

Convolutional Module on the Left
In this section, we describe a novel model based on convolutional neural networks with adjacent attention for learning latent representations of lncRNA-disease node pairs. The overall architecture is showed in Figure 6. We describe the left convolutional module in detail. Left module includes convolution and activation layer, max-pooling layer, fully connected layer. The embedding matrix ∈ ×( ) is inputted the convolutional module to learn an original representation of a pair of lncRNA-disease node. For a convolutional layer, the length and the width of a filter are set to and ℎ respectively, which means the filter is applied on × ℎ features. In order to learn the marginal information of the embedding matrix , we pad zeros around . Let the number of filters is . The convolution filters ∈ × × are applied to the embedding matrix , and obtain the feature maps ∈ ×( )×( ) . is the element at the i-th row and j-th column of .
, , represent a region in a filter when the kth filter slides the position .
, , = ( : + , : + ℎ) ( , ) = , , * , , + ( ) ∈ 1, 4 − + 1 ∈ 1, 2 + + + − ℎ + 1 , ∈ 1, ( , ) is the element at the i-th row and the j-th column of the k-th feature map. is relu function that it is a nonlinear activation function [34], is the weight matrix of the k-th filter and b is a bias vector. For a convolutional layer, the length and the width of a filter are set to w and h respectively, which means the filter is applied on w × h features. In order to learn the marginal information of the embedding matrixX, we pad zeros aroundX. Let the number of filters is n conv . The convolution filters W conv ∈ R w×h×n conv are applied to the embedding matrixX, and obtain the feature maps Z ∈ R n conv ×(4−w+1)×(2+N l +N m +N d −h+1) .X ij is the element at the i-th row and j-th column ofX.X k,i,j represent a region in a filter when the kth filter slides the positionX ij .
Z k (i, j) = g W k,i,j * X k,i,j + b(k) (9) i ∈ [1, is the element at the i-th row and the j-th column of the k-th feature map. g is relu function that it is a nonlinear activation function [34], W k is the weight matrix of the k-th filter and b is a bias vector.
The max-pooling layer is used to down-sample the features of the feature maps Z k (i, j), and it outputs the most important feature in each feature map. Given an input Z k (i, j), the output of pooling layer is shown as follows, where w p is the length of a filter of pooling layer and h p is the width. V k (i, j) is the element at the i-th row and the j-th column in the kth feature map.X goes through two convolutional and two max-pooling layers, and we obtain the original representation Z left of l 1 and d 2 from the left convolutional module. Finally, Z left is flattened to a vector z, which z is feed to fully connected layer. A softmax layer is used to normalized the output of the fully connected layer and we have score l = so f tmax(Wz + b) (11) where W is the weight matrix, and b is a bias vector. score l is an associated probability distribution of C class (C = 2). score l is the probability that the lncRNA l 1 is associated with the disease d 2 and score 0 l is the probability that l 1 and d 2 have no association relationship. Similarly, the embedding matrix Y ∈ R 2×(N l +N m +N d ) is feed to the convolutional module on the right side of the prediction model for learning the network representation Z right of l 1 and d 2 . score r of l 1 and d 2 are obtained when Z right is feed to the full connection layer and the softmax layer.

Convalutional Autoencoder Module on the Right
The matrices about lncRNA-disease association, lncRNA-miRNA interaction, and miRNA-disease association are very sparse, resulting in many 0 elements are contained in the embedding matrix Y. An autoencoder based convolutional neural network is constructed to learn important and low-dimensional feature representations of lncRNA-disease pair on the right side of CNNDLP. The encoding and decoding strategies are given as follows,

Encoding Strategy
The embedding layer matrix Y ∈ R 2×(N l +N d +N m ) is mapped into the low-dimensional feature space through encoding based on convolutional neural network. Y (n−1) encode,k is inputted to the n-th convolution layer to obtain Z encode,k is formed after Z (n) encode,k passes the n-th max-pooling layer. They are defined as follows, where H e is the total number of encoding layers, and Y (0) encode = Y. k represents the k-th filter and n encode is the number of filters during encoding process. Z (n) encode,k (i, j) and Y (n) encode,k (i, j) are the elements at the i-th row and the j-th column of the k-th feature map, respectively. W (n) encode,k is a weight matrix and b (n) encode (k) is a bias vector.

Decoding Strategy
The output of the H e -th encoding layer Y (H e ) encode is used as the input of the decoder. It is a matrix that is similar to Y by decoding. The decoding process includes both the transpose convolution layer and transpose pooling layer, and they are respectively defined as, decode,k and Z (n) decode,k are the outputs of the n-th transpose convolution layer and transpose max-pooling layer, respectively. H d is the total number of decoding layers, and n decode is the number of filters for decoding. As Y (H d ) decode should be consistent with Y, we defined the loss function as follows, where Y decode is the output of decoding and Y is the input of encoding; Y i is corresponding to the i-th training sample (lncRNA-disease pair), and T is the number of training samples. The score r of l 1 and d 2 is obtained after the Y encode is feed to the full connection layer and the softmax layer.

Combined Strategy
In our model, the cross-entropy is used as the loss function, for the left and right parts of the prediction model loss functions are defined as follows, loss 1 = − T i = 1 y label log(score l ) + 1 − y label log(1 − score l ) (17) loss 2 = − T i = 1 y label log(score r ) + 1 − y label log(1 − score r ) (18) where y label denotes the actual association label between a lncRNA and a disease. y label is 1 when the lncRNA is indeed associated with the disease, otherwise y label is 0. T is the number of training samples. The final score of our model is a weighted sum of score l and score r as follows, where the parameter λ ∈ (0, 1) is used to adjust the importance of score l and score r .

Conclusions
A novel method based on the convolutional neural network with adjacent edge attention and convolutional autoencoder, entitled CNNDLP, is developed for inferring potential candidate lncRNA-disease associations. Two embedding layers are constructed from the biological perspective for integrating heterogeneous data about lncRNAs, diseases, and miRNAs from multiple sources. We construct the attention mechanism at the adjacent edge level to discriminate the different contributions of edges and the latent representation of a lncRNA-disease pair is learned from the more informative edges by the left side of CNNDLP's prediction model. On the basis of calculating the new type of lncRNA similarity and that of disease similarity, the right side of CNNDLP's model captures the complex relationships among these similarities and the lncRNA-disease associations, as well as the topological structures of multiple heterogeneous networks. The novel prediction model based on the convolutional neural network learns the attention representation and the low-dimensional network one of the lncRNA-disease pair. The experimental results demonstrated that CNNDLP outperforms the other methods in terms of not only AUCs but AUPRs as well. In particular, CNNDLP is more beneficial for the biologists as the top part of its ranking list may retrieve more real lncRNA-disease associations. Case studies on three diseases further confirm that CNNDLP is able to discover the potential candidate disease-related lncRNAs. CNNDLP may serve as a powerful prioritization tool that screens prospective candidates for the subsequent discovery of actual lncRNA-disease associations through wet-lab experiments.