LDAPred: A Method Based on Information Flow Propagation and a Convolutional Neural Network for the Prediction of Disease-Associated lncRNAs

Long non-coding RNAs (lncRNAs) play a crucial role in the pathogenesis and development of complex diseases. Predicting potential lncRNA–disease associations can improve our understanding of the molecular mechanisms of human diseases and help identify biomarkers for disease diagnosis, treatment, and prevention. Previous research methods have mostly integrated the similarity and association information of lncRNAs and diseases, without considering the topological structure information among these nodes, which is important for predicting lncRNA–disease associations. We propose a method based on information flow propagation and convolutional neural networks, called LDAPred, to predict disease-related lncRNAs. LDAPred not only integrates the similarities, associations, and interactions among lncRNAs, diseases, and miRNAs, but also exploits the topological structures formed by them. In this study, we construct a dual convolutional neural network-based framework that comprises the left and right sides. The embedding layer on the left side is established by utilizing lncRNA, miRNA, and disease-related biological premises. On the right side of the frame, multiple types of similarity, association, and interaction relationships among lncRNAs, diseases, and miRNAs are calculated based on information flow propagation on the bi-layer networks, such as the lncRNA–disease network. They contain the network topological structure and they are learned by the right side of the framework. The experimental results based on five-fold cross-validation indicate that LDAPred performs better than several state-of-the-art methods. Case studies on breast cancer, colon cancer, and osteosarcoma further demonstrate LDAPred’s ability to discover potential lncRNA–disease associations.


Introduction
Many studies have indicated that protein-coding genes only account for~2% of the human genome, whereas non-coding protein sequences account for~98% [1][2][3][4][5]. Non-coding RNA, especially long non-coding RNA with a length exceeding 200 nucleotides (lncRNA), plays an important role in various biological processes, such as transcription, translation, epigenetic regulation, splicing, differentiation, the immune response, and cell cycle control. Mutations and disorders of lncRNA are associated with a variety of human diseases [6][7][8][9]. For example, lncRNA PCA3 is a biomarker for potential cancer diagnosis because it is associated with normal tissues and increases the expression level of prostate tumors by 60× [10,11]. Therefore, it is necessary to discover more potential lncRNA-disease

Parameter Settings
To achieve the best prediction result, we repeatedly verified the results by conducting experiments. Finally, the filter used in the convolutional layer and the pooling layer in the dual channel system was set to the dimension of 2 × 2. The convolution process of the two channels was consistent. We set the number of the first layer filters n conv1 and n conv3 of the left and right convolution modules as 8, and the number of the second layer filters n conv2 and n conv4 of the left and right channels as 16. In the right embedding, hyperparameter γ, which was used to balance the proportion of one-hop and two-hop information, was set to 0.2. Finally, we balanced the score ratio of the two paths by using the parameter λ =0.7.

Evaluation Metrics
To evaluate the performance of the prediction model, we used five-fold cross-validation. First, the known 2687 lncRNA-disease associations were divided into five groups, four of which were used as the training set and one as the test set. Second, we deleted the association in the test set when calculating the similarity of lncRNAs. We regarded those with lncRNA-related diseases in the test set as positive cases and those without any association as negative cases.
After using our prediction model to evaluate the associated scores of the test samples, the scores of the samples were ranked in descending order. The higher the ranking of the positive examples, the better the prediction performance of the model. We measured the global performance of our prediction model by drawing the receiver operating characteristic (ROC) curve and calculating the area under the curve (AUC). The true positive rate (TPR) and false positive rate (FPR) can be defined as follows: where TP is the number of positive samples that are considered positive, and TN is the number of counterexamples that are considered counterexamples. FN is the number of positive examples that are considered counterexamples, and FP is the number of counterexamples that are considered positive examples. Finally, the average of all disease AUCs was taken to represent the performance of the predictive model. The higher the value, the higher the global performance of the model. Because the lncRNA-disease sample has a number of associated positive examples that are smaller than the unrelated or unrecognized counterexamples, there is a serious imbalance ratio. Therefore, we also used the precision-recall (PR) curve to measure the overall performance of the model. The larger the area under the PR curve (AUPR), the better the prediction performance. The precision and recall can be calculated as follows: Biological experiments are costly and time-consuming and limited by equipment precision and human error; thus, biologists choose to predict the top lncRNA to verify the disease associated with it. Therefore, we also calculated the recall rate of the first k (30, 60, 90, ..., 240) samples, i.e., the ratio of the positive samples in the first k samples to all the predicted positive samples, as another performance index.

Comparison with Other Methods
To reveal the advantages of considering network topology information in lncRNA-disease association prediction modeling and demonstrate the strong performance of our model, we selected four latest lncRNA-disease association prediction methods, namely SIMCLDA [25], Ping's method [26], MFLDA [27], and LDAP [16], for comparison. To make a fair comparison, we used the optimal values recommended in these articles as superparameters of the four methods.
As shown in Figure 1a, our method, LDAPred, achieved the best performance in all 405 diseases; i.e., the average area under the ROC curve was 0.963. This is 21.8% higher than that of SIMCLDA, 9.3% higher than that of Ping's method, 34% higher than that of MFLDA, and 10.1% higher than that of LDAP. We also listed five methods for AUCs for 10 well-characterized diseases (Table 1). Each disease was associated with at least 15 lncRNAs. Table 1 shows that LDAPred performs best for 8 out of 10 diseases. Both Ping's method and LDAP achieved a good performance with similar ROC values as they both used the similarity calculated from different angles of lncRNA and disease. The performance of MFLDA is the worst of the five methods because it does not consider the similarity of the disease and lncRNA during the prediction process. LDAPred has the best performance among the five methods because it considers the network topology among lncRNA, disease, and miRNA, and learns the depth representation of these topologies. higher than that of LDAP. We also listed five methods for AUCs for 10 well-characterized diseases (Table 1). Each disease was associated with at least 15 lncRNAs. Table 1 shows that LDAPred performs best for 8 out of 10 diseases. Both Ping's method and LDAP achieved a good performance with similar ROC values as they both used the similarity calculated from different angles of lncRNA and disease. The performance of MFLDA is the worst of the five methods because it does not consider the similarity of the disease and lncRNA during the prediction process. LDAPred has the best performance among the five methods because it considers the network topology among lncRNA, disease, and miRNA, and learns the depth representation of these topologies.  The bold values indicate the higher AUCs.
As shown in Figure 1b and Table 2, the average PR curve of LDAPred for 405 diseases was higher than that of the other four methods. The average AUPR (area under PR curves) of our method's PR curve is 0.219, which is higher than those of SIMCLDA, Ping's method, MFLDA, and LDAP (19%, 6.7%, 18%, and 9.2%, respectively). Of the 10 diseases with clear characteristics associated with lncRNA, LDAPred performed the best for 6 diseases.
In addition, to assess whether the AUC performance of LDAPred for all 405 diseases is better than those of the other four methods, we performed a paired Wilcoxon test. The statistical results are shown in Table 3. For AUC and AUPR, LDAPred performed significantly better than all the other methods at a p-value of 0.05.  As shown in Figure 1b and Table 2, the average PR curve of LDAPred for 405 diseases was higher than that of the other four methods. The average AUPR (area under PR curves) of our method's PR curve is 0.219, which is higher than those of SIMCLDA, Ping's method, MFLDA, and LDAP (19%, 6.7%, 18%, and 9.2%, respectively). Of the 10 diseases with clear characteristics associated with lncRNA, LDAPred performed the best for 6 diseases.
In addition, to assess whether the AUC performance of LDAPred for all 405 diseases is better than those of the other four methods, we performed a paired Wilcoxon test. The statistical results are shown in Table 3. For AUC and AUPR, LDAPred performed significantly better than all the other methods at a p-value of 0.05.  The higher the recall rate of the top k lncRNAs, the greater the number of correctly identified lncRNAs that are related to the disease. Figure 2 shows the average recall rate for the first k samples of all 405 diseases. LDAPred is superior to the other methods at different k values, accounting for 86.4% in the top 30, 92.8% in the top 60, 95.1% in the top 90, and 96.3% in the top 120. The recall rate of Ping's method is very close to that of LDAP. The former accounts for 68.9%, 81.2%, 87.5%, and 92.7% among the top 30, 60, 90, and 120, whereas the latter accounts for 68.5%, 81.7%, 88.0%, and 93.3%, respectively. SIMCLDA accounts for 49.3% in the top 30, 63.0% in the top 60, 74.1% in the top 90, and 80.3% in the top 120, exhibiting lower values than Ping's method and LDAP. Compared to the four methods, MFLDA always shows the worst performance, accounting for 42.0%, 53.9%, 60.9%, and 65.5%, respectively. The bold values indicate the higher AUPRs. The higher the recall rate of the top k lncRNAs, the greater the number of correctly identified lncRNAs that are related to the disease. Figure 2

Case Studies on Breast Cancer, Colon Cancer, and Osteosarcoma
To further demonstrate the LDAPred's ability to detect disease-related lncRNAs, we used two separate databases (Lnc2Cancer and lncRNADisease) and related literature to validate candidate genes for breast cancer, colon cancer, and osteosarcoma. The top 15 candidate lncRNAs associated with these cancers were analysed separately (Table 4). Lnc2Cancer is an experimentally supported lncRNA manual management database for various human cancers [28]. It contains more than 1500 published papers collected by hand and 1057 interactions extracted from 531 lncRNAs and 86 cancers, i.e., the expression level (up or down) of lncRNA in cancer [29]. The LncRNADisease 2.0 database is not only a resource that curates the experimentally-supported lncRNA-disease association data, but also a platform that integrates tools for predicting novel lncRNA-disease associations. We used lncRNADisease and lncRNADisease_P to demonstrate the association between experimental support and prediction, respectively. As shown in Table 4, Lnc2Cancer contains 14 candidate lncRNAs, and lncRNADisease contains 13 candidate lncRNAs, confirming the association. lncRNADisease_P contains 23 candidate lncRNAs, confirming that these lncRNAs are more likely to be associated with the diseases.
The remaining three candidates reported in previous studies are marked as the "literature" in Table 4. Among them, the expression of LATS2 is often down-regulated in breast cancer, and the oncogenic function of LINC00673 is determined in part by inhibiting the expression of KLF2 and LATS2 [30]. MEG8 can directly interact with the epigenetic mechanism and may have a predictive effect on the prognosis of breast cancer [31]. In the PWAR5 prediction experiment, the factors that affect the mother cell tumor also affect the breast cancer. These three candidates may be involved in the progression of breast cancer. Another candidate is represented by GEO in Table 4. The GEO Dataset is a relatively comprehensive public gene expression database, and it indicates that NPSR1-AS1 is associated with colon cancer recurrence [32]. The remaining four are labeled as "Unconfirmed" candidates, indicating that they are not in the database or in the related literature. Case studies of these three diseases confirm that LDAPred has a strong ability to detect lncRNAs with potential diseases.

Dataset
To predict the relationships between lncRNAs and diseases, we needed to integrate the attributes and characteristics of each node of the lncRNAs, miRNAs, and diseases. Therefore, we downloaded 2687 lncRNA-disease associations from the LncRNADisease [33] and Lnc2Cancer [28] databases and from the lncRNAs functional description database, GeneRIF [34]. We calculated the similarity of 249 lncRNAs based on the diseases associated with lncRNAs. We obtained the interaction data of 1002 lncRNAs and miRNAs from starBase v2.0, an open source platform containing multiple RNA interactions [35]. We downloaded 13,559 miRNA and disease associations from HMDD v1.0 [36], a human miRNA and disease association database supported by experiments. We calculated the similarity of 495 miRNAs based on the disease association of miRNA. Finally, we downloaded the similarity data of 405 diseases from DincRNA v1.0 [37], calculated based on the directed myelogram of the diseases.

Semantic Similarity of Diseases
A disease can be expressed as a directed acyclic graph (DAG), which can be obtained from Medical Subject Headings (MeSH), and it includes all relevant annotated items of the disease. Studies have shown that the more common the DAG of two diseases, the more similar the two diseases. Wang et al. [38] measured the semantic similarity between diseases according to the DAG of the disease. In this study, we used the calculated semantic similarity of the disease. We utilised matrix D ∈ n d ×n d to represent the similarity of the diseases, where n d is the number of diseases, D ij denotes the similarity between diseases d i and d j , and the similarity value changes between 0 and 1.

Similarity of lncRNAs
The more similar the functions of two lncRNAs, the more similar the related diseases. Therefore, we calculated the similarity of two lncRNAs by calculating the similarity of the two lncRNA-associated diseases. For example, lncRNA l a is associated with diseases d 1 , d 3 , d 4 , and d 6 , and lncRNA l b is associated with diseases d 1 , d 3 , and d 4 . Using the method of Xuan et al. [22], the similarity between S a = {d 1 , d 3 , d 4 , d 6 } and S b = {d 1 , d 2 , d 3 } was calculated, and the calculation result was taken as the similarity between l a and l b . We used a similarity matrix L ∈ n l ×n l to represent the similarity of lncRNAs, where n l is the number of lncRNAs, and L ij represents the similarity between lncRNA l i and lncRNA l j , with similarity values varying between 0 and 1.

Similarity of miRNAs
Similar to the lncRNA similarity calculation, the miRNA similarity was calculated based on the associated diseases. We used the matrix M ∈ n m ×n m to represent the similarity of miRNAs, where n m is the number of miRNAs, M ij represents the similarity between miRNA m i and miRNA m j , and the similarity values are distributed between 0 and 1.

Interaction Matrix
In this study, heterogeneous data resources were synthesized and the interaction matrix was established: the lncRNA-disease association matrix A ∈ n l ×n d , lncRNA-miRNA interaction matrix B ∈ n l ×n m , and miRNA-disease association matrix C ∈ n m ×n d . In matrix A, n l is the number of lncRNAs and n d denotes the number of diseases. If lncRNA A i is associated with disease A j , then A ij is 1; if there is no association, then A ij is 0. In matrix B, n l is the number of lncRNAs and n m represents the number of miRNAs. If lncRNA B i is associated with disease B j , then B ij is 1; if there is no association, then B ij is 0. In matrix C, n m is the number of miRNAs and n d is the number of diseases. If miRNA C i is associated with disease C j , then C ij is 1; otherwise, C ij is 0.

LncRNA-Disease Association Prediction Model Based on a Dual Convolutional Neural Network
We constructed a dual convolutional neural network (CNN) predictive model to predict the lncRNA-disease associations. The left side uses the original information of the lncRNA l i and disease d j node pair to learn its original representation. The right side learns the path association representation of l i and d j from the network topology structure and information flow propagation. Then, the two representations are combined by a CNN and the complete connection layer to obtain the final association prediction score of l i and d j for the association prediction of l i and d j , respectively.

Establishment of the Left Feature Matrix
We utilized lncRNA l 2 and disease d 3 as examples to describe the establishment of the feature matrix. First, if l 2 and d 3 have a connection with more identical lncRNAs, then l 2 and d 3 are more likely to be associated. Therefore, we took the similarity vector s 1 ∈ L between lncRNA l 2 and all lncRNAs, which comprise the second row of matrix L, and the association vector s 2 ∈ A between disease d 3 and all lncRNAs, which comprise the third column of matrix A, and combined them together. Second, if l 2 and d 3 have a relationship with more of the same disease, then l 2 and d 3 are more likely to be associated. Therefore, we combined the second row of matrix A with the second row of matrix D, which is the l 2 -associated vector s 3 ∈ A for all diseases, and the similarity vector s 4 ∈ D for disease d 3 with all diseases. Third, if l 2 and d 3 are associated with more of the same miRNA, then l 2 and d 3 are more likely to be associated. Therefore, we took vector s 5 ∈ B, for which l 2 interacts with all miRNAs; i.e., the second row of matrix B and the third column of matrix C, vector s 6 ∈ C, and d 3 associated with all miRNAs. Finally, we stitched these vector combinations into the feature matrix S = {s 1 , s 2 , s 3 , s 4 , s 5 , s 6 } ∈ 2×(n d +n l +n m ) (Figure 3).
In a network comprising lncRNAs, L represents the original information between lncRNA nodes; i.e., the one-hop similarity information. × represents the similarity of lncRNA nodes after two hops, and is a hyperparameter, which balances the proportion of one hop and two hops and ranges from 0 to 1. ′ is used to integrate the one hop and two hop similarity information in the path. ′ represents the similarity value of lncRNAs and after integrating the topological information. ′ is calculated as follows: Similarly, ′ integrates the one hop and two hop similarity information of the disease, and ′ is the similarity between diseases and after integrating the information flow. The calculation of ′ is as follows: In a network comprising lncRNAs and diseases, A represents the one-hop information between lncRNA and disease node pairs, and ( • + • ) represents the degree of association between lncRNA and disease node pairs after two hops. is a hyperparameter that balances the proportion of one hop and two hops and ranges from 0 to 1. ′ represents the similarity after integrating the path information, and ′ is the ratio of lncRNA and disease after two hops. The degree of association ′ is calculated as shown in Equation (5), Similarly, the association information between the disease and miRNA is expressed by ( ) ′ , and the calculation process is expressed by Equation (6).
( ) ′ is a transposition of ′ , indicating the association between the disease and lncRNA by information flow propagation bi-layer networks, and Equation (7),

Establishment of the Right Side Topological Information Matrix
Inspired by Chen et al. [14], we constructed a comprehensive matrix T = {t 1 , t 2 , t 3 , t 4 , t 5 , t 6 } ∈ 2×(n d +n l +n m ) , which further considers the topological structure of lncRNA, miRNA, and disease-related bi-layer networks via information flow propagation.
In a network comprising lncRNAs, L represents the original information between lncRNA nodes; i.e., the one-hop similarity information. L × L represents the similarity of lncRNA nodes after two hops, and γ is a hyperparameter, which balances the proportion of one hop and two hops and ranges from 0 to 1. L is used to integrate the one hop and two hop similarity information in the path. L ij represents the similarity value of lncRNAs l i and l j after integrating the topological information. L is calculated as follows: Similarly, D integrates the one hop and two hop similarity information of the disease, and D ij is the similarity between diseases d i and d j after integrating the information flow. The calculation of D is as follows: In a network comprising lncRNAs and diseases, A represents the one-hop information between lncRNA and disease node pairs, and (L·A + A·D) represents the degree of association between lncRNA and disease node pairs after two hops. γ is a hyperparameter that balances the proportion of one hop and two hops and ranges from 0 to 1. A represents the similarity after integrating the path information, and A ij is the ratio of lncRNA l i and disease d j after two hops. The degree of association A is calculated as shown in Equation (5), Similarly, the association information between the disease and miRNA is expressed by C T , and the calculation process is expressed by Equation (6).
A T is a transposition of A , indicating the association between the disease and lncRNA by information flow propagation bi-layer networks, and Equation (7), indicates the calculation process.
In the network composed of lncRNA and miRNA, B represents the original interaction information between lncRNA and miRNA node pairs, i.e., the one-hop information, and (L·B + B·M) represents the degree of association information after two hops. γ is used to balance the proportion of one hop and two hops. The one-hop and two-hop integration information is represented by B , and B ij is used to represent the degree of association between lncRNA l i and miRNA m j with the bi-layer network information. B is calculated as follows: Finally, we took the second row of matrix L as vector t 1 , the third row of matrix A T as vector t 2 , the second line of matrix A as vector t 3 , the third line of matrix D as vector t 4 , the second line of matrix B as vector t 5 , and the third row of matrix C T as vector t 6 . We spliced the combination of these vectors into the path eigenmatrix T = {t 1 , t 2 , t 3 , t 4 , t 5 , t 6 } ∈ 2×(n d +n l +n m ) as the right embedding matrix (Figure 4). indicates the calculation process.
In the network composed of lncRNA and miRNA, B represents the original interaction information between lncRNA and miRNA node pairs, i.e., the one-hop information, and ( • + • ) represents the degree of association information after two hops. is used to balance the proportion of one hop and two hops. The one-hop and two-hop integration information is represented by ′ , and ′ is used to represent the degree of association between lncRNA and miRNA with the bi-layer network information. ′ is calculated as follows: Finally, we took the second row of matrix ′ as vector 1 , the third row of matrix ( ) ′ as vector 2 , the second line of matrix ′ as vector 3 , the third line of matrix ′ as vector 4 , the second line of matrix ′ as vector 5 , and the third row of matrix ( ) ′ as vector 6 . We spliced the combination of these vectors into the path eigenmatrix = { 1 , 2 , 3 , 4 , 5 , 6 } ∈ ℜ 2×( + + ) as the right embedding matrix (Figure 4).

Convolution Module
Because the left and right convolution processes are similar, we will only describe the left convolution process in detail herein. S = {s 1 , s 2 , s 3 , s 4 , s 5 , s 6 } ∈ 2×(n d +n l +n m ) was used as the left input of the CNN module. In the first convolution, the length and width of the convolution filter were respectively set to w f and w d , and the number of convolution filters was set to n conv1 , which can be expressed as W conv1 ∈ w f ×w d ×n conv1 . We applied filter W conv1 to S. In addition, to fully learn the edge information, we applied wide convolution by padding zeros before convolution. The definitions of S k,i,j and M conv1,k are as follows: where S(i, j) is the ith row and jth column element of the embedded layer S, and S k,i,j is the region within the filter when the kth filter is slid to position S(i, j). f is the rectified linear unit (ReLU) activation function, and b conv1 ∈ n conv1 is the offset term. The output feature, which is the result after convolution, is M ∈ n conv1 ×(S w +3−n d )×(S l +3−n f ) .
In the pooling layer, M conv1 performs a max pooling operation; i.e., the output in each sub-area is the maximum value. The pooling layer can reduce the length of the feature graph output of convolution and the number of parameters of the model. The pooling operation can be expressed as follows: After two convolutions and pooling were completed, we obtained the final representation M conv2 (k), k ∈ [1, n conv2 ], which represents the number of filters, where n conv2 is the number of filters for the second convolution.
Finally, we flattened M conv2 and obtained the association prediction scores of l 2 and d 3 through the fully connected layer. The score score 1 can be defined as where H is the weight matrix between the fully connected layer and the output layer, and score 1 ∈ 2×1 represents the matrix evaluated as the associated score and the unassociated score. We used the score 1 as the predicted association score of l 2 and d 3 .
Similarly, we employed T = {t 1 , t 2 , t 3 , t 4 , t 5 , t 6 } ∈ 2×(n d +n l +n m ) as the input to the right CNN module and obtained the output of the second pooling layer. N conv4 is the number of filters. The associated prediction scores of l 2 and d 3 were obtained through the fully connected layer. The score can be defined as follows: where K is the weight matrix between the fully connected layer and the output layer, and score 2 is the associated prediction score.

Dual Combination Strategy
To fully utilize the dual prediction score matrix, we designed a dual combination strategy to train the model and obtain the final prediction score. We used λ ∈ [0, 1] to balance the weight of the two paths, and the final predicted score was expressed by the score, which can be defined as follows: The loss functions of the left and right CNNs can be defined as [y label × loga + (1 − y label ) × log(1 − a)], a = e score 1 2 j=1 e score 1 ( j) where y label represents the actual association label between lncRNA and the disease. When lncRNA is associated, it is 1; otherwise, y label is 0. score 1 and score 2 represent l 2 and d 3 , which are the associated scores. M represents the number of training samples, and a and b represent the probabilities obtained by the Softmax function. The dual convolution and combining processes are displayed in Figure 5. The top 50 potential lncRNA candidates for 405 diseases are listed in supplementary Table S1.

Conclusions
LDAPred, which is a new method based on a dual convolutional neural network, was developed to predict the potential associations between lncRNAs and diseases. According to the biological premise that lncRNAs are likely to possess associations with diseases, the embedding layer was established from a biological perspective. The left and right embedding layers capture the original similarities, associations, and interactions among lncRNAs, miRNAs, and diseases, as well as the topological structures of bi-layer networks. The original representation of lncRNA-disease pairs and their network representations were learned by the new framework based on dual convolutional neural networks and information flow propagation. Cross-validation results for 405 diseases and case studies on three diseases indicated that LDAPred has a strong ability to predict potential associations between lncRNAs and diseases.
Supplementary Materials: Supplementary Materials can be found at www.mdpi.com/xxx/s1. Table S1: The top 50 potential lncRNA candidates for 405 diseases.
Author Contributions: P.X. and L.J. conceived the prediction method; L.J. wrote the paper; N.S. and X.L. developed the computer programs; P.X., J.L., and T.Z. analyzed the results and revised the paper.

Conclusions
LDAPred, which is a new method based on a dual convolutional neural network, was developed to predict the potential associations between lncRNAs and diseases. According to the biological premise that lncRNAs are likely to possess associations with diseases, the embedding layer was established from a biological perspective. The left and right embedding layers capture the original similarities, associations, and interactions among lncRNAs, miRNAs, and diseases, as well as the topological structures of bi-layer networks. The original representation of lncRNA-disease pairs and their network representations were learned by the new framework based on dual convolutional neural networks and information flow propagation. Cross-validation results for 405 diseases and case studies on three diseases indicated that LDAPred has a strong ability to predict potential associations between lncRNAs and diseases.
Author Contributions: P.X. and L.J. conceived the prediction method; L.J. wrote the paper; N.S. and X.L. developed the computer programs; P.X., J.L., and T.Z. analyzed the results and revised the paper.