Integration of Neighbor Topologies Based on Meta-Paths and Node Attributes for Predicting Drug-Related Diseases

Identifying new disease indications for existing drugs can help facilitate drug development and reduce development cost. The previous drug–disease association prediction methods focused on data about drugs and diseases from multiple sources. However, they did not deeply integrate the neighbor topological information of drug and disease nodes from various meta-path perspectives. We propose a prediction method called NAPred to encode and integrate meta-path-level neighbor topologies, multiple kinds of drug attributes, and drug-related and disease-related similarities and associations. The multiple kinds of similarities between drugs reflect the degrees of similarity between two drugs from different perspectives. Therefore, we constructed three drug–disease heterogeneous networks according to these drug similarities, respectively. A learning framework based on fully connected neural networks and a convolutional neural network with an attention mechanism is proposed to learn information of the neighbor nodes of a pair of drug and disease nodes. The multiple neighbor sets composed of different kinds of nodes were formed respectively based on meta-paths with different semantics and different scales. We established the attention mechanisms at the neighbor-scale level and at the neighbor topology level to learn enhanced neighbor feature representations and enhanced neighbor topological representations. A convolutional-autoencoder-based module is proposed to encode the attributes of the drug–disease pair in three heterogeneous networks. Extensive experimental results indicated that NAPred outperformed several state-of-the-art methods for drug–disease association prediction, and the improved recall rates demonstrated that NAPred was able to retrieve more actual drug–disease associations from the top-ranked candidates. Case studies on five drugs further demonstrated the ability of NAPred to identify potential drug-related disease candidates.


Introduction
The process of producing a new medicine is typically lengthy, expensive, and fraught with failure; it may require more than 10 y and cost between USD 0.8 billion and USD 1.5 billion on average [1][2][3][4][5]. Therefore, a method to reduce the time and funding costs for the development of new medicines must be identified. That approved drugs are subject to clinical trials endows them with a favorable safety profile. In contrast to developing a medicine from scratch, using indications for current drugs (drug repositioning) [6] can effectively reduce research and development costs and accelerate drug development [7][8][9].
Drug candidates can be further screened for wet laboratory validation using computational predictions of the relationship between licensed drugs and diseases [10,11]. Several approaches for predicting drug-related diseases that have been reported can be classified into two categories. The first category of methods predicts the disease indications for drugs based on the integration of multiple kinds of information about the drugs and diseases. A couple of methods integrate the known drug-disease associations, the drug similarities, and the disease similarities [12,13]. They estimate the association possibilities between drugs and diseases by utilizing a logistic regression classifier and matrix decomposition with a similarity constraint. Wang et al. employed kernel functions to incorporate drug and disease similarities and applied the support vector machine approach to forecast drug-disease correlations [14]. Liang et al. applied sparse subspace learning and graph Laplacian regularization to combine multiple types of drug characteristics to predict drug indications [15]. To infer drug-disease associations, relevant data from drugs and diseases are utilized or combined in these strategies. However, the above-mentioned approaches cannot consider topological information in a network to demonstrate the potential use of a specific drug.
The second method primarily considers prediction based on the topology of the network. For example, heterogeneous network models based on diseases, drugs, and targets are used to infer drug candidates using iterative algorithms [16]. In several methods, random walk algorithms are employed to predict possible drug-disease associations; in fact, they have been employed in networks such as drug similarity, disease similarity, and integrated drug-disease heterogeneity networks [17][18][19][20][21]. However, because these methods do not consider the attribute information of drug and disease network nodes, they cannot learn the deep feature representation of nodes. Furthermore, these shallow-modelbased approaches cannot extract potentially complicated relationships between drug and disease nodes.
Deep learning technologies have been widely utilized for the prediction of miRNAdisease associations [22] and disease-related lncRNAs [23,24]. Owing to the development of deep learning, the indications of drug candidates are identified more accurately in recent approaches by integrating multiple sources of drug-and disease-relevant information. For the prediction of drug-related diseases, models employing graph convolutional and fully connected autoencoders with attention mechanisms are used [25]. Xuan et al. [26] proposed a prediction model comprising a convolutional neural network (CNN) and a bi-directional long short-term memory (BiLSTM) network. Jiang et al. devised a module for forecasting drug-disease correlations by employing Gaussian interaction profile kernels and autoencoders [27]. Deep relationships between drugs and diseases can be extracted more easily using deep learning models. At the node pair level, however, the present deep learning approaches cannot combine and incorporate the drug-disease neighbor topology and attribute information. In addition, when capturing the neighbor topology information in three heterogeneous networks, the multi-scale meta-paths to obtain the set of neighbor nodes is important auxiliary information.
Herein, we propose and develop NAPred, a predictive model for capturing, encoding, and learning the neighbor topology and attribute representation of node pairs from diverse heterogeneous networks. The primary contributions of our proposed model are as follows: • Three drug-disease heterogeneous networks were constructed, each with different aspects of drug similarities, to facilitate the acquisition of topological information regarding drug and disease nodes from different perspectives. To construct sets of different types of neighbors of the nodes, multi-scale meta-path sets of drug or disease nodes were established; • We present an approach based on fully connected and convolutional neural networks with attention mechanisms for learning topological information regarding the same type of neighbors for drug and disease nodes. Multiple-neighbor feature representations extracted from drug and disease nodes were adaptively combined via a neighbor-scale-level attention mechanism; • We developed a neighbor-topology-level attention mechanism to distinguish the contributions and then obtain the neighbor topological representations of the nodes; this is because different types of neighbor topological features contribute differently to drug-disease association prediction; • The attribute information of the node pairs was extracted from the three heterogeneous networks using the proposed embedding mechanism and encoded using a convolutional autoencoder (CAE). The premise of this embedding mechanism is that drug-disease pairs are more likely to be associated with each other if they exhibit similarities or associations with more typical drugs or diseases.

Evaluation Metrics
The performances of all prediction models were analyzed and compared using fivefold cross-validation. Positive and negative samples were those with known and unknown drug-disease associations, respectively. We used 4/5 of the positive samples, as well as 4/5 of the random negative samples formed in the training set in each fold of the crossvalidation. The remaining 1/5 positive samples, as well as all negative samples were tested. The prediction correlation scores of the test samples were generated and ranked; the higher the rank of the positive sample use cases, the better was their prediction performance.
Several evaluation metrics were used in this study, i.e., the true positive rate (TPR), false positive rate (FPR), receiver operating characteristic (ROC) curve, area under the ROC curve (AUC) [28], precision-recall (PR) curve, area under the PR (AUPR) curve [29], and recall at various top-k. The performances of all models in the cross-validation were compared based on the average AUC and AUPR.
The AUC is an accepted appraisal metric for comparing algorithms and probabilistic estimates [30]. The TPR and FPR at various thresholds yield the ROC curve. The sample was regarded as positive if the predicted association score of a drug-disease pair exceeds a threshold θ; otherwise, it was considered negative. The fraction of correctly (incorrectly) detected positive (negative) samples among all the positive (negative) samples is denoted as the TPR (FPR).
where TP (FN) represents the number of positive samples correctly (incorrectly) classified as positive (negative) and TN (FP) indicates the number of negative samples correctly (incorrectly) categorized as negative (positive) [31,32]. This was due to the uneven distribution of drug-disease candidates. The AUPR curve provides more information regarding the AUC for assessing the predictive performance [29]. precision and recall were determined as follows: where precision indicates the rate of TP samples among those anticipated to be positive and recall expresses the rate of positive samples accurately recognized among the total positive samples. The AUC and AUPR curve were calculated using the mean cross-validation [33]. Each fold's mean AUC and AUPR curve must be calculated, and the final score is the average of the five results.
Considering that biologists typically choose the top-ranked candidates and confirm their predictions based on wet laboratory trials, determining the actual drug-disease connections is critical. Therefore, for the projected outcomes, the recall rates of the topk candidate drug-disease pairs were evaluated. The more trustworthy the prediction performance, the higher is the recall of the top-k.
For each of the 763 drugs, we calculated the AUC and AUPR curve at each fold before calculating their five-fold mean. The final results were averaged across all AUCs (or AUPR curves) for the 763 drugs. As shown in Figure 1A, in the comparison of the 763 drugs, NAPred achieved the best mean AUC value among all the methods investigated (AUC = 0.978), outperforming GFPred by 3.3%, CBPred by 5.2%, SCMFDD by 25.5%, LRSSL by 14.7%, MBiRW by 15%, and HGBI by 27.6%. The second-best model GFPred successfully learned multiple attribute representations of nodes and fully extracted topological information from multiple heterogeneous networks. This suggests that constructing heterogeneous networks on the basis of multiple drug similarities and capturing topological information improved the prediction accuracy. CBPred, LRSSL, and MBiRW extract topology information from heterogeneous networks for drug repositioning, where CBPred considers the path information between pairs of diseases, whereas MBiRW disregards the properties of the nodes. Hence, CBPred performed better, whereas MBiRW performed worse than LRSSL. SCMFDD is a matrix-decomposition-based model. The dimensionality reduction process may cause the lossof low-frequency valid information. Therefore, SCMFDD performed worse, but better than HGBI; additionally, it did not exploit the multiple similarities of the drugs. In conclusion, our NAPred achieved the best results owing to the comprehensive learning of the neighborhood topology, as well as the property information of the drug-disease pairs. As shown in Figure 1B, our method NAPred performed better than GFPred, CBPred, LRSSL, MBiRW, SCMFDD, and HGBI by 14.8%, 22.8%, 28.4%, 34.6%, 37.8%, and 37.9%, respectively, based on the AUPR curves of 763 drugs.
In addition, to validate the robustness of our model under multiple datasets, we used the CC dataset [34] to replace drug-related data and implement another instance of our method, N APred DD . We utilized the A (chemistry) data, B (targets) data, and C (networks) data of CC dataset to replace the original chemical substructure, protein structural domain, and gene ontology data of the drugs. In Figure 1, the AUC and AUPR of N APred DD are still higher than those of the compared methods. The experimental results demonstrated the good robustness of our model.
To evaluate the impact of cross-validation folds on NAPred performance, we also performed an additional ten-fold cross-validation. The number of training samples in the ten-fold cross-validation was larger than that in the five-fold cross-validation. As shown in Supplementary Table S1, the AUC and AUPR for the ten-fold cross validation were 0.8% and 1.3% higher than the five-fold cross validation. NAPred achieved better performance when the training data were increased.

Case Studies of Five Drugs
Case studies of ampicillin, ceftriaxone, doxorubicin, erythromycin, and itraconazole were conducted to further illustrate the efficacy of NAPred in drug-disease association prediction. The association prediction scores for each drug candidate in the descending order, as well as the top-ten candidates for each of the five drugs are listed in Table 2.
The Comparative Toxicogenomics Database (CTD), which was painstakingly acquired and validated based on the literature, contains information regarding drugs and their effects on human health [35]. DrugBank is a database containing drug-related targets, mechanisms of action, interactions, and integrated molecular information [36]. A total of 16 candidate diseases are covered by CTD, and 23 candidates are recorded in DrugBank. This indicates that the disease candidate was receiving effective treatment.
ClinicalTrials.gov, which is the world's largest searchable clinical trial database, contains data pertaining to clinical studies conducted worldwide; the National Library of Medicine in the United States contributes to its resources. As supporting material, we only used experimental records with a "completed" status. PubChem is a public database sponsored by the National Institutes of Health that includes information regarding chemicals and their biological activity, safety, and toxicity [37]. There were 23 candidate diseases supported by ClinicalTrials.gov, whereas PubChem approved 33 of the candidates. These records indicate that clinical trials established an association between the candidate disease and the relevant drug.
Besides manually validated drug-disease correlations, CTD additionally includes those derived from the literature with temporarily unverified associations. The inferred section of the CTD contains two candidates, which suggests a more plausible correlation between the diseases and their corresponding drugs. Among all 50 drug candidates, two candidates were labeled as "unconfirmed".
In addition, we conducted case studies on an additional five drugs (betamethasone, acetaminophen, etoposide, flurbiprofen, and verapamil) and list their top-ten candidate diseases in Supplementary Table S2. There were 42 candidate diseases recorded by CTD. There were 29 and 42 candidates covered by DrugBank and PubChem. ClinicalTrials contained 20 candidate diseases. This indicates that these candidates are more likely to be associated with the corresponding drugs. Only one candidate was labeled as "unconfirmed". All the above analysis indicated that NAPred had the ability to discover potential candidate drug-disease associations.

Prediction of Novel Drug-Related Diseases
Finally, we applied the trained NAPred to 763 drugs to predict candidate diseases. The top-30 drug-related candidate diseases selected by our model are listed in Supplementary  Table S3. They can be used by biologists to facilitate further wet experiments for validation. Figure 3 shows our proposed predictive model for drug-related disease candidates; the model comprises two branches. Three drug-disease heterogeneity networks were first established to correlate the similarities between drugs and diseases from different perspectives. For the first branch, we obtained the sets of neighbor nodes for drugs and diseases based on meta-paths of different scales. Neighbor-scale-level and neighbor-topology-level attention mechanisms are proposed for capturing drug and disease neighbor information, followed by encoding pairwise neighbor topology representations using convolutional neural networks. In the second branch, CAE was utilized to learn a pair of drug-disease attribute representations from the three drug-disease heterogeneous networks. The scores predicted from the two branches were weighted and summed to obtain the scores for the corresponding associations. A higher score signifies the higher possibility of an association.

Drug-Drug-Drug
Drug-Disease-Drug Drug-Drug-Disease

Dataset
Based on previous studies, we obtained drug-disease association data [15], chemical substructure data of drugs, protein structural domain data of target proteins, and gene ontology information of target proteins. Initially, data pertaining to drug-disease associations were obtained in the UMLS [38], which contains information regarding 763 drugs, 681 diseases, and 3051 known drug-disease associations. We extracted drug chemical substructure data from the PubChem database [39] and drug target protein structural domain data from the InterPro database [40]. The UniProt database was used to obtain gene ontology information regarding the target protein of the drug [41]. The numbers of drug chemical substructures, drug target protein structural domains, and drug target protein gene ontologies in our dataset were 623, 1426, and 4447, respectively.

Matrix of Drug Properties
Let the matrix T c denote the case in which each drug contains a chemical substructure, and T c ∈ R N r ×N c . N r and N c indicate the number of drugs and all relevant chemical substructures, respectively. A T c ij value of 1 implies that drug r i contains the chemical substructure c j , whereas a value of 0 implies otherwise. The vector of the chemical substructure attributes of r i , which is obtained from the i-th row vector of T c , is represented as T c i . Let the matrix T p ∈ R N r ×N p denote the cases of protein structural domains discovered in the respective associated target proteins of N r drugs; subsequently, N p is the number of protein structural domains of all drug target proteins. T p ij is 1 for the target protein related to drug r i containing the j-th protein structural, and 0 otherwise. The protein structural domain attribute vector of r i is obtained from the i-th row of data in T p .
The matrix T g ∈ R N r ×N g is used to indicate whether N g gene ontology information is included in N r drugs and their associated target proteins. A T g ij value of 1 implies that the target protein associated with drug r i contains gene ontology g j , whereas a value of 0 implies otherwise. The target protein gene ontology property vector of r i is represented by the i-th row vector T g i .

Establishment of the Drug Network
For two drugs r i and r j , a higher number of identical chemical substructures between them signifies a higher level of similarity between them. The cosine similarity of their chemical substructures can be calculated using the strategy previously described by Liang et al. [15]; in fact, we used it as the first cosine similarity between r i and r j .
Similarly, based on the protein domains or protein-associated gene ontologies in the two drug-related target proteins, cosine similarity calculations can be applied to determine the second and third similarities of a drug.
We treated two drug nodes as having connected edges when the calculated drug similarity exceeded 0. The weights on the edges are expressed as the similarity between the two drugs ( Figure 4). We used the matrices R c = R c ij ∈ R N r ×N r , R p = R p ij ∈ R N r ×N r , and R g = R g ij ∈ R N r ×N r to denote the drug networks obtained based on the similarity of the three drugs. For instance, based on the chemical substructure, R c ij represents the similarity between r i and r j .   Figure 4. Construction of three heterogeneous networks based on multiple kinds of drug similarities, drug-disease associations, and disease similarities.

Establishment of the Disease Network
The similarity of diseases was calculated to establish disease networks. Wang et al. [42] computed the similarity between diseases using their directed acyclic graph (DAG). A DAG that includes all semantic terms associated with a disease can be used to illustrate the disease. A higher number of disease terms in the DAGs of two diseases implies a higher semantic similarity between them. The corresponding edges between any two diseases can be added if their similarity exceeds 0. The weights on these edges reflect the similarity between the two diseases. The matrix D = [D ij ] ∈ R N d ×N d represents the disease network, with D ij denoting the semantic similarity of diseases d i and d j . The attribute vector of d i is denoted as D i .

Drug-Disease Heterogeneous Network
Connecting edges were added to link the nodes among the three drug networks and a disease network using existing drug-disease association data (Figure 4). Let the association matrix A ∈ R N r ×N d denote the association between drugs and diseases, and let A ij = 1 if edges connected between r i and d j exist and A ij = 0 if no connection exists.
, which is derived from the first drug similarity, drug-disease association, and disease semantic similarity, represents the first drug-disease heterogeneous network. Similarly, regarding the second and third drug similarities, the second and third drug-disease heterogeneous network can be generated. These two heterogeneous networks can be represented by We denote these three drug-disease heterogeneous networks by U m , where m ∈ {1, 2, 3}.

Multi-Scale Meta-Path Sets
The meta-path [43] can be expressed as a path shaped as G 1 R 1 G 2 R 2 · · · R t G t (abbreviated as G 1 G 2 · · · G t ). The complex relationship of node types G 1 and G t is described by R = R 1 • R 2 • · · · • R t . Two nodes can be connected to each other via different metapaths in a heterogeneous drug-disease network. Figure 1 shows the manner by which drugs r 1 and r 4 can be connected by meta-paths r − r − r and r − d − r, with different meta-paths showing different semantics. For example, in r 1 − r 2 − r 4 (rrr), drugs r 1 and r 4 may be similar if both have functions similar to r 2 . In r 1 − d 5 − r 4 (rdr), an association is indicated between both drugs and d 5 , suggesting that r 1 may be similar to r 4 . Based on the structural information from U m , we can obtain the first-order meta-paths of drug nodes with r − r and r − d to form the set P

Neighbor Sets Based on Meta-Paths at Different Scales
For node r i (d j ) and the set of meta-paths P ) and the disease neighbor node set N D (k) , where the first-order neighbors of the node include itself.
For the drug (disease)-type neighbors of r i (d j ), we calculated the top-N k neighbors that were the most similar to r i (d j ) based on their similarity to all other drugs (diseases). For the disease (drug)-type neighbors of r i (d j ), the disease (drug) nodes associated with r i (d j ) were ranked based on their occurrence frequency, and the top-N k nodes of the ranking were retained as neighbors of r i (d j ).
As shown in Figure 3, for r 1 and the set of meta-paths P (1) r and P (2) r , assuming N k = 3, we can obtain the first-order drug neighbor nodes of r 1 based on P (1) r via meta-paths r − r, retain the three top-ranked neighbors of r 1 , and obtain the set N R (1) r 1 = {r 1 , r 2 , r 4 }. Similarly, r 1 captures and retains the top-N k disease neighbors via meta-paths r − r − d and r − d − d in P (2) r , thereby forming its second-order disease neighbor set N D (2)

Aggregation of Multi-Scale Neighbor Features
We propose a fully connected neural network with mean aggregation [44] to effectively combine the network topology in U m with the characteristics of same-type nodes to learn the low-dimensional features of same-type neighbors at different scales. Because the learning frameworks of both drug and disease nodes are similar, we describe r i and its drug (disease)-type neighbors as an example.
For the kth-order drug neighbor set N R (k) r i of r i , the attribute vector f r n of its neighbor node r n ∈ N R (k) r i can be obtained from the drug attribute matrix (T c , T p , T g ) corresponding to U m . Because f r n is high-dimensional and sparse, we first performed the mean aggregation of the attribute vectors of the kth-order drug neighbors of r i , and the aggregated vector . . , f r n , . . . , r n ∈ N R (k) r i Subsequently, we project h R (k) r i into the low-dimensional feature space through a fully connected network and obtain the low-dimensional kth-order drug neighbor feature vector u R (k) r i as follows: where σ denotes the activation function ReLU [45], W (k) R the weight matrix when the neighbor type is a drug, and b (k) R the bias vector. K denotes the total number of orders, and K = 2 in our model.

Same-Type Neighbor Topology Encoding Based on Neighbor-Scale-Level Attention
Because the drug (disease)-type neighbor node information at different scales of r i contributes differently to the learning of the drug (disease) neighbor topological representation of r i , we established a neighbor-scale-level attention to learn the attention weights of order 1-k neighbor feature vectors of the same type. For the kth-order drug neighbor where h Scale is the weight vector at the neighbor scale level; W Scale and b Scale are the weight matrix and bias vector, respectively. The normalized attention coefficient is α Scale k , which can be obtained using the so f tmax function, as follows: The drug neighbor topology representation u Rr i of r i obtained using the attention mechanism is: 3.3.5. Neighbor Topology Encoding Based on Attention Enhancement at the Neighbor Topology Level r i contains two types of neighbor nodes, drug and disease, whose neighbor topologies are represented as u Rr i and u Dr i , respectively. However, the importance of different types of neighbor nodes for association prediction varies, and neighbor-topology-level attention is proposed to enhance the neighbor topology representation of r i . The attention score for the same-type neighbor topology representation of r i is: where t ∈ {R, D}, W Topo and h Topo are the neighbor-topology-level weight matrix and weight vector, respectively, and b Topo is a bias vector. The normalized attention weights Finally, the augmented representation of the r i neighbor topology obtained using the attention mechanism is u r i , expressed as follows: Here, u Similarly, the neighbor topology representation u (m) d j of d j in U m can be obtained. These neighboring topological representations are used to form the feature matrices S of r i -d j node pairs, as follows: where N f denotes the dimension number of the neighbor topology representation.

CNN-Based Pairwise Neighbor Topology Encoding
The feature matrix of the first branch S is passed into the CNN, which learns the r i -d j neighbor topology representations. We filled the periphery of S with zeros to learn the edge features of S and then obtained the new matrixŜ.
We established a CNN module using convolutional and pooling layers. The filter length and breadth relative to the convolution layer are denoted by w l and w h , respectively; a total of n conv filters were used. After applying the convolution filter W conv ∈ R w l ×w h ×n conv toŜ, a feature map Z ∈ R n conv ×(4−w l +1)×(2+N f +N f +N f −w h +1) was generated.Ŝ k,i,j represents the sliding of the k-th filter to position (i, j) ofŜ, and it is defined as: where where σ is the ReLU function and b the bias vector. The position (i, j) in the feature map Z k is represented by Z k (i, j). The more significant features of Z k were extracted using the max-pooling layer. The filter length of the max-pooling layer is w e , and the width is w b . The k-th feature map of all feature maps P output by the pooling layer is P k , and P k (i, j) can be calculated as: where In the CNN module, we set the number of filters in the convolutional layer to 16, the kernel size to 2 × 2, and the stride size to 1. In the pooling layer, the kernel size was set to 2 × 2, and the step size and zero-padding were set to 1 and 0, respectively. After performing processing in the convolution and max-pooling layers, the output vector z NT was obtained. Subsequently, z NT was input to the fully connected and so f tmax layer [46], which yielded the association probability distributed for the first branch, as follows: Score NT = so f tmax(W so f t1 z NT + b so f t1 ), (15) where W so f t1 is the first branch of the fully connected layer's weight matrix and b so f t1 is the corresponding bias vector. Score NT indicates the association probability distribution for the C(C = 2) classification, including the likelihood of a drug and disease being associated and otherwise.

Attribute Embedding Matrix for Drug-Disease Pairs
We introduced an embedding strategy to extract the nodal attributes of drug-disease pairs ( Figure 5). If r i (d j ) is similar (related) to a more typical drug or related (similar) to a disease, then r i -d j is likely to be related. Therefore, information regarding the properties of drugs and diseases must be learned from the pairwise node level.  For a heterogeneous drug-disease network U m , U m i contains the m-th similarity of r i with all drugs and the association with all diseases, and U m N r +j contains the association of d j with all drugs and the similarity with all diseases. Therefore, we used the attribute vectors U m i and U m N r +j (m = 1, 2, 3) to perform splicing such that the attribute embedding matrix P of r i and d j can be obtained. P is expressed as follows: where P has a dimension of 2 × ((N r + N d ) × 3).

CAE-Based Pairwise Node Attribute Encoding
Because the node attribute matrix P obtained from the three heterogeneous networks is high-dimensional and sparse, meaningless and non-representative information may be present. Therefore, we performed encoding and decoding based on a CAE to comprehensively learn the attribute information of drug-disease pairs in the original data distribution, as shown in Figure 3.
Encoder: Two hidden layers, each comprising a convolutional layer and a max-pooling layer, constitute the encoder. The edge features of P should be preserved and learned via zero-padding. The first hidden layer uses the zero-padded P as input and yields the feature map Z (1) Encoder encoded as: Subsequently, the feature map of the t-th layer Z (t) Encoder is generated as follows: where σ is the ReLU function. W

(t)
Encoder denotes the encoder's t-th hidden layer's weight matrix, and b (t) Encoder is the corresponding bias vector. t = 2, . . . , L En . L En indicates the encoder's total number of layers, and the convolution computation is indicated by " * "; max denotes the max-pooling processing for capturing the most critical features within every feature map by downsampling the potential representations acquired from the convolution layer.
Decoder: Using the decoder, we projected the Z (L En ) Encoder code such that it returns to its initial space and reassembled it to obtain the decoding matrix. The variance between the decoding matrix and the initial matrix P was evaluated, and an optimal coded feature map was obtained. Three hidden layers, each with a transposed convolutional layer, constitute the decoder. For Z (L En ) Encoder as the input of the first hidden layer of the decoder, the feature map Z  Optimization: Our optimization objective was to renderP as consistent as possible with the input P. The loss function is expressed as: where P is the input of the encoder,P the output at the decoder, T train the number of training samples, and P n the embedding matrix of the nth drug-disease pair in the corresponding training sample. Adam's algorithm [47] was used to optimize loss auto . The back propagation [48] approach was used to train the CAE and update loss auto . Using the iterative algorithm, the pairwise property encoding was regarded as the output Z (L En ) Encoder of the last encoder layer, denoted by F PA .
To acquire the association probability of the second branch of node pair r i -d j Score PA , F PA was processed in the fully connected and so f tmax layer. Score PA is expressed as: Score PA = so f tmax(W so f t2 F PA + b so f t2 ), (22) where W so f t2 and b so f t2 are the weight matrix and bias vector of the fully connected second branch, respectively. Score PA is the association probability distribution for the C(C = 2) classification.

Final Integration and Optimization
The loss function in the first branch can be expressed as the cross-entropy between the true label y NT and the drug-disease association prediction result Score NT , as follows: where N train is the set of training samples and y NT j represents the probability of a drugdisease association. If an r i -d j pair has an association, then y NT j is 1; otherwise, it is 0. In the second branch, the cross-entropy loss function loss PA is defined as: y PA j log (Score PA j ) (24) We trained the loss functions loss NT and loss PA separately until their minimum values were attained. The final correlation prediction score is calculated as follows: where λ denotes a hyperparameter that ranges from 0 to 1 and was used to measure the contribution of neighboring topologies and pairwise node attributes to the association prediction score.

Conclusions
We proposed the NAPred method to determine the association between drug candidates and diseases. The three proposed heterogeneous networks facilitated neighbor topology extraction and pairwise node attribute embedding using multiscale meta-paths. A framework comprising a convolutional neural network with attention mechanisms and CAE was constructed to encode and integrate neighbor topological representations and pairwise attribute representations. Two attention mechanisms were proposed to assign greater weights to multi-scale features and topologies. NAPred's ability to discover potentially relevant diseases for drugs was validated through case studies and a cross-validation of five drugs. Numerous experimental results showed that NAPred's predictions outperformed existing methods. Our predictive model serves as a tool for screening to recognize potential drug-disease associations, thereby allowing biologists to conduct wet laboratory research for determining real drug-disease associations.