A Systematic Review of Deep Learning Methodologies Used in the Drug Discovery Process with Emphasis on In Vivo Validation

The discovery and development of new drugs are extremely long and costly processes. Recent progress in artificial intelligence has made a positive impact on the drug development pipeline. Numerous challenges have been addressed with the growing exploitation of drug-related data and the advancement of deep learning technology. Several model frameworks have been proposed to enhance the performance of deep learning algorithms in molecular design. However, only a few have had an immediate impact on drug development since computational results may not be confirmed experimentally. This systematic review aims to summarize the different deep learning architectures used in the drug discovery process and are validated with further in vivo experiments. For each presented study, the proposed molecule or peptide that has been generated or identified by the deep learning model has been biologically evaluated in animal models. These state-of-the-art studies highlight that even if artificial intelligence in drug discovery is still in its infancy, it has great potential to accelerate the drug discovery cycle, reduce the required costs, and contribute to the integration of the 3R (Replacement, Reduction, Refinement) principles. Out of all the reviewed scientific articles, seven algorithms were identified: recurrent neural networks, specifically, long short-term memory (LSTM-RNNs), Autoencoders (AEs) and their Wasserstein Autoencoders (WAEs) and Variational Autoencoders (VAEs) variants; Convolutional Neural Networks (CNNs); Direct Message Passing Neural Networks (D-MPNNs); and Multitask Deep Neural Networks (MTDNNs). LSTM-RNNs were the most used architectures with molecules or peptide sequences as inputs.


Introduction
The key aims to curing diseases using de novo drug development involves the molecular design of new chemical entities with desired properties or the identification of known molecules that can modulate the effect of a disease. The generalized steps in the drug discovery pipeline include target discovery, lead compound discovery and synthesis pathways, and lead optimization [1]. This process can take up to five years, and 5000-10,000 candidate compounds are tested to achieve a single approved drug. On average, it takes 10-15 years with a total cost of $2-3 billion for a new drug to reach the market [2,3]. Once a target has been identified, the pharmaceutical industry and academic centers follow several workflows to identify molecules that possess the characteristics that render them acceptable as drugs [4]. However, the chemical space is vast (i.e., 10 23 -10 60 ), and the exploration of a molecule balancing multiple properties, as well as safety and potency against a specific target, is challenging [5].

Protocol and Registration
This systematic review was registered on PROSPERO, the international prospective register of systematic reviews, of the National Institute for Health Research. The review question was stipulated as: "Which Deep Learning methodologies have been used for drug design or drug discovery and have been validated with in vivo studies?" The protocol for the systematic review can be found at [17]. The Systematic Reviews and Meta-Analyses (PRISMA) checklist for systematic reviews was applied [18].

Eligibility Criteria
We considered published studies which utilized a deep learning methodology to drug discovery and drug design, and the resulting molecules of the study have been validated in vivo. We considered studies that investigated small molecules and peptides as potential therapeutic candidates for a specific disease. All publications were written in English and published between January 2018 and April 2022. Further details on the characteristics of individual studies are covered in Section 3. Studies that do not include the in vivo evaluation of the selected compounds by the deep learning method were excluded. Studies that did not contain information about the deep learning method developed were also excluded.

Study Information Sources and Search Terms
The sources for performing the literature review were Scopus [19], PubMed [20], SciFinder [21], and Google Scholar [22]. These databases were selected because they contain an abundance of publications and peer-reviewed papers. The search on these databases was completed in April 2022. The terms that were used to search abstracts, titles, and keywords of papers were: Figure 1. The workflow followed by most studies presented in this review. It contains molecules, molecular encoding, a deep architecture model, virtual screening, and/or molecular docking to reduce the number of candidate compounds to a final set of compounds. These are synthesized and tested for their activity in vitro and in vivo.

Protocol and Registration
This systematic review was registered on PROSPERO, the international prospective register of systematic reviews, of the National Institute for Health Research. The review question was stipulated as: "Which Deep Learning methodologies have been used for drug design or drug discovery and have been validated with in vivo studies?" The protocol for the systematic review can be found at [17]. The Systematic Reviews and Meta-Analyses (PRISMA) checklist for systematic reviews was applied [18].

Eligibility Criteria
We considered published studies which utilized a deep learning methodology to drug discovery and drug design, and the resulting molecules of the study have been validated in vivo. We considered studies that investigated small molecules and peptides as potential therapeutic candidates for a specific disease. All publications were written in English and published between January 2018 and April 2022. Further details on the characteristics of individual studies are covered in Section 3. Studies that do not include the in vivo evaluation of the selected compounds by the deep learning method were excluded. Studies that did not contain information about the deep learning method developed were also excluded.

Study Information Sources and Search Terms
The sources for performing the literature review were Scopus [19], PubMed [20], SciFinder [21], and Google Scholar [22]. These databases were selected because they contain an abundance of publications and peer-reviewed papers. The search on these databases was completed in April 2022. The terms that were used to search abstracts, titles, and keywords of papers were: 1.
("in vivo" OR "animal" OR "mouse" OR "murine" OR "rat") A correct balance between sensitivity and specificity of the research was identified, in order to maximize high quality data retrieval [23]. A sensitive search provides the researchers with the opportunity to lower the risk of relevant data loss, however, more irrelevant literature is retrieved as well, increasing the time for filtering and screening [23]. On the other hand, specificity decreases the retrieval of irrelevant results and there is a substantial amount of time saving for filtering and screening the results. The drawback is that the more specific the search becomes, the higher the risk of missing relevant literature [23,24]. In the case of this systematic review, an example of a specificity search is the operator is: de novo AND autoencoder. The respective example of a sensitive search is: drug discovery AND neural network AND in vivo. As indicated in Section 2.4, from a total of 283 papers, only 12 were selected for this systematic review. The low percentage of papers that meet the eligibility criteria is attributed to the inclusion of the search term "in vivo", which dramatically increased the produced results without necessarily increasing the number of eligible papers. Several papers contained the search term "in vivo" without containing an experimental evaluation of the model. On the other hand, removing this specific search term could reduce the number of identified papers with a risk of losing articles of interest. An article could possibly present in vivo studies to verify the in silico results and not describe the selected animal model in the abstract.

Study Selection
The titles and abstracts of papers obtained using the search terms presented in Section 2.3 were collected. Publications that did not meet the eligibility criteria were removed. The remaining articles were carefully studied and examined. Those satisfying the inclusion criteria were characterized and included in the present review. A total of 464 papers were initially identified. Following the removal of duplicates, 283 papers remained. All abstracts were screened, resulting in 36 papers that were retained for full text screening. A list of selected papers for full text screening can be found in the Supplementary Material. Although a lot of studies present interesting results on the application of deep learning models in drug design, "real world" application examples of published algorithms are still relatively rare. Only 12 out of 36 papers present a deep learning model whose results are validated in vivo. Most of the retrieved studies presented the possibilities of deep learning in drug discovery, highlighting the importance of further in vitro or/and in vivo evaluation [25]. Other studies took their research one step further and confirmed the in silico results with in vitro experiments [26]. Studies that continued the evaluation of the in silico results with in vivo experiments were scarce. Deep learning models can accelerate, for example, the hit identification and lead optimization steps, which are present in the early drug discovery phase. The in vitro and in vivo studies are integral in the preclinical phase of the drug development pipeline. The existence of both early-stage research and preclinical studies is not very usual and research is probably conducted by scientists from several fields. As a result, the papers that met the criteria of this review-validating the result of a deep learning algorithm with in vivo experiments-are very limited. However, even the existence of those few published papers is essential for the direct evaluation of the contribution of deep learning methods in drug discovery and development. Papers considered in each stage of the review process are shown in Figure 2.

Outcomes
This review includes the deep learning architectures developed in each study, the molecular representation, the selected animal model for the validation of the identified compounds, the drug/compounds reported by each study as the most effective in an animal model, and the pipeline followed in each study.

Figure 2.
A summary of the papers considered in each stage of the review process. Studies combining early-stage drug discovery and preclinical studies are very limited, resulting in 12 studies to be included in the review.

Outcomes
This review includes the deep learning architectures developed in each study, the molecular representation, the selected animal model for the validation of the identified compounds, the drug/compounds reported by each study as the most effective in an animal model, and the pipeline followed in each study.

Results
We first present the fundamentals of deep learning algorithms, and we review the latest developments in the application of various models in drug discovery. These include not only in silico applications, but also established cases with experimental verification results.

Applied Deep Learning Models Overview
The deep learning models presented here are divided into four categories, including the models based on AEs, GANs, RNNs, and CNNs. The basic principles and recent developments of these models are described together with highlights of their use in drug discovery.

Results
We first present the fundamentals of deep learning algorithms, and we review the latest developments in the application of various models in drug discovery. These include not only in silico applications, but also established cases with experimental verification results.

Applied Deep Learning Models Overview
The deep learning models presented here are divided into four categories, including the models based on AEs, GANs, RNNs, and CNNs. The basic principles and recent developments of these models are described together with highlights of their use in drug discovery.

Autoencoders
AEs are deep learning structures for unsupervised learning that consist of an encoder and a decoder. They are a type of feed-forward neural network with an extra bias for calculating the error of reconstructing the original input [12]. They use unsupervised learning for dimensionality reduction, compressing the input in the hidden layer, and generating an output that is close to the original input as much as possible (Figure 3). One variant of AEs is the Adversarial Autoencoder (AAE). AAE is a probabilistic autoencoder that uses Generative Adversarial Networks (GANs) to perform variational inference by matching the aggregated posterior distribution of the latent representation of the autoencoder to an arbitrary prior distribution [18]. Adversarial training is used for discriminatively predicting whether samples originate either from hidden code or a user-specified distribution [12]. AAE can be used for semi-supervised classification, unsupervised clustering, dimensionality reduction, etc. [12,27]. A Variational Autoencoder (VAE) assumes that the data are sampled from an arbitrary statistical distribution [28]. It is trained in an unsupervised manner with an encoder that provides a low-dimensional latent representation of the data vector, and a decoder which attempts to reconstruct the input vector. The encoder transforms its input into the parameters of a multidimensional statistical distribution, and sampling occurs where a point is drawn from the encoded distribution and fed into the decoder [28]. It can be seen as a probabilistic version of AE that can generate new data and transform existing data within an encoding-modification-decoding scheme [29]. A VAE which directly encodes from and decodes to discrete data represented as a parse tree from a context-free grammar is called Grammar Variational Autoencoder (GVAE). This architecture ensures that the generated outputs of discrete data are syntactically valid [30].

Autoencoders
AEs are deep learning structures for unsupervised learning that consist of an encoder and a decoder. They are a type of feed-forward neural network with an extra bias for calculating the error of reconstructing the original input [12]. They use unsupervised learning for dimensionality reduction, compressing the input in the hidden layer, and generating an output that is close to the original input as much as possible (Figure 3). One variant of AEs is the Adversarial Autoencoder (AAE). AAE is a probabilistic autoencoder that uses Generative Adversarial Networks (GANs) to perform variational inference by matching the aggregated posterior distribution of the latent representation of the autoencoder to an arbitrary prior distribution [18]. Adversarial training is used for discriminatively predicting whether samples originate either from hidden code or a user-specified distribution [12]. AAE can be used for semi-supervised classification, unsupervised clustering, dimensionality reduction, etc. [12,27]. A Variational Autoencoder (VAE) assumes that the data are sampled from an arbitrary statistical distribution [28]. It is trained in an unsupervised manner with an encoder that provides a low-dimensional latent representation of the data vector, and a decoder which attempts to reconstruct the input vector. The encoder transforms its input into the parameters of a multidimensional statistical distribution, and sampling occurs where a point is drawn from the encoded distribution and fed into the decoder [28]. It can be seen as a probabilistic version of AE that can generate new data and transform existing data within an encoding-modification-decoding scheme [29]. A VAE which directly encodes from and decodes to discrete data represented as a parse tree from a context-free grammar is called Grammar Variational Autoencoder (GVAE). This architecture ensures that the generated outputs of discrete data are syntactically valid [30]. . An autoencoder consists of an encoder functionality, which translates an input into a latent space, and a decoder, which translates the internal latent space back to the original input space. The goal of the autoencoder is to compute a reconstruction x' with minimal error compared to the original input x.
AEs have been widely used in de novo drug design [31]. The encoder converts the discrete representation of a molecule into a multidimensional continuous representation, and the decoder converts these continuous vectors back to discrete molecular representations. This model allows the exploration of the chemical space through the development of optimized chemical structures. Schultz et al. [32] developed a VAE-based software that generated novel antagonists of the NMDA receptor. Data obtained in silico and experimentally were combined to train and refine the model, improving its predictive accuracy. A conditional VAE was employed to develop a new molecular design strategy that generated molecules with the desired target properties [33]. The AAE is a method that can show good performance in the generation of new compounds while compressing the data to the latent space. An AAE was developed for the identification and generation of new AEs have been widely used in de novo drug design [31]. The encoder converts the discrete representation of a molecule into a multidimensional continuous representation, and the decoder converts these continuous vectors back to discrete molecular representations. This model allows the exploration of the chemical space through the development of optimized chemical structures. Schultz et al. [32] developed a VAE-based software that generated novel antagonists of the NMDA receptor. Data obtained in silico and experimentally were combined to train and refine the model, improving its predictive accuracy. A conditional VAE was employed to develop a new molecular design strategy that generated molecules with the desired target properties [33]. The AAE is a method that can show good performance in the generation of new compounds while compressing the data to the latent space. An AAE was developed for the identification and generation of new compounds in oncology [34]. The same group compared the VAE and AAE as a molecular generator model in terms of the reconstruction error and variability of the output molecular fingerprints and published an improved model named druGAN [35]. Compared with the VAE model, the AAE model showed better capacity and efficiency in generated new molecules with specific anticancer properties.

Generative Adversarial Networks
In GANs, two neural networks are trained simultaneously: the generator and the discriminator ( Figure 4). The objective of the generator is to create an output that is so similar to the real one that it makes it difficult for the discriminator to differentiate between real and fake data [10]. GANs have gained attention in applications such as image reconstruction, segmentation, detection, and classification [12,36]. There are various GAN architecture applications used in drug discovery [37]. Sanchez-Lengeling et al. [38] invented the Objective-Reinforced Generative Adversarial Network for Inverse-design Chemistry (ORGANIC)-a framework of previous published Objective-Reinforced Generative Adversarial Networks (ORGAN) architecture [39]. With the exception of solubility, ORGAN performed well in comparison to naïve Reinforcement Learning (RL) in terms of drug likeliness and synthesizability [39]. The main shortcoming of ORGANIC was the large number of invalid molecules and the numerous repetitions in the valid molecules. Another architecture called reinforced adversarial neural computer (RANC) was developed for de novo drug design combining GANs and RL. RANC used a differential neural computer-a type of RNN with external memory-as a generator. The existence of an explicit memory bank mitigated common problems found in adversarial settings [40]. compounds in oncology [34]. The same group compared the VAE and AAE as a mol generator model in terms of the reconstruction error and variability of the output m ular fingerprints and published an improved model named druGAN [35]. Compared the VAE model, the AAE model showed better capacity and efficiency in generated molecules with specific anticancer properties.

Generative Adversarial Networks
In GANs, two neural networks are trained simultaneously: the generator and th criminator ( Figure 4). The objective of the generator is to create an output that is so s to the real one that it makes it difficult for the discriminator to differentiate betwee and fake data [10]. GANs have gained attention in applications such as image recon tion, segmentation, detection, and classification [12,36]. There are various GAN arc ture applications used in drug discovery [37]. Sanchez-Lengeling et al. [38] invent Objective-Reinforced Generative Adversarial Network for Inverse-design Chemistry GANIC)-a framework of previous published Objective-Reinforced Generative A sarial Networks (ORGAN) architecture [39]. With the exception of solubility, OR performed well in comparison to naïve Reinforcement Learning (RL) in terms of likeliness and synthesizability [39]. The main shortcoming of ORGANIC was the number of invalid molecules and the numerous repetitions in the valid molecule other architecture called reinforced adversarial neural computer (RANC) was deve for de novo drug design combining GANs and RL. RANC used a differential neura puter-a type of RNN with external memory-as a generator. The existence of an e memory bank mitigated common problems found in adversarial settings [40]. Two independent competing networ trained simultaneously: the Generator (G), which takes an input z from probability distributio and generates data G(z); and the Discriminator (D), which receives as input the training data output from the generator G(z) and tries to predict whether the input is real or generated.

Recurrent Neural Networks
RNNs, like feed-forward networks, may not have cycles among conventional e but edges that connect adjacent time steps ( Figure 5). These are called recurrent edge introduce a notion of time to the model. RNNs can pass information across sequ steps and process data one element at a time. Thus, the input features can be nonde ent sequences of elements [41]. An input is consecutively processed and a connectio rying the output from the previous step into the current step is introduced. As the nu of steps increases, RNNs face the problem of vanishing or exploding gradients d backpropagation, thus impairing the training problem. The effect of the input on th den layer may decay or blow up, causing the so-called vanishing gradient problem. Two independent competing networks are trained simultaneously: the Generator (G), which takes an input z from probability distribution p(z) and generates data G(z); and the Discriminator (D), which receives as input the training data or the output from the generator G(z) and tries to predict whether the input is real or generated.

Recurrent Neural Networks
RNNs, like feed-forward networks, may not have cycles among conventional edges, but edges that connect adjacent time steps ( Figure 5). These are called recurrent edges and introduce a notion of time to the model. RNNs can pass information across sequential steps and process data one element at a time. Thus, the input features can be nondependent sequences of elements [41]. An input is consecutively processed and a connection carrying the output from the previous step into the current step is introduced. As the number of steps increases, RNNs face the problem of vanishing or exploding gradients during backpropagation, thus impairing the training problem. The effect of the input on the hidden layer may decay or blow up, causing the so-called vanishing gradient problem. Many attempts have been made to reduce this problem, with long short-term memory (LSTM) and gated recurrent units (GRU) being the most favored approaches [42,43].
Several examples of the employment of RNNs in de novo drug design have been reported in the literature [44][45][46]. In the work of Olivecrona et al. [44], one of the first attempts to produce a generative model for molecular de novo design is described. A policy-based RL approach was proposed to fine-tune RNNs for generating molecules with given desirable properties. Training of an RNN was performed through maximum likeli-hood estimations of the next token in a target sequence of given tokens from the previous steps. Once the RNN was trained, it was used to generate new sequences. One year later, Popova et al. [45] proposed a stacked LSTM-RNN model which implemented RL to generate new chemical structures with desired physical and/or biological properties. Transfer learning (TL) approaches were also used to fine-tune the predictions of RNNs for specific molecular targets [47]. Gupta et al. [47] trained an LSTM-RNN model to generate libraries of valid SMILES strings. The model was fine-tuned with TL to generate molecules that were structurally similar to drugs with known bioactivities against a particular biological target. Similar to Popova et al. [45], Gupta et al. combined RNN with another technique to reduce the error and unwanted bias. A scaffold-based deep generative model was proposed by Arús-Pous et al. without the implementation of RL or TL [48]. An LSTM-RNN generated scaffold-decorations tuples and another LSTM-RNN decorated the scaffolds. The trained models became synthetic-chemistry-aware and generated molecules that had synthetically feasible decorations without the need to combine it with other techniques [48]. attempts have been made to reduce this problem, with long short-term memory (LSTM) and gated recurrent units (GRU) being the most favored approaches [42,43]. Several examples of the employment of RNNs in de novo drug design have been reported in the literature [44][45][46]. In the work of Olivecrona et al. [44], one of the first attempts to produce a generative model for molecular de novo design is described. A policy-based RL approach was proposed to fine-tune RNNs for generating molecules with given desirable properties. Training of an RNN was performed through maximum likelihood estimations of the next token in a target sequence of given tokens from the previous steps. Once the RNN was trained, it was used to generate new sequences. One year later, Popova et al. [45] proposed a stacked LSTM-RNN model which implemented RL to generate new chemical structures with desired physical and/or biological properties. Transfer learning (TL) approaches were also used to fine-tune the predictions of RNNs for specific molecular targets [47]. Gupta et al. [47] trained an LSTM-RNN model to generate libraries of valid SMILES strings. The model was fine-tuned with TL to generate molecules that were structurally similar to drugs with known bioactivities against a particular biological target. Similar to Popova et al. [45], Gupta et al. combined RNN with another technique to reduce the error and unwanted bias. A scaffold-based deep generative model was proposed by Arús-Pous et al. without the implementation of RL or TL [48]. An LSTM-RNN generated scaffold-decorations tuples and another LSTM-RNN decorated the scaffolds. The trained models became synthetic-chemistry-aware and generated molecules that had synthetically feasible decorations without the need to combine it with other techniques [48]. For the standard RNN, the hidden state at time step t is represented as s t .; is the "memory" of the network, and for time step t, s t is calculated based on the previous hidden state and the input at the current step: s t = f(Ux t + Ws (t − 1) ). The function f is usually a nonlinearity, such as tanh or Rectified Linear Unit (ReLU).

Convolutional Neural Networks
CNNs are a specialized type of NNs that perform convolution in at least one layer [12]. The first few stages of the CNN have two types of layers: the convolutional layers and the pooling layers ( Figure 6). The convolutional layers generate new images called feature maps, and each unit is connected to local patches in the feature map of the previous layer through weights [12]. The created feature maps are processed through a nonlinearity such as the ReLU activation function. The role of the pooling layer is to reduce the size of the input image, as it merges semantically similar features into one. Typically, a pooling unit computes the maximum of a local patch into one feature map. Different convolution, nonlinearity, and pooling stages are stacked, followed by fully connected layers. CNNs have a large range of applications in image classification, video recognition, and image analysis [9,49]. CNNs have been used in drug design to improve the performances of the ligand-based virtual screening process [50] or the prediction of DTIs [51]. An efficient variant of CNNs on graphs is a graph convolutional network (GCN). GCNs stack layers of learned first-order spectral filters followed by a nonlinear activation function to learn graph representations [52]. GCN models are a type of NN that can leverage the graph structure and combine node information from the neighborhoods in a convolutional manner. Ryu et al. [53] proposed an attention-and gate-augmented GCN for the prediction of molecular properties. For the same purpose, an edge-attention-based multirelational GCN was developed [54].
image analysis [9,49]. CNNs have been used in drug design to improve the performances of the ligand-based virtual screening process [50] or the prediction of DTIs [51]. An efficient variant of CNNs on graphs is a graph convolutional network (GCN). GCNs stack layers of learned first-order spectral filters followed by a nonlinear activation function to learn graph representations [52]. GCN models are a type of NN that can leverage the graph structure and combine node information from the neighborhoods in a convolutional manner. Ryu et al. [53] proposed an attention-and gate-augmented GCN for the prediction of molecular properties. For the same purpose, an edge-attention-based multirelational GCN was developed [54]. Figure 6. Schematic diagram of a CNN. A convolutional layer followed by a pooling layer forms a convolutional module. Each module learns to identify features while preserving spatial relationships. A fully connected layer is followed, which utilizes the output from the convolution process and predicts the class in a classification problem, based on the features extracted in previous stages.

Generating Compounds and Searching Chemical Libraries
The goal of drug discovery is to discover new chemical structures with desired pharmacological properties. De novo molecular design aims to leverage computational methods to automate the molecular generation process and reduce the time of searching in a virtually infinite chemical space [55]. Most existing studies of generative models use the Simplified Molecular Input Line Entry System (SMILES), which is a line notation encoding topological and structural properties of molecules [56], or molecular graphs as molecular representation.
One notable study that includes the biological evaluation of the in silico results was published by Zhavoronkov et al. [57], who introduced a generative tensorial RL pipeline named GENTRL. The model employed the variational inference, tensor decomposition, and RL, combined with three different self-organizing maps (SOMs), which were used as reward functions. GENTRL successfully discovered potent inhibitors of Discoidin domain receptor1 (DDR1). Within 23 days, 30,000 unique and valid structures were obtained using the generative model, and six compounds were selected for further examination. By day 46, these molecules had been synthesized and tested for their in vitro inhibitory activity. One compound was tested in mice and showed favorable pharmacokinetics, demonstrating the potential of the method for effective molecular design [57]. Compounds with a potent DDR1 inhibition profile were also designed by Tan et al. [58]. The authors identified a series of FGFR inhibitors, including compound DC-1, which was selected as a starting point for developing DDR1 inhibitors. A scaffold-based molecular design method was developed, consisting of the matched molecular pairs algorithm proposed by Arús-Pous et al. [48] and an AE as generative model. The most potent compounds were selected based on the kinase selectivity and molecular docking scores [58]. To evaluate the quality Figure 6. Schematic diagram of a CNN. A convolutional layer followed by a pooling layer forms a convolutional module. Each module learns to identify features while preserving spatial relationships. A fully connected layer is followed, which utilizes the output from the convolution process and predicts the class in a classification problem, based on the features extracted in previous stages.

Generating Compounds and Searching Chemical Libraries
The goal of drug discovery is to discover new chemical structures with desired pharmacological properties. De novo molecular design aims to leverage computational methods to automate the molecular generation process and reduce the time of searching in a virtually infinite chemical space [55]. Most existing studies of generative models use the Simplified Molecular Input Line Entry System (SMILES), which is a line notation encoding topological and structural properties of molecules [56], or molecular graphs as molecular representation.
One notable study that includes the biological evaluation of the in silico results was published by Zhavoronkov et al. [57], who introduced a generative tensorial RL pipeline named GENTRL. The model employed the variational inference, tensor decomposition, and RL, combined with three different self-organizing maps (SOMs), which were used as reward functions. GENTRL successfully discovered potent inhibitors of Discoidin domain receptor1 (DDR1). Within 23 days, 30,000 unique and valid structures were obtained using the generative model, and six compounds were selected for further examination. By day 46, these molecules had been synthesized and tested for their in vitro inhibitory activity. One compound was tested in mice and showed favorable pharmacokinetics, demonstrating the potential of the method for effective molecular design [57]. Compounds with a potent DDR1 inhibition profile were also designed by Tan et al. [58]. The authors identified a series of FGFR inhibitors, including compound DC-1, which was selected as a starting point for developing DDR1 inhibitors. A scaffold-based molecular design method was developed, consisting of the matched molecular pairs algorithm proposed by Arús-Pous et al. [48] and an AE as generative model. The most potent compounds were selected based on the kinase selectivity and molecular docking scores [58]. To evaluate the quality of generated molecules, the synthetic accessibility score, the water-octanol partition coefficient (clogP), and the molecular weight of generated molecules were compared with the published DDR1 inhibitors. These properties were consistent with DDR1 inhibitors, showing the ability of the model to design molecules with desired properties. Two promising compounds were selected for synthesis and experimental validation, and one showed promising results in the dextran sulfate sodium-induced mouse colitis model [58].
Another successful deep generative model, apart from AE, is the RNN. After training an RNN with a large number of SMILES sequences, the model can generate valid SMILES strings that may not be present in the training dataset. The LSTM models have exhibited significant improvements over the RNN and tend to replace RNNs in drug discovery [47]. Several studies have applied the strategy of TL, training the model with a larger dataset, and then fine-tuning it with a more specific dataset. An LSTM-based generator can be trained with a chemical database and fine-tuned to generate molecules with desired activity against a target. Yang et al. [59] trained an LSTM-based neural network [60] using 200,000 compounds from ChEMBL database. The model was fine-tuned with a smaller dataset of published p300 inhibitors and macrocycle molecules with potential use in several targets to generate novel p300/CREB-binding protein (CBP) lead compounds. A focused library of 672 chemical structures was generated. After filtering, the top compounds were submitted according to their docking score for visual inspection and further systematic optimization. A potential candidate, B026, showed high inhibitory activity against p300/CBP in animal models of human cancer [59]. Similarly, Tan et al. [61] used an LSTM to design antipsychotic drugs. A pretraining was performed to ensure that the LSTM could generate valid molecules, and then the model was fine-tuned to design molecules that target D 1 /D 2 /5-HT 1A /5-HT 2A receptors. Tan et al. combined the generative model with a multitask deep neural network (MTDNN) to predict whether the generated molecules target multiple G-protein coupled receptors (GPCRs) (bioactivities pIC 50 , pEC 50 ). Molecules with high predictive activity were used to expand the fine-tuning set at each iteration during the TL process. The validity of the generated compounds was 97% and the novelty was 87%. The deep discriminative model achieved a test model accuracy expressed as an r 2 of 0.71 and mean absolute error (MAE) of 0.47 for the IC 50 dataset and an r 2 of 0.71 and MAE of 0.54 for the EC 50 dataset. A hit compound was obtained, and analogs of hit compounds were also designed. The activity profiles of 6 analogs were characterized in vitro. Then, the antipsychotic activities of the selected compounds were studied in the phencyclidineinduced locomotor hyperactivity test in ICR mice, showing good potential for subsequent development [61].
Comparison of the deep generative models for de novo molecular design in [57,59], and [61] reveals that the models are pretrained to learn the general SMILES vocabulary and then fine-tuned to generate DDR1 inhibitors, p300-CBP inhibitors, and GPCR inhibitors, respectively, using a smaller set of specific molecules. Generated compounds showed strong inhibitory activity, with an IC 50 of 10 nM to DDR1, 1.8 nM to p300/9.5 nM to CBP, and 1.6 nM to 5-HT 1A for each study. In order to optimize the generated molecules, Zhavoronkov et al. [57] and Tan et al. [61] combined RL into their models. In [57], the reward of the RL was based on a trending SOM that scored compound novelty, a general kinase SOM to distinguish kinase inhibitors from other molecules, and specific kinase SOM to isolate DDR1 inhibitors. In [61], the MTDNN model provides reward signals to generate more attractive molecules. A different approach was followed in [58]: the selective DDR1 inhibitors were generated using a potent scaffold and applying decorations, resulting in the identification of a compound with a potent DDR1 inhibition profile (IC50 of 10.6 nM). This study implemented a global attention mechanism to assign different focus to the information output from the hidden layers of the RNN.
Although many deep learning models use SMILES to represent molecules [59,61], this notation has limitations. For example, a molecule may be represented by multiple different SMILES strings. Moreover, SMILES may be too simple to deliver the topological information of molecular structures. Molecular graphs intuitively express molecules with 2D topological information and are widely adopted for molecular representation for generative models and predictive models [55]. GCN models in drug-related applications construct graph representations of a molecule that include information about the chemical substructures by summing up all the features of all adjacent atoms [13]. GCNs learn their own expert feature representations directly from the data, and they have been shown to be very capable of capturing complex relationships given sufficient data [62]. A model that belongs to this category was published by Yang et al., who proposed an advanced model which adopts a directed message-passing paradigm for property prediction [62]. The direct-message-passing neural network (D-MPNN) matched or outperformed traditional models that use fixed molecular descriptors or other graph neural networks (GNNs). The main difference implemented into their work was that instead of using messages associ-ated with vertices (atoms), the D-MPNN used messages associated with directed edges (bonds). Stokes et al. [63] utilized this D-MPNN in structure-based antibiotics prediction and became the first reported study where it explored with deep learning a large-scale chemical library for the identification of an antibiotic. A drug library of FDA-approved drugs and additional natural products was screened against E. coli, resulting in a training dataset of molecules binarized as hit or non-hit. This dataset was used to train a D-MPNN for a binary classification model that predicts the probability of whether a new compound inhibits the growth of E. coli or not. The resulting model achieved a receiver operating characteristic curve-area under the curve (ROC-AUC) of 0.896 on the test data. An ensemble of trained models was used in molecules from the Drug Repurposing Hub [64]. After empirically testing, authors proposed halicin as a candidate antibiotic: a preclinical nitrothiazole under investigation as a treatment for diabetes. In vitro studies showed that halicin had a broad-spectrum bactericidal activity and effectively treated various infections in murine models [63]. Additionally, from a set of >107 million molecules from ZINC and WuXi databases, the model identified eight antibacterial compounds that were structurally distant from known antibiotics. This study by Stokes et al. [63] indicates the potential of applying machine learning in antibiotic discovery, enabling the expansion of the antibiotic arsenal and increasing the rate at which new molecular entities are discovered. Following their paradigm, Wang et al. used the same D-MPNN model [62] for the identification of Ca v 1.3 antagonists as Parkinson's-disease-relevant drug candidates [65]. They engineered a cell-based drug discovery platform for multiplexed analysis of Ca v 1 channel blockers, which was used as a pilot test for high-throughput screening (HTS) of plant essential oils. To identify the putative active constituents of the essential oils, in silico virtual screening was performed and validated with the D-MPNN with an ROC-AUC of 0.978. Experimental testing of five candidate compounds confirmed that sclareol showed Ca v 1.3 antagonistic activity [65].
Deep learning has also been employed for the prediction of drug efficacy and the underlying pathogenic mechanisms. Using the drugs and the corresponding transcriptional profiles as the input, Zhu et al. [66] developed the deep-learning-based efficacy prediction system (DLEPS), which predicts the drug efficacy from changes in transcriptional profiles. DLEPS utilizes chemical libraries and gene signatures for the identification of candidate disease treatment. In this algorithm, SMILES strings were encoded into a latent space through a GVAE, after passing from a CNN and a dense network was used for the prediction of changes in transcriptional profiles. The changes in transcriptional profiles from both the training and test sets were extremely well fitted with an ROC-AUC around 0.90 and 0.74, respectively. The study explored various gene signature inputs, including a dual up-/downregulated gene set from obesity studies, a dataset for multiple phenotype manifestations in hyperuricemia, and independent disease stage datasets in nonalcoholic steatohepatitis, resulting in the top drug candidates which were further tested experimentally [66].

De Novo Peptide Generation
For a model to be used in the discovery of drug-like molecules, it must first be trained to sort through the many characteristics of molecules and determine which properties should be retained or suppressed. Similarly, deep learning methods can be used in peptide science to perform various tasks, such as peptide identification, property prediction, and de novo peptide generation [67]. Müller et al. [68] presented a generative LSTM-RNN for combinatorial de novo peptide design. The LSTM-RNN was trained on pattern recognition of helical antimicrobial peptides (AMPs) and the trained model was used for sequence generation, generating 91.4% valid unique sequences. Of these sequences, 82% were predicted to be active AMPs compared to 65% of randomly sampled sequences. This model was used by Bolatchiev et al. [69] for combinatorial de novo AMP design and in vivo evaluation of the most promising generated peptides. The authors differentiated the training set from the original publication presenting the generative model [68] and used all AMPs, not only helical peptides. Using an online tool, the generated novel peptides were categorized to define the AMPs with an accuracy of 87%, resulting in a total of 35 selected sequences from 200 generated sequences. Further computer screening of generated sequences resulted in 5 peptides that were active against various microorganisms and were synthesized for further in vitro and in vivo studies [69]. Apart from sequence-based models like RNNs, VAEs have also been used for peptide generation [70]. Similarly to Bolatchiev et al., Das et al. trained a generative model to design AMPs with low toxicity. They utilized a large unlabeled dataset obtained from UniProt to train a VAE and a Wasserstein autoencoder (WAE). To sample peptides with desired properties, authors fitted a Gaussian mixture density estimator and linear property predictors on latent variables of labeled peptide data. Then, they used a rejection sampling scheme to sample desired latent variables and control the generation of sequences. Das et al. showed that the combination of their VAE framework with molecular dynamics simulations and wet-lab experimentation yielded two novel AMPs within 48 days, highlighting the potential of AEs in peptide drug discovery [70]. This study shows that even training the deep generative AE with a large unlabeled dataset, the latent space is informative of peptide properties. As a result, all AMPs generated are unique, valid, and optimized.
By combining a deep generative model with optimization/searching methods such as genetic algorithms, Bayesian optimization, etc., the generation of peptides can be further improved. Schissel et al. combined a generative model, a prediction model, and a genetic algorithm to generate optimized nuclear-targeting miniproteins [71]. An RNN-based generator was used to produce novel cell-penetrating sequences. A CNN predictor was then used to estimate the activity for a given sequence, and a genetic algorithm was used to optimize the sequence. The generated sequences by the LSTM-RNN model were optimized in the predictor-optimizer loop. The predicted miniproteins where characterized as nontoxic and effectively delivered antisense cargo in animal studies [71]. For the inverse design model, multiple combinations of LSTM and nested LSTM layers were combined, achieving an accuracy of 76%.

Interaction Prediction
Interaction prediction plays a vital role in drug discovery. According to polypharmacology, most drugs have multiple effects on both primary and secondary targets. On the other hand, neural networks can simultaneously learn the properties of many types of data. Thus, by combining deep learning with drug-protein(disease)-based networks, the drug selectivity or the protein promiscuity can be evaluated [72]. DTIs identify the interaction sites between drug compounds and protein targets [73]. Furthermore, protein-protein interactions are particularly important in predicting drug development for precisely locating interacting interfaces in pathway-regulatory approaches, as well as drug-drug interactions (DDIs) for identifying potential side effects and discovering novel applications for finding new uses of existing drugs.
Machine learning methods, especially deep learning, are widely applied to DTI predictions. A crucial step in DTI prediction is the feature extraction step of drug-protein networks. AEs are commonly used for feature extraction. In the studies of [74,75], a stacked AE was used to generate low-dimensional, compressed vectors from the original high-dimensional vectors. Zeng et al. [74] proposed a deep learning methodology for new target identification among known drugs. A stacked AE encoded into low-dimensional feature vectors the relational properties, association information, and topological context of each node of a heterogeneous drug-gene-disease network. Topotecan was identified as a direct inhibitor (with an IC50 = 0.43 µM) of human retinoic-acid-receptor-related orphan receptor-gamma t (ROR-γt) with therapeutic effects in a multiple sclerosis mouse model. The proposed model, named deepDTnet, achieved high accuracy (ROC-AUC of 0.963). Similarly, Zhao et al. developed a DTI prediction framework [75]. A stacked AE was used to achieve the optimal mapping of the drug space to the protein space and to obtain low-dimensional feature vectors. The resulting feature vectors integrated the attribute characteristics, interaction information, and the network topology of each target. The low-dimensional feature vectors were used to train the model to obtain the optimal mapping space, and a CNN was used to predict DTIs. The experimental results showed that DLDTI achieved promising performance, with ROC-AUC of 0.917. The new DTIs were identified by ranking candidates according to their optimal mapping space proximity. The predicted targets of tetramethylpyrazine were validated on a novel atherosclerosis model [75].

Databases for Drug Discovery
In this section, we provide a summary of databases used for training the selected models presented above. In Table 1, databases used in Section 3.2, regarding models for de novo molecular design and molecular property prediction, and the size of training datasets, are presented. The ZINC database [76,77] contains a curated collection of commercially available chemical compounds prepared for virtual screening. The new version ZINC-15 contains over 120 million purchasable "drug-like" compounds, and all molecules are in biologically relevant, ready-to-dock formats. The ZINC database is used as a pretraining dataset in [57,61]. Zhavoronkov et al. used the ZINC database for the initial training of a VAE. The pretraining dataset is derived by filtering the ZINC database and removing structures containing atoms other than carbon, nitrogen, oxygen, sulfur, fluorine, chlorine, bromide, and hydrogen. In [61], a collection of molecules from ZINC was used to first train the LSTM model to ensure that it can generate rational "drug-like" molecules. The ZINC-15 database was used by Stokes at al. for virtual screening as well. The authors first trained a model with FDA-approved drugs and predicted the antibiotic activity of >170 million molecules from ZINC-15, identifying eight antibacterial compounds that are structurally distant from known antibiotics [63].
Another commonly used database for drug discovery is ChEMBL [78], which comprises bioactive molecules with drug-like properties. The database provides 5.4 million bioactivity measurements for more than 1 million compounds and 5200 protein targets. The ChEMBL database and Integrity database [79], which are a collection of about half a million bioactive compounds, were used by Zhavoronkov et al. for fine-tuning the VAE to generate DDR1 kinase inhibitors. The ChEMBL database was filtered to contain only DDR/FGFR inhibitors and used as a training set by Tan et al. [58]. A scaffold-based library was used by slicing these inhibitors and obtaining a set of 3603 million scaffold-decoration tuples. ChEMBL was used for pretraining and fine-tuning in [59]. For pretraining, molecules that interact with human "single-protein" targets were retained and by fine-tuning with p300 inhibitors, the LSTM-based molecular generator generated potential p300 inhibitors. Other databases used for the training of deep learning models with a lower frequency are presented in Table 1.  [82], Reaxys, SciFinder [21] 10,286 DLEPS [66] L1000 project-Library of Integrated Network-Based Cellular Signatures [83] 17,051 [63] FDA (growth inhibition of E. coli) 2335 Drug Repurposing Hub [64] 6111 WuXi, ZINC >107 million [65] Literature search (Calcium channel blockers) MUV [84] 240 400 The databases used in the studies presented in Sections 3.3 and 3.4 are shown in Table 2. While there are publicly available datasets for protein informatics with labeled activity, their size is limited. For the generation of peptides with antimicrobial activity, which was the goal in studies of Bolatchiev et al. [69] and Das et al. [70], labeled negative data are often more scarce than positive. Bolatchiev et al. trained an LSTM-based generative model using the APD3 database [85]. The generated sequences were further filtered using online available tools to predict AMPs. A different approach was chosen by Das et al., who trained the generative model using unlabeled sequences from the UniProt DB [86]. Labelled data collected from AmPEP [87], DBAASP [88], ToxinPred [89] were used for training a classifier to distinguish sequences with AMP and non-AMP, toxic and non-toxic. Using a larger training dataset, Das et al. improved the generalizability of the generative model and controlled the generation of desired peptides using a smaller, labelled dataset. In [71], the goal was to generate nuclear-targeting abiotic miniproteins, thus, a more specific database containing cell-penetrating peptides-named CPPSite 2.-was used [90].
Section 3.4 presents models focused on DTI prediction. DrugBank [91] is a comprehensive database that contains molecular information about drugs and was used to collect data in both studies [74,75]. DrugBank contains DTIs, DDIs, drug-disease networks (DDNs), etc. Therapeutic Target DB (TTD) [92] contains information about known therapeutic proteins and nucleic acid targets described in the literature. DrugBank, TTD, and PharmGKB [93] were used in [74] for the DTI network. MetaADEBD [94], CTD [95], SIDER [96], and OFF-SIDES [97] are databases containing information on drugs and adverse effects, and were used in [74] to design the drug-side effect network.

Drug Representation
String-based representations are the most frequent option for molecular encoding, among which the SMILES strings are the mostly used drug representation. As it is a sequence-based feature, it can be used as a "sentence" to learn the representations. Many deep generative model techniques have been developed specifically for sequence generation. Therefore, when generative models are applied to de novo drug design, SMILES are most used as a molecular representation. An important feature of SMILES is that it is easy to learn and human-readable compared to other methods of molecular representation. Molecules are represented as SMILES strings in studies [58,59,61]. The deep generative models presented in these studies use an LSTM-based network for the design of novel molecules. Tan et al. [61] used canonical SMILES as an input in the generative model and molecular-fingerprint-based descriptors in the discriminative model. A few years later, Tan et al. [58] used randomized SMILES, as it was shown from previous studies that the model trained with randomized SMILES could generate more unique molecules than the model trained with canonical SMILES [98]. Molecules were represented using the SMILES format in other studies as well [57,66]. In [66], the authors tried different ways of encoding SMILES of chemical compounds. They encoded the compounds into latent space as plain text through a VAE, and they also converted them into a grammar tree (GVAE), resulting in the latter being indicated as the best representation. Among the models included in this review, there were studies that focused on de novo peptide design. In these cases, peptide sequences were used as text input to train the models that are learning sequence grammar [69][70][71]. Schissel et al. [71] trained a CNN to predict the activity of sequences, apart from the generative model. For the training of the classification model, one-hot encodings and fingerprints were examined. It was shown that the CNN-fingerprint model was able to extrapolate in the codomain and generate predicted activity values that were greater than any in the training set.
In the study of Stokes et al. [63] and Wand et al. [65], the molecular graph for each molecule was constructed using SMILES strings, following the initial work of Yang et al. [62]. A feature vector was initialized for each atom and bond, based on computable features. The message-passing paradigm followed was based on updating representations of directed bonds rather than atoms. Even though the message-passing paradigm can extract features that depend on local chemistry, it may struggle to extract global features. For that reason, the molecular representation was a concatenation of learned features and fixed moleculelevel features. In the cases of interaction prediction, more complex, heterogeneous networks were examined. A deep neural network for graph representation algorithms was employed to learn a low-dimensional vector representation of drugs and targets. The drug-target network was described as a bipartite graph G(D,T,P), where the drug set was denoted as D, the target set as T, and the interaction set as P [74]. In [75], heterogeneous data were integrated, including circular fingerprints to map the structural information of drugs, sequences of drug targets, and graph-embedding-based features for drug and targets.

Discussion
The selected studies in this systematic review include different applications of deep learning in drug discovery, with their in vivo evaluation results, from de novo molecular design, de novo peptide design, and specifically, AMPs and miniproteins, antibiotic discovery, drug repurposing, and drug efficacy. The available codes and tools of these studies are presented in Table 3. Table 3. Open-source codes and web applications for different tasks of computational drug discovery presented in this systematic review.
Conditional molecular design samples new molecules from a conditional generative distribution without any additional optimization process. In the case of [57] and [59], the generated molecules were reduced to most "drug-like" molecules by adding restrictions, such as molecular weight, logP values, no violation of Lipinski's rule of five, etc. With conditional design, these models could directly produce molecules with desired features. An interesting approach was presented by Das et al. [70], who did not implement an RL approach to design AMPs, since this method for targeted generation requires optimal policy learning. Instead, they trained on the latent space of a deep AE, which represented all known peptide sequences and not only AMPs, an attribute classifier to select the informative space for sampling. This study revealed that the latent space is linearly separable into different functional attributes, and sampling from the selected space can generate optimized peptides. By combining deep generative models with optimization methods such as genetic algorithms, generated samples can be further optimized to acquire improved functions. Schissel et al. [71] studied this notion to generate peptides using a deep generative model with a genetic algorithm. They also examined the representation of amino acids and concluded that topological fingerprints led to models with lower accuracies, but with an enhanced generalizability to peptides with labels outside the range of the training data. Regarding molecular representation for property prediction, D-MPNN [62] which combines fixed and learned features of molecules, was selected both by [63,65]. The hybrid representation of molecules yielded higher performance and generalized better than either convolutional or fingerprint-based models. These studies experimentally evaluated the results of virtual screening using the D-MPNN. In [63] the authors identified an antibiotic that, even if structurally divergent from conventional antibiotics, displays growth-inhibitory properties against a wide spectrum of pathogens. In [65] the authors identified an essential oil that inhibits Ca v 1.3. Heterogeneous data sources of DTIs, DDNs, PPIs, etc., were fed into AEs for the generation of low-dimensional but informative vectors for both drugs and targets [74,75].
With his methodology, Zeng et al. [74] uncovered known drug targets contributing to drug repurposing. Relationship-based features were collected by training an AE and were used in [75] for DTI prediction. A CNN prediction model that combined deep information was taught using the stacked AE technique.
It is important to emphasize that machine learning is imperfect. Therefore, the success of deep neural network model-guided drug discovery rests heavily on coupling these approaches with appropriate experiments. Before in vitro studies, the results of the deep learning models were filtered based on other methods. Generated molecules were evaluated using SOMs and pharmacophore modeling on the basis of crystal structures in complex with DDR1 [57], kinase selectivity, and molecular docking [58,59]. Generated peptides were screened for AMPs, toxicity, drug efficacy using sequence-level classifiers [70], and online prediction tools [69]. For in vivo studies, the animal model used for the biological evaluation of the compound of interest, and the compound identified by the deep learning algorithm, are presented in Tables 4 and 5, respectively.
Changes in the frequencies of the selected model architectures per year are shown in Figure 7. LSTM-RNN models were the most commonly published models between 2019 and 2022. The four LSTM-RNN models presented include the use of de novo design for antipsychotic drugs [61], p300 and CBP lead compounds [59], nuclear-targeting miniproteins [71], and AMPs design [69]. AEs were widely used in different architectures, including VAEs, GVAEs, and WAEs. Stacked AEs were used to generate low-dimensional vectors from the original high-dimensional vectors [74,75], and AE architectures were used to design DDR1 inhibitors [57,58]. A WAE was used as an alternative to a VAE for AMP design [70], and in the same year, a GVAE was used to encode SMILES into latent space, with this vector then passing through a CNN to estimate the activity of the given sequence [66]. CNNs were used for the activity prediction of the generated sequences [71] and for the prediction of DTIs [75]. A multitask DNN was used for the virtual screening of molecules based on their activity score [61]. D-MPNN was presented in two studies, for predicting the probability of whether a new compound inhibits the growth of a spectrum of pathogens or not [63], and for the identification of Parkinson's-disease-relevant drug candidates [65].  Chikusetsusaponin IV Perillen Trametinib Stokes et al., 2020 [63] c-Jun N-terminal kinase inhibitor SU3327 (halicin) Tan et al., 2020 [61] 1-(4-(4-(benzo[b]thiophen-4-yl)piperazin-1-yl)butyl)quinazoline-2,4(1H, 3H)-dione

Conclusions
Drug discovery based on artificial intelligence has received much attention, since it has had a significant influence on developing novel drugs. Owing to the rapid advancements in computer hardware, coupled with the growth in size and availability of publicly accessible datasets, deep learning has met unprecedented success in the field of CADD

Conclusions
Drug discovery based on artificial intelligence has received much attention, since it has had a significant influence on developing novel drugs. Owing to the rapid advancements in computer hardware, coupled with the growth in size and availability of publicly accessible datasets, deep learning has met unprecedented success in the field of CADD [99,100]. Advances in deep learning techniques have been successfully combined with well-established drug design strategies, such as drug repositioning, opening new pathways and prospects in the identification of novel therapeutics using cutting-edge computational methods [101,102]. Particularly in the field of de novo drug design, deep learning applications have gained increasing popularity, since numerous approaches (e.g., RNNs, AEs, GCNs) have been developed to build novel compounds with desired pharmacological and physiochemical properties [103,104].
In the present study, a systematic review of peer-reviewed research articles from 2018 up until April of 2022 is presented. The scientific articles considered were used to extract information regarding trends in deep learning models for drug design that were complemented with in vivo animal studies. The outcomes of this review include the deep learning architectures developed, the molecular representations, the workflow of each study, the animal model for the validation of the selected compounds, and the resulting compounds. The deep learning algorithms that were selected were LSTM-RNNs, AEs, CNNs, MTDNNs, and D-MPNNs. LSTM-RNNs were the more frequently used algorithms.
It is important to note that although several studies have examined the potential role that deep learning models could play in the discovery of new drugs, applications of these models in "real cases" are still uncommon due to the need for additional computational and experimental validation. This review selected breakthrough studies that started from a deep learning model and continued to in vivo studies to provide a validated process. We believe that deep learning will become an essential part in drug discovery in the near future, and as highlighted by the presented studies, and will assist medicinal chemists in generating new ideas and accelerate the cycle of drug discovery.