Next Article in Journal
Five Post-Translational Modification Residues of CmPT2 Play Key Roles in Yeast and Rice
Next Article in Special Issue
Inhibition of Neutral Sphingomyelinase 2 by Novel Small Molecule Inhibitors Results in Decreased Release of Extracellular Vesicles by Vascular Smooth Muscle Cells and Attenuated Calcification
Previous Article in Journal
Biological Activity of Novel Organotin Compounds with a Schiff Base Containing an Antioxidant Fragment
Previous Article in Special Issue
XML-CIMT: Explainable Machine Learning (XML) Model for Predicting Chemical-Induced Mitochondrial Toxicity
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:

Artificial Intelligence and Machine Learning Technology Driven Modern Drug Discovery and Development

Department of Pharmacology, North Eastern Indira Gandhi Regional Institute of Health and Medical Sciences (NEIGRIHMS), Mawdiangdiang, Shillong 793018, Meghalaya, India
Department of Pharmacology, All India Institute of Medical Sciences (AIIMS), Virbhadra Road, Rishikesh 249203, Uttarakhand, India
Department of Psychiatry, All India Institute of Medical Sciences (AIIMS), Virbhadra Road, Rishikesh 249203, Uttarakhand, India
Department of Psychiatry, North Eastern Indira Gandhi Regional Institute of Health and Medical Sciences (NEIGRIHMS), Mawdiangdiang, Shillong 793018, Meghalaya, India
Department of Medicine, North Eastern Indira Gandhi Regional Institute of Health and Medical Sciences (NEIGRIHMS), Mawdiangdiang, Shillong 793018, Meghalaya, India
Department of Anesthesiology, North Eastern Indira Gandhi Regional Institute of Health and Medical Sciences (NEIGRIHMS), Mawdiangdiang, Shillong 793018, Meghalaya, India
Department of Computer Science and Engineering, Vellore Institute of Technology, Vellore Campus, Tiruvalam Road, Katpadi, Vellore 632014, Tamil Nadu, India
Author to whom correspondence should be addressed.
Int. J. Mol. Sci. 2023, 24(3), 2026;
Submission received: 29 November 2022 / Revised: 27 December 2022 / Accepted: 28 December 2022 / Published: 19 January 2023
(This article belongs to the Special Issue Early-Stage Drug Discovery: Advances and Challenges)


The discovery and advances of medicines may be considered as the ultimate relevant translational science effort that adds to human invulnerability and happiness. But advancing a fresh medication is a quite convoluted, costly, and protracted operation, normally costing USD ~2.6 billion and consuming a mean time span of 12 years. Methods to cut back expenditure and hasten new drug discovery have prompted an arduous and compelling brainstorming exercise in the pharmaceutical industry. The engagement of Artificial Intelligence (AI), including the deep-learning (DL) component in particular, has been facilitated by the employment of classified big data, in concert with strikingly reinforced computing prowess and cloud storage, across all fields. AI has energized computer-facilitated drug discovery. An unrestricted espousing of machine learning (ML), especially DL, in many scientific specialties, and the technological refinements in computing hardware and software, in concert with various aspects of the problem, sustain this progress. ML algorithms have been extensively engaged for computer-facilitated drug discovery. DL methods, such as artificial neural networks (ANNs) comprising multiple buried processing layers, have of late seen a resurgence due to their capability to power automatic attribute elicitations from the input data, coupled with their ability to obtain nonlinear input-output pertinencies. Such features of DL methods augment classical ML techniques which bank on human-contrived molecular descriptors. A major part of the early reluctance concerning utility of AI in pharmaceutical discovery has begun to melt, thereby advancing medicinal chemistry. AI, along with modern experimental technical knowledge, is anticipated to invigorate the quest for new and improved pharmaceuticals in an expeditious, economical, and increasingly compelling manner. DL-facilitated methods have just initiated kickstarting for some integral issues in drug discovery. Many technological advances, such as “message-passing paradigms”, “spatial-symmetry-preserving networks”, “hybrid de novo design”, and other ingenious ML exemplars, will definitely come to be pervasively widespread and help dissect many of the biggest, and most intriguing inquiries. Open data allocation and model augmentation will exert a decisive hold during the progress of drug discovery employing AI. This review will address the impending utilizations of AI to refine and bolster the drug discovery operation.

1. Introduction

The course of research and development of drugs is comprised of drug target recognition, target authentication, hit-to-lead fructification, lead refinement, preclinical molecule determination, and preclinical evaluation, as well as clinical testing. To advance a new prescription drug to market, the mean pretax spending is almost USD 2.6 billion [1], requiring roughly 10–15 years [2]. However, considering the huge financial stakes, the predicted clinical approval realization frequency for novel small agents during the course of discovery and development of drugs is a meagre 13%, with a rather steep possibility of ultimate non-fruition. The advance of computer-enabled drug design technology has been hailed as the most resourceful method for altering this bleak scenario dependent upon prudent navigation in the development process [3]. The methodology pertinent to drug discovery as well as the associated computer-enabled drug design approaches can be located in the treatise “Computer-Assisted Drug Design” [4]. The computational approaches assure a methodical appraisal of the molecular attributes (such as physicochemical properties, selectivity, side effects, bioactivity, and pharmacokinetic parameters) at the speculative level, in concert with engendering optimized molecules having agreeable attributes in silico. Moreover, computational approaches with multi-objective refinement can be engaged to reduce the failure frequency of the preclinical lead molecules. In the vista of drug design, artificial intelligence (AI) invokes the use of computer software programs that evaluate, learn, and reveal pharmaceutical-associated big data to unravel new medicine molecules, by assimilating the advances in machine learning (ML) in a highly unified and mechanized way [5]. Stemming out from the advancement of ML schemes and the growth of chemical and pharmacological information, the AI paradigms have carved out a niche in the arena of drug design for a data-impelled computational process. In comparison with conventional approaches, ML-facilitated approaches, as an offshoot of AI, do not bank upon the theoretical progress of the convoluted and established physico-chemical tenets, but apportion greater emphasis on the metamorphosis of colossal biomedical big data into new enlightenment and sustainable expertise (Figure 1). Typical algorithms synonymous with ML include: Logistic Regression (LR), Naive Bayesian Classification (NBC), k Nearest Neighbor (KNN), Multiple Linear Regression (MLR), Support Vector Machine (SVM), Probabilistic Neural Network (PNN), Binary Kernel Discrimination (BKD), Linear Discriminant Analysis (LDA), Random Forest (RF), Artificial Neural Network (ANN), Partial Least-Squares (PLS), Principal Component Analysis (PCA), and the like [6,7]. In current times, AI technologies, specifically the Deep Learning (DL) paradigms, exhibit tremendous promise in designing drugs, due to their impressive generalization and feature extrication power. Conventional ML approaches employ manually crafted attributes, while the DL approaches can learn features from the input information in an automated fashion, leading to reorganization of simple attributes into convoluted characteristics via multi-layer attribute extrication (Figure 1). Moreover, the DL approaches commonly exhibit less generalization errors than the conventional ML techniques, which facilitates them in obtaining more beneficial outputs on some criterion or competitive assessments. As an instance, George Dahl’s team claimed the Merck Molecular Activity Challenge through implementing the AI technology, specifically the DL algorithms [8]. Owing to the aforementioned conveniences, the DL technique as a data mining approach has demonstrated huge prospects in the drug-designing arena. The DL paradigms are generally comprised of Deep Neural Networks (DNN), Convolutional Neural Networks (CNN), Neural Networks (RNN), autoencoders, and Restricted Boltzmann Machines (RBN). A brisk critique of DL algorithms can be located somewhere else [9,10,11], with a more comprehensive preface to DL methodologies in the treatise “Deep Learning” [12]. This review acquaints the reader with the AI models related to the arena of drug design, and provides an exclusive spotlight on the implementation of DL algorithms in new drug discovery and development. The drug discovery, drug design topics and AI approaches are summarized in Figure 2 and Figure 3.
Also, this review familiarizes ML blueprints linked to drug design schemes, including approaches for the molecular depiction, transfer learning for low data, cross-validation strategy and dexterity of training the deep neural networks (DNNs). Lastly, this review encapsulates the utilities of AI in the arena of drug design and provides a futuristic panorama of the outlook of AI in drug discovery and advancement.

1.1. Artificial Intelligence: Facts to Ponder

The last several years have witnessed a radical data digitalization escalation in the pharmaceutical arena. But such digitalization has arrived in response to the demand of amassing, investigating, and engaging that expertise to dissect convoluted clinical issues [13]. This encourages the utilization of AI, as it can process huge quantities of data with augmented mechanization [14]. AI follows a technology-enabled approach invoking multiple cutting-edge tools and networks simulating human intelligence. At the same time, it does not prompt fears of superseding human physical existence, altogether [15,16]. AI exploits software and systems which are enabled to decipher and then trained by input data to arrive at autonomous outcomes for realizing definite aims. As pointed out in this review, its utilization has seen a progressive augmentation in the pharmaceutical sector. As per the McKinsey Global Institute, the swift progression in AI-directed mechanization will probably be thoroughly altering the societal work culture [17,18].

1.2. AI: Networks and Tools

AI engages various method domains, like knowledge depiction, solution exploration, deduction, and amidst them, is itself an axiomatic exemplar of ML. ML utilizes algorithmic logic capable of detecting trends in a cluster of data which can be further categorized. An entity of ML is deep learning (DL), which invokes artificial neural networks (ANNs). These networks involve a group of intercommunicating refined computing components engaging ‘perceptrons’, comparable to neurons in human nervous tissue, and simulating the transmittal of electrical excitations within the human CNS [19]. ANNs comprise a group of nodes, with individual nodes accepting a distinct input, and finally transforming inputs to output, either singly or multilinked, utilizing algorithms to decode problems [20]. ANNs comprise many varieties, like multilayer perceptron (MLP) networks, recurrent neural networks (RNNs), and convolutional neural networks (CNNs), engaging either supervised or unsupervised training operations [21,22]. Such MLP networks have utilities inclusive of pattern detection, optimization facilities, process determination, and controls that are generally trained using supervised training algorithms functioning unidirectionally, and may be utilized as universal pattern classifiers (UPCs) [23]. RNNs are network systems possessing a closed loop, having the power to cram and hoard data, like Boltzmann constants as well as Hopfield networks [23,24]. CNNs are an array of dynamic mechanisms engaging local connections, each determined by its topological architecture, with utilities in image and video signal refinement, biological system simulation, handling complex central neuronal activities, pattern appreciation, and refined signal processing [25]. The highly convoluted schemes comprise Kohonen networks, Radial Basis Function (RBF) networks, Learning Vector Quantization (LVQ) networks, counter-propagation networks (CPNs), and Adaptive Linear Neuron or later Adaptive Linear Element (ADALINE) networks [21,23]. Instances of AI-based method domains are depicted in Figure 2. Various algorithms have been crafted depending upon the interconnections that constitute the fundamental framework of AI paradigms. An instance of this advanced tool employing AI scheme is the Watson supercomputer by International Business Machines (IBM) (IBM, New York, NY, USA). This computing infrastructure was devised to support the scrutiny of a patient’s clinical data and its interrelationships against a mammoth database, culminating in determining intervention modalities for cancer. Such a facility may in addition be also utilized for the swift revelation of afflictions. Its utility was established by its power to identify breast cancer within 60 s [26,27].

2. Futuristic Applications of AI in Drug Design

Drug designing and development is an important area of research for pharmaceutical companies and chemical scientists. In order for a molecule to have any potential as a drug target it must be “druggable”. In the post-genomic era, drug discovery has shifted towards applying new design principles to molecules or new strategies to bind, modulate, or degrade challenging biological targets for future innovative medicines. Traditionally, the pharmaceutical industry has been focusing on developing orally bioavailable small molecules with established targets (druggable targets). Based on the physicochemical profiles of Phase II drugs, Lipinski’s Rule of Five (Ro5) was developed in 1997. Ro5 predicts that poor absorption or permeation is more likely when there are more than five hydrogen-bond donors (HBD > 5), more than ten hydrogen-bond acceptors (HBA > 10), the molecular weight is greater than 500 Da (MW > 500), and the calculated Log P is greater than five (cLogP > 5). Since then, Ro5 has served as a guide for designing developable molecules during drug discovery. While the efforts to discover small molecule Ro5 compounds that interact with established “druggable” targets have been productive, there is an increased demand for innovation to engage newer targets for transformative medicines. As a result, identification and validation of novel biological targets have become a key focus in the early stages of drug discovery. Molecular modalities beyond bRo5, small molecules via nontraditional modes of action (e.g., protein–protein interaction or PPI modulators) include bifunctional bRo5 small molecules (e.g., protein-targeted chimeras or PROTACs), peptides/peptidomimetics, and oligonucleotides (ONs). Carbohydrate-based drug discovery is an up-and-coming area of research in medicinal chemistry. Bioactive carbohydrates have opened up a new source for drug development. More than 170 carbohydrate-based drugs have been successfully approved as anticoagulants, antitumor agents, antidiabetic agents, antibiotics, antiviral agents, and vaccines. However, most carbohydrates have low druggability. New methods and strategies to improve carbohydrates’ druggability are in high demand. Lipids are essential for life. They store energy, constitute cellular membranes, serve as signaling molecules, and modify proteins. In the long history of lipid research, many drugs targeting lipid receptors and enzymes that are responsible for lipid metabolism and function have been developed and applied to a variety of diseases. Lipid signaling pathways (prostanoids, leukotrienes, epoxy fatty acids, sphingolipids, lysophospholipids, endocannabinoids, and phosphoinositides) and lipid signaling proteins (lysophospholipid acyltransferases, phosphoinositide 3-kinase, and G protein-coupled receptors (GPCRs) offer a wide array of druggable targets. However, the vast majority of the targets of approved drugs are proteins. A druggable protein is one that possesses folds that favor interactions with small drug-like molecules, be they endogenous or extraneous, and therefore is one that contains a binding site. These binding sites are expected to have certain attributes that enable high affinity site-specific binding with the drug-like molecule. As with all drug targets, a potential protein drug target must be linked to a disease process. Currently, there is a lack of knowledge about both the number of proteins that modern pharmaceuticals act upon and the number of potentially druggable proteins. The development of efficient and advanced systems for the targeted delivery of therapeutic agents with maximum efficiency and minimum risks has imposed a great challenge upon chemical and biological scientists. Researchers across the globe engage in traditional computational approaches like virtual screening (VS) and molecular docking to identify and characterize protein–protein as well as drug–protein interactions. But these approaches are imprecise and inaccurate. In addition, complex and big data from genomics, proteomics, microarray data, and clinical trials also impose an impediment in the drug discovery pipeline. AI and ML technology are innovative approaches that play a crucial role in drug discovery and development. This section of the review article provides crucial insights about how artificial neural networks (ANNs) and deep learning (DL) algorithms have modernized the techniques of elucidating the structure and function of proteins, unravelling hits, hit-to-lead optimization algorithmic schemes, and in silico evaluation of ADME/T properties. ML and DL algorithms have been implemented in several drug discovery processes such as peptide synthesis, structure-based virtual screening, ligand-based virtual screening, toxicity prediction, drug monitoring and release, pharmacophore modelling, quantitative structure–activity relationship (QSAR), drug repositioning, poly-pharmacology, and physico-chemical activity. Evidence from the past strengthens the implementation of AI and DL in this field. Moreover, novel data mining, curation, and management techniques provide critical support to recently developed modelling algorithms. In summary, AI and DL advancements provide an excellent opportunity for a rational drug design and discovery process, one which will eventually impact mankind.

2.1. The Structure and Function of Proteins

2.1.1. Prognostication of Protein Folding from Sequence (Predicting the 3D Structure of a Target Protein)

Most diseases are linked to dysfunctional proteins. By scrutinizing protein architectures, the structure-based drug design blueprints can be employed to originate the active small compounds for the protein targets. But computing the three-dimensional (3D) architectures of the proteins would currently require a huge amount of time and finances, so it is beneficial to craft software codes to foretell the 3D architecture of a protein. Though the sequence data of almost all proteins is accessible, it is still not possible to deduce precise de novo presaging of their 3D architectures. Of late, due to the impressive power of attribute extrication, DL approaches continue to be implemented to foretell the secondary structure [28], backbone torsion angle [29] and residue contacts of proteins [30]. To cite an example, the DL approach capable of amalgamating one-dimensional (1D) with two-dimensional (2D) “Combinatorial Neural Network (CNN)” to foretell the residue contacts defeated other approaches in “12th Community Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction (CASP12)” [30,31]. The architecture of DL may flawlessly learn the linkages among the sequence and the structure via attribute extraction. At present, accurately foretelling 3D architectures of proteins is still an unrealized goal. Hence, the DL approach has demonstrated tremendous potential in advancing the development in this arena.

2.1.2. Prognostication of Protein–Protein Interactions

The protein–protein interactions (PPIs) are vital for various biological systems and may be associated with many disorders [32,33]. A PPIs database, viz., the “Search Tool for the Retrieval of Interacting Genes/Proteins (STRING)” database, hoards close to 1.4 billion PPIs imported from both experimental as well as bioinformatics schemes via literature curation [34]. Moreover, STRING also hoards computationally presaged interactions culled from:
  • text mining of scientific documents,
  • interactions estimated from genomic attributes, and
  • interactions conveyed from model organisms, depending upon orthology.
All approximated/imported interactions are gauged against a common reference of functional association as commentated by the “Kyoto Encyclopedia of Genes and Genomes (KEGG)”.
The PPI interface is represented as the protein–protein association loci made up of multiple residues [35]. It has the potential to usher in a new group of drug targets that are in contrast to the conventional drug targets like ion channels, G-protein coupled receptors (GPCRs), kinases, and nuclear receptors [36,37]. To explain, there exist 1756 non-peptide inhibitors across 18 families of PPIs recorded in the “inhibitors of protein–protein Database (iPPI-DB)” [38]. Being a fresh group of targets, PPIs will expand the target space and boost the advancement of the small molecule compounds [39]. In contrast to conventional approaches, targeting PPIs may lessen the adverse effects as it enhances the biological specificity of regulatory actions [40]. For example, compound DC_AC50 can block copper ion movement inside cells by associating with the copper-transfer mechanisms, and curb tumor cell multiplication selectively without concomitantly interfering with the usual somatic cell durability [41].
To accomplish the concept of drug design dependent upon the architecture of the protein–protein complex, it is imperative to scrutinize the PPI interface. Regrettably, in a great many instances, the precise PPI information is scant [42], thereby spawning a multitude of computational schemes for foretelling the PPI interface. The approach relying on the template is straightforward and highly dependable due to the attribute preservation of PPI interfaces [43]. To cite an exemplar, “eFindSite” [44], a web server for PPI interfaces forecasting, engages template-dependent, residue-dependent, and sequence-dependent determinants to promote “Support Vector Machine (SVM)”, and “Naïve Bayes Classifiers (NBC)” paradigms. According to the doctrine of complementarity, the protein–protein docking processes (e.g., “ZDOCK” [45] and “SymmDock” [46]) can be utilized to foretell the PPI interface when the architecture of two interactive proteins is accessible [47]. Of these approaches, the vexing problem is how to foretell the conformational shifts when two unassociated proteins combine with each other. DL approaches can extricate the most pertinent sequence attributes to presage PPI interfaces, which exemplifies a noticeable enhancement in comparison to other ML technologies like SVM [48].
Taking note of the sizable buried surface area region (1500–3000 Å2) of the interface [33], it is imperative to hunt for the druggable locales or regions on the interface. Detectable hot spots could possibly represent the druggable loci since it furnishes a sizable quantum of binding free energy [35]. “Fragment Docking and Direct Coupling Analysis (FD-DCA)” has been engaged to identify the druggable PPI loci [49]. Researchers initially devised a fragment docking package named “iFitDock”, that could be utilized to scout for the druggable hot spots residing in PPI interfaces. Consequently, the small hot spots were bundled to establish candidate binding loci. Ultimately, the scoring function dependent upon the evolutionary conservative level was engaged to identify the optimized protein–protein binding locales. Collectively, the hot spots residing in the PPI interface have turned out to be encouraging drug targets and it is worthwhile to evolve computational avenues for determining the hot spots and crafting small modulators aiming at PPI interfaces.

2.1.3. Prognosticating Drug–Protein Interactions

Drug–protein interactions (DPIs) play a crucial part in the success of a therapeutic entity. The prognostication of a drug–receptor or a drug–protein association is imperative to comprehend its effectiveness and success, enabling drug repurposing, and thwarts poly-pharmacology [50]. A multitude of AI approaches have been beneficial in the precise prognostication of ligand–protein interactions assuring augmented therapeutic effectiveness [50,51,52,53,54]. A description of a model utilizing the SVM method, incorporating training on about 15,000 protein–ligand interactions, that have been advanced grounded upon primary protein sequences as well as structural features of small compounds to identify nine fresh molecules and their interaction with four vital targets have been documented [55].
A research group took advantage of two RF paradigms to presage achievable DPIs due to the amalgamation of pharmacological as well as chemical information and corroborating them against familiar algorithms, like SVM, with great sensitivity and specificity. Moreover, such approaches were adept at presaging drug–target interactions which may be further widened to capture target–disease as well as target–target interactions, hence hastening the drug discovery mechanism [56]. Espousing the “Synthetic Minority Over-Sampling Technique (SMOST)” and the “Neighborhood Cleaning Rule (NCR)” to gather refined data for the ensuing advancement of iDrugTarget has also been documented. This is a merger of four subpredictors (iDrug-Chl, iDrug-Enz, iDrug-GPCR, and iDrug-NR) for determining associations among a drug and ion channels, enzymes, G-protein-coupled receptors (GPCRs), and nuclear receptors (NR), respectively. Upon correlation with extant predictors via target-jackknife tests, the earlier technique outperformed the latter with respect to both forecast precision and consistency [57].
The power of AI to foretell drug–target associations has also been employed to boost the repurposing of available drugs and obviating poly-pharmacology. Repurposing an already available drug entitles it undeviatingly for Phase II clinical trials [58]. Such strategies curtail financial outlay, as reintroducing an already-available drug costs USD ~8.4 million in comparison with the introduction of a fresh drug entity (USD ~41.3 million) [59]. The ‘Guilt by association’ method may be engaged to foretell the ingenuous interaction between a drug and disease, that is either a knowledge-based or computationally guided interactive grid [60]. For computationally impelled networks, the ML method is extensively employed, which engages methods like SVM, NN, and logistic regression, as well as DL. Algorithms based on logistic regression, like “PREDICT”, “SPACE”, as well as other ML techniques, take into consideration disease-to-disease and drug-to-drug resemblance, the comparability among target compounds, chemical architecture, and gene expression summary at the time of drug repurposing [61].
Cellular network-guided DL technology (“deepDTnet”) has been scrutinized to foretell the therapeutic utility of topotecan, presently utilized as a topoisomerase blocker. This could additionally be employed as the treatment for multiple sclerosis by causing blockade of human retinoic acid receptor-associated orphan receptor-gamma t (ROR-γt) [62]. This package is presently covered by a temporary US patent. Self-Organizing Maps (SOMs) constitute the unsupervised subdivision of ML and are utilized in drug repurposing. SOMs engage a ligand-dependent pathway to determine novel off-targets for a group of drug compounds by training the algorithm on a designated count of molecules with perceived biological actions, that is subsequently utilized for the investigation of various agents [63]. In contemporary practice, Deep Neural Networks (DNN) have been engaged to repurpose extant drugs with established actions towards influenza virus, SARS-CoV, HIV, and drugs which happen to be 3C-like protease blockers. For this, Extended Connectivity FingerPrint (ECFP), Functional-Class Finger-Prints (FCFPs), and a Ghose-Crippen octanol-water partition coefficient (ALogP_- count)” were contemplated to train the AI algorithm. As per the results, it was determined that 13 of the molecules subjected to screening could be advanced for further advancement depending on their cellular toxicity and viral blockade activities [64].
Drug–protein associations may also foretell the probability of poly-pharmacology, referring to the proclivity of a drug compound to bind with various receptors resulting in off-target adverse actions [65]. AI can craft a new compound based on the philosophy of poly-pharmacology and facilitate the origination of safer drug compounds [66]. AI algorithms like SOM, in concert with the enormous databases accessible, may be engaged to connect multiple molecules to many targets and off-targets. Bayesian classifiers and Schoof–Elkies–Atkin (SEA) algorithms could be utilized to provide connections betwixt the pharmacological attributes of drugs and their feasible targets [63].
A research group determined the utility of “KinomeX”, an AI-enabled online algorithmic tool engaging DNNs for the identification of poly-pharmacology of kinases depending on their chemical architectures. This package utilizes DNN trained with ~14,000 bioactivity data points advanced and optimized from > 300 kinases. Hence, this has pragmatic pertinence in analyzing the overall specificity of an agent for the kinase family and certain subfamilies of kinases, hence facilitating in scheming novel chemical alterants. This research group employed NVP-BHG712 as a prototype molecule to foretell its primary targets and in addition its off-targets with justifiable precision [67]. One obvious exemplar is Cyclica’s proteome-screening AI scheme that is cloud-based, christened “Ligand Express”. It can be employed to identify receptors that can associate with a specific small compound (the molecular attributes of which is contained in SMILE string) and yield on- and off-target associations. This aids in comprehending the probable adverse actions of the medicinal molecule [68].

2.1.4. De Novo Drug Design

During the last several years, the de novo drug design method has been extensively employed to craft drug compounds. The conventional approach to de novo drug design has been substituted for by emerging DL paradigms, the earlier one having drawbacks of convoluted synthesis pathways and bothersome augury of the biological effects of the innovated compounds [69]. Computer-guided compound crafting may also offer millions of chemical architectures which could be originated and in addition presage multiple variable pathways for them [70].
Origination and advancement of the “Chematica program” [71], now renamed “Synthia,” has the capacity to cipher a group of axioms into the machine and recommend feasible synthesizing pathways in respect of eight medicinally indispensible targets. Synthia has confirmed efficiency from the point of view of bettering the harvest and curtailing the expenditure. The program is well-suited to catering to substitute synthesizing blueprints for patented items and is conceived to be beneficial in the generation of molecules that are yet to be originated. From an analogous context, DNN emphasizes upon edicts of organic chemistry as well as retrosynthesis, which, along with the help of “Monte-Carlo Tree Searches (MTCS)” and symbolic AI, assist in reaction forecasting and the mechanism of design and unravelling of drugs, that is much more nimble compared to conventional approaches [72,73].
A research group refined a framework where an inflexible forward reaction blueprint was practiced on a set of reactants to generate chemically achievable products having a convincing reaction rate. ML was utilized to analyze the principal product depending upon a score provided by the NNs [74]. A DNN framework termed the “Reinforced Adversarial Neural Computer (RANC)” based on Reinforcement Learning (RL) was utilized for small organic molecule de novo design. This system underwent training with compounds characterized as SMILES strings. This then originated compounds with preordained chemical descriptors with respect to MW, logP, and Topological Polar Surface Area (TPSA). RANC was probed against one different platform, ORGANIC, where RANC performed better in originating unique structures free from notable attenuation of the length of their structure [75].
RNN was also dependent upon the “Long Short-Term Memory (LSTM)” associated with compounds garnered from the ChEMBL database and introduced as SMILES strings. Such a module was engaged to originate a varied library of compounds for Virtual Screening (VS). Such a method was targeted to solicit innovative compounds for a specific target, like 5-HT2A receptor, Staphylococcus aureus, as well as Plasmodium falciparum target sites [76].
The Reinforcement Learning for Structural Evolution (RLSE) program dealing with de novo drug synthesis by engaging generative as well as predictive DNNs to evolve fresh molecules has been documented. For this, the generative paradigm delivers more exclusive molecules with respect to SMILE strings grounded upon a stack memory, although the predictive approaches are utilized to presage the attributes of the originated molecule [77]. Efforts are underway to harness the generative AI paradigm to craft retinoid X as well as PPAR agonist compounds, with coveted therapeutic outcomes free from necessitating baffling regulations. Five molecules have been successfully originated, four of these have exhibited convincing modulatory actions in cell assays, hence underscoring the utility of generative AI in fresh compound origination [78]. The performance of AI in the de novo modeling of compounds can be salutary to the pharmaceutical industry due to its multifarious benefits, such as making provision for online learning and concurrent refinement of the previously-learned data and proposing feasible synthesis pathways for compounds, with consequent brisk lead conception and progression [76,79].

2.2. Hit Discovery

2.2.1. Drug Repurposing

Drug repurposing, also known as drug repositioning, is interpreted as the method to identify ingenious therapeutic applications of the approved drugs [80,81], which can shorten the period and perils of drug advancement [80]. Drug repurposing is achievable since many drugs may have numerous targets [82] and the targets may elicit their varied actions, which exemplifies the high heterogeneity of drug-disease association. To cite an example, metformin, which was ratified for the management of type two diabetes, may prolong lifespan [83,84,85].
Drugs and diseases are two fundamental components related to repurposing a drug. Auxiliary aspects are also associated with drug repurposing, like targets for drugs and genes for diseases. Owing to the multifariousness of the associations, network scrutiny may be engaged to portray the relationships among these aspects [81]. For the purpose of drug design, there happen to be nine types of meaningful networks: gene regulatory, metabolic, protein–protein, drug–target, drug–drug, drug–disease, target–disease, drug–adverse effect, and disease–disease networks [81]. The primary assumption of the network-dependent scheme is that the analogous drugs frequently possess comparable targets or activities [86]. The data contained in the singleton network is restricted and fractional, hence it is imperative to merge multiple networks to generate the conglomerate network for repositioning one drug. Specifically, it is critical to integrate drug repurposing with the drug target forecasting, as the target could be conceived as a connection from the medication to the affliction. DTINet, a diversified network harmonizing the information of numerous networks via the network diffusion algorithm and the dimensionality reduction methodology, was utilized to foretell the fresh target and therapeutic niche [87]. To cite an exemplar, this approach proposed that alendronate, chlorpropamide, and telmisartan could possess novel cyclooxygenase blocking actions. These activities were subsequently substantiated experimentally by assessing the generation of proinflammatory components, and these three molecules thus furnish high fidelity hits for forestalling inflammation.

2.2.2. Virtual Screening (VS)

Virtual screening implies the implementation of algorithm and software to identify bioactive molecules (hits) from in-house compound assemblage or commercial chemical libraries, that offer a hugely efficient scheme to unravel novel hits and refine out molecules with disadvantageous scaffolds during the early stages of drug development [6]. Virtual screening approaches consist of docking-based, pharmacophore-based [88], similarity searching [89], and ML schemes [90]. Broadly speaking, these approaches can be allocated into two types of virtual screening: structure-based and ligand-based. Molecular docking has been extensively applied when the target protein 3D architecture is accessible [91]. Though numerous successful implementations of docking-aided virtual screening have been established [92], there remain glaring disadvantages of this technique. For instance, scoring function of docking is unable to foretell binding affinities precisely due to incomplete attention to solvation and entropic aspects [93], and the protein flexibility renders the issue even more convoluted [91]. Furthermore, as most docking approaches only account for binding affinities and overlook other parameters like the residence period [94], the docking score is not an optimal clue for drug effectiveness and the false positive frequency of the docking-associated VS is large [91,95].
Contrary to the docking-related virtual screening schemes, the ligand-based virtual screening approaches do not bank on the 3D protein architectural data. They attempt to correlate the molecular attributes (descriptors) with bioactivity classes [6]. Contextually, ML algorithms like SVMs have been often utilized for virtual screening [7,90,96], which has demonstrated significant yields (ratio of presaged known hits) and decremented false-hit frequencies concomitantly (false hit in anticipated hits) [97]. Of late, DL approaches have been tested in VS attributable to their incredible classification capabilities, robust feature extrication power and small generalization error [10,98]. For instance, scanty presence of the active moieties in the general database customarily exhausts a substantial quantum of search duration at VS [77,97]. To provide a solution to this roadblock, a LSTM network scheme grounded on the analogy among natural language and the Simplified Molecular Input Line Entry Specification (SMILES) was implemented to engender targeted molecule libraries with compounds comparable with the training compounds [77]. The new molecular libraries engendered by Recurrent Neural Networks (RNNs) can be probed with ML algorithms like Deep Neural Networks (DNNs) and Gradient Boosting Trees (GBTs). Likewise, due to the compelling generative capacity, an Adversarial AutoEncoder (AAE) model was trained dependent upon the NCI-60 cell line assay information [99], which can then be implemented to originate molecular fingerprints for exploring promising anticancer agents [100].

2.2.3. Activity Scoring

As stated earlier, the fundamental attribute of molecular docking happens to be the scoring function, which is crafted to appraise the binding proclivities of the drug-like moieties for a target of relevance [101]. Due to the robust nonlinear mapping capacity, ML- associated scores display improved method execution by extricating numerous attributes efficiently, like the geometric attributes, chemical characteristics as well as physical force field traits [102]. Such scores may be contemplated as data-impelled black box paradigms as they foretell the binding affinity or ligand-protein binding association from experimental information directly and bypass the consideration of the convoluted physical function associated with docking [103]. ML algorithms such as Random Forest (RF) and Support Vector Machine (SVM) can be engaged to enhance the performance of scoring function. As an instance, in place of utilizing the linear additive acceptance of energy terms, an SVM paradigm chronicled the nonlinear association among the particular energy terms borrowed from the docking algorithm “eHiTS” and experimental binding affinity information exhibited enhanced screening power in addition to scoring capability [104,105]. Wang and Zhang documented a ΔvinaRF parameterization correction approach amalgamating RF with AutoDock scoring function [106], which displayed an admirable performance correlated with GlideScore XP [107]. In recent times, owing to the exemplary performance of Convolutional Neural Networks (CNN) in the vista of image refinement [108], several researchers have endeavored to utilize CNN to glean the attributes from protein-ligand interactions image in order to foretell the protein-ligand affinity. A research group utilized a 3D graph CNN algorithm to foretell ligand-protein binding proclivities [109], which provided data that the anticipated binding tendencies had acceptable correlation with experimental information within the datasets. The true capability of DL rests in its competence to grasp convoluted and abstruse attributes from elemental and primeval visage. Hence, it is vital to portray the basic aspects of the compound protein complex like the atom types, atom charge, atom distance and amino types [108]. Deep VS, an algorithmic scheme grounded on CNN, can grasp the abstruse attributes from the key characteristics (the atom frame of reference) and it performed better than the conventional docking algorithms like Internal Coordinate Mechanics (ICM) [109], and GLIDE SP [110] on the Directory of Useful Decoys (DUD) in the context of “area under the curve of receiver operating characteristic (AUCROC)” and enrichment factor [111]. In principle, the CNN approach foretells the binding tendencies by gleaning the characteristics in the protein-ligand association image, which is quite analogous to a knowledge-dependent scoring function coupled with augmented prognosticative prowess.

2.3. Hit-to-Lead Optimization

2.3.1. Quantitative Structure-Activity Relationship (QSAR)/Quantitative Structure-Property Relationship (QSPR) and Structure-Aided Modelling with AI

In the course of hit-to-lead refinement, QSAR analysis could be utilized to determine the potent leading molecules from a collection of hits analogues by foretelling biological effects of related molecules. QSAR especially points to the utility of mathematical techniques for determining the quantitative mapping of the architectural or physicochemical attributes of molecules and their affiliated biological effects [112]. QSAR investigation primarily encompasses data capture, selection, and origination of molecular descriptors, formulation of mathematical paradigms, and evaluation and analysis of models, as well as engagement of models [113]. Of these, the crucial points are the depiction of the chemical architecture and the mathematical paradigm capturing QSAR. Following choice of the descriptors, it is imperative to identify a relevant mathematical algorithm to match the structure-activity co-relationship. In 1964, Hansch et al. advanced the famous “Hansch Equation” that ingeniously employed linear regression techniques in relation to physicochemical descriptors (the hydrophobic parameter, the electronic parameter and the steric parameter) for narrating the 2D structure-activity correlation, leading to the era for QSAR evaluation [114]. In the same year, a research group developed the “Free-Wilson” technique to narrate the association between the chemical architecture and bioactivity depending upon the assumption that the addition of substituents to the actions of the compound is supplementary [115]. In contrast to the Hansch technique, there is no need in the Free-Wilson approach for the physiochemical criteria and it can directly foretell the biological actions from the chemical architecture by ciphering the chemical framework. Owing to the progress of ML algorithms, numerous approaches have subsequently been utilized to engineer mathematical paradigms [116,117,118], like RF and SVM. Of late, DL algorithms have undergone integration with QSAR modeling due to the capacity of handling disparate chemical attributes coupled with the worthiness of extricating attributes in an automated fashion. George Dahl’s team triumphed in the Merck Molecular Activity Challenge (a Kaggle championship event conducted in 2012 and concerned with QSAR problems), by the composite (ensemble) paradigms comprising the multi-task DNN, Gaussian progress regression as well as gradient boosting machine techniques [8]. Energized by the Kaggle championship outcomes, Dahl et al. went on to methodically investigate the multi-task DNN and his output had demonstrated that multi-task DNN outclassed single-task neural network approach as the multitask approach may recognize generic characteristics by allocating specifications of varied but allied assignments [119]. A research group amalgamated multi-task neural networks within the “DeepChem” platform that assisted the engagement of multi-task neural network algorithms in relation to drug advancement [120]. They also assessed the execution efficiency and identified the fact that multi-task deep networks were quite powerful and superior to random forests (RFs) on diverse assignments. Another research group engaged the DNN with Canvas descriptors to construct the classification and regression paradigm to foretell the binding proclivities of the human β-secretase 1 (hBACE-1) blockers [121]. On the validation set, this DNN technique engendered robust classification capacity with a certainty of 0.82 and, in addition, displayed favorable regression capability with the coefficient of determination (R2), and mean absolute error (MAE) of 0.74 and 0.52, respectively. Also, their outcomes have demonstrated that the DNN approach coupled with 2D descriptors offered superior results compared to the force-field-dependent schema (e.g., CoMFA), which is partly because of the compelling generalization proficiency of DLapproaches. Evidently, DL-aided QSAR algorithms with augmented activity prediction accomplishment will exert a more salutary role in the subsequent hit-to-lead refinement techniques.

2.3.2. Generative Schemes for De Novo Drug Design with AI

De novo drug design relates to engendering fresh chemical moieties to inflect the target of relevance [122]. The conventional de novo design approaches like the fragment-aided method can engender fresh moieties from scratch. But most of these are onerous to originate owing to the complexity and inapplicability of the compound architecture [123]. Moreover, it is difficult to assess their biological effects because of the demerits of scoring functions ascribed to in an earlier section.
Due to the compelling generative capacity and learning capability, DL algorithms have been engaged to automatically engender new architectures with several coveted attributes [124]. One research group advanced the deep reinforcement learning approach to refine the RNN to originate the compounds with anticipated biological effects [125]. The “Simplified Molecular-Input Line-Entry System (SMILES)” structures of molecules culled from ChEMBL were employed to train the “Recurrent Neural Network (RNN)” for obtaining the syntax of SMILES, and the RNN could originate the molecules by representation from the conditional probability distribution related to the training group. With reference to reinforcement learning, Agents are the decision-takers who execute actions in the presence of specified conditions. When an Agent’s action results in a positive reward, the Agent’s proclivity of producing this action will be augmented [126]. SVM has been engaged to promote the action protocol for receiving the high anticipated reward from activity scoring depending upon the ligands in the training dataset. In the instance of selection of RNN along with the deep reinforcement learning (DRL) paradigm to originate agents for dopamine receptor type 2, > 95% of the architectures were presaged to be biologically active from the SVM scoring function.
An added utility of generation paradigms with DL is the engagement of auto-encoders to spawn novel molecules. A research group amalgamated the Variational AutoEncoder (VAE) with the Multilayer Perceptron (MLP) to originate fresh molecules with salutary attributes in an automated manner [127]. The network comprised of three components: the encoder, the decoder, and the predictor. While the encoder converts disconnected SMILES strings into uninterrupted vectors in latent space, the decoder can convert these vectors in reverse order to the disconnected SMILES strings. Engagement of MLPs is done to foretell the attributes of the compounds, and the gradient-enabled refinement can be utilized to determine the uninterrupted vectors with high predictive potential of the characteristic. Due to the durability of the vector depiction in the latent space, the gradient-enabled optimization united with Bayesian reasoning can be employed to promptly determine the compounds with coveted attributes. The paradigm has the power to originate a human-intelligible chemical architecture with greater predictive effects in an automatic fashion. But it also resulted in several instances of fallacious SMILES production which are unrelated to legitimate chemical architectures. To surmount this problem, a group engaged the grammar VAE to render the result more efficient by specified SMILES syntax [128]. Very recently, a group popularized an AAE scheme named druGAN to originate molecular fingerprints, that outclassed the VAE paradigm in the context of the reconstruction error, generation power, and attribute extrication potential [129].
To assess if an engendered molecule is synthetically attainable, the group of Coley et al. specified a synthetic complexity metrics by tuning-up a neural network algorithmic scheme based on a reaction database [130]. The axiom for calculating synthetic intricacy is that the synthetic reaction is a system that will raise complexity of the reacting agent. In case of a synthetic reaction, it implies that the product complexity score must exceed that of the reacting agents. Hence, the aforementioned group ciphered a chemical reaction into many (reactant, product) combination duets and undertook to formulate a scoring function for depicting the inequality association among reactant complexity and product complexity. As the neural networks possess compelling function approximation capabilities [131], this group utilized 22 million (reactant, product) duets in order to train the neural network in order to account for scoring function learning, with their outputs elucidating that the learned function (SCScore) might well chronicle the intricacy enhancement in the synthesis method. This scheme will facilitate chemists in achieving the inverse synthetic analysis and in addition assist them set aside improbable molecules in drug design by appraising the synthetic intricacies.

2.3.3. Automated Chemical Synthesis Planning with AI

Foretelling the Retrosynthesis Roadmap

Retrosynthesis is a refined scheme for conniving organic synthesis. In concert with the progress of AI, this assignment can now be undertaken much more skillfully [73,74,132,133,134]. Following virtual screening of a molecule for its conceivable biological effects and toxicity attributes, the hunt for an efficient chemical synthesis roadmap to generate the drug candidate is initiated. This task is often imposing and unrefined. In spite of awareness of hundreds of thousands of conversion steps, there is no assurance that novel molecules may be expertly generated owing to novel structural characteristics or opposing reactivities [135].
Retrosynthesis analysis recursively probes for ‘backward’ reaction pathways until a group of less complex, accessible precursor compounds are accomplished [74]. As retrosynthesis pathway presage engages successive scissions of the lead molecule at numerous locations, Monte Carlo Tree Search (MCTS) [136] is the algorithm of choice for executing branch choices. Monte Carlo simulations implement random search steps with no branching till the time an optimal result is reached. Hitherto, software programs for Computer-Assisted Synthesis Planning (CASP) [137,138] were advanced to facilitate retrosynthesis scrutiny, but missed gaining full approval with the chemists. Such algorithms insist that human insight is integrated within executable schemes, but formalization of chemistry with the aid of manual ciphering will not add up to exponentially multiplying knowledge, and the outputs fetched from reaction databases were most frequently devoid of chemical intelligence [74]. ML algorithms trained on provisional data may now be engaged; (i) to foretell the chances of a conversion at a distinct branching location, and (ii) to pilot the choice of the random steps. For individual conversion steps, the compound (or an intermediary molecule) could be related to distinct forerunner molecules through preordained conversion axioms. Training of AI packages can be done from the scientific documentation with regard to the yields and expenses of these conversion statutes, and the AI can then presage the best suitable retrosynthesis conversion roadmap for a selected compound.
A recently documented 3N-Monte Carlo Tree Search (3N-MCTS) technique [74] incorporates three disparate neural networks incorporating MCTS to engender a roadmap for Critical Assessment of Protein Structure Prediction (CASP). CASP exemplifies the latest state-of-the-art in modelling protein structure following up from amino acid sequence. Each network executes a separate assignment: (i) an expansion node; (ii) a rollout node; and (iii) an update node. For the expansion node, the algorithmic process probes for fresh prospects for modifying the compound (or an intermediary molecule), retrospectively. This embodies an ‘in-scope’ protocol where the workability of a modification is appraised depending upon 12.4 million conversion axioms from the scientific documentation [139]. In order to foretell the best modification for the compound (or intermediary molecule) available, and hence pilot the selection of expansion routes, training the neural networks is imperative. As the literature copiously contains positive data, a modification is deemed less workable if its reverse reaction provides a high yield. Also, choosing high-yielding conversions also facilitates in excluding the chances of by-products [74]. For the rollout node, the ‘in-scope’ protocol is analogous to that in the expansion node, with the exception that only commonly documented conversion axioms are utilized. This approach permits a gradual and meticulous hunt for the optimized conversion alternatives at the time of the expansion period, but swifter scrutiny of position values during the rollout phase [140]. For the update node, the appraisal of a distinct roadmap is amalgamated into the search tree. In the case of a molecule deposited for retrosynthesis scrutiny, these nodes are executed repetitively to explore for conversions with the greatest scores. The latter can ultimately determine feasible precursors associated with the complete reaction pathway [74].
Along with the determination of a reaction route, the time expended to arrive at a solution is also a vital parameter of algorithm execution. A time constraint could be imposed to evaluate the fraction of issues which an algorithm is able to tackle. The achievement of MCTS on the test group of compounds was better than various available software. MCTS tackled 80% of retrosynthesis issues when a 5s per molecule time constraint was imposed [74], and the frequency of arriving at a solution can surpass 90% if the time constraint is extended to 60 s. More attractively, the speed for each molecule for 3N-MCTS is 20-times swifter compared to the conventional Monte Carlo approach [74].

Prediction of Yield of Reaction and Understanding of Reaction Scheme

AI packages can not only depict synthesis pathways but in addition can adequately foretell the products with yields of organic reactions depending upon the molecular characteristics of the reactants. Previously, foretelling the result of complex chemical reactions was associated with a sizable bottleneck [133]. Quantum chemistry methods, for instance, the “Hartree–Fock Method” [134], semi-empirical processes (AM1, PM3), and density functional theory, will possibly surmount such an obstacle, and in various scenarios the result of tests can be optimally simulated in silico. Many studies engaging AI schemes to automate, advance, and establish yield prediction have of late been documented for this sector [141,142,143,144], and Doyle and Dreher testified that ML can be engaged to foretell the returns of a Buchwald–Hartwig coupling reaction [145]. The aforementioned reaction engenders carbon–nitrogen bonds amid aryl halides and amines, utilizing the catalyst palladium, and has been extensively practiced for the total synthesis of pharmaceuticals where aryl amine bonds are pervasive. For this scenario, the vibrational frequencies and dipole moments computed by quantum chemistry were considered as descriptors, and the ultimate product returns from a provided group of reactants were generated by high-throughput experimental syntheses. A RF scheme was then utilized to probe the association among the input descriptors and product returns [133]. At the time of utilizing reactant variants, the algorithm also presaged the yields of other anticipated products with great precision [145].

Synthesis Methods Digitized and Standardized

There are enterprising plans to harness AI to mechanize chemical syntheses with limited manual processes. Recently proven technologies, like the ‘solid phase’ scheme where the growing polymer chain is attached to an insoluble matrix, have mechanized the generation of many classes of agents inclusive of peptides [146] as well as oligonucleotides [147]. But these depend on distinct protocols due to the shortage of standardized digital mechanization methods for computer monitoring of chemical reactions, and no universal software is present for computational governance of chemical operation systems [148] (Table 1). The “Chemputer platform” [149] was newly advanced as a standard benchmark which integrated codified standard recipes, or chemical codes, for compound synthesis. The scheme was executed with the “Chempiler program” [149], one that obtains codified methods from a scripting language “Chemical Assembly (ChASM)”, which also regulates distinct low-level execution rules for the modules that make up the structure of the robotic system. ChASM draws upon a chemical descriptive language (ΧDL) which exclusively and methodically amasses the complete obligatory information for a synthesis operation [149]. The physical modules (e.g., the source flask and the target flask) and their network arrangement and portrayal are depicted as a directed graph by engaging an open-source markup language termed “GraphML” [150]. With “GraphML”, “Chempiler” is capable of governing the robotic procedures in a manner that users can exactly execute chemical syntheses without manual restructuring. This system had been approved by the fruitful synthesis of three pharmaceutical molecules: diphenhydramine hydrochloride, rufinamide, and sildenafil, bereft of any human interference, and with outputs and pureness of products commensurate with or superior to those accomplished manually [149]. This work epitomizes a leap towards the full mechanization of bench-scale chemistry with supplementary benefits of augmented replicability, security, and availability of complex compounds.

AI-Enabled Mechanized Reaction Space Sampling

Synthesis robots in conjunction with AI can also be utilized to examine the uncharted reaction space. Of late, Leroy Cronin and group employed a synthesis robot to execute reactions with non-premeditated substrates where the choice of substrates was communicated as a vector depiction that was accepted as the input for the SVM model [168]. Employing mechanized reaction appraisal of the sample with infrared (IR) and NMR spectroscopy, the model implemented a dual categorization of the reactivity of each substrate duo. The reaction database was then revised appropriately, and a Linear Discriminant Analysis (LDA) [169] algorithm was trained on the chemical space to foretell the possibility of the reactions left. LDA explores a linear amalgamation of chemical characteristics that foretell whether a reaction occurs or does not occur. This repetitive workflow was determined to foretell the reactivity of roughly 1000 reaction combinations demonstrating ~ 80% accuracy employing real-time information from only a few experiments [170]. When this ‘self-driving’ methodology was further brought to bear upon Suzuki–Miyaura reactions [171], the predicted reactive combinations were tracked manually by a chemist, with subsequent uncovering of four hitherto uncharted reactions. Following comparison with the reactants and products of millions of reactions, the Tanimoto similarity scores [172] of the four already unknown reactions were determined to be in the top 10 percentile, proposing the concept that these reactions are separate from others selected at random [170]. This method is a crucial step in the digitization of chemistry that might permit real-time exploration of chemical space to be an actuality, and facilitate chemists in uncovering fresh drug leads in a more time- and cost-effective fashion.

2.4. In Silico Assessment of ADME/T Attributes

2.4.1. Physico-Chemical Characteristics

Early recognition of compounds with undesirable physico-chemical characteristics in a drug discovery channel indubitably decreases the possibility of loss. Various DL-based algorithms have been advanced on this issue [173]. Duvenaud et al. employed the CNN–ANN to foretell the solubility by extricating data straight from the molecular graph with a compelling predictive capability outcome (MAE is 0.53 ± 0.07) [158]. The high point of this approach rests in its tractability. As an instance, the pieces endowing molecule solubility like hydrophilic R-OH substituent can be realized by model backtracking. Encouraged by Duvenaud’s effort, the research group of Coley et al. used a tensor-dependent convolutional inlay of associated molecular graphs schema to foretell the molecular aqueous solubility, which outclassed Duvenaud’s scheme (MAE is 0.424 ± 0.005) [130]. The scheme used a molecular tensor assimilating the bond-level as well as atom-level characteristics to chronicle associated molecular graph. In comparison with Duvenaud’s paradigm, Coley’s scheme employed greater atom-level data to foretell the aqueous solubility of the compound.
Due to the fact that a strong relationship was identified among oral drug absorption and Caco-2 permeability coefficient (Papp) [174,175], presaging the candidate drug Papp performs a crucial role in appraising the pharmacokinetic characteristics of candidate agents. 1272 molecules have been culled with Caco-2 permeability information by utilizing Boosting, Support Vector Machine (SVM) regression, Partial Least Squares (PLS), and Multiple Linear Regression (MLR) to establish the presaging algorithms with 30 descriptors [176]. The Boosting model displayed the best outcomes with compelling predictive capability (R2 = 0.81, root mean square error (RMSE) = 0.31) for the test compound group, and this model rigorously adopted the Organization for Economic Co-operation and Development (OECD) axioms pertinent to QSAR/QSPR [177]. A train of processes adhering to the OECD tenets assure the coherence and reliability of the paradigm.

2.4.2. Pharmacokinetic Parameters (Absorption, Distribution, Biotransformation and Excretion)

Drug absorption is the mechanism by which medications get into the bloodstream from the administration location. Bioavailability is a critical pharmacokinetic attribute that embodies the quantum of absorption. Foretelling the bioavailability of a compound can facilitate the medicinal chemist to refine its absorption characteristics. A research group garnered a dataset containing 1014 compounds and utilized the MLR paradigm to presage bioavailability aided by structural fingerprints as well as molecular attributes [178]. Genetic function approximation method was utilized to determine the choice of molecular attributes employed for process training in an automated manner, and the outcomes provided a compelling predictive accomplishment, the correlation coefficient and RMSE being 0.71 and 0.2355, respectively.
Drug distribution is the method by which drug molecules move in blood to interstitial fluid, as well as intracellular fluid consequent to drug penetration [179]. The steady-state distribution (VDss) of a medication is the ratio between its dose in vivo to its steady-state plasma concentration (CPss). The VDss signifies the measure to which a medicinal molecule is disseminated in the tissue and happens to be a crucial parameter to appraise drug distribution. Foretelling VDss can facilitate medicinal chemists to implement structural alterations for superior pharmacokinetic characteristics. A research group amassed a dataset containing 1096 molecules and built-up Partial Least Squares (PLS) and Random Forest (RF) paradigms to foretell VDss [180]. The presage outcomes of their algorithm on the external test group were disappointing, as only about 50% of the molecules were within two-fold error. Ostensibly, it is problematic to presage VDss value purely from molecular architectural data as there happen to be multiple unidentified parameters which may influence VDss.
Following the administration of the drug into the body, it will initially encounter the metabolic process with resultant attrition of drug effects, or, in a few instances, origination of toxic metabolites. Foretelling the location of biotransformation with great precision can facilitate the structural refinement for assuring the metabolic endurance of the moiety. A colossal quantum of information associated with drug metabolism has been culled, and multiple ML schemes have been utilized to foretell the loci where molecules are bio-transformed by disparate metabolic enzymes, like cytochrome P450s (CYP450s), aldehyde oxidase, and Uridine 5′ diphosphoglucuronosyltransferases (UGTs). As an instance, on the basis of a neural network approach, “XenoSite” [181] can deliver the determination of the location of small compounds biotransformed by CYP450s with a gross accuracy of 87% [182]. Moreover, the “XenoSite”scheme also employs a neural network trained upon a vast database of UGT biotransformation to foretell UGT loci of the molecule biotransformation [183].
Drug excretion is the mechanism by which medications and their bio-transformed metabolites are disposed of from the system. Bio-transformed metabolites of medications are generally water-soluble and can be readily discarded from the body while some drugs can be directly disposed of without biotransformation [184]. Lombardo et al., employed the Principal Components Analysis (PCA) approach to foretell primary clearance pathway and the algorithm exhibited good discrimination outcomes among various approaches, with a predictive precision of 84% [185]. Depending upon the elimination process prediction paradigm, this group engaged the PLS algorithm to foretell the gross human clearance and the PLS paradigm worked satisfactorily and was comparable to animal scaling approaches.

2.4.3. Toxicity and the ADME/T Multi-Task Neural Network

In the course of development of fresh drugs, pre-clinical and clinical toxicity accounts for the reduction of roughly 33% of leading moieties [186]. Hence, foretelling the toxic effects of compounds is invaluable in facilitating the refinement of lead moieties and trimming the hazard of loss in the course of drug development. Conventionally, drug toxicity characteristics (e.g., hepatotoxicity and nephrotoxicity) are foretold by axiom-dependent expert knowledge and architectural flags, which appear to engender false positives and are incapable of broadly encapsulating all mandatory structural characteristics. Currently, owing to the capability of handling varied chemical entities and the virtue of extracting attributes in an automated fashion, the DL algorithms churn out compelling performance on toxicity presage. As an instance, based upon the “Molecular Graph Encoding-Convolutional Neural Networks (MGE–CNN)”, Xu et al. crafted an acute oral toxicity prediction paradigm, and the presage outcomes were superior to the hitherto documented approaches based on SVM [187]. In the MGE–CNN scheme, the molecular ciphering, attribute extrication and model building is executed by methods analogous to the neural networks training. Also, the MGE–CNN algorithmic scheme is quite adjustable since molecular fingerprints can be tailored as per the specific issues. A research group correlated the toxicological characteristics of fingerprints back to atomic levels and gathered several highlighted pieces that conform to structural flags characterized in the “ToxAlerts” [188]. Hence, due to analogy with Duvenaud’s model, this paradigm by Xu et al. is also explicable. Another group originated a multi-task DNN algorithm named “DeepTox” to foretell the toxic effects and the “DeepTox” system certainly outclassed numerous contenders in the Tox21 competition [163]. By accepting the identical criteria, the multi-task neural network algorithm was trained to foretell numerous disparate discrete assignments that are strikingly connected. In comparison to single-task neural network, the execution of multi-task neural network is customarily superior owing to sharing of the criteria of various assignments in facilitating the multi-task algorithm for ingraining added familiar attributes.
Pharmacokinetic processes (drug absorption, distribution, biotransformation, excretion) and drug toxicity in the human system have some congruity and the multitasking neural networks can enhance the predictive capability of such assignments. ADME/T experimental datasets of Vertex Pharmaceuticals have been utilized to match the capabilities of the single-task and multi-task neural networks, and their outcomes implied that multi-task algorithms would yield superior results as anticipated [189].

3. Machine Learning Schemes and Usable Algorithms for Drug Design Scenarios

The representation of molecules has been of interest to scientists since the nineteenth century. Traditionally, molecules are represented as structure diagrams with bonds and atoms, and this is likely the representation most people cognize when they contemplate about molecules. But, alternative representations are imperative for the computational processing of chemical structures in cheminformatics. The advent of computers led to the development of a wide array of machine-readable chemical representations. Computers permitted the rapid digital storage and querying of compounds and their structures, swift modifications of digital data, and augmented physical storage efficiency. Algorithms were implemented to visualize compounds as 2D depictions and the computational visualization of compounds in 3D space was popularized with the advent of specialized programs.
The lead optimization step of drug discovery is fundamentally a low-data problem. When biological studies provide proof that a particular molecule can modulate essential pathways to achieve therapeutic activity, the discovered molecule often fails as a potential drug for numerous reasons including toxicity, low activity, or low solubility. The central problem of small-molecule-based drug-discovery is to refine the candidate molecule by locating analogue molecules with enhanced pharmaceutical activity and reduced risks to the patient. Yet, with only a small amount of biological data available on the candidate and related molecules, it is challenging to form accurate predictions for novel compounds. Recent work has established that standard ML techniques such as random forests and simple deep-networks are capable of learning meaningful chemical information from only a few hundred compounds. Other recent advances in ML have demonstrated that in some circumstances, nontrivial predictors may be learned from only a few data points. These methods work by using related data to learn a meaningful distance metric over the space of possible inputs. This sophisticated metric is used to compare new data points to the limited available data and subsequently predict properties of these new data points. More broadly, these techniques are termed as “one-shot learning” methods. In ML, generalization usually refers to the ability of an algorithm to be effective across various inputs. It means that the ML model does not encounter performance degradation on the new inputs from the same distribution of the training data. Cross-validation (CV) is a technique for evaluating a ML model and testing its performance. CV is commonly used in applied ML tasks. It helps to compare and select an appropriate model for the specific predictive modeling problem. CV is easy to understand, easy to implement, and it tends to have a lower bias than other methods used to count the model’s efficiency scores. All this makes cross-validation a powerful tool for selecting the best model for the specific task. Despite the fact that DL models outclass various conventional ML algorithms, they still invoke many more parameters and unrelated architectures, which leads to several problems during training, specifically in the situations when the samples are inadequate or the feature matrix is meagre. This section of the review details the aforementioned drug design approaches utilizing ML algorithms. Many open-source execution platforms for AI-facilitated drug design paradigms have been outlined in Table 1.

3.1. Approaches for Molecular Depiction

Molecular fingerprints, numbers, ASCII strings, and graphs that depict the compounds may be utilized as the input attributes of ML algorithms for drug design. Such molecular fingerprints cipher the molecular features as a sequence of binary bits (“1” expressing that the molecular feature prevails, and “0” signifies that the molecular feature is nonexistent). In the arena of drug design, molecular fingerprints are continually employed to foretell compound characteristics and compute molecular resemblance since it is a straightforward and powerful approach to express the compounds. Currently, the molecular fingerprints often utilized as the neural network inputs are structure-based 2D molecular fingerprints, like the Molecular ACCess System (MACCS) [190], the Extended-Connectivity Fingerprint (ECFP) [191], the Functional Class Fingerprint (FCFP) and the Molprint2D [192]. As an instance, MACCS has been engaged in training an Adversarial AutoEncoder (AAE) algorithm to hunt for anti-neoplastic moieties [105].
Chemists have long employed 2D molecular graphs to depict molecular architectures and scrutinize molecular characteristics qualitatively. Strikingly, the progress of AI renders it feasible to compute this mechanism. CNN is a compelling engine to extricate characteristics from the molecular graph in an automated manner that can be utilized for engendering compound depiction in presaging of bioactivity [193], toxicity [187], physicochemical attributes [158] and protein-ligand affinity [113]. In comparison to ECFP, the graph-convolutional approaches have been more adaptable as the graph architecture can be tailored depending upon the assigned tests. Moreover, the graph-convolutional architecture is amenable to amalgamation with neural networks so as to foretell the molecular attributes, rendering the training mechanism, molecular attribute elicitation and model development accomplishment concurrently. The molecular graph CNN fingerprints comprise Duvenaud’s graph convolutional fingerprints grounded on atomic radiation technique [158], Kearnes’ graph convolutional fingerprints established upon atoms, bonds and pairwise interconnections [194], and Coley’s graph convolutional fingerprints established upon the molecular tensor. The fundamental tenet of Duvenaud’s graph convolutional fingerprints is akin to the ECFP fingerprints and both these progressively enhance molecular substructures by atomic radiation techniques. Notably, Duvenaud et al., first ciphered atomic attributes (e.g., valence, atomic identity, and number of hydrogens) and bond characteristics into vectors, and then utilized the atomic and bond attribute vectors to build the atomic neighbor attributes to originate the earliest molecular architecture vectors. CNN may be employed to elicit the characteristics from the aforementioned antecedent attribute vectors with individual repetition, and these quantities are then aggregated as the molecular fingerprints. The intrinsic atomic and bond attributes are expert-crafted instead of learning from the molecular graph through the AI process. The superiority of Duvenaud’s graph CNN rests in its strength to engender the molecular fingerprints satisfactorily for a prescribed assignment, and it is explicable as the molecular pieces associated with the distinct molecular characteristics that can be captured by backtracking via the neural network nodes. Such a scheme has been executed in the “DeepChem” toolbox and the outcomes of “MoleculeNet” benchmark assessments indicate that the graph CNN can comprehend fruitful molecular characteristics and it frequently yields superior results compared to other models [195]. Apart from CNN, the recursive neural networks are also amenable for molecular depiction employment. As an instance, Gregor Urban et al. advanced the inner and outer recursive neural networks for graph portrayal of the compound [156]. In comparison with Kearnes’ method, this approach commonly yields superior prediction outcomes on public data groups of the “MoleculeNet” benchmark assignments [195].
The string depictions of small compounds incorporate the Wiswesser Line-formula Notation (WLN) [196], SYBYL line notation (SLN) [197], SMILES [198] and the International Chemical Identifier (InChI) [199]. Out of them, SMILES is more extensively employed backed by multiple software algorithms (like ChemDraw, Cheopy, and RDKit) and databases (e.g., PubChem and ZINC). PubChem (NCBI) happens to be the world’s most comprehensive assortment of freely usable chemical information. It’s the database of chemical molecules and their actions in biological assays. ZINC database (UCSF) is a curated assortment of commercially handy chemical moieties prepared specifically for virtual screening. Recurrent Neural Networks (RNN) can be utilized to comprehend the coding grammar of SMILES [77], which may be transformed to the molecular graph. Moreover, SMILES can be directly employed as an input component of RNN in foretelling the molecular characteristics [200].
Molecular descriptors traditionally relate to the structural or physicochemical attributes of a compound, which can be accessed by molecular ciphering or via typical experiments [201]. The comprehensive characterization of these descriptors has been reviewed elsewhere [202]. The appropriate choice of the descriptors is crucial for ML, which can decrease the computational load, augment the model universalization capability and boost the conduct and characterizability of the algorithm [203]. The usual software to compute molecular descriptors comprises Dragon [204], Cheopy [205], PaDEL [206] and Cinfony [207].

3.2. Transfer Learning Engagement for Low Data

The DL schemes have demonstrated a healthy promise in drug design owing to the powerful data mining competence. But the DL approaches generally depend upon a high quantum of training data, that has limited its application customarily. As an instance, with only a restricted quantum of the activity data at one’s disposal, it is problematic to foretell the bioactivity of the fresh compounds since low data is unable to ensnare a sufficient chemical space. A transfer learning approach can be utilized in ironing out issues by taking advantage of extant knowledge acquired from other associated data repositories. It is known that human experts can apply already acquired knowledge to sort out new issues and the capability aids us in solving the vexing issue optimally. A recommendation of AI study is to mimic this capacity by a transfer learning approach [208]. The fundamental tenet of transfer learning is to utilize the knowledge probing from some former exercises to a pertinent target assignment with sparse training data. Moreover, “one-shot learning scheme” has been recommended which alludes to the DL approach that depends upon only a few training items. This is able to pass on information among pertinent, but unrelated assignments by learning a purposeful distance metric [209]. A research group evolved and advanced a one-shot learning scheme that melded the repetitive sophistication of long short-term neural networks engaging the graph CNN for low data training [12,210]. The model performs better than the RF and other techniques on the “Tox21” and “SIDER” dataset. But, when the toxicity data is engaged for training a scheme in foretelling a side effects datafile, it will fully fail as the congruity among the two datasets is quite feeble.

3.3. The Process of Cross-Validation

The cross-validation process is utilized to assess the conduct of the scheme and the traditional custom is the random-split cross-validation. But the random-split cross-validation approach is usually too buoyant for the evaluation of model predictive outcome as it undermines the covariate alterations in drug development via combining unrelated series’ data [211]. On the other hand, the paradigm of the time-split cross-validation was put forward where the datasets were apportioned into training and test groups depending upon the experimental time order of the data [212]. Sheridan et al. compared varied cross-validation algorithms employed to assess the conduct of the QSAR model and their outcomes indicated that the R2 value obtained by time-split cross validation scheme was more representative of the actual prospective predictive value [213]. Steered by this outcome, Ma et al., engaged the time-split cross-validation instead of traditional random-spilt cross-validation to appraise the conduct of the deep neural network (DNN) in mimicking the pragmatic hit-to-lead schema [8]. For all of these studies, experimental time is a crucial vital attribute, and time-split cross validation must be executed in drug discovery when the data of experimental time information is at one’s disposal.

3.4. What It Takes to Train the Deep Neural Networks

Despite the fact that DL models outclass various conventional ML algorithms, they still invoke many more parameters and unrelated architectures, which leads to several problems during training, specifically in the situations when the samples are inadequate or the feature matrix is meagre. The training process might only obtain a ‘local optimum’ and the accurateness is inadequately valid. For combating this issue, the unsupervised pre-training approach like “deep belief network” has been recommended to upgrade the parameter booting, and the outcomes hint that the approach has been extra efficacious in comparison with the random initial values [153]. A study hinted that the dropout blueprint could efficiently avert overfitting during the QSAR dataset training [8]. Furthermore, in comparison with the “sigmoid action function”, “Rectified Linear Unit (ReLU)” action function has added relevance in the context of the QSAR assignments due to its benefits in forestalling the ‘gradient disappear’ as well as ‘local optimum’.

3.5. The Accessible Drug Design AI Source Code

In the pharmaceutical sector, the business advantages of computer software driven drug design is proven. But a great many software originators are motivated to disseminate their programs on GitHub or various open-source repositories, to combine the AI algorithms with drug design approaches. Many open-source execution platforms for AI-facilitated drug design paradigms have been outlined in Table 1. Such open-source repositories will boost the pervasive operationalization of AI technologies in this arena.

4. Contribution of AI in the Lifecycle of Pharmaceutical Items

This section outlines AI solutions in pharmaceutical products’ lifecycle that could find numerous implementations, but none of the available market solutions cover them all. Natural Language Processing (NLP) allows document summarization, document generation, and Named Entity Recognition (NER) based on novel Bidirectional Encoder Representations from Transformers (BERT) and Generative Pre-trained Transformer (GPT). It can be used for Real World Evidence (RWE)-based trials, reports, and summaries generation. Random Forest (RF), Naive Bayes (NB), and Support Vector Machine (SVM), as well as other methods, could be used for a large amount of unstructured information analysis for new drug target identification. Deep Neural Networks (DNNs), Reinforcement Learning (RL), and Principal Component Analysis (PCA) are most useful for novel molecule generation in silico and their activity prediction. Drug repositioning and repurposing could be done with text mining, coupled with Feed-forward Neural Network (FNN). Generation of synthetic biology is based mostly on NLP implementations for RNA-based sequencing. Clinical trials utilize Real World Data (RWD) and RWE approaches with AI, NLP, and NER support. Image classification with Convolutional Neural Networks (CNNs) can automatically discover, generate, and learn features of images which are useful in pre-clinical and clinical trial results processing. Personalized therapy could be aided with Neural Network (NN) patient risk prediction and multiple factors analysis, including genetics. Drug dispensing control is based on Electronic Medical Record (EMR) analysis for counterindications and drug combination interactions. Additionally, ML and AI technologies could be used for monitoring and predicting epidemic outbreaks around the world to align pharmaceutical development.

4.1. AI in Promoting Pharmaceutical Product Advancement

The identification of a novel drug compound depends upon its consequent embodiment in a proper dosage formulation with preferred delivery attributes. From this aspect, AI can oust the earlier trial and error method [214]. Many computational techniques can iron out issues experienced in the formulation design aspect, like stability problems, porosity, dissolution, etc., utilizing Quantitative Structure Property Relationship (QSPR) [215]. Decision-support systems invoke rule-grounded algorithms to decide the class, attributes, and amount of the excipients banking upon the physicochemical features of the medication and act via a feedback system to supervise the whole mechanism and periodically adjust it [216].
An amalgamation of “Model Expert Systems (MES)” with ANN to engender a mixed approach for the advancement of direct-stuffing of hard gelatin piroxicam capsules in conformity with the stipulations of its dissolution parameters has been reported. The MES determines options and propositions for formulation advancement depending upon the input feed criteria. On the contrary, ANN employs backpropagation training to connect formulation criteria to the preferred feedback, managed with the control module in tandem, to assure convenient formulation advancement [214].
Multiple mathematical means, such as Computational Fluid Dynamics (CFD), Discrete Element Modeling (DEM), and the Finite Element Method (FEM) have been employed to probe the effect of the flow characteristics of the powder upon the die-stuffing and method of tablet compression [217,218]. In addition, CFD could be employed to examine the influence of tablet shape/size upon the dissolution parameters [219]. The amalgamation of such mathematical paradigms with AI may turn out to be of great benefit for the swift manufacture of pharmaceutical items.

4.2. Contribution of AI towards Manufacturing of Pharmaceutical Products

In view of the rising intricacies of production systems coupled with an incremental need for optimization and improved product standards, contemporary production approaches are attempting to transfer human know-how to machines, frequently transforming the production aspects [220]. The integration of AI in manufacturing systems can hold a plethora of advantages for the pharmaceutical sector. Aids like CFD engage “Reynolds-Averaged Navier-Stokes” solvers technique which probes the effect of agitation and stress grades in various equipage (like stirred tanks), harnessing the mechanization of a multitude of pharmaceutical processes. Identical processes, like ‘direct numerical simulations’ as well as ‘large eddy simulations’, employ cutting-edge schemes to iron out convoluted flow issues in production [217].
The innovative“Chemputer”system facilitates digital mechanization in the synthesis and production of molecules, unifying many chemical signatures and executing by utilizing a scripting software code named “Chemical Assembly (ChASM)” [221]. This has been utilized opportunely for the formation and production of diphenhydramine hydrochloride, rufinamide, and sildenafil, with the harvest and cleanness notably identical to hand-operated synthesis [151]. The predicted achievement of granulation in granulators of volumes varying between 25–600 L could be accomplished effectively by AI technical knowledge [222]. The technical knowledge and neuro-fuzzy logic links vital parameters to their output. This system formulated a polynomial relationship for the prognostication of the ratio of the granulation fluid to be poured, desired speed, as well as the impeller diameter parameters in both geometrically identical and non-identical granulators [223].
“Discrete Element Modeling (DEM)”is extensively employed in the pharmaceutical sector, such as in evaluating the partition of powders constituting a binary mixture, the fallout of altering blade speed and geometry, foretelling the probable route of the tablets for the encapsulation procedure, together with scrutiny of time expended by tablets in the spray section [217]. ANNs, coupled with fuzzy paradigms, examined the interrelationship among machine settings as well as the issue of capping to pare tablet capping on the production line [224].
AI capabilities such as meta-classifier and tablet-classifier could facilitate the management of the quality benchmark of the ultimate output, like pointing to a probable aberration in tablet production [225]. A patent has been applied for, establishing a process skillful in identifying the most exclusive amalgamation of drug and dosage schedule for individual patients, employing a processor culling patient data, and configures the preferred transdermal patch as required [226].

4.3. Role of AI in Managing and Ensuring Quality

Production of the preferred item from the raw goods requires a harmonization of multiple criteria [225]. Stringent quality checks on the items, as well as upkeep of batch-to-batch constancy, behooves hand-operated intervention. This might not be the ideal method in each instance, signifying the necessity for AI engagement during this time [217]. The FDA updated the “Current Good Manufacturing Practices (cGMP)” by suggesting a ‘Quality by Design’ scheme to comprehend the pivotal activity and explicit standards that regulate the ultimate nature of the pharmaceutical product [227]. A blend of human endeavor and AI have been utilized, wherein first-round data from manufacturing sets were scrutinized and decision trees originated. They were subsequently transliterated into axioms and explored by the operators to facilitate the manufacturing cycle afterwards [225]. A scientific document reviewed the dissolution characteristics, a barometer of batch-to-batch constancy of theophylline pellets with the help of ANN, that accurately presaged the dissolution of the examined formulation, the inaccuracy being< 8% [228].
AI can also be executed for the governance of in-line production schemes to accomplish the preferred product quality [227]. ANN-facilitated surveillance of the freeze-dehydrating approach is utilized, which implements a merger of self-adaptive evolvement together with local search as well as backpropagation algorithms. Such methods can be employed to foretell the temperature and desiccated-cake thickness at a later time point (t +Dt) for a specified group of operating characteristics, ultimately facilitating in imposing a vigil on the eventual product standards [229].
An automatic data entry algorithm, like an “Electronic Lab Notebook”, in concert with refined, resourceful mechanisms, can secure the quality guarantee of the produce [230]. In addition, data mining and multiple knowledge discovery methodologies in the “Total Quality Management (TQM)” expert process may be utilized as worthwhile avenues in arriving at convoluted judgments, crafting advanced technologies for astute quality management [231].

4.4. Role of AI Algorithms in Determining Clinical Trial Blueprints

Clinical trials are aimed at demonstrating the safety and efficacy of a medication in humans for a specific ailment and need 6–7 years together with a significant financial outlay. But only 10% molecules tested in such trials achieve fruitful approval, which is a gigantic failure for the industry [232]. These losses can arise due to incorrect patient choice, paucity of technical infrastructure, and poor facilities. But, with the colossal digital medical information accessible, these setbacks may be curtailed by utilizing AI [233].
The recruitment of patients consumes about 33% of the clinical trial duration. The fruition of a clinical trial may be facilitated by appropriate patient enrollment, which contrarily results in ~86% of non-fruition scenarios [234]. AI may help in choosing only a selective diseased populace for Phase II and III clinical trial enrollment by utilizing patient-pertinent genome-exposome feature scrutiny, which could facilitate advanced augury of the existing drug targets in the subjects chosen [59,233]. Preclinical scrutiny of molecules and also identifying lead compounds prior to the initiation of clinical trials by employing adjunct attributes of AI, like predictive ML and alternative inferencing algorithms, aid in the advanced forecasting of lead molecules which would make the cut in clinical trials in the chosen patient population [233].
Drop out of patients in clinical trials contributes to the non-fruition of about one-third of clinical trials, resulting in auxiliary enrolment needs for the culmination of the trial, with consequent improvidence of time and finances. Such issues can be obviated by tight surveillance of the patients and facilitating them in complying with the rightful protocol of the clinical trial [234]. Mobile software has been introduced by “AiCure” which checked usual medication use by schizophrenia patients in a Phase II trial, with consequent augmentation of the compliance frequency of patients by 25%, assuring a fruitful conclusion of the clinical trial [59].

4.5. Role of AI in Pharmaceutical Product Management

4.5.1. Role of AI in Market Positioning

Market alignment is the scheme of engendering a uniqueness of the marketed product to entice buyers to purchase it, making it a mandatory component in most business tactics for organizations to build their own novel niche [235,236]. This strategy was utilized for marketing of prime brand Viagra, in which the marketing firm targeted it not only for addressing men’s erectile impairment, but also for adjunctive issues influencing quality of life [237].
Utilizing technology coupled with e-commerce as a launchpad, it has become smoother for organizations to acquire an instinctive acclaim of their brand identity in the public sphere. Firms harness search engines among many available technological pulpits to take up an eminent place in online marketing and aid in the market alignment of the product, as also established by the “Internet Advertising Bureau”. Firms repeatedly endeavor to classify their websites better than those of competitor firms, providing identity to their brand in an abbreviated timeline [238].
Other approaches, like statistical assessment techniques, particle swarm optimization schemes (documented in 1995 by Eberhart and Kennedy) together with NNs, gave a superior opinion about markets. Such approaches can aid in selecting the marketing blueprint for the product attuned to precise consumer-demand prognostication [239].

4.5.2. Role of AI in Market Forecasting and Scrutiny

The prosperity of a firm rests upon the ongoing advancement and augmentation of its commercial interests. Even with outlay of massive funds, R&D harvest in the pharmaceutical sector is declining owing to the inability of firms to adapt to current marketing methodologies [240]. The evolution of digital technologies, named the “Fourth Industrial Revolution”, is facilitating novel digitalized marketing through a multimodal decision-making scheme, which obtains and evaluates statistical and mathematical data and executes human interpretations to enable AI-enabled decision-making paradigms hunt for fresh marketing prospects [241].
AI also facilitates a detailed scrutiny of the core needs of a product from a customer’s viewpoint and also in comprehending the requirement of the market, which helps in decision-making utilizing prediction models. This process is also capable of foretelling sales and evaluating the market. Software engaging AI employ consumers and engender knowledge among healthcare professionals by exhibiting commercials targeting them to the product section with one click [242]. Moreover, these approaches engage natural language-processing (NLP) algorithms to scrutinize keywords fed by buyers and link these to the possibility of buying the product [243,244].
Many businesses to business (B2B) firms have declared self-use platforms that permit free survey of health products, readily located by providing its specification, accept orders, as well as monitor their transportation logistics. Pharmaceutical organizations are also putting forward their online sites like “1 mg”, “Medline”, “Netmeds”, and “Ask Apollo”, to address the unfulfilled patient requirements [241]. Prognostication of the selling space is also imperative for many pharmaceutical trading firms, with the capability to execute AI in the field, in the manner of “Business intelligent Smart Sales Prediction Analysis”, which utilizes a merger of time series prediction and real-time utilization. This assists pharmaceutical firms to foretell the trade of products aforetime to forestall expenses of surplus buildup or avert buyer disadvantage due to shortfall [245].

4.5.3. Role of AI in Product Cost

Depending upon the market assessment and cost acquired in the advancement of the pharmaceutical goods, the organization decides the ultimate cost of the item. The crucial notion in implementing AI to resolve this cost is utilizing its prowess to simulate the cognition of a human specialist to evaluate the criteria that govern the valuation of a product following its production [245]. Issues, like financial outlay in the course of research and advancement of the medication, rigorous price control plans in the relevant country, period of the exclusivity duration, market stake of the improvised agent after a year prior to patent expiration, costing of the reference item, and price-determining statutes control the cost of branded as well as generic medications [246].
In ML, massive bodies of statistical data, like product advancement expenditure, product need in the market, itemization record expenses, manufacturing expenses, and competitors’ product cost, are evaluated using the algorithm, consequently evolving software for predicting the product cost in the aftermath. AI algorithms such as “In competitor”, floated by “Intelligence Node” (set up in 2012), is a total market competitive savvy package that scrutinizes the competitor costing information and aids market players and brands to govern the competition. “Wise Athena” and “Navetti PricePoint” facilitate the user to set the costing of their item, implying that pharmaceutical establishments can embrace the same to aid product pricing [247].

4.6. A Snapshot of AI-Based Advanced Implementations

4.6.1. Drug Delivery Technologies Engaging AI-Grounded Nanorobots

Nanorobots are composed of primarily integrated circuits, power source, sensors, as well as a protected auxiliary data alternative, which are abetted by computational know-how, like AI [248,249]. Such algorithms are trained to avert the encounter, target determination, identification and association, and ultimately purging out from the body. Advancements in nano/microrobots endow such contraptions with the capability to cruise to the focused locus depending upon physiological circumstances, like pH, hence bettering the efficacy and curbing adverse actions on body systems [249]. Evolution of body-fixable nanorobots initiated for controlled distribution of medicaments and genes behooves review of criteria like dose tailoring, continued drug transmission, and modulated release, as well as the discharge of the drugs needing mechanization managed by AI algorithms, like NNs, integrators, and fuzzy logic [250]. Body-fixable microchips are employed for programmed drug transmission and to identify the position of the implant within the body.

4.6.2. Role of AI in Concerted Drug Delivery and Augury of Synergism/Antagonism

Several drug combinations have been authorized and offered for sale to counter complex afflictions, like TB and cancer, since they are capable of furnishing a synergistic action for swift improvement [251,252]. The choice of appropriate and promising medications for combination needs high-throughput scrutiny of a sizable quantum of medications, leading to a labor-intensive mechanism; for instance, cancer treatment needs six or seven medicinal agents for combination chemotherapy. ANNs, network-dependent modeling, and logistic regression could enable screening drug combos and upgrade general dose schedules [251,253]. Rashid et al., proposed a ‘quadratic phenotype optimization scheme (QPOS)’which identifies efficacious combination treatment for the management of bortezomib-resistant multiple myeloma utilizing a selection of 114 FDA-authorized agents. This paradigm endorsed the pairing of mitomycin C (MitoC), with decitabine (Dec) as the leading two-agent combo and MitoC, mechlorethamine, with Dec as the preferred three-agent combo [252].
Drug administration in combination may be more effective if assisted by information on the synergism or antagonism of drugs transmitted concomitantly. The “Regulator Inference Algorithm utilizes ‘Master regulator genes’ to competently foretell 56% of synergistic action. Alternative approaches, like Network-based Laplacian regularized Least Square Synergistic (NLLSS) drug combination, as well as ‘random forest (RF)’, may also be utilized for the purpose [253].
Li et al. advanced a synergistic drug assortment paradigm utilizing RF for the augury of synergistic anticancer drug combos. This exemplar was engendered grounded upon gene expression attributes and many networks, so that the researchers could effectively foretell 28 synergistic anticancer combos. They have documented three such assortments, even if the rest could also ultimately turn out to be critical [66]. Furthermore, an ML implementation scheme, termed the Combination Synergy Estimation, is capable of presaging promising synergistic antimalarial drug assortments from a library group of 1540 antimalarial drug molecules [254].

4.6.3. The Materialization of AI in Nanomedicine

Nanomedicines utilize nanotechnology and medications for the diagnosis, treatment, and surveillance of convoluted afflictions, like malaria, cancer, HIV, many inflammatory maladies, and asthma. Of late, nanoparticle-modulated drug delivery has assumed dominance in the arena of therapeutics and diagnostics as they have improved therapeutic effectiveness [252,255]. A merger of nanotechnology with AI may afford answers to various issues in formulation advancement [256].
A nanosuspension of methotrexate has been algorithmically methodized by examining the energy emanating from the admixing of the drug molecules, examining the factors that could favor the formulation clumping [215]. ‘Coarse-grained simulation’, in concert with chemical estimation, can assist the interactive evaluation of drug-dendrimer and appraisal of drug encapsulation inside the dendrimer. Furthermore, software such as LAMMPS and GROMACS4 could be utilized to probe the punch of surface chemistry on the cellular uptake of nanoparticles [215].
AI enabled the formation of silicasomes, which is a blend of internalizing arginine-glycine-aspartic acid sequences (iRGD), a tumor-penetrating peptide, and multifunctional mesoporous silica nanoparticles charged with irinotecan. This internalization of silicasomes may be enhanced three- or four-fold as iRGD promotes silicasometranscytosis, with enhanced treatment results and favorable long-term survival [255].

5. The Market Potential of AI Applications for Drug Discovery and Development

To curtail the fiscal expenses and possibility of losses which are associated with Virtual Screening (VS), pharmaceutical enterprises are switching over to AI applications. The AI market witnessed an upsurge from USD 200–700 million between 2015–2018, and this is anticipated to rise by 2024 to USD 5 billion [257]. A 40% estimated surge from 2017–2024 implies that AI will possibly refashion the medical and pharmaceutical arenas. Many pharmaceutical firms have devoted or/and are maintaining financial commitment in AI and in addition cooperated with AI providers for engendering indispensable healthcare paraphernalia. Cooperation between DeepMind Technologies, a branch of Google, and the Royal Free London NHS Foundation Trust for the abatement of acute kidney injury, has been an exemplary instance. Primary pharmaceutical firms and AI vendors have been specified in Table 2 [258].

6. Continuing Bottlenecks in Accepting AI: Hints on Methods to Conquer

In spite of rapid advancements in AI and ML algorithm technologies implemented in the pharmaceutical industry, there persist numerous threats regarding the implementation and assimilation of these technologies into the drug discovery process specifically and the pharmaceutical industry in general.
One problem is sloppy data integration. This issue arises from diversity existing between datasets, which may constitute raw data, processed data, metadata, or candidate data. Such datasets should be accumulated and collated for effective analysis, but presently, there exists no validated method of doing so. This is imperative prior to initiation of the drug discovery process, as without appropriately formatted data, the output of the ML algorithms will be imprecise. More efficient methods for integrating available data into data banks before the drug discovery process is initiated are therefore necessary.
A separate recognized issue is occupational and skillset immobility: many people presently engaged in the pharmaceutical sector lack the mandatory skillsets or the qualifications required for operating AI systems. A good number of the workforce are proficient in data science, while others in molecular chemistry and biology, though few are experts in both domains, with the optimum amalgamation of skills to engage AI from a pharmaceutical context. An awareness of the underlying chemistry is imperative for the origination of relevant algorithms, and vice versa.
Each firm applies their own proprietary AI algorithms which are unavailable in the public domain. Thus, there is skepticism about ML and AI in the pharmaceutical industry stemming out from a deficient comprehension about the methodology of algorithms, termed as the “black box” phenomenon, and agnosticism for the results generated. Those who are skeptical may be hesitant to engage the data originating from AI and ML, squandering both time and money, and impeding the forward progression of the industry with regards to efficiency.
The absolute triumph of AI banks upon the accessibility of a massive volume of data as such data are utilized during consequent training subjected to the algorithm. Availability of data from numerous database vendors can inflict additive costs to a firm, and the data must also be dependable and excellent quality to assure precise outcome forecasting. Further bottlenecks that hinder full-blown acceptability of AI in the pharmaceutical sector comprise the dearth of trained manpower to implement AI-based systems, restricted financial resource base for small establishments, worries of substituting humans with subsequent job losses, lack of confidence in the data churned out by AI, as well as the “black box effect” (i.e., the mechanisms contributing to the compiled outcomes which are generated as a result of the AI algorithm) [18].
Mechanization of many steps in drug advancement, production, and supply networks, clinical trials, and trading will occur over time, but all such activities get incorporated in the umbrella of ‘narrow AI’; where AI has to be schooled utilizing a massive amount of data and, hence, makes it appropriate for a specific assignment. Hence, human mediation is compulsory for the effective application, advancement, and execution of the AI algorithm. But the apprehension of retrenchment could be a delusion considering that AI is recently assuming iterative tasks, while sparing liberty for human intellect to be utilized for advancing more convoluted judgements and ingenuity.
Notwithstanding, AI has been accepted by numerous pharmaceutical organizations, and it is anticipated that earnings of about USD 2.2 billion will be realized by 2022 via AI-grounded fixes in the pharmaceutical arena, with a financing in excess of USD 7.20 billion embracing 300+ pacts during 2013–2018 by the pharmaceutical business [259]. Pharmaceutical businesses require transparency regarding the promise of AI algorithms in innovating troubleshooting fixes to complications once it has been applied, in concert with comprehending the justifiable standards that can be accomplished. Talented data scientists, software engineers equipped with a solid understanding of AI system tools, and a transparent comprehension of the objectives and R&D focus of business models will enable the advances and engagement that the AI platform promises.

7. Conclusions and Future Promise

The progress of AI, together with its impressive tools, regularly aiming to curtail bottlenecks encountered by pharmaceutical organizations, affecting the drug advancement pipelines in concert with the long-term lifecycle of the merchandise, may justify the spurt in the quantum of start-ups in this arena [260]. The present healthcare arena is experiencing numerous contorted threats, like the rising prices of medications and treatments, and society requires definitive, noteworthy action in such fields. Consequent upon the incorporation of AI in the production of pharmaceutical goods, personalized medications with the apt dose, release attributes, and varied needed facets may be produced in accordance with individual patient demand [234]. Utilizing the current AI-aided algorithms should abbreviate the duration necessary for the goods to reach the market, as well as also enhance the quality of goods and the comprehensive security of the manufacturing scheme, and lead to augmented usage of accessible resources together with being cost-efficient through underscoring and advancing the criticality of mechanization [261].
The most serious apprehension concerning the inclusion of these platforms is the job cuts that are anticipated to emerge and the rigorous practices mandated towards application of AI. But, these technologies are proposed in order to render the task effortless and not to totally oust humans [262]. Apart from facilitating swift and issue-free hit compound determination, AI may also furnish recommendations of synthesis pathways of these agents together with the augury of the preferred chemical architecture and a comprehension of drug-target associations and the pertinent SAR.
AI is also capable of proposing dominant inputs to the subsequent inclusion of the originated drug in its pertinent dosage form and its refinement, in concert with facilitating swift decision-making, culminating in rapid output of enhanced-quality goods together with promise of batch-to-batch dependability. AI may in addition add to instituting the safety and effectiveness of the agents in clinical trials, coupled with assuring optimum alignment and pricing in the market via extensive market scrutiny and forecasting. Regardless of the truth that there are no drugs presently on the market originated with AI-enabled schemes, distinct challenges prevail with regard to the application of this technology, it is possible that AI will mature into a precious tool in the pharmaceutical sector in the imminent future.

Author Contributions

Conceptualization, B.D. and C.S.; resources, B.D. and C.S.; data curation, D.D. and B.D.; writing-original draft preparation, B.D. and C.S.; writing-review and editing, C.S., B.D., V.S.R., J.B.W., A.N., I.T., N.M.L., D.D., M.B. and H.T.S.; visualization, B.D., C.S. and D.D.; supervision, B.D. and C.S. All authors have read and agreed to the published version of the manuscript.


This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.


  1. DiMasi, J.A.; Grabowski, H.G.; Hansen, R.W. Innovation in the pharmaceutical industry: New estimates of R&D costs. J. Health Econ. 2016, 47, 20–33. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  2. Turner, J.R. New Drug Development; Springer: New York, NY, USA, 2010. [Google Scholar]
  3. Hassan Baig, M.; Ahmad, K.; Roy, S.; Mohammad Ashraf, J.; Adil, M.; Haris Siddiqui, M.; Khan, S.; Amjad Kamal, M.; Provazník, I.; Choi, I. Computer aided drug design: Success and limitations. Curr. Pharm. Des. 2015, 22, 572–581. [Google Scholar] [CrossRef] [PubMed]
  4. Mason, J.S. Introduction to the volume and overview of computer assisted drug design in the drug discovery process. In Comprehensive Medicinal Chemistry II; Taylor, J.B., Triggle, D.J., Eds.; Elsevier: Amsterdam, The Netherlands, 2007; pp. 1–11. [Google Scholar]
  5. Swaminathan, K.; Meller, J. Artificial Intelligence Approaches for Rational Drug Design and Discovery. Curr. Pharm. Des. 2007, 13, 1497–1508. [Google Scholar] [CrossRef] [Green Version]
  6. Lavecchia, A.; Di Giovanni, C. Virtual screening strategies in drug discovery: A critical review. Curr. Med. Chem. 2013, 20, 2839–2860. [Google Scholar] [CrossRef] [PubMed]
  7. Melville, J.L.; Burke, E.K.; Hirst, J.D. Machine Learning in Virtual Screening. Comb. Chem. High Throughput Screen. 2009, 12, 332–343. [Google Scholar] [CrossRef]
  8. Ma, J.; Sheridan, R.P.; Liaw, A.; Dahl, G.E.; Svetnik, V. Deep neural nets as a method for quantitative structure-activity relationships. J. Chem. Inf. Model. 2015, 55, 263–274. [Google Scholar] [CrossRef]
  9. Angermueller, C.; Pärnamaa, T.; Parts, L.; Stegle, O. Deep learning for computational biology. Mol. Syst. Biol. 2016, 12, 878. [Google Scholar] [CrossRef]
  10. LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
  11. Schmidhuber, J. Deep Learning in Neural Networks: An Overview. Neural Netw. 2015, 61, 85–117. [Google Scholar] [CrossRef] [Green Version]
  12. Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; The MIT Press: Cambridge, UK, 2016. [Google Scholar]
  13. Ramesh, A.; Kambhampati, C.; Monson, J.R.; Drew, P.J. Artificial intelligence in medicine. Ann. R. Coll. Surg. Engl. 2004, 86, 334–338. [Google Scholar] [CrossRef]
  14. Miles, J.; Walker, A. The potential application of artificial intelligence in transport. IEEE Proc. Intell. Transp. Syst. 2006, 153, 183–198. [Google Scholar] [CrossRef]
  15. Yang, Y.; Siau, K. A Qualitative Research on Marketing and Sales in the Artificial Intelligence Age. In Proceedings of the Midwest United States Association for Information Systems(MWAIS), St. Louis, MO, USA, 17–18 May 2018. [Google Scholar]
  16. Wirtz, B.W.; Weyerer, J.C.; Geyer, C. Artificial Intelligence and the Public Sector—Applications and Challenges. Int. J. Public Adm. 2018, 42, 596–615. [Google Scholar] [CrossRef]
  17. Smith, R.G.; Farquhar, A. The road ahead for knowledge management: An AI perspective. AI Mag. 2000, 21, 17. [Google Scholar]
  18. Lamberti, M.J.; Wilkinson, M.; Donzanti, B.A.; Wohlhieter, G.E.; Parikh, S.; Wilkins, R.G.; Getz, K. A Study on the Application and Use of Artificial Intelligence to Support Drug Development. Clin. Ther. 2019, 41, 1414–1426. [Google Scholar] [CrossRef] [Green Version]
  19. Beneke, F.; Mackenrodt, M.-O. Artificial intelligence and collusion. IIC Int. Rev. Intellect. Prop. Compet. Law 2019, 50, 109–134. [Google Scholar] [CrossRef] [Green Version]
  20. Steels, L.; Brooks, R. The Artificial Life Route to Artificial Intelligence: Building Embodied, Situated Agents; Routledge: London, UK, 2018. [Google Scholar]
  21. Bielecki, A.; Bielecki, A. Foundations of artificial neural networks. In Models of Neurons and Perceptrons: Selected Problems and Challenges; Janusz, K., Ed.; Springer International Publishing: Cham, Switzerland, 2019; pp. 15–28. [Google Scholar]
  22. Kalyane, D.; Sanap, G.; Paul, D.; Shenoy, S.; Anup, N.; Polaka, S.; Tambe, V.; Tekade, R.K. Artificial intelligence in the pharmaceutical sector: Current scene and future prospect. In The Future of Pharmaceutical Product Development and Research; Academic Press: Cambridge, MA, USA, 2020; pp. 73–107. [Google Scholar] [CrossRef]
  23. da Silva, I.N.; Spatti, D.H.; Flauzino, R.A.; Liboni, L.H.B.; Alves, S.F.D.R. Artificial Neural Network Architectures and Training Processes. In Artificial Neural Networks; Springer: Cham, Switzerland, 2016; pp. 21–28. [Google Scholar] [CrossRef]
  24. Medsker, L.; Jain, L.C. Recurrent Neural Networks: Design and Applications; CRC Press: Boca Raton, FL, USA, 1999. [Google Scholar]
  25. Hanggi, M.; Moschytz, G.S. Cellular Neural Networks: Analysis, Design and Optimization; Springer Science & Business Media: Berlin, Germany, 2000. [Google Scholar]
  26. Rouse, M. IBM Watson Supercomputer. 2017. Available online: (accessed on 13 October 2020).
  27. Vyas, M.; Thakur, S.; Riyaz, B.; Bansal, K.K.; Tomar, B.; Mishra, V. Artificial intelligence: The beginning of a new era in pharmacy profession. Asian J. Pharm. 2018, 12, 72–76. [Google Scholar]
  28. Spencer, M.; Eickholt, J.; Cheng, J. A Deep Learning Network Approach to ab initio Protein Secondary Structure Prediction. IEEE/ACM Trans. Comput. Biol. Bioinform. 2014, 12, 103–112. [Google Scholar] [CrossRef] [Green Version]
  29. Li, H.; Hou, J.; Adhikari, B.; Lyu, Q.; Cheng, J. Deep learning methods for protein torsion angle prediction. BMC Bioinform. 2017, 18, 417. [Google Scholar] [CrossRef] [Green Version]
  30. Wang, S.; Sun, S.; Li, Z.; Zhang, R.; Xu, J. Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model. PLoS Comput. Biol. 2017, 13, e1005324. [Google Scholar] [CrossRef] [Green Version]
  31. Schaarschmidt, J.; Monastyrskyy, B.; Kryshtafovych, A.; Bonvin, A.M. Assessment of contact predictions in CASP12: Co-evolution and deep learning coming of age. Proteins Struct. Funct. Bioinform. 2017, 86, 51–66. [Google Scholar] [CrossRef]
  32. Falchi, F.; Caporuscio, F.; Recanatini, M. Structure-based design of small-molecule protein–protein interaction modulators: The story so far. Futur. Med. Chem. 2014, 6, 343–357. [Google Scholar] [CrossRef] [PubMed]
  33. Scott, D.E.; Bayly, A.R.; Abell, C.; Skidmore, J. Small molecules, big targets: Drug discovery faces the protein-protein interaction challenge. Nat. Rev. Drug Discov. 2016, 15, 533–550. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  34. Szklarczyk, D.; Franceschini, A.; Wyder, S.; Forslund, K.; Heller, D.; Huerta-Cepas, J.; Simonovic, M.; Roth, A.; Santos, A.; Tsafou, K.P.; et al. STRING v10: Protein-Protein Interaction Networks, Integrated Over the Tree of Life. Nucleic Acids Res. 2015, 43, D447–D452. [Google Scholar] [CrossRef] [PubMed]
  35. Cukuroglu, E.; Engin, H.B.; Gursoy, A.; Keskin, O. Hot spots in protein–protein interfaces: Towards drug discovery. Prog. Biophys. Mol. Biol. 2014, 116, 165–173. [Google Scholar] [CrossRef] [PubMed]
  36. Higueruelo, A.P.; Jubb, H.; Blundell, T.L. Protein–protein interactions as druggable targets: Recent technological advances. Curr. Opin. Pharmacol. 2013, 13, 791–796. [Google Scholar] [CrossRef]
  37. Santos, R.; Ursu, O.; Gaulton, A.; Bento, A.P.; Donadi, R.S.; Bologa, C.G.; Karlsson, A.; Al-Lazikani, B.; Hersey, A.; Oprea, T.I.; et al. A comprehensive map of molecular drug targets. Nat. Rev. Drug Discov. 2017, 16, 19–34. [Google Scholar] [CrossRef]
  38. Labbé, C.M.; Kuenemann, M.A.; Zarzycka, B.; Vriend, G.; Nicolaes, G.A.F.; Lagorce, D.; Miteva, M.A.; Villoutreix, B.O.; Sperandio, O. iPPI-DB: An online database of modulators of protein-protein interactions. Nucleic Acids Res. 2016, 44, D542–D547. [Google Scholar] [CrossRef]
  39. Shin, W.-H.; Christoffer, C.W.; Kihara, D. In silico structure-based approaches to discover protein-protein interaction-targeting drugs. Methods 2017, 131, 22–32. [Google Scholar] [CrossRef]
  40. Valkov, E.; Sharpe, T.; Marsh, M.; Greive, S.; Hyvonen, M. Targeting protein-proteininteractions and fragment-based drug discovery. Top. Curr. Chem. 2012, 317, 145–179. [Google Scholar]
  41. Wang, J.; Luo, C.; Shan, C.; You, Q.; Lu, J.; Elf, S.; Zhou, Y.; Wen, Y.; Vinkenborg, J.L.; Fan, J.; et al. Inhibition of human copper trafficking by a small molecule significantly attenuates cancer cell proliferation. Nat. Chem. 2015, 7, 968–979. [Google Scholar] [CrossRef] [Green Version]
  42. Xue, L.C.; Dobbs, D.; Bonvin, A.M.; Honavar, V. Computational prediction of protein interfaces: A review of data driven methods. FEBS Lett. 2015, 589, 3516–3526. [Google Scholar] [CrossRef]
  43. Zhang, Q.C.; Petrey, D.; Norel, R.; Honig, B.H. Protein interface conservation across structure space. Proc. Natl. Acad. Sci. USA 2010, 107, 10896–10901. [Google Scholar] [CrossRef] [Green Version]
  44. Maheshwari, S.; Brylinski, M. Template-based identification of protein–protein interfaces using eFindSitePPI. Methods 2016, 93, 64–71. [Google Scholar] [CrossRef]
  45. Chen, R.; Li, L.; Weng, Z. ZDOCK: An initial-stage protein-docking algorithm. Proteins Struct. Funct. Bioinform. 2003, 52, 80–87. [Google Scholar] [CrossRef]
  46. Schneidman-Duhovny, D.; Inbar, Y.; Nussinov, R.; Wolfson, H.J. PatchDock andSymmDock: Servers for rigid and symmetric docking. Nucleic Acids Res. 2005, 33, W363–W367. [Google Scholar] [CrossRef] [Green Version]
  47. Vakser, I.A. Protein-Protein Docking: From Interaction to Interactome. Biophys. J. 2014, 107, 1785–1793. [Google Scholar] [CrossRef] [Green Version]
  48. Du, T.; Liao, L.; Wu, C.H.; Sun, B. Prediction of residue-residue contact matrix forprotein-protein interaction with Fisher score features and deep learning. Methods 2016, 110, 97–105. [Google Scholar] [CrossRef]
  49. Bai, F.; Morcos, F.; Cheng, R.R.; Jiang, H.; Onuchic, J.N. Elucidating the druggable interface of protein−protein interactions using fragment docking and coevolutionary analysis. Proc. Natl. Acad. Sci. USA 2016, 113, E8051–E8058. [Google Scholar] [CrossRef] [Green Version]
  50. Wan, F.; Zeng, J. Deep learning with feature embedding for compound– protein interaction prediction. bioRxiv 2016, 086033. [Google Scholar]
  51. AlQuraishi, M. End-to-End Differentiable Learning of Protein Structure. Cell Syst. 2019, 8, 292–301.e3. [Google Scholar] [CrossRef]
  52. Hutson, M. AI protein-folding algorithms solve structures faster than ever. Nature 2019. [Google Scholar] [CrossRef] [PubMed]
  53. Avdagic, Z.; Purisevic, E.; Omanovic, S.; Coralic, Z. Artificial Intelligence in Prediction of Secondary Protein Structure Using CB513 Database. Summit Transl. Bioinform. 2009, 2009, 1–5. [Google Scholar] [PubMed]
  54. Tian, K.; Shao, M.; Wang, Y.; Guan, J.; Zhou, S. Boosting compound-protein interaction prediction by deep learning. Methods 2016, 110, 64–72. [Google Scholar] [CrossRef] [PubMed]
  55. Wang, F.; Liu, D.; Wang, H.; Luo, C.; Zheng, M.; Liu, H.; Zhu, W.; Luo, X.; Zhang, J.; Jiang, H. Computational Screening for Active Compounds Targeting Protein Sequences: Methodology and Experimental Validation. J. Chem. Inf. Model. 2011, 51, 2821–2828. [Google Scholar] [CrossRef] [PubMed]
  56. Yu, H.; Chen, J.; Xu, X.; Li, Y.; Zhao, H.; Fang, Y.; Li, X.; Zhou, W.; Wang, W.; Wang, Y. A systematic prediction of multiple drug–target interactions from chemical, genomic, and pharmacological data. PLoS ONE 2012, 7, e37608. [Google Scholar] [CrossRef]
  57. Xiao, X.; Min, J.L.; Lin, W.Z.; Liu, Z.; Cheng, X.; Chou, K.C. iDrug-Target: Predicting the interactions between drug compounds and target proteins in cellular networking via benchmark dataset optimization approach. J. Biomol. Struct. Dyn. 2015, 33, 2221–2233. [Google Scholar] [CrossRef]
  58. Mak, K.-K.; Pichika, M.R. Artificial intelligence in drug development: Present status and future prospects. Drug Discov. Today 2018, 24, 773–780. [Google Scholar] [CrossRef]
  59. Persidis, A. The benefits of drug repositioning. Drug Discov. World 2011, 12, 9–12. [Google Scholar]
  60. Koromina, M.; Pandi, M.T.; Patrinos, G.P. Rethinking drug repositioning anddevelopment with artificial intelligence, machine learning, and omics. Omics 2019, 23, 539–548. [Google Scholar] [CrossRef]
  61. Park, K. A review of computational drug repurposing. Transl. Clin. Pharmacol. 2019, 27, 59–63. [Google Scholar] [CrossRef] [Green Version]
  62. Zeng, X.; Zhu, S.; Lu, W.; Liu, Z.; Huang, J.; Zhou, Y.; Fang, J.; Huang, Y.; Guo, H.; Li, L.; et al. Target identification among known drugs by deep learning from heterogeneous networks. Chem. Sci. 2020, 11, 1775–1797. [Google Scholar] [CrossRef] [Green Version]
  63. Achenbach, J.; Tiikkainen, P.; Franke, L.; Proschak, E. Computational tools for polypharmacology and repurposing. Futur. Med. Chem. 2011, 3, 961–968. [Google Scholar] [CrossRef]
  64. Ke, Y.-Y.; Peng, T.-T.; Yeh, T.-K.; Huang, W.-Z.; Chang, S.-E.; Wu, S.-H.; Hung, H.-C.; Hsu, T.-A.; Lee, S.-J.; Song, J.-S.; et al. Artificial intelligence approach fighting COVID-19 with repurposing drugs. Biomed. J. 2020, 43, 355–362. [Google Scholar] [CrossRef]
  65. Li, X.; Xu, Y.; Cui, H.; Huang, T.; Wang, D.; Lian, B.; Li, W.; Qin, G.; Chen, L.; Xie, L. Prediction of synergistic anti-cancer drug combinations based on drug target network and drug induced gene expression profiles. Artif. Intell. Med. 2017, 83, 35–43. [Google Scholar] [CrossRef]
  66. Reddy, A.S.; Zhang, S. Polypharmacology: Drug discovery for the future. Expert Rev. Clin. Pharmacol. 2013, 6, 41–47. [Google Scholar] [CrossRef] [Green Version]
  67. Li, Z.; Li, X.; Liu, X.; Fu, Z.; Xiong, Z.; Wu, X.; Tan, X.; Zhao, J.; Zhong, F.; Wan, X.; et al. KinomeX: A web application for predicting kinome-wide polypharmacology effect of small molecules. Bioinformatics 2019, 35, 5354–5356. [Google Scholar] [CrossRef]
  68. Cyclica Launches Ligand ExpressTM, a Disruptive Cloud–Based Platform to Revolutionize Drug Discovery. Business Wire, 30 November 2017.
  69. Hessler, G.; Baringhaus, K.-H. Artificial Intelligence in Drug Design. Molecules 2018, 23, 2520. [Google Scholar] [CrossRef] [Green Version]
  70. Corey, E.J.; Wipke, W.T. Computer-Assisted Design of Complex Organic Syntheses. Science 1969, 166, 178–192. [Google Scholar] [CrossRef]
  71. Grzybowski, B.A.; Szymkuć, S.; Gajewska, E.P.; Molga, K.; Dittwald, P.; Wołos, A.; Klucznik, T. Chematica: A Story of Computer Code That Started to Think like a Chemist. Chem 2018, 4, 390–398. [Google Scholar] [CrossRef]
  72. Klucznik, T.; Mikulak-Klucznik, B.; McCormack, M.P.; Lima, H.; Szymkuć, S.; Bhowmick, M.; Molga, K.; Zhou, Y.; Rickershauser, L.; Gajewska, E.P.; et al. Efficient Syntheses of Diverse, Medicinally Relevant Targets Planned by Computer and Executed in the Laboratory. Chem 2018, 4, 522–532. [Google Scholar] [CrossRef] [Green Version]
  73. Segler, M.H.S.; Preuss, M.; Waller, M.P. Planning chemical syntheses with deep neural networks and symbolic AI. Nature 2018, 555, 604–610. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  74. Chan, H.C.S.; Shan, H.; Dahoun, T.; Vogel, H.; Yuan, S. Advancing Drug Discovery via Artificial Intelligence. Trends Pharmacol. Sci. 2019, 40, 592–604. [Google Scholar] [CrossRef] [PubMed]
  75. Putin, E.; Asadulaev, A.; Ivanenkov, Y.; Aladinskiy, V.; Sanchez-Lengeling, B.; Aspuru-Guzik, A.; Zhavoronkov, A. Reinforced adversarial neural computer for de novo molecular design. J. Chem. Inform. Modeling 2018, 58, 1194–1204. [Google Scholar] [CrossRef] [PubMed]
  76. Segler, M.H.S.; Kogej, T.; Tyrchan, C.; Waller, M.P. Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks. ACS Cent. Sci. 2017, 4, 120–131. [Google Scholar] [CrossRef] [Green Version]
  77. Popova, M.; Isayev, O.; Tropsha, A. Deep reinforcement learning for de novo drug design. Sci. Adv. 2018, 4, eaap7885. [Google Scholar] [CrossRef] [Green Version]
  78. Merk, D.; Friedrich, L.; Grisoni, F.; Schneider, G. De Novo Design of Bioactive Small Molecules by Artificial Intelligence. Mol. Inform. 2018, 37, 1700153. [Google Scholar] [CrossRef] [Green Version]
  79. Schneider, G.; Clark, D.E. Automated de novo drug design: Are we nearly there yet? Angew. Chem. 2019, 131, 10906–10917. [Google Scholar] [CrossRef]
  80. Ashburn, T.T.; Thor, K.B. Drug repositioning: Identifying and developing new uses for existing drugs. Nat. Rev. Drug Discov. 2004, 3, 673–683. [Google Scholar] [CrossRef]
  81. Shahreza, M.L.; Ghadiri, N.; Mousavi, S.R.; Varshosaz, J.; Green, J. A review of network-based approaches to drug repositioning. Briefings Bioinform. 2017, 19, 878–892. [Google Scholar] [CrossRef]
  82. Klaeger, S.; Heinzlmeir, S.; Wilhelm, M.; Polzer, H.; Vick, B.; Koenig, P.-A.; Reinecke, M.; Ruprecht, B.; Petzoldt, S.; Meng, C.; et al. The target landscape of clinical kinase drugs. Science 2017, 358, eaan4368. [Google Scholar] [CrossRef] [Green Version]
  83. Cabreiro, F.; Au, C.; Leung, K.-Y.; Vergara-Irigaray, N.; Cochemé, H.M.; Noori, T.; Weinkove, D.; Schuster, E.; Greene, N.D.; Gems, D. Metformin Retards Aging in C. elegans by Altering Microbial Folate and Methionine Metabolism. Cell 2013, 153, 228–239. [Google Scholar] [CrossRef] [Green Version]
  84. De Haes, W.; Frooninckx, L.; Van Assche, R.; Smolders, A.; Depuydt, G.; Billen, J.; Braeckman, B.P.; Schoofs, L.; Temmerman, L. Metformin promotes lifespan through mitohormesis via the peroxiredoxin PRDX-2. Proc. Natl. Acad. Sci. USA 2014, 111, E2501–E2509. [Google Scholar] [CrossRef] [Green Version]
  85. Martin-Montalvo, A.; Mercken, E.M.; Mitchell, S.J.; Palacios, H.H.; Mote, P.L.; Scheibye-Knudsen, M.; Gomes, A.P.; Ward, T.M.; Minor, R.K.; Blouin, M.-J.; et al. Metformin improves healthspan and lifespan in mice. Nat. Commun. 2013, 4, 2192. [Google Scholar] [CrossRef]
  86. Yamanishi, Y.; Araki, M.; Gutteridge, A.; Honda, W.; Kanehisa, M. Prediction of drug-target interaction networks from the integration of chemical and genomic spaces. Bioinformatics 2008, 24, i232–i240. [Google Scholar] [CrossRef] [Green Version]
  87. Luo, Y.; Zhao, X.; Zhou, J.; Yang, J.; Zhang, Y.; Kuang, W.; Peng, J.; Chen, L.; Zeng, J. A network integration approach for drug-target interaction prediction and computational drug repositioning from heterogeneous information. Nat. Commun. 2017, 8, 573. [Google Scholar] [CrossRef] [Green Version]
  88. Kim, K.H.; Kim, N.D.; Seong, B.L. Pharmacophore-based virtual screening: A review of recent applications. Expert Opin. Drug Discov. 2010, 5, 205–222. [Google Scholar] [CrossRef]
  89. Willett, P. Similarity-based virtual screening using 2D fingerprints. Drug Discov. Today 2006, 11, 1046–1053. [Google Scholar] [CrossRef] [Green Version]
  90. Leelananda, S.P.; Lindert, S. Computational methods in drug discovery. Beilstein J. Org. Chem. 2016, 12, 2694–2718. [Google Scholar] [CrossRef] [Green Version]
  91. Chen, Y.C. Beware of docking! Trends Pharmacol. Sci. 2015, 36, 78–95. [Google Scholar] [CrossRef]
  92. Talele, T.; Khedkar, S.; Rigby, A. Successful applications of computer aided drug discovery: Moving drugs from concept to the clinic. Curr. Top. Med. Chem. 2010, 10, 127–141. [Google Scholar] [CrossRef] [PubMed]
  93. Huang, S.Y.; Zou, X. Inclusion of solvation and entropy in the knowledge-based scoringfunction for protein-ligand interactions. J. Chem. Inf. Model. 2010, 50, 262–273. [Google Scholar] [CrossRef] [PubMed]
  94. Copeland, R.A. The dynamics of drug-target interactions: Drug-target residence time and its impact on efficacy and safety. Expert Opin. Drug Discov. 2010, 5, 305–310. [Google Scholar] [CrossRef] [PubMed]
  95. Xing, J.; Lu, W.; Liu, R.; Wang, Y.; Xie, Y.; Zhang, H.; Shi, Z.; Jiang, H.; Liu, Y.-C.; Chen, K.; et al. Machine-Learning-Assisted Approach for Discovering Novel Inhibitors Targeting Bromodomain-Containing Protein 4. J. Chem. Inf. Model. 2017, 57, 1677–1690. [Google Scholar] [CrossRef] [PubMed]
  96. Liew, C.Y.; Ma, X.H.; Liu, X.; Yap, C.W. SVM Model for Virtual Screening of Lck Inhibitors. J. Chem. Inf. Model. 2009, 49, 877–885. [Google Scholar] [CrossRef]
  97. Ma, X.; Jia, J.; Zhu, F.; Xue, Y.; Li, Z.; Chen, Y. Comparative analysis of machine learning methods in ligand-based virtual screening of large compound libraries. Comb. Chem. High Throughput Screen. 2009, 12, 344–357. [Google Scholar] [CrossRef]
  98. Unterthiner, T.; Mayr, A.; Klambauer, G.; Steijaert, M.; Ceulemans, H.; Wegner, J.K.; Hochreiter, S. Deep learning as an opportunity in virtual screening. In Proceedings of the The Workshop on Deep Learning & Representation Learning, Montreal, QC, Canada, 12 December 2014. [Google Scholar]
  99. Shoemaker, R.H. The NCI60 human tumour cell line anticancer drug screen. Nat. Rev. Cancer 2006, 6, 813–823. [Google Scholar] [CrossRef]
  100. Kadurin, A.; Aliper, A.; Kazennov, A.; Mamoshina, P.; Vanhaelen, Q.; Khrabrov, K.; Zhavoronkov, A. The cornucopia of meaningful leads: Applying deep adversarial autoencoders for new molecule development in oncology. Oncotarget 2016, 8, 10883–10890. [Google Scholar] [CrossRef] [Green Version]
  101. Huang, S.-Y.; Grinter, S.Z.; Zou, X. Scoring functions and their evaluation methods for protein–ligand docking: Recent advances and future directions. Phys. Chem. Chem. Phys. 2010, 12, 12899–12908. [Google Scholar] [CrossRef]
  102. Khamis, M.A.; Gomaa, W.; Ahmed, W.F. Machine learning in computational docking. Artif. Intell. Med. 2015, 63, 135–152. [Google Scholar] [CrossRef]
  103. Ain, Q.U.; Aleksandrova, A.; Roessler, F.D.; Ballester, P.J. Machine-learning scoringfunctions to improve structure-based binding affinity prediction and virtual screening. WIREs Comput. Mol. Sci. 2015, 5, 405–424. [Google Scholar] [CrossRef]
  104. Kinnings, S.L.; Liu, N.; Tonge, P.J.; Jackson, R.M.; Xie, L.; Bourne, P.E. A Machine Learning-Based Method To Improve Docking Scoring Functions and Its Application to Drug Repurposing. J. Chem. Inf. Model. 2011, 51, 408–419. [Google Scholar] [CrossRef] [Green Version]
  105. Zsoldos, Z.; Reid, D.; Simon, A.; Sadjad, S.B.; Johnson, A.P. eHiTS: A new fast, exhaustive flexible ligand docking system. J. Mol. Graph. Model. 2007, 26, 198–212. [Google Scholar] [CrossRef]
  106. Wang, C.; Zhang, Y. Improving scoring-docking-screening powers of protein-ligand scoring functions using random forest. J. Comput. Chem. 2016, 38, 169–177. [Google Scholar] [CrossRef]
  107. Repasky, M.P.; Shelley, M.; Friesner, R.A. Flexible Ligand Docking with Glide; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 2007. [Google Scholar]
  108. Jimenez, J.; Skalic, M.; Martinez-Rosell, G.; De Fabritiis, G. KDEEP: Protein-ligand absolute binding affinity prediction via 3D convolutional neural networks. J. Chem. Inf. Model. 2018, 58, 287–296. [Google Scholar] [CrossRef]
  109. Abagyan, R.; Totrov, M.; Kuznetsov, D. ICM?A new method for protein modeling and design: Applications to docking and structure prediction from the distorted native conformation. J. Comput. Chem. 1994, 15, 488–506. [Google Scholar] [CrossRef]
  110. Friesner, R.A.; Banks, J.L.; Murphy, R.B.; Halgren, T.A.; Klicic, J.J.; Mainz, D.T.; Repasky, M.P.; Knoll, E.H.; Shelley, M.; Perry, J.K.; et al. Glide: A New Approach for Rapid, Accurate Docking and Scoring. 1. Method and Assessment of Docking Accuracy. J. Med. Chem. 2004, 47, 1739–1749. [Google Scholar] [CrossRef]
  111. Pereira, J.C.; Caffarena, E.R.; Dos Santos, C.N. Boosting docking-based virtualscreening with deep learning. J. Chem. Inf. Model. 2016, 56, 2495. [Google Scholar] [CrossRef] [Green Version]
  112. Esposito, E.X.; Hopfinger, A.J.; Madura, J.D. Methods for Applying the Quantitative Structure-Activity Relationship Paradigm. Artif. Intell. Med. 2004, 275, 131–213. [Google Scholar] [CrossRef]
  113. Myint, K.Z.; Xie, X.-Q. Recent Advances in Fragment-Based QSAR and Multi-Dimensional QSAR Methods. Int. J. Mol. Sci. 2010, 11, 3846–3866. [Google Scholar] [CrossRef]
  114. Hansch, C.; Fujita, T. Additions and corrections -ρ-σ-π analysis. A method for the correlation of biological activity and chemical structure. J. Am. Chem. Soc. 1964, 86, 5710. [Google Scholar] [CrossRef]
  115. Free, S.M.; Wilson, J.W. A Mathematical Contribution to Structure-Activity Studies. J. Med. Chem. 1964, 7, 395–399. [Google Scholar] [CrossRef] [PubMed]
  116. Dobchev, D.A.; Pillai, G.; Karelson, M. In Silico Machine Learning Methods in Drug Development. Curr. Top. Med. Chem. 2014, 14, 1913–1922. [Google Scholar] [CrossRef] [PubMed]
  117. Arodz, T.; Galvez, J. Computational Methods in Developing Quantitative Structure-Activity Relationships (QSAR): A Review. Comb. Chem. High Throughput Screen. 2006, 9, 213–228. [Google Scholar] [CrossRef]
  118. Ning, X.; Karypis, G. In silico structure-activity-relationship (SAR) models from machine learning: A review. Drug Dev. Res. 2010, 72, 138–146. [Google Scholar] [CrossRef]
  119. Dahl, G.E.; Jaitly, N.; Salakhutdinov, R. Multi-task neural networks for QSAR predictions. arXiv 2014, arXiv:1406.1231v1. [Google Scholar]
  120. Ramsundar, B.; Liu, B.; Wu, Z.; Verras, A.; Tudor, M.; Sheridan, R.P.; Pande, V. Is multitask deep learning practical for pharma? J. Chem. Inf. Model. 2017, 57, 2068–2076. [Google Scholar] [CrossRef]
  121. Subramanian, G.; Ramsundar, B.; Pande, V.; Denny, R.A. Computational Modeling of β-Secretase 1 (BACE-1) Inhibitors Using Ligand Based Approaches. J. Chem. Inf. Model. 2016, 56, 1936–1949. [Google Scholar] [CrossRef]
  122. Hartenfeller, M.; Schneider, G. De novo drug design. Methods Mol. Biol. 2011, 672, 299–323. [Google Scholar]
  123. Schneider, G.; Funatsu, K.; Okuno, Y.; Winkler, D. De novo Drug Design—Ye olde Scoring Problem Revisited. Mol. Inform. 2017, 36, 1681031. [Google Scholar] [CrossRef] [Green Version]
  124. Mullard, A. The drug-maker’s guide to the galaxy. Nature 2017, 549, 445–447. [Google Scholar] [CrossRef] [Green Version]
  125. Olivecrona, M.; Blaschke, T.; Engkvist, O.; Chen, H. Molecular de-novo designthrough deep reinforcement learning. J. Cheminform. 2017, 9, 48. [Google Scholar] [CrossRef] [Green Version]
  126. Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef]
  127. Gómez-Bombarelli, R.; Wei, J.N.; Duvenaud, D.K.; Hernandez-Lobato, J.M.; Sánchez-Lengeling, B.; Sheberla, D.; Aguilera-Iparraguirre, J.; Hirzel, T.D.; Adams, R.P.; Aspuru-Guzik, A. Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules. ACS Cent. Sci. 2018, 4, 268–276. [Google Scholar] [CrossRef]
  128. Pu, Y.; Wang, W.; Henao, R.; Chen, L.; Gan, Z.; Li, C.; Carin, L. Adversarial symmetric variational autoencoder. arXiv 2017, arXiv:1711.04915v2. [Google Scholar]
  129. Kadurin, A.; Nikolenko, S.; Khrabrov, K.; Aliper, A.; Zhavoronkov, A. druGAN: An advanced generative adversarial autoencoder model for de novo generation of new moleculeswith desired molecular properties in silico. Mol. Pharm. 2017, 14, 3098–3104. [Google Scholar] [CrossRef]
  130. Coley, C.W.; Barzilay, R.; Green, W.H.; Jaakkola, T.S.; Jensen, K.F. Convolutional Embedding of Attributed Molecular Graphs for Physical Property Prediction. J. Chem. Inf. Model. 2017, 57, 1757–1772. [Google Scholar] [CrossRef]
  131. Andras, P. High-Dimensional Function Approximation With Neural Networks for Large Volumes of Data. IEEE Trans. Neural Netw. Learn. Syst. 2017, 29, 500–508. [Google Scholar] [CrossRef]
  132. Coley, C.W.; Green, W.H.; Jensen, K.F. Machine Learning in Computer-Aided Synthesis Planning. Accounts Chem. Res. 2018, 51, 1281–1289. [Google Scholar] [CrossRef]
  133. Maryasin, B.; Marquetand, P.; Maulide, N. Machine learning for organic synthesis: Are robots replacing chemists ? Angew. Chem. Int. Ed. 2018, 57, 6978–6980. [Google Scholar] [CrossRef]
  134. Santos, C.B.R.; Lobato, C.C.; Braga, F.S.; Morais, S.S.S.; Santos, C.F.; Fernandes, C.P.; Brasil, D.S.B.; Hage-Melim, L.I.S.; Macêdo, W.J.C.; Carvalho, J.C.T. Application of Hartree-Fock Method for Modeling of Bioactive Molecules Using SAR and QSPR. Comput. Mol. Biosci. 2014, 4, 1–24. [Google Scholar] [CrossRef] [Green Version]
  135. Collins, K.D.; Glorius, F. A robustness screen for the rapid assessment of chemical reactions. Nat. Chem. 2013, 5, 597–601. [Google Scholar] [CrossRef] [PubMed]
  136. Browne, C.B.; Powley, E.; Whitehouse, D.; Lucas, S.M.; Cowling, P.I.; Rohlfshagen, P.; Tavener, S.; Perez, D.; Samothrakis, S.; Colton, S. A Survey of Monte Carlo Tree Search Methods. IEEE Trans. Comput. Intell. AI Games 2012, 4, 1–43. [Google Scholar] [CrossRef] [Green Version]
  137. Kayala, M.A.; Azencott, C.A.; Chen, J.H.; Baldi, P. Learning to predict chemicalreactions. J. Chem. Inf. Model. 2011, 51, 2209–2222. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  138. Cook, A.; Johnson, A.P.; Law, J.; Mirzazadeh, M.; Ravitz, O.; Simon, A. Computer-aided synthesis design: 40 years on. WIREs Comput. Mol. Sci. 2011, 2, 79–107. [Google Scholar] [CrossRef]
  139. Segler, M.H.S.; Waller, M.P. Neural-Symbolic Machine Learning for Retrosynthesis and Reaction Prediction. Chem. A Eur. J. 2017, 23, 5966–5971. [Google Scholar] [CrossRef]
  140. Silver, D.; Huang, A.; Maddison, C.J.; Guez, A.; Sifre, L.; van den Driessche, G.; Schrittwieser, J.; Antonoglou, I.; Panneershelvam, V.; Lanctot, M.; et al. Mastering the game of Go with deep neural networks and tree search. Nature 2016, 529, 484–489. [Google Scholar] [CrossRef]
  141. Zhou, Z.; Li, X.; Zare, R.N. Optimizing Chemical Reactions with Deep Reinforcement Learning. ACS Cent. Sci. 2017, 3, 1337–1344. [Google Scholar] [CrossRef] [Green Version]
  142. Coley, C.W.; Barzilay, R.; Jaakkola, T.S.; Green, W.H.; Jensen, K.F. Prediction of Organic Reaction Outcomes Using Machine Learning. ACS Cent. Sci. 2017, 3, 434–443. [Google Scholar] [CrossRef] [Green Version]
  143. Monemian, S.A.; Shahsavan, H.; Bolouri, O.; Taranejoo, S.; Goodarzi, V.; Torabi-Angaji, M. A stacked neural network approach for yield prediction of propylene polymerization. J. Appl. Polym. Sci. 2010, 116, 1237–1246. [Google Scholar] [CrossRef]
  144. Rahman, M.B.A.; Chaibakhsh, N.; Basri, M.; Salleh, A.B.; Rahman, R.N.Z.R.A. Application of Artificial Neural Network for Yield Prediction of Lipase-Catalyzed Synthesis of Dioctyl Adipate. Appl. Biochem. Biotechnol. 2009, 158, 722–735. [Google Scholar] [CrossRef]
  145. Ahneman, D.T.; Estrada, J.G.; Lin, S.; Dreher, S.D.; Doyle, A.G. Predicting reaction performance in C–N cross-coupling using machine learning. Science 2018, 360, 186–190. [Google Scholar] [CrossRef] [Green Version]
  146. Merrifield, R.B. Automated Synthesis of Peptides. Science 1965, 150, 178–185. [Google Scholar] [CrossRef]
  147. Alvarado-Urbina, G.; Sathe, G.M.; Liu, W.-C.; Gillen, M.F.; Duck, P.D.; Bender, R.; Ogilvie, K.K. Automated Synthesis of Gene Fragments. Science 1981, 214, 270–274. [Google Scholar] [CrossRef]
  148. Karp, P.D. Pathway Databases: A Case Study in Computational Symbolic Theories. Science 2001, 293, 2040–2044. [Google Scholar] [CrossRef] [Green Version]
  149. Steiner, S.; Wolf, J.; Glatzel, S.; Andreou, A.; Granda, J.M.; Keenan, G.; Hinkley, T.; Aragon-Camarasa, G.; Kitson, P.J.; Angelone, D.; et al. Organic synthesis in a modular robotic system driven by a chemical programming language. Science 2019, 363, eaav2211. [Google Scholar] [CrossRef] [Green Version]
  150. Fuhrman, J.A.; Schwalbach, M.S.; Stingl, U. Proteorhodopsins: An array of physiological roles? Nat. Rev. Microbiol. 2008, 6, 488–494. [Google Scholar] [CrossRef]
  151. Senior, A.W.; Evans, R.; Jumper, J.; Kirkpatrick, J.; Sifre, L.; Green, T.; Qin, C.; Žídek, A.; Nelson, A.W.R.; Bridgland, A.; et al. Improved protein structure prediction using potentials from deep learning. Nature 2020, 577, 706–710. [Google Scholar] [CrossRef]
  152. Zhu, H. Big Data and Artificial Intelligence Modeling for Drug Discovery. Annu. Rev. Pharmacol. Toxicol. 2020, 60, 573–589. [Google Scholar] [CrossRef] [Green Version]
  153. Ghasemi, F.; Mehridehnavi, A.; Fassihi, A.; Pérez-Sánchez, H. Deep neural network in biological activity prediction using deep belief network. Appl. Soft Comput. 2018, 62, 251–258. [Google Scholar] [CrossRef]
  154. Mayr, A.; Klambauer, G.; Unterthiner, T.; Hochreiter, S. DeepTox: Toxicity Prediction using Deep Learning. Front. Environ. Sci. 2016, 3, 80. [Google Scholar] [CrossRef] [Green Version]
  155. Stork, C.; Chen, Y.; Šícho, M.; Kirchmair, J. Hit Dexter 2.0: Machine-Learning Models for the Prediction of Frequent Hitters. J. Chem. Inf. Model. 2019, 59, 1030–1043. [Google Scholar] [CrossRef] [PubMed]
  156. Urban, G.; Subrahmanya, N.; Baldi, P. Inner and Outer Recursive Neural Networks for Chemoinformatics Applications. J. Chem. Inf. Model. 2018, 58, 207–211. [Google Scholar] [CrossRef] [PubMed]
  157. Jin, W.; Barzilay, R.; Jaakkola, T. Junction tree variational autoencoder for molecular graph generation. In Proceedings of the 35th International Conference on Machine Learning, PMLR, Stockholmsmässan, Stockholm, Sweden, 10–15 July 2018; pp. 2323–2332. [Google Scholar]
  158. Duvenaud, D.K.; Maclaurin, D.; Iparraguirre, J.; Gomez-Bombarelli, R.; Hirzel, T.; Aspuru-Guzik, A.; Adams, R.P. Convolutional networks on graphs for learning molecular fingerprints. In Advances in Neural Information Processing Systems; Cortes, C., Lawrence, N., Lee, D., Sugiyama, M., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2015; pp. 1–9. [Google Scholar]
  159. Durrant, J.D.; McCammon, J.A. NNScore 2.0: A Neural-Network Receptor–Ligand Scoring Function. J. Chem. Inf. Model. 2011, 51, 2897–2903. [Google Scholar] [CrossRef] [PubMed]
  160. Wójcikowski, M.; Zielenkiewicz, P.; Siedlecki, P. Open Drug Discovery Toolkit (ODDT): A new open-source player in the drug discovery field. J. Cheminform. 2015, 7, 26. [Google Scholar] [CrossRef] [PubMed]
  161. Sanchez-Lengeling, B.; Outeiral, C.; Guimaraes, G.L.; Aspuru-Guzik, A. Optimizing distributions over molecular space. An objective-reinforced generative adversarial network for inverse-design chemistry (ORGANIC). ChemRxiv 2017, 1–18. [Google Scholar]
  162. Feinberg, E.N.; Sur, D.; Wu, Z.; Husic, B.E.; Mai, H.; Li, Y.; Sun, S.; Yang, J.; Ramsundar, B.; Pande, V.S. PotentialNet for molecular property prediction. ACS Cent. Sci. 2018, 4, 1520–1530. [Google Scholar] [CrossRef]
  163. Awale, M.; Reymond, J.-L. Polypharmacology browser PPB2: Target prediction combining nearest neighbors with machine learning. J. Chem. Inf. Model. 2019, 59, 10–17. [Google Scholar] [CrossRef]
  164. Cho, A. No room for error. Science 2020, 369, 130–133. [Google Scholar] [CrossRef]
  165. Blaschke, T.; Arús-Pous, J.; Chen, H.; Margreitter, C.; Tyrchan, C.; Engkvist, O.; Papadopoulos, K.; Patronov, A. REINVENT 2.0: An AI Tool for De Novo Drug Design. J. Chem. Inf. Model. 2020, 60, 5918–5922. [Google Scholar] [CrossRef]
  166. Coley, C.W.; Rogers, L.; Green, W.H.; Jensen, K.F. SCScore: Synthetic Complexity Learned from a Reaction Corpus. J. Chem. Inf. Model. 2018, 58, 252–261. [Google Scholar] [CrossRef]
  167. Yasuo, N.; Sekijima, M. Improved Method of Structure-Based Virtual Screening via Interaction-Energy-Based Learning. J. Chem. Inf. Model. 2019, 59, 1050–1061. [Google Scholar] [CrossRef] [Green Version]
  168. Caramelli, D.; Salley, D.; Henson, A.; Camarasa, G.A.; Sharabi, S.; Keenan, G.; Cronin, L. Networking chemical robots for reaction multitasking. Nat. Commun. 2018, 9, 3406. [Google Scholar] [CrossRef]
  169. Coomans, D.; Jonckheer, M.; Massart, D.; Broeckaert, I.; Blockx, P. The application of linear discriminant analysis in the diagnosis of thyroid diseases. Anal. Chim. Acta 1978, 103, 409–415. [Google Scholar] [CrossRef]
  170. Granda, J.M.; Donina, L.; Dragone, V.; Long, D.-L.; Cronin, L. Controlling an organic synthesis robot with machine learning to search for new reactivity. Nature 2018, 559, 377–381. [Google Scholar] [CrossRef] [Green Version]
  171. Perera, D.; Tucker, J.W.; Brahmbhatt, S.; Helal, C.J.; Chong, A.; Farrell, W.; Richardson, P.; Sach, N.W. A platform for automated nanomole-scale reaction screening and micromole-scale synthesis in flow. Science 2018, 359, 429–434. [Google Scholar] [CrossRef] [Green Version]
  172. Bajusz, D.; Rácz, A.; Héberger, K. Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations? J. Cheminform. 2015, 7, 20. [Google Scholar] [CrossRef] [Green Version]
  173. Lusci, A.; Pollastri, G.; Baldi, P. Deep Architectures and Deep Learning in Chemoinformatics: The Prediction of Aqueous Solubility for Drug-Like Molecules. J. Chem. Inf. Model. 2013, 53, 1563–1575. [Google Scholar] [CrossRef]
  174. Artursson, P.; Karlsson, J. Correlation between oral drug absorption in humans and apparent drug permeability coefficients in human intestinal epithelial (Caco-2) cells. Biochem. Biophys. Res. Commun. 1991, 175, 880–885. [Google Scholar] [CrossRef]
  175. Hubatsch, I.; Ragnarsson, E.G.E.; Artursson, P. Determination of drug permeability and prediction of drug absorption in Caco-2 monolayers. Nat. Protoc. 2007, 2, 2111–2119. [Google Scholar] [CrossRef]
  176. Selvaraj, C.; Chandra, I.; Singh, S.K. Artificial intelligence and machine learning approaches for drug design: Challenges and opportunities for the pharmaceutical industries. Mol. Divers. 2021, 26, 1893–1913. [Google Scholar] [CrossRef] [PubMed]
  177. OECD. Guidance document on the validation of (quantitative) structure-activity relationships [(Q) SAR] models. In OECD Series on Testing and Assessment No. 69; ENV/JM/MONO; OECD: Paris, France, 2007; Volume 2, p. 154. [Google Scholar]
  178. Tian, S.; Li, Y.; Wang, J.; Zhang, J.; Hou, T. ADME Evaluation in Drug Discovery. 9. Prediction of Oral Bioavailability in Humans Based on Molecular Properties and Structural Fingerprints. Mol. Pharm. 2011, 8, 841–851. [Google Scholar] [CrossRef] [PubMed]
  179. Sim, D.S.M. Drug Distribution; Springer International Publishing: Cham, Switzerland, 2015. [Google Scholar]
  180. Lombardo, F.; Jing, Y. In Silico Prediction of Volume of Distribution in Humans. Extensive Data Set and the Exploration of Linear and Nonlinear Methods Coupled with Molecular Interaction Fields Descriptors. J. Chem. Inf. Model. 2016, 56, 2042–2052. [Google Scholar] [CrossRef] [PubMed]
  181. Matlock, M.; Hughes, T.B.; Swamidass, S.J. XenoSite server: A web-available site of metabolism prediction tool. Bioinformatics 2014, 31, 1136–1137. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  182. Zaretzki, J.; Matlock, M.; Swamidass, S.J. XenoSite: Accurately Predicting CYP-Mediated Sites of Metabolism with Neural Networks. J. Chem. Inf. Model. 2013, 53, 3373–3383. [Google Scholar] [CrossRef] [PubMed]
  183. Le Dang, N.; Hughes, T.B.; Krishnamurthy, V.; Swamidass, S.J. A simple model predicts UGT-mediated metabolism. Bioinformatics 2016, 32, 3183–3189. [Google Scholar] [CrossRef] [Green Version]
  184. Sim, D.S.M. Drug elimination. In Pharmacological Basis of Acute Care; Chan, Y., Ng, K., Sim, D., Eds.; Springer: Cham, Switzerland, 2015; pp. 37–47. [Google Scholar]
  185. Lombardo, F.; Obach, R.S.; Varma, M.V.; Stringer, R.; Berellini, G. Clearance Mechanism Assignment and Total Clearance Prediction in Human Based upon in Silico Models. J. Med. Chem. 2014, 57, 4397–4405. [Google Scholar] [CrossRef]
  186. Guengerich, F.P. Mechanisms of Drug Toxicity and Relevance to Pharmaceutical Development. Drug Metab. Pharmacokinet. 2011, 26, 3–14. [Google Scholar] [CrossRef]
  187. Xu, Y.; Pei, J.; Lai, L. Deep Learning Based Regression and Multiclass Models for Acute Oral Toxicity Prediction with Automatic Chemical Feature Extraction. J. Chem. Inf. Model. 2017, 57, 2672–2685. [Google Scholar] [CrossRef]
  188. Sushko, I.; Salmina, E.; Potemkin, V.A.; Poda, G.; Tetko, I.V. ToxAlerts: A Web Server of Structural Alerts for Toxic Chemicals and Compounds with Potential Adverse Reactions. J. Chem. Inf. Model. 2012, 52, 2310–2316. [Google Scholar] [CrossRef]
  189. Kearnes, S.; Goldman, B.; Pande, V. Modeling industrial ADMET data with multitask networks. arXiv 2016, arXiv:1606.08793v3. [Google Scholar]
  190. Durant, J.L.; Leland, B.A.; Henry, D.R.; Nourse, J.G. Reoptimization of MDL Keys for Use in Drug Discovery. ChemInform 2003, 34. [Google Scholar] [CrossRef] [Green Version]
  191. Rogers, D.; Hahn, M. Extended-Connectivity Fingerprints. J. Chem. Inf. Model. 2010, 50, 742–754. [Google Scholar] [CrossRef]
  192. Bender, A.; Mussa, A.H.Y.; Glen, R.C.; Reiling, S. Similarity Searching of Chemical Databases Using Atom Environment Descriptors (MOLPRINT 2D): Evaluation of Performance. J. Chem. Inf. Comput. Sci. 2004, 44, 1708–1718. [Google Scholar] [CrossRef]
  193. Wallach, I.; Dzamba, M.; Heifets, A. AtomNet: A deep convolutional neural network for bioactivity prediction in structure based drug discovery. Math. Z. 2015, 47, 34–46. [Google Scholar]
  194. Kearnes, S.; McCloskey, K.; Berndl, M.; Pande, V.; Riley, P. Molecular graph convolutions: Moving beyond fingerprints. J. Comput. Mol. Des. 2016, 30, 595–608. [Google Scholar] [CrossRef] [Green Version]
  195. Wu, Z.; Ramsundar, B.; Feinberg, E.N.; Gomes, J.; Geniesse, C.; Pappu, A.S.; Leswing, K.; Pande, V. MoleculeNet: A benchmark for molecular machine learning. Chem. Sci. 2017, 9, 513–530. [Google Scholar] [CrossRef] [Green Version]
  196. Smith, E.G.; Wiswesser, W.J. The Wiswesser Line-Formula Chemical Notation; McGraw-Hill: New York, NY, USA, 1975. [Google Scholar]
  197. Ash, S.; Cline, M.A.; Homer, R.W.; Hurst, T.; Smith, G.B. ChemInform Abstract: SYBYL Line Notation (SLN): A Versatile Language for Chemical Structure Representation. ChemInform 2010, 28. [Google Scholar] [CrossRef]
  198. Weininger, D. SMILES, a chemical Language and Information System 1. Introduction to Methodology and Encoding Rules. J. Chem. Inf. Comput. Sci. 1988, 28, 31–36. [Google Scholar]
  199. Heller, S.R.; McNaught, A.; Pletnev, I.V.; Stein, S.; Tchekhovskoi, D. InChI, the IUPAC International Chemical Identifier. J. Cheminform. 2015, 7, 1–34. [Google Scholar] [CrossRef] [Green Version]
  200. Goh, G.B.; Hodas, N.O.; Siegel, C.; Vishnu, A. SMILES2Vec: An Interpretable General-Purpose Deep Neural Network for Predicting Chemical Properties. arXiv 2017, arXiv:1712.02034v2. [Google Scholar]
  201. Todeschini, R.; Consonni, V. Molecular Descriptors for Chemoinformatics; Wiley-VCH: Weinheim, Germany, 2009. [Google Scholar]
  202. Sahoo, S.; Adhikari, C.; Kuanar, M.; Mishra, B. A Short Review of the Generation of Molecular Descriptors and Their Applications in Quantitative Structure Property/Activity Relationships. Curr. Comput. Aided-Drug Des. 2016, 12, 181–205. [Google Scholar] [CrossRef] [PubMed]
  203. Danishuddin; Khan, A.U. Descriptors and their selection methods in QSAR analysis: Paradigm for drug design. Drug Discov. Today 2016, 21, 1291–1302. [Google Scholar] [CrossRef] [PubMed]
  204. Mauri, A.; Consonni, V.; Pavan, M.; Todeschini, R. DRAGON software: An easy approach to molecular descriptor calculations. Match Commun. Math Comput. Chem. 2006, 56, 237–248. [Google Scholar]
  205. Cao, D.-S.; Xu, Q.-S.; Hu, Q.-N.; Liang, Y.-Z. ChemoPy: Freely available python package for computational biology and chemoinformatics. Bioinformatics 2013, 29, 1092–1094. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  206. Yap, C.W. PaDEL-descriptor: An open source software to calculate molecular descriptors and fingerprints. J. Comput. Chem. 2011, 32, 1466–1474. [Google Scholar] [CrossRef] [PubMed]
  207. O’Boyle, N.M.; Hutchison, G.R. Cinfony—Combining Open Source cheminformatics toolkits behind a common interface. Chem. Central J. 2008, 2, 24. [Google Scholar] [CrossRef] [Green Version]
  208. Pan, S.J.; Yang, Q. A Survey on Transfer Learning. IEEE Trans. Knowl. Data Eng. 2010, 22, 1345–1359. [Google Scholar] [CrossRef]
  209. Vinyals, O.; Blundell, C.; Lillicrap, T.; Kavukcuoglu, K.; Wierstra, D. Matching networks for one shot learning. In Proceedings of the Neural Information Processing Systems Conference, Barcelona, Spain 5–10 December 2016. [Google Scholar]
  210. Altae-Tran, H.; Ramsundar, B.; Pappu, A.S.; Pande, V. Low Data Drug Discovery with One-Shot Learning. ACS Cent. Sci. 2017, 3, 283–293. [Google Scholar] [CrossRef] [Green Version]
  211. Cortes, C.; Kuznetsov, V.; Mohri, M. Ensemble methods for structured prediction. In Proceedings of the 31st International Conference on Machine Learning, Beijing, China, 21–26 June 2014; pp. 1134–1142. [Google Scholar]
  212. Chen, B.; Sheridan, R.P.; Hornak, V.; Voigt, J.H. Comparison of Random Forest and Pipeline Pilot Naïve Bayes in Prospective QSAR Predictions. J. Chem. Inf. Model. 2012, 52, 792–803. [Google Scholar] [CrossRef]
  213. Sheridan, R.P. Time-Split Cross-Validation as a Method for Estimating the Goodness of Prospective Prediction. J. Chem. Inf. Model. 2013, 53, 783–790. [Google Scholar] [CrossRef] [PubMed]
  214. Wilson, W.I.; Peng, Y.; Augsburger, L.L. Generalization of a prototype intelligent hybrid system for hard gelatin capsule formulation development. AAPS PharmSciTech 2005, 6, E449–E457. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  215. Mehta, C.H.; Narayan, R.; Nayak, U.Y. Computational modeling for formulation design. Drug Discov. Today 2018, 24, 781–788. [Google Scholar] [CrossRef] [PubMed]
  216. Zhao, C.; Jain, A.; Hailemariam, L.; Suresh, P.; Akkisetty, P.; Joglekar, G.; Venkatasubramanian, V.; Reklaitis, G.V.; Morris, K.; Basu, P. Toward intelligent decision support for pharmaceutical product development. J. Pharm. Innov. 2006, 1, 23–35. [Google Scholar] [CrossRef]
  217. Rantanen, J.; Khinast, J. The Future of Pharmaceutical Manufacturing Sciences. J. Pharm. Sci. 2015, 104, 3612–3638. [Google Scholar] [CrossRef] [Green Version]
  218. Ketterhagen, W.R.; Am Ende, M.T.; Hancock, B.C. Process Modeling in the Pharmaceutical Industry using the Discrete Element Method. J. Pharm. Sci. 2009, 98, 442–470. [Google Scholar] [CrossRef]
  219. Chen, W.; Desai, D.; Good, D.; Crison, J.; Timmins, P.; Paruchuri, S.; Wang, J.; Ha, K. Mathematical Model-Based Accelerated Development of Extended-release Metformin Hydrochloride Tablet Formulation. AAPS PharmSciTech 2015, 17, 1007–1013. [Google Scholar] [CrossRef] [Green Version]
  220. Meziane, F.; Vadera, S.; Kobbacy, K.; Proudlove, N. Intelligent systems in manufacturing: Current developments and future prospects. Integr. Manuf. Syst. 2000, 11, 218–238. [Google Scholar] [CrossRef]
  221. Sahu, A.; Mishra, J.; Kushwaha, N. Artificial Intelligence (AI) in Drugs and Pharmaceuticals. Comb. Chem. High Throughput Screen. 2022, 25, 1818–1837. [Google Scholar] [CrossRef]
  222. Faure, A.; York, P.; Rowe, R. Process control and scale-up of pharmaceutical wet granulation processes: A review. Eur. J. Pharm. Biopharm. 2001, 52, 269–277. [Google Scholar] [CrossRef]
  223. Landin, M. Artificial intelligence tools for scaling up of high shear wet granulationprocess. J. Pharm. Sci. 2017, 106, 273–277. [Google Scholar] [CrossRef] [Green Version]
  224. Das, M.K.; Chakraborty, T. ANN in Pharmaceutical Product and Process Development. In Artificial Neural Network for Drug Design, Delivery and Disposition; Puri, M., Pathak, Y., Sutariya, V.K., Tipparaju, S., Moreno, W., Eds.; Academic Press: Boston, MA, USA, 2016; pp. 277–293. [Google Scholar]
  225. Gams, M.; Horvat, M.; Ožek, M.; Luštrek, M.; Gradišek, A. Integrating Artificial and Human Intelligence into Tablet Production Process. AAPS PharmSciTech 2014, 15, 1447–1453. [Google Scholar] [CrossRef] [Green Version]
  226. Kraft, D.L. System and Methods for the Production of Personalized Drug Products. U.S. Patent 20120041778A1, 29 January 2019. [Google Scholar]
  227. Aksu, B.; Paradkar, A.; de Matas, M.; Özer, Ö.; Güneri, T.; York, P. A quality bydesign approach using artificial intelligence techniques to control the critical quality attributes of ramipril tablets manufactured by wet granulation. Pharm. Dev. Technol. 2013, 18, 236–245. [Google Scholar] [CrossRef]
  228. Goh, W.Y.; Lim, C.; Peh, K.; Subari, K. Application of a Recurrent Neural Network to Prediction of Drug Dissolution Profiles. Neural Comput. Appl. 2002, 10, 311–317. [Google Scholar] [CrossRef]
  229. Drăgoi, E.N.; Curteanu, S.; Fissore, D. On the Use of Artificial Neural Networks to Monitor a Pharmaceutical Freeze-Drying Process. Dry. Technol. 2013, 31, 72–81. [Google Scholar] [CrossRef] [Green Version]
  230. Reklaitis, R. Towards Intelligent Decision Support for Pharmaceutical Product Development; PharmaHub: Olongapo, Philippines, 2008. [Google Scholar]
  231. Wang, X. Intelligent quality management using knowledge discovery in databases. In Proceedings of the 2009 International Conference on Computational Intelligence and Software Engineering, Wuhan, China, 11–13 December 2009; IEEE: Piscataway, NJ, USA, 2009; pp. 1–4. [Google Scholar]
  232. Hay, M.; Thomas, D.W.; Craighead, J.L.; Economides, C.; Rosenthal, J. Clinical development success rates for investigational drugs. Nat. Biotechnol. 2014, 32, 40–51. [Google Scholar] [CrossRef]
  233. Harrer, S.; Shah, P.; Antony, B.; Hu, J. Artificial Intelligence for Clinical Trial Design. Trends Pharmacol. Sci. 2019, 40, 577–591. [Google Scholar] [CrossRef] [Green Version]
  234. Fogel, D.B. Factors associated with clinical trials that fail and opportunities for improving the likelihood of success: A review. Contemp. Clin. Trials Commun. 2018, 11, 156–164. [Google Scholar] [CrossRef]
  235. Kalafatis, S.P.; Tsogas, M.H.; Blankson, C. Positioning strategies in business markets. J. Bus. Ind. Mark. 2000, 15, 416–437. [Google Scholar] [CrossRef]
  236. Jalkala, A.M.; Keränen, J. Brand positioning strategies for industrial firms providing customer solutions. J. Bus. Ind. Mark. 2014, 29, 253–264. [Google Scholar] [CrossRef]
  237. Ding, M.; Eliashberg, J.; Stremersch, S. Innovation and Marketing in the Pharmaceutical Industry; Springer: New York, NY, USA, 2016. [Google Scholar]
  238. Dou, W.; Lim, K.H.; Su, C.; Zhou, N.; Cui, N. Brand Positioning Strategy Using Search Engine Marketing. MIS Q. 2010, 34, 261. [Google Scholar] [CrossRef] [Green Version]
  239. Chiu, C.-Y.; Chen, Y.-F.; Kuo, I.-T.; Ku, H.C. An intelligent market segmentation system using k-means and particle swarm optimization. Expert Syst. Appl. 2009, 36, 4558–4565. [Google Scholar] [CrossRef]
  240. Toker, D.; Tozan, H.; Vayvai, O. A Decision Model for Pharmaceutical Marketing and a Case Study in Turkey. Econ. Res.-Ekon. 2013, 26, 101–114. [Google Scholar] [CrossRef]
  241. Singh, J.; Flaherty, K.; Sohi, R.S.; Deeter-Schmelz, D.; Habel, J.; Le Meunier-FitzHugh, K.; Malshe, A.; Mullins, R.; Onyemah, V. Sales profession and professionals in the age of digitization and artificial intelligence technologies: Concepts, priorities, and questions. J. Pers. Sell. Sales Manag. 2018, 39, 2–22. [Google Scholar] [CrossRef]
  242. Milgrom, P.R.; Tadelis, S. How Artificial Intelligence and Machine Learning Can Impact Market Design; National Bureau of Economic Research: Cambridge, MA, USA; University of Chicago Press: Chicago, IL, USA, 2019; pp. 567–586. [Google Scholar]
  243. Davenport, T.; Guha, A.; Grewal, D.; Bressgott, T. How artificial intelligence will change the future of marketing. J. Acad. Mark. Sci. 2020, 48, 24–42. [Google Scholar] [CrossRef] [Green Version]
  244. Syam, N.; Sharma, A. Waiting for a sales renaissance in the fourth industrial revolution: Machine learning and artificial intelligence in sales research and practice. Ind. Mark. Manag. 2018, 69, 135–146. [Google Scholar] [CrossRef]
  245. Duran, O.; Rodriguez, N.; Consalter, L.A. Neural networks for cost estimation of shelland tube heat exchangers. Expert Syst. Appl. 2009, 36, 7435–7440. [Google Scholar] [CrossRef] [Green Version]
  246. Park, Y.; Goto, D.; Yang, K.; Downton, K.; LeComte, P.; Olson, M.; Mullins, C. A Literature Review of Factors Affecting Price and Competition in the Global Pharmaceutical Market. Value Health 2016, 19, A265. [Google Scholar] [CrossRef]
  247. de Jesus, A. AI for Pricing—Comparing 5 Current Applications. Emerj. Artif. Intell. Res. 2019. [Google Scholar]
  248. Hassanzadeh, P.; Atyabi, F.; Dinarvand, R. The significance of artificial intelligence in drug delivery system design. Adv. Drug Deliv. Rev. 2019, 151–152, 169–190. [Google Scholar] [CrossRef]
  249. Luo, M.; Feng, Y.; Wang, T.; Guan, J. Micro-/nanorobots at work in active drug delivery. Adv. Funct. Mater. 2018, 28, 1706100. [Google Scholar] [CrossRef]
  250. Fu, J.; Yan, H. Controlled drug release by a nanorobot. Nat. Biotechnol. 2012, 30, 407–408. [Google Scholar] [CrossRef]
  251. Calzolari, D.; Bruschi, S.; Coquin, L.; Schofield, J.; Feala, J.D.; Reed, J.C.; McCulloch, A.D.; Paternostro, G. Search Algorithms as a Framework for the Optimization of Drug Combinations. PLoS Comput. Biol. 2008, 4, e1000249. [Google Scholar] [CrossRef] [Green Version]
  252. Wilson, B.; Km, G. Artificial intelligence and related technologies enabled nanomedicine for advanced cancer treatment. Nanomedicine 2020, 15, 433–435. [Google Scholar] [CrossRef]
  253. Tsigelny, I.F. Artificial intelligence in drug combination therapy. Briefings Bioinform. 2019, 20, 1434–1448. [Google Scholar] [CrossRef]
  254. Mason, D.J.; Eastman, R.T.; Lewis, R.P.I.; Stott, I.P.; Guha, R.; Bender, A. Using Machine Learning to Predict Synergistic Antimalarial Compound Combinations With Novel Structures. Front. Pharmacol. 2018, 9, 1096. [Google Scholar] [CrossRef]
  255. Ho, D.; Wang, P.; Kee, T. Artificial intelligence in nanomedicine. Nanoscale Horiz. 2018, 4, 365–377. [Google Scholar] [CrossRef]
  256. Sacha, G.M.; Varona, P. Artificial intelligence in nanotechnology. Nanotechnology 2013, 24, 452002. [Google Scholar] [CrossRef] [Green Version]
  257. Pellat, G.; Anghelache, C. Governance in the EU Member States in the Era of Big Data; Editura EconomicĄ Distributie: Bucharest, Romania, 2019. [Google Scholar]
  258. van der Lee, M.; Swen, J.J. Artificial intelligence in pharmacology research and practice. Clin. Transl. Sci. 2022, 1–6. [Google Scholar] [CrossRef]
  259. Research and Markets Global Growth Insight—Role of AI in the Pharmaceutical Industry 2018–2022: Exploring Key Investment Trends, Companies-to Action, and Growth Opportunities for AI in the Pharmaceutical Industry; Research and Markets: Dublin, Ireland, 2019.
  260. Dong, J.; Zhao, M.; Liu, Y.; Su, Y.; Zeng, X. Deep learning in retrosynthesis planning: Datasets, models and tools. Briefings Bioinform. 2022, 23, bbab391. [Google Scholar] [CrossRef]
  261. Jämsä-Jounela, S.-L. Future trends in process automation. Annu. Rev. Control 2007, 31, 211–220. [Google Scholar] [CrossRef]
  262. Davenport, T.H.; Ronanki, R. Artificial intelligence for the real world. Harv. Bus. Rev. 2018, 96, 108–116. [Google Scholar]
Figure 1. Conceptual Interrelationships between Artificial Intelligence(AI), Machine Learning(ML), & Deep Learning(DL) for drug development.
Figure 1. Conceptual Interrelationships between Artificial Intelligence(AI), Machine Learning(ML), & Deep Learning(DL) for drug development.
Ijms 24 02026 g001
Figure 2. A Summarized Notion of AI & ML Tools engaged in Drug Discovery & Development.
Figure 2. A Summarized Notion of AI & ML Tools engaged in Drug Discovery & Development.
Ijms 24 02026 g002
Figure 3. Links between AI, ML & DL for drug development.
Figure 3. Links between AI, ML & DL for drug development.
Ijms 24 02026 g003
Table 1. Enumeration of AI-Aided Computational Tools for Facilitating Drug Discovery.
Table 1. Enumeration of AI-Aided Computational Tools for Facilitating Drug Discovery.
AlphaFold Protein 3D (tertiary) structure presage employing DNN
(accessed on 28 November 2022)
(accessed on 28 November 2022)
Chemputer An exhaustive regulated schema for documenting a chemical synthesis method
(Furnishes comprehensive compound synthesis recipe)
(accessed on 28 November 2022)
Conv_qsar_fastForetells molecular attributes aided by CNN algorithm
(accessed on 28 November 2022)
Chemical VAEMechanized chemical crafting employing variational autoencoder
(accessed on 28 November 2022)
DeepChem A Python-aided AI technique for various drug discovery workflow predictions utilizing a DL algorithm for molecule recognition
(accessed on 28 November 2022)
DeepNeuralNet-QSAR Foretells molecular activity engaging multilevel DNN
(accessed on 28 November 2022)
DeepTox Toxicity predictions of chemical agents utilizing a DL
(accessed on 28 November 2022)
DeltaVina Presages small molecule interaction affinity with drug employing an amalgamation of random forest (RF) as well as AutoDock scoring function)
(accessed on 28 November 2022)
Hit Dexter ML schemes for the presage of compounds that could be sensitive to biochemical assays by engaging ML techniques
(accessed on 28 November 2022)
InnerOuterRNNForetells the chemical, physical, and biological attributes utilizing inner- and outer RNNs
(accessed on 28 November 2022)
JunctionTree VAEDe novo molecule origination utilizing junction tree variational autoencoder (VAE)
(accessed on 28 November 2022)
Neural Graph Fingerprints Attribute augury of novel molecules employing CNN algorithms
(accessed on 28 November 2022)
NNScore Foretells the affinity of protein–ligand binding utilizing neural network-aided scoring function
(accessed on 28 November 2022)
(accessed on 28 November 2022)
Open Drug Discovery Toolkit (ODDT) An exhaustive toolkit utilized for chemoinformatics and molecular modelling
employing random forest score (RF)-Score as well as NNScore
(accessed on 28 November 2022)
ORGANIC A competent molecular generation tool to originate molecules with favourable attributes employing ML schemes
(accessed on 28 November 2022)
PotentialNet Foretells ligand-binding affinity engaging graph CNN
(accessed on 28 November 2022)
PPB2 Poly-pharmacology prediction employing nearest neighbour as well as ML schemes
(accessed on 28 November 2022)
QML A Python toolkit for quantum ML (utilizing qubits leading to incremented computational speed, data storage capacity, and learning optimization)
(accessed on 28 November 2022)
(accessed on 28 November 2022)
REINVENT De novo design of molecule employing RNN (recurrent neural network) as well as RL (reinforcement learning)
(accessed on 28 November 2022)
SCScore A scoring scheme to figure out the synthesis complexity of a compound
(accessed on 28 November 2022)
SIEVE-Score An upgraded technique of structure-aided virtual screening through interaction-energy-based learning
(accessed on 28 November 2022)
Table 2. Partnerships of AI establishments with pharmaceutical firms.
Table 2. Partnerships of AI establishments with pharmaceutical firms.
Utilization of AIPartnership with the Pharmaceutical EstablishmentPlatform Advanced/Lead Agents for Clinical Trials
San Francisco,
CA 94107,
A scheme for AI-facilitated drug design addressing oncology and gastroenterology specialitiesTakeda Agent S48168 in Phase 1 of clinical testing for Ryanodine receptor 2
San Francisco,
CA 94107, USA
A scheme for AI-facilitated drug design addressing oncology and gastroenterology specialitiesServier Drug advancement related to oncology, central nervous system, and gastroenterologic maladies
San Francisco,
CA 94103,
A scheme for AI-enabled structural modelling LillyAgent BBT-401 in Phase 2 of clinical testing
San Francisco,
CA 94103,
A scheme for AI-enabled structural modellingBridge Biotherapeutics Augmentation of Pellino Inhibitor Pipeline; Agent BBT-401 evaluated in Phase-2a of clinical testing
Benevolent AI
London, UK
AI-facilitated Judgement Augmented Cognition System (JACS) for originating and advancing novel clinical lead agents effective in neurodegenerative ailments Janssen Fresh set of drug compounds to be advanced via such collaboration
Benevolent AI
London, UK
AI aided schemes to advance novel clinical lead agents effective in chronic kidney ailments AstraZeneca Drug candidate evaluated in Phase 2b clinical testing as a lead agent effective in chronic kidney ailments
Oxford, UK
A scheme for AI-enabled drug discovery and lead refinement Sanofi Drug Discovery Research in obsessive-compulsive disorder, Agent DSP-1181 in Phase I clinical testing. Advance Centaur Chemist™ scheme for AI-enabled drug discovery
IBM Watson Health
Cambridge, MA 02142, USA
Furnishes a scheme for clinical and health-associated data evaluation Pfizer Accelerating drug discovery efforts in immuno-oncology
IBM Watson Health
Cambridge, MA 02142, USA
Furnishes a scheme for clinical and health-associated data evaluationNovartis Real-time surveillance of patients to augment breast cancer patient intervention results
Microsoft Redmond, WA 98052, USAA scheme for image processing as well as cell and gene-aided therapeutic interventions Novartis Engendering an AI Innovation lab to augment the drug discovery mechanism as well as its commercialization
Broadway, New York, NY, USA
Furnish a scheme for clinical testing aided by ML technique Roche Originated and advanced Owkin’s Studio platform utilizing AI technology
Sensyne health
Headington, Oxfordshire, UK
A tool serving clinical AI schemes Bayer Originated and advanced Sensyne Health’s proprietary clinical AI technology package
Shenzhen, Guangdong, China
A package enabling Target identification and validation incorporating QM as well as ML schemes Pfizer Presage and refinement of crystalline entities of drug candidates utilizable in early stages of drug screening
BioXcel therapeutics
New Haven, CT, USA
A scheme facilitating drug discovery services incorporating AI mechanisms Pfizer Lead agent BXCL501-in assessment in Phase 3 clinical testing; Drug agent BXCL701-in assessment in Phase 2 clinical assessment
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Sarkar, C.; Das, B.; Rawat, V.S.; Wahlang, J.B.; Nongpiur, A.; Tiewsoh, I.; Lyngdoh, N.M.; Das, D.; Bidarolli, M.; Sony, H.T. Artificial Intelligence and Machine Learning Technology Driven Modern Drug Discovery and Development. Int. J. Mol. Sci. 2023, 24, 2026.

AMA Style

Sarkar C, Das B, Rawat VS, Wahlang JB, Nongpiur A, Tiewsoh I, Lyngdoh NM, Das D, Bidarolli M, Sony HT. Artificial Intelligence and Machine Learning Technology Driven Modern Drug Discovery and Development. International Journal of Molecular Sciences. 2023; 24(3):2026.

Chicago/Turabian Style

Sarkar, Chayna, Biswadeep Das, Vikram Singh Rawat, Julie Birdie Wahlang, Arvind Nongpiur, Iadarilang Tiewsoh, Nari M. Lyngdoh, Debasmita Das, Manjunath Bidarolli, and Hannah Theresa Sony. 2023. "Artificial Intelligence and Machine Learning Technology Driven Modern Drug Discovery and Development" International Journal of Molecular Sciences 24, no. 3: 2026.

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop