Next Article in Journal
Physiologically Based Biopharmaceutics Modelling—Best Scientific Practices to Define Drug Product Performance, Latest Regulatory and Industry Perspectives: Workshop Summary Report
Previous Article in Journal
Predictive Stability of Aggregation in Glycoconjugate Vaccines Using Advanced Kinetics Modeling and High-Throughput Screening
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Review

Integrative Computational Chemistry Approaches in Modern Drug Discovery: Advances in Docking, Pharmacophore Modeling, Molecular Dynamics, and Virtual Screening

Department of Pharmaceutical Chemistry, College of Pharmacy, Prince Sattam Bin Abdulaziz University, Al-Kharj 11942, Saudi Arabia
*
Author to whom correspondence should be addressed.
Pharmaceutics 2026, 18(5), 565; https://doi.org/10.3390/pharmaceutics18050565
Submission received: 28 January 2026 / Revised: 20 April 2026 / Accepted: 27 April 2026 / Published: 1 May 2026
(This article belongs to the Section Drug Targeting and Design)

Abstract

Computational chemistry has played a central role in early-stage drug discovery by accelerating target selection, hit identification, and lead optimization. This review summarizes recent developments in molecular docking, pharmacophore modeling, molecular dynamics (MD), and virtual screening (VS), with a focus on their application in practical drug discovery workflows. Advances in docking protocols, including consensus scoring, physics-based rescoring, and ensemble approaches, addressed the challenges of receptor flexibility. Both ligand-based and structure-based pharmacophore models facilitated scaffold hopping and guided library prioritization. MD simulations were used to assess binding pose stability, identify cryptic binding pockets, and characterize solvent interactions. These simulations also supported free-energy calculations using endpoint and alchemical methods. Large-scale VS campaigns employed curated compound libraries, often composed of make-on-demand molecules, and relied on high-performance computing or cloud infrastructure to screen up to 109 compounds. Hits were validated using orthogonal biophysical assays and filtered by absorption, distribution, metabolism, excretion, and toxicity (ADMET) predictions. Integrated pipelines combining pharmacophore modeling, docking, MD, and free-energy calculations improved enrichment rates and reduced the number of compounds requiring synthesis. Several case studies demonstrated the identification of nanomolar-affinity leads from ultra-large screening campaigns. The review also addressed ongoing challenges, such as inconsistent scoring of binding affinity, protonation, and tautomeric errors, dataset bias, and reproducibility issues. Strategies to mitigate these limitations included standardized library preparation, adherence to FAIR (Findable, Accessible, Interoperable, and Reusable) data principles, and the use of prospective benchmarking protocols. The review discussed emerging trends, including the use of quantum chemistry for electronic structure refinement, ensemble docking guided by cryo-electron microscopy (cryo-EM) data, and the integration of computational tools with automated synthesis and high-throughput screening in closed-loop discovery systems. These approaches have the potential to accelerate the design–make–test cycle, increase hit novelty, and improve decision-making in early drug development programs.

1. Introduction

The process of drug discovery is complex and resource-intensive, historically involving both empirical screening approaches and rational drug design strategies, including early medicinal chemistry and structure-guided optimization [1]. Drug discovery remains one of the most time-intensive, expensive, and high-risk processes in biomedical research, and, on average, total costs exceed USD 2.6 billion, with timelines averaging 10–15 years per compound that receives market approval [2]. Despite significant investments in drug discovery, the attrition of drug candidates remains incredibly high, and reportedly only 1 in every 5000–10,000 compounds receives approval for market use [3].
In this challenging landscape, computational chemistry has become an integral component of the drug discovery pipeline, offering cost-effective, rapid alternatives to early-stage experimental screening [4,5]. Drug discovery starts with target identification and proceeds sequentially through lead discovery, optimization, preclinical, and clinical trials. At least through the first three stages of drug discovery, computational tools play an important role by helping prioritize hits and optimize molecular interactions [6]. Computational chemistry enables virtual screening of millions of compounds, simulation of protein–ligand dynamics, and prediction of absorption, distribution, metabolism, excretion, and toxicity (ADMET) profiles [7]. As experimental screening campaigns often involve high-throughput systems with limited throughput, in silico methods significantly reduce attrition by removing weak candidates before costly biological testing [8].
Structure-based drug design (SBDD) took root in the 1980s following the emergence of crystallographic techniques, but the computational revolution of the last two decades has made molecular modeling accessible to a much wider range of applications [9]. With the exponential increase in structural and chemical databases and the integration of artificial intelligence, in silico platforms are now capable of screening billions of compounds in a matter of days [4,10]. The relevance of these computational methods has increased further during pandemics and emerging disease outbreaks, where time-to-discovery is critical [2]. Recent advances such as AlphaFold 3 significantly extend structure prediction capabilities beyond single proteins, enabling accurate modeling of protein–ligand, protein–nucleic acid complexes, and post-translational modifications. These developments enhance the applicability of structure-based drug design by providing more realistic interaction models [11,12]. Modern approaches in the structure prediction, especially the development of AlphaFold 3 and diffusion-based predictors, have made the field of computational drug discovery much broader and are not limited to the performance of single-protein systems. The goals of these methods are to simulate more and more complicated biological assemblies, such as protein–ligand interactions, protein–nucleic acid complexes, and systems with post-translational modifications. These methods combine principles of deep learning with those of structural biology, which provides the possibility of bridging the gaps between the prediction of static structures and the dynamics of molecular recognition processes, which underlie the structure-based design of drugs in targets that were previously challenging. Nevertheless, even with these encouraging trends, generalizable and precise modeling of such complex systems is still a research area. Recent methods, such as AlphaFold 3, have proven to make significant strides, but nonetheless, are limited in their ability to predict binding affinities, ligand conformations, and interaction energetics in a variety of chemical and biological systems with low confidence. Specifically, issues with the capture of induced-fit effects, solvent interactions, and the effect of post-translational modifications on binding interfaces remain. Likewise, diffusion-based generative models, though strong in investigating structural and chemical space, need additional validation to guarantee their robustness and reproducibility in future applications. In turn, these new approaches are to be considered as complementary tools, not as substitutes for the old computational and experimental approaches. The combination of structure prediction models with the use of molecular docking, molecular dynamics simulations, and experimental validation is necessary to enhance reliability and interpretability. Further methodological development, comparison with high-quality datasets, and future validation experiments will be essential in order to maximize the potential of these methods in drug discovery pipelines. Among the most prominent computational approaches, molecular docking simulates the binding affinity between ligands and biological targets and is a foundational tool in virtual screening workflows [13]. Several docking algorithms, including AutoDock, GOLD, and Glide, have been fine-tuned to enhance overall binding mode prediction and scoring functions [14]. Pharmacophore modeling, another pivotal methodology, allows abstraction of critical features necessary for molecular recognition and is frequently utilized in the absence or limitations of crystal structures [15]. Both structure-based pharmacophores and ligand-based pharmacophores can also be incorporated into virtual screens to enhance the overall accuracy and minimize false positives. Molecular dynamics (MD) simulations provide atomistic perspectives of protein–ligand complexes over time, revealing conformational changes and confirming docking-predicted poses [8]. Rapid MD methods like accelerated MD and meta dynamics, along with the advantages of GPU-based parallelization, have made these simulations not only more accessible but also more accurate [16]. The integration of MD simulations, pharmacophore models, and docking provides a more realistic framework for studying drug–target interactions [17]. Virtual screening has become the procedure for filtering ultra-large libraries, enabling ligand- and structure-based workflows to be performed with high throughput and accuracy to rank or prioritize compounds for synthesis or testing [18]. Recent innovations in AI-guided screening, deep generative models, and transfer learning have led to adaptive systems that can design new compounds from scratch (e.g., de novo) [19]. Combining several computational methods has led to more robust workflows, with high hit-to-lead conversion rates and low attrition rates at late stages observed [19]. In light of these developments, computational methods have become indispensable tools not just for speeding drug discovery but also for improving the precision, safety, and affordability of drugs [20]. However, there are still limitations on accuracy, reliability, and generalizability that hinder universal acceptance [21]. This review examined recent advances in established computational chemistry methods, including molecular docking, pharmacophore modeling, molecular dynamics, and virtual screening, and their uses and limitations in practice for drug discovery pipelines. Another objective was to provide an overview of the connectivity between computational approaches and how integrating several methods can speed up the drug discovery process, offering powerful tools to address longstanding challenges in lead identification and optimization, thus setting the stage for the detailed discussions that follow in this review.

2. Computational Approaches in Drug Discovery

Computational chemistry has emerged as an essential pillar of modern drug discovery, significantly accelerating design–make–test–analyze cycles and reducing experimental effort across target identification, hit finding, and lead optimization. Structure-based methods, such as molecular docking and molecular dynamics (MD) simulations, are now increasingly intertwined with ligand-based approaches, such as pharmacophore modeling and cheminformatics-guided virtual screening (VS). Mature public data repositories and advances in structure prediction have vastly expanded the feasible problem space from thousands to billions of compounds and from single-protein to pathway- and network-level predictions. These advances enable tighter integration between in silico hypotheses and orthogonal biophysical and cellular assays, improving the precision and reproducibility of medicinal chemistry campaigns [22,23].
Structure-based screening typically begins with a high-quality receptor model derived from crystallography, cryo-EM, or NMR deposited in the Protein Data Bank (PDB). Docking engines (for example, AutoDock Vina and Glide) rapidly generate and score poses for millions of ligands, providing prioritized hit lists for purchase or synthesis [24,25]. In practice, computational triage is directly coupled to experimental validation through using biophysical assays such as differential scanning fluorimetry (DSF), surface plasmon resonance (SPR), and isothermal titration calorimetry (ITC) and cell-based readouts; iterative feedback refines receptor protonation states, binding-site water treatment, and ligand tautomers before escalating promising scaffolds into prospective structure-activity relationship studies [26]. MD simulations then probe pose stability, desolvation, and protein flexibility on nanosecond–microsecond timescales, highlight cryptic pockets, and support free-energy calculations to rationalize affinity changes observed in the bench campaigns [23]. Ligand-based methods complement this loop when structural data are sparse or when chemotypes are diversified around known actives. Pharmacophore modeling distills essential 3D features (e.g., H-bond donors/acceptors, hydrophobes, aromatic centroids) required for activity, enabling scaffold hopping and fast database queries (Table 1); platforms like PHASE couple pharmacophore perception with 3D QSAR to guide substituent changes with quantitative predictions [27]. To control attrition due to poor ADME, rule-based filters, most prominently Lipinski’s “Rule of Five”, flag liabilities early and focus synthesis on developable regions of chemical space without over-constraining novelty [28].
At the target-identification stage, transcriptomic perturbation resources such as the Connectivity Map (CMap/LINCS) allow researchers to link compounds, genes, and diseases by shared expression signatures (Figure 1). These signatures can uncover actionable mechanisms and prioritize targets for phenotypic hits when genetics or pathway evidence is equivocal. Once targets are selected, structure-enabled campaigns exploit high-resolution complexes and, increasingly, AI-predicted structures to expand coverage beyond traditionally tractable protein families. Docking and shape/pharmacophore screens seed initial hits; consensus scoring and rescoring with physics-based or ML-augmented models improve enrichment. Hit-to-lead optimization then leverages MD for water-network analysis, conformational selection, and binding-mode stabilization, informing substituent choices that balance potency and physicochemical properties. Throughout, medicinal chemistry teams integrate developability heuristics (e.g., RO5 compliance and its known exceptions) with iterative synthesis to avoid late-stage failures [23,28]. Large-scale virtual screening has matured from millions to billions of candidates via cloud and HPC platforms, workflow automation, and fault-tolerant orchestration. VirtualFlow exemplifies this shift, enabling ultra-large docking campaigns with modular support for multiple docking engines and robust pre-processing, which in turn has led to prospective validations against diverse targets. Open bioactivity repositories such as ChEMBL supply target-annotated structure–activity data that seed pharmacophore/ligand-based models, enable decoy generation, and support external validation of computational pipelines [40]. Early computer-aided design leaned heavily on interpretable rules and linear models, with RO5 serving as a pragmatic filter for oral drug-likeness during lead optimization [28]. Docking programs evolved in parallel, improving sampling and scoring through knowledge-based potentials and empirical terms; method benchmarks on Vina and Glide demonstrated substantial gains in pose prediction and enrichment versus earlier tools [22]. Over the last decade, two inflection points reshaped the field. First, routine microsecond-scale MD powered by algorithmic and hardware advances made it practical to account for receptor dynamics, allostery, and water rearrangements in design decisions rather than treating the protein as static [23]. Second, AI-based structure prediction achieved near-experimental accuracy for many single-chain proteins, vastly expanding the structural coverage for previously “undockable” targets and enabling downstream SBDD at scale [41]. These methodological shifts catalyzed new workflows: predicted structures can be refined by MD, cross-checked against homology models, and used directly for grid generation and docking; ambiguous binding-site residues can be enumerated across protonation/tautomer states and validated by pose stability. Pharmacophore hypotheses extracted from known ligands can be projected onto AI-predicted pockets to rationalize activity cliffs. In parallel, ML models trained on curated ChEMBL bioactivity and assay metadata guide active-learning loops, prioritize syntheses, and calibrate uncertainty, while expression-signature matching from CMap/LINCS links chemical perturbations to pathway rewiring for mechanism-of-action inference and indication expansion [40]. Progress in computational discovery is inseparable from the growth of high-quality public datasets and shared infrastructure. The PDB standardizes macromolecular structures, enabling consistent binding-site preparation, water and ion curation, and comparative modeling pipelines. ChEMBL delivers assay-level activity annotations crucial for kinetic vs. equilibrium distinctions, target engagement confidence, and negative data ingredients for reproducible model training and unbiased external validation. Expression-profiling resources such as the modern CMap make it feasible to connect small molecules to genetic programs, triage series by pathway selectivity, and anticipate liabilities by monitoring off-target transcriptional fingerprints [24,25,40].

3. Molecular Docking: Theory, Tools, and Applications

Molecular docking aims to predict how a ligand binds within a macromolecular target and to estimate the strength of this interaction. Typical workflows comprise (i) receptor and ligand preparation (protonation/tautomer assignment, conformer generation, binding-site definition), (ii) pose generation via systematic, stochastic, or knowledge-guided search, and (iii) scoring and ranking of poses using physics-based, empirical, or knowledge-based functions. Most docking programs treat the receptor as rigid or semi-flexible (sidechain rotamers or soft potentials), while the ligand explores translation, rotation, and torsions; induced fit is approximated by sidechain sampling or post-docking relaxation. Scoring functions balance terms for van der Waals complementarity, electrostatics, desolvation, and sometimes hydrogen bonding and metal coordination; consensus or rescoring strategies are often used to mitigate function-specific bias. Docking paradigms divide broadly into protein–ligand docking (typical drug-like molecules) and protein–protein docking (PPDock), the latter contending with far larger interfaces, rugged energy landscapes, and pronounced conformational change. For PPDock, sampling schemes must address multiscale translations and rotations of both partners, often guided by experimental restraints (e.g., XL-MS, NMR, cryo-EM) [22,42]. Multiple mature platforms underline the docking ecosystem. For instance, AutoDock/AutoDock Vina implemented an efficient stochastic search with multithreading and a reparametrized empirical scoring function that has served as one of the backbones of structure-based virtual screening (SBVS) and pose prediction. Glide combined hierarchical filtering with exhaustive pose refinement, as well as proprietary scoring (GlideScore), including strong early enrichment in SBVS benchmarks [22]. GOLD used a genetic algorithm to improve pose accuracy in challenging pockets by combining the flexibility of the protein’s side chains with user-specified constraints. Earlier releases of DOCK introduced a grid-based shape-complementarity approach and anchor-and-grow strategies, which eventually evolved into a modular suite widely used in academia [29]. For PPDock, RosettaDock enabled rigid-body sampling and optimizing some side-chain degrees of freedom within a Rosetta energy function, and it is accessible through an automated web server [42]. This suite of available dockers ranges from free/open-source (AutoDock Vina, DOCK) to commercially available tools (Glide, GOLD) and from command line engines to user-friendly servers for workflows ranging from exploratory screens of small sizes to ultra-large screen campaigns [22,29,42]. Ensemble docking addresses receptor plasticity by docking against multiple conformations derived from experimental structures, homology models, or molecular dynamics (MD) trajectories and aggregating results via consensus scoring or clustering. This strategy improves the chance of sampling near-native poses in flexible or allosteric sites and has shown practical benefits across GPCRs, kinases, and viral enzymes [43]. Deep learning for pose prediction and scoring is increasingly integrated into docking. Convolutional neural networks trained on protein–ligand grids can rescore poses to improve top-ranked accuracy without changing the search engine; GNINA 1.0, for example, couples CNN scoring to Vina-like sampling and demonstrates consistent pose improvements across standard benchmarks [44]. Beyond CNN rescoring, data-efficient graph and 3D-grid models continue to expand coverage to new protein families. QM/MM integration augments empirical scoring by recalculating key poses with quantum mechanical (QM) methods, capturing polarization, charge transfer, metal coordination, and covalent mechanisms that classical force fields struggle to model. Used selectively (e.g., rescoring top poses or refining ambiguous chemotypes), QM or QM/MM can correct rankings and reduce false positives, albeit at a higher cost [45]. These advances close the distance between rate and physical realism and lend a greater degree of reliable prospective design when combined with orthogonal filters (e.g., pharmacophore constraints, a water thermodynamics model, or an ADME predictor based on machine learning [43,44,45].
Recent advancements in docking include tools such as SMINA, QuickVina, DiffDock, and GNINA, which incorporate improved scoring functions and machine learning-based pose prediction. Diffusion-based docking methods, such as DiffDock, further enhance pose accuracy by modeling ligand placement as a generative process [46,47].
Docking permeates multiple phases of drug discovery. In the field of antivirals, docking was instrumental in triaging and prioritizing inhibitors of the SARS-CoV-2 main protease (Mpro) following crystal structure availability, highlighting SBVS in the hit identification phase and then medicinal chemistry optimization phase; this was also the first or most extensive report of working within the framework of SBVS to combine structure determination and discovery of inhibitors, further detailing structure–activity relationships and demonstrating how SBVS would further support atomically precise approaches during hit discovery [48]. In the CNS area, docking to GPCRs provides novel chemotypes; structure-based docking to the μ-opioid receptor assisted the discovery of PZM21, a Gi-biased agonist that provokes less respiratory depression in preclinical models (Figure 2), demonstrating how docking can leverage considerations for scaffolds with specific signaling profiles [49]. In oncology, docking campaigns commonly seed a series of kinase inhibitors and engage target protein–protein interactions (e.g., BCL-2 family, KRAS effectors), iteratively ranking members of fragments inhibited undefined hot spots, frequently followed by an induced-fit or MD refinement to assess selectivity. Successful efforts across therapeutic areas have been associated with careful target preparation (e.g., suitable biologically relevant protonation states of the molecule, metal or halogen treatment), diligent library design (physicochemical and substructure filters), and careful post-docking adjudication (pharmacophore fit, strain energy, water placement, QM/MM rescoring, experimental orthogonality, etc.) [48,49]. Scoring inaccuracies remain the primary bottleneck: empirical functions trade speed for approximations to solvation and entropy, leading to false positives/negatives and limited rank-order correlation with affinity. Protein flexibility and waters are imperfectly captured by single-structure docking; key rearrangements, cryptic pockets, or displaceable waters can defeat rigid-receptor assumptions. Chemistry edge cases, including covalent mechanisms, transition metals, tautomers/protonation microstates, halogen bonding, and pi-stacking anisotropy, require special handling. Dataset bias and reproducibility also matter. Retrospective enrichments may not translate prospectively if benchmarks overlap with training data (for ML-augmented scorers) or if preparation pipelines differ. Looking forward, several directions are promising. First, physics-aware ML hybrid models that preserve geometric and energetic constraints while learning from large structural corpora should improve pose plausibility and generalization beyond familiar pockets. Second, ensembles and water thermodynamics (e.g., GCMC water, water networks) will likely become standard in SBVS triage. Third, selective QM/MM rescoring and polarizable force fields can correct borderline decisions, especially for metalloproteins and covalent inhibitors. Workflow integration from pocket detection to docking to FEP/MD refinement, plus better uncertainty quantification and prospective benchmarking, will continue to raise confidence in docking-driven nominations [43,44,45].

4. Pharmacophore Modeling: From Concept to Clinical Candidates

A pharmacophore is the ensemble of steric and electronic features that a ligand must present to ensure optimal interactions with a biological target and elicit a response; common features include hydrogen-bond donors/acceptors, hydrophobes, aromatics, cations/anions, and metal binders [50]. In ligand-based modeling, multiple known actives are aligned to derive a consensus feature pattern that tolerates scaffold diversity and supports scaffold hopping. In structure-based modeling, features are extracted from a receptor–ligand complex (or a prepared apo pocket), often complemented by water and protonation analysis to refine the geometry and priorities of interaction hotspots (Figure 3). In practice, teams use ligand-based hypotheses for rapid library triage and structure-based (SB) models to rationalize SAR and steer substituent placement near obligatory interactions or displaceable waters, with the two approaches frequently combined in iterative cycles. Several mature tools support both hypothesis generation and large-scale screening. LigandScout derives 3D pharmacophores directly from protein-bound ligands and exports queries for fast shape/feature search. PHASE integrates common-pharmacophore identification, hypothesis scoring, and 3D-QSAR, and is widely used to couple pharmacophores with machine-learned activity models [27].
MOE provides comprehensive conformer generation and pocket/feature utilities; methods such as LowModeMD improve conformational sampling for alignment and hypothesis robustness [31]. For rapid, free ligand-based hypothesis building, PharmaGist aligns multiple ligands to identify common feature constellations. These platforms underpin both quick exploratory filters and production-grade virtual screening pipelines. Because pharmacophore hits are typically prioritized from very large libraries, early-recognition metrics are essential. Receiver-operating characteristic (ROC-AUC) provides a global view, but enrichment-focused measures such as EF at low database fractions and BEDROC more faithfully evaluate real-world screening, where only the top fraction is tested. Robust practice includes external decoy sets, retrospective recovery of known actives, and prospectively blinded tests that fix decision thresholds before synthesis.
Energy-optimized or e-pharmacophores project per-residue or per-feature energetic terms from a receptor–ligand complex onto pharmacophoric features, producing hypotheses that better reflect the underlying physics than purely geometric models. They are often used to re-rank or prune docking hits before. MD-refined pharmacophores incorporate receptor dynamics by extracting features from conformational ensembles; compared with single-snapshot models, MD-derived hypotheses can better separate actives from decoys and reveal transient/cryptic interactions. Beyond physics-based refinements, AI-enhanced pharmacophore workflows now condition generative or predictive models on 3D feature maps, improving control during linker/R-group design and accelerating hypothesis testing [51]. These strategies increase hit quality by reconciling speed (feature search) with realism (energetics and dynamics). Pharmacophore queries routinely seed phenotypic-to-target campaigns (back-mapping features from chemotyped hits), guide scaffold hopping with preserved interactions in GPCRs/kinases, and pre-filter ultra-large libraries before docking. They are especially valuable when target structures are incomplete or flexible, and when medicinal chemistry requires diverse chemotypes with shared binding logic (Figure 4). Key limitations persist. The hypotheses may overfit sparse training sets; feature geometry can be sensitive to conformer quality; and static models can miss water-mediated or induced-fit interactions unless augmented by MD or energetic analysis. Best practice combines ligand- and structure-based evidence, validates with enrichment-focused metrics (Table 2), and closes the loop with prospective assays to refine features and tolerances.

5. Molecular Dynamics (MD) Simulations: Capturing Flexibility and Dynamics

Molecular dynamics (MD) provides time-resolved, atomistic movies of biomolecules by integrating Newton’s equations of motion on a potential energy surface defined by a force field. For protein and protein–ligand complex, modern biomolecule-tuned force fields CHARMM36m, AMBER ff14SB, and OPLS3e deliver improved secondary-structure balance, sidechain rotamer distributions, and small-molecule compatibility, enabling routine simulations from nanoseconds to multi-microseconds on commodity GPUs [23,33,34,35]. MD complements static models by revealing conformational selection, induced fit, cryptic pocket formation, and ordered water networks that strongly modulate ligand binding and selectivity [23].
Conventional MD uses a fixed potential and timestep to sample thermally accessible motions and is the backbone for stability assessments of docked complexes. Enhanced sampling methods accelerate rare events. Accelerated MD lowers effective energy barriers to promote transitions between metastable states, improving exploration of loops, side chains, and allosteric sites [60]. Steered molecular dynamics (MD) utilizes time-dependent forces to study unbinding trajectories, rupture forces, or conformational switching. These studies typically yield mechanistic hypotheses that are addressed through mutagenesis or linker design. Meanwhile, in thermodynamic studies, end-point free energy approaches with MD files, such as MM-PBSA or MM-GBSA, can provide estimates of relative binding affinity and serve as efficient triage methods when coupled with adequate dielectric and entropy approximations [61]. Alchemical free energy methods (FEP) can be even more accurate in determining affinity, as they more accurately convert ligands along non-physical paths. Furthermore, more recently, when the workflow provides sufficiently rigorous sampling and error control, FEP can provide future estimates with sub-kilo-caloric precision for congeneric series of ligands. While many academic and commercial docking software can generate candidate poses, MD directly assesses the physical plausibility of these poses and resolves ambiguities arising from protonation, tautomerism, or the placement of waters between the ligand and the target. Additionally, while simulations may run for long periods, settings as short as 100 ns can be used to develop hypotheses for molecular design; in fact, shorter simulations may obviate strained poses, quantify hydrogen bond persistence, or reveal water-mediated networks. Clustering of MD frames can yield enriched pharmacophore hypotheses that account for or quantify dynamic donor/acceptor and hydrophobic patterns not accessible from a single snapshot [23]. For prioritization, end-point methods (MM-GBSA/MM-PBSA) or targeted FEP runs on MD-relaxed poses help rank series and rationalize structure–activity relationship, while steered MD provides qualitative rankings for unbinding kinetics in transporters and GPCRs [61]. Machine learning is increasingly used to compress, classify, and predict behavior from large trajectories. Markov state models (MSMs) coarse-grain dynamics into metastable states connected by transition probabilities, enabling estimates of rates, pathways, and long-timescale observables from many short simulations [62]. Deep-learning architectures such as VAMPnets learn slow collective variables and state decompositions directly from coordinates, improving the objectivity and reproducibility of clustering and facilitating mechanism discovery and rare-event prediction [63]. These methodologies may identify an event representing a pocket opening, find ligand able conformations for ensemble docking, or even pinpoint conformers appropriate for free-energy calculations focused on a particular pathway, enabling more accurate selection of conformations for downstream free-energy calculations. The unique advantages of MD are its physical interpretability and its temporal resolution. The physical interpretability of MD can help answer questions about why a pose has good docking stability, how water reorganizes, and which conformations an allosteric modulator stabilizes. The challenge with any predictive power lies in the accuracy of the force field, sampling techniques, and system preparation. Some common artifacts of MD may be due to imprecise salt bridges, misprotonation of residues or ligands, unstable ligation of metals or analytes, or an otherwise insufficient ion or lipid composition. Practical strategies include (i) selecting a force field validated for the system class (CHARMM36m, AMBER ff14SB, OPLS3e), (ii) building ensembles of starting structures (multiple receptor conformations and ligand tautomers), (iii) monitoring stability with RMSD/RMSF, hydrogen-bond occupancy, and water residence times, (iv) using enhanced sampling when functional transitions are slow (accelerated or steered MD), and (v) reserving rigorous free-energy methods for late-stage prioritization where small potency differences matter [33,34,35,60,61]. With these safeguards and ML-assisted analysis, MD now routinely elevates docking and pharmacophore efforts from pose generation to mechanism-aware design [23,62,63].

6. Case Studies of Computational Drug Discovery

Several successful drugs have been developed using computational approaches. Representative examples are summarized below in Table 3.

7. Virtual Screening: Large-Scale Compound Prioritization

Virtual screening (VS) prioritizes candidates from massive chemical libraries by computationally estimating their likelihood to bind a biological target. In practice, VS takes two complementary forms. Ligand-based VS (LBVS) infers activity from molecular similarity, pharmacophoric patterns, or learned embeddings of molecules with known activity; it excels when target structures are uncertain, but the Structure Activity Relationship is rich. Structure-based VS (SBVS) relies on protein structures to dock and score candidates in silico; it is most informative when high-quality experimental or predicted (e.g., AlphaFold) structures are available. Modern discovery programs commonly use both LBVS to rapidly down-select ultra-large libraries and SBVS (docking, rescoring, pose filtering) to refine the final shortlist. Public and commercial repositories now routinely supply billions of purchasable or make-on-demand molecules (e.g., ZINC20/22, PubChem, ChEMBL, and enumerated universes such as GDB-17), enabling VS at unprecedented scale (Table 4). A typical SBVS pipeline proceeds as: (1) library acquisition/design and curation; (2) physicochemical and liability filtering (e.g., Lipinski/Veber-style criteria, medicinal chemistry rules); (3) optional LBVS or pharmacophore triage; (4) docking (HTVS/SP/XP) with consensus rescoring; (5) post-docking filtering (strain, protein–ligand contacts, synthetic tractability, novelty); and (6) selection for synthesis and experimental validation. Iterative cycles then integrate cheminformatics clustering, ADMET prediction, and medicinal chemistry feedback to converge on tractable chemotypes. For SBVS, widely used docking engines include Glide (HTVS/SP/XP), AutoDock Vina, DOCK, GOLD, and others; physics-based free-energy refinements (e.g., FEP+) increasingly sit downstream of docking for potency ranking among close analogs. LBVS and field-based methods (e.g., from Cresset) complement docking with shape/electrostatic similarity and pharmacophore queries; deep-learning toolkits and benchmarks (e.g., GuacaMol, MOSES) support rapid model development for activity prediction and generative design. Enterprise platforms (e.g., Schrödinger VS workflows) orchestrate these steps with parallelization. While cloud/HPC solutions enable elastic scaling when libraries exceed 108 compounds. Campaigns docking 108–109+ molecules have uncovered entirely new chemotypes against diverse targets, demonstrating that scale itself is a discovery lever [64]. docked ~170 million make-on-demand compounds, experimentally confirming nanomolar hits with novel scaffolds at AmpC β-lactamase and D4 receptor. Open-source pipelines such as VirtualFlow subsequently industrialized billion-molecule SBVS on commodity cloud/HPC, providing turnkey workflows for pre-filtering, docking, and results management. Elastic compute has moved VS from fixed clusters to cloud-native scheduling and GPU acceleration, improving wall-clock throughput while enabling iterative enrichment strategies (e.g., quick-and-cheap docking→focused high-precision rescoring). This has normalized “campaigns” that iterate between computational enrichment and make-on-demand synthesis within weeks rather than months. Machine learning now prioritizes library regions before docking, reducing computation by several orders of magnitude. “Deep Docking” learns from an initial sparse docking pass to triage the study of an ultra-large library, while independent deep-learning studies have prospectively discovered active chemotypes (e.g., halicin as a novel antibiotic) from millions of candidates. In parallel, benchmark suites (GuacaMol, MOSES) have standardized evaluation of generative and predictive models used to guide VS.
Computational hits must be confirmed experimentally (biochemical/biophysical assays), then winnowed by developability. Orthogonal confirmation (e.g., thermal shift, SPR, ITC, crystallography/cryo-EM) helps verify binding mode and rule out assay artifacts. Prospective triage integrates in silico ADMET (solubility, permeability, metabolic stability) and liability filters. Classical guidelines (Rule-of-Five; polar surface area/rotatable bonds) remain useful first-pass heuristics, though modern practice couples them with model-based predictions and medicinal chemistry.

8. Future Directions and Role of AI in Drug Discovery

Multimethod pipelines Pharmacophore → Docking → MD → Free Energy yield better decision-quality than any single technique. Pharmacophore queries (ligand- or structure-based) rapidly focus libraries on interaction hypotheses. Docking then proposes concrete poses consistent with the pharmacophore and receptor geometry. Short MD relaxations stabilize induced-fit states, identify conserved waters, and flag unstable poses; consensus contact/energy metrics prune false positives. Finally, relative binding free energy (e.g., FEP+) resolves tight structure activity relationship within a chemotype, ranking analogs at the ~1 kcal·mol−1 level to drive synthesis. Structural coverage is the linchpin of SBVS, and breakthroughs in structure prediction now “unlock” targets formerly inaccessible to docking. AlphaFold and related methods deliver atomic-level models that, after limited refinement, can seed docking/MD workflows when experimental structures are unavailable. The introduction of AlphaFold 3 and diffusion-based modeling approaches represents a paradigm shift in computational drug discovery. Unlike earlier versions, AlphaFold 3 can model multi-component biological systems, including ligand binding and macromolecular interactions, thereby bridging the gap between static structure prediction and functional molecular recognition [72]. In practice, teams assess multiple AF2 models/ensembles, apply binding-site refinements (e.g., side-chain repacking, MD sampling), and validate emergent poses with pharmacophore consistency and experimental structure–activity relationship. Integration is not only sequential but adaptive. AlphaFold 3 and diffusion-based modeling approaches represent major advances toward modeling multi-component biological systems. However, challenges remain in achieving consistent accuracy for protein–ligand interactions, nucleic acid complexes, and post-translational modifications. Continued integration with molecular dynamics, docking, and experimental validation will be essential to fully realize their potential in drug discovery workflows. AI-guided enrichment loops interleave with physics-based steps: an initial docking subset trains a model (e.g., Deep Docking) that triages the remaining library; top predictions are re-docked, MD-filtered, and funneled to synthesis. At a larger scale, such iterations can be parallelized using VirtualFlow-style orchestrators across thousands of cores/GPUs, with chemoinformatics clustering for scaffold diversity. The consequence is higher hit rates and chemotypes per compound synthesized compared with commonly used fixed-screen strategies. In addition to conventional machine learning approaches, recent advances have led to the development of integrated artificial intelligence (AI) and large language model (LLM)-driven platforms for drug discovery [73,74]. Recent advances have led to the emergence of integrated AI- and large language model (LLM)-driven platforms for drug discovery, including systems such as PharmAgents, FROGENT, LIDDiA, AgentD, and AutoBinder Agent. The objectives of these platforms are to integrate various steps in the drug discovery process by integrating generative modeling, cheminformatics, molecular design, and automated reasoning in end-to-end or closed-loop applications. With the help of LLMs, they can aid in hypothesis generation, suggest new chemical structures, and refine candidates through iterative feedback and refinement of candidate properties and feedback loops, shortening the design-make-test cycle. Although promising, there are a number of challenges. It remains not easy to rationalize proposed molecules or mechanisms as the interpretation of model decisions remains limited. The quality and variety of training data are important to the reliability and generalizability of predictions, bringing up the issue of bias and robustness. Moreover, benchmarking systems of measuring these platforms are yet to be standardized, making it difficult to compare methods objectively. Above all, predictions provided by these systems must be hardened by experimental validation since in silico performance is not necessarily translated into biological activity. Thus, though the platforms developed with the help of LLM are an important step in the direction of autonomous drug discovery, they can be considered as supplementary tools that complement, but do not substitute, developed computational and experimental strategies [75,76,77]. These platforms operate within closed-loop frameworks that integrate computational prediction with iterative feedback, significantly improving efficiency, scalability, and hit identification in modern drug discovery pipelines.
Deveral studies demonstrate this synergistic effect: (i) the ultralarge efficacy data set of high-throughput SBVS, rapid one-stoichiometry synthesis of hits on demand, and iterative lead optimization to the nanomolar range, proving that scale plus incremental improvement yields first-in-class scaffolds. During medicinal chemistry cycles, FEP+ has prospectively influenced analog selection and reduced the synthesis workload by zooming in on potentially the most promising substitutions. Docking scores do not correlate well with affinity, and enrichment depends on the target/protein state. Consensus scoring, physics-based rescoring (MM/GBSA or FEP), and MD-derived in silico stability only partially abrogate this gap (Table 5). For example, VS results are highly dependent on the curation of protein structures, protonation/tautomer states, and assay descriptors. In response to community guidance, we prioritize careful scrubbing of chemical/biological data, open workflows, and notebooks/containers that can be re-run for reproducibility and reuse.
The FAIR data principles also advocate for findable, accessible, interoperable, and reusable datasets and models. Popular enrichment benchmarks such as DUD-E catalyzed method development but also introduced analog/decoy biases that can inflate performance estimates; careful benchmark selection and “bias-controlled” tests are essential. When possible, prospective validation or blinded external sets should be used to verify generalization. As AI prioritizes larger fractions of libraries, interpretability (e.g., pharmacophore/interaction attributions), uncertainty quantification, and documentation of training data become central for decision-making, auditing, and regulatory dialogue. VS increasingly draws on integrated public/private data; teams should respect licenses, privacy constraints, and data-sharing policies while adhering to FAIR/traceability practices. Quantum algorithms promise advantages for electronic-structure problems central to binding and reactivity, potentially improving the fidelity of scoring and induced-fit modeling. Near-term devices (via VQE/ADAPT-VQE) and error-mitigated simulations target small active sites and fragments; for the medium term, hybrid quantum-classical workflows may refine docking poses or parameterize bespoke interactions that challenge classical force fields. Although timelines remain uncertain, the trajectory suggests domain-specific quantum acceleration will complement rather than replace classical VS/MD. Foundation models and reinforcement learning increasingly propose synthetically accessible, property-constrained molecules that meet multi-parameter objectives before docking/MD. Community benchmarks (GuacaMol, MOSES) have improved rigor in evaluating distribution learning, validity, novelty, and goal-directed optimization (Table 6). The emerging best practice is closed-loop design: generate→triage by QSAR/ADMET→dock/score→MD refine→pick for synthesis→feedback assay data to retrain. Integration of patient-derived omics, perturbation signatures (e.g., the next-generation Connectivity Map), and target structures will enable individualized VS campaigns: predict vulnerabilities from transcriptomic/proteomic profiles, prioritize compounds with matched mechanisms, and validate in patient-derived models (Figure 5). Computational scaling and causal modeling will be key to translating this promise to the clinic. Routine near-atomic cryo-EM now resolves dynamic states and ligandable cryptic pockets for previously intractable targets. Combining cryo-EM ensembles with MD-derived conformational sampling supports ensemble docking and pocket-state-aware VS, especially for allosteric and membrane proteins. The future of VS is elastic: orchestrators that spin up tens of thousands of CPU/GPU cores on demand, autoscale storage and databases, and couple active learning with make-on-demand synthesis vendors. Platforms such as VirtualFlow already demonstrate billion-scale SBVS with modular pre-/post-filters, while integration with enterprise cheminformatics ensures traceable data lineage from screen to clinic.

9. Conclusions

Recent advances across structure-based and ligand-based approaches have reshaped small-molecule discoveries. Docking engines sampled poses more efficiently and, when coupled with consensus or physics-grounded rescoring and explicit treatment of waters, delivered stronger early enrichment. Pharmacophore modeling had progressed from geometric templates to energy-weighted and ensemble hypotheses that captured receptor plasticity and enabled credible scaffold hopping. Molecular dynamics routinely reached microsecond scales on accessible hardware, revealed cryptic pockets, quantified water networks, and supported endpoint and alchemical free-energy calculations suitable for prospective decisions. Ultra-large virtual screening became practical through curated make-on-demand libraries, elastic computing, and reliable workflow automation, allowing campaigns across hundreds of millions of candidates. High-accuracy structure prediction broadened the tractable target set and, after limited refinement, supported routine structure-based campaigns. The greatest gains arose when methods were combined rather than used alone. Libraries shaped by medicinal-chemistry rules and pharmacophore filters were docked at scale, then triaged by contact patterns, strain energy, synthetic tractability, and diversity. Short dynamics run stabilized promising poses, discarded strained ones, and supplied trajectory-derived pharmacophores. For closely related analogs, rigorous free-energy calculations ranked substitutions at decision-making resolution and reduced avoidable synthesis. Throughout, biophysical confirmation, structural follow-up, and cellular assays closed the loop from computation to experiment. This integrated cycle had shortened design–make–test timelines, increased hit novelty, and improved the credibility of advancement decisions. Sustained progress depended on standardization, transparency, and collaboration. Preparation protocols, protonation and tautomer rules, and threshold criteria were documented, versioned, and shared. Benchmarks addressed bias and emphasized prospective validation with metrics for early recognition and scaffold novelty. Datasets, models, and code followed FAIR principles to enable reuse and independent checks. Finally, partnerships across academia, industry, and computing providers aligned with open libraries, make-on-demand synthesis, and orthogonal assay panels have turned scalable computation into reproducible medicines with greater efficiency and lower cost.

Author Contributions

Conceptualization, A.A.; methodology, A.A.; validation, S.M.A.; formal analysis, A.A. and S.M.A.; investigation, A.A.; writing—original draft preparation, A.A.; writing—review and editing, A.A. and S.M.A.; visualization, S.M.A.; supervision, A.A.; project administration, A.A. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by the Prince Sattam bin Abdulaziz University research project number (PSAU/2025/03/36618).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable.

Acknowledgments

The authors extend their appreciation to Prince Sattam bin Abdulaziz University for funding this research work through the project number (PSAU/2025/03/36618).

Conflicts of Interest

The author declares no conflict of interest.

References

  1. Nagpure, N.; Raut, H.; Kamble, S.; Rasala, T. Redefining Preclinical Research Paradigms: AI-Driven Drug Discovery as a Transformative Approach to Accelerate Innovation, Improve Predictive Accuracy, and Reduce Reliance on Animal Testing. J. Drug Deliv. Ther. 2025, 15, 115–128. [Google Scholar] [CrossRef]
  2. Pathak, S.; Kushwaha, S.P.; Verma, S.; Deep, P.; Gupta, D. Advances in Computational Chemistry for Drug Discovery. J. Drug Discov. Health Sci. 2024, 1, 146–152. [Google Scholar] [CrossRef]
  3. Serrano-Morrás, Á.; Bertran-Mostazo, A.; Miñarro-Lleonar, M.; Comajuncosa-Creus, A.; Cabello, A.; Labranya, C.; Escudero, C.; Tian, T.V.; Khutorianska, I.; Radchenko, D.S.; et al. A bottom-up approach to find lead compounds in expansive chemical spaces. Commun. Chem. 2025, 8, 225. [Google Scholar] [CrossRef] [PubMed]
  4. Corrêa Veríssimo, G.; Salgado Ferreira, R.; Gonçalves Maltarollo, V. Ultra-Large Virtual Screening: Definition, Recent Advances, and Challenges in Drug Design. Mol. Inform. 2025, 44, e202400305. [Google Scholar] [CrossRef]
  5. Veríssimo, R.F.; Matias, P.H.F.; Barbosa, M.R.; Neto, F.O.S.; Neto, B.A.D.; de Oliveira, H.C.B. Integrating Machine Learning and SHAP Analysis to Advance the Rational Design of Benzothiadiazole Derivatives with Tailored Photophysical Properties. J. Chem. Inf. Model. 2025, 65, 7874–7886. [Google Scholar] [CrossRef]
  6. Bhagat, R.T.; Butle, S.R.; Khobragade, D.S.; Wankhede, S.B.; Prasad, C.C.; Mahure, D.S.; Armarkar, A.V. Molecular Docking in Drug Discovery. J. Pharm. Res. Int. 2021, 33, 46–58. [Google Scholar] [CrossRef]
  7. Khattri, R.B.; Morris, D.L.; Bilinovich, S.M.; Manandhar, E.; Napper, K.R.; Sweet, J.W.; Modarelli, D.A.; Leeper, T.C. Identifying Ortholog Selective Fragment Molecules for Bacterial Glutaredoxins by NMR and Affinity Enhancement by Modification with an Acrylamide Warhead. Molecules 2019, 25, 147. [Google Scholar] [CrossRef]
  8. Awoonor-Williams, E.; Dickson, C.J.; Furet, P.; Golosov, A.A.; Hornak, V. Leveraging Advanced In Silico Techniques in Early Drug Discovery: A Study of Potent Small-Molecule YAP-TEAD PPI Disruptors. J. Chem. Inf. Model. 2023, 63, 2520–2531. [Google Scholar] [CrossRef]
  9. Korb, O.; Finn, P.W.; Jones, G. The cloud and other new computational methods to improve molecular modelling. Expert Opin. Drug Discov. 2014, 9, 1121–1131. [Google Scholar] [CrossRef] [PubMed]
  10. Zhou, G.; Rusnac, D.V.; Park, H.; Canzani, D.; Nguyen, H.M.; Stewart, L.; Bush, M.F.; Nguyen, P.T.; Wulff, H.; Yarov-Yarovoy, V.; et al. An artificial intelligence accelerated virtual screening platform for drug discovery. Nat. Commun. 2024, 15, 7761. [Google Scholar] [CrossRef]
  11. Jiang, J.; Wang, G.; Li, D.; Hayes, N.; Jones, B.; Shi, Y.; Qiu, H.; Zhang, B.; Zhou, T.; Wei, G.W. Unexpected Applications of AlphaFold in Molecular Sciences. Annu. Rev. Biochem. 2026. ahead of print. [Google Scholar] [CrossRef]
  12. Zhao, N.; Wu, T.; Wang, W.; Zhang, L.; Gong, X. Review and comparative analysis of methods and advancements in predicting protein complex structure. Interdiscip. Sci. Comput. Life Sci. 2024, 16, 261–288. [Google Scholar] [CrossRef]
  13. Blake, J.; Laird, E. Chppter 30. Recent advances in virtual ligand screening. Annu. Rep. Med. Chem. 2003, 38, 305–314. [Google Scholar]
  14. Kumar, S.; Kumar, Y. Innovations in molecular docking: A detailed analysis of methodological developments and their applications in drug discovery. Int. J. Pharma Prof. Res. 2024, 15, 52–67. [Google Scholar] [CrossRef]
  15. Cele, F.N.; Ramesh, M.; Soliman, M.E. Per-residue energy decomposition pharmacophore model to enhance virtual screening in drug discovery: A study for identification of reverse transcriptase inhibitors as potential anti-HIV agents. Drug Des. Dev. Ther. 2016, 10, 1365–1377. [Google Scholar]
  16. Joseph-McCarthy, D.; Thomas, B.E.; Belmarsh, M.; Moustakas, D.; Alvarez, J.C. Pharmacophore-based molecular docking to account for ligand flexibility. Proteins 2003, 51, 172–188. [Google Scholar] [CrossRef] [PubMed]
  17. Mahrous, R.S.; Fathy, H.M.; Abu EL-Khair, R.M.; Omar, A.A.; Ibrahim, R.S. Molecular docking and pharmacophore modelling; A bridged explanation with emphasis on validation. J. Adv. Pharm. Sci. 2024, 1, 138–152. [Google Scholar] [CrossRef]
  18. Jiménez-Luna, J.; Grisoni, F.; Weskamp, N.; Schneider, G. Artificial intelligence in drug discovery: Recent advances and future perspectives. Expert Opin. Drug Discov. 2021, 16, 949–959. [Google Scholar] [CrossRef] [PubMed]
  19. Frank, Y.; Unger, R.; Senderowitz, H. Statistical analysis of sequential motifs at biologically relevant protein-protein interfaces. Comput. Struct. Biotechnol. J. 2024, 23, 1244–1259. [Google Scholar] [CrossRef]
  20. Sadybekov, A.V.; Katritch, V. Computational approaches streamlining drug discovery. Nature 2023, 616, 673–685. [Google Scholar] [CrossRef]
  21. Mihai, D.P.; Nitulescu, G.M. Computer-aided drug design and drug discovery. Pharmaceuticals 2025, 18, 436. [Google Scholar] [CrossRef]
  22. Friesner, R.A.; Banks, J.L.; Murphy, R.B.; Halgren, T.A.; Klicic, J.J.; Mainz, D.T.; Repasky, M.P.; Knoll, E.H.; Shelley, M.; Perry, J.K.; et al. Glide: A new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy. J. Med. Chem. 2004, 47, 1739–1749. [Google Scholar] [CrossRef] [PubMed]
  23. Hollingsworth, S.A.; Dror, R.O. Molecular Dynamics Simulation for All. Neuron 2018, 99, 1129–1143. [Google Scholar] [CrossRef] [PubMed]
  24. Eberhardt, J.; Santos-Martins, D.; Tillack, A.F.; Forli, S. AutoDock Vina 1.2.0: New Docking Methods, Expanded Force Field, and Python Bindings. J. Chem. Inf. Model. 2021, 61, 3891–3898. [Google Scholar] [CrossRef] [PubMed]
  25. Friesner, R.A.; Murphy, R.B.; Zhang, Y.; Xiong, Y.; Devlaminck, P.A.; Tubert-Brohman, I.; Jerome, S.V. Glide WS: Methodology and Initial Assessment of Performance for Docking Accuracy and Virtual Screening. J. Chem. Theory Comput. 2025, 21, 12696–12708. [Google Scholar] [CrossRef] [PubMed]
  26. Garbagnoli, M.; Linciano, P.; Listro, R.; Rossino, G.; Vasile, F.; Collina, S. Biophysical Assays for Investigating Modulators of Macromolecular Complexes: An Overview. ACS Omega 2024, 9, 17691–17705. [Google Scholar] [CrossRef]
  27. Dixon, S.L.; Smondyrev, A.M.; Knoll, E.H.; Rao, S.N.; Shaw, D.E.; Friesner, R.A. PHASE: A new engine for pharmacophore perception, 3D QSAR model development, and 3D database screening: 1. Methodology and preliminary results. J. Comput. Aided Mol. Des. 2006, 20, 647–671. [Google Scholar] [CrossRef]
  28. Lipinski, C.A.; Lombardo, F.; Dominy, B.W.; Feeney, P.J. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv. Drug Deliv. Rev. 2001, 46, 3–26. [Google Scholar] [CrossRef]
  29. Ewing, T.J.; Makino, S.; Skillman, A.G.; Kuntz, I.D. DOCK 4.0: Search strategies for automated molecular docking of flexible molecule databases. J. Comput. Aided Mol. Des. 2001, 15, 411–428. [Google Scholar] [CrossRef] [PubMed]
  30. Zhong, S.; Zhang, Y.; Xiu, Z. Rescoring ligand docking poses. Curr. Opin. Drug Discov. Dev. 2010, 13, 326–334. [Google Scholar]
  31. Labute, P. LowModeMD—Implicit low-mode velocity filtering applied to conformational search of macrocycles and protein loops. J. Chem. Inf. Model. 2010, 50, 792–800. [Google Scholar] [CrossRef]
  32. Klimenko, K. Computer-Aided Drug Design of Broad-Spectrum Antiviral Compounds. Ph.D. Thesis, Institut AV Bogatsky de Chimie Physique, Université de Strasbourg, Odessa, Ukraine, 2017. [Google Scholar]
  33. Huang, J.; Rauscher, S.; Nawrocki, G.; Ran, T.; Feig, M.; de Groot, B.L.; Grubmüller, H.; MacKerell, A.D., Jr. CHARMM36m: An improved force field for folded and intrinsically disordered proteins. Nat. Methods 2017, 14, 71–73. [Google Scholar] [CrossRef]
  34. Maier, J.A.; Martinez, C.; Kasavajhala, K.; Wickstrom, L.; Hauser, K.E.; Simmerling, C. ff14SB: Improving the Accuracy of Protein Side Chain and Backbone Parameters from ff99SB. J. Chem. Theory Comput. 2015, 11, 3696–3713. [Google Scholar] [CrossRef] [PubMed]
  35. Roos, K.; Wu, C.; Damm, W.; Reboul, M.; Stevenson, J.M.; Lu, C.; Dahlgren, M.K.; Mondal, S.; Chen, W.; Wang, L.; et al. OPLS3e: Extending Force Field Coverage for Drug-Like Small Molecules. J. Chem. Theory Comput. 2019, 15, 1863–1874. [Google Scholar] [CrossRef] [PubMed]
  36. Irwin, J.J.; Tang, K.G.; Young, J.; Dandarchuluun, C.; Wong, B.R.; Khurelbaatar, M.; Moroz, Y.S.; Mayfield, J.; Sayle, R.A. ZINC20-A Free Ultralarge-Scale Chemical Database for Ligand Discovery. J. Chem. Inf. Model. 2020, 60, 6065–6073. [Google Scholar] [CrossRef] [PubMed]
  37. Mysinger, M.M.; Carchia, M.; Irwin, J.J.; Shoichet, B.K. Directory of useful decoys, enhanced (DUD-E): Better ligands and decoys for better benchmarking. J. Med. Chem. 2012, 55, 6582–6594. [Google Scholar] [CrossRef]
  38. Venkatraman, V.; Colligan, T.H.; Lesica, G.T.; Olson, D.R.; Gaiser, J.; Copeland, C.J.; Wheeler, T.J.; Roy, A. Drugsniffer: An open source workflow for virtually screening billions of molecules for binding affinity to protein targets. Front. Pharmacol. 2022, 13, 874746. [Google Scholar] [CrossRef]
  39. Gorgulla, C.; Boeszoermenyi, A.; Wang, Z.-F.; Fischer, P.D.; Coote, P.W.; Padmanabha Das, K.M.; Malets, Y.S.; Radchenko, D.S.; Moroz, Y.S.; Scott, D.A. An open-source drug discovery platform enables ultra-large virtual screens. Nature 2020, 580, 663–668. [Google Scholar] [CrossRef]
  40. Gaulton, A.; Hersey, A.; Nowotka, M.; Bento, A.P.; Chambers, J.; Mendez, D.; Mutowo, P.; Atkinson, F.; Bellis, L.J.; Cibrián-Uhalte, E.; et al. The ChEMBL database in 2017. Nucleic Acids Res. 2017, 45, D945–D954. [Google Scholar] [CrossRef]
  41. Jumper, J.; Evans, R.; Pritzel, A.; Green, T.; Figurnov, M.; Ronneberger, O.; Tunyasuvunakool, K.; Bates, R.; Žídek, A.; Potapenko, A.; et al. Highly accurate protein structure prediction with AlphaFold. Nature 2021, 596, 583–589. [Google Scholar] [CrossRef]
  42. Lyskov, S.; Gray, J.J. The RosettaDock server for local protein-protein docking. Nucleic Acids Res. 2008, 36, W233–W238. [Google Scholar] [CrossRef]
  43. Amaro, R.E.; Baudry, J.; Chodera, J.; Demir, Ö.; McCammon, J.A.; Miao, Y.; Smith, J.C. Ensemble Docking in Drug Discovery. Biophys. J. 2018, 114, 2271–2278. [Google Scholar] [CrossRef] [PubMed]
  44. McNutt, A.T.; Francoeur, P.; Aggarwal, R.; Masuda, T.; Meli, R.; Ragoza, M.; Sunseri, J.; Koes, D.R. GNINA 1.0: Molecular docking with deep learning. J. Cheminform. 2021, 13, 43. [Google Scholar] [CrossRef]
  45. Cavasotto, C.N.; Adler, N.S.; Aucar, M.G. Quantum Chemical Approaches in Structure-Based Virtual Screening and Lead Optimization. Front. Chem. 2018, 6, 188. [Google Scholar] [CrossRef] [PubMed]
  46. Teng, H.; Wang, R.; Shen, Y.; Yuan, Y.; Kingsford, C. DTMol: Pocket-based molecular docking using diffusion transformers. bioRxiv 2025. [Google Scholar] [CrossRef]
  47. Shaker, B.; Barakat, K. Harnessing Deep Learning and Generative AI for Molecular Docking Simulations: Tools, Challenges, and Future Directions. In Molecular Docking in Biomedical Engineering and Computational Chemistry; IntechOpen: London, UK, 2025. [Google Scholar]
  48. Jin, Z.; Du, X.; Xu, Y.; Deng, Y.; Liu, M.; Zhao, Y.; Zhang, B.; Li, X.; Zhang, L.; Peng, C.; et al. Structure of M(pro) from SARS-CoV-2 and discovery of its inhibitors. Nature 2020, 582, 289–293. [Google Scholar] [CrossRef]
  49. Manglik, A.; Lin, H.; Aryal, D.K.; McCorvy, J.D.; Dengler, D.; Corder, G.; Levit, A.; Kling, R.C.; Bernat, V.; Hübner, H.; et al. Structure-based discovery of opioid analgesics with reduced side effects. Nature 2016, 537, 185–190. [Google Scholar] [CrossRef]
  50. IUPAC. Pharmacophore, 5.0.0 ed.; IUPAC, Ed.; International Union of Pure and Applied Chemistry (IUPAC): Research Triangle Park, NC, USA, 2025. [Google Scholar]
  51. Imrie, F.; Hadfield, T.E.; Bradley, A.R.; Deane, C.M. Deep generative design with 3D pharmacophoric constraints. Chem. Sci. 2021, 12, 14577–14589. [Google Scholar] [CrossRef] [PubMed]
  52. Trott, O.; Olson, A.J. AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J. Comput. Chem. 2010, 31, 455–461. [Google Scholar] [CrossRef] [PubMed]
  53. Verdonk, M.L.; Cole, J.C.; Hartshorn, M.J.; Murray, C.W.; Taylor, R.D. Improved protein–ligand docking using GOLD. Proteins Struct. Funct. Bioinform. 2003, 52, 609–623. [Google Scholar] [CrossRef]
  54. Wolber, G.; Langer, T. LigandScout: 3-D pharmacophores derived from protein-bound ligands and their use as virtual screening filters. J. Chem. Inf. Model. 2005, 45, 160–169. [Google Scholar] [CrossRef]
  55. Kim, S.; Chen, J.; Cheng, T.; Gindulyte, A.; He, J.; He, S.; Li, Q.; Shoemaker, B.A.; Thiessen, P.A.; Yu, B.; et al. PubChem 2023 update. Nucleic Acids Res. 2023, 51, D1373–D1380. [Google Scholar] [CrossRef] [PubMed]
  56. Ruddigkeit, L.; van Deursen, R.; Blum, L.C.; Reymond, J.L. Enumeration of 166 billion organic small molecules in the chemical universe database GDB-17. J. Chem. Inf. Model. 2012, 52, 2864–2875. [Google Scholar] [CrossRef]
  57. Chen, L.; Cruz, A.; Ramsey, S.; Dickson, C.J.; Duca, J.S.; Hornak, V.; Koes, D.R.; Kurtzman, T. Hidden bias in the DUD-E dataset leads to misleading performance of deep learning in structure-based virtual screening. PLoS ONE 2019, 14, e0220113. [Google Scholar] [CrossRef] [PubMed]
  58. Brown, N.; Fiscato, M.; Segler, M.H.S.; Vaucher, A.C. GuacaMol: Benchmarking Models for de Novo Molecular Design. J. Chem. Inf. Model. 2019, 59, 1096–1108. [Google Scholar] [CrossRef] [PubMed]
  59. Polykovskiy, D.; Zhebrak, A.; Sanchez-Lengeling, B.; Golovanov, S.; Tatanov, O.; Belyaev, S.; Kurbanov, R.; Artamonov, A.; Aladinskiy, V.; Veselov, M.; et al. Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models. Front. Pharmacol. 2020, 11, 565644. [Google Scholar] [CrossRef]
  60. Hamelberg, D.; Mongan, J.; McCammon, J.A. Accelerated molecular dynamics: A promising and efficient simulation method for biomolecules. J. Chem. Phys. 2004, 120, 11919–11929. [Google Scholar] [CrossRef]
  61. Homeyer, N.; Gohlke, H. Free Energy Calculations by the Molecular Mechanics Poisson-Boltzmann Surface Area Method. Mol. Inform. 2012, 31, 114–122. [Google Scholar] [CrossRef]
  62. Chodera, J.D.; Noé, F. Markov state models of biomolecular conformational dynamics. Curr. Opin. Struct. Biol. 2014, 25, 135–144. [Google Scholar] [CrossRef]
  63. Mardt, A.; Pasquali, L.; Wu, H.; Noé, F. VAMPnets for deep learning of molecular kinetics. Nat. Commun. 2018, 9, 5. [Google Scholar] [CrossRef]
  64. Lyu, J.; Wang, S.; Balius, T.E.; Singh, I.; Levit, A.; Moroz, Y.S.; O’Meara, M.J.; Che, T.; Algaa, E.; Tolmachova, K.; et al. Ultra-large library docking for discovering new chemotypes. Nature 2019, 566, 224–229. [Google Scholar] [CrossRef]
  65. Ain, Q.U.; Aleksandrova, A.; Roessler, F.D.; Ballester, P.J. Machine-learning scoring functions to improve structure-based binding affinity prediction and virtual screening. Wiley Interdiscip. Rev. Comput. Mol. Sci. 2015, 5, 405–424. [Google Scholar] [CrossRef]
  66. Chen, P.; Ke, Y.; Lu, Y.; Du, Y.; Li, J.; Yan, H.; Zhao, H.; Zhou, Y.; Yang, Y. DLIGAND2: An improved knowledge-based energy function for protein-ligand interactions using the distance-scaled, finite, ideal-gas reference state. J. Cheminform. 2019, 11, 52. [Google Scholar] [CrossRef]
  67. Li, H.; Sze, K.H.; Lu, G.; Ballester, P.J. Machine-learning scoring functions for structure-based drug lead optimization. Wiley Comput. Mol. Sci. 2020, 10, e1465. [Google Scholar] [CrossRef]
  68. Shirts, M.R.; Mobley, D.L.; Brown, S.P. Free-energy calculations in structure-based drug design. In Drug Design: Structure- and Ligand-Based Approaches; Cambridge University Press: Cambridge, UK, 2010; Volume 1, pp. 61–86. [Google Scholar]
  69. Horn, H.W.; Swope, W.C.; Pitera, J.W.; Madura, J.D.; Dick, T.J.; Hura, G.L.; Head-Gordon, T. Development of an improved four-site water model for biomolecular simulations: TIP4P-Ew. J. Chem. Phys. 2004, 120, 9665–9678. [Google Scholar] [CrossRef]
  70. Sethi, A.; Agrawal, N.; Brezovsky, J. Impact of water models on the structure and dynamics of enzyme tunnels. Comput. Struct. Biotechnol. J. 2024, 23, 3946–3954. [Google Scholar] [CrossRef]
  71. Sun, Q.; Li, Y.J.; Ning, S.B. Investigating the molecular mechanisms underlying the co-occurrence of Parkinson’s disease and inflammatory bowel disease through the integration of multiple datasets. Sci. Rep. 2024, 14, 17028. [Google Scholar] [CrossRef] [PubMed]
  72. Desai, D.; Kantliwala, S.V.; Vybhavi, J.; Ravi, R.; Patel, H.; Patel, J. Review of AlphaFold 3: Transformative advances in drug design and therapeutics. Cureus 2024, 16, e63646. [Google Scholar] [CrossRef]
  73. Mak, K.-K.; Wong, Y.-H.; Pichika, M.R. Artificial intelligence in drug discovery and development. In Drug Discovery and Evaluation: Safety and Pharmacokinetic Assays; Springer: Berlin/Heidelberg, Germany, 2024; pp. 1461–1498. [Google Scholar]
  74. Zheng, Y.; Koh, H.Y.; Ju, J.; Yang, M.; May, L.T.; Webb, G.I.; Li, L.; Pan, S.; Church, G. Large language models for drug discovery and development. Patterns 2025, 6, 101346. [Google Scholar] [CrossRef] [PubMed]
  75. Averly, R.; Baker, F.N.; Watson, I.A.; Ning, X. LIDDIA: Language-based Intelligent Drug Discovery Agent. Proc. Conf. Empir. Methods Nat. Lang. Process. 2025, 2025, 12015–12039. [Google Scholar]
  76. Ge, F.; Zhu, J.; Zhang, L.; Xiao, H.; Bao, X.; Xie, F.; Chen, D.; Lu, Y.; Wang, Y.; Guan, Z. AutoBinder Agent: An MCP-Based Agent for End-to-End Protein Binder Design. arXiv 2026, arXiv:2602.00019. [Google Scholar]
  77. Pan, Q.; Xu, D.; Yao, J.X.; Ma, L.; Zhu, Z.; Ji, J. Frogent: An end-to-end full-process drug design agent. arXiv 2025, arXiv:2508.10760. [Google Scholar]
  78. Kiss, L. Development of Novel Assays and Application of Innovative Screening Approaches for Improving Hit Discovery Efficiency of G Protein-Coupled Receptor Targets. Ph.D. Dissertation, Semmelweis University, Budapest, Hungary, 2021. [Google Scholar]
  79. Velazquez-Campoy, A.; Claro, B.; Abian, O.; Höring, J.; Bourlon, L.; Claveria-Gimeno, R.; Ennifar, E.; England, P.; Chaires, J.B.; Wu, D.; et al. A multi-laboratory benchmark study of isothermal titration calorimetry (ITC) using Ca(2+) and Mg(2+) binding to EDTA. Eur. Biophys. J. 2021, 50, 429–451. [Google Scholar] [CrossRef]
  80. Wienken, C.J.; Baaske, P.; Rothbauer, U.; Braun, D.; Duhr, S. Protein-binding assays in biological liquids using microscale thermophoresis. Nat. Commun. 2010, 1, 100. [Google Scholar] [CrossRef]
  81. Gucwa, M.; Bijak, V.; Zheng, H.; Murzyn, K.; Minor, W. CheckMyMetal (CMM): Validating metal-binding sites in X-ray and cryo-EM data. IUCrJ 2024, 11, 871–877. [Google Scholar] [CrossRef]
  82. Roskoski, R., Jr. Rule of five violations among the FDA-approved small molecule protein kinase inhibitors. Pharmacol. Res. 2023, 191, 106774. [Google Scholar] [CrossRef] [PubMed]
  83. Reese, T.C.; Devineni, A.; Smith, T.; Lalami, I.; Ahn, J.M.; Raj, G.V. Evaluating physiochemical properties of FDA-approved orally administered drugs. Expert Opin. Drug Discov. 2024, 19, 225–238. [Google Scholar] [CrossRef] [PubMed]
  84. Hornberger, K.R.; Araujo, E.M.V. Physicochemical Property Determinants of Oral Absorption for PROTAC Protein Degraders. J. Med. Chem. 2023, 66, 8281–8287. [Google Scholar] [CrossRef]
  85. O’Donovan, D.H.; De Fusco, C.; Kuhnke, L.; Reichel, A. Trends in Molecular Properties, Bioavailability, and Permeability across the Bayer Compound Collection. J. Med. Chem. 2023, 66, 2347–2360. [Google Scholar] [CrossRef]
  86. Baell, J.B.; Holloway, G.A. New substructure filters for removal of pan assay interference compounds (PAINS) from screening libraries and for their exclusion in bioassays. J. Med. Chem. 2010, 53, 2719–2740. [Google Scholar] [CrossRef]
  87. Brenk, R.; Schipani, A.; James, D.; Krasowski, A.; Gilbert, I.H.; Frearson, J.; Wyatt, P.G. Lessons learnt from assembling screening libraries for drug discovery for neglected diseases. ChemMedChem 2008, 3, 435–444. [Google Scholar] [CrossRef] [PubMed]
  88. Wilkinson, M.D.; Dumontier, M.; Aalbersberg, I.J.; Appleton, G.; Axton, M.; Baak, A.; Blomberg, N.; Boiten, J.W.; da Silva Santos, L.B.; Bourne, P.E.; et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 2016, 3, 160018. [Google Scholar] [CrossRef]
  89. Fourches, D.; Muratov, E.; Tropsha, A. Trust, but Verify II: A Practical Guide to Chemogenomics Data Curation. J. Chem. Inf. Model. 2016, 56, 1243–1252. [Google Scholar] [CrossRef] [PubMed]
  90. Scantlebury, J.; Vost, L.; Carbery, A.; Hadfield, T.E.; Turnbull, O.M.; Brown, N.; Chenthamarakshan, V.; Das, P.; Grosjean, H.; von Delft, F.; et al. A Small Step Toward Generalizability: Training a Machine Learning Scoring Function for Structure-Based Virtual Screening. J. Chem. Inf. Model. 2023, 63, 2960–2974. [Google Scholar] [CrossRef]
  91. Chen, L.; Blay, V.; Ballester, P.J.; Houston, D.R. SCORCH2: A Generalized Heterogeneous Consensus Model for High-Enrichment Interaction-Based Virtual Screening. Adv. Sci. 2025, 12, e08318. [Google Scholar] [CrossRef] [PubMed]
  92. Durant, G.; Boyles, F.; Birchall, K.; Marsden, B.; Deane, C.M. Robustly interrogating machine learning-based scoring functions: What are they learning? Bioinformatics 2025, 41, btaf040. [Google Scholar] [CrossRef]
Figure 1. Workflow from target identification to lead optimization. Data resources feed docking, pharmacophore modeling, and virtual screening. Integration and experimental workflows provide feedback, enabling ultra-large library search and prioritization of chemotypes.
Figure 1. Workflow from target identification to lead optimization. Data resources feed docking, pharmacophore modeling, and virtual screening. Integration and experimental workflows provide feedback, enabling ultra-large library search and prioritization of chemotypes.
Pharmaceutics 18 00565 g001
Figure 2. Framework linking data repositories to docking, pharmacophore modeling, and molecular dynamics, supported by experimental workflows. Panels summarize software, advances, limitations, with integration feedback highlighting antiviral, oncology, and CNS key applications.
Figure 2. Framework linking data repositories to docking, pharmacophore modeling, and molecular dynamics, supported by experimental workflows. Panels summarize software, advances, limitations, with integration feedback highlighting antiviral, oncology, and CNS key applications.
Pharmaceutics 18 00565 g002
Figure 3. Comprehensive schematic linking data sources to pharmacophore modeling, molecular dynamics, virtual screening, and docking, plus validation metrics. Arrows depict integration feedback, with applications and limitations: scoring, flexibility, waters, metals, uncertainty.
Figure 3. Comprehensive schematic linking data sources to pharmacophore modeling, molecular dynamics, virtual screening, and docking, plus validation metrics. Arrows depict integration feedback, with applications and limitations: scoring, flexibility, waters, metals, uncertainty.
Pharmaceutics 18 00565 g003
Figure 4. Data sources, docking, pharmacophore modeling, and virtual screening with representative software. Lower panels present advances, applications, and limitations; laboratory photographs illustrate assay context alongside computational outputs.
Figure 4. Data sources, docking, pharmacophore modeling, and virtual screening with representative software. Lower panels present advances, applications, and limitations; laboratory photographs illustrate assay context alongside computational outputs.
Pharmaceutics 18 00565 g004
Figure 5. Approaches, libraries, tools, breakthroughs, validation, integration, challenges, and future directions. Boxes, arrows, and photographs summarize workflows from library design and docking to assay confirmation, considerations enabling discovery.
Figure 5. Approaches, libraries, tools, breakthroughs, validation, integration, challenges, and future directions. Boxes, arrows, and photographs summarize workflows from library design and docking to assay confirmation, considerations enabling discovery.
Pharmaceutics 18 00565 g005
Table 1. Core computational approaches and where they fit in drug discovery programs.
Table 1. Core computational approaches and where they fit in drug discovery programs.
ModalityPrimary GoalTypical InputsKey OutputsStrengthsMain
Limitations
Representative ToolsCitation(s)
Molecular dockingPredict binding mode and rank candidatesPrepared protein (X-ray/cryo-EM/AF2), curated ligands, microstatesPoses, interaction maps, docking scoresFast triage; structure-aware hypothesesScoring/solvation approximations; limited receptor flexibilityAutoDock Vina, Glide, GOLD, DOCK[22,29,30]
Pharmacophore modelingCapture essential 3D features for activityActive ligands and/or protein–ligand complexesFeature hypotheses; 3D queriesScaffold hopping; ultra-fast pre-filteringOverfitting risk; conformer/feature quality sensitivePHASE, LigandScout, MOE, PharmaGist[27,31,32]
Molecular dynamics (MD)Probe flexibility, water networks, pose stabilityProtein–ligand complex, force field, solvent/ionsTrajectories, RMSD/RMSF, H-bond/water analyses; MM-GB/SA/FEPMechanistic insight; validates poses; supports ΔΔGSampling cost; FF/setup sensitivityCHARMM36m, AMBER ff14SB, OPLS3e[23,33,34,35]
Virtual screening (VS)Prioritize hits from ultra-large librariesZINC/ChEMBL/PubChem/GDB; docking/LBVS filtersRanked shortlists; clustered chemotypesScales to 108–109; cloud/HPC readyBenchmark bias; hit confirmation requiredVirtualFlow; LBVS + SBVS pipelines[36,37,38,39]
Table 2. Software, platforms, and data resources used across workflows.
Table 2. Software, platforms, and data resources used across workflows.
CategoryTool/ResourceLicenseNotable CapabilitiesTypical ScaleCitation
Docking engineAutoDock VinaOpen sourceStochastic search; empirical scoring; multithreaded105–107[52]
Docking engineGlide (HTVS/SP/XP)CommercialHierarchical filters; pose refinement; XP scoring105–107[22]
Docking engineGOLDCommercialGA search; side-chain flexibility; constraints105–106[53]
Docking engineDOCKOpen sourceShape/grid; anchor-and-grow; modular105–107[29]
Protein–protein dockingRosettaDock (server)AcademicRigid-body + side-chain optimization; web accessInterface sets[42]
PharmacophorePHASECommercialCommon-pharmacophore ID; 3D-QSAR106–108 (filter)[27]
PharmacophoreLigandScoutCommercial/academicFeature extraction from complexes; 3D queries106–108[54]
SuiteMOE (+ LowModeMD)CommercialConformers; pocket/feature tools; induced-fit aidsProject-scale[31]
Orchestrator (VS)VirtualFlowOpen sourceBillion-scale docking; engine-agnostic; cloud/HPC108–109+[39]
LibraryZINC20/22FreePurchasable make-on-demand; ready-to-dock109-class[36]
LibraryChEMBLFreeAssay-annotated activities; targetsMillions of activities[40]
LibraryPubChemFreeOpen compounds and bioassays100M+ compounds[55]
Enumerated universeGDB-17Free166B enumerated molecules1011+[56]
StructuresProtein Data Bank (PDB)FreeStandardized macromolecular structures200k+ entries[24,25]
BenchmarkDUD-EFreeActives/decoys for many targetsBenchmark sets[37]
Benchmark critiqueDUD-E bias analysisFreeAnalog/decoy bias diagnosticsBenchmark audit[57]
Structure predictionAlphaFoldFree for modelsNear-experimental single-chain modelsProteome-scale[41]
ML benchmarksGuacaMol/MOSESOpen sourceGenerative design benchmarkingModel-dependent[58,59]
Table 3. Drugs delivered with computational pipelines (docking/pharmacophore/MD/VS).
Table 3. Drugs delivered with computational pipelines (docking/pharmacophore/MD/VS).
Drug (Year)Target/IndicationPrimary Computational Technique(s) CreditedWhat the Pipeline Contributed
(Very Brief)
Captopril (1981)ACE/hypertensionEarly structure-based design guided by carboxypeptidase-A modelsTransition-state/active-site modeling yielded thiol-bearing inhibitors with oral activity
Zanamivir (1999)Influenza neuraminidase/influenzaSBDD + docking on NA crystal structuresRational modifications to sialic-acid scaffold to exploit NA catalytic site; first-in-class NA inhibitor
Saquinavir (1995)HIV-1 protease/HIVSBDD from protease–peptide complexesTransition-state mimic designed from active-site geometry; launched first HIV PI
Indinavir (1996)HIV-1 protease/HIVSBDD with iterative docking/optimizationOptimized P1/P2 groups for S1/S2 subsites; improved oral PK
Ritonavir (1996)HIV-1 protease/HIVSBDD (crystallography-guided)Potent PI that became PK booster after metabolic insights
Dorzolamide (1995)Carbonic anhydrase II/glaucomaSBDDActive-site geometry (Zn2+ coordination) guided sulfonamide design
Tirofiban (1998)Integrin αIIbβ3/ACSSBDD/mimetic design from RGD-ligand structuresCrystal structures with tirofiban/eptifibatide informed small-molecule antagonist design
Aliskiren (2007)Renin/hypertensionSBDD + modeling on renin structuresNon-peptidic scaffold engineered for S1/S3/S1′ pockets; oral renin inhibitor
Boceprevir (2011)HCV NS3 protease/HCVSBDDWarhead and P1/P2 optimization for covalent reversible inhibition
Rivaroxaban (2011)Factor Xa/anticoagulationSBDD + crystallographyStructure-guided optimization and FXa co-crystal analysis supported binding-mode tuning
Baloxavir marboxil (2018)Influenza cap-dependent endonuclease/influenzaSBDD on PA endonucleaseMetal-chelation pharmacophore and pocket mapping drove first-in-class CEN inhibitor
Table 4. Scoring families, force fields, and free-energy methods (what they model and when to use).
Table 4. Scoring families, force fields, and free-energy methods (what they model and when to use).
DomainMethod/ModelWhat It Modeled/
Optimized
StrengthsLimitationsTypical UseCitation
Docking scoring (empirical)GlideScore; AutoDock4/VinavdW, H-bonding, desolvation terms fit to dataFast; good enrichmentTransferability limitsPrimary SBVS ranking[22]
Docking scoring (knowledge-based)Statistical PMF/X-Score (generic)Statistical atom–atom potentialsSimple; robustCoarse physicsComplementary rescoring[57,65,66]
Docking scoring (ML)RF-Score; NNScore; GNINA CNNData-learned pose/affinity from structuresCaptures nonlinearity; strong top-NNeeds curated data; bias riskRescoring top poses[44]
Physics-like rankingMM-PB/GBSAContinuum solvation + force fieldInterpretable; fast triageDielectric/entropy sensitivePost-docking triage[61]
Alchemical ΔΔGFEP (RBFE)Relative binding free energy~1 kcal·mol−1 resolutionSetup/sampling costLead optimization[67]
Alchemical ΔG analysisThermodynamic integrationGradient-based alchemy (λ windows)Rigorous; generalComplex setupTight Structure activity relationship decisions[68]
Force field (proteins)CHARMM36mFolded and IDP proteinsBalanced backbone/IDPχ issues for some residuesGeneral MD[33]
Force field (proteins)AMBER ff14SBProteinsUpdated side-chain/backboneNeeds matched ligand paramsGeneral MD[34]
Force field (proteins/ligands)OPLS3eDrug-like ligands + proteinsBroad ligand coverageLicensedLead optimization MD[35]
Water modelsTIP3P; TIP4P-EwSolvent representationStandardized hydrationModel-specific limitsRoutine MD[69,70,71]
Table 5. Future directions and role of AI in drug discovery.
Table 5. Future directions and role of AI in drug discovery.
DirectionWhat AI/Tech AddsConcrete Example(s)Expected Impact
Ultra-large virtual screening (108–109+ molecules)Cloud/HPC orchestration; adaptive schedulingVirtualFlow enables billion-scale SBVS, modular docking stacksOrders-of-magnitude expansion of search space; more novel chemotypes
AI-guided triage for VSLearn from sparse docking to skip most of the libraryDeep Docking cuts compute by ~50×; consensus/pose filters downstreamSame hit rate at fraction of cost/time
DL-rescoring and pose selectionCNN/GDL models refine ranks/poses post-dockingGNINA family improves top-n pose accuracyBetter early enrichment; fewer false positives
Structural coverage via AIHigh-accuracy protein structures when experiments lackAlphaFold proteome-scale structures“Unlocks” SBDD for previously intractable targets
Bias-aware benchmarkingDetect spurious dataset signals in training/validationDUD-E bias analysis cautions DL claimsMore reliable, reproducible VS metrics
Omics-driven personalizationMatch compounds to patient/pathway signaturesCMap/LINCS L1000 profiles (1.3M signatures)Indication selection, MoA inference, repurposing
Cryo-EM + MD + ensemble dockingMulti-state targets and cryptic pocketsState-aware docking/MD on EM ensemblesAllosteric/drugging “undruggables”
Foundation models and generative designRapid de novo ideas under multi-param constraintsGuacaMol/MOSES benchmarks standardize evalTighter design-make-test loops
Table 6. Validation metrics, assays, library filters, and practical mitigations.
Table 6. Validation metrics, assays, library filters, and practical mitigations.
StageItem/Metric/AssayMeasures/GoalPractical
Notes/Action
When to UseCitation
BenchmarkingEF1%, EF5%, BEDROC, ROC-AUCEarly recognition and global rankingEmphasize EF/BEDROC for top-fraction testingRetrospective method evaluation[37]
Benchmark auditDUD-E bias checksAnalog/decoy bias detectionUse bias-controlled splits; external setsBefore claiming generalization[57]
Orthogonal confirmationSPRkon, koff, KDControl surface artifacts; kinetics insightHit/lead confirmation[78]
Orthogonal confirmationITCΔH, ΔS, KD, stoichiometryThermodynamics; gold standardCharacterize prioritized hits[79]
Orthogonal confirmationMSTKD in solutionLow sample; buffer-flexibleCross-validate binding[80]
Mode validationX-ray/cryo-EMBound structure and binding modeDeposit to PDB for reuseStructural follow-up[81]
Library filterRule-of-FiveOral developabilityUse as soft gate with medicinal reviewPre-screening[82,83]
Library filterVeber criteriaPermeability/bioavailabilityPSA/rotor thresholdsPre-screening[84,85]
Library filterPAINSRemove assay-interference chemotypesAvoid over-filtering true activesLibrary assembly[86]
Library filterBrenk alertsRemove problematic fragmentsCombining with expert reviewLibrary assembly[87]
Data stewardshipFAIR principlesReproducibility and reuseVersion inputs; share code/modelsAll stages[88]
CurationChemogenomics checklistsCorrect labels/states/microstatesScripted, versioned prepBefore modeling/VS[89]
Common pitfallMis-protonated residues/ligandsSpurious contacts/energiesEnumerate microstates; pKa reviewPrep and docking/MD[89]
Common pitfallIgnored conserved watersMissed bridges/affinityRetain/map waters; test displacementDocking triage[23]
Common pitfallMetal coordination errorsUnrealistic poses/instabilityAdd constraints; spot-check with QM/MMTargets with metals[61]
Common pitfallOverfitting benchmarksInflated enrichmentProspective tests; external setsMethod claims[90,91,92]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Altharawi, A.; Alqahtani, S.M. Integrative Computational Chemistry Approaches in Modern Drug Discovery: Advances in Docking, Pharmacophore Modeling, Molecular Dynamics, and Virtual Screening. Pharmaceutics 2026, 18, 565. https://doi.org/10.3390/pharmaceutics18050565

AMA Style

Altharawi A, Alqahtani SM. Integrative Computational Chemistry Approaches in Modern Drug Discovery: Advances in Docking, Pharmacophore Modeling, Molecular Dynamics, and Virtual Screening. Pharmaceutics. 2026; 18(5):565. https://doi.org/10.3390/pharmaceutics18050565

Chicago/Turabian Style

Altharawi, Ali, and Safar M. Alqahtani. 2026. "Integrative Computational Chemistry Approaches in Modern Drug Discovery: Advances in Docking, Pharmacophore Modeling, Molecular Dynamics, and Virtual Screening" Pharmaceutics 18, no. 5: 565. https://doi.org/10.3390/pharmaceutics18050565

APA Style

Altharawi, A., & Alqahtani, S. M. (2026). Integrative Computational Chemistry Approaches in Modern Drug Discovery: Advances in Docking, Pharmacophore Modeling, Molecular Dynamics, and Virtual Screening. Pharmaceutics, 18(5), 565. https://doi.org/10.3390/pharmaceutics18050565

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop