Integrative Computational Chemistry Approaches in Modern Drug Discovery: Advances in Docking, Pharmacophore Modeling, Molecular Dynamics, and Virtual Screening

Altharawi, Ali; Alqahtani, Safar M.

doi:10.3390/pharmaceutics18050565

Open AccessReview

Integrative Computational Chemistry Approaches in Modern Drug Discovery: Advances in Docking, Pharmacophore Modeling, Molecular Dynamics, and Virtual Screening

by

Ali Altharawi

^*

and

Safar M. Alqahtani

Department of Pharmaceutical Chemistry, College of Pharmacy, Prince Sattam Bin Abdulaziz University, Al-Kharj 11942, Saudi Arabia

^*

Author to whom correspondence should be addressed.

Pharmaceutics 2026, 18(5), 565; https://doi.org/10.3390/pharmaceutics18050565

Submission received: 28 January 2026 / Revised: 20 April 2026 / Accepted: 27 April 2026 / Published: 1 May 2026

(This article belongs to the Section Drug Targeting and Design)

Download

Browse Figures

Versions Notes

Abstract

Computational chemistry has played a central role in early-stage drug discovery by accelerating target selection, hit identification, and lead optimization. This review summarizes recent developments in molecular docking, pharmacophore modeling, molecular dynamics (MD), and virtual screening (VS), with a focus on their application in practical drug discovery workflows. Advances in docking protocols, including consensus scoring, physics-based rescoring, and ensemble approaches, addressed the challenges of receptor flexibility. Both ligand-based and structure-based pharmacophore models facilitated scaffold hopping and guided library prioritization. MD simulations were used to assess binding pose stability, identify cryptic binding pockets, and characterize solvent interactions. These simulations also supported free-energy calculations using endpoint and alchemical methods. Large-scale VS campaigns employed curated compound libraries, often composed of make-on-demand molecules, and relied on high-performance computing or cloud infrastructure to screen up to 10⁹ compounds. Hits were validated using orthogonal biophysical assays and filtered by absorption, distribution, metabolism, excretion, and toxicity (ADMET) predictions. Integrated pipelines combining pharmacophore modeling, docking, MD, and free-energy calculations improved enrichment rates and reduced the number of compounds requiring synthesis. Several case studies demonstrated the identification of nanomolar-affinity leads from ultra-large screening campaigns. The review also addressed ongoing challenges, such as inconsistent scoring of binding affinity, protonation, and tautomeric errors, dataset bias, and reproducibility issues. Strategies to mitigate these limitations included standardized library preparation, adherence to FAIR (Findable, Accessible, Interoperable, and Reusable) data principles, and the use of prospective benchmarking protocols. The review discussed emerging trends, including the use of quantum chemistry for electronic structure refinement, ensemble docking guided by cryo-electron microscopy (cryo-EM) data, and the integration of computational tools with automated synthesis and high-throughput screening in closed-loop discovery systems. These approaches have the potential to accelerate the design–make–test cycle, increase hit novelty, and improve decision-making in early drug development programs.

Keywords:

molecular docking; pharmacophore modeling; molecular dynamics; virtual screening; free-energy calculations; structure prediction

1. Introduction

The process of drug discovery is complex and resource-intensive, historically involving both empirical screening approaches and rational drug design strategies, including early medicinal chemistry and structure-guided optimization [1]. Drug discovery remains one of the most time-intensive, expensive, and high-risk processes in biomedical research, and, on average, total costs exceed USD 2.6 billion, with timelines averaging 10–15 years per compound that receives market approval [2]. Despite significant investments in drug discovery, the attrition of drug candidates remains incredibly high, and reportedly only 1 in every 5000–10,000 compounds receives approval for market use [3].

In this challenging landscape, computational chemistry has become an integral component of the drug discovery pipeline, offering cost-effective, rapid alternatives to early-stage experimental screening [4,5]. Drug discovery starts with target identification and proceeds sequentially through lead discovery, optimization, preclinical, and clinical trials. At least through the first three stages of drug discovery, computational tools play an important role by helping prioritize hits and optimize molecular interactions [6]. Computational chemistry enables virtual screening of millions of compounds, simulation of protein–ligand dynamics, and prediction of absorption, distribution, metabolism, excretion, and toxicity (ADMET) profiles [7]. As experimental screening campaigns often involve high-throughput systems with limited throughput, in silico methods significantly reduce attrition by removing weak candidates before costly biological testing [8].

Structure-based drug design (SBDD) took root in the 1980s following the emergence of crystallographic techniques, but the computational revolution of the last two decades has made molecular modeling accessible to a much wider range of applications [9]. With the exponential increase in structural and chemical databases and the integration of artificial intelligence, in silico platforms are now capable of screening billions of compounds in a matter of days [4,10]. The relevance of these computational methods has increased further during pandemics and emerging disease outbreaks, where time-to-discovery is critical [2]. Recent advances such as AlphaFold 3 significantly extend structure prediction capabilities beyond single proteins, enabling accurate modeling of protein–ligand, protein–nucleic acid complexes, and post-translational modifications. These developments enhance the applicability of structure-based drug design by providing more realistic interaction models [11,12]. Modern approaches in the structure prediction, especially the development of AlphaFold 3 and diffusion-based predictors, have made the field of computational drug discovery much broader and are not limited to the performance of single-protein systems. The goals of these methods are to simulate more and more complicated biological assemblies, such as protein–ligand interactions, protein–nucleic acid complexes, and systems with post-translational modifications. These methods combine principles of deep learning with those of structural biology, which provides the possibility of bridging the gaps between the prediction of static structures and the dynamics of molecular recognition processes, which underlie the structure-based design of drugs in targets that were previously challenging. Nevertheless, even with these encouraging trends, generalizable and precise modeling of such complex systems is still a research area. Recent methods, such as AlphaFold 3, have proven to make significant strides, but nonetheless, are limited in their ability to predict binding affinities, ligand conformations, and interaction energetics in a variety of chemical and biological systems with low confidence. Specifically, issues with the capture of induced-fit effects, solvent interactions, and the effect of post-translational modifications on binding interfaces remain. Likewise, diffusion-based generative models, though strong in investigating structural and chemical space, need additional validation to guarantee their robustness and reproducibility in future applications. In turn, these new approaches are to be considered as complementary tools, not as substitutes for the old computational and experimental approaches. The combination of structure prediction models with the use of molecular docking, molecular dynamics simulations, and experimental validation is necessary to enhance reliability and interpretability. Further methodological development, comparison with high-quality datasets, and future validation experiments will be essential in order to maximize the potential of these methods in drug discovery pipelines. Among the most prominent computational approaches, molecular docking simulates the binding affinity between ligands and biological targets and is a foundational tool in virtual screening workflows [13]. Several docking algorithms, including AutoDock, GOLD, and Glide, have been fine-tuned to enhance overall binding mode prediction and scoring functions [14]. Pharmacophore modeling, another pivotal methodology, allows abstraction of critical features necessary for molecular recognition and is frequently utilized in the absence or limitations of crystal structures [15]. Both structure-based pharmacophores and ligand-based pharmacophores can also be incorporated into virtual screens to enhance the overall accuracy and minimize false positives. Molecular dynamics (MD) simulations provide atomistic perspectives of protein–ligand complexes over time, revealing conformational changes and confirming docking-predicted poses [8]. Rapid MD methods like accelerated MD and meta dynamics, along with the advantages of GPU-based parallelization, have made these simulations not only more accessible but also more accurate [16]. The integration of MD simulations, pharmacophore models, and docking provides a more realistic framework for studying drug–target interactions [17]. Virtual screening has become the procedure for filtering ultra-large libraries, enabling ligand- and structure-based workflows to be performed with high throughput and accuracy to rank or prioritize compounds for synthesis or testing [18]. Recent innovations in AI-guided screening, deep generative models, and transfer learning have led to adaptive systems that can design new compounds from scratch (e.g., de novo) [19]. Combining several computational methods has led to more robust workflows, with high hit-to-lead conversion rates and low attrition rates at late stages observed [19]. In light of these developments, computational methods have become indispensable tools not just for speeding drug discovery but also for improving the precision, safety, and affordability of drugs [20]. However, there are still limitations on accuracy, reliability, and generalizability that hinder universal acceptance [21]. This review examined recent advances in established computational chemistry methods, including molecular docking, pharmacophore modeling, molecular dynamics, and virtual screening, and their uses and limitations in practice for drug discovery pipelines. Another objective was to provide an overview of the connectivity between computational approaches and how integrating several methods can speed up the drug discovery process, offering powerful tools to address longstanding challenges in lead identification and optimization, thus setting the stage for the detailed discussions that follow in this review.

2. Computational Approaches in Drug Discovery

Computational chemistry has emerged as an essential pillar of modern drug discovery, significantly accelerating design–make–test–analyze cycles and reducing experimental effort across target identification, hit finding, and lead optimization. Structure-based methods, such as molecular docking and molecular dynamics (MD) simulations, are now increasingly intertwined with ligand-based approaches, such as pharmacophore modeling and cheminformatics-guided virtual screening (VS). Mature public data repositories and advances in structure prediction have vastly expanded the feasible problem space from thousands to billions of compounds and from single-protein to pathway- and network-level predictions. These advances enable tighter integration between in silico hypotheses and orthogonal biophysical and cellular assays, improving the precision and reproducibility of medicinal chemistry campaigns [22,23].

Structure-based screening typically begins with a high-quality receptor model derived from crystallography, cryo-EM, or NMR deposited in the Protein Data Bank (PDB). Docking engines (for example, AutoDock Vina and Glide) rapidly generate and score poses for millions of ligands, providing prioritized hit lists for purchase or synthesis [24,25]. In practice, computational triage is directly coupled to experimental validation through using biophysical assays such as differential scanning fluorimetry (DSF), surface plasmon resonance (SPR), and isothermal titration calorimetry (ITC) and cell-based readouts; iterative feedback refines receptor protonation states, binding-site water treatment, and ligand tautomers before escalating promising scaffolds into prospective structure-activity relationship studies [26]. MD simulations then probe pose stability, desolvation, and protein flexibility on nanosecond–microsecond timescales, highlight cryptic pockets, and support free-energy calculations to rationalize affinity changes observed in the bench campaigns [23]. Ligand-based methods complement this loop when structural data are sparse or when chemotypes are diversified around known actives. Pharmacophore modeling distills essential 3D features (e.g., H-bond donors/acceptors, hydrophobes, aromatic centroids) required for activity, enabling scaffold hopping and fast database queries (Table 1); platforms like PHASE couple pharmacophore perception with 3D QSAR to guide substituent changes with quantitative predictions [27]. To control attrition due to poor ADME, rule-based filters, most prominently Lipinski’s “Rule of Five”, flag liabilities early and focus synthesis on developable regions of chemical space without over-constraining novelty [28].

At the target-identification stage, transcriptomic perturbation resources such as the Connectivity Map (CMap/LINCS) allow researchers to link compounds, genes, and diseases by shared expression signatures (Figure 1). These signatures can uncover actionable mechanisms and prioritize targets for phenotypic hits when genetics or pathway evidence is equivocal. Once targets are selected, structure-enabled campaigns exploit high-resolution complexes and, increasingly, AI-predicted structures to expand coverage beyond traditionally tractable protein families. Docking and shape/pharmacophore screens seed initial hits; consensus scoring and rescoring with physics-based or ML-augmented models improve enrichment. Hit-to-lead optimization then leverages MD for water-network analysis, conformational selection, and binding-mode stabilization, informing substituent choices that balance potency and physicochemical properties. Throughout, medicinal chemistry teams integrate developability heuristics (e.g., RO5 compliance and its known exceptions) with iterative synthesis to avoid late-stage failures [23,28]. Large-scale virtual screening has matured from millions to billions of candidates via cloud and HPC platforms, workflow automation, and fault-tolerant orchestration. VirtualFlow exemplifies this shift, enabling ultra-large docking campaigns with modular support for multiple docking engines and robust pre-processing, which in turn has led to prospective validations against diverse targets. Open bioactivity repositories such as ChEMBL supply target-annotated structure–activity data that seed pharmacophore/ligand-based models, enable decoy generation, and support external validation of computational pipelines [40]. Early computer-aided design leaned heavily on interpretable rules and linear models, with RO5 serving as a pragmatic filter for oral drug-likeness during lead optimization [28]. Docking programs evolved in parallel, improving sampling and scoring through knowledge-based potentials and empirical terms; method benchmarks on Vina and Glide demonstrated substantial gains in pose prediction and enrichment versus earlier tools [22]. Over the last decade, two inflection points reshaped the field. First, routine microsecond-scale MD powered by algorithmic and hardware advances made it practical to account for receptor dynamics, allostery, and water rearrangements in design decisions rather than treating the protein as static [23]. Second, AI-based structure prediction achieved near-experimental accuracy for many single-chain proteins, vastly expanding the structural coverage for previously “undockable” targets and enabling downstream SBDD at scale [41]. These methodological shifts catalyzed new workflows: predicted structures can be refined by MD, cross-checked against homology models, and used directly for grid generation and docking; ambiguous binding-site residues can be enumerated across protonation/tautomer states and validated by pose stability. Pharmacophore hypotheses extracted from known ligands can be projected onto AI-predicted pockets to rationalize activity cliffs. In parallel, ML models trained on curated ChEMBL bioactivity and assay metadata guide active-learning loops, prioritize syntheses, and calibrate uncertainty, while expression-signature matching from CMap/LINCS links chemical perturbations to pathway rewiring for mechanism-of-action inference and indication expansion [40]. Progress in computational discovery is inseparable from the growth of high-quality public datasets and shared infrastructure. The PDB standardizes macromolecular structures, enabling consistent binding-site preparation, water and ion curation, and comparative modeling pipelines. ChEMBL delivers assay-level activity annotations crucial for kinetic vs. equilibrium distinctions, target engagement confidence, and negative data ingredients for reproducible model training and unbiased external validation. Expression-profiling resources such as the modern CMap make it feasible to connect small molecules to genetic programs, triage series by pathway selectivity, and anticipate liabilities by monitoring off-target transcriptional fingerprints [24,25,40].

3. Molecular Docking: Theory, Tools, and Applications

Molecular docking aims to predict how a ligand binds within a macromolecular target and to estimate the strength of this interaction. Typical workflows comprise (i) receptor and ligand preparation (protonation/tautomer assignment, conformer generation, binding-site definition), (ii) pose generation via systematic, stochastic, or knowledge-guided search, and (iii) scoring and ranking of poses using physics-based, empirical, or knowledge-based functions. Most docking programs treat the receptor as rigid or semi-flexible (sidechain rotamers or soft potentials), while the ligand explores translation, rotation, and torsions; induced fit is approximated by sidechain sampling or post-docking relaxation. Scoring functions balance terms for van der Waals complementarity, electrostatics, desolvation, and sometimes hydrogen bonding and metal coordination; consensus or rescoring strategies are often used to mitigate function-specific bias. Docking paradigms divide broadly into protein–ligand docking (typical drug-like molecules) and protein–protein docking (PPDock), the latter contending with far larger interfaces, rugged energy landscapes, and pronounced conformational change. For PPDock, sampling schemes must address multiscale translations and rotations of both partners, often guided by experimental restraints (e.g., XL-MS, NMR, cryo-EM) [22,42]. Multiple mature platforms underline the docking ecosystem. For instance, AutoDock/AutoDock Vina implemented an efficient stochastic search with multithreading and a reparametrized empirical scoring function that has served as one of the backbones of structure-based virtual screening (SBVS) and pose prediction. Glide combined hierarchical filtering with exhaustive pose refinement, as well as proprietary scoring (GlideScore), including strong early enrichment in SBVS benchmarks [22]. GOLD used a genetic algorithm to improve pose accuracy in challenging pockets by combining the flexibility of the protein’s side chains with user-specified constraints. Earlier releases of DOCK introduced a grid-based shape-complementarity approach and anchor-and-grow strategies, which eventually evolved into a modular suite widely used in academia [29]. For PPDock, RosettaDock enabled rigid-body sampling and optimizing some side-chain degrees of freedom within a Rosetta energy function, and it is accessible through an automated web server [42]. This suite of available dockers ranges from free/open-source (AutoDock Vina, DOCK) to commercially available tools (Glide, GOLD) and from command line engines to user-friendly servers for workflows ranging from exploratory screens of small sizes to ultra-large screen campaigns [22,29,42]. Ensemble docking addresses receptor plasticity by docking against multiple conformations derived from experimental structures, homology models, or molecular dynamics (MD) trajectories and aggregating results via consensus scoring or clustering. This strategy improves the chance of sampling near-native poses in flexible or allosteric sites and has shown practical benefits across GPCRs, kinases, and viral enzymes [43]. Deep learning for pose prediction and scoring is increasingly integrated into docking. Convolutional neural networks trained on protein–ligand grids can rescore poses to improve top-ranked accuracy without changing the search engine; GNINA 1.0, for example, couples CNN scoring to Vina-like sampling and demonstrates consistent pose improvements across standard benchmarks [44]. Beyond CNN rescoring, data-efficient graph and 3D-grid models continue to expand coverage to new protein families. QM/MM integration augments empirical scoring by recalculating key poses with quantum mechanical (QM) methods, capturing polarization, charge transfer, metal coordination, and covalent mechanisms that classical force fields struggle to model. Used selectively (e.g., rescoring top poses or refining ambiguous chemotypes), QM or QM/MM can correct rankings and reduce false positives, albeit at a higher cost [45]. These advances close the distance between rate and physical realism and lend a greater degree of reliable prospective design when combined with orthogonal filters (e.g., pharmacophore constraints, a water thermodynamics model, or an ADME predictor based on machine learning [43,44,45].

Recent advancements in docking include tools such as SMINA, QuickVina, DiffDock, and GNINA, which incorporate improved scoring functions and machine learning-based pose prediction. Diffusion-based docking methods, such as DiffDock, further enhance pose accuracy by modeling ligand placement as a generative process [46,47].

Docking permeates multiple phases of drug discovery. In the field of antivirals, docking was instrumental in triaging and prioritizing inhibitors of the SARS-CoV-2 main protease (Mpro) following crystal structure availability, highlighting SBVS in the hit identification phase and then medicinal chemistry optimization phase; this was also the first or most extensive report of working within the framework of SBVS to combine structure determination and discovery of inhibitors, further detailing structure–activity relationships and demonstrating how SBVS would further support atomically precise approaches during hit discovery [48]. In the CNS area, docking to GPCRs provides novel chemotypes; structure-based docking to the μ-opioid receptor assisted the discovery of PZM21, a Gi-biased agonist that provokes less respiratory depression in preclinical models (Figure 2), demonstrating how docking can leverage considerations for scaffolds with specific signaling profiles [49]. In oncology, docking campaigns commonly seed a series of kinase inhibitors and engage target protein–protein interactions (e.g., BCL-2 family, KRAS effectors), iteratively ranking members of fragments inhibited undefined hot spots, frequently followed by an induced-fit or MD refinement to assess selectivity. Successful efforts across therapeutic areas have been associated with careful target preparation (e.g., suitable biologically relevant protonation states of the molecule, metal or halogen treatment), diligent library design (physicochemical and substructure filters), and careful post-docking adjudication (pharmacophore fit, strain energy, water placement, QM/MM rescoring, experimental orthogonality, etc.) [48,49]. Scoring inaccuracies remain the primary bottleneck: empirical functions trade speed for approximations to solvation and entropy, leading to false positives/negatives and limited rank-order correlation with affinity. Protein flexibility and waters are imperfectly captured by single-structure docking; key rearrangements, cryptic pockets, or displaceable waters can defeat rigid-receptor assumptions. Chemistry edge cases, including covalent mechanisms, transition metals, tautomers/protonation microstates, halogen bonding, and pi-stacking anisotropy, require special handling. Dataset bias and reproducibility also matter. Retrospective enrichments may not translate prospectively if benchmarks overlap with training data (for ML-augmented scorers) or if preparation pipelines differ. Looking forward, several directions are promising. First, physics-aware ML hybrid models that preserve geometric and energetic constraints while learning from large structural corpora should improve pose plausibility and generalization beyond familiar pockets. Second, ensembles and water thermodynamics (e.g., GCMC water, water networks) will likely become standard in SBVS triage. Third, selective QM/MM rescoring and polarizable force fields can correct borderline decisions, especially for metalloproteins and covalent inhibitors. Workflow integration from pocket detection to docking to FEP/MD refinement, plus better uncertainty quantification and prospective benchmarking, will continue to raise confidence in docking-driven nominations [43,44,45].

4. Pharmacophore Modeling: From Concept to Clinical Candidates

A pharmacophore is the ensemble of steric and electronic features that a ligand must present to ensure optimal interactions with a biological target and elicit a response; common features include hydrogen-bond donors/acceptors, hydrophobes, aromatics, cations/anions, and metal binders [50]. In ligand-based modeling, multiple known actives are aligned to derive a consensus feature pattern that tolerates scaffold diversity and supports scaffold hopping. In structure-based modeling, features are extracted from a receptor–ligand complex (or a prepared apo pocket), often complemented by water and protonation analysis to refine the geometry and priorities of interaction hotspots (Figure 3). In practice, teams use ligand-based hypotheses for rapid library triage and structure-based (SB) models to rationalize SAR and steer substituent placement near obligatory interactions or displaceable waters, with the two approaches frequently combined in iterative cycles. Several mature tools support both hypothesis generation and large-scale screening. LigandScout derives 3D pharmacophores directly from protein-bound ligands and exports queries for fast shape/feature search. PHASE integrates common-pharmacophore identification, hypothesis scoring, and 3D-QSAR, and is widely used to couple pharmacophores with machine-learned activity models [27].

MOE provides comprehensive conformer generation and pocket/feature utilities; methods such as LowModeMD improve conformational sampling for alignment and hypothesis robustness [31]. For rapid, free ligand-based hypothesis building, PharmaGist aligns multiple ligands to identify common feature constellations. These platforms underpin both quick exploratory filters and production-grade virtual screening pipelines. Because pharmacophore hits are typically prioritized from very large libraries, early-recognition metrics are essential. Receiver-operating characteristic (ROC-AUC) provides a global view, but enrichment-focused measures such as EF at low database fractions and BEDROC more faithfully evaluate real-world screening, where only the top fraction is tested. Robust practice includes external decoy sets, retrospective recovery of known actives, and prospectively blinded tests that fix decision thresholds before synthesis.

Energy-optimized or e-pharmacophores project per-residue or per-feature energetic terms from a receptor–ligand complex onto pharmacophoric features, producing hypotheses that better reflect the underlying physics than purely geometric models. They are often used to re-rank or prune docking hits before. MD-refined pharmacophores incorporate receptor dynamics by extracting features from conformational ensembles; compared with single-snapshot models, MD-derived hypotheses can better separate actives from decoys and reveal transient/cryptic interactions. Beyond physics-based refinements, AI-enhanced pharmacophore workflows now condition generative or predictive models on 3D feature maps, improving control during linker/R-group design and accelerating hypothesis testing [51]. These strategies increase hit quality by reconciling speed (feature search) with realism (energetics and dynamics). Pharmacophore queries routinely seed phenotypic-to-target campaigns (back-mapping features from chemotyped hits), guide scaffold hopping with preserved interactions in GPCRs/kinases, and pre-filter ultra-large libraries before docking. They are especially valuable when target structures are incomplete or flexible, and when medicinal chemistry requires diverse chemotypes with shared binding logic (Figure 4). Key limitations persist. The hypotheses may overfit sparse training sets; feature geometry can be sensitive to conformer quality; and static models can miss water-mediated or induced-fit interactions unless augmented by MD or energetic analysis. Best practice combines ligand- and structure-based evidence, validates with enrichment-focused metrics (Table 2), and closes the loop with prospective assays to refine features and tolerances.

5. Molecular Dynamics (MD) Simulations: Capturing Flexibility and Dynamics

Molecular dynamics (MD) provides time-resolved, atomistic movies of biomolecules by integrating Newton’s equations of motion on a potential energy surface defined by a force field. For protein and protein–ligand complex, modern biomolecule-tuned force fields CHARMM36m, AMBER ff14SB, and OPLS3e deliver improved secondary-structure balance, sidechain rotamer distributions, and small-molecule compatibility, enabling routine simulations from nanoseconds to multi-microseconds on commodity GPUs [23,33,34,35]. MD complements static models by revealing conformational selection, induced fit, cryptic pocket formation, and ordered water networks that strongly modulate ligand binding and selectivity [23].

Conventional MD uses a fixed potential and timestep to sample thermally accessible motions and is the backbone for stability assessments of docked complexes. Enhanced sampling methods accelerate rare events. Accelerated MD lowers effective energy barriers to promote transitions between metastable states, improving exploration of loops, side chains, and allosteric sites [60]. Steered molecular dynamics (MD) utilizes time-dependent forces to study unbinding trajectories, rupture forces, or conformational switching. These studies typically yield mechanistic hypotheses that are addressed through mutagenesis or linker design. Meanwhile, in thermodynamic studies, end-point free energy approaches with MD files, such as MM-PBSA or MM-GBSA, can provide estimates of relative binding affinity and serve as efficient triage methods when coupled with adequate dielectric and entropy approximations [61]. Alchemical free energy methods (FEP) can be even more accurate in determining affinity, as they more accurately convert ligands along non-physical paths. Furthermore, more recently, when the workflow provides sufficiently rigorous sampling and error control, FEP can provide future estimates with sub-kilo-caloric precision for congeneric series of ligands. While many academic and commercial docking software can generate candidate poses, MD directly assesses the physical plausibility of these poses and resolves ambiguities arising from protonation, tautomerism, or the placement of waters between the ligand and the target. Additionally, while simulations may run for long periods, settings as short as 100 ns can be used to develop hypotheses for molecular design; in fact, shorter simulations may obviate strained poses, quantify hydrogen bond persistence, or reveal water-mediated networks. Clustering of MD frames can yield enriched pharmacophore hypotheses that account for or quantify dynamic donor/acceptor and hydrophobic patterns not accessible from a single snapshot [23]. For prioritization, end-point methods (MM-GBSA/MM-PBSA) or targeted FEP runs on MD-relaxed poses help rank series and rationalize structure–activity relationship, while steered MD provides qualitative rankings for unbinding kinetics in transporters and GPCRs [61]. Machine learning is increasingly used to compress, classify, and predict behavior from large trajectories. Markov state models (MSMs) coarse-grain dynamics into metastable states connected by transition probabilities, enabling estimates of rates, pathways, and long-timescale observables from many short simulations [62]. Deep-learning architectures such as VAMPnets learn slow collective variables and state decompositions directly from coordinates, improving the objectivity and reproducibility of clustering and facilitating mechanism discovery and rare-event prediction [63]. These methodologies may identify an event representing a pocket opening, find ligand able conformations for ensemble docking, or even pinpoint conformers appropriate for free-energy calculations focused on a particular pathway, enabling more accurate selection of conformations for downstream free-energy calculations. The unique advantages of MD are its physical interpretability and its temporal resolution. The physical interpretability of MD can help answer questions about why a pose has good docking stability, how water reorganizes, and which conformations an allosteric modulator stabilizes. The challenge with any predictive power lies in the accuracy of the force field, sampling techniques, and system preparation. Some common artifacts of MD may be due to imprecise salt bridges, misprotonation of residues or ligands, unstable ligation of metals or analytes, or an otherwise insufficient ion or lipid composition. Practical strategies include (i) selecting a force field validated for the system class (CHARMM36m, AMBER ff14SB, OPLS3e), (ii) building ensembles of starting structures (multiple receptor conformations and ligand tautomers), (iii) monitoring stability with RMSD/RMSF, hydrogen-bond occupancy, and water residence times, (iv) using enhanced sampling when functional transitions are slow (accelerated or steered MD), and (v) reserving rigorous free-energy methods for late-stage prioritization where small potency differences matter [33,34,35,60,61]. With these safeguards and ML-assisted analysis, MD now routinely elevates docking and pharmacophore efforts from pose generation to mechanism-aware design [23,62,63].

6. Case Studies of Computational Drug Discovery

Several successful drugs have been developed using computational approaches. Representative examples are summarized below in Table 3.

7. Virtual Screening: Large-Scale Compound Prioritization

Virtual screening (VS) prioritizes candidates from massive chemical libraries by computationally estimating their likelihood to bind a biological target. In practice, VS takes two complementary forms. Ligand-based VS (LBVS) infers activity from molecular similarity, pharmacophoric patterns, or learned embeddings of molecules with known activity; it excels when target structures are uncertain, but the Structure Activity Relationship is rich. Structure-based VS (SBVS) relies on protein structures to dock and score candidates in silico; it is most informative when high-quality experimental or predicted (e.g., AlphaFold) structures are available. Modern discovery programs commonly use both LBVS to rapidly down-select ultra-large libraries and SBVS (docking, rescoring, pose filtering) to refine the final shortlist. Public and commercial repositories now routinely supply billions of purchasable or make-on-demand molecules (e.g., ZINC20/22, PubChem, ChEMBL, and enumerated universes such as GDB-17), enabling VS at unprecedented scale (Table 4). A typical SBVS pipeline proceeds as: (1) library acquisition/design and curation; (2) physicochemical and liability filtering (e.g., Lipinski/Veber-style criteria, medicinal chemistry rules); (3) optional LBVS or pharmacophore triage; (4) docking (HTVS/SP/XP) with consensus rescoring; (5) post-docking filtering (strain, protein–ligand contacts, synthetic tractability, novelty); and (6) selection for synthesis and experimental validation. Iterative cycles then integrate cheminformatics clustering, ADMET prediction, and medicinal chemistry feedback to converge on tractable chemotypes. For SBVS, widely used docking engines include Glide (HTVS/SP/XP), AutoDock Vina, DOCK, GOLD, and others; physics-based free-energy refinements (e.g., FEP+) increasingly sit downstream of docking for potency ranking among close analogs. LBVS and field-based methods (e.g., from Cresset) complement docking with shape/electrostatic similarity and pharmacophore queries; deep-learning toolkits and benchmarks (e.g., GuacaMol, MOSES) support rapid model development for activity prediction and generative design. Enterprise platforms (e.g., Schrödinger VS workflows) orchestrate these steps with parallelization. While cloud/HPC solutions enable elastic scaling when libraries exceed 10⁸ compounds. Campaigns docking 10⁸–10⁹+ molecules have uncovered entirely new chemotypes against diverse targets, demonstrating that scale itself is a discovery lever [64]. docked ~170 million make-on-demand compounds, experimentally confirming nanomolar hits with novel scaffolds at AmpC β-lactamase and D4 receptor. Open-source pipelines such as VirtualFlow subsequently industrialized billion-molecule SBVS on commodity cloud/HPC, providing turnkey workflows for pre-filtering, docking, and results management. Elastic compute has moved VS from fixed clusters to cloud-native scheduling and GPU acceleration, improving wall-clock throughput while enabling iterative enrichment strategies (e.g., quick-and-cheap docking→focused high-precision rescoring). This has normalized “campaigns” that iterate between computational enrichment and make-on-demand synthesis within weeks rather than months. Machine learning now prioritizes library regions before docking, reducing computation by several orders of magnitude. “Deep Docking” learns from an initial sparse docking pass to triage the study of an ultra-large library, while independent deep-learning studies have prospectively discovered active chemotypes (e.g., halicin as a novel antibiotic) from millions of candidates. In parallel, benchmark suites (GuacaMol, MOSES) have standardized evaluation of generative and predictive models used to guide VS.

Computational hits must be confirmed experimentally (biochemical/biophysical assays), then winnowed by developability. Orthogonal confirmation (e.g., thermal shift, SPR, ITC, crystallography/cryo-EM) helps verify binding mode and rule out assay artifacts. Prospective triage integrates in silico ADMET (solubility, permeability, metabolic stability) and liability filters. Classical guidelines (Rule-of-Five; polar surface area/rotatable bonds) remain useful first-pass heuristics, though modern practice couples them with model-based predictions and medicinal chemistry.

8. Future Directions and Role of AI in Drug Discovery

Multimethod pipelines Pharmacophore → Docking → MD → Free Energy yield better decision-quality than any single technique. Pharmacophore queries (ligand- or structure-based) rapidly focus libraries on interaction hypotheses. Docking then proposes concrete poses consistent with the pharmacophore and receptor geometry. Short MD relaxations stabilize induced-fit states, identify conserved waters, and flag unstable poses; consensus contact/energy metrics prune false positives. Finally, relative binding free energy (e.g., FEP+) resolves tight structure activity relationship within a chemotype, ranking analogs at the ~1 kcal·mol⁻¹ level to drive synthesis. Structural coverage is the linchpin of SBVS, and breakthroughs in structure prediction now “unlock” targets formerly inaccessible to docking. AlphaFold and related methods deliver atomic-level models that, after limited refinement, can seed docking/MD workflows when experimental structures are unavailable. The introduction of AlphaFold 3 and diffusion-based modeling approaches represents a paradigm shift in computational drug discovery. Unlike earlier versions, AlphaFold 3 can model multi-component biological systems, including ligand binding and macromolecular interactions, thereby bridging the gap between static structure prediction and functional molecular recognition [72]. In practice, teams assess multiple AF2 models/ensembles, apply binding-site refinements (e.g., side-chain repacking, MD sampling), and validate emergent poses with pharmacophore consistency and experimental structure–activity relationship. Integration is not only sequential but adaptive. AlphaFold 3 and diffusion-based modeling approaches represent major advances toward modeling multi-component biological systems. However, challenges remain in achieving consistent accuracy for protein–ligand interactions, nucleic acid complexes, and post-translational modifications. Continued integration with molecular dynamics, docking, and experimental validation will be essential to fully realize their potential in drug discovery workflows. AI-guided enrichment loops interleave with physics-based steps: an initial docking subset trains a model (e.g., Deep Docking) that triages the remaining library; top predictions are re-docked, MD-filtered, and funneled to synthesis. At a larger scale, such iterations can be parallelized using VirtualFlow-style orchestrators across thousands of cores/GPUs, with chemoinformatics clustering for scaffold diversity. The consequence is higher hit rates and chemotypes per compound synthesized compared with commonly used fixed-screen strategies. In addition to conventional machine learning approaches, recent advances have led to the development of integrated artificial intelligence (AI) and large language model (LLM)-driven platforms for drug discovery [73,74]. Recent advances have led to the emergence of integrated AI- and large language model (LLM)-driven platforms for drug discovery, including systems such as PharmAgents, FROGENT, LIDDiA, AgentD, and AutoBinder Agent. The objectives of these platforms are to integrate various steps in the drug discovery process by integrating generative modeling, cheminformatics, molecular design, and automated reasoning in end-to-end or closed-loop applications. With the help of LLMs, they can aid in hypothesis generation, suggest new chemical structures, and refine candidates through iterative feedback and refinement of candidate properties and feedback loops, shortening the design-make-test cycle. Although promising, there are a number of challenges. It remains not easy to rationalize proposed molecules or mechanisms as the interpretation of model decisions remains limited. The quality and variety of training data are important to the reliability and generalizability of predictions, bringing up the issue of bias and robustness. Moreover, benchmarking systems of measuring these platforms are yet to be standardized, making it difficult to compare methods objectively. Above all, predictions provided by these systems must be hardened by experimental validation since in silico performance is not necessarily translated into biological activity. Thus, though the platforms developed with the help of LLM are an important step in the direction of autonomous drug discovery, they can be considered as supplementary tools that complement, but do not substitute, developed computational and experimental strategies [75,76,77]. These platforms operate within closed-loop frameworks that integrate computational prediction with iterative feedback, significantly improving efficiency, scalability, and hit identification in modern drug discovery pipelines.

Deveral studies demonstrate this synergistic effect: (i) the ultralarge efficacy data set of high-throughput SBVS, rapid one-stoichiometry synthesis of hits on demand, and iterative lead optimization to the nanomolar range, proving that scale plus incremental improvement yields first-in-class scaffolds. During medicinal chemistry cycles, FEP+ has prospectively influenced analog selection and reduced the synthesis workload by zooming in on potentially the most promising substitutions. Docking scores do not correlate well with affinity, and enrichment depends on the target/protein state. Consensus scoring, physics-based rescoring (MM/GBSA or FEP), and MD-derived in silico stability only partially abrogate this gap (Table 5). For example, VS results are highly dependent on the curation of protein structures, protonation/tautomer states, and assay descriptors. In response to community guidance, we prioritize careful scrubbing of chemical/biological data, open workflows, and notebooks/containers that can be re-run for reproducibility and reuse.

The FAIR data principles also advocate for findable, accessible, interoperable, and reusable datasets and models. Popular enrichment benchmarks such as DUD-E catalyzed method development but also introduced analog/decoy biases that can inflate performance estimates; careful benchmark selection and “bias-controlled” tests are essential. When possible, prospective validation or blinded external sets should be used to verify generalization. As AI prioritizes larger fractions of libraries, interpretability (e.g., pharmacophore/interaction attributions), uncertainty quantification, and documentation of training data become central for decision-making, auditing, and regulatory dialogue. VS increasingly draws on integrated public/private data; teams should respect licenses, privacy constraints, and data-sharing policies while adhering to FAIR/traceability practices. Quantum algorithms promise advantages for electronic-structure problems central to binding and reactivity, potentially improving the fidelity of scoring and induced-fit modeling. Near-term devices (via VQE/ADAPT-VQE) and error-mitigated simulations target small active sites and fragments; for the medium term, hybrid quantum-classical workflows may refine docking poses or parameterize bespoke interactions that challenge classical force fields. Although timelines remain uncertain, the trajectory suggests domain-specific quantum acceleration will complement rather than replace classical VS/MD. Foundation models and reinforcement learning increasingly propose synthetically accessible, property-constrained molecules that meet multi-parameter objectives before docking/MD. Community benchmarks (GuacaMol, MOSES) have improved rigor in evaluating distribution learning, validity, novelty, and goal-directed optimization (Table 6). The emerging best practice is closed-loop design: generate→triage by QSAR/ADMET→dock/score→MD refine→pick for synthesis→feedback assay data to retrain. Integration of patient-derived omics, perturbation signatures (e.g., the next-generation Connectivity Map), and target structures will enable individualized VS campaigns: predict vulnerabilities from transcriptomic/proteomic profiles, prioritize compounds with matched mechanisms, and validate in patient-derived models (Figure 5). Computational scaling and causal modeling will be key to translating this promise to the clinic. Routine near-atomic cryo-EM now resolves dynamic states and ligandable cryptic pockets for previously intractable targets. Combining cryo-EM ensembles with MD-derived conformational sampling supports ensemble docking and pocket-state-aware VS, especially for allosteric and membrane proteins. The future of VS is elastic: orchestrators that spin up tens of thousands of CPU/GPU cores on demand, autoscale storage and databases, and couple active learning with make-on-demand synthesis vendors. Platforms such as VirtualFlow already demonstrate billion-scale SBVS with modular pre-/post-filters, while integration with enterprise cheminformatics ensures traceable data lineage from screen to clinic.

9. Conclusions

Recent advances across structure-based and ligand-based approaches have reshaped small-molecule discoveries. Docking engines sampled poses more efficiently and, when coupled with consensus or physics-grounded rescoring and explicit treatment of waters, delivered stronger early enrichment. Pharmacophore modeling had progressed from geometric templates to energy-weighted and ensemble hypotheses that captured receptor plasticity and enabled credible scaffold hopping. Molecular dynamics routinely reached microsecond scales on accessible hardware, revealed cryptic pockets, quantified water networks, and supported endpoint and alchemical free-energy calculations suitable for prospective decisions. Ultra-large virtual screening became practical through curated make-on-demand libraries, elastic computing, and reliable workflow automation, allowing campaigns across hundreds of millions of candidates. High-accuracy structure prediction broadened the tractable target set and, after limited refinement, supported routine structure-based campaigns. The greatest gains arose when methods were combined rather than used alone. Libraries shaped by medicinal-chemistry rules and pharmacophore filters were docked at scale, then triaged by contact patterns, strain energy, synthetic tractability, and diversity. Short dynamics run stabilized promising poses, discarded strained ones, and supplied trajectory-derived pharmacophores. For closely related analogs, rigorous free-energy calculations ranked substitutions at decision-making resolution and reduced avoidable synthesis. Throughout, biophysical confirmation, structural follow-up, and cellular assays closed the loop from computation to experiment. This integrated cycle had shortened design–make–test timelines, increased hit novelty, and improved the credibility of advancement decisions. Sustained progress depended on standardization, transparency, and collaboration. Preparation protocols, protonation and tautomer rules, and threshold criteria were documented, versioned, and shared. Benchmarks addressed bias and emphasized prospective validation with metrics for early recognition and scaffold novelty. Datasets, models, and code followed FAIR principles to enable reuse and independent checks. Finally, partnerships across academia, industry, and computing providers aligned with open libraries, make-on-demand synthesis, and orthogonal assay panels have turned scalable computation into reproducible medicines with greater efficiency and lower cost.

Author Contributions

Conceptualization, A.A.; methodology, A.A.; validation, S.M.A.; formal analysis, A.A. and S.M.A.; investigation, A.A.; writing—original draft preparation, A.A.; writing—review and editing, A.A. and S.M.A.; visualization, S.M.A.; supervision, A.A.; project administration, A.A. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by the Prince Sattam bin Abdulaziz University research project number (PSAU/2025/03/36618).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable.

Acknowledgments

The authors extend their appreciation to Prince Sattam bin Abdulaziz University for funding this research work through the project number (PSAU/2025/03/36618).

Conflicts of Interest

The author declares no conflict of interest.

References

Nagpure, N.; Raut, H.; Kamble, S.; Rasala, T. Redefining Preclinical Research Paradigms: AI-Driven Drug Discovery as a Transformative Approach to Accelerate Innovation, Improve Predictive Accuracy, and Reduce Reliance on Animal Testing. J. Drug Deliv. Ther. 2025, 15, 115–128. [Google Scholar] [CrossRef]
Pathak, S.; Kushwaha, S.P.; Verma, S.; Deep, P.; Gupta, D. Advances in Computational Chemistry for Drug Discovery. J. Drug Discov. Health Sci. 2024, 1, 146–152. [Google Scholar] [CrossRef]
Serrano-Morrás, Á.; Bertran-Mostazo, A.; Miñarro-Lleonar, M.; Comajuncosa-Creus, A.; Cabello, A.; Labranya, C.; Escudero, C.; Tian, T.V.; Khutorianska, I.; Radchenko, D.S.; et al. A bottom-up approach to find lead compounds in expansive chemical spaces. Commun. Chem. 2025, 8, 225. [Google Scholar] [CrossRef] [PubMed]
Corrêa Veríssimo, G.; Salgado Ferreira, R.; Gonçalves Maltarollo, V. Ultra-Large Virtual Screening: Definition, Recent Advances, and Challenges in Drug Design. Mol. Inform. 2025, 44, e202400305. [Google Scholar] [CrossRef]
Veríssimo, R.F.; Matias, P.H.F.; Barbosa, M.R.; Neto, F.O.S.; Neto, B.A.D.; de Oliveira, H.C.B. Integrating Machine Learning and SHAP Analysis to Advance the Rational Design of Benzothiadiazole Derivatives with Tailored Photophysical Properties. J. Chem. Inf. Model. 2025, 65, 7874–7886. [Google Scholar] [CrossRef]
Bhagat, R.T.; Butle, S.R.; Khobragade, D.S.; Wankhede, S.B.; Prasad, C.C.; Mahure, D.S.; Armarkar, A.V. Molecular Docking in Drug Discovery. J. Pharm. Res. Int. 2021, 33, 46–58. [Google Scholar] [CrossRef]
Khattri, R.B.; Morris, D.L.; Bilinovich, S.M.; Manandhar, E.; Napper, K.R.; Sweet, J.W.; Modarelli, D.A.; Leeper, T.C. Identifying Ortholog Selective Fragment Molecules for Bacterial Glutaredoxins by NMR and Affinity Enhancement by Modification with an Acrylamide Warhead. Molecules 2019, 25, 147. [Google Scholar] [CrossRef]
Awoonor-Williams, E.; Dickson, C.J.; Furet, P.; Golosov, A.A.; Hornak, V. Leveraging Advanced In Silico Techniques in Early Drug Discovery: A Study of Potent Small-Molecule YAP-TEAD PPI Disruptors. J. Chem. Inf. Model. 2023, 63, 2520–2531. [Google Scholar] [CrossRef]
Korb, O.; Finn, P.W.; Jones, G. The cloud and other new computational methods to improve molecular modelling. Expert Opin. Drug Discov. 2014, 9, 1121–1131. [Google Scholar] [CrossRef] [PubMed]
Zhou, G.; Rusnac, D.V.; Park, H.; Canzani, D.; Nguyen, H.M.; Stewart, L.; Bush, M.F.; Nguyen, P.T.; Wulff, H.; Yarov-Yarovoy, V.; et al. An artificial intelligence accelerated virtual screening platform for drug discovery. Nat. Commun. 2024, 15, 7761. [Google Scholar] [CrossRef]
Jiang, J.; Wang, G.; Li, D.; Hayes, N.; Jones, B.; Shi, Y.; Qiu, H.; Zhang, B.; Zhou, T.; Wei, G.W. Unexpected Applications of AlphaFold in Molecular Sciences. Annu. Rev. Biochem. 2026. ahead of print. [Google Scholar] [CrossRef]
Zhao, N.; Wu, T.; Wang, W.; Zhang, L.; Gong, X. Review and comparative analysis of methods and advancements in predicting protein complex structure. Interdiscip. Sci. Comput. Life Sci. 2024, 16, 261–288. [Google Scholar] [CrossRef]
Blake, J.; Laird, E. Chppter 30. Recent advances in virtual ligand screening. Annu. Rep. Med. Chem. 2003, 38, 305–314. [Google Scholar]
Kumar, S.; Kumar, Y. Innovations in molecular docking: A detailed analysis of methodological developments and their applications in drug discovery. Int. J. Pharma Prof. Res. 2024, 15, 52–67. [Google Scholar] [CrossRef]
Cele, F.N.; Ramesh, M.; Soliman, M.E. Per-residue energy decomposition pharmacophore model to enhance virtual screening in drug discovery: A study for identification of reverse transcriptase inhibitors as potential anti-HIV agents. Drug Des. Dev. Ther. 2016, 10, 1365–1377. [Google Scholar]
Joseph-McCarthy, D.; Thomas, B.E.; Belmarsh, M.; Moustakas, D.; Alvarez, J.C. Pharmacophore-based molecular docking to account for ligand flexibility. Proteins 2003, 51, 172–188. [Google Scholar] [CrossRef] [PubMed]
Mahrous, R.S.; Fathy, H.M.; Abu EL-Khair, R.M.; Omar, A.A.; Ibrahim, R.S. Molecular docking and pharmacophore modelling; A bridged explanation with emphasis on validation. J. Adv. Pharm. Sci. 2024, 1, 138–152. [Google Scholar] [CrossRef]
Jiménez-Luna, J.; Grisoni, F.; Weskamp, N.; Schneider, G. Artificial intelligence in drug discovery: Recent advances and future perspectives. Expert Opin. Drug Discov. 2021, 16, 949–959. [Google Scholar] [CrossRef] [PubMed]
Frank, Y.; Unger, R.; Senderowitz, H. Statistical analysis of sequential motifs at biologically relevant protein-protein interfaces. Comput. Struct. Biotechnol. J. 2024, 23, 1244–1259. [Google Scholar] [CrossRef]
Sadybekov, A.V.; Katritch, V. Computational approaches streamlining drug discovery. Nature 2023, 616, 673–685. [Google Scholar] [CrossRef]
Mihai, D.P.; Nitulescu, G.M. Computer-aided drug design and drug discovery. Pharmaceuticals 2025, 18, 436. [Google Scholar] [CrossRef]
Friesner, R.A.; Banks, J.L.; Murphy, R.B.; Halgren, T.A.; Klicic, J.J.; Mainz, D.T.; Repasky, M.P.; Knoll, E.H.; Shelley, M.; Perry, J.K.; et al. Glide: A new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy. J. Med. Chem. 2004, 47, 1739–1749. [Google Scholar] [CrossRef] [PubMed]
Hollingsworth, S.A.; Dror, R.O. Molecular Dynamics Simulation for All. Neuron 2018, 99, 1129–1143. [Google Scholar] [CrossRef] [PubMed]
Eberhardt, J.; Santos-Martins, D.; Tillack, A.F.; Forli, S. AutoDock Vina 1.2.0: New Docking Methods, Expanded Force Field, and Python Bindings. J. Chem. Inf. Model. 2021, 61, 3891–3898. [Google Scholar] [CrossRef] [PubMed]
Friesner, R.A.; Murphy, R.B.; Zhang, Y.; Xiong, Y.; Devlaminck, P.A.; Tubert-Brohman, I.; Jerome, S.V. Glide WS: Methodology and Initial Assessment of Performance for Docking Accuracy and Virtual Screening. J. Chem. Theory Comput. 2025, 21, 12696–12708. [Google Scholar] [CrossRef] [PubMed]
Garbagnoli, M.; Linciano, P.; Listro, R.; Rossino, G.; Vasile, F.; Collina, S. Biophysical Assays for Investigating Modulators of Macromolecular Complexes: An Overview. ACS Omega 2024, 9, 17691–17705. [Google Scholar] [CrossRef]
Dixon, S.L.; Smondyrev, A.M.; Knoll, E.H.; Rao, S.N.; Shaw, D.E.; Friesner, R.A. PHASE: A new engine for pharmacophore perception, 3D QSAR model development, and 3D database screening: 1. Methodology and preliminary results. J. Comput. Aided Mol. Des. 2006, 20, 647–671. [Google Scholar] [CrossRef]
Lipinski, C.A.; Lombardo, F.; Dominy, B.W.; Feeney, P.J. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv. Drug Deliv. Rev. 2001, 46, 3–26. [Google Scholar] [CrossRef]
Ewing, T.J.; Makino, S.; Skillman, A.G.; Kuntz, I.D. DOCK 4.0: Search strategies for automated molecular docking of flexible molecule databases. J. Comput. Aided Mol. Des. 2001, 15, 411–428. [Google Scholar] [CrossRef] [PubMed]
Zhong, S.; Zhang, Y.; Xiu, Z. Rescoring ligand docking poses. Curr. Opin. Drug Discov. Dev. 2010, 13, 326–334. [Google Scholar]
Labute, P. LowModeMD—Implicit low-mode velocity filtering applied to conformational search of macrocycles and protein loops. J. Chem. Inf. Model. 2010, 50, 792–800. [Google Scholar] [CrossRef]
Klimenko, K. Computer-Aided Drug Design of Broad-Spectrum Antiviral Compounds. Ph.D. Thesis, Institut AV Bogatsky de Chimie Physique, Université de Strasbourg, Odessa, Ukraine, 2017. [Google Scholar]
Huang, J.; Rauscher, S.; Nawrocki, G.; Ran, T.; Feig, M.; de Groot, B.L.; Grubmüller, H.; MacKerell, A.D., Jr. CHARMM36m: An improved force field for folded and intrinsically disordered proteins. Nat. Methods 2017, 14, 71–73. [Google Scholar] [CrossRef]
Maier, J.A.; Martinez, C.; Kasavajhala, K.; Wickstrom, L.; Hauser, K.E.; Simmerling, C. ff14SB: Improving the Accuracy of Protein Side Chain and Backbone Parameters from ff99SB. J. Chem. Theory Comput. 2015, 11, 3696–3713. [Google Scholar] [CrossRef] [PubMed]
Roos, K.; Wu, C.; Damm, W.; Reboul, M.; Stevenson, J.M.; Lu, C.; Dahlgren, M.K.; Mondal, S.; Chen, W.; Wang, L.; et al. OPLS3e: Extending Force Field Coverage for Drug-Like Small Molecules. J. Chem. Theory Comput. 2019, 15, 1863–1874. [Google Scholar] [CrossRef] [PubMed]
Irwin, J.J.; Tang, K.G.; Young, J.; Dandarchuluun, C.; Wong, B.R.; Khurelbaatar, M.; Moroz, Y.S.; Mayfield, J.; Sayle, R.A. ZINC20-A Free Ultralarge-Scale Chemical Database for Ligand Discovery. J. Chem. Inf. Model. 2020, 60, 6065–6073. [Google Scholar] [CrossRef] [PubMed]
Mysinger, M.M.; Carchia, M.; Irwin, J.J.; Shoichet, B.K. Directory of useful decoys, enhanced (DUD-E): Better ligands and decoys for better benchmarking. J. Med. Chem. 2012, 55, 6582–6594. [Google Scholar] [CrossRef]
Venkatraman, V.; Colligan, T.H.; Lesica, G.T.; Olson, D.R.; Gaiser, J.; Copeland, C.J.; Wheeler, T.J.; Roy, A. Drugsniffer: An open source workflow for virtually screening billions of molecules for binding affinity to protein targets. Front. Pharmacol. 2022, 13, 874746. [Google Scholar] [CrossRef]
Gorgulla, C.; Boeszoermenyi, A.; Wang, Z.-F.; Fischer, P.D.; Coote, P.W.; Padmanabha Das, K.M.; Malets, Y.S.; Radchenko, D.S.; Moroz, Y.S.; Scott, D.A. An open-source drug discovery platform enables ultra-large virtual screens. Nature 2020, 580, 663–668. [Google Scholar] [CrossRef]
Gaulton, A.; Hersey, A.; Nowotka, M.; Bento, A.P.; Chambers, J.; Mendez, D.; Mutowo, P.; Atkinson, F.; Bellis, L.J.; Cibrián-Uhalte, E.; et al. The ChEMBL database in 2017. Nucleic Acids Res. 2017, 45, D945–D954. [Google Scholar] [CrossRef]
Jumper, J.; Evans, R.; Pritzel, A.; Green, T.; Figurnov, M.; Ronneberger, O.; Tunyasuvunakool, K.; Bates, R.; Žídek, A.; Potapenko, A.; et al. Highly accurate protein structure prediction with AlphaFold. Nature 2021, 596, 583–589. [Google Scholar] [CrossRef]
Lyskov, S.; Gray, J.J. The RosettaDock server for local protein-protein docking. Nucleic Acids Res. 2008, 36, W233–W238. [Google Scholar] [CrossRef]
Amaro, R.E.; Baudry, J.; Chodera, J.; Demir, Ö.; McCammon, J.A.; Miao, Y.; Smith, J.C. Ensemble Docking in Drug Discovery. Biophys. J. 2018, 114, 2271–2278. [Google Scholar] [CrossRef] [PubMed]
McNutt, A.T.; Francoeur, P.; Aggarwal, R.; Masuda, T.; Meli, R.; Ragoza, M.; Sunseri, J.; Koes, D.R. GNINA 1.0: Molecular docking with deep learning. J. Cheminform. 2021, 13, 43. [Google Scholar] [CrossRef]
Cavasotto, C.N.; Adler, N.S.; Aucar, M.G. Quantum Chemical Approaches in Structure-Based Virtual Screening and Lead Optimization. Front. Chem. 2018, 6, 188. [Google Scholar] [CrossRef] [PubMed]
Teng, H.; Wang, R.; Shen, Y.; Yuan, Y.; Kingsford, C. DTMol: Pocket-based molecular docking using diffusion transformers. bioRxiv 2025. [Google Scholar] [CrossRef]
Shaker, B.; Barakat, K. Harnessing Deep Learning and Generative AI for Molecular Docking Simulations: Tools, Challenges, and Future Directions. In Molecular Docking in Biomedical Engineering and Computational Chemistry; IntechOpen: London, UK, 2025. [Google Scholar]
Jin, Z.; Du, X.; Xu, Y.; Deng, Y.; Liu, M.; Zhao, Y.; Zhang, B.; Li, X.; Zhang, L.; Peng, C.; et al. Structure of M(pro) from SARS-CoV-2 and discovery of its inhibitors. Nature 2020, 582, 289–293. [Google Scholar] [CrossRef]
Manglik, A.; Lin, H.; Aryal, D.K.; McCorvy, J.D.; Dengler, D.; Corder, G.; Levit, A.; Kling, R.C.; Bernat, V.; Hübner, H.; et al. Structure-based discovery of opioid analgesics with reduced side effects. Nature 2016, 537, 185–190. [Google Scholar] [CrossRef]
IUPAC. Pharmacophore, 5.0.0 ed.; IUPAC, Ed.; International Union of Pure and Applied Chemistry (IUPAC): Research Triangle Park, NC, USA, 2025. [Google Scholar]
Imrie, F.; Hadfield, T.E.; Bradley, A.R.; Deane, C.M. Deep generative design with 3D pharmacophoric constraints. Chem. Sci. 2021, 12, 14577–14589. [Google Scholar] [CrossRef] [PubMed]
Trott, O.; Olson, A.J. AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J. Comput. Chem. 2010, 31, 455–461. [Google Scholar] [CrossRef] [PubMed]
Verdonk, M.L.; Cole, J.C.; Hartshorn, M.J.; Murray, C.W.; Taylor, R.D. Improved protein–ligand docking using GOLD. Proteins Struct. Funct. Bioinform. 2003, 52, 609–623. [Google Scholar] [CrossRef]
Wolber, G.; Langer, T. LigandScout: 3-D pharmacophores derived from protein-bound ligands and their use as virtual screening filters. J. Chem. Inf. Model. 2005, 45, 160–169. [Google Scholar] [CrossRef]
Kim, S.; Chen, J.; Cheng, T.; Gindulyte, A.; He, J.; He, S.; Li, Q.; Shoemaker, B.A.; Thiessen, P.A.; Yu, B.; et al. PubChem 2023 update. Nucleic Acids Res. 2023, 51, D1373–D1380. [Google Scholar] [CrossRef] [PubMed]
Ruddigkeit, L.; van Deursen, R.; Blum, L.C.; Reymond, J.L. Enumeration of 166 billion organic small molecules in the chemical universe database GDB-17. J. Chem. Inf. Model. 2012, 52, 2864–2875. [Google Scholar] [CrossRef]
Chen, L.; Cruz, A.; Ramsey, S.; Dickson, C.J.; Duca, J.S.; Hornak, V.; Koes, D.R.; Kurtzman, T. Hidden bias in the DUD-E dataset leads to misleading performance of deep learning in structure-based virtual screening. PLoS ONE 2019, 14, e0220113. [Google Scholar] [CrossRef] [PubMed]
Brown, N.; Fiscato, M.; Segler, M.H.S.; Vaucher, A.C. GuacaMol: Benchmarking Models for de Novo Molecular Design. J. Chem. Inf. Model. 2019, 59, 1096–1108. [Google Scholar] [CrossRef] [PubMed]
Polykovskiy, D.; Zhebrak, A.; Sanchez-Lengeling, B.; Golovanov, S.; Tatanov, O.; Belyaev, S.; Kurbanov, R.; Artamonov, A.; Aladinskiy, V.; Veselov, M.; et al. Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models. Front. Pharmacol. 2020, 11, 565644. [Google Scholar] [CrossRef]
Hamelberg, D.; Mongan, J.; McCammon, J.A. Accelerated molecular dynamics: A promising and efficient simulation method for biomolecules. J. Chem. Phys. 2004, 120, 11919–11929. [Google Scholar] [CrossRef]
Homeyer, N.; Gohlke, H. Free Energy Calculations by the Molecular Mechanics Poisson-Boltzmann Surface Area Method. Mol. Inform. 2012, 31, 114–122. [Google Scholar] [CrossRef]
Chodera, J.D.; Noé, F. Markov state models of biomolecular conformational dynamics. Curr. Opin. Struct. Biol. 2014, 25, 135–144. [Google Scholar] [CrossRef]
Mardt, A.; Pasquali, L.; Wu, H.; Noé, F. VAMPnets for deep learning of molecular kinetics. Nat. Commun. 2018, 9, 5. [Google Scholar] [CrossRef]
Lyu, J.; Wang, S.; Balius, T.E.; Singh, I.; Levit, A.; Moroz, Y.S.; O’Meara, M.J.; Che, T.; Algaa, E.; Tolmachova, K.; et al. Ultra-large library docking for discovering new chemotypes. Nature 2019, 566, 224–229. [Google Scholar] [CrossRef]
Ain, Q.U.; Aleksandrova, A.; Roessler, F.D.; Ballester, P.J. Machine-learning scoring functions to improve structure-based binding affinity prediction and virtual screening. Wiley Interdiscip. Rev. Comput. Mol. Sci. 2015, 5, 405–424. [Google Scholar] [CrossRef]
Chen, P.; Ke, Y.; Lu, Y.; Du, Y.; Li, J.; Yan, H.; Zhao, H.; Zhou, Y.; Yang, Y. DLIGAND2: An improved knowledge-based energy function for protein-ligand interactions using the distance-scaled, finite, ideal-gas reference state. J. Cheminform. 2019, 11, 52. [Google Scholar] [CrossRef]
Li, H.; Sze, K.H.; Lu, G.; Ballester, P.J. Machine-learning scoring functions for structure-based drug lead optimization. Wiley Comput. Mol. Sci. 2020, 10, e1465. [Google Scholar] [CrossRef]
Shirts, M.R.; Mobley, D.L.; Brown, S.P. Free-energy calculations in structure-based drug design. In Drug Design: Structure- and Ligand-Based Approaches; Cambridge University Press: Cambridge, UK, 2010; Volume 1, pp. 61–86. [Google Scholar]
Horn, H.W.; Swope, W.C.; Pitera, J.W.; Madura, J.D.; Dick, T.J.; Hura, G.L.; Head-Gordon, T. Development of an improved four-site water model for biomolecular simulations: TIP4P-Ew. J. Chem. Phys. 2004, 120, 9665–9678. [Google Scholar] [CrossRef]
Sethi, A.; Agrawal, N.; Brezovsky, J. Impact of water models on the structure and dynamics of enzyme tunnels. Comput. Struct. Biotechnol. J. 2024, 23, 3946–3954. [Google Scholar] [CrossRef]
Sun, Q.; Li, Y.J.; Ning, S.B. Investigating the molecular mechanisms underlying the co-occurrence of Parkinson’s disease and inflammatory bowel disease through the integration of multiple datasets. Sci. Rep. 2024, 14, 17028. [Google Scholar] [CrossRef] [PubMed]
Desai, D.; Kantliwala, S.V.; Vybhavi, J.; Ravi, R.; Patel, H.; Patel, J. Review of AlphaFold 3: Transformative advances in drug design and therapeutics. Cureus 2024, 16, e63646. [Google Scholar] [CrossRef]
Mak, K.-K.; Wong, Y.-H.; Pichika, M.R. Artificial intelligence in drug discovery and development. In Drug Discovery and Evaluation: Safety and Pharmacokinetic Assays; Springer: Berlin/Heidelberg, Germany, 2024; pp. 1461–1498. [Google Scholar]
Zheng, Y.; Koh, H.Y.; Ju, J.; Yang, M.; May, L.T.; Webb, G.I.; Li, L.; Pan, S.; Church, G. Large language models for drug discovery and development. Patterns 2025, 6, 101346. [Google Scholar] [CrossRef] [PubMed]
Averly, R.; Baker, F.N.; Watson, I.A.; Ning, X. LIDDIA: Language-based Intelligent Drug Discovery Agent. Proc. Conf. Empir. Methods Nat. Lang. Process. 2025, 2025, 12015–12039. [Google Scholar]
Ge, F.; Zhu, J.; Zhang, L.; Xiao, H.; Bao, X.; Xie, F.; Chen, D.; Lu, Y.; Wang, Y.; Guan, Z. AutoBinder Agent: An MCP-Based Agent for End-to-End Protein Binder Design. arXiv 2026, arXiv:2602.00019. [Google Scholar]
Pan, Q.; Xu, D.; Yao, J.X.; Ma, L.; Zhu, Z.; Ji, J. Frogent: An end-to-end full-process drug design agent. arXiv 2025, arXiv:2508.10760. [Google Scholar]
Kiss, L. Development of Novel Assays and Application of Innovative Screening Approaches for Improving Hit Discovery Efficiency of G Protein-Coupled Receptor Targets. Ph.D. Dissertation, Semmelweis University, Budapest, Hungary, 2021. [Google Scholar]
Velazquez-Campoy, A.; Claro, B.; Abian, O.; Höring, J.; Bourlon, L.; Claveria-Gimeno, R.; Ennifar, E.; England, P.; Chaires, J.B.; Wu, D.; et al. A multi-laboratory benchmark study of isothermal titration calorimetry (ITC) using Ca(2+) and Mg(2+) binding to EDTA. Eur. Biophys. J. 2021, 50, 429–451. [Google Scholar] [CrossRef]
Wienken, C.J.; Baaske, P.; Rothbauer, U.; Braun, D.; Duhr, S. Protein-binding assays in biological liquids using microscale thermophoresis. Nat. Commun. 2010, 1, 100. [Google Scholar] [CrossRef]
Gucwa, M.; Bijak, V.; Zheng, H.; Murzyn, K.; Minor, W. CheckMyMetal (CMM): Validating metal-binding sites in X-ray and cryo-EM data. IUCrJ 2024, 11, 871–877. [Google Scholar] [CrossRef]
Roskoski, R., Jr. Rule of five violations among the FDA-approved small molecule protein kinase inhibitors. Pharmacol. Res. 2023, 191, 106774. [Google Scholar] [CrossRef] [PubMed]
Reese, T.C.; Devineni, A.; Smith, T.; Lalami, I.; Ahn, J.M.; Raj, G.V. Evaluating physiochemical properties of FDA-approved orally administered drugs. Expert Opin. Drug Discov. 2024, 19, 225–238. [Google Scholar] [CrossRef] [PubMed]
Hornberger, K.R.; Araujo, E.M.V. Physicochemical Property Determinants of Oral Absorption for PROTAC Protein Degraders. J. Med. Chem. 2023, 66, 8281–8287. [Google Scholar] [CrossRef]
O’Donovan, D.H.; De Fusco, C.; Kuhnke, L.; Reichel, A. Trends in Molecular Properties, Bioavailability, and Permeability across the Bayer Compound Collection. J. Med. Chem. 2023, 66, 2347–2360. [Google Scholar] [CrossRef]
Baell, J.B.; Holloway, G.A. New substructure filters for removal of pan assay interference compounds (PAINS) from screening libraries and for their exclusion in bioassays. J. Med. Chem. 2010, 53, 2719–2740. [Google Scholar] [CrossRef]
Brenk, R.; Schipani, A.; James, D.; Krasowski, A.; Gilbert, I.H.; Frearson, J.; Wyatt, P.G. Lessons learnt from assembling screening libraries for drug discovery for neglected diseases. ChemMedChem 2008, 3, 435–444. [Google Scholar] [CrossRef] [PubMed]
Wilkinson, M.D.; Dumontier, M.; Aalbersberg, I.J.; Appleton, G.; Axton, M.; Baak, A.; Blomberg, N.; Boiten, J.W.; da Silva Santos, L.B.; Bourne, P.E.; et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 2016, 3, 160018. [Google Scholar] [CrossRef]
Fourches, D.; Muratov, E.; Tropsha, A. Trust, but Verify II: A Practical Guide to Chemogenomics Data Curation. J. Chem. Inf. Model. 2016, 56, 1243–1252. [Google Scholar] [CrossRef] [PubMed]
Scantlebury, J.; Vost, L.; Carbery, A.; Hadfield, T.E.; Turnbull, O.M.; Brown, N.; Chenthamarakshan, V.; Das, P.; Grosjean, H.; von Delft, F.; et al. A Small Step Toward Generalizability: Training a Machine Learning Scoring Function for Structure-Based Virtual Screening. J. Chem. Inf. Model. 2023, 63, 2960–2974. [Google Scholar] [CrossRef]
Chen, L.; Blay, V.; Ballester, P.J.; Houston, D.R. SCORCH2: A Generalized Heterogeneous Consensus Model for High-Enrichment Interaction-Based Virtual Screening. Adv. Sci. 2025, 12, e08318. [Google Scholar] [CrossRef] [PubMed]
Durant, G.; Boyles, F.; Birchall, K.; Marsden, B.; Deane, C.M. Robustly interrogating machine learning-based scoring functions: What are they learning? Bioinformatics 2025, 41, btaf040. [Google Scholar] [CrossRef]

Figure 1. Workflow from target identification to lead optimization. Data resources feed docking, pharmacophore modeling, and virtual screening. Integration and experimental workflows provide feedback, enabling ultra-large library search and prioritization of chemotypes.

Figure 2. Framework linking data repositories to docking, pharmacophore modeling, and molecular dynamics, supported by experimental workflows. Panels summarize software, advances, limitations, with integration feedback highlighting antiviral, oncology, and CNS key applications.

Figure 3. Comprehensive schematic linking data sources to pharmacophore modeling, molecular dynamics, virtual screening, and docking, plus validation metrics. Arrows depict integration feedback, with applications and limitations: scoring, flexibility, waters, metals, uncertainty.

Figure 4. Data sources, docking, pharmacophore modeling, and virtual screening with representative software. Lower panels present advances, applications, and limitations; laboratory photographs illustrate assay context alongside computational outputs.

Figure 5. Approaches, libraries, tools, breakthroughs, validation, integration, challenges, and future directions. Boxes, arrows, and photographs summarize workflows from library design and docking to assay confirmation, considerations enabling discovery.

Table 1. Core computational approaches and where they fit in drug discovery programs.

Modality	Primary Goal	Typical Inputs	Key Outputs	Strengths	Main Limitations	Representative Tools	Citation(s)
Molecular docking	Predict binding mode and rank candidates	Prepared protein (X-ray/cryo-EM/AF2), curated ligands, microstates	Poses, interaction maps, docking scores	Fast triage; structure-aware hypotheses	Scoring/solvation approximations; limited receptor flexibility	AutoDock Vina, Glide, GOLD, DOCK	[22,29,30]
Pharmacophore modeling	Capture essential 3D features for activity	Active ligands and/or protein–ligand complexes	Feature hypotheses; 3D queries	Scaffold hopping; ultra-fast pre-filtering	Overfitting risk; conformer/feature quality sensitive	PHASE, LigandScout, MOE, PharmaGist	[27,31,32]
Molecular dynamics (MD)	Probe flexibility, water networks, pose stability	Protein–ligand complex, force field, solvent/ions	Trajectories, RMSD/RMSF, H-bond/water analyses; MM-GB/SA/FEP	Mechanistic insight; validates poses; supports ΔΔG	Sampling cost; FF/setup sensitivity	CHARMM36m, AMBER ff14SB, OPLS3e	[23,33,34,35]
Virtual screening (VS)	Prioritize hits from ultra-large libraries	ZINC/ChEMBL/PubChem/GDB; docking/LBVS filters	Ranked shortlists; clustered chemotypes	Scales to 10⁸–10⁹; cloud/HPC ready	Benchmark bias; hit confirmation required	VirtualFlow; LBVS + SBVS pipelines	[36,37,38,39]

Table 2. Software, platforms, and data resources used across workflows.

Category	Tool/Resource	License	Notable Capabilities	Typical Scale	Citation
Docking engine	AutoDock Vina	Open source	Stochastic search; empirical scoring; multithreaded	10⁵–10⁷	[52]
Docking engine	Glide (HTVS/SP/XP)	Commercial	Hierarchical filters; pose refinement; XP scoring	10⁵–10⁷	[22]
Docking engine	GOLD	Commercial	GA search; side-chain flexibility; constraints	10⁵–10⁶	[53]
Docking engine	DOCK	Open source	Shape/grid; anchor-and-grow; modular	10⁵–10⁷	[29]
Protein–protein docking	RosettaDock (server)	Academic	Rigid-body + side-chain optimization; web access	Interface sets	[42]
Pharmacophore	PHASE	Commercial	Common-pharmacophore ID; 3D-QSAR	10⁶–10⁸ (filter)	[27]
Pharmacophore	LigandScout	Commercial/academic	Feature extraction from complexes; 3D queries	10⁶–10⁸	[54]
Suite	MOE (+ LowModeMD)	Commercial	Conformers; pocket/feature tools; induced-fit aids	Project-scale	[31]
Orchestrator (VS)	VirtualFlow	Open source	Billion-scale docking; engine-agnostic; cloud/HPC	10⁸–10⁹+	[39]
Library	ZINC20/22	Free	Purchasable make-on-demand; ready-to-dock	10⁹-class	[36]
Library	ChEMBL	Free	Assay-annotated activities; targets	Millions of activities	[40]
Library	PubChem	Free	Open compounds and bioassays	100M+ compounds	[55]
Enumerated universe	GDB-17	Free	166B enumerated molecules	10¹¹+	[56]
Structures	Protein Data Bank (PDB)	Free	Standardized macromolecular structures	200k+ entries	[24,25]
Benchmark	DUD-E	Free	Actives/decoys for many targets	Benchmark sets	[37]
Benchmark critique	DUD-E bias analysis	Free	Analog/decoy bias diagnostics	Benchmark audit	[57]
Structure prediction	AlphaFold	Free for models	Near-experimental single-chain models	Proteome-scale	[41]
ML benchmarks	GuacaMol/MOSES	Open source	Generative design benchmarking	Model-dependent	[58,59]

Table 3. Drugs delivered with computational pipelines (docking/pharmacophore/MD/VS).

Drug (Year)	Target/Indication	Primary Computational Technique(s) Credited	What the Pipeline Contributed (Very Brief)
Captopril (1981)	ACE/hypertension	Early structure-based design guided by carboxypeptidase-A models	Transition-state/active-site modeling yielded thiol-bearing inhibitors with oral activity
Zanamivir (1999)	Influenza neuraminidase/influenza	SBDD + docking on NA crystal structures	Rational modifications to sialic-acid scaffold to exploit NA catalytic site; first-in-class NA inhibitor
Saquinavir (1995)	HIV-1 protease/HIV	SBDD from protease–peptide complexes	Transition-state mimic designed from active-site geometry; launched first HIV PI
Indinavir (1996)	HIV-1 protease/HIV	SBDD with iterative docking/optimization	Optimized P1/P2 groups for S1/S2 subsites; improved oral PK
Ritonavir (1996)	HIV-1 protease/HIV	SBDD (crystallography-guided)	Potent PI that became PK booster after metabolic insights
Dorzolamide (1995)	Carbonic anhydrase II/glaucoma	SBDD	Active-site geometry (Zn²⁺ coordination) guided sulfonamide design
Tirofiban (1998)	Integrin αIIbβ3/ACS	SBDD/mimetic design from RGD-ligand structures	Crystal structures with tirofiban/eptifibatide informed small-molecule antagonist design
Aliskiren (2007)	Renin/hypertension	SBDD + modeling on renin structures	Non-peptidic scaffold engineered for S1/S3/S1′ pockets; oral renin inhibitor
Boceprevir (2011)	HCV NS3 protease/HCV	SBDD	Warhead and P1/P2 optimization for covalent reversible inhibition
Rivaroxaban (2011)	Factor Xa/anticoagulation	SBDD + crystallography	Structure-guided optimization and FXa co-crystal analysis supported binding-mode tuning
Baloxavir marboxil (2018)	Influenza cap-dependent endonuclease/influenza	SBDD on PA endonuclease	Metal-chelation pharmacophore and pocket mapping drove first-in-class CEN inhibitor

Table 4. Scoring families, force fields, and free-energy methods (what they model and when to use).

Domain	Method/Model	What It Modeled/ Optimized	Strengths	Limitations	Typical Use	Citation
Docking scoring (empirical)	GlideScore; AutoDock4/Vina	vdW, H-bonding, desolvation terms fit to data	Fast; good enrichment	Transferability limits	Primary SBVS ranking	[22]
Docking scoring (knowledge-based)	Statistical PMF/X-Score (generic)	Statistical atom–atom potentials	Simple; robust	Coarse physics	Complementary rescoring	[57,65,66]
Docking scoring (ML)	RF-Score; NNScore; GNINA CNN	Data-learned pose/affinity from structures	Captures nonlinearity; strong top-N	Needs curated data; bias risk	Rescoring top poses	[44]
Physics-like ranking	MM-PB/GBSA	Continuum solvation + force field	Interpretable; fast triage	Dielectric/entropy sensitive	Post-docking triage	[61]
Alchemical ΔΔG	FEP (RBFE)	Relative binding free energy	~1 kcal·mol⁻¹ resolution	Setup/sampling cost	Lead optimization	[67]
Alchemical ΔG analysis	Thermodynamic integration	Gradient-based alchemy (λ windows)	Rigorous; general	Complex setup	Tight Structure activity relationship decisions	[68]
Force field (proteins)	CHARMM36m	Folded and IDP proteins	Balanced backbone/IDP	χ issues for some residues	General MD	[33]
Force field (proteins)	AMBER ff14SB	Proteins	Updated side-chain/backbone	Needs matched ligand params	General MD	[34]
Force field (proteins/ligands)	OPLS3e	Drug-like ligands + proteins	Broad ligand coverage	Licensed	Lead optimization MD	[35]
Water models	TIP3P; TIP4P-Ew	Solvent representation	Standardized hydration	Model-specific limits	Routine MD	[69,70,71]

Table 5. Future directions and role of AI in drug discovery.

Direction	What AI/Tech Adds	Concrete Example(s)	Expected Impact
Ultra-large virtual screening (10⁸–10⁹+ molecules)	Cloud/HPC orchestration; adaptive scheduling	VirtualFlow enables billion-scale SBVS, modular docking stacks	Orders-of-magnitude expansion of search space; more novel chemotypes
AI-guided triage for VS	Learn from sparse docking to skip most of the library	Deep Docking cuts compute by ~50×; consensus/pose filters downstream	Same hit rate at fraction of cost/time
DL-rescoring and pose selection	CNN/GDL models refine ranks/poses post-docking	GNINA family improves top-n pose accuracy	Better early enrichment; fewer false positives
Structural coverage via AI	High-accuracy protein structures when experiments lack	AlphaFold proteome-scale structures	“Unlocks” SBDD for previously intractable targets
Bias-aware benchmarking	Detect spurious dataset signals in training/validation	DUD-E bias analysis cautions DL claims	More reliable, reproducible VS metrics
Omics-driven personalization	Match compounds to patient/pathway signatures	CMap/LINCS L1000 profiles (1.3M signatures)	Indication selection, MoA inference, repurposing
Cryo-EM + MD + ensemble docking	Multi-state targets and cryptic pockets	State-aware docking/MD on EM ensembles	Allosteric/drugging “undruggables”
Foundation models and generative design	Rapid de novo ideas under multi-param constraints	GuacaMol/MOSES benchmarks standardize eval	Tighter design-make-test loops

Table 6. Validation metrics, assays, library filters, and practical mitigations.

Stage	Item/Metric/Assay	Measures/Goal	Practical Notes/Action	When to Use	Citation
Benchmarking	EF1%, EF5%, BEDROC, ROC-AUC	Early recognition and global ranking	Emphasize EF/BEDROC for top-fraction testing	Retrospective method evaluation	[37]
Benchmark audit	DUD-E bias checks	Analog/decoy bias detection	Use bias-controlled splits; external sets	Before claiming generalization	[57]
Orthogonal confirmation	SPR	kon, koff, KD	Control surface artifacts; kinetics insight	Hit/lead confirmation	[78]
Orthogonal confirmation	ITC	ΔH, ΔS, KD, stoichiometry	Thermodynamics; gold standard	Characterize prioritized hits	[79]
Orthogonal confirmation	MST	KD in solution	Low sample; buffer-flexible	Cross-validate binding	[80]
Mode validation	X-ray/cryo-EM	Bound structure and binding mode	Deposit to PDB for reuse	Structural follow-up	[81]
Library filter	Rule-of-Five	Oral developability	Use as soft gate with medicinal review	Pre-screening	[82,83]
Library filter	Veber criteria	Permeability/bioavailability	PSA/rotor thresholds	Pre-screening	[84,85]
Library filter	PAINS	Remove assay-interference chemotypes	Avoid over-filtering true actives	Library assembly	[86]
Library filter	Brenk alerts	Remove problematic fragments	Combining with expert review	Library assembly	[87]
Data stewardship	FAIR principles	Reproducibility and reuse	Version inputs; share code/models	All stages	[88]
Curation	Chemogenomics checklists	Correct labels/states/microstates	Scripted, versioned prep	Before modeling/VS	[89]
Common pitfall	Mis-protonated residues/ligands	Spurious contacts/energies	Enumerate microstates; pKa review	Prep and docking/MD	[89]
Common pitfall	Ignored conserved waters	Missed bridges/affinity	Retain/map waters; test displacement	Docking triage	[23]
Common pitfall	Metal coordination errors	Unrealistic poses/instability	Add constraints; spot-check with QM/MM	Targets with metals	[61]
Common pitfall	Overfitting benchmarks	Inflated enrichment	Prospective tests; external sets	Method claims	[90,91,92]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Altharawi, A.; Alqahtani, S.M. Integrative Computational Chemistry Approaches in Modern Drug Discovery: Advances in Docking, Pharmacophore Modeling, Molecular Dynamics, and Virtual Screening. Pharmaceutics 2026, 18, 565. https://doi.org/10.3390/pharmaceutics18050565

AMA Style

Altharawi A, Alqahtani SM. Integrative Computational Chemistry Approaches in Modern Drug Discovery: Advances in Docking, Pharmacophore Modeling, Molecular Dynamics, and Virtual Screening. Pharmaceutics. 2026; 18(5):565. https://doi.org/10.3390/pharmaceutics18050565

Chicago/Turabian Style

Altharawi, Ali, and Safar M. Alqahtani. 2026. "Integrative Computational Chemistry Approaches in Modern Drug Discovery: Advances in Docking, Pharmacophore Modeling, Molecular Dynamics, and Virtual Screening" Pharmaceutics 18, no. 5: 565. https://doi.org/10.3390/pharmaceutics18050565

APA Style

Altharawi, A., & Alqahtani, S. M. (2026). Integrative Computational Chemistry Approaches in Modern Drug Discovery: Advances in Docking, Pharmacophore Modeling, Molecular Dynamics, and Virtual Screening. Pharmaceutics, 18(5), 565. https://doi.org/10.3390/pharmaceutics18050565

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Integrative Computational Chemistry Approaches in Modern Drug Discovery: Advances in Docking, Pharmacophore Modeling, Molecular Dynamics, and Virtual Screening

Abstract

1. Introduction

2. Computational Approaches in Drug Discovery

3. Molecular Docking: Theory, Tools, and Applications

4. Pharmacophore Modeling: From Concept to Clinical Candidates

5. Molecular Dynamics (MD) Simulations: Capturing Flexibility and Dynamics

6. Case Studies of Computational Drug Discovery

7. Virtual Screening: Large-Scale Compound Prioritization

8. Future Directions and Role of AI in Drug Discovery

9. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI