CrypTothML: An Integrated Mixed-Solvent Molecular Dynamics Simulation and Machine Learning Approach for Cryptic Site Prediction
Abstract
:1. Introduction
2. Results
2.1. Workflow of CrypTothML
2.2. Detection of Hotspots from MSMD Simulations
2.3. Feature Selection
2.4. Development and Evaluation of the Machine Learning Model
2.5. Performance Comparison with Existing Methods
3. Discussion
3.1. Case Study Analysis: Fascin and Androgen Receptor
3.2. Limitations and Future Perspectives
4. Materials and Methods
4.1. Selection of Protein and Dataset
4.2. MSMD Simulations
4.3. Detection of Hotspots Corresponding to Cryptic Sites
- (1)
- At least 80% of the voxels in the hotspot were within 3.5 Å of residue atoms that sterically clash with the ligand molecule.
- (2)
- Additionally, at least one voxel in the hotspot was within 4.5 Å of any ligand atom.
- (3)
- Manual inspection confirms its validity.
4.4. Feature Extraction for Machine Learning
4.5. Evaluation of the Optimal Machine Learning Model
4.6. Comparison with Existing Methods
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
GAFF2 | General AMBER Force Field 2 |
GFE | Grid Free Energy |
LOOCV | Leave-One-Out Cross-Validation |
MD | Molecular Dynamics |
MSMD | Mixed-Solvent Molecular Dynamics |
PDB | Protein Data Bank |
RMSF | Root Mean Square Fluctuation |
SVM | Support Vector Machine |
References
- Knoverek, C.R.; Amarasinghe, G.K.; Bowman, G.R. Advanced methods for accessing protein shape-shifting present new therapeutic opportunities. Trends Biochem. Sci. 2019, 44, 351–364. [Google Scholar] [CrossRef] [PubMed]
- Vajda, S.; Beglov, D.; Wakefield, A.E.; Egbert, M.; Whitty, A. Cryptic binding sites on proteins: Definition, detection, and druggability. Curr. Opin. Chem. Biol. 2018, 44, 1–8. [Google Scholar] [CrossRef] [PubMed]
- Schueler-Furman, O.; Wodak, S.J. Computational approaches to investigating allostery. Curr. Opin. Struct. Biol. 2016, 41, 159–171. [Google Scholar] [CrossRef]
- Hart, K.M.; Moeder, K.E.; Ho, C.M.W.; Zimmerman, M.I.; Frederick, T.E.; Bowman, G.R. Designing small molecules to target cryptic pockets yields both positive and negative allosteric modulators. PLoS ONE 2017, 12, e0178678. [Google Scholar] [CrossRef]
- Hardy, J.A.; Wells, J.A. Searching for new allosteric sites in enzymes. Curr. Opin. Struct. Biol. 2004, 14, 706–715. [Google Scholar] [CrossRef]
- Kuzmanic, A.; Bowman, G.R.; Juarez-Jimenez, J.; Michel, J.; Gervasio, F.L. Investigating cryptic binding sites by Molecular Dynamics simulations. Acc. Chem. Res. 2020, 53, 654–661. [Google Scholar] [CrossRef]
- Kimura, S.R.; Hu, H.P.; Ruvinsky, A.M.; Sherman, W.; Favia, A.D. Deciphering cryptic binding sites on proteins by mixed-solvent Molecular Dynamics. J. Chem. Inf. Model. 2017, 57, 1388–1401. [Google Scholar] [CrossRef]
- Schmidt, D.; Boehm, M.; McClendon, C.L.; Torella, R.; Gohlke, H. Cosolvent-enhanced sampling and unbiased identification of cryptic pockets suitable for structure-based drug design. J. Chem. Theory Comput. 2019, 15, 3331–3343. [Google Scholar] [CrossRef]
- Martinez-Rosell, G.; Lovera, S.; Sands, Z.A.; De Fabritiis, G. PlayMolecule CrypticScout: Predicting protein cryptic sites using mixed-solvent Molecular Simulations. J. Chem. Inf. Model. 2020, 60, 2314–2324. [Google Scholar] [CrossRef]
- Oleinikovas, V.; Saladino, G.; Cossins, B.P.; Gervasio, F.L. Understanding cryptic pocket formation in protein targets by enhanced sampling simulations. J. Am. Chem. Soc. 2016, 138, 14257–14263. [Google Scholar] [CrossRef]
- Smith, R.D.; Carlson, H.A. Identification of cryptic binding sites using MixMD with standard and accelerated Molecular Dynamics. J. Chem. Inf. Model. 2021, 61, 1287–1299. [Google Scholar] [CrossRef] [PubMed]
- Nordquist, E.B.; Zhao, M.; Kumar, A.; MacKerell, A.D., Jr. Combined physics- and machine-learning-based method to identify druggable binding sites using SILCS-hotspots. J. Chem. Inf. Model. 2024, 64, 7743–7757. [Google Scholar] [CrossRef] [PubMed]
- MacKerell, A.D., Jr.; Jo, S.; Lakkaraju, S.K.; Lind, C.; Yu, W. Identification and characterization of fragment binding sites for allosteric ligand design using the site identification by ligand competitive saturation hotspots approach (SILCS-Hotspots). Biochim. Biophys. Acta Gen. Subj. 2020, 1864, 129519. [Google Scholar] [CrossRef]
- Guvench, O.; MacKerell, A.D., Jr. Computational fragment-based binding site identification by ligand competitive saturation. PLoS Comput. Biol. 2009, 5, e1000435. [Google Scholar] [CrossRef]
- Raman, E.P.; Yu, W.; Lakkaraju, S.K.; MacKerell, A.D., Jr. Inclusion of multiple fragment types in the site identification by ligand competitive saturation (SILCS) approach. J. Chem. Inf. Model. 2013, 53, 3384–3398. [Google Scholar] [CrossRef]
- Comitani, F.; Gervasio, F.L. Exploring cryptic pockets formation in targets of pharmaceutical interest with SWISH. J. Chem. Theory Comput. 2018, 14, 3321–3331. [Google Scholar] [CrossRef]
- Borsatto, A.; Gianquinto, E.; Rizzi, V.; Gervasio, F.L. SWISH-X, an Expanded Approach to Detect Cryptic Pockets in Proteins and at Protein–Protein Interfaces. J. Chem. Theory Comput. 2024, 20, 3335–3348. [Google Scholar] [CrossRef]
- Seco, J.; Luque, F.J.; Barril, X. Binding site detection and druggability index from first principles. J. Med. Chem. 2009, 52, 2363–2371. [Google Scholar] [CrossRef]
- Alvarez-Garcia, D.; Barril, X. Molecular simulations with solvent competition quantify water displaceability and provide accurate interaction maps of protein binding sites. J. Med. Chem. 2014, 57, 8530–8539. [Google Scholar] [CrossRef]
- Alvarez-Garcia, D.; Schmidtke, P.; Cubero, E.; Barril, X. Extracting atomic contributions to binding free energy using Molecular Dynamics simulations with mixed solvents (MDmix). Curr. Drug Discov. Technol. 2022, 19, 62–68. [Google Scholar] [CrossRef]
- Cimermancic, P.; Weinkam, P.; Rettenmaier, T.J.; Bichmann, L.; Keedy, D.A.; Woldeyes, R.A.; Schneidman-Duhovny, D.; Demerdash, O.N.; Mitchell, J.C.; Wells, J.A.; et al. CryptoSite: Expanding the druggable proteome by characterization and prediction of cryptic binding sites. J. Mol. Biol. 2016, 428, 709–719. [Google Scholar] [CrossRef] [PubMed]
- Ngan, C.H.; Bohnuud, T.; Mottarella, S.E.; Beglov, D.; Villar, E.A.; Hall, D.R.; Kozakov, D.; Vajda, S. FTMAP: Extended protein mapping with user-selected probe molecules. Nucleic Acids Res. 2012, 40, W271–W275. [Google Scholar] [CrossRef] [PubMed]
- Capra, J.A.; Laskowski, R.A.; Thornton, J.M.; Singh, M.; Funkhouser, T.A. Predicting protein ligand binding sites by combining evolutionary sequence conservation and 3D structure. PLoS Comput. Biol. 2009, 5, e1000585. [Google Scholar] [CrossRef] [PubMed]
- Beglov, D.; Hall, D.R.; Wakefield, A.E.; Luo, L.; Allen, K.N.; Kozakov, D.; Whitty, A.; Vajda, S. Exploring the structural origins of cryptic sites on proteins. Proc. Natl. Acad. Sci. USA 2018, 115, E3416–E3425. [Google Scholar] [CrossRef]
- Koseki, J.; Motono, C.; Yanagisawa, K.; Kudo, G.; Yoshino, R.; Hirokawa, T.; Imai, K. CrypToth: Cryptic pocket detection through mixed-solvent molecular dynamics simulations based topological data analysis. J. Chem. Inf. Model. 2025, in press. [Google Scholar]
- Estébanez-Perpiñá, E.; Arnold, L.A.; Nguyen, P.; Rodrigues, E.D.; Mar, E.; Bateman, R.; Pallai, P.; Shokat, K.M.; Baxter, J.D.; Guy, R.K.; et al. A surface on the androgen receptor that allosterically regulates coactivator binding. Proc. Natl. Acad. Sci. USA 2007, 104, 16074–16079. [Google Scholar] [CrossRef]
- Rossi, A.; Marti-Renom, M.A.; Sali, A. Localization of Binding Sites in Protein Structures by Optimization of a Composite Scoring Function. Protein Sci. 2006, 15, 2366–2380. [Google Scholar] [CrossRef]
- Fauchere, J.; Pliska, V. Hydrophobicity Parameters π of amino acid side chains from partitioning of N-Acetyl-amino-acid amides. Eur. J. Med. Chem. 1983, 18, 369–375. [Google Scholar]
- Huang, J.; Rauscher, S.; Nawrocki, G.; Ran, T.; Feig, M.; de Groot, B.L.; Grubmüller, H.; MacKerell, A.D., Jr. CHARMM36m: An Improved Force Field for Folded and Intrinsically Disordered Proteins. Nat. Methods 2017, 14, 71–73. [Google Scholar] [CrossRef]
- Meller, A.; Ward, M.; Borowsky, J.; Kshirsagar, M.; Lotthammer, J.M.; Oviedo, F.; Ferres, J.L.; Bowman, G.R. Predicting locations of cryptic pockets from single protein structures using the PocketMiner graph neural network. Nat. Commun. 2023, 14, 1177. [Google Scholar] [CrossRef]
- Francis, S.; Croft, D.; Schüttelkopf, A.W.; Parry, C.; Pugliese, A.; Cameron, K.; Claydon, S.; Drysdale, M.; Gardner, C.; Gohlke, A.; et al. Structure-based design, synthesis and biological evaluation of a novel series of isoquinolone and pyrazolo [4,3-c]pyridine inhibitors of fascin 1 as potential anti-metastatic agents. Bioorg. Med. Chem. Lett. 2019, 29, 1023–1029. [Google Scholar] [CrossRef] [PubMed]
- Ghanakota, P.; Carlson, H.A. Moving Beyond Active-Site Detection: MixMD Applied to Allosteric Systems. J. Phys. Chem. B 2016, 120, 8685–8695. [Google Scholar] [CrossRef] [PubMed]
- Graham, S.E.; Leja, N.; Carlson, H.A. MixMD Probeview: Robust Binding Site Prediction from Cosolvent Simulations. J. Chem. Inf. Model. 2018, 58, 1426–1433. [Google Scholar] [CrossRef]
- Zak, K.; Kitel, R.; Przetocka, S.; Golik, P.; Guzik, K.; Musielak, B.; Dömling, A.; Dubin, G.; Holak, T.A. Structure of the Complex of Human Programmed Death 1, PD-1, and Its Ligand PD-L1. Structure 2015, 23, 2341–2348. [Google Scholar] [CrossRef]
- Muszak, D.; Surmiak, E.; Plewka, J.; Magiera-Mularz, K.; Kocik-Krol, J.; Musielak, B.; Sala, D.; Kitel, R.; Stec, M.; Weglarczyk, K.; et al. Terphenyl-Based Small-Molecule Inhibitors of Programmed Cell Death-1/Programmed Death-Ligand 1 Protein–Protein Interaction. J. Med. Chem. 2021, 64, 11614–11636. [Google Scholar] [CrossRef]
- Škrhák, V.; Novotný, M.; Feidakis, C.P.; Krivák, R.; Hoksza, D. CryptoBench: Cryptic Protein–Ligand Binding Sites Dataset and Benchmark. Bioinformatics 2025, 41, btae745. [Google Scholar] [CrossRef]
- Schrödinger, LLC. Schrödinger Release 2020-4; Schrödinger, LLC: New York, NY, USA, 2020. [Google Scholar]
- Frisch, M.J.; Trucks, G.; Schlegel, H.B.; Scuseria, G.; Robb, M.; Cheeseman, J.; Scalmani, G.; Barone, V.; Petersson, G.; Nakatsuji, H. Gaussian16; Gaussian Incorporated: Wallingford, UK, 2016. [Google Scholar]
- Yanagisawa, K.; Moriwaki, Y.; Terada, T.; Shimizu, K. EXPRORER: Rational cosolvent set construction method for cosolvent Molecular Dynamics using large-scale computation. J. Chem. Inf. Model. 2021, 61, 2744–2753. [Google Scholar] [CrossRef]
- Martínez, L.; Andrade, R.; Birgin, E.G.; Martínez, J.M. PACKMOL: A package for building initial configurations for molecular dynamics simulations. J. Comput. Chem. 2009, 30, 2157–2164. [Google Scholar] [CrossRef]
- Maier, J.A.; Martinez, C.; Kasavajhala, K.; Wickstrom, L.; Hauser, K.E.; Simmerling, C. ff14SB: Improving the accuracy of protein side chain and backbone parameters from ff99SB. J. Chem. Theor. Comput. 2015, 11, 3696–3713. [Google Scholar] [CrossRef]
- He, X.; Man, V.H.; Yang, W.; Lee, T.S.; Wang, J. A fast and high-quality charge model for the next generation general AMBER force field. J. Chem. Phys. 2020, 153, 114502. [Google Scholar] [CrossRef]
- Jorgensen, W.L.; Chandrasekhar, J.; Madura, J.D.; Impey, R.W.; Klein, M.L. Comparison of simple potential functions for simulating liquid water. J. Chem. Phys. 1983, 79, 926–935. [Google Scholar] [CrossRef]
- Vorreiter, C.; Robaa, D.; Sippl, W. Predicting fragment binding modes using customized Lennard-Jones potentials in short molecular dynamics simulations. Comput. Struct. Biotechnol. J. 2025, 27, 102–116. [Google Scholar] [CrossRef] [PubMed]
- Hess, B. P-LINCS: A parallel linear constraint solver for molecular simulation. J. Chem. Theor. Comput. 2008, 4, 116–122. [Google Scholar] [CrossRef] [PubMed]
- Ester, M.; Kriegel, H.-P.; Sander, J.; Xu, X. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD’96), Portland, OR, USA, 2–4 August 1996; pp. 226–231. [Google Scholar]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
- Akiba, T.; Sano, S.; Yanase, T.; Ohta, T.; Koyama, M. Optuna: A Next-Generation Hyperparameter Optimization Framework. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; Volume 25, pp. 2623–2631. [Google Scholar] [CrossRef]
- Zariquiey, F.; Jacoby, E.; Vos, A.; van Vlijmen, H.W.T.; Tresadern, G.; Harvey, J. Divide and Conquer. Pocket-Opening Mixed-Solvent Simulations in the Perspective of Docking Virtual Screening Applications for Drug Discovery. J. Chem. Inf. Model. 2022, 62, 533–543. [Google Scholar] [CrossRef]
- Bansia, H.; Mahanta, P.; Yennawar, N.H.; Ramakumar, S. Small Glycols Discover Cryptic Pockets on Proteins for Fragment-Based Approaches. J. Chem. Inf. Model. 2021, 61, 1322–1333. [Google Scholar] [CrossRef]
- Tze-Yang, N.; Tan, Y.S. Accelerated Ligand-Mapping Molecular Dynamics Simulations for the Detection of Recalcitrant Cryptic Pockets and Occluded Binding Sites. J. Chem. Theory Comput. 2022, 18, 1969–1981. [Google Scholar] [CrossRef]
- Tan, Y.S.; Verma, C.S. Straightforward Incorporation of Multiple Ligand Types into Molecular Dynamics Simulations for Efficient Binding Site Detection and Characterization. J. Chem. Theory Comput. 2020, 16, 6633–6644. [Google Scholar] [CrossRef]
AdaBoost | XGBoost | LightGBM | Random Forest | SVM | |
---|---|---|---|---|---|
Accuracy | 0.827 | 0.811 | 0.822 | 0.822 | 0.789 |
Precision | 0.818 | 0.755 | 0.846 | 0.776 | 0.756 |
Recall | 0.600 | 0.617 | 0.550 | 0.633 | 0.517 |
F1-score | 0.692 | 0.679 | 0.667 | 0.697 | 0.614 |
Specificity | 0.936 | 0.904 | 0.952 | 0.912 | 0.920 |
ROC AUC | 0.879 | 0.830 | 0.827 | 0.825 | 0.806 |
PR AUC | 0.804 | 0.757 | 0.760 | 0.762 | 0.730 |
Method | CrypTothML (AdaBoost, LOOCV) | PocketMiner | CryptoSite |
---|---|---|---|
Accuracy | 0.827 | 0.768 | 0.719 |
Precision | 0.818 | 0.743 | 0.583 |
Recall | 0.600 | 0.433 | 0.467 |
F1-Score | 0.692 | 0.547 | 0.519 |
Specificity | 0.936 | 0.928 | 0.840 |
ROC AUC | 0.879 | 0.821 | 0.780 |
PR AUC | 0.804 | 0.659 | 0.537 |
Method | Total Number of Proteins Containing Criptic Sites | Number of Proteins Ranked as Top 1 | Number of Proteins Ranked Within Top 3 | Number of Proteins Ranked Within Top 5 |
---|---|---|---|---|
CrypTothML (AdaBoost, LOOCV) | 34 | 12 (35%) | 20 (59%) | 23 (68%) |
PocketMiner | 34 | 6 (18%) | 13 (38%) | 14 (41%) |
CryptoSite | 34 | 9 (26%) | 18 (53%) | 21 (62%) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Motono, C.; Yanagisawa, K.; Koseki, J.; Imai, K. CrypTothML: An Integrated Mixed-Solvent Molecular Dynamics Simulation and Machine Learning Approach for Cryptic Site Prediction. Int. J. Mol. Sci. 2025, 26, 4710. https://doi.org/10.3390/ijms26104710
Motono C, Yanagisawa K, Koseki J, Imai K. CrypTothML: An Integrated Mixed-Solvent Molecular Dynamics Simulation and Machine Learning Approach for Cryptic Site Prediction. International Journal of Molecular Sciences. 2025; 26(10):4710. https://doi.org/10.3390/ijms26104710
Chicago/Turabian StyleMotono, Chie, Keisuke Yanagisawa, Jun Koseki, and Kenichiro Imai. 2025. "CrypTothML: An Integrated Mixed-Solvent Molecular Dynamics Simulation and Machine Learning Approach for Cryptic Site Prediction" International Journal of Molecular Sciences 26, no. 10: 4710. https://doi.org/10.3390/ijms26104710
APA StyleMotono, C., Yanagisawa, K., Koseki, J., & Imai, K. (2025). CrypTothML: An Integrated Mixed-Solvent Molecular Dynamics Simulation and Machine Learning Approach for Cryptic Site Prediction. International Journal of Molecular Sciences, 26(10), 4710. https://doi.org/10.3390/ijms26104710