PDCG: A Diffusion Model Guided by Pre-Training for Molecular Conformation Generation
Abstract
1. Introduction
2. Materials and Methods
3. Results and Discussion
3.1. The Performance with Different Diffusion Steps
3.2. Comparison with Baseline Methods
3.3. RMSD of Conformation Generation
3.4. Properties Prediction by SphereNet
4. Conclusions
Supplementary Materials
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
| PDCG | Pre-training diffusion conformation generation |
| SMILES | the Simplified Molecular Input Row Input System |
| RMSD | Root mean square deviation |
| COV-R | Coverage—Recall |
| COV-P | Coverage—Precision |
| MAT-R | Matching—Recall |
| MAT-P | Matching—Precision |
| MAE | Mean absolute error |
| IQR | Interquartile range |
References
- Hawkins, P.C.D. Conformation Generation: The State of the Art. J. Chem. Inf. Model. 2017, 57, 1747–1756. [Google Scholar] [CrossRef]
- Jumper, J.; Evans, R.; Pritzel, A.; Green, T.; Figurnov, M.; Ronneberger, O.; Tunyasuvunakool, K.; Bates, R.; Zidek, A.; Potapenko, A.; et al. Highly accurate protein structure prediction with AlphaFold. Nature 2021, 596, 583–589. [Google Scholar] [CrossRef]
- Renaud, J.P.; Chari, A.; Ciferri, C.; Liu, W.T.; Remigy, H.W.; Stark, H.; Wiesmann, C. Cryo-EM in drug discovery: Achievements, limitations and prospects. Nat. Rev. Drug Discov. 2018, 17, 471–492. [Google Scholar] [CrossRef]
- Allinger, N.L. Calculation of Molecular Structure and Energy by Force-Field Methods. In Advances in Physical Organic Chemistry; Gold, V., Bethell, D., Eds.; Academic Press: Cambridge, MA, USA, 1976; Volume 13, pp. 1–82. [Google Scholar]
- Watts, K.S.; Dalal, P.; Murphy, R.B.; Sherman, W.; Friesner, R.A.; Shelley, J.C. ConfGen: A conformational search method for efficient generation of bioactive conformers. J. Chem. Inf. Model. 2010, 50, 534–546. [Google Scholar] [CrossRef]
- Jinnouchi, R.; Karsai, F.; Kresse, G. On-the-fly machine learning force field generation: Application to melting points. Phys. Rev. B. 2019, 100, 014105. [Google Scholar] [CrossRef]
- Wang, Z.; Zhong, H.; Zhang, J.; Pan, P.; Wang, D.; Liu, H.; Yao, X.; Hou, T.; Kang, Y. Small-Molecule Conformer Generators: Evaluation of Traditional Methods and AI Models on High-Quality Data Sets. J. Chem. Inf. Model. 2023, 63, 6525–6536. [Google Scholar] [CrossRef] [PubMed]
- Mansimov, E.; Mahmood, O.; Kang, S.; Cho, K. Molecular Geometry Prediction using a Deep Generative Graph Neural Network. Sci. Rep. 2019, 9, 20381. [Google Scholar] [CrossRef] [PubMed]
- Thomas, N.; Smidt, T.; Kearnes, S.; Yang, L.; Li, L.; Kohlhoff, K.; Riley, P. Tensor field networks: Rotation-and translation-equivariant neural networks for 3d point clouds. arXiv 2018, arXiv:1802.08219. [Google Scholar]
- Liberti, L.; Lavor, C.; Maculan, N.; Mucherino, A. Euclidean distance geometry and applications. SIAM Rev. 2014, 56, 3–69. [Google Scholar] [CrossRef]
- Simm, G.N.; Hernández-Lobato, J.M. A generative model for molecular distance geometry. arXiv 2019, arXiv:1909.11459. [Google Scholar]
- Xu, M.; Luo, S.; Bengio, Y.; Peng, J.; Tang, J. Learning neural generative dynamics for molecular conformation generation. arXiv 2021, arXiv:2102.10240. [Google Scholar]
- Xu, M.; Wang, W.; Luo, S.; Shi, C.; Bengio, Y.; Gomez-Bombarelli, R.; Tang, J. An end-to-end framework for molecular conformation generation via bilevel programming. In Proceedings of the 38th International Conference on Machine Learning (ICML), Virtual Event, 18–24 July 2021; pp. 11537–11547. [Google Scholar]
- Shi, C.; Luo, S.; Xu, M.; Tang, J. Learning gradient fields for molecular conformation generation. In Proceedings of the 38th International Conference on Machine Learning (ICML), Virtual Event, 18–24 July 2021; pp. 9558–9568. [Google Scholar]
- Ganea, O.-E.; Pattanaik, L.; Coley, C.W.; Barzilay, R.; Jensen, K.F.; Green, W.H.; Jaakkola, T.S. GEOMOL: Torsional geometric generation of molecular 3D conformer ensembles. In Proceedings of the 35th International Conference on Neural Information Processing Systems, Virtual Event, 6–14 December 2021; pp. 13757–13769. [Google Scholar]
- Zhang, Z.; Wang, G.; Li, R.; Ni, L.; Zhang, R.; Cheng, K.; Ren, Q.; Kong, X.; Ni, S.; Tong, X.; et al. Tora3D: An autoregressive torsion angle prediction model for molecular 3D conformation generation. J. Cheminf. 2023, 15, 57. [Google Scholar] [CrossRef] [PubMed]
- Gogineni, T.; Xu, Z.; Punzalan, E.; Jiang, R.; Kammeraad, J.; Tewari, A.; Zimmerman, P. Torsionnet: A reinforcement learning approach to sequential conformer search. In Proceedings of the 34th International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 6–12 December 2020; pp. 20142–20153. [Google Scholar]
- Volokhova, A.; Koziarski, M.; Hernández-García, A.; Liu, C.-H.; Miret, S.; Lemos, P.; Thiede, L.; Yan, Z.; Aspuru-Guzik, A.; Bengio, Y. Towards equilibrium molecular conformation generation with GFlowNets. Digit. Discov. 2024, 3, 1038–1047. [Google Scholar] [CrossRef]
- Luo, S.; Shi, C.; Xu, M.; Tang, J. Predicting molecular conformation via dynamic graph score matching. In Proceedings of the 35th International Conference on Neural Information Processing Systems, Virtual Event, 6–14 December 2021; pp. 19784–19795. [Google Scholar]
- Janson, G.; Valdes-Garcia, G.; Heo, L.; Feig, M. Direct generation of protein conformational ensembles via machine learning. Nat. Commun. 2023, 14, 774. [Google Scholar] [CrossRef] [PubMed]
- Zhu, J.; Xia, Y.; Liu, C.; Wu, L.; Xie, S.; Wang, Y.; Wang, T.; Qin, T.; Zhou, W.; Li, H. Direct molecular conformation generation. arXiv 2022, arXiv:2202.01356. [Google Scholar]
- Yang, Z.; Xu, Y.; Pan, L.; Huang, T.; Wang, Y.; Ding, J.; Wang, L.; Xiao, J. Conf-GEM: A geometric information-assisted direct conformation generation model. Artif. Intell. Chem. 2024, 2, 100074. [Google Scholar] [CrossRef]
- Kuznetsov, M.; Ryabov, F.; Schutski, R.; Shayakhmetov, R.; Lin, Y.C.; Aliper, A.; Polykovskiy, D. COSMIC: Molecular Conformation Space Modeling in Internal Coordinates with an Adversarial Framework. J. Chem. Inf. Model. 2024, 64, 3610–3620. [Google Scholar] [CrossRef]
- Xu, M.; Yu, L.; Song, Y.; Shi, C.; Ermon, S.; Tang, J. Geodiff: A geometric diffusion model for molecular conformation generation. arXiv 2022, arXiv:2203.02923. [Google Scholar]
- Zhang, H.; Li, S.; Zhang, J.; Wang, Z.; Wang, J.; Jiang, D.; Bian, Z.; Zhang, Y.; Deng, Y.; Song, J.; et al. SDEGen: Learning to evolve molecular conformations from thermodynamic noise for conformation generation. Chem. Sci. 2023, 14, 1557–1568. [Google Scholar] [CrossRef]
- Fan, Z.; Yang, Y.; Xu, M.; Chen, H. EC-Conf: A ultra-fast diffusion model for molecular conformation generation with equivariant consistency. J. Cheminf. 2024, 16, 107. [Google Scholar] [CrossRef]
- Jing, B.; Corso, G.; Chang, J.; Barzilay, R.; Jaakkola, T. Torsional diffusion for molecular conformer generation. In Proceedings of the 36th International Conference on Neural Information Processing Systems, New Orleans, LO, USA, 28 November–9 December 2022; pp. 24240–24253. [Google Scholar]
- Hendrycks, D.; Lee, K.; Mazeika, M. Using pre-training can improve model robustness and uncertainty. In Proceedings of the 36th International Conference on Machine Learning (ICML), Long Beach, CA, USA, 6–15 June 2019; pp. 2712–2721. [Google Scholar]
- Chen, Y.; Liu, J.; Peng, L.; Wu, Y.; Xu, Y.; Zhang, Z. Auto-encoding variational Bayes. Cambr. Explor. Arts Sci. 2024, 2, ceas.v2i1.33. [Google Scholar] [CrossRef]
- Oord, A.v.d.; Dieleman, S.; Zen, H.; Simonyan, K.; Vinyals, O.; Graves, A.; Kalchbrenner, N.; Senior, A.; Kavukcuoglu, K. Wavenet: A generative model for raw audio. arXiv 2016, arXiv:1609.03499. [Google Scholar]
- Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 3–5 June 2019; pp. 4171–4186. [Google Scholar]
- Hu, W.; Liu, B.; Gomes, J.; Zitnik, M.; Liang, P.; Pande, V.; Leskovec, J. Strategies for pre-training graph neural networks. arXiv 2019, arXiv:1905.12265. [Google Scholar]
- Chen, T.; Kornblith, S.; Norouzi, M.; Hinton, G. A simple framework for contrastive learning of visual representations. In Proceedings of the 37th International Conference on Machine Learning (ICML), Virtual Event, 12–18 July 2020; pp. 1597–1607. [Google Scholar]
- Rong, Y.; Bian, Y.; Xu, T.; Xie, W.; Wei, Y.; Huang, W.; Huang, J. Self-supervised graph transformer on large-scale molecular data. In Proceedings of the 34th International Conference on Neural Information Processing Systems, Virtual Event, 6–12 December 2020; pp. 12559–12571. [Google Scholar]
- Ho, J.; Jain, A.; Abbeel, P. Denoising diffusion probabilistic models. In Proceedings of the 34th International Conference on Neural Information Processing Systems, Virtual Event, 6–12 December 2020; pp. 6840–6851. [Google Scholar]
- Ji, X.; Wang, Z.; Gao, Z.; Zheng, H.; Zhang, L.; Ke, G. Uni-Mol2: Exploring Molecular Pretraining Model at Scale. arXiv 2024, arXiv:2406.14969. [Google Scholar]
- Li, H.; Zhang, R.; Min, Y.; Ma, D.; Zhao, D.; Zeng, J. A knowledge-guided pre-training framework for improving molecular representation learning. Nat. Commun. 2023, 14, 7568. [Google Scholar] [CrossRef]
- Wang, H.; Li, W.; Jin, X.; Cho, K.; Ji, H.; Han, J.; Burke, M.D. Chemical-reaction-aware molecule representation learning. arXiv 2021, arXiv:2109.09888. [Google Scholar]
- Zhang, R.; Lin, Y.; Wu, Y.; Deng, L.; Zhang, H.; Liao, M.; Peng, Y. MvMRL: A multi-view molecular representation learning method for molecular property prediction. Brief Bioinform. 2024, 25, bbae298. [Google Scholar] [CrossRef]
- Wang, X.; Zhao, H.; Tu, W.-w.; Yao, Q. Automated 3d pre-training for molecular property prediction. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Long Beach, CA, USA, 6–10 August 2023; pp. 2419–2430. [Google Scholar]
- Alhamoud, K.; Ghunaim, Y.; Alshehri, A.S.; Li, G.; Ghanem, B.; You, F. Leveraging 2D molecular graph pretraining for improved 3D conformer generation with graph neural networks. Comput. Chem. Eng. 2024, 183, 108622. [Google Scholar] [CrossRef]
- Zheng, X.; Tomiura, Y. A BERT-based pretraining model for extracting molecular structural information from a SMILES sequence. J. Cheminf. 2024, 16, 71. [Google Scholar] [CrossRef]
- Axelrod, S.; Gomez-Bombarelli, R. GEOM, energy-annotated molecular conformations for property prediction and molecular generation. Sci. Data 2022, 9, 185. [Google Scholar] [CrossRef]
- Bento, A.P.; Hersey, A.; Felix, E.; Landrum, G.; Gaulton, A.; Atkinson, F.; Bellis, L.J.; De Veij, M.; Leach, A.R. An open source chemical structure curation pipeline using RDKit. J. Cheminf. 2020, 12, 51. [Google Scholar] [CrossRef]
- Liu, Y.; Wang, L.; Liu, M.; Liu, Y.; Zhang, X.; Oztekin, B.; Ji, S. Spherical message passing for 3D moecular graphs. arXiv 2022, arXiv:2102.05013. [Google Scholar]


| Diffusion Steps | Sampling Steps | COV–R (%) ↑2 | MAT–R (Å) ↓3 | COV–P (%) ↑ | MAT–P (Å) ↓ |
|---|---|---|---|---|---|
| 3000 | 3000 | 0.877 | 0.354 | 0.477 | 0.551 |
| 5000 | 5000 | 0.919 | 0.194 | 0.528 | 0.445 |
| 7000 | 7000 | 0.762 | 0.288 | 0.493 | 0.518 |
| 10,000 | 10,000 | 0.677 | 0.393 | 0.431 | 0.515 |
| Diffusion Steps | Sampling Steps | COV–R (%) ↑2 | MAT–R (Å) ↓3 | COV–P (%) ↑ | MAT–P (Å) ↓ |
|---|---|---|---|---|---|
| 6000 | 6000 | 0.901 | 0.795 | 0.687 | 1.286 |
| 8000 | 8000 | 0.904 | 0.785 | 0.689 | 1.083 |
| 10,000 | 10,000 | 0.916 | 0.770 | 0.752 | 0.988 |
| 12,000 | 12,000 | 0.805 | 1.065 | 0.663 | 1.358 |
| Models | COV–R (%) ↑2 | MAT–R (Å) ↓3 | COV–P (%) ↑ | MAT–P (Å) ↓ |
|---|---|---|---|---|
| GraphDG | 0.733 | 0.425 | 0.439 | 0.581 |
| ConfVAE | 0.778 | 0.415 | 0.380 | 0.622 |
| CGCF | 0.781 | 0.422 | 0.662 | 0.661 |
| CONFGF | 0.885 | 0.267 | 0.522 | 0.464 |
| SDEGen | 0.815 | 0.357 | 0.484 | 0.566 |
| EC-Conf | 0.813 | 0.324 | 0.794 | 0.330 |
| GeoMol + MRL | 0.826 | 0.298 | 0.837 | 0.310 |
| Diffusion | 0.919 | 0.194 | 0.528 | 0.445 |
| PDCG | 0.934 | 0.188 | 0.752 | 0.369 |
| Model | COV–R (%) ↑2 | MAT–R (Å) ↓3 | COV–P (%) ↑ | MAT–P (Å) ↓ |
|---|---|---|---|---|
| GraphDG | 0.083 | 1.972 | 0.021 | 2.434 |
| ConfVAE | 0.552 | 1.238 | 0.230 | 1.829 |
| CGCF | 0.540 | 1.249 | 0.217 | 1.857 |
| CONFGF | 0.622 | 1.163 | 0.234 | 1.722 |
| SDEGen | 0.673 | 1.126 | 0.323 | 1.679 |
| EC-Conf | 0.864 | 0.902 | 0.701 | 1.108 |
| GeoMol + MRL | 0.815 | 1.132 | 0.756 | 1.045 |
| Diffusion | 0.916 | 0.770 | 0.752 | 0.988 |
| PDCG | 0.929 | 0.712 | 0.766 | 0.945 |
| Property | Unit | DFT | PDCG |
|---|---|---|---|
| μ | D | 0.0245 | 0.0330 |
| α | 0.0449 | 0.0510 | |
| ϵHOMO | eV | 0.0228 | 0.0329 |
| ϵLUMO | eV | 0.0189 | 0.0216 |
| ϵgap | eV | 0.0313 | 0.0511 |
| <R2> | 0.2680 | 0.4367 | |
| zpve | eV | 0.0011 | 0.0020 |
| Cv | cal/(mol·K) | 0.0215 | 0.0255 |
| U0 | eV | 0.0063 | 0.0634 |
| U | eV | 0.0064 | 0.0450 |
| H | eV | 0.0063 | 0.0568 |
| G | eV | 0.0078 | 0.0447 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Liu, Y.; Zheng, Y.; Tariq, A.; Nan, X.; Qu, L.; Song, J. PDCG: A Diffusion Model Guided by Pre-Training for Molecular Conformation Generation. Chemistry 2026, 8, 29. https://doi.org/10.3390/chemistry8020029
Liu Y, Zheng Y, Tariq A, Nan X, Qu L, Song J. PDCG: A Diffusion Model Guided by Pre-Training for Molecular Conformation Generation. Chemistry. 2026; 8(2):29. https://doi.org/10.3390/chemistry8020029
Chicago/Turabian StyleLiu, Yanchen, Yameng Zheng, Amina Tariq, Xiaofei Nan, Lingbo Qu, and Jinshuai Song. 2026. "PDCG: A Diffusion Model Guided by Pre-Training for Molecular Conformation Generation" Chemistry 8, no. 2: 29. https://doi.org/10.3390/chemistry8020029
APA StyleLiu, Y., Zheng, Y., Tariq, A., Nan, X., Qu, L., & Song, J. (2026). PDCG: A Diffusion Model Guided by Pre-Training for Molecular Conformation Generation. Chemistry, 8(2), 29. https://doi.org/10.3390/chemistry8020029

