GPU-Accelerated Fock Matrix Computation with Efficient Reduction
Abstract
:1. Introduction
Contributions
- Distributed atomic reduction across replicated Fock matrices:
- Hybrid approach with thread-local reduction:
- Computational experiments with relevant molecules:
2. Background
2.1. Hartree–Fock Method
- Step 1
- Initialize the orbital coefficients ,
- Step 2
- Calculate the Fock matrix and update by solving Equation (5),
- Step 3
- Iterate Step 2 until and the energy converge.
2.2. Fock Matrix Computation
2.2.1. Two-Electron Repulsion Integrals
2.2.2. Update of the Fock Matrix
2.3. GPU Programming Model
3. Related Works
4. Proposed Method
4.1. GPU Parallelization of ERIs
4.2. Distributed Atomic Reduction Through Replicated Fock Matrix Update
Algorithm 1 Fock matrix construction using the distributed atomic reduction method |
|
4.3. Thread-Local Reduction in Registers
5. Performance Evaluation
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Arodola, O.; Soliman, M. Quantum mechanics implementation in drug-design workflows: Does it really help? Drug Des. Dev. Ther. 2017, 11, 2551–2564. [Google Scholar] [CrossRef] [PubMed]
- Cavasotto, C.N.; Adler, N.S.; Aucar, M.G. Quantum Chemical Approaches in Structure-Based Virtual Screening and Lead Optimization. Front. Chem. 2018, 6, 188. [Google Scholar] [CrossRef] [PubMed]
- Biz, C.; Fianchini, M.; Gracia, J. Strongly correlated electrons in catalysis: Focus on quantum exchange. ACS Catal. 2021, 11, 14249–14261. [Google Scholar] [CrossRef]
- von Burg, V.; Low, G.H.; Häner, T.; Steiger, D.S.; Reiher, M.; Roetteler, M.; Troyer, M. Quantum computing enhanced computational catalysis. Phys. Rev. Res. 2021, 3, 033055. [Google Scholar] [CrossRef]
- Roothaan, C.C.J. New Developments in Molecular Orbital Theory. Rev. Mod. Phys. 1951, 23, 69–89. [Google Scholar] [CrossRef]
- Ito, Y.; Tsuji, S.; Fujii, H.; Suzuki, K.; Yokogawa, N.; Nakano, K.; Kasagi, A. Introduction to Computational Quantum Chemistry for Computer Scientists. In Proceedings of the 2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), San Francisco, CA, USA, 27–31 May 2024; pp. 273–282. [Google Scholar] [CrossRef]
- Bartlett, R.J.; Stanton, J.F. Applications of Post-Hartree-Fock Methods: A Tutorial. In Reviews in Computational Chemistry; John Wiley & Sons, Ltd.: Hoboken, NJ, USA, 1994; pp. 65–169. [Google Scholar] [CrossRef]
- Gill, P.M. Molecular integrals Over Gaussian Basis Functions. In Advances in Quantum Chemistry; Academic Press: Cambridge, MA, USA, 1994; Volume 25, pp. 141–205. [Google Scholar] [CrossRef]
- Almlöf, J.; Faegri, K., Jr.; Korsell, K. Principles for a direct SCF approach to LICAO–MO ab-initio calculations. J. Comput. Chem. 1982, 3, 385–399. [Google Scholar] [CrossRef]
- Yasuda, K. Two-electron integral evaluation on the graphics processor unit. J. Comput. Chem. 2008, 29, 334–342. [Google Scholar] [CrossRef] [PubMed]
- Ufimtsev, I.S.; Martínez, T.J. Quantum Chemistry on Graphical Processing Units. 1. Strategies for Two-Electron Integral Evaluation. J. Chem. Theory Comput. 2008, 4, 222–231. [Google Scholar] [CrossRef]
- Ufimtsev, I.S.; Martinez, T.J. Quantum chemistry on graphical processing units. 2. Direct self-consistent-field implementation. J. Chem. Theory Comput. 2009, 5, 1004–1015. [Google Scholar] [CrossRef]
- Asadchev, A.; Gordon, M.S. New multithreaded hybrid CPU/GPU approach to Hartree-Fock. J. Chem. Theory Comput. 2012, 8, 4166–4176. [Google Scholar] [CrossRef]
- Miao, Y.; Merz, K.M. Acceleration of high angular momentum electron repulsion integrals and integral derivatives on graphics processing units. J. Chem. Theory Comput. 2015, 11, 1449–1462. [Google Scholar] [CrossRef] [PubMed]
- Mironov, V.; Alexeev, Y.; Keipert, K.; D’mello, M.; Moskovsky, A.; Gordon, M.S. An efficient MPI/openMP parallelization of the Hartree-Fock method for the second generation of Intel®Xeon PhiTM processor. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, Denver, CO, USA, 12–17 November 2017. [Google Scholar]
- Huang, H.; Chow, E. Accelerating quantum chemistry with vectorized and batched integrals. In Proceedings of the SC18: International Conference for High Performance Computing, Networking, Storage and Analysis, Dallas, TX, USA, 11–16 November 2018; pp. 529–542. [Google Scholar]
- Tornai, G.J.; Ladjánszki, I.; Rák, A.; Kis, G.; Cserey, G. Calculation of Quantum Chemical Two-Electron Integrals by Applying Compiler Technology on GPU. J. Chem. Theory Comput. 2019, 15, 5319–5331. [Google Scholar] [CrossRef] [PubMed]
- Huang, H.; Sherrill, C.D.; Chow, E. Techniques for high-performance construction of Fock matrices. J. Chem. Phys. 2020, 152, 024122. [Google Scholar] [CrossRef]
- Barca, G.M.J.; Galvez-Vallejo, J.L.; Poole, D.L.; Rendell, A.P.; Gordon, M.S. High-performance, graphics processing unit-accelerated Fock build algorithm. J. Chem. Theory Comput. 2020, 16, 7232–7238. [Google Scholar] [CrossRef]
- Barca, G.M.J.; Alkan, M.; Galvez-Vallejo, J.L.; Poole, D.L.; Rendell, A.P.; Gordon, M.S. Faster Self-Consistent Field (SCF) Calculations on GPU Clusters. J. Chem. Theory Comput. 2021, 17, 7486–7503. [Google Scholar] [CrossRef]
- Tian, Y.; Suo, B.; Ma, Y.; Jin, Z. Optimizing two-electron repulsion integral calculations with McMurchie-Davidson method on graphic processing unit. J. Chem. Phys. 2021, 155, 34112. [Google Scholar] [CrossRef] [PubMed]
- Manathunga, M.; Jin, C.; Cruzeiro, V.W.D.; Miao, Y.; Mu, D.; Arumugam, K.; Keipert, K.; Aktulga, H.M.; Merz, K.M.J.; Götz, A.W. Harnessing the Power of Multi-GPU Acceleration into the Quantum Interaction Computational Kernel Program. J. Chem. Theory Comput. 2021, 17, 3955–3966. [Google Scholar] [CrossRef]
- Johnson, K.G.; Mirchandaney, S.; Hoag, E.; Heirich, A.; Aiken, A.; Martínez, T.J. Multinode Multi-GPU Two-Electron Integrals: Code Generation Using the Regent Language. J. Chem. Theory Comput. 2022, 18, 6522–6536. [Google Scholar] [CrossRef] [PubMed]
- Qi, J.; Zhang, Y.; Yang, M. A hybrid CPU/GPU method for Hartree-Fock self-consistent-field calculation. J. Chem. Phys. 2023, 159, 104101. [Google Scholar] [CrossRef]
- Suzuki, K.; Ito, Y.; Fujii, H.; Yokogawa, N.; Tsuji, S.; Nakano, K.; Kasagi, A. GPU acceleration of head-Gordon-Pople algorithm. In Proceedings of the 2024 Twelfth International Symposium on Computing and Networking (CANDAR), Naha, Japan, 3–6 December 2024; pp. 115–124. [Google Scholar]
- Tsuji, S.; Ito, Y.; Fujii, H.; Yokogawa, N.; Suzuki, K.; Nakano, K.; Kasagi, A. Dynamic Screening of Two-Electron Repulsion Integrals in GPU Parallelization. In Proceedings of the 2024 Twelfth International Symposium on Computing and Networking Workshops (CANDARW), Naha, Japan, 26–29 November 2024; pp. 211–217. [Google Scholar] [CrossRef]
- Palethorpe, E.; Stocks, R.; Barca, G.M.J. Advanced techniques for high-performance Fock matrix construction on GPU clusters. J. Chem. Theory Comput. 2024, 20, 10424–10442. [Google Scholar] [CrossRef]
- Fujii, H.; Ito, Y.; Yokogawa, N.; Suzuki, K.; Tsuji, S.; Nakano, K.; Parque, V.; Kasagi, A. Efficient GPU Implementation of the McMurchie–Davidson Method for Shell-Based ERI Computations. Appl. Sci. 2025, 15, 2572. [Google Scholar] [CrossRef]
- Yokogawa, N.; Ito, Y.; Tsuji, S.; Fujii, H.; Suzuki, K.; Nakano, K.; Kasagi, A. Parallel GPU computation of nuclear attraction integrals in quantum chemistry. In Proceedings of the 2024 Twelfth International Symposium on Computing and Networking Workshops (CANDARW), Naha, Japan, 26–29 November 2024; pp. 163–169. [Google Scholar]
- Pritchard, B.P.; Altarawy, D.; Didier, B.; Gibson, T.D.; Windus, T.L. New Basis Set Exchange: An Open, Up-to-Date Resource for the Molecular Sciences Community. J. Chem. Inf. Model. 2019, 59, 4814–4820. [Google Scholar] [CrossRef] [PubMed]
- McMurchie, L.E.; Davidson, E.R. One- and two-electron integrals over cartesian Gaussian functions. J. Comput. Phys. 1978, 26, 218–231. [Google Scholar] [CrossRef]
- Obara, S.; Saika, A. Efficient recursive computation of molecular integrals over Cartesian Gaussian functions. J. Chem. Phys. 1986, 84, 3963–3974. [Google Scholar] [CrossRef]
- Head-Gordon, M.; Pople, J.A. A method for two-electron Gaussian integral and integral derivative evaluation using recurrence relations. J. Chem. Phys. 1988, 89, 5777–5786. [Google Scholar] [CrossRef]
- Boys, S.F. Electronic wave functions — I. A general method of calculation for the stationary states of any molecular system. Proc. R. Soc. Lond. A Math. Phys. Sci. 1950, 200, 542–554. [Google Scholar] [CrossRef]
- Tsuji, S.; Ito, Y.; Nakano, K.; Kasagi, A. GPU Acceleration of the Boys Function Evaluation in Computational Quantum Chemistry. Concurr. Comput. Pract. Exp. 2025, 37, e8328. [Google Scholar] [CrossRef]
- Gill, P.M.; Johnson, B.G.; Pople, J.A. A simple yet powerful upper bound for Coulomb integrals. Chem. Phys. Lett. 1994, 217, 65–68. [Google Scholar] [CrossRef]
- Gordon, M.S.; Schmidt, M.W. Chapter 41—Advances in electronic structure theory: GAMESS a decade later. In Theory and Applications of Computational Chemistry; Dykstra, C.E., Frenking, G., Kim, K.S., Scuseria, G.E., Eds.; Elsevier: Amsterdam, The Netherlands, 2005; pp. 1167–1189. [Google Scholar] [CrossRef]
- Sun, Q. Libcint: An efficient general integral library for Gaussian basis functions. J. Comput. Chem. 2015, 36, 1664–1671. [Google Scholar] [CrossRef]
- Parrish, R.M.; Burns, L.A.; Smith, D.G.A.; Simmonett, A.C.; DePrince, A.E.I.; Hohenstein, E.G.; Bozkaya, U.; Sokolov, A.Y.; Di Remigio, R.; Richard, R.M.; et al. Psi4 1.1: An Open-Source Electronic Structure Program Emphasizing Automation, Advanced Libraries, and Interoperability. J. Chem. Theory Comput. 2017, 13, 3185–3197. [Google Scholar] [CrossRef] [PubMed]
- Sun, Q.; Berkelbach, T.C.; Blunt, N.S.; Booth, G.H.; Guo, S.; Li, Z.; Liu, J.; McClain, J.D.; Sayfutyarova, E.R.; Sharma, S.; et al. PySCF: The Python-based simulations of chemistry framework. WIREs Comput. Mol. Sci. 2018, 8, e1340. [Google Scholar] [CrossRef]
- Kühne, T.D.; Iannuzzi, M.; Del Ben, M.; Rybkin, V.V.; Seewald, P.; Stein, F.; Laino, T.; Khaliullin, R.Z.; Schütt, O.; Schiffmann, F.; et al. CP2K: An electronic structure and molecular dynamics software package - Quickstep: Efficient and accurate electronic structure calculations. J. Chem. Phys. 2020, 152, 194103. [Google Scholar] [CrossRef] [PubMed]
- Seritan, S.; Bannwarth, C.; Fales, B.S.; Hohenstein, E.G.; Isborn, C.M.; Kokkila-Schumacher, S.I.L.; Li, X.; Liu, F.; Luehr, N.; Snyder, J.W.; et al. TeraChem: A graphical processing unit-accelerated electronic structure package for large-scale ab initio molecular dynamics. Wiley Interdiscip. Rev. Comput. Mol. Sci. 2021, 11, e1494. [Google Scholar] [CrossRef]
- Wang, Y.; Hait, D.; Johnson, K.G.; Fajen, O.J.; Zhang, J.H.; Guerrero, R.D.; Martínez, T.J. Extending GPU-accelerated Gaussian integrals in the TeraChem software package to f type orbitals: Implementation and applications. J. Chem. Phys. 2024, 161, 174118. [Google Scholar] [CrossRef] [PubMed]
- Miao, Y.; Merz, K.M., Jr. Acceleration of electron repulsion integral evaluation on graphics processing units via use of recurrence relations. J. Chem. Theory Comput. 2013, 9, 965–976. [Google Scholar] [CrossRef]
- Li, R.; Sun, Q.; Zhang, X.; Chan, G.K.L. Introducing GPU acceleration into the python-based simulations of chemistry framework. J. Phys. Chem. A 2025, 129, 1459–1468. [Google Scholar] [CrossRef]
- Wu, X.; Sun, Q.; Pu, Z.; Zheng, T.; Ma, W.; Yan, W.; Yu, X.; Wu, Z.; Huo, M.; Li, X.; et al. Enhancing GPU-acceleration in the Python-based Simulations of Chemistry Framework. arXiv 2024, arXiv:2404.09452. [Google Scholar] [CrossRef]
Destination | |||||||
---|---|---|---|---|---|---|---|
Reduction rate | 1 | ||||||
1 | 1 | 1 | |||||
1 | |||||||
1 | 1 | 1 | |||||
1 | 1 | ||||||
1 | |||||||
#Additional register variables | 0 | 3 | 16 | 16 | 36 | 54 |
Compound | Azobenzene | Rivastigmine | Penicillin G | |
---|---|---|---|---|
Molecular Formula | C12H10N2 | C14H22N2O2 | C16H18N2O4S | |
Basis Set | 6-311G | 6-31G | 6-31G | |
#Fock replicas for DAR: | 1 | |||
2 | ||||
4 | ||||
8 | ||||
16 | ||||
32 | ||||
64 | ||||
128 | ||||
256 | ||||
Compound | Rivastigmine | Penicillin G | ATP | ||||
---|---|---|---|---|---|---|---|
Molecular Formula | C14H22N2O2 | C16H18N2O4S | C10H16N5O13P3 | ||||
#Basis Functions M | 206 | 247 | 323 | ||||
#Primitive Shells N | 340 | 406 | 534 | ||||
Reduction Method | DAR | TLR + DAR | DAR | TLR + DAR | DAR | TLR + DAR | |
#Fock replicas for DAR: | 1 | 251 | 152 | 462 | 276 | 1232 | 699 |
2 | 183 | 131 | 349 | 243 | 888 | 600 | |
4 | 150 | 120 | 287 | 224 | 704 | 551 | |
8 | 132 | 116 | 243 | 216 | 606 | 531 | |
16 | 121 | 115 | 222 | 213 | 547 | 525 | |
32 | 113 | 115 | 208 | 212 | 563 | 525 | |
64 | 109 | 115 | 282 | 215 | 937 | 559 | |
128 | 169 | 121 | 350 | 231 | 982 | 597 | |
256 | 187 | 130 | 383 | 265 | 1041 | 693 | |
Speedup rate |
Compound | Paclitaxel | Valinomycin | Cyclosporine | ||||
---|---|---|---|---|---|---|---|
Molecular Formula | C47H51NO14 | C54H90N6O18 | C62H111N11O12 | ||||
#Basis Functions M | 361 | 480 | 536 | ||||
#Primitive Shells N | 711 | 972 | 1098 | ||||
Reduction Method | DAR | TLR + DAR | DAR | TLR + DAR | DAR | TLR + DAR | |
#Fock replicas for DAR: | 1 | ||||||
2 | |||||||
4 | |||||||
8 | |||||||
16 | |||||||
32 | |||||||
64 | |||||||
128 | |||||||
256 | |||||||
Speedup rate |
ERI Types | #Fock Replicas | Stall | Warp | Stall/Warp | L2 Cache |
---|---|---|---|---|---|
[cycles] | [cycles] | [%] | [Gbyte/s] | ||
1 | |||||
64 | |||||
pp | |||||
1 | |||||
64 | |||||
pp | |||||
1 | |||||
64 | |||||
pp | |||||
1 | |||||
64 | |||||
pp | |||||
1 | |||||
64 | |||||
pp | |||||
1 | |||||
64 | |||||
pp |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Tsuji, S.; Ito, Y.; Fujii, H.; Yokogawa, N.; Suzuki, K.; Nakano, K.; Parque, V.; Kasagi, A. GPU-Accelerated Fock Matrix Computation with Efficient Reduction. Appl. Sci. 2025, 15, 4779. https://doi.org/10.3390/app15094779
Tsuji S, Ito Y, Fujii H, Yokogawa N, Suzuki K, Nakano K, Parque V, Kasagi A. GPU-Accelerated Fock Matrix Computation with Efficient Reduction. Applied Sciences. 2025; 15(9):4779. https://doi.org/10.3390/app15094779
Chicago/Turabian StyleTsuji, Satoki, Yasuaki Ito, Haruto Fujii, Nobuya Yokogawa, Kanta Suzuki, Koji Nakano, Victor Parque, and Akihiko Kasagi. 2025. "GPU-Accelerated Fock Matrix Computation with Efficient Reduction" Applied Sciences 15, no. 9: 4779. https://doi.org/10.3390/app15094779
APA StyleTsuji, S., Ito, Y., Fujii, H., Yokogawa, N., Suzuki, K., Nakano, K., Parque, V., & Kasagi, A. (2025). GPU-Accelerated Fock Matrix Computation with Efficient Reduction. Applied Sciences, 15(9), 4779. https://doi.org/10.3390/app15094779