# GPU Acceleration of CFD Simulations in OpenFOAM

^{*}

## Abstract

**:**

`amgx4Foam`library, which connects the open-source AmgX library from NVIDIA to OpenFOAM. Matrix generation, involving tasks such numerical integration and assembly, is performed on CPUs. Subsequently, the assembled matrix is processed on the CPU. This approach accelerates the computationally intensive linear solver phase of simulations on GPUs. (b) Enhancements to code performance in reactive flow simulations, by relocating the solution of finite-rate chemistry to GPUs, which serve as co-processors. We present code verification and validation along with performance metrics targeting two distinct application sets, namely, aerodynamics calculations and supersonic combustion with finite-rate chemistry.

## 1. Introduction

#### 1.1. Motivation of This Work

#### 1.2. Highlights

`amgx4foam`, linking OpenFOAM to the NVIDIA AmgX library [21,30]; and (b) the vectorization/hybridization of GPGPU reactive flow solvers, which involves the migration of compute-intensive operations to GPUs which function as co-processors. The verification of these software components is conducted on benchmark test cases where reference solutions are available. Furthermore, assessment is carried out on impactful applications within the automotive and aerospace domains, namely, accelerating the solution of aerodynamics for external flows and enhancing the speed of combustion simulations.

#### 1.3. Paper Structure

## 2. Implicit Segregated Solution Method of the Flow Transport for All Flow Speeds

- -
- The generalized form of the Newton’s law of viscosity:$$\mathit{R}=\mu \left[\nabla \mathit{U}+{\left(\nabla \mathit{U}\right)}^{T}\right]+\left(\frac{2}{3}\mu \nabla \xb7\mathit{U}\right)\mathit{I}$$
- -
- A nine-coefficient polynomial computes the thermodynamic properties in the standard state for each gaseous specie, as in the NASA chemical equilibrium code [34], to define the internal energy as a function of the pressure and temperature.
- -
- The Equation of State (EoS) for the gas, which is assumed to be mixture of ${N}_{s}$ species:$$p=\rho {R}_{0}T\sum _{i=1}^{{N}_{s}}\frac{{Y}_{i}}{{W}_{i}}=\rho \frac{{R}_{0}}{W}T\phantom{\rule{1.em}{0ex}}\mathrm{with}\phantom{\rule{1.em}{0ex}}\frac{1}{W}=\sum _{i=1}^{{N}_{s}}\frac{{Y}_{i}}{{W}_{i}}$$
- -
- Eddy–viscosity based models for turbulence closure.

## 3. Solution of Large Sparse Linear Systems in Segregated Solvers

`amgx4Foam`, designed to be compatible with any OpenFOAM solver. The library’s architecture is designed for enhanced ease of use, installation, and maintenance while ensuring compatibility with both current and future software releases. The current approach carries out matrix assembly on the CPUs. This implies that there must be a balance between the number of available GPU cards and CPU cores. Having a large number of GPUs with only a small number of CPUs would not yield significant performance improvements, as data input/output and matrix assembly could become bottlenecks in the computations. Hence, for the calculations presented in this section (acceleration of flow tranport calculations), the optimal hardware configuration assumes one GPU per CPU node. This principle does not hold true in scenarios involving solving ordinary differential equations (ODEs) within reacting flow computations. In such cases, as the number of GPU cards increases, the capacity to simultaneously solve a larger number of systems of ODEs (up to the mesh size) grows as well. This contradicts the previously mentioned balance between GPU cards and CPU cores. In this context, having more GPUs in the same cluster node can enhance the capacity to handle multiple systems of ODEs simultaneously, potentially leading to improved performance. However, in scenarios where matrix assembly remains a computational bottleneck, the relationship between GPU cards and CPU cores as discussed earlier continues to hold. It is worth clarifying that the code implementation remains unaffected by the hardware configuration, and is capable of accommodating any number of CPU cores and GPU cards per node. Offloading the linear algebra solution to the GPU involved several additional steps, including converting the matrix format from LDU to CSR and transferring data between RAM and GPU memory. These processes introduce time overheads; thus, the overall computational efficiency of the GPU needs to consider the net performance gain. It is reasonable to anticipate that larger problem sizes would realize greater speedup benefits from GPU utilization.

`simpleFoam`). The k-omega Shear Stress Transport (SST) model was employed as the turbulence model. The calculations for the momentum predictor and turbulence transport had minimal impact on the computations [8]. Thus, the GPU offloading was exclusively applied to the solution of the pressure (Poisson) equation. All computations were carried out in double precision. The Algebraic Multi-Grid (AMG) solver supported by AmgX [21] was employed as a preconditioner for various iterative Krylov outer solvers.

`motorbike`tutorial test case using different grid sizes (S, M, L). These simulations were run on 8, 16, and 32 CPU cores. Two solver configurations were compared: (a) Conjugate Gradient (CG) with simplified Diagonal-based Incomplete Cholesky (DIC) preconditioning and (b) Geometric Algebraic Multi-Grid (GAMG).

## 4. Reactive Flows/Combustion Simulations

#### 4.1. Governing Equations

`canteraToFoam`utility developed by the authors. As shown in Figure 5, the GPU solver for the finite-rate chemistry can be coupled to any solver available in OpenFOAM, regardless of whether it is pressure-based or density-based. In the following, a shock-capturing density-based solver for the solution of supersonic flows, chosen due to the physics simulated in the validation tests, is used for the fluid transport.

#### 4.2. Multi-Cell Approach to Accelerating the Chemical Solution on Hybrid CPU–GPU Systems

#### 4.3. Further Notes about Domain Decomposition in Heterogeneous CPU/GPU Systems

#### 4.4. Validation and Verification

- -
- Auto-ignition in a single-cell batch reactor. This assessment aimed to verify the solution of Direct Integration (DI) of kinetic mechanisms without considering fluid transport calculations. The GPGPU algorithm’s performance was juxtaposed with two alternatives: (a) an equivalent CPU version of the ODE solver already present in OpenFOAM [31,32] and (b) the solution obtained from Cantera software [65]. This set of simulations was designed to assess the influence of data transfer latency on the overall computation time and to showcase that the various ODE solvers utilized yielded similar results, facilitating a fair comparison.
- -
- Reactive flow simulations on multi-cell domains. This evaluation targeted the holistic performance of the GPGPU solver within the context of fluid transport in the presence of multi-domain parallelization. The validation test cases encompassed the simulation of a scramjet engine.

`canteraToFoam`. All calculations on the GPUs were performed in double precision.

#### 4.5. Auto-Ignition in Single-Cell Batch Reactors

- -
- The spatial distribution of temperature and species mass fraction;
- -
- The computational time required by the calculation;
- -
- The cumulative mass fraction of the chemical species:$${\overline{Y}}_{i}=\frac{1}{\Delta t}{\int}_{0}^{t}{Y}_{i}\phantom{\rule{0.166667em}{0ex}}dt.$$

**Hydrogen Combustion.**The initial set of computations pertains to a stoichiometric hydrogen–air mixture undergoing reactions at a constant pressure of 2.0 bar and an initial temperature of 1000 K. The mechanism encompasses 10 species and 27 reactions [66]. The absence of stiffness results from both the specific chemistry considered and the utilization of a small global time step.

#### 4.6. Validation Test: Supersonic Combustion in a Scramjet Engine

#### 4.7. Simulation of Supersonic Combustion in the Scramjet Engine

#### 4.8. Performance

_{CPU}and t

_{GPU}represent the respective computation times for solving the finite-rate chemistry problem on the CPU and GPU. Additionally, we define t

_{f,GPU}as the time required for GPU memory allocation, CPU data collection, and the forward data transfer (CPU-to-GPU), t

_{k,GPU}as the time for the kernel call and actual ODE integration on the GPU, and ${\mathrm{t}}_{r,\mathrm{GPU}}$ as the duration for completing the backward data transfer (GPU-to-CPU). The use of GPGPU solver is advantageous if

## 5. Conclusions

## Author Contributions

## Funding

`exaFoam`[9]). The JU receives support from the European Union’s Horizon 2020 Research and Innovation Programme, as well as from France, the United Kingdom, Germany, Italy, Croatia, Spain, Greece, and Portugal.

## Acknowledgments

## Conflicts of Interest

## Appendix A. Link between Density and Pressure Correction

## Appendix B. Link between Veocity and Pressure Correction

## References

- Resolved Analytics, CFD Software Comparison. Available online: https://www.resolvedanalytics.com/theflux/comparing-cfd-software (accessed on 15 August 2023).
- NVIDIA. The Computational Fluid Dynamics Revolution Driven by GPU Acceleration. Available online: https://developer.nvidia.com/blog/computational-fluid-dynamics-revolution-driven-by-gpu-acceleration/ (accessed on 10 August 2023).
- Kiran, U.; Sharma, D.; Gautam, S.S. GPU-warp based finite element matrices generation and assembly using coloring method. J. Comput. Des. Eng.
**2019**, 6, 705–718. [Google Scholar] [CrossRef] - Accelerating ANSYS Fluent using NVIDIA GPUs, NVIDIA. 2014. Available online: https://www.nvidia.com/content/tesla/pdf/ansys-fluent-nvidiagpu-userguide.pdf (accessed on 11 August 2023).
- Siemens Digital Industries Software. Simcenter STAR-CCM+, version 2023; Siemens: Munich, Germany, 2023. [Google Scholar]
- Martineau, M.; Posey, S.; Spiga, F. OpenFOAM solver developments for GPU and arm CPU. In Proceedings of the 18th OpenFOAM Workshop, Genova, Italy, 11–14 July 2023. [Google Scholar]
- Piscaglia, F. Modern methods for accelerated CFD computations in OpenFOAM. In Proceedings of the Keynote Talk at the 6th French/Belgian OpenFOAM Users Conference, Grenoble, France, 13–14 June 2023. [Google Scholar]
- Ferziger, J.H.; Perić, M.; Street, R.L. Computational Methods for Fluid Dynamics, 4th ed.; Springer: Berlin/Heidelberg, Germany, 2020. [Google Scholar]
`exaFoam`EU Project. Available online: https://exafoam.eu/ (accessed on 6 August 2023).- The European Centre of Excellence for Engineering Applications (EXCELLERAT P2). Available online: https://www.excellerat.eu/ (accessed on 6 August 2023).
- Ghioldi, F.; Piscaglia, F. GPU Acceleration of CFD Simulations in OpenFOAM. In Proceedings of the 18th OpenFOAM Workshop, Genova, Italy, 11–14 July 2023. [Google Scholar]
- Wichman, I.S. On the use of operator-splitting methods for the equations of combustion. Combust. Flame
**1991**, 83, 240–252. [Google Scholar] [CrossRef] - Descombes, S.; Duarte, M.; Massot, M. Operator Splitting Methods with Error Estimator and Adaptive Time-Stepping. Application to the Simulation of Combustion Phenomena. In Splitting Methods in Communication, Imaging, Science, and Engineering; Springer International Publishing: Cham, Switzerland, 2016; pp. 627–641. [Google Scholar] [CrossRef]
- Yang, B.; Pope, S. An investigation of the accuracy of manifold methods and splitting schemes in the computational implementation of combustion chemistry. Combust. Flame
**1998**, 112, 16–32. [Google Scholar] [CrossRef] - Singer, M.; Pope, S.; Najm, H. Modeling unsteady reacting flow with operator splitting and ISAT. Combust. Flame
**2006**, 147, 150–162. [Google Scholar] [CrossRef] - Ren, Z.; Xu, C.; Lu, T.; Singer, M.A. Dynamic adaptive chemistry with operator splitting schemes for reactive flow simulations. J. Comput. Phys.
**2014**, 263, 19–36. [Google Scholar] [CrossRef] - Lu, Z.; Zhou, H.; Li, S.; Ren, Z.; Lu, T.; Law, C.K. Analysis of operator splitting errors for near-limit flame simulations. J. Comput. Phys.
**2017**, 335, 578–591. [Google Scholar] [CrossRef] - Xue, W.; Jackson, C.W.; Roy, C.J. An improved framework of GPU computing for CFD applications on structured grids using OpenACC. J. Parallel Distrib. Comput.
**2021**, 156, 64–85. [Google Scholar] [CrossRef] - Ghioldi, F.; Piscaglia, F.; Ghioldi, F.; Piscaglia, F. A CPU-GPU Paradigm to Accelerate Turbulent Combustion and Reactive-Flow CFD Simulations. In Proceedings of the 8th OpenFOAM Conference, Virtual, 13–15 October 2020. [Google Scholar]
- Ghioldi, F.; Piscaglia, F.; Ghioldi, F.; Piscaglia, F. GPU-Accelerated Simulation of Supersonic Combustion in Scramjet Engines by OpenFOAM. In Proceedings of the 33rd International Conference on Parallel Computational Fluid Dynamics—ParCFD2022, Manhattan, NY, USA, 25–27 May 2022. [Google Scholar]
- Martineau, M.; Posey, S.; Spiga, F. AmgX GPU Solver Developments for OpenFOAM. In Proceedings of the 8th OpenFOAM Conference, Virtual, 13–15 October 2020. [Google Scholar]
- Nagy, D.; Plavecz, L.; Hegedűs, F. The art of solving a large number of non-stiff, low-dimensional ordinary differential equation systems on GPUs and CPUs. Commun. Nonlinear Sci. Numer. Simul.
**2022**, 112, 106521. [Google Scholar] [CrossRef] - Jaiswal, S.; Reddy, R.; Banerjee, R.; Sato, S.; Komagata, D.; Ando, M.; Okada, J. An Efficient GPU Parallelization for Arbitrary Collocated Polyhedral Finite Volume Grids and Its Application to Incompressible Fluid Flows. In Proceedings of the 2016 IEEE 23rd International Conference on High Performance Computing Workshops (HiPCW), Hyderabad, India, 19–22 December 2016; pp. 81–89. [Google Scholar] [CrossRef]
- Ghioldi, F. Fast Algorithms for Highly Underexpanded Reactive Spray Simulations. Master’s Thesis, Politecnico di Milano, Milan, Italy, 2019. Available online: https://hdl.handle.net/10589/146075 (accessed on 11 August 2023).
- Trevisiol, F. Accelerating Reactive Flow Simulations via GPGPU ODE Solvers in OpenFOAM. Master’s Thesis, Politecnico di Milano, Milan, Italy, 2020. Available online: https://hdl.handle.net/10589/170955 (accessed on 11 August 2023).
- Ghioldi, F. Development of Novel CFD Methodologies for the Optimal Design of Modern Green Propulsion Systems. Ph.D. Thesis, Politecnico di Milano, Milan, Italy, 2022. Available online: https://hdl.handle.net/10589/195309 (accessed on 11 August 2023).
- Dyson, J. GPU Accelerated Linear System Solvers for OpenFOAM and Their Application to Sprays. Ph.D. Thesis, Brunel University London, Uxbridge, UK, 2016. [Google Scholar]
- Molinero, D.; Galván, S.; Domínguez, F.; Ibarra, L.; Solorio, G. Francis 99 CFD through RapidCFD accelerated GPU code. IOP Conf. Ser. Earth Environ. Sci.
**2021**, 774, 012016. [Google Scholar] [CrossRef] - Jacobsen, D.A.; Senocak, I. Multi-level parallelism for incompressible flow computations on GPU clusters. Parallel Comput.
**2013**, 39, 1–20. [Google Scholar] [CrossRef] - Naumov, M.; Arsaev, M.; Castonguay, P.; Cohen, J.; Demouth, J.; Eaton, J.; Layton, S.; Markovskiy, N.; Reguly, I.; Sakharnykh, N.; et al. AmgX: A Library for GPU Accelerated Algebraic Multigrid and Preconditioned Iterative Methods. SIAM J. Sci. Comput.
**2015**, 37, S602–S626. [Google Scholar] [CrossRef] - The OpenFOAM Foundation. Available online: http://www.openfoam.org/dev.php (accessed on 15 August 2023).
- ESI OpenCFD OpenFOAM. Available online: http://www.openfoam.com/ (accessed on 15 August 2023).
- Caretto, L.S.; Gosman, A.D.; Patankar, S.V.; Spalding, D.B. Two calculation procedures for steady, three-dimensional flows with recirculation. In Proceedings of the Third International Conference on Numerical Methods in Fluid Mechanics; Lecture Notes in Physics; Cabannes, H., Temam, R., Eds.; Springer: Berlin/Heidelberg, Germany, 1973; Volume 1, pp. 60–68. [Google Scholar]
- Gordon, S.; McBride, B. Computer Program for Calculation of Complex Equi- librium Compositions, Rocket Performance, Incident and Reflected Shocks, and Chapman-Jouguet Detonations; NASA Techical Report SP-273; National Aeronautics and Space Administration: Washington, DC, USA, 1971. [Google Scholar]
- Intel one API Math Kernel Library (MKL) Developer Reference. Available online: https://software.intel.com/content/www/us/en/develop/articles/mkl-reference-manual.html (accessed on 15 August 2023).
- Balay, S.; Abhyankar, S.; Adams, M.F.; Benson, S.; Brown, J.; Brune, P.; Buschelman, K.; Constantinescu, E.M.; Dalcin, L.; Dener, A.; et al. PETSc Web Page. 2023. Available online: https://petsc.org/ (accessed on 15 August 2023).
- NVIDIA cuSPARSE Library. Available online: https://github.com/NVIDIA/CUDALibrarySamples/tree/master/cuSPARSE (accessed on 15 August 2023).
- Algebraic Multigrid Solver (AmgX) Library. Available online: https://github.com/NVIDIA/AMGX (accessed on 15 August 2023).
- Barrett, R.; Berry, M.; Chan, T.F.; Demmel, J.; Donato, J.; Dongarra, J.; Eijkhout, V.; Pozo, R.; Romine, C.; van der Vorst, H. Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods; Society for Industrial and Applied Mathematics: Philadelphia, PA, USA, 1994. [Google Scholar] [CrossRef]
- Saad, Y. A Flexible Inner-Outer Preconditioned GMRES Algorithm. SIAM J. Sci. Comput.
**1993**, 14, 461–469. [Google Scholar] [CrossRef] - Vogel, J.A. Flexible BiCG and flexible Bi-CGSTAB for nonsymmetric linear systems. Appl. Math. Comput.
**2007**, 188, 226–233. [Google Scholar] [CrossRef] - Bna, S. foam2CSR. 2021. Available online: https://gitlab.hpc.cineca.it/openfoam/foam2csr (accessed on 15 August 2023).
- Liang, L.; Stevens, J.G.; Raman, S.; Farrell, J.T. The use of dynamic adaptive chemistry in combustion simulation of gasoline surrogate fuels. Combust. Flame
**2009**, 156, 1493–1502. [Google Scholar] [CrossRef] - Goldin, G.M.; Ren, Z.; Zahirovic, S. A cell agglomeration algorithm for accelerating detailed chemistry in CFD. Combust. Theory Model.
**2009**, 13, 721–739. [Google Scholar] [CrossRef] - Singer, M.A.; Pope, S.B. Exploiting ISAT to solve the reaction–diffusion equation. Combust. Theory Model.
**2004**, 8, 361–383. [Google Scholar] [CrossRef] - Li, Z.; Lewandowski, M.T.; Contino, F.; Parente, A. Assessment of On-the-Fly Chemistry Reduction and Tabulation Approaches for the Simulation of Moderate or Intense Low-Oxygen Dilution Combustion. Energy Fuels
**2018**, 32, 10121–10131. [Google Scholar] [CrossRef] - Blasco, J.; Fueyo, N.; Dopazo, C.; Ballester, J. Modelling the Temporal Evolution of a Reduced Combustion Chemical System With an Artificial Neural Network. Combust. Flame
**1998**, 113, 38–52. [Google Scholar] [CrossRef] - Nikitin, V.; Karandashev, I.; Malsagov, M.Y.; Mikhalchenko, E. Approach to combustion calculation using neural network. Acta Astronaut.
**2022**, 194, 376–382. [Google Scholar] [CrossRef] - Ji, W.; Qiu, W.; Shi, Z.; Pan, S.; Deng, S. Stiff-PINN: Physics-Informed Neural Network for Stiff Chemical Kinetics. J. Phys. Chem. A
**2021**, 125, 8098–8106. [Google Scholar] [CrossRef] - Tap, F.; Schapotschnikow, P. Efficient Combustion Modeling Based on Tabkin® CFD Look-up Tables: A Case Study of a Lifted Diesel Spray Flame. In Proceedings of the SAE Technical Paper; SAE International: Warrendale, PA, USA, 2012. [Google Scholar] [CrossRef]
- Haidar, A.; Brock, B.; Tomov, S.; Guidry, M.; Billings, J.J.; Shyles, D.; Dongarra, J.J. Performance analysis and acceleration of explicit integration for large kinetic networks using batched GPU computations. In Proceedings of the 2016 IEEE High Performance Extreme Computing Conference (HPEC), Waltham, MA USA, 13–15 September 2016; pp. 1–7. [Google Scholar]
- Wilt, N. The CUDA Handbook: A Comprehensive Guide to GPU Programming; Addison-Wesley: Boston, MA, USA, 2013. [Google Scholar]
- Poinsot, T.; Veynante, D. Theoretical and Numerical Combustion, 3rd ed.; CNRS: Paris, France, 2012. [Google Scholar]
- Van Der Houwen, P.; Sommeijer, B.P. On the Internal Stability of Explicit, m-Stage Runge-Kutta Methods for Large m-Values. J. Appl. Math. Mech.
**1980**, 60, 479–485. [Google Scholar] [CrossRef] - Volkov, V. Better performance at lower occupancy. In Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, Austin, TX, USA, 15–21 November 2008. [Google Scholar]
- Press, W.; Teukolsky, S.; Vetterling, W.; Flannery, B. Numerical Recipes in C, 2nd ed.; Cambridge University Press: Cambridge, UK, 1997. [Google Scholar]
- Fehlberg, E. Some Experimental Results Concerning the Error Propagation in Runge-Kutta Type Integration Formulas; NASA Techical Report; National Aeronautics and Space Administration: Washington, DC, USA, 1970. [Google Scholar]
- Fehlberg, E. Low-order Classical Runge-Kutta Formulas with Stepsize Control and Their Application to Some Heat Transfer Problems; NASA Technical Report TR-R-315; National Aeronautics and Space Administration: Washington, DC, USA, 1969. [Google Scholar]
- Cash, J.; Karp, A. A Variable Order Runge-Kutta Method for Initial Value Problems with Rapidly Varying Right-Hand Sides. Acm Trans. Math. Softw.
**1990**, 16, 201–222. [Google Scholar] [CrossRef] - Nickolls, J.; Dally, W.J. The GPU Computing Era. IEEE Micro
**2010**, 30, 56–69. [Google Scholar] [CrossRef] - Lindholm, E.; Nickolls, J.; Oberman, S.; Montrym, J. NVIDIA Tesla: A Unified Graphics and Computing Architecture. IEEE Micro
**2008**, 28, 39–55. [Google Scholar] [CrossRef] - Branch Statistics. Available online: https://docs.nvidia.com/gameworks/content/developertools/desktop/analysis/report/cudaexperiments/kernellevel/branchstatistics.htm (accessed on 3 February 2023).
- NVIDIA; Vingelmann, P.; Fitzek, F.H.P. CUDA, Release: 10.2.89. 2020. Available online: https://developer.nvidia.com/cuda-toolkit (accessed on 11 August 2023).
- Klingbeil, G.; Erban, R.; Giles, M.; Maini, P.K. Fat versus Thin Threading Approach on GPUs: Application to Stochastic Simulation of Chemical Reactions. IEEE Trans. Parallel Distrib. Syst.
**2012**, 23, 280–287. [Google Scholar] [CrossRef] - Goodwin, D.G.; Speth, R.L.; Moffat, H.K.; Weber, B.W. Cantera: An Object-Oriented Software Toolkit for Chemical Kinetics, Thermodynamics, and Transport Processes. Version 2.4.0. 2018. Available online: https://www.cantera.org (accessed on 15 August 2023). [CrossRef]
- Hong, Z.; Davidson, D.F.; Hanson, R.K. An improved H
_{2}/O_{2}mechanism based on recent shock tube/laser absorption measurements. Combust. Flame**2011**, 158, 633–644. [Google Scholar] [CrossRef] - Guerra, R.; Waidmann, W.; Laible, C. An Experimental Investigation of the Combustion of a Hydrogen Jet Injected Parallel in a Supersonic Air Stream. In Proceedings of the AIAA 3rd International Aerospace Conference, Washington, DC, USA, 3–5 December 1991. [Google Scholar]
- Waidmann, W.; Alff, F.; Bohm, M.; Claus, W.; Oschwald, M. Experimental Investigation of the Combustion Process in a Supersonic Combustion Ramjet (SCRAMJET); Technical Report; DGLR Jahrestagung: Erlangen, Germany, 1994. [Google Scholar]
- Waidmann, W.; Alff, F.; Böhm, M.; Brummund, U.; Clauss, W.; Oschwald, M. Supersonic Combustion of Hydrogen/Air in a Scramjet Combustion Chamber. Space Technol.
**1994**, 6, 421–429. [Google Scholar] - Génin, F.; Menon, S. Simulation of Turbulent Mixing Behind a Strut Injector in Supersonic Flow. AIAA J.
**2010**, 48, 526–539. [Google Scholar] [CrossRef] - Potturi, A.; Edwards, J. Investigation of Subgrid Closure Models for Finite-Rate Scramjet Combustion. In Proceedings of the 43rd Fluid Dynamics Conference, San Diego, CA, USA, 24–27 June 2013; pp. 1–10. [Google Scholar] [CrossRef]
- Zhang, H.; Zhao, M.; Huang, Z. Large eddy simulation of turbulent supersonic hydrogen flames with OpenFOAM. Fuel
**2020**, 282, 118812. [Google Scholar] [CrossRef] - Berglund, M.; Fureby, C. LES of supersonic combustion in a scramjet engine model. Proc. Combust. Inst.
**2007**, 31, 2497–2504. [Google Scholar] [CrossRef]

**Figure 1.**Segregated solution of the fluid transport for incompressible flows in the Finite Volume Method for steady and unsteady problems: (

**a**) Semi-Implicit Method for Pressure-Linked Equations (SIMPLE) and (

**b**) Pressure Implicit with Splitting of Operator (PISO). Each equation implies the solving of a large sparse linear system.

**Figure 2.**Contour plots of the pressure variations across different equally-spaced cross-sections of the studied case. Mesh size: M (18 M cells). (

**a**) Computational mesh (M); (

**b**) CPU; (

**c**) GPU.

**Figure 3.**Contour plots of pressure and velocity fluctuations in the symmetry plane of the investigated case. Mesh size: M (18E6 cells). (

**a**) CPU and (

**b**) GPU. Pressure is denoted as the pressure differential relative to the ambient atmospheric pressure.

**Figure 4.**Strong scalability (Ahmdal’s Law) calculated across three tested grids (S, M, L). The simulation conducted with eight CPU cores served as the reference benchmark. (

**a**) PCG-DIC (CPU) vs. AMG-PCG (GPU) and (

**b**) GAMG (CPU) vs. AMG-PCG (GPU).

**Figure 5.**Solution methods employed in the GPGPU compressible unsteady reactive flow solvers: (

**a**) pressure-based SIMPLE type and (

**b**) density-based shock-capturing type. The calculation of the chemical mass action ${\dot{\omega}}_{i}$ at each time step is performed on the GPUs.

**Figure 6.**Decomposition method for hybrid CPU/GPU computations. The heterogeneous solver employs three-level parallelization on the fluid dynamic problem, the chemistry problem, and the reaction mechanism.

**Figure 8.**Hydrogen/air auto-ignition predicted over time by the direct integration of the chemistry problem on the hybrid CPU/GPU code by an explicit CPU solver and by Cantera [65]. (

**a**) temperature; (

**b**) evolution in time of ${Y}_{H}$; (

**c**) evolution in time of ${Y}_{{H}_{2}{O}_{2}}$.

**Figure 11.**Body-fitted hexahedral Finite Volume (FV) mesh of the scramjet engine. Adaptive Mesh Refinement (AMR) is dynamically applied at run-time in proximity to large temperature gradients between neighboring cells. The number of cell elements ranges between 1.5 M (initial mesh) and 15 M.

**Figure 12.**Comparison of density gradients for the hot simulation: Schlieren image [70] (

**left**) vs. numerical solution (

**right**).

**Figure 14.**Mean temperature (K) at different positions along the flowpath: (

**a**) x = 11 mm; (

**b**) x = 58 mm; (

**c**) x = 106 mm.

**Figure 15.**Combustion simulation at two different time steps, showing a representation of the flame via the threshold (top) and density field (bottom): (

**a**) $5\times {10}^{-4}$ s; (

**b**) $7\times {10}^{-4}$ s.

**Figure 16.**Scalability of the reactive flow solver on the initial coarse mesh (1.5 M cells, AMR deactivated): (

**a**) cold flow simulation; (

**b**) cold-flow simulation with species-transport (combustion off); (

**c**) reactive flow simulation with finite-rate chemistry calculations.

**Table 1.**Quantitative results for the motorbike test case using the same mesh discretization. Legend: ${C}_{d}$ is the drag coefficient, ${C}_{L}$ is the lift coefficient, ${C}_{m,i}$ represents the moment coefficients over x, y, and z, and ${C}_{s}$ is the side force coefficient.

Iter | Method | C_{d} | C_{L} | C_{m,x} | C_{m,y} | C_{m,z} | C_{s} |
---|---|---|---|---|---|---|---|

1000 | S | 0.82% | 1.2% | 0.3% | 0.15% | 1.1% | 0.96% |

M | 0.7% | 0.9% | 0.1% | 0.2% | 1% | 0.8% | |

L | 0.62% | 0.8% | 0.25% | 0.18% | 1.05% | 0.89% |

**Table 2.**Strong scaling analysis for the three tested grids. In this context, “nProcs” refers to the number of CPU cores used to decompose the computational domain.

nProcs | 8 | 16 | 32 | ||||
---|---|---|---|---|---|---|---|

grid | mesh | CPU | GPU | CPU | GPU | CPU | GPU |

S | 100% | 100% | 82.8% | 83.3% | 65% | 70% | |

M | 100% | 100% | 91% | 89.8% | 80% | 79% | |

L | 100% | 100% | 75% | 100% | 82% | 98% |

0 | 0 | 0 | 0 | ||||||||||||

${c}_{2}$ | ${a}_{21}$ | 0 | $\frac{1}{5}$ | $\frac{1}{5}$ | 0 | ||||||||||

${c}_{3}$ | ${a}_{31}$ | ${a}_{32}$ | 0 | $\frac{3}{10}$ | $\frac{3}{40}$ | $\frac{9}{40}$ | 0 | ||||||||

${c}_{4}$ | ${a}_{41}$ | ${a}_{42}$ | ${a}_{43}$ | 0 | $\frac{3}{5}$ | $\frac{3}{10}$ | $-\frac{9}{40}$ | $\frac{6}{5}$ | 0 | ||||||

${c}_{5}$ | ${a}_{51}$ | ${a}_{52}$ | ${a}_{53}$ | ${a}_{54}$ | 0 | 1 | $-\frac{11}{54}$ | $\frac{5}{2}$ | $-\frac{70}{27}$ | $\frac{35}{27}$ | 0 | ||||

${c}_{6}$ | ${a}_{61}$ | ${a}_{62}$ | ${a}_{63}$ | ${a}_{64}$ | ${a}_{65}$ | 0 | $\frac{7}{8}$ | $\frac{1631}{55,296}$ | $\frac{175}{512}$ | $\frac{575}{13,824}$ | $\frac{44,275}{110,592}$ | $\frac{253}{4096}$ | 0 | ||

${b}_{1}$ | ${b}_{2}$ | ${b}_{3}$ | ${b}_{4}$ | ${b}_{5}$ | ${b}_{6}$ | $\frac{37}{378}$ | 0 | $\frac{250}{621}$ | $\frac{125}{594}$ | 0 | $\frac{512}{1771}$ | (5th order) | |||

${\tilde{b}}_{1}$ | ${\tilde{b}}_{2}$ | ${\tilde{b}}_{3}$ | ${\tilde{b}}_{4}$ | ${\tilde{b}}_{5}$ | ${\tilde{b}}_{6}$ | $\frac{2825}{27,648}$ | 0 | $\frac{18,575}{48,384}$ | $\frac{13,525}{55,296}$ | $\frac{277}{14,336}$ | $\frac{1}{4}$ | (4th order) |

U [m/s] | ${\mathit{T}}_{0}$ [K] | p [Pa] | ${\mathit{Y}}_{{\mathit{N}}_{2}}$ [-] | ${\mathit{Y}}_{{\mathit{H}}_{2}}$ [-] | ${\mathit{Y}}_{{\mathit{H}}_{2}\mathit{O}}$ [-] | ${\mathit{Y}}_{{\mathit{H}}_{2}}$ [-] | |
---|---|---|---|---|---|---|---|

Air | 730 | 600 | 10^{5} | 0.736 | 0.232 | 0.032 | 0 |

Fuel | 1200 | 300 | 10^{5} | 0 | 0 | 0 | 1 |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Piscaglia, F.; Ghioldi, F.
GPU Acceleration of CFD Simulations in OpenFOAM. *Aerospace* **2023**, *10*, 792.
https://doi.org/10.3390/aerospace10090792

**AMA Style**

Piscaglia F, Ghioldi F.
GPU Acceleration of CFD Simulations in OpenFOAM. *Aerospace*. 2023; 10(9):792.
https://doi.org/10.3390/aerospace10090792

**Chicago/Turabian Style**

Piscaglia, Federico, and Federico Ghioldi.
2023. "GPU Acceleration of CFD Simulations in OpenFOAM" *Aerospace* 10, no. 9: 792.
https://doi.org/10.3390/aerospace10090792