Next Article in Journal
Source Identification of Trace Elements in PM2.5 at a Rural Site in the North China Plain
Next Article in Special Issue
Coupling Large Eddies and Waves in Turbulence: Case Study of Magnetic Helicity at the Ion Inertial Scale
Previous Article in Journal
Coexistence of Lightning Generated Whistlers, Hiss and Lower Hybrid Noise Observed by e-POP (SWARM-E)–RRI
Previous Article in Special Issue
Nonlinear Effects on the Precessional Instability in Magnetized Turbulence
Open AccessArticle

GPU Parallelization of a Hybrid Pseudospectral Geophysical Turbulence Framework Using CUDA

1
Cooperative Institute for Research in the Atmosphere, Boulder, CO 80305, USA
2
Departamento de Física, Facultad de Ciencias Exactas y Naturales & IFIBA, CONICET, Ciudad Universitaria, Buenos Aires 1428, Argentina
3
CSRA Inc., at NOAA/NWS/NCEP/Environmental Modeling Center, College Park, MD 20740, USA
4
Laboratory for Atmospheric and Space Physics, CU, Boulder, CO 80309, USA
5
National Center for Atmospheric Research, Boulder, CO 80307, USA
*
Author to whom correspondence should be addressed.
Atmosphere 2020, 11(2), 178; https://doi.org/10.3390/atmos11020178
Received: 16 January 2020 / Revised: 2 February 2020 / Accepted: 3 February 2020 / Published: 8 February 2020
An existing hybrid MPI-OpenMP scheme is augmented with a CUDA-based fine grain parallelization approach for multidimensional distributed Fourier transforms, in a well-characterized pseudospectral fluid turbulence code. Basics of the hybrid scheme are reviewed, and heuristics provided to show a potential benefit of the CUDA implementation. The method draws heavily on the CUDA runtime library to handle memory management and on the cuFFT library for computing local FFTs. The manner in which the interfaces to these libraries are constructed, and ISO bindings utilized to facilitate platform portability, are discussed. CUDA streams are implemented to overlap data transfer with cuFFT computation. Testing with a baseline solver demonstrated significant aggregate speed-up over the hybrid MPI-OpenMP solver by offloading to GPUs on an NVLink-based test system. While the batch streamed approach provided little benefit with NVLink, we saw a performance gain of 30 % when tuned for the optimal number of streams on a PCIe-based system. It was found that strong GPU scaling is nearly ideal, in all cases. Profiling of the CUDA kernels shows that the transform computation achieves 15% of the attainable peak FlOp-rate based on a roofline model for the system. In addition to speed-up measurements for the fiducial solver, we also considered several other solvers with different numbers of transform operations and found that aggregate speed-ups are nearly constant for all solvers.
Keywords: computational fluids; numerical simulation; MPI; OpenMP; CUDA; GPU; parallel computing computational fluids; numerical simulation; MPI; OpenMP; CUDA; GPU; parallel computing
MDPI and ACS Style

Rosenberg, D.; Mininni, P.D.; Reddy, R.; Pouquet, A. GPU Parallelization of a Hybrid Pseudospectral Geophysical Turbulence Framework Using CUDA. Atmosphere 2020, 11, 178.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop