Next Article in Journal
Challenges for Theory and Computation
Previous Article in Journal
Nonlinear-Adaptive Mathematical System Identification
Article

A Holistic Scalable Implementation Approach of the Lattice Boltzmann Method for CPU/GPU Heterogeneous Clusters

1
Department of Informatics, Technical University of Munich, 85748 Garching, Germany
2
Department of Computer Science/Mathematics, University of Exeter, Exeter EX4 4QF, UK
3
Scientific Computing, University of Hamburg, 20146 Hamburg, Germany
*
Author to whom correspondence should be addressed.
Computation 2017, 5(4), 48; https://doi.org/10.3390/computation5040048
Received: 16 October 2017 / Revised: 11 November 2017 / Accepted: 24 November 2017 / Published: 30 November 2017
(This article belongs to the Section Computational Engineering)
Heterogeneous clusters are a widely utilized class of supercomputers assembled from different types of computing devices, for instance CPUs and GPUs, providing a huge computational potential. Programming them in a scalable way exploiting the maximal performance introduces numerous challenges such as optimizations for different computing devices, dealing with multiple levels of parallelism, the application of different programming models, work distribution, and hiding of communication with computation. We utilize the lattice Boltzmann method for fluid flow as a representative of a scientific computing application and develop a holistic implementation for large-scale CPU/GPU heterogeneous clusters. We review and combine a set of best practices and techniques ranging from optimizations for the particular computing devices to the orchestration of tens of thousands of CPU cores and thousands of GPUs. Eventually, we come up with an implementation using all the available computational resources for the lattice Boltzmann method operators. Our approach shows excellent scalability behavior making it future-proof for heterogeneous clusters of the upcoming architectures on the exaFLOPS scale. Parallel efficiencies of more than 90 % are achieved leading to 2604.72 GLUPS utilizing 24,576 CPU cores and 2048 GPUs of the CPU/GPU heterogeneous cluster Piz Daint and computing more than 6.8 × 10 9 lattice cells. View Full-Text
Keywords: GPU clusters; heterogeneous clusters; hybrid implementation; lattice Boltzmann method; multilevel parallelism; petascale; resource assignment; scalability GPU clusters; heterogeneous clusters; hybrid implementation; lattice Boltzmann method; multilevel parallelism; petascale; resource assignment; scalability
Show Figures

Figure 1

MDPI and ACS Style

Riesinger, C.; Bakhtiari, A.; Schreiber, M.; Neumann, P.; Bungartz, H.-J. A Holistic Scalable Implementation Approach of the Lattice Boltzmann Method for CPU/GPU Heterogeneous Clusters. Computation 2017, 5, 48. https://doi.org/10.3390/computation5040048

AMA Style

Riesinger C, Bakhtiari A, Schreiber M, Neumann P, Bungartz H-J. A Holistic Scalable Implementation Approach of the Lattice Boltzmann Method for CPU/GPU Heterogeneous Clusters. Computation. 2017; 5(4):48. https://doi.org/10.3390/computation5040048

Chicago/Turabian Style

Riesinger, Christoph, Arash Bakhtiari, Martin Schreiber, Philipp Neumann, and Hans-Joachim Bungartz. 2017. "A Holistic Scalable Implementation Approach of the Lattice Boltzmann Method for CPU/GPU Heterogeneous Clusters" Computation 5, no. 4: 48. https://doi.org/10.3390/computation5040048

Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop