Effects of OpenCL-Based Parallelization Methods on Explicit Numerical Methods to Solve the Heat Equation
Abstract
:1. Introduction
2. Equation and Methods
2.1. The Numerical Methods Being Investigated
- f is the temperature [°C, K] —or, in the case of diffusion, concentration,
- t is the simulated time [s],
- is the position vector [m],
- D is the diffusion coefficient [m2/s], and
- is the Laplacian operator.
2.2. Parallelization and Applying OpenCL
Listing 1: Kernel of CNe and the 1st stage of CpC. |
#pragma OPENCL EXTENSION cl_khr_fp64 : enable |
__kernel void calc_cell_2D( |
__global const double *before, __global double *after, |
double2 coeffs, uint2 size, long4 neighbours) { |
const size_t index = get_global_id(0) + size.x * get_global_id(1); |
__global const double *old_cell = before + index; |
after[index] = (1 − 2 * (coeffs.x + coeffs.y)) * *old_cell |
+ coeffs.x * |
(old_cell[neighbours.even.x] + |
old_cell[neighbours.odd.x]) |
+ coeffs.y * |
(old_cell[neighbours.even.y] + |
old_cell[neighbours.odd.y]); |
} |
- before, after are pointers to the array containing data before and after the timestep;
- coeffs is a vector containing numerical coefficients for computation;
- size is a vector containing the [Nx, Ny] number of datapoints;
- neighbours is a vector containing pointer differences between the current datapoint and its neighbours (introduced to simplify handling boundary conditions).
Listing 2: Kernel of the 2nd stage of CpC. |
__kernel void stage2p05_calc_cell_2D( |
__global const double *stage0, |
__global const double *stage1, |
__global double *after, |
double2 coeffs, uint2 size, long4 neighbours) { |
const size_t index = get_global_id(0) + size.x * get_global_id(1); |
__global const double *old_cell = stage0 + index; |
__global const double *midpoint = stage1 + index; |
after[index] = (1 − 2 * (coeffs.x + coeffs.y)) * *old_cell |
+ coeffs.x * |
(midpoint[neighbours.even.x] + |
midpoint[neighbours.odd.x]) |
+ coeffs.y * |
(midpoint[neighbours.even.y] + |
midpoint[neighbours.odd.y]); |
} |
- stage0 is a pointer to the array containing data before stage 1;
- stage1 is a pointer to the array containing the result of stage 1;
- after is a pointer to the array containing data after all the stages;
- coeffs is a vector containing numerical coefficients for computation;
- size is a vector containing the [Nx, Ny] number of datapoints;
- neighbours is a vector containing pointer differences between the current datapoint and its neighbours (introduced to possibly simplify handling boundary conditions).
2.3. Initial and Boundary Conditions, Scale Parameters, and the Method of Investigation
- A 1D sine with nodes at the boundaries is
- A 1D Gaussian function [33] is
- In 2D, the product of sinewaves for each dimension with nodes at the boundaries is
- A 2D planar wave, propagating at an angle [34] is
2.4 The Specifications of the Applied Computer, Further Circuimstances of the Investigation
- CPU—Gen11 Intel Core i7-11700F, 2.5 GHz;
- RAM—64 GB;
- OS—Windows 10 Professional (22H2), ×64;
- GPU—NVIDIA;
- Open CL—Intel v2020.3.494.
3. Results
3.1. The Dependency of CPU Time on the Number of Data Points
3.1.1. The CNe Method Compared to Euler’s Method
3.1.2. The CpC Method Compared to Euler’s Method
3.2. The Error as a Function of the Timestep Size
3.2.1. The CNe Method Compared to Euler’s Method
3.2.2. The CpC Method Compared to Euler’s Method
4. Discussion and Conclusions
Supplementary Materials
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Robert, E.T.; Larson, E. Parallel Processing Algorithms for the Optimal Control of Nonlinear Dynamic Systems. IEEE Trans. Comput. 1973, 100, 777–786. [Google Scholar]
- Patel, Y.F.; Dhodiya, J.M. Efficient algorithm to study the class of Burger’s Fisher equation. Int. J. Appl. Nonlinear Sci. 2022, 3, 179–266. [Google Scholar] [CrossRef]
- Xue, G.; Gao, Y. A Samarskii domain decomposition method for two-dimensional convection-diffusion equations. Comput. Appl. Math. 2022, 41, 283. [Google Scholar] [CrossRef]
- Kumar, V.; Chandan, K.; Nagaraja, K.V.; Reddy, M.V. Heat conduction with Krylov subspace method using FEniCSx. Energies 2022, 15, 8077. [Google Scholar] [CrossRef]
- Köroğlu, C.; Aydin, A. Exact and nonstandard finite difference schemes for the Burgers equation B(2,2). Turk. J. Math. 2021, 45, 3. [Google Scholar] [CrossRef]
- Ruan, C.; Dong, C.; Zhang, Z.; Chen, B.; Liu, Z. Finite difference-peridynamic differential operator. Comput. Model. Eng. Sci. 2024, 140, 2707–2728. [Google Scholar]
- Yang, J.; Kim, J. Consistently and unconditionally energy-stable linear method for the diffuse-interface model of narrow volume reconstruction. Eng. Comput. 2024, 40, 2617–2627. [Google Scholar] [CrossRef]
- Mbroh, N.A.; Munyakazi, J.B. A robust numerical for singularly perturbed parabolic reaction-diffusion problems via the method of lines. Comput. Math. 2021, 99, 1139–1158. [Google Scholar] [CrossRef]
- Aydin, A.; Koroglu, C. A nonstandard numerical method for the modified KdV equa-tion. Pramana—J. Phys. 2017, 89, 72. [Google Scholar] [CrossRef]
- Beuken, L.; Cheffert, O.; Tutueva, A.; Butusov, D.; Legat, V. Numerical stability and performance of semi-explicit and semi-implicit predictor-corrector methods. Mathematics 2022, 10, 2015. [Google Scholar] [CrossRef]
- Ji, Y.; Xing, Y. Highly accurate and efficient time integration methods with unconditional stability and flexible numerical dissipation. Mathematics 2023, 11, 593. [Google Scholar] [CrossRef]
- Dou, N.; Dlamini, P.; Jacobs, B.A. Enhanced unconditionally positive finite difference method for advection-diffusion-reaction equations. Mathematics 2022, 10, 2639. [Google Scholar] [CrossRef]
- Jaglan, J.; Singh, A.; Maurya, V.; Yadav, V.S.; Rajpoot, M.K. Strong stability preserving multiderivative time marching methods for stiff reaction-diffusion systems. Math. Comput. Simul. 2024, 225, 267–282. [Google Scholar] [CrossRef]
- Essongue, S.; Ledoux, Y.; Ballu, A. Speeding up mesoscale thermal simulations of powder bed additive manufacturings thanks to the forward Euler time integration scheme: A critical assesment. Finite Elem. Anal. Des. 2022, 211, 103825. [Google Scholar] [CrossRef]
- Kovács, E.; Nagy, Á.; Saleh, M. A set of new, stabl, explicit, second-order schemes for the nonsationary heat conduction equation. Mathematics 2021, 9, 2284. [Google Scholar] [CrossRef]
- Kovács, E. A Class of New Stable, Explicit Methods to Solve the Non-Stationary Heat Equation. Numer. Methods Partial Differ. Equ. 2020, 37, 2469–2489. [Google Scholar] [CrossRef]
- Askar, A.H.; Nagy, Á.; Barna, I.F.; Kovács, E. Analytical and numerical results for the diffusion-reaction equation when the reaction coefficient depends on simultaneously the space and time coordinates. Computation 2023, 11, 127. [Google Scholar] [CrossRef]
- Midkiff, S.P. Automatic Parallelization: An Overview of Fundamental Compiler Techniques; Springer Nature: Cham, Switzerland, 2022. [Google Scholar]
- Parhami, B. Introduction to Parallel Processing: Algorithms and Architectures; Springer Science & Business Media: Cham, Switzerland, 2006. [Google Scholar]
- Munshi, A. The OpenCL specification. In 2009 IEEE Hot Chips 21 Symposium (HCS); IEEE: Stanford, CA, USA, 2009; pp. 1–314. [Google Scholar] [CrossRef]
- Kang, P. Programming for high-performance computing on edge accelerators. Mathematics 2023, 11, 1055. [Google Scholar] [CrossRef]
- Takáč, M.; Petráš, I. Cross-Platform GPU-Based Implementation of Lattice Boltzmann Method Solver Using ArrayFire Library. Mathematics 2021, 9, 1793. [Google Scholar] [CrossRef]
- Tavakkoli, V.; Mohsenzadegan, K.; Chedjou, J.C.; Kyamakya, K. Contribution to Speeding-Up the Solving of Nonlinear Ordinary Differential Equations on Parallel/Multi-Core Platforms for Sensing Systems. Sensors 2020, 20, 6130. [Google Scholar] [CrossRef]
- Baskaran, M.M.; Bordawekar, R. Optimizing Sparse Matrix-Vector Multiplication on GPUs. IBM Research Report RC24704, (W0812–047) 2009. Available online: https://dominoweb.draco.res.ibm.com/reports/rc24704.pdf (accessed on 19 September 2024).
- Stanisławski, R.; Kozioł, K. Parallel Implementation of Modeling of Fractional-Order State-Space Systems Using the Fixed-Step Euler Method. Entropy 2019, 21, 931. [Google Scholar] [CrossRef]
- Di Tucci, L.; O’Brien, K.; Blott, M.; Santambrogio, M.D. Architectural optimizations for high performance and energy efficient Smith-Waterman implementation on FPGAs using OpenCL. In Proceedings of the Design, Automation & Test in Europe Conference & Exhibition (DATE), Lausanne, Switzerland, 27–31 March 2017; pp. 716–721. [Google Scholar]
- Khronos Group. OpenCL 3.0 Specification. 2020. Available online: https://www.khronos.org/registry/OpenCL/specs/3.0-unified/html/OpenCL_API.html (accessed on 19 September 2024).
- Jo, G.; Jung, J.; Park, J.; Lee, J. Memory-Access-Pattern Analysis Techniques for OpenCL Kernels. In International Workshop on Languages and Compilers for Parallel Computing; Springer International Publishing: Cham, Switzerland, 2017; pp. 109–126. [Google Scholar]
- Jääskeläinen, P.; de La Lama, C.S.; Schnetter, E.; Raiskila, K.; Takala, J.; Berg, H. pocl: A performance-portable OpenCL implementation. Int. J. Parallel Program. 2015, 43, 752–785. [Google Scholar] [CrossRef]
- Wang, Z.; He, B.; Zhang, W.; Jiang, S. A performance analysis framework for optimizing OpenCL applications on FPGAs. In Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA), Barcelona, Spain, 12–16 March 2016; pp. 114–125. [Google Scholar]
- Smith, G.D. Numerical Solution of Partial Differential Equations: Finite Difference Methods; Oxford University Press: Oxford, UK, 1985. [Google Scholar]
- MathWorks. MATLAB—ODE15S. 2023. Available online: https://www.mathworks.com/help/matlab/ref/ode15s.html (accessed on 19 September 2024).
- Weisstein, E.W. (n.d.). Gaussian Function. In MathWorld—A Wolfram Web Resource. Available online: https://mathworld.wolfram.com/GaussianFunction.html (accessed on 19 September 2024).
- Jackson, J.D. Classical Electrodynamics, 3rd ed.; Wiley: Hoboken, NJ, USA, 1998. [Google Scholar]
Nt | ∆t |
---|---|
9 | 1.1111 × 10−1 |
15 | 6.6667 × 10−2 |
30 | 3.3333 × 10−2 |
50 | 2.0000 × 10−2 |
90 | 1.1111 × 10−2 |
150 | 6.6667 × 10−3 |
300 | 3.3333 × 10−3 |
500 | 2.0040 × 10−3 |
900 | 1.1111 × 10−3 |
1500 | 6.6667 × 10−4 |
3000 | 3.3333 × 10−4 |
5000 | 2.0000 × 10−4 |
9000 | 1.1111 × 10−4 |
15,000 | 6.6667 × 10−5 |
30,000 | 3.3333 × 10−5 |
50,000 | 2.0000 × 10−5 |
90,000 | 1.1111 × 10−5 |
150,000 | 6.6667 × 10−6 |
300,000 | 3.3333 × 10−6 |
500,000 | 2.0000 × 10−6 |
900,000 | 1.1111 × 10−6 |
1,500,000 | 6.6667 × 10−7 |
3,000,000 | 3.3333 × 10−7 |
5,000,000 | 2.0000 × 10−7 |
9,000,000 | 1.1111 × 10−7 |
Nx | ∆x |
---|---|
50 | 2.0408 × 10−1 |
100 | 1.0101 × 10−1 |
200 | 5.0251 × 10−2 |
400 | 2.5063 × 10−2 |
800 | 1.2516 × 10−2 |
1200 | 8.3403 × 10−3 |
2000 | 5.0025 × 10−3 |
4000 | 2.5006 × 10−3 |
8000 | 1.2502 × 10−3 |
12,000 | 8.3340 × 10−4 |
Nt | ∆t |
---|---|
9 | 1.1111 × 10−2 |
15 | 6.6667 × 10−3 |
30 | 3.3333 × 10−3 |
50 | 2.0000 × 10−3 |
90 | 1.1111 × 10−3 |
150 | 6.6667 × 10−4 |
300 | 3.3333 × 10−4 |
500 | 2.0000 × 10−4 |
900 | 1.1111 × 10−4 |
1500 | 6.6667 × 10−5 |
3000 | 3.3333 × 10−5 |
5000 | 2.0000 × 10−5 |
9000 | 1.1111 × 10−5 |
15,000 | 6.6667 × 10−6 |
30,000 | 3.3333 × 10−6 |
50,000 | 2.0000 × 10−6 |
90,000 | 1.1111 × 10−6 |
150,000 | 6.6667 × 10−7 |
300,000 | 3.3333 × 10−7 |
500,000 | 2.0000 × 10−7 |
900,000 | 1.1111 × 10−7 |
Nx | Ny | NxNy | ∆x | ∆y | ∆x∆y |
---|---|---|---|---|---|
25 | 25 | 625 | 4.1667 × 10−2 | 4.1667 × 10−2 | 1.7361 × 10−3 |
25 | 50 | 1250 | 4.1667 × 10−2 | 2.0408 × 10−2 | 8.5034 × 10−4 |
50 | 50 | 2500 | 2.0408 × 10−2 | 2.0408 × 10−2 | 4.1649 × 10−4 |
50 | 75 | 3750 | 2.0408 × 10−2 | 1.3514 × 10−2 | 2.7579 × 10−4 |
75 | 75 | 5625 | 1.3514 × 10−2 | 1.3514 × 10−2 | 1.8262 × 10−4 |
75 | 100 | 7500 | 1.3514 × 10−2 | 1.0101 × 10−2 | 1.3650 × 10−4 |
100 | 100 | 10,000 | 1.0101 × 10−2 | 1.0101 × 10−2 | 1.0203 × 10−4 |
100 | 150 | 15,000 | 1.0101 × 10−2 | 6.7114 × 10−3 | 6.7792 × 10−5 |
150 | 150 | 22,500 | 6.711 × 10−3 | 6.7114 × 10−3 | 4.5043 × 10−5 |
150 | 200 | 30,000 | 6.7114 × 10−3 | 5.0251 × 10−3 | 3.3726 × 10−5 |
200 | 200 | 40,000 | 5.0251 × 10−3 | 5.0251 × 10−3 | 2.5252 × 10−5 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Koics, D.; Kovács, E.; Hornyák, O. Effects of OpenCL-Based Parallelization Methods on Explicit Numerical Methods to Solve the Heat Equation. Computers 2024, 13, 250. https://doi.org/10.3390/computers13100250
Koics D, Kovács E, Hornyák O. Effects of OpenCL-Based Parallelization Methods on Explicit Numerical Methods to Solve the Heat Equation. Computers. 2024; 13(10):250. https://doi.org/10.3390/computers13100250
Chicago/Turabian StyleKoics, Dániel, Endre Kovács, and Olivér Hornyák. 2024. "Effects of OpenCL-Based Parallelization Methods on Explicit Numerical Methods to Solve the Heat Equation" Computers 13, no. 10: 250. https://doi.org/10.3390/computers13100250
APA StyleKoics, D., Kovács, E., & Hornyák, O. (2024). Effects of OpenCL-Based Parallelization Methods on Explicit Numerical Methods to Solve the Heat Equation. Computers, 13(10), 250. https://doi.org/10.3390/computers13100250