Accelerating the Finite-Element Method for Reaction-Diffusion Simulations on GPUs with CUDA
Abstract
:1. Introduction
2. Materials and Methods
2.1. Chemical System
2.2. Finite Element Method
2.3. Assembly of Stiffness and Damping Matrices
2.4. Resolution of the Matrix Differential Equations
Algorithm 1 User-level algorithm. |
|
Algorithm 2 Addition of two vectors. |
|
Algorithm 3 Naive dot product. |
|
Algorithm 4 Optimized dot product. |
|
2.5. Comparison GPU and CPU
2.6. Post-Processing
3. Results
3.1. Geometry
3.2. Comparison Performance
3.3. Profiling
4. Discussion
Supplementary Materials
Author Contributions
Funding
Conflicts of Interest
References
- Brodtkorb, A.; Hagen, T.; Sætra, M. Graphics processing unit (GPU) programming strategies and trends in GPU computing. J. Parallel Distrib. Comput. 2013, 73, 4–13. [Google Scholar] [CrossRef][Green Version]
- Ghorpade, J.; Parande, J.; Kulkarni, M.; Bawaskar, A. GPGPU processing in CUDA architecture. arXiv 2012, arXiv:1202.4347. [Google Scholar] [CrossRef]
- CUDA Performance Report. Available online: http://developer.download.nvidia.com/compute/cuda/6_5/rel/docs/CUDA_6.5_Performance_Report.pdf (accessed on 7 September 2020).
- Nickolls, J.; Buck, I.; Garland, M.; Skadron, K. Scalable parallel programming with CUDA. Queue 2008, 6, 40–53. [Google Scholar] [CrossRef][Green Version]
- Abadi, M.; Barham, P.; Chen, J.; Chen, Z.; Davis, A.; Dean, J.; Devin, M.; Ghemawat, S.; Irving, G.; Isard, M.; et al. Tensorflow: A system for large-scale machine learning. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), Savannah, GA, USA, 2–4 November 2016; pp. 265–283. [Google Scholar]
- Paszke, A.; Gross, S.; Chintala, S.; Chanan, G.; Yang, E.; DeVito, Z.; Lin, Z.; Desmaison, A.; Antiga, L.; Lerer, A. Automatic differentiation in pytorch. In Proceedings of the NIPS 2017 Workshop Autodiff, Long Beach, CA, USA, 9 December 2017. [Google Scholar]
- Rovigatti, L.; Šulc, P.; Reguly, I.; Romano, F. A comparison between parallelization approaches in molecular dynamics simulations on GPUs. J. Comput. Chem. 2015, 36, 1–8. [Google Scholar] [CrossRef] [PubMed][Green Version]
- Glaser, J.; Nguyen, T.; Anderson, J.; Lui, P.; Spiga, F.; Millan, J.; Morse, D.; Glotzer, S. Strong scaling of general-purpose molecular dynamics simulations on GPUs. Comput. Phys. Commun. 2015, 192, 97–107. [Google Scholar] [CrossRef][Green Version]
- Le Grand, S.; Götz, A.; Walker, R. SPFP: Speed without compromise—A mixed precision model for GPU accelerated molecular dynamics simulations. Comput. Phys. Commun. 2013, 184, 374–380. [Google Scholar] [CrossRef]
- Zienkiewicz, O.; Taylor, R.; Zhu, J. The Finite Element Method: Its Basis and Fundamentals; Elsevier: Amsterdam, The Netherlands, 2005. [Google Scholar]
- Fu, Z.; Lewis, T.; Kirby, R.; Whitaker, R. Architecting the finite element method pipeline for the GPU. J. Comput. Appl. Math. 2014, 257, 195–211. [Google Scholar] [CrossRef] [PubMed]
- Wu, W.; Heng, P.A. A hybrid condensed finite element model with GPU acceleration for interactive 3D soft tissue cutting. Comput. Animat. Virtual Worlds 2004, 15, 219–227. [Google Scholar] [CrossRef]
- Goddeke, D.; Buijssen, S.H.; Wobker, H.; Turek, S. GPU acceleration of an unmodified parallel finite element Navier-Stokes solver. In Proceedings of the 2009 International Conference on High Performance Computing & Simulation, Leipzig, Germany, 21–24 June 2009; pp. 12–21. [Google Scholar]
- Komatitsch, D.; Erlebacher, G.; Göddeke, D.; Michéa, D. High-order finite-element seismic wave propagation modeling with MPI on a large GPU cluster. J. Comput. Phys. 2010, 229, 7692–7714. [Google Scholar] [CrossRef]
- Joldes, G.R.; Wittek, A.; Miller, K. Real-time nonlinear finite element computations on GPU–Application to neurosurgical simulation. Comput. Methods Appl. Mech. Eng. 2010, 199, 3305–3314. [Google Scholar] [CrossRef][Green Version]
- Dziekonski, A.; Sypek, P.; Lamecki, A.; Mrozowski, M. Finite element matrix generation on a GPU. Prog. Electromagn. Res. 2012, 128, 249–265. [Google Scholar] [CrossRef][Green Version]
- Knepley, M.G.; Terrel, A.R. Finite element integration on GPUs. ACM Trans. Math. Softw. (TOMS) 2013, 39, 1–13. [Google Scholar] [CrossRef][Green Version]
- Wang, S.; Wang, C.; Cai, Y.; Li, G. A novel parallel finite element procedure for nonlinear dynamic problems using GPU and mixed-precision algorithm. Eng. Comput. 2020, 37. [Google Scholar] [CrossRef]
- Huthwaite, P. Accelerated finite element elastodynamic simulations using the GPU. J. Comput. Phys. 2014, 257, 687–707. [Google Scholar] [CrossRef][Green Version]
- Johnsen, S.F.; Taylor, Z.A.; Clarkson, M.J.; Hipwell, J.; Modat, M.; Eiben, B.; Han, L.; Hu, Y.; Mertzanidou, T.; Hawkes, D.J.; et al. NiftySim: A GPU-based nonlinear finite element package for simulation of soft tissue biomechanics. Int. J. Comput. Assist. Radiol. Surg. 2015, 10, 1077–1095. [Google Scholar] [CrossRef][Green Version]
- Bauer, P.; Klement, V.; Oberhuber, T.; Žabka, V. Implementation of the Vanka-type multigrid solver for the finite element approximation of the Navier–Stokes equations on GPU. Comput. Phys. Commun. 2016, 200, 50–56. [Google Scholar] [CrossRef]
- Carrascal-Manzanares, C.; Imperiale, A.; Rougeron, G.; Bergeaud, V.; Lacassagne, L. A fast implementation of a spectral finite elements method on CPU and GPU applied to ultrasound propagation. Adv. Parallel Comput. 2018, 32, 339–348. [Google Scholar]
- Comsol, A. COMSOL Multiphysics User’s Guide; COMSOL: Stockholm, Sweden, 2005; Volume 10, p. 333. [Google Scholar]
- Soloveichik, D.; Seelig, G.; Winfree, E. DNA as a universal substrate for chemical kinetics. Proc. Natl. Acad. Sci. USA 2010, 107, 5393–5398. [Google Scholar] [CrossRef][Green Version]
- Kim, J.; Winfree, E. Synthetic in vitro transcriptional oscillators. Mol. Syst. Biol. 2011, 7, 465. [Google Scholar] [CrossRef]
- Montagne, K.; Plasson, R.; Sakai, Y.; Fujii, T.; Rondelez, Y. Programming an in vitro DNA oscillator using a molecular networking strategy. Mol. Syst. Biol. 2011, 7, 466. [Google Scholar] [CrossRef]
- Fujii, T.; Rondelez, Y. Predator–prey molecular ecosystems. ACS Nano 2013, 7, 27–34. [Google Scholar] [CrossRef] [PubMed]
- Padirac, A.; Fujii, T.; Estévez-Torres, A.; Rondelez, Y. Spatial waves in synthetic biochemical networks. J. Am. Chem. Soc. 2013, 135, 14586–14592. [Google Scholar] [CrossRef]
- Srinivas, N.; Parkin, J.; Seelig, G.; Winfree, E.; Soloveichik, D. Enzyme-free nucleic acid dynamical systems. Science 2017, 358. [Google Scholar] [CrossRef] [PubMed][Green Version]
- Genot, A.J.; Bath, J.; Turberfield, A.J. Reversible logic circuits made of DNA. J. Am. Chem. Soc. 2011, 133, 20080–20083. [Google Scholar] [CrossRef]
- Genot, A.J.; Bath, J.; Turberfield, A.J. Combinatorial displacement of DNA strands: Application to matrix multiplication and weighted sums. Angew. Chem. Int. Ed. 2013, 52, 1189–1192. [Google Scholar] [CrossRef] [PubMed]
- Stojanovic, M.N.; Stefanovic, D.; Rudchenko, S. Exercises in molecular computing. Acc. Chem. Res. 2014, 47, 1845–1852. [Google Scholar] [CrossRef]
- Lopez, R.; Wang, R.; Seelig, G. A molecular multi-gene classifier for disease diagnostics. Nat. Chem. 2018, 10, 746–754. [Google Scholar] [CrossRef]
- Cherry, K.M.; Qian, L. Scaling up molecular pattern recognition with DNA-based winner-take-all neural networks. Nature 2018, 559, 370–376. [Google Scholar] [CrossRef]
- Woods, D.; Doty, D.; Myhrvold, C.; Hui, J.; Zhou, F.; Yin, P.; Winfree, E. Diverse and robust molecular algorithms using reprogrammable DNA self-assembly. Nature 2019, 567, 366–372. [Google Scholar] [CrossRef][Green Version]
- Song, T.; Eshra, A.; Shah, S.; Bui, H.; Fu, D.; Yang, M.; Mokhtar, R.; Reif, J. Fast and compact DNA logic circuits based on single-stranded gates using strand-displacing polymerase. Nat. Nanotechnol. 2019, 14, 1075–1081. [Google Scholar] [CrossRef]
- Chirieleison, S.M.; Allen, P.B.; Simpson, Z.B.; Ellington, A.D.; Chen, X. Pattern transformation with DNA circuits. Nat. Chem. 2013, 5, 1000. [Google Scholar] [CrossRef] [PubMed][Green Version]
- Weitz, M.; Kim, J.; Kapsner, K.; Winfree, E.; Franco, E.; Simmel, F.C. Diversity in the dynamical behaviour of a compartmentalized programmable biochemical oscillator. Nat. Chem. 2014, 6, 295–302. [Google Scholar] [CrossRef] [PubMed][Green Version]
- Zambrano, A.; Zadorin, A.; Rondelez, Y.; Estévez-Torres, A.; Galas, J. Pursuit-and-evasion reaction-diffusion waves in microreactors with tailored geometry. J. Phys. Chem. B 2015, 119, 5349–5355. [Google Scholar] [CrossRef] [PubMed]
- Genot, A.; Baccouche, A.; Sieskind, R.; Aubert-Kato, N.; Bredeche, N.; Bartolo, J.; Taly, V.; Fujii, T.; Rondelez, Y. High-resolution mapping of bifurcations in nonlinear biochemical circuits. Nat. Chem. 2016, 8, 760. [Google Scholar] [CrossRef] [PubMed]
- Baccouche, A.; Okumura, S.; Sieskind, R.; Henry, E.; Aubert-Kato, N.; Bredeche, N.; Bartolo, J.F.; Taly, V.; Rondelez, Y.; Fujii, T.; et al. Massively parallel and multiparameter titration of biochemical assays with droplet microfluidics. Nat. Protoc. 2017, 12, 1912–1932. [Google Scholar] [CrossRef] [PubMed]
- Kurylo, I.; Gines, G.; Rondelez, Y.; Coffinier, Y.; Vlandas, A. Spatiotemporal control of DNA-based chemical reaction network via electrochemical activation in microfluidics. Sci. Rep. 2018, 8, 6396. [Google Scholar] [CrossRef] [PubMed][Green Version]
- Amodio, A.; Del Grosso, E.; Troina, A.; Placidi, E.; Ricci, F. Remote Electronic Control of DNA-Based Reactions and Nanostructure Assembly. Nano Lett. 2018, 18, 2918–2923. [Google Scholar] [CrossRef]
- Zadorin, A.S.; Rondelez, Y.; Galas, J.C.; Estevez-Torres, A. Synthesis of programmable reaction-diffusion fronts using DNA catalyzers. Phys. Rev. Lett. 2015, 114, 068301. [Google Scholar] [CrossRef]
- Scalise, D.; Schulman, R. Emulating cellular automata in chemical reaction–diffusion networks. Nat. Comput. 2016, 15, 197–214. [Google Scholar] [CrossRef]
- Zadorin, A.S.; Rondelez, Y.; Gines, G.; Dilhas, V.; Urtel, G.; Zambrano, A.; Galas, J.C.; Estévez-Torres, A. Synthesis and materialization of a reaction–diffusion French flag pattern. Nat. Chem. 2017, 9, 990. [Google Scholar] [CrossRef][Green Version]
- Abe, K.; Kawamata, I.; Shin-ichiro, M.; Murata, S. Programmable reactions and diffusion using DNA for pattern formation in hydrogel medium. Mol. Syst. Des. Eng. 2019, 4, 639–643. [Google Scholar] [CrossRef]
- Chen, S.; Seelig, G. Programmable patterns in a DNA-based reaction–diffusion system. Soft Matter 2020, 16, 3555–3563. [Google Scholar] [CrossRef] [PubMed][Green Version]
- Bardi, I.; Biro, O.; Dyczij-Edlinger, R.; Preis, K.; Richter, K.R. On the treatment of sharp corners in the FEM analysis of high frequency problems. IEEE Trans. Magn. 1994, 30, 3108–3111. [Google Scholar] [CrossRef]
- Molnár, F., Jr.; Izsák, F.; Mészáros, R.; Lagzi, I. Simulation of reaction–diffusion processes in three dimensions using CUDA. Chemom. Intell. Lab. Syst. 2011, 108, 76–85. [Google Scholar] [CrossRef][Green Version]
- Descombes, S.; Dhillon, D.; Zwicker, M. Optimized CUDA-based PDE Solver for Reaction Diffusion Systems on Arbitrary Surfaces. In International Conference on Parallel Processing and Applied Mathematics; Springer: Berlin/Heidelberg, Germany, 2015; pp. 526–536. [Google Scholar]
- Sanderson, A.R.; Meyer, M.D.; Kirby, R.M.; Johnson, C.R. A framework for exploring numerical solutions of advection–reaction–diffusion equations using a GPU-based approach. Comput. Vis. Sci. 2009, 12, 155–170. [Google Scholar] [CrossRef]
- Sato, D.; Xie, Y.; Weiss, J.N.; Qu, Z.; Garfinkel, A.; Sanderson, A.R. Acceleration of cardiac tissue simulation with graphic processing units. Med. Biol. Eng. Comput. 2009, 47, 1011–1015. [Google Scholar] [CrossRef][Green Version]
- Mena, A.; Ferrero, J.M.; Matas, J.F.R. GPU accelerated solver for nonlinear reaction–diffusion systems. Application to the electrophysiology problem. Comput. Phys. Commun. 2015, 196, 280–289. [Google Scholar] [CrossRef]
- Sjodin, B. What’s the Difference between FEM, FDM, and FVM. Mach. Des. 2016. Available online: https://www.machinedesign.com/3d-printing-cad/fea-and-simulation/article/21832072/whats-the-difference-between-fem-fdm-and-fvm (accessed on 7 September 2020).
- Pera, D.; Málaga, C.; Simeoni, C.; Plaza, R. On the efficient numerical simulation of heterogeneous anisotropic diffusion models for tumor invasion using GPUs. Rend. Mat. E Sue Appl. 2019, 40, 233–255. [Google Scholar]
- Gormantara, A.; Pranowo, P. Parallel simulation of pattern formation in a reaction-diffusion system of FitzHugh-Nagumo using GPU CUDA. In AIP Conference Proceedings; AIP Publishing LLC: College Park, MD, USA, 2020; Volume 2217, p. 030134. [Google Scholar]
- Zaikin, A.; Zhabotinsky, A. Concentration wave propagation in two-dimensional liquid-phase self-oscillating system. Nature 1970, 225, 535–537. [Google Scholar] [CrossRef]
- Turing, A.M. The chemical basis of morphogenesis. Bull. Math. Biol. 1990, 52, 153–197. [Google Scholar] [CrossRef]
- Dalchau, N.; Seelig, G.; Phillips, A. Computational design of reaction-diffusion patterns using DNA-based chemical reaction networks. In International Workshop on DNA-Based Computers; Springer: Berlin/Heidelberg, Germany, 2014; pp. 84–99. [Google Scholar]
- Zenk, J.; Scalise, D.; Wang, K.; Dorsey, P.; Fern, J.; Cruz, A.; Schulman, R. Stable DNA-based reaction–diffusion patterns. RSC Adv. 2017, 7, 18032–18040. [Google Scholar] [CrossRef][Green Version]
- Smith, S.; Dalchau, N. Beyond activator-inhibitor networks: The generalised Turing mechanism. arXiv 2018, arXiv:1803.07886. [Google Scholar]
- Smith, S.; Dalchau, N. Model reduction enables Turing instability analysis of large reaction–diffusion models. J. R. Soc. Interface 2018, 15, 20170805. [Google Scholar] [CrossRef][Green Version]
- Joesaar, A.; Yang, S.; Bögels, B.; van der Linden, A.; Pieters, P.; Kumar, B.P.; Dalchau, N.; Phillips, A.; Mann, S.; de Greef, T.F. DNA-based communication in populations of synthetic protocells. Nat. Nanotechnol. 2019, 14, 369–378. [Google Scholar] [CrossRef]
- Urtel, G.; Estevez-Torres, A.; Galas, J.C. DNA-based long-lived reaction–diffusion patterning in a host hydrogel. Soft Matter 2019, 15, 9343–9351. [Google Scholar] [CrossRef]
- Gines, G.; Zadorin, A.; Galas, J.C.; Fujii, T.; Estevez-Torres, A.; Rondelez, Y. Microscopic agents programmed by DNA circuits. Nat. Nanotechnol. 2017, 12, 351–359. [Google Scholar] [CrossRef]
- Dupin, A.; Simmel, F.C. Signalling and differentiation in emulsion-based multi-compartmentalized in vitro gene circuits. Nat. Chem. 2019, 11, 32–39. [Google Scholar] [CrossRef]
- Kasahara, Y.; Sato, Y.; Masukawa, M.K.; Okuda, Y.; Takinoue, M. Photolithographic shape control of DNA hydrogels by photo-activated self-assembly of DNA nanostructures. APL Bioeng. 2020, 4, 016109. [Google Scholar] [CrossRef]
- Wolfram Research, Inc. Mathematica, Version 12.1; Wolfram Research, Inc.: Champaign, IL, USA, 2020. [Google Scholar]
- Galerkin, B. Series occurring in various questions concerning the elastic equilibrium of rods and plates. Eng. Bull. (Vestn. Inzhenerov) 1915, 19, 897–908. [Google Scholar]
- Strang, G. On the construction and comparison of difference schemes. SIAM J. Numer. Anal. 1968, 5, 506–517. [Google Scholar] [CrossRef]
- Ahamed, A.; Magoules, F. Conjugate gradient method with graphics processing unit acceleration: CUDA vs. OpenCL. Adv. Eng. Softw. 2017, 111, 32–42. [Google Scholar] [CrossRef]
- Barrett, R.; Berry, M.; Chan, T.; Demmel, J.; Donato, J.; Dongarra, J.; Eijkhout, V.; Pozo, R.; Romine, C.; Van der Vorst, H. Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods; SIAM: Philadelphia, PA, USA, 1994. [Google Scholar]
- Nvidia, C. Cublas Library; NVIDIA Corp.: Santa Clara, CA, USA, 2008; Volume 15, p. 31. [Google Scholar]
- Naumov, M.; Chien, L.; Vandermersch, P.; Kapasi, U. CUSPARSE library: A set of basic linear algebra subroutines for sparse matrices. In Proceedings of the GPU Technology Conference, San Jose, CA, USA, 20–23 September 2010; Volume 2070. [Google Scholar]
- Hestenes, M.; Stiefel, E. Methods of conjugate gradients for solving linear systems. J. Res. Natl. Bur. Stand. 1952, 49, 409–436. [Google Scholar] [CrossRef]
- Hutton, T.; Munafo, R.; Trevorrow, A.; Rokicki, T.; Wills, D. Ready, A Cross-Platform Implementation of Various Reaction-Diffusion Systems. 2015. Available online: https://github.com/GollyGang/ready (accessed on 7 September 2020).
- Du, Q.; Wang, D.; Zhu, L. On mesh geometry and stiffness matrix conditioning for general finite element spaces. SIAM J. Numer. Anal. 2009, 47, 1421–1444. [Google Scholar] [CrossRef][Green Version]
- Ramage, A.; Wathen, A. On preconditioning for finite element equations on irregular grids. SIAM J. Matrix Anal. Appl. 1994, 15, 909–921. [Google Scholar] [CrossRef]
- Bell, N.; Garland, M. Efficient Sparse Matrix-Vector Multiplication on CUDA; Technical Report, Nvidia Technical Report NVR-2008-004; Nvidia Corporation: Santa Clara, CA, USA, 2008. [Google Scholar]
Paper | Method | Problem | Speedup (CPU vs. GPU) |
---|---|---|---|
Sanderson et al., 2009 [52] | FDM | grid mesh, Advection-Reaction-Diffusion | ∼5–10x vs. one CPU core |
Molnar et al., 2011 [50] | FDM | grid mesh, Turing Patterns, Cahn–Hilliard eq., … | ∼5–40x vs. one CPU thread |
Pera et al., 2019 [56] | FDM | grid mesh, tumor growth | ∼100–500x vs. 8-core CPU |
Gormantara et al., 2020 [57] | FDM | grid mesh, FitzHugh-Nahumo model | ∼10x vs. CPU |
Sato et al., 2009 [53] | FEM+ODE | 3D cardiac simulations | ∼0.6x vs. 32-CPU cluster |
Mena et al., 2015 [54] | FEM+ODE | 3D cardiac simulations | ∼50x vs. one CPU core |
Descombes et al., 2015 [51] | FEM | chemotactic reaction-diffusion on arbitrary surface | ∼100–300x vs. 4-core CPU |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Sellami, H.; Cazenille, L.; Fujii, T.; Hagiya, M.; Aubert-Kato, N.; Genot, A.J. Accelerating the Finite-Element Method for Reaction-Diffusion Simulations on GPUs with CUDA. Micromachines 2020, 11, 881. https://doi.org/10.3390/mi11090881
Sellami H, Cazenille L, Fujii T, Hagiya M, Aubert-Kato N, Genot AJ. Accelerating the Finite-Element Method for Reaction-Diffusion Simulations on GPUs with CUDA. Micromachines. 2020; 11(9):881. https://doi.org/10.3390/mi11090881
Chicago/Turabian StyleSellami, Hedi, Leo Cazenille, Teruo Fujii, Masami Hagiya, Nathanael Aubert-Kato, and Anthony J. Genot. 2020. "Accelerating the Finite-Element Method for Reaction-Diffusion Simulations on GPUs with CUDA" Micromachines 11, no. 9: 881. https://doi.org/10.3390/mi11090881