Accelerating the FiniteElement Method for ReactionDiffusion Simulations on GPUs with CUDA
Abstract
:1. Introduction
2. Materials and Methods
2.1. Chemical System
2.2. Finite Element Method
2.3. Assembly of Stiffness and Damping Matrices
2.4. Resolution of the Matrix Differential Equations
Algorithm 1 Userlevel algorithm. 

Algorithm 2 Addition of two vectors. 

Algorithm 3 Naive dot product. 

Algorithm 4 Optimized dot product. 

2.5. Comparison GPU and CPU
2.6. PostProcessing
3. Results
3.1. Geometry
3.2. Comparison Performance
3.3. Profiling
4. Discussion
Supplementary Materials
Author Contributions
Funding
Conflicts of Interest
References
 Brodtkorb, A.; Hagen, T.; Sætra, M. Graphics processing unit (GPU) programming strategies and trends in GPU computing. J. Parallel Distrib. Comput. 2013, 73, 4–13. [Google Scholar] [CrossRef][Green Version]
 Ghorpade, J.; Parande, J.; Kulkarni, M.; Bawaskar, A. GPGPU processing in CUDA architecture. arXiv 2012, arXiv:1202.4347. [Google Scholar] [CrossRef]
 CUDA Performance Report. Available online: http://developer.download.nvidia.com/compute/cuda/6_5/rel/docs/CUDA_6.5_Performance_Report.pdf (accessed on 7 September 2020).
 Nickolls, J.; Buck, I.; Garland, M.; Skadron, K. Scalable parallel programming with CUDA. Queue 2008, 6, 40–53. [Google Scholar] [CrossRef][Green Version]
 Abadi, M.; Barham, P.; Chen, J.; Chen, Z.; Davis, A.; Dean, J.; Devin, M.; Ghemawat, S.; Irving, G.; Isard, M.; et al. Tensorflow: A system for largescale machine learning. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), Savannah, GA, USA, 2–4 November 2016; pp. 265–283. [Google Scholar]
 Paszke, A.; Gross, S.; Chintala, S.; Chanan, G.; Yang, E.; DeVito, Z.; Lin, Z.; Desmaison, A.; Antiga, L.; Lerer, A. Automatic differentiation in pytorch. In Proceedings of the NIPS 2017 Workshop Autodiff, Long Beach, CA, USA, 9 December 2017. [Google Scholar]
 Rovigatti, L.; Šulc, P.; Reguly, I.; Romano, F. A comparison between parallelization approaches in molecular dynamics simulations on GPUs. J. Comput. Chem. 2015, 36, 1–8. [Google Scholar] [CrossRef] [PubMed][Green Version]
 Glaser, J.; Nguyen, T.; Anderson, J.; Lui, P.; Spiga, F.; Millan, J.; Morse, D.; Glotzer, S. Strong scaling of generalpurpose molecular dynamics simulations on GPUs. Comput. Phys. Commun. 2015, 192, 97–107. [Google Scholar] [CrossRef][Green Version]
 Le Grand, S.; Götz, A.; Walker, R. SPFP: Speed without compromise—A mixed precision model for GPU accelerated molecular dynamics simulations. Comput. Phys. Commun. 2013, 184, 374–380. [Google Scholar] [CrossRef]
 Zienkiewicz, O.; Taylor, R.; Zhu, J. The Finite Element Method: Its Basis and Fundamentals; Elsevier: Amsterdam, The Netherlands, 2005. [Google Scholar]
 Fu, Z.; Lewis, T.; Kirby, R.; Whitaker, R. Architecting the finite element method pipeline for the GPU. J. Comput. Appl. Math. 2014, 257, 195–211. [Google Scholar] [CrossRef] [PubMed]
 Wu, W.; Heng, P.A. A hybrid condensed finite element model with GPU acceleration for interactive 3D soft tissue cutting. Comput. Animat. Virtual Worlds 2004, 15, 219–227. [Google Scholar] [CrossRef]
 Goddeke, D.; Buijssen, S.H.; Wobker, H.; Turek, S. GPU acceleration of an unmodified parallel finite element NavierStokes solver. In Proceedings of the 2009 International Conference on High Performance Computing & Simulation, Leipzig, Germany, 21–24 June 2009; pp. 12–21. [Google Scholar]
 Komatitsch, D.; Erlebacher, G.; Göddeke, D.; Michéa, D. Highorder finiteelement seismic wave propagation modeling with MPI on a large GPU cluster. J. Comput. Phys. 2010, 229, 7692–7714. [Google Scholar] [CrossRef]
 Joldes, G.R.; Wittek, A.; Miller, K. Realtime nonlinear finite element computations on GPU–Application to neurosurgical simulation. Comput. Methods Appl. Mech. Eng. 2010, 199, 3305–3314. [Google Scholar] [CrossRef][Green Version]
 Dziekonski, A.; Sypek, P.; Lamecki, A.; Mrozowski, M. Finite element matrix generation on a GPU. Prog. Electromagn. Res. 2012, 128, 249–265. [Google Scholar] [CrossRef][Green Version]
 Knepley, M.G.; Terrel, A.R. Finite element integration on GPUs. ACM Trans. Math. Softw. (TOMS) 2013, 39, 1–13. [Google Scholar] [CrossRef][Green Version]
 Wang, S.; Wang, C.; Cai, Y.; Li, G. A novel parallel finite element procedure for nonlinear dynamic problems using GPU and mixedprecision algorithm. Eng. Comput. 2020, 37. [Google Scholar] [CrossRef]
 Huthwaite, P. Accelerated finite element elastodynamic simulations using the GPU. J. Comput. Phys. 2014, 257, 687–707. [Google Scholar] [CrossRef][Green Version]
 Johnsen, S.F.; Taylor, Z.A.; Clarkson, M.J.; Hipwell, J.; Modat, M.; Eiben, B.; Han, L.; Hu, Y.; Mertzanidou, T.; Hawkes, D.J.; et al. NiftySim: A GPUbased nonlinear finite element package for simulation of soft tissue biomechanics. Int. J. Comput. Assist. Radiol. Surg. 2015, 10, 1077–1095. [Google Scholar] [CrossRef][Green Version]
 Bauer, P.; Klement, V.; Oberhuber, T.; Žabka, V. Implementation of the Vankatype multigrid solver for the finite element approximation of the Navier–Stokes equations on GPU. Comput. Phys. Commun. 2016, 200, 50–56. [Google Scholar] [CrossRef]
 CarrascalManzanares, C.; Imperiale, A.; Rougeron, G.; Bergeaud, V.; Lacassagne, L. A fast implementation of a spectral finite elements method on CPU and GPU applied to ultrasound propagation. Adv. Parallel Comput. 2018, 32, 339–348. [Google Scholar]
 Comsol, A. COMSOL Multiphysics User’s Guide; COMSOL: Stockholm, Sweden, 2005; Volume 10, p. 333. [Google Scholar]
 Soloveichik, D.; Seelig, G.; Winfree, E. DNA as a universal substrate for chemical kinetics. Proc. Natl. Acad. Sci. USA 2010, 107, 5393–5398. [Google Scholar] [CrossRef][Green Version]
 Kim, J.; Winfree, E. Synthetic in vitro transcriptional oscillators. Mol. Syst. Biol. 2011, 7, 465. [Google Scholar] [CrossRef]
 Montagne, K.; Plasson, R.; Sakai, Y.; Fujii, T.; Rondelez, Y. Programming an in vitro DNA oscillator using a molecular networking strategy. Mol. Syst. Biol. 2011, 7, 466. [Google Scholar] [CrossRef]
 Fujii, T.; Rondelez, Y. Predator–prey molecular ecosystems. ACS Nano 2013, 7, 27–34. [Google Scholar] [CrossRef] [PubMed]
 Padirac, A.; Fujii, T.; EstévezTorres, A.; Rondelez, Y. Spatial waves in synthetic biochemical networks. J. Am. Chem. Soc. 2013, 135, 14586–14592. [Google Scholar] [CrossRef]
 Srinivas, N.; Parkin, J.; Seelig, G.; Winfree, E.; Soloveichik, D. Enzymefree nucleic acid dynamical systems. Science 2017, 358. [Google Scholar] [CrossRef] [PubMed][Green Version]
 Genot, A.J.; Bath, J.; Turberfield, A.J. Reversible logic circuits made of DNA. J. Am. Chem. Soc. 2011, 133, 20080–20083. [Google Scholar] [CrossRef]
 Genot, A.J.; Bath, J.; Turberfield, A.J. Combinatorial displacement of DNA strands: Application to matrix multiplication and weighted sums. Angew. Chem. Int. Ed. 2013, 52, 1189–1192. [Google Scholar] [CrossRef] [PubMed]
 Stojanovic, M.N.; Stefanovic, D.; Rudchenko, S. Exercises in molecular computing. Acc. Chem. Res. 2014, 47, 1845–1852. [Google Scholar] [CrossRef]
 Lopez, R.; Wang, R.; Seelig, G. A molecular multigene classifier for disease diagnostics. Nat. Chem. 2018, 10, 746–754. [Google Scholar] [CrossRef]
 Cherry, K.M.; Qian, L. Scaling up molecular pattern recognition with DNAbased winnertakeall neural networks. Nature 2018, 559, 370–376. [Google Scholar] [CrossRef]
 Woods, D.; Doty, D.; Myhrvold, C.; Hui, J.; Zhou, F.; Yin, P.; Winfree, E. Diverse and robust molecular algorithms using reprogrammable DNA selfassembly. Nature 2019, 567, 366–372. [Google Scholar] [CrossRef][Green Version]
 Song, T.; Eshra, A.; Shah, S.; Bui, H.; Fu, D.; Yang, M.; Mokhtar, R.; Reif, J. Fast and compact DNA logic circuits based on singlestranded gates using stranddisplacing polymerase. Nat. Nanotechnol. 2019, 14, 1075–1081. [Google Scholar] [CrossRef]
 Chirieleison, S.M.; Allen, P.B.; Simpson, Z.B.; Ellington, A.D.; Chen, X. Pattern transformation with DNA circuits. Nat. Chem. 2013, 5, 1000. [Google Scholar] [CrossRef] [PubMed][Green Version]
 Weitz, M.; Kim, J.; Kapsner, K.; Winfree, E.; Franco, E.; Simmel, F.C. Diversity in the dynamical behaviour of a compartmentalized programmable biochemical oscillator. Nat. Chem. 2014, 6, 295–302. [Google Scholar] [CrossRef] [PubMed][Green Version]
 Zambrano, A.; Zadorin, A.; Rondelez, Y.; EstévezTorres, A.; Galas, J. Pursuitandevasion reactiondiffusion waves in microreactors with tailored geometry. J. Phys. Chem. B 2015, 119, 5349–5355. [Google Scholar] [CrossRef] [PubMed]
 Genot, A.; Baccouche, A.; Sieskind, R.; AubertKato, N.; Bredeche, N.; Bartolo, J.; Taly, V.; Fujii, T.; Rondelez, Y. Highresolution mapping of bifurcations in nonlinear biochemical circuits. Nat. Chem. 2016, 8, 760. [Google Scholar] [CrossRef] [PubMed]
 Baccouche, A.; Okumura, S.; Sieskind, R.; Henry, E.; AubertKato, N.; Bredeche, N.; Bartolo, J.F.; Taly, V.; Rondelez, Y.; Fujii, T.; et al. Massively parallel and multiparameter titration of biochemical assays with droplet microfluidics. Nat. Protoc. 2017, 12, 1912–1932. [Google Scholar] [CrossRef] [PubMed]
 Kurylo, I.; Gines, G.; Rondelez, Y.; Coffinier, Y.; Vlandas, A. Spatiotemporal control of DNAbased chemical reaction network via electrochemical activation in microfluidics. Sci. Rep. 2018, 8, 6396. [Google Scholar] [CrossRef] [PubMed][Green Version]
 Amodio, A.; Del Grosso, E.; Troina, A.; Placidi, E.; Ricci, F. Remote Electronic Control of DNABased Reactions and Nanostructure Assembly. Nano Lett. 2018, 18, 2918–2923. [Google Scholar] [CrossRef]
 Zadorin, A.S.; Rondelez, Y.; Galas, J.C.; EstevezTorres, A. Synthesis of programmable reactiondiffusion fronts using DNA catalyzers. Phys. Rev. Lett. 2015, 114, 068301. [Google Scholar] [CrossRef]
 Scalise, D.; Schulman, R. Emulating cellular automata in chemical reaction–diffusion networks. Nat. Comput. 2016, 15, 197–214. [Google Scholar] [CrossRef]
 Zadorin, A.S.; Rondelez, Y.; Gines, G.; Dilhas, V.; Urtel, G.; Zambrano, A.; Galas, J.C.; EstévezTorres, A. Synthesis and materialization of a reaction–diffusion French flag pattern. Nat. Chem. 2017, 9, 990. [Google Scholar] [CrossRef][Green Version]
 Abe, K.; Kawamata, I.; Shinichiro, M.; Murata, S. Programmable reactions and diffusion using DNA for pattern formation in hydrogel medium. Mol. Syst. Des. Eng. 2019, 4, 639–643. [Google Scholar] [CrossRef]
 Chen, S.; Seelig, G. Programmable patterns in a DNAbased reaction–diffusion system. Soft Matter 2020, 16, 3555–3563. [Google Scholar] [CrossRef] [PubMed][Green Version]
 Bardi, I.; Biro, O.; DyczijEdlinger, R.; Preis, K.; Richter, K.R. On the treatment of sharp corners in the FEM analysis of high frequency problems. IEEE Trans. Magn. 1994, 30, 3108–3111. [Google Scholar] [CrossRef]
 Molnár, F., Jr.; Izsák, F.; Mészáros, R.; Lagzi, I. Simulation of reaction–diffusion processes in three dimensions using CUDA. Chemom. Intell. Lab. Syst. 2011, 108, 76–85. [Google Scholar] [CrossRef][Green Version]
 Descombes, S.; Dhillon, D.; Zwicker, M. Optimized CUDAbased PDE Solver for Reaction Diffusion Systems on Arbitrary Surfaces. In International Conference on Parallel Processing and Applied Mathematics; Springer: Berlin/Heidelberg, Germany, 2015; pp. 526–536. [Google Scholar]
 Sanderson, A.R.; Meyer, M.D.; Kirby, R.M.; Johnson, C.R. A framework for exploring numerical solutions of advection–reaction–diffusion equations using a GPUbased approach. Comput. Vis. Sci. 2009, 12, 155–170. [Google Scholar] [CrossRef]
 Sato, D.; Xie, Y.; Weiss, J.N.; Qu, Z.; Garfinkel, A.; Sanderson, A.R. Acceleration of cardiac tissue simulation with graphic processing units. Med. Biol. Eng. Comput. 2009, 47, 1011–1015. [Google Scholar] [CrossRef][Green Version]
 Mena, A.; Ferrero, J.M.; Matas, J.F.R. GPU accelerated solver for nonlinear reaction–diffusion systems. Application to the electrophysiology problem. Comput. Phys. Commun. 2015, 196, 280–289. [Google Scholar] [CrossRef]
 Sjodin, B. What’s the Difference between FEM, FDM, and FVM. Mach. Des. 2016. Available online: https://www.machinedesign.com/3dprintingcad/feaandsimulation/article/21832072/whatsthedifferencebetweenfemfdmandfvm (accessed on 7 September 2020).
 Pera, D.; Málaga, C.; Simeoni, C.; Plaza, R. On the efficient numerical simulation of heterogeneous anisotropic diffusion models for tumor invasion using GPUs. Rend. Mat. E Sue Appl. 2019, 40, 233–255. [Google Scholar]
 Gormantara, A.; Pranowo, P. Parallel simulation of pattern formation in a reactiondiffusion system of FitzHughNagumo using GPU CUDA. In AIP Conference Proceedings; AIP Publishing LLC: College Park, MD, USA, 2020; Volume 2217, p. 030134. [Google Scholar]
 Zaikin, A.; Zhabotinsky, A. Concentration wave propagation in twodimensional liquidphase selfoscillating system. Nature 1970, 225, 535–537. [Google Scholar] [CrossRef]
 Turing, A.M. The chemical basis of morphogenesis. Bull. Math. Biol. 1990, 52, 153–197. [Google Scholar] [CrossRef]
 Dalchau, N.; Seelig, G.; Phillips, A. Computational design of reactiondiffusion patterns using DNAbased chemical reaction networks. In International Workshop on DNABased Computers; Springer: Berlin/Heidelberg, Germany, 2014; pp. 84–99. [Google Scholar]
 Zenk, J.; Scalise, D.; Wang, K.; Dorsey, P.; Fern, J.; Cruz, A.; Schulman, R. Stable DNAbased reaction–diffusion patterns. RSC Adv. 2017, 7, 18032–18040. [Google Scholar] [CrossRef][Green Version]
 Smith, S.; Dalchau, N. Beyond activatorinhibitor networks: The generalised Turing mechanism. arXiv 2018, arXiv:1803.07886. [Google Scholar]
 Smith, S.; Dalchau, N. Model reduction enables Turing instability analysis of large reaction–diffusion models. J. R. Soc. Interface 2018, 15, 20170805. [Google Scholar] [CrossRef][Green Version]
 Joesaar, A.; Yang, S.; Bögels, B.; van der Linden, A.; Pieters, P.; Kumar, B.P.; Dalchau, N.; Phillips, A.; Mann, S.; de Greef, T.F. DNAbased communication in populations of synthetic protocells. Nat. Nanotechnol. 2019, 14, 369–378. [Google Scholar] [CrossRef]
 Urtel, G.; EstevezTorres, A.; Galas, J.C. DNAbased longlived reaction–diffusion patterning in a host hydrogel. Soft Matter 2019, 15, 9343–9351. [Google Scholar] [CrossRef]
 Gines, G.; Zadorin, A.; Galas, J.C.; Fujii, T.; EstevezTorres, A.; Rondelez, Y. Microscopic agents programmed by DNA circuits. Nat. Nanotechnol. 2017, 12, 351–359. [Google Scholar] [CrossRef]
 Dupin, A.; Simmel, F.C. Signalling and differentiation in emulsionbased multicompartmentalized in vitro gene circuits. Nat. Chem. 2019, 11, 32–39. [Google Scholar] [CrossRef]
 Kasahara, Y.; Sato, Y.; Masukawa, M.K.; Okuda, Y.; Takinoue, M. Photolithographic shape control of DNA hydrogels by photoactivated selfassembly of DNA nanostructures. APL Bioeng. 2020, 4, 016109. [Google Scholar] [CrossRef]
 Wolfram Research, Inc. Mathematica, Version 12.1; Wolfram Research, Inc.: Champaign, IL, USA, 2020. [Google Scholar]
 Galerkin, B. Series occurring in various questions concerning the elastic equilibrium of rods and plates. Eng. Bull. (Vestn. Inzhenerov) 1915, 19, 897–908. [Google Scholar]
 Strang, G. On the construction and comparison of difference schemes. SIAM J. Numer. Anal. 1968, 5, 506–517. [Google Scholar] [CrossRef]
 Ahamed, A.; Magoules, F. Conjugate gradient method with graphics processing unit acceleration: CUDA vs. OpenCL. Adv. Eng. Softw. 2017, 111, 32–42. [Google Scholar] [CrossRef]
 Barrett, R.; Berry, M.; Chan, T.; Demmel, J.; Donato, J.; Dongarra, J.; Eijkhout, V.; Pozo, R.; Romine, C.; Van der Vorst, H. Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods; SIAM: Philadelphia, PA, USA, 1994. [Google Scholar]
 Nvidia, C. Cublas Library; NVIDIA Corp.: Santa Clara, CA, USA, 2008; Volume 15, p. 31. [Google Scholar]
 Naumov, M.; Chien, L.; Vandermersch, P.; Kapasi, U. CUSPARSE library: A set of basic linear algebra subroutines for sparse matrices. In Proceedings of the GPU Technology Conference, San Jose, CA, USA, 20–23 September 2010; Volume 2070. [Google Scholar]
 Hestenes, M.; Stiefel, E. Methods of conjugate gradients for solving linear systems. J. Res. Natl. Bur. Stand. 1952, 49, 409–436. [Google Scholar] [CrossRef]
 Hutton, T.; Munafo, R.; Trevorrow, A.; Rokicki, T.; Wills, D. Ready, A CrossPlatform Implementation of Various ReactionDiffusion Systems. 2015. Available online: https://github.com/GollyGang/ready (accessed on 7 September 2020).
 Du, Q.; Wang, D.; Zhu, L. On mesh geometry and stiffness matrix conditioning for general finite element spaces. SIAM J. Numer. Anal. 2009, 47, 1421–1444. [Google Scholar] [CrossRef][Green Version]
 Ramage, A.; Wathen, A. On preconditioning for finite element equations on irregular grids. SIAM J. Matrix Anal. Appl. 1994, 15, 909–921. [Google Scholar] [CrossRef]
 Bell, N.; Garland, M. Efficient Sparse MatrixVector Multiplication on CUDA; Technical Report, Nvidia Technical Report NVR2008004; Nvidia Corporation: Santa Clara, CA, USA, 2008. [Google Scholar]
Paper  Method  Problem  Speedup (CPU vs. GPU) 

Sanderson et al., 2009 [52]  FDM  grid mesh, AdvectionReactionDiffusion  ∼5–10x vs. one CPU core 
Molnar et al., 2011 [50]  FDM  grid mesh, Turing Patterns, Cahn–Hilliard eq., …  ∼5–40x vs. one CPU thread 
Pera et al., 2019 [56]  FDM  grid mesh, tumor growth  ∼100–500x vs. 8core CPU 
Gormantara et al., 2020 [57]  FDM  grid mesh, FitzHughNahumo model  ∼10x vs. CPU 
Sato et al., 2009 [53]  FEM+ODE  3D cardiac simulations  ∼0.6x vs. 32CPU cluster 
Mena et al., 2015 [54]  FEM+ODE  3D cardiac simulations  ∼50x vs. one CPU core 
Descombes et al., 2015 [51]  FEM  chemotactic reactiondiffusion on arbitrary surface  ∼100–300x vs. 4core CPU 
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Sellami, H.; Cazenille, L.; Fujii, T.; Hagiya, M.; AubertKato, N.; Genot, A.J. Accelerating the FiniteElement Method for ReactionDiffusion Simulations on GPUs with CUDA. Micromachines 2020, 11, 881. https://doi.org/10.3390/mi11090881
Sellami H, Cazenille L, Fujii T, Hagiya M, AubertKato N, Genot AJ. Accelerating the FiniteElement Method for ReactionDiffusion Simulations on GPUs with CUDA. Micromachines. 2020; 11(9):881. https://doi.org/10.3390/mi11090881
Chicago/Turabian StyleSellami, Hedi, Leo Cazenille, Teruo Fujii, Masami Hagiya, Nathanael AubertKato, and Anthony J. Genot. 2020. "Accelerating the FiniteElement Method for ReactionDiffusion Simulations on GPUs with CUDA" Micromachines 11, no. 9: 881. https://doi.org/10.3390/mi11090881