# Runtime Construction of Large-Scale Spiking Neuronal Network Models on GPU Devices

^{1}

^{2}

^{3}

^{4}

^{5}

^{*}

^{†}

## Abstract

**:**

## 1. Introduction

## 2. Materials and Methods

#### 2.1. Creation of Connections Directly in GPU Memory

`ngpu.Connect(sources, targets, conn_dict, syn_dict)`, where the connection dictionary

`conn_dict`specifies a connection rule, e.g.,

`one_to_one`, for establishing connections between source and target nodes. The successive creation of several individual sub-networks, according to deterministic or probabilistic rules, can then lead to a complex overall network. In the rules used here, we allow autapses (self-connections) and multapses (multiple connections between the same pair of nodes); see [21] for a summary of state-of-the-art connectivity concepts.

- One-to-one (
`one_to_one`):$$\begin{array}{cc}\hfill s& \hfill ={s}_{0}+{i}_{\mathrm{newconns}}\hfill \end{array}$$$$\begin{array}{cc}\hfill t& \hfill ={t}_{0}+{i}_{\mathrm{newconns}}\hfill \end{array}$$ - All-to-all (
`all_to_all`):$$\begin{array}{cc}\hfill s& \hfill ={s}_{0}+\lfloor \frac{{i}_{\mathrm{newconns}}}{{N}_{\mathrm{targets}}}\rfloor \hfill \end{array}$$$$\begin{array}{cc}\hfill t& \hfill ={t}_{0}+\mathrm{mod}({i}_{\mathrm{newconns}},{N}_{\mathrm{targets}})\hfill \end{array}$$ - Random, fixed out-degree with multapses (
`fixed_outdegree`):$$\begin{array}{cc}\hfill s& \hfill ={s}_{0}+\lfloor \frac{{i}_{\mathrm{newconns}}}{K}\rfloor \hfill \end{array}$$$$\begin{array}{cc}\hfill t& \hfill ={t}_{0}+\mathrm{rand}\left({N}_{\mathrm{targets}}\right)\hfill \end{array}$$ - Random, fixed in-degree with multapses (
`fixed_indegree`):$$\begin{array}{cc}\hfill s& \hfill ={s}_{0}+\mathrm{rand}\left({N}_{\mathrm{sources}}\right)\hfill \end{array}$$$$\begin{array}{cc}\hfill t& \hfill ={t}_{0}+\lfloor \frac{{i}_{\mathrm{newconns}}}{K}\rfloor \hfill \end{array}$$ - Random, fixed total number with multapses (
`fixed_total_number`):$$\begin{array}{cc}\hfill s& \hfill ={s}_{0}+\mathrm{rand}\left({N}_{\mathrm{sources}}\right)\hfill \end{array}$$$$\begin{array}{cc}\hfill t& \hfill ={t}_{0}+\mathrm{rand}\left({N}_{\mathrm{targets}}\right)\hfill \end{array}$$

#### 2.2. Data Structures Used for Connections

#### 2.3. The Spike Buffer

#### 2.4. Models Used for Performance Evaluation

`fixed_total_number`with autapses and multapses allowed. The dynamics of the membrane potentials and synaptic currents are integrated using the exact integration method proposed by Rotter and Diesmann [25], and the membrane potential of the neurons of every population are initialized from a normal distribution with mean and standard deviation optimized from the neuron population as in [26]. This approach avoids transients at the beginning of the simulation. Signals originating from outside of the local circuitry, i.e., from other cortical areas and the thalamus, can be approximated with Poisson-distributed spike input or DC current input. Tables 1–4 of [27] (see fixed total number models) contain a detailed model description and report the values of the parameters. The model explains the experimentally observed cell-type and layer-specific firing statistics, and it has been used in the past both as a building block for larger models (e.g., [28]) and as a benchmark for several validation studies [9,17,26,29,30,31,32].

`nestgpu.Connect()`calls. The total number of neurons in the network is N (i.e., $N/2$ per population), and the target total number of connections is $N\times K$ connections, where K is the target number of connections per neuron. Dependent on the connection rule used, the instantiated networks may exhibit small deviations from the following target values:

`fixed_total_number`:The total number of connections used in each connect call is set to $\lfloor N\times K/4\rfloor $.`fixed_indegree`:The in-degree used in each connect call is set to $\lfloor K/2\rfloor $.`fixed_outdegree`:The out-degree used in each connect call is set to $\lfloor K/2\rfloor $.

#### 2.5. Hardware and Software of Performance Evaluation

#### 2.6. Simulation Phases

- Initialization is a setup phase in the Python script for preparing both model and simulator by importing modules, instantiating a class, or setting parameters, etc.
`import nestgpu`

- Node creation instantiates all the neurons and devices of the model.
`nestgpu.Create()`

- Node connection instantiates the connections among network nodes.
`nestgpu.Connect()`

- Calibration is a preparation phase that orders the connections and initializes data structures for the spike buffers and the spike arrays just before the state propagation begins. In the CPU code, the pre-synaptic connection infrastructure is set up here. This stage can be triggered by simulating just one time step h.
`nestgpu.Simulate(h)`

Previously, the calibration phase of NEST GPU was used to finish moving data to the GPU memory and instantiate additional data structures like the spike buffer (cf. Section 2.3). Now, as no data transfer is needed and connection sorting is carried out instead (cf. Section 2.2), the calibration phase is now conceptually closer to the operations carried out in the CPU version of NEST [37].

- Model definition defines neurons and devices and synapses of the network model.
`from pygenn import genn_model``model = genn_model.GeNNModel()``model.add_neuron_population()`

- Building generates and compiles the simulation code.
`model.build()`

- Loading allocates memory and instantiates the network on the GPU.
`model.load()`

#### 2.7. Validation of the Proposed Network Construction Method

- Time-averaged firing rate for each neuron;
- Coefficient of variation of inter-spike intervals (CV ISI);
- Pairwise Pearson correlation of the spike trains obtained from a subset of 200 neurons for each population.

## 3. Results

#### 3.1. Cortical Microcircuit Model

#### 3.2. Two-Population Network

`fixed_total_number`connection rule and ranging the number of neurons and connections per neuron. The performance obtained using the

`fixed_indegree`and

`fixed_outdegree`connection rules are totally compatible with the ones shown in this figure, and the respective plots are available in Appendix D for completeness.

## 4. Discussion

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

`nest-gpu_onboard`and

`nest-gpu_offboard`.

## Acknowledgments

## Conflicts of Interest

## Appendix A. Block Sorting

**Figure A1.**The COPASS block-sort algorithm. (

**A**) Unsorted array, divided in blocks (subarrays). Each element of the array is represented as a blue bar. The vertical solid lines represent the division in subarrays. (

**B**) Each subarray is sorted using the underlying sorting algorithm. (

**C**) The subarrays are divided in two partitions each using a common threshold, t, in such a way that the total size of the left partitions (represented in red) is equal to the size of the first block. (

**D**) The left partitions are copied to the auxiliary array. (

**E**) The right partitions are shifted to the right, and the auxiliary array is copied to the first block. (

**F**) The auxiliary array is sorted. The procedure from (

**C**) to (

**E**) is then repeated on the new subarrays, delimited by the green dashed lines, in order to extract and sort the second block, and so on, until the last block.

#### Appendix A.1. The COPASS (Constrained Partition of Sorted Subarrays) Block-Sort Algorithm

#### Appendix A.2. The COPASS Partition Algorithm

- Case 1$$\begin{array}{c}\hfill \sum _{i}{\underset{\overline{}}{\mu}}_{i,s}\le m\le \sum _{i}{\overline{\mu}}_{i,s}\end{array}$$In this case, $t={\tilde{t}}_{s}$. The iteration is concluded, and the partition sizes ${m}_{i}$ are computed using the procedure described in Appendix A.4.
- Case 2$$\begin{array}{c}\hfill \sum _{i}{\underset{\overline{}}{m}}_{i,s}<m<\sum _{i}{\underset{\overline{}}{\mu}}_{i,s}\end{array}$$In this case, we set$$\begin{array}{cccc}\hfill {\underset{\overline{}}{m}}_{i,s+1}& \hfill ={\underset{\overline{}}{m}}_{i,s}\phantom{\rule{2.em}{0ex}}\hfill & \hfill {\underset{\overline{}}{t}}_{s+1}& \hfill ={\underset{\overline{}}{t}}_{s}\hfill \end{array}$$$$\begin{array}{cccc}\hfill {\overline{m}}_{i,s+1}& \hfill ={\underset{\overline{}}{\mu}}_{i,s}\phantom{\rule{2.em}{0ex}}\hfill & \hfill {\overline{t}}_{s+1}& \hfill ={\tilde{t}}_{s}\hfill \end{array}$$
- Case 3$$\begin{array}{c}\hfill \sum _{i}{\overline{\mu}}_{i,s}<m<\sum _{i}{\overline{m}}_{i,s}\end{array}$$In this case, we set$$\begin{array}{cccc}\hfill {\underset{\overline{}}{m}}_{i,s+1}& \hfill ={\overline{\mu}}_{i,s}\phantom{\rule{2.em}{0ex}}\hfill & \hfill {\underset{\overline{}}{t}}_{s+1}& \hfill ={\tilde{t}}_{s}\hfill \end{array}$$$$\begin{array}{cccc}\hfill {\overline{m}}_{i,s+1}& \hfill ={\overline{m}}_{i,s}\phantom{\rule{2.em}{0ex}}\hfill & \hfill {\overline{t}}_{s+1}& \hfill ={\overline{t}}_{s}\hfill \end{array}$$

#### Appendix A.3. The COPASS Partition Last Step, Case 1

#### Appendix A.4. The COPASS Partition Last Step, Case 2

## Appendix B. Validation Details

`seaborn.violinplot`function of the Seaborn [40] Python library. The function computes smoothed distribution through the Kernel Density Estimation method [41,42] with Gaussian kernel, with bandwidth optimized using the Silverman method [43].

**Figure A2.**Violin plots of the distributions of firing rate (

**A**), CV ISI (

**B**), and Pearson correlation (

**C**) for a simulation for the populations of the cortical microcircuit model using NEST GPU with (sky blue distributions, right) or without (orange distributions, left) the new method for network construction. (

**D**–

**F**) Same as (

**A**–

**C**), but the orange distributions are obtained using NEST 3.3. Central dashed line represents the median of the distributions, whereas the two dashed lines represent the interquartile range.

`scipy.stats.wasserstein_distance`of the SciPy library [44]. More details on this method can be found in [13]. We simulate sets of 100 simulations changing the seed for random number generation. The sets of simulations for the two versions of the NEST GPU library are, thus, pairwise compared, obtaining for each distribution and each population of the model a set of 100 values of EMD, evaluating the difference between the distributions of the two versions of NEST GPU (offboard-onboard). Furthermore, we compute an additional set of simulations for the previous version of NEST GPU to be compared with the other set of the same version (offboard-offboard). This way, we can evaluate the differences that can arise using the same simulator with different seeds for random number generation and compare it with the differences obtained by comparing the two different versions of NEST GPU. Additionally, we performed the same validation to compare NEST and NEST GPU to have a quantitative comparison between the most recent versions of the two simulators, i.e., NEST-NEST GPU (onboard) and NEST-NEST. Figure A3 shows the EMD box plots for all the distributions computed and for all the populations.

**Figure A3.**Box plots of the Earth Mover’s Distance comparing side by side firing rate (

**A**), CV ISI (

**B**) and Pearson correlation (

**C**) of the two versions of NEST GPU (sky blue boxes, left) and the previous version of NEST GPU using different seeds (orange boxes, right). Panels (

**D**–

**F**) are the same as (

**A**–

**C**), but distributions of NEST GPU (onboard) and NEST 3.3 are compared. In particular, the comparison between the different simulator is represented by the sky blue boxes on the left, whereas the comparison between two sets of NEST simulations is depicted with the orange boxes. Central line of the box plot represent the median of the distribution, whereas the extension of the boxes is determined by the interquartile range of the distribution formed by the values of EMD of each comparison. Whiskers shows the rest of the distribution as a function of the interquartile range, and dots represent the outliers.

## Appendix C. Additional Data for Cortical Microcircuit Simulations

**Figure A4.**Real-time factor, defined as ${T}_{\mathrm{wall}}/{T}_{\mathrm{model}}$, of cortical microcircuit model simulations for NEST GPU (onboard), NEST and GeNN. The biological model time we use to compute the real-time factor is ${T}_{\mathrm{model}}=10$ s, simulated driving the external stimulation using Poisson spike generators (left bars, pink) or DC input (right bars, dark red). GeNN (magenta bars) employs a different approach for simulating external stimuli. Error bars show the standard deviation of the simulation phase over ten simulations using different random seeds.

## Appendix D. Additional Data for the Two-Population Network Simulations

`fixed_total_number`connection rule. In Figure A5, we provide the corresponding data for the

`fixed_indegree`and the

`fixed_outdegree`rules.

**Figure A5.**Network construction time of the two-population network with N total neurons and K connections per neuron using different connection rules. (

**A**) Performance obtained using the

`fixed_indegree`connection rule, i.e., each neuron of the network has an in-degree of K. (

**B**) Performance obtained using the

`fixed_outdegree`connection rule, i.e., each neuron of the network has K out-degrees. The value of network construction time for the network with ${10}^{6}$ neurons and ${10}^{4}$ connections per neuron is not shown because of lack of GPU memory. Error bars indicate the standard deviation of the performance across 10 simulations using different seeds.

## References

- Gewaltig, M.O.; Diesmann, M. NEST (NEural Simulation Tool). Scholarpedia
**2007**, 2, 1430. [Google Scholar] [CrossRef] - Carnevale, N.T.; Hines, M.L. The NEURON Book; Cambridge University Press: Cambridge, UK, 2006. [Google Scholar] [CrossRef]
- Stimberg, M.; Brette, R.; Goodman, D.F. Brian 2, an intuitive and efficient neural simulator. eLife
**2019**, 8, e47314. [Google Scholar] [CrossRef] [PubMed] - Bekolay, T.; Bergstra, J.; Hunsberger, E.; DeWolf, T.; Stewart, T.; Rasmussen, D.; Choo, X.; Voelker, A.; Eliasmith, C. Nengo: A Python tool for building large-scale functional brain models. Front. Neuroinform.
**2014**, 7, 48. [Google Scholar] [CrossRef] - Vitay, J.; Dinkelbach, H.U.; Hamker, F.H. ANNarchy: A code generation approach to neural simulations on parallel hardware. Front. Neuroinform.
**2015**, 9, 19. [Google Scholar] [CrossRef] - Yavuz, E.; Turner, J.; Nowotny, T. GeNN: A code generation framework for accelerated brain simulations. Sci. Rep.
**2016**, 6, 18854. [Google Scholar] [CrossRef] - Nageswaran, J.M.; Dutt, N.; Krichmar, J.L.; Nicolau, A.; Veidenbaum, A.V. A configurable simulation environment for the efficient simulation of large-scale spiking neural networks on graphics processors. Neural Netw.
**2009**, 22, 791–800. [Google Scholar] [CrossRef] - Niedermeier, L.; Chen, K.; Xing, J.; Das, A.; Kopsick, J.; Scott, E.; Sutton, N.; Weber, K.; Dutt, N.; Krichmar, J.L. CARLsim 6: An Open Source Library for Large-Scale, Biologically Detailed Spiking Neural Network Simulation. In Proceedings of the 2022 International Joint Conference on Neural Networks (IJCNN), Padua, Italy, 18–23 July 2022; pp. 1–10. [Google Scholar] [CrossRef]
- Golosio, B.; Tiddia, G.; De Luca, C.; Pastorelli, E.; Simula, F.; Paolucci, P.S. Fast Simulations of Highly-Connected Spiking Cortical Models Using GPUs. Front. Comput. Neurosci.
**2021**, 15, 627620. [Google Scholar] [CrossRef] [PubMed] - Kumbhar, P.; Hines, M.; Fouriaux, J.; Ovcharenko, A.; King, J.; Delalondre, F.; Schürmann, F. CoreNEURON: An Optimized Compute Engine for the NEURON Simulator. Front. Neuroinform.
**2019**, 13, 63. [Google Scholar] [CrossRef] - Golosio, B.; De Luca, C.; Pastorelli, E.; Simula, F.; Tiddia, G.; Paolucci, P.S. Toward a possible integration of NeuronGPU in NEST. In Proceedings of the NEST Conference, Aas, Norway, 29–30 June 2020; Volume 7. [Google Scholar]
- Stimberg, M.; Goodman, D.F.M.; Nowotny, T. Brian2GeNN: Accelerating spiking neural network simulations with graphics hardware. Sci. Rep.
**2020**, 10, 410. [Google Scholar] [CrossRef] - Tiddia, G.; Golosio, B.; Albers, J.; Senk, J.; Simula, F.; Pronold, J.; Fanti, V.; Pastorelli, E.; Paolucci, P.S.; van Albada, S.J. Fast Simulation of a Multi-Area Spiking Network Model of Macaque Cortex on an MPI-GPU Cluster. Front. Neuroinform.
**2022**, 16, 883333. [Google Scholar] [CrossRef] - Alevi, D.; Stimberg, M.; Sprekeler, H.; Obermayer, K.; Augustin, M. Brian2CUDA: Flexible and Efficient Simulation of Spiking Neural Network Models on GPUs. Front. Neuroinform.
**2022**, 16, 883700. [Google Scholar] [CrossRef] - Awile, O.; Kumbhar, P.; Cornu, N.; Dura-Bernal, S.; King, J.G.; Lupton, O.; Magkanaris, I.; McDougal, R.A.; Newton, A.J.H.; Pereira, F.; et al. Modernizing the NEURON Simulator for Sustainability, Portability, and Performance. Front. Neuroinform.
**2022**, 16, 884046. [Google Scholar] [CrossRef] [PubMed] - Abi Akar, N.; Cumming, B.; Karakasis, V.; Küsters, A.; Klijn, W.; Peyser, A.; Yates, S. Arbor—A Morphologically-Detailed Neural Network Simulation Library for Contemporary High-Performance Computing Architectures. In Proceedings of the 2019 27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), Pavia, Italy, 13–15 February 2019; pp. 274–282. [Google Scholar] [CrossRef]
- Knight, J.C.; Komissarov, A.; Nowotny, T. PyGeNN: A Python Library for GPU-Enhanced Neural Networks. Front. Neuroinform.
**2021**, 15, 659005. [Google Scholar] [CrossRef] - Balaji, A.; Adiraju, P.; Kashyap, H.J.; Das, A.; Krichmar, J.L.; Dutt, N.D.; Catthoor, F. PyCARL: A PyNN Interface for Hardware-Software Co-Simulation of Spiking Neural Network. In Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, UK, 19–24 July 2020; pp. 1–10. [Google Scholar] [CrossRef]
- Eppler, J.; Helias, M.; Muller, E.; Diesmann, M.; Gewaltig, M.O. PyNEST: A convenient interface to the NEST simulator. Front. Neuroinform.
**2009**, 2, 12. [Google Scholar] [CrossRef] - Davison, A.P. PyNN: A common interface for neuronal network simulators. Front. Neuroinform.
**2008**, 2, 11. [Google Scholar] [CrossRef] [PubMed] - Senk, J.; Kriener, B.; Djurfeldt, M.; Voges, N.; Jiang, H.J.; Schüttler, L.; Gramelsberger, G.; Diesmann, M.; Plesser, H.E.; van Albada, S.J. Connectivity concepts in neuronal network modeling. PLoS Comput. Biol.
**2022**, 18, e1010086. [Google Scholar] [CrossRef] - Morrison, A.; Diesmann, M. Maintaining Causality in Discrete Time Neuronal Network Simulations. In Lectures in Supercomputational Neurosciences: Dynamics in Complex Brain Networks; Graben, P.b., Zhou, C., Thiel, M., Kurths, J., Eds.; Springer: Berlin/Heidelberg, Germany, 2008; pp. 267–278. [Google Scholar] [CrossRef]
- Cormen, T.H.; Leiserson, C.E.; Rivest, R.L.; Stein, C. Introduction to Algorithms, 3rd ed.; The MIT Press: Cambridge, MA, USA, 2009. [Google Scholar]
- Potjans, T.C.; Diesmann, M. The Cell-Type Specific Cortical Microcircuit: Relating Structure and Activity in a Full-Scale Spiking Network Model. Cereb. Cortex
**2014**, 24, 785–806. [Google Scholar] [CrossRef] [PubMed] - Rotter, S.; Diesmann, M. Exact digital simulation of time-invariant linear systems with applications to neuronal modeling. Biol. Cybern.
**1999**, 81, 381–402. [Google Scholar] [CrossRef] - Van Albada, S.J.; Rowley, A.G.; Senk, J.; Hopkins, M.; Schmidt, M.; Stokes, A.B.; Lester, D.R.; Diesmann, M.; Furber, S.B. Performance Comparison of the Digital Neuromorphic Hardware SpiNNaker and the Neural Network Simulation Software NEST for a Full-Scale Cortical Microcircuit Model. Front. Neurosci.
**2018**, 12, 291. [Google Scholar] [CrossRef] - Dasbach, S.; Tetzlaff, T.; Diesmann, M.; Senk, J. Dynamical Characteristics of Recurrent Neuronal Networks Are Robust Against Low Synaptic Weight Resolution. Front. Neurosci.
**2021**, 15, 757790. [Google Scholar] [CrossRef] - Schmidt, M.; Bakker, R.; Shen, K.; Bezgin, G.; Diesmann, M.; van Albada, S.J. A multi-scale layer-resolved spiking network model of resting-state dynamics in macaque visual cortical areas. PLoS Comput. Biol.
**2018**, 14, e1006359. [Google Scholar] [CrossRef] [PubMed] - Knight, J.C.; Nowotny, T. GPUs Outperform Current HPC and Neuromorphic Solutions in Terms of Speed and Energy When Simulating a Highly-Connected Cortical Model. Front. Neurosci.
**2018**, 12, 941. [Google Scholar] [CrossRef] - Rhodes, O.; Peres, L.; Rowley, A.G.D.; Gait, A.; Plana, L.A.; Brenninkmeijer, C.; Furber, S.B. Real-time cortical simulation on neuromorphic hardware. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci.
**2019**, 378, 20190160. [Google Scholar] [CrossRef] - Kurth, A.C.; Senk, J.; Terhorst, D.; Finnerty, J.; Diesmann, M. Sub-realtime simulation of a neuronal network of natural density. Neuromorphic Comput. Eng.
**2022**, 2, 021001. [Google Scholar] [CrossRef] - Heittmann, A.; Psychou, G.; Trensch, G.; Cox, C.E.; Wilcke, W.W.; Diesmann, M.; Noll, T.G. Simulating the Cortical Microcircuit Significantly Faster Than Real Time on the IBM INC-3000 Neural Supercomputer. Front. Neurosci.
**2022**, 15, 728460. [Google Scholar] [CrossRef] [PubMed] - Izhikevich, E. Simple model of spiking neurons. IEEE Trans. Neural Netw.
**2003**, 14, 1569–1572. [Google Scholar] [CrossRef] [PubMed] - Spreizer, S.; Mitchell, J.; Jordan, J.; Wybo, W.; Kurth, A.; Vennemo, S.B.; Pronold, J.; Trensch, G.; Benelhedi, M.A.; Terhorst, D.; et al. NEST 3.3. Zenodo
**2022**. [Google Scholar] [CrossRef] - Vieth, B.V.S. JUSUF: Modular Tier-2 Supercomputing and Cloud Infrastructure at Jülich Supercomputing Centre. J. Large-Scale Res. Facil. JLSRF
**2021**, 7, A179. [Google Scholar] [CrossRef] - Thörnig, P. JURECA: Data Centric and Booster Modules implementing the Modular Supercomputing Architecture at Jülich Supercomputing Centre. J. Large-Scale Res. Facil. JLSRF
**2021**, 7, A182. [Google Scholar] [CrossRef] - Jordan, J.; Ippen, T.; Helias, M.; Kitayama, I.; Sato, M.; Igarashi, J.; Diesmann, M.; Kunkel, S. Extremely Scalable Spiking Neuronal Network Simulation Code: From Laptops to Exascale Computers. Front. Neuroinform.
**2018**, 12, 2. [Google Scholar] [CrossRef] - Azizi, A. Introducing a Novel Hybrid Artificial Intelligence Algorithm to Optimize Network of Industrial Applications in Modern Manufacturing. Complexity
**2017**, 2017, 8728209. [Google Scholar] [CrossRef] - Schmitt, F.J.; Rostami, V.; Nawrot, M.P. Efficient parameter calibration and real-time simulation of large-scale spiking neural networks with GeNN and NEST. Front. Neuroinform.
**2023**, 17, 941696. [Google Scholar] [CrossRef] [PubMed] - Waskom, M.L. Seaborn: Statistical data visualization. J. Open Source Softw.
**2021**, 6, 3021. [Google Scholar] [CrossRef] - Rosenblatt, M. Remarks on Some Nonparametric Estimates of a Density Function. Ann. Math. Stat.
**1956**, 27, 832–837. [Google Scholar] [CrossRef] - Parzen, E. On Estimation of a Probability Density Function and Mode. Ann. Math. Stat.
**1962**, 33, 1065–1076. [Google Scholar] [CrossRef] - Silverman, B.W. Density Estimation for Statistics and Data Analysis; Chapman and Hall: London, UK, 1986. [Google Scholar]
- Virtanen, P.; Gommers, R.; Oliphant, T.E.; Haberland, M.; Reddy, T.; Cournapeau, D.; Burovski, E.; Peterson, P.; Weckesser, W.; Bright, J.; et al. SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nat. Methods
**2020**, 17, 261–272. [Google Scholar] [CrossRef] - Albers, J.; Pronold, J.; Kurth, A.C.; Vennemo, S.B.; Mood, K.H.; Patronis, A.; Terhorst, D.; Jordan, J.; Kunkel, S.; Tetzlaff, T.; et al. A Modular Workflow for Performance Benchmarking of Neuronal Network Simulations. Front. Neuroinform.
**2022**, 16, 837549. [Google Scholar] [CrossRef]

**Figure 1.**Example of connection creation using the all-to-all connection rule. (

**A**) Each one of the four source nodes (green) is connected to all three target nodes (orange). The connections generated via this rule are identified with an index, ${i}_{\mathrm{newconns}}$, ranging from 0 to 11 (blue disks). (

**B**) The connections are stored in blocks that are allocated dynamically, where, for demonstration purposes, a block size of ten connections is used. The black squares represent previous connections (established using an earlier connect call), while the twelve connections generated via the considered instance of the all-to-all rule are represented by the same blue disks labeled with ${i}_{\mathrm{newconns}}$ as in panel A. The new connections in different blocks are generated via separate CUDA kernels. In this example, ${N}_{\mathrm{prevconns},2}$ of the new connections are created in the previous block (grey frame), and the remaining ones in the current block ($b=2$, yellow frame), where ${i}_{\mathrm{newconns}}$ is computed by adding the CUDA thread index k to ${N}_{\mathrm{prevconns},2}$.

**Figure 2.**Schematic representation of the networks used in this work. (

**A**) Diagram of the cortical microcircuit model reproduced from [26]. (

**B**) Scheme of the network of two populations of Izhikevich neurons.

**Figure 3.**Comparison of network construction phase and simulation of the network dynamics for the two versions of NEST GPU on the cortical microcircuit model. (

**A**) Performance comparison of the network construction phase using different hardware configurations. (

**B**) Real-time factor defined as ${T}_{\mathrm{wall}}/{T}_{\mathrm{model}}$. The biological model time we use to compute the real-time factor is ${T}_{\mathrm{model}}=10$ s. The external drive is provided via Poisson spike generators (left bars, pink) or DC input (right bars, dark red). Error bars show the standard deviation of the simulation phase over ten simulations using different random seeds.

**Figure 4.**Performance comparison of the network construction phase for different simulators and hardware configurations on the cortical microcircuit model. Data for NEST GPU (onboard) are the same as in Figure 3. (

**A**) Network construction time of the model in linear scale for different simulators and hardware configurations. (

**B**) as in (

**A**) but with logarithmic y-axis scale. In both panels, the building phase of GeNN is placed on top of the bar, breaking with the otherwise chronological order because this phase is not always required, and, at the same time, this display makes the shorter loading phase visible in the plot with the logarithmic y-axis. Error bars show the standard deviation of the overall network construction phase over ten simulations using different random seeds.

**Figure 5.**Network construction time of the two-population network with N neurons in total and K connections per neuron using the

`fixed_total_number`connection rule, i.e., the average amount of connections per neuron is K, and the total number of connections is $N\times K$. Error bars indicate the standard deviation of the performance across 10 simulations using different seeds.

**Table 1.**Hardware configuration of the different systems used to measure the performance of the simulators. Cluster information is given on a per node basis.

System | CPU | GPU |
---|---|---|

JUSUF cluster | 2× AMD EPYC 7742, 2× 64 cores, 2.25 GHz | NVIDIA V100 ^{1}, 1530 MHz, 16 GB HBM2e, 5120 CUDA cores |

JURECA-DC cluster | 2× AMD EPYC 7742, 2× 64 cores, 2.25 GHz | NVIDIA A100 ^{2}, 1410 MHz, 40 GB HBM2e, 6912 CUDA cores |

Workstation 1 | Intel Core i9-9900K, 8 cores, 3.60 GHz | NVIDIA RTX 2080 Ti ^{3}, 1545 MHz, 11 GB GDDR6, 4352 CUDA cores |

Workstation 2 | Intel Core i9-10940X, 14 cores, 3.30 GHz | NVIDIA RTX 4090 ^{4}, 2520 MHz, 24 GB GDDR6X, 16384 CUDA cores |

^{1}Volta architecture: https://developer.nvidia.com/blog/inside-volta, accessed on 14 August 2023.

^{2}Ampere architecture: https://developer.nvidia.com/blog/nvidia-ampere-architecture-in-depth, accessed on 14 August 2023.

^{3}Turing architecture: https://developer.nvidia.com/blog/nvidia-turing-architecture-in-depth, accessed on 14 August 2023.

^{4}Ada Lovelace architecture: https://www.nvidia.com/en-us/geforce/ada-lovelace-architecture, accessed on 14 August 2023.

**Table 2.**Performance metrics of NEST and NEST GPU when using Poisson spike generators to drive external stimulation to the neurons of the model. All times are in seconds with notation (mean (standard deviation)). Simulation time is calculated for a simulation of 10 s of biological time.

Metrics | NEST GPU (onboard) | NEST GPU (offboard) | NEST 3.3 (CPU) | ||||||
---|---|---|---|---|---|---|---|---|---|

V100 | A100 | 2080Ti | 4090 | V100 | A100 | 2080Ti | 4090 | 2 $\times $ 64 Cores | |

Initialization | $5.08\left(0.15\right)$ × 10 ^{−4} | $1.44\left(0.15\right)$ × 10 ^{−3} | $1.71\left(0.09\right)$ × 10 ^{−4} | $1.91\left(0.04\right)$ × 10 ^{−4} | $4.99\left(0.08\right)$ × 10 ^{−4} | $1.44\left(0.15\right)$ × 10 ^{−3} | $1.66\left(0.12\right)$ × 10 ^{−4} | $1.84\left(0.05\right)$ × 10 ^{−4} | $0.02\left(0.01\right)$ |

Node creation | $0.02\left(0.004\right)$ | $0.03\left(0.007\right)$ | $1.63\left(0.09\right)$ × 10 ^{−3} | $1.94\left(0.02\right)$ × 10 ^{−3} | $3.02\left(0.02\right)$ | $3.32\left(0.05\right)$ | $1.93\left(0.04\right)$ | $1.781$(0.018) | $0.39\left(0.02\right)$ |

Node connection | $0.105$(0.0003) | $0.08\left(0.002\right)$ | $0.308$(0.009) | $0.1600$(0.0005) | $54.65\left(0.11\right)$ | $56.02\left(0.27\right)$ | $41.16\left(0.28\right)$ | $44.2\left(0.7\right)$ | $1.72\left(0.17\right)$ |

Calibration | $0.57$(0.001) | $0.408$(0.005) | $0.602$(0.0006) | $0.3638$(0.0004) | $1.99\left(0.01\right)$ | $2.06\left(0.01\right)$ | $2.202\left(0.01\right)$ | $2.183$(0.014) | $2.39\left(0.01\right)$ |

Network construction | $0.708$(0.001) | $\mathbf{0}.\mathbf{52}(\mathbf{0}.\mathbf{08})$ | $0.91\left(0.09\right)$ | $\mathbf{0}.\mathbf{5259}$(0.0008) | $59.67\left(0.13\right)$ | $61.41\left(0.27\right)$ | $45.29\left(0.32\right)$ | $48.2\left(0.7\right)$ | $4.54\left(0.18\right)$ |

Simulation (10 s) | $8.82\left(0.09\right)$ | $8.54\left(0.03\right)$ | $8.504\left(0.02\right)$ | $4.707$(0.008) | $9.28\left(0.04\right)$ | $8.94\left(0.02\right)$ | $8.64\left(0.01\right)$ | $5.219$(0.018) | $12.66\left(0.08\right)$ |

**Table 3.**Performance metrics of NEST and NEST GPU when using DC input to drive external stimulation to the neurons of the model. All times are in seconds with notation (mean (standard deviation)). Simulation time is calculated for a simulation of 10 s of biological time.

Metrics | NEST GPU (onboard) | NEST GPU (offboard) | NEST 3.3 (CPU) | ||||||
---|---|---|---|---|---|---|---|---|---|

V100 | A100 | 2080Ti | 4090 | V100 | A100 | 2080Ti | 4090 | 2 $\times $ 64 Cores | |

Initialization | $5.04\left(0.13\right)$ × 10 ^{−4} | $1.44\left(0.08\right)$ × 10 ^{−3} | $1.75\left(0.16\right)$ × 10 ^{−4} | $1.97\left(0.09\right)$ × 10 ^{−4} | $5.1\left(0.4\right)$ × 10 ^{−4} | $1.5\left(0.4\right)$ × 10 ^{−3} | $1.62\left(0.04\right)$ × 10 ^{−4} | $1.86\left(0.04\right)$ × 10 ^{−4} | $0.018$(0.003) |

Node creation | $7.0\left(0.5\right)$ × 10 ^{−3} | $6.6\left(0.3\right)$ × 10 ^{−3} | $1.43\left(0.13\right)$ × 10 ^{−3} | $1.64\left(0.04\right)$ × 10 ^{−3} | $3.01\left(0.02\right)$ | $3.28\left(0.03\right)$ | $1.91\left(0.02\right)$ | $1.79\left(0.03\right)$ | $0.392$(0.003) |

Node connection | $0.1028$(0.0004) | $0.0790$(0.0013) | $0.31\left(0.02\right)$(0.009) | $0.1538$(0.0005) | $54.65\left(0.17\right)$ | $55.89\left(0.19\right)$ | $40.8\left(0.5\right)$ | $44.2\left(0.7\right)$ | $1.53\left(0.07\right)$ |

Calibration | $0.5785$(0.0013) | $0.412$(0.008) | $0.6011$(0.0006) | $0.3632$(0.0003) | $1.993$(0.012) | $2.059$(0.016) | $2.194$(0.015) | $2.181$(0.015) | $2.352$(0.005) |

Network construction | $0.6888$(0.0018) | $\mathbf{0}.\mathbf{499}(\mathbf{0}.\mathbf{10})$ | $0.91\left(0.02\right)$ | $\mathbf{0}.\mathbf{5189}$(0.0005) | $59.65\left(0.19\right)$ | $61.23\left(0.19\right)$ | $44.9\left(0.5\right)$ | $48.1\left(0.7\right)$ | $4.30\left(0.07\right)$ |

Simulation (10 s) | $6.36\left(0.02\right)$ | $7.32\left(0.05\right)$ | $5.61\left(0.03\right)$ | $3.86\left(0.01\right)$ | $6.530$(0.012) | $7.43\left(0.02\right)$ | $5.604$(0.016) | $3.953$(0.013) | $7.77\left(0.15\right)$ |

**Table 4.**Performance metrics of GeNN. All times are in seconds with notation (mean (standard deviation)). Simulation time is calculated for a simulation of 10 s of biological time.

Metrics | GeNN | |||
---|---|---|---|---|

V100 | A100 | 2080Ti | 4090 | |

Model definition | $1.704\left(0.008\right)$ × 10^{−2} | $1.75\left(0.01\right)$ × 10^{−2} | $1.07\left(0.01\right)$ × 10^{−2} | $1.094\left(0.007\right)$ × 10^{−2} |

Building | $13.87\left(0.36\right)$ | $14.301\left(0.72\right)$ | $7.25\left(0.04\right)$ | $8.15\left(0.04\right)$ |

Loading | $0.77\left(0.02\right)$ | $0.85\left(0.006\right)$ | $0.51\left(0.01\right)$ | $0.445\left(0.015\right)$ |

Network construction (no building) | $0.79\left(0.02\right)$ | $0.85\left(0.006\right)$ | $0.52\left(0.01\right)$ | $0.456\left(0.015\right)$ |

Network construction | $14.67\left(0.35\right)$ | $15.15\left(0.72\right)$ | $7.78\left(0.04\right)$ | $8.61\left(0.04\right)$ |

Simulation (10 s) | $6.48\left(0.01\right)$ | $5.39\left(0.01\right)$ | $7.007\left(0.01\right)$ | $2.719\left(0.006\right)$ |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Golosio, B.; Villamar, J.; Tiddia, G.; Pastorelli, E.; Stapmanns, J.; Fanti, V.; Paolucci, P.S.; Morrison, A.; Senk, J.
Runtime Construction of Large-Scale Spiking Neuronal Network Models on GPU Devices. *Appl. Sci.* **2023**, *13*, 9598.
https://doi.org/10.3390/app13179598

**AMA Style**

Golosio B, Villamar J, Tiddia G, Pastorelli E, Stapmanns J, Fanti V, Paolucci PS, Morrison A, Senk J.
Runtime Construction of Large-Scale Spiking Neuronal Network Models on GPU Devices. *Applied Sciences*. 2023; 13(17):9598.
https://doi.org/10.3390/app13179598

**Chicago/Turabian Style**

Golosio, Bruno, Jose Villamar, Gianmarco Tiddia, Elena Pastorelli, Jonas Stapmanns, Viviana Fanti, Pier Stanislao Paolucci, Abigail Morrison, and Johanna Senk.
2023. "Runtime Construction of Large-Scale Spiking Neuronal Network Models on GPU Devices" *Applied Sciences* 13, no. 17: 9598.
https://doi.org/10.3390/app13179598