Simulation of Spiking Neural P Systems with Sparse Matrix-Vector Operations

: To date, parallel simulation algorithms for spiking neural P (SNP) systems are based on a matrix representation. This way, the simulation is implemented with linear algebra operations, which can be easily parallelized on high performance computing platforms such as GPUs. Although it has been convenient for the ﬁrst generation of GPU-based simulators, such as CuSNP, there are some bottlenecks to sort out. For example, the proposed matrix representations of SNP systems lead to very sparse matrices, where the majority of values are zero. It is known that sparse matrices can compromise the performance of algorithms since they involve a waste of memory and time. This problem has been extensively studied in the literature of parallel computing. In this paper, we analyze some of these ideas and apply them to represent some variants of SNP systems. We also provide a new simulation algorithm based on a novel compressed representation for sparse matrices. We also conclude which SNP system variant better suits our new compressed matrix representation.


Introduction
Membrane computing [1,2] is an interdisciplinary research area in the intersection of computer science and cellular biology mainly [3], but also with many other fields such as engineering, neuroscience, systems biology, chemistry, etc. The aim is to study computational devices called P systems, taking inspiration from how living cells process information. Spiking neural P (SNP) systems [4] are a type of P system composed of a directed graph inspired by how neurons are interconnected by axons and synapses in the brain. Neurons communicate through spikes, and the time difference between them plays an important role in the computation. Therefore, this model belongs to the known third generation of artificial neural networks, i.e., based on spikes.
Aside from computing numbers, SNP systems can also compute strings, and hence, languages. More general ways to provide the input or receive the output include the use of spike trains, i.e., a stream or sequence of spikes entering or leaving the system. Further results and details on computability, complexity, and applications of spiking neural P systems are detailed in [5][6][7], a dedicated chapter in the Handbook in [8], and an extensive bibliography until February 2016 in [9]. Moreover, there is a wide range of SNP system variants: with delays, with weights [10], with astrocytes [11], with anti-spikes [12], dendrites [13], rules on synapses [14], scheduled synapses [15], stochastic firing [16], numerical [17], etc.
The research on applications and variants of SNP systems has required the development of simulators. The simulation of SNP systems was initially carried out through sequential simulators such as pLinguaCore [18]. In 2010, a matrix representation of SNP systems was introduced [19]. Since then, most simulation algorithms are based on matrices and vector representations, and consists of a set of linear algebra operations. This way, parallel simulators can be efficiently implemented, since matrix-vector multiplications are easy to parallelize. Moreover, there are efficient algebra libraries that can be used out-of-the-box, although they have not been explored yet for this purpose. For instance, GPUs are parallel devices optimized for certain matrix operations [20], and can handle matrix operations efficiently. We can say, without loss of generality, that these matrix representations of SNP systems fit well to the highly parallel architecture of these devices. This have been harnessed already by introducing CuSNP, a set of simulators for SNP systems implemented with CUDA [21][22][23][24]. Simulators for specific solutions have been also defined in the literature [5,25]. Moreover, this is not unique for SNP systems, many simulators for other P system variants have been accelerated on GPUs [26][27][28].
However, this matrix representation can be sparse (i.e., having a majority of zero values) because the directed graph of SNP systems is not usually fully connected. A first approach to tackle this problem was presented in [29], where some of the ideas described in this work were described. Following these ideas, in [30], the transition matrix was split to reduce the memory footprint of the SNP representation. In many disciplines, sparse vector-matrix operations are very usual, and hence, many solutions based on compressed implementations have been proposed in the literature [31].
In this paper, we introduce compressed representations for the simulation of SNP systems based on sparse vector-matrix operations. First, we provide two approaches to compress the transition matrix for the simulation of SNP systems with static graph. Second, we extend these algorithms and data structures for SNP systems with dynamic graphs (division, budding, and plasticity). Finally, we make a complexity analysis and comparison of the algorithms to draw some conclusions.
The paper is structured as follows: Section 2 provides required concepts for the methods and algorithms here defined; Section 3 defines the designs of the representations; Section 4 contains the detailed algorithms based on the compressed representations; Section 5 shows the results on complexity analyses of the algorithms; Section 6 provides final conclusions, remarks, and plans of future work.

Preliminaries
In this section we briefly introduce the concepts employed in this work. Firstly, we define the standard model of spiking neural P systems and three variants. Second, a matrix-based simulation algorithm for this model is revisited. Third, the fundamentals of compressed formats for sparse matrix-vector operations are given.

Spiking Neural P Systems
Let us first formally introduce the definition of spiking neural P system. This model was first introduced in [4].

Definition 1.
A spiking neural P system of degree q ≥ 1 is a tuple Π = (O, syn, σ 1 , . . . , σ q , i out ) where: • O = {a} is the singleton alphabet (a is called spike); • syn = {(i, j)|, 1 ≤ i, j ≤ q, i = j} represents the arcs of a directed graph G = (V, syn) whose nodes are V = {1, . . . , q}; • σ 1 , . . . , σ q are neurons of the form where: n i ≥ 0 is the initial number of spikes within neuron labeled by i; and -R i is a finite set of rules associated to neuron labeled by i, of one of the following forms: (1) E/a c → a p , being E a regular expression over {a}, c ≥ p ≥ 1 (firing rules); (2) a s → λ for some s ≥ 1, with the restriction that for each rule E/a c → a p of type of type (1) from R i , we have a s ∈ L(E) (forgetting rules).
A spiking neural P system of degree q ≥ 1 can be viewed as a set of q neurons {σ 1 , . . . , σ q } interconnected by the arcs of a directed graph syn, called synapse graph. There is a distinguished neuron label i out , called output neuron (σ i out ), which communicates with the environment.
If a neuron σ i contains k spikes at an instant t, and a k ∈ L(E), k ≥ c, then the rule E/a c → a p can be applied. By the application of that rule, c spikes are removed from neuron σ i and the neuron fires producing p spikes immediately. Thus, each neuron σ j such that (σ i , σ j ) ∈ G receives p spikes. For σ i out , the output neuron i out , the spikes are sent to the environment.
The rules of type (2) are forgetting rules, and they are applied as follows: If neuron σ i contains exactly s spikes, then the rule a s → λ from R i can be applied. By the application of this rule all s spikes are removed from σ i .
In spiking neural P systems, a global clock is assumed, marking the time for the whole system. Only one rule can be executed in each neuron at step t. As models of computation, spiking neural P systems are Turing complete, i.e., as powerful as Turing machines. On one hand, a common way to introduce the input (instance of the problem to solve) to the system is to encode it into some or all of the initial spikes n i 's (inside each neuron i). On the other hand, a common way to obtain the output is by observing neuron i out : either by getting the interval t 2 − t 1 = n, where σ i out sent its first two spikes at times t 1 and t 2 (we say n is computed or generated by the system), or by counting all the spikes sent by σ i out to the environment until the system halts.
For the rest of the paper, we call this model spiking neural P systems with static structure, or just static SNP, given that the graph associated with it does not change along the computation. Next, we briefly introduce and focus on three variants with a dynamic graph: division, budding, and plasticity. A broader explanation of them and more variants are provided at [32][33][34][35].

Spiking Neural P Systems with Budding Rules
Based on the idea of neuronal budding, where a cell is divided into two new cells, we can abstract it to budding rules. In this process, the new neurons can differ in some aspects: their connections, contents, and shape. A budding rule has the following form: where E is a regular expression and i, j ∈ {1, . . . , q}.
If a neuron σ i contains s spikes, a s ∈ L(E), and there is no neuron σ j such that there exists a synapse (i, j) in the system, then the rule [E] i → [ ] i /[ ] j is enabled and it can be executed. A new neuron σ j is created, and both neurons σ i and σ j are empty after the execution of the rule. This neuron σ i keeps all the synapses that were going in, and this σ j inherits all the synapses that were going out of σ i in the previous configuration. There is also a synapse (i, j) between neurons σ i and σ j , and the rest of synapses of σ j are given to the neuron depending on the synapses of syn.

Spiking Neural P Systems with Division Rules
Inspired by the process of mitosis, division rules have been widely used within the field of membrane computing. In SNP systems, a division rule can be defined as follows: where E is a regular expression and i, j, k ∈ {1, . . . , q}.
If a neuron σ i contains s spikes, a s ∈ L(E), and there is no neuron σ g such that the synapse (g, i) or (i, g) exists in the system, g ∈ {j, k}, then the rule [E] i → [ ] j ||[ ] k is enabled and it can be executed. Neuron σ i is then divided into two new cells, σ j and σ k . The new cells are empty at the time of their creation. The new neurons keep the synapses previously associated to the original neuron σ i , that is, if there was a synapse from σ g to σ i , then a new synapse from σ g to σ j and a new one to σ k are created, and if there was a synapse from σ i to σ g , then a new synapse from σ j to σ g and a new one from σ k to σ g are created. The rest of synapses of σ j and σ k are given by the ones defined in syn.

Spiking Neural P Systems with Plasticity Rules
It is known that new synapses can be created in the brain thanks to the process of synaptogenesis. We can recreate this process in the framework of spiking neural P systems defining plasticity rules in the following form: where E is a regular expression, c ≥ 1, α ∈ {+, −, ±, ∓}, k ≥ 1 and N j ⊆ {1, . . . , q} (a.k.a. neuron set). Recall that pres(i) is the set of presynapses of neuron σ i .
If a neuron σ i contains s spikes, a s ∈ L(E), then the rule E/a c → αk(i, N j ) is enabled and can be executed. The rule consumes c spikes and, depending on the value of α, it performs one of the following:

•
If α = + and |N j − pres(i)| ≤ k, deterministically create a synapse to every σ g , g ∈ N j − pres(i). Otherwise, if |N j − pres(i)| > k, then non-deterministically select k neurons in N j − pres(i) and create one synapse to each selected neuron.

•
If α = − and |pres(i)| ≤ k, deterministically delete all synapses in pres(i). Otherwise, if |pres(i)| > k, then non-deterministically select k neurons in pres(i) and delete each synapse to the selected neurons.

•
If α = {±, ∓}, create (respectively, delete) synapses at time t and then delete (resp., create) synapses at time t + 1. Even when this rule is applied, neurons are still open, that is, they can continue receiving spikes.
Let us notice that if, for some σ i , we apply a plasticity rule with α ∈ {+, ±, ∓}, when a synapse is created, a spike is sent from σ i to the neuron that has been connected. That is, when σ i attaches to σ j through this method, we have immediately transferring one spike to σ j .

Matrix Representation for SNP Systems
Usually, parallel, P system simulators make use of ad-hoc representations, tailored for a certain variant [26][27][28]. In order to ease the simulation of static SNP system and its deployment to parallel environments, a matrix representation was introduced [19]. By using a set of algebraic operations, it is possible to reproduce the transitions of a computation. Although the baseline representation only involves SNP systems without delays and static structure, many extensions have followed such as for enabling delays [21,22], handling non-determinism [24], plasticity rules [36], rules on synapses [37], and dendrite P systems [38]. In this section we briefly introduce the definitions for the matrix representation of the basic model of spiking neural P systems without delays, as defined above. We also provide the pseudocode to simulate just one computation of any P system of the variant using this matrix representation. In our notation, we use capital letters for vectors and matrices, and [] for accessing values: V[i] is the value at position i of the vector V, and M[i, j] is the value at row i and column j of matrix M.
For an SNP system Π of degree (q, m) (q neurons and m rules, where m = ∑ q i=1 |R i |), we define the following vectors and matrices: Configuration vector: C k is the vector containing all spikes in every neuron on the kth computation step/time, where C 0 denotes the initial configuration; i.e., C 0 [i] = n i , for neuron σ i = (n i , R i ). It contains q elements.
Spiking vector: S k shows if a rule is going to fire at the transition step k (having value 1) or not (having value 0). Given the non-determinism nature of SNP systems, it would be possible to have a set of valid spiking vectors, which is denoted as SV k . However, for the computation of the next configuration vector, only a spiking vector is used. It contains m elements.
Transition matrix: M Π is a matrix comprised of m · q elements given as −c, rule r i is in σ j and is applied consuming c spikes; p, rule r i is in σ s , with s = j and (s, j) ∈ syn, and is applied producing p spikes; 0, rule r i is in σ s with s = j and (s, j) / ∈ syn.
In this representation, rows represent rules and columns represent neurons in the spiking transition matrix. Note also that a negative entry corresponds to the consumption of spikes. Thus, it is easy to observe that each row has exactly one negative entry, and each column has at least one negative entry [19].
Hence, to compute the transition k, it is enough to select a spiking vector S k from all possibilities SV k and calculate: C k = S k · M Π + C k−1 .
The pseudocode to simulate a computation of an SNP system is as described in Algorithm 1. The selection of valid spiking vectors can be done in different ways, as in [21,22]. This returns a set of valid spiking vectors. In this work, we focus on just one computation, but non-determinism can be tackled by maintaining a queue of generated configurations [24]. Algorithm 1 MAIN PROCEDURE: simulating one computation for static spiking neural P systems.
Require: A SNP system Π of degree (q, m), and a limit L of time steps. Ensure: A computation of the system 1: Calculate all possible spiking vectors 5: Pick one spiking vector randomly 6: C k+1 ← C k + S k · M Π Compute next configuration 7: Stop condition: maximum steps or no more applicable rules 9: return C 0 . . . C k−1 In this work we focus on compressing the representation, specifically the transition matrix, so the determination of the spiking vector is not affecting these designs. Therefore, we use a straightforward approach and select just one valid spiking vector randomly. The representations here depicted only affect how the computation of the next configuration is done (matrix-vector multiplication at line 6 in Algorithm 1).

Sparse Matrix-Vector Operations
Algebraic operations have been studied deeply in parallel computing solutions. Specifically, GPU computing provides large speedups when accelerating such kind of operations. This technology allows us to run scientific computations in parallel on the GPU, given that a GPU device typically contains thousands of cores and high memory bandwidth [39]. However, parallel computing on a GPU has more constraints than on a CPU: threads have to run in an SPMD fashion while accessing data in a coalesced way; that is, best performance is achieved when the execution of threads is synchronized and accessing contiguous data from memory. In fact, GPUs have been employed for P system simulations since the introduction of CUDA.
Matrix computation is a highly optimized operation in CUDA [40], and there are many efficient libraries for algebra computations like cuBLAS. It is usual that when working with large matrices, these are almost "empty", or with a majority of zero values. This is known as sparse matrix, and this downgrades the runtime in two ways: lot of memory is wasted, and lot of operations are redundant.
Given the importance of linear algebra in many computational disciplines, sparse vector-matrix operations (SpMV) have been subject of study in parallel computing (and so, on GPUs). Today there exists many approaches to tackle this problem [41]. Let us focus on two formats to represent sparse matrices in a compressed way, assuming that threads access rows in parallel: • CSR format. Only non-null values are represented by using three arrays: row pointers, non-zero values, and columns (see Figure 1 for an example). First, the row-pointers array is accessed, which contains a position per row of the original matrix. Each position says the index where the row start in the non-zero values and columns arrays. The non-zero values and the columns arrays can be seen as a single array of pairs, since every entry has to be accessed at the same time. Once a row is indexed, then a loop over the values in that row has to be performed, so that the corresponding column is found, and therefore, the value. If the column is not present, then the value is assumed to be zero, since this data structure contains all non-zero values. The main advantage is that it is a full-compressed format if NumNonZeroValues · 2 > NumZeroValues, where NumNonZeroValues and NumZeroValues are the number of non-zero and zero values in the original matrix, respectively. However, the drawbacks is that the search of elements in the non-zero values and columns arrays is not coalesced when using parallelism per row. Moreover, since it is a full-compressed format, there is no room for modifying the values, such as introducing new non-zero values; • ELL format. This representation aims at increasing the memory coalescing access of threads in CUDA. This is achieved by using a matrix of pairs, containing a trimmed, transposed version of the original matrix (see Figure 2 for an example). Each column of the ELL matrix is devoted to each row of the matrix, even though the row is empty (all elements are zero). Every element is a pair, where the first position denotes the column and the second is the value, of only the non-zero elements in the corresponding row. However, the size of the matrix is fixed, so the number of columns equals the number of rows of the original matrix, but the number of rows is the maximum length of a row in terms of non-zero values; in other words, the maximum amount of non-zero elements in a row of the original matrix. Rows containing fewer elements pad the difference with null elements. The main advantage of this format is that threads always access the elements of all rows in coalesced way, and the null elements padded by short rows can be utilized to incorporate new data. However, there is a waste of memory, which is worst when the rows are unbalanced in terms of number of zeros.

Methods
SNP systems in the literature typically do not contain fully connected graphs. This means that the transition matrix gets very sparse and, therefore, both computing time and memory are wasted. However, further optimizations based on SpMV can be conveyed. In the following subsections we discuss some approaches. Of course, if the graph inherent to an SNP system leads to a compressed transition matrix, then a normal (sparse) format can be employed, because using compressed formats will increase the memory footprint.
In this work, we focus on the basic model of spiking neural P systems without delays as defined above, as well as three variants with dynamic network: budding, division and plasticity. The set of algorithms defined next are designed to take advantage of data parallelism, what is convenient for GPUs and vector co-processors. Their pseudocodes are detailed in Section 4 of this paper.
In Algorithm 2 we generalize the pseudocode disposed in Algorithm 1 to be able to handle both static and dynamic networks. This way, each variant requires to re-define just some functions from the ones defined for the static SNP system variant using vector and matrices without compression (i.e., sparse representation). In order to understand the algorithms, we will present in this section the main new data structures and their behavior. The detailed pseudocodes are available in Section 4.

Algorithm 2 MAIN PROCEDURE: simulating one computation for spiking neural P systems.
Require: An SNP system Π of degree (q, m), and a limit L of time steps. Ensure: A computation of the system 1: Calculate all possible spiking vectors 5: Pick one spiking vector randomly 6: Stop condition: maximum steps or no more applicable rules 9: return C 0 . . . C k−1 As a convention, those vectors and matrices using subindex k are dynamic and can change during the simulation time, while those with Π subindex are constructed at the beginning and are invariant. Capital letters refer to vectors and matrices, and small letters are scalar numbers. A total order over the rules defined in the system is assumed, which is denoted as R = q i=1 R i . For the sake of simplicity, we represent each rule r j ≡ E j /a c j → a p j , with 1 ≤ j ≤ m, as a tuple (i, E j , c j , p j ), where i is the subindex of the set R i where r j belongs (i.e., the neuron where it is contained). Specifically, forgetting rules just have p j = 0.
For static SNP systems using sparse representation, we use the following vectors and matrices: • Preconditions vector P Π is a vector storing the preconditions of the rules; that is, both the regular expression and the consumed spikes. Initially, Neuron-rule map vector N Π is a vector that maps each neuron index with its rules indexes. Specifically, N Π [i] is the index of the first rule in the neuron. Given that rules have been ordered in R as mentioned above, rules belonging to the same neuron have contiguous indexes. Thus, it is enough to store just the first index. In this sense, the first rule in neuron i is N Π [i] and the last one is N Π [i + 1] − 1. In other words, N Π contains q + 1 elements, and it is initialized as follows: If the variant has a dynamic network, the transition matrix needs to be modified. Therefore, we start with M 0 . The following algorithms show how they are constructed.
Algorithm 2 can be easily transformed to Algorithm 1 by defining the INIT and COMPUTE_NEXT functions as in Algorithm 3. They work exactly as already specified in Section 2.2; that is, using the usual vector-matrix multiplication operation to calculate the next configuration vector. We will also detail how the selection of spiking vectors can be done. This is defined in Algorithm 4, and it is based on previous ideas already presented in [21,22]. First, SPIKING_VECTORS function calculates the set of all possible spiking vectors by using a recursive function over neuron index i. It gathers all spiking vectors that can be generated for neurons i > i and then. If neuron i contains applicable rules, it populates a spiking vector for each of these rules, and from each of the generated spiking vectors form neurons i > i. Finally, neuron i propagates these spiking vectors to the next neuron i − 1.

Approach with ELL Format
Our first approach to compress the representation of the transition matrix, M Π , is to use the ELL format (see Figure 3 for an example). The reason for using ELL and not other compressed formats for sparse matrices (CSR, COO, BSR, ...) is to enable extensions for dynamic networks, as seen later. ELL can give some room for modifications without much memory re-allocations, while CSR requires us to modify the whole matrix to add new elements.
ELL format leads to the new compressed matrix M s Π . The following aspects have been taken into consideration:

•
The ELL format represents the transpose of the original matrix, so now rows correspond to neurons and columns to rules. This is convenient for SIMD processors such as GPUs.

•
The number of rows of M s Π equals the maximum amount of non-zero values in a row of M Π , denoted by z . It can be shown that z = z + 1, where z is the maximum output degree found in the neurons of the SNP system. Specifically, z = max{outdegree(i)|1 ≤ i ≤ q} (see definition in Section 2.1). z can be derived from the composition of the transition matrix, where row j devoted for rule r j ≡ (i j , E j , c j , p j ) contains the values +p j for every neuron i (columns) connected though an output synapse with the neuron where the rule belongs to (i.e., i ∈ pres(i j )), and a value −c j for consuming the spikes in the neuron the rule belongs to (i.e., i j ).

•
The values inside columns can be sorted, so that the consumption of spikes (−c values) are placed at the first row. In this way, if implemented in parallel, all threads can start by doing the same task: consuming spikes.

•
Every position of M s Π is a pair (as illustrated in Figure 3), where the first element is a neuron label, and the second is the number of spikes produced (+p).
A parallel code can be implemented with this design by assigning a thread to each rule, and so, one per column of the spiking vector S k and one per column of M s Π (rows of the original transition matrix). For the vector-matrix multiplication, it is enough to have a loop of z steps at maximum through the columns. In the loop of each column j, the corresponding value in the spiking vector S k [j] (either 0 or 1) is multiplied to the value x j in the pair (i j , x j ), and added to the neuron id n j in the configuration vector C k [n j ]. In case the SNP network contains hubs (nodes with high amount of input synapses), then we can opt for a parallel reduction per column. Since some threads might write to same positions in the configuration vector at the same time, a solution would be to use atomic adding operations, which are available on devices such as GPUs.
Initialize vectors only related to neurons. 3: Initialize matrices related to rules. 4: Create initial configuration 10: Get info of the neuron from Π 14: C 0 [i] ← n i Initial configuration 15: 16: end for 17: return (C 0 , N Π ) 18: end procedure 19: procedure INIT_RULE_MATRICES(Π) 20: (R, m, q, pres) ← Π Get information from Π 21: Create preconditions vector 22: Create transition matrix 23: for all r j ∈ R, j ← 1 . . . m do For each rule (column). This loop is parallelizable. 24: Get info of the rule 25: Store it in precondition vector 26: 27: for all i ∈ pres(i j ) do For each connected neuron to i j 28: Construct transition matrix 29: end for 30: end for 31: return (M Π , P Π ) 32: end procedure 33: procedure COMPUTE_NEXT(C k , M k , S K ) 34: Get some content of transition tuple 35: Only the configuration is updated. 36: end procedure Algorithm 4 Spiking vectors selection with static SNP systems and sparse representation.
Start calculating combinations from neuron 1 3: end procedure Get some content of transition tuple 7: if i > q k then If neuron i is out of index. 8: return ∅ An empty set. 9: else 10: The set for the rest of neurons. 11: All combinations for rest of neurons. 12: if SV = ∅ then No spiking vectors yet for rest of neurons 13: S ← EMPTY_VECTOR(m) Create an empty spiking vector. 14: If rule j is applicable 20: for all S ∈ SV do For each spiking vector, either SV or empty vector 21: S ← S Create a copy 22: Mark rule j as applicable 23: 24: end for 25: end if 26: end for 27: if SV = ∅ then If there are no applicable rules 28: return SV Just propagate combinations 29: else 30: return SV Return calculated combinations 31: end if 32: end if 33: end procedure 34: procedure APPLICABLE(i, j, C k , M k ) 35: Get some content of transition tuple 36: Preconditions of the rule 37: If rule j is applicable in neuron i 38: end procedure 39: procedure GET_ONE_RANDOMLY(SV k ) 40: return s -th spiking vector in SV k Returns just one randomly chosen 42: end procedure In order to use this representation in Algorithm 2, we only need to re-define functions INIT_RULE_MATRICES and COMPUTE_NEXT from Algorithm 3 (for sparse representation) as shown in Algorithm 5. The rest of functions remain unchanged.  (R, m, q, z , pres) ← Π Get information from Π 3: Create transition matrix 5: for all r j ∈ R, j ← 1 . . . m do For each rule (column). This loop is parallelizable. 6: Only for the first row 9: if p j > 0 then Only if r j is not a forgetting rule 10: k ← 2 k is our iterator, compacting the rows 11: for all i ∈ pres(i j ) do For each out synapse 12: Add out neuron and produced spikes 13: k ← k + 1 We know that k ≤ z 14: end for 15: end if 16: end for 17: return (M Π , P Π ) 18: end procedure 19: procedure COMPUTE_NEXT(C k , M k , S K ) 20: Create a copy of C k 21: (M s Π , _, _) ← M k Get some content of transition tuple 22: for j ← 1 . . . m do For each rule (column). This loop is parallelizable. 23: i ← 1 24: repeat 25: Get info from transition matrix 26: Until reaching an empty value or the maximum 29: end for 30: return (C k+1 , M k ) 31: end procedure

Optimized Approach for Static Networks
If, in general, more than one rule are associated to each neuron, many of the iterations in the main loop in COMPUTE_NEXT function are wasted. Indeed, if the loop is parallelized and each iteration is assigned to a thread, then many of them will be inactive (having a 0 in the spiking vector), causing performance drops such as branch divergence and non-coalesced memory access in GPUs. Moreover, note in Figure 3 that columns corresponding to rules belonging to the same neuron contain redundant information: the generation of spikes (+p) is replicated for all synapses.
Therefore, a more efficient compressed matrix representation can be obtained when maintaining the synapses separated from the rule information. This is called optimized matrix representation, and can be done with the following data structures: • Rule vector, Ru Π . By using a CSR-like format (see Figure 4 for an example), rules of the form E/a c → a p (also forgetting rules are included, assuming p = 0) can be represented by an array storing the values c and p in a pair. We can use the already defined neuron-rule map vector N k to relate the subset of rules associated to each neuron. • Synapse matrix, Sy Π . It is a transposed matrix as with ELL representation (to better fit to SIMD architectures such as GPU devices), but it has a column per neuron i and a row for every neuron j such that (i, j) ∈ Syn (there is a synapse). That is, every element of the matrix corresponds to a synapse (the neuron id) or a null value otherwise. Null values are employed for padding the columns, since the number of rows equals z (the maximum output degree in the neurons of the SNP system). See Figure 4 for an example. Note that we replace the transition matrix for a pair with rule vector and synapse matrix: M Π = (Ru Π , Sy Π ). In order to compute the next configuration, it is enough to loop over the neurons. Then, for each neuron i, we check which rule j is selected, according to the spiking vector at position S k [i]. This is used to grab the pair (c j , p j ) from the rule vector, and therefore consume c j spikes in the neuron i and add p j spikes in the neurons at the column i of the synapse matrix. The loop over the column can end prematurely if the out degree of neuron i is not z (that is, when encountering a null value). This operation can be easily parallelized by assigning a thread to each column of the synapse matrix (requiring q threads, one per neuron).
In order to use this optimized representation in Algorithm 2, we need to re-define the spiking selection function, since this vector works differently. To do this, it is enough to just modify two lines in the definition of COMBINATIONS function at Algorithm 4, in order to keep the spiking vector with size q and storing the rule id instead of just 1 or 0 (see Section 4 for more detail). Moreover, we need to define tailored INIT_RULE_MATRICES and COM-PUTE_NEXT functions as shown in Algorithm 6, replacing those from Algorithm 3.

Algorithm 6
Functions for static SNP systems with optimized compressed matrix representation. Ru Π ← EMPTY_VECTOR(m) 5: for all r j ∈ R(j ← 1 . . . m) do For each rule (column). This loop is parallelizable. 6: Store it in rule vector 9: end for 10: for i ← 1 . . . q do For each neuron (column in synapse matrix) 12: k ← 1 k is our iterator, compacting the rows 13: for all h ∈ pres(i) do For each out synapse 14: We know that k ≤ z 16: end for 17: end for 18: return (M Π , P Π ) 20: end procedure 21: procedure COMPUTE_NEXT(C k , M k , S k ) 22: Create a copy of C k 23: (M Π , _, _) ← M k Get some content of transition tuple 24: (Ru Π , Sy Π ) ← M Π

25:
for i ← 1 . . . q do For each neuron. This loop is parallelizable. 26: Index of rule to fire in the neuron 27: if j = 0 then Only if there is a rule. 28: Consume spikes in firing neuron 30: w ← 1 Next while stops if p j = 0, i.e., a firing rule 31: while Until an empty value or the maximum 32: h ← Sy Π [w, i] Get connected neuron by a synapse 33: Produce spikes in connected neuron 34: w ← w + 1 35: end while 36: end if 37: end for 38: return (C k+1 , M k ) 39: end procedure

Optimized Approach for Dynamic Networks
The optimized compressed matrix representation discussed in Section 3.2 can be further extended to support rules that modify the network, such as budding, division, or plasticity.

Budding and Division Rules
We start by analyzing how to simulate dynamic SNP systems with budding and division rules. They are supported at the same time in order to unify the pseudocode and also because both kind of rules are usually together in the model.
First of all, the synapse matrix has to be flexible enough to host new neurons. This can be accomplished by allocating a matrix large enough to populate new neurons (probably up to fill the whole memory available). We denote q max as the maximum amount of neurons that the simulator is able to support, and q k the amount of neurons in a given step k. The formula to calculate q max is in Section 5. It is important to point out that the simulator needs to differentiate between neuron label and neuron id [35]. The reason for this separation is that we can have more than one neuron (with different ids) with the same label (and hence, rules).
In order to achieve this separation, it is enough to have a vector to map each neuron id to its label. We will call this new vector, neuron-id map vector Q k , and the following holds at step k: q k = |Q k | = |C k | = |S k | ≤ q max . That is, neuron-id map vector, configuration vector and spiking vector have a size of q max as well. Once the label of a neuron is obtained, the information of its corresponding rules can be accessed as usual, like the neuron-rule map vector N Π . For simplicity, we attach the neuron-id map vector to the transition tuple. Moreover, the synapse matrix becomes dynamic, thus using k sub-index: Sy k ; hence, the transition matrix is also dynamic. Let us now introduce this new notation for transition tuple and transition matrix:

•
The transition matrix is now a dynamic pair: M k = (Ru Π , Sy k ).

•
The transition tuple is extended as follows: M k = (M k , Q k , P Π , N Π ).
We use the following encoding for each type of rule. Spiking and forgetting rules remain unchanged: Let i be the neuron id executing this rule.

2.
Allocate a column l to the synapse matrix Sy k for the new neuron, and use this index as its neuron id.

3.
Add an entry to the neuron-id map vector Q k at position l with the label l.

4.
Copy column i to the new column l in Sy k .

5.
Delete the content of column i and add only one element at the first row with the id l .
For a division rule [E] i → [] j ||[] l , the following operations have to be performed (see Figure 6 for an example): 1.
Let i be the neuron id executing this rule.

2.
Allocate a new column l for the created neuron l in the synapse matrix Sy k .

3.
Modify the neuron-id map vector Q k as follows: replace the value at position i for label j, and add a new entry for l to associate it with label k.

4.
Copy column i to l in Sy k (the generated neuron gets the out synapses of the parent).

5.
Find all occurrences of i in the synapse matrix, and add l to the columns where it is found. with values greater than 0 (i.e., with neuron id), and light cells are empty columns allocated in memory (a total of q max ). Neuron 1 is applying budding, and its content is copied to an empty column (5) and replaced by a single synapse to the created neuron. with values greater than 0 (i.e., with neuron id), and light cells are empty columns allocated in memory (a total of q max ). Neuron 1 is being divided, and its content is copied to an empty column (5). Columns 0 and 3 represent neurons with a synapse to the neuron being divided (1), so we need to update them as well with the synapse to the created neuron (5). Neuron 3 has reached its limit of maximum out degree, therefore we need to expand the matrix with a new row, or use a COO-like system to store these exceeded elements.
The last operation can be very expensive if the amount of neurons is large, since it requires to loop all over the synapse matrix. Moreover, when adding l in all the columns containing i , it would be possible to exceed the predetermined size z. For this situation, a special array of overflows is needed, like ELL + COO format for SpMV [41]. For simplicity, we will assume this situation is weird and the algorithm will allocate a new row for the synapse matrix.
Some functions in the pseudocode are re-defined to support dynamic networks with division and budding: • INIT functions as in Algorithm 7. They now take into account the initialization of structures at its maximum amount q max , including the new neuron-id map vector. • SPIKING_VECTORS function, as defined in Algorithm 4 and modified in Section 3.2 for optimized matrix representation, is slightly modified (just two lines) to support the neuron-id map vector. • APPLICABLE function as in Algorithm 8. This function, when dealing with division rules, has to search if there are existing synapses for the neurons involved. If they exist, the division rule does not apply. • COMPUTE_NEXT function as in Algorithm 9, to include the operations described above. It now needs to expand the synapse matrix Sy k either by columns (when new neurons are created) or by rows if there is a neuron from which we need to create a synapse to the new neuron and it has already the maximum out degree z. In this case, we need to re-allocate the synapse matrix in order to extend it by one row (this is written in the pseudocode with the function EXPAND_MATRIX). Finally, let us remark that we can easily detect if type of a rule r j at it's associated c j value: if 0, it is a budding rule, if it is positive number, a spiking rule, otherwise (negative value) it is a division rule.

Algorithm 7
Initialization functions for dynamic SNP systems with budding and division rules over optimized compressed matrix representation. (q, q max , σ 1 , . . . , σ q ) ← Π Get information from Π 9: C 0 ← EMPTY_VECTOR(q max ) Create initial configuration 10: (n i , R i ) ← σ i Get info of the neuron from Π

Plasticity Rules
For dynamic SNP systems with plasticity rules, the synapse matrix can be allocated in advance to the exact size q, since no new neurons are created. Thus, there is no need of using a neuron-id map vector as before. However, enough rows (value z) in the synapse matrix have to be pre-established to support the maximum amount of synapses. Fortunately, this can be pre-computed by looking to the initial out degrees of the neurons and the size of the neuron sets in the plasticity rules adding synapses. We encode a plasticity rule r j ≡ E j /a c j → α j k j (i j , N j ), with α j = +/ − / ± /∓ as follows: r j = (i j , E j , c j , α j , k j , N j ). Next, we define the value of z p for SNP systems with plasticity rules: In other words, z p is the maximum out degree (z) that a neuron can have initially plus those new connections that can be created with plasticity rules inside that neuron. This result can be refined for plasticity rules having α ∈ {±, ∓}, because we know up to k new synapses can be created at a time. However, for simplicity, we will use the formula above. Algorithm 9 Compute next function for dynamic SNP systems with budding and division rules using optimized compressed matrix representation. 1: procedure COMPUTE_NEXT(C k , M k , S k ) 2: Create a copy of C k 3: (M k , Q k , _, _) ← M k Extract info from transition tuple 4: (Ru Π , Sy k ) ← M k Extract info from transition matrix 5: q k ← |C k | Get current amount of neurons. 6: for i ← 1 . . . q k do For each neuron. This loop is parallelizable. 7: Index of rule to fire in the neuron 8: if j = 0 then Only if there is a rule. 9: Get rule info 10: if c j > 0 then Execution of a spiking or forgetting rule 11: Consume spikes in firing neuron 12: w ← 1 Next while stops if p j = 0, i.e., a firing rule 13: while Until an empty value 14: h ← Sy k [w, i] Get connected neuron by a synapse 15: Produce spikes in connected neuron 16: w ← w + 1 17: end while 18: else if c j = 0 then Execution of a budding rule 19: (h j , l j ) ← (c j , −p j ) Get new neurons labels 32: q k ← q k + 1 Increment counter of neurons 33: C k+1 [i] ← 0 Empty the neuron i, neuron q k is 0 already 34: The new label of the neuron 35: w ← w + 1 45: end while 46: if b then If synapse was found, add new neuron at the end of the column 47: if w = z then 48: The neuron x has a larger out degree than z return (C k+1 , M k ) 58: end procedure First, we need to represent plasticity rules into vectors. We assume that np is the total amount of plasticity rules in the system, and that there is a total order between these rules. Given a plasticity rule, we can initialize the neuron-map and the precondition vector as with spiking rules. But in this case, we need a couple of new vectors and modify existing ones in order to represent all plasticity rules r j = (i j , E j , c j , α j , k j , N j ), with j ∈ {1 . . . np} (following the imposed total order): • Rule vector Ru Π stores the following pair for a plasticity rule r j : (c j , −j), that is, the consumed spikes c j and the unique index of the plasticity rule j. This index is used to access the following vector, and it is stored as a negative value in order to detect that this is a plasticity rule. • Plasticity rule vector Pr Π , of size np, contains a tuple for each plasticity rule r j of the form (α j , k j , ni j , ne j ). The values ni j and ne j are used as indexes, from ni j (start) to ne j (end) for the following vector. • Plasticity neuron vector Pn Π , of size npr = ∑ j=np j=1 |N j |, represents all neuron sets of plasticity rules. Thus, the elements of N j are stored, in an ordered way, between Time vector T k is used to prevent neurons from applying rules during one step if the plasticity rule applied was of the type α j ∈ {±, ∓}. It contains binary (0 or 1) values.

•
Transition matrix is therefore M k = (Ru Π , Sy k , T k , Pr Π , Pn Π ). Note that the Synapse matrix can be modified at each step, so we use sub-index k.
The following operations have to be performed to reproduce the behavior plasticity rules (see Figure 7 for an illustration):

1.
For each column in the synapse matrix executing a plasticity rule deleting x synapses: (a) If the intersection of the rule's neuron set and the current synapses in Sy k is larger than x, then randomly select x synapses.
Loop through the rows (up to z p iterations) to search the selected neurons and set them to null. Given that holes might appear in the column, its values can be sorted (or compacted).

2.
For each column in the synapse matrix executing a plasticity rule adding x synapses: (a) If the difference of the rule's neuron set and the current synapses in Sy k is larger than x, then randomly select x neurons.
Loop through the rows (up to z p iterations) to insert the selected new synapses while keeping the order.
Checking the applicability of plasticity rules is much simpler than for division rules, given that the preconditions only affect to the local neuron and we do not need to know if there are existing synapses. However, for a plasticity rule r in a neuron i, and in order to create new or delete existing synapses, we need to check which neurons declared in r are already in the column i in the synapse matrix. This search can be O(z p · n r ), being n r the length of the neuron set in r. Nevertheless, by maintaining always the order in the column, this search can be done easily in O(z p + n r ).
Given that it is not usual to have budding and division rules together with plasticity rules, the pseudocode is based on the optimized matrix representation for static SNP systems (and not for division and budding) in Section 3.2. Algorithm 10 shows the redefinition of INIT_RULE_MATRICES and COMPUTE_NEXT functions, replacing those from Algorithm 3. For COMPUTE_NEXT, the implementation is very similar to the original one, but it just call to a new function, PLASTICITY, which actually modify the synapses of the neuron (by just modifying its corresponding column in the synapse matrix). This function and its auxiliaries are defined in Algorithms 11 and 12, respectively.

Algorithms
In this section we define the algorithms implementing the methods described in Section 3.
Let us first define a generic function to create a new, empty (all values to 0) vector of size s as follows: EMPTY_VECTOR(s). In order to create an empty matrix with f rows and c columns, we will use the following function: EMPTY_MATRIX( f , c). Next, the pseudocodes for simulating static SNP systems with sparse representation are given. Algorithm 3 shows the INIT and COMPUTE_NEXT functions, while Algorithm 4 shows the selection of spiking vectors.
For ELL-based matrix representation for static SNP systems, we need to re-define only two functions (INIT_RULE_MATRICES and COMPUTE_NEXT) from Algorithm 3 (static SNP systems with sparse representation) as shown in Algorithm 5.
For our optimized matrix representation for static SNP systems, we need to redefine only two functions (INIT_RULE_MATRICES and COMPUTE_NEXT) from Algorithm 3 (static SNP systems with sparse representation) as shown in Algorithm 6. Moreover, the following two lines in the definition of COMBINATIONS function at Algorithm 4 are required, in order to support a spiking vector of size q: (R, m, q, z p , np, npr, pres) ← Π Get information from Π, z p is different for plasticity. 3: Create preconditions vector 4: Ru Π ← EMPTY_VECTOR(m) Create rule vector 5: Pr Π ← EMPTY_VECTOR(np) Create plasticity rule vector 6: Pn Π ← EMPTY_VECTOR(npr) Create plasticity neuron vector 7: pk ← 1 Counter for plasticity rule vector 8: ni ← 1 Counter for plasticity neuron vector 9: for all r j ∈ R, j ← 1 . . . m do For each rule (column). This loop is parallelizable. 10: if r j = (i j , E j , c j , p j ) then If spiking rule 11: Store it in precondition vector 12: Ru Π [j] ← (c j , p j ) Store it in rule vector 13: else if r j = (i j , E j , c j , α j , k j , N j ) then If plasticity rule 14: Store it in precondition vector 15: Ru Π [j] ← (c j , −pk) Store it in rule vector 16: Pr Π [pk] ← (α j , k j , ni, ni + |N j |) Store it in rule vector 17: Pn Π [ni] ← SORT(N j ) Sort and store N j in Pn after position ni 18: pk ← pk + 1 19: end if 21: end for 22: T 0 ← EMPTY_VECTOR(q) Create time vector 23: Sy 0 ← EMPTY_MATRIX(z p , q) Create synapse matrix 24: for all i ← 1 . . . q do For each neuron (column in synapse matrix) 25: k ← 1 k is our iterator, compacting the rows 26: for all h ∈ pres(i) do For each out synapse 27: We know that k ≤ z p 29: end for 30: end for 31: M 0 ← (Ru Π , Sy 0 , T 0 , Pr Π , Pn Π ) New transition matrix 32: return (M 0 , P Π ) 33: end procedure 34: procedure COMPUTE_NEXT(C k , M k , S k ) 35: Create a copy of C k 36: (M k , P Π , N Π ) ← M k Get some content of transition tuple 37: (Ru Π , Sy k , Pr Π , Pn Π ) ← M k 38: for i ← 1 . . . q do For each neuron. This loop is parallelizable. 39: Index of rule to fire in the neuron 40: if Only if there is a rule or blocked neuron 41: Consume spikes in firing neuron 43: if p j > 0 then If a spiking rule 44: w ← 1 45: while For dynamic SNP systems with plasticity rules, the pseudocode is based on the optimized matrix representation for static SNP systems (and not for division and budding) in Section 3.2. Algorithm 10 shows the re-definition of INIT_RULE_MATRICES and COMPUTE_NEXT functions, replacing those from Algorithm 3. As for line 17, we assume that the function SORT exists, which takes a set of neurons, sorts them by id, and generates a vector. Moreover, we can copy vectors directly from one position by just one assignation. The new PLASTICITY function is defined in Algorithm 11, and its auxiliaries are defined in Algorithm 12.

Algorithm 11
Function for plasticity mechanism using optimized compressed matrix representation.
Get info of plasticity rule 3: npr j ← ne j − ni j Number of neurons in the neuron set 4: if α j = − then Delete synapses 10: else if α j = + then Add synapses 12: else if α j = ± then Add and delete synapses 14: 16: t ← 1 17: else if α j = ∓ then Delete and add synapses 18: 19: 20: t ← 1 21: end if 22: return (A i , t) 23: end procedure In order to keep Algorithm 12 simple, we assume that the functions INTERSEC, DIFF, and DELETE_RANDOM are already defined. As mentioned above, INTERSEC and DIFF can be implemented with algorithms of complexity O(z p + n r ), given that the vectors (a column of synapse matrix and a chunk of plasticity neuron vector) are already sorted. We also assume that DELETE_RANDOM is a function that randomly select k elements from a total of n while keeping the order between elements. This can be done with an algorithm of complexity O(k 2 ).

Results
In this section we conduct a complexity analysis (for both time and memory) of the algorithms. In order to define the formulas, we need to introduce a set of descriptors for a spiking neural P system Π. These are described in Table 1. Moreover, Table 2 summarizes the vectors and matrices employed by each representation, and their corresponding sizes defined according to the descriptors. We use the following short names for the representations: Sparse (original sparse representation as Section 3), ELL (ELL compressed representation as in Section 3.1), optimized static (optimized static compressed representation as in Section 3.2), division and budding (optimized dynamic compressed representation for division and budding as in Section 3.3.1), and plasticity (optimized dynamic compressed representation for plasticity as in Section 3.3.2).

Algorithm 12
Auxiliary functions for plasticity mechanism using optimized compressed matrix representation.
According to Table 2, we can limit the value of q max for dynamic SNP systems with division and budding with the following formula: q max = MT−4m+2q+1 z+2 , where MT is the maximum amount of memory in the system (measured in the word size employed to encode the elements of all the matrices and vectors; e.g., 4 Bytes). Moreover, we can infer when the matrix representation will be smaller for static SNP systems: ELL is better than sparse when z < q−2 2 ; optimized is better than ELL when z > q−m 2m−q ; optimized is better than sparse when z < m − 2. In other words, our optimized compressed representation is worth when the the maximum out degree of the neurons is less than the total number of rules minus 2.
For dynamic SNP systems, given can say that a solution to a problem using an SNP with plasticity rules is better than a solution based on division and budding, if ; in other words, if we can know the maximum amount of neurons to generate, and this number is greater than a formula based on number of initial neurons, number of rules, and number of elements in the neuron set and the max out degree, then the solution will need less memory using plasticity.
Finally, Table 3 shows the order of complexity of each function as defined for each representation. We can see that COMPUTE_NEXT gets reduced in complexity as well when using optimized static representation against ELL and sparse, given that we expect that m ≤ q, and also z < m − 2. However, we can see that implementing division and budding explodes the complexity of the algorithms, since they need to loop over all the neurons checking for in-synapses. This also depends on the total amount of generated neurons in a given step. This is also the case for the generation of spiking vectors, because the applicability function also needs to loop over all existing neurons. However, for dynamic networks, plasticity keeps the complexity with the amount of neurons, the value z, and the descriptors of plasticity rules (max value of k and amount of neurons in a neuron set n p ). Table 3. Algorithmic order of complexity of the main functions employed in the simulation loop (i.e., excluding init functions) for each representation.

Sparse ELL Optimized Static Division & Budding Plasticity
Therefore, we can see that using our compressed representations, both the memory footprint of the simulators and their complexity are reduced, as long as the maximum out degree of neurons is a low number. Furthermore, we can see that for dynamic networks, plasticity is an option that keeps the complexity balanced, since we know in advance the amount of neurons and synapses.
Let us make an easy example of comparison with an example from the literature. For example, if we take the SNP system for sorting natural numbers as defined in [42], then we have that q = 3n, m = n + n 2 and z = n, where n is the amount of natural numbers to sort. Thus: • The size of the sparse representation is 3n 3 + 6n 2 + 5n + 1 and the complexity of The size of the ELL representation is 2n 3 + 7n 2 + 11n + 1 and the complexity of The size of the optimized representation is 7n 2 + 13n + 1 and the complexity of COMPUTE_NEXT is O(n 2 ). The optimized representation drastically decreases the order of complexity and amount of memory spent for the algorithms, going from orders of n 3 to n 2 . ELL has a similar order of complexity to that of sparse, but the amount of memory is just a bit decreased. Figure 8 shows that the reduction of the memory footprint achieved with the compressed representations takes effect after n > 3. Figure 9 shows that the optimized representation scales better than ELL and sparse. ELL is only a bit better than the sparse representation, demonstrating the need for using the optimized one, which significantly scales much better.
Finally, we also analyze a uniform solution to 3SAT with SNP systems without delays as in [43] (Figure 10). We can see that q = 8n 3 + 3n + 3, m = 64n 3 + 6n + 3 and z = 8n 3 , where n is the amount of variables in the 3SAT instance. We can see that z < m − 3, so our optimized implementation will be able to save some memory. Therefore:

•
The size of the ELL representation is 1024n 6 + 96n 4 + 384n 3 + 36n + 22 and the complexity of COMPUTE_NEXT is O(n 6 ). • The size of the optimized representation is 64n 6 + 24n 4 + 304n 3 + 33n + 22 and the complexity of COMPUTE_NEXT is O(n 6 ).  We can see that the memory footprint is decreased but it is still of the same order of magnitude (O(n 6 )), and the same happens with the computing complexity. Thus, our representation helps to reduce memory, although not significantly for this specific solution. This is mainly due to having a high value of z. We can see in Figure 10 how the reduction of memory takes place only for optimized representation as long as n increases. It is interesting to see that the ELL representation is even worse than just using sparse representation.
Finally, let us analyze the size of the solution uniform solution to subset sum with plasticity rules in [34]. The descriptors for the matrix representation of a dynamic SNP system with plasticity rules are the following: q = 4n + 9, m = 5n + 11, npr = 2n, z p = 2, where n is the number of sets V, therefore, the memory footprint is described as: 66n + 143. If we were using a sparse representation where the transition matrix is of order m · q, then the amount of memory is of order O(n 2 ).

Conclusions
In this paper, we addressed the problem of having very sparse matrices in the matrix representation of SNP systems. Usually, the graph defined for an SNP system is not fully connected, leading to sparse matrices. This drastically downgrades the performance of the simulators. However, sparse matrices are a known issue in other disciplines, and efficient representations have been introduced in the literature. There are even solutions tailored for parallel architectures such as GPUs.
We propose two efficient compressed representations for SNP systems, one based on the classic format ELL, and an optimized one based on a combination of CSR and ELL. This representation gives room to support rules for dynamic networks: division, budding, and plasticity. The representation for plasticity poses more advantages than the one for division and budding, since the synapse matrix size can be pre-computed. Thus, no label mapping nor empty columns to host new neurons are required. Moreover, simulating the creation of new neurons in parallel can damage the performance of the simulator significantly, because this operation can be sequential. Plasticity rules do not create new neurons, so this is avoided.
As future work, we plan to provide implementations of these designs within cuSNP [21] and P-Lingua [44] frameworks to provide high performance simulations with real examples from the literature. We believe that these concepts will help to bring efficient tools to simulate SNP systems on GPUs, enabling the simulation of large networks in parallel. Specifically, we will use these designs to develop a new framework for automatically designing SNP systems using genetic algorithms [45]. Another tool that could benefit from the inclusion of this new type of representation are visual tools for SNP systems [46]. Moreover, our optimized designs will enable the effective usage of spiking neural P systems on industrial processes such as [47][48][49][50], and to optimization applications as [51,52]. SNP systems have been used in many applications [5], and in order to be used in industrial applications we need efficient simulators where compressed representations of sparse matrices can help.
Numerical SNP systems (or NSNP systems) [17,53] are SNP system variants which are largely dissimilar to many variants of SNP systems, especially to the variants considered in this paper, for at least two main reasons: (1) rules in NSNP systems do not use regular expressions, and instead use linear functions, so that rules are applied when certain values or threshold of the variables in such functions are satisfied, and (2) the variables in the functions are real-valued, unlike the natural numbers associated with strings and regular expressions. One of the main goals in [17] for introducing NSNP systems is to create an SNP system variant, which in a future work may be more feasible for use with training algorithms in traditional neural networks [53]. For these reasons, we plan to extend our algorithms and compressed data structures for NSNP systems. We think that simulators for this variant can be effectively accelerated on GPUs. Specifically, GPUs are devices designed for floating point operations and not for integer arithmetic, although the latter is supported.
We also plan to include more models and ingredients into these new methods, such as delays, weights, dendrites, rules on synapses, and scheduled synapses, among others. Moreover, a recent work in SNP systems with plasticity shows that having the same set of rules in all neurons leads to Turing complete algorithms [54]. This means that m descriptor can be common to all neurons, leading to smaller representations for this kind of systems. We plan to study this deeper and combine it with our representations. Our aim on focusing on plasticity is also related to other results involving this ingredient in other fields such as machine learning [55].