Comparison of Bagging and Sparcity Methods for Connectivity Reduction in Spiking Neural Networks with Memristive Plasticity

: Developing a spiking neural network architecture that could prospectively be trained on energy-efficient neuromorphic hardware to solve various data analysis tasks requires satisfying the limitations of prospective analog or digital hardware, i.e., local learning and limited numbers of connections, respectively. In this work, we compare two methods of connectivity reduction that are applicable to spiking networks with local plasticity; instead of a large fully-connected network (which is used as the baseline for comparison), we employ either an ensemble of independent small networks or a network with probabilistic sparse connectivity. We evaluate both of these methods with a three-layer spiking neural network, which are applied to handwritten and spoken digit classification tasks using two memristive plasticity models and the classical spike time-dependent plasticity (STDP) rule. Both methods achieve an F1-score of 0.93–0.95 on the handwritten digits recognition task and 0.85–0.93 on the spoken digits recognition task. Applying a combination of both methods made it possible to obtain highly accurate models while reducing the number of connections by more than three times compared to the basic model.


Introduction
Neural network-based intelligent systems are widely employed in a wide range of tasks, from natural language processing to computer vision and signal processing.In edge computing, however, the use of deep learning methods still poses a variety of challenges, including latency and power consumption constraints, both during training and inference.
Neuromorphic computing devices, in which the information is encoded and processed in the form of binary events called spikes, offer a prospective solution to these problems.Modern neuroprocessors, e.g., TrueNorth [1], Loihi [2], or Altai https://motivnt.ru/neurochip-altai (accessed 8 February 2024), have been shown to achieve power consumption at the order of milliwatts [3].Thus, these devices offer a powerful inference interface, and they can be used to deploy spiking neural networks (SNNs), thereby allowing both inference and training directly on the neurochip.This can be extremely useful for edge computing applications by reducing power consumption and latency.
In turn, memristor-based training poses its own unique set of challenges and limitations.The most prominent one arises from the hardware implementation of synapses, where every neuron can have only a limited amount of synapses [4][5][6], thus imposing limitations on the number of weights that a given network may have.In this regard, sparsely connected spiking networks, where the connectivity can be reduced depending on the hardware specifications, are a plausible solution.
In this study, we compared two methods of reducing connectivity in memristive spiking neural networks: a bagging ensemble of spiking neural networks and a probabilistic sparse SNN.Using a three-layer SNN with inhibitory and excitatory synapses, we solved the handwritten and spoken digits classification tasks, as well as compared the outcomes for the proposed connectivity reduction types and three plasticity models.The main contributions of this work are as follows: - We design a probabilistic sparse connectivity approach to creating a two-layer spiking neural network (achieved by implementing a bagging ensemble of two-layer SNNs) and then compare these two methods.- We propose an efficiency index that facilitates comparisons between different methods of connectivity reduction, and we will look to apply it to the SNNs used in the study.
We demonstrate that both connectivity reduction methods achieve competitive results on handwritten and spoken digit classification tasks, and that it can be used with the memristive plasticity models.- We show that the model that uses both connectivity reduction techniques simultaneously outperforms both methods in terms of the accuracy-per-connection efficiency metric.
The rest of the study is structured as follows: In Section 2, we provide a brief overview of the existing connectivity reduction methods for SNNs.In Section 3, we describe the datasets we use, the plasticity models, the base spiking neural structure, and the sparsity methods (which we utilized for comparison).In Section 4, we provide the accuracy estimations for the proposed approaches and discuss the obtained results in Section 5. Finally, we detail our conclusions in Section 6.

Literature Review
Connectivity reduction concerning spiking and artificial neural networks has been studied in several existing works.
A number of works have proposed to use a probabilistic coefficient to form connections between neurons in a spiking neural network.For example, the work of [7] used a network that consists of three layers of neurons.The first layer is responsible for encoding the input samples (images) into Poisson spike sequences.The second layer consists of 4000 excitatory and 1000 inhibitory leaky integrate and fire neurons.The output layer also consists of LIF neurons, the number of which corresponds to the number of classes in the selected dataset (EMNIST [8], YALE [9], or ORL [10]).The connections within the second layer are formed in accordance with the selected probability of 0.2.In this case, the weights of the synapses change depending on the spatial location of neurons and the dynamics of spike activity.The connections between the encoding layer and the second layer of neurons are excitatory without exhibiting plasticity.Probabilistic linking and changing weights using spatial location can achieve high classification accuracy on the image datasets.
Another approach to connectivity reduction that is present in the literature is based on designing locally connected SNNs, the weights in which are created in a sparse fashion according to a certain rule.In [11], for example, a routing scheme that used a hybrid of shortrange direct connectivity and an address event representation network was developed.Without providing any benchmark results, the authors focused on the details of mapping a given SNN to the proposed architecture, and they showed that it yielded up to a 90% reduction in connectivity.The authors of the [12] study proposed a way through which to reduce connectivity in a three-layer network operating on the Winner Takes All principle.The input image, encoded by the first layer using frequency coding, was divided into small fragments that were then sent to individual neurons in the excitation layer.This made it possible to reduce the number of connections with local plasticity by up to 50% while maintaining the accuracy on the MNIST dataset at approximately 90%.The authors of [13] proposed a joined connectivity and weight learning approach inspired by the synapse elimination and synaptogenesis in biological neurons.The gradient in this work was redefined as an additional synaptic parameter, thereby facilitating a better adaptation of the network topology.A multilayer convolutional SNN trained using a surrogate gradient approach and pruned according to the designed method demonstrated an accuracy of about 89 and 92%, with less than 5% connectivity in the whole SNN for the MNIST and CIFAR datasets, respectively.Increasing the proportion of connectivity to 40% improved the quality of solving classification problems to 98.9 and 92.8%.The possibility of a sharp decrease in the number of connections can be caused by a large number of layers and neurons in the original network.Overall, this work demonstrates the fundamental applicability of the approaches used in classical machine learning for spiking neural networks.
In [14], a two-stage pruning method for on-chip SNNs was developed.The pruning was first performed during training based on weight update history and spike timing.After training, it was then via weight thresholding.By training a deep SNN with time-tofirst-spike coding using the proposed approach, the authors decreased latency by a factor of 2 and reduced the network connectivity by 92% without accuracy loss.Another example can be found in [15], where the authors used a method of zeroing weights above a given threshold and achieved a 70% reduction in connectivity.In this paper, the network consists of a mixture of formal and spiking convolutional layers, and the resulting sparse hybrid network achieved more than 71% accuracy on the IVS 3cls [16] dataset.In [17], sparsity in a multilayer convolutional spiking network is achieved by limiting the number of connections associated with each neuron, and this is based on calculating the contribution of a neuron to the operation of the entire network.The proposed approach is shown to achieve high accuracy on such classical datasets as DVS-Gesture [18] (98%), MNIST (99%), CIFAR-10 (94%), and N-MNIST [19] (99%), with a 50% reduction in the number of connections.Finally, in [20], a sparse SNN topology was proposed, where the connectivity reduction was performed via a combination of pruning and quantization based on the power law weight-dependent plasticity model.Connectivity reduction was performed based on a threshold value at which the weights become zero.After training, the three-layer, fully connected SNN designed in the study achieved a classification accuracy of 92% on the MNIST dataset.
Thus, currently employed methods of reducing the connectivity in spiking neural networks are mostly encompassed by pruning, quantization, and local connectivity.However, ensemble learning, where multiple smaller networks are used together to form a stronger classifier, can be also viewed as a single sparse network.In this work, we explored this path to connectivity reduction and compared it to a probabilistic, locally connected SNN topology that was proposed in the work of [21], and which was investigated in our previous research with different types of plasticity models [22][23][24].

Datasets and Preprocessing
To train and evaluate the proposed methods, we used two benchmark datasets: the scikit-learn Digits (Digits) [25] and Free Spoken Digits Dataset (FSDD) [26].The first consists of 1797 8 × 8 images of handwritten numbers, and the second contains 3000 audio recordings of spoken numbers from 0 to 9 in English.The choice of these datasets over larger and more widely used classification datasets such as MNIST, CIFAR-10, or N-MNIST was motivated by computational requirements.The training process for spiking neural networks when using local plasticity rules requires extensive computational experiments to select the combination of hyperparameters.We automated this process (see Section 4 for details), thereby placing a limit on the time required to train the network.The Digits dataset was quite difficult in comparison to MNIST due to containing less information about the handwritten digits in terms of image size and dataset volume (see the examples in Figure 1).Both datasets had 10 classes, which could be broken down as 180 samples per class for Digits and 300 samples per class for FSDD.Additionally, the samples in FSDD varied by speaker as follows: 6 speakers in total and 50 recordings for each digit per speaker with different intonations.
The raw data were preprocessed as follows: (1) Feature representation: For Digits, their original vector representation in the form of pixel intensities was used without changes; for FSDD, a vector representing an audio sample was obtained by splitting the audio into frames, which was achieved by extracting 30 Mel-frequency cepstral coefficients [27] (MFCCs) and then averaging them across frames.(2) Normalization: Depending on the type of plasticity, the input vectors were normalized either by reducing to a zero mean and one standard deviation (standard scaling) or by L2 normalization.(3) Gaussian Receptive Fields (GRFs): This step was intended to increase the separability of the data by transforming it into a space of higher dimension.At this stage, the range of each of the normalized feature vectors was divided into M equal intervals.At each interval j = 1, . . ., M, a Gaussian peak was constructed with a center µ j and standard deviation σ (see Equation (1), Figure 2).The value of each component x i of the input vector was replaced by a set of values G j (x), which characterized the proximity of x i to the center of the j-th receptive field.Thus, the dimension of the input vector increased by the factor of M. (4) Spike encoding: To convert the normalized and GRF-processed input vectors into spike sequences, we used a frequency-based approach.With this encoding method, each input neuron (spike generator) emits spikes at a frequency ν during the entire sample time t e , where ν = ν max • k.Here, ν max is the maximum frequency of spike emission and k is the value of the input vector component.After time t e has passed, the generators do not emit spikes for t p = 50 ms to allow for the neuron potentials to return to their original values.
To illustrate the relative complexity of both datasets, we performed dimensionality reduction using principal component analysis (PCA) on both datasets after the feature engineering stage.This method allows us to reduce the feature space to two dimensions and visually assess the degree of nonlinear separability of the samples.Its results are shown in Figure 3.
It can be clearly seen that the classification boundaries for the MFCC-encoded FSDD dataset have to be much more complex in order to achieve high accuracy.In other words, in this work, the handwritten digits dataset acts as a weak baseline, and it was used to assess the general capability of the sparsity methods under consideration, while the spoken digits dataset played the role of a more challenging benchmark.

Synaptic Plasticity Models
In this work, we considered two memristive plasticity models: nanocomposite (NC) [28] and poly-p-xylylene (PPX) [29].These models were proposed to approximate the real-world dependence of synaptic conductance change ∆w on the value of the conductance w and on the time difference ∆t between presynaptic and postsynaptic spikes, and they are defined in Equations ( 2) and (3).
Both memristive plasticity rules are shown in Figure 4.It can be seen that the models differ both in their dependence on the initial weight and in their spread along the ∆t axis: NC plasticity is localized within the [−25, 25] ms range and is relatively symmetric, while the PPX plasticity covers a much larger ∆t range and exhibits asymmetric behavior depending on the initial weight and the sign of ∆t.Due to these differences, the training process differed for both rules, thus facilitating a broader study of the capabilities and limitations of the proposed methods.Additionally, we considered a classical additive spike timing-dependent plasticity (STDP) [30] model to study the impact of sparcity on the memristor-based network in relation to the simpler synapse models.

Spiking Classification Models
Within the framework of the frequency approach to encoding input data, we considered a hybrid architecture consisting of a three-layer Winner Takes All (WTA) network [21], and this serves as a feature extraction module in combination with a formal classifier.
The WTA network is based on three layers (see Figure 5).The input layer consists of spike generators that convert input vectors into spike sequences according to the algorithm described above.The size of the input layer corresponds to the size of the input vector after preprocessing steps.The generated spike sequences are transmitted to the layer of leaky integrate-and-fire (LIF) neurons with an adaptive threshold (excitatory layer).This layer is connected to the input via trainable weights with one of the previously described plasticities according to the "all-to-all" rule.The number of neurons in the excitatory layer can be optimized depending on the complexity of the problem being solved.In turn, the excitatory layer is connected to the third layer of non-adaptive LIF neurons of the same size, which is called the inhibitory layer.Connections from the excitatory to the inhibitory layer are not trainable and have a fixed positive weight w syn, exc > 0. In this case, each neuron in the excitatory layer is connected to a single neuron (partner) in the inhibitory layer.The connections directed from the inhibitory layer to the excitatory layer are called inhibitory connections.These connections are static and have a weight w syn, inh < 0. Each neuron in the inhibitory layer is connected to all neurons in the excitatory layer except to its partner.Finally, the generators in the input layer are also connected to inhibitory neurons by randomly distributed static links with a weight w syn, gen > 0. In all our experiments, the number of such connections was equal to 10% of the number of connections between the input and excitatory layers.The spiking neural network was implemented using the NEST simulator [31].We chose logistic regression (LGR), which was optimized for multi-class problems, and we used the one-versus-all (OVR) scheme as the formal classifier [32].
In this work, we considered two methods for reducing the connectivity in the WTA network: an ensemble of several classifiers trained using the bagging technique and sparse connectivity between layers.

Classification Ensemble
The bagging method was chosen as an ensemble creation technique; several identical classifiers were trained on subsets of input data, after which their predictions were then aggregated by voting.This method has several advantages compared to using a single larger network; in particular, it reduces the total number of connections within the network and increases the classification speed due to parallelization.In addition, it allows you to break unwanted correlations in the training dataset, thus resulting in improved architecture stability.
Connectivity within an ensemble is controlled using the following parameters: n estimators : Defines the number of models within the ensemble.-max_features: Determines the proportion of input features that are passed to the input of each of the models in the ensemble.
In addition, the bagging architecture allows one to regulate the number of examples on which each network is trained using the max_samples parameter.
The ensemble was implemented using the BaggingClassifier method of the Scikit-Learn [25] library.In all experiments, which were based on the preliminary empirical observations, the parameters max_ f eatures = 1.0 and max_samples = 0.7 were fixed.

Sparse Connectivity
Another way through which to reduce the connectivity of a spike network is to set a rule that allows you to regulate the number of connections and their organization.To this end, we formally placed the neurons of the excitatory and inhibitory layers on two-dimensional square grids of a 1 mm × 1 mm size (the dimension was chosen for the convenience of further presentation), and these were oriented mirror-image relative to each other.The neurons on the grids may be arranged irregularly.The initialization of sparse connections occurred according to the following algorithm: (1) The presynaptic neuron projects onto the plane of the postsynaptic neurons.
(2) The projection of the presynaptic neuron becomes the center of a circular neighborhood, and all postsynaptic neurons within which will be connected to this presynaptic neuron with some probability.
This process is shown schematically in Figure 6.Thus, connectivity within the network is regulated using two parameters: -Probability P of connection formation between pre-and post-synaptic neurons.- The radius of the circular neighborhood is R.This parameter is defined only for connections between the inhibitory and excitatory layers since neurons in the input layer do not have a spatial structure.
Both methods, as well as their combination, are expressed in a generalized form in Algorithm 1.This algorithm can be used for both proposed methods, as well as their combination, since the behavior of the network is determined by the parameters n estimators , P gen-exc , P inh-exc , and R inh-exc .In the case of a sparse WTA network, an ensemble consists of one network (n estimators = 1); furthermore, in the case of a bagging ensemble of fully-connected WTA networks, randomized sparse connectivity is not applied (P gen-exc = P inh-exc = R inh-exc = None).A combined approach, therefore, requires specifying all four parameters.The generalized decoding procedure using a logistic regression approach is detailed in Algorithm 2.

Algorithm 1 SNN learning process
Input: training data matrix X train with dimensions (M, N) of the preprocessed input vectors x i of each sample in the dataset, neuron parameters, plasticity parameters, and synapse parameters Training parameters: epochs = 1.Sparsity parameters: n estimators , max_features, max_samples, n neurons , P gen-exc , P inh-exc , and R inh-exc Network parameters: Table 1 Output data: An ensemble containing n estimators spiking neural networks, which are vectors of neuron activity frequencies in the excitatory layer for each example of the training set v i .Simulating all SNNs in the ensemble that are bound to x i during t e time steps using spike sequences array x seq i .

15:
Simulating SNN without input signal during t p time steps for membrane potential to relax to initial value.

Algorithm 2 SNN decoding process
Input: a collection F containing output frequencies of the excitatory layers of SNNs in the ensemble corresponding to each sample in the dataset, where each element f i contains K i frequency vectors obtained from the K i SNNs in the ensemble.Output data: a vector C of predicted class labels for each sample in the dataset.
Applying the logistic regression model to predict the class label c ik for the sample v i .Apply the most common element voting scheme to the obtained collection of class labels c ik : the final predicted class is determined as the most frequently occurring class.

6:
Record the resulting class label to the vector C. 7: end for 8: Return a vector of predicted class labels C.
The classification process is additionally visualized in Figure 7. Depending on the experimental settings, the number of the models in the ensemble may be equal to one (base  0.9 0.9 0.9 0.9 0.9 0.9

Experiments and Results
Experiments on the Digits dataset were conducted using hold-out cross-validation; then, 20% of the training examples were used for testing.On FSDD, a fixed testing dataset was used.For all experiments, such parameters as the number of neurons in the networks, the number of receptive fields, and the number of networks in the ensemble were selected for each plasticity and for each dataset by maximizing the training classification accuracy.The selection was performed automatically using the tree-parzen estimator (TPE) algorithm implemented in HyperOpt [33] (an open-source Python package).For all of the three methods under consideration, the parameters across plasticity models and datasets were fixed, thus ensuring a fair comparison.The source code of all the connectivity reduction methods is available in our repository (see Funding information below).
In Table 1, we present the hyperparameters of SNN that were used for each of the considered datasets and plasticity models: norm-the input normalization method: L2 or standard scaling (STD); -n fields -the number of Gaussian receptive fields (GRFs); -n neurons -the number of excitatory neurons in the network; -n estimators -the number of networks in the bagging ensemble (for the ensemble approach specifically, everywhere else it is equal to 1); τ m,exc and τ m,inh -the characteristic time of the membrane potential decay for the excitatory and inhibitory neurons in milliseconds; -frequency-the maximal spiking frequency of the Poisson generators; -t ref,exc and t ref,inh -the refractory time for the excitatory and inhibitory neurons in milliseconds; -w syn,exc and w syn,inh -the synaptic weights of the excitatory-to-inhibitory and inhibitoryto-excitatory connections, respectively; -P gen-exc -the probability of forming a connection between an input and an excitatory neuron in the sparse WTA network; -P inh-exc -the probability of forming a connection between an inhibitory and an excitatory neuron in the sparse WTA network; -R inh-exc -the projection radius for inhibitory-to-excitatory connections in the sparse WTA network.
In all experiments, the time for submitting one example to the WTA network t e was 350 ms, which was followed by a relaxation period t p of 50 ms, thereby resulting in a processing time of t e + t p = 400 ms per sample, where learning took place over one epoch.As a baseline, we conducted an experiment using the classical WTA network (see Section 3.3).
Additionally, we presented the number of connections within the network with a breakdown by type of the pre-and postsynaptic neurons.Table 2 demonstrates the connectivity within the base WTA network.The connections for the base network and the bagging ensemble were calculated, as shown in Equation (4).In the equations, N inp is equal to the number of input features, N exc denotes the number of excitatory neurons, p Gen-to-Inh = 0.1 is a fraction of the inhibitory neurons connected to the generator layer, N ens is the number of estimators within the ensemble, and N ens = 1 is for the base WTA network.For the sparse WTA network, the connections were counted manually from the weight checkpoints due to their stochastic nature.
Here, N inp is equal to the number of input features, N exc denotes the number of excitatory neurons, p Gen-to-Inh = 0.1 is a fraction of the inhibitory neurons connected to the generator layer, N ens is the number of estimators within the ensemble, and N ens = 1 is for the base WTA network.For the sparse WTA network, the connections were counted manually from the weight checkpoints due to their stochastic nature.
After applying the considered methods for reducing the number of connections in the WTA network, the connectivity of the resulting models was obtained, as presented in Table 3.The experimental results for applying different types of reduction methods are presented in Table 4, and they are also expressed using the F1-score metric, which is defined in Equation ( 5).The obtained accuracies are consistent with and superior to the results reported in our previous works on spiking neural networks with memristive plasticity and without sparse connectivity: Digits 0.83-0.86F1 from the article [23].The best performance was obtained by a one-layer SNN with 1600 neurons and 2,660,800 connections (vs. the bagging WTA SNN model with 550 neurons and 221,000 connections) FSDD 0.81-0.93F1 from the article [34], where the best scores were achieved by WTA SNN with 400 neurons and 243,600 connections.For comparison, the proposed sparse bagging WTA SNN model containing 550 neurons and 151,217 connections achieved the same performance (F1: 0.93).
It also follows from Table 5 that the resulting SNN models with sparse connectivity and memristive plasticity demonstrated high accuracy compared to the other algorithms, including non-spiking ones.

Discussion
To evaluate the effectiveness of the connectivity reduction methods, we introduced the connectivity index κ, as defined in Equation ( 6): Here, N sparse and N base are the total number of connections in the sparse network and the equivalent fully connected WTA network, respectively.Based on this definition, the efficiency of the connectivity reduction method can be assessed by calculating the ratio of the classification accuracy to the connectivity index (see Equation (7), where the efficiency is represented by the index η).
The values of the connectivity and efficiency indices for different datasets, plasticities, and network types are presented in Table 6.The motivation for the proposed indices lies in assessing the accuracy-per-connection ratio, which is a more robust comparison metric for the proposed methods compared to raw accuracy.
From the table above, it follows that, in our experiments, the relative efficiency of spike networks with sparse connectivity in comparison to ensembles of spike networks was found to be slightly higher; on average, across plasticities and datasets, the efficiency of the sparsely connected WTA network was 2.2, while the efficiency of bagging was, on average, equal to 2.1.However, due to the small scale of this difference, we concluded that both methods can be effectively used to reduce connectivity depending on the specifics of the problem and the hardware requirements.If the experimental setting facilitates only the reduction in static connections and supports ensembles, bagging is a preferable option, while sparse connectivity may be used in situations where only a single larger network is feasible.
The combination of these two methods yielded the highest efficiency of 2.8, on average, with an average connectivity index equal to 0.32.Therefore, if combining both approaches is possible for a given problem, the resulting accuracy-per-connection efficiency will be the highest.

Conclusions
In this work, we compared two approaches to connectivity reduction in memristive spiking neural networks: the bagging ensemble technique and probabilistic sparse connectivity.Using a three-layer WTA network, we demonstrated that both methods achieved competitive performance on the handwritten digits and spoken digits classification tasks, with a combination of both approaches achieving the highest efficiency.On the Digits dataset, the bagging ensemble yielded an F1-score of 0.95, 0.94, and 0.94 for the STDP, NC, and PPX plasticity rules, respectively, while the sparse WTA network achieved 0.94, 0.93, and 0.94, respectively; furthermore, a combined Bagging + Sparse model, in turn, yielded F1-scores of 0.87, 0.93, and 0.94, respectively.On FSDD, the F1-score values lay within the 0.89-0.93range for the ensemble of WTA networks, within the 0.84-0.92interval for the sparse WTA network, and within the 0.88-0.91range for the combined model.The resulting models were found to be superior in accuracy to well-known spiking neural network solutions, and they also corresponded to the level of other non-spike algorithms.Additionally, by studying the ratio between the proposed connectivity index and the F1-score, we showed that the bagging ensemble and the sparse WTA network achieved an almost equal efficiency, while the combination of both methods yielded a 20% higher average efficiency coefficient value.
Thus, the created combination of methods can be used as a computational technology for creating spike neural network models for implementation on neurochips in inference mode.Also, the developed architectures of spiking neural networks can be used for the subsequent implementation of online learning on neuromorphic chips with memristive connections.In our future research, we plan to expand the scope of the classification problems for image and audio data that can be solved using the proposed methods (e.g., the CIFAR-10, Google Speech Commands dataset, etc.), as well as work on hardware implementations of the designed networks.

Figure 1 .
Figure 1.Averaged samples by each class of handwritten digits from the Digits and MNIST datasets.

Figure 2 .
Figure 2.An example of Gaussian receptive fields with the number of fields being equal to 5. The input feature x i was intersected with overlapping Gaussian fields to produce a vectorized feature representation: G j (x i ), j ∈ [0, 5) ∩ N.

Figure 3 .
Figure 3. Principal component analysis for the Digits and FSDD datasets.

Figure 4 .
Figure 4. NC and PPX memristive plasticity curves for different values of the initial weight w0.

Figure 5 .
Figure 5. WTA spiking neural network topology.Poisson generators, adaptive excitatory LIF neurons, and inhibitory LIF neurons are shown in yellow, green, and red, respectively.Trainable synapses are depicted in blue, excitatory-to-inhibitory connections are shown in green, and inhibitory-to-excitatory connections are denoted in red.Finally, generator-to-inhibitory connections are expressed using a dashed green arrow.

Figure 6 .
Figure 6.Sparse connectivity: neuron projection.The projection neighborhood is shown in red; all postsynaptic neurons inside it will be connected to the projected presynaptic neuron.

Figure 7 .
Figure7.The testing pipeline for the proposed approach.It is depicted for both audio and image data: the former requires MFCC preprocessing, while the latter does not.After preprocessing, the resulting feature vectors were normalized and analyzed by each model in the ensemble.The predictions of multiple estimators were aggregated via voting, thus producing the final class label.

1 :
Generate weights 2: if P gen-exc or (P inh-exc and R inh-exc ) then Neural network initialization: neurons, synapses, and initial weights.8: Bagging: randomly attribute at most max_samples • M samples with max_ f eatures • N features to each of the networks in the ensemble.
9: for k in epochs do 10:for each x i in X train do11:for each k ij in x i do seq ij with length t e and frequency f ij .

Table 1 .
Experimental settings for the WTA network.

Table 2 .
Connectivity within the base WTA network.

Table 3 .
Connectivity within the sparse WTA network.

Table 4 .
Experimental results for the WTA network.The maximum F1-scores for each experiment are highlighted in bold.

Table 5 .
Comparison of results for both of the classification tasks.* indicates the accuracy metric.The maximum performance scores for our experiments and literature sources are highlighted in bold.

Table 6 .
Efficiency estimation of the different sparsity types.The best η values are marked in bold.