Extraction of Significant Features by Fixed-Weight Layer of Processing Elements for the Development of an Efficient Spiking Neural Network Classifier

: In this paper, we demonstrate that fixed-weight layers generated from random distribution or logistic functions can effectively extract significant features from input data, resulting in high accuracy on a variety of tasks, including Fisher’s Iris, Wisconsin Breast Cancer, and MNIST datasets. We have observed that logistic functions yield high accuracy with less dispersion in results. We have also assessed the precision of our approach under conditions of minimizing the number of spikes generated in the network. It is practically useful for reducing energy consumption in spiking neural networks. Our findings reveal that the proposed method demonstrates the highest accuracy on Fisher’s iris and MNIST datasets with decoding using logistic regression. Furthermore, they surpass the accuracy of the conventional (non-spiking) approach using only logistic regression in the case of Wisconsin Breast Cancer. We have also investigated the impact of non-stochastic spike generation on accuracy.


Introduction
Spiking neural networks (SNNs), considered to be the third generation of artificial neural networks, combine inspiration from biological neural networks with the capability to address traditional machine-learning tasks.In SNNs, neurons are dynamic systems that communicate through current impulses.
Over the past two decades, alongside the rapid development of spiking neural networks, neuromorphic processors have emerged, such as TrueNorth [1], Loihi [2], Tianjic [3], and others [4].These processors realize models that imitate functionality existing in the nervous system and perform machine learning, including realizations of reservoir-based spiking neural network topologies rooted in short-term effects [5] and long-term effects [6].Their primary advantages include low energy consumption as demonstrated for ZnObased memristor [7], high parallelism as demonstrated for image skeletonizing [8], signal processing efficiency as shown in [9] for a device with a very low switching current level and self-rectifying characteristics that can be utilized for reservoir computing and real-time adaptability as was discussed in [10].Efficient solutions for a set of tasks, such as sensory information processing as demonstrated for novel artificial retinal neuron with ultra-low power in [11], pattern analysis as for Fashion MNIST dataset [12], speech recognition, e.g., for the Texas Instruments digit sequences dataset in [13], the Heidelberg Digits dataset and the Speech Commands dataset [14], and other perception and environment interactionrelated activities as for reinforcement learning in [15] can be achieved by neuromorphic processors.In some cases, this requires the application of methods based on the utilization of a limited number of trainable synaptic weights, which necessitates the development of spiking neural network topologies with limited plasticity synapses [16].This concern is especially pertinent when training networks on a chip.Lowering the number of trainable weights offers advantages such as reduced overfitting [17], simplifying the fitting algorithm's complexity, and thus facilitating implementation on neuromorphic processors.
Our approach involves utilizing spiking networks with fixed synaptic weights that remain unchanged during training.This reduces the number of tunable network parameters, simplifying the potential hardware implementation of training on neuromorphic computing devices.This approach is inspired by reservoir neural networks [18].However, instead of a reservoir with recurrent connections, we use a fully connected layer with weights fixed based on logistic functions or a uniform random distribution.This approach has been previously demonstrated for conventional neural networks [19] and subsequently extended to spiking networks [20].In our work, we explore the capabilities of layers with fixed weights in solving benchmark tasks of real-valued vector and image classification.With this aim, the current paper presents the following contributions:

•
We demonstrate the ability of the proposed layer to perform effective reduction in the dimension of the input data vectors without loss of classification performance; • We explore the tradeoff between the number of spikes needed for encoding the input information and classification performance to find input encoding parameters that minimize the number of spikes while maintaining competitive classification performance;

•
We compare different methods for initializing the weights of the proposed layer.
As a result, we find that: 1. Layers with random or logistic function-generated weights can efficiently extract meaningful features from input data; 2.
Logistic functions enable achieving high accuracy with less result dispersion.
We conduct numerical experiments on a set of commonly used benchmark datasets described in Section 2: Fisher's Iris, Wisconsin Breast Cancer, and MNIST.The spiking network used is described in Section 3. Section 4 provides descriptions of the experiments and their results.Finally, an analysis of the results is presented in Section 5.

Datasets
Our methodology is employed across various tasks.Each of the tasks is a machinelearning benchmark.Therefore, the datasets contain information from different areas expressed in vector format.These representations detail the characteristics of flowers in the Fisher's Iris dataset and medical parameters in the case of Wisconsin cancer or encapsulated in the structure of pixel matrices, portraying images of handwritten digits in the case of MNIST.The sizes and distributions of the data vary significantly.To examine the efficacy of the suggested approach across diverse tasks, we perform a comprehensive assessment of the adaptability and resilience of the proposed layer, characterized by fixed weights.

1.
The MNIST dataset contains 60,000 training and 10,000 testing black-and-white images of size 28 × 28 pixels, representing handwritten digits from 0 to 9. The brightness of each pixel ranges from 0 to 255, where 0 corresponds to an absolutely black pixel and 255 to an absolutely white pixel.This dataset has become a benchmark for evaluating the performance of various classification algorithms.The examples from the dataset are depicted in Figure 1. 2. The Fisher's Iris dataset contains 150 samples of iris flowers, with 50 samples for each of the three species.Each sample consists of four numeric features describing the length and width of the sepals and the length and width of the petals.The data visualization is presented in Figure 2A, illustrating the non-linearity of the task using only two features.

3.
The Breast Cancer Wisconsin (Diagnostic) dataset consists of 569 samples containing information on cell characteristics from breast biopsy samples and their corresponding diagnosis: malignant or benign tumor, with 212 and 357 samples, respectively.The features are numeric and describe the morphological and structural characteristics of the cells, such as nucleus size, radius, area, and others.The data visualization, as shown in Figure 2B, employs only two features, akin to the case of Fisher's irises.

Spike Generators
The transition from numeric input (green squares in Figure 3) to spiking representation of the input is based on frequency encoding, where a higher input value (e.g., for digit images, higher brightness) means a higher number of spikes emitted by the corresponding generator in the spiking neural network.The advantage of frequency encoding lies in its naturalness and biological analogy, as neurons in the brain, for instance, in sensory cortical areas, often respond to signals by changing the spike generation frequency.
This numeric-to-spiking transition is performed by a layer of N spike generators (orange diamonds in Figure 3), where N is the input dimension.The output of each generator is a sequence of spikes, and the times of these spikes are aligned to numerical simulation timesteps of 0.1 ms.Poisson generators emit spikes at random times so that the total number of spikes in the sequence that encodes its corresponding input vector component x i obeys a Poisson distribution with the mean x i • Rate.Poisson generators, further referred to as stochastic generators, are used in all experiments except Section 4.6, where they are compared to non-stochastic generators that emit spikes with equal interspike intervals, keeping the total number of spikes equal to x i • Rate.Here, Rate is an adjustable coefficient that is examined during the experiments.

Processing Elements
We consider two types of processing elements (blue circles in Figure 3).The first one is the Leaky Integrate-and-Fire spiking neuron [21] with exponential post-synaptic currents.The dynamic of this neuron model is described by the following system: where V j is the membrane potential of j-th neuron, V rest = −70 mV is the resting potential, τ = 10 ms is the synaptic decay time constant, I syn is the total synaptic current, C = 250 pF is the membrane capacitance, w ij is the synaptic weight, I ij is the individual synaptic current, t spike is spike timing, and τ syn = 2 ms is the time constant of synaptic current.The asterisk ( * ) denotes convolution, and the sum is over the input synapses i of neuron j, and then overall times t spike ij of input spikes arriving at the i-th input of neuron j.The values of parameters are set according to our previous study [20].
When the neuronal membrane potential reaches a threshold value of V th , the neuron fires a spike, and the membrane potential resets to the V rest value and remains unchanged during t ref = 2 ms.V th was adjusted during the experiments.
The spiking neuron model implementation was adopted from the NEST Python library, specifically referred to as iaf_psc_exp within that context.
After presenting an input vector to the spiking layer, the output of the spiking neuron is the number of spikes it emits.
We also use another type of processing element, further referred to as adder.It is an element that receives a vector of numbers of spikes emitted by spike generators as an input (spike_count i ), then multiplies it by a vector of weights (w ij ) corresponding to this element j, and outputs the result: Effectively, the output of an adder is the upper limit of the output of a spiking neuron with infinitely small threshold and simulation timestep.On the plots, the adder is therefore symbolically placed as if it had V th = V rest .
In the case of adders, the output data from processing elements is numeric and directly fed into logistic regression.

Weight Initialization
Below, we use and further compare two methods of weight initialization: 1.
Random values-the weights are generated from a uniform distribution within the range of −1 to 1; 2.
Logistic functions-the weights are determined by the values of logistic functions, the general form of which looks as follows: where N is the number of inputs, r was set at 1.885, A is 0.3, B is 5.9 as in the original study [19] for this paper as tuning these parameters for the proposed spiking model did not result in a significant impact on accuracy.

Decoding
During a fixed time window, determined through experiments, generators emit spikes from 0 to the maximum per second, depending on the feature value.These spikes reach adders or spiking neurons via the weights, triggering them to emit spikes themselves.The numbers of spikes fired by the neurons or the outputs of adders are recorded.At the end of the time window, these values are passed to a logistic regression (LR) classifier [22], which outputs a class label based on the received numeric vector.The one-vs-rest scheme was used for the training stage of the LR classifier, and the rest of the parameters were set by default as in the scikit-learn Python library.LR serves as an intermediate step in our research.In future investigations, we intend to replace it with a layer of spiking neurons with trainable weights.This transition aims to improve the model's generalization capabilities and achieve a more efficient solution to the task at hand.

Agenda of Experiments
In our experiments, we aimed to achieve optimal accuracies on the aforementioned benchmarks while using a limited number of spikes.This problem formulation is of practical interest when developing energy-efficient biomorphic systems.
For the benchmarks, we conducted sequences of experiments to clear: 1.
The criteria for selecting the feed time window; 2.
The accuracy dependence on the number of processing elements; 3.
The accuracy dependence on the maximal number of spikes in the case of a more effective number of processing elements, defined in experiments of point 1; 4.
The accuracy dependence on the number of output spikes with a given number of neurons with thresholds; 5.
The influence of stochastic input signal on the accuracy.
The experiments were conducted using both weight initialization methods: Logistic functions and Random values.The dependence of accuracy on the time window, the number of processing elements, and the number of input spikes have been obtained for adders as processing elements, while the influence of spiking neuron dynamics is assessed separately in Section 4.5.
Due to the extensive time required for experiments with MNIST data, they were conducted on a reduced training set.Its size was determined through additional experiments, the results of which are presented in All subsequent experiments were conducted on the reduced MNIST dataset.
To mitigate the influence of randomness caused by the stochasticity of Poisson spike generators and the weight initialization method, we performed several accuracy calculations (further, we call it an attempt) for each point in the experiments.Additionally, we conducted a 5-fold cross-validation on the Fisher's Iris and Wisconsin's Breast Cancer data.The 'box-and-whisker' plots below were obtained after five attempts for the MNIST dataset and ten attempts for the Fisher's Iris and Wisconsin's Cancer datasets.The box spans from the first quartile (Q1) to the third quartile (Q3) of the dataset, featuring an orange line at the mean value.The whiskers reach from the box to the furthest data point within 1.5 times the interquartile range from the box.Outliers that lie beyond the ends of the whiskers are denoted with circles.
Furthermore, in Table 1, the minimum and maximum accuracies obtained across all attempts are presented.
The dotted lines shown on the plots indicate the saturation level of accuracy determined by the mean values of the boxplots.The parameters with which the saturation level of accuracy was achieved were chosen for subsequent experiments.

Analyzing the Time Window Size for Each Dataset
In the experiments conducted in this study, the maximum spike generation frequency of the generators is capped at 1000 Hz.This limitation is imposed to prevent generators from emitting more than one spike per discretization step, which is physiologically implausible.The purpose of this experiment is precisely to determine the minimum temporal window required to achieve optimal accuracy under these constraints.
Figure 5 demonstrates that the saturation of accuracy is achieved at a time window size of 5200 ms.Therefore, this time window size is used in future experiments.
In Figure 6, it is shown that the optimal accuracy with the Wisconsin cancer dataset is achieved with a 200 ms time window.Therefore, this window size is used in subsequent experiments.

Searching the Optimal Number of Processing Elements
The objective of this experiment was to determine the minimum number of processing elements directly influencing the number of spikes propagating through the network required to achieve saturation accuracy.
Using 3 processing elements, we observe that accuracy saturates for both weight initializations according to Figure 8.For the subsequent experiments, we have chosen to use 24 processing elements, as the accuracy plateaus across all initiations at this number of elements following the details in Figure 9.
Figure 10 indicates that maximum accuracies are achieved using 250 processing elements with logistic function initialization and 200 elements with random number initialization.The relevant values have been selected for further experiments.

Searching the Optimal Number of Generated Spikes with a Given Number of PE from the Previous Experiments
During this experiment, generator frequencies were gradually increased to establish the minimum number of spikes required to achieve saturation accuracy.
Accuracy reaches saturation with several generated spikes set at 52 for logistic weights and 312 for random weights as outlined in Figure 11.For subsequent experiments, we chose a value of 312 for logistic weights because the variance in accuracy is relatively small, and the average accuracy is comparable to that obtained with 52 spikes.It is also worth noting that when weights are initialized with random numbers, the variances in accuracy are higher compared to weight initialization using logistic functions.
Accuracy saturation is achieved with several generated spikes set at 8 for logistic weights and 24 for random weights, as indicated by the data presented in Figure 12.These respective values were selected for subsequent experiments.
Saturation is reached when the number of generated spikes is set to 160 for both weight initializations as per Figure 13.In this experiment, spiking neurons were employed to examine their impact on the accuracy of models.On the graphs, summators are denoted with a conditional threshold of −70.0 mV.
It is important to note that as the threshold potential value increases, the number of emitted spikes decreases.Therefore, the threshold value at which the firing rate saturates is sought from right to left rather than from left to right, as in previous experiments.
Figure 14 clearly demonstrates that processing elements without a threshold yield the highest level of accuracy.Similar to the previous experiment, we observe a relatively significant variance when weights are initialized with random numbers, in contrast to initializing them using logistic functions.70 .069 .9 9 69 .9 8 69 .97 69 .96 69 .9 5 69 .9 4 69 .9 3 69 .9 2 69 .9 1 69 .969 .8   In Figure 15, it is evident that when weights are initialized with logistic functions, accuracy saturates at a threshold of −69.94, whereas with random weights, the plateau accuracy is observed at a threshold of −69.8, taking into account the variance.A higher threshold level indicates a lower number of spikes emitted by neurons.7 0 6 9 .98 6 9 .96 6 9 .9 4 6 9 .9 2 6 9 .96 9 .8

The Influence of Stochastic Input Signal on the Accuracy
During the experiment, we conducted a comparison between stochastic spike generation, where spikes are emitted according to a Poisson distribution at each discretization step of 0.1 ms, and non-stochastic spike generation, where they are uniformly generated within a time window determined by the corresponding input.
As before, accuracies were calculated several times, and the data were visualized using box plots.However, to eliminate the potential impact of weight initialization on the model's accuracy, the weights were initialized once for all the attempts for both types of generators.
The use of stochastic generators results in outliers in accuracy, while the averages are comparable for both types of generators and both weight initialization methods as depicted in Figure 17.When using logistic functions, the accuracies of stochastic and non-stochastic generators are similar in line with Figure 18.However, in the case of random weight initialization, stochastic generators exhibit a considerable variance in accuracy compared to their nonstochastic counterparts.

Stochastic
Non In the context of Figure 19, accuracies are consistent for both initialization methods and generator types, with differences of no more than 1%.Non-stochastic generators completely remove accuracy deviation.

Efficiency of the Proposed Approach and Comparison with Other Existing Methods
In Table 1, we present the minimum spike counts required to achieve an average accuracy of 92% (referred to as minimum spike counts) and the spike counts needed to attain the highest achievable accuracy (referred to as Desired Spike counts) for each dataset for weight initialization method.The minimum and maximum F1-macro values across all attempts are linked to Desired Spike counts, which can be found in Figure 14 for the Fisher's Iris dataset, in Figure 15 for the Wisconsin Breast Cancer dataset, in Figure 16 for the MNIST dataset.The table includes LR and other existing spike-based approaches for comparative analysis.NC and PPX plasticities refer to the plasticity that models the conductance change in nanocomposite memristors [28] and the plasticity of highly plastic poly-p-xylylene memristor [29], respectively.The table includes other existing methods for the MNIST dataset for comparison, whose accuracies were obtained using the complete training dataset.
Table 2 presents the model parameters that result in the highest accuracies.

Discussion
Striving for low energy consumption of the prospective hardware implementation, our experiments were aimed at minimizing the number of processing elements in the network and the total number of spikes emitted.Our experiments followed this aim by selecting the temporal window for presenting examples, adjusting the number of processing elements and spike generator frequencies, and choosing the threshold value for neuron potentials.However, input generators emitting too few spikes per input sample would lead to loss of information at the input encoding stage, while neurons emitting too few output spikes would lead to information loss at the decoding stage.Accordingly, a lower neuron threshold makes samples of different classes more distinguishable by their output spike counts.Therefore, a processing element without a threshold, the adder, modeling an extreme case of a neuron with an infinitely low threshold, allows one to obtain the lower bound on the number of input spikes sufficient for acceptable classification performance.
As a result, for all the three datasets used, we find that competitive accuracy can be achieved with few elements in the layer (furthermore, for the Fisher's Iris dataset, just three neurons are sufficient, one neuron per class) and a few hundred input spikes per input vector.
Competitive accuracy was also achieved by the proposed approach on several other classification tasks.Our prior study [30] involving Free Spoken Digits data exhibited an accuracy of approximately 94%, while the Optical Recognition of Handwritten Digits data from our previous research [20] achieved an accuracy of around 92%.This shows the efficiency of the proposed approach for transforming input features into spike sequences.
Drawing weights from a random distribution rather than setting them on the base of logistic functions introduces more variability in accuracy, particularly in cases with limited datasets, such as Fisher's Iris and Wisconsin Breast Cancer.Larger datasets, such as MNIST, yield more stable training results due to their larger number of examples.
The comparison of stochastic and non-stochastic spike generators did not reveal significant differences in the average accuracy between the two.However, the stochastic nature of the generators affects the accuracy variance, which increases for smaller datasets.

Conclusions
On a set of benchmarks-Fisher's Iris, Wisconsin Breast Cancer, and MNIST-we have empirically demonstrated the efficiency of the proposed non-trainable layer in transforming numeric input values into a less-dimensional space of counts of spikes emitted during a specified time window, allowing the subsequent decoding of the classes from the spike counts.We also show that the number of elements in the proposed layer, as well as the number of spikes needed for encoding an input sample, can be kept minimal without much accuracy loss.The highest classification accuracy achieved using this approach (see Table 1) are on par (in the case of Fisher's Iris and MNIST) or exceed (in the case of Wisconsin Breast Cancer) the accuracy that LR alone achieves on the respective tasks, thus proving the efficiency of the proposed layer as a feature extraction step.Synaptic weights of the non-trainable layer can be set at random or be based on logistic functions; logistic functions reduce the resulting variability.
The results of this study are, to our knowledge, the first proof of concept of data preprocessing using a layer of spiking neurons with fixed weights, which has only been proposed earlier for conventional neural networks [19].
The obtained results lay a foundation for creating both efficient and economical spiking neural network topologies to be deployed on prospective biomorphic devices.

Figure 3 Figure 3 .
Figure 3 illustrates the proposed algorithm's general scheme: The input is encoded from numeric representation into spike sequences, which are fed into the spiking layer, and then the output spike sequences emitted by the layer are subsequently decoded back into numeric representation and processed by the classifier.

Figure 4 .Figure 4 .
Figure 4.The impact of the volume of the used MNIST training data on model accuracy.

Figure 5 .
Figure 5.The dependence of Fisher's Iris classification accuracy (solid blue line) on the time window length.

Figure 7 .
Figure 7.The dependence of MNIST classification accuracy (solid blue line) on the time window length.

Figure 8 .
Figure 8.The dependence of Fisher's Iris classification accuracy on the number of processing elements (PE) in the layer: left subplot, for the layer with weights based on logistic functions; right subplot, for random weights.

Figure 9 .Figure 10 .
Figure 9.The dependence of Breast Cancer Wisconsin (Diagnostic) classification accuracy on the number of processing elements in the layers with logistic-function-based weights and with random weights.

4. 5 .
Searching the Optimal Threshold Reflecting the Number of Output Spikes with a Given Number of Processing Elements That Are Replaced by Neurons and Several Generated Spikes from the Previous Experiments

Figure 14 .
Figure 14.The dependence between accuracy and the threshold potential of spiking neurons on the Fisher's Iris data.

Figure 15 .Figure 16 .
Figure 15.The dependence between accuracy and the threshold potential of spiking neurons on the Breast Cancer Wisconsin (Diagnostic) data.

Figure 17 .
Figure 17.The dependence between accuracy and spike generation method on the Fisher's Iris data.

Figure 19 .
Figure 19.The dependence between accuracy and spike generation method on the MNIST data.

Table 1 .
Comparison of accuracies with both weight initialization methods.
The dependence between accuracy and the number of spikes on the Fisher's Iris data.The dependence between accuracy and the number of spikes generated by the generators on the Breast Cancer Wisconsin (Diagnostic) data.
Figure 12.Figure 13.The dependence between accuracy and the number of spikes generated by the generators on the MNIST data.

Table 2 .
Parameter values of the models that achieved the best performance on each dataset and weight initialization.