Constrain Bias Addition to Train Low-Latency Spiking Neural Networks

In recent years, a third-generation neural network, namely, spiking neural network, has received plethora of attention in the broad areas of Machine learning and Artificial Intelligence. In this paper, a novel differential-based encoding method is proposed and new spike-based learning rules for backpropagation is derived by constraining the addition of bias voltage in spiking neurons. The proposed differential encoding method can effectively exploit the correlation between the data and improve the performance of the proposed model, and the new learning rule can take complete advantage of the modulation properties of bias on the spike firing threshold. We experiment with the proposed model on the environmental sound dataset RWCP and the image dataset MNIST and Fashion-MNIST, respectively, and assign various conditions to test the learning ability and robustness of the proposed model. The experimental results demonstrate that the proposed model achieves near-optimal results with a smaller time step by maintaining the highest accuracy and robustness with less training data. Among them, in MNIST dataset, compared with the original spiking neural network with the same network structure, we achieved a 0.39% accuracy improvement.


Introduction
Due to the inherent properties of energy efficiency and biological plausibility, spiking neural networks (SNNs) have made significant advancements lately. Unlike basic artificial neural network (ANN), SNNs make an effort to mimic biological neural networks more closely through discrete events. The SNNs can encode information into a sequence of spikes, and are useful tools for complex spatio-temporal information processing when compared to traditional ANN that use floating-point numbers for operations ( [1]). Additionally, the generation of the spike sequence is an asynchronous process, and the operation of the spike is a binary operation, which offers tremendous adaptability to hardware circuits ( [2]). Many research institutions have developed neural computing devices based on SNNs, such as SpiNNaker ( [3]), TrueNorth ( [4]), and Loihi ( [5]), among others. The emergence of these platforms has also expanded the potential applications and prolific research prospects of SNNs.
SNNs and ANN differ primarily in two respects. (1) As shown in Figure 1, While the data is infiltrated in to the network, the SNNs must first encode it into a spike train; (2) The spike-firing process of spiking neurons is non-differentiable, which has a negative impact on training. In the field of neuroscience, neural information coding is concerned with the cation tasks, and the results show that our models can achieve competitive results with fewer spikes.
The rest of this article is organized as follows: Section 2 introduces the Sigma-Delta encoding method, and explores the proposed bias addition rule and the new gradient approximation, thereby deriving the backpropagation formula. Section 3 verifies the effectiveness of the proposed encoding method and learning rules through experiments on different tasks and compares them with other models. In Section 4, we summarize and discuss the effectiveness of the outcome obtained in the proposed work. Figure 1. Information transfer and structure diagram of Deep SNNs. If the input type is a floating point, it is usually encoded as a spike train and injected into the network at different time steps. When the membrane potential exceeds a threshold, the spiking neurons can accumulate the spikes from the presynaptic neurons and fire; otherwise, the output is zero.

Spiking Neuron Model
As the most commonly used spiking neuron, the Leaky Integrate-and-Fire neuron (LIF) adopts event-driven simulation strategy and has limited neural computing characteristics ( [23]). The first-order differential form of its kinetic equation is as follows: where V(t) represents the membrane potential at the t-th moment; V rest represents the resting potential; X(t) is the input at time t, including the output spikes of the previous layer and bias voltage; τ is the decay time constant. When the membrane potential V(t) does not exceed the threshold V th , the membrane potential decays, and the magnitude is controlled by τ, otherwise, the neuron will fire a spike and reset the membrane potential. We express the firing process with the following formula: At the t-th time step, the value of the membrane potential of the j-th neuron in the l-th layer is given by: where M l−1 means the number of spiking neurons in l − 1-th layer, X l bias,j represents the bias voltage. For the spike reset phase, we use a subtraction-based reset mechanism: When compared with the method of directly resetting the membrane potential to a fixed value, the subtraction-based mechanism is better suited to training deep SNNs and makes more biological sense ( [20,24]).

Spike Encoding Method
Most of the present SNNs encoding methods such as time coding and rate coding involves the encoding of input numerical data. This characteristic assumes that the input data are judged to be independent of one another. But there are correlations between data at different points in a single image or spectrogram of a single speech signal, especially in spectrograms, where each row represents a subband. Differential Pulse Coding Method (DPCM) is a common coding method in classical speech coding schemes that can eliminate information redundancy in the spoken signal and achieve commendable data compression. DPCM does not encode the input data directly, but rather make use of the difference between adjacent sampled signals. This technology has been applied not just in the field of communication, but also in digital picture processing ( [25,26]). However, the DPCM technique has to rebuild the sampled value at the previous time in the calculation process and then calculate the difference value with the current input, which increases the computing complexity of the encoding process.
The Sigma-Delta ADC is a very common circuit structure in the field of integrated circuits, which can convert analog signals to digital signals. We model the process by which a first-order Sigma-Delta ADC outputs a digital signal. In some previous studies, researchers carried out the process of combining Sigma-Delta ADC with SNNs. It is used as the spiking neurons to process spike trains in [27,28]; in [29], the researchers add it to each layer of the network and propose an STDP-like learning mechanism; researchers in [30] quantify the activation of ANN and convert them to SNNs using a method similar to Sigma-Delta modulation. We utilize Sigma-Delta ADC as the rate encoder and simplify the model to realize better computational efficiency. The Sigma-Delta ADC encoding process, shown in Figure 2, is similar to DPCM, but employs fixed return values instead of predictors. When the input is a picture or a spectrogram that involves a numeric matrix, we set the data in row i and column j as x(i, j) and encode the input data along the rows. It is important to note that the number of rows of input data is equal to the number of encoders. The encoding process of i-th row is as follows: where e(i, j) represents the prediction error, I(i, j) denotes the value of the integrator,x(i, j) represents the predicted value, and is determined by the Formula (7), y(i, j) is the output of the encoder, and V th is the benchmark of the comparator. When the time window is N, the above formula will loop N times, but at the end of each loop, the value of the integrator will be retained and used for the next loop. It can be seen from the Formulas (5)-(7) that any data will be affected by the already encoded data during the encoding process. If the input data is a spectrogram, the data of any subband will be disturbed by the data of the same subband during the encoding process. For image data, we still input along the rows during the experiment. Note that the input range for Sigma-Delta encoder(SDE) is [−1, 1]. We use linear and nonlinear methods to map the input data to this range.The mapping formula is as follows: Sin : where X max and X min represent the upper and lower bounds of the input value, respectively. Linear mapping will not affect the distribution of the data, while sinusoidal mapping will increase the difference between higher and lower value regions.   Figure 3a. If and only when the input value exceeds zero, the information will continue to be transmitted; otherwise, the output is zero. The ReLU function introduces nonlinearity and alleviates the problem of exploding or vanishing gradients when training with the gradient backpropagation algorithm ( [31]). As shown in Figure 3b, we can observe that there is also a positive correlation between the spike output frequency and the inputs of Integrate-and-Fire (IF) neurons in SNNs. But, as their outputs are spike trains rather than continuous values, the spike firing process is non-differentiable, which makes SNNs difficult to train directly by using the backpropagation algorithm.  In this study, the role of bias voltage in spiking neurons is reconsidered. By developing a new bias addition mechanism, a novel gradient approximation method is proposed based on the new rule's input and output characteristics. In this work, it is observed that the gradient of a spiking neuron should be divided into two parts based on the range of the input: when the input is less than or equal to zero, the gradient is considered to be zero, and when the input is greater than zero, the gradient is calculated, and the detailed derivation of the gradient is given in the following sentences.
Spiking neurons in SNNs fire only when the membrane potential exceeds the threshold, the source of the membrane potential can be separated into three parts, as shown in Figure 4: the membrane potential at the last time-step, the input, and the bias. The bias voltage value and the spike firing rate have a positive association. Due to the fact that only the membrane potential and the input change during the network inference process, the neuron's actual spike firing thresholdV th is equal to V th − X bias . However, in the previous research, the processing of bias current is mainly divided into two cases: (1). The bias is directly set to 0 and not as a variable during the network learning process; (2). Like ANN, the bias is added at each moment and updated together with the weights during the network learning process. In case 1, since the bias is permanently forced to zero, it is equivalent to ignoring the regulation effect of bias on the threshold; while in case 2, adding bias voltage at every time-step undoubtedly violates the event-driven characteristics of SNNs because When time T → ∞, in the absence of any input, the spiking neuron will spontaneously emit spikes, which has a negative impact on training and reasoning. The left side is the existing bias addition mechanism, which adds bias in the accumulate spike train phase and triggers a spike when the membrane potential exceeds the threshold; the new bias addition mechanism adds bias only when the membrane potential is reset to ensure that the membrane potential does not increase in the absence of input.
For the above two cases, we modify the bias addition mechanism. We want to think of the bias as the reference voltage in the spiking neuron, which means that in the absence of any input, the membrane potential of the spiking neuron will always be X bias . When the output spikes of the previous layer arrive, for the change of the membrane potential of the IF neuron, we describe it with the following formula: where V l j (t − ) denotes the membrane potential of the previous layer when the spike just arrived, V l j (t) presents the membrane potential after the spike is emitted. When the spiking neuron emits a spike, the membrane potential is reset according to the Formula (4). We set the bias to be consumed during the reset process, and then the bias will be added at the end of the moment, the total process is shown in the right side of Figure 4. At the same time, we set the final membrane potential to be V l j (T) after the inference process is over. In the whole process, a single neuron satisfies: where N l j is the total number of transmitted spikes. In order to reduce the difficulty of solving, we ignore the final membrane potential V l j (T) because V l j (T) must contain bias, so a bias needs to be subtracted from the left side of the equation, and the final Equation (11) evolves into the following form: For a single spike, its value is generally set to 1, so it can be seen that N l j is the final output of the neuron. Let (t) as the total input from the previous layer, we can get: Since then, we have approximated the input-output relationship of the IF neuron, and can be approximately applied at each time step. The process is similar to the LIF neuron, but in the process of calculating the membrane potential, the decay factor needs to be considered like the Formula (3), then there is more loss when approximating the Formula (13). Therefore, only IF neurons are used in all experiments in this paper.

Spike-Based Backpropagation Algorithm
In Formula (13), we obtain the input-output relationship of the spiking neuron. By simply derivation, we can easily obtain the following formula: After obtaining the approximate gradient of the neuron, we design our backpropagation gradient algorithm using the BackPropagation Through Time (BPTT) concept. We define the output layer of the network as a linear layer that emits no spikes and accumulates the membrane potential to determine the category based on the magnitude of the membrane potential. For a M classification problem, we utilize Mean-Square-Error (MSE) as our loss function with: where , during backpropagation, we use the chain rule. We set the L-th layer as the output layer, and then for the output layer neurons, the gradient calculation is as follows: For the hidden layer, to the j-th neuron in the l-th layer, the gradient calculation formula at the moment t 0 is as follows: According to the Formula (10), the first two items of the Formula (23) can be known: Considering the case of N l,t 0 j = 0, the last term of Formula (23) cannot be calculated directly, but can be eliminated by the chain rule. Finally, we get the gradient of the loss function to the weight W l and bias X l bias,j respectively: The pseudocode for the entire training phase is shown below: Algorithm 1 Training process of the proposed SNN model and SDE method.

Experimental Environment
To validate the effectiveness of the proposed encoding method and learning rule, we apply the proposed model to environmental sound classification tasks and image classification tasks. The network structures and hyperparameters we use are different for different tasks, The training process of the network is shown in Algorithm 1. For different tasks, the input data needs to be preprocessed differently.. It is worth noting that the input of the network needs to be input to the network at multiple time steps in SNNs. Hence, when using the dropout layer, it is a necessary requirement that the mask is the same for the same set of data input at each time step, which is mentioned in [32]. Our model is constructed using the PyTorch-based SpikingJelly framework, which facilitates faster inference with GPU acceleration ( [33]). The environmental sound classification experiment is run on Intel(R) Core(TM) i7-8700 CPU, and the image classification task is performed on NVIDIA TitanXP Graphics card.

Datasets
In the experimental stage, the data set used for the environmental sound recognition task is the Real World Computing Partnership (RWCP) sound scene dataset dataset ( [34]). Following the settings of the previous research, we selected ten categories: bells, bottle, buzzer, cymbals, horn, kara, metal, phone, ring, and whistle, with durations ranging from 0.5 s to 3.0 s. We randomly select 80 files from each class, picking half for training and the other half for testing. In order to test the effectiveness of learning, we add noise to the test set while the training set does not change. Under such mismatched conditions, the test set with noises of different signal-to-noise ratios is tested. The experimental results in this part are all the average values of ten groups of experimental results. We use the MNIST dataset and the Fashion-MNIST dataset for image classification tasks. The MNIST dataset is one of the most classic datasets in deep learning, which contains ten different handwritten digits and the input size is 28 × 28 ( [35]). It contains ten classes, and the training set has a total of 60,000 samples, each class contains 6000 samples. The test set contains 10,000 samples, 1000 samples per class. The Fashion-MNIST and MNIST dataset have the same number of sample classes and samples. Its data is more complex, in which the content of the pictures includes various clothes, the input size is 28 × 28, and the number of training and test sets is the same as in MNIST ( [36]).

Experimental Chapter Arrangement
Our experiment is divided into three parts. In Section 3.2, we completed the task of environmental sound classification in the RWCP dataset. We briefly introduced the feature extraction algorithm we used and the whole process of feature extraction and coding. We perform our experiment under different normalized mapping modes and check the network performance under different time Windows. In order to verify the network's robustness, noises with various signal-to-noise ratios were introduced to the test set in addition to clean data sets for training and testing.
In Section 3.3, we used the MNIST and Fashion-MNIST data sets for image classification tasks. We evaluate the network's performance with various numbers of codes, compare the outcomes with those from other SNN models, and display the results in a table. By lowering the amount of training samples, we further test the network's capacity for learning in Section 3.4. Finally, We analyzed the algorithm efficiency in Section 3.5.

Constant-Q Transform
Constant-Q transformation (CQT) refers to a technique for converting a time-domain signal x(n) into the time-domain frequency domain, and the center frequencies of the frequency bins are geometrically spaced. The conversion formula from discrete timedomain signal x(n) to frequency domain X(k, n) is as follows: where k is the indexes of the frequency bins and denoted by N bins in Table 1. · denotes rounding towards negative infinity, a * k represents the complex conjugate of a k (n), details can be found in [37]. The calculation formula of the center frequency f k of each frequency bin is: where f 1 is the initial center frequency of the lowest frequency bin. The value of Q is determined by where ω has different values for different window functions, q is a scaling factor and usually set to 1, B represents the number of bins per octave which is important to the performance of CQT. If the value of B is set very high, then the interval between the center frequencies will be very small, for the same band length, we can obtain more frequency bins, which can be regarded as realizing the oversampling of the frequency axis. In our experiment, in order to limit the size of the spectrogram, we set the value of B as 3.

Network Structure and Parameter Setting
We use the CQT to extract audio features from environmental sound data and create spectrograms . The main advantage of CQT is that it devoid of suffering from uniform time-frequency resolution because the bandwidth of its subband to its center frequency ratio is a constant value. Therefore, high frequency waves have a larger bandwidth and more substantial temporal resolution at high frequencies read the path of quick changing overtones; In contrast, its bandwidth is relatively narrow for low-frequency waves, while it has a higher-frequency resolution to resolve related notes. In reality, the CQT can be used to evaluate the audio signal more thoroughly since the energy of the sound signal is primarily focused in the low-frequency range. We use Librosa library to complete the experiment of the feature extraction part ( [38]). The hyperparameters we set in the experiments are shown in Table 1.
Due to the different lengths of the environmental sound segments employed, the dimensions of the spectrogram produced by CQT after its extraction also vary. Therefore, we divide the temporal dimension of the spectrogram into T-segments. Each segment has different sampling numbers for sound signals with different lengths. Then, we average each segment to obtain a spectrogram with a defined scale. If M cochlear filters are used and T is the length of the time axis, after segmenting the temporal dimension, we obtain a spectrogram of size (M, T). Therefore, we input the spectrogram directly to the SDE, and each SDE encodes only one data row, so we need M SDEs to perform the encoding process, which is shown in the Figure 5. After the encoding process, we flat the spike matrix into a 1-dimensional vector and input it into the fully connected SNNs. The network structure is 800 × 50 × 10, and the threshold of the last layer is set to infinity. In the initialization phase of the network, the weights are initialized according to the normal distribution, the fixed bias is 1 and the threshold is 3. Since the RWCP dataset is relatively small, the default batch size is set to 1.

Classification Performance
We first conduct experiments under the condition of clean signal. As shown in Table 2, for different mapping methods, even the same time window will show differences. When the time window is equal to 20, the network using linear mapping achieves the best performance, and the average result of ten experiments achieves 99.85%. When the time window is 16, the sinusoidal mapping method achieves the best performance of 99.68%. In Table 3, we compare our results with other SNNs models and traditional MFCC-HMM methods; we can see that our proposed approach with a fully connected SNN model achieves results close to the best and even exceeds some SNN models that include convolutional operations. We further test the robustness of the model by adding noise to the test data, while still using clean sound for training and testing the ability of the network to extract audio features by constructing mismatched conditions. It can be seen from Table 4 that under the mismatched condition, the network using sinusoidal mapping can achieve better robustness, the optimal performance is achieved when the time window is 12, and the average precision reaches 71.49%, while the optimal performance of linear scaling is 67.23%, the time window is 14. In this regard, we consider sinusoidal mapping that can introduce more nonlinear transformations, which can improve the robustness of the network to a certain extent. Moreover, we find that the performance of the network first increases and then decreases as the time window increases, which is consistent with our expectations. When the time windows are too small, the encoding of the input feature image is relatively coarse. At this point, increasing the time windows may benefit feature extraction. However, if the time windows are too large, the network will be overfitted, the slight differences in the network can also affect the performance of the network.

Network Structure and Parameter Settings
Compared to environmental sound classification task, most current SNNs tasks choose to demonstrate network performance on image datasets, and the relevant network parameters are shown in Table 5. The network structure we choose is 15C5-40C5-FC300-FC10, where 15C5 refers to the output channel of the first layer, which is 15, and the size of the convolution kernel is 5 × 5. FC300 means that the number of neurons in the fully connected layer is 300. We set the parameter of the dropout layer to 0.2 and employ only linear mapping when we use SDE. We use stochastic gradient descent to update the network parameters and cosine annealing to control the change in the learning rate. The period of cosine annealing is 100, and the period of the whole training is 200, which means the learning rate at the beginning and end of the training is the maximum learning rate. At the 100th epoch, the learning rate decreases to its lowest value. The advantage of this setting is that the value of the loss function of the network is very small in the later training phase, and it is difficult for a small learning rate to break the optimal local solution. Therefore, using a higher learning rate at the end of network training allows the network to explore a larger space to achieve superior performance. One quarter of the learning rate decay period 100 lr ini The initial value of learning rate 5 × 10 −2 lr min The minimum value of learning rate 1 × 10 −4

MNIST Dataset
As one of the most widely used datasets for classification tasks, MNIST has been used by many SNN models trained on spikes to test performance. In the Table 6, we show the performance of our model for different time windows. With a number equal to 12, the performance of the network has reached 99.37%. With the increase of the time window to 14, the classification accuracy reaches the highest of 99.60%. We further compare the results of our network with other models with similar structures and present the results in the Table 7. For other networks with similar structures, our network shows advantages in both the time windows and the network performance.  [44] Spike-based BP 15C5-P2-40C5-P2-FC300-FC10 400 99.49% ST-RSBP [45] Spike-based BP 15C5-P2-40C5-P2-FC300-FC10 400 99.62% Lee et al. [32] Spike-based BP 20C5-P2-50C5-P2-FC200-FC10 120 99.59% ASF-BP [46] Spike-based BP 20C5-P2-50C5-P2-FC200-FC10 300 99.65% Relaxation LIF [47] Spike-based BP 15C5-P2-40C5-P2-FC300- To examine the effect of different time windows on the performance of the classification task, we plot the training curves in the Figure 6. From the different training curves in the Figure 6a, we can see that our network can achieve more than 97% accuracy in the initial stage of training, and as the training time increases, the accuracy of the network continues to increase. In the later stages of training, the performance of the network is still very stable, even though the learning rate continues to increase at this time. A lower time window can better highlight the advantages of the SNN's low energy consumption and reduce the time consumption for inference. We replaced the coding method with Poisson coding to test the influence of SDE on model performance. The training curve is shown in Figure 6b. It can be observed that the model based on SDE has better performance and faster training accuracy. Since the Sigma-Delta ADC is full-fledged method in the field of analog circuits and the SDE is equivalent to a first-order Sigma-Delta ADC, it is very easy to implement in hardware.

Fashion-MNIST Dataset
For the Fashion-MNIST dataset, the image content is more complex and challenging than the MNIST dataset. In the Table 8, we show the performance of other SNNs models on this dataset and show the performance of our model under a different number of codes. When the time window is 20, the model accuracy reaches 90.14%, and when the time window is increased to 100, the network performance reaches 91.71%. Our model can achieve high accuracy with a small time window, which also reflects the low power consumption of SNNs.

Training with Less Data
To test the learning ability of the network, we control the amount of data used to train the model and test with the three datasets above. For the RWCP dataset, We adjusted the number of training data. For the MNIST and Fashion-MNIST datasets, we set the ratio of the training data to the training set to 0.1 and 0.5, respectively, and the time windows are 14 and 20 with respect to the MNIST and Fashion-MNIST datasets.
The experimental results for the RWCP dataset are shown in Table 9. When we reduce the ratio of the training data to 0.1, our SNNs model can still achieve an accuracy of 98.97% and 97.318%, respectively, and even under mismatched conditions, the average accuracy can reach 70.53%. When we increased the ratio to 0.9, the performance of the network is significantly improved. The average accuracy reached 99.70% and 99.75% without adding noise, and the average performance under mismatched conditions is 68.00% and 73.20%, respectively. As shown in Figure 7a, for the MNIST dataset, when the number of data points used for training is 6000, our proposed SNNs model achieves 98.41%, and when the training number increases to 30,000, the training accuracy further rises to 99.22%. For the Fashion-MNIST dataset, the training curve is shown in Figure 7b, we use 10% of the training set data to train, the model performance reaches 84.07%, and its training curve shows a trend of overfitting; when using half the number of training sets during training, the network accuracy improves to 87.74%. We believe that the images in the Fashion-MNIST dataset are more complex than those in MNIST. Although the amount of data used for training is the same, Fashion-MNIST's image feature extraction is more difficult, and the model is more prone to overfitting.

Algorithm Efficiency
We compare the proposed SNN model with SNNs based on native IF neurons. The network model used is still LeNet5, and the dataset is MNIST. We first compare the average running time and memory consumption of each epoch, reflecting the efficiency of the algorithm. The running time of a single epoch for SNNs based on primary IF neurons is 94 s, while the running time of a single epoch for our proposed SNN model is 100 s. This is due to the need to determine whether to add bias when implementing the code. In terms of memory consumption, as can be seen in Figure 8a, our model occupies about 889 MB, while the native IF neuron model consumes 1003 MB, which is related to the spike sparsity of the network. Note that the approximate gradient we derived has an obvious advantage over the surrogate gradient: if the spike neuron emits no spikes at time t, the gradient at time t is 0, which effectively reduces memory consumption.
We further studied the performance of the network. We tested the model under the same time window. The training curves of the two models are shown in the Figure 8b.
It can be seen that our model is not only much higher in precision than the model using native IF neurons, but also has obvious advantages in convergence speed.

Discussion
For SNNs, spike encoding is a crucial process, and the question of how to achieve a higher coding benefits is one worth exploring, the efficiency of encoded spike trains has an important impact on the performance of the model. The proposed SDE can be applied to sound recognition and image classification tasks and uses the correlation between data to effectively reduce the time windows. In traditional deep neural networks, the bias measures the ease with which positive and negative excitations are generated, while in SNNs, the bias adjusts the ease with which spikes are fired. In previous studies, researchers have not emphasized the effect of bias. We believe that the adjustment of the bias is equivalent to changing the spike firing threshold. In this work, we usefully restrict the addition of bias to tight coupling to the spike, which allows us to obtain identities for the input and output of a single neuron throughout the inference process and to obtain gradient approximations. The bias voltage is different for different neurons, and the actual spike firing threshold is also different, reflecting the heterogeneity of biological neurons and consistent with biological systems.
We apply the proposed SNNs model to the classification of environmental sounds. Compared to images, sound signals contain rich spatiotemporal features and are more suitable for evaluating the performance of SNNs. For the test set without noise, our average precision reaches a maximum of 99.85%, and for the test set with noise, our average precision reaches 71.49%, reflecting the robustness of the network. We also test the performance of the proposed SNN model with only a few samples. The experiments show that using only 80 audio samples for training and 720 audio samples for testing, the network achieves the highest average precision of 98.97% without adding noise. The average accuracy when noise is added reaches 70.53%, which reflects the excellent learning ability of the SNNs. We verify the effectiveness of the proposed learning rule and SDE on the MNIST and Fashion-MNIST image datasets. For the MNIST dataset, compared to previous SNN studies with similar network structures, our network achieves nearoptimal results in only 14 time steps, and for the Fashion-MNIST dataset, the network performance also achieves 90.26% when the time window is equal to 20, which is better than most networks.
Apart from the conception of novel SNN coding method which is highly viable for hardware implementation, the proposed bias addition mechanism is also consistent with the event-driven nature of SNNs. Compared to most other SNNs models, our network uses fewer time windows, which means that our model spikes more sparsely during inference. This aspect affirms the advantage of low energy consumption of SNNs and reduces the response time of the network, which is beneficial for its application in practical scenarios. During the training process, in quest of speeding up the training, techniques such as gradient normalization and weight constraints are avoided. Hence, suitability facet for on-chip training of neuromorphic chips is assured with great aplomb.