Optimization of Spiking Neural Networks Based on Binary Streamed Rate Coding

: Spiking neural networks (SNN) increasingly attract attention for their similarity to the biological neural system. Hardware implementation of spiking neural networks, however, remains a great challenge due to their excessive complexity and circuit size. This work introduces a novel optimization method for hardware friendly SNN architecture based on a modiﬁed rate coding scheme called Binary Streamed Rate Coding (BSRC). BSRC combines the features of both rate and temporal coding. In addition, by employing a built-in randomizer, the BSRC SNN model provides a higher accuracy and faster training. We also present SNN optimization methods including structure optimization and weight quantization. Extensive evaluations with MNIST SNNs demonstrate that the structure optimization of SNN (81-30-20-10) provides 183.19 times reduction in hardware compared with SNN (784-800-10), while providing an accuracy of 95.25%, a small loss compared with 98.89% and 98.93% reported in the previous works. Our weight quantization reduces 32-bit weights to 4-bit integers leading to further hardware reduction of 4 times with only 0.56% accuracy loss. Overall, the SNN model (81-30-20-10) optimized by our method shrinks the SNN’s circuit area from 3089.49 mm 2 for SNN (784-800-10) to 4.04 mm 2 —a reduction of 765 times.


Introduction
In recent years, various types of Artificial Neural Network (ANN) have been studied as effective solutions for many object recognition and image classification problems with increasing accuracy. The Modified National Institute of Standards and Technology (MNIST) dataset is one of the popular benchmarks for testing different types of ANN due to its simplicity. MNIST dataset contains 60,000 and 10,000 images of handwritten digits for training and testing neural network modules, respectively. Training and evaluating various types of ANNs on large datasets consume a lot of time. Furthermore, while keeping a high level of accuracy, designing an ANN of minimal model size or hardware cost is even more challenging. There is a growing demand for hardware-friendly ANN optimization due to the rapid growth in low power AI for IOT and edge AI accelerator technologies. Among the ANN models, SNN is receiving growing attention due to its structural resemblance to the biological neural system. In addition, SNN is known to require less hardware leading to smaller chip size and lower power consumption [1,2]. Many hardware accelerators have been reported for SNN, which include SpiNNaker/SpiNNaker-2 [3,4], Intel Loihi [5] and Neurogrid [6], and TrueNorth chip [7]. In this paper, we present a highly optimized SNN with relatively high accuracy aimed at a compact SNN hardware accelerator. We propose a novel spiking signal coding scheme, and present SNN model optimization and quantization techniques, which are summarized as follows. 2 of 17 First, we propose a new rate-based spike generation method called a Binary Streamed Rate Coding (BSRC), which allows easy implementation in both software and hardware. BSRC eliminates the need for adding random noise to the input image pixels like traditional rate coding [8][9][10]. Instead, we can directly generate spike signals corresponding to all pixels of the entire input image. Second, we introduce a training technique for the proposed SNN. Recently, a direct supervised training algorithm for SNN called STBP was published [11]. It can reportedly achieve a very high accuracy of 99.42% using the MNIST dataset. We show in the experimental section that our BSRC coding scheme method, when combined with STBP, provides even higher accuracy.
It is a great challenge to design hardware accelerators of an SNN for low power and compact embedded devices. This is especially the case because hardware resources like memory, multipliers, and adders are limited. The accelerator designs are also constrained by the size of the ANN model, design complexity, operation speed, and power consumption [12][13][14][15][16][17]. In [18], the authors discussed an efficient technique for reducing the number of bits in representing SNN weights for training and inference process using either fixed-point or floating-point calculations. The authors in [19] further reduced the number of required weight bits into two bits for the inference process using an ANN training algorithm called BinaryConnect [20]. Then, they converted the ANN model to an SNN model and reported an accuracy of 99.43% on MNIST dataset. In [21], the authors reported an energy-efficient convolutional SNN by converting a deep CNN into an SNN to implement it to a spike-based neuromorphic hardware. In [14], a hybrid updating algorithm was proposed, which combines the advantages of existing algorithms to reduce the hardware complexity and improve the system performance. They proposed a network module supporting up to 16,384 neurons with a total of 16.8 million synapses. The design of [14] reported a reduced power consumption of 0.477 W, while achieving a relatively high accuracy of 97.06% for the MNIST dataset.
Many research studies on SNN have gradually improved the accuracy of the MNIST dataset. SNNs can be grouped to two types: fully connected SNNs [1,22] and convolutional SNNs [18,19]. Fully connected (FC) network is our choice, because of its simplicity for hardware implementation. In the fully connected networks, all neurons in one layer are connected to every neuron in the next layer. The nature of nondifferentiable spiking function and the dynamic feature make the training process of SNN incredibly challenging. Training an SNN can be done in three different methods: • Unsupervised training is a kind of self-training for synapse weight modification inspired by biological neural system exhibiting spike timing dependence plasticity [23][24][25]. • Indirect supervised training is a method that first trains the network model as an ANN through using traditional training algorithms, and then converts the trained network model into SNN version [26][27][28]. • Direct supervised training is a method that attempts to train an SNN directly by using approximated version of spiking function [11]. These training algorithms should have the capability to utilize spatial domain property to increase the training accuracy [22].
The proposed work introduces a new type of direct supervised training method based on Binary Streamed Rate Coding (BSRC). We then efficiently combine the time and spatial domain information to obtain higher accuracy than other algorithms.

Overall Structure of SNN
Spiking neural networks commonly consist of spiking neurons, synapses, and interconnections between neurons and synapses. Synapses are often modeled by adjustable weights. A type of SNNs comprises only fully connected layer(s), while other types comprise convolutional layer(s) as well as fully connect layer(s). In the fully connected type of SNNs, all neurons in the preceding layer are fully connected to every neuron in the subsequent layer [26]. Figure 1a shows the general structure of a Electronics 2020, 9,1599 3 of 17 fully connected SNN for 28 × 28 MNIST dataset. The training or inference processes start by flatten the input image pixels from two-dimensional array of 28 × 28 to a one-dimensional vector of size 784. After the input image flattening, the input layer of the network converts each input pixel's integer value to spike signals using various methods such as temporal coding and rate coding. The output layer consists of neurons, whose outputs represent the classes of MNIST.
Electronics 2020, 9, x FOR PEER REVIEW  3 of 17 After the input image flattening, the input layer of the network converts each input pixel's integer value to spike signals using various methods such as temporal coding and rate coding. The output layer consists of neurons, whose outputs represent the classes of MNIST. For low cost hardware implementation, the full-scale input image can be scaled down to reduce the SNN size. For example, Figure 1b shows a reduced SNN that takes as input the MNIST images scaled downed to 9 × 9.
The full scale SNNs and reduced SNNs have been used to evaluate our proposed SNN optimization and quantization methods.
Through an extensive analysis of performance-to-cost metric, we chose an SNN structure consisting of two fully connect hidden layers with various image size (28 × 28, 14 × 14, 9 × 9) and various number of neurons in each layer. Our SNN model represents each pixel integer value in a binary stream of spikes using the proposed rate coding scheme called a Binary Streamed Rate Coding (BSRC). Like other SNN models, our SNN model also distinguishes between two types of synapses, excitatory and inhibitory synapses, which are denoted by red and blue colors, respectively, in Figure  1a,b.
Throughout our paper, we define an SNN's dimension by a list of each layer's size (the number of neurons). For example, (784-800-10) represents an SNN consisting of the input layer with 784 neurons, first hidden layer with 800 neurons, and the output layer with 10 neurons.

Spike Signal Representation
Spiking neural networks are more plausible than other types of biological neuron ensembles. Most of SNNs process spike signals in two domains (temporal and spatial) [11], which provides prominent advantage over traditional ANNs which have only spatial domain. Figure 2 illustrates the differences between ANN and SNN's neuron models. Note the differences in the input signal, multiplication and addition processes, activation function, and the output signal. For low cost hardware implementation, the full-scale input image can be scaled down to reduce the SNN size. For example, Figure 1b shows a reduced SNN that takes as input the MNIST images scaled downed to 9 × 9.
The full scale SNNs and reduced SNNs have been used to evaluate our proposed SNN optimization and quantization methods.
Through an extensive analysis of performance-to-cost metric, we chose an SNN structure consisting of two fully connect hidden layers with various image size (28 × 28, 14 × 14, 9 × 9) and various number of neurons in each layer. Our SNN model represents each pixel integer value in a binary stream of spikes using the proposed rate coding scheme called a Binary Streamed Rate Coding (BSRC). Like other SNN models, our SNN model also distinguishes between two types of synapses, excitatory and inhibitory synapses, which are denoted by red and blue colors, respectively, in Figure 1a,b.
Throughout our paper, we define an SNN's dimension by a list of each layer's size (the number of neurons). For example, (784-800-10) represents an SNN consisting of the input layer with 784 neurons, first hidden layer with 800 neurons, and the output layer with 10 neurons.

Spike Signal Representation
Spiking neural networks are more plausible than other types of biological neuron ensembles. Most of SNNs process spike signals in two domains (temporal and spatial) [11], which provides prominent advantage over traditional ANNs which have only spatial domain. Figure 2 illustrates the differences between ANN and SNN's neuron models. Note the differences in the input signal, multiplication and addition processes, activation function, and the output signal. Coding schemes play an important role in representing the spike signals in each layer and training the SNN with the temporal and spatial domains. There are two common coding schemes for converting input pixel value in SNNs: rate coding and temporal coding [10,11]. The rate-based coding scheme is regarded as highly demanding for training and implementation, since it results in a large number of weight lookups and high spike traffic in the routing fabric [29]. Another difficulty is that high spike rate tends to mask the discrete nature of the spiking activity [30], On the other hand, the temporal coding SNNs tend to suffer from poor accuracy. For example, in [29,30], where temporal based coding was used, the authors reported low accuracies of 96.8% and 97.55% respectively, for the SNN of size (78-4800-10). In our work, we developed a hardware-friendly rate coding SNN model called Binary Streamed Rate Coding (BSRC) which can overcome the above drawbacks.
BSRC converts the input pixel values to rate-based spike signals represented by a binary stream. The length T of spike streams is determined with consideration of the hardware implementation cost. We represent each spike signal in each layer of SNN by a stream of T binary values with 1 indicating the presence of a spike and 0 indicating no spike. Each image in the MNIST dataset consists of pixels represented in integer values like most image data. In the proposed method, Algorithm 1 converts each input pixel's integer value to a stream of binary values representing the rate of spikes in the predetermined stream length T. For a pixel value of n-bits, the pixel is converted to a sequence of 2 n − 1 bits. Hence, the sequence length is given by Equation (1): Line 2 and 6 of Algorithm 1 separate the pixel values into two groups. The pixel value Pv satisfying the condition 0 < < ( /2) falls into the first group (line 2). For the first group, line 4 of Algorithm 1 sets the bits to ones that correspond to spike positions. On the other hand, the pixel values Pv meeting ( /2) ≤ Pv falls into the second group (line 6). For the second group, line 7 of Algorithm 1 initializes the spike stream Sspikes by all ones and sets the bits to zeros that correspond to non-spike positions (line 14). As the final step in each group, the stream of spikes is rotated by random positions R (0 < R < T − 1), which provides regularization with randomness for robust classification results. Coding schemes play an important role in representing the spike signals in each layer and training the SNN with the temporal and spatial domains. There are two common coding schemes for converting input pixel value in SNNs: rate coding and temporal coding [10,11]. The rate-based coding scheme is regarded as highly demanding for training and implementation, since it results in a large number of weight lookups and high spike traffic in the routing fabric [29]. Another difficulty is that high spike rate tends to mask the discrete nature of the spiking activity [30], On the other hand, the temporal coding SNNs tend to suffer from poor accuracy. For example, in [29,30], where temporal based coding was used, the authors reported low accuracies of 96.8% and 97.55% respectively, for the SNN of size (78-4800-10). In our work, we developed a hardware-friendly rate coding SNN model called Binary Streamed Rate Coding (BSRC) which can overcome the above drawbacks.
BSRC converts the input pixel values to rate-based spike signals represented by a binary stream. The length T of spike streams is determined with consideration of the hardware implementation cost. We represent each spike signal in each layer of SNN by a stream of T binary values with 1 indicating the presence of a spike and 0 indicating no spike. Each image in the MNIST dataset consists of pixels represented in integer values like most image data. In the proposed method, Algorithm 1 converts each input pixel's integer value to a stream of binary values representing the rate of spikes in the pre-determined stream length T. For a pixel value of n-bits, the pixel is converted to a sequence of 2 n − 1 bits. Hence, the sequence length T is given by Equation (1): Line 2 and 6 of Algorithm 1 separate the pixel values into two groups. The pixel value Pv satisfying the condition 0 < Pv < int(T/2) falls into the first group (line 2). For the first group, line 4 of Algorithm 1 sets the bits to ones that correspond to spike positions. On the other hand, the pixel values Pv meeting int(T/2) ≤ Pv falls into the second group (line 6). For the second group, line 7 of Algorithm 1 initializes the spike stream S spikes by all ones and sets the bits to zeros that correspond to non-spike positions (line 14). As the final step in each group, the stream of spikes is rotated by random positions R (0 < R < T − 1), which provides regularization with randomness for robust classification results.

Algorithm 1 Generate Binary Stream of Spikes
Inputs: input image pixel values Pv, n bits representing each pixel value, length T of binary stream. Output: Stream of Spikes (S spikes ) with length T = (2 n − 1) For each pixel in the input image: initialize S spikes with all ones 8.
if Pv_com =1 then // Only 1 spike is needed in the stream 10.
else if Pv_com > 1 then //generate equally distributed zeros 13. for Random-rotate (S spikes ) // Rotate in range (0, T − 1) As a running example, we use a reduced MNIST image with each pixel represented by 4 bits. Thus, the input layer of the SNN converts each pixel value to a binary sequence of a length of 15 spikes using Algorithm 1. For example, a pixel value of 5 is converted to the spike sequence shown below.
An example of Random-rotate (S spikes ) with R = 1 is given below.

14.
Sspikes Random-rotate (Sspikes) // Rotate in range (0, − 1) As a running example, we use a reduced MNIST image with each pixel represented by 4 bits. Thus, the input layer of the SNN converts each pixel value to a binary sequence of a length of 15 spikes using Algorithm 1. For example, a pixel value of 5 is converted to the spike sequence shown below.

Spiking Neural Network Model
Among various spiking neuron models, the leaky integrate and fire (LIF) neuron is regarded as more efficient and reliable than others [31]. The LIF neuron model is represented by Equation (2) [11]:

Spiking Neural Network Model
Among various spiking neuron models, the leaky integrate and fire (LIF) neuron is regarded as more efficient and reliable than others [31]. The LIF neuron model is represented by Equation (2) [11]: here, V m and τ represent the membrane voltage and time constant, respectively, while (W × S) is the dot product of synapse weights W and pre-synaptic inputs S. For V m (t = 0), the initial condition is Here, V m(reset) is the initial membrane voltage upon reset of the circuit. In our implementation, we used V m(reset) = 0 without loss of generality. Equation (2) has been simplified to an approximate formula of Equation (3) that is suitable for the numerical implementation of fast training and inference processes. The resulted exponential decay term e t−(t+1) τ from Equation (2) has been approximated by an addition operation with the previous membrane voltage factored by a constant slope D con . Equation (3) also makes the circuit implementation extremely compact even for large SNNs with thousands of synapses: here, V m(i) (t + 1) is the i-th neuron's membrane voltage at time t + 1, which is a real value. S i (t) and S j (t) indicate neuron spike outputs in binary value for post-synaptic and pre-synaptic neurons, respectively, while a real value W ij denotes the synaptic weight between the j-th pre-synaptic neuron and j-th post-synaptic neuron. N neurons indicates the number of neurons (neurons) in the previous layer (l − 1) and D con is a decay constant.
In Equation (3), the term (1 − S i (t)) gives a binary value 0 or 1 which resets the membrane voltage V m(i) (t + 1), when the spike output S i (t) is 1, after each spiking process. To minimize the hardware implementation cost, while maintaining high accuracy, we suppressed the decay constant in Equation (3), leading to Equation (4): In Section 4, we show that removing decay constant does not degrade the accuracy by comparing two implementations based on Equations (3) and (4), respectively.
In general, the SNN training or inference process for each input image is computed in an iterative fashion to take account for the rate or temporal coding of spikes. For the BSRC SNN model configured with the spike stream length T, the training or inference process expressed by Equation (3) or (4) is repeated for T times for each image to accumulate the effect the entire spike stream in the membrane. Equation (1) expresses the length T of spike stream for n-bit pixel value.

Optimization of SNN Model
The advantages of the proposed SNN optimization method are summarized below: • BSRC spike coding scheme significantly reduces the hardware cost by combining the advantages of both rate and temporal coding schemes using a built-in randomizer. • BSRC achieves high training and testing accuracies, while keeping the training time short. It reduces the training time by 50% compared to STBP [11] for the same accuracy goal. • For a network model of (784-800-10), BSRC achieves higher accuracy even with a small number of training epochs compared with the previous model.
• By splitting the one hidden layer into two hidden layers, we can substantially reduce the hardware cost with little loss in the classification accuracy.

•
The proposed quantization algorithm provides further reduction in hardware cost.

BSRC Based Training
For the training algorithm of the proposed BSRC SNN model, we employ a backpropagation algorithm based on modified gradient decent to handle spike signals. To solve the problem of spike signals being nondifferentiable, we approximate a spike signal by a rectangular signal of Equation (5), which is differentiable. Then we can express the differentiation of Equation (5) by Equation (6): At S(x) represents spike with 1 indicating a spike event. V m indicates the membrane voltage and V th denotes the threshold voltage.
Using Equations (5) and (6), our BSRC-based training algorithm optimizes weight values of the target SNN by utilizing both the spatial and temporal features of the SNN. The training algorithm also randomly rotates the BSRC spike streams in order to increase the regularization of the SNN model to avoid overfitting. For faster training, we use constant threshold voltages V th = 0.5 for all neurons in all layers. In general, a fixed threshold value does not degrade the accuracy during the training process, because optimal weights are selected with respect to the constant threshold. In our proposal we have conducted training of all the SNNs models with a spike stream length T = 15, a batch size of 100, and a learning rate of 10 −3 to compare their training accuracy.

SNN Structure Optimization
This section describes how we optimize the SNN structure and size. The objective of SNN optimization is to minimize the size of the SNN (the number of neurons and synapses) under the following constraints and assumptions: • Fully connected SNNs for MNIST are considered.

•
Each pixel of the input image is represented by 4 bits.

•
The target accuracy for MNIST is 94.60% or higher.
The structure optimization process explores SNN structures of various image size, various number of layers, and each layer size in the number of nodes. For the image size, we explored three different image sizes: 28 × 28, 14 × 14, and 9 × 9 by scaling the MNIST dataset. Figure 4 shows three SNNs models using the different image sizes and the accuracy obtained for the each SNNs model using the proposed BSRC based training method described above. The training was conducted with floating-point weight values before applying our weight quantization algorithm. The hardware cost of the three SNNs models in terms of the number of synapses decreases from (635,200) to (164,800), and then to (72,800), which corresponds to 75% and 88.54% reduction, respectively. From this structure optimization, we can observe that resizing the input images to 9 × 9 pixels presents significant size reduction at a negligible accuracy loss (only 0.84%). The next structure optimization is reducing the number of neurons in the hidden layers to determine the minimum network size that meets the target accuracy. For example, Figure 5 compares the maximum testing accuracy of SNNs with various number of neurons in the hidden layer using 9 × 9 MNIST dataset. For SNNs with the hidden layer size ranging from 800 to 50 neurons, the accuracy changes from 98.0% to 95.87%. This indicates that a substantial reduction of 72,800 to 4550 synapses (93.75% reduction) can be obtained at an accuracy loss of only 2.13%.
is reducing the number of neurons in the hidden layers to determine the minimum network size that meets the target accuracy. For example, Figure 5 compares the maximum testing accuracy of SNNs with various number of neurons in the hidden layer using 9 × 9 MNIST dataset. For SNNs with the hidden layer size ranging from 800 to 50 neurons, the accuracy changes from 98.0% to 95.87%. This indicates that a substantial reduction of 72,800 to 4550 synapses (93.75% reduction) can be obtained at an accuracy loss of only 2.13%.   For a further reduction in the overall hardware cost, we explore SNN structures by splitting the hidden layer into two layers. Figure 1b shows such an SNN constructed by hidden layer splitting. Figure 6 shows various SNN structures by splitting the hidden layer into two hidden layers with various ratios of number of neurons. We can observe that accuracy decreases as we further reduce the total number of synapses. For the running example, we chose the SNN of (81-30-20-10) as the target SNN structure to meet the final accuracy goal of 94.60%.

SNN Weight Quantization
After the iterative process of structure optimization and training, our optimization method applies a weight quantization algorithm to the trained SNN of selected structure to further reduce the circuit size and power consumption. To explain our weight quantization algorithm, we chose an SNN of (81-30-20-10) from the training and structure optimization results of Figure 6 (highlighted in blue color).
In the SNN model considered, every synapse consists of two types of weights: Maximum testing accuracy Network model For a further reduction in the overall hardware cost, we explore SNN structures by splitting the hidden layer into two layers. Figure 1b shows such an SNN constructed by hidden layer splitting. Figure 6 shows various SNN structures by splitting the hidden layer into two hidden layers with various ratios of number of neurons. We can observe that accuracy decreases as we further reduce the total number of synapses. For the running example, we chose the SNN of (81-30-20-10) as the target SNN structure to meet the final accuracy goal of 94.60%. For a further reduction in the overall hardware cost, we explore SNN structures by splitting the hidden layer into two layers. Figure 1b shows such an SNN constructed by hidden layer splitting. Figure 6 shows various SNN structures by splitting the hidden layer into two hidden layers with various ratios of number of neurons. We can observe that accuracy decreases as we further reduce the total number of synapses. For the running example, we chose the SNN of (81-30-20-10) as the target SNN structure to meet the final accuracy goal of 94.60%.

SNN Weight Quantization
After the iterative process of structure optimization and training, our optimization method applies a weight quantization algorithm to the trained SNN of selected structure to further reduce the circuit size and power consumption. To explain our weight quantization algorithm, we chose an SNN of (81-30-20-10) from the training and structure optimization results of Figure 6 (highlighted in blue color).
In the SNN model considered, every synapse consists of two types of weights: Maximum testing accuracy Network model Figure 6. Maximum testing accuracy of SNNs with two hidden layers using various structures.

SNN Weight Quantization
After the iterative process of structure optimization and training, our optimization method applies a weight quantization algorithm to the trained SNN of selected structure to further reduce the circuit size and power consumption. To explain our weight quantization algorithm, we chose an SNN of (81-30-20-10) from the training and structure optimization results of Figure 6 (highlighted in blue color).
In the SNN model considered, every synapse consists of two types of weights: 1.
Positive weights for Excitatory Synapses 2.
Negative weights for Inhibitory Synapses Upon receiving each spike signal, an excitatory synapse increases the spiking rate, while an inhibitory synapse decreases the spiking rate. For each synapse type, the quantization algorithm calculates the mean and the standard deviation of the trained weights that are initially in floating-point. The proposed quantization algorithm determines the optimal range of the weights for each layer based on the mean and standard deviation of floating-point weights. It then clips the weights of each layer by selecting two limits, a positive limit for excitatory synapses and a negative limit for inhibitory synapses. Algorithm 2 summarizes the key steps of our weight quantization algorithm. The selected clipping limits influence the final accuracy. Thus, Algorithm 2 determines the clipping limits for each layer in a way that maximizes the overall SNN's accuracy.

Algorithm 2 Find optimal Weight Quantization
Inputs: SNN with floating-point weights w, Target Accuracy A Target , Max allowable number of bits B max for quantization Output: The number bits N, Quantized weights w Exc _int and w Inh_int 1. for l = 0 to L // L is the num. of layers // Group all weights into excitatory and inhibitory weights 2.
for i = 0 to len(w); 11. for N = 0 to B max // Find the best quantization bits N // Calculate the quantization step 14.
Acc = Calculate Accuracy of SNN (w Q_int ) 21. Select best (n Exc , n Inh ) with min N that meets A Target Lines 2-4 of Algorithm 2 start by splitting the trained floating-point weights to excitatory and inhibitory weights. For each of weights, Lines 5 and 6 compute the mean and standard deviation.
Meanwhile, lines 7-12 iteratively determine the minimum clipping limits of the weight groups, so we can maximize the resolution of the quantized weights by clipping large outlier weights. Lines 14 and 15 calculate the excitatory and inhibitory quantization steps (∆Excq, ∆Inhq) based on the chosen number of bits N for weight quantization. Then, lines 16-21 determine the minimal number of bits for the quantized weights in each layer that can satisfy the target accuracy in the constraints.
To apply Algorithm 2 to the running example SNN of (81-30-20-10), we chose n min = 0.8, n max = 1.6, n step = 0.1, and B max = 8. As optimal parameters, Algorithm 2 selected n = 4 bits for all layers, clipping limits of (1.04, −1.03) for hidden layer 1, (0.95, −1.19) for hidden layer 2, and (1.24, −2.20) for the output layer. Figure 7 shows the output layer floating-point weights, in this figure the two horizontal red lines indicate the range of floating-point weights before weights clipping. The purple and gray lines (at 1.24 and −2.20) in Figures 7 and 8 are two examples for the chipping levels of excitatory and inhibitory synapses, respectively. Figure 9 shows the SNN output layer 4-bit integers optimized quantized weights according to Algorithm 2.

Integer Threshold Compensation
The final step of the proposed SNN optimization method is an integer threshold compensation algorithm. After applying the weight quantization algorithm, the floating-point threshold should be compensated to maintain the target accuracy [12,19]. The following equations describe why threshold compensation is required and derive a simplified formula to directly calculate the integer threshold compensation. Equation (7) represents total number of spikes fired by each neuron ( ) in any layer: In Equations (7) and (8), In Equations (9)-(12), by using matrix notations for the weights and input spike streams of all synapses for each neuron, where ( ) and ( ) denotes all post-neurons and previous-neurons

Integer Threshold Compensation
The final step of the proposed SNN optimization method is an integer threshold compensation algorithm. After applying the weight quantization algorithm, the floating-point threshold should be compensated to maintain the target accuracy [12,19]. The following equations describe why threshold compensation is required and derive a simplified formula to directly calculate the integer threshold compensation. Equation (7) represents total number of spikes fired by each neuron (i) in any layer: In Equations (7) and (8), N Spikes(post)i is the number of generated spikes in one neuron (i), while F r(post)i indicates the firing rate using floating-point weights for post-neuron (i), and V th is the original threshold voltages. of neurons in the previous layer, while W denotes a floating-point weight matrix. In Equation (11), f j−r(pre) indicates the firing rate of the spike stream T for each synapse j.
Equation (13) represents an estimation of the firing rate after weight quantization process with N-bits using the integer threshold based on Equation (12).
here, W int represents a matrix of N-bit quantized weights, while V th−int denotes the target threshold to be compensated in integer value for the current layer, while F r(pre)−int and F r(post)−int indicate the pre-synaptic firing rate and post-neuron firing rate, respectively. The goal of our threshold compensation algorithm is to determine an integer threshold V th−int for quantized weights such that the firing rates of Equations (12) and (13) best match; this goal is expressed by Equation (14).
To determine a fast solution that satisfies Equation (14), we assume that F r(pre)−int equals F r(pre) (pre-synaptic firing rates before and after weight quantization), which derives a solution expressed by Equation (15): To further speed up the computation, we simplify Equation (15) by Equation (16). Equation (16) enables a simple and fast method to find one common threshold (per layer) for all neurons for each network layer: Algorithm 3 summarizes the threshold compensation process for an SNN model of L layers. Line 2 reads the floating-point and integer weights for the entire layer. In line 4, the algorithm divides the accumulated (floating-point and integer) weights for each neuron (i) in the current layer l. Line 5 incrementally calculates the absolute mean of appended results from line 4; the algorithm uses Equation (16) to calculate the compensated integer threshold V th−int(l) for each layer l. Figure 10 summarizes the accuracy of two SNN structures trained using the proposed BSRC method compared to the previous works in [1,22]. When we apply the proposed BSRC method to an SNN structure of (784-800-10) with floating-point weights using the full-scale MNIST dataset, the model achieves an accuracy of 98.84% after only 84 epochs. In contrast, the previous works, HM2-BP [22] and STBP [11] require 100 and 200 epochs, respectively, to obtain the same level of accuracy for the same SNN structure. For a reduced SNN structure of (784-400-10), BSRC outperforms STBP [11] by 0.3% in accuracy with a training of 43 epochs. This evidence proves the effectiveness of the proposed spike representation and SNN training method. Next, we compare the SNN structural optimization targeting a hardware cost reduction up to the order of 3. We first shrink the input layer size to 9 × 9 MNIST images and follow by minimizing the number of neurons in each layer. Figure 11 compares the accuracy and network size (number of synapses and neurons) for various SNN structures by varying the size of two hidden layers. We chose (81-30-20-10) as the final SNN structure which reduces the number of neurons to 3230 compared with 4550 neurons for the SNN of (81-50-10) at a cost of only 0.7% loss in accuracy.
Electronics 2020, 9, x FOR PEER REVIEW 13 of 17 proposed spike representation and SNN training method. Next, we compare the SNN structural optimization targeting a hardware cost reduction up to the order of 3. We first shrink the input layer size to 9 × 9 MNIST images and follow by minimizing the number of neurons in each layer. Figure 11 compares the accuracy and network size (number of synapses and neurons) for various SNN structures by varying the size of two hidden layers. We chose (81-30-20-10) as the final SNN structure which reduces the number of neurons to 3230 compared with 4550 neurons for the SNN of (81-50-10) at a cost of only 0.7% loss in accuracy.   Figure 12 compares the accuracy of the reduced SNNs (81-30-20-10) obtained via our quantization algorithm. First, note that before the quantization step, the SNN with no decay factor (no leaky component) leads to an accuracy that is 0.27% higher than the SNN with decay factor. This result demonstrates that our simplified neuron model with no decay factor (see Equation (4)) has no accuracy loss. Figure 12 also compares the accuracy of optimized SNNs before and after weight quantization, which proves the effectiveness of our weight quantization method. For example, while the SNN (81-30-20-10) with floating point weight gives an accuracy of 95.25%, the SNNs quantized to 8 bits and 4 bits, respectively, obtain an accuracy of 94. 88%   proposed spike representation and SNN training method. Next, we compare the SNN structural optimization targeting a hardware cost reduction up to the order of 3. We first shrink the input layer size to 9 × 9 MNIST images and follow by minimizing the number of neurons in each layer. Figure 11 compares the accuracy and network size (number of synapses and neurons) for various SNN structures by varying the size of two hidden layers. We chose (81-30-20-10) as the final SNN structure which reduces the number of neurons to 3230 compared with 4550 neurons for the SNN of (81-50-10) at a cost of only 0.7% loss in accuracy.   Figure 12 compares the accuracy of the reduced SNNs (81-30-20-10) obtained via our quantization algorithm. First, note that before the quantization step, the SNN with no decay factor (no leaky component) leads to an accuracy that is 0.27% higher than the SNN with decay factor. This result demonstrates that our simplified neuron model with no decay factor (see Equation (4)) has no accuracy loss. Figure 12 also compares the accuracy of optimized SNNs before and after weight quantization, which proves the effectiveness of our weight quantization method. For example, while the SNN (81-30-20-10) with floating point weight gives an accuracy of 95.25%, the SNNs quantized to 8 bits and 4 bits, respectively, obtain an accuracy of 94. 88% Figure 12 compares the accuracy of the reduced SNNs (81-30-20-10) obtained via our quantization algorithm. First, note that before the quantization step, the SNN with no decay factor (no leaky component) leads to an accuracy that is 0.27% higher than the SNN with decay factor. This result demonstrates that our simplified neuron model with no decay factor (see Equation (4)) has no accuracy Electronics 2020, 9,1599 14 of 17 loss. Figure 12 also compares the accuracy of optimized SNNs before and after weight quantization, which proves the effectiveness of our weight quantization method. For example, while the SNN (81-30-20-10) with floating point weight gives an accuracy of 95.25%, the SNNs quantized to 8 bits and 4 bits, respectively, obtain an accuracy of 94.88% and 94.69%. This indicates that a substantial size reduction can be obtained at a negligible accuracy loss.  Figure 13 compares three performance metrics (accuracy, number of synapses, and circuit area) for four different SNNs. The first two groups of bar graphs show the performance metrics of the previous SNNs of 784-800-10 (HM2-BP [22]) and 784-400-10 ( STBP [11]), respectively. These SNNs require a large number of synapses, which lead to excessive hardware cost. Here, the SNN circuit size is represented by the silicon size in mm 2 . In contrast, the third and fourth groups of the bar graphs represent our optimized network structure of (81-30-20-10) for 9 × 9 MNIST images. Compared with the previous SNNs, they drastically reduce the hardware size by 183.19 times (for SNN with floating point weights) and 764.73 times (for SNN with 4-bit quantized weights) at a small accuracy loss of 4.24%, which still satisfies our target accuracy constraint of 94.60%. Compared with HM2-BP SNN of (784-800-10), our SNN of (81-30-20-10) optimized by the proposed BSRC method achieves a significant reduction in the number of synapses from 635,200 to 3230 synapses at a small accuracy loss from 98.93% to 95.25%. We use Equation (17) to estimate the relative area of the circuit implementation for the SNN model:   Figure 13 compares three performance metrics (accuracy, number of synapses, and circuit area) for four different SNNs. The first two groups of bar graphs show the performance metrics of the previous SNNs of 784-800-10 (HM2-BP [22]) and 784-400-10 ( STBP [11]), respectively. These SNNs require a large number of synapses, which lead to excessive hardware cost. Here, the SNN circuit size is represented by the silicon size in mm 2 . In contrast, the third and fourth groups of the bar graphs represent our optimized network structure of (81-30-20-10) for 9 × 9 MNIST images. Compared with the previous SNNs, they drastically reduce the hardware size by 183.19 times (for SNN with floating point weights) and 764.73 times (for SNN with 4-bit quantized weights) at a small accuracy loss of 4.24%, which still satisfies our target accuracy constraint of 94.60%. Compared with HM2-BP SNN of (784-800-10), our SNN of (81-30-20-10) optimized by the proposed BSRC method achieves a significant reduction in the number of synapses from 635,200 to 3230 synapses at a small accuracy loss from 98.93% to 95.25%.  Figure 13 compares three performance metrics (accuracy, number of synapses, and circuit area) for four different SNNs. The first two groups of bar graphs show the performance metrics of the previous SNNs of 784-800-10 (HM2-BP [22]) and 784-400-10 ( STBP [11]), respectively. These SNNs require a large number of synapses, which lead to excessive hardware cost. Here, the SNN circuit size is represented by the silicon size in mm 2 . In contrast, the third and fourth groups of the bar graphs represent our optimized network structure of (81-30-20-10) for 9 × 9 MNIST images. Compared with the previous SNNs, they drastically reduce the hardware size by 183.19 times (for SNN with floating point weights) and 764.73 times (for SNN with 4-bit quantized weights) at a small accuracy loss of 4.24%, which still satisfies our target accuracy constraint of 94.60%. Compared with HM2-BP SNN of (784-800-10), our SNN of (81-30-20-10) optimized by the proposed BSRC method achieves a significant reduction in the number of synapses from 635,200 to 3230 synapses at a small accuracy loss from 98.93% to 95.25%. We use Equation (17) to estimate the relative area of the circuit implementation for the SNN model: Testing accuracy Network model Figure 13. SNN circuit area mm 2 and accuracy using different models.
We use Equation (17) to estimate the relative area of the circuit implementation for the SNN model: Circuit Area = N syn × Size syn + N neuron × Size neuron + H w,R (17) here, N syn indicates the total number of synapses, while Size syn represents the total size of synapse circuits including the memories for quantized weights. N neuron denotes the total number of neurons, while Size neuron indicates the size of a neuron circuit. H w,R is the wiring and routing overhead.
As we can see in Figure 13, the proposed optimization method reduces the hardware area from 3089.49 mm 2 for HM2-BP SNN (784-800-10) with 32-bit floating-point weights to 16.86 mm 2 for SNN (81-30-20-10) with 32-bit floating-point weights. The 4-bit quantization method provides an additional size reduction of 4 times from 16.86 mm 2 to 4.04 mm 2 with a negligible accuracy loss of only 0.56%. These implementation and evaluation results demonstrate that the proposed optimization method using BSRC SNN model is a highly effective approach to minimizing the hardware size and tradeoff between accuracy and hardware size.
We implemented the final optimized SNN model BSRC (81-30-20-10) using analog circuits for synapse and neuron cells and digital standard cells for weight and image memories. Figure 14a illustrates the overall block diagram of the SNN chip based on BSRC (81-30-20-10), while Figure 14b shows the full chip layout design of the silicon. The SNN chip is currently under fabrication using CMOS 65 nm process. We plan to report the test results of the silicon in a future paper.
Electronics 2020, 9, x FOR PEER REVIEW  15 of 17 here, indicates the total number of synapses, while represents the total size of synapse circuits including the memories for quantized weights.
denotes the total number of neurons, while indicates the size of a neuron circuit. , is the wiring and routing overhead. As we can see in Figure 13, the proposed optimization method reduces the hardware area from 3089.49 mm for HM2-BP SNN (784-800-10) with 32-bit floating-point weights to 16.86 mm for SNN (81-30-20-10) with 32-bit floating-point weights. The 4-bit quantization method provides an additional size reduction of 4 times from 16.86 mm to 4.04 mm with a negligible accuracy loss of only 0.56%. These implementation and evaluation results demonstrate that the proposed optimization method using BSRC SNN model is a highly effective approach to minimizing the hardware size and tradeoff between accuracy and hardware size.
We implemented the final optimized SNN model BSRC (81-30-20-10) using analog circuits for synapse and neuron cells and digital standard cells for weight and image memories. Figure 14a illustrates the overall block diagram of the SNN chip based on BSRC (81-30-20-10), while Figure 14b shows the full chip layout design of the silicon. The SNN chip is currently under fabrication using CMOS 65 nm process. We plan to report the test results of the silicon in a future paper.

Conclusions
Although the artificial neural networks can achieve high testing accuracy for the MNIST dataset, most of these models tend to incur excessive hardware code, and thus are not suitable for edge AI and mobile systems. In this work, we have developed an SNN model and optimization method by introducing a new spike representation scheme called Binary Streamed Rate Coding (BSRC). BSRC improves the model generality, eliminates the need for random noise, and consequently offers efficient training and high accuracy. Our optimization method consists of SNN structure optimization, BSRC spike generation, weight quantization, and threshold compensation algorithms. We applied the proposed optimization method to an SNN for MNIST dataset and obtained 764.73 times reduction in circuit size with accuracy loss of 4.24% compared with the previous SNN reported in [22]. This result envisages that the proposed method can offer a breakthrough to design an extremely compact and low power SNN hardware with a reasonable accuracy aimed at edge AI applications. For future work, we plan to extend the proposed method and apply to a larger SNNs with different datasets such as CIFAR-10 which contain color images and CIFAR-100 for more classes. We also plan to extend it for hybrid neural networks comprising convolutional layers as well as spiking synapse layers.

Conclusions
Although the artificial neural networks can achieve high testing accuracy for the MNIST dataset, most of these models tend to incur excessive hardware code, and thus are not suitable for edge AI and mobile systems. In this work, we have developed an SNN model and optimization method by introducing a new spike representation scheme called Binary Streamed Rate Coding (BSRC). BSRC improves the model generality, eliminates the need for random noise, and consequently offers efficient training and high accuracy. Our optimization method consists of SNN structure optimization, BSRC spike generation, weight quantization, and threshold compensation algorithms. We applied the proposed optimization method to an SNN for MNIST dataset and obtained 764.73 times reduction in circuit size with accuracy loss of 4.24% compared with the previous SNN reported in [22]. This result envisages that the proposed method can offer a breakthrough to design an extremely compact and low power SNN hardware with a reasonable accuracy aimed at edge AI applications. For future work, we plan to extend the proposed method and apply to a larger SNNs with different datasets such as CIFAR-10 which contain color images and CIFAR-100 for more classes. We also plan to extend it for hybrid neural networks comprising convolutional layers as well as spiking synapse layers.