A Compact and Low-Power SoC Design for Spiking Neural Network Based on Current Multiplier Charge Injector Synapse

This paper presents a compact analog system-on-chip (SoC) implementation of a spiking neural network (SNN) for low-power Internet of Things (IoT) applications. The low-power implementation of an SNN SoC requires the optimization of not only the SNN model but also the architecture and circuit designs. In this work, the SNN has been constituted from the analog neuron and synaptic circuits, which are designed to optimize both the chip area and power consumption. The proposed synapse circuit is based on a current multiplier charge injector (CMCI) circuit, which can significantly reduce power consumption and chip area compared with the previous work while allowing for design scalability for higher resolutions. The proposed neuron circuit employs an asynchronous structure, which makes it highly sensitive to input synaptic currents and enables it to achieve higher energy efficiency. To compare the performance of the proposed SoC in its area and power consumption, we implemented a digital SoC for the same SNN model in FPGA. The proposed SNN chip, when trained using the MNIST dataset, achieves a classification accuracy of 96.56%. The presented SNN chip has been implemented using a 65 nm CMOS process for fabrication. The entire chip occupies 0.96 mm2 and consumes an average power of 530 μW, which is 200 times lower than its digital counterpart.


Introduction
In an effort to make Internet of Things (IoT) hardware more intelligent, artificial intelligence (AI) is being employed in the next generation of IoT applications. The inclusion of AI in IoT promises new horizons while presenting some challenges. Existing AI neural networks use CPU/GPU hardware architectures, which are not feasible for IoT applications with limited power. IoT applications have scarce energy sources and thus require lowpower solutions to ensure the longevity of the devices [1].
On the other hand, neuromorphic systems for prospective computing systems that mirror a biological neural network have received great research interest. These systems are highly energy-efficient and perform parallel signal processing [2,3]. In many IoT applications, deep neural networks (DNNs) are being recognized for achieving high accuracy of classification [4]. However, DNNs' excessive calculations and the exigencies of memory over a conventional von Neumann computing system make them power-hungry and occupy more bandwidth. Thus, they are not applicable to IoT and mobile applications such as object recognition in drones [5].
In contrast with today's digital microprocessors, the biological human brain is established on a non-von Neumann architecture [6], wherein neurons (processing elements) of performing inference and training tasks. The information delivery process in the SNN resembles a brain wherein interconnected neurons become activated upon receiving discrete input spikes that are evoked at different time intervals. A neuron in a layer receives input spikes from a neuron in a previous layer via synapses. The weight of the synapse modulates the incoming spikes and generates an equivalent current. The charge from all input synapses is accumulated in the form of potential on the neuron's membrane. The neuron evokes an output spike when its membrane potential accumulates up to a predefined threshold value. Therefore, synapses can be considered as a memory with a communication interface, while neurons can be considered as a processing unit able to accumulate and compare.

Leaky Integrate and Fire Model
The Leaky Integrate and Fire (LIF) neuronal model has been adopted for realizing the large-scale SNN in this work, as it encapsulates biological computational features of a neuron by integrating simpler circuits on silicon [16]. The robust CMOS-based LIF neuronal model finds advantages over other models [17] since it allows for a compact silicon implementation of a large SNN on a chip. A representation of the LIF neuronal model with elementary CMOS devices is shown in Figure 1. Here, the neuronal membrane potential V mem can be modeled as a parallel connection of the Resistor-Capacitor network. The response of the parallel structure of C mem (membrane capacitance) and R mem (membrane resistance) can be modeled by Kirchhoff's current law [11] and is defined as in Equation (1). (1)

Spiking Neural Network Model
A third-generation SNN has been chosen in this work for the on-chip implantation of neural networks due to its brain-like achievements and efficiency in spatial-temporal coding [15]. An SNN is a network of neurons interconnected through synapses capable of performing inference and training tasks. The information delivery process in the SNN resembles a brain wherein interconnected neurons become activated upon receiving discrete input spikes that are evoked at different time intervals. A neuron in a layer receives input spikes from a neuron in a previous layer via synapses. The weight of the synapse modulates the incoming spikes and generates an equivalent current. The charge from all input synapses is accumulated in the form of potential on the neuron's membrane. The neuron evokes an output spike when its membrane potential accumulates up to a predefined threshold value. Therefore, synapses can be considered as a memory with a communication interface, while neurons can be considered as a processing unit able to accumulate and compare.

Leaky Integrate and Fire Model
The Leaky Integrate and Fire (LIF) neuronal model has been adopted for realizing the large-scale SNN in this work, as it encapsulates biological computational features of a neuron by integrating simpler circuits on silicon [16]. The robust CMOS-based LIF neuronal model finds advantages over other models [17] since it allows for a compact silicon implementation of a large SNN on a chip. A representation of the LIF neuronal model with elementary CMOS devices is shown in Figure 1. Here, the neuronal membrane potential Vmem can be modeled as a parallel connection of the Resistor-Capacitor network. The response of the parallel structure of Cmem (membrane capacitance) and Rmem (membrane resistance) can be modeled by Kirchhoff's current law [11] and is defined as in Equation (1). (1) Here, the synapse acting as a current source injects a current I(t) into the neuron, charging Cmem with current IC(t) and discharging Cmem through Rmem (leakage path) with current IR(t). When Vmem ≤ Vth, input synaptic current from a multitude of input synaptic sources accumulates charge over Cmem and increases Vmem.

Synapse2
Synapse3 Synapsei Here, the synapse acting as a current source injects a current I(t) into the neuron, charging C mem with current I C (t) and discharging C mem through R mem (leakage path) with current I R (t). When V mem ≤ Vth, input synaptic current from a multitude of input synaptic sources accumulates charge over C mem and increases V mem .
When V mem ≥ Vth, V mem triggers the neuron to evoke an output spike signal, and V mem becomes immediately reset to the resting potential V reset . Equation (3) defines Vth as the threshold voltage value at which the comparator decides to evoke an output spike. V reset is the resting potential of the neuronal membrane after an output spike is evoked. The total current I(t) is the sum of all the excitatory and inhibitory currents injected by all the input synapses and is expressed as Here, in Equation (4), I ref is the reference synaptic current to be generated upon receiving a pre-synaptic input spike f (the number of spikes). I ref is then multiplied with the weight W i of the ith synapse to generate the total injected current I(t) [14].

Optimization of SNN Architecture
The proposed SNN SoC architecture adopts a spike signal representation called Binary Streamed Rate Coding (BSRC), which has been introduced by [18]. It has been shown that BSRC allows for an optimized SNN model for compact hardware and employs direct training for high accuracy based on an off-chip training technique. This direct training technique uses the exact model of SNN hardware implementation for determining floating point weights by propagating spike signals in the form of binary streams through synapses and neurons in each layer of the SNN model. Afterward, the floating-point weights are quantized into integer weights having a minimum bit-width of 5-bit values, which are good enough to achieve the target accuracy of 96% or higher for the MNIST dataset.
The overall architecture of the proposed SNN chip implementation is shown in Figure 2. The architecture, after optimization using the BSRC SNN model, is composed of four fully connected layers, namely (196-30-20-10), integrating synapse circuits, neuron circuits, and flip flops for storing image pixel and weight values. The first input layer (IPL) consists of 196 neurons that receive 196 individual grayscale pixels of the input image of size 14 × 14. The two hidden layers (HL1, HL2) consist of 30 and 20 neurons, respectively, while the output layer (OPL) of 10 neurons is used to classify each handwritten digit image of the MNIST dataset into ten numbers from 0 to 9. A total of 6680 synapses fully interconnect all the 256 neurons in each layer with one another. The 14 × 14 pixels of each image from the MNIST dataset are provided to the IPL synapses, which convert each pixel value into a stream of spike signals. As a result, the IPL generates 196 spike signal trains. Each spike train consists of spike pulses ranging from 1 spike up to 15 spike pulses. The proposed architecture distributes the weight memories to all the synapses. Small registers are collocated with associated synapse circuits in each layer. The register in the IPL stores a 5-bit pixel value, while the register in the remaining layers keeps the weight values. The proposed architecture, with distributed memory, benefits from minimal routing for high-speed processing and lower power consumption. The OPL consists of 10 neurons, which produce output spike trains. The proposed architecture has a digital controller (DC) that counts the number of spike pulses produced by each neuron of the OPL and classifies the image based on which output neuron gives the maximum spiking activity.

Analog SNN SoC Implementation
The four-layer SNN model in Figure 2 has been constructed from 6680 synapse circuits interconnecting different neurons. To keep size and power consumption low, a prototype synapse and neuron circuits were proposed earlier [13]. These circuits serve as a building block for our SNN hardware implementation in this work.
The neuron circuit is established on the LIF model of the neuron, as shown in Figure  3. A Metal-Insulator-Metal Capacitor (MIMCAP) Cmem of 10 fF is used to realize the neuronal membrane, which accumulates all the input synaptic currents to produce Vmem. MIM-CAP realizes Cmem to ensure optimum linearity and minimal power consumption. The comparator, a crucial part of the LIF model, determines when to fire output spikes using a condition such as Vmem ≥ Vth. The proposed asynchronous Schmitt Trigger circuit [13] serves as a comparator and benefits from optimum Vth, high sensitivity, less area, and power consumption. The comparison of Vmem and Vth determines whether or not the Schmitt Trigger fires an output spike. When the Schmitt trigger fires an output spike, the feedback path resets the membrane potential to the initial potential (Vreset). The leakage resistance and reset feedback path are implemented by NMOS constant current source and switches, respectively. Four output buffer stages are implemented, with an incremental 4× larger buffer in the next

Analog SNN SoC Implementation
The four-layer SNN model in Figure 2 has been constructed from 6680 synapse circuits interconnecting different neurons. To keep size and power consumption low, a prototype synapse and neuron circuits were proposed earlier [13]. These circuits serve as a building block for our SNN hardware implementation in this work.
The neuron circuit is established on the LIF model of the neuron, as shown in Figure 3. A Metal-Insulator-Metal Capacitor (MIMCAP) C mem of 10 fF is used to realize the neuronal membrane, which accumulates all the input synaptic currents to produce V mem . MIMCAP realizes C mem to ensure optimum linearity and minimal power consumption. The comparator, a crucial part of the LIF model, determines when to fire output spikes using a condition such as V mem ≥ Vth.

Analog SNN SoC Implementation
The four-layer SNN model in Figure 2 has been constructed from 6680 synapse circuits interconnecting different neurons. To keep size and power consumption low, a prototype synapse and neuron circuits were proposed earlier [13]. These circuits serve as a building block for our SNN hardware implementation in this work.
The neuron circuit is established on the LIF model of the neuron, as shown in Figure  3. A Metal-Insulator-Metal Capacitor (MIMCAP) Cmem of 10 fF is used to realize the neuronal membrane, which accumulates all the input synaptic currents to produce Vmem. MIM-CAP realizes Cmem to ensure optimum linearity and minimal power consumption. The comparator, a crucial part of the LIF model, determines when to fire output spikes using a condition such as Vmem ≥ Vth. The proposed asynchronous Schmitt Trigger circuit [13] serves as a comparator and benefits from optimum Vth, high sensitivity, less area, and power consumption. The comparison of Vmem and Vth determines whether or not the Schmitt Trigger fires an output spike. When the Schmitt trigger fires an output spike, the feedback path resets the membrane potential to the initial potential (Vreset). The leakage resistance and reset feedback path are implemented by NMOS constant current source and switches, respectively. Four output buffer stages are implemented, with an incremental 4× larger buffer in the next HL 2 weights memory  The proposed asynchronous Schmitt Trigger circuit [13] serves as a comparator and benefits from optimum Vth, high sensitivity, less area, and power consumption. The comparison of V mem and Vth determines whether or not the Schmitt Trigger fires an output spike. When the Schmitt trigger fires an output spike, the feedback path resets the membrane potential to the initial potential (V reset ). The leakage resistance and reset feedback path are implemented by NMOS constant current source and switches, respectively. Four output buffer stages are implemented, with an incremental 4× larger buffer in the next stage, to drive the synapses of the next layer. The neuron circuit in [13] offers a more optimal and compact structure than in [14].
The synapse circuit designed for the proposed SNN, as pictured in Figure 4, is a 5-bit binary weighted current mirror structure named a "Current Multiplier Charge injector" (CMCI). The circuit embodies two symmetric and complementary portions. An excitatory portion, which is made up of four binary-weighted NMOS branches, and an inhibitory portion, which is made up of four binary-weighted PMOS branches. Each synapse calculates either a positive current amount modeling an excitatory action or negative current amount modeling an inhibitory action. Upon receiving an input spike event, the weight of each synapse determines the amount of current to be injected or ejected. The sign bit of the 5-bit weight parameter determines whether the synapse undertakes an excitatory or inhibitory action. Then, 4-bit LSBs of weight value determine the amount of binary-weighted current to be injected (excitatory) or ejected (inhibitory). The binary-weighted current is enabled by each weight bit connected to each branch of the synaptic circuit. The current mirroring transistors multiply the current of the right half by 1× and of the left half by 4×, thus forming binary-weighted currents by using four branches which are turned on or off by each of the weight bits, w 3 , w 2 , w 1 , and w 0 . The final current of the synapse circuit is accumulated on or ejected from the membrane potential capacitor. The splitting of the synapse circuit into symmetric right and left halves and the current multiplication reduces the size of the proposed CMCI synapse circuit by 60% compared to the conventional binary-weighted circuits. Moreover, the flip-flops are utilized in conjunction with each synapse circuit to store pre-trained weight values on synapses. The CMCI structure synapse consumes less area and power than the previous synapse circuit proposed in [14] while providing higher design scalability for higher resolutions. stage, to drive the synapses of the next layer. The neuron circuit in [13] offers a more optimal and compact structure than in [14]. The synapse circuit designed for the proposed SNN, as pictured in Figure 4, is a 5-bit binary weighted current mirror structure named a "Current Multiplier Charge injector" (CMCI). The circuit embodies two symmetric and complementary portions. An excitatory portion, which is made up of four binary-weighted NMOS branches, and an inhibitory portion, which is made up of four binary-weighted PMOS branches. Each synapse calculates either a positive current amount modeling an excitatory action or negative current amount modeling an inhibitory action. Upon receiving an input spike event, the weight of each synapse determines the amount of current to be injected or ejected. The sign bit of the 5-bit weight parameter determines whether the synapse undertakes an excitatory or inhibitory action. Then, 4-bit LSBs of weight value determine the amount of binaryweighted current to be injected (excitatory) or ejected (inhibitory). The binary-weighted current is enabled by each weight bit connected to each branch of the synaptic circuit. The current mirroring transistors multiply the current of the right half by 1× and of the left half by 4×, thus forming binary-weighted currents by using four branches which are turned on or off by each of the weight bits, w3, w2, w1, and w0. The final current of the synapse circuit is accumulated on or ejected from the membrane potential capacitor. The splitting of the synapse circuit into symmetric right and left halves and the current multiplication reduces the size of the proposed CMCI synapse circuit by 60% compared to the conventional binary-weighted circuits. Moreover, the flip-flops are utilized in conjunction with each synapse circuit to store pre-trained weight values on synapses. The CMCI structure synapse consumes less area and power than the previous synapse circuit proposed in [14] while providing higher design scalability for higher resolutions.  One neuron cell comprising one synapse circuit and a neuron circuit was simulated, and the results are shown in Figure 5. It illustrates input spikes, accumulation of Vmem, and evoking of output spikes for different values of 5-bit weights. For this simulation, the synapse circuit first receives a 5-bit weight parameter, which is pre-determined by the training process of the SNN system of our concern. As shown in Figure 5a, a sequence of 15 One neuron cell comprising one synapse circuit and a neuron circuit was simulated, and the results are shown in Figure 5. It illustrates input spikes, accumulation of V mem , and evoking of output spikes for different values of 5-bit weights. For this simulation, the synapse circuit first receives a 5-bit weight parameter, which is pre-determined by the training process of the SNN system of our concern. As shown in Figure 5a, a sequence of 15 input spike pulses acting as an enable signal is supplied to the CMCI synapse. As can be seen in Figure 5b,c, for a synapse with an excitatory weight (MSB = 1), each input spike makes the generated current charge up V mem . Once V mem exceeds the pre-determined threshold Vth, the neuron circuit fires an output spike. For example, if the weight value is +15, a single input spike generates the highest current amount, which rapidly charges V mem . On the other hand, for a synapse with an inhibitory weight (MSB = 0), each input spike makes the current discharge V mem . Therefore, the inhibitory weight refrains V mem from evoking an output spike. After the neuron evokes an output spike, it resets V mem to V reset . In case of either no input spike or a weight value of 0, V mem remains at the same value or decreases over time due to the leaky integration function of the LIF neuron.
Sensors 2023, 23, x FOR PEER REVIEW 7 of 13 input spike pulses acting as an enable signal is supplied to the CMCI synapse. As can be seen in Figure 5b,c, for a synapse with an excitatory weight (MSB = 1), each input spike makes the generated current charge up Vmem. Once Vmem exceeds the pre-determined threshold Vth, the neuron circuit fires an output spike. For example, if the weight value is +15, a single input spike generates the highest current amount, which rapidly charges Vmem.
On the other hand, for a synapse with an inhibitory weight (MSB = 0), each input spike makes the current discharge Vmem. Therefore, the inhibitory weight refrains Vmem from evoking an output spike. After the neuron evokes an output spike, it resets Vmem to Vreset. In case of either no input spike or a weight value of 0, Vmem remains at the same value or decreases over time due to the leaky integration function of the LIF neuron.

Digital SNN SoC Implementation
In addition to the analog implementation of the SNN shown in Figure 2, we also implemented a full-digital design based upon the BSRC SNN model using Verilog HDL for

Digital SNN SoC Implementation
In addition to the analog implementation of the SNN shown in Figure 2, we also implemented a full-digital design based upon the BSRC SNN model using Verilog HDL for the purpose of comparing the two implementations in terms of area and power consumption. Like the analog SNN implementation discussed above, the input layer of the digital SNN comprises 196 synapses to convert pixel values into spike signal trains. The digital SNN is constituted of multiple synapses connected to neuron logic that stores its membrane potential in its register memory. When an input spike arrives at a particular synapse, then its weight value is accumulated in the membrane register. The neuron evokes an output spike when the value in the membrane register reaches a threshold value and resets the value of the membrane register. A discrete spike-event strobe signal, which is produced by dividing the system clock by a known factor, triggers the operation of neurons and synapses at its rising edge.

Analyzing Analog SNN
The proposed SNN architecture shown in Figure 2 was implemented and fabricated using a 65 nm CMOS process design kit. The entire chip layout of the analog SNN integrated with pads is highlighted and demarcated in the micrograph of the fabricated chip in Figure 6. All the four fully connected layers of the SNN are tagged as IPL (input layer), HL1 (hidden layer1), HL2 (hidden layer2), OPL (output layer), and a DC (digital controller). To minimize the chip size, all the layers are aligned abreast to curtail the routing, which leads to an analog SNN chip with an active core area of 0.96 mm 2 . the purpose of comparing the two implementations in terms of area and power consumption. Like the analog SNN implementation discussed above, the input layer of the digital SNN comprises 196 synapses to convert pixel values into spike signal trains. The digital SNN is constituted of multiple synapses connected to neuron logic that stores its membrane potential in its register memory. When an input spike arrives at a particular synapse, then its weight value is accumulated in the membrane register. The neuron evokes an output spike when the value in the membrane register reaches a threshold value and resets the value of the membrane register. A discrete spike-event strobe signal, which is produced by dividing the system clock by a known factor, triggers the operation of neurons and synapses at its rising edge.

Analyzing Analog SNN
The proposed SNN architecture shown in Figure 2 was implemented and fabricated using a 65 nm CMOS process design kit. The entire chip layout of the analog SNN integrated with pads is highlighted and demarcated in the micrograph of the fabricated chip in Figure 6. All the four fully connected layers of the SNN are tagged as IPL (input layer), HL1 (hidden layer1), HL2 (hidden layer2), OPL (output layer), and a DC (digital controller). To minimize the chip size, all the layers are aligned abreast to curtail the routing, which leads to an analog SNN chip with an active core area of 0.96 mm 2 . The measurement setup, along with the printed circuit board (PCB) for the fabricated analog SNN, is shown in Figure 7a. The input image, weight data, and configuration parameters are provided to the SNN chip by a host CPU board (in our measurement setup The measurement setup, along with the printed circuit board (PCB) for the fabricated analog SNN, is shown in Figure 7a. The input image, weight data, and configuration parameters are provided to the SNN chip by a host CPU board (in our measurement setup Raspberry Pi 4) via a serial parallel interface (SPI). Once configured, the on-chip digital controller (DC) takes weight values from the host CPU and stores them in the weight memories, which are connected to synapses. It then takes input pixel data and converts them into input spike signal pulses based on the BSRC coding described in [18]. Then, the spike signals propagate from the input layer IPL through each layer, eventually reaching the output of the output layer OPL. The DC counts the spiking activity of the ten outputs of OPL, converts it into a digital value, and forwards it to the host CPU board for further estimation of classification.
Sensors 2023, 23, x FOR PEER REVIEW 9 of 13 Raspberry Pi 4) via a serial parallel interface (SPI). Once configured, the on-chip digital controller (DC) takes weight values from the host CPU and stores them in the weight memories, which are connected to synapses. It then takes input pixel data and converts them into input spike signal pulses based on the BSRC coding described in [18]. Then, the spike signals propagate from the input layer IPL through each layer, eventually reaching the output of the output layer OPL. The DC counts the spiking activity of the ten outputs of OPL, converts it into a digital value, and forwards it to the host CPU board for further estimation of classification. The outputs of the SNN chip are also measured using a logic analyzer, as shown in Figure 8a,b. In Figure 8a, one input spike is propagated to the first output node through  The outputs of the SNN chip are also measured using a logic analyzer, as shown in Figure 8a,b. In Figure 8a, one input spike is propagated to the first output node through the first neuron of all four layers when the maximum weight value of 15 is written to the first neuron's synapse of every layer and a weight value of zero to the rest of the synapses. Similarly, as shown in Figure 8b, keeping the same weight configuration as the former measurement, 15 input spikes are propagated to the first output node. These measured results demonstrate the successful spike propagation from the input layer to the output nodes of the fabricated analog SNN chip. The average power consumption of the analog SNN chip was 530 µW when measured using the MNIST dataset at a clock frequency of 10 MHz. We also calculate energy per spike, which is the total energy consumed for propagating all input spike events divided by the total number of input spike events. In the test of the proposed SNN chip shown in Figure 8a, which is operated at 10 MHz, a single input spike takes 2.5 µs for propagation, thus consuming 1.325 nJ of energy per spike. the first neuron of all four layers when the maximum weight value of 15 is written to the first neuron's synapse of every layer and a weight value of zero to the rest of the synapses. Similarly, as shown in Figure 8b, keeping the same weight configuration as the former measurement, 15 input spikes are propagated to the first output node. These measured results demonstrate the successful spike propagation from the input layer to the output nodes of the fabricated analog SNN chip. The average power consumption of the analog SNN chip was 530 μW when measured using the MNIST dataset at a clock frequency of 10 MHz. We also calculate energy per spike, which is the total energy consumed for propagating all input spike events divided by the total number of input spike events. In the test of the proposed SNN chip shown in Figure 8a, which is operated at 10 MHz, a single input spike takes 2.5 μs for propagation, thus consuming 1.325 nJ of energy per spike.

Comparison with Digital SNN Chip
The full-digital SNN chip described in Section 3 has been implemented in FPGA for comparison purposes. The FPGA board is measured by employing the test setup shown in Figure 7b, wherein a host CPU board (Raspberry Pi 4) provides input image and weight values via an SPI. To test the digital SNN FPGA, the on-chip controller generates stimulus input spike signals in a sequence of digital pulses. The output spiking activities at the ten 2.5μs

Comparison with Digital SNN Chip
The full-digital SNN chip described in Section 3 has been implemented in FPGA for comparison purposes. The FPGA board is measured by employing the test setup shown in Figure 7b, wherein a host CPU board (Raspberry Pi 4) provides input image and weight values via an SPI. To test the digital SNN FPGA, the on-chip controller generates stimulus input spike signals in a sequence of digital pulses. The output spiking activities at the ten output nodes are forwarded to the host CPU for further classification, like the case of an analog SNN chip. These outputs of the digital SNN are taken out of the FPGA and observed via a logic analyzer, as shown in Figure 9. Here, the spiking activities of the ten output nodes are visualized for correct and failed classification against each input test image. The digital SNN occupies an area of 0.75 mm 2 and a power consumption of 117 mW, which are estimated by Synopsys Design Compiler using the same 65 nm CMOS process as the analog SNN. From the measurement results of Figure 9, wherein the digital SNN operates at 10 MHz, a single input spike takes 2.4 us for propagation, thus consuming 4.660 nJ of energy per spike. Moreover, the digital SNN takes 8 us per image to complete, thus consuming 936 nJ of energy per image. From this, the energy per image for the analog SNN can be calculated as 261 nJ.
output nodes are forwarded to the host CPU for further classification, like the case of an analog SNN chip. These outputs of the digital SNN are taken out of the FPGA and observed via a logic analyzer, as shown in Figure 9. Here, the spiking activities of the ten output nodes are visualized for correct and failed classification against each input test image. The digital SNN occupies an area of 0.75 mm 2 and a power consumption of 117 mW, which are estimated by Synopsys Design Compiler using the same 65 nm CMOS process as the analog SNN. From the measurement results of Figure 9, wherein the digital SNN operates at 10 MHz, a single input spike takes 2.4 us for propagation, thus consuming 4.660 nJ of energy per spike. Moreover, the digital SNN takes 8 us per image to complete, thus consuming 936 nJ of energy per image. From this, the energy per image for the analog SNN can be calculated as 261 nJ.

Performance Analysis
To evaluate the performance of the implemented SNN chips, we chose the SNN model for the digit image classification task based on the MNIST dataset of handwritten digits. MNIST consists of 50,000 training and 10,000 test images. Using the BSRC encoding [18], we optimized the SNN model and trained it in a Python framework. Afterward, the SNN chip utilized these trained weights for inference using the MNIST dataset. The SNN hardware, in analogy to its software counterpart, attained the targeted average accuracy of 96.56% for classification. Table 2 compares the performance of the proposed SNN chip with previous works [5,11,19] and its predecessor chip [14]. A figure of merit, namely "complexity", is defined as the total number of weights, which allows for a fair comparison of the physical dynamics of different neuromorphic chips implemented for various applications. We then calculated the area and power efficiencies (η) by dividing the complexity (number of weights), respectively, by consumed area and power. The proposed SNN implementation outperformed the previous works [5,11,19] and its predecessor chip [14] in terms of area, power, and accuracy while consuming slightly more energy. The previous work [5]

Performance Analysis
To evaluate the performance of the implemented SNN chips, we chose the SNN model for the digit image classification task based on the MNIST dataset of handwritten digits. MNIST consists of 50,000 training and 10,000 test images. Using the BSRC encoding [18], we optimized the SNN model and trained it in a Python framework. Afterward, the SNN chip utilized these trained weights for inference using the MNIST dataset. The SNN hardware, in analogy to its software counterpart, attained the targeted average accuracy of 96.56% for classification. Table 2 compares the performance of the proposed SNN chip with previous works [5,11,19] and its predecessor chip [14]. A figure of merit, namely "complexity", is defined as the total number of weights, which allows for a fair comparison of the physical dynamics of different neuromorphic chips implemented for various applications. We then calculated the area and power efficiencies (η) by dividing the complexity (number of weights), respectively, by consumed area and power. The proposed SNN implementation outperformed the previous works [5,11,19] and its predecessor chip [14] in terms of area, power, and accuracy while consuming slightly more energy. The previous work [5] based on a radial basis function network (RBFN) with multilayer perception (MLP) incorporates a compact analog core to provide good area efficiency. It, however, suffers from poor power efficiency due to its recursive operations. The neuromorphic chip called HICANN-DLS [11] incorporates an array of neuron cells based on compact LIF with wide tunable parameters, which benefits from its low energy per spike of 790 pJ. Its total power consumption, however, is excessively high, leading to very low power efficiency (598 times lower than the proposed work). In addition, it shows significantly lower area efficiency (24 times lower than the proposed work). Compared with the predecessor work [14], the proposed work offers an improvement of 7.5 and 4.0 times in terms of area and power efficiency, respectively.

Conclusions
This paper proposes a spiking neural network SoC hardware implementation optimized for area and power consumption. The four-layer SNN is constituted of compact synapse and neuron circuits, occupies a die area of 0.96 mm 2 , and consumes 530 µW of power. The SNN chip successfully achieves a targeted calcification accuracy of 65.6% while consuming 1325 pJ of energy. The SNN can be easily extended for higher resolution and a number of classes. Therefore, making it a suitable candidate for mobile applications. It will also expand the IoT horizon to cover an even more comprehensive range of applications (which are currently deemed impractical), due to the improvement in the ratio of performance to power consumption.