Implementation of Field-Programmable Gate Array Platform for Object Classification Tasks Using Spike-Based Backpropagated Deep Convolutional Spiking Neural Networks

This paper investigates the performance of deep convolutional spiking neural networks (DCSNNs) trained using spike-based backpropagation techniques. Specifically, the study examined temporal spike sequence learning via backpropagation (TSSL-BP) and surrogate gradient descent via backpropagation (SGD-BP) as effective techniques for training DCSNNs on the field programmable gate array (FPGA) platform for object classification tasks. The primary objective of this experimental study was twofold: (i) to determine the most effective backpropagation technique, TSSL-BP or SGD-BP, for deeper spiking neural networks (SNNs) with convolution filters across various datasets; and (ii) to assess the feasibility of deploying DCSNNs trained using backpropagation techniques on low-power FPGA for inference, considering potential configuration adjustments and power requirements. The aforementioned objectives will assist in informing researchers and companies in this field regarding the limitations and unique perspectives of deploying DCSNNs on low-power FPGA devices. The study contributions have three main aspects: (i) the design of a low-power FPGA board featuring a deployable DCSNN chip suitable for object classification tasks; (ii) the inference of TSSL-BP and SGD-BP models with novel network architectures on the FPGA board for object classification tasks; and (iii) a comparative evaluation of the selected spike-based backpropagation techniques and the object classification performance of DCSNNs across multiple metrics using both public (MNIST, CIFAR10, KITTI) and private (INHA_ADAS, INHA_KLP) datasets.


Introduction
Innovations in artificial neural networks [1,2] and vision-based technologies [3][4][5] have spawned intelligent applications in a variety of fields [6][7][8][9][10][11], despite some limitations in low-cost computing [12]. Due to developments in spiking neural networks (SNNs), neuromorphic processing units inspired by the brain have gained popularity [13]. In SNNs, neurons communicate with one another by means of spike patterns, transmitting spike information from the input neuron to other interconnected neurons, and ultimately to the output neuron [14]. SNNs have been utilized as the neuromorphic processing units for artificial intelligence tasks requiring efficient energy consumption [15]. SNNs are considered the third iteration of artificial neural networks (ANNs) and possess nearly the same computational capability as ANNs [16]. SNNs are distinguished from ANNs predominantly by their discrete output spikes. In contrast to ANNs, which have continuous values as neuron responses, SNNs have discrete pulses that are typically repetitive due to membrane potential [17][18][19][20]. The operation of SNNs is defined by the leaky integrate fire neuron (LIF), which is dependent on the membrane potential dynamics of the neuron. Presently, neuromorphic processors such as IBM's TrueNorth and Intel's Loihi enable researchers to deploy spiking neural networks with a performance that is comparable to that of PC-based convolutional neural networks [21,22]. The aforementioned neuromorphic hardware devices are costly in terms of initial price and maintenance [23]. This has encouraged numerous researchers to implement spiking neural networks on inexpensive field-programmable gate array (FPGA) boards [24][25][26][27][28].

Motivation
To operate object classification and detection algorithms, a number of mid-sized businesses favor low-power chips, particularly in the autonomous vehicle industry. Based on previous interactions with these companies in the current demographics, their primary need is to acquire a low-cost, low-power computing bench with considerable accuracy and processing speed for their autonomous systems. Typically, the final autonomous system design for these institutions includes redundant benchmarking protocols to evaluate the precision and processing speed provided by a low-power computing bench. In the future, these mid-sized companies might use neural networks for power analysis [29] and electrical load prediction [30,31] to compare low-power FPGAs to other computer platforms. Therefore, there is a need for reporting the shortcomings and borderline advantages of deploying powerful DCSNNs to attain higher accuracy on low-power FPGAs at the expense of processing latency.

Purpose of Study
The unique objective of this study was to report and assist mid-level autonomous vehicle manufacturers with the potential deployment of low-power networks such as DCSNNs for deep learning tasks on low-power FPGA boards. In addition, hybrid networks such as DCSNNs could match the accuracy of their ANN counterparts with regard to object classification tasks on a variety of public and private datasets. The research conducted in this study regarding the deployment of DCSNNs on low-cost FPGA boards and the accuracy and processing time latency with respect to MNIST, CIFAR10, KITTI, INHA_ADAS, and INHA_KLP could inform researchers and businesses in this field about the limitations and distinctive perspectives of this approach. The overall study analysis is depicted in Figure 1 below.

Literature Review
Multiple studies have been conducted to attain a comparable level of efficiency for SNNs using backpropagation techniques [32][33][34][35][36][37][38][39][40]. Some have utilized the neuron action potential timing information to infer and distinguish the timing information for potential backpropagation [34,41]. The limitation of these methodologies is that a reduction in the neuronal firing rates eventually leads to a decline in the network's capacity. To circumvent this issue, unsupervised learning techniques such as spike-time-dependent plasticity (STDP) have been utilized to train SNNs. The STDP mechanism was utilized to design reward-modulated STDP for supervised learning, enabling networks to perform object recognition and autonomous tasks, in [42,43]. In a manner of speaking, these mechanisms have a high energy requirement and will degrade the overall efficacy of the system when implemented on deeper SNNs [44,45]. Several of these studies employed spiking non-linearity to approximate the discontinuous spiking activation function, thereby flattening the activation function and rendering the SNN continuously differentiable [46]. Recent studies employing techniques such as surrogate gradient descent (SGD) [46][47][48] and temporal spike sequence learning (TSSL) [49] have substantially improved the consistency of the training process for deeper SNNs while maintaining the accuracy of the SNNs at the same level as that of ANNs on standard PC hardware.
These authors conducted a few exploratory analyses on SGD-BP SNNs executed on PCs in conjunction with NVIDIA TX2 embedded platforms [47]. Similarly, the authors developed a number of deeper SNN architectures trained with SGD-BP and implemented on an embedded board [48,50]. In addition, a literature review of a few potential hybrid networks (involving SNNs with convolutions) on the FPGA platform was conducted to assess the scope of this study; the pertinent information is presented in Table 1. The networks compared in the literature review table alongside the proposed study are all either SNNs or hybrid networks (convolutional SNNs). However, no known study except for [49] has utilized deeper convolutional layers coupled with either integrated fire (IF) or leaky integrate fire (LIF) SNN layers. This combination is very powerful, as it harnesses the power of a convolutional filter alongside the spiking mechanism of IF or LIF neurons. When using low-power FPGA boards, as in several studies [49,[51][52][53][54][55][56], it is challenging to balance both the deeper convolutions and spiking mechanisms. The current study was able to overcome several of the challenges faced by other works because of the following:

•
We hosted deeper convolutions alongside SNNs with very few parameters compared to [49] and were still able to achieve similar accuracy over the MNIST and CIFAR10 datasets.

•
We employed both real-valued and Poisson distribution spikes as input encoding schemes to capture most of the information before processing them through DCSNNs, which were not used in [49,[51][52][53][54][55][56]. • We tested the DCSNNs on automotive relevant datasets such as KITTI, INHA_ADAS, and INHA_KLP as opposed to just MNIST and CIFAR10, as was the case in [49,52,[54][55][56]. • We customized the proposed SGD-BP to fit the low-power needs of several target medium-sized intelligent vehicle industries in the form of FPGA implementation while preserving accuracy.

Spiking Neuron Model
The spiking neural network employed in this study was constructed by the adaptation of the LIF neuron model [13]. According to the LIF neuron model, the input spike train flows from the presynaptic neuron v to the postsynaptic neuron u. The input spike train can be denoted by v represents the firing time of the presynaptic neuron v. The postsynaptic current J v (t) is produced from the incoming spikes through the synaptic connection between neuron v and neuron u. The membrane potential voltage P u (t) for the postsynaptic neuron u at a given time t is represented by where R o is the leaky resistance of the LIF neuron, τ p is the membrane potential time constant, Q uv is the weight of the synaptic connection between the presynaptic and postsynaptic neurons, J v (t) is the postsynaptic current inculcated by the presynaptic neuron spike, and r u (t) is the reset mechanism in the spiking activity. The postsynaptic current and the reset mechanism can be denoted as where α(·) and β(·) are the response mechanism kernel and reset mechanism kernel, respectively. Accordingly, the first-order spike response is denoted in conjunction with a synaptic time constant τ s as The membrane potential is reduced by the reset mechanism for each neuron firing period by a specific firing equilibrium value. By applying the Euler method to 1, the membrane potential is simplified as The overall firing mechanism is then followed by the reset scheme to obtain the output of the firing neuron as where V Eq is the firing equilibrium or threshold, and H(·) is the step function.

Deep Convolutional Spiking Neural Networks (DCSNNs)
The combination of convolutional kernels and pooling layers with spiking neural network components results in DCSNNs. This study employed such architectures with additional layers to perform classification tasks. As input spikes pass through various layers, the training process occurs. When the input spike train is processed by the filters in the convolutional layers, the input current is estimated. At each time step, the input current determines the membrane potential P u (t) of the neuron. When the neuron's P u (t) exceeds the threshold value V Eq , both the neuron's spikes and the membrane potential revert to their initial values of zero. In contrast, the value of P u (t) is regarded as residual leakage over the course of the subsequent time steps. The pooling layers in the DCSNN function are similar to those in ANNs; however, the spike representation of the input image corresponding to spatial information is governed by either average [53,58] or maximum [59] pooling. A schematic of DCSNNs is depicted in Figure 2. These factors contribute to the stability of the training of deep convolutional spiking neural networks.

TSSL-BP for DCSNNs
The temporal spike loss function proposed in [49] was formulated as the sum of the squared error with respect to each time step for all neurons. This enabled the calculation of the difference between the desired spikes D sp = [D sp | t=t 0 , D sp | t=t 1 , . . . , D sp | t=t N t ] and produced (actual) spikes S sp = [S sp | t=t 0 , S sp | t=t 1 , . . . , S sp | t=t N t ], where D sp | t and S sp | t are the desired and produced (actual) firing events, respectively, at time t for the output neurons, with the number of total time steps being N t . The temporal spike loss function was calculated as where ξ TSSL [t] represents the error at time t, and (·) represents a function that yields the Van Rossum difference between D sp | t and S sp | t .

SGD-BP for DCSNNs
The loss in the surrogate gradient descent is defined with respect to the integral over the time steps, where the difference in desired spikes D sp = [D sp | t=t 0 , D sp | t=t 1 , . . . , D sp | t=t N t ] and actual spikes S sp = [S sp | t=t 0 , S sp | t=t 1 , . . . , S sp | t=t N t ] are coupled by the amount of membrane potential P(t n ) [46,60]: where ξ SGD [t] represents the SGD error at time t between D sp | t and S sp | t . The membrane potential P(t n ) influences the loss function in the case of SGD, as the growth in membrane potential corresponds to the reduction in loss when the output spike is absent Rossum distance is used to achieve the stable control of the loss function. Additionally, the DCSNN uses convolutional kernels, which was taken into consideration according to previous studies [38] to further optimize the loss function. The entire loss function with respect to the total time period N t was calculated in the presence of convolutional kernel a(t) = 1 as Considering the Heaviside step function H, and with Z gt → [0, 1] being the groundtruth labels for classification and Z = N t 0 S sp (t n )dt n being the actual output of the network, the loss function could be modified to The membrane potential and errors associated with the different loss expressions presented in (7)-(9) could be combined using the background presented in [46] as follows: The effect of surrogate gradient with respect to membrane potential on the loss function was derived in [46,60] such that the differentiable output of the actual spikes was directly associated with the function of the membrane potential coupled with a change in the membrane potential as follows: The function of the membrane potential was the combination of the hyperparameter χ, gradient thickness c, and difference between the membrane potential and equilibrium threshold f (P(t n )) = χ [1+c(P(t n )−V Eq )] 2 ; thus, (11) could be written as

FPGA Design and Data Processing
The external view of the SNN processor on-chip FPGA board used in this investigation, which was intended to host spiking neural networks, is depicted in Figure 3. External components such as JTAG, an SD card, flash memory, USB, UART, and SDRAM made up the FPGA board. Consequently, the internal components consisted of a JTAG controller; OpenRISC core; SDcard controller; DNN accelerator; 512 KB of SRAM; and flash (SPI), SDRAM, USB, and UART controllers. The advanced microcontroller bus architecture (AMBA) was used to implement the master-slave AHB protocols. The FPGA design's overall block diagram is depicted in Figure 4. The FPGA design elements such as the maximum frequency; quantization; details regarding the CLB LUTs; CLB registers utilized; and DSPs, BRAM, etc., are shown in Table 2.

Flow of Data in FPGA Board
The training of the DCSNN was carried out on a PC with an NVIDIA GPU leveraging the processing power utilized for the backpropagation. The data flow in the FPGA board is depicted in Figure 5. Both the training and testing datasets were deployed for learning and inference purposes, respectively, on the PC to obtain initial estimates. In the next step, the trained model weight file (.bin) was transferred to the SD card and inserted into the FPGA board. The SD card created an environment inside the FPGA with arguments such as header, APB set, layer parameters, and inputs, which allowed us to run the SNN model. The data prediction process based on the underlying spiking mechanism on the FPGA is shown in Figure 6. The highest spike count in a specific category provided the prediction over the corresponding class. Additionally, a GUI demo video visualizing the inference data transfer from the FPGA to the PC via the UART can be found using the following link: https://github.com/INHACVLAB/DCSNN-on-FPGA/blob/main/ SNN%20Object%20Classification%20(KITTI%20Dataset)%20on%20FPGA.mp4, accessed on 29 September 2022. The spiking activity that has the highest numerical value corresponds to the predicted class, and the resultant membrane potential values relate to the firing neuron activity during the prediction of the class.

DCSNN Architecture and Network Parameters
A significant number of papers have been published using the spike-timing-dependent plasticity (STDP) technique to deploy SNNs on an FPGA. However, the STDP does not conform to DCSNNs' use of backpropagation. TSSL-BP is a well-designed backpropagation algorithm that supports significant open-source datasets. Therefore, TSSL-BP was evaluated and modified to match the FPGA deployment routines. On the other hand, the SGD-BP was chosen because we had conducted a substantial quantity of research utilizing this backpropagation technique, thus targeting better FPGA optimization. The TSSL-BP PC implementation and SGD-BP implementation codes can be found at https://github.com/INHACVLAB, accessed on 29 September 2022. The TSSL-BP-designed DCSNN architecture consisted of ten layers, which are depicted in Table 3 along with pertinent network characteristics. For the processing of a TSSL-BP coupled network, a total of 21,150,496 parameters are required. The ten layers of the SGD-BP-designed DCSNN architecture are displayed in Table 4 along with pertinent network characteristics. A total of 5,322,592 parameters are required for the processing of an SGD-BP-coupled network. In TSSL-BP, the input image is directly fed into the convolution filters, where real-valued spikes are processed during future training, as described in [49]. In contrast, SGD-BP encodes the input image using Poisson distribution for the LIF neurons and simultaneously feeds the input image into the convolution and pooling layers. As stated in [47], the spike currents were calculated via a cumulative process incorporating LIF and convolutions, and the threshold was applied to the membrane potential.

Public and Private Datasets
The publicly available datasets included MNIST [61], CIFAR10 [62], and KITTI [63]. The MNIST dataset was considered to offer the classification scope of 10 classes from 0 to 9, and CIFAR10 contains classes such as airplane, automobile, bird, cat, deer, dog, frog, horse, ship, and truck. Similarly, the KITTI dataset consists of classes such as vehicle, cyclist, and pedestrian. Internally, cars, buses, and trucks were combined as vehicles for a generalized classification scenario with the privately-acquired INHA_ADAS dataset containing the same three classes of vehicle, cyclist, and pedestrian. The private datasets used in these experiments were designed for the sole purpose of classification tasks in the context of autonomous vehicle scenarios. The datasets such as INHA_ADAS and INHA_KLP were customized to assist the experiments related to assessing the classification performance of the DCSNNs. The INHA_ADAS dataset consisted of three classes, namely vehicle, cyclist, and pedestrian, which were maintained in correlation to the classes defined in the KITTI public dataset. Additionally, the INHA_KLP dataset was chosen and customized to test the inference capabilities of the DCSNNs for Korean license plates. A total of 50 classes were considered with the combination of numbers 0 to 9, just as in the case of the public MNIST set, and 40 Korean alphabet classes, which acted as the perfect scope to test the inference of the DCSNNs. The overall dataset specifications are stated in Table 5.

Performance Evaluations
The evaluations were carried out using mainly accuracy and processing time as fundamental performance criteria. The algorithms and networks were deployed on the FPGA for inference, and corresponding metrics related to the accuracy and processing time were collected for evaluation. The experiments were carried out on diverse datasets, and comparisons between the PC and FPGA with respect to TSSL-BL and SGD-BP were drawn. Usually, the mAP in traditional machine learning applications is calculated using a ground-truth bounding box. However, the mAP metric considered in this work was calculated according to the true-positive (TP) and false-positive (FP) precision values with respect to classes based on spike predictions. The mAP calculation is clearly illustrated with a use case in Figure 7, where the spike count led to the predicted class and thereby the mAP was calculated from the average precision. Moreover, the average power consumption (in Watts) was calculated on a CPU (Intel i7-12700) alongside FPGA Xilinx Kintex UltraScale (xcku115-flvf1924-2-i). Regarding the public dataset evaluations, Figure 8 shows the classification performance of the TSSL-BP-and SGD-BP-coupled DCSNNs on the MNIST dataset with 10 classes. Also, Table 6 presents the metrics corresponding to the samples inferred on the PC alongside the FPGA in terms of accuracy and processing time. Similarly, the confusion matrices in Figure 9 provide a glimpse of the classification performance of the TSSL-BPand SGD-BP-coupled DCSNNs on the CIFAR10 dataset with 10 classes. Also, Table 7 presents the metrics corresponding to the samples inferred on the PC alongside the FPGA in terms of accuracy and processing time. Additionally, Figure 10 provides an overview of the classification performance of the TSSL-BP-and SGD-BP-coupled DCSNNs on the KITTI dataset with three classes. Also, Table 8 presents the metrics corresponding to the samples inferred on the PC alongside the FPGA in terms of accuracy and processing time. The latency of the DCSNN on the FPGA was calculated with respect to various datasets alongside the PC inference and was defined as the spiking FPGA inference latency with respect to the spiking PC inference. For instance, in Table 6, the FPGA latency with respect to MNIST for TSSL-BP is 49 times that of the PC.
Regarding the private dataset evaluations, the confusion matrices in Figure 11 show the classification performance of the TSSL-BP-and SGD-BP-coupled DCSNNs on the INHA_ADAS dataset with three classes. Also, Table 9 presents the metrics corresponding to the samples inferred on the PC alongside the FPGA in terms of accuracy and processing time. Figure 12 shows the classification performance of the TSSL-BP-and SGD-BP-coupled DCSNNs on the INHA_KLP dataset with 50 classes. Also, Table 10 presents the metrics corresponding to the samples inferred on the PC alongside the FPGA in terms of accuracy and processing time.

Classification Accuracy
The classification accuracy of both backpropagation techniques is shown in Figure 13, where the green and purple column bar charts correspond to the performance of TSSL-BP and SGD-BP on the FPGA hardware, respectively. The classification accuracies of TSSL-BP on the datasets CIFAR10 and INHA_ADAS were higher than the classification accuracies of SGD-BP on the FPGA platform. However, the mAP metrics on the dataset INHA_KLP favored SGD-BP over TSSL-BP on the FPGA platform. Nevertheless, under the FPGA inference environment, the classification accuracies of both backpropagation techniques were similar on the datasets MNIST and KITTI.

Processing Time
Additionally, the inference processing time of both backpropagation techniques were estimated in the FPGA environment. The green and purple line plots in Figure 14 correspond to the inference processing times of TSSL-BP and SGD-BP on the FPGA hardware, respectively. The processing times of TSSL-BP on almost all the datasets were higher than the inference processing times of SGD-BP on the FPGA hardware. This suggested that the inference time taken for SGD-BP to classify a sample from all the diverse datasets was shorter than the inference time taken for TSSL-BP on the FPGA platform.

Trade-Off between Accuracy and Processing Time
The trade-off between the accuracy and processing time must be investigated thoroughly because of the ambiguity that might alter the choice of a proper backpropagation technique for coupling with DCSNNs to be deployed in classification tasks. For certain datasets with simple backgrounds and binary images as samples, the classification task can be considered less complex than in the case of complex backgrounds with clutter in color image samples. Also, the design elements such as the choice of hyperparameters and design options such as dropout rate and batch normalization can also influence the overall classification accuracy. However, the employment of low-cost FPGA platforms for the classification task converts the network design aspect into a complex set of restrictions. The hardware restrictions, such as the inability of the FPGA board to handle batch norms, dropout, and several other factors, are responsible for better accuracy. Therefore, instead of considering classification performance or processing time alone, it is appropriate to consider the trade-off between classification accuracy and processing time for a better choice of backpropagation technique. The trade-off metric can be estimated by the fraction term, with the numerator being the classification accuracy and the denominator being the processing time. To attain a better accuracy, the mAP is required to be as high as possible, and the processing time as low as possible. Accordingly, the trade-off is estimated by where ψ is the trade-off metric, Acc(mAP%) refers to the classification accuracy (mAP%), and PT is the processing time in msec. This metric must be as high as possible, with the classification accuracy (mAP%) as the numerator and the processing time (msec) as the denominator. The comparisons between the trade-off metrics corresponding to TSSL-BP and SGD-BP on the FPGA and PC are shown in Figure 15. The blue and red column bar charts represent the trade-of metrics of TSSL-BP and SGD-BP in the PC environment, respectively. Similarly, the green and purple line plots represent the trade-off metrics of TSSL-BP and SGD-BP on the FPGA board, respectively.

Performance Analysis with Respect to Datasets on the FPGA Platform
The classification performance of both techniques varied with the datasets and other design aspects when deployed on the FPGA. A clear analysis of which technique should be preferred according to the classification accuracy on the FPGA is shown in Table 11. The table indicates the suitability of certain techniques when employed on the FPGA board in conjunction with specific datasets for better classification performance. Similarly, SGD-BP could be considered as a suitable candidate for coupling with DCSNNs deployed on FPGA platforms for classifying samples from KITTI, INHA_KLP, and MNIST. Both backpropagation techniques required considerable inference times to classify the samples on the FPGA board with respect to diverse datasets. These inference values are presented in Table 12, which clearly states that the processing time taken by SGD-BP to classify the samples from all five datasets on the FPGA board was far lower than that of TSSL-BP. Analogous to the processing time metric, the trade-off metric of SGD-BP seemed to contain better quantitative values compared to the TSSL-BP trade-off metric. These tradeoff values are shown in Table 13, which clearly states that SGD-BP could be a suitable candidate for coupling with DCSNNs to achieve better performance on classification tasks in the context of FPGA environments for datasets such as MNIST, KITTI, INHA_ADAS, and INHA_KLP. In the tables below, the symbols + and − correspond to suitable and unsuitable BP candidates for the respective dataset.    [54,55] were chosen to evaluate the performance with respect to MNIST and a single work [55] with respect to CIFAR10. The relevant hardware specifications of these works are reported in Table 1. The evaluations were conducted in terms of quantization (bits), the number of parameters, accuracy (%), and throughput (frames per second). Tables 14 and 15 show the corresponding results. However, with proper open-source algorithms, performance evaluations compared to other SNN works on FPGAs could be carried out with better precision in the future.
In addition to the performance evaluations of contemporary studies, the power consumption aspect of the FPGA board was tested during this investigative study. The images in the public and private datasets were standard 32 × 32 pixel images. This size was maintained in all the experiments conducted in this study. The precision of 16-bit and the aforementioned image size was maintained while measuring the power consumption on the major computation devices used in this study. An Intel i7-12700 CPU and Xilinx Kintex UltraScale FPGA (xcku115-flvf1924-2-i) were used to run the DCSNNs with input images taken from all the datasets. As the input size was standardized on all the datasets, the power consumption was the same for all the datasets with respect to the computation device. In the comparative study, it was evident that the Xilinx Kintex UltraScale FPGA (xcku115-flvf1924-2-i) consumed much less power compared to the Intel i7-12700 CPU, and this could be quantified as being 18 times the power efficient, as shown in Table 16.
To examine the on-chip power utilization percentage on the overall FPGA, the utilization metrics were acquired from the chip, and it appeared to be using 0.74 watts of dynamic power while deploying the DCSNN on the FPGA board, as shown in Figure 16.

Discussions, Limitations, and Future Work
This study focused on the deployment of DCSNNs on a low-cost FPGA board and reported the accuracy and processing time latency with respect to the hardware. The MNIST, CIFAR10, KITTI, INHA_ADAS, and INHA_KLP datasets were used to inform researchers and enterprises about the limitations and unique perspectives of this methodology. This experimental study had two primary objectives: (i) to determine the most effective backpropagation technique, TSSL-BP or SGD-BP, for deeper SNNs with convolution filters across multiple datasets; and (ii) to evaluate the feasibility of deploying DCSNNs trained using backpropagation techniques on low-power FPGAs for inference, taking into account potential configuration adjustments and power consumption. The inference performed on the FPGA necessitated the customization of networks with respect to constraints such as batch normalization. The network porting (.yaml) file required to operate the DCSNN on an FPGA can be accessed via the following link: https://github.com/INHACVLAB/DCSNN-on-FPGA/tree/main/networks, accessed on 29 September 2022. When using low-cost FPGA devices to implement DCSNNs for classification tasks, the trade-off between accuracy and the processing time is crucial. The processing time attribute varied depending on the dataset and has been depicted as a latency parameter in various tables , such as Tables 6-10. The processing duration of the model deployed on an FPGA was at least 50 times shorter than the model deployed on a PC. Due to the dearth of open-source FPGA-related working codes, the performance analysis was limited to a small number of recent works. The outcomes depicted in Tables 14 and 15 served as the performance analysis of the current study compared to other works with respect to the MNIST and CIFAR10 datasets on an FPGA board. As shown in Table 16, the power efficiency of the Xilinx Kintex UltraScale FPGA (xcku115-flvf1924-2-i) was 18 times that of the Intel CPU (i7-12700) for all datasets with an input image size of 32 × 32 and 16-bit precision. This is an essential observation for the investigation of the performance of custom FPGAs using DCSNN models.
The private datasets, such as INHA_ADAS and INHA_KLP, aided in the exploration of BP techniques in general on the PC and later on the FPGA, which provided insights into the feasibility of employing BP techniques in future experiments. In addition to the topics discussed, there were a few limitations associated with the current study, including: 1.
The current work was limited to testing DCSNNs on a single FPGA model, Xilinx Kintex UltraScale. Due to the lack of open-source code, the performance analysis conducted in the study was unable to fully address the pros and cons of the model in comparison to contemporary works carried out on other FPGA models. In the future, this issue could be effectively resolved by contemplating multiple models of FPGA boards with similar on-chip SNN deployment design elements and evaluating various DCSNNs with respect to various datasets.

2.
Experiments must be conducted to ensure that the surrogate gradient descent backpropagation technique is well-tuned to enhance classification accuracy on several ADAS-based private datasets while preserving the shallower network design layers.

3.
Deeper networks (DCSNNs) are currently considered for massive datasets using TSSL-BP and SGD-BP. However, the network design could be expanded to shallow layered networks using the customized parametric surrogate gradient descent backpropagation technique (CPSGD-BP) for greater data size flexibility without compromising performance.

Conclusions
Deep convolutional neural networks utilizing spike-based backpropagation techniques such as TSSL-BP and SGD-BP were successfully implemented and deployed on the Xilinx Kintex UltraScale FPGA platform. The efficacy of the DCSNN in terms of classification accuracy and processing time was evaluated using a variety of metrics on public and private datasets. Using both a PC and FPGA, comparative deployment studies of spike-based backpropagation-coupled DCSNNs on various datasets were conducted, and the results were documented in terms of classification accuracy, processing time, and the trade-off metric. Similarly, a performance analysis of the current study was conducted alongside other works on the MNIST and CIFAR10 datasets. For the purpose of evaluating BP-trained DCSNNs deployed on FPGAs in relation to public and private datasets, all conceivable evaluation methods were investigated. Finally, the potential future directions that could aid researchers attempting to develop DSCNNs for FPGAs, with or without BP techniques, were discussed. Also, the current work validated performance using metrics focusing on accuracy, processing time, and the trade-off between them; however, future work will need to employ multiple hardware metrics on diverse datasets. This investigation into deploying DCSNNs on a low-cost FPGA board and determining the accuracy and processing time latency with respect to the MNIST, CIFAR10, KITTI, INHA_ADAS, and INHA_KLP datasets can inform researchers and businesses about the limitations and unique perspectives of this approach. In the future, there is a need for diverse optimization methods to reduce latency by sustaining the accuracy and low-power characteristics of FPGAs to benefit the medium-scale intelligent vehicle industry.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author. The data are not publicly available due to ongoing validations and continuous improvements.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: Weight of the synaptic connection r u (t) Reset mechanism in the spiking activity α(·) Response mechanism kernel β(·) Reset mechanism kernel V Eq Firing equilibrium H(·) Step function D sp Distance between desired spikes S sp Distance between produced (actual) spikes D sp | t Firing events for desired spikes S sp | t Firing events for produced (actual) spikes N t Total time steps L temp.sp Temporal spike loss function