Improving Spiking Neural Network Performance with Auxiliary Learning

: The use of back propagation through the time learning rule enabled the supervised training of deep spiking neural networks to process temporal neuromorphic data. However, their performance is still below non-spiking neural networks. Previous work pointed out that one of the main causes is the limited number of neuromorphic data currently available, which are also difﬁcult to generate. With the goal of overcoming this problem, we explore the usage of auxiliary learning as a means of helping spiking neural networks to identify more general features. Tests are performed on neuromorphic DVS-CIFAR10 and DVS128-Gesture datasets. The results indicate that training with auxiliary learning tasks improves their accuracy, albeit slightly. Different scenarios, including manual and automatic combination losses using implicit differentiation, are explored to analyze the usage of auxiliary tasks


Introduction
Spiking neural networks (SNNs) have great potential for low-powered artificial intelligence applications when they are implemented on specialized hardware [1][2][3].In contrast to non-spiking neural networks (ANNs) [4][5][6] that process information by successive nonlinear transformations, SNNs use dynamic units, called spiking neurons, for handling spatiotemporal data.The SNN use discrete "spike" events to represent the activation of neurons.This makes them more efficient as the communication through spikes is a sparse process.When implemented on neuromorphic hardware, such as Intel's Loihi2 neuromorphic chip [7,8], they use orders of magnitude less energy than ANNs.This energy efficiency makes them attractive for various applications, such as sign gesture recognition [9,10], heartbeat signal classification [11], continual learning in 3D scenarios [12], and optimization [13].
In practice, however, the usage of SNNs for solving real-life problems is still limited because of their performance.A key issue linked with their lower performance is the limited size of available temporal data for their training [14,15].As dynamic systems, SNNs require and are better suited for processing temporal data, such as neuromorphic data [16].Unfortunately, there is a small number of temporal datasets currently available, and what makes it even worse is that they often have a small number of instances.As a result, SNNs trained with such datasets often suffer from over-fitting and unstable convergence.In order to improve their efficiency on limited-size data, other techniques need to be found.
The problems with training on small-size data are not specific to SNN but are present in machine learning in general [17,18].Two methods have been proposed to address this problem: data augmentation [17,18] (the creation of new synthetic data by the modification of input samples or latent feature vectors) and the use of regularization methods [19][20][21] (direct regularization by penalty loss or indirect regularization with auxiliary learning).Only input data augmentation has been studied within the framework of SNN [22].
In this paper, we study the use of auxiliary learning as an indirect regularization method for SNN training.Auxiliary learning was used for improving the performance of ANNs.Works such as [19,20,23] explored the use of one or more secondary tasks as a way of regularization.These attempts have been helpful in increasing their performance.The limitations seen in the ANN framework are still present in SNNs.In this paper, we explore the selection of the auxiliary tasks as well as the relation of their number to be used.
The specific contributions of our paper are as follows: • We utilized AL for training SNN and experimentally demonstrated that using one or more auxiliary tasks increases the performance.

•
We performed an analysis of different auxiliary task learning setups and analyzed their influence on performance.

•
We compared the results with state-of-the-art SNN solutions and showed that using AL improves their accuracy.
The rest of the paper is structured as follows.Section 2 presents a detailed overview of the related work.Section 3 describes the proposed framework for SNN auxiliary learning.Section 4 describes the experimental settings and results.The paper ends with conclusions.

Spiking Neural Networks
SNNs are designed to emulate the way that biological neurons generate and transmit electrical impulses, known as spikes, to other neurons [1][2][3].This allows for processing information in an event-driven and temporally precise way, which is more efficient and well suited for solving a variety of tasks.Training SNNs requires specialized algorithms that can deal with the temporal processing of spiking neurons in the presence of non-linearity associated with spiking neuron communication.Noteworthy works providing an overview of training methodologies for SNNs include [25? -27].
There are several different approaches to training SNNs that depend on a specific task to be solved and on the learning rule used for updating the weights.Some of the most used learning rules include spike-timing-dependent plasticity (STDP) for unsupervised local learning [28][29][30], reward modulated STDP for reinforcement learning through STDP [31,32], and spike-based backpropagation (SBP) [33][34][35] and remote supervised learning (ReSuMe) [36,37] for supervised learning.Here, we use the SBP algorithm because of its advantages over the other training methods, including enhanced efficiency and the capability to leverage standard optimization techniques developed for deep neural network architectures.
SBP is a variant of the backpropagation algorithm commonly used to train ANNs.Like the standard backpropagation algorithm, SBP uses gradient descent to update the weights in order to minimize the error between the predicted and desired output.However, unlike the standard backpropagation algorithm, SBP is designed to handle the dynamic operation of spiking neurons and the associated non-linearity of the communication via spikes.To enable the incorporation of temporal dependencies in the training process, SBP uses backpropagation through time (BPTT) [35], and in order to overcome the non-linearity of the spiking mechanisms, it uses surrogate gradient functions (SGD) [34].
BPTT was originally developed for training recurrent neural networks, which can process sequences of inputs [38].In BPTT, the error signal is propagated back through the network not just over a single time step but over multiple time steps.This allows the network to take into account the past history of inputs and outputs while adjusting its weights.The trick that allows the use of BPTT in SNNs is that the spiking neuron model can be unfolded into a recurrent computation system [35].
Although implementing the computational graph of the spiking operation is feasible through tools such as PyTorch or TensorFlow, the non-differentiability of the Heaviside function impedes the correct calculation of the error gradients.To overcome this issue, the SGD method is used [34].The main idea behind SGD is to use a replacement function as the gradient for the non-differentiable function.This replacement is only used during the gradient calculation or backward pass training stage.The Heaviside function is still used during the forward pass to maintain the correct operation of the network.

Auxiliary Learning
Auxiliary learning (AL) is a technique developed to improve the performance of ANN when training data are limited in size or expensive to collect [19][20][21].In auxiliary learning, a model is trained on multiple tasks in a similar manner as used in multitask learning (MTL), see Figure 1.The difference between MTL and AL is that while MTL strives for good performance on all tasks (all tasks being of equal importance), AL focuses on the performance of one main task that the network needs to solve and treats all the other tasks as auxiliary (used only to help improve the performance of the main task).The auxiliary tasks can be related or not to the main task.In multitask learning, the goal is to perform more than one learning task at the same time, with all tasks being equally important.(Right): In auxiliary learning, the goal is to learn one main task while using one or more auxiliary tasks.
AL approach has several advantages.Training the network on multiple tasks simultaneously using AL forces it to learn more general, transferable to other tasks features.This improves performance on the main task.AL can also improve the efficiency of training as the network can learn from the auxiliary tasks without the need for additional training data.This makes it practical for training complex neural networks.Finally, AL can serve as a regularization tool for network learning, which usually improves its generalization ability by reducing overfitting.
Learning multiple tasks, however, creates problems such as negative transfer [39], i.e., when different tasks have conflicting goals, like increasing performance for one task decreases the performance of the other task(s).Another challenge is knowing how to efficiently combine multiple loss functions, i.e., how to weight the losses so the main task is preferred [40].In this paper, we address these questions by exploring different setups for loss error combination and the number of auxiliary tasks required.We also compare the efficiency of AL in training SNN vs. training ANN (also with AL).

Input Data Augmentation
Data augmentation is a technique used to increase the size and diversity of a dataset [17,18].In the input data augmentation, additional data are generated by applying various transformations to the original input data, such as rotation, scaling, cropping, or adding noise.Doing this provides more examples to learn from and can help the trained model to generalize better on new data.Researchers studied the use of geometrical transformations for input data augmentation on neuromorphic data for training SNN.Using this approach allowed for about a 4% accuracy increase [14].This illustrates one of the problems of SNN, namely, the scarcity of event-based data for their training.In this paper, in addition to input data augmentation, we use AL as a method to increase accuracy on limited size training data.

Problem Definition
Consider an input space X, where X ∈ R n , and a main task T main and one or more auxiliary tasks T (i) aux .The expected output for the main task is Y main and for the auxiliary tasks Y i aux .We want to train a spiking neural network, f (x), with weights W that minimize the loss of T main while using T (i) aux as a regularization method during training.Note that T (i) aux is used during training only.

Architecture
The auxiliary learning architecture for training SNNs is shown in Figure 2. It consists of a feature extraction block connected in a feed-forward fashion to the main task and auxiliary task(s) blocks.The spiking input signal is processed by the first block, the feature extraction block, into a latent p-dimensional spiking feature vector, which is then fed to the main and auxiliary task classifier blocks to find the outputs.The idea behind this architecture is to allow the feature extraction block to receive feedback from the main classifier block (main task loss) and also from the auxiliary task classifier block(s) (auxiliary task losses) during training.In this way, the auxiliary task classifier blocks act as regularization blocks for the feature extraction block.

Main Classifier Block
L A h Figure 2. Auxiliary learning architecture.The network uses a multitask architecture in which only one task, "the main task", is of importance.The other tasks, "the auxiliary tasks", are used as additional regularization losses for helping the main task performance.The auxiliary tasks are only used during training.
In this work as the spiking neuron model, we use the parametric leaky integrate and fire neuron model (PLIF) [22], which is a leaky integrate and fire (LIF) [41,42] where α and β are decay constants equal to α ≡ exp(− t τ mem ) and β ≡ exp(− t τ syn ) with a small simulation time step t > 0 and membrane and synaptic time constants τ mem and τ syn ; W ij are synaptic weights of the postsynaptic neuron i and presynaptic neurons j and j within the same layer l; and S (l) i [n] is the output spike train of neuron i in layer l at time step [n].The output spike train is expressed as the Heaviside step function of the difference between the membrane voltage and the firing threshold ϕ as follows: (3)

Training and Testing
The goal is to learn a set of weights, W * , that minimizes the loss of the main task while utilizing the auxiliary losses as regularization parameters.This can be expressed as the following optimization problem: where L represents the total loss, which is calculated from the main task loss, L M , and the auxiliary task losses, L A , as follows: A , L A , . . ., L A ) where L M is the main task loss; L i aux are the auxiliary task losses; and h is a linear/nonlinear operation that processes the auxiliary losses.The simplest loss combination case is when h(.) is a linear combination of the auxiliary losses.In this scenario, the total loss, L, can be expressed as: where α is a loss rate constant that controls the rate between the main and auxiliary losses, and γ i denotes weights assigned to each auxiliary loss (they can be determined through manual tuning methods like grid search, or automatic tuning methods like implicit differentiation [43]).The latter approach can also be used to train function h(.) when a non-linear model is chosen.In this work, we compare the results obtained by all three methods: the manual tuning of a linear combination, the automatic tuning of a linear model, and the automatic tuning of a non-linear model for h(.).
During testing, the samples are only fed into the feature extraction block and then into the main task classifier block.The auxiliary task blocks are not used since the focus is solely on evaluating the performance of the main task.

Implementation
The network is implemented using the SpikingJelly framework, which consists of a set of python libraries for supervised training of SNNs [44].The code for the network implementation as well as all sof the experiments and results is posted at GitHub (https://github.com/PaoloGCD/AuxiliaryLearning-SNN(accessed on 22 July 2023)).

Experiments and Results
We evaluate effectiveness of AL in SNN for solving recognition tasks using the CIFAR10-DVS [45] and DVS128-Gesture [46] neuromorphic datasets.All of the tests are performed using the architecture shown in Figure 2.For structuring the network, we followed the architecture used in [14,22], which is based on the VGG structure [47].Specifically, the number of layers used for the feature extraction and classifier blocks for each dataset is shown in Table 1.Each layer of the feature extraction block is composed of PLIF neurons in a convolutional layer with batch normalization that is followed by max pooling with kernel 2 × 2. All convolution operations use a kernel size of 3 × 3 with stride 1 and padding 1.The number of channels for all convolution layers is 128.The layers of the classifier blocks (the main and auxiliary) are composed of a fully connected layer of PLIF neurons with a dropout of 0.5.The number of features of the first fully connected layer is set to 1/4 of the number of input vector features.The number of features for the output layer (the last fully connected layer) is 10 times the number of classes of the average voting when stride 10 is used for computing the classification label.All of the results are presented as the average of ten runs.For DVS-CIFAR10 data, A1 is selected as a duplicate of the main task label; A2 is categorization into living vs. non-living class labels; and A3 is based on morphological properties of the classes.For example, deer and horses are put into the same group (group 4) because of their morphological similarities.For DVS128-Gesture data, A1 is again a duplicate of the main task, while A2 and A3 are two different categorization tasks based on the morphological properties of the images.For example, hand clapping, arm rolling, and air drums are one auxiliary category (see Table 2, column A3) as they show the usage of both hands.
For the above cases only a linear combination loss (Equation ( 6)) is used.We test different values of the loss rate constant α.Specifically, we use α values 0.1, 0.2, 0.3, 0.4, and 0.5.Table 3 shows the validation accuracy after 250 training epochs for the two datasets while using different auxiliary tasks and loss rate constants.The validation dataset was randomly selected as a separate part from the training dataset; its size is equal to 10% of the training dataset.We chose 250 training epochs based on the observation that at around this number the network's accuracy reaches a plateau with no significant improvement (see Figure 3, which illustrates validation accuracy over 1024 training epochs on CIFAR10-DVS and DVSGesture128 datasets).
Both data augmentation and auxiliary learning improve the accuracy of the SNN.Data augmentation results in a more significant increase in performance, while the utilization of auxiliary learning further improves the performance achieved through data augmentation alone.It is worth noting that there is a decline in performance when using task A2 for CIFAR10-DVS data.This decrease can be attributed to the fact that auxiliary tasks should find useful information to facilitate learning.Apparently A2 does not provide such information for the network since living vs. non-living categorization is based on a very abstract concept that the network is not able to handle.Regarding the choice of the loss rate constant, higher values (greater than 0.3) yield better results (except for case A2 for CIFAR10-DVS data).However, the difference in performance is not clear-cut, making the manual selection of this parameter quite challenging.Because of this, we use an automated method for selecting the loss rate constant; it is described below in Section 4.3.

Training with More Than One Auxiliary Task
Table 4 shows the testing accuracies of AL with two (AL-SNN-2T), three (AL-SNN-3T), and four (AL-SNN-4T) auxiliary tasks.The first three auxiliary tasks are the same classification tasks as in Table 2.The fourth auxiliary task is randomly generated as a four-label classification.For convenience of comparison, the results for ST-SNN and the best results for AL-SNN trained with one auxiliary task (repeated from Table 3) are also shown.Observe that training with more auxiliary tasks did not yield better results than using a single auxiliary task.The process of determining the appropriate selection of auxiliary tasks, determining their respective weights, and choosing a proper combination of loss rate becomes highly challenging, rendering a manual grid search unfeasible.In our test, the uniform combination of weights of 1 for all auxiliary losses and a loss rate of 0.5 is not the optimal choice, as seen from the results.Given the complexities involved in the manual combination of multiple auxiliary tasks, an automated method for combining them becomes essential to effectively leverage their strength, which is described next.

Using Implicit Differentiation
Here, we use implicit differentiation to train a loss combination function, h, such that L is minimized (Equation ( 5)).Table 5 shows the testing accuracies of training AL using all four auxiliary tasks and implicit differentiation.h is tested for both linear (AL-SNN-IDL-4T) and non-linear (AL-SNN-IDNL-4T) cases.Traditional ANN with three hidden layers is used for the non-linear case.Observe that employing automatic differentiation with a non-linear function h yields the best overall result.When a linear function h is used, the obtained result is very close to the best outcome achieved through a manual grid search.These findings show that automatic differentiation not only mitigates the challenges associated with manual grid search but also improves the SNN system performance.It is important to highlight that A4 is a random task that does not provide any useful information, yet automatic differentiation successfully handles this task.This underscores the robustness and adaptability of automatic differentiation in effectively handling diverse tasks, even when they apparently do not provide additional information.

Comparison with State-of-the-Art SNNs
The proposed training approach using auxiliary learning with state-of-the-art methods is compared with using SNN on the CIFAR10-DVS and DVSGesture128 neuromorphic datasets.To identify the best trained networks, we conduct an analysis using precision, recall, and the F1-score.We then select the top-performing network for each dataset.Figure 4 shows the confusion matrix for the selected networks, and Table 6 shows     Overall, SNN trained using auxiliary learning exhibits a well-balanced performance in predicting the labels for each dataset.It is worth highlighting a particular case, which is the prediction of class 3 (cat) for CIFAR10-DVS data.This specific class is the most challenging to predict in the CIFAR10-DVS dataset.
We compare the obtained results with state-of-the-art SNNs, which are shown in Table 7.Notice that training with auxiliary learning achieves the highest accuracy for the DVSGesture128 dataset and the second highest for CIFAR10-DVS.The highest accuracy for CIFAR10-DVS is achieved by AIA, which is an SNN that uses a more advanced neuron model than the PLIF neuron model used in this work.In fact, we see that training with AL achieves higher accuracy when compared with SNN, which uses PLIF neurons (PLIF and NDA).We expect that AL with the AIA neuron model would achieve the best performance.

Conclusions
In this paper, we present the usage of auxiliary learning, in addition to data augmentation, to improve the performance of SNNs.The used network architecture consists of a feature extraction block connected in a feedforward fashion to a main classification block and one or more auxiliary task classification blocks.By using auxiliary tasks, we use additional information during training that helps in the regularization of the feature extraction block.As a result, the feature extraction block is forced to learn more general and robust features, which helps improve the SNN network performance on the main task.The results confirm that using AL during training indeed results in improved performance.Moreover, the experiments demonstrate that the extent of the improvement depends on the careful tuning combination of the loss rate parameters.To overcome this challenge, we use automatic differentiation [43] to automatically adjust the loss combination parameters.Note that all the experiments presented in this study were conducted through simulation using the SpikingJelly neuromorphic library.However, in the future we plan to leverage Intel's Lava framework, which enables one to directly deploy the network on the Loihi2 neuromorphic chip.

Figure 1 .
Figure neuron with learnable time constants.The equations for the membrane potential, U (l) i [n], and synaptic current, I (l) i [n], of neuron i in layer l are given by:
the above performance indicators.The results are shown for 1024 training epochs on the testing set.

Table 1 .
Network architecture used for analyzing DVS-CIFAR10 and DVS128-Gesture neuromorphic data.First, we test the performance of training SNN with just one auxiliary task.For each dataset, we test three different auxiliary task configurations.The labels used for the main (M) and auxiliary (A) tasks are shown in Table2.

Table 3 .
Validation accuracy on DVS-CIFAR10 and DVS128-Gesture datasets using auxiliary learning for 250 training epochs.Bold values represent the best result within the column.

Table 4 .
Validation accuracy on DVS-CIFAR10 and DVS128-Gesture datasets using multiple auxiliary tasks for 250 training epochs.Bold values represent the best result within the column.

Table 5 .
Validation accuracy on DVS128-Gesture datasets using implicit differentiation for 250 training epochs.Bold values represent the best result within the column.

Table 6 .
Testing accuracy, precision, recall, and F1 score for best performing SNN with AL for CIFAR10-DVS and DVSGesture128 datasets.

Table 7 .
Comparison with state-of-the-art SNNs for CIFAR10-DVS and DVSGesture-128 datasets.Bold values represent the best result within the column.