A Homomorphic Encryption Framework for Privacy-Preserving Spiking Neural Networks

Machine learning (ML) is widely used today, especially through deep neural networks (DNNs), however, increasing computational load and resource requirements have led to cloud-based solutions. To address this problem, a new generation of networks called Spiking Neural Networks (SNN) has emerged, which mimic the behavior of the human brain to improve efficiency and reduce energy consumption. These networks often process large amounts of sensitive information, such as confidential data, and thus privacy issues arise. Homomorphic encryption (HE) offers a solution, allowing calculations to be performed on encrypted data without decrypting it. This research compares traditional DNNs and SNNs using the Brakerski/Fan-Vercauteren (BFV) encryption scheme. The LeNet-5 model, a widely-used convolutional architecture, is used for both DNN and SNN models based on the LeNet-5 architecture, and the networks are trained and compared using the FashionMNIST dataset. The results show that SNNs using HE achieve up to 40% higher accuracy than DNNs for low values of the plaintext modulus t, although their execution time is longer due to their time-coding nature with multiple time-steps.


I. Introduction
Machine learning (ML) has witnessed significant development in recent years, finding diverse applications in various sectors such as robotics, automotive, smart industries, economics, medicine, and security [1], [2], [3].Several models based on the structure of the human brain have been implemented [4], including the widely used deep neural networks (DNNs) [5], [6] and spiking neural networks (SNNs) [7], which emulate the functioning of neurons relatively better than DNNs [8].These models require large amounts of data to be trained and reach high accuracy.However, if such data are collected from users' private information, such as personal images, interests, web searches, and clinical records, the DNN deployment toolchain will access sensitive information that could be mishandled [9].Moreover, the large computational load and resource requirements for training DNNs have led to outsourcing the computations on the cloud, where untrusted agents may undermine the algorithms' confidentiality and intellectual property of the service provider.Note that encrypting the data transmission in the communication from client to server using common techniques such as advanced encryption standard (AES) would not solve the issues, because untrusted agents on the server side have full access to the sensitive data and DNN model.Among privacy-preserving methods, homomorphic encryption (HE) employs polynomial encryption to encrypt input data, perform computations, and decrypt the output.Because the computations are conducted in the encrypted (ciphertext) domain, the ML algorithm and data remain confidential as long as the decryption key is unknown to the adversary agents.However, common HE-based methods focus on traditional DNNs, and studying the impact and potential of encryption techniques for SNNs is still unexplored.
In this work, we deploy the Brakerski/Fan-Vercauteren (BFV) HE scheme [10] for SNNs, and compare with its application to DNN architectures [11].From the experimental results, we observed that the SNN models working on encrypted data yield better results than traditional DNN models, despite the increased computational time due to the intrinsic latency of SNNs that simulate human neurons.
Our novel contributions are summarized as follows (see an overview in Figure 1): • We design an encryption framework based on the BVF HE scheme that can execute privacy-preserving DNNs and SNNs (Section III).• The encryption parameters are properly selected to obtain good tradeoffs between security and computational efficiency (Section III-D).
• We implement the encryption framework, evaluate the accuracy of encrypted models, and compare the results between DNNs and SNNs.We observe that the SNNs achieve up to 40% higher accuracy than DNNs for low values of the plaintext modulus t (Section IV).Paper organization: Section II contains the background information of the methods and algorithms used in this work, which are DNNs, SNNs, and HE, with a particular focus on the BFV scheme.Section III discusses the proposed encryption framework for DNNs and SNNs and describes our methodology for selecting the encryption parameters.Section IV reports the experimental results and a discussion on the comparison between DNNs and SNNs when using HE.Section V concludes the paper.

A. Deep Neural Networks and Convolutional Neural Networks
DNNs, whose functionality is shown in Figure 2a, are a class of artificial neural networks composed of multiple layers of interconnected nodes called neurons.These networks are designed to mimic the structure and functioning of the human brain.DNNs are characterized by depth, referring to the many hidden layers between the input and output.This depth allows DNNs to learn complex patterns and representations from data, enabling them to solve intricate problems in fields such as image and speech recognition, natural language processing, and more.
Convolutional neural networks (CNNs) [12] are a specialized type of DNN designed to efficiently process grid-like data, such as images or time series.CNNs apply filters to input data, capturing local patterns and features.This allows CNNs to extract hierarchical representations from visual data, enabling object detection, image classification, and image generation tasks.CNNs have revolutionized the field of computer vision and have been widely adopted in various applications, including autonomous driving, medical imaging, and facial recognition.

B. Spiking Neural Networks
SNNs [13], [14], [15] are a type of neural network model that aim to replicate the behavior of biological neurons.Unlike traditional DNNs that use continuous activation values, SNNs communicate through discrete electrical impulses called spikes.As shown in Figure 2b, these spikes encode the timing and intensity of neuron activations, allowing for more precise and efficient information processing [16], [17], [18], [19].SNNs are particularly suited for modeling dynamic and time-varying data, as they can capture the temporal aspects of input signals.This enables SNNs to excel in temporal pattern recognition, event-based processing, and real-time sensory processing [20], [21], [22], [23].SNNs provide an efficient and brain-inspired computing paradigm for executing ML workloads.However, processing SNNs on traditional (Von Neumann) architectures demands high energy consumption and execution time.To overcome these issues, designers have developed specialized hardware platforms such as neuromorphic chips to execute SNNs in a fast and efficient manner.Compared to non-spiking DNNs, the communication between neurons in SNNs is discrete through spike trains, whereas DNNs have continuous activation values.The key advantage of SNNs is that computations are executed only in the presence of spikes.If the spikes are sparse in time, SNNs can save a large amount of energy compared to the non-spiking DNNs that process continuous values.By emulating the spiking behavior of biological neurons, SNNs offer a promising avenue for understanding and replicating the computational capabilities of the human brain.Because conventional ML datasets typically lack any form of temporal encoding, an additional encoding step is necessary to introduce the required temporal dimension [24].In the case of SNNs, input spikes are treated as a sequence of tensors consisting of binary values [25], [26], [27].

C. Homomorphic Encryption and Brakerski/Fan-Vercauteren scheme
HE is a cryptographic technique that allows computations on encrypted data without decryption [28], [29].A popular scheme used in HE is the BFV scheme [10] (see Figure 3).This scheme leverages polynomial encoding to enable encrypted data manipulation.In this scheme, the client encrypts their sensitive input data using a public key provided by the server [30], [31].The server computes the encrypted data using specialized algorithms that maintain the encryption.The encrypted results are then returned to the client, who can decrypt them using their private key to obtain the desired outputs.The BFV scheme supports addition and multiplication operations on encrypted variables, preserving the algebraic structures necessary for computation.By employing this scheme, sensitive data remain protected throughout the computation process, ensuring privacy and security [32], [33], [34], [35].HE comes in different variants, such as partially HE (PHE), somewhat HE (SHE), and fully HE (FHE), each offering different levels of computation capabilities on encrypted data [36], [37], [38], [10], [39].

Encrypted output data
Encrypted model

Operations of addition and multiplication
Private key
The BFV scheme is a type of FHE, which means that operations are fully encrypted, both on multiplications and additions.Consequently, there is no possibility of obtaining intermediate information during the process.To explain this concept more clearly, we can look at an example using an equation.In this case, we will apply homomorphic invariance only to addition, but FHE applies the same logic to multiplication as well.Our basic equation is Equation (1).Let us assume it undergoes a homomorphic transformation (encryption) represented as Equation ( 2).Let us calculate the result by choosing random values for x and y (see Equation ( 3)).Calculating both sides of Equation ( 1), we obtain Equations ( 4) and (5).Applying the homomorphic transformation of Equation ( 2), we obtain Equations ( 6) and (7).We obtained the same result on both sides of the equation, despite the homomorphic transformation applied in the middle.This is what HE accomplishes.In the case of the BFV scheme and FHE in general, homomorphism applies to both additions and multiplications.

III. Proposed Encryption Framework
In this work, (see Figure 4), we implement a LeNet-5 CNN [11] and its equivalent SNN variant.For the dataset, we leveraged FashionMNIST [40] (see Figure 5), which is similar to MNIST [41] but consists of 10 classes of clothing items (note that we adopt the same test conditions as widely used by the SNN research community where the typical evaluation settings [42] use the spiking LeNet and datasets such as MNIST and Fashion MNIST).The hardware system used for conducting the experiments consisted of a Tesla P100-PCIE GPU, an Intel(R) Xeon(R) Gold 6134 CPU @ 3.20GHz, and 100 GB of RAM.We developed the code in Python, utilizing the PyTorch framework [43], the Pyfhel library for the encryption [44], and the Norse library to implement the SNN [45].

A. FashionMNIST
FashionMNIST [40] (see Figure 5) is a widely used dataset in computer vision and machine learning.It serves as a benchmark for image classification tasks and is a variation of the classic MNIST dataset.Instead of handwritten digits, FashionMNIST consists of grayscale images of various clothing items, such as shirts, dresses, shoes, and bags.It contains 60,000 training and 10,000 testing samples, each a 28 × 28 pixel image.The dataset offers a diverse range of clothing categories, making it suitable for evaluating algorithms and models for tasks such as image recognition, object detection, and fashion-related applications.FashionMNIST provides a challenging yet realistic dataset for researchers and practitioners to explore and develop innovative solutions in computer vision.

B. LeNet-5 and AlexNet
LeNet-5 is a classic CNN architecture developed by Yann LeCun [11].It was explicitly designed for handwritten digit recognition and played a crucial role in the early advancements of deep learning.LeNet-5 is composed of convolutional, pooling, and fully connected layers (see Figure 6).The convolutional layers extract features from the input images using convolutional filters.The pooling layers reduce the dimensionality of the extracted features while preserving their essential information.Finally, the fully connected layers classify the features and produce the output predictions.LeNet-5 revolutionized the field of computer vision by demonstrating the effectiveness of CNNs for image classification tasks.Since then, it has served as a foundational model for developing more advanced CNN architectures and has found applications in various domains, including character recognition, object detection, and facial recognition.AlexNet [6] is a nine-layer DNN composed of six convolutional layers and three fully-connected layers.It represents the reference model for deep CNNs, where stacking several layers resulted in significant performance improvements compared to shallower CNNs.A sequence of several convolutional layers can learn high-level features from the inputs that are used by fully connected layers to generate the output predictions.

C. Spiking-LeNet-5 and Norse, Spiking-AlexNet and Norse
Spiking-LeNet-5 [46], [47], [48], [49] is an extension of the LeNet-5 CNN architecture that incorporates the principles of SNNs [50].It is specifically designed to process temporal data encoded as spike trains, mimicking the behavior of biological neurons.Unlike the traditional LeNet-5, which operates on static input values, Spiking-LeNet-5 receives input spikes as a sequence of binary tensors.It utilizes specialized spiking neuron models, such as the leaky integrate-and-fire (LIF) neuron, to simulate the firing behavior of biological neurons [51].The temporal dimension introduced by spike encoding allows Spiking-LeNet-5 to capture the dynamics and temporal dependencies present in the data.This enables the network to learn and recognize patterns over time, making it suitable for tasks involving temporal data, such as event-based vision, audio processing, and other time-dependent applications.Spiking-LeNet-5 combines the power of traditional CNNs with the temporal processing capabilities of SNNs, opening up new possibilities for advanced SNN architectures.Similarly, Spiking-AlexNet [52] extends AlexNet by incorporating the principles of SNNs, such as spike trains and LIF neurons.
The LIF parameters [53] in Norse are specific settings that define the behavior of LIF neurons in SNNs.These parameters include: syn -represents the inverse of the synaptic time constant.It determines the rate at which the synaptic input decays over time; • τ −1 mem -represents the inverse of the membrane time constant.This parameter influences the rate at which the neuron's membrane potential decays without input; • v leak -specifies the leak potential of the neuron.It is the resting potential of the neuron's membrane when there is no synaptic input or other stimuli; • v th -defines the threshold potential of the neuron.The neuron generates an action potential when the membrane potential reaches or exceeds this threshold; • v reset -represents the reset potential of the neuron.After firing an action potential, the membrane potential is reset to this value.
These parameters play a crucial role in shaping the dynamics of the LIF neuron in the SNN.They determine how the neuron integrates and responds to incoming synaptic input and when it generates an action potential.The specific values of these parameters can be adjusted to achieve desired behavior and control the firing rate and responsiveness of the neuron within the network.
SNNs also require an encoder because they operate on temporal data represented as spikes.Because most ML datasets do not include any temporal encoding, it is necessary to add an encoding step to provide the required temporal dimension.The encoder transforms the input data into sequences of spikes, which are then processed by the SNN as tensors containing binary values.The constant-current LIF encoder is an encoding method used in the Norse library to transform input data into sparse spikes.This encoding technique converts the constant input current into constant voltage spikes.During a specified time interval, known as seq length , spikes are simulated based on the input current.This encoding allows Norse to operate on sparse input data in a sequence of binary tensors, which the SNN can efficiently process.

D. HE parameters and Pyfhel
The HE process, implemented in the Pyfhel library, allows computations on encrypted data without decryption, ensuring data privacy and security [54], [55].Pyfhel is built on the BFV scheme, a fully HE scheme.
The encryption process in the BFV scheme involves transforming plaintext data into ciphertext using a public key [56].The computations can be directly conducted on the ciphertext, preserving the confidentiality of the underlying plaintext [57].The BFV scheme supports various mathematical operations on encrypted data, such as addition and multiplication.These operations can be performed on ciphertexts without decryption, enabling computations on sensitive data while maintaining its privacy [58].
The BFV scheme relies on three key parameters: • m-represents the polynomial modulus degree, influencing the encryption scheme's computational capabilities and security level; • t-denotes the plaintext modulus and determines the size and precision of the encrypted plaintext values; • q-represents the ciphertext modulus, determining the size of the encrypted ciphertext values and affecting the security and computational performance of the encryption scheme.A balance between security and computational efficiency in HE computations can be achieved by selecting appropriate values for these parameters.Pyfhel provides a convenient interface to work with the BFV scheme, allowing for data encryption, computation, and decryption while maintaining privacy and confidentiality.
Another critical parameter is the noise budget (NB), which refers to the maximum amount of noise or error that can be introduced during the encryption and computation process without affecting the correctness of the results.When performing computations on encrypted data, operations such as additions and multiplications can accumulate noise, deleting the decrypted results' accuracy.The NB represents a limit on how much noise can be tolerated before the decrypted results become unreliable.The NB needs to be carefully managed and monitored throughout the computation process to ensure the security and correctness of the encrypted computations.

IV. Results and Discussion
The experiments are divided into several parts to obtain accurate results: • Training of the LeNet-5, AlexNet, Spiking-LeNet-5, and Spiking-AlexNet models on the training set of the FashionMNIST dataset; • Validating the models on the test set of the same dataset; • Creating encrypted models based on the previously trained models [59]; • Encrypting the test set; • Evaluating the encrypted images on the encrypted LeNet-5, AlexNet, Spiking-LeNet-5, and Spiking-AlexNet models.

A. Training phase
For the training phase, optimal parameters were set to increase accuracy.The best learning rate was found using the learning rate finder technique [60], whereas the number of epochs was chosen based on early stopping to prevent overfitting [61].Table I reports all the parameters chosen for the training phase.
Figure 7 shows the accuracy and loss during training, comparing the LeNet-5 CNN with Spiking-LeNet-5 and their respective validation values at each epoch.
Note that Spiking-LeNet-5 has slightly lower accuracy than (non-spiking) LeNet-5 due to the intrinsic complexity of the model itself, and its computational time is, on average, equal to that of LeNet-5 multiplied by the value of the seq length .

B. Encryption
It is necessary to determine the three fundamental parameters that define a BFV HE scheme to proceed with image encryption: m, t, and q.
The m parameter is chosen as a power of two and is directly proportional to the amount of NB.Values that are too small would be insecure, whereas values that are too large would make the computation too complex.Generally, m is never less than 1024, and in our specific case, we observe that values of 2048 or higher do not influence the results but incur in exponentially longer computation time.For these reasons, we chose to keep the parameter m fixed at 1024.
The t parameter can also vary, and low values do not allow for proper encryption, whereas excessively high values degrade the result due to computational complexity.In our case, we evaluated the results over values ranging from 10 to 5000.
The q parameter is closely related to the m parameter in determining the NB.Hence, it is automatically calculated by the Pyfhel library to achieve proper encryption.
With the hardware at our disposal (Tesla P100-PCIE GPU, Intel(R) Xeon(R) Gold 6134 @ 3.20GHz CPU, and 100 GB of RAM), it took approximately 30 s to encrypt each image and an additional 30 s to evaluate encrypted LeNet-5.However, for evaluation on encrypted Spiking-LeNet-5, it took around 15 min due to the seq length parameter equal to 30.For a clearer visualization, Table II shows a comparison of the computation times for each image along with estimates for other models: AlexNet [6], VGG-16 [64], and ResNet-50 [65].These long execution times are aligned with the recent trend in the community that demands to build specialized accelerators for HE.A popular example is represented by the data protection in virtual environments (DPRIVE) challenge, used by DARPA to sponsor organizations that pursue R&D of HE hardware [66], [67], [68].

C. Evaluation
In Figures 8 to 11, we can observe the results of encryption compared to the standard ones, along with the correct labels as the parameter t varies.The various parts of the bars in the figures are divided as follows: • Blue-both correct: indicates the number of images classified correctly in both the standard and encrypted executions; • Orange-standard correct: represents the case where images are classified correctly in the standard execution but not in the encrypted one.It can be observed that by summing the blue and orange columns, we always obtain the same result: the accuracy of validation during training (see pointer 1 -Figures 8 to 11); • Green-encrypted correct: represents images classified correctly in the encrypted case but not in the standard one.It can be noticed that the percentages are generally low; this is because the encrypted model mistakenly classified the images differently from the standard model, but by chance, it happened to choose the correct label.Therefore, this column part does not represent a valid statistical case but rather randomness; • Red-both wrong but equal: indicates cases where the encrypted model was classified identically to the standard one but did not classify the correct label.This part is essential, as it shows the encrypted model working correctly by emulating the standard model, even though the classification is incorrect overall; • Purple-both wrong and different: shows cases where the encrypted model made mistakes by not producing the same result as the standard model, and the standard model also made mistakes by not classifying correctly.It can be noticed that for both low and high values of t, the results degrade rapidly.For a better understanding, let us compare LeNet-5 with Spiking-LeNet-5 by looking at Figures 12 and  14, and AlexNet with Spiking-AlexNet in Figures 13 and 15, where the accuracies are graphically displayed as t varies.
In Figures 12 and 13, we compared the LeNet-5, Spiking-LeNet-5, AlexNet, and Spiking-AlexNet models in the case where both the standard and encrypted models were correct, representing the graphical representation   of the blue parts of Figures 8 to 11.As can be seen, the Spiking-LeNet-5 version achieves acceptable levels of accuracy much earlier than LeNet-5, even with low values of t (see pointer 1 -Figure 12).For instance, when t = 50, Spiking-LeNet-5 achieves about 40% higher accuracy than LeNet-5.However, the final accuracy of the Spiking-LeNet-5 model is slightly lower than that of LeNet-5 (see pointer 2 -Figure 12); this can be attributed to the fact that the Spiking-LeNet-5 model itself had lower validation accuracy compared to LeNet-5, as shown in Figure 7. Similar observations can be derived by comparing AlexNet with Spiking-AlexNet.Spiking-AlexNet reaches higher accuracy than the AlexNet for low values of t (see pointer 1 -Figure 13), but for larger t, the accuracy of AlexNet is slightly higher than that of Spiking-AlexNet (see pointer 2 -Figure 13).On the contrary, in Figures 14 and 15, we compared the sums of the blue and red parts from Figures 8 to 11.In this manner, we can observe all the cases where the encrypted version produced the same result as the standard one, even if it was incorrect (see pointer 2 -Figures 14 and 15).From this graph, we can notice that the encrypted version of  the Spiking-LeNet-5 model performs better than the encrypted LeNet-5, and the encrypted Spiking-AlexNet performs better than the encrypted AlexNet.The SNNs achieve valid results with lower values of t (see pointer 1 -Figures 14  and 15) and higher overall accuracy.For excessively high values of t, the results degrade for both the DNN and SNN models due to the increased computational complexity, which hinders the attainment of acceptable outputs (see pointer 3 -Figures 12 to 15).

V. Conclusions
In this work, we have demonstrated how SNNs can be a crucial factor in the development of future private and secure networks.Despite the increased time requirement, SNNs offer higher reliability, and further research can potentially reduce the time differences between DNNs and SNNs.The use of encryption systems such as HE is now more important than ever, considering the vast amount of data being exchanged worldwide.In this research, we have successfully shown how complete encryption systems can be applied to complex models, both CNNs and SNNs, ensuring correct final results without the possibility of decoding during the intermediate process, and how these results differ for CNNs and SNNs.Our work represents the first proof-of-concept that demonstrates the applicability of HE schemes to SNNs.In future works, we plan to design acceleration techniques for encrypted SNNs and extend the experiment set with deeper networks.

Fig. 2 .
Fig. 2. Overview of (a) the functionality of a DNN and (b) the functionality of an SNN.

Fig. 4 .
Fig. 4. Our proposed encryption framework with the experimental setup.

Fig. 5 .
Fig. 5.The FashionMNIST dataset consists of 10 classes of monochrome clothing items and is divided into 60,000 images for the training set and 10,000 images for the test set.

Fig. 7 .
Fig. 7. Accuracy and loss during training and validation of LeNet-5 and Spiking-LeNet-5 for the FashionMNIST dataset.The figure shows accuracy and loss values across different training epochs.

Fig. 13 .
Fig. 13.Comparison of FashionMNIST accuracy between AlexNet and Spiking-AlexNet for t variations when both standard and encrypted versions classified correctly.
Fig. of FashionMNIST accuracy between LeNet-5 and Spiking-LeNet-5 for t when the standard and encrypted versions coincide in both correct and incorrect classification.

TABLE II
Execution time for each image reported in seconds for each model.The total encrypted execution is broken down into encryption and processing time of encrypted data.The long processing times of encrypted data are due to the complexity of the encrypted computations.