FICConvNet: A Privacy-Preserving Framework for Malware Detection Using CKKS Homomorphic Encryption

Pang, Si; Wen, Jing; Liang, Shaoling; Huang, Baohua

doi:10.3390/electronics14101982

Open AccessArticle

FICConvNet: A Privacy-Preserving Framework for Malware Detection Using CKKS Homomorphic Encryption

by

Si Pang

¹,

Jing Wen

²,

Shaoling Liang

² and

Baohua Huang

^1,*

¹

School of Computer and Electronic Information, Guangxi University, Nanning 530004, China

²

Guangxi Key Laboratory of Digital Infrastructure, Guangxi Zhuang Autonomous Region Information Center, Nanning 530000, China

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(10), 1982; https://doi.org/10.3390/electronics14101982

Submission received: 7 April 2025 / Revised: 30 April 2025 / Accepted: 9 May 2025 / Published: 13 May 2025

(This article belongs to the Section Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

Recent advancements in cloud computing, edge computing, and Internet of Things (IoT) have increased the complexity of network environments and provided fertile ground for malicious attacks. Existing DL-based malware detections, while making progress in detection accuracy and generalization ability, face serious challenges in user data privacy protection. To address this problem, this paper proposed a non-interactive malware detection system based on CKKS homomorphic encryption (FICConvNet). The system effectively achieves end-to-end data privacy protection, ensures that sensitive data uploaded by users are processed in an encrypted state, prevents data leakage, and protects the privacy of detection results. The key technology of FICConvNet is its innovative lightweight ciphertext inference architecture, which combines DS Conv and structured sparse projection to significantly reduce the complexity of homomorphic computation. Meanwhile, in this paper, an adaptive learnable activation function (ALPolyAct) is designed to replace the traditional fixed polynomial activation function to enhance the expressive power and inference accuracy of the model. In addition, the privacy protection of user data and the security of detection results are optimized by the zero-decryption inference process. Experimental results show that FICConvNet achieves a detection accuracy of 95.86%, which significantly outperforms the existing ciphertext inference model CryptoNets (15.5% improvement) and approaches the performance of the plaintext model ResNet-18. In addition, FICConvNet reduces ciphertext inference time by about 80% compared to Conv2d structures. The research in this paper provides an effective privacy-preserving solution in the field of malware detection and explores new research directions for the application of homomorphic encryption in malware detection.

Keywords:

malware detection; privacy protection; CKKS homomorphic encryption; adaptive activation function; zero-decryption inference

1. Introduction

Recent advancements in cloud computing, edge computing, and Internet of Things (IoT) have increased the complexity of network environments and provided fertile ground for malicious attacks. According to the Malware Trends Report 2024 [1], the incidence of malware attacks is projected to reach unprecedented levels this year, particularly with the surge in stealer malware activity, which has risen to 51,291 reported detections, compared to 18,290 last year. To address the increasing frequency of these malicious attacks, deep learning (DL)-based malware detection has become a crucial area of research.

However, existing studies often prioritize optimizing deep learning models for accuracy while overlooking user data privacy and the risk of leakage of detection results. Given the increasingly stringent data privacy regulations (e.g., GDPR in Europe and CCPA in the U.S.), ensuring privacy protection without compromising detection performance has become essential in cybersecurity [2]. Consequently, our research aimed to explore strategies to achieve both privacy protection and effective detection performance.

Current malware detection schemes typically necessitate users to upload raw binary files or plaintext features to the cloud for analysis. This practice poses significant privacy risks, as sensitive data (e.g., user identity and device operation records) may be exposed during unencrypted transmission or plaintext computation, potentially leading to snooping by third-party providers or interception by malicious actors. Furthermore, in high-security environments, users seek to safeguard detection results. Therefore, ensuring user privacy throughout the entire malware detection process is an urgent challenge that must be addressed in the field of cybersecurity [3].

Homomorphic encryption (HE) technology allows computation on encrypted data, and it can be categorized into partially homomorphic encryption (PHE) and fully homomorphic encryption (FHE) based on the types of operations it supports. Its invention has laid the theoretical foundation for privacy-preserving inference. Nonetheless, research on DL inference using HE still encounters challenges, primarily due to the high computational complexity and difficulties in calculating nonlinear activation functions in their encrypted state. Furthermore, most existing schemes rely on multiple rounds of interaction between the client and server or coordination with trusted third parties, which significantly increases system complexity and security risks.

CKKS (Cheon-Kim-Kim-Song) is a fully homomorphic encryption (FHE) scheme that provides advantages, including support for floating-point computations and SIMD parallel processing, making it well suited to meet the computational demands of DL model inference. In recent years, it has demonstrated significant benefits in practical applications such as secure multi-party computation and cloud-based inference.

In response to these identified challenges, this paper introduced a non-interactive malware detection system utilizing the CKKS scheme, detailing the following key contributions:

Lightweight Ciphertext Inference Architecture and Acceleration Strategy
−
Integrating Depthwise Separable Convolution (DS Conv) with Sparse Projection techniques reduces the number of homomorphic multiplications and ciphertext rotations, thereby decreasing computational overhead. Based on this, we designed a ciphertext-friendly convolution module FICConv to achieve efficient Ciphertext Inference.
−
We proposed a Dynamic Multi-byte Mapping Algorithm to generate malicious code images through weighted arithmetic and mean value, compressing data volume while retaining key features.
Adaptive Learning Activation Function and Accuracy Compensation Mechanisms
−
We designed a dynamic parametric polynomial activation function (ALPolyAct), combined with L2 regularization and Residual Connection, to adapt to the feature distribution across different network layers, enhancing the model’s expressiveness and inference accuracy.
End-to-End Non-Interactive Framework for Privacy Protection
−
Zero-decryption inference is achieved through the use of single ciphertext inference technology. Users upload only encrypted data and receive encrypted results, reducing communication overhead and eliminating the risk of data leakage due to the absence of decryption operations on the server side.

The remainder of this paper is organized as follows: Section 2 reviews related work on malware detection using deep learning and deep ciphertext inference models. Section 3 introduces our non-interactive malware detection system based on CKKS homomorphic encryption. Section 4 presents the evaluation of our method. In Section 5, we provide a comprehensive discussion of the proposed framework. Finally, Section 6 concludes the paper.

2. Related Works

Our examination of the existing literature can be categorized into two primary directions: firstly, DL techniques for malware detection; secondly, neural network model based on HE. In the following sections, this paper will critically analyze the advancements in research within these two domains.

2.1. DL for Malware Detection

In the field of malware detection based on DL, researchers have constructed and optimized the model structure to learn a variety of features of malware, such as binary files, logs, and call instructions, thereby improving the accuracy and generalization capabilities of malware detection [4]. Nataraj et al. [5] proposed a pictorial transformation method for mapping sequences of bytes into gray scale images, opening up the field of malware detection in conjunction with visualization techniques. To address the issue of data imbalance among different families in the Malimg dataset, Cui et al. [6] utilized grayscale images for classification using CNN and developed an effective data balancing method called DRBA, based on the Bandwidth Allocation Technique (BAT) algorithm. In addition, Kalash et al. [7] proposed a method for malware classification using deep CNNs, which demonstrated the effectiveness of DL in malware detection, with experimental results on the Malimg and Big2015 datasets showing classification accuracies of 98.52% and 98.99%, respectively. Jang et al. [8] explored the use of synthetic image generative adversarial networks (GANs) to enhance malware detection models, showing the potential of this approach in dealing with complex malware code by generating new malware images and combining them with a convolutional CNN trained to cope with data imbalance. Ma et al. [9] presented a convolutional neural network and an attention mechanism-based model by converting malware binaries into gray scale images in order to extract more informative features. Wu et al. [10] proposed an integrated DL based on intuitionistic fuzzy sets; the proposed method first extracts six types of features from disassembled and byte files, before fusing them together in order to solve the problem of a single feature in traditional classification methods. Singh et al. [11] proposed a visualization and machine learning-based framework to classify Android malware. Manual features are extracted from image sections using algorithms such as Global Image Descriptor (GIST) and fused with CNN features. Nobakht et al. [12] proposed a lightweight framework, DEMD-IoT, combining three one-dimensional Convolutional Neural Networks (1D-CNNs) and Random Forest Meta-Learner to detect IoT botnets. It achieves 99.9% accuracy on an IoT-23 dataset and supports high concurrent traffic analysis. Agrawal et al. [13] et al. proposed the ARI-LSTM model to enhance LSTM’s ability to capture the temporal dependency of ransomware API call sequences through the attention mechanism and significantly improve the detection performance on the Windows environment ransomware dataset. Aldehim et al. [14] designed a two-hidden-layer extreme learning machine (TELM) for specific scenarios, which is combined with Markov chain optimization for fast malware classification. The result was reduced detection latency with 98.95% accuracy in medical and IoT scenarios. Miao et al. [15] introduced DistillMal, a novel approach for lightweight malware detection based on knowledge distillation. This method enhances performance by enabling a student network to acquire valuable cueing knowledge from a teacher network, thereby achieving a lightweight model. Their work underscores the potential of knowledge distillation in developing efficient malware detection systems. Aldhafferi et al. [16] proposed a new approach for Android malware detection by extracting dynamic functional preprocessing and normalization from Android applications, introducing SVR with a Radial Basis Function (RBF) kernel for malware classification, which achieves comprehensive detection for identifying known and novel malware variants.

2.2. Neural Network Model Based on HE

In 2016, Dowlin et al. [17] proposed CryptoNets, the first neural network that can be applied to encrypted data, enabling users to send data in encrypted form to the cloud for encrypted predictions and return the predictions in encrypted form. They achieved 99% classification accuracy on the MNIST dataset but performed poorly on deeper neural network models. Chabanne et al. [18] combined batch normalization with a polynomial approximation to devise a privacy-preserving classification method for neural networks deeper than 2 layers, demonstrating better performance than CryptoNets. Hesamifard et al. [19] designed CryptoDL, which uses low-order polynomial approximation activation functions (e.g., ReLU, Sigmoid, Tanh) and achieves 99.52% and 91.5% accuracy on the MNIST and CIFAR-10 datasets, respectively, thus ensuring compatibility with homomorphic encryption. Lee et al. [20] implemented the standard ResNet-20 model using the RNS-CKKS FHE with bootstrapping and validated the implementation using the CIFAR-10 dataset and explicit model parameters. In terms of binarizing the input data and weights of the CNN model, Zhou et al. [21] achieved DL acceleration for encrypted data. They designed an efficient pooling layer to handle ciphertext comparison operations and demonstrated an improved performance of at least 6.3 s for convolutional operations. Nandakumar et al. [22] evaluated the feasibility of training neural networks on encrypted data in a fully non-interactive manner, and the proposed system uses the open-source fully homomorphic encryption (FHE) toolkit HElib to implement Stochastic Gradient Descent (SGD)-based neural network training. Training is accelerated by simplifying the network to reduce precision and selecting the appropriate data representation and resolution. Badawi et al. [23] implemented the first HE Convolutional Neural Network (HCNN) in combination with GPU acceleration. Classification accuracies of 99% and 77.55% were achieved on the MNIST and CIFAR-10 datasets, respectively. Chen et al. [24] proposed a THE-X framework to apply HE to the Transformer model for the first time, with an accuracy loss of less than 1% on the GLUE dataset through ReLU substitution and Softmax approximation. Zhu et al. [25] designed an FPGA-based, HE-encrypted CNN inference framework that achieves 13.49 s speedup compared to CPU scheme and supports MNIST and CIFAR-10 tasks. Kim et al. [26] combined the RAConv and CAConv algorithms to reduce the inference latency of ResNet-18 on ImageNet to 14.7 s, significantly optimizing the memory footprint. Toluwani.A et al. [27] proposed the PolyKervNet architecture, which uses polynomial kervolution to completely eliminate nonlinear layers, thereby improving the efficiency of private inference and achieving significant latency reduction in image classification tasks.

3. Method

3.1. Overall Architecture Design

In this paper, we presented an end-to-end, non-interactive DL inference model for malware detection in a ciphertext environment. The primary objective of this model was to facilitate efficient malware classification while ensuring the privacy of user data and detection results. The overall architecture of the malware detection system is illustrated in Figure 1, which includes several key modules.

3.1.1. Client Side

Visualization:

To preprocess the files under inspection and obtain standardized plaintext input images, we proposed a multi-byte mapping algorithm (refer to Section 3.2 for details).

Encrypted Input:

The input image is encrypted using the CKKS scheme with the Microsoft SEAL 4.1 [28] encryption library, encoding all pixel values of the input image into a single ciphertext using Plaintext Slot Technology. The ciphertext and public key are then sent to the server side.

Decrypted output:

After receiving the predicted result ciphertext returned by the server side, the prediction classification results are obtained by decrypting it with the private key.

3.1.2. Server Side

We designed a flexible and efficient malware detection network architecture on the server side. This architecture optimizes computational inference for CKKS ciphertext inputs.

Stem

In the Stem, we only use one layer of conventional 2D convolution (Conv2d) for feature down-sampling.

Backbone

In the backbone, we permitted flexible combinations and stacking of convolutional and pooling layers. This strategy alleviates the limitations on feature changes caused by the inability of CKKS ciphertext convolution to use zero padding.

Head

We discarded the linear layer and directly utilized a 1 × 1 Conv along with global average pooling to obtain the final output distribution.

Furthermore, this paper proposed a Fast-Inference Cipher Convolution (FICConv) module to replace Conv2D within the Backbone and introduced a learnable polynomial activation function (ALPolyAct) to substitute activation functions (such as ReLU and sigmoid) that are difficult to implement in ciphertext inference.

3.2. Data Preprocessing

To reduce the amount of encryption of malware data while preserving its features, we proposed a multi-byte mapping algorithm for visualizing malware.

Firstly, the original binary data are extracted from the malware file, which is regarded as a series of bytes (8-bit representation), and depending to the size of the malicious file, these bytes are grouped according to the rules in Table 1. k denotes the number of bytes contained in each group, and each group of bytes is mapped into pixel points in turn, and the mapping method is as follows:

Step 1. Obtain the bytes in the group corresponding to the decimal arithmetic sum P₁ and the mean value P₂:

P_{1} = \sum_{i = 1}^{k} b_{i} m o d 256,

(1)

P_{2} = ⌊ \frac{1}{k} \sum_{i = 1}^{k} b_{i} ⌋,

(2)

Step 2. P₁ and P₂ are weighted and summed to obtain the pixel value:

P = 0.7 P_{1} + 0.3 P_{2},

(3)

We mapped multiple bytes to single pixels in order to effectively reduce the amount of data to be processed when faced with larger files to be detected, as well as to generate local extreme features and overall trends for each byte group in a balanced manner by weighted summation of arithmetic sums (P₁) and means (P₂).

To make the initially generated image close to square, the width of the image matrix is set to

⌈ \sqrt{N} ⌉

, with N being the total number of pixels obtained from the previous mapping, and the pixels are populated into the corresponding image matrix preferentially by rows, with the remaining portion being populated with zeros. The image is then resized to the standard input size of 64 × 64 using bilinear interpolation.

In the final preprocessing stage, each pixel value of the generated image is sequentially mapped into the plaintext slots of the CKKS ciphertext. The CKKS encryption is then executed via the SEAL library, thereby transforming the plaintext image into its encrypted ciphertext representation.

3.3. Ciphertext Inference Model Optimization

This section will focus on how to optimize ciphertext inference models to improve computational efficiency and accuracy. We will explore enhancements in convolution algorithms under ciphertext, as well as activation functions and pooling operations, to ensure the effective operation of ciphertext inference models.

3.3.1. Convolution Layer Optimization

Rotated Ciphertext Convolution Algorithm

Extracting and splitting data from CKKS ciphertext is challenging, making it impossible to derive computed values for each convolution window and reorganize them into a feature matrix as with plaintext computation. Therefore, this paper employed the Rotated Ciphertext Convolution Algorithm from [29] to perform the ciphertext convolution operation. The algorithm is illustrated in Figure 2.

Firstly, the input ciphertext is rotated to align with the parameters of the Conv kernel by Rot(A,x) (A represents the ciphertext object, x denotes the number of rotation steps) to correspond to the parameters of the Conv kernel. The number of rotation operations is related to the size k × k of the Conv kernel. As shown in Figure 2, k = 2, then k² − 1 = 3 rotations are required to acquire all the copies needed for Conv computation.

Subsequently, each rotated ciphertext block is multiplied with the corresponding Conv kernel parameter to obtain the intermediate result of each Conv, a process that involves performing k × k CKKS ciphertext multiplication operations.

Finally, all the Conv intermediate results are summed up to generate the ciphertext output containing the final Conv value.

From the above results, it can be seen that k² − 1 rotations and k² CKKS ciphertext multiplications are required for single channel Conv kernel computation when computed using the Rotated Ciphertext Convolution Algorithm. Therefore, when the inputs and outputs of the Conv layer are C_in and C_out, conventional 2D convolution (Conv2d) requires (k² − 1) × C_in rotations and k² × C_in × C_out ciphertext multiplications. To optimize the acceleration of ciphertext Conv computation, we introduce Depthwise Separable Convolution (DS Conv) and sparse projection.

DS Conv

Depthwise Separable Convolution (DS Convolution) decomposes Conv2d into two steps:

Step 1. Depthwise Convolution (DW Conv): Each input channel is independently convolved with a single channel convolution kernel without fusing information across channels.

Step 2. Point Conv: Cross-channel feature fusion is performed using a 1 × 1 Conv kernel on the features obtained from the channel-by-channel convolution.

Therefore, for the computation of DS Convolution in the CKKS ciphertext state, the DW Conv is first performed using the rotated ciphertext convolution method, followed by Point Conv, which is an operation that directly multiplies and sums the output ciphertext of the DW Conv with the pointwise convolution parameters. Compared with the conventional Conv2d, the use of depth-separable convolution only requires k² × Cin + Cout × Cin times of ciphertext multiplication, which can effectively reduce the amount of computation.

Sparse projection

In DL, sparsifying the Conv kernel is a technical means of effectively enhancing the computational efficiency of the model. From the previous section, we know that the number of ciphertext rotations and multiplications in the Rotated Ciphertext Convolution Algorithm is determined by the number of parameters in the Conv kernel, and the sparse Conv kernel parameter matrix can effectively reduce the number of both to accelerate ciphertext computation. As shown in Figure 3, the Conv kernel parameter matrix is sparsified so that only the parameters K₀, K₁ are retained, and only one ROT operation and two ciphertext multiplications are required for the input CKKS ciphertext A.

In order to standardize the sparse form of the Conv kernel parameter matrix and to facilitate the skipping of zero parameters during ciphertext computation, we introduced sparse projection, which directly targets specific regions of the Conv kernel parameter matrix for sparsification during the training process. This method actively sets a specific region of the parameter matrix to zero during training, limiting the learning ability of connections in that region. As shown in Figure 4, the original Conv kernel is K ∈ R^k×k and the sparse projection matrix is M ∈ {0, 1}^k×k, then the kernel after sparse projection is

K_{s p a r s e} = \{\begin{matrix} K_{i, j} if M_{i, j} = 1 \\ 0 if M_{i, j} = 0 \end{matrix},

(4)

Then, the Conv result becomes

C = {A \times K}_{s p a r s e},

(5)

A comparison of the optimization computations of the above-mentioned methods is shown in Table 2, where C_in and C_out are the number of input and output channels, respectively. S is the degree of sparse projection (S ≤ 1), ROT is the number of ciphertext rotations, and HE-Mult is the number of homomorphic multiplications.

3.3.2. A Learning Polynomial Activation Function (ALPolyAct)

The activation functions (e.g., ReLU, Sigmoid, Tanh) used in general plaintext models involve high-order polynomials or nonlinear operations, which are difficult to implement directly in CKKS homomorphic encryption environments and thus will be replaced by low-order polynomials as approximate activation functions. Low-degree polynomials contain only addition and multiplication operations, which facilitate computation and activation under ciphertext. However, general low-degree polynomial activation functions are defined with fixed values, which can limit their adaptability and performance in different-depth convolution layers. To address this issue, a learnable low-degree polynomial activation function (ALPolyAct) was proposed in this paper. ALPolyAct is defined as follows:

f (x) = α_{1} x + α_{2} x^{2} + α_{3},

(6)

α_{1}

,

α_{2}

, and

α_{3}

are polynomially learnable parameters. During the training process using explicit text, we add L2 regularization terms to the loss function to encourage the parameters and approximation to 1. We optimally update the

α_{1}

,

α_{2}

, and

α_{3}

parameters through backpropagation, so that the activation function adapts to the feature distributions of the different layers, enhances the model expression ability, and performs projection operation after each parameter update to constrain the parameters and size through the projected gradient method, balancing the nonlinear expression ability and numerical stability, and preventing gradient explosion. Algorithm 1 is as follows:

Algorithm 1 ALPolyAct Parameter Training

Input: Training dataset D, regularization strength λ = 0.01, learning rate η, epoch N.
Output: For each ALPolyAct layer l
1: Start
2: Initialize ALPolyAct for each layer l: α1^(l) = 0.4, α2^(l) = 0.3, α3^(l) = 0.2.
3: for epoch = 1 to N do
4: Forward propagation
5: for each layer l = 1 to L do
6:

z^{(l)} = {C o n v}^{(l)} (z^{(l - 1)})

7:

a^{(l)} = α_{1}^{(l)} z^{(l)} + α_{2}^{(l)} (z^{(l)})^{2} + α_{3}^{(l)}

// Per-layer ALPolyAct
8: Compute Loss
9:

L_{m a i n} = T a s k L o s s (a^{(L)}, y)

10:

L_{r e g} = λ \cdot \sum_{l = 1}^{L} (α_{1}^{(l)} + α_{2}^{(l)} + α_{3}^{(l)} - 1)^{2}

11:

L_{t o t a l} = L_{m a i n} + L_{r e g}

12: Backward propagation and parameter update
13: Compute gradients

\nabla_{{α_{i}^{(l)}}} L_{t o t a l}

for all layers
14: for each ALPolyAct layer l do
15:

α_{i}^{(l)} \leftarrow α_{i}^{(l)} - η \cdot \nabla_{α_{i}^{(l)}} L_{t o t a l} (i = 1,2, 3)

//Update parameters
16:

S^{(l)} = α_{1}^{(l)} + α_{2}^{(l)} + α_{3}^{(l)}

17:

α_{i}^{(l)} \leftarrow α_{i}^{(l)} / S^{(l)} (i = 1, 2, 3)

18: End

3.3.3. Pooling Layer

Since max pooling requires comparison operations, which are not friendly to CKKS ciphertexts, only average pooling is applied for feature vector scaling in this paper’s method. The mathematical essence of average pooling is the computation of feature means over local regions. Its computation is similar to Conv. Specifically, for k × k average pooling, it can be equivalently converted into the following two steps:

In step 1, a convolutional computation with a convolution kernel size of k × k, parameters of 1, and stride = k is performed. In the CKKS ciphertext state, the ciphertext rotation convolution algorithm can be degraded to a purely additive operation because the parameters of the convolution kernel are all 1. It is straightforward to let the copies obtained by Rot be homomorphically summed up to obtain the CKKS ciphertext that implements windowed local summation:

C_{s u m} = \sum_{i = 0}^{k^{2} - 1} R o t (C, i),

(7)

In step 2, the result is scaled to obtain the window average, which is a multiplication of ciphertext computation.

C_{a v g} = \frac{1}{k \times k} {\cdot C}_{s u m},

(8)

Therefore, in ciphertext computation, k² − 1 ciphertext rotations and one ciphertext multiplication are required during the pooling of a single input channel.

3.4. Fast-Inference Ciphertext Convolution (FICConv)

This paper presents a convolution module, FICConv, suitable for ciphertext model inference, which enhances feature extraction efficiency by utilizing DS Conv, introducing sparse projection, ALPolyAct, and residual concatenation. Its structure is illustrated in Figure 5.

The FICConv module utilizes DS Conv as the core for feature extraction and introduces sparse projection in the DW Conv kernel to accelerate ciphertext inference. Before proceeding to the DW Conv computation, a 1 × 1 convolution is employed to restructure the input features across channels, optimizing the distribution of input features. Simultaneously, the ALPolyAct activation function is used to adapt the activation requirements of the FICConv module across different layers of the model during the training process, thereby enhancing feature expressiveness. Meanwhile, to mitigate the feature loss caused by sparse projection and DS Conv, we have introduced residual connections between the input and the DW Conv outputs.

Due to the limitations of convolution padding in a ciphertext environment, any convolution computation with a kernel size greater than 1 will result in a change in feature dimensions. To match the output feature size of the channel convolution, we employed a proportionate projection method to adjust the size of the residual features, as shown in Figure 6:

The method is similar to the average pooling operation, where we employed a convolution kernel of the same size and stride as the main route DW Conv, with a parameter of 1. We resize the input features to match their outputs and multiply by 1/k² scaled feature values to obtain the residual outputs. The computational cost of this method in ciphertext is equivalent to that of average pooling, requiring k² − 1 rotation operations and one ciphertext multiplication for a single feature map.

4. Experimental Results

In this section, we presented a series of experiments to demonstrate the effectiveness of our proposed approach. We first detailed the specific setup of the experiments, including the deployed dataset and the validation procedure. Subsequently, we provided a comprehensive performance evaluation of our approach and by comparing it with currently known models and methods, we provided ample evidence of its superior effectiveness.

4.1. Experimental Setup, Dataset, and Validation

The hardware configuration for this experiment is Intel(R) Core (TM) i7-9700 CPU (Intel Corporation, Santa Clara, CA, USA), 32 GB RAM, and NVIDIA GeForce RTX 2080 GPU (NVIDIA Corporation, Santa Clara, CA, USA). The software environment for this paper is configured as follows: Plaintext network training and ciphertext network inference are performed on WIN10 operating system. Plaintext network training environment is deployed; the programming language is Python 3.8, and the tool library is Pytorch 2.3.1. The inference of ciphertext homomorphic neural network is based on C++ programming language, and the encryption tool library is Microsoft SEAL.

In analyzing the performance of our approach, we used the Microsoft BIG [30] dataset, which is publicly available from Microsoft’s security team and contains malware binaries (.bytes and .asm) covering nine malware families (e.g., Ramnit, Lollipop, Kelihos, etc.). To avoid the impact of category imbalance on the evaluation of the experimental results, we excluded the Simda malware family category, which has a very small sample size in the Microsoft BIG dataset, collected 700 benign files from the Windows 10/11 system directory, open-source communities (e.g., GitHub), and commonly used software (e.g., Chrome, Firefox) as benign samples. In preprocessing, we converted the dataset raw PE binary files to malware gray scale images by taking the multi-byte mapping algorithm conversion described in this paper. The classification labels and number of datasets are shown in Figure 7.

In order to comprehensively evaluate the model capability, this paper adopted a 75% training set and 25% test set division in some experiments, where the training set was used for model training and the test set was used to evaluate the model performance. In addition, in order to further enhance the reliability of the evaluation and reduce the chance of data division, this paper adopted a 5-fold cross-validation method in the ablation experiment part. The entire dataset is equally divided into five subsets, four of which are selected as training data during each round of validation, while the remaining one is used as a test.

On this basis, in order to ensure the statistical significance of the experimental results, this paper conducted multiple experiments (specifically, 10 runs) and averaged the results to obtain the final experimental data. The confidence intervals for each experimental metric were also calculated to reflect the variability of the model’s performance. Ultimately, this study selected four metrics: Accuracy, Precision, Recall, and F1 Score to evaluate the performance of different methods, and the definitions of these metrics are as follows:

A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N},

(9)

P r e c i s i o n = \frac{T P}{T P + F P},

(10)

R e c a l l = \frac{T P}{T P + F N},

(11)

F 1 S c o r e = \frac{2 \times P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l},

(12)

4.2. Comparative Experiments

We used 4 FICConv modules to stack to obtain FICConvNet-4 and applied sparse projection to each FICConv module, choosing to apply non-zero weights at the four corners and the center of the Conv kernel, with a sparse projection rate of s = 0.556. The structure of FICConvNet-4 is shown in Table 3.

The specific parameters for CKKS encryption and model training are detailed in Table 4. During ciphertext inference, the modulus is adjusted after each layer to effectively manage noise, ensuring that the noise level of the output from each layer stays within a safe range.

In addition to the models in this paper, we selected (1) DL models ResNet-18 and VGG19; (2) existing non-interactive ciphertext inference model CryptonNets and PolyKervNets, which was conducted on Microsoft BIG dataset, and selected the common Precision, Recall, and F1 Score (F1 Score) as the evaluation metrics. The experimental results are shown in Table 5; the model training convergence process is shown in Figure 8.

As can be seen from Table 4 and Figure 8, FICConvNet-4 demonstrates good performance in supporting ciphertext reasoning to protect privacy (privacy is marked as √). Although the training curve of FICConvNet-4 converges at a slower rate compared to traditional CNN models such as ResNet-18 and VGG19, which have nonlinear activation functions, its training stability is comparable to them and better than that of the ciphertext inference models CryptoNets and PolyKervNets. After sufficient training, its accuracy reaches 0.958, outperforming CryptoNets (0.803) and PolyKervNets (0.945), and all performance metrics (Accuracy, Precision, Recall, F1 Score) are close to those of ResNet-18 (0.970) and VGG19 (0.965). This proves the effectiveness of our proposed method.

To validate the detection effect of the FICConvNet-4 model in a ciphertext state, we randomly select 200 samples from the test set (covering all 9 types of malicious families and benign samples), record the output distributions and prediction category results in plaintext and ciphertext states, and perform statistical analyses. We used the Mean Absolute Error (MAE) metric to measure the output distribution differences.

M A E = \frac{1}{9} \sum_{i = 1}^{9} |P_{p}^{(i)} - P_{c}^{(i)}|

(13)

P_{p}^{(i)}

,

P_{c}^{(i)}

are the probabilities that the output distribution of the inference result for a single sample belongs to the i-th classification under plaintext and ciphertext, respectively.

The average MAE values and classification consistency rate of the model under the sample are shown in Table 6. Meanwhile, we plotted a point scatter plot (Figure 9a) to compare the probability of the classification output of top1 for each sample when reasoning in plaintext with the corresponding probability under ciphertext; the difference in reasoning results between the two states is demonstrated by the classification consistency matrix (Figure 9b).

The experiments show that although there is a loss of accuracy in the calculation under ciphertext, the error is very small (MAE = 0.007). And from Figure 9a, it can be seen that the value of top1 of the output prediction distribution is basically the same for each sample in plaintext and ciphertext, and the confidence level is concentrated in 35~80%, which is in line with the stable prediction characteristics of the model for high confidence samples in the real scenario. The consistency of the model’s output predictions reaches 96.5%, as can be seen from the classification consistency matrix Figure 9b; the model predictions are highly consistent in general, and the prediction bias occurs mostly in the classification with fewer samples, which is due to the insufficient training data, insufficient model learning, similar output probability distributions, and errors in ciphertext reasoning that lead to changes in the top1 of the output prediction distributions.

4.3. Ablation Experiment

Firstly, to validate the effectiveness of the proposed multi-byte mapping method for the model presented in this paper, we will train it with the dataset obtained from conventional single-byte mapping methods under the configuration described in Table 4. The comparative results are shown in Table 7. Through comparison, we can see that our proposed method is more suitable for the ciphertext inference model presented in this paper and outperforms the conventional single-byte mapping method across all metrics.

To evaluate the effectiveness of the components in the FICConv module, we designed the following ablation experiments on the Microsoft BIG dataset, including replacement with Conv2d (A), replacement with regular convolution + sparse projection (B), removal of sparse projection (C), removal of residual (D), and ALPolyact replacement with fixed activation function (E). The evaluation metrics include accuracy (Accuracy), F1 score (Macro-F1), and ciphertext inference time (s). Table 8 shows the comparison of the experimental results and the description of the ablation changes. Meanwhile, we recorded the time consumed by the ablated model to pass through each layer during ciphertext inference to Table 9.

The experimental results shown in Table 6 indicate that although FICConv adopts DS Conv and sparse projection to replace the conventional Conv2d to slightly reduce the model detection accuracy, the amount of computation is greatly compressed, and the ciphertext inference time is shortened from 830.97 s to 166.71 s. The inference time is reduced by 7.24 s after removing the residuals, but the Accuracy decreases by 3.54% and the F1 score decreases by 3.35%, indicating that the introduction of the residual structure can effectively alleviate the gradient disappearance problem and improve the model learning ability and performance, although it increases the computational delay. The inference time decreases by 7.24 s after removing the residuals, but the Accuracy and F1 score decrease by 3.54% and 3.35%, respectively, indicating that the introduction of the residuals structure can effectively alleviate the problem of gradient vanishing and improve the learning ability and performance of the model despite the increase in computational delay. The use of the fixed polynomial activation function (Fixed PolyAct) results in a decrease of 4.37% in Accuracy and 4.39% in F1 Score, respectively. This demonstrates that the learning parameters of the Adaptive PolyAct can dynamically adapt to various activation function parameters. This adaptability enables the activation function to adjust dynamically to the feature distribution of different layers, thereby enhancing the nonlinear expressiveness of the ciphertext model.

Based on the data presented in Table 7, it is evident that both the total computation time and the computation time for each module layer using FICConv are significantly better compared to using Conv2d, with reductions reaching approximately 80%. Additionally, the inference efficiency of FICConv surpasses that of lightweight strategies that utilize only DS Conv and combinations of sparse matrices with regular convolution. This effectively demonstrates the exceptional performance of the proposed method in terms of inference efficiency.

In order to verify the effectiveness of the ALPolyAct proposed in this paper more comprehensively, we used the fixed polynomial activation function 0.4x + 0.3x² + 0.2 and x² for replacement and compared the training curves of FICConvNet-4 under the three activation modes, as shown in Figure 10a. At the same time, we plotted the fourth Conv block of the FICConvNet-4 network in training learning curve for the ALPolyAct parameter of FICConv4, as shown in Figure 10b.

In (a), since the initial value of the ALPolyAct activation function is the same as that of the fixed polynomial, the training curves are similar, but the training accuracy and stability are better than that of the fixed polynomial function due to the continuous learning of adjusting the parameter values to adapt to the activation needs of each layer through backpropagation ALPolyAct. As can be seen in (b), the parameters of ALPolyAct of FICConv4 are transformed and gradually stabilized during training, and the convergence values are greatly changed compared with the initial values. The above results demonstrate that, compared with fixed polynomials, the learnable mechanism of ALPolyAct can adapt itself to the activation requirements of different layers in the model, which can effectively improve the convergence speed and stability at the later stage, and verify its advancement in the ciphertext reasoning scenario.

5. Discussion

5.1. Security Discussion

In the proposed scheme, we assume that the server is honest but curious. As the data owner, the user encrypts sensitive data using a fully homomorphic encryption (FHE) scheme and sends it to an untrusted server. During the inference phase, the server directly processes the encrypted data without decrypting it or interacting with the user, ultimately returning the ciphertext results to the user.

The security objective of this scheme is to ensure that both the cloud server and any potential eavesdroppers cannot access the user’s original sensitive data, nor can they infer information from the data using the homomorphic convolutional neural network. Given that the CKKS homomorphic encryption scheme provides chosen plaintext attack (CPA) security, the probability of the server or an adversary successfully recovering the user’s original sensitive data is negligible.

However, in addition to stealing ciphertext data, an attacker might influence the outputs of the encrypted inference by altering the server-side model parameters. This is particularly critical in binary classification tasks (such as benign/malicious), where a misclassification of the detected file could lead to irreversible consequences. Therefore, in subsequent tasks, we need to enhance the security of the server-side model. This could include implementing anti-tampering mechanisms, utilizing hardware security modules (HSMs) or trusted execution environments (TEEs) to encapsulate model parameters, and employing adversarial training (such as generative adversarial networks, GANs) to improve the model’s resilience against parameter attacks.

5.2. Communications Cost Analysis

Unlike frameworks like CryptoDL and Hyphen that require communication across multiple servers for joint inference, the end-to-end non-interactive inference proposed in this paper incurs communication overhead only during the upload and reception phases.

From Section 3.1, it is known that the size of the communication overhead is determined by the number of classes (num_class) and the size of the ciphertext (l_CKKS). When detecting a single ciphertext, the communication costs for the upload and reception phases are l_CKKS and num_class × l_CKKS, respectively.

5.3. Scalability Discussion

Due to the minimal limitations of this framework and its lack of a complex process structure, it exhibits good scalability in practical scenarios. It can be applied to binary classification tasks that simply determine whether a piece of code is malicious. Moreover, this model can be utilized in other areas, such as traffic monitoring, to help in the real-time identification of potential network threats. By adapting and optimizing for different tasks, FICConvNet is expected to play a significant role in a broader range of application contexts, thereby enhancing overall network security.

5.4. Limitations Discussion

Despite the significant results achieved through the simulation of the end-to-end model, there are still various challenges to address before actual deployment.

Latency Issues

While our method improves speed by 80% compared to Conv2d, a latency of 166 s is clearly unacceptable in environments with high real-time requirements. This delay undermines the framework’s ability to ensure security in scenarios that need instantaneous responses, thus reducing its feasibility in real-time applications.

Detection Accuracy

The current model achieves an accuracy rate close to 96%, which, although commendable, is still inadequate for critical applications. Therefore, there is significant room for improvement in accuracy, making further parameter tuning and testing on diverse datasets essential steps to enhance the model’s robustness. Additionally, the experimental results related to ALPolyAct indicate that the model’s fluctuation and convergence under varying conditions require further investigation. These fluctuations may impact the model’s stability and reliability in practical applications, necessitating a systematic analysis and optimization approach. Future work will focus on improving the overall stability and performance of the model to ensure efficient and accurate results across various application scenarios. By addressing these key areas, we aim to further enhance the model’s performance, making it better suited for complex and challenging application environments.

Multi-user Environment

We have only conducted an end-to-end model design, while an actual deployment requires the server to communicate with multiple user endpoints. Communication delays, bandwidth issues, and the computational overhead and complexity introduced by fully homomorphic encryption in a multi-user environment may affect overall system performance. As the number of users increases, the server’s load will significantly rise, leading to extended response times.

Impact of Homomorphic Encryption

Although homomorphic encryption can ensure data privacy, its complex computation may further exacerbate delays and resource consumption, impacting efficiency in large-scale applications.

Dataset Limitations

The current research considers only a single dataset, which may not adequately capture the diversity present in real-world scenarios. In practical network environments, newly emerging malicious code could affect the model’s effectiveness. Thus, future work should investigate the model’s adaptability across different datasets and its ability to address threats that were not encountered during training.

6. Conclusions and Future Work

In this paper, we proposed a non-interactive detection system FICConvNet based on CKKS homomorphic encryption for malware detections, which achieves an end-to-end privacy protection mechanism for malware detections. This system achieves an end-to-end privacy protection mechanism for malware detection. The efficient FICConv convolution module is proposed to be applied in FICConvNet. FICConv uses DS Conv with structured sparse projection to reduce the amount of ciphertext computation and compensates for the loss of accuracy due to sparsification by residual concatenation. Meanwhile, the adaptive learnable activation function ALPolyAct is proposed to solve the problem of insufficient fixed polynomial activation ability in ciphertext Conv.

The experimental results show that FICConvNet significantly outperforms the existing ciphertext model CryptoNets (+15.5%), approaches the performance of the plaintext model ResNet-18 in terms of detection accuracy (97.56%), and supports the full-process ciphertext inference, which produces an average MAE of only 0.007.

In the future, our research program includes the following:

Ciphertext reasoning acceleration: In this paper, we only optimize the reasoning computation at the model level, but this is far from meeting the requirements of practical applications, and in the future, we will study how to further compress the computation latency by combining hardware acceleration techniques (GPU/FPGA).
Model Capacity Enhancement: Due to the limitations of ciphertext computing, our current models are not yet able to introduce overly complex network building blocks. Therefore, we will explore how the efficient network modules implemented on ciphertext can enhance the model learning capability; introduce knowledge distillation or federated learning framework to cope with the capacity constraints of the ciphertext model capacity by using the knowledge migration of the instructor model.
We will also focus on deployment issues in real-world scenarios to identify and address the challenges and limitations that may arise in practical environments. This includes evaluating the model’s encryption levels under varying security requirements, optimizing system communication in multi-user contexts, and ensuring the scalability and efficiency of the model when integrated into existing infrastructures.

Author Contributions

Conceptualization, S.P. and J.W.; methodology, S.P.; software, S.P.; validation, S.P., J.W. and S.L.; formal analysis, S.P., J.W. and S.L.; investigation, S.P. and J.W.; resources, S.P.; data curation, S.P.; writing—original draft preparation, S.P.; writing—review and editing, B.H.; visualization, S.P.; supervision, B.H.; project administration, B.H.; funding acquisition, B.H. All authors have read and agreed to the published version of the manuscript.

Funding

Supported by Open Project Program of Guangxi Key Laboratory of Digital Infrastructure (Project number: GXDINBC202406).

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

ANY.RUN. Malware Trends Report: Q4. 2024. Available online: https://any.run/cybersecurity-blog/malware-trends-q4-2024 (accessed on 1 January 2024).
ENISA. ENISA Threat Landscape 2024. 2024. Available online: https://www.enisa.europa.eu/publications/enisa-threat-landscape-2024 (accessed on 25 January 2025).
Vellela, S.S.; Balamanigandan, R.; Praveen, S.P. Strategic Survey on Security and Privacy Methods of Cloud Computing Environment. J. Next Gener. Technol. 2022, 2, 70–78. [Google Scholar]
Choi, S.; Jang, S.; Kim, Y.; Kim, J. Malware Detection using Malware Image and Deep Learning. In Proceedings of the International Conference on Information and Communication Technology Convergence (ICTC), Jeju, Republic of Korea, 18–20 October 2017; pp. 1193–1195. [Google Scholar]
Nataraj, L.; Karthikeyan, S.; Jacob, G.; Manjunath, B.S. Malware images: Visualization and automatic classification. In Proceedings of the 8th International Symposium on Visualization for Cyber Security, Pittsburgh, PA, USA, 20 July 2011; pp. 1–7. [Google Scholar]
Cui, Z.; Xue, F.; Cai, X.; Cao, Y.; Wang, G.G.; Chen, J. Detection of malicious code variants based on deep learning. IEEE Trans. Ind. Inform. 2018, 14, 3187–3196. [Google Scholar] [CrossRef]
Kalash, M.; Rochan, M.; Mohammed, N.; Bruce, N.D.; Wang, Y.; Iqbal, F. Malware classification with deep convolutional neural networks. In Proceedings of the 2018 9th IFIP International Conference on New Technologies, Mobility and Security (NTMS), Paris, France, 26–28 February 2018; pp. 1–5. [Google Scholar]
Jang, S.; Li, S.; Sung, Y. Generative adversarial network for global image-based local image to improve malware classification using convolutional neural network. Appl. Sci. 2020, 10, 7585. [Google Scholar] [CrossRef]
Ma, X.; Guo, S.Z.; Li, H.Y.; Pan, Z.S.; Qiu, J.Y.; Ding, Y.; Chen, F.Q. How to Make Attention Mechanisms More Practical in Malware Classification. IEEE Access 2019, 7, 155270–155280. [Google Scholar] [CrossRef]
Wu, X.; Song, Y.F. An Efficient Malware Classification Method Based on the AIFS-IDL and Multi-Feature Fusion. Information 2022, 13, 571. [Google Scholar] [CrossRef]
Singh, J.; Thakur, D.; Gera, T.; Shah, B.B.; Abuhmed, T.; Ali, F. Classification and Analysis of Android Malware Images Using Feature Fusion Technique. IEEE Access 2021, 9, 90102–90117. [Google Scholar] [CrossRef]
Nobakht, M.; Javidan, R.; Pourebrahimi, A. DEMD-IoT: A deep ensemble model for IoT malware detection using CNNs and network traffic. Evol. Syst. 2023, 14, 461–477. [Google Scholar] [CrossRef]
Agrawal, R.; Stokes, J.W.; Selvaraj, K.; Marinescu, M. Attention in recurrent neural networks for ransomware detection. In Proceedings of the 44th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019; pp. 3222–3226. [Google Scholar]
Aldehim, G.; Arasi, M.A.; Khalid, M.; Aljameel, S.S.; Marzouk, R.; Mohsen, H.; Yaseen, I.; Ibrahim, S.S. Gauss-Mapping Black Widow Optimization With Deep Extreme Learning Machine for Android Malware Classification Model. IEEE Access 2023, 11, 87062–87070. [Google Scholar] [CrossRef]
Miao, C.Y.; Kou, L.; Zhang, J.L.; Dong, G.Z. A Lightweight Malware Detection Model Based on Knowledge Distillation. Mathematics 2024, 12, 4009. [Google Scholar] [CrossRef]
Aldhafferi, N. Android Malware Detection Using Support Vector Regression for Dynamic Feature Analysis. Information 2024, 15, 23. [Google Scholar] [CrossRef]
Dowlin, N.; Gilad-Bachrach, R.; Laine, K.; Lauter, K.; Naehrig, M.; Wernsing, J. CryptoNets: Applying Neural Networks to Encrypted Data with High Throughput and Accuracy. In Proceedings of the 33rd International Conference on Machine Learning, New York, NY, USA, 20–22 June 2016. [Google Scholar]
Chabanne, H.; De Wargny, A.; Milgram, J.; Morel, C.; Prouff, E. Privacy-preserving classification on deep neural network. Cryptol. Eprint Arch. 2017, in press. [Google Scholar]
Hesamifard, E.; Takabi, H.; Ghasemi, M. Cryptodl: Deep neural networks over encrypted data. arXiv 2017, arXiv:1711.05189. [Google Scholar]
Lee, J.-W.; Kang, H.; Lee, Y.; Choi, W.; Eom, J.; Deryabin, M.; Lee, E.; Lee, J.; Yoo, D.; Kim, Y.-S. Privacy-preserving machine learning with fully homomorphic encryption for deep neural network. IEEE Access 2022, 10, 30039–30054. [Google Scholar] [CrossRef]
Zhou, J.; Li, J.; Panaousis, E.; Liang, K. Deep binarized convolutional neural network inferences over encrypted data. In Proceedings of the 2020 7th IEEE International Conference on Cyber Security and Cloud Computing (CSCloud)/2020 6th IEEE International Conference on Edge Computing and Scalable Cloud (EdgeCom), New York, NY, USA, 1–3 August 2020; pp. 160–167. [Google Scholar]
Nandakumar, K.; Ratha, N.; Pankanti, S.; Halevi, S. Towards deep neural network training on encrypted data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Long Beach, CA, USA, 16–17 June 2019. [Google Scholar]
Al Badawi, A.; Jin, C.; Lin, J.; Mun, C.F.; Jie, S.J.; Tan, B.H.M.; Nan, X.; Aung, K.M.M.; Chandrasekhar, V.R. Towards the alexnet moment for homomorphic encryption: Hcnn, the first homomorphic cnn on encrypted data with gpus. IEEE Trans. Emerg. Top. Comput. 2020, 9, 1330–1343. [Google Scholar] [CrossRef]
Chen, T.; Bao, H.; Huang, S.; Dong, L.; Jiao, B.; Jiang, D.; Zhou, H.; Li, J.; Wei, F. The-x: Privacy-preserving transformer inference with homomorphic encryption. arXiv 2022, arXiv:2206.00216. [Google Scholar]
Zhu, Y.; Wang, X.; Ju, L.; Guo, S. FxHENN: FPGA-based acceleration framework for homomorphic encrypted CNN inference. In Proceedings of the 2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA), Montreal, QC, Canada, 25 February–1 March 2023; pp. 896–907. [Google Scholar]
Kim, D.; Park, J.; Kim, J.; Kim, S.; Ahn, J.H. Hyphen: A hybrid packing method and its optimizations for homomorphic encryption-based neural networks. IEEE Access 2023, 12, 3024–3038. [Google Scholar] [CrossRef]
Aremu, T.; Nandakumar, K. Polykervnets: Activation-free neural networks for efficient private inference. In Proceedings of the 2023 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML), Raleigh, NC, USA, 8–10 February 2023; pp. 593–604. [Google Scholar]
Microsoft SEAL (Release 3.2); Microsoft Research: Redmond, WA, USA, 2019; Available online: https://github.com/Microsoft/SEAL (accessed on 29 July 2024).
Juvekar, C.; Vaikuntanathan, V.; Chandrakasan, A. GAZELLE: A low latency framework for secure neural network inference. In Proceedings of the 27th USENIX Security Symposium (USENIX Security 18), Baltimore, MD, USA, 15–17 August 2018; pp. 1651–1669. [Google Scholar]
Panconesi, A.; Marian, W.; Cukierski, W.; BIG Cup Committee. Microsoft Malware Classification Challenge (BIG 2015). Kaggle. 2015. Available online: https://kaggle.com/competitions/malware-classification (accessed on 16 November 2024).

Figure 1. Architecture of the detection system.

Figure 2. Rotated Ciphertext Convolution calculation process. (A is the input ciphertext feature, after Rot, to obtain A₁, A₂, A₃, corresponding to the parameters of Conv kernel K).

Figure 3. Ciphertext Conv computation after sparsification. (k₂, k₃ are set to 0 to avoid generating the corresponding circular ciphertexts A₂, A₃).

Figure 4. Sparse projection (after projection, the effective parameters of the Conv kernel are only k₀, k₂, k₄, k₆, k₈).

Figure 5. FICConv.

Figure 6. Equal scaling projection.

Figure 7. The classification labels and number of datasets.

Figure 8. Training process curves.

Figure 9. (a) Comparison of top1 prediction probability consistency between plaintext and ciphertext; (b) classification consistency matrix.

Figure 10. (a) Training curves with 3 activation functions; (b) ALPolyAct’s a1, a2, and a3 learning curve in FICConv4.

Table 1. Mapping reference.

File Size Range	k
Tiny (<10 KB)	1
Small (10–50 KB)	4
Medium (50–200 KB)	8
Large (200–800 KB)	16
Oversize (>800 KB)	32

Table 2. Comparison of ciphertext computation by methods.

k × k Conv	ROT	HE-Mult
Conv2d	C_in × k² − 1	k² × C_out × C_in
DS Conv	C_in × k² − 1	k² × C_in + C_out × C_in
sparse projection	S × C_in × k² − 1	S × k² × C_out × C_in
DS Conv + sparse projection	S × C_in × k² − 1	S × k² × C_in + C_out × C_in

Table 3. Structure of FICConvNet-4.

FICConvNet-4	Input Size	Output Size	Detial
Stem	1 × 64 × 64	4 × 62 × 62	3 × 3 conv, stride = 1
Avg_pool	4 × 62 × 62	4 × 31 × 31	-
FICConv1	4 × 31 × 31	8 × 15 × 15	3 × 3 dw_conv, stride = 2
FICConv2	8 × 15 × 15	16 × 7 × 7	3 × 3 dw_conv, stride = 2
FICConv3	16 × 7 × 7	32 × 5 × 5	3 × 3 dw_conv, stride = 1
FICConv4	32 × 5 × 5	64 × 3 × 3	3 × 3 dw_conv, stride = 1
Head	64 × 3 × 3	9 × 1 × 1	-

Table 4. Experimental hyperparameter settings.

Hyperparameter	Setting Value
Poly Modulus Degree (N)	8192
Rescale Parameter of Q	2⁴⁰
Δ	2³⁰
optimizer	SGD
Learning Rate (η)	0.005
Momentum (μ)	0.9
Weight Decay (λ)	5 × 10⁴

Table 5. Comparison of FICConvNet-4, ResNet18, VGG19, and CryptoNets experimental data.

Model	Accuracy	Precision	Recall	F1 Score	Privacy
FICConvNet-4	95.86% (±0.15)	94.84% (±0.11)	94.20% (±0.10)	94.47% (±0.12)	√
ResNet18	97.56% (±0.10)	96.40% (±0.08)	95.85% (±0.09)	96.12% (±0.10)	×
VGG19	96.51% (±0.11)	95.57% (±0.09)	94.89% (±0.09)	95.21% (±0.10)	×
CryptoNets	80.31% (±0.20)	80.71% (±0.18)	78.89% (±0.20)	79.76% (±0.19)	√
PolyKervNets	94.52% (±0.10)	93.53% (±0.09)	92.90% (±0.08)	93.15% (±0.09)	√

Table 6. MAE and classification consistency rate.

	MAE (Average)	Classification Consistency Rate
FICConvNet-4	0.007	96.5%

Table 7. Experimental results of different mapping methods.

Method	Accuracy	Precision	Recall	F1 Score
Multi-byte mapping	95.86% (±0.15)	94.84% (±0.11)	94.20% (±0.10)	94.47% (±0.12)
Single-byte mapping	96.14% (±0.11)	95.01% (±0.09)	94.67% (±0.08)	94.84% (±0.10)

Table 8. Results of ablation experiments.

Model	Accuracy	F1 Score	Time (s)	Description of Changes
FICConv	95.86% (±0.15)	94.47% (±0.12)	166.71	-
A	96.21% (+0.35%; ±0.14)	95.22% (+0.75%; ±0.12)	830.97	Replace DS Conv in FICConv for + sparse projection with standard 3 × 3 convolution.
B	96.10% (+0.24%; ±0.12)	94.79% (+0.32%; ±0.10)	767.95	Replacement with regular convolution + sparse projection.
C	95.91% (+0.05%; ±0.11)	94.61% (+0.14%; ±0.09)	183.59	Removal of sparse projection.
D	92.32% (−3.54%; ±0.16)	91.12% (−3.35%; ±0.15)	159.47	Remove residual branches and keep only FICConv main path Conv and activation.
E	91.49% (−4.37%; ±0.17)	90.19% (−4.39%; ±0.16)	170.37	Fix the ALAolyAct of FICConv to be f(x) = 0.4x + 0.3x² + 0.2.

Table 9. Ciphertext inference time comparison.

Stage	FICConv	A	B	C	D
Stem	1.27 s	1.27 s	1.27 s	1.27 s	1.27 s
Avg_pool	0.22 s	0.22 s	0.22 s	0.22 s	0.22 s
FICConv1	2.48 s	9.96 s	8.03 s	3.10 s	2.35 s
FICConv2	8.91 s	39.44 s	34.53 s	9.23 s	8.12 s
FICConv3	29.65 s	153.67 s	143.21 s	36.06 s	27.71 s
FICConv4	105.61 s	607.84 s	562.12 s	122.44 s	101.23 s
Head	18.57 s	18.57 s	18.57 s	18.57 s	18.57 s
Sum	166.71 s	830.97 s	767.95 s	190.89 s	159.47 s

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Pang, S.; Wen, J.; Liang, S.; Huang, B. FICConvNet: A Privacy-Preserving Framework for Malware Detection Using CKKS Homomorphic Encryption. Electronics 2025, 14, 1982. https://doi.org/10.3390/electronics14101982

AMA Style

Pang S, Wen J, Liang S, Huang B. FICConvNet: A Privacy-Preserving Framework for Malware Detection Using CKKS Homomorphic Encryption. Electronics. 2025; 14(10):1982. https://doi.org/10.3390/electronics14101982

Chicago/Turabian Style

Pang, Si, Jing Wen, Shaoling Liang, and Baohua Huang. 2025. "FICConvNet: A Privacy-Preserving Framework for Malware Detection Using CKKS Homomorphic Encryption" Electronics 14, no. 10: 1982. https://doi.org/10.3390/electronics14101982

APA Style

Pang, S., Wen, J., Liang, S., & Huang, B. (2025). FICConvNet: A Privacy-Preserving Framework for Malware Detection Using CKKS Homomorphic Encryption. Electronics, 14(10), 1982. https://doi.org/10.3390/electronics14101982

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

FICConvNet: A Privacy-Preserving Framework for Malware Detection Using CKKS Homomorphic Encryption

Abstract

1. Introduction

2. Related Works

2.1. DL for Malware Detection

2.2. Neural Network Model Based on HE

3. Method

3.1. Overall Architecture Design

3.1.1. Client Side

3.1.2. Server Side

3.2. Data Preprocessing

3.3. Ciphertext Inference Model Optimization

3.3.1. Convolution Layer Optimization

3.3.2. A Learning Polynomial Activation Function (ALPolyAct)

3.3.3. Pooling Layer

3.4. Fast-Inference Ciphertext Convolution (FICConv)

4. Experimental Results

4.1. Experimental Setup, Dataset, and Validation

4.2. Comparative Experiments

4.3. Ablation Experiment

5. Discussion

5.1. Security Discussion

5.2. Communications Cost Analysis

5.3. Scalability Discussion

5.4. Limitations Discussion

6. Conclusions and Future Work

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI