Next Article in Journal
SplitML: A Unified Privacy-Preserving Architecture for Federated Split-Learning in Heterogeneous Environments
Previous Article in Journal
A 1.06 ppm/°C Compact CMOS Voltage Reference
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Reinforcement Learning-Based Optimization Strategy for Noise Budget Management in Homomorphically Encrypted Deep Network Inference

1
Faculty of Information Network Security, Yunnan Police College, Kunming 650223, China
2
Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China
*
Author to whom correspondence should be addressed.
Electronics 2026, 15(2), 275; https://doi.org/10.3390/electronics15020275
Submission received: 13 November 2025 / Revised: 26 December 2025 / Accepted: 31 December 2025 / Published: 7 January 2026
(This article belongs to the Special Issue Security and Privacy in Artificial Intelligence Systems)

Abstract

Homomorphic encryption provides a powerful cryptographic solution for privacy-preserving deep neural network inference, enabling computation on encrypted data. However, the practical application of homomorphic encryption is fundamentally constrained by the noise budget, a core component of homomorphic encryption schemes. The substantial multiplicative depth of modern deep neural networks rapidly consumes this budget, necessitating frequent, computationally expensive bootstrapping operations to refresh the noise. This bootstrapping process has emerged as the primary performance bottleneck. Current noise management strategies are predominantly static, triggering bootstrapping at pre-defined, fixed intervals. This approach is sub-optimal for deep, complex architectures, leading to excessive computational overhead and potential accuracy degradation due to cumulative precision loss. To address this challenge, we propose a Deep Network-aware Adaptive Noise-budget Management mechanism, a novel mechanism that formulates noise budget allocation as a sequential decision problem optimized via reinforcement learning. The core of the proposed mechanism comprises two components. First, we construct a layer-aware noise consumption prediction model to accurately estimate the heterogeneous computational costs and noise accumulation across different network layers. Second, we design a Deep Q-Network-driven optimization algorithm. This Deep Q-Network agent is trained to derive a globally optimal policy, dynamically determining the optimal timing and network location for executing bootstrapping operations, based on the real-time output of the noise predictor and the current network state. This approach shifts from a static, pre-defined strategy to an adaptive, globally optimized one. Experimental validation on several typical deep neural network architectures demonstrates that the proposed mechanism significantly outperforms state-of-the-art fixed strategies, markedly reducing redundant bootstrapping overhead while maintaining model performance.

1. Introduction

In the data-driven era, Deep Neural Networks (DNNs) have become a core technology for unlocking the value of massive data. However, a fundamental contradiction exists between this highly centralized data model and the growing societal and regulatory demands for privacy protection [1]. This conflict has catalyzed the rapid development of the privacy-preserving machine learning field [2]. Among the various technical approaches, Federated Learning (FL) and Multi-Party Computation (MPC) primarily address the issues of keeping data local or collaborative training, but they still have limitations in non-interactive inference scenarios between clients and a server [3,4]; Differential Privacy (DP), in contrast, focuses on protecting the statistical information of the model’s output rather than the input data itself [5]. In this context, Homomorphic Encryption (HE) offers a unique cryptographic solution. HE permits arbitrary computations on ciphertexts without decryption, fundamentally achieving data’s “usability without visibility” [6]. Although HE is theoretically promising, a significant gap exists in its practical application to modern deep networks [7]. The computation in HE schemes is based on a core constraint known as the noise budget [8]. Each multiplication operation consumes this budget, and the inherent high multiplicative depth of deep networks can quickly exhaust the initial budget. To prevent decryption failure, a periodic bootstrapping operation must be performed to “refresh” the budget [9]. However, bootstrapping itself is an extremely computationally expensive operation, with overhead far exceeding standard ciphertext computations [10]. Consequently, the bootstrapping operation has become the key bottleneck restricting HE inference performance.
Currently, mainstream HE schemes predominantly adopt static noise management strategies [11,12]. These strategies trigger bootstrapping by setting uniform noise thresholds at pre-defined, regular layer intervals. While the performance impact of this strategy is acceptable for shallow networks, its inherent limitations become prominent when applied to modern DNNs with substantial depth, leading to significant performance issues, especially when handling multiple inference tasks [13]. Furthermore, excessive bootstrapping introduces and accumulates minor computational precision losses; during the iterations of multiple training cycles, this cumulative effect can be sufficient to damage the model’s gradients and weights, which in turn affects inference accuracy [14]. Therefore, the core problem of this paper is to construct a globally optimal dynamic bootstrapping strategy that simultaneously optimizes the computational efficiency and model performance of deep networks while strictly adhering to the HE security thresholds.
To address the above problems, this paper proposes a Deep Network-aware Adaptive Noise-budget Management mechanism (DN-ANM). First, to tackle the lack of awareness in static strategies, we construct a layer-aware noise consumption prediction model capable of accurately estimating the heterogeneous computational overhead of different layers. Second, to address the lack of global optimality in traditional decision-making, we design an optimization algorithm driven by reinforcement learning. We utilize a Deep Q-Network (DQN) [15] to train an intelligent agent that can achieve a globally optimal policy allocation for the noise budget based on the output of the noise predictor and the current network state. To facilitate understanding, the key mathematical notations and symbols used throughout this paper are summarized in Table 1. The main contributions of this paper are summarized as follows:
(1)
To address the problem that existing HE schemes use fixed noise budget allocation strategies, which cannot adapt to the heterogeneous computational characteristics between deep network layers, we propose a Deep Network-aware Adaptive Noise-budget Management mechanism. By constructing a layer-aware noise consumption prediction model, this mechanism achieves dynamic and precise allocation of the noise budget.
(2)
To solve the lack of global optimality and adaptability in traditional bootstrapping decision methods, we design a reinforcement learning-driven optimization algorithm for bootstrapping decisions. It uses a Deep Q-Network to learn the optimal timing for bootstrapping, minimizing the bootstrapping overhead while ensuring computational security.
(3)
Experimental validation on several typical deep networks demonstrates the effectiveness and scalability of the proposed mechanism. The results show that the mechanism can optimize the management strategy online according to the computational characteristics of network. While guaranteeing security requirements, it significantly reduces the redundant overhead of bootstrapping operations compared to fixed strategies.
This paper is organized as follows: Section 2 introduces the related work. Section 3 details the DN-ANM model architecture. Section 4 presents the reinforcement learning-based adaptive bootstrapping decision-making. Section 5 provides the experimental setup and results analysis. Section 6 concludes the paper.

2. Related Work

2.1. Homomorphic Encryption in Privacy-Preserving Machine Learning

HE enables computation on ciphertexts without prior decryption. This cryptographic primitive ensures that operations performed in the ciphertext space map directly to operations in the plaintext space. Formally, for an encryption scheme E and a binary operation ⊙, the scheme satisfies the homomorphic property if D e c ( E ( m 1 ) E ( m 2 ) ) = m 1 m 2 for plaintexts m 1 , m 2  [6]. HE classifications depend on the supported operations and computation depth. Partially Homomorphic Encryption (PHE) supports a single operation type with unbounded depth. Somewhat Homomorphic Encryption (SHE) supports both addition and multiplication but limits the circuit depth due to noise accumulation. Fully Homomorphic Encryption (FHE) supports arbitrary computations on ciphertexts. FHE achieves this by introducing a bootstrapping mechanism to reduce noise, theoretically allowing infinite evaluation depth. Leveled HE, a subset of FHE, supports arbitrary depth provided the parameters are selected based on the circuit complexity.
A generic HE scheme comprises four algorithms: ( G e n , E n c , D e c , E v a l ) . The key generation algorithm ( p k , s k ) G e n ( 1 λ ) outputs a public and private key pair based on the security parameter λ . The encryption algorithm yields a ciphertext c E n c ( p k , m ) , mapping the plaintext m M to the ciphertext space. The decryption algorithm recovers the message m D e c ( s k , c ) . The evaluation algorithm c r e s E v a l ( p k , f , c 1 , . . . , c k ) computes the function f over the input ciphertexts. Modern schemes, including BFV [16] and CKKS [17], rely on the Ring Learning with Errors (RLWE) assumption [17] and operate as noise-based cryptosystems. The LWE problem can be formalized as: given samples ( a i , b i = a i · s + e i mod q ) over the polynomial ring R q = Z q [ X ] / ( X N + 1 ) , where a i R q is uniformly random, s R q is the secret, and e i is sampled from an error distribution χ , the objective is to distinguish these samples from uniformly random pairs [18]. A freshly generated ciphertext c E n c ( pk , m ) contains an initial, low-level noise e init , bounded by | e init | < B i n i t . Every homomorphic operation performed on the ciphertext causes this noise to grow [17]. Specifically, for homomorphic addition, the noise satisfies | e 1 + e 2 |     | e 1 | + | e 2 | ; for homomorphic multiplication, the noise growth is more significant, satisfying | e mult |     | e 1 | · | e 2 |   +   | e 1 |   ·   | m 2 |   +   | e 2 |   ·   | m 1 | . The correctness of the Dec algorithm is only guaranteed when the noise e is below a specific modulus or maximum bound B max . This characteristic poses a core challenge for DNN inference. A DNN is essentially an arithmetic circuit with a high multiplicative depth L  [19]. To map DNNs onto HE, two paths have evolved in the academic community. One path is to use the BFV/BGV schemes, which quantize network weights and activations into integers and perform exact computations over the integer ring Z t . In the BFV scheme, the plaintext space is Z t , and the ciphertext space is R q 2 . The decryption process can be represented as m = t q · [ c , sk ] q mod t . The other path uses the CKKS scheme, which natively supports approximate arithmetic on real numbers R . CKKS introduces a scaling factor Δ to control precision. The encoding process is m Δ · m , and after decryption, rescaling is required: m 1 Δ · Dec ( sk , c ) . By adjusting the scaling factor Δ and the modulus q, CKKS allows for a flexible trade-off between precision and noise budget. This approximate nature enables CKKS to directly handle the floating-point weights and activations in neural networks without complex quantization procedures. Furthermore, while the integer quantization of BFV faces challenges with cumulative quantization errors when processing deep networks [20,21], the native support of CKKS for floating-point operations makes it more easily scalable to large-scale deep networks. Therefore, this paper selects the CKKS noise model as the foundation to construct a noise consumption predictor usable by the reinforcement learning agent, thereby achieving adaptive noise budget management for deep networks.
Regardless of the HE scheme chosen, non-polynomial activation functions (e.g., ReLU, Sigmoid, GeLU) pose a significant challenge. As HE natively supports only addition and multiplication, executing these functions requires relying on the costly bootstrapping Boot operation. This operation can “refresh” the ciphertext and reduce its noise: c refresh Boot ( pk , c ) , such that Dec ( sk , c refresh ) = Dec ( sk , c ) but the noise is reset to its initial level. Alternatively, low-degree polynomials, such as p ( x ) = x 2 or p ( x ) = i = 0 d a i x i , are used as approximations [22].

2.2. Noise Budget Management in HE Inference

In HE schemes, the noise growth e is always constrained by an upper bound B max to guarantee decryption correctness [23], requiring | e i |   < B max for all i [ 1 , n ] , where n is the total number of computation steps. The performance of HE inference is essentially a constrained optimization problem: minimizing the computational overhead min θ T ( θ ) subject to the noise constraint e final < B max , where θ denotes the HE parameter configuration and T is the total time overhead. Early work in HE-based machine learning, exemplified by CryptoNets [24], introduced the leveled homomorphic encryption paradigm. This strategy analyzes the total multiplicative depth L total = l = 1 L d l of the entire DNN circuit C, where d l is the multiplicative depth of the l-th layer. It then selects a set of sufficiently large initial HE parameters ( λ , q , N ) , such that the initial noise budget B i n i t can accommodate the noise accumulation of the entire circuit, i.e., satisfying B i n i t · i = 1 L total α i < B max , where α i is the noise growth factor of the i-th multiplication. This approach completely avoids the costly bootstrapping operation, exhibiting a time complexity of O ( T infer ) without incurring the additional O ( k · T boot ) overhead. The trade-off, however, is a high-order dependency of the HE parameters on the multiplicative depth [25,26]. Specifically, the modulus q must satisfy log q = O ( L total · log B mult ) , causing the ciphertext size | c | = O ( N · log q ) and computational complexity O ( N log N · log q ) to grow exponentially. Consequently, this paradigm is only applicable to shallow networks such as LeNet-5, where L total 5 .
To overcome the depth limitation, Fan et al. [16] shifted the focus to automated optimization through static graph optimization and compile-time decisions. HE compilers, such as EVA [27], perform parameter pre-planning by conducting static analysis on the DNN’s dataflow graph G = ( V , E ) at compile-time. The compiler constructs a noise propagation function ϕ : V R + , where ϕ ( v i ) denotes the estimated upper bound of the noise at node v i . For each computational node v V , the compiler statically allocates a noise budget b v such that v V b v B total , and inserts bootstrapping operations at predetermined locations { v j 1 , v j 2 , , v j k } . While these methods enhance the efficiency of HE applications, they remain fundamentally static. Consider three different layer types: the noise consumption of a 3 × 3 convolutional layer is Δ e conv 3 x 3 9 · | w | · e prev , that of a 1 × 1 convolutional layer is Δ e conv 1 x 1 | w | · e prev , and that of a non-linear activation layer using a degree-d polynomial approximation is Δ e act O ( e prev d ) . These three operations consume the noise budget in a starkly different manner, satisfying Δ e conv 1 x 1 Δ e conv 3 x 3 Δ e act . However, the static analyzer assigns them the same conservative budget b v = max { Δ e conv 1 x 1 , Δ e conv 3 x 3 , Δ e act } , resulting in suboptimal resource allocation efficiency η = B used B allocated < 1 . As a complement to static compilers, some works employ runtime heuristic rules. This includes the fixed-interval strategy S interval : if ( i mod k = 0 ) Boot , which performs bootstrapping every k layers. Although this strategy possesses basic runtime awareness, enabling decisions based on the current noise state e t , it disregards the noise consumption characteristics of the future computational path, E [ Δ e t + 1 : T ]  [28]. Consequently, it triggers redundant bootstrapping operations at unnecessary locations, incurring additional time overhead.

3. Noise Model for HE Deep Networks

3.1. Problem Formulation Definition

Definition 1.
Algebraic Structure of CKKS. The CKKS homomorphic encryption scheme is defined over the quotient polynomial ring R q = Z q [ X ] / ( X N + 1 ) , where N = 2 m is the ring dimension and X N + 1 = Φ 2 N ( X ) is the 2 N -th cyclotomic polynomial, which is irreducible over the rational field Q . The ciphertext modulus q is typically chosen as a product of distinct primes, q = i = 1 k q i , where each prime q i satisfies q i 1 ( mod 2 N ) . This condition ensures the existence of a primitive 2 N -th root of unity in Z q i , enabling an efficient implementation of negacyclic polynomial multiplication via the Number Theoretic Transform (NTT). The size and number of primes composing q are selected according to the required multiplicative depth, precision, and noise growth constraints of the homomorphic computation.
Although the polynomial X N + 1 is irreducible over Q and plays a central role in the security analysis based on the RLWE assumption, irreducibility modulo q is not required for homomorphic computation. In practice, due to the choice q i 1 ( mod 2 N ) , the polynomial X N + 1 splits into linear factors over each Z q i . As a consequence, the ring R q admits a Chinese Remainder Theorem (CRT) decomposition R q i = 1 k Z q i [ X ] / ( X N + 1 ) i = 1 k Z q i N , which enables a Single Instruction Multiple Data (SIMD) representation and efficient parallel arithmetic in the CRT domain. Arithmetic operations in R q are defined as follows:
  • Addition: Polynomial addition modulo q, performed coefficient-wise (or component-wise in the CRT representation),
    ( a + b ) ( X ) = j = 0 N 1 ( a j + b j mod q ) X j .
  • Multiplication: Negacyclic convolution modulo X N + 1 and q, which is efficiently implemented using the NTT,
    ( a · b ) ( X ) = a ( X ) b ( X ) mod ( X N + 1 ) mod q .
The cryptographic primitives are defined over this structure as follows:
  • KeyGen ( 1 λ ) ( pk , sk , evk ) : Generates the public key, secret key, and evaluation key.
  • Enc ( pk , m ; Δ ) c t : Encrypts a plaintext vector m C N / 2 into a ciphertext c t = ( c 0 , c 1 ) R q 2 , where Δ is the scaling factor.
  • Dec ( sk , c t ) m : Decrypts the ciphertext to obtain an approximate plaintext m m .
Definition 2.
Computational graph of deep neural networks. We formally represent a Deep Neural Network (DNN), denoted as N , as a weighted Directed Acyclic Graph (DAG), N = ( V , E , Θ ) . In this graph, the vertex set V = { v 0 , v 1 , , v L } represents the nodes of the network, where v 0 is the input node, v L is the output node, and the intermediate vertices v i ( i [ 1 , L 1 ] ) correspond to the i-th computational layer. The set of directed edges E V × V defines the data dependencies between layers, where ( v i , v j ) E indicates that data flows from layer i to layer j. Θ = { θ 1 , θ 2 , , θ L } is the set of parameters associated with each computational layer vertex. Each θ i = ( f i , W i , M i ) comprises three components: the layer’s computational function f i : X i X i + 1 ; the layer’s weight parameters W i R d i + 1 × d i ; and the meta-information M i describing the layer’s topological structure, such as kernel size k i , stride s i , padding p i , and the number of input/output channels C i in , C i out , among others. Consequently, for any given input x X 0 , the network’s forward propagation process can be compactly expressed as the composite function N ( x ) = f L f L 1 f 1 ( x )  [29].
Definition 3.
Homomorphic network reasoning. Given a network N = ( V , E , Θ ) and an input x X 0 , homomorphic network inference is defined as a state transition process { ( c t i , e i ) } i = 0 L  [24]. In the initialization phase, c t 0 = Enc ( pk , x ; Δ 0 ) , with an initial noise e 0 = B i n i t . For each layer i [ 1 , L ] , the homomorphic computation c t i = H ( f i , c t i 1 , W i ) is executed, where H denotes the homomorphic computation operator that maps the plaintext function f i to a sequence of ciphertext-domain operations. The output noise at i is determined by the noise growth function Φ i :
e i = Φ i ( e i 1 , θ i ) = e i 1 + Δ E i ( e i 1 , W i , M i )
In this formulation, Φ i represents the noise growth function that maps the input error state and layer configuration to the output error bound. The variable | | e i 1 | | denotes the infinity norm of the noise inherited from preceding computations, serving as the baseline for the current step. The parameter set θ i encapsulates the structural configuration of the layer, including the weight tensor W i and topological metadata M i . The term Δ E i defines the noise increment function, which quantifies the specific additive error generated by the arithmetic operations within layer i as a function of the input noise magnitude and the layer’s multiplicative complexity.
Problem 1.
The Adaptive Noise Budget Management Problem. Given an initial noise budget B i n i t , a maximum noise threshold B max , and the L computational layers of a network N , the objective is to design an adaptive policy π : S A with a ternary action space A = { a 0 , a 1 , a 2 } . This policy, for each state s i = ( i , e i , { θ j } j > i ) during inference, decides whether and when to execute the bootstrapping operation Boot . Formally, we define a decision sequence a = ( a 1 , a 2 , , a L ) , where a i = π ( s i ) { a 0 , a 1 , a 2 } ( a i { a 1 , a 2 } indicates that bootstrapping is scheduled around layer i). The optimization goal is min π K ( π ) = i = 1 L I [ a i { a 1 , a 2 } ] . Subject to the security constraint i [ 1 , L ] , e i < B max . The noise evolution follows:
e i = B i n i t + k = j + 1 i Δ E k if a i { a 1 , a 2 } , j = max { k < i : a k { a 1 , a 2 } } e i 1 + Δ E i if a i = a 0
The goal of this problem, given only the current noise state e i , is to accurately predict the future noise trajectory { e j } j > i to make optimal decisions, thereby avoiding both premature and delayed bootstrapping.

3.2. Deep Network-Aware Adaptive Noise-Budget Management (DN-ANM)

To address Problem 1, we propose the DN-ANM model. As shown in Figure 1, this model is composed of three core components, forming a complete closed loop for noise perception, prediction, and monitoring.
Network Structure Analyzer: The Network Structure Analyzer is defined as a mapping A : N G , which transforms the input DNN model N into a structured graph representation G = ( V , E , Θ , Ψ ) , where Ψ = { ψ 1 , , ψ L } is the set of layer-wise feature vectors. For each layer i, the feature vector ψ i R d encodes the network’s structural features, including:
  • Topological features: layer depth d i , the set of predecessor nodes pred ( v i ) = { v j : ( v j , v i ) E } , and the set of successor nodes succ ( v i ) .
  • Operational features: the one-hot encoding of the operation type
    τ i { Conv , FC , Poly , Pool , } .
  • Parameter features: weight dimension dim ( W i ) , weight norm W i F , and multiplicative complexity C i mult .
  • Dataflow features: input/output tensor shapes shape ( x i in ) and shape ( x i out ) .
This component has a time complexity of O ( | V | + | E | ) and only needs to be executed once before the inference begins.
Noise Consumption Prediction: This component constructs a forward noise propagation model P pred : R + × [ 1 , L ] × [ 1 , L ] R + . Given the current layer index i, the current noise level e i , and a target layer index j ( j > i ), it predicts the cumulative noise consumption:
P pred ( e i , i , j ) = k = i + 1 j Δ E k pred ( e k 1 pred , θ k )
The sequence of predicted noise states { | | e k | | p r e d } k = i j is computed recursively. The initialization is set by the current observed noise level, such that | | e i | | p r e d = | | e i | | . For all subsequent layers k > i , the noise state evolves according to:
e k pred = e k 1 pred + Δ E k pred ( e k 1 pred , θ k )
The key to the predictor lies in accurately modeling the noise increment function Δ E i pred for each layer, the construction of which will be detailed in Section 3. Theoretically, if Δ E i pred = Δ E i , the prediction error would be zero.
Runtime Noise Monitor: This component tracks the current ciphertext noise level in real-time, as HE schemes typically provide functions to estimate the current noise level. The monitor integrates this real-time noise value, the current network layer index, and the future noise consumption predictions (provided by the prediction model) into a complete state vector. This vector is then supplied to the reinforcement learning decision agent, as detailed in Section 4, to support optimal decision-making. The component maintains a dynamic state space S = I × N × F , where I = [ 1 , L ] is the layer index space; N = [ B i n i t , B max ) is the noise level space; and F = R L i is the future noise consumption prediction space. At the i-th inference step, the monitor constructs the state vector s i = ( i , e i cur , f i ) . Here, e i cur represents the actual noise level of the current ciphertext, obtained via the CKKS scheme’s built-in function GetNoise ( c t i ) . The vector f i = ( P pred ( e i cur , i , i + 1 ) , P pred ( e i cur , i , i + 2 ) , , P pred ( e i cur , i , L ) ) is the prediction vector for the noise consumption of all subsequent layers. This state vector provides a complete decision-making context, serving as the informational foundation for the subsequent reinforcement learning agent.

3.3. Layer-Wise Sensing Noise Prediction Model Flowchart

The core of noise consumption prediction lies in the construction of a precise layer-wise noise increment function, Δ E i . We adopt a bottom-up modeling strategy: commencing with a theoretical analysis of homomorphic primitives, progressively extending to composite layer-level operations, and culminating in an end-to-end noise propagation model.

3.3.1. Noise Modeling of Basic Homomorphic Operations

Consider the basic operations of the CKKS scheme. Let two ciphertexts, c t 1 and c t 2 , have noises e 1 and e 2 , plaintexts m 1 , m 2 C N / 2 , and modulus q. Homomorphic Addition: c t a d d = Add ( c t 1 , c t 2 ) . Its output noise satisfies e a d d     e 1   +   e 2 . Homomorphic Multiplication: c t m u l t = Mult ( c t 1 , c t 2 ) followed by relinearization Relin . Its output noise satisfies:
e m u l t Δ · ( m 1 ·   e 2 + m 2 ·   e 1 ) + e 1 ·   e 2 + e r e l i n
Here, e r e l i n = O ( N · σ 2 ) is the additional noise introduced by relinearization, where σ is the standard deviation of the key distribution. Since e 1 , e 2 Δ , the cross-term Δ 1 e 1 e 2 can be ignored. We thus define the dominant function for multiplicative noise growth as h m u l t ( e , m ) = Δ · m · e + e r e l i n .

3.3.2. Noise Modeling of Neural Network Layer-Wise Operations

Based on the analysis of fundamental homomorphic operations, we differentiate between theoretical worst-case bounds and calibrated predictive models to establish precise noise increments for DNN layers. Table 2 provides a summary of these layer-wise noise models.
(1) Convolutional Layer Noise Increment: Consider a convolutional layer l i parameterized by kernel size k × k , input channels C i n , output channels C o u t , and a weight tensor W i with maximum magnitude w m a x = max | W i | . The computational cost is driven by the number of ciphertext-plaintext multiplications per activation, defined as N m u l t ( i ) = k 2 · C i n . We first derive the theoretical upper bound for a single output activation, | | e s i n g l e | | , based on the worst-case interaction between weight scaling and input noise:
| | e s i n g l e | | Δ w m a x N m u l t ( i ) | | e i 1 | | + N m u l t ( i ) e r e l i n + N m u l t ( i ) | | e i 1 | |
Since this theoretical bound is often loose for efficient budgeting, we formulate a calibrated noise increment model. This model introduces learnable coefficients α i , β i to map the theoretical components to the actual noise behavior:
Δ E i c o n v ( | | e i 1 | | , W i ) = α i N m u l t ( i ) ( Δ w m a x + 1 ) | | e i 1 | | + β i N m u l t ( i )
where α i and β i are layer-specific calibration coefficients used to minimize the prediction error.
(2) Polynomial Activation Noise Increment: For a square activation f ( x ) = x 2 , involving ciphertext-ciphertext self-multiplication, the noise increment is modeled as:
Δ E i s q u a r e ( | | e i 1 | | ) = 2 Δ m m a x | | e i 1 | | + Δ 1 | | e i 1 | | 2 + e r e l i n
where m m a x denotes the maximum plaintext norm. For a general polynomial activation of degree d, the noise growth scales exponentially with the multiplicative depth log 2 d :
Δ E i p o l y ( | | e i 1 | | , d ) ( 2 Δ m m a x ) log 2 d | | e i 1 | | + O ( d · e r e l i n )

3.4. Cross-Layer Noise Propagation and Cumulative Prediction

By integrating the noise models for each layer, we construct a complete inter-layer noise propagation equation. Let the network comprise L layers, with each layer’s noise increment function denoted as Δ E i (where the corresponding model is selected based on the layer type). Starting from the initial noise budget | | e 0 | | = B i n i t , we derive the cumulative noise at layer i by iteratively expanding the recursive relationship established in Equation (6). Since each step adds a strictly positive noise increment Δ E k , the total noise is the sum of the initial state and the increments of all preceding layers k = 1 to i. This telescoping expansion yields the closed-form expression:
e i pred = e 0 + k = 1 i Δ E k pred ( e k 1 pred , θ k )
To evaluate the prediction accuracy, we define the layer-wise prediction error ξ i = | e i pred e i actual | and the layer-wise relative error η i = ξ i / e i actual .
Theorem 1.
(Noise Prediction Error Bound): Suppose the noise increment prediction for each layer satisfies | Δ E i pred Δ E i | ϵ , and the noise growth follows a linear pattern (i.e., Δ E i e i 1 , with a growth factor of γ i ). Then, the absolute prediction error at the i-th layer satisfies: ξ i ϵ · ( 1 + γ max ) i 1 γ max where γ max = max k [ 1 , i ] γ k .
Proof of Theorem 1. 
By mathematical induction. Base Case ( i = 1 ): ξ 1 = | Δ E 1 pred Δ E 1 | ϵ . Inductive Step: Assume the hypothesis holds for i 1 , i.e., ξ i 1 ϵ · ( 1 + γ max ) i 1 1 γ max . Then: ξ i = | e i 1 pred + Δ E i pred e i 1 actual Δ E i |   ξ i 1 + | Δ E i pred ( e i 1 pred ) Δ E i ( e i 1 actual ) |   ξ i 1 + γ i ξ i 1 + ϵ ( 1 + γ max ) ξ i 1 + ϵ Substituting the inductive hypothesis and expanding yields the result. Theorem 1 indicates that under the assumption of a noise growth factor γ max < 1 , the growth of the prediction error is bounded. By minimizing ϵ using the calibration coefficients α i , β i , it can be ensured that the cumulative prediction error for the entire network remains within an acceptable range.    □

4. Reinforcement Learning-Based Adaptive Bootstrapping Decision

This section formalizes the adaptive bootstrapping problem as a Markov Decision Process (MDP) and proposes a multi-objective driven decision algorithm. This algorithm incorporates the noise prediction model from the preceding section as part of the environment. It aims to learn an optimal policy function π * : S A , seeking to achieve a Pareto optimum [30] between computational security and the overhead of homomorphic operations.

4.1. MDP Problem Formulation

Definition 4
(Adaptive Bootstrapping MDP). We formalize the adaptive noise budget management problem defined in Problem 1 as a finite-horizon Markov Decision Process (MDP) M = ( S , A , P , R , γ , H ) . Here, S is the state space, representing all possible system configurations during inference; A = { a 0 , a 1 , a 2 } is the action space, corresponding to different bootstrapping decisions; P : S × A × S [ 0 , 1 ] is the state transition probability function, defined as P ( s | s , a ) = Pr [ s t + 1 = s | s t = s , a t = a ] ; R : S × A R is the reward function, quantifying the immediate utility of each state-action pair; γ [ 0 , 1 ] is the discount factor, balancing immediate rewards and long-term gains; and H = L is the decision horizon, corresponding to the total number of network layers. The objective of this MDP is to find an optimal policy π * : S Δ ( A ) that maximizes the expected cumulative reward: π * = arg max π E τ π t = 0 H 1 γ t R ( s t , a t ) where τ = ( s 0 , a 0 , s 1 , a 1 , , s H 1 , a H 1 , s H ) is a complete decision trajectory, and Δ ( A ) denotes a probability distribution over the action space [10,31].

4.1.1. Multi-Dimensional State Space

The design of the state space is determinant of the quality of policy learning. Our designed state vector, s i S R d s , integrates multi-scale information provided by the DN-ANM model from Section 3. It is structurally defined as s i = [ s i instant , s i foresight , s i structural ] This vector is composed of three functional modules, which capture the instantaneous state, foresight prediction, and structural features, respectively.
Instantaneous State Features s i instant R 4 encode the basic information of the current inference step: s i instant = [ r budget , r progress , r distance , r util ] where each component is defined as: Noise Budget Ratio: r budget = e i cur B max [ 0 , 1 ) , characterizing the proportion of the current noise relative to the security threshold. As this ratio approaches 1, the system nears the failure boundary. Inference Progress Ratio: r progress = i L [ 0 , 1 ] , quantifying the completeness of the inference process. Combined with the noise budget ratio, this feature helps the agent determine if the current rate of noise consumption is reasonable. Bootstrapping Distance: r distance = d i L , where d i = i max { j < i : a j { a 1 , a 2 } } is the number of layers elapsed since the last bootstrapping. This feature provides an explicit signal regarding the duration of noise accumulation. Budget Utilization r util = e i cur B i n i t B max B i n i t [ 0 , 1 ] , measuring the proportion of the effective noise budget used since the last bootstrapping, incentivizing the agent to fully utilize the budget provided by each bootstrapping operation.
Foresight Prediction Features s i foresight R 3 directly leverage the future information provided by the noise predictor P pred from Section 3  s i foresight = [ r next , r remain , r horizon ] Its components are defined as: Immediate Consumption Ratio: r next = P pred ( e i cur , i , i + 1 ) B max e i cur , representing the proportion of the expected noise consumption of the next layer relative to the current remaining budget, providing a short-term risk assessment. Remaining Consumption Ratio: r remain = P pred ( e i cur , i , L ) B max e i cur , the ratio of the total expected noise consumption from the current layer to the end of the network against the remaining budget, offering a long-term risk assessment. If r remain > 1 , it indicates that the inference will fail without bootstrapping. Foresight Horizon: r horizon = min k : P pred ( e i cur , i , i + k ) B max e i cur , representing the maximum number of layers that can be safely traversed with the current budget, providing the agent with a clear decision-making time window.
Structural Features s i structural R | T | + 1 encode network topology information, where T is the set of all layer types s i structural = [ r depth , e layer ] where Normalized Multiplicative Depth r depth = k = i + 1 L I [ τ k { Conv , FC , Poly } ] L total , characterizing the density of high-noise operations in the subsequent computations. Layer Type Encoding e layer { 0 , 1 } | T | is the one-hot encoding of the next layer’s operation type, enabling the agent to distinguish the noise consumption patterns of different operations. Here, I [ · ] denotes the indicator function; τ k denotes the type of layer k; and Θ denotes the parameter space of layers in the network. For typical linear layers (convolution and fully-connected), we denote the parameter at layer i + 1 by W i + 1 Θ .
Theorem 2.
(Markov Property of the State Space). The defined state space S satisfies the Markov property. That is, given the current state s i , the distribution of the future state s i + 1 is conditionally independent of the historical states { s 0 , , s i 1 } : Pr [ s i + 1 | s 0 , , s i , a i ] = Pr [ s i + 1 | s i , a i ] .
Proof of Theorem 2. 
The state s i completely encodes all sufficient statistics required for decision-making, as it contains the current noise level e i cur , the current layer index i, and all future layer noise predictions computed via P pred . The recursive nature of the noise predictor P pred (as defined in Section 3) ensures that future noise evolution depends only on the current noise and the network structure, and this network structure information is already encoded by s i structural . Therefore, the influence of historical decisions on the future is fully captured by the current noise level, thereby satisfying the Markov property.    □

4.1.2. Extended Action Space

To empower the agent with fine-grained control capabilities, we design a ternary action space A = { a 0 , a 1 , a 2 } , which ameliorates the limitations of binary decisions.
  • a 0 (Continue): No bootstrapping; compute the next layer directly.
  • a 1 (Immediate Bootstrapping): Bootstrap first to reset noise and distance, then compute the next layer.
  • a 2 (Pre-scheduled Bootstrapping): Compute the next layer, then bootstrap immediately to prepare subsequent layers.
  • Formal transition compositions for these actions are provided Equation (21).

4.1.3. Multi-Objective Reward Function

The reward function R ( s i , a i ) serves as the sole signal to guide the agent’s learning. To balance the two primary objectives of efficiency and security, we design a decomposed multi-objective function, formally defined as: R ( s i , a i ) = w eff · R eff ( s i , a i ) + w sec · R sec ( s i ) + w comp · R comp ( s i ) P penalty ( s i , a i ) where w eff , w sec , w comp > 0 are adjustable weight coefficients satisfying a normalization constraint, and w eff + w sec + w comp = 1 .
Efficiency Reward ( R eff ): This component incentivizes the agent to minimize the number of bootstrappings and to fully utilize the budget provided by each operation. It is expressed as: R eff ( s i , a i ) = r util ( s i ) α · I [ a i { a 1 , a 2 } ] β · I [ a i = a 1 r next ( s i ) < τ next ] This function incorporates three mechanisms. The first term rewards the full utilization of the budget via the budget utilization r util . The second term imposes a fixed cost α on any bootstrapping action ( a 1 or a 2 ), reflecting the computational cost of the Boot operation. The third term introduces an additional penalty β for unnecessary immediate bootstrapping (i.e., selecting a 1 when the immediate next layer’s noise consumption r next is below a threshold τ next ).
Security Reward ( R sec ): This component guides the agent to maintain a safe noise level through a soft constraint, preventing it from approaching the failure boundary:
R sec ( s i ) = 1 if r budget ( s i ) < θ safe exp κ · r budget ( s i ) θ safe 1 θ safe if θ safe r budget ( s i ) < θ danger 0 otherwise
Here, θ safe is the safe threshold, θ danger is the danger threshold, and κ > 0 controls the decay rate. This function creates three reward zones: a “safe zone” providing full reward, a “transition zone” where the reward decays exponentially, and a “danger zone” (beyond θ danger ) yielding zero reward. This structure guides the agent to maintain a safety margin. As shown in Equation (13), the security reward is defined piecewise over the safe, transition, and danger regions to enforce conservative behavior near the failure boundary.
Completion Reward ( R comp ): This reward is granted only upon the successful completion of the entire inference (i.e., at i = L without any layer failing). Its magnitude is correlated with resource utilization efficiency:
R comp ( s L ) = R base · 1 + K baseline K actual K baseline if success 0 otherwise
where R base is the base completion reward, K baseline is the average number of bootstrappings for a baseline policy (e.g., a fixed-threshold strategy), and K actual is the actual number of bootstrappings for the current trajectory. This design incentivizes the agent to find strategies that are more economical (i.e., use fewer bootstrappings) than the baseline. Equation (14) formalizes this completion reward, scaling the base reward by the efficiency gain over a baseline policy.
Penalty Term ( P penalty ): This term imposes a strong penalty on actions that lead to computation failure, thereby rapidly eliminating hazardous policies:
P penalty ( s i , a i ) = P fail if e i + 1 B max P risk · ( r budget ( s i + 1 ) θ danger ) + otherwise
where P fail is a large failure penalty, far exceeding any positive reward, and ( x ) + = max ( 0 , x ) . P risk applies a progressive penalty for any action that causes the subsequent state s i + 1 to enter the danger zone ( r budget > θ danger ). As specified in Equation (15), the penalty term sharply discourages trajectories that violate safety constraints or approach hazardous regions.

4.2. Policy Optimization Based on Deep Q-Learning

We employ the Deep Q-Network (DQN) approach to solve the defined MDP problem. This method approximates the optimal action-value function Q * ( s , a ) using a deep neural network Q θ : S × A R . The optimal action-value function is defined as:
Q * ( s , a ) = E t = 0 H 1 γ t R ( s t , a t ) s 0 = s , a 0 = a , π = π *
This function represents the expected cumulative return obtained by taking action a in state s and subsequently following the optimal policy π * until the end of the task horizon. Equation (16) formalizes this optimal return under the policy π * .

4.2.1. Value Network Architecture

We construct a fully connected neural network Q θ : R d s R | A | , with its architecture defined as:
Q θ ( s ) = W ( 3 ) σ ( W ( 2 ) σ ( W ( 1 ) s + b ( 1 ) ) + b ( 2 ) ) + b ( 3 )
where θ = { W ( l ) , b ( l ) } l = 1 3 are the learnable parameters, and σ ( · ) is the ReLU activation function, σ ( x ) = max ( 0 , x ) . The network contains two hidden layers, with dimensions h 1 = 128 and h 2 = 64 , respectively. The output layer has a dimension of | A | = 3 . This architectural design accounts for the small input dimension d s ; the decreasing dimensions of the hidden layers form an information bottleneck, which promotes feature abstraction. See Equation (17) for the explicit forward mapping of the value network.

4.2.2. Temporal Difference Learning and Bellman Optimality

The core of Q-learning utilizes the Bellman optimality equation to iteratively update value estimates. For a deterministic MDP, the Bellman equation is:
Q * ( s , a ) = R ( s , a ) + γ max a Q * ( s , a )
where s = T ( s , a ) is the next state following the deterministic transition. For an observed transition ( s , a , r , s ) , the Temporal Difference (TD) target is defined as: y = r + γ max a A Q θ ^ ( s , a ) . Here, Q θ ^ denotes the target network, whose parameters θ ^ are periodically copied from the online network θ (i.e., θ ^ θ every C steps). This mechanism addresses the “moving target problem” in bootstrapped updates and significantly improves training stability. The network parameters θ are updated by minimizing the expected Huber loss on the TD error: L ( θ ) = E ( s , a , r , s ) D Huber ( δ ) , where δ = y Q θ ( s , a ) is the TD error, and D is the experience replay buffer. The Huber loss is defined as
Huber ( δ ) = 1 2 δ 2 , if | δ | c c ( | δ | 1 2 c ) , otherwise
with smoothing threshold c > 0 , which improves robustness to outliers compared to a pure squared loss.

4.2.3. Experience Replay and Exploration Strategy

The training objective of DQN is to minimize a TD loss function. To enhance training stability, we utilize experience replay and an exploration strategy.
Experience Replay: A circular buffer D = { ( s j , a j , r j , s j , d j ) } j = 1 N of fixed capacity is maintained, where N = 10 5 is the buffer size and d j { 0 , 1 } is a termination flag. At each training step, the agent stores the new transition ( s t , a t , r t , s t + 1 , d t ) into D . When the buffer is full, the oldest sample is discarded using a First-In, First-Out (FIFO) policy. During training, a mini-batch B D of size | B | = B = 32 is sampled uniformly at random from D .
ϵ -Greedy Exploration Strategy: To balance exploration and exploitation, an adaptive ϵ -greedy strategy is adopted. At step t, the policy selects an action according to:
a t = random ( A ) probability ϵ t arg max a A Q θ ( s t , a ) probability 1 ϵ t
where random ( A ) denotes the operation of sampling an action from the discrete action space A according to a uniform distribution U ( A ) , such that each action is chosen with probability 1 / | A | . The exploration rate ϵ t decays exponentially over time. The exploration rate decays exponentially ϵ t = max ϵ end , ϵ start · λ t / T decay The action selection rule is summarized in Equation (20).

4.2.4. Integration with the DN-ANM Model

The reinforcement learning agent is integrated into the DN-ANM model to form a complete perception-prediction-decision-execution loop. This enables the system to achieve intelligent noise budget management during complex homomorphic inference tasks.
(1) Construction Mechanism of the Environment State Space: The first step in reinforcement learning is to define the state space S , which determines the information the agent possesses. First, the current noise level e i cur within the instantaneous state features s i instant is provided by the runtime noise monitor M . In a simulated environment, M maintains a virtual noise state variable, which is updated after each action execution according to the DN-ANM’s noise increment function. In a real-world environment, M instead calls the CKKS scheme’s built-in function GetNoise ( c t i ) to retrieve the actual noise level of the ciphertext. Based on this current noise level, M further computes derived features, such as the noise budget ratio r budget = e i cur / B max and the budget utilization r util . These normalized features map the raw noise values to the [ 0 , 1 ] interval, rendering states comparable across different HE parameter configurations and enhancing the policy’s generalization capability. The inference progress ratio r progress = i / L and the bootstrapping distance r distance then provide temporal context, helping the agent to understand the current stage of the inference process.
(2) State Transition Function: The DN-ANM provides a deterministic noise prediction model; consequently, the state transition is effectively deterministic and can be represented as a function T : S × A S . The implementation of this transition function must handle the three distinct action types, as each action induces a different state evolution pattern. To clearly express this process, we define two fundamental transition operators: the computation transition operator T compute and the bootstrapping transition operator  T boot .
The computation transition operator T compute : S × Θ S simulates the effect of executing one HE layer on the state. This operation updates the noise level according to the DN-ANM’s noise increment model: e i + 1 e i + Δ E i + 1 pred ( e i , W i + 1 ) where the specific form of Δ E i + 1 pred is selected from the noise models in Section 3 based on the type of layer i + 1 . For instance, if layer i + 1 is a 3 × 3 convolutional layer, then Δ E i + 1 conv ( e i , W i + 1 ) = α i + 1 · N mult ( i + 1 ) · ( Δ · w max + 1 ) · e i + β i + 1 · N mult ( i + 1 ) , where N mult ( i + 1 ) = 9 · C in is the number of multiplications, and w max is the maximum absolute value of the weights.
This state transition necessitates a re-computation of the foresight features based on the updated noise level | | e i + 1 | | and the advanced layer index i + 1 . We formally define the remaining consumption ratio ( r r e m a i n ) and the foresight horizon ( r h o r i z o n ) according to the following update rules:
r next P pred ( e i + 1 , i + 1 , i + 2 ) B max e i + 1 r remain P pred ( e i + 1 , i + 1 , L ) B max e i + 1 r horizon min k : P pred ( e i + 1 , i + 1 , i + 1 + k ) B max e i + 1
The metric r r e m a i n functions as a global risk indicator. Unlike the immediate consumption ratio r n e x t , which restricts its assessment to the safety of the single upcoming layer, r r e m a i n evaluates the long-term viability of the inference task. It compares the cumulative predicted noise consumption aggregated from layer i + 1 to the final layer L against the currently available noise budget. A value of r r e m a i n > 1 serves as a deterministic signal that the current budget is insufficient to reach the final layer without at least one future bootstrapping operation. Complementing this, r h o r i z o n quantifies the temporal safety window. By solving for the maximum integer k such that the cumulative noise remains within bounds, it provides the agent with an explicit countdown of safe computation steps available before a mandatory bootstrapping event is triggered. These foresight features collectively enable the agent to balance immediate safety with long-term planning.
The bootstrapping transition operator T boot : S S simulates the effect of executing the Boot operation on the state. The primary effect of bootstrapping is to reset the noise level to its initial value: e i B i n i t This reset operation has a cascading effect on multiple state features. First, the budget utilization r util is reset to zero, as no budget has been consumed since the bootstrapping. The bootstrapping distance r distance is also reset to zero, marking the start of a new noise accumulation cycle. Furthermore, due to the abrupt change in the noise level, all foresight features must be recomputed based on this new noise starting point. This implies that even at the same layer position, the state before and after bootstrapping will have entirely different foresight feature values, representing a complete reset of the noise budget.
Using these two base operators, the complete state transition function can be defined through composition. For the “Continue Computation” action a 0 , the transition directly applies the computation operator: T ( s i , a 0 ) = T compute ( s i , W i + 1 ) For the “Immediate Bootstrapping” action a 1 , the transition first applies the bootstrapping operator, then the computation operator: T ( s i , a 1 ) = T compute ( T boot ( s i ) , W i + 1 ) This signifies that the noise is reset first, and the next layer’s computation is then executed with a clean budget. For the “Pre-scheduled Bootstrapping” action a 2 , the transition first applies the computation operator, then the bootstrapping operator: T ( s i , a 2 ) = T boot ( T compute ( s i , W i + 1 ) ) This represents executing the next layer’s computation using the current budget, and then immediately resetting the noise in preparation for subsequent layers. This strategy of deferred bootstrapping can be superior to immediate bootstrapping under certain conditions, as it avoids budget waste.

4.3. Algorithm Implementation

We integrate the aforementioned components into a complete training pipeline, as shown in Algorithm 1 In each training episode, the agent interacts with the simulated homomorphic inference environment according to the current policy. The data generated during this interaction is stored in the experience replay buffer, and the Q-network is updated by minimizing the TD loss. In a practical homomorphic inference task, the algorithmic procedure is shown in Algorithm 2.
Algorithm 1 DQN Training
Require: Network N , Noise model P pred , Episodes E max
Ensure: Trained Q-network parameters θ *
  1: Initialize Q-networks θ , θ ^ θ , replay buffer D, step_count 0
  2:  ϵ ϵ start
  3: for  episode = 1 to E max   do
  4:        sInitialState ( G , B i n i t ) ; K episode 0 ; done false
  5:        while not done do
  6:              if Random < ϵ  then
  7:                    aUniformSample ( A )
  8:              else
  9:                     a arg max a A Q θ ( s , a )
10:              end if
11:               s T ( s , a ) ; r R ( s , a ) T is the DN-ANM simulation
12:              if  a { a 1 , a 2 } then  K episode K episode + 1
13:              end if
14: ▹ Check environment termination conditions
15:              if  s . layer_index = L  then
16:                     done true ; r r + R comp ( K episode )
17:              else if  s . noise_budget B max  then
18:                     done true ; r r P fail
19:              end if
20:              D. Store ( ( s , a , r , s , done ) )
21:               s s ; step_count step_count + 1
22:              if  | D | B  then
23:                    UpdateNetworks ( D , θ , θ ^ , step_count ) ▹ Target-network updates with Huber loss
24:              end if
25:        end while
26:         ϵ max ( ϵ end , ϵ · λ ( 1 / T decay ) ) ▹ Decay ϵ
27: end for
28: return  θ *
Algorithm 2 Online Adaptive Bootstrapping Decision-Making
Require: Input ciphertext c t 0 , Network N , Trained Q-network Q θ * , DN-ANM model ( A , P , M )
Ensure: Inference result c t L , Total bootstrapping count K
  1: G ← AnalyzeNetwork ( N ) ▹ Offline pre-processing: Analyze network
  2: n0 ← GetNoise ( c t 0 ) ▹ Get initial noise
  3: s ←InitializeState  ( G , n 0 ) ▹ Initialize state
  4:  i 0 , K 0
  5: while  i < L   do
  6:        a arg max a A Q θ * ( s , a ) ▹ Greedy action selection
  7:       if  a = a 1  then▹ Immediate bootstrap
  8:              cti ← Boot ( c t i )
  9:               K K + 1
10:       end if
11:        c t i + 1 f i + 1 ( c t i , W i + 1 ) ▹ Execute homomorphic computation
12:        i i + 1
13:       if  a = a 2  then▹ Pre-scheduled bootstrap
14:              cti ← Boot ( c t i ) ▹ Bootstrap the result c t i (now c t i + 1 logical)
15:               K K + 1
16:       end if
17:       ni ← GetNoise ( c t i ) ▹ Update noise information
18:       s ←  M . UpdateState ( n i , i , G , P ) ▹ Update state
19: end while
20: return  c t L , K
By organically integrating the noise perception capabilities from Section 3 with the adaptive decision-making capabilities from this section, we have constructed an end-to-end intelligent noise management system. This system provides a theoretically sound solution for the efficient homomorphic inference of deep neural networks.

Hyperparameter Configuration and Sensitivity Analysis

To facilitate reproducibility and provide a complete specification of the learning environment, we summarize the key hyperparameters used for training the DN-ANM agent in Table 3. As these parameters are not intrinsic to the theoretical model, their specific values were determined through an empirical grid search to optimize the stability of the Deep Q-Network. The discount factor is set to γ = 0.99 to ensure the agent prioritizes long-term safety over immediate computational gains, which is critical given the cumulative nature of noise growth. The reward weights were balanced ( w e f f = w s e c = 0.4 ) to prevent policy collapse; we observed that unbalanced weights led to either reckless behavior or overly conservative strategies. The safety thresholds were calibrated based on the error margin of noise predictor, θ d a n g e r was set to 0.95 rather than 1.0 to create a 5% buffer zone, accommodating potential worst-case deviations in the layer-wise noise prediction Δ E p r e d .
Sensitivity Analysis: The performance of agent is primarily sensitive to the balance between w e f f and w s e c . We observed that setting w e f f w s e c encourages the agent to delay bootstrapping until the noise budget is critically low, increasing the risk of decryption failure near the danger threshold ( θ d a n g e r ). Conversely, dominant security weights ( w s e c > 0.6 ) lead to conservative policies with redundant bootstrapping operations, approximating static fixed-interval strategies. The safety threshold θ s a f e determines the “alert zone”; lowering θ s a f e below 0.7 triggers earlier interventions but reduces budget utilization efficiency ( r u t i l ). The chosen configuration ( w e f f = w s e c = 0.4 ) enables the agent to maximize budget usage while maintaining a 5% safety margin ( θ d a n g e r = 0.95 ) against calculation errors.

5. Experimental Analysis

All experiments in this study were executed on a PC equipped with an Intel Ultra 7 258V 8-core processor and 32GB of RAM. The operating system was Windows 11, and the experimental environment was based on Python 3.12.3.

5.1. Network Architectures and Evaluation Metrics

To comprehensively evaluate the performance and generalization capability of the proposed DN-ANM mechanism, we employed two representative Convolutional Neural Network (CNN) architectures: LeNet-5 and ResNet-20. These models differ significantly in depth and computational patterns, enabling a robust evaluation of efficiency across varying complexities. We uniformly adopted f ( x ) = x 2 as the activation function to maintain compatibility with HE constraints.
To ensure the experimental results are transferable to practical privacy-preserving deployments, we configured the CKKS scheme to target a standard security level of λ 128 -bit. We selected a polynomial ring dimension of N = 16 , 384 , according to the HE security standard, supports a maximum modulus capacity sufficient for the depths of the tested networks while maintaining resistance against known lattice attacks (e.g., primal and dual hybrid attacks). Specifically, we utilized a coefficient modulus chain Q = { q 0 , , q L } with bit-sizes [ 60 , 30 , , 30 , 60 ] . The initial and special moduli are set to 60 bits to preserve precision during key switching, while intermediate moduli are set to 30 bits to match the scaling factor Δ = 2 30 . This configuration ( log Q 210 bits) strictly adheres to the security threshold for N = 16 , 384 , balancing computational efficiency with 128-bit security guarantees.
We compare the proposed DN-ANM with a representative baseline method, PyCrCNN [11], which exemplifies a framework adopting a static noise management strategy. It adheres to a fixed bootstrapping policy, triggering bootstrapping at predefined layer intervals, and lacks the capability for dynamic awareness of the network’s real-time state. Consequently, our evaluation primarily focuses on the number of bootstrapping operations, model accuracy, and parameter robustness.

5.2. Experimental Results and Analysis

5.2.1. Noise Prediction Effectiveness

We evaluate the accuracy of the DN-ANM noise prediction under the CKKS scheme. The encryption parameters are set as: polynomial dimension n = 16 , 384 , scaling factor Δ = 2 30 , and coefficient modulus chain q i s i z e s = [ 60 , 30 , 30 , 30 , 60 ] . The test network includes two convolutional layers (with 8 3 × 3 filters and 16 5 × 5 filters, respectively) and two fully-connected layers (with 64 and 10 output units). We compare DN-ANM against two methods: Worst-Case Static Analysis and Linear Extrapolation (LE). DN-ANM, the predictor for deep network-aware noise management, combines theoretical structures with empirical parameters to perform calibrated modeling of the noise increments for network layers such as convolution, fully-connected, and square (activation). Its objective is to estimate the noise consumption of each layer and maintain trend alignment. Worst-Case Static Analysis is a static estimation guided by worst-case upper bounds, which typically overestimates the noise and serves as a conservative baseline. Linear Extrapolation uses the average increment of the first few layers to perform linear extrapolation; it can track partial trends but is prone to deviation when encountering non-linear network layers and parameter changes.
The actual measured noise(MN) of each layer is obtained by calculating the difference between the decrypted value and the corresponding plaintext reference value, and is compared against the layer-wise predicted values from each method. As can be seen in Table 4, the proposed DN-ANM outperforms the baseline methods in both noise estimation accuracy and trend alignment. Its predictions adapt well to changes in convolutional kernel size, channel count, and the input dimensions of fully-connected layers, while maintaining profile stability after non-linear activations.

5.2.2. Bootstrapping Efficiency Analysis

Bootstrapping operations constitute the primary computational bottleneck in HE inference; minimizing their frequency is critical for performance. Figure 2a contrasts the bootstrapping overhead of DN-ANM against the static PyCrCNN baseline across varying network scales. While PyCrCNN exhibits a rigid, linear increase in overhead (fixed at 17 and 22 operations for ResNet-20 Narrow and Deep, respectively), DN-ANM dynamically optimizes the schedule, reducing overhead by approximately 47% to 65%. Figure 2b details the epoch-wise fluctuations during RL agent training. Unlike the flatline static schedule, DN-ANM actively explores the state space, converging to a minimal-cost strategy (approx. 6 operations for ResNet-20 Narrow) by the 7th epoch, validating the capability of RL agent to identify globally optimal execution paths.

5.2.3. Model Training Accuracy

Excessive bootstrapping not only incurs latency but also accumulates numerical precision loss, effectively adding noise to the gradient updates. Figure 3 illustrates the test accuracy for ResNet-20 over 80 training epochs. DN-ANM demonstrates superior convergence characteristics, reaching 80% accuracy by epoch 25—ten epochs earlier than the baseline. Furthermore, the model stabilizes at a higher peak accuracy (95–96% vs. 91–92%). This 4–5% performance gap is directly attributable to the reduced frequency of bootstrapping operations, which preserves the fidelity of the ciphertext data and mitigates the cumulative error inherent in the refresh procedure.

5.2.4. Parameter Robustness

To validate that the effectiveness of the DN-ANM mechanism is not confined to a specific parameter configuration; we further evaluate its performance robustness across different combinations of critical HE hyperparameters. The initial noise budget and the security threshold are two core variables influencing HE performance. Figure 4 presents the optimal average reduction in bootstrapping operations achieved by DN-ANM compared to PyCrCNN, under various initial budget settings. Here, optimal refers to the optimal security threshold configuration adaptively identified by the DN-ANM’s agent for a given budget. The experimental data reveals that DN-ANM demonstrates strong robustness across a broad parameter space. Whether in budget-constrained ( B u d g e t = 30 ) or relatively ample-budget ( B u d g e t = 60 ) scenarios, DN-ANM consistently delivers significant performance gains, achieving an average reduction of 14 to 17 bootstrapping operations. Particularly noteworthy is that DN-ANM’s adaptability is manifested in its selection of the optimal threshold for different budgets. For instance, the optimal threshold is 5 when B u d g e t = 40 , whereas it is 7 when B u d g e t = 50 . This demonstrates that the DN-ANM’s agent is capable of perceiving changes in HE parameters and dynamically adjusting its decision policy to match the current constraints, thereby finding a near-optimal execution path for any given parameter configuration.

5.2.5. Ablation Study

Table 5 presents the ablation study results across 21 experimental configurations, evaluated on the ResNet-20 wide architecture under varying HE constraints (noise budgets B { 40 , 50 , 60 } and thresholds θ s a f e { 5 , 6 , 8 } ). The key finding is that the ternary action space contributes primarily to security rather than efficiency. While the binary action variant achieves fewer raw re-encryptions (5.85 vs 18.67), it does so at the cost of a 38.1% reduction in safe completion rate (76.2% → 38.1%). Unlike raw re-encryption counts, the Violations column quantifies how often the ciphertext noise budget falls below the security threshold. Such events render the inference invalid, making configurations with non-zero violations impractical for production HE deployments. The foresight features improve safe completion rate by 23.8% by anticipating future noise consumption and triggering preemptive re-encryptions. The completion reward mechanism maintains stable RL training convergence with minimal impact on final performance.

6. Conclusions

To address the limitation of existing HE schemes in Deep Neural Network (DNN) inference, which employ fixed noise strategies and thus fail to adapt to the heterogeneous computational characteristics across network layers, this paper proposes the DN-ANM. The core of this mechanism involves constructing a layer-aware noise consumption prediction model and designing an optimization algorithm for bootstrapping decisions driven by reinforcement learning. This approach enables DN-ANM to dynamically and precisely allocate the noise budget, utilizing a Deep Q-Network (DQN) to learn the globally optimal timing for bootstrapping, thereby minimizing overhead. We conducted exhaustive experimental validation on two typical deep networks (LeNet-5 and ResNet-20). The experimental results demonstrate that the proposed DN-ANM can dynamically optimize its management strategy based on the network’s computational characteristics. Compared to the fixed strategy of PyCrCNN, its adaptive nature yields significant efficiency improvements. Furthermore, by minimizing the bootstrapping operations that induce precision loss, DN-ANM preserves the fidelity of the ciphertext data to the greatest extent, thereby accelerating the convergence of the learned bootstrapping policy and improving the stability of encrypted inference accuracy. In conclusion, DN-ANM addresses the limitations of static strategies through awareness and adaptive decision-making, offering a novel approach to balancing efficiency and precision in complex privacy-preserving computations. Future work will focus on deploying this mechanism from the simulated environment into a practical HE cryptography library to validate its real-world application performance. Further work will explore the use of Graph Neural Networks (GNNs) to characterize the computational graphs of arbitrary DNNs, enabling the Reinforcement Learning agent to generalize to more complex architectures such as Transformers.

Author Contributions

Conceptualization, C.Z.; Validation, F.B.; Data curation, Y.C.; Writing—original draft, C.Z.; Writing—review & editing, F.B., J.W., and Y.C.; Project administration, Y.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Key Research and Development Program of China Fundamental Research Projects under Grant 2022YFC3320800.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Liu, B.; Ding, M.; Shaham, S.; Rahayu, W.; Farokhi, F.; Lin, Z. When machine learning meets privacy: A survey and outlook. ACM Comput. Surv. (CSUR) 2021, 54, 1–36. [Google Scholar] [CrossRef]
  2. Podschwadt, R.; Takabi, D.; Hu, P.; Rafiei, M.H.; Cai, Z. A survey of deep learning architectures for privacy-preserving machine learning with fully homomorphic encryption. IEEE Access 2022, 10, 117477–117500. [Google Scholar] [CrossRef]
  3. Yuan, L.; Wang, Z.; Sun, L.; Yu, P.S.; Brinton, C.G. Decentralized federated learning: A survey and perspective. IEEE Internet Things J. 2024, 11, 34617–34638. [Google Scholar] [CrossRef]
  4. Zhang, Q.; Xin, C.; Wu, H. Privacy-preserving deep learning based on multiparty secure computation: A survey. IEEE Internet Things J. 2021, 8, 10412–10429. [Google Scholar] [CrossRef]
  5. Blanco-Justicia, A.; Sánchez, D.; Domingo-Ferrer, J.; Muralidhar, K. A critical review on the use (and misuse) of differential privacy in machine learning. ACM Comput. Surv. 2022, 55, 1–16. [Google Scholar] [CrossRef]
  6. Marcolla, C.; Sucasas, V.; Manzano, M.; Bassoli, R.; Fitzek, F.H.; Aaraj, N. Survey on fully homomorphic encryption, theory, and applications. Proc. IEEE 2022, 110, 1572–1609. [Google Scholar] [CrossRef]
  7. Falcetta, A.; Roveri, M. Privacy-preserving deep learning with homomorphic encryption: An introduction. IEEE Comput. Intell. Mag. 2022, 17, 14–25. [Google Scholar] [CrossRef]
  8. Doan, T.V.T.; Messai, M.L.; Gavin, G.; Darmont, J. A survey on implementations of homomorphic encryption schemes. J. Supercomput. 2023, 79, 15098–15139. [Google Scholar] [CrossRef]
  9. Zhang, Q.; Fu, Y.; Cui, J.; He, D.; Zhong, H. Efficient fine-grained data sharing based on proxy re-encryption in iiot. IEEE Trans. Dependable Secur. Comput. 2024, 21, 5797–5809. [Google Scholar] [CrossRef]
  10. Kim, A.; Deryabin, M.; Eom, J.; Choi, R.; Lee, Y.; Ghang, W.; Yoo, D. General bootstrapping approach for RLWE-based homomorphic encryption. IEEE Trans. Comput. 2023, 73, 86–96. [Google Scholar] [CrossRef]
  11. Disabato, S.; Falcetta, A.; Mongelluzzo, A.; Roveri, M. A privacy-preserving distributed architecture for deep-learning-as-a-service. In Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, UK, 19–24 July 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1–8. [Google Scholar]
  12. Zhu, K.; Wang, Z.; Ding, D.; Dong, H.; Xu, C.Z. Secure state estimation for artificial neural networks with unknown-but-bounded noises: A homomorphic encryption scheme. IEEE Trans. Neural Netw. Learn. Syst. 2024, 36, 6780–6791. [Google Scholar] [CrossRef] [PubMed]
  13. Lou, Q.; Jiang, L. Hemet: A homomorphic-encryption-friendly privacy-preserving mobile neural network architecture. In Proceedings of the International Conference on Machine Learning, Virtual, 18–24 July 2021; pp. 7102–7110. [Google Scholar]
  14. Schneider, T.; Wang, H.C.; Yalame, H. HE-SecureNet: An Efficient and Usable Framework for Model Training via Homomorphic Encryption. In Proceedings of the 24th Workshop on Privacy in the Electronic Society, Taipei, China, 13–17 October 2025. [Google Scholar]
  15. Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef] [PubMed]
  16. Fan, J.; Vercauteren, F. Somewhat practical fully homomorphic encryption. In Cryptology ePrint Archive; Paper 2012/144; IACR: Bellevue, WA, USA, 2012. [Google Scholar]
  17. Cheon, J.H.; Kim, A.; Kim, M.; Song, Y. Homomorphic encryption for arithmetic of approximate numbers. In Proceedings of the International Conference on the Theory and Application of Cryptology and Information Security; Springer: Cham, Switzerland, 2017; pp. 409–437. [Google Scholar]
  18. Mouchet, C.; Troncoso-Pastoriza, J.; Bossuat, J.P.; Hubaux, J.P. Multiparty homomorphic encryption from ring-learning-with-errors. Proc. Priv. Enhancing Technol. 2021, 2021, 291–311. [Google Scholar] [CrossRef]
  19. Lloret-Talavera, G.; Jorda, M.; Servat, H.; Boemer, F.; Chauhan, C.; Tomishima, S.; Shah, N.N.; Pena, A.J. Enabling homomorphically encrypted inference for large DNN models. IEEE Trans. Comput. 2021, 71, 1145–1155. [Google Scholar] [CrossRef]
  20. Castro, F.; Impedovo, D.; Pirlo, G. An efficient and privacy-preserving federated learning approach based on homomorphic encryption. IEEE Open J. Comput. Soc. 2025, 6, 336–347. [Google Scholar] [CrossRef]
  21. Mia, M.J.; Amini, M.H. QuanCrypt-FL: Quantized homomorphic encryption with pruning for secure federated learning. IEEE Trans. Artif. Intell. 2025. Early Access. [Google Scholar] [CrossRef]
  22. Wu, L.; Wang, X.A.; Liu, J.; Su, Y.; Tu, Z.; Liu, W.; Lei, H.; Tang, D.; Cao, Y.; Zhang, J. Homomorphic Encryption for Machine Learning Applications with CKKS Algorithms: A Survey of Developments and Applications. Comput. Mater. Contin. 2025, 85, 89–119. [Google Scholar] [CrossRef]
  23. Gentry, C. A Fully Homomorphic Encryption Scheme; Stanford University: Stanford, CA, USA, 2009; Volume 20. [Google Scholar]
  24. Gilad-Bachrach, R.; Dowlin, N.; Laine, K.; Lauter, K.; Naehrig, M.; Wernsing, J. Cryptonets: Applying neural networks to encrypted data with high throughput and accuracy. In Proceedings of the International Conference on Machine Learning, New York, NY, USA, 20–22 June 2016; pp. 201–210. [Google Scholar]
  25. Akram, A.; Khan, F.; Tahir, S.; Iqbal, A.; Shah, S.A.; Baz, A. Privacy preserving inference for deep neural networks: Optimizing homomorphic encryption for efficient and secure classification. IEEE Access 2024, 12, 15684–15695. [Google Scholar] [CrossRef]
  26. Ha, J.; Kim, S.; Lee, B.; Lee, J.; Son, M. Rubato: Noisy ciphers for approximate homomorphic encryption. In Proceedings of the Annual International Conference on the Theory and Applications of Cryptographic Techniques; Springer: Cham, Switzerland, 2022; pp. 581–610. [Google Scholar]
  27. Dathathri, R.; Kostova, B.; Saarikivi, O.; Dai, W.; Laine, K.; Musuvathi, M. EVA: An encrypted vector arithmetic language and compiler for efficient homomorphic computation. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation, London, UK, 15–20 June 2020; pp. 546–561. [Google Scholar]
  28. Lee, Y.; Cheon, S.; Kim, D.; Lee, D.; Kim, H. ELASM: Error-Latency-Aware scale management for fully homomorphic encryption. In Proceedings of the 32nd USENIX Security Symposium (USENIX Security 23), Anaheim, CA, USA, 9–11 August 2023; pp. 4697–4714. [Google Scholar]
  29. Bengio, Y.; Goodfellow, I.; Courville, A.; Bengio, Y. Deep Learning; MIT Press: Cambridge, MA, USA, 2017; Volume 1. [Google Scholar]
  30. Miettinen, K. Nonlinear Multiobjective Optimization; Springer Science & Business Media: New York, NY, USA, 1999; Volume 12. [Google Scholar]
  31. Dathathri, R.; Saarikivi, O.; Chen, H.; Laine, K.; Lauter, K.; Maleki, S.; Musuvathi, M.; Mytkowicz, T. CHET: An optimizing compiler for fully-homomorphic neural-network inferencing. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, Phoenix, AZ, USA, 22–26 June 2019; pp. 142–156. [Google Scholar]
Figure 1. Deep Network-aware Adaptive Noise-budget Management Model.
Figure 1. Deep Network-aware Adaptive Noise-budget Management Model.
Electronics 15 00275 g001
Figure 2. Comparative analysis of bootstrapping overhead. (a) Total bootstrapping counts across ResNet-20 variants, demonstrating a consistent 47–65% reduction by DN-ANM compared to the static PyCrCNN baseline. (b) RL agent learning dynamics over 20 RL training epochs, highlighting its ability to adaptively converge to a low-frequency bootstrapping policy.
Figure 2. Comparative analysis of bootstrapping overhead. (a) Total bootstrapping counts across ResNet-20 variants, demonstrating a consistent 47–65% reduction by DN-ANM compared to the static PyCrCNN baseline. (b) RL agent learning dynamics over 20 RL training epochs, highlighting its ability to adaptively converge to a low-frequency bootstrapping policy.
Electronics 15 00275 g002
Figure 3. Encrypted inference accuracy during RL agent training for ResNet-20.
Figure 3. Encrypted inference accuracy during RL agent training for ResNet-20.
Electronics 15 00275 g003
Figure 4. Relationship Between Network Complexity and Bootstrapping Reduction Rate.
Figure 4. Relationship Between Network Complexity and Bootstrapping Reduction Rate.
Electronics 15 00275 g004
Table 1. Summary of Key Notations.
Table 1. Summary of Key Notations.
SymbolDescription
N Deep Neural Network model
LTotal number of network layers
qCiphertext modulus
Δ Scaling factor in CKKS scheme
B i n i t Initial noise budget
B m a x Maximum noise threshold for decryption correctness
| | e | | Infinity norm of the noise error
S State space in the Markov Decision Process
A Action space { a 0 , a 1 , a 2 }
s t State vector at step t
r t Reward received at step t
π * Optimal bootstrapping policy
Q ( s , a ) Action-value function
Table 2. Summary of Calibrated Noise Increment Models by Layer Type.
Table 2. Summary of Calibrated Noise Increment Models by Layer Type.
Layer TypeKey ParametersNoise Increment Model ( Δ E )
Convolutional N m u l t , w m a x , α i , β i α i N m u l t ( Δ w m a x + 1 ) | | e i n | | + β i N m u l t
Square Activation m m a x 2 Δ m m a x | | e i n | | + Δ 1 | | e i n | | 2 + e r e l i n
Poly Activation m m a x , d ( 2 Δ m m a x ) log 2 d | | e i n | | + O ( d · e r e l i n )
Table 3. DN-ANM Hyperparameter Configuration (Empirically Optimized).
Table 3. DN-ANM Hyperparameter Configuration (Empirically Optimized).
CategoryParameterValue
Training SettingsEpisodes ( E m a x )2000
Replay Buffer Size (N) 10 5
Batch Size (B)32
Learning Rate ( η ) 1 × 10 4
Discount Factor ( γ )0.99
Reward WeightsEfficiency ( w e f f )0.4
Security ( w s e c )0.4
Completion ( w c o m p )0.2
ThresholdsSafe Threshold ( θ s a f e )0.80
Danger Threshold ( θ d a n g e r )0.95
Lookahead Threshold ( τ n e x t )0.15
PenaltiesBootstrapping Cost ( α )0.1
Premature Boot. ( β )0.2
Table 4. Comparison of the performance of layer noise measurement and prediction methods.
Table 4. Comparison of the performance of layer noise measurement and prediction methods.
TypeParamsMN ( 10 7 ) DN-ANM ( 10 7 ) Worst-CaseLE ( 10 7 )
Conv2dIC = 1, OC = 83.75597.0809 3.53 × 10 9 9.1703
Conv2dIC = 8, OC = 164.24497.6795 8.21 × 10 10 1.1921
Square8.16277.6811 8.21 × 10 10 2.1458
LinearIn = 64, Out = 643.78997.8763 1.06 × 10 11 3.1472
LinearIn = 64, Out = 103.50298.0765 1.29 × 10 11 4.6159
Table 5. Performance Comparison of Different Configurations (Ablation Study).
Table 5. Performance Comparison of Different Configurations (Ablation Study).
ConfigurationAvg Reenc.ViolationsSafe Rate(%)Adjusted Reenc. *
Full DN-ANM18.67076.219.08
No Foresight18.970.1652.419.75
Binary Action5.850.2438.17.06
No Completion Reward18.40076.218.82
* The number of re-encryption operations after introducing penalties for violations.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, C.; Bai, F.; Wan, J.; Chen, Y. A Reinforcement Learning-Based Optimization Strategy for Noise Budget Management in Homomorphically Encrypted Deep Network Inference. Electronics 2026, 15, 275. https://doi.org/10.3390/electronics15020275

AMA Style

Zhang C, Bai F, Wan J, Chen Y. A Reinforcement Learning-Based Optimization Strategy for Noise Budget Management in Homomorphically Encrypted Deep Network Inference. Electronics. 2026; 15(2):275. https://doi.org/10.3390/electronics15020275

Chicago/Turabian Style

Zhang, Chi, Fenhua Bai, Jinhua Wan, and Yu Chen. 2026. "A Reinforcement Learning-Based Optimization Strategy for Noise Budget Management in Homomorphically Encrypted Deep Network Inference" Electronics 15, no. 2: 275. https://doi.org/10.3390/electronics15020275

APA Style

Zhang, C., Bai, F., Wan, J., & Chen, Y. (2026). A Reinforcement Learning-Based Optimization Strategy for Noise Budget Management in Homomorphically Encrypted Deep Network Inference. Electronics, 15(2), 275. https://doi.org/10.3390/electronics15020275

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop