Next Article in Journal
A Novel Traffic Scheduling Algorithm for Multi-CQF Using Mixed Integer Programming and Variable Neighborhood Search Genetic Algorithm in Time-Sensitive Networking
Previous Article in Journal
Engineering Safety-Oriented Blasting-Induced Seismic Wave Signal Processing: An EMD Endpoint Suppression Method Based on Multi-Scale Feature
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Applying Reinforcement Learning to Protect Deep Neural Networks from Soft Errors

1
Department of Engineering Design, KTH Royal Institute of Technology, 10044 Stockholm, Sweden
2
Thrust of Microelectronics, Hong Kong University of Science and Technology (Guangzhou), Guangzhou 511458, China
*
Author to whom correspondence should be addressed.
Sensors 2025, 25(13), 4196; https://doi.org/10.3390/s25134196
Submission received: 26 May 2025 / Revised: 1 July 2025 / Accepted: 2 July 2025 / Published: 5 July 2025
(This article belongs to the Section Fault Diagnosis & Sensors)

Abstract

With the advance of Artificial Intelligence, Deep Neural Networks are widely employed in various sensor-based systems to analyze operational conditions. However, due to the inherently nondeterministic and probabilistic natures of neural networks, the assurance of overall system performance could become a challenging task. In particular, soft errors could weaken the robustness of such networks and thereby threaten the system’s safety. Conventional fault-tolerant techniques by means of hardware redundancy and software correction mechanisms often involve a tricky trade-off between effectiveness and scalability in addressing the extensive design space of Deep Neural Networks. In this work, we propose a Reinforcement-Learning-based approach to protect neural networks from soft errors by addressing and identifying the vulnerable bits. The approach consists of three key steps: (1) analyzing layer-wise resiliency of Deep Neural Networks by a fault injection simulation; (2) generating layer-wise bit masks by a Reinforcement-Learning-based agent to reveal the vulnerable bits and to protect against them; and (3) synthesizing and deploying bit masks across the network with guaranteed operation efficiency by adopting transfer learning. As a case study, we select several existing neural networks to test and validate the design. The performance of the proposed approach is compared with the performance of other baseline methods, including Hamming code and the Most Significant Bits protection schemes. The results indicate that the proposed method exhibits a significant improvement. Specifically, we observe that the proposed method achieves a significant performance gain of at least 10% to 15% over on the test network. The results indicate that the proposed method dynamically and efficiently protects the vulnerable bits compared with the baseline methods.

1. Introduction

With the advancement of Artificial Intelligence (AI), Deep Neural Networks (DNNs) are widely employed in sensor-based systems across various fields [1]. In these systems, DNNs usually are utilized in the perception of complex operational conditions by dealing with unstructured high-dimensional data. However, the extensive parameters within the DNN lead to the computational process that is difficult to interpret and explain, posing challenges for quality management of DNNs. In particular, the ability of DNNs to withstand potential functional faults is considered to be an important factor in the safety critical systems, such as Automated Driving Systems (ADSs) [2] and medical equipment [3]. As typical functional faults, soft errors caused by radiation, aging, device variation, and temperature [4] can affect a small proportion of the data within DNNs [5]. Consequently, they weaken robustness and threaten safety by leading unexpected behaviors and system failure. Some related work [4,5,6] shows soft errors such as bit-flips that could lead to the DNN misclassifying a critical object (e.g., a truck) as a trivial one (e.g., a bird). Based on the reasoning of related system hazards, industrial standards (e.g., IEC 61508 [7] and ISO 26262 [8] ) advise systematic approaches to guarantee functional safety by fault avoidance, fault tolerance, and fault detection [9]. In addition, soft errors—particularly bit-flips—can also act as a form of attack that threatens the security of distributed intelligent systems, such as the Internet of Vehicles (IoV) and Unmanned Aerial Vehicle (UAV) swarms [10]. For example, sensor data integrated into IoV systems is often used by DNNs to determine vehicle positions. An attacker could exploit this by flipping specific bits to falsify the input data [11]. Therefore, enhancing the security of these systems requires a thorough understanding of, and protection against, the impact of soft errors on DNNs.
Conventional fault-tolerant techniques [12], such as replicating critical components via Triple Module Redundancy (TMR) [13] or inspecting vital outputs via symptom of neurons within DNN [14], pose challenges in efficiently and effectively achieving fault tolerance. Such challenges often arise from the difficulty in effectively addressing and identifying vulnerabilities within DNNs. For example, while recent studies suggest that certain layers of DNNs are more susceptible [2,15], such findings may not provide sufficient granularity for redundancy design or checksum validation in certain safety-critical applications, particularly in embedded systems with AI techniques, where physical sizes and economic costs are limiting factors [16]. Current studies propose contributions from various strategies [4,17], including optimizing DNN architectures, algorithms, microarchitecture, and hardware implementation to address these issues. These approaches enable DNN protection and reliability analysis across the functional and technical designs, as well as the Validation and Verification (V&V) process [18].
However, there remains a need for increased effort in protecting DNNs from soft errors by dynamically and efficiently identifying vulnerable bits. Meanwhile, the identification of vulnerable bits can play a critical role during functional design exploration within safety-critical systems (e.g., ADS) to support the design of necessary safety mechanisms that provide the protection [18,19,20]. Therefore, we propose a Reinforcement Learning (RL)-based method to protect DNNs from soft errors. Regarding soft errors impacting DNN performance, the RL-based agent dynamically selects and protects vulnerable bits by generating bit masks. To improve the efficiency of the proposed method, we adopt transfer learning across DNNs by utilizing the layer-wise resiliency acquired from fault injection simulation as the prior knowledge for reducing the training time. As a case study, we select different DNNs to test and validate the proposed approach. Compared to several baseline methods, the results indicate that our approach decreases the cost of redundancy and improves the protection efficiency. The contributions of this article are summarized as follows:
  • Proposing a methodological framework integrating fault injection, layer-wise resiliency analysis and learning-based agent to reveal and protect DNNs against soft errors.
  • Designing a layer-wise RL-based agent to identify vulnerable bits in each layer. This layer-wise learning agent efficiently and dynamically generates bit masks to protect DNNs from soft errors.
  • Adopting transfer learning to improve the training efficiency and flexibility in selecting protected bits by using layer-wise resiliency as prior knowledge.
The rest of this paper is organized as follows: Section 2 presents related work on fault injections within DNNs and the protection of DNNs from soft errors. Section 3 describes the methodology and workflow of the proposed framework, which consists of fault injection, resiliency analysis, the configuration of the DRL-based agent, and the design of transfer learning. In Section 4, we use a customized MLP and ResNet-18 with MNIST and CIFAR-10 datasets to test and verify the proposed approach. The conclusion and future work are detailed in Section 5.

2. Related Work

In this section, we review recent studies on fault injection techniques and fault models in the context of DNNs by elaborating the adaption of fault injection techniques. Next, we present approaches aimed at improving resiliency and protecting DNNs from soft errors and discuss their limitations. To this end, we also present a background of Reinforcement Learning and its current adoption within the DNN.

2.1. Injecting Faults Within DNNs

Fault injection is a common approach to revealing the system-wide impacts of specific faults for requirements engineering, system and component design, and verification and validation [21]. Based on the characteristics of faults of concern, fault injection techniques for DNNs can be categorized into (1) functional fault injection, (2) technical fault injection, and (3) physical fault injection.
Functional fault injection involves the use of Platform-independent Models (PiMs) to simulate and analyze the functional impacts of faults [19,22]. This technique is typically applied during early design phases for revealing the sensitivity of DNN [17,23] applications regarding faults commonly represented as bit-flips, stuck-at-0/1, and Gaussian noise with certain prior distributions. The injection affects either the structure or the parameters of specific DNN models [2,5,15,23,24,25], with the scope of injection ranging from the network layers to specific neurons and parameters. By enabling an analysis of fault behaviors early in system development, functional fault injection constitutes a basis for identifying additional requirements on DNN robustness, resilience, and protection. While injecting functional faults with high fidelity into applications can often become very challenging, current studies have shown that a systematic approach to fault modeling and fault injections can still yield effectiveness similar to more detailed hardware simulation [26]. Many current approaches [2,5,15,24,25] treat the injections of faults as individual and uncorrected tasks, highlighting a significant area for improvement.
Technical fault injection is a technique used to assess the robustness and reliability of DNN using Platform-specific Models (PsMs) or detailed hardware-dependent solutions. It is focused on the fault behaviors of DNNs when implemented with hardware resources like CPU and AI accelerators [19,27,28,29]. The injections can be applied to the full DNN systems by targeting the microarchitectures, including Instruction Set Architectures (ISAs) and Register Transfer Level (RTL) properties, commonly through advanced virtual simulations (e.g., gem5 [30]). The aim is to effectively capture the actual system’s responses to hardware faults [26,27]. This type of injection is important for the detailed technical design of DNN systems, where the specification of hardware performance, robustness, and resilience are of particular importance.
Physical fault injection relies on gaining physical access or close proximity to the target DNN systems in their actual operational environments to intentionally introduce physical faults [22]. As an example of physical fault injection, prior studies [6,27] arrange the hardware (e.g., CPU and General Purpose—GPU) to be exposed to neutron beams. The results conclude that while the data corruption rate remains similar, physical fault injection exhibits a significantly higher system failure rate compared to one of the other fault injection methods. Beyond the synthetic simulations, this type of fault injection (e.g., the beam test) often requires high-cost equipment, such as radiation sources, to induce data corruption in memory. Moreover, beam tests typically offer a statistical perspective on the impact of fault models across the devices, such as Failure In Time (FIT), which does not allow faults to be measured at the level of individual memory cells or mapped to specific parameters of DNNs [31]. Therefore, this method is tailored for reliability analysis during the final technical development phase, such as with physical microchips or products.

2.2. Protecting DNNs from Soft Errors

Robustness has been considered as a key concept in the design of DNN protection. Previous research [4,5,6,32,33] refers to the robustness of DNNs as their capability to resist faults and maintain performance under various disturbances. The corresponding protection strategies for DNN systems can be broadly categorized as follows [33]:
Passive strategies are characterized by a system that works without interfering with the faults. Such approaches usually design redundant modules (e.g., triple module redundancy) to filter the faulty values by voting for the major one [34,35]. Considering the extensive parameters of DNNs, such a redundancy design could significantly increase the hardware dimensions. Some research selects critical neurons with redundant modules for fault tolerance [25,35]. However, such approaches can become inefficient due to the challenges of identifying and addressing critical neurons in large-scale DNNs. Moreover, it is normally a challenging task to protect all potential critical neurons from various errors [4,5]. Adding noise into the training data enhances the robustness of DNNs by avoiding overfitting [36]. Building on this training-based enhancement, recent studies also address the protection of DNNs against soft errors by introducing fault injections during training to improve robustness [37]. For example, a soft error library is developed to inject into the training dataset in [38]. However, these optimization-based approaches typically assume that faults occur in the input data rather than within the DNNs themselves (e.g., weights or biases of activation functions). This assumption may underestimate the impact of soft errors as DNNs inherently have the ability to filter certain unexpected inputs (e.g., values with extremely large magnitudes) through their activation functions. As an improvement, some works add regularization terms in loss functions to resist potential soft errors inside the DNNs [39]. These modifications incorporating regularization terms only provide a generic improvements of soft error resilience in DNNs. However, such a regularization may actually demand extensive iterations to converge the training process.
Active strategies refer to the introduction of a specific mechanism that dynamically manages resources to mitigate the impacts of faults by addressing their locations [40]. One key effort is to detect errors in the DNN architectures. Some methods employ and optimize the checksum mechanisms for detecting errors [41] in the DNN accelerator. However, such an optimization requires additional computational overhead to implement the design [42]. Other research proposes to detect errors based on data-driven approaches (e.g., machine learning techniques [43,44]). Such approaches usually need a complete comprehension of fault behaviors, requiring massive training data.
Another effort in supporting active fault tolerance is to protect computational units within DNNs. Specifically, these efforts can be categorized as follows: (1) Layer-wise protection refers to analysis of layer-wise features from DNNs. For example, a median feature selection scheme [45] is proposed to filter the impact of bit errors from each layer of the DNNs. Similarly, some work [2,46] reveals resiliency and correct errors by analyzing data distributions extracted from the intermediate layers of the DNNs. Although this approach improves the robustness by synthesizing layer-wise median features across DNN, such a fine-grained fault-tolerant module needs sophisticated data analysis techniques to extract the features from each layer, which limits its application on the large-scale DNNs. (2) Neuron-level protection refers to techniques that make DNNs recoverable from errors with dynamic constraints on the output value [24,47]. For example, an output of activation functioning larger than a predefined threshold could dramatically affect the performance of the DNNs [24]. Therefore, this work proposes a clipped function to constrain the maximum output. Although this approach mitigates the faulty behaviors by solely suppressing the output values of neurons, it still cannot provide a guarantee to correct the constrained value. (3) Bit-level protection is a fundamental solution for protecting DNNs from soft errors as it directly and explicitly safeguards the basic units within a DNN. In particular, the bit-level protection supposes to cope with Silent Data Corruption (SDC) [5], which usually refers to the corruptions in data without any explicit indication caused by flipping bits [4,17]. These bits are identified as vulnerable bits, which require protection. Several approaches [48,49] based on the Error Correction Code (ECC) are used to protect the DNN from soft errors. For example, RL-based approaches in [49] exhibit adaptability to select different modes by analyzing the power consumption and error-correcting schemes. However, these schemes are more efficient to protect bit-level data in conventional applications, such as bit streams in the communication systems. Some research studies [50,51] propose RL-based methods to address vulnerable bits in the context of DNN hardware. For example, an RL-based selective scheme [50] is proposed to protect DNNs by identifying critical bits. The agent protects the DNNs by masking the vulnerable bits across the network. Nevertheless, the extensive amount of DNN parameters could lead to the agent spending an unaffordable amount of time performing satisfactorily. To solve this issue, a compositional mechanism is used in [51] by designing global and local agents to boost the training of the RL. For instance, to protect a ResNet-18 [52] with more than 15 layers, the methods need to design local agents for each layer. Each of these local agents obtains independent local reward functions [50,51] to analyze the DNN performance. Meanwhile, a global agent with an additional reward function is used to evaluate the protection efficiency of these layer-wise agents. Such a design increases the complexity to design the reward functions and the computational costs to converge the training of the agents.

2.3. Adopting RL to Analyze Behaviors of DNNs

Reinforcement Learning (RL) is an interactive optimization process in which an agent makes decisions and receives feedback in the form of rewards from the environment. To ensure that an agent can make appropriate decisions in varying situations, Reinforcement Learning (RL) typically relies on a quality function (Q-function) or a value function (V-function) to quantify the impact of the agent’s decisions—such as specific states or state–action pairs—on the environment. By iteratively approximating these functions, an RL-based agent can optimize its decision-making process. As environments become increasingly complex, the evaluation of the decisions is often implemented using various neural network architectures. For example, Deep Q-Learning (DQN) employs neural networks—specifically policy and target networks—to approximate Q-values in environments with discrete and low-dimensional action spaces [53]. In contrast, the Deep Deterministic Policy Gradient (DDPG) uses actor and critic networks to optimize the agent’s actions in continuous and high-dimensional action spaces [54]. Due to their learning-enabled nature, DRL approaches are widely used for optimization-related tasks such as communication and networking in Internet of Things [55] and anomaly detection [56]. In particular, given the extensive state spaces associated with neural networks, DRL-based methods offer a promising approach for understanding and analyzing their behaviors and performance by formulating them as optimization problems. For example, DRL-based agents are widely employed to optimize network pruning by identifying and removing redundant components within a DNN. In such cases, the agent’s actions involve pruning specific parts of the target DNN—which serves as the environment to be compressed—while the states represent the DNN’s performance in response to these actions. One example is the use of a DRL-based agent for layer-wise pruning using the DDPG algorithm [57]. Building on this idea, a multi-agent design is proposed to perform channel-wise pruning within the DNN [58]. Although most existing approaches leverage DRL-based methods to optimize DNN architectures by targeting coarse-grained structures (e.g., layers or channels), only a few focus on using DRL to address vulnerabilities at a fine-grained level, such as individual parameters.
As mentioned above, this work proposes an RL-based method to protect DNNs from soft errors by addressing and identifying vulnerable bits during functional development phase. After reviewing related work on fault injection techniques, we propose a functional fault injection service featuring a simulated memory hierarchy model. With the adoption of layer-wise resiliency analysis, the DQN-based agent protects vulnerable bits from soft errors by generating bit masks within a specific layer. To enhance the efficiency of protection and reduce the time cost to deploy the agent across DNNs, we utilize resiliency analysis as prior knowledge to adopt transfer learning.

3. Methodology

This work proposes a framework to protect DNNs with an RL-based agent that helps to reveal the vulnerable bits due to soft errors. As shown in Figure 1, we utilize a fault injection service to assess the impacts of soft errors on the DNN (Task I) and generate a layer-wise resiliency analysis (Task II). Based on the resiliency analysis in Task II, a layer-wise RL-based agent is trained to protect the vulnerable bits from soft errors by generating bit masks (Task III). To promote the training efficiency and employ the RL-based agent across DNN, we adopt transfer learning to utilize the acquired knowledge from a trained agent and resiliency analysis (Task IV).

3.1. Task I: Simulating Faults Within DNNs

As previously mentioned, current functional fault injection methods typically manipulate parameters without considering memory resource constraints [2,5,15,24,25]. As a result, faults are usually injected into parameters as independent and identically distributed (i.i.d.) events, disregarding the occurrence of faults in the context of memory allocation. To address this issue, the proposed framework introduces an simulated memory model for fault injection within the DNN parameters. The design of this simulated model follows the memory hierarchy of DNN accelerators. Specifically, a DNN accelerator usually consists of the Processing Elements (PEs) to process the data and memory hierarchy to store the parameters of the PE [4,5]. Compared with potential faults occurred in the PE, previous research [4,5,6,32] shows that fetching data (e.g., parameters of DNNs) from the memory hierarchy could be more vulnerable due to the highly frequent operations. Therefore, we mainly focus on the fault occurring in parameters of DNNs within the memory. To inject these faults, we firstly present a schematic of a typical DNN accelerator, as shown in Figure 2.
The memory hierarchy usually organizes extensive memory cells by words and entries so that each operation fetches a batch of parameters from specific entries. To characterize the operations from different entries, we allocate the parameters including weights and bias to this memory hierarchy. Such a hierarchy is analogous to the HBM2 memory, which is a common type of Dynamic Random Access Memory (DRAM) used in DNN accelerators [6,26]. To be specific, the parameters of the target DNN are converted to a generic machine-readable format (e.g., floating bytes) and then stored in the simulated memory representing the memory hierarchy. To design this simulated model for supporting the faults to be injected, we configure the memory hierarchy, which is similar to the memory in [6]. This memory contains multiple entries to store batches of parameters as a set of continuous words. These words are strings of bytes converted from the parameters of DNNs. A batch of parameters from the same layer is stored in the same entry in sequence until all of the parameters are loaded or these entries are fully stacked.
To define soft errors occuring in these units, we assign each entry i with an independent error distribution r e i . Meanwhile, we use Bit Error Rate (BER) r b to indicate the number of bit errors per entry. To this end, the fault probability p e of the simulated memory for storing the parameters of DNNs can then be formulated as follows:   
p e = i = 1 N e r e i · r b
where N e refers to the number of entries, determined by the number of parameters to be stored in the simulated model and the size specification for each entry.
To this end, we present the development flow based on the proposed fault injection service shown in Figure 3 as follows: (1) creating a simulated model by configuring the size of entries and words; (2) extracting the weights from pre-trained DNN models; (3) allocating the weights sequentially based on the constraints of the simulated model; (4) injecting the soft errors by assigning fault parameters within Equation (1). Specifically, the fault injection simulation is configured by injecting bit-flips in the weight values of the DNNs, with a fault probability p e affecting different memory entries. Additionally, compared to the most of current work which randomly inject faults across the parameters, the proposed fault injection along with the surrogate model enables the allocation of fault occurrences with respect to the memory index of the DNN’s weights.

3.2. Task II: Analyzing Layer-Wise Resiliency of DNNs

Once these faults are injected, we analyze the impacts on the DNNs based on a metric that measures in the context of applications. For example, the deviation of TOP-1 accuracy could be a metric that reflects the resiliency of supervised learning (e.g., image classification), while the F1 score could be used for evaluating the semi-supervised and unsupervised learning (e.g., prediction and regression). To deal with the inherent probabilistic nature of the DNNs, we analyze the resiliency by collecting a set of performance data and their deviations, which are formulated in Equation (2).
P t ¯ p e = j = 1 N k P j N k σ t = j = 1 N k ( P j P t ¯ p e ) 2 N k
where P j refers to the measured performance figure of injecting faults with p e on the DNNs in the j-th execution. N k refers to the total number of fault injection simulation cases for collecting the performance data after injecting faults at timestamp t. σ t refers to the deviations caused by the injected faults.

3.3. Task III: Generating Bit Masks by a RL-Based Agent

A conventional error protection and correction scheme (e.g., Hamming Code) usually protects a few arbitrary bits from soft errors [59]. Although such a protection scheme has been widely used, it is insufficient for soft errors with multiple vulnerable bits. Moreover, such protection schemes usually protect arbitrary bits without considering their significance in the context of DNN operations. However, given the extensive parameters of the DNNs, efficiently addressing the vulnerable bits across the DNNs is challenging due to the following reasons: (1) Soft errors can have varying impacts across the DNNs—specifically, errors occurring in different layers can lead to different levels of performance degradation and result in varying numbers of vulnerable bits. (2) With more neurons in the layers, analyzing and addressing vulnerable bits become inefficient due to the need to inject faults across a proportion of neurons. (3) With deeper networks, it is time-consuming to identify vulnerable bits by iterating layer by layer. To address these challenges, an RL-based solution is developed to dynamically and efficiently identify the vulnerable bits by interacting the agent and the environment. The backbone of the interaction is to create a Markov Decision Process (MDP) that contains the following elements [60,61]: S , A , T , r , γ . S refers to the state space, and A refers to a set of actions. T ( s t + 1 | ( s t , a t ) ) describes transition dynamics from the current states s t to the next states s t + 1 at timestamp t by taking an action a t . r refers to an immediate reward regarding consequences of a policy π ( s , a ) based on the state–action pairs ( s t , a t ) . A discount factor γ [ 0 , 1 ] is used to offset the impact of the immediate reward r. To evaluate the long-term rewards R of current state–action pairs, we use the following function [61]:
Q π ( s t , a t ) = E [ R | s t , a t , π ( s t , a t ) ]
The configuration of actions and states considers the following aspects to encourage the training of the RL agent: (1) It is impractical to implement the protection scheme with small memory granularity (e.g., each weight assigned to an individual bit mask) in the memory hierarchy. Therefore, the protection scheme should be flexible to be deployed in the context of DNNs. (2) The protection scheme should leverage the resiliency in terms of layers to decrease the protection cost (e.g., the protection number of bits). Thus, we protect the parameters (e.g., weights) by specifying a layer-wise bit mask. To be specific, we propose an RL-based agent by generating bit masks to protect specific bits in the l-th layer. Such a bit mask can be formulated as follows:
m l = ( m 0 l , m 1 l , , m w 1 l ) ( 0 , 1 ) w
where w refers to the number of bits requiring to store a parameter from the l-th layer. m i l = 1 refers to protect bit i of every weight in the layer l. By selecting the protected bits of parameters (e.g., weights in a layer of the DNNs), the state space S contains bit mask with different combinations, which dynamically determines the efficiency of the protection. As an example, Figure 4 shows that the weights of the DNNs store in different memory locations. By assigning the bit masks regarding the resiliency, it is efficient to protect the parameters of a layer from soft errors.
As shown in Figure 1, the actions of the RL-based agent involve marking the bit positions within the masks m l to explore and exploit the vulnerable bits. We define the action a l as follows:
a l = ( a 0 l , a 1 l , , a w 1 l )
where a i l { 0 , 1 } refers to the actions for configuring the bit masks to protect bit position i at layer l. The primary reason for selecting binary representations of actions is to reduce the size of the state and action spaces. The size of the action space using this binary representation is | A | , denoted as 2 w , which is less than the agent sample’s actions from a continuous distribution (e.g., a Gaussian distribution).As an example of generating bit masks via current actions, Figure 4 presents how the initial bit masks m 0 l = ( 0 ) are modified by actions a t l at step t, where a 1 l and a 5 l are set to 1 to produce m t l . In addition, to encourage the agent to explore and exploit vulnerable bits in the next step, it receives rewards based on the policy π ( m t l , a t l ) .
We design the reward function by considering the following aspects: (1) The reward should encourage the generation of bit masks by the RL-based agent to protect the parameters of DNNs from soft errors. (2) The reward should balance the performance and redundancy cost by exploring and exploiting vulnerable bits. Therefore, the reward function designed for layer l at timestamp t is modeled as follows:
r t l = α · f i ( P 0 P t ¯ p e ) β · f p ( i = 0 w I ( m i l ) )
where P 0 denotes the performance of the DNNs defined by the system requirements. P t ¯ p e derived from Equation (2) refers to the performance after injecting faults with the fault probability p e . The functions f i and f p are scaling functions that normalize the performance and protected number of bits to facilitate the convergence of the DRL-based agent. Specifically, f i ( · ) is a function where the agent receives a greater reward when there is less discrepancy between the P 0 and P t ¯ p e . f p ( · ) represents that when more bit needs to be protected, the agent receives more penalty. When a specific bit is masked by m i l , the indication function I ( m i l ) = 1 . α , β refer to the coefficients to balance the performance and the number of masked bits.
Based on the modeling of the design space and reward functions of the RL-agent, we conclude the states and actions are discrete with numerical rewards. Therefore, we adopt Deep Q-learning (DQN) [53], a model-free and off-policy algorithm, to approximate the optimal action-value functions Q * ( s , a ) , which is represented as follows:
Q * ( s , a ) = m a x π E R t | s t = s , a t = a , π ( s , a )
where a refers to the action for marking bits defined in Equation (5), and s refers to the bit masks defined in Equation (4).
To stable optimize and approximate the Q-value, the DQN utilizes the similar implementation in [53], where this Q-network contains a target network and a policy network with the same configurations and parameters. The weights of the target network are slowly updated by the learned policy network. We model this training process as Equation (8), which minimize loss function L i ( θ i ) at each iteration i:
L i ( θ i ) = E s , a ρ ( · ) ( r i l + γ · Q ( s , a ; θ i 1 ) Q ( s , a ; θ i ) ) 2
where ρ ( · ) refers to the probability distribution of the state–action pairs. θ i 1 , θ i refer to the parameters updating for the target and policy networks. γ refers to the discount factor for the long-term reward. To encourage the agent to explore and exploit using off-policy approaches, the DQN implements an ϵ -greedy policy to select and execute actions by sampling from the replay memory, which is a buffer pool over many episodes during training. The replay buffer supports breaking the sequential correlations that arise during online training by using past experiences stored in the buffer. In addition, by adopting the replay memory, the agent allows for greater data efficiency by updating the weights from the experience within the replay memory. Moreover, by sampling episodes from the replay memory, correlations within the sample states are broken, increasing the training efficiency of the agent.

3.4. Task IV: Deploying the Agent with Transfer Learning

Although this DRL-based agent could protect the parameters in a specific layer, such a protection needs to become more effective across the DNNs. Considering the complexity of the state and action spaces, it is impractical to train such an agent for each layer in terms of time consumption and scalability. To illustrate these issues, we utilize a five-layer neural network as an example: the action space of bit masks contains 2 w actions to protect weights within each layer; to obtain all potential optimal actions simultaneously following the methods in [50,51], the action space increases from 2 w to 2 5 w . In addition, with an increase in the number of neurons in deeper layers, training the layer-wise DRL agent becomes more time-consuming due to the faults’ need to be injected across a proportion of neurons. To cope with this issue, transfer learning provides a solution to accelerate the learning process by reusing the knowledge of a trained agent with similar scenarios [62]. Additionally, with the results from the fault injection service, the layer-wise resiliency of the DNNs can expose prior knowledge for designing the reward functions. Such prior knowledge could stimulate the convergence of training the agent [63]. As a common method of transfer learning, we adopt the reward shaping by formulating the MDP as follows [62,64,65]:
M = < S , A , T , r , γ > M = < S , A , T , r , γ >
where M and M refer to the source and target domain for transferring the knowledge. To be specific, M is the MDP, which is trained at the l-th layer in Section 3.3. M is the MDP for the other layers. r refers to the newly shaped rewards, denoting to r = r l + r e . We specify r e as r e n regarding the layer index n , where n l . We define the reshaped reward r e n as Equation (10) for balancing the number of protected bits with the prior knowledge concluded from the fault injection service.
r e n = τ · ( i = 0 w I ( m i n ) σ n )
where τ refers to a coefficient to adjust the importance of the prior knowledge for training agents. m i n refers to the protected bit i in a specific layer n . The number of protected bits estimated by the DQN-based agent is formulated as i = 0 w I ( m i n ) . By combining the knowledge from bit masks trained in the specific layers with the resiliency analysis, σ n refers to an approximated number of protected bits. For example, σ n could be larger in some non-trivial layers (e.g., the first layer of the DNNs) than others. This threshold is determined by the resiliency of each layer under fault injection, which helps reduce the search cost within the state space. When the number of protected bits estimated by the DQN-based agent exceeds the threshold defined by prior knowledge, the agent receives a penalty and is encouraged to continue exploring. Conversely, if the estimated number of protected bits is within the threshold, the agent receives a reward and is encouraged to exploit its current policy to optimize performance. Equation (10) implies that transfer learning encourages the agent to protect the layers regarding approximated protected bit numbers from prior knowledge. In particular, when attaining a similar performance, the agent receives a penalty if the amount of bit masks from the agent is more significant than the approximated one.

4. Case Study

We follow a similar experimental design to that of existing works [57,58] by selecting an MLP and a ResNet-18 and evaluating them on the MNIST and CIFAR-10 datasets to test and validate our framework. We implement the case study in the following steps: (1) Identifying the layer-wise resiliency of these networks by simulating the fault injection. (2) Conducting the training process of the RL-based agent for a specific layer. (3) Measuring the time consumption of the agent training with other layers by transferring the knowledge. (4) Evaluating the performance of bit masks generated by the proposed method.

4.1. Evaluating Resiliency by Injecting Faults

Considering the energy efficiency and occupation of bit storage for large-scale neural networks, more and more hardware platforms intend to utilize 8-bit fixed points (INT-8) instead of 32-bit floating points to convert and store the parameters of the DNNs [66]. However, a side effect by using the INT-8 is that the DNNs require more precision and reliability to maintain the performance [67]. To meet these requirements, we test and validate our design by converting the weights of the DNNs to INT-8 and storing them in the proposed simulated model mentioned in Section 3.1. Given the significance and severity of soft errors [4,5], including bit-flips and stuck-at-0/1 errors, we inject bit-flips—recognized as the most critical faults affecting DNN performance—into the weights of the simulated model according to Equation (1). The model is configured with an entry size of 256 bits, with each size storing four 64-bit words. With different parameters, we collect the metrics to evaluate the performance and robustness of the DNNs by using Equation (2). Specifically, we present the layer-wise resiliency evaluated under r b = 10 % , and r e i follows the normal distribution with i = 1 N e r e i = 1 in Figure 5. The number of N e depends on the number of weights within the target DNN model, and they are automatically sequentially allocated with an entry size of 256 bits. Figure 5a illustrates that Layer 1 and 2 could dramatically affect the performance when soft errors occur. Additionally, Layer 3 shows a better resiliency compared to the others.
Based on the resiliency of this MLP, we generalize that the amount of protected bits is correlated to the layer index. For example, the neurons in the intermediate layers could require less protected bits than those in other layers. However, such a correlation is more complex when the layers become deeper in ResNet-18. Layer 1 of the ResNet-18 shows a similar pattern as the MLP. Meanwhile, Layers 7, 8, and 9, allocated in the middle of the network, are more sensitive to soft errors. Such a phenomenon shows that the various DNN architectures could differ regarding the performance of a unified protection scheme (e.g., conventional ECC scheme), indicating the demand of a tailored scheme regarding the DNNs.

4.2. Training the DRL-Based Agent with a Specific Layer

From Figure 5, we observe that input and intermediate layers are significantly affected by the soft errors. Considering the extensive parameters in the intermediate layers (e.g., Layer 8 of the ResNet-18 contains more than 140,000 parameters), we select Layer 2 of the MLP and Layer 1 of the ResNet-18, which have fewer parameters compared to the intermediate layers, to serve as the environment for the DRL-based agent. The agent consists of two MLP networks as a target network and a policy network. Each of them contains three hidden dense layers with sizes of 64, 128, and 128, and the output neurons of the first two hidden layers are activated by Leaky Relu functions. The generated actions from the last layer is activated by a Relu function. We configure the initialized bit masks as m 0 = ( 0 ) . The discount factor is set as γ = 0.99 with the learning rate l r = 0.001 . During the training of the agent, we use the Adam optimizer for stochastic gradient descent. We adopt the ϵ -greedy algorithm to balance the efficiency of exploring and exploiting with different state–actions pairs. At the start of the training episodes M, ϵ could be configured with 0.5 . After some iterations of the training episodes, we improve the efficiency to exploit the optimal actions by increasing the value of ϵ .
We propose the requirements with P 0 and σ 0 to indicate the threshold of the desired metrics of the DNNs. To converge the training of the DRL-based agent, we mark the terminal conditions for ending the loop of each episode as the bit masks satisfy the requirements instead of iterating the entire loop. Such an implementation could decrease the amount of training episodes and avoid the over-fitting. The overall training algorithm incorporated to the fault injection service is listed in Algorithm 1. When the performance and deviation in different episodes always meet the requirements (line 14 in Algorithm 1), then the agent is treated as converge.
Algorithm 1 Training the DRL-based Agent for a Specific Layer
Require: 
Configurations of the agent                                                            ▹ such as γ , l r , l
Ensure: 
Tailored bit masks m l for a specific layer l
 1:
for  k 1 to M do                                                                        ▹ Start episodes, M = 200
 2:
Initialize the bit masks m 0 l
 3:
Configure the fault parameters p e                                                                ▹ Equation (1)
 4:
Inject faults into a specific layer l
 5:
      for  t = 1 to T do                                                                     ▹ Start iterations, T = 500
 6:
            Run a random seed p t
 7:
            if  p t p ϵ  then                                                                           ▹ ϵ -greedy algorithm
 8:
                  Select a random action a t l
 9:
            else
10:
                  Select an action a t by the DQN                                    ▹Equations (6) and (7)
11:
            end if
12:
            Execute m t l a t l                                                                    ▹ Equations (4) and (5)
13:
            Protect the weights with bit masks m t l
14:
            if  P t p e ¯ P 0 and σ t σ 0  then                                                             ▹ Equation (2)
15:
                  Break the iteration and start new episodes
16:
            else
17:
                  Continue the iterations
18:
            end if
19:
            Optimize DQN by gradient descent                                                ▹ Equation (8)
20:
      end for
21:
end for
22:
Execute m l = m t l
23:
return  m l
A sample of training the agent with the MLP and ResNet-18 can be found in Figure 6a,d. The upper and lower bounds of the red region in the performance’s curves refer to P ¯ t ± σ t . All these tests run on the hardware configuration with Intel i7-9750H and NVIDIA Geforce GTX 1660 Ti. In these experiments, we configure r e i as a normal distribution and r b = 20 % . We set f i , f p as a scaling function with α = 1.15 and β = 1 to normalize the output. By converging the training of the agent, we observe that the bit masks finally stabilize to maintain the performance by protecting a few bits of each weight. To be specific, the optimal bit mask of the MLP is m m l p 2 = ( 1 , 0 , 0 , 1 , 0 , 1 , 0 , 0 ) , and the one of the ResNet-18 is m r e s 1 = ( 1 , 1 , 0 , 0 , 0 , 0 , 0 , 0 ) . Additionally, we find that the agent deployed in ResNet-18 requires around 100 episodes (27,675 s) to converge, spending much less time compared with the one of the MLP (31,668.79 s). The explicit reasons is that the number of weights could affect the time consumption of training the agent. To be specific, parameters of Layer 1 in the ResNet-18 (9472) are less than the ones of Layer 2 (12,800) in the MLP. Another potential reason could be that with deeper layers of the ResNet-18, it has a better optimization of the network topology compared with the MLP designed by ours.

4.3. Ablation Study of Transfer Learning

In Section 4.2, we find that training a layer-wise agent in MLP and ResNet-18 requires 31,668 s and 27,675 s. Obviously, this time cost is impractical to learn all the layers with the similar workload. In addition, with more neurons in the deeper layers, the training of the agents is more time-consuming. Therefore, regarding Equations (9) and (10), we use transfer learning by modifying the rewards and reusing the agent in Algorithm 1. We configure τ = 0.5 to leverage the current and approximated bit masks. According to the bit masks m m l p 2 , m r e s 1 learned from Section 4.2, we specify σ n = 2 for the rest of the layers of the MLP. Considering the complexity of the ResNet-18, σ n = 3 is used for the critical layers, which are those that exhibit significant performance degradation by fault injection simulations in Figure 5 (e.g., Layers 7, 8, and 9). σ n = 2 is used for the rest of the parts of the networks (e.g., Layers 2, 3, and 4). To implement transfer learning in the remaining layers, we reload the parameters of the DRL-based agent trained by Algorithm 1 after the training concludes, as described in Algorithm 1. Next, the DRL-based agent with transfer learning continues the training with reshaped rewards σ n following Equation (10). The terminal states of this agent follow the condition defined in line 14–15 in Algorithm 1. In Figure 6b,e, we present the performance of bit masks by adopting transfer learning in Layer 1 and Layer 14 of the MLP and ResNet-18, which are sensitive layers to soft errors. Compared with Figure 6a,d, these agents spend fewer episodes to converge the training while still maintaining the performance of the DNNs. Moreover, these layers’ protected bits are approaching even less than the bit numbers σ n defined by prior knowledge. Figure 6c,f refer to adopting the transfer learning to train the agents for the robust layers shown in resiliency analysis. The results from these layers, including Layer 4 of the MLP and the Layer 3 of ResNet-18, present that the agents use a few episodes to attain acceptable bit masks. Although soft errors in these layers could slightly impact the performance, the bit masks still show superior results to the baseline.
To this end, we conduct an ablation study in Figure 7 by comparing the adaptation of transfer learning with the training approach outlined in [50,51], where the layer-wise agents are trained to protect bits across the entire DNN simultaneously. In this figure, we also present the number and index of vulnerable bits by configuring different σ n during transfer learning. Figure 7a,b present the comparison of time cost between using RL and transfer learning. By adopting transfer learning, the trend of training time consumption is significantly decreased. Compared to the RL-based method without transfer learning, the training time of the agent for the MLP is reduced by an average of 62.76 % , decreasing from 72,170 s to 26,874 s, and for ResNet-16, it is reduced by 44.03 % , decreasing from 652,225 s to 363,991 s. We observe that there is one exception in Figure 7b, where Layer 16 of ResNet-18 spends more time than the RL-based training method. The potential rationale behind this phenomenon lies in balancing the resiliency of the layers with the σ n assigned by prior knowledge. Such a design attempts to spend more time to reach the desired performance as well as the optimal number of protected bits. In fact, the agent in this layer suggests that the bit masks protect 3 bits, while σ n = 2 . Such a conflict implies the consistency of the rationale. Figure 7c,d show the number of protected bits and their allocation in each layer. The labels of each bar refer to the positions protected by bit masks. Although we find that the agents protect the significant bits of the weights, these agents still discover other bit positions that affect the performance. In addition, σ n with different values determines the number of protected bits to a certain extent, decreasing the redundancy cost to protect the DNNs. For example, half of the layers defined σ n = 2 select one bit to protect the weights in Figure 7d.

4.4. Ablation Study of the Proposed Method

The bits highlighted by the bit masks could be protected by different practical schemes such as the BCH code and Hamming code. We utilize a similar comparative analysis as in [50,51] by selecting the Hamming Code for the ECC scheme to protect the bits within the weights. We use the MSB method and conventional Hamming code, which protects significant and arbitrary bits from soft errors, as a benchmark to compare with the proposed method. To further analyze the protection efficiency of the proposed framework, we also design a comparative scheme to protect k bits (referred to Top-k), where faults most frequently occur. The number of protected positions, k, is the same as those selected by the DRL-based agents shown in Figure 4c,d. We primarily assess the proportion between information and redundant bits to present the effectiveness of these methods. Next, we introduce a SDC ratio to underscore the protection efficiency. Finally, we measure the performance of the DNNs under various soft error rates by adopting different protection schemes. All these evaluations are based on the following configurations: r e i follows the normal distribution, r b , with 10% and 20% to reveal that the SDC, which could be visible, affected the performance of the DNNs.
We evaluate the efficiency of the protection schemes by modifying the metrics from [50,51], where they measure the proportion between the information and redundant bits.
η = k p r o · ( n i k i ) k i
where ( n i , k i ) refer to a linear code with n i bits and k i information bits for a weight in the DNNs; k p r o refers to the protected bits in the information bits. Equation (11) indicates the redundancy rate to protect the same bits in a weight by using different schemes. With the same measurement in [50,51], the average redundancy rate for the proposed method η m l p = 50.05 % , and η r e s n e t = 51.25 % , while η m s b = 56.24 % , and η h a m m i n g = 75.00 % . Since the Top-k scheme protects the same number of bits as the proposed methods, the average redundancy rate is the same as η m l p , η r e s n e t .
Due to the inherent robustness and probabilistic nature of the DNNs, comparing bit-by-bit could be invalid to reflect the degradation of the overall performance [4,5,6]. We propose a metric defined in [5,44] as Equation (12) to further highlight and evaluate the protection schemes by normalizing the SDC ratio under the specific fault probability p e . A smaller SDC ratio indicates that a more robust DNN resists soft errors.
p ( S D C | p e ) = i = 1 N t ( P g P i ) N t P g = P g P ¯ N t p e P g
where P g refers to the performance of the DNNs when they are error-free (golden run). N t refers to the times of running DNNs with different protection schemes under the fault injection. We configure N t = 500 across these experiments. P ¯ N t p e is the same as Equation (2), referring to the average performance of the DNNs protected by different methods in N t tests.
In Table 1, we list the SDC ratio which is computed by Equation (12). Although we observe that the Hamming code scheme outperforms other baseline solutions, it suffers a higher average redundancy rate than other baseline methods. On the other hand, the MSB and Top-k schemes demonstrate varying protection efficiency across different layers and DNN structures. Specifically, we observe that the performance of the MSB scheme tends to be better than that of the Top-k scheme in less sensitive layers of ResNet-18. This is because Top-k protects only a few bits, the same number as the DRL-based agent shown in Figure 7, which is fewer than the number protected by the MSB scheme. Beyond these baseline methods, the proposed method significantly decreases the SDC ratio under different BERs, improving the robustness of the DNNs. Compared to baseline methods that only protect a few crucial or random bits, we conclude that this improvement stems from the reward design of the RL-based agent, which effectively explores vulnerable bits within weights. In particular, the proposed method shows better performance than the Top-k scheme, which protects the same number of bits, indicating a higher protection efficiency of the proposed method. For MLP or ResNet-18 at r b = 20 % , the average performance of the proposed method decreases from 98.9% to 97.67% and from 89.60% to 80.80% when protecting the critical layers, including Layers 1 and 2 of the MLP and Layers 1, 7, 8, and 9 of the ResNet-18. These results still significantly outperform those of the baseline schemes. Meanwhile, the SDC ratios significantly decrease to 1.24 within MLP and to 7.54 within ResNet-18. In addition, our method performs superior results in ResNet-18, implying that it is more suitable for neural networks with deeper layers.
To further evaluate the propose methods, we illustrate the overall performance of the DNNs using both the proposed method and the baseline methods across the varying soft error rates in Figure 8. Although these methods demonstrate similar performance in the MLP with r b = 10 % , as depicted in Figure 8a, the proposed method outperforms them as r b increases. In particular, the proposed methods demonstrate significant improvement compared to the baseline methods across ResNet-18 in Figure 8c,d, suggesting that the proposed framework is more efficient and effective as the DNNs adopt a more complex model.

5. Conclusions and Future Work

This paper presents a dynamic error protection method by combining RL with transfer learning to protect DNNs from soft errors. The proposed method can dynamically and flexibly identify and address the vulnerable bits from each layer of the DNNs. By comparing with the current work, we summarize the contribution of the proposed methods as follows: (1) Compared to current work in functional fault injection [2,5,15,24,25], we introduce a simulated model for injecting faults within the weights of the DNNs, which enables RL-based methods in addressing vulnerable bits in the early function design phase. (2) Compared with conventional protection schemes that protect arbitrary bits, we train a layer-wise agent with the consideration of the resiliency revealed by the fault injection service. As a result, this agent dynamically selects vulnerable bits by balancing the performance and redundancy. (3) Compared with the related work by using RL-based agents in [50,51], our work optimizes the time consumption by training a layer-wise agent instead of using global and local ones. By specifying an explicit reward mechanism, the proposed layer-wise agent intends to suppress the hyper-parameters’ impact on the agent, avoiding configuring sophisticated reward functions in [50,51]. (4) With the help of the prior knowledge acquired by layer-wise resiliency analysis, the proposed method accelerates the agent’s convergence by adopting transfer learning. The proposed method performs flexibility to configure the number of protected bits regarding the prior knowledge. Moreover, the adoption of transfer learning enhances the potential to generalize the DRL-based agent to other DNN structures by refining the reshaped rewards. (5) In the experiments, ResNet-18, which is more complex than a simple MLP, still demonstrates significant efficiency, suggesting the potential for our framework to be extended to larger and more complex DNNs.
Based on the above conclusions, we prepare to extend our work from the following aspects: (1) The implementation of protection schemes using bit masks generated by the DRL-based agent could be further strengthened by integrating it with specific hardware configurations to optimize the proposed framework. (2) The design of DRL and transfer learning can be further enhanced through various optimization techniques (e.g., knowledge distillation) to improve the training efficiency. For example, the hyperparameters within this framework can be fine-tuned in accordance with specific system requirements. Specifically, the correlation between the specification of numerical rewards and injected faults can be further parameterized to mitigate the coupling effects caused by the random fault injection presented in Task I. In addition, the algorithmic optimization can be further used in the framework to improve the training efficiency. For example, prioritized replay buffers offer an enhanced version of the standard replay buffer by sampling transitions that are less frequent or more informative, thereby improving the agent’s performance. (3) Given the detailed and realistic evaluation from microarchitecture fault injection, combining the RL-based agent with a technical fault injection seems promising. Utilizing the simulator gem5 and its fault injector gemifi, we prepare to conduct a detailed evaluation of the RL-based agent, taking into account timing and resource constraints. (4) The proposed framework can be further integrated into the design of safety-critical systems to ensure functional safety. For example, the DRL-based agent can be deployed online or offline in the learning-enabled components of the ADS to improve reliability. Specifically, during the technical design for deploying these components in hardware (e.g., FPGA/ASIC), the proposed framework provides solutions to enhance functional safety by identifying and addressing vulnerabilities in the context of bit-level data. (5) Despite the evaluation on MLP and ResNet-18 with the INT-8 quantization scheme, further experiments are needed to analyze the generalizability and scalability of the proposed framework. Specifically, we plan to implement the proposed framework in other more complex DNN structures with different quantization schemes to extract the critical layers and protect them from soft errors.

Author Contributions

Conceptualization, D.C. and P.S.; Methodology, D.C. and P.S.; Software, P.S. and Y.L.; Formal analysis, P.S.; Data curation, Y.L.; Writing—original draft, P.S. and D.C.; Writing—review & editing, all authors; Supervision, Z.L. and D.C.; Project administration, D.C.; Funding acquisition, D.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported KTH Royal Institute of Technology with the industrial research project ADinSOS (grant number: 2019065006).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Dataset available on request from the authors.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Cerquitelli, T.; Meo, M.; Curado, M.; Skorin-Kapov, L.; Tsiropoulou, E.E. Machine learning empowered computer networks. Comput. Netw. 2023, 230, 109807. [Google Scholar] [CrossRef]
  2. Su, P.; Chen, D. Using fault injection for the training of functions to detect soft errors of dnns in automotive vehicles. In Proceedings of the International Conference on Dependability and Complex Systems, Wrocław, Poland, 27 June–1 July 2022; Springer: Cham, Switzerland, 2022; pp. 308–318. [Google Scholar]
  3. Zhu, Z.; Su, P.; Zhong, S.; Huang, J.; Ottikkutti, S.; Tahmasebi, K.N.; Zou, Z.; Zheng, L.; Chen, D. Using a VAE-SOM architecture for anomaly detection of flexible sensors in limb prosthesis. J. Ind. Inf. Integr. 2023, 35, 100490. [Google Scholar] [CrossRef]
  4. Ibrahim, Y.; Wang, H.; Liu, J.; Wei, J.; Chen, L.; Rech, P.; Adam, K.; Guo, G. Soft errors in DNN accelerators: A comprehensive review. Microelectron. Reliab. 2020, 115, 113969. [Google Scholar] [CrossRef]
  5. Li, G.; Hari, S.K.S.; Sullivan, M.; Tsai, T.; Pattabiraman, K.; Emer, J.; Keckler, S.W. Understanding error propagation in deep learning neural network (DNN) accelerators and applications. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, Denver, CO, USA, 12–17 November 2017; pp. 1–12. [Google Scholar]
  6. Sullivan, M.B.; Saxena, N.; O’Connor, M.; Lee, D.; Racunas, P.; Hukerikar, S.; Tsai, T.; Hari, S.K.S.; Keckler, S.W. Characterizing and mitigating soft errors in gpu dram. In Proceedings of the MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture, Online, 18–22 October 2021; pp. 641–653. [Google Scholar]
  7. Brown, S. Overview of IEC 61508. Design of electrical/electronic/programmable electronic safety-related systems. Comput. Control Eng. 2000, 11, 6–12. [Google Scholar] [CrossRef]
  8. Su, P.; Warg, F.; Chen, D. A simulation-aided approach to safety analysis of learning-enabled components in automated driving systems. In Proceedings of the 2023 IEEE 26th International Conference on Intelligent Transportation Systems (ITSC), Bilbao, Spain, 24–28 September 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 6152–6157. [Google Scholar]
  9. Su, P. Supporting Self-Management in Cyber-Physical Systems by Combining Data-Driven and Knowledge-Enabled Methods. Ph.D. Thesis, KTH Royal Institute of Technology, Stockholm, Sweden, 2025. [Google Scholar]
  10. Alsboui, T.; Qin, Y.; Hill, R.; Al-Aqrabi, H. Distributed intelligence in the internet of things: Challenges and opportunities. SN Comput. Sci. 2021, 2, 277. [Google Scholar] [CrossRef]
  11. Li, Y.; Liu, W.; Liu, Q.; Zheng, X.; Sun, K.; Huang, C. Complying with iso 26262 and iso/sae 21434: A safety and security co-analysis method for intelligent connected vehicle. Sensors 2024, 24, 1848. [Google Scholar] [CrossRef]
  12. Gao, Z.; Cecati, C.; Ding, S.X. A survey of fault diagnosis and fault-tolerant techniques—Part I: Fault diagnosis with model-based and signal-based approaches. IEEE Trans. Ind. Electron. 2015, 62, 3757–3767. [Google Scholar] [CrossRef]
  13. Bertoa, T.G.; Gambardella, G.; Fraser, N.J.; Blott, M.; McAllister, J. Fault-tolerant neural network accelerators with selective TMR. IEEE Des. Test 2022, 40, 67–74. [Google Scholar] [CrossRef]
  14. Kim, S.; Howe, P.; Moreau, T.; Alaghi, A.; Ceze, L.; Sathe, V. MATIC: Learning around errors for efficient low-voltage neural network accelerators. In Proceedings of the 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE), Dresden, Germany, 19–23 March 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 1–6. [Google Scholar]
  15. Chen, Z.; Narayanan, N.; Fang, B.; Li, G.; Pattabiraman, K.; DeBardeleben, N. Tensorfi: A flexible fault injection framework for tensorflow applications. In Proceedings of the 2020 IEEE 31st International Symposium on Software Reliability Engineering (ISSRE), Coimbra, Portugal, 12–15 October 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 426–435. [Google Scholar]
  16. Su, P.; Fan, T.; Chen, D. Scheduling Resource to Deploy Monitors in Automated Driving Systems. In Proceedings of the International Conference on Dependability and Complex Systems, Brunów, Poland, 3–7 July 2023; Springer: Cham, Switzerland, 2023; pp. 285–294. [Google Scholar]
  17. Moghaddasi, I.; Gorgin, S.; Lee, J.A. Dependable DNN Accelerator for Safety-critical Systems: A Review on the Aging Perspective. IEEE Access 2023, 11, 89803–89834. [Google Scholar] [CrossRef]
  18. Salay, R.; Queiroz, R.; Czarnecki, K. An analysis of ISO 26262: Using machine learning safely in automotive software. arXiv 2017, arXiv:1709.02435. [Google Scholar]
  19. Chen, D.; Johansson, R.; Lönn, H.; Blom, H.; Walker, M.; Papadopoulos, Y.; Torchiaro, S.; Tagliabo, F.; Sandberg, A. Integrated safety and architecture modeling for automotive embedded systems. Elektrotechnik Informationstechnik 2011, 128, 196. [Google Scholar] [CrossRef]
  20. Su, P.; Kang, S.; Tahmasebi, K.N.; Chen, D. Enhancing safety assurance for automated driving systems by supporting operation simulation and data analysis. In Proceedings of the ESREL 2023, 33nd European Safety And Reliability Conference, Southampton, UK, 3–7 September 2023. [Google Scholar]
  21. Koopman, P.; Wagner, M. Challenges in autonomous vehicle testing and validation. SAE Int. J. Transp. Saf. 2016, 4, 15–24. [Google Scholar] [CrossRef]
  22. Shuvo, A.M.; Zhang, T.; Farahmandi, F.; Tehranipoor, M. A Comprehensive Survey on Non-Invasive Fault Injection Attacks. Cryptology ePrint Archive. 2023. Available online: https://eprint.iacr.org/2023/1769 (accessed on 25 May 2025).
  23. Taheri, M.; Ahmadilivani, M.H.; Jenihhin, M.; Daneshtalab, M.; Raik, J. Appraiser: Dnn fault resilience analysis employing approximation errors. In Proceedings of the 2023 26th International Symposium on Design and Diagnostics of Electronic Circuits and Systems (DDECS), Tallinn, Estonia, 3–5 May 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 124–127. [Google Scholar]
  24. Hoang, L.H.; Hanif, M.A.; Shafique, M. Ft-clipact: Resilience analysis of deep neural networks and improving their fault tolerance using clipped activation. In Proceedings of the 2020 Design, Automation & Test in Europe Conference & Exhibition (DATE), Grenoble, France, 9–13 March 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1241–1246. [Google Scholar]
  25. Ruospo, A.; Gavarini, G.; Bragaglia, I.; Traiola, M.; Bosio, A.; Sanchez, E. Selective hardening of critical neurons in deep neural networks. In Proceedings of the 2022 25th International Symposium on Design and Diagnostics of Electronic Circuits and Systems (DDECS), Prague, Czech Republic, 6–8 April 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 136–141. [Google Scholar]
  26. He, Y.; Balaprakash, P.; Li, Y. Fidelity: Efficient resilience analysis framework for deep learning accelerators. In Proceedings of the 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Athens, Greece, 17–21 October 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 270–281. [Google Scholar]
  27. Chatzidimitriou, A.; Bodmann, P.; Papadimitriou, G.; Gizopoulos, D.; Rech, P. Demystifying soft error assessment strategies on arm cpus: Microarchitectural fault injection vs. neutron beam experiments. In Proceedings of the 2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), Portland, OR, USA, 24–27 June 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 26–38. [Google Scholar]
  28. Tsai, T.; Hari, S.K.S.; Sullivan, M.; Villa, O.; Keckler, S.W. Nvbitfi: Dynamic fault injection for gpus. In Proceedings of the 2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), Taipei, Taiwan, 21–24 June 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 284–291. [Google Scholar]
  29. Papadimitriou, G.; Gizopoulos, D. Demystifying the System Vulnerability Stack: Transient Fault Effects Across the Layers. In Proceedings of the 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA), Valencia, Spain, 14–18 June 2021; pp. 902–915. [Google Scholar]
  30. Parasyris, K.; Tziantzoulis, G.; Antonopoulos, C.D.; Bellas, N. Gemfi: A fault injection tool for studying the behavior of applications on unreliable substrates. In Proceedings of the 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, Atlanta, GA, USA, 23–26 June 2014; IEEE: Piscataway, NJ, USA, 2014; pp. 622–629. [Google Scholar]
  31. dos Santos, F.F.; Hari, S.K.S.; Basso, P.M.; Carro, L.; Rech, P. Demystifying GPU reliability: Comparing and combining beam experiments, fault simulation, and profiling. In Proceedings of the 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS), Portland, OR, USA, 17–21 May 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 289–298. [Google Scholar]
  32. Moskalenko, V.; Kharchenko, V.; Moskalenko, A.; Kuzikov, B. Resilience and Resilient Systems of Artificial Intelligence: Taxonomy, Models and Methods. Algorithms 2023, 16, 165. [Google Scholar] [CrossRef]
  33. Torres-Huitzil, C.; Girau, B. Fault and error tolerance in neural networks: A review. IEEE Access 2017, 5, 17322–17341. [Google Scholar] [CrossRef]
  34. Krcma, M.; Kotasek, Z.; Lojda, J. Triple modular redundancy used in field programmable neural networks. In Proceedings of the 2017 IEEE East-West Design & Test Symposium (EWDTS), Novi Sad, Serbia, 29 September–2 October 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 1–6. [Google Scholar]
  35. Arifeen, T.; Hassan, A.S.; Lee, J.A. Approximate triple modular redundancy: A survey. IEEE Access 2020, 8, 139851–139867. [Google Scholar] [CrossRef]
  36. Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
  37. Dutta, S.; Bai, Z.; Low, T.M.; Grover, P. CodeNet: Training large scale neural networks in presence of soft-errors. arXiv 2019, arXiv:1903.01042. [Google Scholar]
  38. Shi, Y.; Wang, B.; Luo, S.; Xue, Q.; Zhang, X.; Ma, S. Understanding and Mitigating the Soft Error of Contrastive Language-Image Pre-training Models. In Proceedings of the 2024 IEEE International Test Conference in Asia (ITC-Asia), Changsha, China, 18–20 August 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1–6. [Google Scholar]
  39. Liu, T.; Wen, W.; Jiang, L.; Wang, Y.; Yang, C.; Quan, G. A fault-tolerant neural network architecture. In Proceedings of the 56th Annual Design Automation Conference, Las Vegas, NV, USA, 2–6 June 2019; pp. 1–6. [Google Scholar]
  40. Abbaspour, A.; Mokhtari, S.; Sargolzaei, A.; Yen, K.K. A survey on active fault-tolerant control systems. Electronics 2020, 9, 1513. [Google Scholar] [CrossRef]
  41. Beyer, M.; Schorn, C.; Fabarisov, T.; Morozov, A.; Janschek, K. Automated hardening of deep neural network architectures. In Proceedings of the ASME International Mechanical Engineering Congress and Exposition, Online, 1–5 November 2021; American Society of Mechanical Engineers: New York, NY, USA, 2021; Volume 85697, p. V013T14A046. [Google Scholar]
  42. Abich, G.; Gava, J.; Garibotti, R.; Reis, R.; Ost, L. Applying lightweight soft error mitigation techniques to embedded mixed precision deep neural networks. IEEE Trans. Circuits Syst. I Regul. Pap. 2021, 68, 4772–4782. [Google Scholar] [CrossRef]
  43. Xiao, H.; Cao, M.; Peng, R. Artificial neural network based software fault detection and correction prediction models considering testing effort. Appl. Soft Comput. 2020, 94, 106491. [Google Scholar] [CrossRef]
  44. Lu, Q.; Li, G.; Pattabiraman, K.; Gupta, M.S.; Rivers, J.A. Configurable detection of SDC-causing errors in programs. ACM Trans. Embed. Comput. Syst. (TECS) 2017, 16, 1–25. [Google Scholar] [CrossRef]
  45. Ozen, E.; Orailoglu, A. Boosting bit-error resilience of DNN accelerators through median feature selection. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 2020, 39, 3250–3262. [Google Scholar] [CrossRef]
  46. Wang, Y.; Su, H.; Zhang, B.; Hu, X. Interpret neural networks by identifying critical data routing paths. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8906–8914. [Google Scholar]
  47. Chen, Z.; Li, G.; Pattabiraman, K. A low-cost fault corrector for deep neural networks through range restriction. In Proceedings of the 2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), Taipei, Taiwan, 21–24 June 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1–13. [Google Scholar]
  48. Wang, K.; Louri, A.; Karanth, A.; Bunescu, R. High-performance, energy-efficient, fault-tolerant network-on-chip design using reinforcement learning. In Proceedings of the 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE), Florence, Italy, 25–29 March 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1166–1171. [Google Scholar]
  49. Wang, K.; Louri, A. Cure: A high-performance, low-power, and reliable network-on-chip design using reinforcement learning. IEEE Trans. Parallel Distrib. Syst. 2020, 31, 2125–2138. [Google Scholar] [CrossRef]
  50. Huang, K.; Siegel, P.H.; Jiang, A.A. Functional Error Correction for Reliable Neural Networks. In Proceedings of the 2020 IEEE International Symposium on Information Theory (ISIT), Los Angeles, CA, USA, 21–26 June 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 2694–2699. [Google Scholar]
  51. Huang, K.; Siegel, P.H.; Jiang, A. Functional error correction for robust neural networks. IEEE J. Sel. Areas Inf. Theory 2020, 1, 267–276. [Google Scholar] [CrossRef]
  52. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
  53. Mnih, V.; Kavukcuoglu, K.; Silver, D.; Graves, A.; Antonoglou, I.; Wierstra, D.; Riedmiller, M. Playing atari with deep reinforcement learning. arXiv 2013, arXiv:1312.5602. [Google Scholar]
  54. Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep reinforcement learning. arXiv 2015, arXiv:1509.02971. [Google Scholar]
  55. Luong, N.C.; Hoang, D.T.; Gong, S.; Niyato, D.; Wang, P.; Liang, Y.C.; Kim, D.I. Applications of deep reinforcement learning in communications and networking: A survey. IEEE Commun. Surv. Tutorials 2019, 21, 3133–3174. [Google Scholar] [CrossRef]
  56. Su, P.; Lu, Z.; Chen, D. Combining self-organizing map with reinforcement learning for multivariate time series anomaly detection. In Proceedings of the 2023 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Honolulu, HI, USA, 1–4 October 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1964–1969. [Google Scholar]
  57. He, Y.; Lin, J.; Liu, Z.; Wang, H.; Li, L.J.; Han, S. Amc: Automl for model compression and acceleration on mobile devices. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 784–800. [Google Scholar]
  58. Chen, J.; Chen, S.; Pan, S.J. Storage efficient and dynamic flexible runtime channel pruning via deep reinforcement learning. In Proceedings of the Annual Conference on Neural Information Processing Systems 2020: Advances in Neural Information Processing Systems 33 (NeurIPS 2020), Online, 6–12 December 2020; pp. 14747–14758. [Google Scholar]
  59. Hamming, R.W. Error detecting and error correcting codes. Bell Syst. Tech. J. 1950, 29, 147–160. [Google Scholar] [CrossRef]
  60. Li, Y. Deep reinforcement learning: An overview. arXiv 2017, arXiv:1701.07274. [Google Scholar]
  61. Arulkumaran, K.; Deisenroth, M.P.; Brundage, M.; Bharath, A.A. Deep reinforcement learning: A brief survey. IEEE Signal Process. Mag. 2017, 34, 26–38. [Google Scholar] [CrossRef]
  62. Zhuang, F.; Qi, Z.; Duan, K.; Xi, D.; Zhu, Y.; Zhu, H.; Xiong, H.; He, Q. A comprehensive survey on transfer learning. Proc. IEEE 2020, 109, 43–76. [Google Scholar] [CrossRef]
  63. Badreddine, S.; Spranger, M. Injecting prior knowledge for transfer learning into reinforcement learning algorithms using logic tensor networks. arXiv 2019, arXiv:1906.06576. [Google Scholar]
  64. Tang, M.; Cai, S.; Lau, V.K. Online system identification and control for linear systems with multiagent controllers over wireless interference channels. IEEE Trans. Autom. Control. 2022, 68, 6020–6035. [Google Scholar] [CrossRef]
  65. Zhu, Z.; Lin, K.; Jain, A.K.; Zhou, J. Transfer learning in deep reinforcement learning: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 13344–13362. [Google Scholar] [CrossRef] [PubMed]
  66. Sun, X.; Choi, J.; Chen, C.Y.; Wang, N.; Venkataramani, S.; Srinivasan, V.V.; Cui, X.; Zhang, W.; Gopalakrishnan, K. Hybrid 8-bit floating point (HFP8) training and inference for deep neural networks. In Proceedings of the Annual Conference on Neural Information Processing Systems 2019: Advances in Neural Information Processing Systems 32 (NeurIPS 2019), Vancouver, BC, Canada, 8–14 December 2019. [Google Scholar]
  67. Wang, N.; Choi, J.; Brand, D.; Chen, C.Y.; Gopalakrishnan, K. Training deep neural networks with 8-bit floating point numbers. In Proceedings of the Annual Conference on Neural Information Processing Systems 2018: Advances in Neural Information Processing Systems 31 (NeurIPS 2018), Montréal, QC, Canada, 3–8 December 2018. [Google Scholar]
Figure 1. The proposed framework of applying RL to protect DNN from soft errors.
Figure 1. The proposed framework of applying RL to protect DNN from soft errors.
Sensors 25 04196 g001
Figure 2. A simple structure of DNN accelerator. Depending on the DNN layer-wise characteristics (e.g., the dimensions), the parameters from the PE could be saved either in the same entry or different entries of the memory. Therefore, the fault rate p e could impact the parameters (e.g., parameter #a and parameter #b) when they store in different entries.
Figure 2. A simple structure of DNN accelerator. Depending on the DNN layer-wise characteristics (e.g., the dimensions), the parameters from the PE could be saved either in the same entry or different entries of the memory. Therefore, the fault rate p e could impact the parameters (e.g., parameter #a and parameter #b) when they store in different entries.
Sensors 25 04196 g002
Figure 3. The development flow of the proposed fault injection service. The dash box represents the contribution of our proposed method.
Figure 3. The development flow of the proposed fault injection service. The dash box represents the contribution of our proposed method.
Sensors 25 04196 g003
Figure 4. An example to illustrate that bit masks protect weights from different layers by selecting vulnerable bits.
Figure 4. An example to illustrate that bit masks protect weights from different layers by selecting vulnerable bits.
Sensors 25 04196 g004
Figure 5. Layer-wise resiliency of DNN by injecting faults with 500 iterations.
Figure 5. Layer-wise resiliency of DNN by injecting faults with 500 iterations.
Sensors 25 04196 g005
Figure 6. Snapshot during training the agents with their performance, rewards, and redundancy. The red boundaries within the figures denote the upper and lower values observed across various training episodes.
Figure 6. Snapshot during training the agents with their performance, rewards, and redundancy. The red boundaries within the figures denote the upper and lower values observed across various training episodes.
Sensors 25 04196 g006
Figure 7. Overall results by adopting transfer learning in different layers. The protected bit position is shown in the top of the bar.
Figure 7. Overall results by adopting transfer learning in different layers. The protected bit position is shown in the top of the bar.
Sensors 25 04196 g007
Figure 8. Performance between different protection schemes with soft errors.
Figure 8. Performance between different protection schemes with soft errors.
Sensors 25 04196 g008
Table 1. Performance evaluation between different protection schemes. The golden run performances P g of MLP and ResNet-18 are 98.90% and 89.6%.
Table 1. Performance evaluation between different protection schemes. The golden run performances P g of MLP and ResNet-18 are 98.90% and 89.6%.
DNN
Type
r b
(%)
Average Performance
Protecting in Critical Layers (%)
SDC Ratio
Across the DNN
Our
Method
Without
Protection
Hamming
Code
MSB Top-k Our
Method
Without
Protection
Hamming
Code
MSB Top-k
MLP1098.3094.3298.2698.1696.830.472.920.500.621.40
2097.6770.5593.9190.3586.611.2417.303.505.648.15
ResNet-181083.6062.9870.2965.8976.635.5514.6311.1712.939.91
2080.8032.9170.9866.6171.117.5731.9910.9218.039.93
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Su, P.; Li, Y.; Lu, Z.; Chen, D. Applying Reinforcement Learning to Protect Deep Neural Networks from Soft Errors. Sensors 2025, 25, 4196. https://doi.org/10.3390/s25134196

AMA Style

Su P, Li Y, Lu Z, Chen D. Applying Reinforcement Learning to Protect Deep Neural Networks from Soft Errors. Sensors. 2025; 25(13):4196. https://doi.org/10.3390/s25134196

Chicago/Turabian Style

Su, Peng, Yuhang Li, Zhonghai Lu, and Dejiu Chen. 2025. "Applying Reinforcement Learning to Protect Deep Neural Networks from Soft Errors" Sensors 25, no. 13: 4196. https://doi.org/10.3390/s25134196

APA Style

Su, P., Li, Y., Lu, Z., & Chen, D. (2025). Applying Reinforcement Learning to Protect Deep Neural Networks from Soft Errors. Sensors, 25(13), 4196. https://doi.org/10.3390/s25134196

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop