Skip Content
You are currently on the new version of our website. Access the old version .
MicromachinesMicromachines
  • Article
  • Open Access

29 January 2026

HCCA-SAFE: A Hybrid Cascaded Control Architecture for FPGA-Based Fault Injection in Safety-Critical Automotive SoCs

,
,
,
,
,
and
1
School of Microelectronics, Xidian University, Xi’an 710071, China
2
Guangzhou Institute of Technology, Xidian University, Guangzhou 510555, China
3
Shenzhen Institute of Technology, Xidian University, Shenzhen 518000, China
*
Authors to whom correspondence should be addressed.
This article belongs to the Section D1: Semiconductor Devices

Abstract

Automotive System-on-Chips (SoCs) must meet stringent functional safety standards, such as ISO 26262 and IEC 61508, to ensure reliable operation under hardware faults. FPGA-based fault injection has emerged as a practical and cost-effective technique for functional safety verification. However, instrumentation-based methods face scalability challenges when applied to the high fault densities typical of automotive SoCs. To address these challenges, we propose a hybrid cascaded fault-injection controller architecture (HCCA-SAFE) that simultaneously reduces high-fanout global nets and eliminates long serial propagation paths. The architecture constrains enable-signal cluster width and distributes control across cascaded stages, improving timing results and routability under limited FPGA resources. The proposed architecture is evaluated on multiple open-source RISC-V processor cores. On openE902, HCCA-SAFE reduces net delay from 27.276 ns to 22.535 ns and achieves 32.2% and 63.8% lower net delay compared with the representative centralized and shift-chain approaches, respectively. On openE906, the proposed HCCA-SAFE limits the net delay to 12.959 ns and reduces the maximum control-signal fanout to 1763, respectively, compared with 25.825 ns and 40.442 ns in the conventional method. On openC906, the proposed design lowers the maximum control-signal fanout from 7725 to 570 and reduces the net delay to 7.506 ns. Furthermore, HCCA-SAFE produces results fully consistent with software-based RTL simulation, while delivering substantial performance gains. Speed-up factors of 127 × , 206 × , and 2123 × are achieved on openE902, openE906, and openC906, respectively, with efficiency improvements scaling with processor complexity These results confirm that HCCA-SAFE delivers scalable, timing-robust fault-injection control suitable for large automotive SoCs.

1. Introduction

Electronic systems in automobiles are required to comply with stringent functional safety standards, most notably ISO 26262 [1] and IEC 61508 [2]. A central requirement of these standards is that electronic systems must be capable of preventing failures that could arise from inherent faults. Such faults are typically mitigated or detected through hardware redundancies integrated within automotive System-on-Chips (SoCs), which are referred to as safety mechanisms in ISO 26262. With the rapid growth in the size, architectural complexity, and safety-critical functionalities of automotive SoCs, functional safety verification has become an indispensable requirement for both SoC and IP-level designs. To guarantee that safety goals are met even under the most adverse operating conditions, comprehensive and efficient fault injection campaigns, supported by high-speed simulation, are essential. Despite advancements in commercial fault simulators such as Cadence Xcelium Fault Simulator [3], software-based fault injection methods still struggle to meet the stringent efficiency and scalability requirements inherent to large-scale SoC designs and long-term validation campaigns. To address these challenges, researchers have increasingly turned to hardware-assisted solutions. In particular, Field-Programmable Gate Array (FPGA)-based fault injection techniques have been widely explored in the literature.
There are two primary ways to perform FPGA-based fault injection [4]: reconfiguration-based and instrumentation-based methods. The first method emulates logic faults in FPGA prototypes by dynamically altering the contents of configuration memory (CM) during runtime. However, this method suffers from a lack of direct correlation between the Register Transfer Level (RTL) design nodes and the corresponding bit location of configuration frames, making precise fault targeting difficult [5]. In contrast, the instrumentation-based approach involves modifying the design under test (DUT) by inserting additional logic to simulate fault effects explicitly within the design. Nevertheless, this method incurs additional area and timing overhead [6], and it becomes increasingly impractical when the DUT involves thousands to millions of potential fault sites, due to scalability and resource limitations.
In this paper, we propose a novel Hybrid Cascaded fault injection Controller Architecture for SAFE critical (HCCA-SAFE) Automotive SoCs verification that redefines the way control is distributed in FPGA-based fault injection. The proposed architecture introduces a distributed cascaded topology that localizes control distribution while preserving global flexibility. This hybrid design uniquely enables simultaneous improvements in timing performance, scalability, and resource efficiency, addressing a long-standing trade-off in existing solutions. To the best of our knowledge, this is the first architecture that explicitly targets large-scale fault injection campaigns under strict FPGA area and performance constraints, thereby offering a new pathway toward practical and high-throughput fault injection platforms. The remainder of this paper is organized as follows. First, we review the background and related work. Next, the proposed fault injection controller architecture is presented in detail, followed by a description of the fault injection flow. Subsequently, the experimental setup and results are discussed. Finally, conclusions are drawn.

3. A Hybrid Cascaded Fault Injection Controller Architecture

To overcome the challenges inherent in the instrumentation-based method, we propose a hybrid cascaded fault injection controller architecture that is optimized for handling a large number of potential fault sites and supporting SA0, SA1, and SEU fault models, thereby improving both scalability and applicability in complex SoC designs. Leveraging a hierarchical and decoupled control design, the proposed architecture reformulates fault injection from a fine-grained instrumentation problem into a scalable control distribution problem. Specifically, the FCU and FU decouple fault model specification from fault execution, allowing fault semantics to be configured independently of injection targets. Under this organization, the FCU centrally manages fault activation logic and distributes control through bounded-width enable clusters, while the FU performs localized fault injection at designated sites. To further enable deterministic fault activation, time registers are introduced as a first-class temporal control mechanism, providing cycle-accurate triggering of fault events. This combination establishes a programmable, time-aware, and scalable fault injection architecture, rather than a simple aggregation of control logic. Figure 3 illustrates the proposed architecture. At the core of the design is the FCU, which integrates a Fault Control Finite State Machine (FSM), a Fault Bit Selection Module, a Fault Injection Timer, and a Fault Enable Signal Cluster.
Figure 3. The hybrid cascaded fault injection controller architecture.

3.1. Microarchitecture

3.1.1. Fault Control FSM

This component is driven by the system clock and reset signals, and it additionally receives the control signal (nxt_cycle_i) propagated from the preceding FCU. These inputs collectively determine the scheduling and regulation of fault injection events, and the corresponding state-transition diagram of the controller FSM is shown in Figure 4. The FSM will be initialized in the IDLE state after reset and transitions to a wait state upon fault ID matching with a start signal. Once the internal timer reaches the target injection cycle, the FSM executes the injection based on the fault type: generating a single-cycle pulse for SEU or holding the enable signal for SA faults until the round concludes. Afterward, the FSM enters the NEXT_BIT state to reset the timer and left-shift the selection vector; it then loops back to inject the next bit or transitions to FINISH.
Figure 4. State transition diagram of the fault control finite-state machine.

3.1.2. Fault Bit Selection Module

Based on the control signals generated by the Fault Control FSM, this module identifies the specific FUs to be activated in a given fault injection campaign. The corresponding enable signals are then propagated to the subsequent stage to ensure correct fault activation and synchronization across the system.

3.1.3. Fault Injection Timer

There is a configuration register to receive the fault injection timing signal (time). Once initialized, the internal timer will autonomously increment with each clock cycle, enabling precise scheduling of FU activation at the target clock cycle prescribed by the fault injection campaign.

3.1.4. Fault Enable Signal Cluster

To minimize the risk of control signals (FU_en) for FUs to become high-fanout nets, which may otherwise degrade timing closure and increase implementation complexity, we constrain the output of the fault enable signal cluster to a maximum width of 128 bits. This value is derived from empirical design experience, balancing scalability and feasibility. In addition, the architecture maintains the parallel enable signal, ensuring flexible and efficient FUs control, particularly in cases where the ordering of fault injection events has not been determined in advance. Although the width of FU_en is limited to 128 bits, fault units are activated in a time-multiplexed manner across multiple injection cycles. As a result, the FU_en can be configured as a one-hot vector when triggering a single fault unit in a fault injection campaign, or as a multi-bit enable vector when multiple fault units are triggered simultaneously.

3.1.5. Fault Unit

To reduce area overhead, the design of the FU prioritizes simplicity and minimal hardware resources. To comprehensively evaluate the proposed architecture, three commonly used types of fault model units are implemented. Each FU modifies the original stream (origin_data) to produce faulty data (faulty_data) when activated.

3.2. Latency and Resource Overhead Model

The worst-case control latency model of the proposed hybrid cascaded fault-injection controller can be written as:
T max = H · t hop
where T max denotes the end-to-end worst-case control latency, t hop represents the single-hop propagation delay between two adjacent FCU stages ( t hop = 1 cycle in our implementation), and H is the hop count that a control signal must traverse in the worst case.
The hop count can be approximated by
H = N FU C
where N FU is the total number of instrumented FUs and C is the number of fault units that a single FCU can control, corresponding to the width of FU_en signal in Figure 3 ( C = 128 in our implementation).
The resource overhead introduced by the FCUs and FUs can be expressed as follows:
R total = N FU · r FU + N FCU · r FCU
where R total denotes the overall FPGA resource overhead (e.g., LUTs and FFs). The term r FU is the per-FU resource cost, which is typically lightweight and often merged with the existing logic of the target design. In contrast, r FCU represents the fixed per-controller cost of an FCU, including control logic, state registers, and local routing (71 LUTs, 293 FFs, and 6 CARRY8 blocks in our implementation). Overall, the resource overhead scales approximately linearly with the numbers of FUs and FCUs. The FU contribution reflects fine-grained incremental cost, while the FCU contribution introduces coarse-grained but fixed per-stage overhead.
In summary, to scale the architecture for a large number of potential fault sites, multiple FCUs are organized in a cascaded topology, with each unit receiving the cycle control signal from its immediate predecessor. Compared with conventional architectures, the proposed hybrid cascaded structure restricts the width of the fault enable signal cluster to 128 bits, thereby effectively mitigating the risk of high-fanout nets while preserving parallel enable lines to support flexible and simultaneous FUs activation.

4. Fault Injection Flow

Based on the architecture introduced in Section 3, we establish a fully automated workflow, as illustrated in Figure 5. This workflow consists of three major steps: (i) parsing the DUT design to generate a comprehensive fault list, (ii) performing FUs instrumentation on the DUT, (iii) deploying the instrumented DUT onto the FPGA platform.
Figure 5. Automated fault injection flow for emulation.

4.1. Fault List Generation

The workflow begins with the generation of a comprehensive fault list from the original DUT design. This step integrates multiple analysis procedures, including design parsing, fault modeling, and static testability analysis. The design parser extracts fault injection sites information from the DUT, while the fault model defines potential fault scenarios relevant to functional safety verification. Static testability analysis further refines and reorders the fault list by identifying candidate sites with high controllability and observability, and distributing the faults of the same module across appropriate FCU units according to their quantity.

4.2. Fault Unit Instrumentation

Once the fault list is established, the next stage focuses on FU instrumentation. As summarized in Algorithm 1, the DUT is automatically modified according to a user-defined constraints file, where specific fault types of FUs (e.g., SA or SEU), and the DUT modules are injected at the designated fault sites. The hybrid cascaded FCUs are instantiated at the top level of the design to coordinate the activation of different FUs. Following instrumentation, the modified DUT undergoes synthesis, placement, and routing to generate the FPGA bitstream.

4.3. FPGA Deployment and Experimental Evaluation

The final step involves deploying the instrumented DUT onto an FPGA platform for experimental fault injection. The FPGA-based prototype is subjected to test stimulation, during which input vectors are applied to activate both normal and faulted behaviors. The output of the faulty DUT is compared with the golden DUT directly in hardware on an FPGA platform. The fault injection results are then stored in the on-chip memory and transferred to the host PC after all fault injection campaigns are completed. Finally, the evaluation results are compiled into automatically generated reports, providing quantitative insights into fault coverage, error propagation, and system robustness.
Algorithm 1 Automated Fault Injection Instrumentation Flow
Require: Original design D, fault list F L , user constraints C o n s , FU library L F U
Ensure: Instrumented design D , top-level controller F I Ctrl
  1:  S tgt IdentifyCandidates ( D , F L , C o n s )        ▹Select target signals based on constraints
  2: for all modules m D containing signals in S tgt  do
  3:    for all target signals s ( m S tgt )  do
  4:      F U Instantiate ( L F U , Type ( s ) )     ▹Instantiate Specific FU based on constraints
  5:      InterceptSignal ( m , s , F U )      ▹Insert FU between driver and loads
  6:    end for
  7:     RoutePorts ( m , F U ctrl ) ▹Aggregate and expose control ports to module boundary
  8: end for
  9:  M BuildHierarchyMap ( D )        ▹Build Hierarchical Mapping Tables
10:  F I Ctrl SynthesizeController ( M )   ▹Generate logic to address and control all FUs
11: return D , F I Ctrl

5. Experiment Setup and Results

In this paper, we present two types of fault-injection experiments to evaluate the efficiency of the HCCA-SAFE control architecture. The first campaign targets ISCAS’85 and ISCAS’89 benchmark circuits to evaluate the scalability and resource overhead of the proposed architecture. The second campaign is conducted on two open-source RISC-V processor cores to evaluate the architecture’s applicability and effectiveness in a realistic design scenario, and to compare its performance against conventional approaches such as centralized and shift-chain-based control schemes.

5.1. Experimental Setup

In ISCAS and processor campaigns, all designs were synthesised and implemented on an AMD Zynq UltraScale+ FPGA platform (ZCU102; Advanced Micro Devices, Inc., Santa Clara, CA, USA) using Vivado toolchain with the default Synthesis and Implementation strategy (version 2024.1), as shown in Table 3. For the ISCAS fault injection campaign, all effective fault sites in the benchmark circuits were systematically analyzed to provide a full coverage evaluation. In contrast, for the processor core campaign, a full-chip fault injection evaluation would exceed the available FPGA resources on the ZCU102 platform due to the large number of potential fault sites (e.g., 142,095) and the required instrumentation overhead. Therefore, the instruction fetch unit (IFU), a timing-critical and control-intensive module on the processor’s critical path, was selected as a representative target to evaluate the timing overhead introduced by the additional FUs and FCUs. It is important to note that this selection is driven by experimental resource constraints rather than architectural limitations. The proposed fault injection architecture inherently supports full-chip fault coverage, which can be achieved given sufficient FPGA resources through scalable instantiation of the cascaded FCU/FU hierarchy.
Table 3. Vivado default synthesis and implementation strategy details.

5.1.1. ISCAS Benchmark

To assess the scalability and resource overhead of the proposed fault-injection architecture, we performed experiments on seven representative circuits from the ISCAS’85 and ISCAS’89 benchmarks. These designs span a broad spectrum of structural complexity—from small, simple modules to large, complex circuits. The LUT and FF utilization reported by Vivado synthesis tools, together with the potential fault sites of each circuit extracted by our analysis scripts, are summarized in Table 4.
Table 4. Overview of the benchmark circuits and processors.

5.1.2. RISC-V Processor

To further evaluate the practical applicability of the proposed fault-injection architecture beyond benchmark-scale circuits, experiments were conducted on three different open-source RISC-V processor cores: openE902 [18], openE906 [19], and openC906 [20]. These processors exhibit distinctly different architectural characteristics, spanning from lightweight embedded cores to high-performance application-class designs. Specifically, openE902 is a 32-bit lightweight embedded core with a shallow two-stage pipeline and a compact microarchitectural organization. openE906 is a 32-bit RISC-V processor configured in this study with a five-stage pipeline, supporting the integer, multiplication/division, atomic, floating-point, and compressed instruction set extensions (IMACF). At the high-performance end, openC906 is a 64-bit application-class processor featuring deeper pipelines and an integrated memory management unit (MMU), enabling the execution of Linux systems. Together, these processor cores form a realistic and structurally diverse evaluation platform that complements the ISCAS benchmark circuits and enables a more comprehensive analysis of the scalability of the proposed fault-injection architecture. In addition, the fault injection efficiency of these processors is evaluated using practical matrix-based workloads, and the results are presented in Section 5.2.4 in detail.

5.2. Results

5.2.1. Scalability Analysis on ISCAS Benchmarks

We evaluated the resource overhead using seven representative ISCAS benchmark circuits, whose fault counts range from 374 to 48,126. Table 5 summarizes the incremental LUT and FF utilization, together with the corresponding degradation in maximum operating frequency. As circuit complexity increases, the number of instrumented fault sites grows proportionally, resulting in a consistent upward trend in hardware overhead. The increase in FFs usage is noticeably steeper than that of LUTs. This is because the additional FU combinational logic can be partially merged with existing logic during the synthesis process, thereby limiting LUT expansion. In contrast, the FFs required by the control architecture scale directly with the number of fault sites, leading to a significantly faster growth in FF utilization.
Table 5. ISCAS circuits, resource, and frequency overhead.
Figure 6 further illustrates the scalability trend by fitting the LUT and FF overhead as a function of the number of faults, as described in Equation (3). Both resource metrics exhibit a clear linear growth pattern, indicating that the proposed architecture scales proportionally with the number of instrumented fault sites. The fitted models provide a reliable basis for predicting resource utilization for arbitrary fault counts, and they further enable analysis of the maximum number of fault sites that can be supported on a given FPGA platform. The fitted models can be written as:
LUT 0 + ( a LUT N f + b LUT ) LUT FPGA FF 0 + ( a FF N f + b FF ) FF FPGA
where the LUT 0 and FF 0 denotes the resource utilization of DUT, and a LUT , b LUT , a FF , b FF are the fitted parameters of the linear resource models. The LUT FPGA and FF FPGA represent the maximum available LUT and FF resources of the target FPGA platform. N f is the number of faults. The maximum of fault sites supported on a given FPGA platform can be expressed as:
N f min LUT FPGA LUT 0 b LUT a LUT , FF FPGA FF 0 b FF a FF
Figure 6. Resource overhead scaling with the number of fault targets.
Despite the increase in resource usage, the timing degradation remains within a predictable range. The reduction in maximum frequency varies between −155 MHz and −784 MHz, depending primarily on circuit size and structural depth. Importantly, no abrupt timing collapse or nonlinear bottleneck is observed as circuit size increases, which confirms that the proposed architecture maintains stable and predictable timing behavior when scaling to tens of thousands of faults.
Nevertheless, as an instrumentation-based approach, the area overhead, particularly the FF consumption, scales linearly with the number of fault sites and may become a limiting factor on resource-constrained FPGA platforms. In scenarios involving full-chip fault injection on small or mid-range FPGAs, the proposed approach may therefore be impractical due to area and associated power constraints. The approach is best suited for safety-critical validation scenarios that require high fault coverage, cycle-accurate fault modeling, and fast fault simulation requirements, where the additional area and power overhead represent a deliberate trade-off for accuracy and observability. On platforms with limited resources, practical deployment can be achieved through selective module-level fault injection or reduced fault sites, without architectural modification.

5.2.2. Impact of Cluster Width on Scalability and Implementation

To further investigate the impact of cluster width on timing, fan-out, routing congestion, and resource usage, a simple design space exploration (DSE) experiment is conducted. The results are summarized in Table 6. The maximum achievable clock frequency is jointly influenced by circuit scale, fault count, and the fault-enabled cluster width, exhibiting distinct sensitivity across designs. For the larger s38417 circuit (24,063 fault nodes), the clock frequency peaks at 171.5 MHz with a cluster width of 1024, while comparable frequencies of 141.75 MHz are observed at widths of 128. Similarly, the smaller s15850 circuit (10,399 fault nodes) achieves its highest frequency of 127.54 MHz at a width of 1024, with performance degrading to 55.45 MHz and 121.15 MHz as the width decreases to 128 and 512. Beyond global reset buffers, the fault-enable register width is the second most significant contributor to maximum fan-out, indicating that wider clusters can adversely impact timing due to increased local control fan-out.
Table 6. Different cluster width in s38417 and s15850.
Routing congestion correlates with the selected cluster width and circuit scale. For s38417, congestion levels of 3, 6, and 5 are reported at cluster widths of 128, 512, and 1024, respectively, whereas s15850 remains below Vivado’s congestion reporting threshold (Level 3) across 128 and 512 widths. In terms of resource utilization, the cluster width directly determines the number of FCUs, according to Equation (3). A wider cluster width results in fewer required FCUs and consequently lower control-related resource overhead. Based on the observed routing congestion levels, a cluster width of 128 is selected in this work, and all processor-level experiments are conducted using this configuration.

5.2.3. Performance Evaluation on Processor Cores

The proposed architecture is compared with conventional centralized and shift-chain designs across the openE902, openE906, and openC906 processor cores. The corresponding resource usage and performance overhead of each fault-control strategy are reported in Table 7. Our architecture and the best results of different methods are highlighted in bold. For openE902, HCCA-SAFE reduces the ΔFmax degradation by 94.7% compared with shift-chain, with a resource overhead of 154.7% LUTs and 17.2% FFs, and also improves over the centralized method by 4 MHz. For openE906, the proposed architecture can achieve 23.5% and 27.0% improvements compared with conventional methods. For openC906, the proposed design improves timing by 81.0% compared with shift-chain, and 45.5% compared with centralized. Although the Shift-Chain approach yields the smallest hardware overhead, it also causes the most significant frequency degradation on both processors. Conversely, the proposed architecture requires more resources to implement the distributed control logic, but it consistently achieves the least performance loss, demonstrating superior timing scalability.
Table 7. Resource and frequency overhead of different fault-control architectures on processor cores.
The shift-chain architecture is structurally simpler than both the centralized and proposed methods, resulting in lower LUT and FF overhead. However, its serialized flip-flop–based control path introduces a long critical path, as control signals must propagate sequentially across hierarchical boundaries and FPGA tiles. This sequential propagation leads to substantial net delay, which dominates the overall critical path and causes severe performance degradation, as confirmed by the timing results in Table 8. The results of this work are highlighted with an orange background.
Table 8. Timing analysis on all processor cores across different fault-control architectures.
The centralized architecture places all fault control logic in a single module and relies on a wide, high-fanout control signal to activate individual FUs. Although its critical-path timing is better than that of the shift-chain design, the fanout grows rapidly with the number of faults—for instance, reaching 7725 on openC906 compared with 570 in our method—creating severe routing pressure and hotspots. This excessive fanout ultimately degrades timing and limits scalability, even when small designs appear manageable. As shown in Figure 7, the high-fanout signal generates congested routing regions and forces the critical path to traverse multiple overloaded channels, directly contributing to increased net delay and degraded performance.
Figure 7. Critical timing path and routing hotspot analysis of the centralized architecture on openC906.

5.2.4. Efficiency Results on Processor Cores

Figure 8 illustrates the fault simulation and verification system constructed for processor-level evaluation. On the FPGA board side, the simple System-on-Chip (SoC) integrates the processor core, on-chip memories, peripherals, fabric bus system, and the proposed fault-injection infrastructure. Faults are injected into the target processor IFU units under the control of the hybrid cascaded FCUs. A cycle-level timer and stop-flag logic are employed to accurately measure the execution latency of each fault injection run.
Figure 8. Experimental fault injection platform.
To measure the fault injection campaign correctness, a fault-free golden SoC is instantiated in parallel with the DUT. Both SoC systems execute the same Matrix workload. During execution, the architectural states and system bus output signals of the faulty SoC and the golden SoC are continuously compared using an on-chip comparator. Any mismatch caused by injected faults is captured and reported through the JTAG interface at the end of all fault campaigns, while detailed execution information is transferred to the host PC.
After the completion of all fault injection runs, the comparison results and timing statistics are collected on the host side for offline analysis. Experimental results show that the outcomes produced by the FPGA-based fault simulation are fully consistent with those obtained from software-based RTL simulation, confirming the functional correctness and reliability of the proposed fault-injection framework. Table 9 summarizes the fault simulation results obtained on the three processor cores.
Table 9. Software-based and FPGA-based simulation results.
Table 10 reports the mean time per fault injection run and the corresponding simulation speed-up achieved by the proposed HCCA-SAFE framework, compared with conventional RTL-based fault simulation. For openE902, the average injection time is reduced from 80.7 μs to 0.6367 μs, resulting in a speed-up factor of 127×. Similar trends are observed for openE906, where HCCA-SAFE achieves a 206 × acceleration over RTL simulation. The most significant improvement is observed on openC906, for which the average injection time is reduced from 528.6 μs to 0.2490 μs, corresponding to a speed-up factor of 2123×.
Table 10. Mean time per injection run and estimated speed-up factor.

6. Conclusions

This paper presented HCCA-SAFE, a hybrid cascaded controller architecture that addresses the scalability, timing, and routing bottlenecks of conventional instrumentation-based fault-injection methods. Across two RISC-V processors, the proposed design delivers consistently better overall performance. On openE902, HCCA-SAFE reduces net delay from 27.276 ns to 22.535 ns (17.4%), compared with baseline, and achieves 32.2% and 63.8% lower net delay than the centralized and shift-chain designs. On openE906, the proposed architecture exhibits timing improvements comparable to those observed on openE902. On openC906, it significantly reduces the dominant control-signal fanout to 570, compared with 7725 and 876 in prior methods, and lowers net delay to 7.506 ns. Beyond timing improvements, the proposed framework enables highly efficient and accurate fault simulation on processor-level systems. Experimental results show that the FPGA-based fault simulation outcomes are fully consistent with those obtained from software-based RTL simulation, while achieving substantial acceleration. Specifically, HCCA-SAFE provides speed-up factors of 127 × , 206 × , and 2123 × over RTL-based fault simulation on openE902, openE906, and openC906, respectively, with the efficiency gains becoming more pronounced as processor complexity increases. These results demonstrate that HCCA-SAFE provides superior timing robustness, reduced global fanout, and improved routability, establishing a scalable and efficient foundation for large-scale FPGA-based functional-safety fault injection in automotive SoCs.

Author Contributions

Conceptualization, J.H. and X.L.; Methodology, J.H., Y.Z. and X.L.; Software, J.H. and X.L.; Validation, J.H.; Formal analysis, J.H.; Investigation, J.H., Y.Z., W.L. and X.L.; Resources, J.H. and X.L.; Data curation, J.H. and X.L.; Writing—original draft, J.H.; Writing—review & editing, J.H.; Visualization, J.H., Y.Z., Y.L., C.X. and Y.Y.; Supervision, J.H., Y.L., C.X. and Y.Y.; Project administration, J.H., Y.L., C.X., X.L. and Y.Y.; Funding acquisition, J.H., Y.L., C.X., X.L. and Y.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by Guangdong Basic and Applied Basic Research Foundation under Grant 2025A1515012058, by Natural Science Basic Research Program of Shaanxi under Grant 2025JC-YBQN-822, by the fund of innovation center of radiation application under Grant KFZC2025010203, by the Shenzhen Science and Technology Program under Grant KJZD20240903100506009, by National Science Foundation of China under Grant 62090043, 61934006, and by National Key Research and Development Program of China under Grant 2022YFB4401301.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. ISO 26262-11:2018; Road Vehicles—Functional Safety—Part 11: Guidelines on Application of ISO 26262 to Semiconductors. ISO: Geneva, Switzerland, 2018.
  2. IEC 61508:2010; Functional Safety of Electrical/Electronic/Programmable Electronic Safety-Related Systems. IEC: Geneva, Switzerland, 2010.
  3. Cadence Design Systems Inc. Xcelium Logic Simulator: Industry-Leading, Highest Performance Simulation Platform. Available online: https://www.cadence.com/en_US/home/solutions/automotive-solution/functional-safety.html (accessed on 24 December 2025).
  4. Lopez-Ongil, C.; Entrena, L.; Garcia-Valderas, M.; Portela, M.A.P.M.; Aguirre, M.A.; Tombs, J. A Unified Environment for Fault Injection at Any Design Level Based on Emulation. IEEE Trans. Nucl. Sci. 2007, 54, 946–950. [Google Scholar] [CrossRef]
  5. Tuzov, I.; de Andrés, D.; Ruiz, J.C.; Hernández, C. BAFFI: A Bit-Accurate Fault Injector for Improved Dependability Assessment of FPGA Prototypes. In Proceedings of the Design, Automation & Test in Europe Conference & Exhibition (DATE), Antwerp, Belgium, 17–19 April 2023; pp. 1–6. [Google Scholar]
  6. Bailan, O.; Rossi, U.; Wantens, A.; Daveau, J.-M.; Nappi, S.; Roche, P. Verification of Soft Error Detection Mechanism through Fault Injection on Hardware Emulation Platform. In Proceedings of the International Conference on Dependable Systems and Networks Workshops (DSN-W), Chicago, IL, USA, 28 June–1 July 2010; pp. 113–118. [Google Scholar]
  7. Xilinx Inc. ICAP—Internal Configuration Access Port. Available online: https://docs.amd.com/r/en-US/pg036_sem/ICAP-Interface (accessed on 24 December 2025).
  8. Fibich, C.; Horauer, M.; Obermaisser, R. Bitstream-Level Interconnect Fault Characterization for SRAM-Based FPGAs. In Proceedings of the Design, Automation & Test in Europe Conference & Exhibition (DATE), Antwerp, Belgium, 17–19 April 2023; pp. 1–2. [Google Scholar]
  9. Ullah, A.; Sanchez, E.; Sterpone, L.; Cardona, L.A.; Ferrer, C. An FPGA-Based Dynamically Reconfigurable Platform for Emulation of Permanent Faults in ASICs. Microelectron. Reliab. 2017, 75, 110–120. [Google Scholar] [CrossRef]
  10. Nowosielski, R.; Gerlach, L.; Bieband, S.; Payá-Vayá, G.; Blume, H. FLINT: Layout-Oriented FPGA-Based Methodology for Fault Tolerant ASIC Design. In Proceedings of the Design, Automation & Test in Europe Conference & Exhibition (DATE), Grenoble, France, 9–13 March 2015; pp. 297–300. [Google Scholar]
  11. Ejlali, A.; Miremadi, S.G.; Zarandi, H.; Asadi, G.; Sarmadi, S.B. A Hybrid Fault Injection Approach Based on Simulation and Emulation Co-Operation. In Proceedings of the International Conference on Dependable Systems and Networks (DSN), San Francisco, CA, USA, 22–25 June 2003; p. 479. [Google Scholar]
  12. Entrena, L.; Garcia-Valderas, M.; Fernandez-Cardenal, R.; Lindoso, A.; Portela, M.; Lopez-Ongil, C. Soft Error Sensitivity Evaluation of Microprocessors by Multilevel Emulation-Based Fault Injection. IEEE Trans. Comput. 2012, 61, 313–322. [Google Scholar] [CrossRef]
  13. Lopez-Ongil, C.; Garcia-Valderas, M.; Portela-Garcia, M.; Entrena, L. Autonomous Fault Emulation: A New FPGA-Based Acceleration System for Hardness Evaluation. IEEE Trans. Nucl. Sci. 2007, 54, 252–261. [Google Scholar] [CrossRef]
  14. Serrano, F.; Clemente, J.A.; Mecha, H. A Methodology to Emulate Single Event Upsets in Flip-Flops Using FPGAs through Partial Reconfiguration and Instrumentation. IEEE Trans. Nucl. Sci. 2015, 62, 1617–1624. [Google Scholar] [CrossRef]
  15. Abideen, Z.U.; Rashid, M.H. EFIC-ME: A Fast Emulation-Based Fault Injection Control and Monitoring Enhancement. IEEE Access 2020, 8, 207705–207716. [Google Scholar]
  16. Huang, Z.-M.; Yang, D.-A.; Chen, H.H. FPGA-Based Emulation for Accelerating Transient Fault Injection in Microprocessors. In Proceedings of the IEEE Asian Test Symposium (ATS), Taichung, Taiwan, 21–24 November 2022; pp. 106–111. [Google Scholar]
  17. Aranda, L.A.; Ruano, O.; Garcia-Herrero, F.; Maestro, J.A. ACME-2: Improving the Extraction of Essential Bits in Xilinx SRAM-Based FPGAs. IEEE Trans. Circuits Syst. II Express Briefs 2022, 69, 1577–1581. [Google Scholar]
  18. XUANTIE-RV. OpenXuantie—OpenE902 Core. GitHub. Available online: https://github.com/XUANTIE-RV/opene902.git (accessed on 24 December 2025).
  19. XUANTIE-RV. OpenXuantie—OpenE906 Core. GitHub. Available online: https://github.com/XUANTIE-RV/opene906.git (accessed on 24 December 2025).
  20. XUANTIE-RV. OpenXuantie—OpenC906 Core. GitHub. Available online: https://github.com/XUANTIE-RV/openc906.git (accessed on 24 December 2025).
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.