Simulating the Effects of Sensor Failures on Autonomous Vehicles for Safety Evaluation
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThe article presents a very interesting proposal for a sensor fault injection framework based on Carla and ROS. As a "Design Under Test (DUT)" a current Autoware AD stack is used -- which also could be interesting for several readers.
However, some points definitely need to be corrected before acceptance. Some needed changes can be done quickly. Others might be solvable by a literature review, or by adding further own experiments.
Main points:
- How reliable is the simulation framework? Are the results valid? Are they meaningful for a later transfer of an AD stack to a real vehicle --> a comparison with real data is needed!
- "Results show that faults in LiDAR and IMU’s gyroscope ... In contrast, faults in GNSS and IMU’s accelerometers..." --> depend on the motion control / decision making components under test -- not depended on the "simulation-based fault injection framework" presented in this work. The authors should clearly distinguish between the meta layer of the framework they present and the design under test which gives just a demo case to study and demonstrate the framework. This distinction is possible with a small change in section 4 but needs some more work in section 5.
- several failures in the basics (see below)
- Relevance for the industry should be shown. Autoware/ROS is used in university research while OEM use other frameworks, e.g., AUTOSAR. Relevance for AUTOSAR users should be discussed, e.g., compare Hong and Moon: "Autonomous Driving System Architecture with Integrated ROS2 and Adaptive AUTOSAR"
Further points:
- line 67-70: how did the authors select these kinds of failures and these sensors? Why not offset/drift/stuck-at errors? Why not radar, ultra sonic, wheel encoders?
- line 116: Fig 1 is a self-citation and [12] is a self-citation -- which are not really needed and:
- line 96: the SPA architecture is definitely outdated (compare even textbooks like the Siciliano, chapter 8.2 History (of Robot Architectures)) and should at least treated as a simplification and just one option for rare cases. line 105/106 "Autonomous vehicles (AVs) are built on a modular software stack that follows a sense-plan-act design" definitely needs to be corrected or explained.
- line 106/107 is wrong the way it is put here, and, moreover, [11] is no proof for it.
- page 3+4: such a separation of perception -- localization -- prediction does not make much sense
- 2.1.5 the terms "trajectory" and "plan" were mixed and at least not explained, as well as the stated separation between speed control and steering control.
- GNSS is proprioceptive? really???
- lines 212-215: sounds like the frequency shift gives both, relative speed and distance -- correct to only speed or explain distance computation, e.g., FMCW principle
- lines 221-223: how do [27-28] support the statement that interference is becoming a growing concern in dense traffic scenarios??? How can you make such a statement?
- lines 225-226: TOF just one of multiple LiDAR measurement methods
- lines 230-231: 1550 nm or NIR, and not only pulses. Better not call it "typical".
- lines 257-258: Why are the variations between the satellite’s atomic clock and the receiver’s crystal clock the critical point? The receiver clock bias must be known only, e.g., at the 1 μs level!?! (compare Maciuk 2021)
- In general: at several parts, rather strict statements are made. These cannot be maintained when taking a closer look... be careful on what you claim!
- section 2.4: missing is a description what Autoware brings on top of ROS, some readers might be familiar with SLAM etc. on ROS but do not know Autoware
- especially 2.4.7 needs more details, but also for 2.4.3 - 2.4.6, e.g., used/available algorithms would be interesting
- in section 2.5 or in section 3: a discussion on types of sensor faults would be helpful: What are typical, relevant, and/or because of other reasons selected faults for simulation? There should be two motivations: (a) finding faults in the implementation by any means (e.g., as described with DriveFI) and (b) evaluating the system's behaviour at specifically those faults that can occur later in the real vehicle (this is missing at this point, comes later in section 5, but even there all types of faults are reduced to a single Gaussian -- this needs to be justified)
- line 693: "represents"?? meant was "presents", maybe?
- line 737: see above: why no other fault types? stuck-at? ceiling/floor effects? drifts? What are faults in real vehicles? If you have no own data, check studies.
- Especially in section 5 (and a bit in section 4) two points/questions need to be separated in order that the article is helpful and applicable for the reader:
(1) Is the proposed fault injection framework working correctly and is it useful and meaningful?
(2) How is it applied to a DUT and what are results (e.g., found faults) for the DUT (here: the current Autoware AD stack)? - line 785: why is the camera not working?
- line 815-816: simulating all these measurement errors just by a single Gaussian distribution is rather simple. Ok for a very first trial but this should be said / other models be mentioned.
- line 983: "suggest that", later "likely due"??? The methods are open in ROS, and should be in Autoware, too. Which method is used in this version? What is the influence of GNSS and LiDAR?
- line 990: "Autoware demonstrated good tolerance": Autoware is a framework and architecture, integrated methods can/will change. Meant is not "Autoware" but a set of specific methods.
Author Response
We thank the reviewer for the careful, technically informed review. Their comments helped us correct inaccuracies, tighten claims, and clarify scope and contributions. Below we respond point‑by‑point. For each item we indicate how the manuscript was revised.
The revisions on the new version of the manuscript are highlighted in yellow.
Comment 1:
How reliable is the simulation framework? Are the results valid? Are they meaningful for a later transfer of an AD stack to a real vehicle --> a comparison with real data is needed!
Response 1:
This is the second step of our research. We started by analyzing sensor failures [1], and in this step, we are presenting a framework able to emulate some of these sensor failures that can be used in laboratory experiments aimed at understanding how sensor failures impact on AD vehicles. The framework uses components which are already established in academia and industry (CARLA and Autoware). We performed a validation of the framework by executing several driving scenarios with and without faults (section 6.1). When driving without faults, we emulated minor sensor disturbances that fall inside the limits of their nominal values, according to the sensors’ specifications. In the scenarios of driving with major faults, we used the information on the way that sensos can fail [1]. We agree with the importance of validation against real-world datasets, and plan to do this in the next step. We have now explicitly acknowledged this limitation in the Conclusions and Future Work (section 7, lines 1219-1229), citing potential datasets (e.g., LDF, ROD2021) that can be used in future studies.
[1] Matos, F.; Bernardino, J.; Durães, J.; Cunha, J. A Survey on Sensor Failures in Autonomous Vehicles: Challenges and Solutions. Sensors 2024, Vol. 24, Page 5108 2024, 24, 5108, doi:10.3390/S24165108.
Comment 2:
"Results show that faults in LiDAR and IMU’s gyroscope ... In contrast, faults in GNSS and IMU’s accelerometers..." --> depend on the motion control / decision making components under test -- not depended on the "simulation-based fault injection framework" presented in this work. The authors should clearly distinguish between the meta layer of the framework they present and the design under test which gives just a demo case to study and demonstrate the framework. This distinction is possible with a small change in section 4 but needs some more work in section 5.
Response 2:
We have revised the Abstract (lines 25-26), Sections 4.3 (lines 823-830), Section 5.1 (lines 900-906), and Section 7 (lines 1219-1229) to clarify this distinction, ensuring it is clear which results pertain to the framework itself, and which are specific to the Autoware DUT demonstration.
Comment 3:
several failures in the basics (see below)
Comment 4:
Relevance for the industry should be shown. Autoware/ROS is used in university research while OEM use other frameworks, e.g., AUTOSAR. Relevance for AUTOSAR users should be discussed, e.g., compare Hong and Moon: "Autonomous Driving System Architecture with Integrated ROS2 and Adaptive AUTOSAR"
Response 4:
We added industry relevance and AUTOSAR discussion in the Related Work (lines 712-731), noting recent ROS2 - AUTOSAR Adaptive interoperability (e.g., ROS2 SOME/IP bridges) and explaining how our fault profiles can be replayed at AUTOSAR communication endpoints (ara::com/SOME‑IP) or at simulator interfaces. We also added Future Work (lines 1240-1244) to prototype a ROS2 - SOME/IP gateway and AUTOSAR‑native injector.
Comment 5:
line 67-70: how did the authors select these kinds of failures and these sensors? Why not offset/drift/stuck-at errors? Why not radar, ultra sonic, wheel encoders?
Response 5:
This paper aims first to present the framework utility to laboratory experiments on AD resilience in the presence of sensor failures. We used a set of common sensors and failures to show its use and validity towards its aim. An extensive simulation using a wide range of realistic failures to systematically evaluate the robustness of the AV stack is a natural future work we intend to carry subsequently. Therefore, the framework was built in a way to easily adapt to other sensors and failure modes. To make this clearer, Section 4.3 (lines 872-883) now justifies the initial scope, focused on LiDAR, GNSS, and IMU because they are central to the tested Autoware stack and well supported in CARLA for reproducible fault injection. We selected silent failures and severe noise as representative extremes, covering complete signal loss to substantial measurement degradation. We acknowledge other fault classes (bias/offset, drift, stuck‑at, latency/jitter, dropouts) and additional sensors (radar, ultrasonic, wheel encoders) and note the framework’s extensibility (lines 1230-1232), these are included in future work.
Comment 6:
line 116: Fig 1 is a self-citation and [12] is a self-citation -- which are not really needed and:
Response 6:
We revised the captions to remove the “from” wording and minimized self-citations. Nevertheless, we retained a reference in lines 200–201 to explicitly acknowledge that the description of sensors used in AVs within the Background Concepts section builds upon our previous publication.
Comment 7:
line 96: the SPA architecture is definitely outdated (compare even textbooks like the Siciliano, chapter 8.2 History (of Robot Architectures)) and should at least treated as a simplification and just one option for rare cases. line 105/106 "Autonomous vehicles (AVs) are built on a modular software stack that follows a sense-plan-act design" definitely needs to be corrected or explained.
Response 7:
We have presented the SPA model as a conceptual foundation and for simplification, but the reviewer is right, as this can give a wrong idea about the used AV architecture. Therefore, Section 2.1 (lines 105-123) has been rewritten: SPA is now presented solely as a historical/simplified pedagogical model, the text describes contemporary hybrid/reactive architectures (e.g., layered control, behavior trees, continuous re‑planning with MPC) and clarifies that modern AV stacks do not follow a pure SPA pipeline.
Comment 8:
line 106/107 is wrong the way it is put here, and, moreover, [11] is no proof for it.
Response 8:
Section 2.1 is now revised, and this claim no longer exists.
Comment 9:
page 3+4: such a separation of perception -- localization -- prediction does not make much sense
Response 9:
We revised Section 2.1 and the architecture figure to present perception, localization, and prediction as interdependent processes with continuous feedback, consistent with modern AV architectures.
Comment 10:
2.1.5 the terms "trajectory" and "plan" were mixed and at least not explained, as well as the stated separation between speed control and steering control.
Response 10:
We clarified terminology in Sections 2.1.2–2.1.3 (lines 151-171): “plan/behavior plan” vs. “motion/trajectory,” and “lateral (steering) vs. longitudinal (speed/brake)” control. We removed residual SPA wording to avoid confusion.
Comment 11:
GNSS is proprioceptive? really???
Response 11:
It was an error, GNSS is, of course, exteroceptive, and the text in Section 2.2 (line 209) has been fixed.
Comment 12:
lines 212-215: sounds like the frequency shift gives both, relative speed and distance -- correct to only speed or explain distance computation, e.g., FMCW principle
Response 12:
Revised to explain FMCW principles succinctly (lines 224-227), beat frequency yields range, Doppler shift yields relative velocity, references added.
Comment 13:
lines 221-223: how do [27-28] support the statement that interference is becoming a growing concern in dense traffic scenarios??? How can you make such a statement?
Response 13:
We replaced/augmented (lines 233-235) with citations to works explicitly addressing automotive radar interference and mitigation in dense environments, and we softened the claim to be evidence‑based.
Comment 14:
lines 225-226: TOF just one of multiple LiDAR measurement methods
Response 14:
Updated Section 2.2.3 (lines 237-242) to cover ToF, AMCW/phase‑shift, and FMCW LiDAR ranging with appropriate references.
Comment 15:
lines 230-231: 1550 nm or NIR, and not only pulses. Better not call it "typical".
Response 15:
Wording adjusted (lines 245-246). Wavelengths are described as common options (905 nm, 1550 nm/NIR), and both pulsed ToF and continuous‑wave modalities are mentioned; we avoid “typical.”
Comment 16:
lines 257-258: Why are the variations between the satellite’s atomic clock and the receiver’s crystal clock the critical point? The receiver clock bias must be known only, e.g., at the 1 μs level!?! (compare Maciuk 2021)
Response 16:
Section 2.2.5 (lines 273-276) now clarifies receiver clock bias and stability, aligning with Maciuk (2021). We note that absolute bias is estimated with position/time, so stability (and atmospheric/multipath effects) dominate accuracy, we cite the suggested reference.
Comment 17:
In general: at several parts, rather strict statements are made. These cannot be maintained when taking a closer look... be careful on what you claim!
Response 17:
We audited the manuscript to remove over‑generalized claims and tightened wording to better align with the available evidence.
Here are some examples:
Example 1
Original: “Autonomous vehicles (AVs) are built on a modular software stack that follows a sense-plan-act design…”
Revised lines 105-108: “The Sense–Plan–Act (SPA) model has served as a conceptual foundation… While useful as a simplification, SPA is rarely used in its pure form for modern AVs because it struggles with real-time uncertainty…”
Example 2
Original: “Additionally, as more vehicles adopt frequency-modulated continuous-wave (FMCW) radar systems, interference caused by overlapping signal frequencies is becoming a growing concern in dense traffic scenarios [27,28].”
Revised lines 233-235: “Additionally, FMCW automotive radars can experience mutual interference, particularly when many vehicles operate nearby, which has been documented experimentally and is an active area of mitigation research [34,35].”
Example 3
Original: “LiDAR systems typically use 1550 nm laser pulses and can detect objects at distances up to 300 meters [28,29].”
Revised lines 237-242 “LiDAR sensors estimate distance by emitting laser light and inferring range from the returned signal. LiDAR distance can be measured using time-of-flight, amplitude/phase-modulated continuous wave (AMCW/phase-shift), and frequency-modulated continuous wave (FMCW), ToF relies on round-trip time, AMCW estimates the phase shift of an intensity-modulated carrier, and FMCW uses coherent beat-frequency detection and can directly yield Doppler velocity [36–40].”
and lines 245-247: “LiDAR’s operate in the near-infrared (e.g., around 905 nm or 1550 nm) and can detect objects at distances up to 300 meters [41,42].”
Example 4
Original: “Timing errors, due to variations between the satellite’s atomic clock and the receiver’s crystal clock.”
Revised lines 273-276: “Clock bias and synchronization, due to residual satellite-clock error and receiver clock drift/jitter, clock/synchronization effects are present, but the receiver’s absolute bias is estimated with (x,y,z), so stability, not absolute offset, governs positioning accuracy [49].”
Comment 18:
section 2.4: missing is a description of what Autoware brings on top of ROS, some readers might be familiar with SLAM etc. on ROS but do not know Autoware
Response 18:
Expanded Section 2.4 (lines 434-438) to outline Autoware’s added layers over ROS 2: perception pipelines, LiDAR‑map and lane‑based localization, behavior/motion planning, traffic light interface, and control interfaces.
Comment 19:
especially 2.4.7 needs more details, but also for 2.4.3 - 2.4.6, e.g., used/available algorithms would be interesting
Response 19:
We enriched Sections 2.4.3–2.4.7 (lines 486-516, 523-524, 555-586, 593-618, 623-633) with brief algorithmic descriptions and configuration notes (such as NDT for LiDAR localization, behavior planners, lateral/longitudinal controllers, supervision).
Comment 20:
in section 2.5 or in section 3: a discussion on types of sensor faults would be helpful: What are typical, relevant, and/or because of other reasons selected faults for simulation? There should be two motivations: (a) finding faults in the implementation by any means (e.g., as described with DriveFI) and (b) evaluating the system's behaviour at specifically those faults that can occur later in the real vehicle (this is missing at this point, comes later in section 5, but even there all types of faults are reduced to a single Gaussian -- this needs to be justified)
Response 20:
We added a short discussion on common sensor fault classes (bias/offset, drift, stuck‑at, latency/jitter, dropouts, outliers), in lines 872-883, and clarified our two motivations: (a) demonstrate the framework with structured, reproducible faults. And (b) target faults that plausibly occur in practice. We explicitly justify the initial Gaussian noise simplification and list additional non‑Gaussian/colored models we plan to evaluate.
Comment 21:
line 693: "represents"?? meant was "presents", maybe?
Response 21:
Yes, corrected to “presents” (line 803)
Comment 22:
line 737: see above: why no other fault types? stuck-at? ceiling/floor effects? drifts? What are faults in real vehicles? If you have no own data, check studies.
Response 22:
Addressed together with the expanded justification in Section 4.3 and the added discussion of fault types, in lines 872-883.
Comment 23:
Especially in section 5 (and a bit in section 4) two points/questions need to be separated in order that the article is helpful and applicable for the reader: (1) Is the proposed fault injection framework working correctly and is it useful and meaningful? (2) How is it applied to a DUT and what are results (e.g., found faults) for the DUT (here: the current Autoware AD stack)?
Response23:
We added Section 5.7 Experimental Setup Validation Process (lines 1098-1115) to demonstrate that fault messages are injected as intended (pre/post‑fault stream inspection at the ROS Bridge). Results are then clearly attributed to the DUT. We also state that the goal of this work is not to find DUT defects exhaustively, but to illustrate how the framework surfaces system‑level effects of sensor faults.
Comment 24:
line 785: why is the camera not working?
Response 24:
We expanded the explanation in section 5.1 (lines 915-925), the camera‑based traffic light detection was not functioning reliably in the Autoware version used, we include references to relevant GitHub. Since the goal of the paper is to demonstrate our framework, and not the robustness of the DUT itself, this was not an issue.
Comment 25:
line 815-816: simulating all these measurement errors just by a single Gaussian distribution is rather simple. Ok for a very first trial but this should be said / other models be mentioned.
Response 25:
We agree, in lines 872-883, we now state explicitly that Gaussian noise is a first‑trial simplification to validate the setup. We enumerate additional noise/fault models (bias/offset, drift, stuck‑at, latency/jitter, heavy‑tailed/mixture, colored noise) supported by the framework and planned for evaluation.
Comment 26:
line 983: "suggest that", later "likely due"??? The methods are open in ROS, and should be in Autoware, too. Which method is used in this version? What is the influence of GNSS and LiDAR?
Response 26:
We removed ambiguous phrasing and specified, based on Autoware documentation/configuration, that localization relies primarily on LiDAR‑map matching (NDT), with GNSS used as auxiliary input (lines 1146-1153). This explains the observed robustness to GNSS noise/dropout in our scenarios.
Comment 27:
line 990: "Autoware demonstrated good tolerance": Autoware is a framework and architecture, integrated methods can/will change. Meant is not "Autoware" but a set of specific methods.
Response 27:
Reformulated (lines 1146-1163, 1168-1171, 1176-1178) to attribute outcomes to the tested configuration and methods of the Autoware version under study (e.g., NDT localization, selected controllers), rather than to “Autoware” in general.
Reviewer 2 Report
Comments and Suggestions for Authors- On what basis are the faults considered tolerated or not?
- Needs to be more specific about the novelty of this work.
- The research gap is not mentioned properly.
- There is a missing connection between the existing work and the proposed framework.
- Discusses other research but fails to thoroughly examine the shortcomings that the study resolves.
- The core components, including the "Sensor Failure Model," "Sensor Fusion Module," and "Autonomous Control System," are deficient in essential specifics regarding their implementation, particular algorithms, and parameters.
- It is unclear how the faults are precisely controlled (timing, duration, intensity).
- In Figures, it is not necessary to add from [12]; only [12] is enough.
- The discussion is qualitative and requires precise numerical data, such as error values and percentages.
- Explains outcomes but fails to elucidate the reasons for performance variations in the “Result” section.
- Comparison with other fusion techniques or fault injection methods is absent.
Author Response
We would like to sincerely thank the reviewer for their constructive feedback, which has helped us improve the clarity, technical depth, and overall quality of our manuscript. Below, we provide detailed, point-by-point responses to each of the reviewer’s comments.
The revisions on the new version of the manuscript are highlighted in yellow.
Comment 1:
On what basis are the faults considered tolerated or not?
Response 1:
We clarified in Section 5.5 (lines 1082-1086) that faults are considered tolerated if the vehicle reaches its destination without any collision, lane departure, or timeout. Specifically, we classify run outcomes into four categories: Collision, Out, Timeout, and OK. This classification was already present but has now been explicitly linked to the notion of fault tolerance and system failure to address the reviewer’s request.
Comment 2:
Needs to be more specific about the novelty of this work.
Response 2:
We have emphasized in the Related Work (lines 712-731) section that this is the first framework that integrates CARLA and Autoware with a structured sensor fault injection mechanism operating at the ROS 2 bridge layer. Unlike prior works, it enables reproducible scenario-based testing with sensor faults across multiple sensors.
Comment 3:
The research gap is not mentioned properly.
Response 3:
We have revised the related work (lines 713-732) to explicitly state the gap: existing works either focus on ideal or mildly perturbed conditions, or use proprietary AV stacks without structured, reproducible sensor fault injection for open-source production-grade stacks such as Autoware. Our work fills this gap by providing an open, reproducible framework.
Comment 4:
There is a missing connection between the existing work and the proposed framework.
Response 4:
We clarified in Related Work (lines 712-721) how our framework builds upon prior tools (AVFI, DriveFI, CarFASE) while addressing their limitations by highlighting differences in scope, methodology, and intended use cases.
Comment 5:
Discusses other research but fails to thoroughly examine the shortcomings that the study resolves.
Response 5:
In Related Work, we now highlight specific shortcomings of prior frameworks, such as limited sensor coverage, focus on automated search rather than controlled analysis, or use of non-production-grade AV stacks, and explain how our framework addresses each of these (lines 712-731).
Comment 6:
The core components, including the 'Sensor Failure Model,' 'Sensor Fusion Module,' and 'Autonomous Control System,' are deficient in essential specifics.
Response 6:
Section 5.2 presents the Fault Model we adopt in our framework, which emulate the Sensor Failures affecting the AV system, therefore representing the Sensor Failure Model from the perspective of the isolated sensors. We added in line 927 a clarification that a failure from the perspective of a subsystem (the sensor) appears as a fault inside the larger system (the AV).
The goal of this paper is to present a framework able to emulate these sensor failures and evaluate AV behavior when facing these failures. The paper does not aim to re-implement these AV modules, but rather to use an existing AV stack (Autoware) to perform fault injection experiments and demonstrate the capability of the framework. We have slightly expanded the Background Concepts section to clarify how these components are implemented in Autoware for context (lines 431, 434-438, 486-514, 523-524, 555-586, 593-618, 623-633).
Comment 7:
It is unclear how the faults are precisely controlled (timing, duration, intensity).
Response 7:
We added clarifications in Section 5.2 describing the activation mechanism for each fault, its duration, trigger, and location (lines 935-936).
Comment 8:
In Figures, it is not necessary to add 'from [12]'; only '[12]' is enough.
Response 8:
We have removed 'from' in all figure captions where applicable.
Comment 9:
The discussion is qualitative and requires precise numerical data.
Response 9:
Since we are describing our fault injection framework and presenting its capabilities, we have opted for a qualitative analysis of the experiments, instead of quantitative, as would be expected if the goal was to evaluate the robustness of the AV stack.
Where applicable, such as in noise fault definitions and severity levels, we have ensured that numerical parameters are explicitly stated, therefore the classifications we used for the outcomes are clear.
Comment 10:
Explains outcomes but fails to elucidate the reasons for performance variations in the 'Result' section.
Response 10:
While our study primarily focuses on observing system behavior under faults, with the objective of validating the framework, we have added some explanations based on Autoware documentation in Section 6 (lines 1146-1163, 1168-1171, 1176-1178) to provide explanations for observed trends.
Comment 11:
Comparison with other fusion techniques or fault injection methods is absent.
Response 11:
Fusion techniques are not the central focus of our paper. We have extended Related Work to briefly note other fault injection methods and position our work in relation to them (lines 712-720).
Round 2
Reviewer 2 Report
Comments and Suggestions for AuthorsAuthors have modified the paper.