This section discusses the key patterns of safety failures identified through the hybrid Swiss Cheese–SHELL analysis. By examining real-world AV incidents, we highlight how technical weaknesses and human-system interactions contribute to these failures, and how the findings can inform practical safety improvements.
5.1. An Integrated Swiss Cheese—SHELL Framework
First, we use Swiss Cheese model to study failures in AVs. The goal is to characterize how multilayer barrier failures align to produce autonomous driving safety incidents, identify the most common alignment pathways, and prioritize the highest-impact mitigations. The Swiss Cheese model in AVs failures is depicted in
Figure 6. The following is a discussion of each slice needed to check for holes in every incident.
The Department of Motor Vehicles of the State of California reports safety risks exposed by accidents and disengagements to the public during on-road testing [
54]. When an AV disengages during a test on the road, this does not necessarily lead to traffic accidents, but it represents a risk event that requires the human operator to step in and take control of the vehicle [
55]. Apple classifies disengagement events into two general types: manual takeover and software disengagement. A manual takeover occurred when the AV operators determined that manual control was necessary, instead of automated control. In some cases, these types of events may be the result of actual driving conditions, such as emergency vehicles, construction zones, or unexpected objects around the roads. In software, disengagement is caused when an issue with perception, motion planning, controls, or communication is detected. For instance, if a sensor is unable to detect and track an object in the surrounding environment adequately, a human driver will take over driving. Whenever the decision layer fails to generate a motion plan, and the actuator fails to respond in a timely or appropriate manner, disengagement will occur. Nevertheless, different manufacturers may define disengagement events differently, so for some companies, the reported disengagement events may not be complete. There may be significant differences, primarily due to the maturity of autonomous technology. Despite that, the possibility that the definition of disengagement during on-road testing contributes to the difference cannot be ruled out. Policymakers can play a significant role in defining disengagement events in a way that considers perception errors, decision errors, actions errors, and system faults, among others.
A perception layer acquires data from multiple sensors to obtain a real-time picture of the environment and make decisions [
56]. Sensor technology plays a major role in determining the development of AVs, especially its complexity, reliability, suitability, and maturity [
57]. Various sensors can be used to detect the environment, including light detection and ranging (LiDAR) sensors, cameras, radars, ultrasonic sensors, contact sensors, and global positioning systems. Various sensing technologies have their own functions and capabilities [
58]. The perception of the other road users, traffic signals, and other hazards may be disturbed if AVs misperceive their status, location, or movement, or detect potential hazards.
A variety of factors, including hardware, software, and communication, can cause a perception error. Sensing technology plays an important role in perception; therefore, perception errors can result from hardware, including sensors. Especially when sensors fail or degrade, it can compromise the server’s perception, lead to confusion when making decisions, and create risky driving conditions. Considering this, the development of reliable and fault-tolerant sensing technology may provide a solution to these problems. Additionally, perception errors may also derive from software malfunctions that could affect decision and action layers, potentially leading to mission failing or causing safety issues. With AVs approaching full automation, communication errors will become increasingly important. In some cases, communication errors are caused by problems between the AVs and corresponding infrastructures [
59], other road users [
58], and the internet [
60]. Modern transportation systems require effective interpersonal communication [
61]. Communication is the foundation of AVs, enabling coordinated movements and ensuring the safety of all road users, including pedestrians, bicyclists, and construction workers [
62]. The methods of communication include gestures, facial expressions, and vehicular devices, as well as cultural, contextual, and experiential factors all play a role in how these messages are comprehended. These factors also represent AV technology’s most significant challenges [
61].
All information processed in the perception layer is analyzed by the decision layer. Decisions are made and actions are taken based on the information generated by the decision layer [
63]. The decision-making system relies upon SA for both short- and long-term planning. Several tasks are included in short-term planning, including trajectory generation, obstacle avoidance, event management, and maneuver management [
64]. In the meantime, route planning and mission planning play a vital role in long-term planning [
65,
66].
There are two main causes of decision errors: system failures and human error. The effective use of AV requires only that it takes over driving or warning the drivers whenever necessary, with a minimal number of false alarms and acceptable positive performance, such as safety [
67]. A significant reduction in false alarm rates can be achieved while maintaining adequate accuracy and safety as AV technology improves over time [
68]. If the algorithm is incapable of detecting all the hazards effectively and efficiently, the safety of AVs will be compromised. Due to drivers taking over from the automated vehicle, there may be a few seconds before the driver can take over control [
69]. This adds uncertainty to the safe operation of the AV.
The reliability of an AV system depends on its architecture, hardware, and software. It is important to note, however, that AV architecture is highly dependent upon the level of automation, and therefore, AV safety may vary from one stage to another. It is also possible to find variations in AV architecture even at the same automation level across studies. A typical AV comprises a sensor-based perception system, an algorithm-based decision system, an actuator-based actuation system, and the interconnections between systems [
70]. As a rule of thumb, all components of an AV should function well to ensure its safety.
A traditional engine’s steering wheel, throttle, or brake is controlled by the action controller after the decision layer transmits the command [
71]. Moreover, the actuators are also monitoring the feedback variables and based on the feedback information, making new actuation decisions.
Human–machine interfaces (HMIs) for vehicles typically serve as support for the role of human dominance. Even though modern driving assistance systems allow vehicles to take over control in some situations, the typical HMI has not changed radically in the past few decades. As deep learning neural network technologies become more prevalent in automotive applications, multi-modal communications between drivers and vehicles can be enabled effectively [
72]. An intuitive multimodal HMI supports smooth switching between human manual control and automated operation, allowing drivers to interact with Avs easily. In contrast to the steering wheel of a vehicle, AVs do not have a standard multimodal HMI. It should also be noted that multimodal communications differ from buttons and knobs, which are less susceptible to cultural variations across countries. Original Equipment Manufacturers (OEMs) in the automotive industry can distribute their typical HMI across many countries and markets with very little adaptation. However, OEMs need to adapt to multimodal HMIs based on cultural impacts, driving habits, social cognition, and the legal traffic system. There is an elaborate description of design methodologies for HMI systems at various levels of the AVs [
18]. In partially driving AVs (SAE level 2) as well as conditionally driving AVs (SAE level 3), multimodal HMI is intended not only to reduce driver fatigue, but also to maintain a certain level of the driver’s engagement to ensure that when the switching occurs, the driver can take over control quickly. Multimodal HMI systems function as a trade-off curve between two design goals.
Figure 6.
Hybrid Swiss Cheese and SHELL Approach.
Figure 6.
Hybrid Swiss Cheese and SHELL Approach.
Following the safety failures analysis performed by the Swiss Cheese model, we use the SHELL model to systematically identify, code, classify, and prioritize human-system interface failures that contribute to safety incidents in AVs. Human actors, such as safety drivers and other road users, are at the center of the core conceptual framing, serving as liveware. The four interfaces in the SHELL model are discussed as follows.
The liveware-software (L-S) interface refers to the interactions, communication, and dependencies between human actors (liveware) and the software components of an autonomous driving system. The software term here covers the user interface, the automation logic, tool chains, development pipelines, and telemetry and diagnostics systems. Failures in L-S occur when software fails to appropriately present, manage, or support human interaction with those functions, or when human expectations/actions misalign with what the software is doing.
Table 2 shows common ways L-S failures show up in AV safety incidents.
When the software interface fails as an L-S hole, it reduces the effectiveness of one of the defensive layers in the Swiss Cheese model, such as the HMI layer or the human-monitoring barrier. Then that hole must align with failures in other layers, such as perception and planning, for a safety failure to occur. For instance, poor alert design combined with sensor occlusion and misprediction will lead to a collision because the human did not see the pedestrian in time and was not sufficiently alerted or able to intervene.
The liveware-hardware (L-H) refers to the interactions, dependencies, and potential mismatches between human actors (liveware) and the physical components and mechanical or electronic hardware of an AV system. Hardware includes sensors, actuators, mechanical components and mounting elements, calibration and alignment, and maintenance infrastructures or tools. If hardware malfunctions, degrades, are poorly maintained or installed, or are calibrated badly, human operators must respond. Failures at this interface can degrade perception, lead to misinterpretation of the environment, and delay or prevent the taking of corrective actions.
Table 3 shows the most common types of hardware-related failures in AVs settings.
In Swiss Cheese model, L-H failures are holes in the hardware or perception or actuation layer defenses. For instance, a misaligned LiDAR-camera extrinsic calibration, such as hardware hole, can allow perception to misclassify or mislocalize an object as another hole in the perception layer. The hardware error, combined with a planning module that does not account for uncertainty, amplifies the hole alignment error. If the human operator or safety driver is not notified or misses notice due to user interface issues, then that layer of defense fails, letting the hazard through. Therefore, L-H holes often trigger or magnify failures in higher layers, such as perception, prediction, planning, and control. Analyses that neglect hardware issues may underestimate the chance of hole alignment leading to collisions or near-misses.
The liveware-environment (L-E) interface concerns how human operators, such as safety drivers, interact with, are affected by, or must adapt to the environmental conditions under which autonomous driving systems operate. Environment in this case spans ambient conditions, road geometry and infrastructure, traffic complexity, and regulatory, legal or societal context. The human component (liveware) must perceive, understand, anticipate, and respond effectively, given the environmental context. If the environment degrades or presents surprises, those human processes can fail.
Table 4 shows typical failure modes in L-E, how they happen, and how they can undermine safety in autonomous driving systems.
Within the Swiss Cheese model, L-E holes often act as enablers or amplifiers. They weaken upstream defenses or create more severe consequences of other holes. For instance, under heavy rain as the environment hole, sensor perception as the hardware or software layer degrades, leading to miss pedestrians. If human safety drivers are fatigued or alert user interface is low in salience, takeover is delayed or absent, then it will lead to collision. A poor road geometry (curves) and faded lane markings degrade lane detection (perception), and low lighting (environment) worsens visibility, leading to system misalignments, operator misinterpretations, and the planning layer may assume more confidence than warranted. Other example of L-E holes in the Swiss Cheese model are regulatory or temporary signage issues as construction zones might not be encoded in maps, system may not know the change, and even human may see it, but system’s behavior may lag or misinterpret. Thus, L-E holes often set the stage for active failures elsewhere or force human operators to operate higher workload or with degraded SA, increasing probability of error.
The liveware-liveware (L-L) interface refers to human-to-human interactions that are relevant to safety in autonomous driving systems. These human actors may include a safety or fallback driver, remote or teleoperators, operations or fleet management staff, engineers or developers maintaining the system, first responders or regulators, and other road users such as pedestrians, human-driven vehicles, cyclists, especially when informal communication (gestures and eye contact) matters. L-L covers communication, coordination, shared SA, handover protocols, expectations, organizational culture, rules of engagement, escalation procedures, and informal behavioral norms among these agents.
Table 5 shows some typical failure modes in L-L.
The L-L holes often serve as latent or active contributors in multi-layer failure alignment in the Swiss Cheese model. Some pathways are automation layers (software or perception) may detect hazards, but if remote operator or safety driver is not aware, i.e., handed over poorly, then that layer fails due to human intervention being late or incorrect. Hardware degradation or environmental condition degrade system performance. If operators or maintenance teams do not communicate the issue to safety drivers, then the hole persists. Regulatory or culture lapses at the organizational or governance layer may lead to vague roles, leaving no one in charge in incidents. Informal communication, such as gestures and expectations among road users, interacts with environment or perception. If non-human road users misinterpret AV behavior, or human drivers do not understand AV behavior, leading to risky situations.
While the Swiss Cheese model and the SHELL model have been widely applied independently in safety-critical domains, such as aviation and healthcare, their combined application has not been formally proposed in the context of autonomous driving systems. This study intentionally integrates the two models to address the complex socio-technical nature of AV safety failures. The Swiss Cheese model provides a system-level perspective by illustrating how latent failures align across multiple defensive layers, including governance, perception, planning, control, and human–machine interface. However, it does not explicitly capture the detailed mechanisms of human–system interaction failures. Conversely, the SHELL model focuses on interface-level mismatches between humans and software, hardware, environment, and other humans, but lacks a structured representation of how these failures propagate across system defenses. By integrating the two models, this hybrid approach enables a unified analysis that links interface-level human factors to system-level failure pathways, offering a more comprehensive and empirically grounded understanding of autonomous driving safety incidents. This integration emerged directly from the analysis of real-world AV incident data, where failures were rarely isolated and instead resulted from the alignment of technical, organizational, and human factors.
Table 6 presents the core operational contribution of the hybrid framework by explicitly mapping SHELL interface failures to the Swiss Cheese model layers, the nature of the resulting vulnerability, and their safety implications. This mapping transforms the conceptual integration into an applied analytical tool.
5.2. Research Contributions
AVs represent a transformative technology with the potential to reshape mobility, but their safe deployment remains a critical concern. Understanding how and why safety failures occur in AVs especially from both technical and human–organizational standpoints—is essential for building trust, improving regulations, and reducing real-world risks. Based on the detailed analysis presented above, this study makes the following key contributions:
This study develops and applies a hybrid Swiss Cheese–SHELL framework that systematically links system-level defense layers with human–system interaction failures, enabling a more comprehensive understanding of autonomous driving safety incidents.
There are five layers identified in the Swiss Cheese model. By decomposing autonomous driving into governance, perception, planning and decision, control and actuation, and human–machine interface layers, this work demonstrates how failures propagate and align across layers, rather than occurring as isolated faults.
There are four interfaces analyzed in the SHELL model. The study adapts the classical SHELL model to autonomous driving by detailing how Liveware–Software, Liveware–Hardware, Liveware–Environment, and Liveware–Liveware interactions contribute to safety failures, supported by concrete failure modes observed in real-world incidents.
The analysis highlights that human contribution to AV safety failures extends beyond on-board driving, encompassing remote operators, maintenance teams, organizational communication, and interactions with other road users, emphasizing the socio-technical nature of AV systems.
The findings provide a foundation for targeted mitigation strategies by showing how technical degradation, environmental complexity, interface design, and communication breakdowns jointly influence risk, offering actionable insights for system designers, operators, and regulators.
These contributions are derived from in-depth analysis of real-world AV incident data sourced from NHTSA. The study reveals that safety failures often stem from the alignment of latent failures across governance, perception, planning, control, and HMI layers. Moreover, human factors—such as unclear alerting, sensor misalignment, or remote operation breakdowns—play a major role in these incidents, demonstrating that AV safety is not merely a technical problem but also socio-technical.