Next Article in Journal
Towards a Fair and Comprehensive Evaluation of Walkable Accessibility and Attractivity in the 15 Min City Scenario Based on Demographic Data
Previous Article in Journal
Geotechnical Data Management for Infrastructure Resilience: A Relational Database Approach Based on the AGS Standard
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Integrating System-Theoretic Process Analysis and System Dynamics for Systemic Risk Analysis in Safety-Critical Systems

by
Ahmed Shaban
1,2,*,
Ahmed Abdelwahed
2,3,
Islam H. Afefy
2,
Giulio Di Gravio
4 and
Riccardo Patriarca
4,*
1
Department of Mechanical and Industrial Engineering, College of Engineering, Sultan Qaboos University, Al-Khoud, Muscat 123, Oman
2
Mechanical Engineering Department, Faculty of Engineering, Fayoum University, Fayoum 63514, Egypt
3
School of Public Health and Social Work, Faculty of Health, Queensland University of Technology, Brisbane, QLD 4001, Australia
4
Mechanical and Aerospace Engineering Department, Sapienza University of Rome, Via Eudossiana, 18, 00184 Rome, Italy
*
Authors to whom correspondence should be addressed.
Infrastructures 2026, 11(1), 3; https://doi.org/10.3390/infrastructures11010003
Submission received: 20 October 2025 / Revised: 10 December 2025 / Accepted: 12 December 2025 / Published: 19 December 2025

Abstract

This paper presents a novel integration of System-Theoretic Process Analysis (STPA) and System Dynamics (SD) for hazard and resilience analysis in safety-critical infrastructure systems. The methodology is applied iteratively to assess the safety and continuity of a hospital’s oxygen supply system, a key element of critical health infrastructure, addressing both technical and managerial factors. STPA identifies unsafe interactions between system components, which are systematically translated into a system dynamics simulation model. This dynamic perspective allows for the exploration of how hazards evolve over time and how control strategies influence overall system resilience. Unlike previous conceptual approaches, this study applies the integrated framework to a real-world incident of oxygen supply failure. The model structure is derived from STPA artifacts and validated using expert input and incident data. Simulation experiments uncovered emergent risk patterns, such as alarm delays, staff stress, and insufficient training, that are not evident through STPA alone. These insights support targeted interventions, including enhanced drill frequency and resource allocation, to strengthen infrastructure resilience. By embedding dynamic simulation within the STPA framework, this research moves beyond static hazard identification to enable scenario-based testing and conditional estimation of system response to support risk-informed decision-making. The resulting methodology is traceable, repeatable, and adaptable, offering a practical and generalizable tool for systemic risk analysis in critical infrastructures.

1. Introduction

Complex systems involve a large number of interactions among their components [1]. These interactions might cause hazardous situations which require dedicated analysis. Hazard analysis is an essential part of any risk assessment process that aims to prevent accidents. A robust risk management approach should indeed include an analysis and assessment of the hazards associated with the system by using structured approaches [2]. Extensive research in system safety has highlighted those accidents in complex socio-technical systems often arise not from individual component failures but rather from unsafe interactions between components, particularly under dynamic operating conditions [1,3,4,5]. To assess and manage these interactions, practitioners have traditionally relied on hazard analysis techniques such as Failure Mode and Effects Analysis (FMEA) [6], Hazard and Operability Study (HAZOP) [7], and Bow-Tie analysis [8,9]. However, these approaches tend to focus on isolated component failures and often fall short in addressing the systemic interactions and feedback mechanisms that can lead to hazardous scenarios [10,11]. In response to limitations in traditional hazard analysis methods, there is a growing reliance on enhanced methodologies that specifically delve into the complexities of system interactions. Modern hazard analysis methods aim to scrutinize all system components and their potentially unsafe interactions, providing a clearer understanding of hazards associated with the entire system and its constituent elements [12,13]. Notably, an increased interest has been directed towards methodologies such as STPA (Systems-Theoretic Process Analysis) to analyze hazards from a system perspective [14].
According to Leveson & Thomas [1], STPA is grounded in the System-Theoretic Accident Model and Processes (STAMP), and leverages system theory to enhance safety analysis [15,16]. Unlike traditional methods, STPA systematically identifies hazards arising from unsafe interactions within system control loops, rather than solely focusing on individual components [3,14,17]. Several studies have demonstrated its superior capability in uncovering hazards that conventional methods like FMEA, HAZOP, and Bow-tie analysis frequently overlook due to their linear and component-focused nature [8,18]. Moreover, STPA provides additional advantages by offering a comprehensive understanding of system behavior, highlighting not only immediate hazards but also long-term implications and interactions among system components. This systemic perspective significantly enhances hazard management strategies, moving beyond traditional static outputs towards dynamic and interaction-aware safety assessments, ultimately contributing to more robust safety management decisions and reduced accident rates [14,19,20,21,22]. However, quantifying how loss scenarios or unsafe control actions evolve over time is beyond the intended scope of STPA, and requires the use of simulation-based modeling approaches that allow risk-free experimentation with the system’s behavior [23,24]. Therefore, there has been effective integration between modelling, accident analysis, and hazard analysis models to analyze hazards and accidents related to complex systems [25]. In particular, recent research has attempted to integrate hazard analysis methodologies with other modelling techniques that can capture the dynamic behavior of complex systems under different conditions.
System dynamics modelling methodology relies on studying the behavior of systems by analyzing the relations between system components [26]. System dynamics can be used to study and understand the behavior of complex system components over time [27]. While STPA is not intended to model how system behaviors evolve over time or to support quantitative risk decision-making, integrating it with dynamic modeling techniques such as system dynamics can enhance its practical applicability. This integration allows safety analysts to simulate the progression of identified loss scenarios under varying conditions and to assess the potential effectiveness of proposed safety constraints in a dynamic context. Although STPA identifies hazards based on structural and causal analysis, simulating these scenarios helps explore how contributing factors interact over time and supports decision-making by evaluating the anticipated impact of preventive measures. In this way, the use of simulation serves to complement STPA by providing a virtual environment in which to test and refine proposed control modifications [28], without questioning the inherent validity of the original hazard identification. Accordingly, the integrated STPA–SD approach used in this study is intended both to deepen system understanding and to provide conditional, scenario-based predictions of system response under specified assumptions, rather than long-term or context-free forecasting.
This paper contributes to the field of systemic risk analysis in safety-critical systems by operationalizing a structured integration between STPA and system dynamics simulation. STPA focuses on identifying hazards resulting from unsafe interactions and feedback flaws, but it does not extend to modeling their progression over time or enabling quantitative analysis of mitigation strategies. To address this methodological gap, we propose an approach that systematically translates STPA outputs, such as loss scenarios and control structures, into system dynamics models capable of simulating time-dependent behaviors, feedback effects, and the impact of various operational factors. This integration enhances the analytical depth of STPA and supports dynamic, risk-informed decision-making. The methodology is demonstrated through a real-world case study involving a hospital oxygen supply system, where a previously identified loss scenario is simulated to examine how alarm delays, staff stress, and preparedness measures influence patient safety outcomes. The results offer both empirical validation and practical insights for safety practitioners, particularly in healthcare settings where timely control actions are critical. This research extends the previous work presented in Shaban, Abdelwahed, Di Gravio, et al. [29] in which STPA has been employed for hazard analysis in safety-critical medical gas pipeline and oxygen supply systems. It will involve the development of a system dynamics simulation model derived from the loss scenarios identified in the previous related study.
The subsequent sections of this paper are structured as follows. Section 2 provides a summary of previous research related to STPA and system dynamics and their applicability across various domains. Section 3 outlines the proposed methodology for integrating STPA and systems dynamics for hazard analysis and assessment in safety-critical healthcare systems. Section 4 elaborates on the implementation of a new methodology in a hospital setting. Section 5 provides a discussion of the results. Finally, Section 6 offers conclusions, summarizes the subject and limitations of this research and suggests avenues for future research.

2. Literature Review

As summarized in a recent scoping review by Patriarca et al. [14], STPA has seen increasing adoption across industries such as aerospace, healthcare, energy, transportation, and automation, underscoring its methodological maturity and broad applicability for systemic hazard analysis. Recent research supports using STPA in system safety analyses across diverse domains to overcome the limitations of traditional methodologies. Abrecht et al. [18] applied STPA to a rotorcraft flight control system and identified hazards related to unsafe control actions and feedback flaws that were not detected by traditional methods like FMEA and FTA. These overlooked hazards stemmed from system-level interactions rather than component failures, illustrating how STPA enables a more comprehensive analysis, an approach particularly relevant to complex socio-technical systems such as healthcare infrastructures studied in this paper. Similarly, Ishimatsu et al. [17] demonstrated STPA’s ability to reveal design flaws in complex spacecraft systems, surpassing traditional fault tree analysis in coverage and depth. Beyond aerospace, STPA has been applied in the medical domain to analyze anesthesiologic processes [30], and radiotherapy safety [31], and has been recommended for risk assessments in radiological device licensing [32]. Applications also extend to autonomous maritime systems [28], and robotic operations [33], reflecting the method’s adaptability to complex socio-technical environments.
Although STPA is increasingly applied to complex socio-technical systems in complex socio-technical systems through the analysis of unsafe control actions and flawed feedback mechanisms [17,29,32,34], it is fundamentally a qualitative method focused on structural and causal reasoning. As such, it was not designed to predict how system behavior may change over time or to provide direct quantitative support for risk-informed decision-making [35,36]. These aspects represent boundaries of its intended scope rather than methodological shortcomings. For example, Dakwat & Villani [34] noted the lack of formalism in STPA due to its reliance on expert judgment, which may introduce variability depending on the analyst’s experience. This reinforces the motivation to incorporate structured, simulation-based methods to enhance consistency and traceability in hazard analysis.
To complement STPA’s qualitative outputs, researchers have increasingly explored the integration of simulation-based approaches such as system dynamics. Quantification enables the analysis of temporal dynamics, sensitivity to system variables, and evaluation of proposed control strategies under different operational conditions [33,36]. In this context, simulation serves not to validate STPA findings but to dynamically explore how identified loss scenarios may develop, interact, and escalate in realistic environments. Virtual simulation environments offer a safe and controlled means for testing the effectiveness of control measures, especially in complex domains where physical experimentation is impractical or risky [27,28]. This supports decision-makers by providing insights into the potential timing and impact of safety interventions, thereby enhancing the practical utility of the hazard analysis. Consequently, the term “verification” in this study refers to examining the behavior and outcomes of STPA-identified scenarios under dynamic conditions, rather than questioning the validity of the scenarios themselves.
To overcome these challenges, more recent studies have integrated STPA with system dynamics simulation or broader dynamic analysis to identify hazard scenarios and model complex systems to assess the validity of the system control structure. For instance, Jiao et al. [36] introduced a framework which integrates system dynamics and STPA for hazard analysis in the coal mining process. Leveson et al. [25] used system engineering based on control structure modeling and static and dynamic analysis to assess pharmaceutical safety. Systems-theoretic approaches such as STAMP have been used to analyze accident causation mechanisms and support accident investigation in complex socio-technical domains, including aerospace and infrastructure-related systems [3,37].
Forrester [38] defined system dynamics as a method for understanding and analyzing the behavior of industrial systems to identify how the elements of the system, such as procedures and policies, are interconnected, and the effects of the interaction of the elements of the system with each other on the growth and stability of the system. Various studies have used the system dynamics methodology in studying and analyzing the safety of industrial systems as well as analyzing the causes of accidents. Yu et al. [39] assessed the safety of a nuclear power plant system by using a dynamic model to examine the human and regulatory factors that could affect the safety of the system. In addition, Kang & Jae [40] used system dynamics to analyze system reliability and evaluate operating limitations in a nuclear power plant. Bouloiz et al. [41] applied system dynamics to assess the safety of a storage unit of hazardous chemicals in Morocco. Garbolino et al. [11] applied system dynamics simulation in evaluating the safety system in the chlorine transfer unit in a plastic synthesis plant. Also, system dynamics simulation used by Cooke [26] to analyze the conditions and causes of an explosion at the Westray mine disaster in 1992 using the principles of system dynamics.
While previous studies, such as Jiao et al. [36], proposed integrating STPA with system dynamics (SD), their work focused mainly on conceptual feasibility and lacked a detailed, replicable method for translating STPA outputs into simulation model components. Similarly, Leveson’s foundational contributions emphasize hazard identification through control structure analysis but do not address the temporal evolution of hazards or the dynamic evaluation of control strategies. Previous works have largely focused on demonstrating the feasibility of STPA–SD integration without establishing a standardized, reusable protocol for converting static STPA artefacts into dynamic simulation elements. To address these gaps, this study introduces a structured and repeatable translation methodology for mapping STPA artifacts, such as unsafe control actions and loss scenarios, into system dynamics constructs, including stocks, flows, variables, and feedback loops. Additionally, we define a formalized translation mechanism that systematically maps specific STPA outputs, including Unsafe Control Actions (UCAs), directly into corresponding System Dynamics constructs, thereby operationalizing STPA findings for dynamic analysis. This integration is demonstrated through a healthcare case study, where a detailed mapping illustrates the transformation of STPA findings into a dynamic simulation model, enabling time-based risk analysis and supporting simulation-informed safety decision-making.
The integration between STPA and SD allows practitioners to evaluate the anticipated impact of safety constraints under dynamic conditions, offering quantifiable insights that support proactive and risk-informed decision-making [35]. In this framework, simulation does not question or re-validate STPA findings; rather, it extends their utility by modelling how identified loss scenarios evolve over time and how technical and human factors interact through feedback mechanisms. By embedding the formalized translation protocol within a dynamic modelling environment, the proposed approach converts qualitative STPA outputs into executable, scenario-based analyses that reveal the temporal progression of hazards and the conditions under which interventions are most effective. This structured integration enhances both the explanatory depth and the practical applicability of STPA, supporting scenario-based evaluation and enabling more rigorous and transparent risk-informed decision-making in complex socio-technical systems.

3. Proposed Methodology for Integrating STPA and System Dynamics

The proposed methodology, illustrated in Figure 1, was developed based on using system dynamics simulation to simulate loss scenarios detected by STPA. It essentially aims to extend the use of system dynamics as a complementary step to the STPA methodology, reinforcing the validity of the current controls identified in STPA. Furthermore, this approach offers a streamlined method for testing the loss scenarios uncovered by STPA and verifying the factors that contribute to these scenarios and their anticipated outcomes. Through this methodology, an integrated framework was introduced to combine STPA and system dynamics simulation to enhance the effectiveness of STPA in identifying hazards of complex systems.
The outcomes of the STPA phase, which include identified loss scenarios, will be utilized in the system dynamics phase to perform simulation modelling. This simulation phase seeks to replicate the occurrence of these loss scenarios and analyze the behavior of system components during these scenarios. This process is meant to offer a clear understanding of unsafe interactions that could occur and lead to loss scenarios. To some extent, using a different simulation logic, this paper is referring to integration between STPA-Sec with simulation models in the context of cyber-resilience [42]. The specifics of each phase of the methodology are outlined in Section 3.1, Section 3.2 and Section 3.3.

3.1. Identifying Loss Scenarios Using STPA

The standard STPA procedure consists of four main steps: (1) identifying system-level accidents, associated hazards, and safety constraints; (2) modeling the system’s control structure, including human, organizational, and technical components; (3) identifying unsafe control actions (UCAs) that could lead to hazardous states; and (4) analyzing potential causal scenarios that may trigger these UCAs. The outcome of this process is a set of detailed loss scenarios representing how safety constraints might be violated under various operational conditions. These scenarios form the analytical basis for subsequent modeling using system dynamics. Figure 1 illustrates how the outputs of STPA are mapped into the simulation framework. In this study, the STPA process was conducted using structured templates following the STPA Handbook. The specific application of these steps in the context of a hospital oxygen supply system is presented in Section 4.2.

3.2. Simulate Unsafe Control Scenario Using System Dynamics

System dynamics focuses on analyzing the causal relationships among system components, thereby contributing to the complexity of the system. The systemic exploration of feedback and the examination of system behavior are captivating aspects of system dynamics. This approach elucidates how the structure of a feedback system, along with the loops it encompasses, influences its dynamic behavior. It is a methodology grounded in the concept of feedback, emphasizing the interactions between structural components and behavior [41].
According to Forrester [38], constructing a system dynamics model involves four steps. The first step includes defining the problem by specifying the modelling goal and identifying the entities, relationships, and behavior to be emphasized. The second step entails creating a causal relationship diagram to illustrate the causal links between system elements. Additionally, causal diagrams depict feedback cycles between system elements that either reinforce (positive feedback loop) or counteract (negative feedback loop) changes in system variables [43]. The third step involves developing a stock-flow diagram to introduce stock variables and flow into the system. In the fourth step, simulation models are formulated, considering the values of variables that have an impact and from which the model derives information.
Integrating STPA facilitated the process of defining the system and its components, thereby reducing the steps required to build the system dynamics model. As shown in the system dynamics phase in Figure 1, the loss scenarios discovered through STPA can be used as the primary input for constructing the model. Following this, the variables in the model will be identified, considering both the identified loss scenarios and the STPA control structure. In the third step, a simulation model will be developed by constructing a stock and flow diagram to represent the system components and their interactions. The system dynamics model will be built using the selected loss scenario and control structure from the STPA phase as a foundation for its design. Subsequently, the model will undergo testing, and the results will be reviewed to confirm its effectiveness.

3.3. Transition from STPA to System Dynamics

The proposed methodology incorporates system dynamics simulation as a subsequent step following the discovery of loss scenarios by STPA. The most critical loss scenarios are selected, and a simulation model will develop to reflect the occurrence of these scenarios, as outlined by STPA.
During the model-building stage, the control structure established in the STPA methodology serves as the foundation for model-building to define the model variables and the interrelations between variables. This structure aids in identifying the system components, their interrelationships, the current control methods utilized, and their interactions with the system elements, whether safe or unsafe. Additionally, using control structure and loss scenarios can inform the stock and flow diagram where stock variables accumulate hazard or loss. Furthermore, actions and feedback loops addressed by the control structure aid in determining the initial conditions or trigger points for variables in the simulation model.
To systematically translate the STPA results into the simulation model, we established a direct mapping protocol where specific hazard artifacts correspond to distinct system dynamics elements. The identified Loss Scenarios are modeled as Stocks to represent the cumulative state of the system over time (e.g., the number of affected patients). Unsafe Control Actions (UCAs), particularly those involving timing failures, are quantified as Flow Regulators or delay functions that control the rate of change. These flows are influenced by causal factors, such as panic, alarm failure, or stress, which are mapped as Auxiliary Variables. Finally, the control loops defined in the STPA structure are implemented as Feedback Loops, creating the information links that connect corrective actions back to the system state.
In practical terms, the translation protocol was implemented through a sequence of steps. First, we selected a representative loss scenario (Loss Scenario 11.4) from the STPA analysis as the primary “case” to be simulated, thereby fixing the system boundary, time horizon, and key actors. Second, the STPA control structure was used to identify the main controllers, controlled processes, sensors, and actuators (e.g., nursing staff, maintenance team, AVSU, AAPU, main panel), which were then mapped into the causal structure and feedback loops of the SD model. For example, the feedback loop linking alarms, staff response, and restoration actions is directly derived from the control loop that connects the alarm panels, nursing team, and maintenance team in the STPA control structure.
Third, timing-related Unsafe Control Actions (UCAs) and causal scenarios (e.g., delayed alarms, delayed maintenance response, omitted temporary restoration) were converted into quantitative delay parameters and rate modifiers in the SD model (e.g., “Alert Delay”, “Response Time”, “Repair Time”). Their baseline values and plausible ranges were elicited through structured interviews and group discussions with hospital staff and subsequently calibrated by reconstructing the 2017 oxygen supply incident, as detailed in Section 4.3.4. In this way, the STPA outcomes do not merely inspire the SD model but actively constrain its structure and parameterization, ensuring that the dynamic representation remains traceable to the original hazard analysis.
More broadly, using STPA as a front-end to system dynamics substantially facilitates the development of the SD model. STPA provides a systematically derived set of loss scenarios, UCAs, and control-structure relationships that define: (i) which variables must be represented, (ii) which interactions and feedbacks are safety-relevant, and (iii) which pathways are most critical for simulation. Rather than constructing the SD model from scratch based solely on modeler intuition, the proposed approach anchors each stock, flow, and feedback loop in explicitly documented STPA artefacts, thereby reducing modelling subjectivity and enhancing the transparency and repeatability of the simulation model.
The methodology focuses on simulating each loss scenario independently to explore how the unsafe interactions identified through STPA may evolve over time and to assess the dynamic behavior of contributing factors. The simulation model was developed using Vensim PLE, a widely used tool for system dynamics modeling. Validation of the model was conducted through structured consultations with hospital personnel, including staff from the safety, engineering, and clinical departments, to ensure that the structure and assumptions reflected the real-world system. Additionally, the simulation results were compared with a previously documented incident to assess the plausibility and practical coherence of the model. This process supports the evaluation of both individual and combined impacts of scenario elements and strengthens the model’s utility in risk-informed decision-making.
While this study focused on modeling a single, representative loss scenario derived from the STPA process, the system dynamics approach used is highly extensible. One of the key advantages of simulation modeling, particularly using system dynamics, is its ability to represent complex, non-linear interactions among multiple system elements in a risk-free virtual environment. The developed model can be expanded to include multiple, concurrent, or cascading loss scenarios, enabling exploration of more realistic and dynamic accident conditions that cannot be safely tested in real-world settings. There are no inherent limitations on the complexity or scale of the scenarios that can be incorporated or the types of simulation experiments that can be conducted. This modeling flexibility is especially valuable for supporting proactive risk management and control strategy evaluation in safety-critical systems. Future research will build on the current framework to simulate the simultaneous occurrence of multiple loss scenarios and assess the effectiveness of combined control measures under a range of operational conditions.

4. Case Study

The integrated methodology will be applied to a real-world study involving the medical oxygen supply system of a large-scale hospital in Cairo, Egypt. This study will utilize data from prior research Shaban, Abdelwahed, Di Gravio, et al. [29], where the STPA methodology was employed to establish the control structure and identify loss scenarios. Shaban, Abdelwahed, Di Gravio, et al. [29] will serve as the foundation for this study.

4.1. System Description

To define the system accurately and engage system operators, discussion and brainstorming sessions were conducted among safety researchers, the facility’s occupational safety and health personnel, maintenance department representatives, and biomedical engineering staff. These sessions aimed to understand hospital policies, the design of the medical gas system, and emergency procedures for oxygen interruptions.
The oxygen supply system in the Medical Gas Pipeline System (MGPS) consists of three independent sources: primary, secondary, and reserve. The primary source is a Vacuum Insulated Evaporator (VIE) unit, which includes an insulated tank with an 18,000-L capacity. Secondary and reserve oxygen sources consist of two banks of compressed oxygen cylinders, each containing 12 D-type cylinders with a 50-L capacity.
The oxygen supply system utilizes a pipeline system made of copper seamless pipes to transfer oxygen from the manifold room to hospital units. Each hospital unit contains Area Valve Service Units (AVSUs) and Area Alarm Panels (AAPs). AVSUs are breakable glass-locked boxes containing gas valves for each type of medical gas connected to the pipeline system. AAPs, located at the entrance of each clinical unit, monitor the pressure of oxygen and other medical gases in the outlet points.
The alarm system comprises panels in the manifold control panel and AAPs, which trigger visual and audible alarms in case of pressure drops. Terminal units serve as the final delivery points, color-coded and gas-specific to prevent the wrong gas from being administered to patients.

4.2. STPA Results

In the previous study by Shaban, Abdelwahed, Di Gravio, et al. [29], the STPA identified 13 possible loss scenarios. This study focuses on scenario 11.4, which examines the effect of time required to temporarily restore the O2 supply, as outlined in Table 1. Table 1, adopted from Shaban, Abdelwahed, Di Gravio, et al. [29], presents the potential loss scenarios, associated unsafe control actions, and the corresponding threats that may arise from these scenarios. This scenario was selected particularly due to its emphasis on the interaction between human elements such as nursing and maintenance staff, and system components, including alarm systems (AAPs) and gas control systems (AVSUs).
Moreover, this scenario depicts a realistic incident that occurred at the hospital in 2017, facilitating the process of testing the model’s effectiveness and validity. Studying this loss scenario aims to validate the ability of the simulation model to capture the interactive relationships between contributing elements of the loss scenario and their impact on potential outcomes.
Loss Scenario 11.4 “Long time to temporarily restore the oxygen supply system” was selected for system dynamics simulation based on its significance, complexity, and practical relevance. First, the scenario presents a critical risk to patient safety and hospital operations, as any delay in restoring oxygen supply can lead to life-threatening consequences and disrupt essential clinical functions. Second, this scenario captures complex interactions among multiple system components, such as alarm systems, human operators, logistical processes, and emergency protocols, as outlined in the STPA control structure. These interactions involve dynamic feedback and time-dependent behaviors, which are best explored using simulation-based approaches. Finally, Scenario 11.4 aligns closely with the hospital’s operational priorities, as it enables the evaluation of different policies related to training effectiveness, response procedures, and resource allocation. These characteristics make this scenario highly suitable for system dynamics modeling and allow for a meaningful assessment of potential safety interventions in a controlled, virtual environment.

4.3. Using System Dynamics to Simulate Loss Scenarios

The simulation model was developed according to the steps shown in Figure 1. STPA will provide essential input, including the definition of the system, its elements, control structure, and loss scenarios. These inputs will be used in identifying model variables, building the simulation model, and formulating the mathematical equations that explain the relationships between the components of the system.
For this case study, Vensim PLE software was utilized to construct the system dynamics model. The main objective is to simulate the loss scenario and examine unsafe interactions within various factors that may increase the severity of its consequences. Although the original STPA control structure included hospital and operations management, these controllers were not directly modeled in the simulation to simplify the model of real-time emergency operations, i.e., the timing of the actions executed by these agents does not affect significantly the operations at the granularity level proposed in the model. Their influence is captured through structural parameters such as work-related stress and training frequency, which reflect institutional preparedness and managerial oversight.
The primary input to the simulation process, as depicted in Figure 1, is the control structure which represents the system components and their interactions, providing support in model building and variable identifications. Furthermore, the loss scenarios are considered as primary input for the simulation model which represents model objectives and model outcomes.
System dynamics variables will be identified depending on the control structure outlined by Shaban, Abdelwahed, Di Gravio, et al. [29]. This includes nursing staff, maintenance staff, manifold control panel, the area valve service unit (AVSU), and the Area alarm panel unit (APPU) which were incorporated in the stock and flow diagram shown in Figure 2. Figure 2 presents the stock-and-flow diagram developed in Vensim PLE, summarizing the core feedback loops and system components derived from the STPA control structure; this representation serves as the structural basis for the simulation model. The selected loss scenario is used to identify where the stock variable accumulates risk, identified by the number of affected patient variables.
In the simulation model, variables such as panic, drills, and workplace stress were incorporated to assess the effectiveness of actions and feedback introduced by the operation management and hospital team respectively.

4.3.1. The Scope of the Simulation

The simulation scope is limited to the loss scenario detected by STPA 11.4. Other loss scenarios will be excluded. The aim of studying loss Scenario 11.4, regarding the long time required to restore the oxygen supply system, is to study the effect of various factors such as repair time, work-related stress, and training on increasing the number of affected patients. Furthermore, the model will investigate the unsafe interaction between various factors such as panic, work-related stress, and equipment failure in prolonging system recovery time and thus increasing the number of affected patients.

4.3.2. Model Assumptions

The total or partial failure of the oxygen supply system poses a significant threat to patient safety, as oxygen is vital for managing various medical conditions. Therefore, minimizing the time required to restore the oxygen supply is paramount. For this simulation, it is assumed that the loss scenario affects a total of 50. Additionally, the survival time for all patients is set to three minutes, which refers to the time before a patient can survive before being affected by oxygen cutoff. The model assumptions were set with the advice of hospital staff through brainstorming sessions. In addition to expert input, several core assumptions were anchored in objective and verifiable information. The total number of patients (50) is directly derived from the architectural layout and bed capacity of the affected intensive care zone, as documented in the hospital’s design and occupancy records. Likewise, the configuration of the oxygen supply system, including the AVSU–AAPU arrangement and alarm logic, was taken from technical documentation and standard operating procedures. These objective sources constrain the model structure, boundary conditions, and upper limits on the number of affected patients, while expert judgement is used primarily to parameterize behavioral and timing-related factors that are not routinely recorded in hospital databases.
These brainstorming sessions were conducted by running structured group discussions with six hospital staff members involved in or dealing with the oxygen supply system, including representatives from occupational safety, maintenance, biomedical engineering, medical services, and intensive care. Each session takes approximately three hours and focuses on reviewing past incidents, validating unsafe control actions, and identifying realistic system behaviors during emergency responses. This session provides the ground basis and operational basis for defining the remaining variables such as panic, work-related stress, and the number of drills executed, which were further enriched through individual interviews with nursing and maintenance teams for a better understanding of how they handle emergency situations. This approach ensured that all model components, whether directly derived from the STPA analysis or not, were grounded in real-world experience and reflected the real situation that happened in the hospital.

4.3.3. Setting Model Variables

The variables used in the simulation model were created based on the control structure model shown in Figure 2, as presented in Shaban, Abdelwahed, Di Gravio, et al. [29]. In particular, the Unsafe Control Actions (UCA) and loss scenarios identified through STPA, most notably in clause 5.7 (unsafe control actions) and clause 5.8 (loss scenarios), were used to guide the selection of relevant variables and their interactions. As detailed in Table 2, these variables are categorized into (1) STPA-related variables, which are directly linked to loss scenarios or unsafe interactions in the control structure, and (2) remaining variables that were developed to support the simulation logic, based on expert input and operational understanding. To provide full transparency regarding variable provenance, Table 2 explicitly classifies each model variable as either STPA-related (directly derived from loss scenarios, UCAs, or control-structure elements (CA)) or as a remaining variable introduced to operationalize the dynamics, and it links STPA-derived variables to their specific clauses and scenarios in the original analysis.
The system dynamics model focuses on simulating the escalation of Loss Scenario 11.4, which concerns delays in the temporary restoration of the oxygen supply system. The variables represent the interacting system elements and conditions contributing to this scenario. For instance, the “alert” variable refers to alarms triggered from the main panel, the area alarm panel (AAPU), and the nursing team when a drop in oxygen pressure is visually detected at the area valve service unit (AVSU). Upon activation of these alerts, the maintenance team is notified of a malfunction and the repair process begins. Alarm system failures or staff panic, as identified in the STPA findings, may cause delays in alert activation, leading to longer repair times. These delays are further exacerbated by late notification and insufficient emergency response.
The “repair time” is affected by several factors, primarily the “response time,” which is influenced by work-related stress (stemming from hospital-level control actions such as CA1.1) and the level of panic among staff. Both panic and repair delays can be mitigated through preparedness activities, represented in the model by the “number of drills executed,” which is based on control actions such as CA2.3 and CA3.3 identified in the STPA analysis. As the repair time increases and patient “survival time” expires, the “effect rate” grows, leading to a gradual increase in the “number of affected patients.” These accumulations and feedback loops are central to capturing the dynamics of Loss Scenario 11.4.
While some variables, such as panic, response time, repair time, work-related stress, and patient survivability, are not explicitly named in the STPA outputs, they are logically inferred from the unsafe interactions, human factors, and system weaknesses identified in the analysis. These remaining variables help translate qualitative STPA findings into dynamic behavior that can be modeled and analyzed over time. This approach ensures that the system dynamics simulation remains grounded in the STPA framework while providing deeper insight into the systemic propagation of risk.
The scientific basis for selecting behavioral variables such as panic, drills, and work-related stress lies in the STPA analysis, where these factors were identified as causal contributors to unsafe control actions in the observed case study. For example, the maintenance team’s failure to respond (UCA) was linked to inadequate training resulting from infrequent drills, as well as psychological pressure that manifested as stress and panic. In this model, the variable ‘Panic’ refers specifically to the psychological state of hospital staff (nurses and maintenance technicians), not patients, who act as passive recipients of care in this scenario. Panic is operationalized as cognitive freezing or confusion that can delay detection and prolong repair activities. The quantitative values assigned to these behavior-related variables were derived through expert judgement and incident reconstruction, as detailed in Section 4.3.4.

4.3.4. System Dynamics Equations and Parameters Setting

According to the model depicted in Figure 2, the equations listed in Table 3 were employed to construct the model. The initial values of the variables are defined in Table 4. For variables such as training, work pressure factor, panic, and alarm system failure rate, the initial values were determined through discussions with the nursing staff and the hospital’s safety and maintenance department. Variables such as response time, repair time, and survival time, were reasonably assumed in collaboration with the hospital team. These equations represent the interactions among the components of the oxygen supply system.
The initial values used in the simulation are presented in Table 4. These values were obtained through a structured and collaborative data collection process that included two group meetings and four individual interviews with hospital staff. The group discussions involved six experienced practitioners: the occupational safety manager, maintenance manager, biomedical engineering manager, director of medical services, chief nurse of the intensive care unit, and a medical gas technician. Each meeting lasted approximately three hours and was designed to clarify system functions, emergency procedures, past incidents, and expected response behaviors under failure conditions. All participants had more than 10 years of experience in their respective roles and had been directly involved in the operation, supervision, and emergency management of the hospital’s oxygen supply system. To address the inherent uncertainty and subjectivity of these expert-derived values, the model was calibrated by reconstructing the timeline of the documented 2017 incident, ensuring that the simulated behaviors matched the qualitative historical outcome. These parameter values therefore serve as semi-quantitative, expert-elicited approximations of behavior observed in past emergency events. In particular, timing-related Unsafe Control Actions (e.g., delayed alarm activation, late notification of maintenance staff, slow initiation of temporary restoration) were converted into the numerical delay parameters used in the SD model, specifically “Alert Delay,” “Response Time,” and “Repair Time”—ensuring that each quantified parameter directly reflects the causal mechanisms identified in the STPA analysis. Where direct measurement was not possible, quantitative bounds for delay-related variables were derived through a triangulation process that combined the documented event timeline, expert judgment, and the causal pathways identified in the STPA analysis, ensuring that each numerical parameter remained traceable to an explicit unsafe control action or causal scenario.
Taken together, these steps mean that the model is calibrated at a semi-quantitative level: the absolute values of behavioral parameters are not intended as precise statistical estimates, but as plausible representations of the timing and intensity of delays consistent with the documented 2017 incident. The primary objective is therefore to reproduce realistic temporal patterns and relative changes in system behavior under different configurations, rather than to forecast exact numerical outcomes for individual events.
The initial values, such as panic rate, work-related stress factor, number of drills, survival time, and response/repair times, were derived from these discussions by synthesizing operational knowledge, expert judgement, and past incident reviews (including a real partial oxygen outage). Further refinement of behavioral and procedural variables was obtained through four individual interviews (totaling six hours) with nursing and maintenance staff involved in emergency response. This approach ensured that all values in Table 4 reflect realistic, experience-based approximations of the system’s behavior during an oxygen supply failure. Human factors such as stress and panic are complex psychological responses and difficult to measure with the same precision as physical system variables. Consequently, the values assigned to these variables in the simulation are not intended to be exact empirical measurements, but rather semi-quantitative estimates. These were derived through a consensus of experienced hospital staff and validated against historical incident reports to ensure the model produces behavior that aligns with real-world observations. Accordingly, the model behavior reflects the combined influence of documented system characteristics and expert-elicited behavioral parameters, producing escalation patterns that are consistent with both the STPA causal analysis and the historical 2017 incident.
The alert is activated by an alarm from the main panel or nursing staff, which is activated in response to AAPu alarm or visual monitoring of AVSu if a decrease in oxygen pressure is detected. Alert activation in the model is influenced by multiple factors identified in the STPA analysis, including technical failure in the main or area alarm panels, human delay in visually detecting the pressure drop, and late notification of the maintenance team. These factors interact within the system’s control structure and are represented in the simulation model through causal loops that dynamically influence the timing of alert activation and subsequent restoration efforts in Loss Scenario 11.4.
Repair time refers to the duration needed to restore the oxygen supply which is influenced by three factors beyond the required time to fix the failure. The main factor is the response time of the maintenance staff, which is affected by stress from operations management, potentially decreasing their reaction. Additional factors include alarm delay, due to alarm system failures (main panel and AAPu), and panic, which can critically delay the alert process and, directly impact the response time of maintenance staff.
The panic rate in the simulation model was modeled as inversely related to the number of emergency drills executed. This relationship was formulated based on insights gathered during structured brainstorming sessions and individual interviews with hospital personnel, including representatives from safety, maintenance, nursing, and clinical departments. Participants consistently emphasized that regular emergency drills enhance preparedness, reduce uncertainty, and significantly lower panic levels during emergency situations. These expert insights were used to parameterize the causal link between drills and panic in the model. The inclusion of this relationship is further supported by reflections on a real oxygen supply disruption that occurred at the hospital in 2017, during which inadequate training and drill frequency were identified as contributing factors to delayed response.
The effect rate is the main variable representing the relationship between repair time and patient survival time. As repair time increases, the patient survival time decreases over time, which triggers the effect rate and leads to a gradual rise in affected patients as the situation persists.

4.3.5. Model Running

Several experiments were conducted to study the impact of various factors on the number of affected patients. The factors such as repair time, panic, management pressure, and alarm delay, are also studied. The experiments evaluated how these variables contribute to the increase in the affected number of patients, even when the repair time is reduced.
This case study includes three experiments. The first experiment investigates the effect of repair time on the number of patients affected. The second experiment analyses the impact of several factors such as delaying alarm and increasing work-related stresses on increasing the number of affected patients. Finally, the third experiment explores how panic and management pressure can increase repair times and, consequently, increase the number of affected patients.
A: The Effect of Repair Time on the Number of Affected Patients
In the first experiment, the number of affected patients was examined under varying repair times, while all other influencing factors, such as emergency drills, alarm delay, and response time, were set to their minimum values. Work-related stress was excluded from this experiment to isolate the effect of repair time. Table 5 summarizes the parameter settings used for this scenario.
The simulation results, shown in Figure 3, demonstrate a strong, direct relationship between repair time and the number of affected patients. When the repair occurs within one minute, the number of affected patients remains below 45. However, increasing the repair time to 3 and 5 min leads to a sharp rise in patient impact, with the number escalating to approximately 50. The steep increase during the first two minutes highlights the system’s high sensitivity to early repair delays.
These findings emphasize the critical importance of minimizing repair time to limit the spread of harm in emergency scenarios. From a policy standpoint, the results advocate for allocating emergency repair teams in advance, streamlining fault escalation procedures, and ensuring targeted training to reduce both the initiation and execution delays in restoration activities.
B: The Effect of Alarm Delay on Affected Patients
In the second experiment, the repair time was fixed at one minute, and the combined effects of alarm delay and work-related stress were examined. Alarm delay, caused by a failure in the alarm system, was set to two minutes, and work stress was elevated to its maximum level to observe its influence on system response.
The simulation results, shown in Figure 4, reveal that alarm delay alone (with no work stress) leads to a noticeable increase in the number of affected patients, despite the short repair time. When both alarm delay and work stress are present (blue line), the number of affected patients escalates rapidly. Conversely, the most favorable outcome (green line) occurs when there is no alarm delay and staff are not under stress, limiting patient impact to fewer than 45.
These findings confirm that alarm responsiveness and staff workload management are critical factors in emergency preparedness. Even with optimal repair conditions, upstream delays such as alarm failures or heightened stress can significantly compromise system performance. The results indicate that the system is most vulnerable when these two factors converge; an upstream technical delay significantly leverages the negative impact of human stress, compromising system performance more severely than when these failures occur independently. This highlights the importance of designing resilient early-warning mechanisms and implementing proactive stress-reduction strategies to maintain safety in time-critical healthcare environments.
C: The Effect of Drills on Increasing the Repair Time
In the third experiment, the repair time was fixed at five minutes, the alarm delay was set to two minutes, and work-related stress was elevated to its maximum level. These conditions were selected to represent a highly adverse scenario. The study then examined the effect of increasing the number of emergency drills on staff response and system performance, particularly in mitigating panic-induced delays.
As shown in Figure 5, increasing the number of drills led to a noticeable reduction in the number of affected patients, despite the presence of alarm delays and high stress levels. The scenario with the highest number of drills resulted in a slower rise and lower overall number of affected patients compared to scenarios with fewer or no drills. This effect is attributed to improved staff preparedness and reduced panic, which translated into faster alerting and shorter effective repair times.
These results highlight the significant value of frequent emergency drills as a non-technical safety intervention. By reducing behavioral delays and enhancing coordination under stress, drills help mitigate cascading effects during emergencies. Institutions should therefore institutionalize regular, scenario-based training as a critical component of safety management and emergency response planning.

5. Discussion

The simulation model studies the effect of the repair time in increasing the number of patients affected. Also, the simulation model studies the effects of the other factors on the repair time and affected patients such as alarm delay due to system failure, work pressure, and the effect of training and drills conducted.

5.1. Results Discussion

The effect of the repair time on the number of affected patients is demonstrated in dynamic simulation, considering the other factors such as training, drills, work pressure, and the failure in the alarm systems. The simulation showed the impact of various factors on increasing repair time. The failure of the alarm system, for instance, delays the alarm process, which in turn extends the time required for system repair and leads to an increase in the number of patients affected. Importantly, these patterns were preserved when key behavioral parameters were varied within plausible ranges, indicating that the qualitative insights are structurally robust and primarily driven by the feedback mechanisms identified through the STPA-based causal analysis.
The analysis highlights a critical synergistic effect between technical malfunctions and human responses. The simulation demonstrates that the interaction between alarm delays and work-related stress creates a compounding impact on patient safety. When technical controls (e.g., alarms) fail, the uncertainty of the situation triggers an increase in panic and stress among the staff. This human response acts as a reinforcing feedback loop: heightened stress degrades the cognitive performance required for rapid decision-making, which further increases the repair time. These findings also illustrate how the integrated STPA–SD model provides conditional projections of incident evolution under defined parameter settings, thereby supporting both explanatory insight and scenario-based evaluation of safety interventions.
Work pressure is an important factor that affects the repair time and affects patients where the increase of the work pressure factor due to panic and management pressure leads to wasted time, further extending the repair time and increasing the number of patients affected.
Additionally, the effect of training and conducted drills on raising the efficiency of workers and improving response even with the presence of other factors such as work pressure and delays in warning systems at work. Regular training for workers in emergencies helps reduce the delays due to panic, which reduces the repair time and ultimately reduces the number of affected patients. The simulation experiments provide actionable insights into how specific safety measures, such as reducing repair time, improving alarm responsiveness, and conducting regular drills, can mitigate patient harm during system failures. These findings offer practical guidance for hospital managers and policymakers aiming to strengthen emergency preparedness and response protocols.

5.2. Model Verification

As previously mentioned, the simulation model reproduces the key behavioural patterns of the past accident that occurred in a hospital. Therefore, the selection of the model and variables was based on numerous discussions with hospital personnel, particularly the medical maintenance team, engineering maintenance team, and safety management team. During the running of the simulation model, the interaction of system elements with each other was verified, and comparisons were drawn with the actual events of the accident. These comparisons revealed agreement regarding the factors influencing the increase in fault repair time, such as reporting delays due to panic and response delays caused by management pressure. The close alignment between observed and simulated behavior further supports the use of the model for conditional scenario analysis, within the parameter envelope defined by expert input and historical data.
As a result of heightened pressure from management on maintenance workers, there was a subsequent increase in the time required for repairs. Furthermore, due to the lack of mock experiments simulating this type of situation within the hospital, the nursing team was not adequately prepared to address the interruption of the oxygen system, although they demonstrated a quick response in caring for affected patients. Interestingly, in the actual accident, there was no significant impact from response delays due to alarm system failures, as the hospital regularly conducts preventive maintenance on alarm panels, thus mitigating sudden failures in low-oxygen warning systems. Additionally, the modeling of human factors (e.g., stress and panic) rely on simplified representation of complex behaviors, where future research could benefit from integrating more granular human reliability data or psychological models (e.g., human reliability analysis) to further refine these variables.
To further enhance the reliability verification, a qualitative pattern matching assessment was conducted to compare the simulation outputs against the documented narrative of the 2017 incident. This comparison revealed critical alignments between the modelled behavior and historical reality. First, regarding the delay mechanism, the incident report cited staff confusion as the primary cause of the prolonged response. This was mirrored by the simulation results, which showed that ‘Repair Time’ was significantly more sensitive to the ‘Panic’ variable than to technical factors. Second, the model captured the cascade effect observed during the real event, reproducing the exponential growth in the number of affected patients once the initial response window was missed.
Overall, while the STPA enabled the identification of unsafe control actions and systemic hazards, the SD model operationalized these insights to simulate how risk factors evolve and interact over time. The simulation results dynamically verified the selected loss scenario (Loss Scenario 11.4) by demonstrating how alarm delays, staff stress, and insufficient training contribute to the escalation of harm, even under controlled repair conditions. This confirms the validity of the STPA findings and adds practical depth by showing how targeted interventions, such as increased drill frequency, can mitigate risks. Thus, the integrated approach offers a structured and repeatable framework for bridging qualitative hazard analysis with quantitative scenario testing in complex socio-technical systems.

6. Conclusions

This study introduced an integrated framework that fuses STPA and SD simulation to strengthen hazard identification and risk assessment in safety-critical systems.
Using hospital oxygen supply, adapted from Shaban, Abdelwahed, Di Gravio, et al. (2022) [29], as an illustrative case, we translated core STPA artefacts (control structures, unsafe control actions, and causal scenarios) into an executable SD model. Time-based simulations verified the STPA-derived loss scenarios and revealed how risk escalates as component interactions evolve, giving safety managers operationally meaningful early-warning indicators. Beyond explanatory insight, the model also provides conditional projections of system response under different operational assumptions, allowing stakeholders to compare the expected impact of alternative training, workload, or alarm-response strategies.
While the case study confirms the explanatory power of the approach, it is limited to one loss scenario and relies on expert-selected, semi-quantitative behavioral parameters. The present work therefore prioritizes structural and qualitative validation (through incident reconstruction and pattern matching) over exhaustive numerical sensitivity analysis. Future work should model multiple, interacting hazards and validate the method across varied socio-technical domains through real-world trials. Building a library of reusable model components would further streamline adoption.
From a practical perspective, the application of this framework to the hospital oxygen supply system revealed that human factors often outweigh technical component failures in determining the severity of an incident. The simulation results demonstrated that emergency drills act as a critical buffer that decouples technical alarms from panic-induced delays. Consequently, we recommend that hospital administrators prioritize high-frequency, scenario-based training over static procedure reviews, as this directly mitigates panic and accelerates repair times. Furthermore, the study suggests that resource allocation should focus on upstream stress reduction for maintenance staff, as the interaction between high workload and alarm delays was shown to cause a non-linear escalation in patient risk. These insights provide a concrete basis for revising safety protocols to address both the physics of the infrastructure and the psychology of the operators. However, the modelling of human responses in the current framework remains simplified and based on expert-informed estimates, and future work should incorporate more detailed human-reliability data to enhance behavioral realism. Future research should also include formal, quantitative sensitivity analysis to systematically explore parameter uncertainty and to further refine the ranges obtained through expert elicitation and incident-based calibration.
By embedding STPA outputs in a dynamic simulation environment, the framework converts static hazard lists into a test bench for policy experimentation. It offers practitioners a scalable, structured tool for proactive safety planning, enabling them to compare control strategies, priorities interventions, and make risk-informed decisions in complex socio-technical systems. We expect that, while specific parameter values remain context-dependent, the proposed integrated framework itself is adaptable to other safety-critical domains as a scalable and structured tool for scenario-based analysis and risk-informed decision-making.

Author Contributions

Conceptualization, A.S., I.H.A., G.D.G. and R.P.; methodology, A.S., A.A., G.D.G. and R.P.; software, A.A.; validation, A.S., A.A. and R.P.; formal analysis, A.S., A.A. and R.P.; investigation, A.S., A.A., G.D.G. and R.P.; resources, A.S., I.H.A., G.D.G. and R.P.; data curation, A.S., A.A. and R.P.; writing—original draft preparation, A.S., A.A. and R.P.; writing—review and editing, A.S., A.A., G.D.G. and R.P.; visualization, A.A.; supervision, A.S., I.H.A., G.D.G. and R.P.; project administration, A.S. and R.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data that support the findings of this study are available from the authors upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Leveson, N.; Thomas, J. STPA Handbook; Cambridge, MA, USA, 2018; pp. 1–188. [Google Scholar]
  2. Brauer, R.L. Safety and Health for Engineers; John Wiley & Sons: Hoboken, NJ, USA, 2016. [Google Scholar]
  3. Leveson, N. A new accident model for engineering safer systems. Saf. Sci. 2004, 42, 237–270. [Google Scholar] [CrossRef]
  4. Guillerm, R.; Demmou, H.; Sadou, N. Safety evaluation and management of complex systems: A system engineering approach. Concurr. Eng. Res. Appl. 2012, 20, 149–159. [Google Scholar] [CrossRef]
  5. Bugalia, N.; Maemura, Y.; Ozawa, K. A system dynamics model for near-miss reporting in complex systems. Saf. Sci. 2021, 142, 105368. [Google Scholar] [CrossRef]
  6. Huang, J.; You, J.X.; Liu, H.C.; Song, M.S. Failure mode and effect analysis improvement: A systematic literature review and future research agenda. Reliab. Eng. Syst. Saf. 2020, 199, 106885. [Google Scholar] [CrossRef]
  7. Lilli, G.; Sanavia, M.; Oboe, R.; Vianello, C.; Manzolaro, M.; De Ruvo, P.L.; Andrighetto, A. A semi-quantitative risk assessment of remote handling operations on the SPES Front-End based on HAZOP-LOPA. Reliab. Eng. Syst. Saf. 2024, 241, 109609. [Google Scholar] [CrossRef]
  8. Bensaci, C.; Zennir, Y.; Pomorski, D.; Innal, F.; Liu, Y.; Tolba, C. STPA and Bowtie risk analysis study for centralized and hierarchical control architectures comparison. Alex. Eng. J. 2020, 59, 3799–3816. [Google Scholar] [CrossRef]
  9. Wu, X.; Huang, H.; Xie, J.; Lu, M.; Wang, S.; Li, W.; Huang, Y.; Yu, W.; Sun, X. A novel dynamic risk assessment method for the petrochemical industry using bow-tie analysis and Bayesian network analysis method based on the methodological framework of ARAMIS project. Reliab. Eng. Syst. Saf. 2023, 237, 109397. [Google Scholar] [CrossRef]
  10. Baybutt, P. Requirements for improved process hazard analysis (PHA) methods. J. Loss Prev. Process Ind. 2014, 32, 182–191. [Google Scholar] [CrossRef]
  11. Garbolino, E.; Chery, J.P.; Guarnieri, F. A Simplified Approach to Risk Assessment Based on System Dynamics: An Industrial Case Study. Risk Anal. 2016, 36, 16–29. [Google Scholar] [CrossRef] [PubMed]
  12. Stemn, E.; Fosu, S.; Addo, L.N.A. Assessment of occupational hazards exposures of artisanal and small-scale mining in Ghana. J. Saf. Sustain. 2025, in press. [Google Scholar] [CrossRef]
  13. Wang, S.; Zhu, Y. A theoretical framework for analyzing firefighters’ situational awareness and information requirements in large chemical tank firefighting. J. Saf. Sustain. 2025, in press. [Google Scholar] [CrossRef]
  14. Patriarca, R.; Chatzimichailidou, M.; Karanikas, N.; Di Gravio, G. The past and present of System-Theoretic Accident Model And Processes (STAMP) and its associated techniques: A scoping review. Saf. Sci. 2022, 146, 105566. [Google Scholar] [CrossRef]
  15. Rasmussen, J. Risk management in a dynamic society: A modelling problem. Saf. Sci. 1997, 27, 183–213. [Google Scholar] [CrossRef]
  16. Young, W.; Leveson, N.G. Inside risks an integrated approach to safety and security based on systems theory: Applying a more powerful new safety methodology to security risks. Commun. ACM 2014, 57, 31–35. [Google Scholar] [CrossRef]
  17. Ishimatsu, T.; Leveson, N.G.; Thomas, J.P.; Fleming, C.H.; Katahira, M.; Miyamoto, Y.; Ujiie, R.; Nakao, H.; Hoshino, N. Hazard Analysis of Complex Spacecraft Using Systems-Theoretic Process Analysis. J. Spacecr. Rocket. 2014, 51, 509–522. [Google Scholar] [CrossRef]
  18. Abrecht, B.; Arterburn, D.; Horney, D.; Schneider, J.; Abel, B.; Leveson, N. A new approach to hazard analysis for rotorcraft. In Proceedings of the Specialists’ Meeting on Development, Affordability and Qualification of Complex Systems 2016, Huntsville, AL, USA, 9–10 February 2016; American Helicopter Society International: Fairfax, VA, USA, 2016. [Google Scholar]
  19. Dghaym, D.; Hoang, T.S.; Turnock, S.R.; Butler, M.; Downes, J.; Pritchard, B. An STPA-based formal composition framework for trustworthy autonomous maritime systems. Saf. Sci. 2021, 136, 105139. [Google Scholar] [CrossRef]
  20. Lu, X.; Zeng, S.; Guo, J.; Deng, W.; He, M.; Che, H. An integrated method of extended STPA and BN for safety assessment of man-machine phased-mission system. Reliab. Eng. Syst. Saf. 2025, 253, 110569. [Google Scholar] [CrossRef]
  21. Nakashima, T.; Kureta, R.; Khastgir, S. Addressing systemic risks in autonomous maritime navigation: A structured STPA and ODD-based methodology. Reliab. Eng. Syst. Saf. 2025, 261, 111041. [Google Scholar] [CrossRef]
  22. Riccardi, L.; Compare, M.; Mascherona, R.; Zio, E. Structural causal modeling and STPA for the risk analysis of a rail system powered by H2 fuel. Reliab. Eng. Syst. Saf. 2025, 256, 110758. [Google Scholar] [CrossRef]
  23. Flavio Vismari, L.; Camargo Junior, J.B. A safety assessment methodology applied to CNS/ATM-based air traffic control system. Reliab. Eng. Syst. Saf. 2011, 96, 727–738. [Google Scholar] [CrossRef]
  24. Bendib, R.; Mechhoud, E.; Bendjama, H.; Boulksibat, H. Risk Assessment of a Gas Plant (Unit 30 Skikda Refinery) Using Hazop & Bowtie Methods, Simulation of Dangerous Scenarios Using ALOHA Software. Alger. J. Signals Syst. 2020, 5, 25–32. [Google Scholar] [CrossRef]
  25. Leveson, N.; Couturier, M.; Thomas, J.; Dierks, M.; Wierz, D.; Psaty, B.M.; Finkelstein, S. Applying System Engineering to Pharmaceutical Safety. J. Healthc. Eng. 2012, 3, 391–414. [Google Scholar] [CrossRef]
  26. Cooke, D.L. A system dynamics analysis of the Westray mine disaster. Syst. Dyn. Rev. 2003, 19, 139–166. [Google Scholar] [CrossRef]
  27. Shire, M.I.; Jun, G.T.; Robinson, S. The application of system dynamics modelling to system safety improvement: Present use and future potential. Saf. Sci. 2018, 106, 104–120. [Google Scholar] [CrossRef]
  28. Yamada, T.; Sato, M.; Kuranobu, R.; Watanabe, R.; Itoh, H.; Shiokari, M.; Yuzui, T. Evaluation of effectiveness of the STAMP/STPA in risk analysis of autonomous ship systems. J. Phys. Conf. Ser. 2022, 2311, 012021. [Google Scholar] [CrossRef]
  29. Shaban, A.; Abdelwahed, A.; Di Gravio, G.; Afefy, I.H.; Patriarca, R. A systems-theoretic hazard analysis for safety-critical medical gas pipeline and oxygen supply systems. J. Loss Prev. Process Ind. 2022, 77, 104782. [Google Scholar] [CrossRef]
  30. Patriarca, R.; Di Gravio, G.; Costantino, F.; Fedele, L.; Tronci, M.; Bianchi, V.; Caroletti, F.; Bilotta, F. Systemic safety management in anesthesiological practices. Saf. Sci. 2019, 120, 850–864. [Google Scholar] [CrossRef]
  31. Silvis-Cividjian, N.; Verbakel, W.; Admiraal, M. Using a systems-theoretic approach to analyze safety in radiation therapy-first steps and lessons learned. Saf. Sci. 2020, 122, 104519. [Google Scholar] [CrossRef]
  32. Blandine, A. Systems Theoretic Hazard Analysis (STPA) Applied to the Risk Review of Complex Systems: An example from the Medical Device Industry. Ph.D. Thesis, Massachusetts Institute of Technology, Cambridge, MA, USA, 2013. [Google Scholar]
  33. Bensaci, C.; Zennir, Y.; Pomorski, D.; Innal, F.; Lundteigen, M.A. Collision hazard modeling and analysis in a multi-mobile robots system transportation task with STPA and SPN. Reliab. Eng. Syst. Saf. 2023, 234, 109138. [Google Scholar] [CrossRef]
  34. Dakwat, A.L.; Villani, E. System safety assessment based on STPA and model checking. Saf. Sci. 2018, 109, 130–143. [Google Scholar] [CrossRef]
  35. Bjerga, T.; Aven, T.; Zio, E. Uncertainty treatment in risk analysis of complex systems: The cases of STAMP and FRAM. Reliab. Eng. Syst. Saf. 2016, 156, 203–209. [Google Scholar] [CrossRef]
  36. Jiao, J.; Jing, Y.; Pang, S. An Integrated Quantitative Safety Assessment Framework Based on the STPA and System Dynamics. Systems 2022, 10, 137. [Google Scholar] [CrossRef]
  37. Leveson, N.; Daouk, M.; Dulac, N.; Marais, K. A Systems Theoretic Approach to Safety Engineering; Massachusetts Institute of Technology: Cambridge, MA, USA, 2003. [Google Scholar]
  38. Forrester, J.W. Industrial Dynamics; The M.I.T. Press: Cambridge, MA, USA, 1961. [Google Scholar]
  39. Yu, J.-H.; Ahn, N.; Jae, M. A Quantitative Assessment of Organizational Factors Affecting Safety Using System Dynamics Model. Nucl. Eng. Technol. 2004, 36, 64–72. [Google Scholar]
  40. Kang, K.M.; Jae, M. A quantitative assessment of LCOs for operations using system dynamics. Reliab. Eng. Syst. Saf. 2005, 87, 211–222. [Google Scholar] [CrossRef]
  41. Bouloiz, H.; Garbolino, E.; Tkiouat, M.; Guarnieri, F. A system dynamics model for behavioral analysis of safety conditions in a chemical storage unit. Saf. Sci. 2013, 58, 32–40. [Google Scholar] [CrossRef]
  42. Simone, F.; Akel, A.J.N.; Di Gravio, G.; Patriarca, R. Thinking in Systems, Sifting Through Simulations: A Way Ahead for Cyber Resilience Assessment. IEEE Access 2023, 11, 11430–11450. [Google Scholar] [CrossRef]
  43. Sterman, J. System Dynamics: Systems Thinking and Modeling for a Complex World; Working Paper; Massachusetts Institute of Technology, Engineering Systems Division: Cambridge, MA, USA, 2002; Available online: https://dspace.mit.edu/handle/1721.1/102741 (accessed on 11 December 2025).
Figure 1. STPA and System Dynamics Methodology.
Figure 1. STPA and System Dynamics Methodology.
Infrastructures 11 00003 g001
Figure 2. Stock and Flow Diagram.
Figure 2. Stock and Flow Diagram.
Infrastructures 11 00003 g002
Figure 3. Effect of Repair Time on The Number of Affected Patients.
Figure 3. Effect of Repair Time on The Number of Affected Patients.
Infrastructures 11 00003 g003
Figure 4. Effect of Alarm Delay and Work Stress on Affected Patients.
Figure 4. Effect of Alarm Delay and Work Stress on Affected Patients.
Infrastructures 11 00003 g004
Figure 5. Study the effect of Drills on the affected patients.
Figure 5. Study the effect of Drills on the affected patients.
Infrastructures 11 00003 g005
Table 1. Loss scenarios discovered by STPA [29].
Table 1. Loss scenarios discovered by STPA [29].
Loss Scenario DescriptionControl Action and Unsafe Control ActionThreats Affecting the Scenario
The maintenance team failed to carry out the temporary restoration needed in the affected ward.Control Action: Conduct temporary maintenance to reinstate oxygen service in the impacted area.
Unsafe Control Action: The temporary restoration task was omitted or not performed by the responsible personnel.
An extended loss of oxygen supply caused by failure to implement immediate restoration measures.
Temporary restoration efforts were completed without first addressing the most critical clinical zones.Control Action: Execute temporary system repair to resume oxygen delivery to all affected units.
Unsafe Control Action: Restoration was conducted without prioritizing life-support or intensive care departments.
Delay in re-supplying oxygen to high-risk areas, heightening the exposure of vulnerable patients.
The emergency process to re-establish oxygen flow was started too late after the disruption occurred.Control Action: Initiate the emergency response procedure to restore oxygen distribution.
Unsafe Control Action: The activation of the emergency response was delayed beyond the acceptable timeframe.
Longer interruption of oxygen delivery results in extended downtime and safety risk escalation.
Reconnection and cylinder transfer operations significantly slowed the temporary restoration process.Control Action: Apply temporary recovery measures through cylinder replacement or alternate pipeline link.
Unsafe Control Action: Excessive time was consumed due to logistical delays in moving and connecting the backup source.
Prolonged outage of oxygen service and elevated hazard to patient safety.
Table 2. Variables Descriptions.
Table 2. Variables Descriptions.
VariableDescriptionTypeSTPA Source Reference
Total number of patientsNumber of patients in the studyRemaining VariableHazard H3 definition, and loss L1& L2 definition
AVSU MonitorRepresents AVSU monitored by nursing staffSTPA VariableCA-7
AAPU AlarmArea Alarm PanelSTPA VariableCA-5 and UCA-8 & UCA-5
Main Panel AlarmThe alarm of the Main PanelSTPA VariableCA-6 & CA-9
Nurse AlarmShows that the nurse staff were notifiedSTPA VariableCA-5 and UCA-8 & UCA-5
AlertInitiating alert in case of O2 dropRemaining VariableDerived from brainstorming interviews
Work-Related Stress FactorStress from top management on maintenance staffRemaining VariableDerived from brainstorming interviews
Number of Drills ExecutedDrills executed over the yearRemaining VariableDerived from brainstorming interviews
PanicPanic experienced by nurse and maintenance staff delays the alarm or repair processRemaining VariableDerived from brainstorming interviews
Alarm System FailureFailure of the alarm system to workSTPA VariableUCA4, UCA-5, UCA-8
Alert DelayDelay due to panic or alarm system failureSTPA VariableCA-4, CA-5, CA-6, UCA-4 &UCA-5
Response TimeTime for maintenance staff to reach affected areaSTPA VariableCA-11, UCA-11, and Scenario 11.4
Survival TimeTime a patient can survive O2 deficiencySTPA VariableHazard H3 definition, and loss L1& L2 Definition
Repair TimeTime to restore system (maintenance)STPA VariableCA-11, UCA-11, and Scenario 11.4
Patients SurvivabilityRate of all patients to survive before getting affectedSTPA VariableScenario 11.4
Number of Affected PatientsPatients affected by O2 deficiencySTPA VariableScenario 11.4
Effect RateRate at which patients are affected (normal to affected)STPA VariableScenario 11.4
Table 3. Model Equations.
Table 3. Model Equations.
VariableEquations
AlertIF THEN ELSE (Main Panel Alarm = 1: OR: Nurse Alarm = 1, 1, 0)
Alert DelayAlarm System Failure + Panic
Effect rateIF THEN ELSE (Repair Time ≥ Survival Time, IF THEN ELSE (Repair Time/Survival Time) *Patients Survivability > 1, 1, (Repair Time/Survival Time) *Patients Survivability) *Total Number of Patients, 0)
Number of Affected PatientsINTEG (Effect rate, 0)
Nurse AlarmIF THEN ELSE (AAPu alarm = 1: OR: AVSU Monitoring = 1, 1,0)
Panic2*(1/Number of drills Executed)
Repair TimeIF THEN ELSE (Alert = 1, 5 + Response Time + Alert Delay, 0)
Response Time(1*(1 + Work Related Stress Factor)) + Panic
Total Number of PatientsINTEG (-Effect rate, 50)
Table 4. Model Variables Initial Values.
Table 4. Model Variables Initial Values.
VariablesInitial ValuesMax Value
Total Number of Patients50 50
Alarm System Failure0 min delay 5 min delay
Number of drills Executed1 Per Year 12 Per year
Panic2 min delay Infinity
Patients Survivability0.3 0.3
Response Time1 min delayInfinity
Survival Time3 min 5 min
Work-Related Stress Factor01
Table 5. First study Variables Values.
Table 5. First study Variables Values.
VariablesValue
Repair Time1, 3, 5 Min
Drills1
Panic2 Min delays
Alarm system Failure0 min delay
Stress Factor0
Response Time1 min delay
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Shaban, A.; Abdelwahed, A.; Afefy, I.H.; Di Gravio, G.; Patriarca, R. Integrating System-Theoretic Process Analysis and System Dynamics for Systemic Risk Analysis in Safety-Critical Systems. Infrastructures 2026, 11, 3. https://doi.org/10.3390/infrastructures11010003

AMA Style

Shaban A, Abdelwahed A, Afefy IH, Di Gravio G, Patriarca R. Integrating System-Theoretic Process Analysis and System Dynamics for Systemic Risk Analysis in Safety-Critical Systems. Infrastructures. 2026; 11(1):3. https://doi.org/10.3390/infrastructures11010003

Chicago/Turabian Style

Shaban, Ahmed, Ahmed Abdelwahed, Islam H. Afefy, Giulio Di Gravio, and Riccardo Patriarca. 2026. "Integrating System-Theoretic Process Analysis and System Dynamics for Systemic Risk Analysis in Safety-Critical Systems" Infrastructures 11, no. 1: 3. https://doi.org/10.3390/infrastructures11010003

APA Style

Shaban, A., Abdelwahed, A., Afefy, I. H., Di Gravio, G., & Patriarca, R. (2026). Integrating System-Theoretic Process Analysis and System Dynamics for Systemic Risk Analysis in Safety-Critical Systems. Infrastructures, 11(1), 3. https://doi.org/10.3390/infrastructures11010003

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop