Integrating Commercial-Off-The-Shelf Components into Radiation-Hardened Drone Designs for Nuclear-Contaminated Search and Rescue Missions

: This paper conducts a focused probabilistic risk assessment (PRA) on the reliability of commercial off-the-shelf (COTS) drones deployed for surveillance in areas with diverse radiation levels following a nuclear accident. The study employs the event tree/fault tree digraph approach, integrated with the dual-graph error propagation method (DEPM), to model sequences that could lead to loss of mission (LOM) scenarios due to combined hardware–software failures in the drone’s navigation system. The impact of radiation is simulated by a comparison of the total ionizing dose (TID) with the acceptable limit for each component. Errors are then propagated within the electronic hardware and software blocks to determine the navigation system’s reliability in different radiation zones. If the system is deemed unreliable, a strategy is suggested to identify the minimum radiation-hardening requirement for its subcomponents by reverse-engineering from the desired mission success criteria. The ﬁndings of this study can aid in the integration of COTS components into


Introduction
Uncrewed Aerial Vehicles (UAVs), commonly known as drones, have become increasingly popular in various applications due to their versatility and cost-effectiveness.One such application is search and rescue (SAR) activities, particularly in the aftermath of a nuclear accident.These missions are critical to assess the extent of damage, locate survivors, and monitor radiation levels [1,2].However, the high radiation levels in such environments pose a significant risk to the electronic components of UAVs, potentially leading to mission failure.The use of commercial off-the-shelf (COTS) drones in SAR missions offers several advantages, such as cost-effectiveness, rapid deployment, and ease of operation.However, these drones are not typically designed to withstand the harsh radiation environments encountered in nuclear accidents.On the other hand, radiation-hardened UAVs may be employed, but their use may not be justified in all cases due to factors such as availability, cost, and mission-specific requirements.
Given these challenges, there is a pressing need to develop a systematic approach to assess the reliability of COTS drones in nuclear SAR missions.This paper presents a probabilistic risk assessment (PRA) approach to determining radiation-hardening limits for COTS drones in radiological SAR operations based on predefined mission success criteria.PRA offers a comprehensive framework that is well-suited to modeling intricate dependencies and failure modes in complex systems.These methods have significantly contributed to ensuring safety in current nuclear and aerospace operations.

Literature Review
Recent years have seen a surge in the use of drones for SAR missions, with studies such as that by Murphy et al. [3] highlighting their potential in providing real-time situational awareness during disaster management.The effects of radiation on electronic components are also well-documented.Fleetwood [4], Normand [5] and Dodd et al. [6] provide comprehensive reviews of these effects, including single-event upsets (SEUs), and total ionizing dose (TID) effects, which are particularly relevant to our study.As such, the use of COTS components in radiation environments presents both challenges and opportunities.Barnaby et al. [7,8] underscore the need for radiation-hardening in microelectronics, a concept explored in depth by Ladbury [9].On the other hand, PRA has been widely used to evaluate the reliability of systems, as outlined by Apostolakis [10] and Modarres et al. [11].However, the application of PRA to drone systems, particularly in the context of nuclear SAR missions, is a novel research area.While the literature provides valuable insights into the individual aspects of drone design for nuclear-contaminated SAR missions, there is a need for a comprehensive study that integrates these aspects into a systematic approach for assessing drone system reliability.This study aims to fill this gap by proposing a PRA-based approach for assessing the reliability of COTS drones in nuclear SAR missions.
Our proposed approach utilizes event trees, fault trees, and a Markov-chain-based dual-graph error propagation methodology to model sequences leading to Loss of Mission (LOM) due to component failures in the drone's navigation system.Radiation effects are simulated by calculating the total ionizing dose (TID) against the permissible limit per component, and errors are propagated within the electronic hardware and software blocks to quantify navigation system availability per radiation zone.The results provide a demonstrative assessment of the drone's availability in nuclear SAR missions.
The remainder of this paper is organized as follows: Section 2 provides a brief overview of the problem scope and the methodology behind the proposed solution.This includes the proposed PRA approach, including the modeling of radiation effects on electronic components and the integration of dynamic failure scenarios.Section 3 presents a case study demonstrating the application of the proposed approach to a COTS drone navigation system.Section 4 discusses the results and their implications for drone design, component selection and radiation-hardening.Finally, Section 5 concludes the paper by addressing the limitations and outlining ideas for future research.

Methodology
In this section, we provide a brief overview of the problem scope and the methodology behind our proposed solution to the assessment of COTS drone availability in nuclear SAR missions.We introduce a few terms within resilience ontology in the context of temporal logic using the Kripke structure notation to model time-dependent risk [12][13][14][15].The formal equation and definitions are introduced in Equation ( 1) and Table 1.A system M consists of states S that transition along a bounded but countably infinite long path P, starting with the initial state i.R is the transition relation that maps all the valid state transitions.An equivalent definition can be expressed using the edge list e = {A, B, C, D, E, F, G, H, I}, which forms our alphabet.The traversal of path P Drones 2023, 7, 528 3 of 17 produces words W. This allows for us to build a grammar with which we can define events or system properties, and emergent behaviors.This grammar can include words of our choosing; some examples are listed in in Table 2.By extension, each trajectory or word is expressible in a temporal sense, as depicted in Figure 1 [16].

Term Definition Description
= {nominal, degraded, failure} A set of possible states.

𝑅 𝑅 ⊆ 𝑆 × 𝑆
A mapping or transition relation, where  is left-total (if the source set  equals the domain,  ⊆  ×  is left-total), and  is fully connected.Risk is evaluated by identifying potential failure scenarios, assessing their likelihood, and determining the consequences if these failures occur.Consequently, risk is formally Risk is evaluated by identifying potential failure scenarios, assessing their likelihood, and determining the consequences if these failures occur.Consequently, risk is formally expressed as a complete set of N triplets that include a scenario description s i , its probability p i , and the consequences, i.e., the resulting damage measure or evaluation metric x i : Conventional PRA approaches involve sequence-based modeling where initiating events are chosen and conditional event progressions are analyzed, leading to end states of interest.By incorporating consequence information into these PRA models, frequencyconsequence curves can be formulated [17].In event tree analysis, probabilities are assigned to functional events depicting various components, systems, or operator actions using fault trees [18].These probabilities take into account either time-dependent or on-demand failure modes given predetermined mission durations.
In certain situations, however, event tree/fault tree methods may need to be supplemented with specialized analysis techniques to model the systems that involve error propagation failure modes, or incorporate multiple failure paths, such as the example provided in the previous section [19].Tracking the propagation of errors from discrete sub-components to system or functional levels in such systems presents a unique challenge.DEPM is an extension to Markov chains, which enables the modeling of data flows and control flows as two separate Markov chains within a cyber-physical system [20].DEPM terms and definitions are listed in Equation (3) and Table 3, respectively.The interaction between the control and data Markov chains is specified by the DEPM algorithm.In DEPM, we consider a system as independent, and discrete elements, which can be sensor modules or software blocks in a mechatronic system [21].When a fault activates, it can propagate to connected elements, which may corrupt data, or alter the control and data flows.We present the DEPM formalism along with an example in the following figures.Flows from an element may branch erroneously, depending on its corresponding failure rates/probabilities.By extension, error propagation analyses can simulate single-event upsets (SEUs).To perform quantitative evaluations, DEPM models are automatically transformed into continuous time (CTMCs) or discrete-time Markov Chains (DTMCs).Figure 2 and Table 4 illustrate an example DEPM with associated conditional logic expressions.
Drones 2023, 7, x FOR PEER REVIEW 4 of 18 using fault trees [18].These probabilities take into account either time-dependent or ondemand failure modes given predetermined mission durations.
In certain situations, however, event tree/fault tree methods may need to be supplemented with specialized analysis techniques to model the systems that involve error propagation failure modes, or incorporate multiple failure paths, such as the example provided in the previous section [19].Tracking the propagation of errors from discrete sub-components to system or functional levels in such systems presents a unique challenge.DEPM is an extension to Markov chains, which enables the modeling of data flows and control flows as two separate Markov chains within a cyber-physical system [20].DEPM terms and definitions are listed in Equation ( 3) and Table 3, respectively.The interaction between the control and data Markov chains is specified by the DEPM algorithm.In DEPM, we consider a system as independent, and discrete elements, which can be sensor modules or software blocks in a mechatronic system [21].When a fault activates, it can propagate to connected elements, which may corrupt data, or alter the control and data flows.We present the DEPM formalism along with an example in the following figures.
Table 3. Definitions for terms in dual-graph error propagation model (DEPM).

Term Definition 𝐸
A set of elements, always non-empty.

𝐷
A set of optional data terms.

𝐴
An edge-list representing control flows.

𝐴
An edge-list representing data flows.

𝐶
A list of conditional expressions, which apply to the element set .
Flows from an element may branch erroneously, depending on its corresponding failure rates/probabilities.By extension, error propagation analyses can simulate singleevent upsets (SEUs).To perform quantitative evaluations, DEPM models are automatically transformed into continuous time (CTMCs) or discrete-time Markov Chains (DTMCs).Figure 2 and Table 4 illustrate an example DEPM with associated conditional logic expressions.This DEPM computes the probability that the output variable, data variable 4, is corrupted.Given that SEUs are stochastic in nature, this may occur at any time [22].To achieve this goal, expressions can be evaluated by employing quantifiable Boolean formulae (QBF) evaluating satisfiability solvers [23,24].Relevant metrics such as the mean time to failure (MTTF), the number of total failures, and time-dependent failure probability can be directly quantified using formal verification and model-checking methods.Since it is based on probabilistic modeling checking, DEPM is better-suited to modeling the behavior of smaller, but highly interdependent systems than traditional methods such as fault trees and Markov chains.

Demonstration Case
In this demonstration case, we consider a COTS drone equipped with a navigation system, communication system, and radiation sensor payload.The drone's primary mission is to perform SAR activities in a nuclear-contaminated environment, which includes monitoring radiation levels, identifying damaged infrastructure, and locating survivors.The drone's navigation system comprises a power subsystem, inertial measurement unit (IMU) sensors, positioning sensors, and a Kalman filter.The drone is tasked with flying over a predefined search area, which is divided into three zones with varying radiation levels.The drone starts its mission in the low-radiation zone (Zone A), transitions to the medium-radiation zone (Zone B), and finally enters the high-radiation zone (Zone C) before returning to the base.Radiation levels in Zone A are based on background radiation [25,26].Radiation levels in Zone B and C are sourced from samples in and around Unit 4 and surrounding buildings at the Chernobyl nuclear power plant (NPP) shortly after explosion [27].Since mission success is dependent on the UAV successfully performing SAR activities for each zone before Loss of Vehicle (LOV) occurs, the analysis is finished once the drone completes its mission objectives in Zone C. The absorbed dose rates, time in each zone, and total absorbed dose are sampled from a truncated normal distribution (N (µ, σ)-normal distribution, truncated to represent a realistic and physically meaningful sampling space; for example, time cannot be negative) and a loguniform distribution (LU (min , max)- loguniform distribution with min and max), listed in Table 5 and illustrated in Figure 3.The proposed approach is applied to assess the drone's availability in each radiation zone by considering the potential failure scenarios due to radiation-induced component failures.TID is calculated for each component in the drone's navigation system and compared with the component's permissible limit to determine the likelihood of failure.

Scenario Description
The event tree for the drone's mission is constructed based on the sequence of events that the drone is expected to encounter during its mission.Starting with the initiating event, the likelihood of navigation system availability is computed for zones A, B, and C. At each functional event, the tree branches into two outcomes: success or failure.The success branch leads to the next event in the sequence, while the failure branch leads to a Loss of Mission (LOM) end state.The failure probabilities at each node are calculated using the navigation system fault tree.Figure 4 illustrates this event tree, modeled in the OpenPRA framework [28].

Assumptions and Simplifications
In order to focus our analysis on the presented methodology, we made several assumptions and simplifications.These are necessary to streamline the discussion and concentrate on the core concepts, but it is important to note that they may limit the comprehensiveness of the model.
The event tree depicted in Figure 4 only considers the availability of the navigation system.A more comprehensive model would consider all components of the UAV and their interdependencies, including the potential for common cause failures (CCFs).As a result, the baseline failure probabilities presented in this study may appear lower than they would in a more complex model that includes CCFs.Additionally, when mechatronic systems are exposed to radiation environments, they can fail due to a variety of mechanisms.Expressing failure likelihoods in terms of TID effects abstracts away from the underlying failure mechanisms and simplifies the associated failure modes.Ionizing radiation can accelerate degradations in digital hardware through single-event upsets, gate oxide breakdown, and hot carrier injection, amongst others.In this study, we only consider

Scenario Description
The event tree for the drone's mission is constructed based on the sequence of events that the drone is expected to encounter during its mission.Starting with the initiating event, the likelihood of navigation system availability is computed for zones A, B, and C. At each functional event, the tree branches into two outcomes: success or failure.The success branch leads to the next event in the sequence, while the failure branch leads to a Loss of Mission (LOM) end state.The failure probabilities at each node are calculated using the navigation system fault tree.Figure 4 illustrates this event tree, modeled in the OpenPRA framework [28].

Assumptions and Simplifications
In order to focus our analysis on the presented methodology, we made several assumptions and simplifications.These are necessary to streamline the discussion and concentrate on the core concepts, but it is important to note that they may limit the comprehensiveness of the model.
The event tree depicted in Figure 4 only considers the availability of the navigation system.A more comprehensive model would consider all components of the UAV and their interdependencies, including the potential for common cause failures (CCFs).As a result, the baseline failure probabilities presented in this study may appear lower than they would in a more complex model that includes CCFs.Additionally, when mechatronic systems are exposed to radiation environments, they can fail due to a variety of mechanisms.Expressing failure likelihoods in terms of TID effects abstracts away from the underlying failure mechanisms and simplifies the associated failure modes.Ionizing radiation can accelerate degradations in digital hardware through single-event upsets, gate oxide breakdown, and hot carrier injection, amongst others.In this study, we only consider TID effects, which are the cumulative effects of ionizing radiation on materials and devices.In addition, our model does not explicitly account for system recoveries or environmental factors beyond ambient radiation levels.While resilience is often associated with recovery capabilities after failures or disturbances, this aspect is not fully explored in this work.A DEPM-based analysis was presented in a previous work, which explicitly modeled the temperature dependence of digital systems on partial or full recovery per mission phase [29].
The model does not take into account the potential impact of weather conditions or terrain on the drone's ability to navigate each zone.These factors could significantly affect the drone's performance and likelihood of mission success.Lastly, our model does not factor in potential for human error in drone operation.In real-world scenarios, human error can significantly contribute to mission failure; however, incorporating such effects would add a layer of complexity that is beyond the scope of this study.

Navigation System Fault Tree
The fault tree for the drone's navigation system is constructed based on the potential failure modes of the system's components.This is depicted in Figure 5.A fault tree is a graphical representation of the logical relationships between the failures, or "basic events", and the system-level failure, or "top event".The basic events are the lowest level failures that can occur in the system, while the top event is the failure of the entire system.The intermediate events represent the failure of subsystems or groups of components.The logical relationships between these events are represented by gates, which can be "AND" gates, "OR" gates, or more complex logical gates.In addition, our model does not explicitly account for system recoveries or environmental factors beyond ambient radiation levels.While resilience is often associated with recovery capabilities after failures or disturbances, this aspect is not fully explored in this work.A DEPM-based analysis was presented in a previous work, which explicitly modeled the temperature dependence of digital systems on partial or full recovery per mission phase [29].
The model does not take into account the potential impact of weather conditions or terrain on the drone's ability to navigate each zone.These factors could significantly affect the drone's performance and likelihood of mission success.Lastly, our model does not factor in potential for human error in drone operation.In real-world scenarios, human error can significantly contribute to mission failure; however, incorporating such effects would add a layer of complexity that is beyond the scope of this study.

Navigation System Fault Tree
The fault tree for the drone's navigation system is constructed based on the potential failure modes of the system's components.This is depicted in Figure 5.A fault tree is a graphical representation of the logical relationships between the failures, or "basic events", and the system-level failure, or "top event".The basic events are the lowest level failures that can occur in the system, while the top event is the failure of the entire system.The intermediate events represent the failure of subsystems or groups of components.The logical relationships between these events are represented by gates, which can be "AND" gates, "OR" gates, or more complex logical gates.
In the given fault tree, the top event is the failure of the drone's navigation system, represented by the gate "TOP".This event can occur due to the failure of the power system "SYSTEM_POW", the positioning sensors "SENSOR_POS", the Kalman filter "FIL-TER_KAL", or the inertial measurement unit sensors "SENSOR_IMU".The intermediate events are represented by the gates "SENSOR_POS", "FILTER_KAL", "SYSTEM_POW", "SENSOR_GPS", "SENSOR_VIZ", "BATT_FAIL", and "GPS_SIGNAL".Each of these gates represents a failure mode that can contribute to the top event.For example, the "SEN-SOR_POS" gate represents the failure of the positioning sensors, which can occur due to the failure of the GPS hardware "GPS_HW" or the visual sensors "SENSOR_VIZ".In the given fault tree, the top event is the failure of the drone's navigation system, represented by the gate "TOP".This event can occur due to the failure of the power system "SYSTEM_POW", the positioning sensors "SENSOR_POS", the Kalman filter "FIL-TER_KAL", or the inertial measurement unit sensors "SENSOR_IMU".The intermediate events are represented by the gates "SENSOR_POS", "FILTER_KAL", "SYSTEM_POW", "SENSOR_GPS", "SENSOR_VIZ", "BATT_FAIL", and "GPS_SIGNAL".Each of these gates represents a failure mode that can contribute to the top event.For example, the "SENSOR_POS" gate represents the failure of the positioning sensors, which can occur due to the failure of the GPS hardware "GPS_HW" or the visual sensors "SENSOR_VIZ".
The basic events are the lowest level failures that can occur in the system.These include the failure of the power supply "SUPPLY_POW", the ba ery running low "BAT_LOW", the loss of the ba ery "BATT_LOSS", the failure of the GPS hardware "GPS_HW", the loss of the GPS signal "GPS_LOSSY", the failure of the camera hardware "CAM_HW", the failure of the radiation sensor hardware "RAD_HW", the failure of the Kalman filter code "CODE_KAL", the failure of the Kalman filter DSP "DSP_KAL", and the failure of the IMU sensors "SENSOR_IMU".Failure rates for each hardware component and basic event are listed in Tables 6 and 7.These rates were acquired from the Texas Instruments reliability database [30].They are used to calculate the probability of each basic event for the elapsed time at each radiation zone, which is used to calculate the probability of the intermediate and top events using the logical relationships defined by the gates.This allows for a quantitative assessment of the reliability of the drone's navigation system.Basic event BATT_LOW models ba ery drain, with the cumulative distribution function (CDF) plo ed in Figure 6.We can observe that the ba ery was chosen to last well beyond the mission time.The basic events are the lowest level failures that can occur in the system.These include the failure of the power supply "SUPPLY_POW", the battery running low "BAT_LOW", the loss of the battery "BATT_LOSS", the failure of the GPS hardware "GPS_HW", the loss of the GPS signal "GPS_LOSSY", the failure of the camera hardware "CAM_HW", the failure of the radiation sensor hardware "RAD_HW", the failure of the Kalman filter code "CODE_KAL", the failure of the Kalman filter DSP "DSP_KAL", and the failure of the IMU sensors "SENSOR_IMU".Failure rates for each hardware component and basic event are listed in Tables 6 and 7.These rates were acquired from the Texas Instruments reliability database [30].They are used to calculate the probability of each basic event for the elapsed time at each radiation zone, which is used to calculate the probability of the intermediate and top events using the logical relationships defined by the gates.This allows for a quantitative assessment of the reliability of the drone's navigation system.Basic event BATT_LOW models battery drain, with the cumulative distribution function (CDF) plotted in Figure 6.We can observe that the battery was chosen to last well beyond the mission time.In the DEPM, the assembly code is first translated into a control flow graph (CFG) and a data flow graph (DFG).The CFG represents the flow of control in the program, while the DFG represents the flow of data between operations.The DEPM then combines these two graphs into a dual graph, which represents both the control flow and data flow in the program.The DEPM is used to analyze the propagation of accumulated errors in the software, caused by TID effects in the DSP hardware.Figure 7 illustrates the DEPM for the Kalman filter assembly in Table 8, compiled using the LLVMPars framework [21,31].

Modeling Total Ionizing Dose Limits for Electronic Hardware
TID is a measure of the amount of radiation absorbed by electronic components.Excessive TID can cause degradation or failure of these components, leading to mission failure.In order to assess the availability of the drone's electronic hardware in each radiation zone, the TID limits for each component need to be determined.The TID limits for electronic components are typically provided by manufacturers and are based on the radiation hardness of the components [32].These limits specify the maximum TID that a component can withstand without experiencing significant degradation or failure.TID limits for these components are consolidated from the manufacturer's specifications or empirical tests and are listed in Table 9, sources for which can be obtained by the following corresponding references.Using the TID limits from Table 9, the probability of failure for each component can be calculated based on the total received dose.This probability is then used to determine the likelihood of component failure in each radiation zone.For example, consider the COTS Inertial Measurement Unit (IMU) component.The TID limit for the IMU is in the range of U (min = 1.00 , max = 5.50) × 10 4 [33].Based on the total received dose in each radiation zone, the probability of exceeding the TID limit for the IMU can be calculated.If the probability of exceeding the TID limit is below this threshold, the IMU is considered to have survived in that radiation zone.Otherwise, the IMU is considered to have failed.Similarly, the probability of failure can be calculated for other components, such as the power-switching circuit, lithium-ion battery [34], GPS sensor module, vision SoC module, mmWave radar module, and filter DSP hardware [35,36].
Components can be targeted for radiation-hardening measures, such as shielding or the use of radiation-hardened components, to improve their availability in nuclearcontaminated environments.In the next section, we present the results of this analysis, and re-run it after radiation-hardening the chosen components.

Results and Discussion
The results of the event sequence analysis for the drone's navigation system are presented in this section.The modeling, quantification and visualization were performed using the OpenPRA framework, which integrates the event tree/fault tree approach with DEPM.OpenPRA is under active development and this analysis represents its current modeling capability.The results are presented in terms of the LOM likelihood in each radiation zone.

Probability of Loss of Mission (LOM) Using Commercial Off-The-Shelf (COTS) Components
The probability of LOM in each radiation zone is calculated based on the failure probabilities of the components in the drone's navigation system.The results are plotted in Figure 8 and listed in Table 10.As expected, the probability of LOM increases with the radiation level, with the highest probability occurring in Zone C, the highest radiation zone.This is due to the higher TID received by the components in this zone, which increases the likelihood of component failure.There are no TID-related failures in Zones A and B. To highlight the contribution from TID failures, the Zone C distribution was split into two.The probability of LOM in each radiation zone is calculated based on the failure probabilities of the components in the drone's navigation system.The results are plo ed in Figure 8 and listed in Table 10.As expected, the probability of LOM increases with the radiation level, with the highest probability occurring in Zone C, the highest radiation zone.This is due to the higher TID received by the components in this zone, which increases the likelihood of component failure.There are no TID-related failures in Zones A and B. To highlight the contribution from TID failures, the Zone C distribution was split into two.Table 10 presents the probability of LOM due to the failure of the drone's navigation system in each radiation zone (A, B, and C).The sampled probabilities are parametrized using a log-normal distribution (LN), with the mean (m) and error factor (EF) parameters provided.The probability of LOM is low and dependent on non-radiation-related phenomena for all parts of Zones A and B, and most parts of Zone C.This suggests that the drone's navigation system is relatively reliable in low-radiation environments, averaging about one LOM per ten thousand missions.However, the probability of LOM increases significantly in Zone C.This is due to the higher TID received by the components in this zone, which increases the likelihood of component failure.This is a significant concern, as it suggests that the drone may not be able to complete its mission in high-radiation environments.This could have serious consequences for SAR missions, as it could prevent the drone from reaching survivors or accurately assessing the extent of the damage.Table 10 presents the probability of LOM due to the failure of the drone's navigation system in each radiation zone (A, B, and C).The sampled probabilities are parametrized using a log-normal distribution (LN), with the mean (m) and error factor (EF) parameters provided.The probability of LOM is low and dependent on non-radiation-related phenomena for all parts of Zones A and B, and most parts of Zone C.This suggests that the drone's navigation system is relatively reliable in low-radiation environments, averaging about one LOM per ten thousand missions.However, the probability of LOM increases significantly in Zone C.This is due to the higher TID received by the components in this zone, which increases the likelihood of component failure.This is a significant concern, as it suggests that the drone may not be able to complete its mission in high-radiation environments.This could have serious consequences for SAR missions, as it could prevent the drone from reaching survivors or accurately assessing the extent of the damage.
In terms of mission success, the results indicate a relatively low probability.This suggests that the current design of the drone's navigation system may not be suitable for SAR missions in nuclear-contaminated environments.Therefore, improvements to the design, such as the use of radiation-hardened components or shielding, may be necessary to increase the probability of mission success.

Selective Radiation-Hardening Using Mission Success Criteria
With the objective of improving the unacceptably low likelihood of mission success, we propose a strategy to selectively harden components from the navigation system.We begin by choosing a component and assign a wide distribution for its TID limit.For instance, we choose the DSP and assign its TID limit as TID DSP = LU min = 1.0 × 10 0 , max = 1.00 × 10 6 .Here, TID DSP is a loguniform distribution, and much wider than the nominal value specified in Table 9. Next, we invert the probability of mission success, making it conditional on the event TID DSP , and accept TID DSP values only when LOM does not occur.Figure 9 plots the kernel density estimates for the sampled, accepted and rejected DSP TID limit ranges at the 95th percentile for 1 in 10,000 mission failures.By extension, sampling over a range of expected mission failure rates, we can construct a radiation-hardening vs. mission failure curve.This curve is illustrated in Figure 10.
stance, we choose the DSP and assign its TID limit as  = ℒ( = 1.0 × 10 ,  = 1.00 × 10 ).Here,  is a loguniform distribution, and much wider than the nominal value specified in Table 9. Next, we invert the probability of mission success, making it conditional on the event  , and accept  values only when LOM does not occur.Figure 9 plots the kernel density estimates for the sampled, accepted and rejected DSP TID limit ranges at the 95th percentile for 1 in 10,000 mission failures.By extension, sampling over a range of expected mission failure rates, we can construct a radiation-hardening vs. mission failure curve.This curve is illustrated in Figure 10.
The results of the analysis allow for us to choose a radiation-hardening limit based on target mission success criteria.The results of the analysis allow for us to choose a radiation-hardening limit based on target mission success criteria.

Conclusions
This study presented a probabilistic risk assessment (PRA) approach for assessing the reliability of commercial off-the-shelf (COTS) drones in nuclear-contaminated search and rescue (SAR) missions.The approach integrated the event tree/fault tree digraph method with the dual-error propagation method (DEPM) to model potential loss of mission (LOM) scenarios due to combined hardware-software failures in the drone's navigation system.The impact of radiation on the drone's components was simulated by comparing the total ionizing dose (TID) with the acceptable limit for each component.To mitigate TID-based component failure, a strategy was presented for selectively hardening components based on the desired mission success criteria.
The results of this analysis can aid in the integration of COTS components into radiation-hardened designs, optimizing the balance between cost, performance, and reliability in drone systems for nuclear-contaminated SAR missions.
While the study provides valuable insights into the reliability of COTS drones in nuclear-contaminated environments, it also has several limitations.The analysis only considered the availability of the navigation system and did not take into account all components of the UAV and their interdependencies, including common-cause failures.The model also did not consider the potential impact of weather conditions or terrain on the drone's ability to navigate each zone, or the potential for human error in drone operation.Furthermore, the study only considered TID effects and did not take into account other failure mechanisms such as displacement damage and single-event effects (SEEs).In addition, the mission objective was limited to the failure of a single drone, which is not fully representative of fleet-based systems.
Future work will address these limitations by developing a more comprehensive model that includes all components of the UAV and their interdependencies and considers other failure mechanisms and environmental factors.Additionally, future research will explore the use of dynamic PRA methods and integrate them into a framework that collectively assesses the cost, performance, and reliability of drone systems for nuclear-contaminated SAR missions.Additionally, we will explore the broader context of resilience, including recoveries, evolving mission objectives, and the occurrence of unspecified failures.This will provide a more holistic understanding of the resilience of drone systems in nuclearcontaminated environments.

(
): empty set, (x*): zero or more instances of x, (x + ): one or more instances of x.Drones 2023, 7, x FOR PEER REVIEW 3 of 18

Figure 1 .
Figure 1.State transitions within a three-state system, initialized as nominal.(left) temporal, (right) state machine.

Figure 1 .
Figure 1.State transitions within a three-state system, initialized as nominal.(left) temporal, (right) state machine.

Figure 2 .
Figure 2. Example DEPM with a legend.The DEPM model in Figure 2 depicts the execution of a serial code.Assembly operations, represented as elements A, B, and C, read and write data variables 1 and 2, and to and from CPU registers.Element A changes variables 1 and 3. Elements B and C change variables 2 and 4. Element B reads from data variable 1, while element C reads from both variable 2 and variable 3.

Figure 3 .
Figure 3. Kernel density estimates for total received dose by zone.

Figure 3 .
Figure 3. Kernel density estimates for total received dose by zone.
Drones 2023, 7, x FOR PEER REVIEW 7 of 18TID effects, which are the cumulative effects of ionizing radiation on materials and devices.

Figure 4 .
Figure 4. Event tree description of navigation system availability for radiation zones A, B, and C.

Figure 4 .
Figure 4. Event tree description of navigation system availability for radiation zones A, B, and C.

Figure 7 .
Figure 7. Dual-graph error propagation method (DEPM) representation of assembly for a singlevariable Kalman filter.

Figure 8 .
Figure 8. Density estimates of COTS navigation system failure probabilities by zone and TID effects.

Figure 8 .
Figure 8. Density estimates of COTS navigation system failure probabilities by zone and TID effects.

Table 1 .
Three-tuple Kripke terms for a three-state transition system.
X × Y is left-total), and M is fully connected.

Table 2 .
State transition definitions for the three-state model referenced in Figure1.
B(C*|I) Fault-Tolerant Avoid transition to failure, given a fault.A + |(B(C*|I)) Failure-Avoidant No failures occur.G|F Resilient System Recover from a failure, either fully or partially.B(C*|D(E*|G))|(H(E*|G)) Irrecoverable System Neither completely fails, nor returns to nominal.BC*( |DE*)Permanently Failed System remains irrecoverable forever.

Table 1 .
Three-tuple Kripke terms for a three-state transition system.

Table 2 .
State transition definitions for the three-state model referenced in Figure1.
(ϵ): empty set, (x*): zero or more instances of x, (x + ): one or more instances of x.

Table 3 .
Definitions for terms in dual-graph error propagation model (DEPM).

Table 4 .
Conditional logic table for example DEPM in Figure2.

Table 5 .
Ambient radiation dose rates for radiation zones A, B, C.

Table 5 .
Ambient radiation dose rates for radiation zones A, B, C.

Table 6 .
Manufacturer (Texas Instruments)-provided failure rates for generic drone hardware components.

Table 8 .
Assembler code for single-variable Kalman filter algorithm.

Table 10 .
Probability of loss of mission (LOM) due to COTS drone navigation system failure.