Model-Based Approach to Engineering Resilience in Multi-UAV Systems

: Multi-UAV Operations are an area of great interest in government, industry, and research community. In multi-UAV operations, a group of unmanned aerial vehicles (UAVs) are deployed to carry out missions such as search and rescue or disaster relief. As multi-UAV systems operate in an open operational environment, many disrupting events can occur. To this end, resilience of these systems is of great importance. The research performed and reported in this paper utilizes simulation-based research methodology and demonstrates that resilience of multi-UAV systems can be achieved by real-time evaluation of resilience alternatives during system operation. This evaluation is done using a dynamic utility function where priorities change as a function of context. Simulation results show that resilience response can in fact change depending on the context.

A Multi-UAV system is essentially a network of UAVs (agents) in which managing interaction and dependencies are important for successful mission execution.Operating multiple UAVs simultaneously has several advantages.It enables flexible allocation of requirements to multiple vehicles, which reduces operational complexity while increases overall mission coverage.Component systems can collect information from multiple sources (using onboard sensors), share that information with other members and execute actions in coordinated fashion in multiple locations [9].This capability brings more time efficiency into mission execution as vehicles perform assigned or negotiated tasks in parallel to fulfill mission objective [10].
Multi-UAV systems typically operate in an open environment and are susceptible to multiple disruptions.Uncertainties and unexpected factors in the environment disrupt system's operation and adversely impact overall system performance.It is highly desirable that these systems maintain acceptable level of performance while dealing with disruptions in dynamic environments.This ability is the hallmark of resilient systems.
Resilience is an important capability needed in 21st century systems and system-of-systems [11].As systems continue to grow in complexity while operating in dynamic environment, the need to handle unexpected disruptions has become extremely important.Various organizations have invested in performing research in various aspects of resilience engineering.For example, U.S. Department of Defense has invested in various programs such as Engineered Resilient Systems (ERS) to develop resilience methods for engineering systems [12].International Council on Systems Engineering (INCOSE) has a dedicated working group for resilient systems focusing on resilience engineering research and application [13,14].
Current methods and approaches for achieving resilience are rooted in safety and risk analysis [13,15].These methods, while applicable to single systems, do not scale to address resilience in system-of-systems (SoS) [13,16].They require anticipating and proper planning for circumventing disruptions or recovering from potential disruptions.This activity is mainly done during system design stages.However, since real world systems operate in open, dynamic environments, anticipating and planning for all potential disruptions is a daunting task.This recognition forms the basis of the research presented in this paper.
There are multiple ways to deal with a particular disruption; however, which alternative is appropriate to deal with the disruption given the current situation has not been addressed properly in resilience literature.To overcome this shortcoming, a methodological framework is required to explore resilience alternatives in terms of their impact on the system.This is the main focus of the performed and reported research in this paper.One of the key questions this research aimed to answer is whether resilience can be achieved by dynamically evaluating resilience alternatives during system operation.
The state-of-the-art in multi-UAV mission execution is primarily limited to pre-loaded plans for the mission with limited flexibility to handle known disruptions [17,18].As UAVs operate in open environment, disruptions can take a variety of forms such as systemic (within UAV or within UAV network), environmental (e.g., jamming, loss of communication, loss of sensor, or loss of observability due to extreme weather), or human-triggered (e.g., sending a wrong command by the operator or hacking into the system by a third party).
Multi-UAV systems can be homogeneous or heterogeneous [19].In homogeneous multi-UAV system, constituent UAVs share similar physical and functional characteristics.On the contrary, in a heterogeneous system, UAVs can have different physical shapes or perform different functionalities [20].The heterogeneity can contribute to system resilience due to complimentary capabilities of UAVs.At the same time, it can also pose a challenge in devising resilient strategy to deal with disruption since required functionalities may not exist in any of the constituent systems.In addition, multi-UAV systems demonstrate higher availability since each vehicle has a degree of fault tolerance and reliability.These systems also enable flexible communication protocols and allow adaptable functional allocations, which further contribute to system resilience.
Multi-UAV systems can be viewed as a system-of-systems [10].This perspective explicates the interactions and dependencies among the UAVs (agents) and facilitates understanding of disruption propagation throughout the SoS.When multiple systems come together under system-of-systems umbrella, they offer additional functionalities that do not reside in any single system, multi-UAV systems are no exception.Thus, they can exhibit very unique capabilities.Additionally, as they are composed from existing systems, no longer there is a need to design a single vehicle with a set of complicated requirements to meet a specific need, which has great impact on reducing cost.One key aspect that has to be taken into consideration carefully is interoperability and integration of constituent systems to ensure successful operation.
This paper is organized as follows.Section 2 discusses main characteristics of multi-UAV systems and operation.Section 3 discusses resilience in the context of multi-UAV systems.Section 4 identifies research problem and questions.Section 5 formulates resilient multi-UAV system problem and introduces research hypothesis.Section 6 discusses main components of the solution approach.Section 7 presents experimentation setup and simulation results where three alternatives are considered and compared to each other during mission execution.Section 8 summarizes the paper.

Multi-UAV Systems and Operation
A multi-UAV system is essentially a system-of-systems (SoS) as it satisfies the requirements for SoS defined by Maier [21].These are discussed next within multi-UAV system context.Each UAV has operational independence as it performs its assigned function while also participating in the SoS.High-level planning, plan decomposition, task allocation, and conflict resolution are the key functions that play important roles for successful operation.As such, proper coordination and cooperation is essential [10,17].Coordination requires allocating sufficient temporal and spatial resources.Temporal coordination is key to synchronized operations and spatial coordination is key to ensuring safety of vehicles.Cooperation of vehicles requires integration of sensing, planning and control within a model-based decision framework.Both centralized and decentralized decision-driven architecture can be employed in parallel [22].The choice of the decision architecture depends on the characteristics of the operational mission, each vehicle's sensing and computing capabilities, and security considerations [9,10].
Vehicles can also have different governance while participating in the SoS, which can have an impact on the interaction and communication protocols among vehicles.Furthermore, multi-UAV system can evolve with functions and purposes added, removed, and modified with experience and with changing needs or mission objectives.It also exhibits emergent behavior as overall functionality of SoS does not reside within any single UAV.Furthermore, UAVs are geographically distributed since they primarily exchange information-not mass or energy.
The Office Under Secretary of Defense, Acquisition Technology and Logistics (OUSD AT and L) Systems Engineering Guide for SoS [23] defines the following classification for SoS.Virtual: There is no agreed upon purpose and there is no central authority.Collaborative: Each system voluntarily collaborates with other systems on an agreed upon purpose.Acknowledged: There is recognized objectives and central authority and resources, however, constituent systems retain their independent operation.Directed: There is a central authority and component system's operational mode is subordinate to the centrally managed purpose.Since multi-UAV systems are typically deployed to carry out a specific mission, they can be either collaborative, acknowledged or directed.The specific configuration, however, depends on the mission that system is intended to carry out.Multi-UAV system's operation, regardless of mission type, falls into four operational phases [24].During Deployment (or Takeoff) phase, multi-UAV system is put into operation.In this phase, various take-off methods can be seen, such as vertical take-off, conventional take off, or assisted take-off depending on the UAV type.During En-Route (or Cruise) phase, already deployed UAVs fly from deployed location to the specific location where the mission has to be performed.In this phase, path planning and navigation plays major role.Actions on Objective phase is the central part of overall multi-UAV system operation.In this phase, multi-UAV system performs intended mission such as surveillance, data collection, or sensing.During Redeployment phase, UAVs returns to base and multi-UAV system is taken apart.
Furthermore, in each phase, UAV's maneuver falls into different patterns.For instance, a quadcopter can perform the following maneuvers: Vertical Take-off and Land, which is flying straight up to a specified altitude or land from a certain altitude; Hover, when vehicle is staying stationary at a specified location for a certain period of time to perform a task; Straight Path with/without angle, when vehicle is going a straight path from point A to point B (e.g., between waypoints) in specified time; Flying in an arc, when vehicle is going from point A to point B on an arc of radius R; and Combined Maneuvers, which is a combination of basic maneuvers.For example, going zigzag is essentially going a straight line between alternative waypoints.Other combined maneuvers are "Cuban 8" (also known as "Figure 8") or other geometric shapes such as square or circular [25,26].
In general, there are three Command and Control (C2) typologies for multi-UAV systems [27][28][29][30].In a human-centered control, the human is the central part of the system and gives all commands and controls [31,32].In this case, the system has less autonomy, which is a disadvantage for long missions.This is mainly because humans can go under cognitive overload when working with multiple vehicles simultaneously and suffer from loss of attention when monitoring displays for long periods of time [27][28][29]33].In decentralized control with human as a supervisor, each vehicle has control over its own mission and shares data with its neighbor vehicles.This requires some degree of autonomy on the system level.Human operator is brought into the loop in the case of emergency.
In the decentralized control typology, the system aims to defer human intervention as much as possible and operate autonomously [5,18].Furthermore, in this typology, UAVs can have leader-subordinate configuration [34] where one UAV is the designated commander of the group and it makes high level decisions for the multi-UAV system.In fully autonomous command and control, the human has almost no role in the system.The system distributes the task and operates fully autonomously.This puts extra emphasis on attributes such as resilience and self-regulation.This is usually costly and poses several risks since system validation and verification requires substantial effort.This typology is mostly suitable for long missions and requires no personnel training, which reduces operations cost [15].However, in most complex missions, especially military missions, having humans in the loop in some capacity is still one of the main requirements [27][28][29].

Resilience in Multi-UAV Systems
Multi-UAV system operates in open environment and is susceptible to various forms of disruptions [10,11,35].As uncertainties and unexpected events in the environment disrupt system's operation, the system should be able to deal with these events while maintaining acceptable level of performance.A multi-UAV system is called resilient if it is able to accomplish the original mission within acceptable level of performance in the face of disruptions [10,11].
Madni and Jackson [36] identified several resilience heuristics.Some of these heuristics can be applied during system design to ensure resilient behavior during system operation.However, some of these heuristics can be viewed as resilience mechanisms (alternative) that a multi-UAV system can dynamically employ to deal with a disruption during operation.These mechanisms are the following.
Human as a Backup is to bring humans into the loop when the system is unable to handle the disruption.It is viewed as the last option for a collection of autonomous systems.Pre-planned Protocols is to execute pre-defined plan to handle known and some unknown disruptions.Physical Redundancy in multi-UAV system is a resilience alternative when another identical UAV replaces incapacitated UAV, for example deploying a new UAV and integrating it into the system when one of the UAVs unexpectedly lands due to a disruption.Functional Redundancy is to achieve same functionality by other means.Function re-allocation is re-distribution of overall functionalities (or remaining tasks) among remaining UAVs upon a disruption (e.g., loss of an UAV).Circumvention is to avoid a disrupting event by re-planning.Neutral State is to go into a safe mode to prevent further damage.
For severe disruptions, these alternatives can be combined to develop more sophisticated strategies to handle disruptions.Not all resilience alternatives are going to be affordable and compatible with system capabilities; some may even violate system constraints.Therefore, before choosing a resilience solution, the capabilities and constraints of multi-UAV system and the environment must be taken into account.
In general, there are three categories of disruptions: external, systemic, and human-triggered [36].External disruptions are largely associated with environmental obstacles and incidents.These disruptions are often random and their severity and duration cannot be predicted.In a multi-UAV SoS, for example, a disruption in communication among UAVs caused by a flock of bird can be considered an external disruption.
Systemic disruption happen when an internal component's functionality, capability or capacity causes performance degradation.It is perhaps the most easily detectable in technological systems when a component failure affects the functionality of the node or overall system.For example, an internal failure in UAVs communication device will not allow the vehicle to send out its coordinates to its neighbor, vehicle which can potentially lead to a collision [36].
Human-triggered disruptions are associated with human operators inside or outside of the system boundary.Even though the human operator's role can be limited to commanding the overall objective [15] and monitoring activities, this does not make the multi-UAV system immune from human-triggered disruptions.Multi-UAV systems are highly adaptable and complex systems, and can adapt at a much faster rate than humans.Thus, human operators can inadvertently cause disruption within the system [36].
Disruptions can also be predictable or random [37].Predictable disruptions are known in terms of time of occurrence, location of occurrence, and triggering event.Random disruptions can occur unexpectedly and put the mission at jeopardy.To determine severity of the disruption, its context and duration must be taken into account [15,38].Disruption context reflects the system's state and current operational use.Disruption duration determines whether a disruptive event is temporary or an indication of a trend [38].In some situations, temporary disruptive conditions can be safely ignored without resulting in system failure.Persistent conditions need to be monitored so that an appropriate resilience response can be triggered [38].The operational context determines the appropriate system response to a disruption.Responses that need to be fast are likely to be performed autonomously by the UAVs.Less time-critical responses can be performed with human in the loop, or planned ahead of time.

Research Problem and Questions
In complex engineering systems, the main dimensions of resilience refer to stability, robustness, vulnerability, safety, and adaptability [39][40][41].Stability is mainly focused on the system's ability to preserve or return to the same equilibrium state when failures occur.Robustness of the system is the ability to maintain basic functionality when subjected to disrupting events.Vulnerability is primarily focused on sensitivity of the system to disruptions as it measures the degree of loss or damage due to impact of disruptions.Safety is looking at property damage or human injuries through a process to identify hazards and manage risks.Adaptability involves transformation, learning, self-organization of the system to deal with disruption [39][40][41].
In resilience research community for engineering systems, redundancy and connectivity are considered as main enablers of resilience [39].Redundancy, both functional and physical, is one way to achieve resilience [36,39].It is primarily rooted in fault-tolerance and reliability analysis.Connectivity is mainly attributed to the communication among systems (or nodes) to share information through different pathways to protect the network from disconnection of a node or disruptions in general.It is widely assumed that these two enablers positively impact system resilience.Other resilience enablers such as function re-allocation or human-in-the-loop unfortunately have not been emphasized.The relationship between system topology and resilience in terms of robustness and vulnerability has been mainly investigated by employing network theory through various tools such as critical node detection and shortest path identification [39,40].For networked systems, the main dimensions of resilience are recovery, and adaptive capacity with main enabler being the redundancy of links and nodes centrality.
Research methodologies used for resilience of engineering systems are mainly focused on case studies or conceptual models [39,40].Conceptual models are mainly employed when research aims to theoretically identify resilience drivers.Case studies are mainly used to provide clarification and confirmation of relationships between drivers and overall system resilience.Unfortunately, quantitative-based and simulation-based research methodologies are rarely used [39].To this end, theories developed for resilience research lack operational capabilities and have not been tested.This can be rectified by employing simulation and quantitative methods to further investigate resilience drivers and push the concepts more toward operation and test their viability under different conditions.The research presented in this paper utilized simulation-based research methodology to address resilient behavior during system operation and to ensure performed research is close to reality and can be utilized in real world.
Previously, there have been attempts to categorize resilience techniques for acknowledged system-of-systems in space domain [42].MITRE's cyber resiliency framework [43] was extended to include non-cyber disruption and main resilience attributes were identified [42].Then, resilience techniques were mapped to resilience goals and objectives.Furthermore, conflicts and the harmonies among resilience techniques were identified.However, the analysis was primarily qualitative and did not have quantitative bases.Some key questions such as why one resilience technique (mechanism) is preferred over the other or how trade-off can be performed among competing resilience techniques were not addressed.The research presented in this paper aimed to answer these questions.
Current resilient methods are rooted in safety-based analysis and are primarily limited to system design or verification stages [10,24,40].Most of these methods require anticipating disrupting events and/or are very subjective to system designer's opinion [41].This forces a system designer to identify when, where, and how disruptions can occur and plan resilience in specific areas.If systems were to operate in a closed and static environment, this would be quite appropriate.However, most real life systems operate in open, dynamic, and uncertain environments.Identifying all possible disruptions and mitigating them during design stage seems to be a daunting task.Hence, it is important to ensure system can make reasonable decisions to deal with unexpected disruptions during the operation.Limited number of techniques address resilience in real-time (i.e., during operation stage) where appropriate resilience response is chosen by the system to handle the disrupting event [44].
Safety-and risk-based analysis techniques are suitable for single systems, however, their application to system-of-systems are limited [40].It is rarely the case that one designs constituent systems within SoS from scratch [21].Existing systems are usually integrated together under SoS umbrella to satisfy mission requirement.Often, constituent systems are fully developed systems with some degree of fault tolerance and robustness [45].When integrating constituent systems and forming the SoS, the systems engineer can ensure that SoS is robust and can withstand disrupting events, however, anticipating every disruption that can occur is again a daunting task [46].Therefore, it is important to show resilience during SoS operation to unknown and unexpected events when each system has already some degree of fault-tolerance and robustness.This requires high level decision making to select an appropriate resilience response to disruptions.As number of alternatives to employ during system operation to cope with a disruption can be quite large, specifically which alternative is appropriate given the system condition is something that current methods do not address.Thus, one of the main questions that presented research aimed to answer is: Can system resilience be achieved during operation by evaluating resilience alternative's impact?If so, how?
The constraints imposed on the system limit the choices for resilient alternative selection.The list of resilient alternatives can vary given the operational context, however, it is never going to be a short list.Furthermore, alternatives can be combined to deal with a set of disruptions.Thus, the possibilities to deal with a specific disruption or set of disruptions can be quite large.Not all of these alternatives will be suitable to execute given the system constraints and operational context and disruption type.Furthermore, these alternative need be evaluated within the given context and be selected based on a criteria such as vehicles safety, available resources, or mission objectives.
When selecting the best alternative, the need for trade-off among alternatives becomes important.Each alternative needs to be evaluated based on its impact on set of attributes that each are linked to safety of the systems, available resources at the disposal, or their impact on mission objectives.Furthermore, these attributes are unlikely to have equal priorities all the time.For example, during normal operation, accomplishing the mission on time has higher priority, while safety can become more important when dealing with a disruption.

Multi-UAV System Problem Formulation and Research Hypothesis
Multi-UAV systems have gained interest in many domains due to their unique capabilities [20].Many researchers from around the world have been working in this area and have made significant advances in various aspects of the multi-UAV systems.These can be categorized into five major groups.Observation and Monitoring: This area is concerned with developing proper mechanisms to ensure appropriate information is collected, for example, techniques that ensure vehicles are equipped with proper sensors to collect information [47] about multi-UAV system performance.

Detection and Identification:
This area of research is concerned with developing methods to analyze collected information, for example, using Bayesian Belief Networks (BBN) to calculate a belief for system's current state [48].Furthermore, this information can be used to detect the disruptions and their impact on the system.Planning and Decision Making: This research area is concerned with high level decision making, planning and coordination among the vehicles.Questions such as "what is best course of action given system's current state?"falls into this category [22,49].Communication and Networking: This research area focuses on the communication and networking challenges to ensure uninterrupted data flow among systems, or to identify the best topology to ensure smooth data exchange [3].Execution and Control: This research area is concerned with ensuring commands are executed properly, for example, developing appropriate guidance, navigation, and control on the single vehicle level [50].Figure 1 summarizes the five research areas in multi-UAV systems.Resilience, being a broad topic of research, can be found in all of the areas discussed above.The research presented in this paper falls into the "Planning and Decision Making" category.To formulate the problem, the following conceptual framework (Figure 2) was created.A multi-UAV system receives the mission information from the operator on the ground.It senses the environment through the sensors mounted on each vehicle.Vehicles share their collected information to enhance situational awareness.The system is operating in an environment where there are static obstacles such as building, towers, and tress; dynamic obstacles such as birds and other flying objects; different weather patterns that can impact communication among vehicles; and other hostile actors such as hackers.The environment, due to the uncertainty, can cause disruptions for the system.The system then needs to deal with disruptions while carrying out the mission.Given the nature of the problem, it is desirable to continue the operation despite disruptions either within original objectives or de-scoped objectives.Furthermore, while handling disruptions, the system need to take into account constraints such as physical, regulatory, communication, sensing, battery/power, and payload to select proper alternative.In light of the above, what is needed is a methodological framework to determine the best course of action to deal with disruption given system's current condition and disruption type.In this research, it is hypothesized that resilience of multi-UAV system during operation can be achieved if resilience alternatives (mechanisms) are evaluated by a utility function with attributes concerning safety, resources available, and mission objective where priorities of these attributes change during mission execution.For example, as discussed in Section 3, physical redundancy and function (task) re-allocation are two possible alternatives to deal with a disruption.However, specifically, which alternative is appropriate given the system's current situation depends on multiple factors.If one alternative is chosen without proper considerations, it can potentially lead to unintended consequences such as waste of resources, improper task re-allocation, or jeopardizing safety of vehicles.

Solution Approach
The overall solution approach leverages concepts from decision science, system modeling, and strategy and tactics from military science.To address the research question and hypothesis, a utility function was defined with three attributes and was tied to a decision support module that includes models and algorithms.Furthermore, the weightings of these attributes change depending on system tactics.The details of solution approach is discussed in this section.

Assumptions
In this research, it was assumed that UAVs can perform their mission autonomously without human intervention.The human operator plays monitoring role and is brought into the loop if multi-UAV system has exhausted all other possible alternatives or is dealing with a disruption beyond system's capability and capacity.As such, system tries to avoids "Human-in-the-loop" alternative as much as possible.
It was also assumed that the multi-UAV system is a directed system-of-systems where there is a central authority (commander UAV) that makes high level decisions, assigns tasks, and is responsible to direct the multi-UAV system when disruptions occur.While each UAV is concerned with the specific task(s) it is given, the commander is concerned with the overall mission that multi-UAV system is performing.Each UAV has some degree of decision making autonomy with respect to the task(s) it is given.For example, if a vehicle is given a task to go to a certain location within acceptable performance margins (e.g., acceptable accuracy and time to complete task) and monitor the area , it is allowed to make decisions with respect to handling disruptions impacting its task as long as it stays within the specified performance margins.If for any reason the vehicle cannot satisfy this requirement, it notifies the commanding UAV.
During the operation, all vehicles send and receive messages among each other at 100 Hz.This is to ensure the vehicles do not collide, and, if vehicles are on a collision course, there is enough time to avoid the collisions.To this end, each message has two main groups of information.The first group of information is about vehicle's state and task status.The second group of information is about multi-UAV system.It includes IDs of vehicles within SoS that the vehicle has been able to successfully send/receive messages.The commander UAV also includes information such as available resources and high level decision making criteria in the second group of information.The rationale for including UAV IDs in the messages is to ensure that multi-UAV system can identify if it has lost a member.If an UAV misses three cycles and does not send messages to any of the other vehicles, it will be assumed that the system has lost a member and it will be viewed as a disruption that would require the commander to take an action.The rationale for commander to send the decision making criteria and other information regarding the mission as part of the messages it sends is to ensure that that information is not lost if the commander leaves the group due to a disruption.To ensure continuity in commanding position, as well as keep track of information, each vehicle is given a unique ID number.The UAV with the lowest ID number assumes the role of the commander.If due to a disruption the commander UAV is incapacitated, the next UAV with the lowest ID number takes over the role.Figure 3 shows the system architecture and highlights main components of the vehicles.
It was also assumed that vehicles have identical platform (e.g., quadcopters), however, they are capable of performing different functionalities.Each vehicle has decision making block which is responsible for making high level decisions with respect to the mission (if it is the commander) or the specific task.Furthermore, each vehicle has guidance, navigation, and control units to execute the specific commands issued by the decision making algorithm.Each vehicle also has a communication unit, which is responsible for sending and receiving messages to and from other vehicles.Task related decisions are done using a rule-based system (i.e., if-then rules), while mission related decisions follow more sophisticated process described in the next section.

Decision-Making Process
The commander UAV makes decisions for the entire system using a utility function.Utility functions are mathematical equations that are used to rank alternatives based on their usefulness [51].Utility functions can be used to make informed decisions about a situation [51].Hence, they can be applied to make a decision about which resilience alternative to execute in order to deal with a disruption.However, utility function alone is not enough to make a decision.Various models and algorithms are needed to determine the impact of each alternative to the overall mission.Once the alternative is constructed, it is then evaluated using the utility function.The overall decision making process is shown in Figure 4. Every decision making process includes activities such as assessing current situation, generating set of alternatives, evaluating each alternative, and then selecting the best alternative [52].The decision making process shown above includes these components as well.System tactics block determines appropriate tactic based on the current situation and set of alternatives which needs to be evaluated.The details of this block is discussed in Section 6.3.Once the system tactic is defined, the weighting parameters for the utility function is then picked from a database (i.e., a look up table).The "Construct" module is responsible for generating and constructing the alternatives.It pulls in set of the possible alternatives from the list of resilience alternative, and constructs the details of the alternative using combination of models and algorithms.For example, for functional re-allocation alternative, this block calculates time to complete the mission, covered area, amount of utilized resources, remaining resources, and safety concerns such as possible collision between vehicles.This information, along with the weightings, is then passed to the utility function.The utility function calculates the utility of the alternative.Once utility of each alternative is calculated, this information is then sent to "Select" block where the alternative with highest utility is picked.Based on the selected alternative, the "Command" module issues high level command for the multi-UAV system.Details of utility function are discussed in Section 6.4.

Multi-UAV System Tactics
Tactics are used in different contexts.For example, a chess master can execute multiple tactics in a chess game to defeat an opponent [53,54].These tactics depend on the opponents moves and current state of the game [53,54].Similarly, in a battlefield, commanders can use different tactics to defeat the enemy [55].To select a tactic in a battlefield, the commander considers factors such as available resources, situational awareness, and safety and security of the troops [55].In general, tactics can be categorized as "Offensive" to defeat the opponent or enemy, or "Defensive" to protect assets and survive by sacrificing primary objective (winning a game, or battle), or "Trade-Off Tactics" where trade-offs are made between defeating the opponent or enemy and protecting assets or conserving resources [53][54][55].
A multi-UAV system dealing with a disruption is very similar to a commander or a chess master.To the chess master or the commander in battlefield, the moves made by the opponent or enemy are essentially disruptions.Tactics are then used to deal with these disruptions.A multi-UAV system can also benefit from similar tactics when dealing with a disruption.The tactics for multi-UAV system are defined below.

•
T1: Execute Mission (Offense): The priority is to accomplish the mission within the required performance or better.Safety and resource consumption have equal low priorities.• T2: Resource Conservation-Execute Mission: The priority is to conserve resources.Executing the mission and safety have second and third priorities, respectively.• T3: Safe/Secure-Execute Mission: The priority is safety and security.Execution of the mission and resource consumption have second and third priorities, respectively.• T4: Safe/Secure-Resource Conservation (Defense): Safety and security and resource conservation have equal high priority while executing the intended mission has low priority.
The first tactic is an offensive tactic where accomplishing the mission is the priority.The second and third tactics are trade-off tactics where the system is trying to balance the mission execution with resource utilization and safety concerns.The last tactic is a defensive tactic where accomplishing the mission is not a priority and surviving is more important.
In addition to the tactics, the multi-UAV system needs a criteria to transition (switch) between tactics.The transition between the tactics depends on the damage to the system, current state of the mission, and remaining resources.For example, if there are enough resources, and the system is at full capacity, and small amount of mission is accomplished, then the system chooses "Execute Mission" tactic.For brevity, Table 1 presents a subset of transition criteria.

Constructing Utility Function
To construct a useful and meaningful utility function, three criteria (attributes) need to be considered: Safety, mission goal, and resource constraints [56].Assuming utility function is the linear sum of these attributes, it takes the following form (Equation ( 1)): Mission Score is the sum of MissionETC Score and AccomplishedMission Score .MissionETC Score is the score associated with completing mission on time.Depending on the mission type, two types of functions can be constructed to calculate the score associated with the Estimated Time of Completion (ETC).Assuming optimum value (desired ETC) and deviations from this value is defined by the mission planner, a bell shape curve can be constructed to score mission completion time.In this bell shape curve, the score associated with the optimum value is equal to 1 and at the end points (the deviations) the score is equal to zero. Figure 5a shows the shape and the variables associated with the curve.The score when the value of attribute is equal to O is equal to 1 and at D min and D max is equal to zero.To this end, Wymore's Standard Scoring Function #6 was used [57].The original function was modified so values of S 1 , S 2 , B 1 , and B 2 can be calculated using values of O, D min , and D max .The reason for choosing SSF6 is because of its shape, as well as being a continuous function.Additionally, since values of S 1 , S 2 , B 1 , and B 2 can be modified, different shapes for the curve can be achieved, which will result in different scoring.For example, if the value of S 2 is a large negative number, the score for larger values of ETC will drop more rapidly.
Similarly, for time critical missions such as finding a lost hiker, the sooner the mission is accomplished the better.Therefore, even if the mission is completed before desired ETC, the score is still to 1. Assuming the maximum allowable time to complete the mission is also given, a smooth curve can be constructed between the desired value and the maximum value.To this end, Wymore's Standard Scoring Function #10 [57] was chosen for this purpose due to its shape.This function was further modified and combined with a step function to construct the overall shape given in Figure 5b.
AccomplishedMission Score is the score associated with the amount of accomplished mission.For example, for covering an area, since the total number of waypoints to cover is known, essentially the amount of accomplished mission is the ratio of covered waypoints over total number of waypoints.
Resource Score is the score associated with utilizing resources (i.e., deploying UAVs on reserve to replace an incapacitated vehicle).Assuming UAVs on reserve have equal functionalities and worth, then each vehicle's worth is essentially 1  n where n is the total number of UAVs on reserve.Therefore, a linear curve can be constructed where the score associated with zero utilization is equal to 1 and the score associated with maximum utilization is equal to zero.This curve is depicted in Figure 5c.
Sa f ety Score is the score associated with vehicles safety (i.e., collision).If an alternative puts vehicles on collision course, it receive score of zero.Alternatively, if it does not, it receives score of 1.Therefore, the scoring function for safety is essentially a step function.Figure 5d   In Equation ( 1), w 1 , w 2 , and w 3 are the weights for the parameters discussed above, where n ∑ i=1 w i = 1, n = 3.Furthermore, specific values are chosen based on the multi-UAV system tactics defined in the previous section.Table 2 defines the priority of the attributes based on the tactic and specifies the weightings value.These values are calculated based on the priorities and the condition that sum of these values equal to 1.For example, for Execute Mission tactic, since w 1 needs to be much bigger than the other two, it is assumed that it is at least twice as big as the sum of the other two.
Hence, the values shown in Table 2 are calculated.The weights for other tactics are calculated in a similar way.

Experimentation Setup and Results
In this research, agent-based modeling technique was utilized to model multi-UAV SoS.Each UAV was considered an agent with dynamic properties.The rationale for dynamic properties aws to get realistic behavior of individual UAVs.The specific type of UAVs modeled and implemented in this research were quadcopters.These vehicles are relatively easy to model compared to RC planes and have different maneuvering capabilities [5].Quadcopters (rotorcrafts) have gained interest in the drone industry due to their size and unique capabilities such as hovering.
To have flexibility in simulation environment and have the ability to modify parameters easily, the simulations for this research were written in Java.Object oriented programming languages such as Java are suitable for agent-based simulation since main classes can be defined and multiple agents can be instantiated during simulation.Furthermore, since multiple packages exist for Java to solve differential equations, vehicle's dynamics model can be easily implemented and solved at every simulation time step.Simulations ran at 100 Hz to ensure differential equations within dynamics model were solved accurately and solver's error is minimum.
Overall modeling construct is shown in Figure 6.Each UAV was modeled following the sense-plan-act construct.Each UAV senses its position and sends the information to plan module.Plan module has the information regarding the task that needs to be performed and decides the next action based on the sensed information.The decisions regarding specific tasks are made using a rule-based system.For example, if encountered by a physical static obstacle, the vehicle calculates how long it would take to bypass the obstacle.If it falls within the acceptable level of performance, it bypasses the obstacle and continues the operation.If it does not, it then notifies the commander UAV.The commander UAV makes decisions for itself using same rule-based system, however, decisions concerning overall multi-UAV system is done using the decision making process discussed in Section 6.2.The decision making process is also executed at every simulation step.Once a decision is made the command to act upon is communicated to all the participating UAVs.The act module for each vehicle includes vehicle's dynamics model and low level position and attitude controllers.

Simulation Scenario
A monitoring scenario was simulated as representative scenario to test research hypothesis.In this scenario, three alternatives (physical redundancy, function (task) re-allocation, and continue (do nothing) ) are available to the commander UAV to handle disrupting event(s).This scenario assumes there are N number of UAVs on reserve.The mission was to monitor a 10 m by 10 m area.As such, five UAVs are deployed.The UAV with ID = 0 is the designated group leader (commander).To demonstrate leader-follower configuration, it hovers at a certain altitude and monitors the activity of the remaining vehicles.Other vehicles are given a square pattern with specific set of waypoints to follow.Figure 7 shows the mission profile for the described scenario.
The commander has access to information such as number of extra vehicles on reserve, desired time to complete the monitoring mission, and maximum allowable time to complete the mission.Furthermore, based on this information and the information gathered from the vehicles in the SoS, the commander decides which tactic to employ and based on that determines the best course of action in case of disrupting events.The goal of the following set of simulations was to demonstrate that multi-UAV system tactic can change as a function of available resources, amount of damage to the system, and amount of accomplished mission.Furthermore, as tactic changes the priorities associated with mission goal, resource utilization, and safety attributes in the utility function, the resilience options executed by the system change.
During the mission execution, due to a malfunction, one of the UAVs lands unexpectedly.It communicates its decision to land with the rest of the vehicles and the commander.To the commander, this is a disruption that puts the mission at risk since a portion of the area will no longer be monitored.To remedy this, the commander has two alternatives: either deploy a new vehicle from the deploying site or reallocate (re-assign) the remaining task of the incapacitated UAV to another vehicle that is the closest to that area.
Figure 8 shows the result of a simulation where one of the UAVs (U AV 3 ) experiences malfunction at the end of its en-route phase just before executing the main task.In this case, five UAVs are deployed and three are on reserve (N = 8).Since in the beginning of the mission, the amount of accomplished mission is less than 25% of the overall mission, the tactic chosen by the system is Offensive (execute mission).This is to ensure that at least 25% of the mission is accomplished even with limited number of resources.For the same reason, when U AV 3 malfunctions, the system replaces it with a new vehicle to ensure some portion of the mission is executed on time.However, once 25% of the mission is accomplished and due to the drop in the number of available resources, the system changes its tactic to "Resource Conservation-Execute Mission".This is to ensure the system can deal with possible future disruptions and does not utilize its resources excessively.
Figure 9 shows the result of a simulation where the same UAV (U AV 3 ) experiences malfunction during execution of the main task.Since already more than 25% of the mission is accomplished at that time, the system is now on the "Resource Conservation-Execute Mission" tactic.This is mainly due to the limited number of resources.As can be seen from the vehicle paths in Figure 9, the system has chosen reallocating the tasks as the best alternative to deal with the disruption.The change in the system's choice, comparing to the previous case, is due to the early change in the system's tactic.Figure 10 shows simulation results where multiple disruptions happen to the multi-UAV system.In this case, five UAVs are initially deployed and five are on reserve (N = 10).At time T = 30 s, U AV 3 loses its communication with the rest of the group.This is viewed as a disruption to the commander UAV.Since there are enough resources and not enough amount of the mission is accomplished, the commander decides to deploy a new vehicle (U AV 5 ) to replace the lost UAV.Consequently, because of a drop in available resources after deploying a vehicle, the multi-UAV system changes its tactic to "Resource Conservation-Execute Mission".At time T = 50 s, U AV 5 loses communication with the rest of the group as well.Since the system just lost a second vehicle for the same reason, it switches its tactic to "Safe/Secure-Mission Execution".In this tactic, the commander now decides to continue the mission in degraded mode (de-scoped mission).It does not re-allocate the remaining tasks to the other vehicles and does not deploy a new vehicle.This is because the commander is prioritizing the safety of the vehicles over mission execution and is marking that area as unsafe.From the simulation results presented in this section, it can be concluded that the change in system's tactic (i.e., the priorities) can lead to selecting different resilience alternative to deal with the same disruption.Simulation results demonstrated that resilience of multi-UAV can be achieved during operation by evaluating resilience alternatives using a dynamic utility function, which validates the hypothesis proposed in Section 5.

Summary
Multi-UAV operations are important for a variety of applications in both military and civilian domains.In this paper, multi-UAV systems are discussed from a system-of-systems perspective.As these systems operate in an open environment, the need for resilience are emphasized.This paper addresses the gap associated with real-time evaluation of resilience mechanisms to handle disrupting events.It focuses on the resilience of multi-UAV systems and demonstrates that the resilience of these systems can be achieved by evaluating resilience alternatives (mechanisms) during system operation.It introduces the notion of multi-UAV system tactics to dynamically change the priorities of the system based on the current state of the mission, remaining resources, and the damage(s) caused by the disrupting events.The simulation results validated the research hypothesis that such real-time evaluation is possible and demonstrated that change in system's tactic leads to selecting different resilience alternative.

Figure 1 .
Figure 1.Research areas in multi-UAV systems.
depicts this function.

Figure 8 .
Figure 8. Case I: U AV 3 malfunctions at time T = 30 s.