Start-Up Strategy-Based Resilience Optimization of Onsite Monitoring Systems Containing Multifunctional Sensors

: In nonrepairable multifunctional systems, the lost function of a component can be restored by the same function from another component; therefore, the activation mechanism of redundant functions illustrates that multifunctional systems have resilience features. This study evaluates the resilience of multifunctional systems and analyzes the properties of system resilience ﬁrst. To determine the optimal start-up strategy, a resilience-oriented start-up strategy optimization model for onsite monitoring systems (OMSs) is established to maximize system resilience under a limited budget. In this study, real-time reliability is regarded as the system performance to evaluate the system resilience, and a two-stage local search based genetic algorithm (TLSGA) is proposed to solve the resilience optimization problem. The results of our numerical experiments show that the TLSGA can more effectively solve the problems for OMSs, with high function failure rates and low component failure rates compared with classical genetic algorithms under 48 systems. Moreover, the optimal combinations of unmanned aerial vehicles (UAVs) for an OMS under a limited budget shows that UAVs with a higher carrying capacity should be given priority for selection. Therefore, this study provides an effective solution for determining the optimal start-up strategy to maximize the resilience of OMSs, which is beneﬁcial for OMS conﬁguration.


Introduction
As technology continues to advance, a component can perform multiple functions, and such components usually have a basic function with one or more additional capabilities.Multifunctional systems are widely used in various engineering fields, including power transmission [1], energy storage and conversion [2,3], 3D manufacturing [4,5], and water quality monitoring [6].A multifunctional system containing multifunctional components aims to provide several functions required to complete one task; however, all the functions or components are nonrepairable.If a function in a component is lost, the corresponding functions of other components are activated and connected to the system to ensure the required functions.Multifunctional systems do not fail until a function is completely lost; thus, the multifunctional system can resist external interference.Some researchers have focused on the reliability analysis of multifunctional systems.Moreover, researchers are focusing on the ability of complex systems to resist external disruptive events, among which resilience is an important indicator for evaluating system performance.Many complex systems should consider resilience because system suspension will cause substantial economic losses, especially in major engineering systems such as interdependent infrastructures [7], transportation systems [8], power systems [9], and container ports [10], oil and gas supply chain management [11], logistics systems [12], international trade [13], the large-scale optimization of damage functions [14], UAV networks [15], and epidemic vaccination [16].The automatic replenishment of failed functions in a multiple functional system indicates the resistance ability of the system; therefore, it is necessary to analyze the resilience of multifunctional systems with multifunctional components.
Resilience has become an important consideration in system design, evaluation, and optimization [17].Many scholars have focused on evaluating and enhancing system resilience to resist external risks.Several evaluation methods are available to quantify the resistance of complex systems.In a smart grid, the resilience of a power system can be quantitatively measured by the total load of the service [18].Wu et al. [19] reviewed the optimization methods for cyber-physical power system resilience from three perspectives.For a comprehensive performance management system, a DEA framework of data envelopment analysis for a dynamic network was proposed to assess the performance of a supply chain based on resilience over time [20].A figure of merit-based method was developed to determine system resilience according to the constituent elements of the tradespace processes that are currently used to select the preferred design alternatives [21].Under the resilience evaluation framework, a sequential Monte Carlo simulation method was proposed to integrate the real-time weather condition [22].Considering the influence of hurricane wind, a fully probabilistic and analytical measurement framework has been proposed to evaluate the resilience of a linear power distribution system [23].A full probabilistic and analytical measurement framework has been designed to assess the resilience of linear power distribution systems affected by hurricane wind [24].A twofold procedure based on data envelopment analysis and the Tobit model has been proposed to evaluate the effect of resilience conditions on the energy sector [25].Supply chain resilience is an increasing concern for diverse types of disturbances in quantitative methods, including analysis frameworks [26].Moreover, some scholars have focused on enhancing system resilience to reduce the degradation process.The upgradation and repair of network poles have been developed as hardening strategies to improve the resilience of the distribution network [9].Improving network resilience is often associated with reducing vulnerability by using a tri-level optimization method [27].To maximize infrastructure resilience, limited budgets and resources must be wisely allocated to components to reduce the consequences of low-probability high-impact events [28].The optimal defense strategy of reconfigurable systems was obtained using a genetic algorithm to maximize the defensive capability for resisting external risks [29].Resilience enhancement strategies can be obtained using a fast non-dominated sorting genetic algorithm to minimize costs and maximize system resilience [30].The location selection strategy of emergency rescue facilities in a multimodal transport network was obtained using a cooperative coverage model to improve system resilience [31].In summary, considerable work has been devoted to quantifying and describing resilience from different perspectives; however, there are no universal metrics for evaluating the resilience of different systems under different hazards.Resilience optimization is one of the most popular topics in the field of system performance optimization under external risks.Based on the research reported in the literature, a gap analysis for system resilience is summarized in Table 1 from the perspectives of the system performance index and optimization algorithms.Different indices were selected to evaluate system performance; however, system reliability was not selected as the performance index.Many studies have focused on resilience evaluation instead of resilience optimization; however, the optimization algorithms consist of some classical algorithms, including genetic algorithms and Monte Carlo simulations.Therefore, it is necessary to develop a new system performance index and effective optimization algorithms.

References
System Performance Indexes Optimization Algorithms [18] Integrated response [19] The disaster probability, caused damage, and response measures in grid Generation decomposition algorithm [20] Failure time and recovery probability Segmentation algorithm [21] Scale function [22] Recovery curve Network topology optimization [23] Fuzzy logical relationship [24] Probability for decision making Monte Carlo simulation algorithm [25] Energy efficiency and energy security [26] Analog derivation [27] Failure time and recovery rate [28] Examination time and functional level [29] defensive capability Genetic algorithm [30] Examination time and functional level [31] Cooperative coverage model Many scholars have analyzed and evaluated the performance of multifunctional systems.The reliability of a one-shot system containing multifunctional components was analyzed using an integrated analysis of component reliability, function reliability, and the interrelationship among the components and functions [32].A new goal-oriented method was presented to analyze the reliability of repairable systems with multifunctional components [33].The functional resonance analysis method was used to determine the risk of multifunctional flood defenses [34].A reliability evaluation methodology was presented for multifunctional processes that use a reliability-centered maintenance approach and modify general power generation functions in a large-scale manufacturing environment [35].The reliability of a multifunctional inverter was analyzed by considering actual industrial reactive power curve injection [36].The safety of multifunctional flood-defined systems has been evaluated based on the failure probability of multiple reinforcement strategies [37].A continuous-time Markov chain method was developed to evaluate the time-dependent reliability of a multifunctional system using a known start-up strategy [38].The reliability of multifunctional complex systems has been presented using hazard rate matrix and Markovian approximation [39].In addition, some researchers have studied the performance optimization of multifunctional systems.A multifunctional automatized forging station with a supervisory system for process and production management was developed in [40] to produce high-quality and low-price forgings.A reliability optimization model for multifunctional systems with multistate units was built to minimize system cost in [41].By considering multifunctional maintenance windows, a state-based maintenance decision problem was introduced but the authors of [42] to ensure a flexible balance of inspection and spare parts.A validated thermodynamic model was established to determine the optimal energy and exergy configurations of a two-stage multifunctional hybrid open absorption system [43].An effective metaheuristic algorithm model was established to reduce the total cost and solve the problem of preventive maintenance scheduling based on production planning and reliability [44].A sequential quadratic programming algorithm was designed to solve the reliability optimization problem of multistage supply chains [45].In summary, many scholars have focused on the reliability evaluation of multifunctional systems; however, few scholars have focused on the resilience evaluation and optimization of multifunctional systems.
The aim of this study is to evaluate the real-time reliability of a multifunctional system for quantifying system resilience.Unlike in the reported study, the real-time reliability analysis of a multifunctional system must consider when and how the start-up strategy changes before a specific point in time, making the problem more complicated.Once a component fails, all of its functions can no longer be used.Determining the reliability of components or functions according to their lifetime distributions is difficult because of the dependency between a component and its functions.The evaluation of system resilience mainly depends on changes in the actual system performance over time, which is one of the most important processes for calculating system resilience.Obtaining an optimal start-up strategy for multifunctional systems is another challenge.
The remainder of this paper is organized as follows: Section 2 introduces the definition, assumptions, and resilience evaluation of a nonrepairable multifunctional system containing multifunctional components.Section 3 describes the resilience-oriented start-up strategy optimization of multifunctional systems.In Section 4, a two-stage local searchbased genetic algorithm is developed to solve the optimization problem.Section 5 presents two experiments to illustrate the performance of the proposed algorithm and optimal start-up strategy for onsite monitoring systems under a limited budget.Finally, concluding remarks and directions for future research are provided in Section 6.

Onsite Monitoring Systems Containing Multifunctional Sensors
An onsite monitoring system (OMS) consists of multifunctional sensors, and they are used to complete one task with several required monitoring functions.In onsite monitoring systems, the required functions are satisfied by combining the multifunctional sensors with components.Maintenance activities cannot be implemented when the system is operating, and the lost function of one component can be supplemented with the same function of the other components.The system fails when a required function is lost and cannot be complemented by other components.The structure of an onsite monitoring system with m components and n required functions is shown in Figure 1.
The aim of this study is to evaluate the real-time reliability of a multifunctional system for quantifying system resilience.Unlike in the reported study, the real-time reliability analysis of a multifunctional system must consider when and how the start-up strategy changes before a specific point in time, making the problem more complicated.Once a component fails, all of its functions can no longer be used.Determining the reliability of components or functions according to their lifetime distributions is difficult because of the dependency between a component and its functions.The evaluation of system resilience mainly depends on changes in the actual system performance over time, which is one of the most important processes for calculating system resilience.Obtaining an optimal startup strategy for multifunctional systems is another challenge.
The remainder of this paper is organized as follows: Section 2 introduces the definition, assumptions, and resilience evaluation of a nonrepairable multifunctional system containing multifunctional components.Section 3 describes the resilience-oriented startup strategy optimization of multifunctional systems.In Section 4, a two-stage local searchbased genetic algorithm is developed to solve the optimization problem.Section 5 presents two experiments to illustrate the performance of the proposed algorithm and optimal start-up strategy for onsite monitoring systems under a limited budget.Finally, concluding remarks and directions for future research are provided in Section 6.

Onsite Monitoring Systems Containing Multifunctional Sensors
An onsite monitoring system (OMS) consists of multifunctional sensors, and they are used to complete one task with several required monitoring functions.In onsite monitoring systems, the required functions are satisfied by combining the multifunctional sensors with components.Maintenance activities cannot be implemented when the system is operating, and the lost function of one component can be supplemented with the same function of the other components.The system fails when a required function is lost and cannot be complemented by other components.The structure of an onsite monitoring system with m components and n required functions is shown in Figure 1.To better understand the structure, an onsite monitoring system consisting of three UAVs, each carrying two different types of sensors, was introduced to collect onsite information in a certain area, as shown in Figure 2. As shown in Figure 2, each UAV can be regarded as a component with a carrying capacity of four sensors, and this system should To better understand the structure, an onsite monitoring system consisting of three UAVs, each carrying two different types of sensors, was introduced to collect onsite information in a certain area, as shown in Figure 2. As shown in Figure 2, each UAV can be regarded as a component with a carrying capacity of four sensors, and this system should collect the three types of data (Types I, II, and III) via the coordination of three UAVs, which are three required functions.If sensor 2 in UAV3 fails, the system loses the required Type II data temporarily.UAV2 can activate carrying sensor 2 to complement the lost function, but the system fails if sensor 2 in UAV2 fails.The system fails if no functions that can complement the required functions are available from the other components.
collect the three types of data (Types I, II, and III) via the coordination of three UAVs, which are three required functions.If sensor 2 in UAV3 fails, the system loses the required Type II data temporarily.UAV2 can activate carrying sensor 2 to complement the lost function, but the system fails if sensor 2 in UAV2 fails.The system fails if no functions that can complement the required functions are available from the other components.The start-up strategy includes the initial selection of functions and the redundancy level, which is the number of functions for all components.An available start-up strategy should select a subset of functions from the components to satisfy the required functions.Moreover, the complementary mechanism of the OMS is important for guaranteeing the required functions.Therefore, it is necessary to determine the optimal start-up strategy to extend system survival time.

Assumptions of OMSs
Some reasonable assumptions that can be made to better evaluate the resilience and start-up strategy optimization of OMSs are summarized as follows: 1.In a multifunctional system, the failure of one component does not affect the failure of other components, and a functional failure does not affect the failure of another function in the same component.2. The failure of one component can cause all the functions of this component to be activated regardless of the conditions of the functions.3.If any function of a component is activated, the component is used.4. If no components can provide one of the required functions, the system will fail.5. Considering that sensors are electronic components, the lifetime of the components and their functions follow exponential distributions.6. Information on hazardous events, including the occurrence time and effect of events on components and their functions, is known, and the failure rates of the affected components or functions are larger.

Real-Time Reliability Analysis of OMSs
To better analyze the real-time reliability of the OMS, changes in the start-up strategy in a specified time duration should be considered; however, it is difficult to enumerate all The start-up strategy includes the initial selection of functions and the redundancy level, which is the number of functions for all components.An available start-up strategy should select a subset of functions from the components to satisfy the required functions.Moreover, the complementary mechanism of the OMS is important for guaranteeing the required functions.Therefore, it is necessary to determine the optimal start-up strategy to extend system survival time.

Assumptions of OMSs
Some reasonable assumptions that can be made to better evaluate the resilience and start-up strategy optimization of OMSs are summarized as follows: 1.
In a multifunctional system, the failure of one component does not affect the failure of other components, and a functional failure does not affect the failure of another function in the same component.

2.
The failure of one component can cause all the functions of this component to be activated regardless of the conditions of the functions.

3.
If any function of a component is activated, the component is used.

4.
If no components can provide one of the required functions, the system will fail.

5.
Considering that sensors are electronic components, the lifetime of the components and their functions follow exponential distributions.6.
Information on hazardous events, including the occurrence time and effect of events on components and their functions, is known, and the failure rates of the affected components or functions are larger.

Resilience Evaluation of OMSs 2.3.1. Real-Time Reliability Analysis of OMSs
To better analyze the real-time reliability of the OMS, changes in the start-up strategy in a specified time duration should be considered; however, it is difficult to enumerate all failure or activation situations.Furthermore, the usage time of components or functions is difficult to determine; therefore, the reliability of components or functions is difficult to obtain.Therefore, a new reliability evaluation method for evaluating the real-time reliability of OMSs should be considered.Considering the lifetime distribution assumptions, the reliability analysis method based on continuous-time Markov chain (CTMC) can be used to determine the state transition process by considering the activation and redundancy level of functions [46].However, the difficulty of the CTMC model lies in defining the system states and evaluating the transition probability by carrying out the following: (1) Define all the possible system states of the OMSs.( 2) Generate all possible states based on the redundancy levels and initial selection of functions.( 3) Analyze the transition rates between any two states by considering the failure of components or functions.(4) Calculate system reliability based on the Kolmogorov equations.The detailed process we used is summarized as follows: 1.
The OMS states were defined.For an OMS containing m components with n required functions, indicates that function j in component i is initially selected in the start-up strategy; otherwise, x ij = 0.All possible system states can be represented by a m × 2n matrix, which includes the initial selection of functions, where the redundancy levels y ij represents the redundancy levels of function j in component i, and y ij = 0 indicates that the function is lost.Once all elements in column j in the redundancy level part are 0, the system fails because no component can provide the required functions.

2.
Generate all possible states of the OMS based on a two-step iterative method.Set all the non-zero redundancy levels as one and generate all the system states based on the start-up strategy.Then, consider the combination of real redundancy levels y ij to enumerate all the system states.

3.
Evaluation of transition rates between different states: To better analyze the transition probability between the different states, the transition rates from state u to state v can be divided into four situations.Situation 1 indicates that state u cannot be transferred to state v by considering the failure of a function or a component, and the transition rates in Situation 1 are 0. Situation 2 indicates that state u can be transferred to state v by considering the failure of a component, and the transition rates are the failure rates of the components.Situation 3 indicates that state u can transfer to state v by considering the failure of a function, and the transition rates are the product of the function's failure rate and its redundancy levels under state u.Situation 4 indicates that state u can transfer to state v by considering the failure of a component or a function whose redundancy level is 1 for state u, and the transition rates are the sum of the function's failure rate and the component's failure rate.In summary, the transition rate q uv (u = v) can be determined using the above four situations, and q uu = −∑ N B u=1,u =v q uv , (u = v).The transition rate matrix is represented by all q uv .4.
Calculation of real-time system reliability: Let E contain all the system states, and let W be the set of all the working states.The probability that the system state is u at time t can be denoted as According to the Kolmogorov equations, P ' (t) = P(t)Q, and the probability of all states can be obtained using P(t) = P(0)e Qt .Therefore, the system reliability R(t) of OMS can be evaluated by the sum of the probabilities that the system is in the working states, as shown in Equation (1).

Resilience Definition for OMSs
Under hazardous events, system resilience can be defined as the ability of a system to resist the external environment and reduce system performance losses [47].When hazardous events occur, system performance will degrade over time.The classical evaluation method for system resilience is equal to the ratio of the area enclosed by the real performance curve and time axis to the area enclosed by the ideal performance and time axes [48], as shown in Figure 3.
Under hazardous events, system resilience can be defined as the ability of a system to resist the external environment and reduce system performance losses [47].When hazardous events occur, system performance will degrade over time.The classical evaluation method for system resilience is equal to the ratio of the area enclosed by the real performance curve and time axis to the area enclosed by the ideal performance and time axes [48], as shown in Figure 3.If system reliability is selected as the index of system performance, the resilience of OMSs can be evaluated by the ratio of the area enclosed by the real performance curve and time axis real S to the area enclosed by the ideal performance curve and time axis ideal S . real S is equal to  0 ( ) T R t dt , and ideal S is rectangular.Therefore, in ( ) SR f T , the system resilience at time T can be calculated as  0 ( ) T R t dt can be directly evaluated using the differential element method, though this can be time-consuming.With the consideration of evaluating ( ) Proposition 1.According to the assumptions of nonrepairable multifunctional systems, the expected residual lifetime after t0 of this type of system can be calculated using In this study, we only considered the resistance ability of a multifunctional system under several hazardous events without considering the restoration process because the aforementioned multifunctional systems are nonrepairable.The effects of each hazardous event on the components or functions are known, meaning that the real-time system reliability in the interval  If system reliability is selected as the index of system performance, the resilience of OMSs can be evaluated by the ratio of the area enclosed by the real performance curve and time axis S real to the area enclosed by the ideal performance curve and time axis S ideal .S real is equal to T 0 R(t)dt, and S ideal is rectangular.Therefore, in f SR (T), the system resilience at time T can be calculated as T 0 R(t)dt can be directly evaluated using the differential element method, though this can be time-consuming.With the consideration of evaluating R(t) via CTMC, T 0 R(t)dt can be easily calculated using dt, which could reduce time.The detailed derivation of T 0 R(t)dt, which is the expected residual lifetime after t 0 , is listed in Proof of Proposition 1.

Proposition 1.
According to the assumptions of nonrepairable multifunctional systems, the expected residual lifetime after t0 of this type of system can be calculated using In this study, we only considered the resistance ability of a multifunctional system under several hazardous events without considering the restoration process because the aforementioned multifunctional systems are nonrepairable.The effects of each hazardous event on the components or functions are known, meaning that the real-time system reliability in the interval [t l , t l+1 ](l = 0, 1, • • •, n d ) is recorded as R l+1 (t), which can be evaluated using Equation (1), where t l is the occurrence time of the l-th hazardous event, B l+1 is the transition matrix between all working states from t l to t l+1 , and e W is a column vector whose elements are one, and the number of elements is equal to the number of working states.The proof of Proposition 1 is as follows.
Proof of Proposition 1.The block matrix form of the Kolmogorov equation can be shown as follows: (P ' W (t), P ' F (t)) = (P W (t), P F (t))

B C D E
where P W (t) = (P 1 (t), P 2 (t), To solve this equation quickly, a Laplace transform was introduced to derive the final result.The Laplace form of P i (t) can be represented as follows: The Laplace transform of P ' W (t) = P W (t)B can be shown as follows: The left side of the above equation can be derived using partial integration as follows: Through substituting the result of the two equations above, we can obtain the following: The expected lifetime of a nonrepairable system is equal to the meantime to the first failure; thus, These two equations can get the same results.Therefore, we can conclude that Based on the above expression, the expected residual lifetime of a nonrepairable system after time t 0 can be calculated as follows: The expected lifetime of a multifunctional system in the interval [t 0 , T], (t 0 = 0), under known n d hazardous events can be calculated as Proof of Proposition 2. The expected lifetime of a nonrepairable onsite monitoring system is the meantime to the first failure.Thus, Through combining Equation (2) and Proposition 2, the system resilience at T under n d hazardous events can be evaluated as follows: Corollary 1: The system resilience at T without hazardous events can be evaluated using f SR (T 0) = (P W (T) − P W (t 0 ))B 1 −1 e W /T .

Resilience Properties of OMSs 2.4.1. Monotonic Analysis of System Resilience
According to the definition of system resilience in Equation ( 3), the monotonic properties of f SR (t) are summarized in Proposition 3. Proposition 3. If the system reliability of the OMSs R(t) is known, the resilience of the multifunctional system f SR (t) decreases with an increase in t, and the range of f SR (t) is [0, 1].Through using f SR (t) = t 0 R(t)dt/t, we can obtain the corresponding limitation of f SR (t).(2) Proof of monotonicity for system resilience: Because t = 0 when t > 0,

Proof of
the purpose is to determine whether g(t) is less than 0.Moreover, R(t) is differentiable and decreasing so that R (t) < 0 .We can obtain g (t) = tR (t) < 0, and g(t) will decrease as time increases.At the same time, g(t)| t=0 = 0 .Therefore, g(t) < 0, t > 0. It is easy to find that f SR (t) < 0, which means that f SR (t) decreases with an increase in time t.

Changes in System Resilience as the Number of Hazardous Events Increases
System resilience is closely related to the number of hazardous events, and it is not difficult to find that the higher the number of hazardous events, the lower the system resilience.To illustrate the effect of the number of hazardous events, the change in system resilience over time with different numbers of hazards when the start-up strategy is fixed is shown in Figure 4.The occurrence times of the hazardous events were t 1 = 5, t 2 = 10, and t 3 = 15.

Changes in System Resilience as the Number of Hazardous Events Increases
System resilience is closely related to the number of hazardous events, and it is not difficult to find that the higher the number of hazardous events, the lower the system resilience.To illustrate the effect of the number of hazardous events, the change in system resilience over time with different numbers of hazards when the start-up strategy is fixed is shown in Figure 4.The occurrence times of the hazardous events were  From Figure 4, it is easy to observe that the system resilience without any hazardous events is much greater than that of a system with hazardous events.At t = 50, the difference between the system resilience at = 0 d n and system resilience at = 1 d n was approximately 0.12.However, the difference in system resilience narrowed with an increase in From Figure 4, it is easy to observe that the system resilience without any hazardous events is much greater than that of a system with hazardous events.At t = 50, the difference between the system resilience at n d = 0 and system resilience at n d = 1 was approximately 0.12.However, the difference in system resilience narrowed with an increase in hazardous events.For example, the system resilience when n d = 2 is very close to that at n d = 3, and the difference between them is no greater than 0.0115.Therefore, the system resilience with the same start-up strategy decreases with an increase in the occurrence of hazardous events; however, the difference between system resilience is very close when the number of hazardous events is large.

Changes in System Resilience as Start-Up Strategy Adjusts
When the number of hazardous events and parameters of components and functions are determined, different start-up strategies can cause significant differences in system resilience.For a multifunctional system, the number of required functions is 5, and the number of components is 3. From Figure 5, we can observe that the differences between the three different start-up strategies are significant.From Figure 5, we can see that the system with start-up strategy 3 has the highest resilience over time compared with the systems with the other two start-up strategies.Compared with these three start-up strategies, start-up strategy 3 has two activated components in the initial selection of function, while start-up strategies 1 and 2 have three activated components.We can also conclude that a smaller number of activated components can generate higher system resilience when the redundancy levels of the functions are similar [37].To maximize the system resilience at a specific time point, the optimal start-up strategy should be determined to increase the ability of a multifunctional system to resist the decrease of system performance.

Resilience-Oriented Start-Up Strategy Optimization of Onsite Monitoring Systems
Start-up strategy optimization determines the start-up strategy for maximizing the resilience of an onsite monitoring system with known hazardous events.The occurrence time and effects of all of the hazardous events are known.
The objective is to maximize system resilience under a limited budget.The decision variables are the start-up strategy, including the initial selection of functions and the redundancy levels of all functions.The mathematical model was established as follows: From Figure 5, we can see that the system with start-up strategy 3 has the highest resilience over time compared with the systems with the other two start-up strategies.Compared with these three start-up strategies, start-up strategy 3 has two activated components in the initial selection of function, while start-up strategies 1 and 2 have three activated components.We can also conclude that a smaller number of activated components can generate higher system resilience when the redundancy levels of the functions are similar [37].To maximize the system resilience at a specific time point, the optimal start-up strategy should be determined to increase the ability of a multifunctional system to resist the decrease of system performance.

Resilience-Oriented Start-Up Strategy Optimization of Onsite Monitoring Systems
Start-up strategy optimization determines the start-up strategy for maximizing the resilience of an onsite monitoring system with known hazardous events.The occurrence time and effects of all of the hazardous events are known.
The objective is to maximize system resilience under a limited budget.The decision variables are the start-up strategy, including the initial selection of functions and the redundancy levels of all functions.The mathematical model was established as follows: ∑ Constraints ( 5) and ( 6) are related to the initial selection of the start-up strategy.In Constraint (5), (6) indicates that each function must be selected from a single component.Constraints ( 7) and ( 8) define the limits of the redundancy levels of functions.Constraint (7) shows that x ij must equal zero if y ij = 0 (i.e., component i does not have function j), and the redundancy levels must be integers.Constraint (8) describes the carrying capacity of each UAV, which means that the total redundancy levels of each UAV should be less than or equal to n 0 .Constraint (9) illustrates the cost limitation of the UAVs, and the corresponding redundancy of functions is supposed to be less than the budget C max , where C 0 is the unit cost of the UAV.

Solving Algorithm
A genetic algorithm (GA) is a remarkable meta-heuristic algorithm that utilizes the survival of the fittest idea to determine the optimal solution [49].GA has good global search ability because it can dynamically increase the diversity of population via crossover and mutation; however, GA has weak local search ability [50].Considering the advantages of importance measures on the local search ability, a two-stage local search was developed to enhance the local search ability of the GA [51].To determine the optimal start-up strategy model for onsite monitoring systems, a two-stage local search-based genetic algorithm (TLSGA) was developed by adding a two-stage local search process to improve the GA.

Two-Stage Local Search Method
Considering the advantage of the importance measure for finding the weakest links of a system, it can potentially cost-effectively improve the system performance by increasing the maintenance resources for the components with the highest importance [52].The local search method is used to update the start-up strategy by adjusting the start-up strategy to maximize system resilience, which involves two stages: the first stage is the redundancy adjustment of a component based on the redundancy importance measure; the second stage is the adjustment of the initial selection for a function.By considering the constraints of the initial selection and redundancy level, the detailed ideas of the local search method can be summarized as follows: (1) To adjust the redundancy level, each component has a limit of carrying functions, which is less than n 0 ; a component is selected by removing and adding one redundant function to maximize the improvement in system resilience under the unchanged carrying capacity of the component.The detailed redundancy adjustment process of component i* is as follows:

•
Choose the function • • •, n with minimum resilience reduction when removing its redundant function by one.

•
Choose the function • • •, n with maximum resilience increase when adding its redundant function by one.

•
Determine the component i * with the maximum improvement of system resilience by To adjust the initial selection of functions, the same function from other components should be considered to replace the current function; therefore, the component with the maximum improvement in system resilience should be selected to provide the same function.Thus, the detailed process of start-up strategy function adjustment can be summarized as follows:

•
Find the component with the maximum improvement in system resilience by adjusting the start-up strategy of function j according to i s * (j) = Determine the function of the component by maximizing the improvement in system resilience by j s * = argmax{j|i * s (j), j = 1, • • •, n}.In summary, we should choose the optimal component redundancy and start-up strategy adjustment to maximize the improvement in system resilience.

TLSGA Procedures
The TLSGA takes advantage of the better global search ability of the GA and better local search ability of the two-stage local search method.The procedures of the TLSGA are similar to those of the GA, which is a classical meta-heuristic algorithm with a standard process that includes initialization, selection, crossover, and mutation.The real number encoding method is used to represent the start-up strategy in the TLSGA, which is an m × 2n matrix.The difference between the TLSGA and GA is that the local search process is added after the mutation process in the TLSGA.A flow chart of the TLSGA procedures is shown in Figure 6.In summary, we should choose the optimal component redundancy and start-up strategy adjustment to maximize the improvement in system resilience.

TLSGA Procedures
The TLSGA takes advantage of the better global search ability of the GA and better local search ability of the two-stage local search method.The procedures of the TLSGA are similar to those of the GA, which is a classical meta-heuristic algorithm with a standard process that includes initialization, selection, crossover, and mutation.The real number encoding method is used to represent the start-up strategy in the TLSGA, which is an × 2 m n matrix.The difference between the TLSGA and GA is that the local search process is added after the mutation process in the TLSGA.A flow chart of the TLSGA procedures is shown in Figure 6.The performance of the TLSGA and GA in different systems was compared in an experiment (Experiment 1).For Experiment 1, the high and low failure rates of components and their functions were categorized into four groups: low λ f and low λ c , high λ f and low λ c , low λ f and high λ c , and high λ f and high λ c .For each group, there are 12 systems, the symbols of which are listed in Table 2; each system has a different component number m, required number of system functions n, carrying capacity of component n 0 , and number of hazardous events n d .The UAV fails more easily than the sensors; therefore, the failure rate of the components is higher than that of the functions.A low λ c is randomly selected from the interval [0.1, 0.3], and a high λ c λ c is randomly selected from [0.8, 1]; a low λ f λ c is randomly selected from [0.01, 0.02], and a high λ f λ c is randomly selected from [0.08, 0.1].Hazardous events occur at the 3rd and 7th hour, respectively.Moreover, the parameters of the TLSGA and GA included a population size of 100, a maximum number of generations of 200, a crossover probability of 0.9, and a mutation probability of 0.1.For each system, the TLSGA and GA were employed 50 times.
Two indices were used to analyze the performance of the TLSGA and GA: the improved number (IN) and the mean ratio of system resilience (MRSR).IN is used to count the number for which the result of the TLSGA is not less than that of the GA, and MRSR is the average ratio of the system resilience obtained by the TLSGA to that obtained by the GA.A higher IN or higher MRSR indicates that the performance of the TLSGA is better than that of the GA.Moreover, the applicability of the TLSGA can be analyzed using the average percentage improvement in system resilience.The improvement percentage P isr can be evaluated using Equation (10) as follows: where R TLSGA is the system resilience obtained by the TLSGA, and R GA is the system resilience obtained by the GA.

Experimental Results
The experimental results are listed in Table 3. Table 3 shows that IN decreased with an increase in the scale of the systems.The INs are almost 50 for systems S1-S6 in these four groups, but the INs become smaller for systems S7-S12.The lowest IN was 45, which

Numerical Case for Onsite Monitoring Systems
The optimal design of the onsite monitoring system determines the optimal combinations of the number of UAVs m and the carrying capacity of a UAV n0 to maximize the system resilience.Once the number of required system functions and the maximum

Numerical Case for Onsite Monitoring Systems
The optimal design of the onsite monitoring system determines the optimal combinations of the number of UAVs m and the carrying capacity of a UAV n 0 to maximize the system resilience.Once the number of required system functions and the maximum budget are determined, the optimal system resilience with different m and n 0 can be obtained using the TLSGA.The hazardous event occurs on third hour, and its effect was recorded.The maximum budget was USD 55,000, the unit price of the sensors was USD 1000, and the unit price of the UAV depended on the carrying capacity of the sensors, which was equal to 1000n 0 .The maximum system resilience at t = 10 h was obtained by the TLSGA.Therefore, the optimal system resilience with different combinations of m and n 0 can be obtained using the TLSGA.
From Figure 8, we can see that the maximum system resilience is 0.2003 when t = 10 h, and the optimal combination occurs when the number of UAVs is three and the carrying capacity of each UAV five.When the maximum budget is USD 55,000, the system resilience increases with an increase in the carrying capacity when m = 3 and 4. Similarly, system resilience increases with an increase in the number of UAVs when n 0 = 2 and 3 because the carrying capacity of each UAS is small.With an increase in the number of UAs, the redundancy of the entire system will increase; thus, the system resilience will increase.However, the system resilience decreases when m is 5 and n 0 is greater than 3. To some extent, the sensors in the UAV may have lower redundancy when they carry more types of sensors, which may decrease the system resilience.From the results of Experiment 2, the maximum budget cannot provide sufficient sensors with an increase in m and n 0 ; therefore, some UAVs may carry sensors whose amount does not reach the maximum carrying capability.From our experimental results, we can observe that the number of UAVs cannot be too large because of the limited budget, and a UAV with a higher carrying capacity should be selected first.Moreover, the TLSGA can determine the optimal configuration of OMSs under a limited budget by considering the combination of the number of UAVs and carrying capability to maximize system resilience.The TLSGA can be applied to integer program problems under nonlinear constraints in practical engineering by adjusting the initialization method and fitness function.

Conclusions
This study evaluated the resilience of OMSs by considering real-time reliability and system performance, and the novelty of this study lies in the fact that we evaluated realtime system reliability based on CIMC.A TLSGA was developed to effectively solve the resilience-oriented start-up strategy optimization problem, especially for systems with high λ f and low λ c .Moreover, under a limited budget, a UAV with a higher carrying capacity should be selected first to determine the optimal combinations of m and n0.The proposed method focuses on exponential distribution and known risk situations.In Future, more research works should focus on complex and uncertain risk situations and other types of distributions of components and their functions.

Conclusions
This study evaluated the resilience of OMSs by considering real-time reliability and system performance, and the novelty of this study lies in the fact that we evaluated realtime system reliability based on CIMC.A TLSGA was developed to effectively solve the resilience-oriented start-up strategy optimization problem, especially for systems with high λ f and low λ c .Moreover, under a limited budget, a UAV with a higher carrying capacity should be selected first to determine the optimal combinations of m and n 0 .The proposed method focuses on exponential distribution and known risk situations.In Future, more research works should focus on complex and uncertain risk situations and other types of distributions of components and their functions.

Figure 1 .
Figure 1.Structure of an onsite monitoring system with m multifunctional components.

Figure 1 .
Figure 1.Structure of an onsite monitoring system with m multifunctional components.

Figure 2 .
Figure 2. Example of an OMS containing three UAVs.

Figure 2 .
Figure 2. Example of an OMS containing three UAVs.

Figure 3 .
Figure 3. General description of performance-based system resilience.
R t dt , which could reduce time.The detailed derivation of  0 ( ) T R t dt , which is the expected residual lifetime after t0, is listed in Proof of Proposition 1.

Figure 3 .
Figure 3. General description of performance-based system resilience.

Proposition 3 .
(1) Proof of system resilience bounds: In Section 2.3.1,R(t) is a decreasing and continuous function of time.We obtain lim x→∞ R(t) = 0 and lim x→∞ R(t) = 0.

Figure 4 .
Figure 4. Changes in system resilience with an increasing number of hazards.

Figure 4 .
Figure 4. Changes in system resilience with an increasing number of hazards.

Mathematics 2023 , 19 Figure 5 .
Figure 5. Changes in system resilience with three different start-up strategies.

Figure 5 .
Figure 5. Changes in system resilience with three different start-up strategies.

Figure 6 .
Figure 6.Flow chart of the procedures of the TLSGA.

Figure 7 .
Figure 7.The improvement percentage of system resilience obtained by TLSGA.

Figure 7 .
Figure 7.The improvement percentage of system resilience obtained by TLSGA.

Mathematics 2023 ,
11, x FOR PEER REVIEW 17 of 19program problems under nonlinear constraints in practical engineering by adjusting the initialization method and fitness function.

Figure 8 .
Figure 8.The optimal system resilience with different combinations of m and n0.

Figure 8 .
Figure 8.The optimal system resilience with different combinations of m and n 0 .

Table 1 .
Research gap analysis for system resilience.
decreases and bounces, R(t) is integrable, and t 0 R(t)dt is finite.Using t → ∞ , we can obtain lim t→∞ f SR (t) = 0. Therefore, the lower bounce of f SR (t) is zero, and the upper bounce of f SR (t) is one.

Table 2 .
The symbols of the 12 systems with different parameters.