Survivability Analysis on a Cyber-Physical System

: A cyber-physical system (CPS) is composed of interdependent physical-resource and cyber-resource networks that are tightly coupled. The malfunction of nodes in a network may trigger failures to the other network and further cause cascading failures, which would potentially lead to the complete collapse of the entire system. The number and communication of operating nodes at stable state are closely related to the initial failure nodes and the topology of the network system. To address this issue, this paper studies the survivability of CPS in the presence of initial failure nodes, proposes ( m , k ) — survivability, which is deﬁned as the probability that at least k nodes are still working in CPS after m nodes are attacked, and discusses the problem of cascading failure based on reliability (CFR). Further, we propose an algorithm to calculate ( m , k ) — survivability and ﬁnd that the minimum survivability of system with regular allocation strategy decreases with k for a ﬁxed m , and the proportion of initial failure node groups that cause the system to completely fragment increases with m . The simulation shows the properties and the result of CFR of the system with 12 nodes.


Introduction
A CPS consists of a physical network and a cyber network that are tightly coupled together and has been widely applied in many fields.Smart grid is a typical application of CPS.A power grid is covered on the computer network to realize the automation of the power grid and improve efficiency.The most striking feature of CPS is interdependency.Although interdependent networks make the system more intelligent and complex, Vespignani [1] showed that the system has greater vulnerability than a single network when facing attacks, failures, and natural disasters.A random removal of a small fraction of nodes from one network can cause an iterative cascade of failures in interdependent network and further lead to a disastrous impact on the overall cyber-physical system, such as 2003 blackout in the Northeastern United States and Southeastern Canada [2], and the electrical blackout that cause many losses in Italy on 28 September 2003 [3].The cascading effect is one of the important characteristics of CPS.In the case of initial random node failures, cascading failures sometimes may not trigger complete collapse of CPS, resulting in several surviving nodes at steady state, such as micro grids.The network performance of surviving nodes is also an important issue to be addressed.In engineering, survivability is the quantified ability of a system, subsystem, equipment, process, or procedure to continue to function during and after a natural or man-made disturbance [4].For CPS, the steady state is absolutely a subsystem of CPS and obtained after disturbance, i.e., random node failures.Survivability is consistent with the issue we address.Further, we focus on the communication ability of surviving nodes at steady state.Fortunately, reliability is an effective parameter to evaluate network performance and has been utilized in CPS [5].In this article, we study the survivability of CPS based on reliability.For convenience, some definitions are listed as follows: The definition is based on all-terminal reliability with node failures.For interdependence, cascading failures may occur after some nodes are attacked.All-terminal reliability with node failures can describe the connectivity of two networks.Therefore, it can represent the ability that the system can meet the required level of service when a certain number of nodes fail.
Based on these definitions, we propose (m, k)-survivability, which is defined as the probability that at least k nodes are still working in CPS after m nodes are attacked, and discuss the problem of cascading failure based on reliability.The proposed (m, k)-survivability reflects the connectivity of nodes in networks and illustrates the probability that the system meets the service requirements.Further, we propose an algorithm to calculate (m, k)-survivability and find that the minimum survivability of the system with a regular allocation strategy decreases with k for a fixed m, and the proportion of initial failure node groups that cause a system to fragment increases with m.
The rest of this paper is organized as follows.Section 2 shows background and related work.In Section 3, we introduce the definition of (m, k)-survivability and analyze cascading failures.Section 4 gives simulation and experiment to survivability of a CPS.The analysis and conclusion appear in Sections 5 and 6, respectively.

Background and Related Work
The fragility of interdependent networks and designation of robust networks have been hot spots for CPS.Buldrev et al. [3] put forward a "one-to-one correspondence" model for interdependent networks and studied its robustness.They found that a random removal of a small fraction of nodes from one network can cause cascading failures and even complete fragmentation of interdependent networks.In order to estimate the robustness of the system, they calculated the size of functioning parts for each stage of cascading failures.Moreover, they presented a critical threshold p c and showed that if the fraction of node failures 1 − p in one network satisfies p c ≤ p, the two networks will completely fragment.Some literature [3] has received widespread attention and has aroused the interest of researchers in interdependent networks.Further research appears in different directions [7][8][9].For instance, Buldyrev et al. [7] also considered the model in the case of "one-to-one correspondence," but the difference from [3] is that mutually dependent nodes are assumed to have the same intraedges in their own networks.Shao et al. [9] considered multiple support-dependence relations between two coupled networks.They used a model where there existed some autonomously nodes in one network, meaning that these nodes can operate without supporting nodes from the other network.Furthermore, Yagan et al. [10] described the dynamic characteristics of cascading failures in the system where the topology of each individual network is unknown and showed that the regular allocation strategy has better robustness compared with all possible strategies.The above literature considers different CPS models and studies the robustness of CPS.
In engineering, survivability concepts appear in many other networks or research areas.For example, Neumann et al. [11] proposed the survivability definition of the network system for the first time.In a communication system, survivability is seen as the probability that its service is still available when the system is damaged or fails.Deutsh et al. [12] defined survivability in the context of software engineering, even if some parts of the system do not work and the basic services are reachable.Ellison et al. [13] discussed that survivability is the ability of the system to complete the task in time facing attacks, failures, and contingencies.Moitra et al. [14] argued that survivability is the ability of the system to resist the attack and to provide certain services after being attacked.In a wireless network, Panirahi [15] defined survivability as the ability of the system to complete its tasks in time for attacks, failures, and contingencies.Levitin et al. [16] argued that the survivability of information systems is the ability to maintain a working state when a fault event occurs.In the above article, the authors give different descriptive definitions of survivability.On the other hand, the study of survivability varies with researchers' fields.Liang et al. [17] studied the survivability of time-varying networks and proposed a new survivability framework for time-varying networks.Wan et al. [18] researched the node survivability of a sensor network and considered how to schedule each sensor between active and sleep modes to maximize the network lifetime while meeting survival requirements.Petridou et al. [19] put forward a quantitative analysis for evaluating survivability of wireless sensor networks and defined four measures of network survivability: the frequency of failures, the data loss, the delay, and the compromised data under three different type of failures, namely, link, node, attack failures.
In the model of network reliability, K-terminal network reliability is the probability that G k is connected, where G k is a subgraph with specified k nodes [20].Two nodes are connected to each other if there is a path between them in the network [21].The following literature provides results ideal for calculating network reliability.Moskowitz [22] investigated a network model where nodes never fail, but each edge fails independently with probability.A factoring algorithm is put forward to calculate the reliability of network.On the basis of this result, Carlier et al. [23] applied a factoring algorithm and reductions to the network model in which vertices and edges may fail.As the calculation of network reliability is an NP-hard problem, the method of network reduction may reduce computation time.On the other hand, many studies have concentrated on approximating reliability.A discrete and dynamic model is defined to evaluate the reliability of telecommunication networks [24] and in this case, the set of terminals K of network is specified arbitrarily.Three methods were presented and they showed that failures have negative impact on service quality offered by network.Ayoub et al. [25] developed an algorithm that uses Monte-Carlo simulation and Breadth-First search to calculate the reliability of telecommunication networks.An exact reliability estimate was found by them in feasible and practical time after a sufficient number of simulations.The above researches on network reliability do not restrict the length of the path.Petingi et al. [26] proposed a polynomial-time algorithm for detecting and deleting irrelevant edges that make no difference to the source-to-terminal diameter constrained network reliability.They integrated this algorithm within an exact recursive factorization approach based upon Moskowitz's edge decomposition, conducted on different real-world topologies and confirmed a substantial computational gain.

Survivability of Cyber-Physical Systems
A cyber-physical system is made up of two interacting networks, say, Networks A and B, both with N nodes, so the total number of nodes is n = 2N.We denote the node sets of A and B by 1, 2, . . ., N and N + 1, N + 2, . . ., n respectively.We call edges that connect different networks as interedges and those in the same network as intraedges.A node is functioning only if there is at least one interedge and one intraedge.We are concerned with whetheer the system exists functioning giant component [10] or completely separated when cascading failures occur.Definition 3. If m nodes are attacked, the probability that at least k nodes still operate is called (m, k)-survivability, denoted by S (m, k).
Discussion: Definition 3 does not indicate where the m nodes are located and thus imply the best-case and worst-case survivability.The parameter m is the number of failed nodes in the system, and the parameter k indexes the communication quality of surviving nodes with node failures.Actually, S (m, k) with all possible m and k is a matrix.
For any 1 where S ij denotes the probability that there has at least j nodes are still working after i nodes are attacked.Obviously, the locations of node failures may cause different steady state.Thus, S ij is a vector where the elements denote the survivability after i nodes in different positions are invalidated.That is, The survivability S x (i, j) is stated mathematically as follows: where 1 ≤ x ≤ C n i , S x (i, j) is the survivability of system after the xth combination of i nodes are invalidated.|V x | is the number of nodes in the steady state.
We propose an Algorithm 1 based on network reliability method in the presence of initial node failures.Step 0. Let M = ∅.
Step 1. Choose all the endpoint groups with m numbers, and store them into the collection M.
Step 2. For each group v ∈ M, obtain the adjacency matrix A of steady state.
Step 4. calculate the node number of steady state n 1 for each terminal nodes groups, find all the connected endpoint sets of r, and stored in the collection S. For  where Sij denotes the probability that there has at least j nodes are still working after i nodes are attacked.Obviously, the locations of node failures may cause different steady state.Thus, Sij is a vector where the elements denote the survivability after i nodes in different positions are invalidated.That is, The survivability Sx (i, j) is stated mathematically as follows: where 1 ≤ x ≤ Cn i , Sx (i, j) is the survivability of system after the xth combination of i nodes are invalidated.|Vx| is the number of nodes in the steady state.
We propose an Algorithm 1 based on network reliability method in the presence of initial node failures.
Step 1. Choose all the endpoint groups with m numbers, and store them into the collection M.
Step 2. For each group v  M, obtain the adjacency matrix A of steady state.The complexity of computation: First step, choosing all groups is based on permutation and combination theory and belongs to the NP-hard problem.That is, for a given m, we need to choose groups of nodes.For example, in Figure 1, i = 1 and j = 2.
In order to explore the survivability of CPS, we analyze the nonzero elements in the matrix.
For any i and j, S ij is a vector of length C n i .We use − S ij to denote the average of S ij .The corresponding matrix is An average is a measure of the trend in a set of data sets.Through the above matrix, we can get the average survival level of the system for different mand k-values.If the average value is large, it shows that the system has strong survivability when m nodes are attacked.Otherwise, it has weak survivability.
For the interdependent network, whether the system is still to provide a certain service after some nodes are attacked is concerned problem.Thus, we give the following definition.We consider a critical value to determine whether node-attacks cause cascading failures.Different from [5], we require the maximum, minimum survivability, and the number of node groups that are less or more than R 0 .The situations where k = 0 or R 0 = 1 are specifically discussed in [2], and situation k = |V| indicates the all-terminal reliability of the system.|V| denotes the number of nodes at steady state.The value of R 0 is related to m and k.
For example, in Figure 1, R 0 = 0.98, and min S (1,2) > R 0 , then the system is (1,2)-RS, max S (1,5) = 0.9743 < R 0 = 0.98, so the system is (1,5) The regular allocation strategy is to allocate the same number of bidirectional inter-links to each node of the system.We study the systems with regular allocation strategy and have the following results.Proposition 1.For a fixed m, the survivability of CPS decreases monotonously with k for the same locations.
Proposition 2. The survivability of a graph G = (V, E) is not less than that of its subgraphs if same locations are attacked.
The survivability of subgraph G* can be calculated by According to the probability theory, we obtain For example, Figure 2b is subgraph of (a), when same nodes are attacked, steady state of (b) is subgraph of (a).Accordingly, the survivability of (a) is better than (b).
Therefore, the survivability of CPS decreases monotonously with k at the same position.
Proposition 2. The survivability of a graph G = (V, E) is not less than that of its subgraphs if same locations are attacked.= S*x(m, k).For example, Figure 2b is subgraph of (a), when same nodes are attacked, steady state of (b) is subgraph of (a).Accordingly, the survivability of (a) is better than (b).From Propositions 1 and 2, the survivability of the system is closely related to m and k.Therefore, it is meaningful to evaluate CPS by checking m and k of the survivability.Proposition 3.For a fixed m, the number of initial failed node groups that provide the system with minimal survivability increases monotonically as k increases.
Proof.According to Proposition 1, the survivability of CPS decreases monotonously with k at the same position.nk and nk+1 denote the number of initial failure node groups that provide the system with minimal survivability respectively.Nk and Nk+1 denote the set of initial failure node groups that provide the system with minimal survivability respectively.Then Nk ⊆ Nk+1, so nk ≤ nk+1.From Propositions 1 and 2, the survivability of the system is closely related to m and k.Therefore, it is meaningful to evaluate CPS by checking m and k of the survivability.Proposition 3.For a fixed m, the number of initial failed node groups that provide the system with minimal survivability increases monotonically as k increases.
Proof.According to Proposition 1, the survivability of CPS decreases monotonously with k at the same position.n k and n k+1 denote the number of initial failure node groups that provide the system with minimal survivability respectively.N k and N k+1 denote the set of initial failure node groups that provide the system with minimal survivability respectively.Then N k ⊆ N k+1 , so n k ≤ n k+1 .
We also study the robustness of system with regular allocation strategy.Let n s denote the number of remain nodes of system at steady state.Then, for a fixed m, n s is a vector of length C n m .We use average number of remain nodes of n s (denoted by ANRN) to describe the robustness of entire system macroscopically.And we have follow result on robustness.
Proposition 4. The proportion of node groups that cause the system to completely fragment increases with m.
Proof.V m denote the set of node groups that cause the system to completely separate for a fixed m.Therefore,

Simulation and Examples
In this part, we choose a CPS with a regular allocation strategy to simulate and experiment.Each network has six nodes, and each node has the same interedges that are assumed to be bidirectional.Let the reliability of each node be p = 0.9.Network topology is shown in Figure 3.We also study the robustness of system with regular allocation strategy.Let ns denote the number of remain nodes of system at steady state.Then, for a fixed m, ns is a vector of length Cn m .We use average number of remain nodes of ns (denoted by ANRN) to describe the robustness of entire system macroscopically.And we have follow result on robustness.

Proposition 4. The proportion of node groups that cause the system to completely fragment increases with m.
Proof.Vm denote the set of node groups that cause the system to completely separate for a fixed m.
Cn m , we can obtain

Simulation and Examples
In this part, we choose a CPS with a regular allocation strategy to simulate and experiment.Each network has six nodes, and each node has the same interedges that are assumed to be bidirectional.Let the reliability of each node be p = 0.9.Network topology is shown in Figure 3.According to the definition, we obtain the following (m, k)-survivability matrix:  Then, in order to analyze the survivability of this system, we need to calculate each element.First, according to the above results, if i + j > 12, Sij = 0. Therefore, we get the following simplified matrix.The corresponding average matrix is Next, we analyze the remain elements of the matrix.The mean and minimum value of survivability for different m-and k-values are given, and we also obtain the number of initial failure node groups that provide the system with minimal survivability.The results are as follows: According to the definition, we obtain the following (m, k)-survivability matrix: Then, in order to analyze the survivability of this system, we need to calculate each element.First, according to the above results, if i + j > 12, S ij = 0. Therefore, we get the following simplified matrix.The corresponding average matrix is Next, we analyze the remain elements of the matrix.The mean and minimum value of survivability for different mand k-values are given, and we also obtain the number of initial failure node groups that provide the system with minimal survivability.The results are as follows (Table 1):  In Table 2, we give a constant R 0 for different mand k-values to estimate the survivability of the system.In Table 3, we give the number and proportion of node groups that cause the system to fragment and average remain nodes for any m.When m > 9, the system always collapses.Therefore, we just analyze m < 9.

Discussion
The average survivability can represent a variation trend of survivability of a system for different mand k-values macroscopically.From Table 1, the mean survivability of the system decreases with the parameters k and m.When both m and k are small, the system has a strong survivability; i.e., when m nodes are attacked, the system has a great probability that k nodes are still working.It represents that the ability which the system satisfies the specified level of service is great.Otherwise, the system has weak survivability; i.e., the system has a small probability that k nodes are still working and the ability which the system satisfies the specified level of service is not good.Moreover, when m > 9, the mean survivability is always equal zero.It shows that the system always collapses as m > 9.For other values, some node groups cause the system to collapse, and the system has the best survivability after it is attacked by other node groups.The second parameter indicates that the minimum survivability decreases with k for a fixed m.That is, the minimum probability that k nodes are still functioning decreases with k when m nodes are attacked.We also find the number of initial failure node groups that provide the system with minimal survivability increase with k for a fixed m.Let p = n k /C m n if an intelligent adversary chooses some nodes to attack, we can know what the probability of getting the worst impact is.From Table 2, the survivability of the system changes as R 0 , m and k vary.R 0 is a critical threshold to determine the impact that random attacks cause.Moreover, the system experiences cascading failure as m > 9 and R 0 = 0. From Table 3, we can see that the average of the remain node decreases with m.The nodes groups that cause the system to fragment increases with m.The other two sets of data illustrate this point.The more nodes of failure there are, the less ability to reach the specified service the system will have.
For a clearer explanation of Propositions 1 and 3, see Figure 4, wherein the image of S 1 (1, k), S 8 (1, k) changes with k.The image of num changes with k for a fixed m in Figure 5.
we can see that the average of the remain node decreases with m.The nodes groups that cause the system to fragment increases with m.The other two sets of data illustrate this point.The more nodes of failure there are, the less ability to reach the specified service the system will have.
For a clearer explanation of Propositions 1 and 3, see Figure 4, wherein the image of S1(1, k), S8 (1, k) changes with k.The image of num changes with k for a fixed m in Figure 5.

Conclusions
We investigate the survivability problem of a cyber-physical system and define survivability of the cyber-physical system, namely (m, k)-survivability.By analyzing (m, k)-survivability, we find that the survivability of a system with a regular allocation strategy is not less than that of its subsystem for given node failures and for a fixed m, the number of initial failed node groups that provide the system with minimal survivability increases monotonically as k increases.We also find that the proportion of node groups that cause the system to completely fragment increases with m.Therefore, the survivability of a system is closely related to m and k.Our results are beneficial for network design.For example, the evaluation of survivability can be used to predict whether a designed network can meet given requirements and a certain level of communication quality.As for future work, we will research the survivability of a system with a random allocation strategy and a unidirectional interedge allocation strategy.Furthermore, we will investigate the relationship between threshold R0 and the network.

Conclusions
We investigate the survivability problem of a cyber-physical system and define survivability of the cyber-physical system, namely (m, k)-survivability.By analyzing (m, k)-survivability, we find that the survivability of a system with a regular allocation strategy is not less than that of its subsystem for given node failures and for a fixed m, the number of initial failed node groups that provide the system with minimal survivability increases monotonically as k increases.We also find that the proportion of node groups that cause the system to completely fragment increases with m.Therefore, the survivability of a system is closely related to m and k.Our results are beneficial for network design.For example, the evaluation of survivability can be used to predict whether a designed network can meet given requirements and a certain level of communication quality.As for future work, we will research the survivability of a system with a random allocation strategy and a unidirectional interedge allocation strategy.Furthermore, we will investigate the relationship between threshold R 0 and the network.

Algorithm 1 :
(m, k)-survivability in G = (V, E) Input: A connected graph G = (V, E) with the node set V, edge set E, and node probability p require the number of normal work k and failure nodes m.Output: (m, k)-survivability S (m, k) of G = (V, E).
Step 5. S r = |S|.Step 6. R = R + S r * p r * (1 − p ) n1-r .Step 7. r = r + 1, S = ∅, go to Step 4. The complexity of computation: First step, choosing all groups is based on permutation and combination theory and belongs to the NP-hard problem.That is, for a given m, we need to choose n m = n! (n−m)!m! groups of nodes.For example, in Figure 1, i = 1 and j = 2. Machines 2017, 5, 17 4 of 11 Discussion: Definition 3 does not indicate where the m nodes are located and thus imply the best-case and worst-case survivability.The parameter m is the number of failed nodes in the system, and the parameter k indexes the communication quality of surviving nodes with node failures.Actually, S (m, k) with all possible m and k is a matrix.

Algorithm 1 :
(m, k)-survivability in G = (V, E) Input: A connected graph G = (V, E) with the node set V, edge set E, and node probability p require the number of normal work k and failure nodes m.

Figure 2 .
Figure 2. (a) is a CPS with 10 nodes and (b) is its subgraph.

Figure 2 .
Figure 2. (a) is a CPS with 10 nodes and (b) is its subgraph.

Figure 3 .
Figure 3. Instance with n = 12 nodes for survivability where Networks A and B have both six nodes.

Figure 3 .
Figure 3. Instance with n = 12 nodes for survivability where Networks A and B have both six nodes.

Figure 5 .
Figure 5.The image of num with different k-values.

Table 1 .
The mean and minimum value of survivability for different mand k-values and the number of initial failure node groups that provide the system with minimal survivability.(The first parameter indicates the mean survivability.The second parameter indicates minimum survivability.The third parameter indicates the number of node groups).

Table 3 .
The number and proportion of node groups that cause the system to completely fragment and ANRN with 12 nodes.