Defense Strategies for Asymmetric Networked Systems with Discrete Components

We consider infrastructures consisting of a network of systems, each composed of discrete components. The network provides the vital connectivity between the systems and hence plays a critical, asymmetric role in the infrastructure operations. The individual components of the systems can be attacked by cyber and physical means and can be appropriately reinforced to withstand these attacks. We formulate the problem of ensuring the infrastructure performance as a game between an attacker and a provider, who choose the numbers of the components of the systems and network to attack and reinforce, respectively. The costs and benefits of attacks and reinforcements are characterized using the sum-form, product-form and composite utility functions, each composed of a survival probability term and a component cost term. We present a two-level characterization of the correlations within the infrastructure: (i) the aggregate failure correlation function specifies the infrastructure failure probability given the failure of an individual system or network, and (ii) the survival probabilities of the systems and network satisfy first-order differential conditions that capture the component-level correlations using multiplier functions. We derive Nash equilibrium conditions that provide expressions for individual system survival probabilities and also the expected infrastructure capacity specified by the total number of operational components. We apply these results to derive and analyze defense strategies for distributed cloud computing infrastructures using cyber-physical models.


Introduction
Infrastructures for cloud computing, science experiments and computations and smart energy grid consist of complex, geographically-dispersed systems that are connected over long-haul networks. In these infrastructures, the communications network plays a critical, asymmetric role of providing the vital connectivity between the systems such as cloud computing sites, or supercomputers, or energy distribution centers. Network failures render the individual systems unreachable and in extreme cases can render the entire infrastructure unavailable. Such an infrastructure is represented by its constituent systems, S i , i = 1, 2, . . . , N, and the network connecting them is represented as a separate system S N+1 . The individual systems themselves are complex, consisting of several discrete cyber and physical components, which must be operational and connected to the network. The individual components of S i may be disabled or disconnected, and S i as a system may be disconnected, by cyber and physical attacks on the components. We formulate the problem of ensuring the infrastructure performance as a game between an attacker that launches cyber or physical attacks on the components and a provider that reinforces them to withstand the attacks. Since both attacks and reinforcements incur costs, the two players both weight the costs with expected benefits by minimizing utility functions. We derive Nash Equilibrium (NE) conditions that provide expressions for individual system survival probabilities and also the expected capacity specified by the total number of operational components. This paper extends and presents a unified view of the partial results presented in earlier conference papers on sum-and product-form utilities [1], composite utilities [2,3] and multi-site cloud infrastructures [4].
The components can be reinforced to survive direct attacks, but they can still be unavailable due to attacks on other components. For example, a super computer at a site may be hardened against cyber attacks, but can still be made unavailable by cutting fiber connections to the site. On the other hand, we consider that non-reinforced cyber and physical components will be disabled by direct cyber and physical attacks, respectively. Furthermore, in addition to within system S i , the effects of attacks on its components may propagate to components of other systems S j , j = i. Thus, both the correlations between components within individual systems and those between systems represent the propagation of disruptions across the infrastructure. The infrastructure provider is tasked with developing strategies to choose a number of components of each system to reinforce against the attacks by taking into account both types of correlations.
Let n i denote the number of components of S i , i = 1, 2, . . . , N + 1, of which y i and x i denote the number of components attacked and reinforced, respectively. Let P i be the survival probability of S i and P I be the survival probability of the entire infrastructure. Furthermore, let S −i denote the infrastructure without S i and P −i be its survival probability. The relative importance of S i is captured by the aggregate failure correlation function C i given by the failure probability of S −i given the failure of S i . Under the asymmetric network conditions, the specific role of the network is specified by two conditions: (a) C N+1 = 1 indicates that network failure will disrupt the entire infrastructure; and (b) C i = 0, for i = 1, 2, . . . , N, indicates that disruptions of individual systems are uncorrelated. The correlations between components of individual systems are captured by simple first-order differential conditions on P i using the system multiplier functions that capture correlations within systems and also abstract the effects of system-level parameters [5]. This two-level characterization helps to conceptualize the basic correlations in infrastructures, such as cloud computing and smart grid infrastructures and provides insights into the needed defense strategies by naturally "separating" the system-level and component-level aspects.
A game between an attacker and a provider involves balancing the costs of attacks and reinforcements of systems, given by L A (y 1 , . . . , y N+1 ) and L D (x 1 , . . . , x N+1 ), respectively, with the survival probability of the infrastructure. We consider that the provider minimizes the composite utility function given by: U D (x 1 , . . . , x N+1 , y 1 , . . . , y N+1 ) = F D,G (x 1 , . . . , x N+1 , y 1 , . . . , y N+1 )G D (x 1 , . . . , x N+1 , y 1 , . . . , y N+1 ) where the first product term corresponds to the reward and the second product term corresponds to the cost. Within the product terms, F D,G and F D,L are the reward and cost multipliers, respectively, of the provider, and G D and L D represent the reward and cost terms, respectively, of keeping the infrastructure operational. Similarly, we consider that the attacker minimizes: U A (x 1 , . . . , x N+1 , y 1 , . . . , y N+1 ) = F A,G (x 1 , . . . , x N+1 , y 1 , . . . , y N+1 )G A (x 1 , . . . , x N+1 , y 1 , . . . , y N+1 ) where F A,G and F A,L are the reward and cost multipliers, respectively, of the attacker, and G A and L A represent the reward and cost terms of disrupting the infrastructure operation, respectively. The expected capacity of the infrastructure is the expected number of available components, given by: which reflects the part of the infrastructure that survives the attacks. In the example of the cloud infrastructure, it represents the number of operational servers that are available to users on average.
The composite utility function can be specialized to obtain sum-form and product-form utilities by using appropriate terms, as summarized in Table 1, and their choice represents different values in keeping the infrastructure operational: (a) The sum-form utility function is given by: which will be minimized by the provider. The scalar g D ≥ 0 represents the benefit of keeping the infrastructure operational such as income from an operational cloud computing infrastructure. Thus, the sum-form represents a weak coupling between gain and cost terms, since the effect of their minimization on the utility function is independent. For a provider, this form is used when explicit "gain" in keeping the infrastructure up can be identified and balanced against the cost. (b) The product-form utility function is given by: which will be minimized by the provider; it represents the "wasted" cost to the provider since it is the expected cost under the condition that the infrastructure fails. Thus, the product-form represents a strong coupling between probability and cost terms, since the effect of minimization of one term gets multiplied by the other. This utility is used when the main goal of the provider is to keep the infrastructure up with the cost incurred, since there is no explicit "gain" term. Table 1. Gain and cost terms and their multipliers for sum-form and product-form utilities of the provider.
The Nash Equilibrium (NE) conditions based on the utility functions can used to estimate x i 's for the provider using various methods [6,7]. Our objective in this paper is to show that critical insights can be gained by deriving estimates of system survival probabilities and expected capacity explicitly in terms of various correlations, without relying on explicit solutions for x i 's. The differences in the goals of sum-and product-form utilities lead to qualitatively different defense strategies, which are derived separately in earlier works, and the corresponding expressions for the survival probabilities that are structurally different [5,8]. We show that under the asymmetric network conditions, NE conditions of this game lead to expressions for P i 's and N I with the same structure. In particular, the estimates of P i for sum-form and product-form utilities have the same expression in Theorem 3 except for one term, given by ξ . To consider the case where the sum-form and the product-form utilities are equivalent, we equate the two terms and obtain the following "equivalent" gain term of the sum-form: for 0 < P I < 1, which is an increasing function in both P I and L D ; or, equivalently, we have, This similarity is striking since the sum-form and product-form utilities represent two quite different objectives. The composite utility functions lead to simple expressions for P i , i = 1, 2, . . . , N, and N I at NE, which subsume the above cases. In general, the dependence of P i on cost terms and aggregate correlation functions, as well as their partial derivatives, is presented in a compact form by using the composite gain-cost and composite multiplier terms (defined in Section 4). The expected capacity at NE is expressed in terms of cost term L D and its derivative, the aggregate correlation functions C i , i = 1, 2 . . . , N + 1, and the system multiplier functions, Λ i , i = 1, 2 . . . , N + 1 (defined in Section 3.2). The expression provides critical information on the dependence of the expected capacity on system parameters, in particular C i and Λ i , and utility functions. Furthermore, by decomposing the system models into sub-models, such as cyber and physical sub-models, finer relationships can be inferred between system parameters, such as refined versions of C i and Λ i , and the expected capacity. We apply these results to a simplified model of cloud computing infrastructure with multiple server sites connected over a communications network.
The organization of this paper is as follows. We describe related work in Section 2. In Section 3, we describe the infrastructure model along with the aggregate correlation function and differential conditions on system survival probabilities. We present our game-theoretic formulation using sum-form, product-form and composite utility functions in Section 4 and derive NE conditions and estimates for the system survival probabilities and expected capacity. We apply the analytical results to a model of cloud computing infrastructure in Section 5. We present conclusions in Section 6.

Related Work
Critical infrastructures of power grids, cloud computing and transportation systems provide vital public and private services [9,10]. They depend on complex communications networks that connect the constituent systems, which by themselves consist of many disparate cyber and physical components [10]. The communications network plays a very critical role in these infrastructures [11], in some ways more so than the constituent systems, and its failure can significantly degrade the entire infrastructure [12,13]. These infrastructures are under increasing cyber and physical attacks, which the providers are required to counter by applying defense measures and strategies.
By capturing the interactions between providers and attackers of these infrastructures, game-theoretic methods have been extensively applied to develop the needed defense strategies [14][15][16], which attempt to ensure continued infrastructure operations in the presence of evolving threats [17]. Partial differential equations and discrete component models have been used in several of these infrastructures to model the physical and cyber systems [18] in formulating the underlying games. The game-theoretic formulations and the solutions developed for such infrastructures are quite varied and extensive. They include: multiple-period games that address multiple time-scales of system dynamics [19]; incomplete information games that account for partial knowledge about the system dynamics and attack models [20]; and multiple-target games that account for possibly competing objectives [21].
A comprehensive review of the defense and attack models in various game-theoretic formulations has been presented in [22]. Recent interest in cyber and cyber-physical systems led to the application of game theory to a variety of cyber security scenarios [16,23] and, in particular, for securing cyber-physical networks [24] with applications to power grids [11,[25][26][27].
The system availability, reliability and robustness aspects can be explicitly integrated into the game formulations [14] for infrastructures such as power grids, cloud computing and transportation systems. In particular, discrete models of cyber-physical infrastructures have been studied in various forms under Stackelberg game formulations [28]. A subclass of these models using the number of cyber and physical components that are attacked and reinforced as the main variables have been studied in [29]. These models characterize infrastructures with a large number of components and are coarser compared to the models that consider the attacks and reinforcements of individual cyber and physical components. Various forms of correlation functions [5,8,29] are used in these works to capture the dependencies between the survival probabilities of constituent systems, such as the cyber and physical sub-infrastructures.
Complex interacting collections of systems have been studied using game-theoretic formulations in [30], and their two-level correlations have been studied using the sum-form utility functions in [5] and the product-form utility functions in [8]. The sum-form utility represents a gain-centric priority, wherein an independent gain term weighted by P I represents the gain to be maximized by the provider. The product-form utility, on the other hand, represents a cost-centric priority, wherein the expected cost is to be minimized. The sum-form utility function [5] and the product-form utility function [8] are considered separately for a generic version of this game, wherein all systems play a similar role, unlike the asymmetric role of the network considered here. In terms of analysis, these two formulations have a certain degree of commonality, but there are also differences; in particular, estimates of P I can be obtained somewhat directly for the product-form as shown in [8]. These two utility functions also lead to qualitatively different defense strategies, and in particular, P I appears explicitly in the sensitivity estimates of system survival probabilities in product-form, but not in sum-form. These two utility functions are unified in [2], and the sum-form utility function has been studied under the asymmetric role of the communications network in [1].
The infrastructures for smart energy grids, cloud computing and intelligent transportation systems are composed of complex constituent systems that rely on communications networks. For wide-area operations, these networks play a critical asymmetric role of providing the vital connectivity needed for continued infrastructure operations. The asymmetric network correlations have been incorporated into multiple system infrastructures for sum-form and product-form utilities in [1], and these two works are unified in [3] by using the composite utility functions. The multi-site cloud computing infrastructure was discussed as an example for sum-form and product-form utility functions in [1] and composite utility functions in [3], wherein the network plays a critical asymmetric role. This model is further extended by including an HVAC system in [4], and also, additional details of NE conditions and capacity estimates are provided. In this paper, we consolidate these results and present a unified treatment of the sum-form, product-form and composite utilities under asymmetric network correlation conditions. For multi-site cloud infrastructures, we explicitly relate these utility functions and interpret the abstract definitions of correlation functions and system multiplier functions in terms of systems and their components.

Discrete System Models
We consider infrastructures with constituent systems consisting of discrete components [5,8] and connected over a communications network [1]. We first consider the correlations at the systems and network levels and then consider the correlations between the components of individual systems.

System-Level Correlations
The correlations between systems, including the network, in these infrastructures are characterized in terms of their survival probabilities as follows. Condition 1. Aggregate correlation function: Let C i denote the failure probability of the rest of the infrastructure S −i given the failure of S i , and let C −i denote the failure probability of S i given the failure of S −i such that: Then, the survival probability of the infrastructure is given by: The aggregate failure correlation function captures the interdependence of the rest of the system S −i on the failure of S i , which can be illustrated using the following special cases.
(a) Asymmetric network: In a cloud computing infrastructure, consider that the fiber connections to N sites, each with l servers, constitute the network system S F = S N+1 . Then, we have: where K is a normalization constant, since the fiber failure rate is amplified by l in rendering the servers unavailable. Thus, we have: The system failures satisfy a statistical independent condition given by indicating that the failure probability of S −i is not dependent on P i . This condition in turn leads to P I = P i P −i , which indicates the statistical independence of the survival processes of S i and S −i . More generally, if C i > 1 − P −i , the failures in S −i are positively correlated with failures in S i , that is they occur with a higher probability following the latter. If we denote the failure probability of S i by P¯i, then we have P −i|ī > P −i , or equivalently, failure in S i leads to a higher probability of failure in S −i . If C i < 1 − P −i , failures in S −i are negatively correlated with the latter failures, that is P −i|ī < P −i .
(c) Definite failure: In another case, when the failure of S i leads to a definite failure of the rest of the infrastructure, we have C i (P i ) = 1 such that P I = P −i . This condition indicates that the infrastructure survival probability solely depends on the marginal failure probability of S −i .
(d) ORsystems: The OR systems as modeled in [29] correspond to the special case N = 2 where the infrastructure consists of uncorrelated cyber and physical systems (denoted by i = C and −i = P, respectively) that can be independently analyzed. For OR systems, the failure probabilities of S i and S −i are uncorrelated such that C i = C −i = 0, and hence, we have We apply this condition next in Condition 2 for N systems considered in this paper.
The important asymmetric role of the communications network is characterized using the following condition.
Part (i) of Condition 2 leads to P I = P −(N+1) , which indicates the role of the rest of infrastructure S −(N+1) without the network; namely, its survival probability is the same as that for server sites together. Part (ii) of Condition 2 leads to P I = P i + P −i − 1, i = 1, 2, . . . , N, which linearly depends on each of the failure probabilities of the constituent system S i and the rest of infrastructure S −i . It is important to note that although there are direct correlations between the site failures zero (Part (ii) above), these site failures are still indirectly related through the network. In particular, the failures of S i and S j , which are parts of S −(N+1) , are correlated with the network via C N+1 ; for example, they both become simultaneously unavailable when the wide-area network fails.
The effects of reinforcements and attacks on host sites and wide-area networks can be separated using the following two conditions: the first condition, indicates that reinforcing the server site S i does not directly impact the survival probability of other sites or networks; and (ii) the second condition, ∂P i ∂x j = 0 for i = 1, 2, . . . , N + 1, j = 1, 2, . . . , N and j = i, indicates that reinforcing server sites or network S j does not directly impact the survival probability of server sites or network S i .
While the reinforcements to individual server sites or networks are not directly reflected in other systems, their failures may still be correlated due to the underlying system structures as reflected in the aggregated correlation function of the network C N+1 . These system-level considerations for the provider are captured by the following condition, which is obtained by differentiating P I in Condition 1 with respect to x i and ignoring the terms corresponding to Parts (i) and (ii) above.
The condition captures the effect on the increment in P I as a result of the change in the number of reinforced components x i of S i . It is the sum of (i) the increment in individual system survival probability P i weighted by "non-correlation" term (1 − C i ) and (ii) the increment in correlation C i weighted by the failure probability 1 − P i of S i . For the sites S i , i = 1, 2, . . . , N, we have: For the network S N+1 , we have: Under Condition 2, C i is constant, but its partial derivatives with respect to x i could be non-zero, as other parameters change to keep it constant.

Component-Level Correlations
The system survival probabilities satisfy the following differential condition that specifies the correlations at the component level [5,31].

Condition 4.
System multiplier functions: The survival probabilities P i and P −i of system S i and S −i , respectively, satisfy the following conditions: there exist system multiplier functions Λ i and Λ −i such that: . , x N+1 , y 1 , . . . , y N+1 )P −i for i = 1, 2, . . . , N + 1.
The derivative in the above condition is linear in P i for Λ i = 1 and is faster than linear if Λ i > 1 and slower than linear if Λ i < 1. These system multiplier functions capture the underlying system structure including its parameters, in addition to the game variables x i 's and y i 's. For example, in the case of multi-site server infrastructure, Λ i in Section 5.2 depends on the number of severs l i at site i. This somewhat abstract condition enables us to capture such a structure in a generic manner and indeed is satisfied in two special cases studied extensively in the literature.
(a) Statistically independent components: The special case when component survival probabilities are statistically independent with and without reinforcements has been studied in [31]. Let p i|R and p i|W denote the conditional survival probability of a component of S i with and without reinforcement, respectively. Under the statistical independence condition of component failures, the probability that S i with n i components survives the attacks is: as in [31], or equivalently: By differentiating the equation with respect to x i , we obtain: The condition for the faster than linear derivative is ln p i|R p i|W > 1 or equivalently p i|R > ep i|W , where e is the base of the natural logarithm. The condition that the survival probability of a reinforced component is higher than that of a non-reinforced component, but less than ep i|W , namely, ep i|W > p i|R > p i|W , corresponds to only the slower than linear derivative. (b) Contest survival functions: The contest survival functions are to express P i in [30] such that P i = ξ+x i ξ+x i +y i for a suitably-selected slack variable ξ, which in turn leads to: The condition for the slower than linear derivative is: which is satisfied for larger values of x i sufficient to make the left-hand side negative.

Game Theoretic Formulation
The provider's objective is to make the infrastructure resilient by reinforcing x i components of S i to optimize the utility function. Similarly, the attacker's objective is to disrupt the infrastructure by attacking y i components of S i to optimize the corresponding utility function. NE conditions are derived by equating the corresponding derivatives of the utility functions to zero, which yields: for i = 1, 2, . . . , N + 1 for the provider. We define: as the composite gain-cost term, wherein the gain G D and cost L D are "amplified" by the derivatives of their corresponding multiplier functions with respect to P I . We then define: as the composite multiplier term, wherein the gain multiplier F D,G and cost multiplier F D,L are "amplified" by the derivatives of their corresponding gain and cost terms with respect to x i , i = 1, 2, . . . , N + 1, respectively. These two terms lead to the compact NE condition ∂P I These NE conditions can be used to solve for x i 's using available methods whose complexity depends on the details of gain and cost terms [14][15][16]. Indeed, different methods and trade-offs may be required to derive such solutions by exploiting the details of infrastructure [7]. We show in the next section that estimates for system survival probabilities and expected capacity can be obtained without explicitly solving for x i 's, and yet, they provide valuable qualitative insights about the infrastructure. Various terms of the composite utility function specialized to sum-form and product-form utilities are shown in Table 2, which are considered separately in Section 4.3. Table 2. Gain and cost terms, their multipliers and other terms for sum-form and product-form utilities of the provider.

OR Systems
The OR systems [31] constitute a sub-class of abstract infrastructures where simultaneous failures of two or more systems are extremely unlikely, namely their probability is zero. These abstract models are used to illustrate the simplifications that result from ignoring the correlations and are generally used for analysis purposes. Here, OR systems ignore the asymmetric role played by the communications network. These systems are simpler to analyze due to the absence of system-level correlation terms, and they serve as base study cases when the correlations can be ignored. Indeed, an estimate of P i can be derived as a simple ratio of the gain-cost gradient and system multiplier function Λ i . Using P S = P i + P −i − 1, we obtain: where Θ i (·) is called the scaled gain-cost gradients of system S i . Then, Condition 4 provides us an estimate for the survival probability of S i as the ratio of the scaled gain-cost gradient and the system multiplier function given by: for i = 1, 2, . . . , N. These estimates for individual systems depend mainly on the corresponding scaled gain-cost gradients and thus represent a "separation" of the individual systems at this level.
In this sense, OR systems constitute an important analytical case wherein the correlations between the individual systems may be ignored. In addition, these estimates provide the sensitivity information of the survival probabilities of the individual systems with respect to various quantities of S i . In particular, the survival probability estimateP i;D is proportional to the corresponding weighted cost and reward functions and inversely proportional to their weighted derivatives. This seemingly counter-intuitive trend applies only to the set of Nash equilibria and not to the overall system behavior. In the rest of the paper, we denote Λ i (x 1 , . . . , x N , y i , . . . , y N ) and Θ i (x 1 , . . . , x N , y i , . . . , y N ), by Λ i and Θ i , respectively, to simplify the notation.

System Survival Probabilities and Expected Capacity
We now derive estimates for P i at NE using aggregated correlation functions and their partial derivatives to infer qualitative information about their sensitivities to different parameters. Theorem 1. Survival probability estimates: Under Conditions 1, 3 and 4, estimates of the survival probability of system S i , for i = 1, 2, . . . , N + 1, are given by: Under the asymmetric network correlation coefficient C N+1 = 1, the survival probability of the network is given by: Our proof is based on deriving NE conditions for the utility function. At NE, we have: Then, using the equation in Condition 3 and ∂P i ∂x i = Λ i P i from Condition 4, we have: Under the condition , and hence, we obtain: Consider the survival probability of the infrastructure; under the asymmetric network condition, we have C N+1 = 1 and ∂C N+1 ∂x N+1 = 0, which imply that the condition C i < 1 or ∂C i ∂x i = 0 is not satisfied; hence, the above formula cannot be used directly since the denominator ∂C i Instead, using C N+1 = 1 in Condition 1, we obtain P I = P −(N+1) , which implies: Then, the NE condition is given by: which completes the proof.
The system survival probability estimatesP i;D provide qualitative information about the effects of various parameters including aggregated correlation coefficient C i , system multiplier functions Λ i , composite gain-cost L D G,L and composite multiplier F D,i G,L ; note that the estimates may not necessarily lie within the range [0,1]. In particular,P i;D (i) increases and decreases with F D,i G,L and L D G,L , respectively, (ii) increases with Λ i and (iii) depends both on C i and its derivative for i = 1, 2, . . . , N. For the network, P −(N+1);D is in a simpler form since C N+1 = 1.
We now consider that the asymmetric role played by the network described in Condition 2, namely its failure, renders entire infrastructure unavailable; also, failures of individual systems are uncorrelated with others. The following theorem provides a single, simplified expression for the expected capacity under these conditions. Theorem 2. Expected capacity under asymmetric network correlations: Under Conditions 1-4, the expected capacity is given by: This condition indicates that lower L D G,L and higher composite multiplier F D,i G,L lead to lower expected capacity. Typically, the composite gain-cost L D G,L is negative (e.g., −g D for sum-form) since it is minimized by the provider; thus, its lower value is more negative and has a higher magnitude. Furthermore, larger values of Λ i also lead to lower expected capacity. In particular, the condition Λ i > 1, called the faster than linear growth of ∂P i ∂x i , leads to lower expected capacity. This seems counter-intuitive since faster improvement in P i due to the increase in x i leads to lower expected capacity, but note that it only characterizes the states that satisfy NE conditions.

Sum-Form and Product-Form Utility Functions
The NE conditions for sum-form and product-form utilities are derived by equating the corresponding derivatives to zero, which yields the following conditions, respectively: for i = 1, 2, . . . , N + 1 for the provider. We now derive estimates for P i at NE using partial derivatives of the cost and failure correlation functions to infer qualitative information about their sensitivities to different parameters.
where A = + and A = × correspond to sum-form and product-form, respectively, such that: Under the asymmetric network correlation coefficient C N+1 = 1, the survival probability of the network is given by: Proof. Our proof is based on deriving NE conditions separately for sum-form and product-form utility functions and then comparing them to identify their common structure and the difference terms. At NE, for the sum-form, we have: Then, using the equation in Condition 3 and ∂P i ∂x i = Λ i P i from Condition 4, we have: Under the condition , and hence, we obtain: for i = 1, 2, . . . , N + 1. Similarly, for the product-form, we have: Then, using the equation in Condition 3 and ∂P i ∂x i = Λ i P i from Condition 4, we have: Then, we have: Consider the survival probability of the infrastructure; under the asymmetric network condition, we have C N+1 = 1 and ∂C N+1 ∂x N+1 = 0, which imply that the condition C i < 1 or ∂C i ∂x i = 0 is not satisfied; hence, the above formula cannot be used directly since the denominator ∂C i Instead, using C N+1 = 1 in Condition 1, we obtain P I = P −(N+1) , which implies: Then, the NE condition for the sum-form is given by: Similarly, for the product-form, we obtain, which completes the proof.
The estimatesP i;D above provide sensitivity information about the corresponding survival probabilities with respect to various parameters; note that the estimates may not necessarily lie within [0,1]. In particular, they qualitatively relate P i to the corresponding aggregate correlation function C i and its derivative, and also to Λ i . These dependencies are identical for both sum-form and product-form utility functions. Indeed, the difference between the two formulae is captured by the single term ξ A i , which is proportional to the derivative term ∂L D ∂x i in both cases. The main difference is that ξ × i is an increasing function of P I , whereas ξ + i does not depend on P I . Furthermore, the dependence on L D is different for these two terms. Since , the role of g D in the former is played by L D /(1 − P I ) in the latter. Typically, g D is chosen as a constant in the sum-form, and P I is a function of x i and y i .
We now consider that network failure renders the entire infrastructure unavailable, and the failure of individual systems is uncorrelated with others given by Condition 2. The following theorem provides a single, simplified expression for the expected capacity under these conditions. Theorem 4. Asymmetric network correlations: Under Conditions 1-4, the expected capacity is given by: where A = + and A = × correspond to sum-form and product-form, respectively, such that: (2) and (3) in Theorem 3 simplify to the same equation

Proof. Under Part (ii) of Condition 2, Equations
, which provides the expression for N A I .
For the sum-form, indicates that higher gain g D leads to a lower number of operational components. For the product form, indicates that higher survival probability of the network leads to a lower number of operational components. The dependence on Λ i is similar in both cases, namely faster than linear leads to a lower number of available component, and vice versa. The dependence on L D is somewhat different due to its presence in the denominator for the product-form, even though ∂L D ∂x i appears in the numerator in both forms.
The expressions of N I for the composite utility are simpler due to the generality of the composite gain-cost and composite multiplier, which are complex by themselves in that the sum-form and product-form are subsumed by them as indicated in Table 1. Typically, the composite gain-cost L D G,L is negative (e.g., −g D for the sum-form) since it is minimized by the provider; thus, its lower value is more negative and has a higher magnitude. Furthermore, larger values of Λ i also lead to lower expected capacity. In particular, the condition Λ i > 1, called the faster than linear growth of ∂P i ∂x i , leads to lower expected capacity. This seems counter-intuitive since faster improvement in P i due to the increase in x i leads to lower expected capacity, but note that it only characterizes the states that satisfy NE conditions.

Multi-Site Server Infrastructure
A distributed cloud computing infrastructure consisting of N sites, each with l i servers at site i, i = 1, 2, . . . , N, has been studied by using separate cyber and physical models for each site in [2]. Here, we expand this model to incorporate both cyber and physical aspects of the HVAC of a site, namely its mobile phone app and cooling tower located outside the facility. The sites are connected over a wide-area network S N+1 , as shown in Figure 1. The components of the network include routers, each of which manages l N+1 connections as shown in Figure 2. This infrastructure is subject to a variety of cyber and physical attacks on its components. Cyber attacks on the servers may be launched remotely over the network since the servers are accessible to users. Meanwhile, routers are located at geographically-separated sites, and access to them is limited (to network administrators), so they are not as easily accessible over the network. Cyber attacks on routers require different techniques and represent different costs to the attacker compared to server attacks. Furthermore, this infrastructure is subject to physical attacks in the form of fiber cuts, which require a proximity access by the attacker. Cutting the network fibers that connect server sites to routers will disconnect the entire site, making it inaccessible to the users. Such attacks may also be launched on the network fibers between routers at different locations on the network.
The infrastructure provider may employ a number of reinforcements to protect against attacks, including replicating the servers and routers to support fail-over operations and installing physically-separated redundant fiber lines to the sites and between router locations. These measures could require significant costs and hence must be strategically chosen.

System-Level Correlations
The cyber and physical aspects of a site S i can be represented by using two finer sub-models S (i,c) and S (i,p) that correspond to the cyber and physical model, respectively. Similarly, those of the network S N+1 are represented by S (N+1,c) and S (N+1,p) , which are the cyber and physical models, respectively, as illustrated in Figure 3. Let n (i,c) and n (i,p) represent the cyber and physical components of S i , which correspond to the number of components of S (i,c) and S (i,p) , respectively, such that n i = n (i,c) + n (i,p) . Let x (i,c) and x (i,p) denote the number of cyber and physical components that are reinforced, respectively, such that x i = x (i,c) + x (i,p) . Similarly, y (i,c) and y (i,p) denote the number of cyber and physical components that are attacked, respectively, such that y i = y (i,c) + y (i,p) . The relationships between these system-level models can be captured using refined versions of the aggregate correlation function as follows. For the wide-area network, we have: which reflects that a cyber attack on a router will disrupt all of its l N+1 connections, thereby illustrating the amplification effect of these cyber attacks. For the server sites, we have a similar effect due to physical fiber attacks denoted by label p f reflected by: which indicates that at site S i , the fiber disruption will disconnect all of its l i servers. Similarly, the cyber attack on the site's HVAC app denoted by label c h leads to: which indicates that at site S i , the HVAC disruption will affect all of its l i servers. In the limiting case, each component can be represented as a singleton sub-model S i,j such that x i = n i ∑ j=1 x (i,j) and Here, x (i,j) ∈ {0, 1} and y (i,j) ∈ {0, 1} indicate if the component represented by S i,j is reinforced and attacked, respectively.

Component-Level Correlations
We now consider a special case where the attacker and provider choose the components of a constituent system to attack and reinforce, respectively, according to a uniform distribution. Corresponding to the site physical model S (i,p) , i = 1, 2, . . . , N, there are [n (i,p) − x (i,p) ] + non-reinforced fiber connections, where [x] + = x for x > 0, and [x] + = 0 otherwise. Similarly, there are [n (i,c) − x (i,c) ] + non-reinforced servers. If a cyber component (i.e., a server) is reinforced, it will survive a cyber attack, but can be brought down indirectly by a fiber attack. Then, the probability that a cyber-reinforced component survives y (i,p) fiber attacks is approximated by: where the normalization constant f (i,c) is appropriately chosen.
On the other hand, if a cyber component is not reinforced, it can be brought down by either a direct cyber attack or indirectly through a fiber attack. Thus, we approximate the survival probability of a cyber component at site i as: which reflects the additional lowering of the survival probability in inverse proportion to the level of cyber attack y (i,c) . Under the independence of component attacks and reinforcements, the survival probability of the cyber sub-model S (i,c) is given by: which in turn provides: Using the above formulae, for cyber model S (i,c) of site S i , we have: It is interesting to note that the system multiplier function Λ (i,c) does not depend on the cyber reinforcement term x (i,c) even though it corresponds to . The function, however, depends on the physical reinforcement term x (i,p) . Under the statistical independence of cyber and physical attacks, for the cyber and physical sub-models, namely, S (i,c) and S (i,p) , respectively, we have the following generalization of Equation (4): or equivalently: By differentiating the equation with x (i,c) , we obtain: Then, by noting that ∂x i ∂x (i,c) = 1, we obtain: which enables us to approximate Λ i by Λ (i,c) . Consider that the HVAC sub-model S (i,h) of site i is further decomposed into cyber and physical singleton sub-models represented by S (i,c h ) and S (i,p h ) , respectively. Then, we have: which corresponds to a cyber attack on and defense of the HVAC app. Similarly, we have: which corresponds to a physical attack on and defense of the HVAC cooling tower.

Expected Capacity Estimates
We now consider the capacity of the infrastructure under x i reinforcements and y i attacks on components of S i , which can be further partitioned into the corresponding values of sub-systems of S i .

Sum-Form and Product-Form
Based on the estimates from Section 4.3, for the expected capacity N A I of the sub-models of S i , the dependence on y (i,c) and y (i,p) − x (i,p) + is more direct, and it is qualitatively similar for both sum-form and product-form, since the term Λ i appears in the denominator. Then, we obtain the following expected capacity estimates: for the sum-form, and for the product form, In both cases, the multipliers n i , g D and L D are positive, and it is reasonable to assume the condition ∂L D ∂x i ≥ 0, as described above. Thus, the expected capacity decreases with the number of cyber attacks y (i,c) . The opposite trend is true with respect to y (i,p) − x (i,p) + , which implies no effect if the number of reinforced components is at least as large as the number of component attacks, and otherwise, the expected capacity increases with the difference. In both cases, the dependence on the number of servers l i at site i is qualitatively similar in that the expected capacity increases proportional to its logarithm.
The term n i that corresponds to site S i can be further refined by decomposing into its sub-models, which provides insight into their individual effects. The impact of the HVAC control app at site i is reflected in its corresponding term: obtained from Equation (5), which shows that reinforcing the app, that is x (i,c h ) = 1, nullifies the amplification effect of l i since [y (i,c h ) − x (i,c h ) ] + = 0 for both sum-form and product-form utility functions. Such an analysis can be carried out for other critical components of the sites to gain information on which components to reinforce for higher utility. In particular, reinforcing the site fiber routes will have a similar effect on nullification, but server reinforcements will have somewhat lesser impact.

Composite Utility Functions
We now obtain the following expected number of servers for the composite utility functions, In the equation, n i is positive, and it is reasonable to assume that − at NE, and the survival probability of entire infrastructure P I does not decrease with x i . Thus, the expected capacity decreases with y (i,c) , and the opposite is true with respect to y (i,p) − x (i,p) + , as discussed in the previous section. In both cases, the dependence on the number of servers l i at site i is qualitatively similar in that the expected capacity increases proportional to its logarithm, also as in the previous section. As in sum-form and product-form utility functions, the term: can be decomposed using sub-models of site i to assess the impacts of its parts, in particular its components. For the HVAC app at site i, we have the corresponding term: . Furthermore, such an analysis can be carried out for other components, and in a limiting case, each component can be modeled as a singleton sub-model, in which case their attack and reinforcement variables are Boolean.
The dependencies considered here for the sub-models are quite simple as a result of the statistical independence and uniform distributions of reinforcements and attacks. Even under such simple conditions, the detailed NE conditions are quite complex to characterize, but they do provide qualitative insights into the effects of underlying parameters.

Conclusions
We consider a class of infrastructures with multiple systems, wherein the communications network plays an asymmetric role by providing the critical connectivity between them. By utilizing correlations at the system-and component-level, we formulated the problem of ensuring the infrastructure survival as a game between an attacker and a provider, by using composite utility functions that generalize the sum-form and product-form utility functions. We derived Nash equilibrium conditions in terms of composite gain-cost and composite multiplier, which provide compact expressions for individual system survival probabilities and also the expected number of operational components. This paper presented a unified account of partial results that were separately developed for: sum-form utility functions [5] and under asymmetric network conditions [1]; product-form utility functions [8]; composite utility functions [2]; composite utility functions under asymmetric network conditions [3]; and detailed derivations for multi-site cloud server infrastructure [4]. These results extend previous results on interconnected systems [30,32] and cyber-physical infrastructures [31] by using the composite utility functions. We presented a comprehensive treatment of the three utility functions, including more illustrative details of the sum-form and product-form utility functions. For multi-site cloud infrastructures, we explicitly related the correlation functions and system multiplier functions to the infrastructure parameters, which in turn provided us insights into the estimates for system survival probabilities and the expected capacity. In particular, by employing sub-models of the sites, the effect of parts of the system on the expected capacity could be inferred by using the corresponding multiplier functions.
The formulation studied in this paper can be extended to include cases where targeted attacks and reinforcements of specific individual components are explicitly represented. The system models here incorporate the same level of detail in that they all consist of components, and it would be of future interest to incorporate varying levels of detail in them, for example by replacing components with the recursively-defined systems. The utility functions considered in this paper do not explicitly use the capacity term. Instead, they are driven by the infrastructure level considerations by using P I , which in turn leads to expressions for the capacity that involve other terms that contribute to P I . It is of future interest to compare this formulation to ones whose utility functions contain the expected capacity term in place of infrastructure survival probability terms. Another future direction is to consider the simultaneous cyber and physical attacks on multiple systems and components and sequential game formulations of this problem. Performance studies of our approach using more detailed models of cloud computing infrastructure, smart energy grid infrastructures and high-performance computing complexes would be of future interest.