Security from the Adversary ’ s Inertia – Controlling Convergence Speed When Playing Mixed Strategy Equilibria

Game-theoretic models are a convenient tool to systematically analyze competitive situations. This makes them particularly handy in the field of security where a company or a critical infrastructure wants to defend against an attacker. When the optimal solution of the security game involves several pure strategies (i.e., the equilibrium is mixed), this may induce additional costs. Minimizing these costs can be done simultaneously with the original goal of minimizing the damage due to the attack. Existing models assume that the attacker instantly knows the action chosen by the defender (i.e., the pure strategy he is playing in the i-th round) but in real situations this may take some time. Such adversarial inertia can be exploited to gain security and save cost. To this end, we introduce the concept of information delay, which is defined as the time it takes an attacker to mount an attack. In this period it is assumed that the adversary has no information about the present state of the system, but only knows the last state before commencing the attack. Based on a Markov chain model we construct strategy policies that are cheaper in terms of maintenance (switching costs) when compared to classical approaches. The proposed approach yields slightly larger security risk but overall ensures a better performance. Furthermore, by reinvesting the saved costs in additional security measures it is possible to obtain even more security at the same overall cost.


Playing a Mixed Strategy Causes Costs
Implementing a pure strategy equilibrium of a game is straightforward and the installation cost of the strategy occur only once at the beginning of the game since the optimal strategy profile is pure and will never be altered.When playing repeated games, however, it may occur that the optimal strategy is mixed, i.e., the optimal strategy is obtained by assigning a positive probability to two or more pure strategies.A mixed strategy is an assignment of probabilities, which declares a law for randomly selecting the individual pure strategies in each round of game to ensure an optimal result regarding the expected utility.In standard models it is assumed that players can switch strategies as frequently as they want.Yet, in real life switching strategies will incur additional costs.For example if we consider game-theoretic models in cybersecurity, strategies may include different configurations of servers, firewalls or other system components.If switching strategies means changing configurations, the change may be costly in terms of time or money (e.g., downtime of servers, hourly rates of staff, etc.).
One possibility to consider switching costs is to compute multiple Nash equilibria and choose the one with the smallest (Shannon) entropy.This approach yields the "purest" of all equilibria.Another possibility is to model the problem in terms of dynamic games: the aim is to find the optimal Markov chain, i.e., to find the best (mixed) strategy based on the current state and the cost of switching to new states.Rass, König and Schauer have discussed these approaches in [1].They point out that solely considering "more pure" strategies (an thus reducing the frequency of action changes) or minimizing the costs for the next choice is not sufficient: the implementation of a defense strategy needs to be done in a way, such that the defender's moves should not be predictable for an attacker, as this facilitates security breaches.In other words, when employing a security strategy it should not be possible to get a better forecast on the defender's next action when considering the current state of the system and the costs incurred by switching to another strategy.Thus, instead of calling for a dynamic optimization [1], suggest a static framework, where all actions are taken stochastically independent of the current state while still minimizing the switching costs.Despite their strong focus on security principles, there exist even more efficient solutions if we take an "information delay" for the adversary into account, i.e., the time it takes the attacker to recognize a changed situation and adapt to it.
The concept introduced in this work incorporates the time it takes for an attacker to mount an attack.It may happen that an attacker does not have complete information about the present state of the attacked system (such as the current strategy of a defender), but only knows the state of the system some rounds ago.This may happen, for example, if it takes the attacker some time to carry out the attack, i.e., the adversary has some inertia.During this period, the attacker may not be able to keep track of the system, and will not detect if the state of the system changes.Thus, his attack is performed after some delay, during which no new information can be processed.As a vivid example, consider an intruder who tries to gain unauthorized access to some critical infrastructure, that is surrounded by a wall.Before he starts his attack, he knows the current position of a guard, as he can see him through a window or compromised camera, but while he is entering the premises, the position of the guard may change, without the attacker noticing.
In the following, we will refer to this scenario as an information delay.By taking into account the average time the system is unobservable for an attacker prior to his attack, we can construct strategy policies that are cheaper in terms of switching cost.This saving is traded for a slightly larger risk in the primary security goal, but ultimately yielding a better performance overall.Security is never only an economic matte of cost-benefit balance, and impractical security solutions are practically worthless (say, if the optimal security strategy prescribes frequent changes in server configurations, such a strategy would simply not be doable in practice).Taking into account the cost for "running" an optimal defense as such is, in our opinion, an equally important aspect of defense as the security precaution itself.This work aims at providing means to keep the running costs of a defense under control and in balance to the security benefits therefrom.

Related Work
This work essentially deals with convergence to a Nash equilibrium, which is a well studied matter in the literature, but usually with a totally different goal as ours.Some work [2] indeed assumes a certain "speed" of the attacker, and adapts the defense to it.However, this prior work (and related follow-ups) disregard the potential of moving slightly faster than the adversary to gain an explicit profit from this.Most studies of convergence relate to the speed at which behavior can be adapted to become optimal in the long run [3][4][5][6][7], with some consideration spent on specific settings such as congestion or load balancing.The cost borne in switching between configurations has been considered in [8], where the authors use entropy as a measure to prefer certain strategies with less cost in the change.In the context of password policy choice, [9] considered games about choosing passwords that are (i) easy to remember, (ii) hard to guess, and (iii) easy to change (for the owner).The latter aspect is a well known cause of passwords to follow certain patterns like having counters attached to them or similar.Taking the password change (switching) cost into account can aid looking for a best password policy and prevent the issue to some extent.Related on different grounds is also [10,11], where convergence to an ( -approximate) equilibrium is studied using Markov chains.Our work relates to this in the sense that we also design Markov chains to play a desired equilibrium, but use an -approximation to the equilibrium as an area of trade-off to avoid costs from switching.In that sense, we provide a novel use of -approximations to equilibria for the sake of security economics [12].

Contribution and Structure of the Article
This contribution aims at generalizing the switching cost model [1] for games where the attacker has incomplete information that can be described by an information delay.By taking into account information delay, the implementation costs can be reduced while still ensuring the security principle that the opponent cannot forecast the next move more precisely.The resulting policy can be described using a Markov chain model.We will show in fact that the switching cost model is a special case of our information delay model.
The structure of the article is as follows: first, we introduce some preliminary concepts and notations required to describe the game setup as well as the costs for switching strategies in Section 2. In Sections 3 and 4, the theoretical framework is explained and all mathematical derivations stated.A numerical example completes the Section 3. Finally, Section 5 summarizes the findings and highlights some open questions that might be relevant for future research.

Preliminaries
In the following, we will use uppercase letters to define random variables and sets.Vectors are printed in bold-face.We will write X∼F if a random variable X is distributed according to a probability distribution F. Distributions on finite ordered sets are described using probability vectors x = (x 1 , . . ., x n ), ∑ n i=1 x 1 = 1 which represent the probability mass function of the underlying random variable.It is assumed that the random variable follows a discrete distribution, hence it has a density w.r.t. the counting measure.We will use the notation d x ← PS to express that an element d was sampled from the set PS with distribution x; i.e., Pr(X = d) = x i if d is the i-th element in the ordered set PS.

Definitions and Game Setup
We consider a finite two-player game between player 1 and player 2 with pure strategy sets PS 1 and PS 2 , respectively.Let |PS 1 | = n and |PS 2 | = m where n, m ∈ N. We write ∆(PS) to denote the simplex over a strategy set PS that contains all probability distributions on PS.The extension to n > 2 players will be obvious so we only consider the case with two players.We assume a zero-sum situation, i.e., the attacker (which is player two) has the payoff u 2 = −u 1 .In our security game scenario let us adopt the defenders perspective, i.e., we act as player 1 in the game.Throughout this work the defender's strategies are determined by the expected damage and the switching costs.We assume that the defender is minimizing two objectives: the primary security goal is minimization of the damage due to a risk and the second goal is reduction of the switching cost.

Costs for Playing Mixed Strategies
The damage that is minimized as the first objective is modeled by a utility function u (1) 1 (x, y) = x T Ay that descries the expected damage depending on both players actions.For simplicity we assume A ∈ R n×m is a constant matrix.
The second goal is switching cost minimization: by our definition, a switch from strategy i ∈ PS 1 to strategy j ∈ PS 1 will cause cost s ij ∈ R + for player 1.Note that the cost of switching strategies only depends on player 1's actions, i.e., on his past and present strategy which we denote by X t−1 and X t respectively, and t ∈ N denotes the t-th gameplay.Thus, we can employ a first order Markov chain to describe the switching behavior.As the player's switching costs, and therefore his next move, only depend on the present state the switching process is a first order Markov process.As any stochastic process is fully determined by its finite dimensional distribution, we can describe the switching behavior by specifying the joint probability distribution (jpd) of X t−1 and X t , t ∈ N. As we assume the switching costs are constant over time, the optimal jpd that determines the mode of changing strategies will be constant over time as well.Thus, the resulting, optimal switching policy joint probability distribution of Pr(X t−1 = i, X t = j) can be modeled as a time-homogeneous process, i.e., it holds Pr(X t+h−1 = i, X t+h = j) = Pr(X t−1 = i, X t = j) ∀ h ∈ N. Homogeneity implies that expected switching cost can be described by We now model the simultaneous optimization of damage and cost as a multi-objective game (MOG).In a MOG, each player i can have , where ∆(PS −i ) denotes the strategy space of the remaining players.In our two-player zero-sum game we have 2 objectives and both players have vector-valued payoffs u 1 , −u 1 : For this situation the following definition is convenient.
Definition 1 (Pareto-Nash Equilibrium).In game with a minimizing player 1, a Pareto-Nash equilibrium is a strategy profile (x * , y * ) ∈ ∆(PS 1 ) × ∆(PS 2 ) that fulfills where x ≥ 1 y means that there exists at least one coordinate i for which x i ≥ y i holds, regardless of the other coordinates.
Lozovanu, Solomon and Zelikovsky [13] have studied the computation of Pareto-Nash equilibria by scalarizing the utility vector.To this end, each player i defines weights α α α i > 0, α α α i 1 = 1 to scalarize his utilities via α α α T i • u i .In [13] it was proven that the Nash equilibria of so scalarized games are exactly the Pareto-Nash equilibria in the original multi-objective game.
Letting the defender prioritize a set of two goals by assigning weights α and 1 − α, the scalarized payoff for the defender is For readability we will drop the coefficients (1 − α) and α as we can just include them in the constant matrices A and S = (s ij ) i,j=1,...,n .

The Switching Cost Model (SCM)
The model introduced in [1] assumes that the switching of strategies is performed independently of the current strategy, i.e., Pr(X t−1 = i, X t = j) = Pr(X t−1 = i) • Pr(X t = j).Thus, any future change in strategy is not predictable with more accuracy when the current system state is known.Hence the utility function u 1 can be written as This way, the whole behavior of the system can be described using only the marginal probability vectors x : with constant payoff matrices A as well as S = (s ij ) i,j=1,...,n .We stress the fact that S need not be a symmetric matrix.As a simple example consider a security guard driving to different assets i and j where j is on top of a mountains and i in the valley.The ascend from i to j will certainly take up more resources (e.g., fuel) than the decent from j to i. Thus, s ij > s ji holds indeed.Yet, we assume that s ii = 0 (remaining in the current strategy does not incur any switching costs).
In absence of an accurate adversary model [14], we may strive for a worst-case analysis and assume that the attacker will always try to cause as much damage as possible, i.e., they aim to maximize over u 1 : max Note that for player 2 the expression x T Sx is constant.Thus arg max y∈∆(PS 2 ) (x , where e i ∈ R m denotes the i-th coordinate unit vector.By substituting v := max i (x T Ae i + x T Sx), the resulting problem can be described through the following optimization problem.

Extension of SCM-Taking into Account Information Delay
In this paper we extend the switching cost model by relaxing the independence assumption, i.e., we let the choice of the next pure strategy X t depend on the current state X t−1 .Thus, we want to model the switching behavior as a Markov process.In order to reduce the switching costs, we may add some inertia to player 1 by increasing the conditional probability to remain in the current strategy for each state.Will control the amount of inertia in a way that we can guarantee the distribution of the system after a predetermined amount of gameplays k conditional on the last observed state to be almost the same as the unconditional distribution.
Comparison of switching cost optimization in [1] to our approach.(a) Model in [1] assumes independent choice of next pure-strategy; (b) Our model allows for first-order dependence when choosing next pure-strategy but controls the deviation from the independence assumption after k or more subsequent gameplays.
Hence, we demand the resulting marginal distribution after a fixed number k of consecutive repetitions of the game to be "almost independent" of the initial state, i.e., the conditional and unconditional probabilities after k or more steps need to be almost the same, that is we require where | • | is the sum of absolute deviations of the two probability vectors.We call k the information delay that specifies the length of the period an attacker is not able to gain insight into a system prior to attacking (see Figure 1).Furthermore we call the maximum deviation of independence.In a seemingly alternative view, one could propose wrapping up a lot of κ rounds of the game that are interdependent in a single "larger" round, yet such an approach could be flawed for two reasons: first, this would impose an independence assumption between any two batch of round in the game.Second, the timing of the game rounds may be naturally induced by the "periodicity" of the business as such (e.g., work hours per day, shifts, or similar).
In this dynamic framework we need to redefine the objective function u 1 .Obviously, there is a conflict in notation and conceptualization here when optimizing over u (1) 1 , as u 1 is a function with arguments x and y, i.e., the arguments are the marginal distributions each players assign to his set of pure strategies, but u 1 (in contrast to the formulation in (3)) is a function of the joint probability distribution of player 1's strategies.Yet, there is a direct connection between x and Pr(X t−1 = i, X t = j): as we are dealing with mixed strategies, the defender will often switch pure strategies and by law of large numbers the distribution over pure strategies PS 1 will converge to x after an infinitude of gameplays.Accordingly, the dynamic (i.e., switching) behavior of player 1, which is described using a homogeneous discrete Markov chain (HDMC) needs to have x as a stationary as well as the unique limiting distribution in order for the two objective goals to be consistent.
Bearing in mind that the limiting behavior of any HDMC can be described using a one-step transition matrix P of dimension |PS 1 | × |PS 1 | and an initial distribution π π π 0 that describes the starting state of the process, we will make use of the following theorems for our results: Theorem 1. (Limit [15]) Every aperiodic irreducible HDMC with finite state space has a unique limiting state π π π.
So if we are dealing with aperiodic irreducible homogeneous discrete Markov chains with finite state space E = {1, . . ., } we can ensure the existence of a unique limiting state π π π = (π 1 , . . ., π ) T , which is always a stationary state.Additionally, it can be shown that the limiting distribution π π π of a such a stochastic process is independent of the initial distribution π π π 0 .Moreover, by the following ergodic theorem, it is possible to specify the speed of convergence to the limit state for an aperiodic irreducible HDMC with finite state space E = {1, . . ., } and transition matrix P. Let P k (i, j) denote the transition probability from state i ∈ E to j ∈ E after k steps.Note that the following theorem is a consequence of the Perron-Frobenius Theorem. 1 Theorem 2. (Geometric Ergodicity [16]) Let P the transition matrix of an irreducible, aperiodic Markov chain with finite state space E = {1, . . ., }.Then for all probability vectors π π π 0 if holds π π π = (π 1 , . . ., π ) T , π j > 0 for all j ∈ E and π π π is the only solution to 1 Note that by Perron-Frobenius Theorem for aperiodic irreducible HDMCs with finite state space the largest eigenvalue of the transition matrix is always 1 and its eigenvector is the steady state distribution.Further, the second largest eigenvalue that determines the speed of convergence to the steady state.
Moreover, the speed of convergence to the limiting state π is geometric, i.e., there exists a constant c > 0 (that depends on P only) such that ∀i, j ∈ E is the row vector with all ones, λ 2 denotes the second largest eigenvalue of P in terms of absolute values.
The following proof is from [17].We will limit ourself tho the case when P is diagonalizable, which is the case for our construction of P(θ).
Proof.An irreducible aperiodic Markov chain has a positive transition matrix P. Let h i , i ∈ E, h i ∈ R ×1 denote the right eigenvectors of P and g T i , i ∈ E, g i ∈ R ×1 the left eigenvectors of P. By Perron-Frobenius Theorem the largest eigenvalue λ 1 is unique and possesses a strictly positive left eigenvector.For stochastic matrices like P it additionally holds that the largest eigenvalue is λ 1 = 1, the right eigenvector to λ 1 is 1 and its left eigenvector is the one that fulfills (5).Thus, g T i = π π π T .Now we can write P in its spectral representation Now for all initial states i and resulting states j the absolute difference of the components of P k and the corresponding entries in 1π π π T (i.e., |P k (i, j) − π j |) is bounded by where B l (i, j) denotes the respective entry of B l , l ∈ {2, . . ., }.
we get the Expression (6).
Geometric ergodicity means that the absolute difference of the steady state to the marginal distribution after k steps given any initial distribution is bounded by c • |λ 2 | k .Subsequently, λ 2 determines the speed of convergence to the steady state distribution given an arbitrary initial distribution π π π 0 : The smaller λ 2 , the faster the convergence to the steady state.Considering Equation (4) it is obvious, that if we want the distributions of X t+κ and X t+κ | X t for an arbitrary instantiation of X t to differ by at maximum ∀κ ≥ k, we need to control the second largest eigenvalue of P, i.e., ∀i, j ∈ E, ∀t ∈ N, ∀κ ≥ k.Now we want construct an irreducible aperiodic HDMC described by a transition probability matrix of a for which it holds

•
-Convergence: the resulting conditional probabilities after k or more repetitions of the game given an initial are approximately the ergodic state (i.e., they satisfy ( 4)) • Equilibrium: the limiting as well as the marginal distribution of the process equal the Nash-Equilibrium-solution from (3)

•
Cost reduction: the total costs are reduced.
In (8) have seen that -convergence can be achieved by controlling the second largest eigenvalue of the conditional probability matrix.The following result will help construct the sought transition matrix for the intended convergence control: Theorem 3. (Sklar [18]) Every cumulative distribution function F X (X) of a random vector X = (X 1 , . . ., X n ) T can be expressed by its marginal distributions F X 1 (x 1 ), . . ., F X n (x n ) and a copula C : Note that we are dealing with first order HDMCs and that the whole behaviour of the chain is determined by one single two-dimensional joint distribution function for all (X t−1 , X t ), i.e., ∀t.For brevity, we will abbreviate F X t−1 ,X t (i, j) by F ij .As we require both the marginal probabilities of X t and X t−1 to equal the Nash-Equilibrium-solution from (3), the joint distribution of the random vector (X t−1 , X t ) can be constructed using the marginal distribution of X t only, i.e., Using Sklar's Theorem and the fact that we are only considering absolutely continuous discrete random variables X t with σ-finite measures F X t , the first order Markov Process has not only a joint cdf, but also a discrete density Pr(X t−1 = i, X t = j).Thus, there exists an f ∈ [0, 1] n×n , for which it holds that for all 1 ≤ i, j ≤ n: f ij = Pr(X t−1 = i, X t = j), ∑ n i=1 f ij = x j for all j and ∑ n j=1 f ij = x i for all i.Therefore, the discrete density, which is represented by ( f ij ) i,j=1,...,n , has marginals prescribed by x and we hereafter write f ij (x) to denote this dependency.As such f ij (x), i, j ∈ PS 1 is not necessarily a function of x, but rather chosen in a way constrained by x regarding the marginals.
Under the above-mentioned prerequisites, we are able to redefine u 1 so that it only depends on x: for the just defined joint probability matrix f = ( f ij ) i,j=1,...,n : Note that it is necessary to specify parametric functions f (x, θ) to model the jpd, as n 2 parameters are not estimable given the number of constraints.It is not possible to directly optimize the individual f ij over [0, 1] n×n .Therefore, we need to constrain f to a parametric family of functions, i.e., f =: f (θ θ θ, x), where the optimization is performed by adjusting the parameter vector θ θ θ ∈ Θ.Then, given f (θ θ θ, x) which represents, the respective one step transition matrix P can directly be computed via where diag is a diagonal matrix.Unfortunately, even when using parametric families of functions for f in most cases controlling the value of λ 2 from diag(x) −1 f (θ θ θ, x) will be difficult.For reversible Markov chains one could compute upper bounds via Cheeger's and Poincare's inequality [19], yet we will work with a direct construction scheme for f (θ θ θ, x) for which it is possible to obtain exact control of λ 2 .

Efficient Switching by Considering Information Delay
In the defined framework it is possible to construct an aperiodic irreducible HDMC with state space E ⊆ PS 1 that satisfies -convergence as well as the equilibrium condition while reducing costs at the same time.
To do so, we first set the parameters • k ∈ N: the information delay • > 0: the maximum deviation from the steady state distribution after k rounds of game play when an arbitrary initial state X 0 is given.W.l.o.g.we assume X 0 is the last instantiation of the process which is known to the attacker.In the next step we compute the optimal solution x * for x from (3).Then, using x * we will only include those pure strategies i ∈ PS 1 in our framework for which x * i > 0 holds in Theorem (3).Excluding zero probability states is necessary, as otherwise the transition Matrix that we will construct is not positive, which is a necessary condition in Theorem 2. W.l.o.g.denote the included strategies E = {1, . . ., }, 1 ≤ ≤ n and their probability vector x = ( x1 , . . ., x ) T .For the class of functions f we choose the following family that depends on one parameter θ ∈ (0, 1] and the probability vector x * : W.l.o.g.let the 0 entries of x * be x * +1 , . . ., x * n .For f we can easily prove the following: Now observe that by the definition of f in (10) statement ( 9) is equivalent to where I denotes the identity matrix and 1 ∈ R is the vector of all 1s.Note that by strict positivity of (1 − θ) the constructed HDMC with one step transition matrix P(θ) is aperiodic and irreducible.Furthermore, the so constructed Markov chain has the limiting state x * , i.e., the limiting marginal distribution of the chain is the Nash-equilibrium solution from (3).In this setting it can easily be verified that the largest eigenvalue of P(θ) is 1 and that all remaining eigenvalues are θ: Proof.The characteristic polynomial of P(θ) is Thus, in order to determine the eigenvalues λ i of P(θ) we need to determine the eigenvalues λi of Q and it holds 2 : As Q consists of equal rows (1 − θ) • ( x * 1 , x * 2 , . . ., x * ) the rank of Q is one.Thus, there exists only one eigenvalue λ1 of Q which is not 0. As the trace of Q is the sum of its eigenvalues, and λ2 = Note that the eigenvectors of P(θ) and x * • 1 T are equal, as rescaling and adding a multiple of I does not alter the eigenvalues.
or by deletion of the zero-entries in x * and the correspronding rows in A: Expression (16) states that the total cost can be guaranteed to be reduced if the switching cost reduction is larger than the costs incurred by deviating from x * .We will now show, how θ can be chosen with respect to a given maximum deviation of independence > 0 in order to ensure a total cost reduction.First, assume that (4) holds; a sufficient criterion for this to hold is c • |θ| k ≤ .By (7) |c| Then, for the resulting conditional probability vector with information delay, it holds for all j ∈ E and for all κ ≥ k: and as the maximum deviation from independence is we have Thus, we have proven the following theorem: Theorem 5 (Cost reduction).
This implies that, if θ and satisfy the condition (18), then the total costs are reduced.The following example illustrates the results.

Example and Sensitivity Analysis
Consider the following two-player zero-sum Matrix-game with switching costs.We define the parameters For simplicity it is assumed that the scalarization constants are already included in the payoff matrices.Furthermore, assume that the information delay is k = 3.
The standard equilibrium (we will denote it as x), if we only consider A, is x = (0.511, 0.311, 0.178) T with a value 6.0889.The cost of switching strategies independently, as it is assumed in standard models, however, would incur an extra cost of 0.88 per round, which yields 6.9689 in total.Now, computing x * from (3) yields x * = (0.2, 0, 0.8) T .The switching costs are 0.32 and the maximum damage caused by the adversary is 6.4.This yields average costs of 6.72 in each repetition of the game.Note that this strategy only includes the first and the last strategy ∈ PS 1 .
We will lower this average cost by taking into account the adversaries inertia, which is represented by the information delay of k = 3 rounds.As mentioned before, in the information delay model we only include those strategies j, where x * j > 0, hereafter denoted as x * = (0.≈ 0.149 and = 4•θ 225 ≈ 0.00265, i.e., if the adversary knows the initial state, the individual conditional probabilities P k (i, j) after 3 or more steps will differ from each component x * by about a quarter of a percent point at maximum.Thus, the total average cost incurred is 6.67497.The switching costs were reduced by 14.9% in each round, while the maximum damage caused by an adversary was only increased by 0.04140625%.Henceforth, taking into account the adversaries inertia can cause a dramatic cost reduction, while still ensuring almost the same security.
Remark 1.Another possibility to find admissible θ is to apply the bisection method to θ until the maximum deviation of the entries of P 3 to x * is smaller than a predetermined .As an example we chose = 0.005 and obtained θ = 0.187, which yields even lower switching costs (0.26016).We obtained P = 0.3496 0.6504 0.1626 0.8374 , P 3 = 0.205231 0.794769 0.198692 0.801308 .
In this case, the average value of u 1 is 6.405228 and the total cost per round is 6.665388.

Minimizing the Total Cost
If one wishes not only to find a way to efficiently implement the Nash-equilibrium solution x * from (3), but also allows for other ergodic states π π π = x * to minimize the total cost while still ensuring -convergence after k steps, one can rewrite the utility function (13) using the transition matrix P: Here, k is is the information delay (an input parameter), and π π π ∈ R is an × 1 probability vector over E = {1, . . ., } ⊆ PS 1 , that only includes non-0-probability strategies from PS 1 .Ã, S likewise denote the cleaned from zeros payoff and switching cost matrices only including the strategies from E. For π π π, the transition matrix is defined as P = θ • I + (1 − θ) • 1 • (π π π) T .The global optimum can then be found by solving the following optimization problem: subject to c • θ k ≤ , θ ∈ [0, 1), π j > 0 ∀j ∈ E, ∑ j∈E π j = 1.The optimization is performed in the following way: first, it is decided which stategies from PS 1 to include in E. i.e., we choose a subset of stategies from PS 1 and w.l.o.g denote them {1, . . ., }.Then, the matrix A is reduced to Ã ∈ R ×m , which obtained by deleting all rows of stategies that are not included in E. Analogously, S ∈ R × is obtained by deleting all columns and rows of strategies / ∈ E. Having obtained Ã and S and using the spectral representation of P k we can reformulate (20): subject to c • θ k ≤ , θ ∈ [0, 1), π j > 0 ∀j ∈ E, ∑ j∈E π j = 1.Note that all B i , i = 2, . . ., depend continuously on π π π.

Discussion
It is interesting to note that despite optimality-by-design, some short-term deviations from an equilibrium can indeed be rewarding.Extending the concepts put forth in this work to dynamic games (e.g., leader-follower scenarios) is a natural next step.The methods used here lend themselves also to a treatment of perhaps continuous time chains, as limits of sequences of discrete chains with vanishing pauses in the limit.For practical matters, our work can provide a tool to fix implausible or impractical equilibria, by avoiding "hectic" changes if the equilibrium is mixed, while retaining a good security-investment trade-off.In the end, reinvesting the saved cost in additional security measures will yield even more security at the same cost.
At first glance our result seem to contrast earlier findings.For example, Reference [20] states that a defending player may actually benefit from revealing information about the defense strategy to the adversary and Reference [21] suggest that centrally allocating resources and publicly announcing the defensive allocation yields higher success probabilities for a defender.Both approaches deal with publicly announcing defense strategies to influence alleged attackers.This is different from our situation as we do not consider influencing the attacker (neither by providing potentially misleading information nor by hiding information).Rather we investigate how players behave if an information delay is part of the setting of the game, i.e., if it needs to be taken into account due to the situation at hand.