Next Article in Journal
Cournot’s Oligopoly Equilibrium under Different Expectations and Differentiated Production
Next Article in Special Issue
Information Design for Multiple Interdependent Defenders: Work Less, Pay Off More
Previous Article in Journal
Games over Probability Distributions Revisited: New Equilibrium Models and Refinements
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Complete Analysis on the Risk of Using Quantal Response: When Attacker Maliciously Changes Behavior under Uncertainty

1
Department of Computer and Information Science, University of Oregon, Eugene, OR 97403, USA
2
College of Information Sciences and Technology, Pennsylvania State University, State College, PA 16801, USA
*
Authors to whom correspondence should be addressed.
Games 2022, 13(6), 81; https://doi.org/10.3390/g13060081
Submission received: 2 November 2022 / Revised: 22 November 2022 / Accepted: 23 November 2022 / Published: 2 December 2022
(This article belongs to the Special Issue Game-Theoretic Analysis of Network Security and Privacy)

Abstract

:
In security games, the defender often has to predict the attacker’s behavior based on some observed attack data. However, a clever attacker can intentionally change its behavior to mislead the defender’s learning, leading to an ineffective defense strategy. This paper investigates the attacker’s imitative behavior deception under uncertainty, in which the attacker mimics a (deceptive) Quantal Response behavior model by consistently playing according to a certain parameter value of that model, given that it is uncertain about the defender’s actual learning outcome. We have three main contributions. First, we introduce a new maximin-based algorithm to compute a robust attacker deception decision under uncertainty, given the defender is unaware of the attacker deception. Our polynomial algorithm is built via characterizing the decomposability of the attacker deception space as well optimal deception behavior of the attacker against the worst case of uncertainty. Second, we propose a new counter-deception algorithm to tackle the attacker’s deception. We theoretically show that there is a universal optimal defense solution, regardless of any private knowledge the defender has about the relation between their learning outcome and the attacker deception choice. Third, we conduct extensive experiments in various security game settings, demonstrating the effectiveness of our proposed counter-deception algorithms to handle the attacker manipulation.

1. Introduction

In many real-world security domains, security agencies (defender) attempt to predict the attacker’s future behavior based on some collected attack data, and use the prediction result to determine effective defense strategies. A lot of existing work in security games has thus focused on developing different behavior models of the attacker [1,2,3]. Recently, the challenge of playing against a deceptive attacker has been studied, in which the attacker can manipulate the attack data (by changing its behavior) to fool the defender, making the defender learn a wrong behavior model of the attacker [4]. Such deceptive behavior by the attacker can lead to an ineffective defender strategy.
A key limitation in existing work is the assumption that the defender has full access to the attack data, which means the attacker knows exactly what the learning outcome of the defender would be. However, in many real-world domains, the defender often has limited access to the attack data, e.g., in wildlife protection, park rangers typically cannot find all the snares laid out by poachers in entire conservation areas [5]. As a result, the learning outcome the defender obtains (with limited attack data) may be different from the deception behavior model that the attacker commits to. Furthermore, the attacker (and the defender) may have imperfect knowledge about the relation between the deception choice of the attacker and the actual learning outcome of the defender.
We address this limitation by studying the challenge of attacker deception given such uncertainty. We consider a security game model in which the defender adopts Quantal Response ( QR ), a well-known behavior model in economics and game theory [2,6,7], to predict the attacker’s behavior, where the model parameter λ R is trained based on some attack data. On the other hand, the attacker plays deceptively by mimicking a QR model with a different value of λ , denoted by λ dec . In this work, we incorporate the deception-learning uncertainty into this game model, where the learning outcome of the defender (denoted by λ learnt ) can be any value within a range centered at λ dec .
We provide the following key contributions. First, we present a new maximin-based algorithm to compute an optimal robust deception strategy for the attacker. At a high level, our algorithm works by maximizing the attacker’s utility under the worst-case of uncertainty. The problem comprises of three nested optimization levels, which is not straightforward to solve. We thus propose an alternative single-level optimization problem based on partial discretization. Despite this simplification, the resulting optimization is still challenging to solve due to the non-convexity of the attacker’s utility and the dependence of the uncertainty set on λ dec . By exploiting the decomposibility of the deception space and the monotonicity of the attacker’s utility, we show that the alternative relaxed problem can be solved optimally in polynomial time. The idea is to decompose the problem into a polynomial number of sub-problems (according to the decomposition of the deception space)—each sub problem can be solved in a polynomial time given the attacker optimal deception decision within each sub-space is shown to be one of the extreme points of that sub-space, despite the non-convexity of the sub-problem.
Second, we propose a new counter-deception algorithm, which generates an optimal defense function that outputs a defense strategy for each possible (deceptive) learning outcome. Our key finding is that there is a universal optimal defense function for the defender, regardless of any additional information he has about the relation between their learning outcome and the deception choice of the attacker (besides the common knowledge that the learning outcome is within a range around the deception choice). Importantly, this optimal defense function, which can be determined by solving a single non-linear program, only outputs two different strategies despite the infinite-sized learning outcome space. Our counter-deception algorithm is built based on an extensive in-depth analysis of intrinsic characteristics of the attacker’s adaptive deception response to any deception-aware defense solution. That is, under our propose defense mechanism, the attacker’s deception space remains decomposable (although the sub-spaces vary which depends on the counter-deception mechanism) and the attacker’s optimal deception remains one of the extreme points of the deception sub-spaces.
Third, we conduct extensive experiments to evaluate our proposed algorithms in various security game settings with different number of targets, various ranges of the defender capacity as well as different levels of the attacker uncertainty, and finally, different correlations between players’ payoffs. Our results show that (i) despite the uncertainty, the attacker still obtains a significantly higher utility by playing deceptively when the defender is unaware of the attacker deception; and (ii) the defender can substantially diminish the impact of the attacker’s deception when following our counter-deception algorithm.

Outline of the Article

We outline the rest of our article as follows. We discuss the Related Work and Background in Section 2 and Section 3. In Section 4, we present our detailed theoretical analysis on the attacker behavior deception under the uncertainty of the defender’s learning outcome, given that the defender is unaware of the attacker’s deception. In Section 5, we describe our new counter-deception algorithm for the defender to tackle the attacker’s manipulation. In this section, we first extend theoretical results in Section 4 as to analyzing the attacker manipulation adaptation to the defender’s counter-deception. Based on the result of the attacker adaptation, we then provide theoretical results on computing the defender optimal counter-deception. In Section 6, we show our experiment results, evaluating our proposed algorithms. Finally, Section 7 concludes our article.

2. Related Work

Parameterized models of attacker behavior such as Quantal Response, and other machine learning models have been studied for Stackelberg security games ( SSG s) [5,8,9]. These models provide general techniques for modeling the attacker decision making. Prior work assumes that the attacker always plays truthfully. Thus, existing algorithms for generating defense strategies would be vulnerable against deceptive attacks by an attacker who is aware of the defender’s learning. Our work addresses such a strategic deceptive attacker by planning counter-deception defense strategies.
Deception is widely studied in security research [10,11,12,13,14,15]. In  SSG literature, a lot of prior work has studied deception by the defender, i.e., the defender exploits their knowledge regarding uncertainties to mislead the attacker’s decision making [16,17,18,19]. Recently, deception on the attacker’s side has been studied. Existing work focuses on situations in which the defender is uncertain about the attacker type [20,21,22]. Some study the attacker behavior deception problem [4,23]. They assume that the attacker knows exactly the learning outcome while in our problem, the attacker is uncertain about that learning outcome.
Our work is also related to poisoning attacks in adversarial machine learning in which an adversary can contaminate the training data to mislead ML algorithms [24,25,26,27]. Existing work in adversarial learning uses prediction accuracy as the measure to analyzing such attacks, while in our game setting, the final goals of players are to optimize their utility, given some learning outcome.

3. Background

Stackelberg Security Games ( SSG s). There is a set of T = { 1 , 2 , , T } targets that a defender has to protect using L < T security resources. A pure strategy of the defender is an allocation of these L resources over the T targets. A mixed strategy of the defender is a probability distribution over all pure strategies. In this work, we consider the no-scheduling-constraint game setting, in which each defender mixed strategy can be compactly represented as a coverage vector x = { x 1 , x 2 , , x T } , where x t [ 0 , 1 ] is the probability that the defender protects target t and t x t L  [28]. We denote by X the set of all defense strategies. In  SSG s, the defender plays first by committing to a mixed strategy, and the attacker responds against this strategy by choosing a single target to attack.
When the attacker attacks target t, it obtains a reward R t a while the defender receives a penalty P t d if the defender is not protecting that target. Conversely, if the defender is protecting t, the attacker gets a penalty P t a < R t a while the defender receives a reward R t d > P t d . The expected utility of the defender, U t d ( x t ) (and attacker’s, U t a ( x t ) ), if the attacker attacks target t are computed as follows:
U t d ( x t ) = x t R t d + ( 1 x t ) P t d U t a ( x t ) = x t P t a + ( 1 x t ) R t a
Quantal Response Model ( QR ). QR is a well-known behavioral model used to predict boundedly rational (attacker) decision making in security games [2,6,7]. Essentially, QR predicts the probability that the attacker attacks each target t using the softmax function:
q t ( x , λ ) = e λ U t a ( x t ) t e λ U t a ( x t )
where λ is the parameter that governs the attacker’s rationality. When λ = 0 , the attacker attacks every target uniformly at random. When λ = + , the attacker is perfectly rational. Given that the attacker follows QR , the defender and attacker’s expected utility is computed as an expectation over all targets:
U d ( x , λ ) = t q t ( x , λ ) U t d ( x t )
U a ( x , λ ) = t q t ( x , λ ) U t a ( x t )
The attacker’s utility U a ( x , λ ) was proved to be increasing in λ  [4]. We leverage this monotonicity property to analyze the attacker’s deception. In SSG s, the defender can learn λ based on some collected attack data, denoted by λ learnt , and find an optimal strategy which maximizes their expected utility accordingly:
max x X U d ( x , λ learnt )

4. Attacker Behavior Deception under Unknown Learning Outcome

We first study the problem of imitative behavior deception in a security scenario in which the attacker does not know exactly the defender’s learning outcome. Formally, if the attacker plays according to a particular parameter value of QR , denoted by λ dec , the learning outcome of the defender can be any value within the interval [ max { λ dec δ , 0 } , λ dec + δ ] , where δ > 0 represents the extent to which the attacker is uncertain about the learning outcome of the defender. We term this interval, [ max { λ dec δ , 0 } , λ dec + δ ] , as the uncertainty range of λ dec . We are particularly interested in the research question:
Given uncertainty about learning outcomes of the defender, can the attacker still benefit from playing deceptively?
In this section, we consider the scenario when the attacker plays deceptively while the defender does not take into account the prospect of the attacker’s deception. We aim at analyzing the attacker deception decision in this no-counter-deception scenario. We assume that the attacker plays deceptively by mimicking any λ dec within the range [ 0 , λ m a x ] . We consider λ 0 as this is the widely accepted range of the attacker’s bounded rationality in the literature. The value λ m a x represents the limit to which the attacker plays deceptively. When λ m a x , the deception range of the attacker covers the whole range of λ . We aim at examining the impact of λ m a x on the deception outcome of the attacker later in our experiments. Given uncertainty about the learning outcome of the defender, the attacker attempts to find the optimal λ dec [ 0 , λ m a x ] to imitate that maximizes its utility in the worst case scenario of uncertainty, which can be formulated as follows:
( P dec ) : max λ dec min λ learnt U a ( x ( λ learnt ) , λ dec ) s . t . λ dec [ 0 , λ m a x ] max { λ dec δ , 0 } λ learnt λ dec + δ x ( λ learnt ) argmax x X U d ( x , λ learnt )
where x ( λ learnt ) is the defender’s optimal strategy w.r.t their learning outcome λ learnt . The objective U a ( x ( λ learnt ) , λ dec ) is the attacker’s utility when the defender plays x ( λ learnt ) and the attacker mimics QR with λ dec to play (see Equations (1)–(3) for the detailed computation). In addition, U d ( x , λ learnt ) is the defender’s expected utility that the defender aims to maximize where x is the defender’s strategy and λ learnt is the learning outcome of the defender regarding the attacker’s behavior. Essentially, the last constraint of ( P dec ) ensures that the defender will play an optimal defense strategy according to their learning outcome. Finally, due to potential noises in learning, the defender’s learning outcome λ learnt may fall outside of the deception range of the attacker, which is captured by our constraint that λ learnt λ dec + δ .

4.1. A Polynomial-Time Deception Algorithm

The optimization problem ( P dec ) involves three-nested optimization levels which is not straightforward to solve. We thus propose to limit the possible learning outcomes of the defender by discretizing the domain of λ learnt into a finite set Λ discrete learnt = ( λ 1 learnt , λ 2 learnt , , λ K learnt ) where λ 1 learnt = 0 , λ K learnt = λ m a x + δ , and  λ k + 1 learnt λ k learnt = η , k < K where η > 0 is the discretization step size and K = λ m a x + δ η + 1 is the number of discrete learning values1. For each deception choice λ dec , the attacker’s uncertainty set of defender’s possible learning outcomes λ learnt is now given by:
Λ discrete learnt ( λ dec ) = Λ discrete learnt [ λ dec δ , λ dec + δ ]
For each λ k learnt , we can easily compute the corresponding optimal defense strategy x ( λ k learnt ) in advance [2]. We thus obtain a simplified optimization problem:
( P discrete dec ) : max λ dec [ 0 , λ m a x ] U s . t . U U a ( x ( λ k learnt ) , λ dec ) , for all λ k learnt Λ discrete learnt ( λ dec )
where U is the maximin utility for the attacker in the worst-case of learning outcome.
Remark on computational challenge. Although ( P discrete dec ) is a single-level optimization, solving it is still challenging due to (i) ( P discrete dec ) is a non-convex optimization problem since the attacker’s utility U a ( x ( λ k learnt ) , λ dec ) is non-convex in λ dec ; and (ii) the number of inequality constraints in ( P discrete dec ) vary with respect to λ dec , which complicates the problem further. By exploiting the decomposability property of the deception space [ 0 , λ m a x ] and the monotonicity of the attacker’s utility function U a ( x ( λ k learnt ) , λ dec ) , we show that ( P discrete dec ) can be solved optimally in a polynomial time.
Theorem 1
(Time complexity). ( P discrete dec ) can be solved optimally in a polynomial time.
Overall, the proof of Theorem 1 is derived based on (i) Lemma 1—showing that the deception space can be divided into an O ( K ) number of sub-intervals, and each sub-interval leads to the same uncertainty set; and (ii) Lemma 4—showing that ( P discrete dec ) can be divided into a O ( K ) sub-problems which correspond to the decomposability of the deception space (as shown in Lemma 1), and each sub-problem can be solved in polynomial time.

4.1.1. Decomposability of Deception Space

In the following, we first present our theoretical analysis on the decomposability of the deception space. We then describe in detail our decomposition algorithm.
Lemma 1
(Decomposability of deception space). The attacker deception space [ 0 , λ m a x ] can be decomposed into a finite number of disjointed sub-intervals, denoted by i n t j dec where j = 1 , 2 , , and i n t j dec i n t j dec = for all j j and j i n t j dec = [ 0 , λ m a x ] , such that each λ dec i n t j dec leads to the same uncertainty set of learning outcomes, denoted by Λ j learnt Λ discrete learnt . Furthermore, these sub-intervals and uncertainty sets ( i n t j dec , Λ j learnt ) can be found in a polynomial time.
The proof of Lemma 1 is derived based on Lemmas 2 and 3. An example of the deception-space decomposition is illustrated in Figure 1. Intuitively, although the deception space [ 0 , λ m a x ] is infinite, the total number of possible learning-outcome uncertainty sets is at most 2 K (i.e., the number of subsets of the discrete learning space Λ discrete learnt ). Therefore, the deception space can be divided into a finite number of disjoint subsets such that any deception value λ dec within each subset will lead to the same uncertainty set. Moreover, each of these deception subsets form a sub-interval of [ 0 , λ m a x ] , which is derived from Lemma 2:
Lemma 2.
Given two deception values λ 1 dec < λ 2 dec [ 0 , λ m a x ] , if the learning uncertainty sets corresponding to these two values are the same, i.e.,  Λ discrete learnt ( λ 1 dec ) Λ discrete learnt ( λ 2 dec ) , then for any deception value λ 1 dec < λ dec < λ 2 dec , its uncertainty set is also the same, that is:
Λ discrete learnt ( λ dec ) Λ discrete learnt ( λ 1 dec ) Λ discrete learnt ( λ 2 dec )
The remaining analysis for Lemma 1 is to show that these deception sub-intervals can be found in polynomial time, which is obtained based on Lemma 3:
Lemma 3.
For each learning outcome λ k learnt , there are at most two deception sub-intervals such that λ k learnt is the smallest learning outcome in the corresponding learning uncertainty set. As a result, the total number of deception sub-intervals is O ( K ) , which is polynomial.
Since there is a O ( K ) number of deception sub-intervals, we now can develop a polynomial-time algorithm (Algorithm 1) which iteratively divides the deceptive range [ 0 , λ m a x ] into multiple intervals, denoted by { i n t j dec } j . Each of these intervals, i n t j dec , corresponds to the same uncertainty set of possible learning outcomes for the defender, denoted by Λ j learnt .
In this algorithm, for each λ k learnt , we denote by l b k = λ k learnt δ and u b k = λ k learnt + δ the smallest and largest possible values of λ dec so that λ k learnt belongs to the uncertainty set of λ dec . In Algorithm 1, s t a r t is the variable which represents the left bound of each interval i n t j dec . The variable o p e n indicates if i n t j dec is left-open ( o p e n = t r u e ) or not ( o p e n = f a l s e ). If  s t a r t is known for i n t j dec , the uncertainty set Λ j learnt can be determined as follows:
Λ j learnt = { λ k learnt : λ k learnt [ s t a r t δ , s t a r t + δ ] } if i n t j dec is left - closed Λ j learnt = { λ k learnt : λ k learnt ( s t a r t δ , s t a r t + δ ] } if i n t j dec is left - open
Initially, s t a r t is set to 0 which is the lowest possible value of λ dec such that the uncertainty range [ λ dec δ , λ dec + δ ] contains λ 1 learnt and o p e n = f a l s e . Given s t a r t and its uncertainty range [ s t a r t δ , s t a r t + δ ] , the first interval i n t 1 dec of λ dec corresponds to the uncertainty set determined as follows:
Λ 1 learnt = { λ k learnt Λ learnt : λ k learnt [ s t a r t δ , s t a r t + δ ] }
At each iteration j, given the left bound s t a r t and the uncertainty set Λ j learnt of the interval i n t j dec , Algorithm 1 determines the right bound of i n t j dec , the left bound of the next interval i n t j + 1 dec (by updating s t a r t ), and the uncertainty set Λ j + 1 learnt , (lines (6–15)). Finally, we prove the correctness of Algorithm 1 by presenting Proposition 1, which shows that for any λ dec within each interval i n t j dec , the corresponding uncertainty interval [ λ dec δ , λ dec + δ ] covers the same uncertainty set Λ j learnt .
Algorithm 1: Imitative behavior deception—Decomposition of QR parameter domain into sub-intervals
Games 13 00081 i001
Proposition 1
(Correctness of Algorithm 1). Each iteration j of Algorithm 1 returns an interval i n t j dec such that each λ dec i n t j dec leads to the same uncertainty set:
Λ j learnt = { λ k j m i n learnt , , λ k j m a x learnt }
The rest of this section will provide details of missing proofs for the aforementioned theoretical results.
Proof of Lemma 2.
For any λ learnt Λ discrete learnt ( λ 1 dec ) Λ discrete learnt ( λ 2 dec ) , we have:
λ 1 dec δ λ learnt λ 1 dec + δ λ 2 dec δ λ learnt λ 2 dec + δ
Since λ dec ( λ 1 dec , λ 2 dec ) , we obtain:
λ dec δ λ learnt λ dec + δ
which implies λ learnt Λ discrete learnt ( λ dec ) . As a result,
Λ discrete learnt ( λ 1 dec ) Λ discrete learnt ( λ 2 dec ) Λ discrete learnt ( λ dec ) ( * )
On the other hand, let us consider a λ learnt Λ discrete learnt ( λ dec ) , or equivalently, λ dec δ λ learnt λ dec + δ . We are going to show that this λ learnt Λ discrete learnt ( λ 1 dec ) Λ discrete learnt ( λ 2 dec ) as well. Indeed, let us assume λ learnt Λ discrete learnt ( λ 1 dec ) Λ discrete learnt ( λ 2 dec ) . It means the following inequalities must hold true:
λ 1 dec + δ < λ learnt < λ 2 dec δ
which means that the uncertainty ranges with respect to λ 1 dec and λ 2 dec are not overlapped, i.e.,  [ λ 1 dec δ , λ 1 dec + δ ] [ λ 2 dec δ , λ 2 dec + δ ] , or equivalently, Λ discrete learnt ( λ 1 dec ) Λ discrete learnt ( λ 2 dec ) , which is contradictory.
Therefore, λ learnt Λ discrete learnt ( λ 1 dec ) Λ discrete learnt ( λ 2 dec ) , meaning that:
Λ discrete learnt ( λ dec ) Λ discrete learnt ( λ 1 dec ) Λ discrete learnt ( λ 2 dec ) ( * * )
The combination of (*) and (**) concludes our proof.   □
Proof of Lemma 3.
First, although the deception space [ 0 , λ m a x ] is infinite, the total number of possible learning-outcome uncertainty sets is at most 2 K (i.e., the number of subsets of the discrete learning space Λ discrete learnt ). Therefore, the deception space can be divided into a finite number of disjoint subsets such that any deception value λ dec within each subset will lead to the same uncertainty set. Moreover, each of these deception subsets form a sub-interval of [ 0 , λ m a x ] , which is a result of Lemma 2.
Now, in order to prove that the number of disjoint sub-intervals is O ( K ) , we will show that for each learning outcome λ k learnt , there are at most two deception sub-intervals such that λ k learnt is the smallest learning outcome in the corresponding learning uncertainty set. Let us assume there is a deception sub-interval [ λ 1 dec , λ 2 dec ] which leads to an uncertainty set { λ k learnt , λ k + 1 learnt , , λ k learnt } for some k k . We will prove that the following inequalities must hold:
2 δ η 2 < k k 2 δ η
where η is the discretization step size. Indeed, for any λ dec [ λ 1 dec , λ 2 dec ] , we have:
λ dec δ λ k learnt λ dec + δ λ dec δ λ k learnt λ dec + δ λ k 1 learnt < λ dec δ and λ k + 1 learnt > λ dec + δ
Therefore,
λ k learnt λ k learnt 2 δ k k 2 σ η λ k + 1 learnt λ k 1 learnt > 2 δ k k > 2 σ η 2
which concludes (4). Now, according to (4), for every k, then k = k + 2 σ η 2 or k = k + 2 σ η , which means that there are at most two deception sub-intervals such that λ k learnt is the smallest learning outcome in their learning uncertainty sets.   □
Proof of Proposition 1.
Note that, for each λ k learnt , we denote by l b k = λ k learnt δ and u b k = λ k learnt + δ the smallest and largest possible values of λ dec so that λ k learnt belongs to the uncertainty set of λ dec . In addition, k j m a x and k j m i n are the indices of the smallest and largest learning outcomes in the learnt uncertainty set for every deception value in the j t h deception interval.
At each iteration j, given the j t h learnt uncertainty set, Algorithm 1 attempts to find the corresponding j t h deception interval as well as the next ( j + 1 ) t h learnt uncertainty set. Essentially, Algorithm 1 considers two cases:
  • Case 1: k j m a x < K and l b k j m a x + 1 u b k j m i n . This is when (i) the j t h deception interval does not cover the maximum possible learning outcome λ K learnt ; and (ii) the smallest deception value w.r.t the learning outcome λ k j m a x + 1 learnt is less than the largest deception value w.rt the learning outcome λ k j m i n learnt . Intuitively, (ii) implies that the upper bound of the j t h deception interval is strictly less than l b k j m a x + 1 . Otherwise, this deception upper bound will correspond to an uncertainty set which covers the learning outcome λ k j m a x + 1 learnt , which is contradict to the fact that λ k j m a x learnt (which is strictly less that λ k j m a x + 1 learnt ) is the maximum learning outcome for the j t h deception interval.
In this case, the interval i n t j dec is determined as follows:
i n t j dec = [ s t a r t , l b k j m a x + 1 ) if o p e n = f a l s e i n t j dec = ( s t a r t , l b k j m a x + 1 ) if o p e n = t r u e
Note that, since Λ j learnt is the uncertainty set of s t a r t with the smallest and largest indices of ( k j m i n , k j m a x ), we have: l b k j m i n l b k j m a x s t a r t and u b k j m i n 1 < s t a r t . Therefore, for any λ dec i n t j dec , we obtain:
l b k j m i n s t a r t λ dec and λ dec < l b k j m a x + 1 u b k j m i n l b k j m a x s t a r t λ dec and λ < u b i u b k j m a x λ dec < l b k j m a x + 1 and λ dec s t a r t > u b k j m i n 1
which means λ k j m i n learnt and λ k j m a x learnt belongs to the uncertainty set of λ dec while λ k j m i n 1 learnt and λ k j m a x + 1 learnt do not. Thus, Λ j learnt is the uncertainty set of λ dec . Since i n t j dec is open-right, the left bound of i n t j + 1 dec is s t a r t = l b m j + 1 and o p e n = f a l s e , and  Λ j + 1 learnt is determined accordingly.
  • Case 2: k j m a x = K or l b k j m a x + 1 > u b k j m i n . Note that when l b k j m a x + 1 > u b k j m i n , the upper bound of the j t h deception interval must be at most u b k j m i n . This is to ensure that this upper bound covers the learning outcome λ k j m i n learnt .
In this case, deception interval i n t j dec is determined as follows:
i n t j dec = [ s t a r t , u b k j m i n ] if o p e n = f a l s e i n t j dec = ( s t a r t , u b k j m i n ] if o p e n = t r u e
The argument for this case is similar. For the sake of analysis, since k j m a x = K which is the largest index of λ learnt in the entire set Λ learnt , we set l b k j m a x + 1 = . For any λ dec i n t j dec , we have:
l b k j m i n s t a r t λ dec u b k j m i n l b k j m a x s t a r t λ dec u b k j m i n u b k j m a x λ dec u b k j m i n < l b k j m a x + 1 and λ dec s t a r t > u b k m i n 1
which implies Λ j learnt is the uncertainty set of λ dec . Since i n t j dec is closed-right, the left bound of i n t j + 1 dec is s t a r t = u b k j m i n and o p e n = t r u e , concluding our proof.   □

4.1.2. Divide and Conquer: (Divide P discrete dec ) into a O ( K ) Polynomial Sub-Problems

Lemma 4
(Divide-and-conquer). The problem ( P discrete dec ) can be decomposed into O ( K ) sub-problems { ( P j dec ) } according to the decomposibility of the deception space. Each of these sub-problems can be solved in polynomial time.
Indeed, we can now divide the problem ( P discrete dec ) into multiple sub-problems which correspond to the decomposition of the deception space (Lemma 1). Essentially, each sub-problem optimizes λ dec (and λ learnt ) over the deception sub-interval i n t j dec (and its corresponding uncertainty set Λ j learnt ), as shown in the following:
( P j dec ) : max λ dec i n t j dec U a s . t . U a U a ( x ( λ k learnt ) , λ dec ) , λ k learnt Λ j learnt
which maximizes the attacker’s worst-case utility w.r.t uncertainty set Λ j learnt . Note that the defender strategies x ( λ k learnt ) can be pre-computed for every outcome λ k learnt . Each sub-problem ( P j dec ) has a constant number of constraints, but still remain non-convex. Our Lemma 5 shows that despite of the non-convexity, the optimal solution for ( P j dec ) is actually straightforward to compute.
Lemma 5.
The optimal solution of λ dec for each sub-problem, P j dec , is the (right) upper limit of the corresponding deception sub-interval i n t j dec .
This observation is derived based on the fact that the attacker’s utility, U a ( x , λ ) , is an increasing function of λ  [4]. Therefore, in order to solve ( P discrete dec ), we only need to iterate over right bounds of i n t j dec and select the best j such that the attacker’s worst-case utility (i.e., the objective of ( P j dec ) ), is the highest among all sub-intervals. Since there are O ( K ) sub-problems, ( P discrete dec ) can be solved optimally in a polynomial time, concluding our proof for Theorem 1.

4.2. Solution Quality Analysis

We now focus on analyzing the solution quality of our method presented in Section 4.1 to approximately solve the deception problem ( P dec ) . Intuitively, let us denote by λ * dec the optimal solution of ( P dec ) and U worst - case a ( λ * dec ) is the corresponding worst-case utility of the attacker under the uncertainty of learning outcomes in ( P dec ) . We also denote by λ discrete dec the optimal solution of ( P discrete dec ). Then, Theorem 2 states that:
U worst - case a ( λ * dec ) U worst - case a ( λ discrete dec ) U worst - case a ( λ * dec ) ϵ
Theorem 2.
For any arbitrary ϵ > 0 , there always exists a discretization step size η > 0 such that the optimal solution of the corresponding ( P discrete dec ) is ϵ-optimal for ( P dec ) .
Proof. 
Let us denote by λ * dec the optimal solution of ( P dec ) . Then the worst-case utility of the attacker is determined as follows:
U worst ( λ * dec ) = min λ learnt [ λ * dec δ , λ * dec + δ ] U a ( x ( λ learnt ) , λ * dec )
On the other hand, let us denote by λ discrete dec the optimal solution of ( P discrete dec ) . Then the discretized worst-case utility of the attacker is determined as follows:
U discrete worst ( λ discrete dec ) = min λ learnt Λ discrete learnt ( λ discrete dec ) U a ( x ( λ learnt ) , λ discrete dec )
Note that, U discrete worst ( λ discrete dec ) is not the actual worst-case utility of the attacker for mimicking λ discrete dec since it is computed based on the discrete uncertainty set, rather than the original continuous uncertainty set. In fact, the actual attacker worst-case utility is U worst ( λ discrete dec ) . We will show that for any ϵ > 0 , there exists a discretization step size η such that:
U worst ( λ * dec ) U worst ( λ discrete dec ) U worst ( λ * dec ) ϵ
Observe that the first inequality is easily obtained since λ * dec the optimal solution of ( P dec ) . Therefore, we will focus on the second inequality. First, we obtain the following inequalities:
U worst ( λ * dec ) U discrete worst ( λ * dec ) U discrete worst ( λ discrete dec )
The first inequality is obtained based on the fact that the discretized uncertainty set is a subset of the actual continuous uncertainty range Λ discrete learnt ( λ * dec ) [ λ * dec δ , λ * dec + δ ] . The second inequality is derived from the fact that λ discrete dec is the optimal solution of ( P discrete dec ) . Therefore, in order to obtain the second inequality of (5), we are going to prove that for any ϵ > 0 , there exists η > 0 such that:
U worst ( λ discrete dec ) + ϵ U discrete worst ( λ discrete dec )
Let us denote by λ * learnt the worst-case learning outcome with respect to λ discrete dec within the uncertainty range [ λ discrete dec δ , λ discrete dec + δ ] . That is,
U worst ( λ discrete dec ) = U a ( x ( λ * learnt ) , λ discrete dec )
Since Λ discrete learnt ( λ discrete dec ) is a discretization of [ λ discrete dec δ , λ discrete dec + δ ] , there exist a λ k learnt Λ discrete learnt ( λ discrete dec ) such that | λ k learnt λ * learnt | η . Now, according to the definition of the discretized worst-case utility of the attacker, we have:
U discrete worst ( λ discrete dec ) U a ( x ( λ k learnt ) , λ discrete dec )
Therefore, proving (6) now induces to proving η :
U a ( x ( λ k learnt ) , λ discrete dec ) U a ( x ( λ * learnt ) , λ discrete dec ) ϵ
where | λ k learnt λ * learnt | η . First, according to [23], for any λ , the defender’s corresponding optimal strategy x ( λ ) is a differentiable function of λ . Second, the attacker’s utility U a ( x , λ discrete dec ) is a differentiable function of the defender’s strategy x for any λ discrete dec . Therefore, U a ( x ( λ ) , λ discrete dec ) is differentiable (and thus continuous) at λ . According to the continuity property, for any ϵ > 0 , there always exists η > 0 such that:
U a ( x ( λ ) , λ discrete dec ) U a ( x ( λ * learnt ) , λ discrete dec ) ϵ
for all λ such that | λ λ * learnt | η , concluding our proof.   □

4.3. Heuristic to Improve Discretization

According to Theorem 2, we can obtain a high-quality solution for ( P dec ) by having a fine discretization of the learning outcome space with a small step size η . In practice, it is not necessary to have a fine discretization over the entire learning space right from the begining. Instead, we can start with a coarse discretization and solve the corresponding ( P discrete dec ) to obtain a solution of λ discrete dec . We then refine the discretization only within the uncertainty range of the current solution, [ λ discrete dec δ , λ discrete dec + δ ] . We keep doing that until the uncertainty range of the latest deception solution reaches the step-size limit which guarantees the ϵ -optimality. Practically, by doing so, we will obtain a much smaller discretized learning outcome set (aka. smaller K). As a result, the computational time for solving ( P discrete dec ) is substantially faster while the solution quality remains the same.

5. Defender Counter-Deception

In order to counter the attacker’s imitative deception, we propose to find a counter-deception defense function H : [ 0 , λ m a x + δ ] X which maps a learnt parameter λ learnt to a strategy x of the defender. In designing an effective H , we need to take into account that the attacker will also adapt its deception choice accordingly, denoted by λ dec ( H ) . Essentially, the problem of finding an optimal defense function which maximizes the defender’s utility against the attacker’s deception can be abstractly represented as follows:
max H U d ( H , λ dec ( H ) )
where λ dec ( H ) is the deception choice of the attacker with respect to the defense function H and U d is the defender’s utility corresponding to ( H , λ dec ( H ) ) . Finding an optimal H is challenging since the domain [ 0 , λ m a x + δ ] of λ learnt is continuous and there is no explicit closed-form expression of H as a function of λ learnt . For the sake of our analysis, we divide the entire domain [ 0 , λ m a x + δ ] into a number of sub-intervals I = { I 1 d , I 2 d , , I N d } where I 1 d = [ λ 1 def , λ 2 def ] , I 2 d = ( λ 2 def , λ 3 def ] , &, I N d = ( λ N def , λ N + 1 def ] with 0 = λ 1 def λ 2 def λ N + 1 def = λ m a x + δ , and N is the number of sub-intervals. We define a defense function with respect to the interval set: H I : I X which maps each interval I n d I to a single defense strategy x n , i.e.,  H I ( I n d ) = x n X , for all n N . We denote the set of these strategies by X def = { x 1 , , x N } . Intuitively, all λ learnt I n d will lead to a single strategy x n . Our counter-deception problem now becomes finding an optimal defense function H * = ( I * , H * I * ) that comprises of (i) an optimal interval set I * ; and (ii) corresponding defense strategies determined by the defense function H * I * with respect to I * , taking into account the attacker’s deception adaptation. Essentially, ( I * , H * I * ) is the optimal solution of the following optimization problem:
max I , H I U d ( H I , λ dec ( H I ) )
s . t . λ dec ( H I ) argmax λ dec [ 0 , λ m a x ] min x X ( λ dec ) U a ( x , λ dec )
where λ dec ( H I ) is the maximin deception choice of the attacker. Here, X ( λ dec ) = { x n : I n d [ λ dec δ , λ dec + δ ] } is the uncertainty set of the attacker when playing λ dec . This uncertainty set contains all possible defense strategy outcomes with respect to the deceptive value λ dec .
Main Result. To date, we have not explicitly defined the objective function, U d ( H I , λ dec ( H I ) ) , except that we know this utility depends on the defense function H I and the attacker’s deception response λ dec ( H I ) . Now, since H I maps each possible learning outcome λ learnt to a defense strategy, we know that if λ learnt I n d , then U d ( H I , λ dec ( H I ) ) = U d ( x n , λ dec ( H I ) ) , which can be computed using Equation (3). However, due to the deviation of λ learnt from the attacker’s deception choice, λ dec ( H I ) , different possible learning outcomes λ learnt within [ λ dec ( H I ) δ , λ dec ( H I ) + δ ] may belong to different intervals I n d (which correspond to different strategies x n ), leading to different utility outcomes for the defender. One may argue that to cope with this deception-learning uncertainty, we can apply the maximin approach to determine the defender’s worst-case utility if the defender only has the common knowledge that λ learnt [ λ dec ( H I ) δ , λ dec ( H I ) + δ ] . Furthermore, perhaps, depending on any additional (private) knowledge the defender has regarding the relation between the attacker’s deception and the actual learning outcome of the defender, we can incorporate such knowledge into our model and algorithm to obtain an even better utility outcome for the defender. Interestingly, we show that there is, in fact, a universal optimal defense function for the defender, H * , regardless of any additional knowledge that he may have. That is, the defender obtains the highest utility by following this defense function, and additional knowledge besides the common knowledge cannot make the defender do better. Our main result is formally stated in Theorem 3.
Theorem 3.
There is a universal optimal defense function, regardless of any additional information (besides the common knowledge) he has about the relation between their learning outcome and the deception choice of the attacker. Formally, let us consider the following optimization problem:
( P counter ) : max x , λ U d ( x , λ ) s . t . U a ( x , λ ) min x X U a ( x , λ m a x ) 0 λ λ m a x , x X
Denote by ( x * , λ * ) an optimal solution of ( P counter ) , then an optimal solution of (7), H * can be determined as follows:
  • If λ * = λ m a x , choose the interval set I * = { I 1 d } with I 1 d = [ 0 , λ m a x + δ ] covering the entire learning space, and function H * I * ( I 1 d ) = x 1 where x 1 = x * .
  • If λ * < λ m a x , choose the interval set I * = { I 1 d , I 2 d } with I 1 d = [ 0 , λ * + δ ] , I 2 d = ( λ * + δ , λ m a x + δ ] . In addition, choose the defender strategies x 1 = x * and x 2 argmin x X U a ( x , λ m a x ) correspondingly.
The attacker’s optimal deception against this defense function is to mimic λ * . As a result, the defender always obtains the highest utility, U d ( x * , λ * ) , while the attacker receives the maximin utility of U a ( x * , λ * ) .
Example 1.
Let us give a concrete example illustrating the result in Theorem 3. Considering a 3-target security game with the following payoff matrix shown in Table 1:
In this game, the defender has 1 security resource. The maximum deception value of the attacker is λ m a x = 3 and the uncertainty level δ = 0.25 . By solving ( P counter ) , we obtain a corresponding defender strategy x * = [ 0 , 1 , 0 ] and the attacker behavior parameter λ * = 0 . Since λ * < λ m a x , the optimal counter-deception defense function is as follows:
  • If the defender learns λ learnt [ 0 , 0.25 ] , the defender will play a strategy x 1 = x * = [ 0 , 1 , 0 ] .
  • Otherwise, if the defender learns λ learnt ( 0.25 , 3.25 ] , the defender then plays x 2 = [ 0.34 , 0.20 , 0.46 ] argmin x X U a ( x , λ m a x ) .
Given the defender follows this counter-deception function, the attacker’s optimal deception is to mimic λ * = 0 , meaning the attacker just simply attacks each target uniformly at random. Here is the reason why this is the optimal choice for the attacker:
  • If the attacker chooses λ dec = λ * = 0 , the corresponding learning outcome for the defender can be any value within the range [ 0 , 0 , 25 ] . According to the defense function, the defender will always play the strategy x 1 = [ 0 , 1 , 0 ] . As a result, the attacker’s expected utility is 1 3 × 2 + 1 3 × ( 2 ) + 1 3 × 3 = 1.0 .
  • Now, if the attacker chooses λ dec > λ * = 0 , the corresponding learning outcome for the defender may fall into either [ 0 , 0.25 ] or ( 0.25 , 3.25 ] . In particular, if the learning outcome λ learnt ( 0.25 , 3.25 ] , it means the defender plays x 2 = [ 0.34 , 0.20 , 0.46 ] argmin x X U a ( x , λ m a x ) . In this case, the resulting attacker utility is U a ( x 2 , λ dec ) U a ( x 2 , λ m a x ) = 0.33 (this inequality is due to the fact that the attacker utility is an increasing function of λ dec ). As a result, the worst-case utility of the attacker is no more than 0.33 which is strictly lower than the utility of 1.0 when the attacker mimics λ dec = λ * = 0 .
Corollary 1.
When λ m a x = + , the defense function H * (specified in Theorem 3) gives the defender a utility which is no less than their Strong Stackelberg equilbrium (SSE) utility.
The proof of Corollary 1 is straightforward. Since ( x sse , λ m a x = + ) is a feasible solution of ( P counter ), the optimal utility of the defender U d ( x * , λ * ) is thus no less than U d ( x sse , λ m a x ) ( x sse denotes the defender’s SSE strategy).
Now the rest of this section will be devoted to prove Theorem 3. The full proof of Theorem 3 can be decomposed into three main parts: (i) We first analyze the attacker deception adapted to the defender’s counter deception; (ii) Based on the result of the attacker adaptation, we provide theoretical results on computing the defender optimal defense function given a fixed set of sub-intervals I ; and (iii) Finally, we complete the proof of the theorem leveraging the result in (ii).

5.1. Analyzing Attacker Deception Adaptation

In this section, we aim at understanding the behavior of the attacker deception against H I . Overall, as discussed in the previous section, since the attacker is uncertain about the actual learning outcome of the defender, the attacker can attempt to find an optimal deception choice λ dec ( H I ) that maximizes its utility under the worst case of uncertainty. Essentially, λ dec ( H I ) is an optimal solution of the following maximin problem:
max λ dec [ 0 , λ m a x ] min x X ( λ dec ) U a ( x , λ dec )
where: X ( λ dec ) = { x n : I n d [ λ dec δ , λ dec + δ ] } is the uncertainty set of the attacker with respect to the defender’s sub-intervals I . In this problem, the uncertainty set X ( λ dec ) depends on λ dec that we need to optimize, making this problem challenging to solve.

5.1.1. Decomposability of Attacker Deception Space

First, given H I , we show that we can divide the range of λ dec into several intervals, each interval corresponds to the same uncertainty set. This characteristic of the attacker uncertainty set is, in fact, similar to the no-counter-deception scenario as described in previous section. We propose Algorithm 2 to determine these intervals of λ dec , which works in a similar fashion as Algorithm 1. The main difference is that in the presence of the defender’s defense function, the attacker’s uncertainty set X ( λ dec ) is determined based on whether the uncertainty range of the attacker [ λ dec δ , λ dec + δ ] is overlapped with the defender’s intervals I = { I n d } or not.
Algorithm 2: Counter-deception—Decomposition of QR parameter into sub-intervals
Games 13 00081 i002
Essentially, similar to Algorithm 1, Algorithm 2 also iteratively divides the range of λ dec into multiple intervals, (with an abuse of notation) denoted by { i n t j dec } . Each of these intervals, i n t j dec , corresponds to the same uncertainty set of x n , denoted by X j def . In this algorithm, for each interval of the defender I n d , l b n = λ n def δ and u b n + 1 = λ n + 1 def + δ represent the smallest and largest possible deceptive values of λ dec so that I n d [ λ dec δ , λ dec + δ ] . In addition, n j m i n and n j m a x denote the smallest and largest indices of the defender’s strategies in the set X def = { x 1 , x 2 , , x N } that belongs to X j def . Algorithm 2 relies on Lemma 6 and 7. Note that Algorithm 2 does not check if each interval i n t j dec of λ dec is left-open or not since all intervals of the defender I n d is left-open (except for n = 1 ), making all i n t j dec left-closed (except for j = 1 ).
Lemma 6.
Given a deceptive λ dec , for any n 1 < n 2 such that x n 1 , x n 2 X ( λ dec ) , then x n X ( λ dec ) for any n 1 < n < n 2 .
Lemma 7.
For any λ dec such that l b n < λ dec u b n + 1 2, the uncertainty range of λ dec overlaps with the defender’s interval I n d , i.e., I n d [ λ dec δ , λ dec + δ ] , or equivalently, x n X ( λ ) . Otherwise, if λ dec l b n or λ dec > u b n + 1 , then x n X ( λ dec ) .
The proofs of these two lemmas are straightforward so we omit them for the sake of presentation. Essentially, this algorithm divides the range of λ dec into multiple intervals, (with an abuse of notation) denoted by { i n t j dec } . Each of these intervals, i n t j dec , corresponds to the same uncertainty set of x n , denoted by X j def . An example of decomposing the deceptive range of λ dec is shown in Figure 2.

5.1.2. Characteristics of Attacker Optimal Deception

We denote by M the number of attacker intervals. Given the division of the attacker’s deception range { i n t j dec } , we can divide the problem of attacker deception into M sub-problems. Each corresponds to a particular i n t j dec where j { 1 , , M } , as follows:
( P ¯ j dec ) : U j a , * = max λ dec i n t j dec min x n X j def U a ( x n , λ dec )
Lemma 8.
For each sub-problem ( P ¯ j dec ) with respect to the deception sub-interval i n t j dec , the attacker optimal deception is to imitate the right-bound of i n t j dec , denoted by λ ¯ j dec .
The proof of Lemma 8 is derived based on the fact that the attacker’s utility U a ( x n , λ dec ) is increasing in λ dec . As a result, the attacker only has to search over the right bounds, { λ ¯ j dec } , of all intervals { i n t j dec } to find the best one among the sub-problems that maximizes the attacker’s worst-case utility. We consider these bounds λ ¯ j dec to be the deception candidates of the attacker. Let us assume j o p t is the best deception choice for the attacker among these candidates, that is, the attacker will mimic the λ ¯ j o p t dec . We obtain the following observations about important properties of the attacker’s optimal deception, which we leverage to determine an optimal defense function later.
Our following Lemma 9 says that any non-optimal deception candidate for the attacker, λ ¯ j dec λ ¯ j o p t dec , such that the max index of the defender strategy in the corresponding uncertainty set X j def , denoted by n j m a x , satisfies n j m a x n j o p t m a x , then the deception candidate λ ¯ j dec is strictly less than λ ¯ j o p t dec , or equivalently, j < j o p t . Otherwise, j o p t cannot be a best deception response.
Lemma 9.
For any j j o p t s.t. n j m a x n j o p t m a x , then λ ¯ j dec < λ ¯ j o p t dec , or equivalently, j < j o p t .
Proof. 
Lemma 9 can be proved by contradiction as follows. Let us assume if there is j > j o p t such that n j m a x n j o p t m a x . According to Algorithm 2, for any attacker interval indices j > j , we have the min and max indices of the defender’s strategies in corresponding uncertainty sets must satisfy: n j m i n n j m i n and n j m a x n j m a x , and they can not be both equal. That is because the intervals { i n t j dec } returned by Algorithm 2 are sorted in a strictly increasing order. Therefore, if there is j > j o p t such that n j m a x n j o p t m a x , it means n j m i n > n j o p t m i n and n j m a x = n j o p t m a x . In other words, the uncertainty set X j def X j o p t d e f . Thus, we have the attacker’s optimal worst-case utility w.r.t deception intervals j and j o p t must satisfy:
U j o p t a , * = min x X j o p t def U a ( x , λ ¯ j o p t dec ) min x X j def U a ( x , λ ¯ j o p t dec ) < min x X j def U a ( x , λ ¯ j dec ) = U j a , *
since U a ( x , λ ) is a strictly increasing function of λ 3. This strict inequality shows that j o p t cannot be an optimal deception for the attacker, concluding our proof for Observation 9.
Note that we denote right bounds of attacker intervals by { λ ¯ 1 dec , , λ ¯ M dec = λ m a x } . Our Lemma 10 then says that if the max index of the defender strategy n j o p t m a x in the set X j o p t is equal to the max index of the whole defense set, N, then λ ¯ j o p t dec is equal to the highest value of the entire deception range, i.e., λ ¯ j o p t dec = λ ¯ M = λ m a x , or equivalently, j o p t = M .
Lemma 10.
If n j o p t m a x = N , then j o p t = M .
Proof. 
We also prove this observation using contradiction. Let us assume that j o p t < M . Again, according to Algorithm 2, for any j > j , we have n j m i n n j m i n and n j m a x n j m a x , and they can not be both equal. Therefore, if n j o p t m a x = N , then for all j > j o p t , we have: n j m a x = N and n j m i n > n j o p t m i n , which means X j def X j o p t def . Therefore, if j o p t < M , then we obtain:
U j o p t a , * = min x X j o p t def U a ( x , λ ¯ j o p t dec ) min x X M def U a ( x , λ ¯ j o p t dec ) < min x X M def U a ( x , λ ¯ M dec ) = U M a , *
which shows that j o p t cannot be an optimal deception of the attacker, concluding the proof of Lemma 10. □
Remark 1.
According to Lemmas 9 and 10, we can easily determine which deception choices among the set { λ ¯ 1 dec , , λ ¯ M dec } cannot be an optimal attacker deception, regardless of defense strategies { x 1 , , x N } . These non-optimal choices are determined as follows: the deception choice λ ¯ j can not be optimal for:
  • Anyjsuch that there is a j > j with n j m a x n j m a x
  • Any j < M such that n j m a x = N
For any other choices λ ¯ j dec , there always exists defense strategies { x 1 , , x N } such that λ ¯ j dec is an optimal attacker deception.

5.2. Finding Optimal Defense Function H I Given Fixed I: Divide-and-Conquer

Given a set of sub-intervals I , we aim at finding optimal defense function H I or equivalently, strategies X def = { x 1 , x 2 , , x N } corresponding to these sub-intervals. According to previous analysis on the attacker’s deception adaptation, since the attacker’s best deception is one of the bounds { λ ¯ 1 dec , , λ ¯ M dec } , we propose to decompose the problem of finding an optimal defense function H I into multiple sub-problems P j counter , each corresponds to a particular best deception choice for the attacker. In particular, for each sub-problem P j counter , we attempt to find H I such that λ ¯ j dec is the best response of the attacker. As discussed in the remark of previous section, we can easily determine which sub-problem P j counter is not feasible. For any feasible optimal deception candidate j fea , i.e., P j fea counter is feasible, P j fea counter can be formulated as follows:
( P j fea counter ) : max H I U d ( H I , λ ¯ j fea dec ) s . t . min x X j fea def U a ( x , λ ¯ j fea dec ) min x X j def U a ( x , λ ¯ j dec ) , j
where U d ( H I , λ ¯ j fea dec ) is the defender’s utility when the defender commits to H I and the attacker plays λ ¯ j fea dec . The constraints in ( P j fea counter ) guarantee that the attacker’s worst-case utility for playing λ ¯ j fea dec is better than playing other λ ¯ j dec . Finally, our Propositions 2 and 3 determine an optimal solution for ( P j fea counter ).
Proposition 2
(Sub-problem P j fea counter ). If n j fea m a x < N , the best defense function for the defender is determined as follows:
  • For all n > n j fea m a x , choose x n = x > * where x > * is an optimal solution of the following optimization problem:
    min x X U a ( x , λ m a x )
  • For all n n j fea m a x , choose x n = x < * where x < * is the optimal solution of the following optimization problem:
    U * d = max x X U d ( x , λ ¯ j fea dec ) s . t . U a ( x , λ ¯ j fea dec ) U a ( x > * , λ m a x )
By following the above defense function, an optimal deception of the attacker is to mimic λ ¯ j fea dec , and the defender obtains an utility of U * d .
Proof. 
First, we show that the attacker optimal deception response is to λ ¯ j fea dec . Indeed, we have the uncertainty set X j fea def { x < * } because the defender plays x n = x < * for all n n j fea m a x . In addition, for all j such that n j m a x > n j fea m a x , the uncertainty set X j def contains x > * . Therefore, we have the attacker worst-case utility satisfying:
U j a , * U a ( x > * , λ ¯ j dec ) U a ( x > * , λ m a x ) U a ( x < * , λ ¯ j fea dec ) = U j fea a , *
Furthermore, for all j such that n j m a x n j fea m a x , we have j j fea according to Observation 9. Thus, we obtain:
U j a , * = U a ( x < * , λ ¯ j dec ) U a ( x < * , λ ¯ j fea ) = U j fea a , *
Based on the above defense function and the fact that the attacker will choose λ ¯ j fea dec , the defender receives an utility of U * d . Next, we prove that this is the best the defender can obtain by showing that any defense function { x 1 , , x N } such that j fea is the attacker’s best response will lead to a defender utility less than U * d . Indeed, since n j fea m a x < N , it means j fea < M or in other words, λ ¯ j fea dec < λ ¯ M = λ m a x . On the other hand, since λ ¯ j fea dec is the best choice of the attacker, the following inequality must hold:
U j fea a , * U M a , * = min x X M def U a ( x , λ m a x ) min x X U a ( x , λ m a x )
This means that any defense function { x 1 , , x k } such that j fea is the attacker’s best response has to satisfy the above inequality. As defined, U * d is the highest utility for the defender among these defense functions that satisfy the above inequality. □
Proposition 3
(Sub-problem P j fea counter ). If n j fea m a x = N , the best counter-deception of the defender can be determined as follows: for all n, we set: x n = x ^ where x ^ is an optimal solution of
max x X U d ( x , λ m a x )
By following this defense function, the attacker’s best deception is to mimic λ m a x and the defender obtains an utility of U d ( x ^ , λ m a x ) .
Proof. 
First, we observe that given x ^ , λ ¯ j fea is the best response of the attacker. Indeed, since j fea = M or equivalently λ ¯ j fea = λ m a x according to Observation 10, we have:
U j fea a , * = U a ( x ^ , λ m a x ) U a ( x ^ , λ ¯ j dec ) = U j a , * , j
Second, since λ ¯ j fea = λ m a x , then for any defense function such that λ ¯ j fea is the best deception choice of the attacker, the resulting utility for the defender must be no more than:
max x X j fea def U d ( x , λ m a x ) max x X U d ( x , λ m a x )
regardless of the learning outcome λ learnt [ λ m a x δ , λ m a x + δ ] . This is because the defender eventually plays one of the defense strategies in the set X j fea def . The RHS is the defender’s utility obtained by playing the counter-deception specified by the proposition. □
Based on Propositions 2 and 3, we can easily find the optimal counter-deception by choosing the solution of the sub-problem that provides the highest defender utility.

5.3. Completing the Proof of Theorem 3

According to Propositions 2 and 3, given an interval set I , the resulting defense function will only lead the defender to play either { x > * , x < * } or { x ^ } , whichever provides a higher utility for the defender. Based on this result, our Theorem 3 then identifies an optimal interval set, and corresponding optimal defense strategies, as we prove below.
First, we will show that if the defender follows the defense function specified in Theorem 3, then the attacker’s optimal deception is to mimic λ * . Indeed, if λ * = λ m a x , then since the defender always plays x * , the attacker’s optimal deception is to play λ * = λ m a x to obtain a highest utility U a ( x * , λ m a x ) .
On the other hand, if λ * < λ m a x , we consider two cases:
Case 1, if λ m a x 2 δ λ * < λ m a x , then the intervals of the attackers are i n t 1 dec = [ 0 , λ * ] and i n t 2 dec = ( λ * , λ m a x ] . The corresponding uncertainty sets are X 1 def = { x 1 } and X 2 def = { x 1 , x 2 } . In this case, the attacker’s optimal deception is to mimic λ * , since:
min x X 1 def U a ( x , λ * ) = U a ( x * , λ * ) U a ( x 2 , λ m a x ) min x X 2 def U a ( x , λ m a x )
Case 2, if λ * < λ m a x 2 δ , then the corresponding intervals for the attacker are i n t 1 dec = [ 0 , λ * ] , i n t 2 dec = ( λ * , λ * + 2 δ ] , and i n t 3 dec = ( λ * + 2 δ , λ m a x ] . These intervals of the attacker have uncertainty sets X 1 def = { x 1 } , X 2 def = { x 1 , x 2 } , and X 3 def = { x 2 } , respectively. The attacker’s best deception is thus to mimic λ * , since the attacker’s worst-case utility is min x X 1 def U a ( x , λ * ) = U a ( x * , λ * ) , and
U a ( x * , λ * ) U a ( x 2 , λ m a x ) min x X 2 U a ( x , λ * + 2 δ ) U a ( x * , λ * ) U a ( x 2 , λ m a x ) = min x X 3 U a ( x , λ m a x )
Now, since the attacker’s best deception is to mimic λ * , according to the above analysis, the uncertainty set is X 1 def = { x 1 = x * } , thus the defender will play x * in the end, leading to an utility of U d ( x * , λ * ) . This is the highest possible utility that the defender can obtain since both optimization problems presented in Propositions 2 and 3 are special cases of ( P counter ) when we fix the variable λ = λ m a x (for Proposition 3) or λ = λ ¯ j fea (for Proposition 2).

6. Experimental Evaluation

Our experiments are run on a 2.8 GHz Intel Xeon processor with 256 GB RAM. We use Matlab (https://www.mathworks.com, accessed on 1 October 2022) to solve non-linear programs and Cplex (https://www.ibm.com/analytics/cplex-optimizer, accessed on 1 October 2022) to solve MILP s involved in the evaluated algorithms. We use a value of λ m a x = 5 in all our experiments (except in Figure 3g,h), and discretize the range [ 0 , λ m a x ] using a step size of 0.2 : λ { 0 , 0.2 , , λ m a x } . We use the covariance game generator, GAMUT (http://gamut.stanford/edu, accessed on 1 October 2022) to generate rewards and penalties of players within the range of [ 1 , 10 ] (for attacker) and [ 10 , 1 ] (for defender). GAMUT takes as input a covariance value r [ 1 , 0 ] which controls the correlations between the defender and the attacker’s payoff. Our results are averaged over 50 runs. All our results are statistically significant under bootstrap-t ( p = 0.05 ).
Algorithms. We compare three cases: (i) Non - Dec : the attacker is non deceptive and the defender also assumes so. As a result, both play Strong Stackelberg equilibrium strategies; (ii) Dec - δ : the attacker is deceptive, while the defender does not handle the attacker’s deception (Section 4). We examine different uncertainty ranges by varying values of δ ; and (iii) Dec - Counter : the attacker is deceptive while the defender tackle the attacker’s deception (Section 5).
Figure 3a,b compare the performance of our algorithms with increasing number of targets. These figures show that (i) the attacker benefits by playing deceptively ( Dec - 0 achieves 61% higher attacker utility than Non - Dec ); (ii) the benefit of deception to the attacker is reduced when the attacker is uncertain about the defender’s learning outcome. In particular, Dec - 0.25 achieves 4% lesser attacker utility than Dec - 0 ; (iii) the defender suffers a substantial utility loss due to the attacker’s deception and this utility loss is reduced in the presence of the attacker’s uncertainty; and finally, (iv) the defender benefits significantly (in their utility) by employing counter-deception against a deceptive attacker.
In Figure 3c,d, we show the performance of our algorithms with varying r (i.e., covariance) values. In zero-sum games (i.e., r = 1 ), the attacker has no incentive to be deceptive [4]. Therefore, we only plot the results of r [ 0.2 , 0.8 ] with a step size of 0.2 . This figure shows that when r gets closer to 1.0 (which implies zero-sum behavior), the attacker’s utility with deception (i.e., Dec - 0 and Dec - 0.25 ) gradually moves closer to its utility with Non - Dec , reflecting that the attacker has less incentive to play deceptively. Furthermore, the defender’s average utility in all cases gradually decreases when the covariance value gets closer to 1.0 . This results show that in SSG s, the defender’s utility is always governed by the adversarial level (i.e., the payoff correlations) between the players, regardless of whether the attacker is deceptive or not.
Figure 3e,f compare the attacker and defender utilities with varying uncertainty range, i.e., δ values, on 60-target games. These figures show that attacker utilities decrease linearly with increasing values of δ . On the other hand, defender utilities increase linearly with increasing values of δ . This is reasonable as increasing δ corresponds to a greater width of the uncertainty interval that the attacker has to contend with. This increased uncertainty forces the attacker to play more conservatively, thereby leading to decreased utilities for the attacker and increased utilities for the defender.
In Figure 3g,h, we analyze the impact of varying λ m a x on the players’ utilities in 60-target games. These figures show that (i) with increasing values of λ m a x , the action space of a deceptive attacker increases, hence, the attacker utility increases as a result ( Dec - 0 , Dec - 0.25 in both sub-figures); (ii) When this λ m a x is close to zero, the attacker is limited to a less-strategic-attack zone and thus the defender’s strategies have less influence on how the attacker would response. The defender thus receives a lower utility when λ m a x gets close to zero; and (iii) most importantly, the attacker utility against a counter-deceptive defender decreases with increasing values of λ m a x . This result shows that when the defender plays counter-deception, the attacker can actually gain more benefit by committing to a more limited deception range.
Finally, we evaluate the runtime performance of our algorithms in Figure 4. We provide results for resource-to-target ratio L T = 0.3 and 0.5 . This figure shows that (i) even on 100 target games, Dec - 0 finishes in ∼5 min. (ii) Due to the simplicity of the proposed counter-deception algorithm, Counter - Dec finishes in 13 s on 100 target games.

Additional Experiment Results

Figure 5 shows the performance of our algorithms as we vary the number of resources L on 80-target games and 20-target games. This figure shows that the benefits of deception and counter-deception to the players are observed consistently when varying L. It shows that (i) the defender (attacker) utilities steadily increase (decrease) with increasing L; and (ii) the trends observed between the different algorithms in Figure 5 are observed consistently at different values of L. In Figure 6, we compare different algorithms with increasing number of targets when L T = 0.5 . We observe similar trends in these additional results.

7. Conclusions

This paper provides a comprehensive analysis of the attacker deception and defender counter-deception under uncertainty. Our algorithms are developed based on the decomposibility of the attacker’s deception space and the discretization of the defender’s learning outcome. Our key finding is that the optimal counter-deception defense solution only depends on the common knowledge of players about the uncertainty range of the defender’s learning outcome. Finally, our extensive experiments show the effectiveness of our counter-deception solutions in handling the attacker’s deception.
As for future work, this article focus on the attacker deception and defender counter-deception in the context of the Quantal Response model, which only has a single model parameter. Given promising results of this article, investigating the attacker deception in more complex model settings such as neural nets would be interesting future direction.

Author Contributions

Conceptualization, T.H.N. and A.Y.; methodology, T.H.N.; validation, A.Y.; writing—original draft preparation, T.H.N. and A.Y.; writing—review and editing, T.H.N. All authors have read and agreed to the published version of the manuscript.

Funding

Dr. Nguyen was supported by ARO Grant No. W911NF-20-1-0344 and Dr. Yadav was supported in part by ARO Grant No. W911NF-21-1-0047.

Conflicts of Interest

The authors declare no conflict of interest.

Notes

1
We use a uniform discretization for the sake of solution quality analysis (as we will describe later). Our approach can be generalized to any non-uniform discretization.
2
Lemma 7 is stated for the general case n > 1 when the defender’s interval I n d is left-open. When n = 1 with the left bound is included, we have l b n λ dec u b n + 1 .
3
There is a degenerate case in which U a ( x , λ ) is constant for all λ , when the defense strategy x leads to an identical expected utility for the attacker across all targets. To avoid this case, we can add a small noise to such defense strategy x so that these attacker expected utilities vary across the targets, while ensuring that this noise only leads to a small change in the defender’s utility.

References

  1. Tambe, M. Security and Game Theory: Algorithms, Deployed Systems, Lessons Learned; Cambridge University Press: Cambridge, UK, 2011. [Google Scholar]
  2. Yang, R.; Kiekintveld, C.; Ordonez, F.; Tambe, M.; John, R. Improving resource allocation strategy against human adversaries in security games. In Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, Barcelona, Spain, 16–22 July 2011. [Google Scholar]
  3. Nguyen, T.H.; Yang, R.; Azaria, A.; Kraus, S.; Tambe, M. Analyzing the effectiveness of adversary modeling in security games. In Proceedings of the AAAI Conference on Artificial Intelligence, Bellevue, WA, USA, 41–18 October 2013. [Google Scholar]
  4. Nguyen, T.H.; Vu, N.; Yadav, A.; Nguyen, U. Decoding the Imitation Security Game: Handling Attacker Imitative Behavior Deception. In Proceedings of the 24th European Conference on Artificial Intelligence, Compostela, Spain, 29 August–8 September 2020. [Google Scholar]
  5. Gholami, S.; Yadav, A.; Tran-Thanh, L.; Dilkina, B.; Tambe, M. Do not Put All Your Strategies in One Basket: Playing Green Security Games with Imperfect Prior Knowledge. In Proceedings of the 18th International Conference on Autonomous Agents and Multiagent Systems, Montreal, QC, Canada, 13–17 May 2019; pp. 395–403. [Google Scholar]
  6. McFadden, D. Conditional Logit Analysis of Qualitative Choice Behavior. In Frontiers in Econometrics New York; Zarembka, P., Ed.; Academic Press: Cambridge, MA, USA, 1973. [Google Scholar]
  7. McKelvey, R.D.; Palfrey, T.R. Quantal response equilibria for normal form games. Games Econ. Behav. 1995, 10, 6–38. [Google Scholar] [CrossRef] [Green Version]
  8. Kar, D.; Nguyen, T.H.; Fang, F.; Brown, M.; Sinha, A.; Tambe, M.; Jiang, A.X. Trends and applications in Stackelberg security games. Handb. Dyn. Game Theory 2017. [Google Scholar] [CrossRef]
  9. An, B.; Shieh, E.; Yang, R.; Tambe, M.; Baldwin, C.; DiRenzo, J.; Maule, B.; Meyer, G. A Deployed Quantal Response Based Patrol Planning System for the US Coast Guard. Interfaces 2013, 43, 400–420. [Google Scholar] [CrossRef] [Green Version]
  10. Carroll, T.E.; Grosu, D. A game theoretic investigation of deception in network security. Secur. Commun. Netw. 2011, 4, 1162–1172. [Google Scholar] [CrossRef]
  11. Fraunholz, D.; Anton, S.D.; Lipps, C.; Reti, D.; Krohmer, D.; Pohl, F.; Tammen, M.; Schotten, H.D. Demystifying Deception Technology: A Survey. arXiv 2018, arXiv:1804.06196. [Google Scholar]
  12. Horák, K.; Zhu, Q.; Bošanskỳ, B. Manipulating adversary’s belief: A dynamic game approach to deception by design for proactive network security. In Proceedings of the International Conference on Decision and Game Theory for Security, Vienna, Austria, 23–25 October 2017; Springer: Cham, Switzerland, 2017; pp. 273–294. [Google Scholar]
  13. Zhuang, J.; Bier, V.M.; Alagoz, O. Modeling secrecy and deception in a multiple-period attacker–defender signaling game. Eur. J. Oper. Res. 2010, 203, 409–418. [Google Scholar] [CrossRef]
  14. Han, X.; Kheir, N.; Balzarotti, D. Deception techniques in computer security: A research perspective. ACM Comput. Surv. (CSUR) 2018, 51, 1–36. [Google Scholar] [CrossRef]
  15. Fugate, S.; Ferguson-Walter, K. Artificial Intelligence and Game Theory Models for Defending Critical Networks with Cyber Deception. AI Mag. 2019, 40, 49–62. [Google Scholar] [CrossRef]
  16. Guo, Q.; An, B.; Bosansky, B.; Kiekintveld, C. Comparing strategic secrecy and Stackelberg commitment in security games. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17), Melbourne, Australia, 19–25 August 2017. [Google Scholar]
  17. Rabinovich, Z.; Jiang, A.X.; Jain, M.; Xu, H. Information disclosure as a means to security. In Proceedings of the the 2015 International Conference on Autonomous Agents and Multiagent Systems, Istanbul, Turkey, 4–8 May 2015; pp. 645–653. [Google Scholar]
  18. Sinha, A.; Fang, F.; An, B.; Kiekintveld, C.; Tambe, M. Stackelberg Security Games: Looking Beyond a Decade of Success. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18), Stockholm, Sweden, 13–19 July 2018; pp. 5494–5501. [Google Scholar]
  19. Xu, H.; Rabinovich, Z.; Dughmi, S.; Tambe, M. Exploring Information Asymmetry in Two-Stage Security Games. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, Austin, TX, USA, 25–30 January 2015; pp. 1057–1063. [Google Scholar]
  20. Gan, J.; Xu, H.; Guo, Q.; Tran-Thanh, L.; Rabinovich, Z.; Wooldridge, M. Imitative Follower Deception in Stackelberg Games. arXiv 2019, arXiv:1903.02917. [Google Scholar]
  21. Nguyen, T.H.; Wang, Y.; Sinha, A.; Wellman, M.P. Deception in finitely repeated security games. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, Hi, USA, 27 January–1 February 2019. [Google Scholar]
  22. Estornell, A.; Das, S.; Vorobeychik, Y. Deception Through Half-Truths. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020. [Google Scholar]
  23. Nguyen, T.H.; Sinha, A.; He, H. Partial Adversarial Behavior Deception in Security Games. In Proceedings of the 29th International Joint Conference on Artificial Intelligence (IJCAI), Virtual Conference, 7–15 January 2021. [Google Scholar]
  24. Biggio, B.; Nelson, B.; Laskov, P. Poisoning attacks against support vector machines. arXiv 2012, arXiv:1206.6389. [Google Scholar]
  25. Huang, L.; Joseph, A.D.; Nelson, B.; Rubinstein, B.I.; Tygar, J.D. Adversarial machine learning. In Proceedings of the 4th ACM Workshop on Security and Artificial Intelligence, Chicago, IL, USA, 21 October 2011; ACM: New York, NY, USA, 2011; pp. 43–58. [Google Scholar]
  26. Steinhardt, J.; Koh, P.W.W.; Liang, P.S. Certified defenses for data poisoning attacks. In Proceedings of the Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 3517–3529. [Google Scholar]
  27. Tong, L.; Yu, S.; Alfeld, S. Adversarial Regression with Multiple Learners. In Proceedings of the International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; pp. 4946–4954. [Google Scholar]
  28. Kiekintveld, C.; Jain, M.; Tsai, J.; Pita, J.; Ordóñez, F.; Tambe, M. Computing optimal randomized resource allocations for massive security games. In Proceedings of the 8th International Conference on Autonomous Agents and Multiagent Systems, Budapest, Hungary, 10–15 May 2009; pp. 689–696. [Google Scholar]
Figure 1. An example of discretizing λ learnt , Λ learnt = { 0 , 0.9 , 1.7 , 2.3 } , and the six resulting attacker sub-intervals and corresponding uncertainty sets, with λ m a x = 2 , δ = 0.5 . In particular, the first sub-interval of deceptive λ dec is i n t 1 = [ 0 , 0.4 ) in which any λ dec corresponds to the same uncertainty set of possible learning outcomes Λ 1 learnt = { 0 } .
Figure 1. An example of discretizing λ learnt , Λ learnt = { 0 , 0.9 , 1.7 , 2.3 } , and the six resulting attacker sub-intervals and corresponding uncertainty sets, with λ m a x = 2 , δ = 0.5 . In particular, the first sub-interval of deceptive λ dec is i n t 1 = [ 0 , 0.4 ) in which any λ dec corresponds to the same uncertainty set of possible learning outcomes Λ 1 learnt = { 0 } .
Games 13 00081 g001
Figure 2. An example of a defense function with corresponding sub-intervals and uncertainty sets of the attacker, where λ m a x = 2.0 and δ = 0.4 . The defense function is determined as: I 1 d = [ 0 , 1.4 ] , I 2 d = ( 1.4 , 2.4 ] with corresponding defense strategies { x 1 , x 2 } . Then the deception range of the attacker can be divided into three sub-intervals: i n t 1 dec = [ 0 , 1 ] , i n t 2 dec = ( 1 , 1.8 ] , i n t 3 dec = ( 1.8 , 2 ] with corresponding uncertainty sets X 1 def = { x 1 } , X 2 def = { x 1 , x 2 } , X 3 def = { x 2 } . For example, if the attacker plays any λ dec i n t 2 dec , it will lead the defender to play either x 1 or x 2 , depending on the actual learning outcome of the defender.
Figure 2. An example of a defense function with corresponding sub-intervals and uncertainty sets of the attacker, where λ m a x = 2.0 and δ = 0.4 . The defense function is determined as: I 1 d = [ 0 , 1.4 ] , I 2 d = ( 1.4 , 2.4 ] with corresponding defense strategies { x 1 , x 2 } . Then the deception range of the attacker can be divided into three sub-intervals: i n t 1 dec = [ 0 , 1 ] , i n t 2 dec = ( 1 , 1.8 ] , i n t 3 dec = ( 1.8 , 2 ] with corresponding uncertainty sets X 1 def = { x 1 } , X 2 def = { x 1 , x 2 } , X 3 def = { x 2 } . For example, if the attacker plays any λ dec i n t 2 dec , it will lead the defender to play either x 1 or x 2 , depending on the actual learning outcome of the defender.
Games 13 00081 g002
Figure 3. Evaluations on player utilities.
Figure 3. Evaluations on player utilities.
Games 13 00081 g003
Figure 4. Runtime performance.
Figure 4. Runtime performance.
Games 13 00081 g004
Figure 5. Player Utilities with Varying Number of Resources.
Figure 5. Player Utilities with Varying Number of Resources.
Games 13 00081 g005
Figure 6. Player Utilities with Varying Number of Targets.
Figure 6. Player Utilities with Varying Number of Targets.
Games 13 00081 g006
Table 1. The payoff matrix of a 3-target game.
Table 1. The payoff matrix of a 3-target game.
Target 1Target 2Target 3
Def. Reward231
Def. Penalty−1−20
Att. Reward213
Att. Penalty−3−2−3
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Nguyen, T.H.; Yadav, A. A Complete Analysis on the Risk of Using Quantal Response: When Attacker Maliciously Changes Behavior under Uncertainty. Games 2022, 13, 81. https://doi.org/10.3390/g13060081

AMA Style

Nguyen TH, Yadav A. A Complete Analysis on the Risk of Using Quantal Response: When Attacker Maliciously Changes Behavior under Uncertainty. Games. 2022; 13(6):81. https://doi.org/10.3390/g13060081

Chicago/Turabian Style

Nguyen, Thanh Hong, and Amulya Yadav. 2022. "A Complete Analysis on the Risk of Using Quantal Response: When Attacker Maliciously Changes Behavior under Uncertainty" Games 13, no. 6: 81. https://doi.org/10.3390/g13060081

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop