A Complete Analysis on the Risk of Using Quantal Response: When Attacker Maliciously Changes Behavior under Uncertainty

Nguyen, Thanh Hong; Yadav, Amulya

doi:10.3390/g13060081

Open AccessArticle

A Complete Analysis on the Risk of Using Quantal Response: When Attacker Maliciously Changes Behavior under Uncertainty

by

Thanh Hong Nguyen

^1,* and

Amulya Yadav

^2,*

¹

Department of Computer and Information Science, University of Oregon, Eugene, OR 97403, USA

²

College of Information Sciences and Technology, Pennsylvania State University, State College, PA 16801, USA

^*

Authors to whom correspondence should be addressed.

Games 2022, 13(6), 81; https://doi.org/10.3390/g13060081

Submission received: 2 November 2022 / Revised: 22 November 2022 / Accepted: 23 November 2022 / Published: 2 December 2022

(This article belongs to the Special Issue Game-Theoretic Analysis of Network Security and Privacy)

Download

Browse Figures

Versions Notes

Abstract

:

In security games, the defender often has to predict the attacker’s behavior based on some observed attack data. However, a clever attacker can intentionally change its behavior to mislead the defender’s learning, leading to an ineffective defense strategy. This paper investigates the attacker’s imitative behavior deception under uncertainty, in which the attacker mimics a (deceptive) Quantal Response behavior model by consistently playing according to a certain parameter value of that model, given that it is uncertain about the defender’s actual learning outcome. We have three main contributions. First, we introduce a new maximin-based algorithm to compute a robust attacker deception decision under uncertainty, given the defender is unaware of the attacker deception. Our polynomial algorithm is built via characterizing the decomposability of the attacker deception space as well optimal deception behavior of the attacker against the worst case of uncertainty. Second, we propose a new counter-deception algorithm to tackle the attacker’s deception. We theoretically show that there is a universal optimal defense solution, regardless of any private knowledge the defender has about the relation between their learning outcome and the attacker deception choice. Third, we conduct extensive experiments in various security game settings, demonstrating the effectiveness of our proposed counter-deception algorithms to handle the attacker manipulation.

Keywords:

security games; behavior models; deception; uncertainty

1. Introduction

In many real-world security domains, security agencies (defender) attempt to predict the attacker’s future behavior based on some collected attack data, and use the prediction result to determine effective defense strategies. A lot of existing work in security games has thus focused on developing different behavior models of the attacker [1,2,3]. Recently, the challenge of playing against a deceptive attacker has been studied, in which the attacker can manipulate the attack data (by changing its behavior) to fool the defender, making the defender learn a wrong behavior model of the attacker [4]. Such deceptive behavior by the attacker can lead to an ineffective defender strategy.

A key limitation in existing work is the assumption that the defender has full access to the attack data, which means the attacker knows exactly what the learning outcome of the defender would be. However, in many real-world domains, the defender often has limited access to the attack data, e.g., in wildlife protection, park rangers typically cannot find all the snares laid out by poachers in entire conservation areas [5]. As a result, the learning outcome the defender obtains (with limited attack data) may be different from the deception behavior model that the attacker commits to. Furthermore, the attacker (and the defender) may have imperfect knowledge about the relation between the deception choice of the attacker and the actual learning outcome of the defender.

We address this limitation by studying the challenge of attacker deception given such uncertainty. We consider a security game model in which the defender adopts Quantal Response (

QR

), a well-known behavior model in economics and game theory [2,6,7], to predict the attacker’s behavior, where the model parameter

λ \in R

is trained based on some attack data. On the other hand, the attacker plays deceptively by mimicking a

QR

model with a different value of

λ

, denoted by

λ^{dec}

. In this work, we incorporate the deception-learning uncertainty into this game model, where the learning outcome of the defender (denoted by

λ^{learnt}

) can be any value within a range centered at

λ^{dec}

.

We provide the following key contributions. First, we present a new maximin-based algorithm to compute an optimal robust deception strategy for the attacker. At a high level, our algorithm works by maximizing the attacker’s utility under the worst-case of uncertainty. The problem comprises of three nested optimization levels, which is not straightforward to solve. We thus propose an alternative single-level optimization problem based on partial discretization. Despite this simplification, the resulting optimization is still challenging to solve due to the non-convexity of the attacker’s utility and the dependence of the uncertainty set on

λ^{dec}

. By exploiting the decomposibility of the deception space and the monotonicity of the attacker’s utility, we show that the alternative relaxed problem can be solved optimally in polynomial time. The idea is to decompose the problem into a polynomial number of sub-problems (according to the decomposition of the deception space)—each sub problem can be solved in a polynomial time given the attacker optimal deception decision within each sub-space is shown to be one of the extreme points of that sub-space, despite the non-convexity of the sub-problem.

Second, we propose a new counter-deception algorithm, which generates an optimal defense function that outputs a defense strategy for each possible (deceptive) learning outcome. Our key finding is that there is a universal optimal defense function for the defender, regardless of any additional information he has about the relation between their learning outcome and the deception choice of the attacker (besides the common knowledge that the learning outcome is within a range around the deception choice). Importantly, this optimal defense function, which can be determined by solving a single non-linear program, only outputs two different strategies despite the infinite-sized learning outcome space. Our counter-deception algorithm is built based on an extensive in-depth analysis of intrinsic characteristics of the attacker’s adaptive deception response to any deception-aware defense solution. That is, under our propose defense mechanism, the attacker’s deception space remains decomposable (although the sub-spaces vary which depends on the counter-deception mechanism) and the attacker’s optimal deception remains one of the extreme points of the deception sub-spaces.

Third, we conduct extensive experiments to evaluate our proposed algorithms in various security game settings with different number of targets, various ranges of the defender capacity as well as different levels of the attacker uncertainty, and finally, different correlations between players’ payoffs. Our results show that (i) despite the uncertainty, the attacker still obtains a significantly higher utility by playing deceptively when the defender is unaware of the attacker deception; and (ii) the defender can substantially diminish the impact of the attacker’s deception when following our counter-deception algorithm.

Outline of the Article

We outline the rest of our article as follows. We discuss the Related Work and Background in Section 2 and Section 3. In Section 4, we present our detailed theoretical analysis on the attacker behavior deception under the uncertainty of the defender’s learning outcome, given that the defender is unaware of the attacker’s deception. In Section 5, we describe our new counter-deception algorithm for the defender to tackle the attacker’s manipulation. In this section, we first extend theoretical results in Section 4 as to analyzing the attacker manipulation adaptation to the defender’s counter-deception. Based on the result of the attacker adaptation, we then provide theoretical results on computing the defender optimal counter-deception. In Section 6, we show our experiment results, evaluating our proposed algorithms. Finally, Section 7 concludes our article.

2. Related Work

Parameterized models of attacker behavior such as Quantal Response, and other machine learning models have been studied for Stackelberg security games (

SSG

s) [5,8,9]. These models provide general techniques for modeling the attacker decision making. Prior work assumes that the attacker always plays truthfully. Thus, existing algorithms for generating defense strategies would be vulnerable against deceptive attacks by an attacker who is aware of the defender’s learning. Our work addresses such a strategic deceptive attacker by planning counter-deception defense strategies.

Deception is widely studied in security research [10,11,12,13,14,15]. In

SSG

literature, a lot of prior work has studied deception by the defender, i.e., the defender exploits their knowledge regarding uncertainties to mislead the attacker’s decision making [16,17,18,19]. Recently, deception on the attacker’s side has been studied. Existing work focuses on situations in which the defender is uncertain about the attacker type [20,21,22]. Some study the attacker behavior deception problem [4,23]. They assume that the attacker knows exactly the learning outcome while in our problem, the attacker is uncertain about that learning outcome.

Our work is also related to poisoning attacks in adversarial machine learning in which an adversary can contaminate the training data to mislead ML algorithms [24,25,26,27]. Existing work in adversarial learning uses prediction accuracy as the measure to analyzing such attacks, while in our game setting, the final goals of players are to optimize their utility, given some learning outcome.

3. Background

Stackelberg Security Games ( $SSG$ s). There is a set of

T = {1, 2, \dots, T}

targets that a defender has to protect using

L < T

security resources. A pure strategy of the defender is an allocation of these L resources over the T targets. A mixed strategy of the defender is a probability distribution over all pure strategies. In this work, we consider the no-scheduling-constraint game setting, in which each defender mixed strategy can be compactly represented as a coverage vector

x = {x_{1}, x_{2}, \dots, x_{T}}

, where

x_{t} \in [0, 1]

is the probability that the defender protects target t and

\sum_{t} x_{t} \leq L

[28]. We denote by

X

the set of all defense strategies. In

SSG

s, the defender plays first by committing to a mixed strategy, and the attacker responds against this strategy by choosing a single target to attack.

When the attacker attacks target t, it obtains a reward

R_{t}^{a}

while the defender receives a penalty

P_{t}^{d}

if the defender is not protecting that target. Conversely, if the defender is protecting t, the attacker gets a penalty

P_{t}^{a} < R_{t}^{a}

while the defender receives a reward

R_{t}^{d} > P_{t}^{d}

. The expected utility of the defender,

U_{t}^{d} (x_{t})

(and attacker’s,

U_{t}^{a} (x_{t})

), if the attacker attacks target t are computed as follows:

U_{t}^{d} (x_{t}) = x_{t} R_{t}^{d} + (1 - x_{t}) P_{t}^{d} U_{t}^{a} (x_{t}) = x_{t} P_{t}^{a} + (1 - x_{t}) R_{t}^{a}

Quantal Response Model ( $QR$ ).

QR

is a well-known behavioral model used to predict boundedly rational (attacker) decision making in security games [2,6,7]. Essentially,

QR

predicts the probability that the attacker attacks each target t using the softmax function:

\begin{matrix} q_{t} (x, λ) = \frac{e^{λ U_{t}^{a} (x_{t})}}{\sum_{t^{'}} e^{λ U_{t^{'}}^{a} (x_{t^{'}})}} \end{matrix}

(1)

where

λ

is the parameter that governs the attacker’s rationality. When

λ = 0

, the attacker attacks every target uniformly at random. When

λ = + \infty

, the attacker is perfectly rational. Given that the attacker follows

QR

, the defender and attacker’s expected utility is computed as an expectation over all targets:

\begin{matrix} U^{d} (x, λ) = \sum_{t} q_{t} (x, λ) U_{t}^{d} (x_{t}) \end{matrix}

(2)

\begin{matrix} U^{a} (x, λ) = \sum_{t} q_{t} (x, λ) U_{t}^{a} (x_{t}) \end{matrix}

(3)

The attacker’s utility

U^{a} (x, λ)

was proved to be increasing in

λ

[4]. We leverage this monotonicity property to analyze the attacker’s deception. In

SSG

s, the defender can learn

λ

based on some collected attack data, denoted by

λ^{learnt}

, and find an optimal strategy which maximizes their expected utility accordingly:

\begin{matrix} max_{x \in X} U^{d} (x, λ^{learnt}) \end{matrix}

4. Attacker Behavior Deception under Unknown Learning Outcome

We first study the problem of imitative behavior deception in a security scenario in which the attacker does not know exactly the defender’s learning outcome. Formally, if the attacker plays according to a particular parameter value of

QR

, denoted by

λ^{dec}

, the learning outcome of the defender can be any value within the interval

[max {λ^{dec} - δ, 0}, λ^{dec} + δ]

, where

δ > 0

represents the extent to which the attacker is uncertain about the learning outcome of the defender. We term this interval,

[max {λ^{dec} - δ, 0}, λ^{dec} + δ]

, as the uncertainty range of

λ^{dec}

. We are particularly interested in the research question:

Given uncertainty about learning outcomes of the defender, can the attacker still benefit from playing deceptively?

In this section, we consider the scenario when the attacker plays deceptively while the defender does not take into account the prospect of the attacker’s deception. We aim at analyzing the attacker deception decision in this no-counter-deception scenario. We assume that the attacker plays deceptively by mimicking any

λ^{dec}

within the range

[0, λ^{m a x}]

. We consider

λ \geq 0

as this is the widely accepted range of the attacker’s bounded rationality in the literature. The value

λ^{m a x}

represents the limit to which the attacker plays deceptively. When

λ^{m a x} \to \infty

, the deception range of the attacker covers the whole range of

λ

. We aim at examining the impact of

λ^{m a x}

on the deception outcome of the attacker later in our experiments. Given uncertainty about the learning outcome of the defender, the attacker attempts to find the optimal

λ^{dec} \in [0, λ^{m a x}]

to imitate that maximizes its utility in the worst case scenario of uncertainty, which can be formulated as follows:

\begin{matrix} (P^{dec}) : max_{λ^{dec}} min_{λ^{learnt}} & U^{a} (x (λ^{learnt}), λ^{dec}) \\ s . t . & λ^{dec} \in [0, λ^{m a x}] \\ max {λ^{dec} - δ, 0} \leq λ^{learnt} \leq λ^{dec} + δ \\ x (λ^{learnt}) \in \underset{x^{'} \in X}{argmax} U^{d} (x^{'}, λ^{learnt}) \end{matrix}

where

x (λ^{learnt})

is the defender’s optimal strategy w.r.t their learning outcome

λ^{learnt}

. The objective

U^{a} (x (λ^{learnt}), λ^{dec})

is the attacker’s utility when the defender plays

x (λ^{learnt})

and the attacker mimics

QR

with

λ^{dec}

to play (see Equations (1)–(3) for the detailed computation). In addition,

U^{d} (x^{'}, λ^{learnt})

is the defender’s expected utility that the defender aims to maximize where

x^{'}

is the defender’s strategy and

λ^{learnt}

is the learning outcome of the defender regarding the attacker’s behavior. Essentially, the last constraint of

(P^{dec})

ensures that the defender will play an optimal defense strategy according to their learning outcome. Finally, due to potential noises in learning, the defender’s learning outcome

λ^{learnt}

may fall outside of the deception range of the attacker, which is captured by our constraint that

λ^{learnt} \leq λ^{dec} + δ

.

4.1. A Polynomial-Time Deception Algorithm

The optimization problem

(P^{dec})

involves three-nested optimization levels which is not straightforward to solve. We thus propose to limit the possible learning outcomes of the defender by discretizing the domain of

λ^{learnt}

into a finite set

Λ_{discrete}^{learnt} = (λ_{1}^{learnt}, λ_{2}^{learnt}, \dots, λ_{K}^{learnt})

where

λ_{1}^{learnt} = 0

,

λ_{K}^{learnt} = λ^{m a x} + δ

, and

λ_{k + 1}^{learnt} - λ_{k}^{learnt} = η, \forall k < K

where

η > 0

is the discretization step size and

K = \frac{λ^{m a x} + δ}{η} + 1

is the number of discrete learning values1. For each deception choice

λ^{dec}

, the attacker’s uncertainty set of defender’s possible learning outcomes

λ^{learnt}

is now given by:

\begin{matrix} Λ_{discrete}^{learnt} (λ^{dec}) = Λ_{discrete}^{learnt} \cap [λ^{dec} - δ, λ^{dec} + δ] \end{matrix}

For each

λ_{k}^{learnt}

, we can easily compute the corresponding optimal defense strategy

x (λ_{k}^{learnt})

in advance [2]. We thus obtain a simplified optimization problem:

\begin{matrix} (P_{discrete}^{dec}) : & max_{λ^{dec} \in [0, λ^{m a x}]} U \\ s . t . U \leq U^{a} (x (λ_{k}^{learnt}), λ^{dec}), for all λ_{k}^{learnt} \in Λ_{discrete}^{learnt} (λ^{dec}) \end{matrix}

where U is the maximin utility for the attacker in the worst-case of learning outcome.

Remark on computational challenge. Although

(P_{discrete}^{dec})

is a single-level optimization, solving it is still challenging due to (i)

(P_{discrete}^{dec})

is a non-convex optimization problem since the attacker’s utility

U^{a} (x (λ_{k}^{learnt}), λ^{dec})

is non-convex in

λ^{dec}

; and (ii) the number of inequality constraints in

(P_{discrete}^{dec})

vary with respect to

λ^{dec}

, which complicates the problem further. By exploiting the decomposability property of the deception space

[0, λ^{m a x}]

and the monotonicity of the attacker’s utility function

U^{a} (x (λ_{k}^{learnt}), λ^{dec})

, we show that

(P_{discrete}^{dec})

can be solved optimally in a polynomial time.

Theorem 1

(Time complexity). (

P_{discrete}^{dec}

) can be solved optimally in a polynomial time.

Overall, the proof of Theorem 1 is derived based on (i) Lemma 1—showing that the deception space can be divided into an

O (K)

number of sub-intervals, and each sub-interval leads to the same uncertainty set; and (ii) Lemma 4—showing that (

P_{discrete}^{dec}

) can be divided into a

O (K)

sub-problems which correspond to the decomposability of the deception space (as shown in Lemma 1), and each sub-problem can be solved in polynomial time.

4.1.1. Decomposability of Deception Space

In the following, we first present our theoretical analysis on the decomposability of the deception space. We then describe in detail our decomposition algorithm.

Lemma 1

(Decomposability of deception space). The attacker deception space

[0, λ^{m a x}]

can be decomposed into a finite number of disjointed sub-intervals, denoted by

i n t_{j}^{dec}

where

j = 1, 2, \dots,

and

i n t_{j}^{dec} \cap i n t_{j^{'}}^{dec} = \emptyset

for all

j \neq j^{'}

and

\cup_{j} i n t_{j}^{dec} = [0, λ^{m a x}]

, such that each

λ^{dec} \in i n t_{j}^{dec}

leads to the same uncertainty set of learning outcomes, denoted by

Λ_{j}^{learnt} \subseteq Λ_{discrete}^{learnt}

. Furthermore, these sub-intervals and uncertainty sets

(i n t_{j}^{dec}, Λ_{j}^{learnt})

can be found in a polynomial time.

The proof of Lemma 1 is derived based on Lemmas 2 and 3. An example of the deception-space decomposition is illustrated in Figure 1. Intuitively, although the deception space

[0, λ^{m a x}]

is infinite, the total number of possible learning-outcome uncertainty sets is at most

2^{K}

(i.e., the number of subsets of the discrete learning space

Λ_{discrete}^{learnt}

). Therefore, the deception space can be divided into a finite number of disjoint subsets such that any deception value

λ^{dec}

within each subset will lead to the same uncertainty set. Moreover, each of these deception subsets form a sub-interval of

[0, λ^{m a x}]

, which is derived from Lemma 2:

Lemma 2.

Given two deception values

λ_{1}^{dec} < λ_{2}^{dec} \in [0, λ^{m a x}]

, if the learning uncertainty sets corresponding to these two values are the same, i.e.,

Λ_{discrete}^{learnt} (λ_{1}^{dec}) \equiv Λ_{discrete}^{learnt} (λ_{2}^{dec})

, then for any deception value

λ_{1}^{dec} < λ^{dec} < λ_{2}^{dec}

, its uncertainty set is also the same, that is:

\begin{matrix} Λ_{discrete}^{learnt} (λ^{dec}) \equiv Λ_{discrete}^{learnt} (λ_{1}^{dec}) \equiv Λ_{discrete}^{learnt} (λ_{2}^{dec}) \end{matrix}

The remaining analysis for Lemma 1 is to show that these deception sub-intervals can be found in polynomial time, which is obtained based on Lemma 3:

Lemma 3.

For each learning outcome

λ_{k}^{learnt}

, there are at most two deception sub-intervals such that

λ_{k}^{learnt}

is the smallest learning outcome in the corresponding learning uncertainty set. As a result, the total number of deception sub-intervals is

O (K)

, which is polynomial.

Since there is a

O (K)

number of deception sub-intervals, we now can develop a polynomial-time algorithm (Algorithm 1) which iteratively divides the deceptive range

[0, λ^{m a x}]

into multiple intervals, denoted by

{i n t_{j}^{dec}}_{j}

. Each of these intervals,

i n t_{j}^{dec}

, corresponds to the same uncertainty set of possible learning outcomes for the defender, denoted by

Λ_{j}^{learnt}

.

In this algorithm, for each

λ_{k}^{learnt}

, we denote by

l b_{k} = λ_{k}^{learnt} - δ

and

u b_{k} = λ_{k}^{learnt} + δ

the smallest and largest possible values of

λ^{dec}

so that

λ_{k}^{learnt}

belongs to the uncertainty set of

λ^{dec}

. In Algorithm 1,

s t a r t

is the variable which represents the left bound of each interval

i n t_{j}^{dec}

. The variable

o p e n

indicates if

i n t_{j}^{dec}

is left-open (

o p e n = t r u e

) or not (

o p e n = f a l s e

). If

s t a r t

is known for

i n t_{j}^{dec}

, the uncertainty set

Λ_{j}^{learnt}

can be determined as follows:

\begin{matrix} Λ_{j}^{learnt} = {λ_{k}^{learnt} : λ_{k}^{learnt} \in [s t a r t - δ, s t a r t + δ]} if i n t_{j}^{dec} is left - closed \\ Λ_{j}^{learnt} = {λ_{k}^{learnt} : λ_{k}^{learnt} \in (s t a r t - δ, s t a r t + δ]} if i n t_{j}^{dec} is left - open \end{matrix}

Initially,

s t a r t

is set to 0 which is the lowest possible value of

λ^{dec}

such that the uncertainty range

[λ^{dec} - δ, λ^{dec} + δ]

contains

λ_{1}^{learnt}

and

o p e n = f a l s e

. Given

s t a r t

and its uncertainty range

[s t a r t - δ, s t a r t + δ]

, the first interval

i n t_{1}^{dec}

of

λ^{dec}

corresponds to the uncertainty set determined as follows:

\begin{matrix} Λ_{1}^{learnt} = {λ_{k}^{learnt} \in Λ^{learnt} : λ_{k}^{learnt} \in [s t a r t - δ, s t a r t + δ]} \end{matrix}

At each iteration j, given the left bound

s t a r t

and the uncertainty set

Λ_{j}^{learnt}

of the interval

i n t_{j}^{dec}

, Algorithm 1 determines the right bound of

i n t_{j}^{dec}

, the left bound of the next interval

i n t_{j + 1}^{dec}

(by updating

s t a r t

), and the uncertainty set

Λ_{j + 1}^{learnt}

, (lines (6–15)). Finally, we prove the correctness of Algorithm 1 by presenting Proposition 1, which shows that for any

λ^{dec}

within each interval

i n t_{j}^{dec}

, the corresponding uncertainty interval

[λ^{dec} - δ, λ^{dec} + δ]

covers the same uncertainty set

Λ_{j}^{learnt}

.

Algorithm 1: Imitative behavior deception—Decomposition of QR parameter domain into sub-intervals

Proposition 1

(Correctness of Algorithm 1). Each iteration j of Algorithm 1 returns an interval

i n t_{j}^{dec}

such that each

λ^{dec} \in i n t_{j}^{dec}

leads to the same uncertainty set:

\begin{matrix} Λ_{j}^{learnt} = {λ_{k_{j}^{m i n}}^{learnt}, \dots, λ_{k_{j}^{m a x}}^{learnt}} \end{matrix}

The rest of this section will provide details of missing proofs for the aforementioned theoretical results.

Proof of Lemma 2.

For any

λ^{learnt} \in Λ_{discrete}^{learnt} (λ_{1}^{dec}) \equiv Λ_{discrete}^{learnt} (λ_{2}^{dec})

, we have:

\begin{matrix} λ_{1}^{dec} - δ \leq λ^{learnt} \leq λ_{1}^{dec} + δ \\ λ_{2}^{dec} - δ \leq λ^{learnt} \leq λ_{2}^{dec} + δ \end{matrix}

Since

λ^{dec} \in (λ_{1}^{dec}, λ_{2}^{dec})

, we obtain:

\begin{matrix} λ^{dec} - δ \leq λ^{learnt} \leq λ^{dec} + δ \end{matrix}

which implies

λ^{learnt} \in Λ_{discrete}^{learnt} (λ^{dec})

. As a result,

\begin{matrix} Λ_{discrete}^{learnt} (λ_{1}^{dec}) \equiv Λ_{discrete}^{learnt} (λ_{2}^{dec}) \subseteq Λ_{discrete}^{learnt} (λ^{dec}) & (*) \end{matrix}

On the other hand, let us consider a

λ^{learnt} \in Λ_{discrete}^{learnt} (λ^{dec})

, or equivalently,

λ^{dec} - δ \leq λ^{learnt} \leq λ^{dec} + δ

. We are going to show that this

λ^{learnt} \in Λ_{discrete}^{learnt} (λ_{1}^{dec}) \equiv Λ_{discrete}^{learnt} (λ_{2}^{dec})

as well. Indeed, let us assume

λ^{learnt} \notin Λ_{discrete}^{learnt} (λ_{1}^{dec}) \equiv Λ_{discrete}^{learnt} (λ_{2}^{dec})

. It means the following inequalities must hold true:

\begin{matrix} λ_{1}^{dec} + δ < λ^{learnt} < λ_{2}^{dec} - δ \end{matrix}

which means that the uncertainty ranges with respect to

λ_{1}^{dec}

and

λ_{2}^{dec}

are not overlapped, i.e.,

[λ_{1}^{dec} - δ, λ_{1}^{dec} + δ] \cap [λ_{2}^{dec} - δ, λ_{2}^{dec} + δ] \equiv \emptyset

, or equivalently,

Λ_{discrete}^{learnt} (λ_{1}^{dec}) \cap Λ_{discrete}^{learnt} (λ_{2}^{dec}) \equiv \emptyset

, which is contradictory.

Therefore,

λ^{learnt} \in Λ_{discrete}^{learnt} (λ_{1}^{dec}) \equiv Λ_{discrete}^{learnt} (λ_{2}^{dec})

, meaning that:

\begin{matrix} Λ_{discrete}^{learnt} (λ^{dec}) \subseteq Λ_{discrete}^{learnt} (λ_{1}^{dec}) \equiv Λ_{discrete}^{learnt} (λ_{2}^{dec}) & (* *) \end{matrix}

The combination of (*) and (**) concludes our proof. □

Proof of Lemma 3.

First, although the deception space

[0, λ^{m a x}]

is infinite, the total number of possible learning-outcome uncertainty sets is at most

2^{K}

(i.e., the number of subsets of the discrete learning space

Λ_{discrete}^{learnt}

). Therefore, the deception space can be divided into a finite number of disjoint subsets such that any deception value

λ^{dec}

within each subset will lead to the same uncertainty set. Moreover, each of these deception subsets form a sub-interval of

[0, λ^{m a x}]

, which is a result of Lemma 2.

Now, in order to prove that the number of disjoint sub-intervals is

O (K)

, we will show that for each learning outcome

λ_{k}^{learnt}

, there are at most two deception sub-intervals such that

λ_{k}^{learnt}

is the smallest learning outcome in the corresponding learning uncertainty set. Let us assume there is a deception sub-interval

[λ_{1}^{dec}, λ_{2}^{dec}]

which leads to an uncertainty set

{λ_{k}^{learnt}, λ_{k + 1}^{learnt}, \dots, λ_{k^{'}}^{learnt}}

for some

k^{'} \geq k

. We will prove that the following inequalities must hold:

\begin{matrix} \frac{2 δ}{η} - 2 < k^{'} - k \leq \frac{2 δ}{η} \end{matrix}

(4)

where

η

is the discretization step size. Indeed, for any

λ^{dec} \in [λ_{1}^{dec}, λ_{2}^{dec}]

, we have:

\begin{matrix} λ^{dec} - δ \leq λ_{k}^{learnt} \leq λ^{dec} + δ \\ λ^{dec} - δ \leq λ_{k^{'}}^{learnt} \leq λ^{dec} + δ \\ λ_{k - 1}^{learnt} < λ^{dec} - δ and λ_{k^{'} + 1}^{learnt} > λ^{dec} + δ \end{matrix}

Therefore,

\begin{matrix} λ_{k^{'}}^{learnt} - λ_{k}^{learnt} \leq 2 δ \Rightarrow k^{'} - k \leq \frac{2 σ}{η} \\ λ_{k^{'} + 1}^{learnt} - λ_{k - 1}^{learnt} > 2 δ \Rightarrow k^{'} - k > \frac{2 σ}{η} - 2 \end{matrix}

which concludes (4). Now, according to (4), for every k, then

k^{'} = k + ⌈ \frac{2 σ}{η} ⌉ - 2

or

k^{'} = k + ⌊ \frac{2 σ}{η} ⌋

, which means that there are at most two deception sub-intervals such that

λ_{k}^{learnt}

is the smallest learning outcome in their learning uncertainty sets. □

Proof of Proposition 1.

Note that, for each

λ_{k}^{learnt}

, we denote by

l b_{k} = λ_{k}^{learnt} - δ

and

u b_{k} = λ_{k}^{learnt} + δ

the smallest and largest possible values of

λ^{dec}

so that

λ_{k}^{learnt}

belongs to the uncertainty set of

λ^{dec}

. In addition,

k_{j}^{m a x}

and

k_{j}^{m i n}

are the indices of the smallest and largest learning outcomes in the learnt uncertainty set for every deception value in the

j^{t h}

deception interval.

At each iteration j, given the

j^{t h}

learnt uncertainty set, Algorithm 1 attempts to find the corresponding

j^{t h}

deception interval as well as the next

{(j + 1)}^{t h}

learnt uncertainty set. Essentially, Algorithm 1 considers two cases:

Case 1: $k_{j}^{m a x} < K$ and $l b_{k_{j}^{m a x} + 1} \leq u b_{k_{j}^{m i n}}$ . This is when (i) the $j^{t h}$ deception interval does not cover the maximum possible learning outcome $λ_{K}^{learnt}$ ; and (ii) the smallest deception value w.r.t the learning outcome $λ_{k_{j}^{m a x} + 1}^{learnt}$ is less than the largest deception value w.rt the learning outcome $λ_{k_{j}^{m i n}}^{learnt}$ . Intuitively, (ii) implies that the upper bound of the $j^{t h}$ deception interval is strictly less than $l b_{k_{j}^{m a x} + 1}$ . Otherwise, this deception upper bound will correspond to an uncertainty set which covers the learning outcome $λ_{k_{j}^{m a x} + 1}^{learnt}$ , which is contradict to the fact that $λ_{k_{j}^{m a x}}^{learnt}$ (which is strictly less that $λ_{k_{j}^{m a x} + 1}^{learnt}$ ) is the maximum learning outcome for the $j^{t h}$ deception interval.

In this case, the interval

i n t_{j}^{dec}

is determined as follows:

\begin{matrix} i n t_{j}^{dec} = [s t a r t, l b_{k_{j}^{m a x} + 1}) if o p e n = f a l s e \\ i n t_{j}^{dec} = (s t a r t, l b_{k_{j}^{m a x} + 1}) if o p e n = t r u e \end{matrix}

Note that, since

Λ_{j}^{learnt}

is the uncertainty set of

s t a r t

with the smallest and largest indices of (

k_{j}^{m i n}, k_{j}^{m a x}

), we have:

l b_{k_{j}^{m i n}} \leq l b_{k_{j}^{m a x}} \leq s t a r t

and

u b_{k_{j}^{m i n} - 1} < s t a r t

. Therefore, for any

λ^{dec} \in i n t_{j}^{dec}

, we obtain:

\begin{matrix} l b_{k_{j}^{m i n}} \leq s t a r t \leq λ^{dec} and λ^{dec} < l b_{k_{j}^{m a x} + 1} \leq u b_{k_{j}^{m i n}} \\ l b_{k_{j}^{m a x}} \leq s t a r t \leq λ^{dec} and λ < u b_{i} \leq u b_{k_{j}^{m a x}} \\ λ^{dec} < l b_{k_{j}^{m a x} + 1} and λ^{dec} \geq s t a r t > u b_{k_{j}^{m i n} - 1} \end{matrix}

which means

λ_{k_{j}^{m i n}}^{learnt}

and

λ_{k_{j}^{m a x}}^{learnt}

belongs to the uncertainty set of

λ^{dec}

while

λ_{k_{j}^{m i n} - 1}^{learnt}

and

λ_{k_{j}^{m a x} + 1}^{learnt}

do not. Thus,

Λ_{j}^{learnt}

is the uncertainty set of

λ^{dec}

. Since

i n t_{j}^{dec}

is open-right, the left bound of

i n t_{j + 1}^{dec}

is

s t a r t = l b_{m_{j} + 1}

and

o p e n = f a l s e

, and

Λ_{j + 1}^{learnt}

is determined accordingly.

Case 2: $k_{j}^{m a x} = K$ or $l b_{k_{j}^{m a x} + 1} > u b_{k_{j}^{m i n}}$ . Note that when $l b_{k_{j}^{m a x} + 1} > u b_{k_{j}^{m i n}}$ , the upper bound of the $j^{t h}$ deception interval must be at most $u b_{k_{j}^{m i n}}$ . This is to ensure that this upper bound covers the learning outcome $λ_{k_{j}^{m i n}}^{learnt}$ .

In this case, deception interval

i n t_{j}^{dec}

is determined as follows:

\begin{matrix} i n t_{j}^{dec} = [s t a r t, u b_{k_{j}^{m i n}}] if o p e n = f a l s e \\ i n t_{j}^{dec} = (s t a r t, u b_{k_{j}^{m i n}}] if o p e n = t r u e \end{matrix}

The argument for this case is similar. For the sake of analysis, since

k_{j}^{m a x} = K

which is the largest index of

λ^{learnt}

in the entire set

Λ^{learnt}

, we set

l b_{k_{j}^{m a x} + 1} = \infty

. For any

λ^{dec} \in i n t_{j}^{dec}

, we have:

\begin{matrix} l b_{k_{j}^{m i n}} \leq s t a r t \leq λ^{dec} \leq u b_{k_{j}^{m i n}} \\ l b_{k_{j}^{m a x}} \leq s t a r t \leq λ^{dec} \leq u b_{k_{j}^{m i n}} \leq u b_{k_{j}^{m a x}} \\ λ^{dec} \leq u b_{k_{j}^{m i n}} < l b_{k_{j}^{m a x} + 1} and λ^{dec} \geq s t a r t > u b_{k^{m i n} - 1} \end{matrix}

which implies

Λ_{j}^{learnt}

is the uncertainty set of

λ^{dec}

. Since

i n t_{j}^{dec}

is closed-right, the left bound of

i n t_{j + 1}^{dec}

is

s t a r t = u b_{k_{j}^{m i n}}

and

o p e n = t r u e

, concluding our proof. □

4.1.2. Divide and Conquer: (Divide $P_{discrete}^{dec}$ ) into a $O (K)$ Polynomial Sub-Problems

Lemma 4

(Divide-and-conquer). The problem (

P_{discrete}^{dec}

) can be decomposed into

O (K)

sub-problems

{(P_{j}^{dec})}

according to the decomposibility of the deception space. Each of these sub-problems can be solved in polynomial time.

Indeed, we can now divide the problem (

P_{discrete}^{dec}

) into multiple sub-problems which correspond to the decomposition of the deception space (Lemma 1). Essentially, each sub-problem optimizes

λ^{dec}

(and

λ^{learnt}

) over the deception sub-interval

i n t_{j}^{dec}

(and its corresponding uncertainty set

Λ_{j}^{learnt}

), as shown in the following:

\begin{matrix} (P_{j}^{dec}) : & max_{λ^{dec} \in i n t_{j}^{dec}} U^{a} \\ s . t . & U^{a} \leq U^{a} (x (λ_{k}^{learnt}), λ^{dec}), \forall λ_{k}^{learnt} \in Λ_{j}^{learnt} \end{matrix}

which maximizes the attacker’s worst-case utility w.r.t uncertainty set

Λ_{j}^{learnt}

. Note that the defender strategies

x (λ_{k}^{learnt})

can be pre-computed for every outcome

λ_{k}^{learnt}

. Each sub-problem

(P_{j}^{dec})

has a constant number of constraints, but still remain non-convex. Our Lemma 5 shows that despite of the non-convexity, the optimal solution for

(P_{j}^{dec})

is actually straightforward to compute.

Lemma 5.

The optimal solution of

λ^{dec}

for each sub-problem,

P_{j}^{dec}

, is the (right) upper limit of the corresponding deception sub-interval

i n t_{j}^{dec}

.

This observation is derived based on the fact that the attacker’s utility,

U^{a} (x, λ)

, is an increasing function of

λ

[4]. Therefore, in order to solve (

P_{discrete}^{dec}

), we only need to iterate over right bounds of

i n t_{j}^{dec}

and select the best j such that the attacker’s worst-case utility (i.e., the objective of

(P_{j}^{dec})

), is the highest among all sub-intervals. Since there are

O (K)

sub-problems, (

P_{discrete}^{dec}

) can be solved optimally in a polynomial time, concluding our proof for Theorem 1.

4.2. Solution Quality Analysis

We now focus on analyzing the solution quality of our method presented in Section 4.1 to approximately solve the deception problem

(P^{dec})

. Intuitively, let us denote by

λ_{*}^{dec}

the optimal solution of

(P^{dec})

and

U_{worst - case}^{a} (λ_{*}^{dec})

is the corresponding worst-case utility of the attacker under the uncertainty of learning outcomes in

(P^{dec})

. We also denote by

λ_{discrete}^{dec}

the optimal solution of (

P_{discrete}^{dec}

). Then, Theorem 2 states that:

\begin{matrix} U_{worst - case}^{a} (λ_{*}^{dec}) \geq U_{worst - case}^{a} (λ_{discrete}^{dec}) \geq U_{worst - case}^{a} (λ_{*}^{dec}) - ϵ \end{matrix}

Theorem 2.

For any arbitrary

ϵ > 0

, there always exists a discretization step size

η > 0

such that the optimal solution of the corresponding (

P_{discrete}^{dec}

) is ϵ-optimal for

(P^{dec})

.

Proof.

Let us denote by

λ_{*}^{dec}

the optimal solution of

(P^{dec})

. Then the worst-case utility of the attacker is determined as follows:

\begin{matrix} U^{worst} (λ_{*}^{dec}) & = min_{λ^{learnt} \in [λ_{*}^{dec} - δ, λ_{*}^{dec} + δ]} U^{a} (x (λ^{learnt}), λ_{*}^{dec}) \end{matrix}

On the other hand, let us denote by

λ_{discrete}^{dec}

the optimal solution of

(P_{discrete}^{dec})

. Then the discretized worst-case utility of the attacker is determined as follows:

\begin{matrix} U_{discrete}^{worst} (λ_{discrete}^{dec}) & = min_{λ^{learnt} \in Λ_{discrete}^{learnt} (λ_{discrete}^{dec})} U^{a} (x (λ^{learnt}), λ_{discrete}^{dec}) \end{matrix}

Note that,

U_{discrete}^{worst} (λ_{discrete}^{dec})

is not the actual worst-case utility of the attacker for mimicking

λ_{discrete}^{dec}

since it is computed based on the discrete uncertainty set, rather than the original continuous uncertainty set. In fact, the actual attacker worst-case utility is

U^{worst} (λ_{discrete}^{dec})

. We will show that for any

ϵ > 0

, there exists a discretization step size

η

such that:

\begin{matrix} U^{worst} (λ_{*}^{dec}) \geq U^{worst} (λ_{discrete}^{dec}) \geq U^{worst} (λ_{*}^{dec}) - ϵ \end{matrix}

(5)

Observe that the first inequality is easily obtained since

λ_{*}^{dec}

the optimal solution of

(P^{dec})

. Therefore, we will focus on the second inequality. First, we obtain the following inequalities:

\begin{matrix} U^{worst} (λ_{*}^{dec}) \leq U_{discrete}^{worst} (λ_{*}^{dec}) \leq U_{discrete}^{worst} (λ_{discrete}^{dec}) \end{matrix}

The first inequality is obtained based on the fact that the discretized uncertainty set is a subset of the actual continuous uncertainty range

Λ_{discrete}^{learnt} (λ_{*}^{dec}) \subset [λ_{*}^{dec} - δ, λ_{*}^{dec} + δ]

. The second inequality is derived from the fact that

λ_{discrete}^{dec}

is the optimal solution of

(P_{discrete}^{dec})

. Therefore, in order to obtain the second inequality of (5), we are going to prove that for any

ϵ > 0

, there exists

η > 0

such that:

\begin{matrix} U^{worst} (λ_{discrete}^{dec}) + ϵ \geq U_{discrete}^{worst} (λ_{discrete}^{dec}) \end{matrix}

(6)

Let us denote by

λ_{*}^{learnt}

the worst-case learning outcome with respect to

λ_{discrete}^{dec}

within the uncertainty range

[λ_{discrete}^{dec} - δ, λ_{discrete}^{dec} + δ]

. That is,

\begin{matrix} U^{worst} (λ_{discrete}^{dec}) = U^{a} (x (λ_{*}^{learnt}), λ_{discrete}^{dec}) \end{matrix}

Since

Λ_{discrete}^{learnt} (λ_{discrete}^{dec})

is a discretization of

[λ_{discrete}^{dec} - δ, λ_{discrete}^{dec} + δ]

, there exist a

λ_{k}^{learnt} \in Λ_{discrete}^{learnt} (λ_{discrete}^{dec})

such that

| λ_{k}^{learnt} - λ_{*}^{learnt} | \leq η

. Now, according to the definition of the discretized worst-case utility of the attacker, we have:

\begin{matrix} U_{discrete}^{worst} (λ_{discrete}^{dec}) \leq U^{a} (x (λ_{k}^{learnt}), λ_{discrete}^{dec}) \end{matrix}

Therefore, proving (6) now induces to proving

\exists η

:

\begin{matrix} U^{a} (x (λ_{k}^{learnt}), λ_{discrete}^{dec}) - U^{a} (x (λ_{*}^{learnt}), λ_{discrete}^{dec}) \leq ϵ \end{matrix}

where

| λ_{k}^{learnt} - λ_{*}^{learnt} | \leq η

. First, according to [23], for any

λ

, the defender’s corresponding optimal strategy

x (λ)

is a differentiable function of

λ

. Second, the attacker’s utility

U^{a} (x, λ_{discrete}^{dec})

is a differentiable function of the defender’s strategy

x

for any

λ_{discrete}^{dec}

. Therefore,

U^{a} (x (λ), λ_{discrete}^{dec})

is differentiable (and thus continuous) at

λ

. According to the continuity property, for any

ϵ > 0

, there always exists

η > 0

such that:

\begin{matrix} U^{a} (x (λ), λ_{discrete}^{dec}) - U^{a} (x (λ_{*}^{learnt}), λ_{discrete}^{dec}) \leq ϵ \end{matrix}

for all

λ

such that

| λ - λ_{*}^{learnt} | \leq η

, concluding our proof. □

4.3. Heuristic to Improve Discretization

According to Theorem 2, we can obtain a high-quality solution for

(P^{dec})

by having a fine discretization of the learning outcome space with a small step size

η

. In practice, it is not necessary to have a fine discretization over the entire learning space right from the begining. Instead, we can start with a coarse discretization and solve the corresponding (

P_{discrete}^{dec}

) to obtain a solution of

λ_{discrete}^{dec}

. We then refine the discretization only within the uncertainty range of the current solution,

[λ_{discrete}^{dec} - δ, λ_{discrete}^{dec} + δ]

. We keep doing that until the uncertainty range of the latest deception solution reaches the step-size limit which guarantees the

ϵ

-optimality. Practically, by doing so, we will obtain a much smaller discretized learning outcome set (aka. smaller K). As a result, the computational time for solving (

P_{discrete}^{dec}

) is substantially faster while the solution quality remains the same.

5. Defender Counter-Deception

In order to counter the attacker’s imitative deception, we propose to find a counter-deception defense function

H : [0, λ^{m a x} + δ] \to X

which maps a learnt parameter

λ^{learnt}

to a strategy

x

of the defender. In designing an effective

H

, we need to take into account that the attacker will also adapt its deception choice accordingly, denoted by

λ^{dec} (H)

. Essentially, the problem of finding an optimal defense function which maximizes the defender’s utility against the attacker’s deception can be abstractly represented as follows:

\begin{matrix} max_{H} U^{d} (H, λ^{dec} (H)) \end{matrix}

where

λ^{dec} (H)

is the deception choice of the attacker with respect to the defense function

H

and

U^{d}

is the defender’s utility corresponding to

(H, λ^{dec} (H))

. Finding an optimal

H

is challenging since the domain

[0, λ^{m a x} + δ]

of

λ^{learnt}

is continuous and there is no explicit closed-form expression of

H

as a function of

λ^{learnt}

. For the sake of our analysis, we divide the entire domain

[0, λ^{m a x} + δ]

into a number of sub-intervals

I = {I_{1}^{d}, I_{2}^{d}, \dots, I_{N}^{d}}

where

I_{1}^{d} = [λ_{1}^{def}, λ_{2}^{def}]

,

I_{2}^{d} = (λ_{2}^{def}, λ_{3}^{def}]

, &,

I_{N}^{d} = (λ_{N}^{def}, λ_{N + 1}^{def}]

with

0 = λ_{1}^{def} \leq λ_{2}^{def} \leq \dots \leq λ_{N + 1}^{def} = λ^{m a x} + δ

, and N is the number of sub-intervals. We define a defense function with respect to the interval set:

H^{I} : I \to X

which maps each interval

I_{n}^{d} \in I

to a single defense strategy

x_{n}

, i.e.,

H^{I} (I_{n}^{d}) = x_{n} \in X

, for all

n \leq N

. We denote the set of these strategies by

X^{def} = {x_{1}, \dots, x_{N}}

. Intuitively, all

λ^{learnt} \in I_{n}^{d}

will lead to a single strategy

x_{n}

. Our counter-deception problem now becomes finding an optimal defense function

H_{*} = (I_{*}, H_{*}^{I_{*}})

that comprises of (i) an optimal interval set

I_{*}

; and (ii) corresponding defense strategies determined by the defense function

H_{*}^{I_{*}}

with respect to

I_{*}

, taking into account the attacker’s deception adaptation. Essentially, (

I_{*}, H_{*}^{I_{*}}

) is the optimal solution of the following optimization problem:

\begin{matrix} max_{I, H^{I}} & U^{d} (H^{I}, λ^{dec} (H^{I})) \end{matrix}

(7)

\begin{matrix} s . t . & λ^{dec} (H^{I}) \in \underset{λ^{dec} \in [0, λ^{m a x}]}{argmax} min_{x \in X (λ^{dec})} U^{a} (x, λ^{dec}) \end{matrix}

(8)

where

λ^{dec} (H^{I})

is the maximin deception choice of the attacker. Here,

X (λ^{dec}) = {x_{n} : I_{n}^{d} \cap [λ^{dec} - δ, λ^{dec} + δ] \neq \emptyset}

is the uncertainty set of the attacker when playing

λ^{dec}

. This uncertainty set contains all possible defense strategy outcomes with respect to the deceptive value

λ^{dec}

.

Main Result. To date, we have not explicitly defined the objective function,

U^{d} (H^{I}, λ^{dec} (H^{I}))

, except that we know this utility depends on the defense function

H^{I}

and the attacker’s deception response

λ^{dec} (H^{I})

. Now, since

H^{I}

maps each possible learning outcome

λ^{learnt}

to a defense strategy, we know that if

λ^{learnt} \in I_{n}^{d}

, then

U^{d} (H^{I}, λ^{dec} (H^{I})) = U^{d} (x_{n}, λ^{dec} (H^{I}))

, which can be computed using Equation (3). However, due to the deviation of

λ^{learnt}

from the attacker’s deception choice,

λ^{dec} (H^{I})

, different possible learning outcomes

λ^{learnt}

within

[λ^{dec} (H^{I}) - δ, λ^{dec} (H^{I}) + δ]

may belong to different intervals

I_{n}^{d}

(which correspond to different strategies

x_{n}

), leading to different utility outcomes for the defender. One may argue that to cope with this deception-learning uncertainty, we can apply the maximin approach to determine the defender’s worst-case utility if the defender only has the common knowledge that

λ^{learnt} \in [λ^{dec} (H^{I}) - δ, λ^{dec} (H^{I}) + δ]

. Furthermore, perhaps, depending on any additional (private) knowledge the defender has regarding the relation between the attacker’s deception and the actual learning outcome of the defender, we can incorporate such knowledge into our model and algorithm to obtain an even better utility outcome for the defender. Interestingly, we show that there is, in fact, a universal optimal defense function for the defender,

H_{*}

, regardless of any additional knowledge that he may have. That is, the defender obtains the highest utility by following this defense function, and additional knowledge besides the common knowledge cannot make the defender do better. Our main result is formally stated in Theorem 3.

Theorem 3.

There is a universal optimal defense function, regardless of any additional information (besides the common knowledge) he has about the relation between their learning outcome and the deception choice of the attacker. Formally, let us consider the following optimization problem:

\begin{matrix} (P^{counter}) : max_{x, λ} & U^{d} (x, λ) \\ s . t . & U^{a} (x, λ) \geq min_{x^{'} \in X} U^{a} (x^{'}, λ^{m a x}) \\ 0 \leq λ \leq λ^{m a x}, x \in X \end{matrix}

Denote by

(x^{*}, λ^{*})

an optimal solution of

(P^{counter})

, then an optimal solution of (7),

H_{*}

can be determined as follows:

If $λ^{*} = λ^{m a x}$ , choose the interval set $I_{*} = {I_{1}^{d}}$ with $I_{1}^{d} = [0, λ^{m a x} + δ]$ covering the entire learning space, and function $H_{*}^{I_{*}} (I_{1}^{d}) = x_{1}$ where $x_{1} = x^{*}$ .
If $λ^{*} < λ^{m a x}$ , choose the interval set $I_{*} = {I_{1}^{d}, I_{2}^{d}}$ with $I_{1}^{d} = [0, λ^{*} + δ]$ , $I_{2}^{d} = (λ^{*} + δ, λ^{m a x} + δ]$ . In addition, choose the defender strategies $x_{1} = x^{*}$ and $x_{2} \in {argmin}_{x \in X} U^{a} (x, λ^{m a x})$ correspondingly.

The attacker’s optimal deception against this defense function is to mimic

λ^{*}

. As a result, the defender always obtains the highest utility,

U^{d} (x^{*}, λ^{*})

, while the attacker receives the maximin utility of

U^{a} (x^{*}, λ^{*})

.

Example 1.

Let us give a concrete example illustrating the result in Theorem 3. Considering a 3-target security game with the following payoff matrix shown in Table 1:

In this game, the defender has 1 security resource. The maximum deception value of the attacker is

λ^{m a x} = 3

and the uncertainty level

δ = 0.25

. By solving

(P^{counter})

, we obtain a corresponding defender strategy

x^{*} = [0, 1, 0]

and the attacker behavior parameter

λ^{*} = 0

. Since

λ^{*} < λ^{m a x}

, the optimal counter-deception defense function is as follows:

If the defender learns $λ^{learnt} \in [0, 0.25]$ , the defender will play a strategy $x_{1} = x^{*} = [0, 1, 0]$ .
Otherwise, if the defender learns $λ^{learnt} \in (0.25, 3.25]$ , the defender then plays $x_{2} = [0.34, 0.20, 0.46] \in {argmin}_{x \in X} U^{a} (x, λ^{m a x})$ .

Given the defender follows this counter-deception function, the attacker’s optimal deception is to mimic

λ^{*} = 0

, meaning the attacker just simply attacks each target uniformly at random. Here is the reason why this is the optimal choice for the attacker:

If the attacker chooses $λ^{dec} = λ^{*} = 0$ , the corresponding learning outcome for the defender can be any value within the range $[0, 0, 25]$ . According to the defense function, the defender will always play the strategy $x_{1} = [0, 1, 0]$ . As a result, the attacker’s expected utility is $\frac{1}{3} \times 2 + \frac{1}{3} \times (- 2) + \frac{1}{3} \times 3 = 1.0$ .
Now, if the attacker chooses $λ^{dec} > λ^{*} = 0$ , the corresponding learning outcome for the defender may fall into either $[0, 0.25]$ or $(0.25, 3.25]$ . In particular, if the learning outcome $λ^{learnt} \in (0.25, 3.25]$ , it means the defender plays $x_{2} = [0.34, 0.20, 0.46] \in {argmin}_{x \in X} U^{a} (x, λ^{m a x})$ . In this case, the resulting attacker utility is $U^{a} (x_{2}, λ^{dec}) \leq U^{a} (x_{2},$ $λ^{m a x}) = 0.33$ (this inequality is due to the fact that the attacker utility is an increasing function of $λ^{dec}$ ). As a result, the worst-case utility of the attacker is no more than $0.33$ which is strictly lower than the utility of $1.0$ when the attacker mimics $λ^{dec} = λ^{*} = 0$ .

Corollary 1.

When

λ^{m a x} = + \infty

, the defense function

H_{*}

(specified in Theorem 3) gives the defender a utility which is no less than their Strong Stackelberg equilbrium (SSE) utility.

The proof of Corollary 1 is straightforward. Since

(x^{sse}, λ^{m a x} = + \infty)

is a feasible solution of (

P^{counter}

), the optimal utility of the defender

U^{d} (x^{*}, λ^{*})

is thus no less than

U^{d} (x^{sse}, λ^{m a x})

(

x^{sse}

denotes the defender’s SSE strategy).

Now the rest of this section will be devoted to prove Theorem 3. The full proof of Theorem 3 can be decomposed into three main parts: (i) We first analyze the attacker deception adapted to the defender’s counter deception; (ii) Based on the result of the attacker adaptation, we provide theoretical results on computing the defender optimal defense function given a fixed set of sub-intervals

I

; and (iii) Finally, we complete the proof of the theorem leveraging the result in (ii).

5.1. Analyzing Attacker Deception Adaptation

In this section, we aim at understanding the behavior of the attacker deception against

H^{I}

. Overall, as discussed in the previous section, since the attacker is uncertain about the actual learning outcome of the defender, the attacker can attempt to find an optimal deception choice

λ^{dec} (H^{I})

that maximizes its utility under the worst case of uncertainty. Essentially,

λ^{dec} (H^{I})

is an optimal solution of the following maximin problem:

\begin{matrix} max_{λ^{dec} \in [0, λ^{m a x}]} & min_{x \in X (λ^{dec})} U^{a} (x, λ^{dec}) \end{matrix}

where:

X (λ^{dec}) = {x_{n} : I_{n}^{d} \cap [λ^{dec} - δ, λ^{dec} + δ] \neq \emptyset}

is the uncertainty set of the attacker with respect to the defender’s sub-intervals

I

. In this problem, the uncertainty set

X (λ^{dec})

depends on

λ^{dec}

that we need to optimize, making this problem challenging to solve.

5.1.1. Decomposability of Attacker Deception Space

First, given

H^{I}

, we show that we can divide the range of

λ^{dec}

into several intervals, each interval corresponds to the same uncertainty set. This characteristic of the attacker uncertainty set is, in fact, similar to the no-counter-deception scenario as described in previous section. We propose Algorithm 2 to determine these intervals of

λ^{dec}

, which works in a similar fashion as Algorithm 1. The main difference is that in the presence of the defender’s defense function, the attacker’s uncertainty set

X (λ^{dec})

is determined based on whether the uncertainty range of the attacker

[λ^{dec} - δ, λ^{dec} + δ]

is overlapped with the defender’s intervals

I = {I_{n}^{d}}

or not.

Algorithm 2: Counter-deception—Decomposition of QR parameter into sub-intervals

Essentially, similar to Algorithm 1, Algorithm 2 also iteratively divides the range of

λ^{dec}

into multiple intervals, (with an abuse of notation) denoted by

{i n t_{j}^{dec}}

. Each of these intervals,

i n t_{j}^{dec}

, corresponds to the same uncertainty set of

x_{n}

, denoted by

X_{j}^{def}

. In this algorithm, for each interval of the defender

I_{n}^{d}

,

l b_{n} = λ_{n}^{def} - δ

and

u b_{n + 1} = λ_{n + 1}^{def} + δ

represent the smallest and largest possible deceptive values of

λ^{dec}

so that

I_{n}^{d} \cap [λ^{dec} - δ, λ^{dec} + δ] \neq \emptyset

. In addition,

n_{j}^{m i n}

and

n_{j}^{m a x}

denote the smallest and largest indices of the defender’s strategies in the set

X^{def} = {x_{1}, x_{2}, \dots, x_{N}}

that belongs to

X_{j}^{def}

. Algorithm 2 relies on Lemma 6 and 7. Note that Algorithm 2 does not check if each interval

i n t_{j}^{dec}

of

λ^{dec}

is left-open or not since all intervals of the defender

I_{n}^{d}

is left-open (except for

n = 1

), making all

i n t_{j}^{dec}

left-closed (except for

j = 1

).

Lemma 6.

Given a deceptive

λ^{dec}

, for any

n_{1} < n_{2}

such that

x_{n_{1}}, x_{n_{2}} \in X (λ^{dec})

, then

x_{n} \in X (λ^{dec})

for any

n_{1} < n < n_{2}

.

Lemma 7.

For any

λ^{dec}

such that

l b_{n} < λ^{dec} \leq u b_{n + 1}

2, the uncertainty range of

λ^{dec}

overlaps with the defender’s interval

I_{n}^{d}

, i.e.,

I_{n}^{d} \cap [λ^{dec} - δ, λ^{dec} + δ] \neq \emptyset

, or equivalently,

x_{n} \in X (λ)

. Otherwise, if

λ^{dec} \leq l b_{n}

or

λ^{dec} > u b_{n + 1}

, then

x_{n} \notin X (λ^{dec})

.

The proofs of these two lemmas are straightforward so we omit them for the sake of presentation. Essentially, this algorithm divides the range of

λ^{dec}

into multiple intervals, (with an abuse of notation) denoted by

{i n t_{j}^{dec}}

. Each of these intervals,

i n t_{j}^{dec}

, corresponds to the same uncertainty set of

x_{n}

, denoted by

X_{j}^{def}

. An example of decomposing the deceptive range of

λ^{dec}

is shown in Figure 2.

5.1.2. Characteristics of Attacker Optimal Deception

We denote by M the number of attacker intervals. Given the division of the attacker’s deception range

{i n t_{j}^{dec}}

, we can divide the problem of attacker deception into M sub-problems. Each corresponds to a particular

i n t_{j}^{dec}

where

j \in {1, \dots, M}

, as follows:

\begin{matrix} ({\bar{P}}_{j}^{dec}) : U_{j}^{a, *} = max_{λ^{dec} \in i n t_{j}^{dec}} min_{x_{n} \in X_{j}^{def}} U^{a} (x_{n}, λ^{dec}) \end{matrix}

Lemma 8.

For each sub-problem (

{\bar{P}}_{j}^{dec}

) with respect to the deception sub-interval

i n t_{j}^{dec}

, the attacker optimal deception is to imitate the right-bound of

i n t_{j}^{dec}

, denoted by

{\bar{λ}}_{j}^{dec}

.

The proof of Lemma 8 is derived based on the fact that the attacker’s utility

U^{a} (x_{n}, λ^{dec})

is increasing in

λ^{dec}

. As a result, the attacker only has to search over the right bounds,

{{\bar{λ}}_{j}^{dec}}

, of all intervals

{i n t_{j}^{dec}}

to find the best one among the sub-problems that maximizes the attacker’s worst-case utility. We consider these bounds

{\bar{λ}}_{j}^{dec}

to be the deception candidates of the attacker. Let us assume

j^{o p t}

is the best deception choice for the attacker among these candidates, that is, the attacker will mimic the

{\bar{λ}}_{j^{o p t}}^{dec}

. We obtain the following observations about important properties of the attacker’s optimal deception, which we leverage to determine an optimal defense function later.

Our following Lemma 9 says that any non-optimal deception candidate for the attacker,

{\bar{λ}}_{j}^{dec} \neq {\bar{λ}}_{j^{o p t}}^{dec}

, such that the max index of the defender strategy in the corresponding uncertainty set

X_{j}^{def}

, denoted by

n_{j}^{m a x}

, satisfies

n_{j}^{m a x} \leq n_{j^{o p t}}^{m a x}

, then the deception candidate

{\bar{λ}}_{j}^{dec}

is strictly less than

{\bar{λ}}_{j^{o p t}}^{dec}

, or equivalently,

j < j^{o p t}

. Otherwise,

j^{o p t}

cannot be a best deception response.

Lemma 9.

For any

j \neq j^{o p t}

s.t.

n_{j}^{m a x} \leq n_{j^{o p t}}^{m a x}

, then

{\bar{λ}}_{j}^{dec} < {\bar{λ}}_{j^{o p t}}^{dec}

, or equivalently,

j < j^{o p t}

.

Proof.

Lemma 9 can be proved by contradiction as follows. Let us assume if there is

j > j^{o p t}

such that

n_{j}^{m a x} \leq n_{j^{o p t}}^{m a x}

. According to Algorithm 2, for any attacker interval indices

j > j^{'}

, we have the min and max indices of the defender’s strategies in corresponding uncertainty sets must satisfy:

n_{j}^{m i n} \geq n_{j^{'}}^{m i n}

and

n_{j}^{m a x} \geq n_{j^{'}}^{m a x}

, and they can not be both equal. That is because the intervals

{i n t_{j}^{dec}}

returned by Algorithm 2 are sorted in a strictly increasing order. Therefore, if there is

j > j^{o p t}

such that

n_{j}^{m a x} \leq n_{j^{o p t}}^{m a x}

, it means

n_{j}^{m i n} > n_{j^{o p t}}^{m i n}

and

n_{j}^{m a x} = n_{j^{o p t}}^{m a x}

. In other words, the uncertainty set

X_{j}^{def} \subset X_{j^{o p t}}^{d e f}

. Thus, we have the attacker’s optimal worst-case utility w.r.t deception intervals j and

j^{o p t}

must satisfy:

\begin{matrix} U_{j^{o p t}}^{a, *} = min_{x \in X_{j^{o p t}}^{def}} U^{a} (x, {\bar{λ}}_{j^{o p t}}^{dec}) \leq min_{x \in X_{j}^{def}} U^{a} (x, {\bar{λ}}_{j^{o p t}}^{dec}) < min_{x \in X_{j}^{def}} U^{a} (x, {\bar{λ}}_{j}^{dec}) = U_{j}^{a, *} \end{matrix}

since

U^{a} (x, λ)

is a strictly increasing function of

λ

3. This strict inequality shows that

j^{o p t}

cannot be an optimal deception for the attacker, concluding our proof for Observation 9.

Note that we denote right bounds of attacker intervals by

{{\bar{λ}}_{1}^{dec}, \dots, {\bar{λ}}_{M}^{dec} = λ^{m a x}}

. Our Lemma 10 then says that if the max index of the defender strategy

n_{j^{o p t}}^{m a x}

in the set

X_{j^{o p t}}

is equal to the max index of the whole defense set, N, then

{\bar{λ}}_{j^{o p t}}^{dec}

is equal to the highest value of the entire deception range, i.e.,

{\bar{λ}}_{j^{o p t}}^{dec} = {\bar{λ}}_{M} = λ^{m a x}

, or equivalently,

j^{o p t} = M

.

Lemma 10.

If

n_{j^{o p t}}^{m a x} = N

, then

j^{o p t} = M

.

Proof.

We also prove this observation using contradiction. Let us assume that

j^{o p t} < M

. Again, according to Algorithm 2, for any

j > j^{'}

, we have

n_{j}^{m i n} \geq n_{j^{'}}^{m i n}

and

n_{j}^{m a x} \geq n_{j^{'}}^{m a x}

, and they can not be both equal. Therefore, if

n_{j^{o p t}}^{m a x} = N

, then for all

j > j^{o p t}

, we have:

n_{j}^{m a x} = N

and

n_{j}^{m i n} > n_{j^{o p t}}^{m i n}

, which means

X_{j}^{def} \subset X_{j^{o p t}}^{def}

. Therefore, if

j^{o p t} < M

, then we obtain:

\begin{matrix} U_{j^{o p t}}^{a, *} = min_{x \in X_{j^{o p t}}^{def}} U^{a} (x, {\bar{λ}}_{j^{o p t}}^{dec}) \leq min_{x \in X_{M}^{def}} U^{a} (x, {\bar{λ}}_{j^{o p t}}^{dec}) < min_{x \in X_{M}^{def}} U^{a} (x, {\bar{λ}}_{M}^{dec}) = U_{M}^{a, *} \end{matrix}

which shows that

j^{o p t}

cannot be an optimal deception of the attacker, concluding the proof of Lemma 10. □

Remark 1.

According to Lemmas 9 and 10, we can easily determine which deception choices among the set

{{\bar{λ}}_{1}^{dec}, \dots, {\bar{λ}}_{M}^{dec}}

cannot be an optimal attacker deception, regardless of defense strategies

{x_{1}, \dots, x_{N}}

. These non-optimal choices are determined as follows: the deception choice

{\bar{λ}}_{j}

can not be optimal for:

Anyjsuch that there is a $j^{'} > j$ with $n_{j^{'}}^{m a x} \leq n_{j}^{m a x}$
Any $j < M$ such that $n_{j}^{m a x} = N$

For any other choices

{\bar{λ}}_{j}^{dec}

, there always exists defense strategies

{x_{1}, \dots, x_{N}}

such that

{\bar{λ}}_{j}^{dec}

is an optimal attacker deception.

5.2. Finding Optimal Defense Function $H^{I}$ Given Fixed I: Divide-and-Conquer

Given a set of sub-intervals

I

, we aim at finding optimal defense function

H^{I}

or equivalently, strategies

X^{def} = {x_{1}, x_{2}, \dots, x_{N}}

corresponding to these sub-intervals. According to previous analysis on the attacker’s deception adaptation, since the attacker’s best deception is one of the bounds

{{\bar{λ}}_{1}^{dec}, \dots, {\bar{λ}}_{M}^{dec}}

, we propose to decompose the problem of finding an optimal defense function

H^{I}

into multiple sub-problems

P_{j}^{counter}

, each corresponds to a particular best deception choice for the attacker. In particular, for each sub-problem

P_{j}^{counter}

, we attempt to find

H^{I}

such that

{\bar{λ}}_{j}^{dec}

is the best response of the attacker. As discussed in the remark of previous section, we can easily determine which sub-problem

P_{j}^{counter}

is not feasible. For any feasible optimal deception candidate

j^{fea}

, i.e.,

P_{j^{fea}}^{counter}

is feasible,

P_{j^{fea}}^{counter}

can be formulated as follows:

\begin{matrix} (P_{j^{fea}}^{counter}) : max_{H^{I}} & U^{d} (H^{I}, {\bar{λ}}_{j^{fea}}^{dec}) \\ s . t . & min_{x \in X_{j^{fea}}^{def}} U^{a} (x, {\bar{λ}}_{j^{fea}}^{dec}) \geq min_{x \in X_{j}^{def}} U^{a} (x, {\bar{λ}}_{j}^{dec}), \forall j \end{matrix}

where

U^{d} (H^{I}, {\bar{λ}}_{j^{fea}}^{dec})

is the defender’s utility when the defender commits to

H^{I}

and the attacker plays

{\bar{λ}}_{j^{fea}}^{dec}

. The constraints in

(P_{j^{fea}}^{counter})

guarantee that the attacker’s worst-case utility for playing

{\bar{λ}}_{j^{fea}}^{dec}

is better than playing other

{\bar{λ}}_{j}^{dec}

. Finally, our Propositions 2 and 3 determine an optimal solution for (

P_{j^{fea}}^{counter}

).

Proposition 2

(Sub-problem

P_{j^{fea}}^{counter}

). If

n_{j^{fea}}^{m a x} < N

, the best defense function for the defender is determined as follows:

For all $n > n_{j^{fea}}^{m a x}$ , choose $x_{n} = x_{>}^{*}$ where $x_{>}^{*}$ is an optimal solution of the following optimization problem:

$\begin{matrix} {min}_{x \in X} U^{a} (x, λ^{m a x}) \end{matrix}$
For all $n \leq n_{j^{fea}}^{m a x}$ , choose $x_{n} = x_{<}^{*}$ where $x_{<}^{*}$ is the optimal solution of the following optimization problem:

$\begin{matrix} U_{*}^{d} = {max}_{x \in X} & U^{d} (x, {\bar{λ}}_{j^{fea}}^{dec}) \\ s . t . & U^{a} (x, {\bar{λ}}_{j^{fea}}^{dec}) \geq U^{a} (x_{>}^{*}, λ^{m a x}) \end{matrix}$

By following the above defense function, an optimal deception of the attacker is to mimic

{\bar{λ}}_{j^{fea}}^{dec}

, and the defender obtains an utility of

U_{*}^{d}

.

Proof.

First, we show that the attacker optimal deception response is to

{\bar{λ}}_{j^{fea}}^{dec}

. Indeed, we have the uncertainty set

X_{j^{fea}}^{def} \equiv {x_{<}^{*}}

because the defender plays

x_{n} = x_{<}^{*}

for all

n \leq n_{j^{fea}}^{m a x}

. In addition, for all j such that

n_{j}^{m a x} > n_{j^{fea}}^{m a x}

, the uncertainty set

X_{j}^{def}

contains

x_{>}^{*}

. Therefore, we have the attacker worst-case utility satisfying:

\begin{matrix} U_{j}^{a, *} \leq U^{a} (x_{>}^{*}, {\bar{λ}}_{j}^{dec}) \leq U^{a} (x_{>}^{*}, λ^{m a x}) \leq U^{a} (x_{<}^{*}, {\bar{λ}}_{j^{fea}}^{dec}) = U_{j^{fea}}^{a, *} \end{matrix}

Furthermore, for all j such that

n_{j}^{m a x} \leq n_{j^{fea}}^{m a x}

, we have

j \leq j^{fea}

according to Observation 9. Thus, we obtain:

\begin{matrix} U_{j}^{a, *} = U^{a} (x_{<}^{*}, {\bar{λ}}_{j}^{dec}) \leq U^{a} (x_{<}^{*}, {\bar{λ}}_{j^{fea}}) = U_{j^{fea}}^{a, *} \end{matrix}

Based on the above defense function and the fact that the attacker will choose

{\bar{λ}}_{j^{fea}}^{dec}

, the defender receives an utility of

U_{*}^{d}

. Next, we prove that this is the best the defender can obtain by showing that any defense function

{x_{1}^{'}, \dots, x_{N}^{'}}

such that

j^{fea}

is the attacker’s best response will lead to a defender utility less than

U_{*}^{d}

. Indeed, since

n_{j^{fea}}^{m a x} < N

, it means

j^{fea} < M

or in other words,

{\bar{λ}}_{j^{fea}}^{dec} < {\bar{λ}}_{M} = λ^{m a x}

. On the other hand, since

{\bar{λ}}_{j^{fea}}^{dec}

is the best choice of the attacker, the following inequality must hold:

\begin{matrix} U_{j^{fea}}^{a, *} \geq U_{M}^{a, *} = min_{x \in X_{M}^{def}} U^{a} (x, λ^{m a x}) \geq min_{x \in X} U^{a} (x, λ^{m a x}) \end{matrix}

This means that any defense function

{x_{1}^{'}, \dots, x_{k}^{'}}

such that

j^{fea}

is the attacker’s best response has to satisfy the above inequality. As defined,

U_{*}^{d}

is the highest utility for the defender among these defense functions that satisfy the above inequality. □

Proposition 3

(Sub-problem

P_{j^{fea}}^{counter}

). If

n_{j^{fea}}^{m a x} = N

, the best counter-deception of the defender can be determined as follows: for all n, we set:

x_{n} = \hat{x}

where

\hat{x}

is an optimal solution of

{max}_{x \in X} U^{d} (x, λ^{m a x})

By following this defense function, the attacker’s best deception is to mimic

λ^{m a x}

and the defender obtains an utility of

U^{d} (\hat{x}, λ^{m a x})

.

Proof.

First, we observe that given

\hat{x}

,

{\bar{λ}}_{j^{fea}}

is the best response of the attacker. Indeed, since

j^{fea} = M

or equivalently

{\bar{λ}}_{j^{fea}} = λ^{m a x}

according to Observation 10, we have:

\begin{matrix} U_{j^{fea}}^{a, *} = U^{a} (\hat{x}, λ^{m a x}) \geq U^{a} (\hat{x}, {\bar{λ}}_{j}^{dec}) = U_{j}^{a, *}, \forall j \end{matrix}

Second, since

{\bar{λ}}_{j^{fea}} = λ^{m a x}

, then for any defense function such that

{\bar{λ}}_{j^{fea}}

is the best deception choice of the attacker, the resulting utility for the defender must be no more than:

\begin{matrix} {max}_{x \in X_{j^{fea}}^{def}} U^{d} (x, λ^{m a x}) \leq {max}_{x \in X} U^{d} (x, λ^{m a x}) \end{matrix}

regardless of the learning outcome

λ^{learnt} \in [λ^{m a x} - δ, λ^{m a x} + δ]

. This is because the defender eventually plays one of the defense strategies in the set

X_{j^{fea}}^{def}

. The RHS is the defender’s utility obtained by playing the counter-deception specified by the proposition. □

Based on Propositions 2 and 3, we can easily find the optimal counter-deception by choosing the solution of the sub-problem that provides the highest defender utility.

5.3. Completing the Proof of Theorem 3

According to Propositions 2 and 3, given an interval set

I

, the resulting defense function will only lead the defender to play either

{x_{>}^{*}, x_{<}^{*}}

or

{\hat{x}}

, whichever provides a higher utility for the defender. Based on this result, our Theorem 3 then identifies an optimal interval set, and corresponding optimal defense strategies, as we prove below.

First, we will show that if the defender follows the defense function specified in Theorem 3, then the attacker’s optimal deception is to mimic

λ^{*}

. Indeed, if

λ^{*} = λ^{m a x}

, then since the defender always plays

x^{*}

, the attacker’s optimal deception is to play

λ^{*} = λ^{m a x}

to obtain a highest utility

U^{a} (x^{*}, λ^{m a x})

.

On the other hand, if

λ^{*} < λ^{m a x}

, we consider two cases:

Case 1, if

λ^{m a x} - 2 δ \leq λ^{*} < λ^{m a x}

, then the intervals of the attackers are

i n t_{1}^{dec} = [0, λ^{*}]

and

i n t_{2}^{dec} = (λ^{*}, λ^{m a x}]

. The corresponding uncertainty sets are

X_{1}^{def} = {x_{1}}

and

X_{2}^{def} = {x_{1}, x_{2}}

. In this case, the attacker’s optimal deception is to mimic

λ^{*}

, since:

\begin{matrix} min_{x \in X_{1}^{def}} U^{a} (x, λ^{*}) = U^{a} (x^{*}, λ^{*}) \geq U^{a} (x_{2}, λ^{m a x}) \geq min_{x \in X_{2}^{def}} U^{a} (x, λ^{m a x}) \end{matrix}

Case 2, if

λ^{*} < λ^{m a x} - 2 δ

, then the corresponding intervals for the attacker are

i n t_{1}^{dec} = [0, λ^{*}]

,

i n t_{2}^{dec} = (λ^{*}, λ^{*} + 2 δ]

, and

i n t_{3}^{dec} = (λ^{*} + 2 δ, λ^{m a x}]

. These intervals of the attacker have uncertainty sets

X_{1}^{def} = {x_{1}}

,

X_{2}^{def} = {x_{1}, x_{2}}

, and

X_{3}^{def} = {x_{2}}

, respectively. The attacker’s best deception is thus to mimic

λ^{*}

, since the attacker’s worst-case utility is

{min}_{x \in X_{1}^{def}} U^{a} (x, λ^{*}) = U^{a} (x^{*}, λ^{*})

, and

\begin{matrix} U^{a} (x^{*}, λ^{*}) \geq U^{a} (x_{2}, λ^{m a x}) \geq {min}_{x \in X_{2}} U^{a} (x, λ^{*} + 2 δ) \\ U^{a} (x^{*}, λ^{*}) \geq U^{a} (x_{2}, λ^{m a x}) = {min}_{x \in X_{3}} U^{a} (x, λ^{m a x}) \end{matrix}

Now, since the attacker’s best deception is to mimic

λ^{*}

, according to the above analysis, the uncertainty set is

X_{1}^{def} = {x_{1} = x^{*}}

, thus the defender will play

x^{*}

in the end, leading to an utility of

U^{d} (x^{*}, λ^{*})

. This is the highest possible utility that the defender can obtain since both optimization problems presented in Propositions 2 and 3 are special cases of (

P^{counter}

) when we fix the variable

λ = λ^{m a x}

(for Proposition 3) or

λ = {\bar{λ}}_{j^{fea}}

(for Proposition 2).

6. Experimental Evaluation

Our experiments are run on a 2.8 GHz Intel Xeon processor with 256 GB RAM. We use

Matlab

(https://www.mathworks.com, accessed on 1 October 2022) to solve non-linear programs and

Cplex

(https://www.ibm.com/analytics/cplex-optimizer, accessed on 1 October 2022) to solve

MILP

s involved in the evaluated algorithms. We use a value of

λ^{m a x} = 5

in all our experiments (except in Figure 3g,h), and discretize the range

[0, λ^{m a x}]

using a step size of

0.2

:

λ \in {0, 0.2, \dots, λ^{m a x}}

. We use the covariance game generator,

GAMUT

(http://gamut.stanford/edu, accessed on 1 October 2022) to generate rewards and penalties of players within the range of

[1, 10]

(for attacker) and

[- 10, - 1]

(for defender). GAMUT takes as input a covariance value

r \in [- 1, 0]

which controls the correlations between the defender and the attacker’s payoff. Our results are averaged over 50 runs. All our results are statistically significant under bootstrap-t (

p = 0.05

).

Algorithms. We compare three cases: (i)

Non - Dec

: the attacker is non deceptive and the defender also assumes so. As a result, both play Strong Stackelberg equilibrium strategies; (ii)

Dec - δ

: the attacker is deceptive, while the defender does not handle the attacker’s deception (Section 4). We examine different uncertainty ranges by varying values of

δ

; and (iii)

Dec - Counter

: the attacker is deceptive while the defender tackle the attacker’s deception (Section 5).

Figure 3a,b compare the performance of our algorithms with increasing number of targets. These figures show that (i) the attacker benefits by playing deceptively (

Dec - 0

achieves 61% higher attacker utility than

Non - Dec

); (ii) the benefit of deception to the attacker is reduced when the attacker is uncertain about the defender’s learning outcome. In particular,

Dec - 0.25

achieves 4% lesser attacker utility than

Dec - 0

; (iii) the defender suffers a substantial utility loss due to the attacker’s deception and this utility loss is reduced in the presence of the attacker’s uncertainty; and finally, (iv) the defender benefits significantly (in their utility) by employing counter-deception against a deceptive attacker.

In Figure 3c,d, we show the performance of our algorithms with varying r (i.e., covariance) values. In zero-sum games (i.e.,

r = - 1

), the attacker has no incentive to be deceptive [4]. Therefore, we only plot the results of

r \in [- 0.2, - 0.8]

with a step size of

0.2

. This figure shows that when r gets closer to

- 1.0

(which implies zero-sum behavior), the attacker’s utility with deception (i.e.,

Dec - 0

and

Dec - 0.25

) gradually moves closer to its utility with

Non - Dec

, reflecting that the attacker has less incentive to play deceptively. Furthermore, the defender’s average utility in all cases gradually decreases when the covariance value gets closer to

- 1.0

. This results show that in

SSG

s, the defender’s utility is always governed by the adversarial level (i.e., the payoff correlations) between the players, regardless of whether the attacker is deceptive or not.

Figure 3e,f compare the attacker and defender utilities with varying uncertainty range, i.e.,

δ

values, on 60-target games. These figures show that attacker utilities decrease linearly with increasing values of

δ

. On the other hand, defender utilities increase linearly with increasing values of

δ

. This is reasonable as increasing

δ

corresponds to a greater width of the uncertainty interval that the attacker has to contend with. This increased uncertainty forces the attacker to play more conservatively, thereby leading to decreased utilities for the attacker and increased utilities for the defender.

In Figure 3g,h, we analyze the impact of varying

λ^{m a x}

on the players’ utilities in 60-target games. These figures show that (i) with increasing values of

λ^{m a x}

, the action space of a deceptive attacker increases, hence, the attacker utility increases as a result (

Dec - 0

,

Dec - 0.25

in both sub-figures); (ii) When this

λ^{m a x}

is close to zero, the attacker is limited to a less-strategic-attack zone and thus the defender’s strategies have less influence on how the attacker would response. The defender thus receives a lower utility when

λ^{m a x}

gets close to zero; and (iii) most importantly, the attacker utility against a counter-deceptive defender decreases with increasing values of

λ^{m a x}

. This result shows that when the defender plays counter-deception, the attacker can actually gain more benefit by committing to a more limited deception range.

Finally, we evaluate the runtime performance of our algorithms in Figure 4. We provide results for resource-to-target ratio

\frac{L}{T} = 0.3

and

0.5

. This figure shows that (i) even on 100 target games,

Dec - 0

finishes in ∼5 min. (ii) Due to the simplicity of the proposed counter-deception algorithm,

Counter - Dec

finishes in 13 s on 100 target games.

Additional Experiment Results

Figure 5 shows the performance of our algorithms as we vary the number of resources L on 80-target games and 20-target games. This figure shows that the benefits of deception and counter-deception to the players are observed consistently when varying L. It shows that (i) the defender (attacker) utilities steadily increase (decrease) with increasing L; and (ii) the trends observed between the different algorithms in Figure 5 are observed consistently at different values of L. In Figure 6, we compare different algorithms with increasing number of targets when

\frac{L}{T} = 0.5

. We observe similar trends in these additional results.

7. Conclusions

This paper provides a comprehensive analysis of the attacker deception and defender counter-deception under uncertainty. Our algorithms are developed based on the decomposibility of the attacker’s deception space and the discretization of the defender’s learning outcome. Our key finding is that the optimal counter-deception defense solution only depends on the common knowledge of players about the uncertainty range of the defender’s learning outcome. Finally, our extensive experiments show the effectiveness of our counter-deception solutions in handling the attacker’s deception.

As for future work, this article focus on the attacker deception and defender counter-deception in the context of the Quantal Response model, which only has a single model parameter. Given promising results of this article, investigating the attacker deception in more complex model settings such as neural nets would be interesting future direction.

Author Contributions

Conceptualization, T.H.N. and A.Y.; methodology, T.H.N.; validation, A.Y.; writing—original draft preparation, T.H.N. and A.Y.; writing—review and editing, T.H.N. All authors have read and agreed to the published version of the manuscript.

Funding

Dr. Nguyen was supported by ARO Grant No. W911NF-20-1-0344 and Dr. Yadav was supported in part by ARO Grant No. W911NF-21-1-0047.

Conflicts of Interest

The authors declare no conflict of interest.

Notes

1	We use a uniform discretization for the sake of solution quality analysis (as we will describe later). Our approach can be generalized to any non-uniform discretization.
2	Lemma 7 is stated for the general case $n > 1$ when the defender’s interval $I_{n}^{d}$ is left-open. When $n = 1$ with the left bound is included, we have $l b_{n} \leq λ^{dec} \leq u b_{n + 1}$ .
3	There is a degenerate case in which $U^{a} (x, λ)$ is constant for all $λ$ , when the defense strategy $x$ leads to an identical expected utility for the attacker across all targets. To avoid this case, we can add a small noise to such defense strategy $x$ so that these attacker expected utilities vary across the targets, while ensuring that this noise only leads to a small change in the defender’s utility.

References

Tambe, M. Security and Game Theory: Algorithms, Deployed Systems, Lessons Learned; Cambridge University Press: Cambridge, UK, 2011. [Google Scholar]
Yang, R.; Kiekintveld, C.; Ordonez, F.; Tambe, M.; John, R. Improving resource allocation strategy against human adversaries in security games. In Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, Barcelona, Spain, 16–22 July 2011. [Google Scholar]
Nguyen, T.H.; Yang, R.; Azaria, A.; Kraus, S.; Tambe, M. Analyzing the effectiveness of adversary modeling in security games. In Proceedings of the AAAI Conference on Artificial Intelligence, Bellevue, WA, USA, 41–18 October 2013. [Google Scholar]
Nguyen, T.H.; Vu, N.; Yadav, A.; Nguyen, U. Decoding the Imitation Security Game: Handling Attacker Imitative Behavior Deception. In Proceedings of the 24th European Conference on Artificial Intelligence, Compostela, Spain, 29 August–8 September 2020. [Google Scholar]
Gholami, S.; Yadav, A.; Tran-Thanh, L.; Dilkina, B.; Tambe, M. Do not Put All Your Strategies in One Basket: Playing Green Security Games with Imperfect Prior Knowledge. In Proceedings of the 18th International Conference on Autonomous Agents and Multiagent Systems, Montreal, QC, Canada, 13–17 May 2019; pp. 395–403. [Google Scholar]
McFadden, D. Conditional Logit Analysis of Qualitative Choice Behavior. In Frontiers in Econometrics New York; Zarembka, P., Ed.; Academic Press: Cambridge, MA, USA, 1973. [Google Scholar]
McKelvey, R.D.; Palfrey, T.R. Quantal response equilibria for normal form games. Games Econ. Behav. 1995, 10, 6–38. [Google Scholar] [CrossRef] [Green Version]
Kar, D.; Nguyen, T.H.; Fang, F.; Brown, M.; Sinha, A.; Tambe, M.; Jiang, A.X. Trends and applications in Stackelberg security games. Handb. Dyn. Game Theory 2017. [Google Scholar] [CrossRef]
An, B.; Shieh, E.; Yang, R.; Tambe, M.; Baldwin, C.; DiRenzo, J.; Maule, B.; Meyer, G. A Deployed Quantal Response Based Patrol Planning System for the US Coast Guard. Interfaces 2013, 43, 400–420. [Google Scholar] [CrossRef] [Green Version]
Carroll, T.E.; Grosu, D. A game theoretic investigation of deception in network security. Secur. Commun. Netw. 2011, 4, 1162–1172. [Google Scholar] [CrossRef]
Fraunholz, D.; Anton, S.D.; Lipps, C.; Reti, D.; Krohmer, D.; Pohl, F.; Tammen, M.; Schotten, H.D. Demystifying Deception Technology: A Survey. arXiv 2018, arXiv:1804.06196. [Google Scholar]
Horák, K.; Zhu, Q.; Bošanskỳ, B. Manipulating adversary’s belief: A dynamic game approach to deception by design for proactive network security. In Proceedings of the International Conference on Decision and Game Theory for Security, Vienna, Austria, 23–25 October 2017; Springer: Cham, Switzerland, 2017; pp. 273–294. [Google Scholar]
Zhuang, J.; Bier, V.M.; Alagoz, O. Modeling secrecy and deception in a multiple-period attacker–defender signaling game. Eur. J. Oper. Res. 2010, 203, 409–418. [Google Scholar] [CrossRef]
Han, X.; Kheir, N.; Balzarotti, D. Deception techniques in computer security: A research perspective. ACM Comput. Surv. (CSUR) 2018, 51, 1–36. [Google Scholar] [CrossRef]
Fugate, S.; Ferguson-Walter, K. Artificial Intelligence and Game Theory Models for Defending Critical Networks with Cyber Deception. AI Mag. 2019, 40, 49–62. [Google Scholar] [CrossRef]
Guo, Q.; An, B.; Bosansky, B.; Kiekintveld, C. Comparing strategic secrecy and Stackelberg commitment in security games. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17), Melbourne, Australia, 19–25 August 2017. [Google Scholar]
Rabinovich, Z.; Jiang, A.X.; Jain, M.; Xu, H. Information disclosure as a means to security. In Proceedings of the the 2015 International Conference on Autonomous Agents and Multiagent Systems, Istanbul, Turkey, 4–8 May 2015; pp. 645–653. [Google Scholar]
Sinha, A.; Fang, F.; An, B.; Kiekintveld, C.; Tambe, M. Stackelberg Security Games: Looking Beyond a Decade of Success. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18), Stockholm, Sweden, 13–19 July 2018; pp. 5494–5501. [Google Scholar]
Xu, H.; Rabinovich, Z.; Dughmi, S.; Tambe, M. Exploring Information Asymmetry in Two-Stage Security Games. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, Austin, TX, USA, 25–30 January 2015; pp. 1057–1063. [Google Scholar]
Gan, J.; Xu, H.; Guo, Q.; Tran-Thanh, L.; Rabinovich, Z.; Wooldridge, M. Imitative Follower Deception in Stackelberg Games. arXiv 2019, arXiv:1903.02917. [Google Scholar]
Nguyen, T.H.; Wang, Y.; Sinha, A.; Wellman, M.P. Deception in finitely repeated security games. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, Hi, USA, 27 January–1 February 2019. [Google Scholar]
Estornell, A.; Das, S.; Vorobeychik, Y. Deception Through Half-Truths. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020. [Google Scholar]
Nguyen, T.H.; Sinha, A.; He, H. Partial Adversarial Behavior Deception in Security Games. In Proceedings of the 29th International Joint Conference on Artificial Intelligence (IJCAI), Virtual Conference, 7–15 January 2021. [Google Scholar]
Biggio, B.; Nelson, B.; Laskov, P. Poisoning attacks against support vector machines. arXiv 2012, arXiv:1206.6389. [Google Scholar]
Huang, L.; Joseph, A.D.; Nelson, B.; Rubinstein, B.I.; Tygar, J.D. Adversarial machine learning. In Proceedings of the 4th ACM Workshop on Security and Artificial Intelligence, Chicago, IL, USA, 21 October 2011; ACM: New York, NY, USA, 2011; pp. 43–58. [Google Scholar]
Steinhardt, J.; Koh, P.W.W.; Liang, P.S. Certified defenses for data poisoning attacks. In Proceedings of the Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 3517–3529. [Google Scholar]
Tong, L.; Yu, S.; Alfeld, S. Adversarial Regression with Multiple Learners. In Proceedings of the International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; pp. 4946–4954. [Google Scholar]
Kiekintveld, C.; Jain, M.; Tsai, J.; Pita, J.; Ordóñez, F.; Tambe, M. Computing optimal randomized resource allocations for massive security games. In Proceedings of the 8th International Conference on Autonomous Agents and Multiagent Systems, Budapest, Hungary, 10–15 May 2009; pp. 689–696. [Google Scholar]

Figure 1. An example of discretizing

λ^{learnt}

,

Λ^{learnt} = {0, 0.9, 1.7, 2.3}

, and the six resulting attacker sub-intervals and corresponding uncertainty sets, with

λ^{m a x} = 2, δ = 0.5

. In particular, the first sub-interval of deceptive

λ^{dec}

is

i n t_{1} = [0, 0.4)

in which any

λ^{dec}

corresponds to the same uncertainty set of possible learning outcomes

Λ_{1}^{learnt} = {0}

.

Figure 1. An example of discretizing

λ^{learnt}

,

Λ^{learnt} = {0, 0.9, 1.7, 2.3}

, and the six resulting attacker sub-intervals and corresponding uncertainty sets, with

λ^{m a x} = 2, δ = 0.5

. In particular, the first sub-interval of deceptive

λ^{dec}

is

i n t_{1} = [0, 0.4)

in which any

λ^{dec}

corresponds to the same uncertainty set of possible learning outcomes

Λ_{1}^{learnt} = {0}

.

Figure 2. An example of a defense function with corresponding sub-intervals and uncertainty sets of the attacker, where

λ^{m a x} = 2.0

and

δ = 0.4

. The defense function is determined as:

I_{1}^{d} = [0, 1.4]

,

I_{2}^{d} = (1.4, 2.4]

with corresponding defense strategies

{x_{1}, x_{2}}

. Then the deception range of the attacker can be divided into three sub-intervals:

i n t_{1}^{dec} = [0, 1], i n t_{2}^{dec} = (1, 1.8], i n t_{3}^{dec} = (1.8, 2]

with corresponding uncertainty sets

X_{1}^{def} = {x_{1}}, X_{2}^{def} = {x_{1}, x_{2}}, X_{3}^{def} = {x_{2}}

. For example, if the attacker plays any

λ^{dec} \in i n t_{2}^{dec}

, it will lead the defender to play either

x_{1}

or

x_{2}

, depending on the actual learning outcome of the defender.

Figure 2. An example of a defense function with corresponding sub-intervals and uncertainty sets of the attacker, where

λ^{m a x} = 2.0

and

δ = 0.4

. The defense function is determined as:

I_{1}^{d} = [0, 1.4]

,

I_{2}^{d} = (1.4, 2.4]

with corresponding defense strategies

{x_{1}, x_{2}}

. Then the deception range of the attacker can be divided into three sub-intervals:

i n t_{1}^{dec} = [0, 1], i n t_{2}^{dec} = (1, 1.8], i n t_{3}^{dec} = (1.8, 2]

with corresponding uncertainty sets

X_{1}^{def} = {x_{1}}, X_{2}^{def} = {x_{1}, x_{2}}, X_{3}^{def} = {x_{2}}

. For example, if the attacker plays any

λ^{dec} \in i n t_{2}^{dec}

, it will lead the defender to play either

x_{1}

or

x_{2}

, depending on the actual learning outcome of the defender.

Figure 3. Evaluations on player utilities.

Figure 4. Runtime performance.

Figure 5. Player Utilities with Varying Number of Resources.

Figure 6. Player Utilities with Varying Number of Targets.

Table 1. The payoff matrix of a 3-target game.

	Target 1	Target 2	Target 3
Def. Reward	2	3	1
Def. Penalty	−1	−2	0
Att. Reward	2	1	3
Att. Penalty	−3	−2	−3

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Nguyen, T.H.; Yadav, A. A Complete Analysis on the Risk of Using Quantal Response: When Attacker Maliciously Changes Behavior under Uncertainty. Games 2022, 13, 81. https://doi.org/10.3390/g13060081

AMA Style

Nguyen TH, Yadav A. A Complete Analysis on the Risk of Using Quantal Response: When Attacker Maliciously Changes Behavior under Uncertainty. Games. 2022; 13(6):81. https://doi.org/10.3390/g13060081

Chicago/Turabian Style

Nguyen, Thanh Hong, and Amulya Yadav. 2022. "A Complete Analysis on the Risk of Using Quantal Response: When Attacker Maliciously Changes Behavior under Uncertainty" Games 13, no. 6: 81. https://doi.org/10.3390/g13060081

APA Style

Nguyen, T. H., & Yadav, A. (2022). A Complete Analysis on the Risk of Using Quantal Response: When Attacker Maliciously Changes Behavior under Uncertainty. Games, 13(6), 81. https://doi.org/10.3390/g13060081

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Complete Analysis on the Risk of Using Quantal Response: When Attacker Maliciously Changes Behavior under Uncertainty

Abstract

1. Introduction

Outline of the Article

2. Related Work

3. Background

4. Attacker Behavior Deception under Unknown Learning Outcome

4.1. A Polynomial-Time Deception Algorithm

4.1.1. Decomposability of Deception Space

4.1.2. Divide and Conquer: (Divide $P_{discrete}^{dec}$ ) into a $O (K)$ Polynomial Sub-Problems

4.2. Solution Quality Analysis

4.3. Heuristic to Improve Discretization

5. Defender Counter-Deception

5.1. Analyzing Attacker Deception Adaptation

5.1.1. Decomposability of Attacker Deception Space

5.1.2. Characteristics of Attacker Optimal Deception

5.2. Finding Optimal Defense Function $H^{I}$ Given Fixed I: Divide-and-Conquer

5.3. Completing the Proof of Theorem 3

6. Experimental Evaluation

Additional Experiment Results

7. Conclusions

Author Contributions

Funding

Conflicts of Interest

Notes

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

A Complete Analysis on the Risk of Using Quantal Response: When Attacker Maliciously Changes Behavior under Uncertainty

Abstract

1. Introduction

Outline of the Article

2. Related Work

3. Background

4. Attacker Behavior Deception under Unknown Learning Outcome

4.1. A Polynomial-Time Deception Algorithm

4.1.1. Decomposability of Deception Space

4.1.2. Divide and Conquer: (Divide P discrete dec ) into a O ( K ) Polynomial Sub-Problems

4.2. Solution Quality Analysis

4.3. Heuristic to Improve Discretization

5. Defender Counter-Deception

5.1. Analyzing Attacker Deception Adaptation

5.1.1. Decomposability of Attacker Deception Space

5.1.2. Characteristics of Attacker Optimal Deception

5.2. Finding Optimal Defense Function H I Given Fixed I: Divide-and-Conquer

5.3. Completing the Proof of Theorem 3

6. Experimental Evaluation

Additional Experiment Results

7. Conclusions

Author Contributions

Funding

Conflicts of Interest

Notes

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

4.1.2. Divide and Conquer: (Divide $P_{discrete}^{dec}$ ) into a $O (K)$ Polynomial Sub-Problems

5.2. Finding Optimal Defense Function $H^{I}$ Given Fixed I: Divide-and-Conquer