A Two-Player Resource-Sharing Game with Asymmetric Information

Wijewardena, Mevan; Neely, Michael J.

doi:10.3390/g14050061

Open AccessArticle

A Two-Player Resource-Sharing Game with Asymmetric Information

by

Mevan Wijewardena

^*

and

Michael J. Neely

Department of Electrical and Computer Engineering, University of Southern California, Los Angeles, CA 90089-2565, USA

^*

Author to whom correspondence should be addressed.

Games 2023, 14(5), 61; https://doi.org/10.3390/g14050061

Submission received: 11 August 2023 / Revised: 12 September 2023 / Accepted: 13 September 2023 / Published: 17 September 2023

(This article belongs to the Special Issue Applications of Game Theory with Mathematical Methods)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

This paper considers a two-player game where each player chooses a resource from a finite collection of options. Each resource brings a random reward. Both players have statistical information regarding the rewards of each resource. Additionally, there exists an information asymmetry where each player has knowledge of the reward realizations of different subsets of the resources. If both players choose the same resource, the reward is divided equally between them, whereas if they choose different resources, each player gains the full reward of the resource. We first implement the iterative best response algorithm to find an

ϵ

-approximate Nash equilibrium for this game. This method of finding a Nash equilibrium may not be desirable when players do not trust each other and place no assumptions on the incentives of the opponent. To handle this case, we solve the problem of maximizing the worst-case expected utility of the first player. The solution leads to counter-intuitive insights in certain special cases. To solve the general version of the problem, we develop an efficient algorithmic solution that combines online convex optimization and the drift-plus penalty technique.

Keywords:

resource-sharing games; congestion games; potential games; worst-case utility maximization; drift-plus penalty method

1. Introduction

We consider the following game with two players, A and B. There are n resources, each denoted by an integer between 1 and n. Each player selects a resource without knowledge about the other player’s selection. The state of the game is described by the random vector

W = {(W_{1}, W_{2}, \dots, W_{n})}^{⊤}

, where

W_{k}

is the reward random variable of resource k. We assume

W_{k}

to be independent random variables for each

1 \leq k \leq n

, taking non-negative real values. If both players choose the same resource k, each gets a utility of

W_{k} / 2

. If they choose different resources

k, l

, they receive utilities of

W_{k}

and

W_{l}

, respectively. It is assumed that the mean and the variance of

W_{k}

exist and are finite for each

1 \leq k \leq n

. Both players know the distribution of

W

. Our formulation allows for an information asymmetry between the players. In particular,

{1, 2, \dots, n}

can be partitioned into four sets

{A, B, C, AB}

where only player A observes the realizations of

W_{k}

for

k \in A

, only player B observes the realizations of

W_{k}

for

k \in B

, no player observes the realizations of

W_{k}

for

k \in C

, and both players observe the realizations of

W_{k}

for

k \in AB

.

This game can be used to model different real-world scenarios where the agents have asymmetric information regarding the involved information structure. One classic example is the problem of Multiple-Access Control (MAC) in communication systems. Here, communication channels are accessed by multiple users, and the data rate of a channel is shared amongst the users who select it [1]. A channel can be shared using Time Division Multiple Access (TDMA) or Frequency Division Multiple Access (FDMA), where in TDMA, the channel is time-shared among the users [2,3], whereas in FDMA, the channel is frequency-shared among the users [4]. In both cases, the total data rate supported by the channel can be considered the utility of the channel. The problem of information asymmetry arises since a user might have precise information regarding the total data rate offered by some channels but not others, and the known channels can be different for different users. On the other hand, the users in such a system cannot be trusted since the system may have malicious users (for instance, jammers) who focus on reducing the data rate available to genuine users.

Modified versions of this game apply to problems in economics. For instance, consider a firm that chooses a market to enter from a pool of market options. The chosen market may also be chosen by another firm. The reward of a market is the revenue it brings. Assume a simplified model where there exists a total revenue for each market, and the total revenue is divided equally among the firms entering the market. A reward known to all firms can be considered public information, while a reward known only to one firm is private information of that firm.

The game defined above can be viewed as a stochastic version of the class of games defined in [5], which are resource-sharing games, also known as congestion games. In resource-sharing games, players compete for a finite collection of resources. In a single turn of the game, each player is allowed to select a subset of the collection of resources, where the allowed subsets make up the action space of the player. Each resource offers a reward to each player who selected the particular resource, where the reward offered depends on the number of players who selected it. The relationship between the reward offered to a player by a resource and the number of users selecting it is captured by the reward function of the resource. A player’s utility is equal to the sum of the rewards offered by the resources in the subset selected by the player. In [5], it is established that the above game has a pure-strategy (deterministic) Nash equilibrium.

Although in the classical setting, these games ignore the stochastic nature of the rewards offered by the resources, the idea of resource-sharing games has been extended to different stochastic versions [6,7]. Versions of the game with information asymmetry have been considered through the work of [8] in the context of Bayesian games, which considers the information design problem for resource-sharing with uncertainty. Similar Bayesian games have also been considered in [9,10]. It should be noted that in general resource-sharing games, no conditions are placed on the reward functions of the resources. The special case where the reward functions are non-decreasing in the number of players selecting the resource is called a cost-sharing game [11]. These games are typically treated as games where a cost is minimized rather than a utility being maximized. In fair cost-sharing games, the cost of a resource is divided equally among the players selecting the resource. We consider a fair reward allocation model, where the reward of a resource is equally shared among the players selecting the resource. It should be noted that in this model, the players have opposite incentives compared to a fair-cost sharing model.

The work on resource-sharing games assumes that the players either cooperate or have the incentive to maximize a private or a social utility. It is interesting to consider a stochastic version of the game with asymmetric information between players who do not necessarily trust each other and who place no assumptions on the incentives of the opponents. In this context, the players have no signaling or external feedback and take actions based only on their personal knowledge of the reward realizations for a subset of the resource options. In this paper, we consider the above problem and limit our attention to the two-player singleton case, where each player can choose only one resource.

In the first part of the paper, we provide an iterative best response algorithm to find an

ϵ

-approximate Nash equilibrium of the system. In the second part, we solve the problem of maximizing the worst-case expected utility of the first player. We solve the problem in two cases. The first case is when both players do not know the realizations of the reward random variables of any of the resources, in which case an explicit solution can be constructed. This case yields a counter-intuitive solution that provides insight into the problem. One such insight is that, while it is always optimal to choose from a subset of resources with the highest average rewards, within that subset, one chooses the higher-valued rewards with lower probability. For the second case, we solve the general version of the problem by developing an algorithm that leverages the online optimization technique [12,13] and the drift-plus penalty method [14]. This algorithm generates a mixture of

O (1 / ε^{2})

pure strategies, which, when used in an equiprobable mixture, provides a utility within

ε

of optimality on average. Below, we summarize our major contributions.

We consider the problem of a two-player singleton stochastic resource-sharing game with asymmetric information. We first provide an iterative best response algorithm to find an $ϵ$ -approximate Nash equilibrium of the system. This equilibrium analysis uses potential game concepts.
When the players do not trust each other and place no assumptions on the incentives of the opponent, we solve the problem of maximizing the worst-case expected utility of the first player using a novel algorithm that leverages techniques from online optimization and the drift-plus penalty methods. The algorithm developed can be used to solve the general unconstrained problem of finding the randomized decision $α \in {1, 2, \dots, n}$ , which maximizes $E {h (x; Θ)}$ , where $x \in R^{n}$ with $x_{k} = E {Γ_{k} 1_{{α = k}}}$ , $Θ \in R^{m}$ and $Γ \in R^{n}$ are non-negative random vectors with finite second moments, and h is a concave function such that $\tilde{h} (x) = E {h (x; Θ)}$ is Lipschitz continuous, entry-wise non-decreasing and has bounded subgradients.
We show our algorithm uses a mixture of only $O (1 / ε^{2})$ pure strategies using a detailed analysis of the sample path of the related virtual queues (our preliminary work on this algorithm used a mixture of $O (1 / ε^{3})$ pure strategies). Virtual queues are also used for constrained online convex optimization in [13], but our problem structure is different and requires a different and more involved treatment.

1.1. Background on Resource-Sharing Games

The classical resource-sharing game defined in [5] is a tuple

(M, N, T, r)

, where

M

is a set of m players,

N

is a set of n resources,

T = T_{1} \times T_{2} \times \dots \times T_{m}

where

T_{j}

is the set of possible actions of player j (which is a subset of

2^{N}

), and

r = (r_{1}, r_{2}, \dots, r_{n})

, where

r_{i} : N_{0} \to R

is the reward function of resource i. Here, we use the notation

N_{0} = N \cup {0}

. Each player has complete knowledge about the tuple

(M, N, T, r)

, but they do not have knowledge of the actions chosen by other players. For an action profile

a = (a_{1}, a_{2}, \dots, a_{m}) \in T

, the count function # is a function from

N \times T

to

N_{0}

where

# (i, a) = \sum_{k = 1}^{m} 1_{{i \in a_{k}}}

. In other words,

# (i, a)

is the number of players choosing resource i under action profile

a

. We call the quantity

r_{i} (# (i, a))

the per-player reward of resource i under action profile

a

. The utility

u_{j}

of player j is a function from

T

to

R

, where

u_{j} (a) = \sum_{i = 1}^{n} 1_{{i \in a_{k}}} r_{i} (# (i, a))

. In other words,

u_{j} (a)

is the sum of the per-player rewards of the resources chosen by player j under action profile

a

. Resource-sharing games fall under the general category of potential games [15]. Potential games are the class of games for which the change in reward of any player as a result of changing their strategy can be captured by the change in a global potential function.

Many game variations of the resource-sharing game have been studied [16]. Weighted resource-sharing games [17], games with player-dependent reward functions [18], and games with resources having preferences over players [19] are some of the extensions. Singleton games, where each player is allowed to choose only one resource, have also been explored explicitly in the literature [20,21]. Some of the extensions of the classical resource-sharing game possess a pure Nash equilibrium in the singleton case. Two examples would be the games with player-specific reward functions for a resource [18] and the games with priorities where the resources have preferences over the players [19].

Resource-sharing games have been extended to several stochastic versions. For instance, ref. [6] considers the selfish routing problem with risk-averse players in a network with stochastic delays. The work of [7] considers two scenarios where, in the first scenario, each player participates in the game with a certain probability, and in the second scenario, the reward functions are stochastic. The problem of information asymmetry in resource-sharing games has been addressed through the work of [8,9,10,22]. The work of [22] considers a network congestion game where the players have different information sets regarding the edges of the network. Further, ref. [8] considers a scenario with a single random state

θ

, which determines the reward functions. The realization of

θ

is known to a game manager who strategically provides recommendations (signaling) to the players to minimize the social cost. An information asymmetry arises among the players in this case due to the actions of the game manager during private signaling, where the game manager provides player-specific recommendations.

Resource-sharing games appear in a variety of applications such as service chain composition [23], congestion control [24], network design [25], load balancing networks [26,27], resource sharing in wireless networks [28], spectrum sharing [29], radio access selection [30], non-orthogonal multiple access [31,32], network selection [33,34], and migration of species [35].

Our formulation differs from the literature on resource-sharing games since we consider a scenario that is difficult to be analyzed using the standard equilibrium-based approaches. This is due to the fact that the players do not trust each other and place no assumptions on the incentives of the opponents, and they take action in the absence of a signaling mechanism or external feedback by just using their knowledge of the reward random variables. This motivates our formulation as a one-shot problem tackled using worst-case expected utility maximization.

1.2. Notation

We use calligraphic letters to denote sets. Vectors and matrices are denoted by boldface characters. For integers n and m, we denote by

[n : m]

the inclusive set of integers between n and m. Given a vector

w \in R^{m}

,

w_{k}

is used to denote the k-th element of

w

;

w_{k : l}

for

l \geq k

represents the

l - k + 1

dimensional sub-vector

{(w_{k}, w_{k + 1}, \dots, w_{l})}^{⊤}

of

w

; for a subset

S

of integers from 1 to n

{w_{k}; k \in S}

represents the sub-vector of

w

with index in

S

. For

z \in R^{m}

, we use

{∥ z ∥}_{2}

to denote the standard Euclidean norm (L2 norm) of

z

. For a function

f : R^{m} \to R

, and

z \in R^{m}

, we use

f^{^{'}} (z) = (f_{1}^{^{'}} (z), f_{2}^{^{'}} (z), \dots, f_{m}^{^{'}} (z))

to denote a subgradient of f at

z

.

2. Materials and Methods

The code used for the simulations is implemented using Python programming language in the notebook https://rb.gy/wvt33, accessed on 10 August 2023.

3. Formulation

Denote

X = {W_{k}; k \in A}

,

Y = {W_{k}; k \in B}

,

Z = {W_{k}; k \in AB}

, and

V = {W_{k}; k \in C}

. Recall that

X

is known only to player A,

Y

is known only to player B,

Z

is known to both players, and

V

is known to neither. Let us define

A^{c} = [1 : n] ∖ A

, and

B^{c} = [1 : n] ∖ B

. Let

| A | = a

,

| B | = b

,

| C | = c

and

| AB | = d

. Therefore,

a + b + c + d = n

. Without loss of generality, we assume

A = [1 : a]

,

B = [a + 1 : a + b]

,

C = [a + b + 1 : a + b + c]

, and

AB = [a + b + c + 1 : n]

.

Let

R^{C} (g^{A}, g^{B})

be the random variable representing the utility of player

C \in {A, B}

, given that player A uses strategy

g^{A}

, and player B uses strategy

g^{B}

. General strategies for players A and B can be represented by the Borel-measurable functions,

\begin{matrix} g^{A} : [0, 1) \times R_{\geq 0}^{a + d} \to [1 : n], \end{matrix}

(1)

\begin{matrix} g^{B} : [0, 1) \times R_{\geq 0}^{b + d} \to [1 : n], \end{matrix}

(2)

where

\begin{matrix} α^{A} = g^{A} (U^{A}, X, Z), \end{matrix}

(3)

\begin{matrix} α^{B} = g^{B} (U^{B}, Y, Z), \end{matrix}

(4)

are the resources chosen by players A and B, respectively. Here,

U^{A}

and

U^{B}

are independent randomization variables uniformly distributed in [0, 1) and independent of

W

. A pure strategy for player A is a function

g^{A}

that does not depend on

U^{A}

, whereas a mixed strategy is a function

g^{A}

that depends on

U^{A}

. Hence, we drop the randomization variable when depicting a pure strategy. Pure strategies and mixed strategies for player B are defined similarly. Let

S^{A}

and

S^{B}

denote the sets of all possible strategies for players A and B, respectively.

It turns out that our analysis is simplified when

Z

is fixed. Fixing

Z

does not affect the symmetry between players A and B since

Z

is observed by both players A and B. Hereafter, we conduct the analysis by considering all quantities conditioned on

Z

.

Define

\begin{matrix} p_{k}^{A} = E {1_{{α^{A} = k}} | Z} for 1 \leq k \leq n, \\ q_{k}^{A} = E {W_{k} 1_{{α^{A} = k}} | Z} for k \in A, \end{matrix}

(5)

and,

\begin{matrix} p_{k}^{B} = E {1_{{α^{B} = k}} | Z} for 1 \leq k \leq n, \\ q_{k}^{B} = E {W_{k} 1_{{α^{B} = k}} | Z} for k \in B . \end{matrix}

(6)

Note that

p_{k}^{A}

and

p_{k}^{B}

are the conditional probabilities of players A and B choosing k given

Z

. Define vectors

p^{A} = {p_{k}^{A}; 1 \leq k \leq n}

,

q^{A} = {q_{k}^{A}; k \in A}

,

p^{B} = {p_{k}^{B}; 1 \leq k \leq n}

, and

q^{B} = {q_{k}^{B}; k \in B}

. For

1 \leq k \leq n

, define

E_{k} = E {W_{k} | Z}

. Hence, we have

E_{k} = \{\begin{matrix} W_{k} & if k \in AB, \\ E {W_{k}} & otherwise, \end{matrix}

(7)

which uses the independence of

W_{k}

and

Z

when

k \notin AB

.

Note that the utility achieved by player A given the strategies

g^{A}

and

g^{B}

can be written as

\begin{matrix} R^{A} (g^{A}, g^{B}) = \sum_{k = 1}^{n} W_{k} (1_{{α^{A} = k}} - \frac{1}{2} 1_{{α^{A} = k}} 1_{{α^{B} = k}}) . \end{matrix}

(8)

Given the strategies

g^{A}

and

g^{B}

, we provide an expression for the expected utility of player A given

Z

, where the expectation is over the random variables

X, Y, V

, and the possibly random actions

α^{A}

and

α^{B}

. Taking expectations of (8) gives,

\begin{matrix} E { & R^{A} (g^{A}, g^{B}) | Z} = \sum_{k = 1}^{n} E {W_{k} 1_{{α^{A} = k}} | Z} - \frac{1}{2} \sum_{k = 1}^{n} E {W_{k} 1_{{α^{A} = k}} 1_{{α^{B} = k}} | Z} \\ = \sum_{k \in A} E {W_{k} 1_{{α^{A} = k}} | Z} + \sum_{k \in A^{c}} E {W_{k} | Z} E {1_{{α^{A} = k}} | Z} - \frac{1}{2} \sum_{k = 1}^{n} E {W_{k} 1_{{α^{A} = k}} 1_{{α^{B} = k}} | Z} \\ = \sum_{k \in A} q_{k}^{A} + \sum_{k \in A^{c}} E_{k} p_{k}^{A} - \frac{1}{2} \sum_{k = 1}^{n} E {W_{k} 1_{{α^{A} = k}} 1_{{α^{B} = k}} | Z} . \end{matrix}

(9)

Note that given

Z

, the random variables

α^{A}

and

α^{B}

are independent. Hence, we can split the last term (9) as follows,

\begin{matrix} \sum_{k = 1}^{n} & E {W_{k} 1_{{α^{A} = k}} 1_{{α^{B} = k}} | Z} = \sum_{k \in A} E {W_{k} 1_{{α^{A} = k}} 1_{{α^{B} = k}} | Z} + \sum_{k \in B} E {W_{k} 1_{{α^{A} = k}} 1_{{α^{B} = k}} | Z} \\ + \sum_{k \in C \cup AB} E {W_{k} 1_{{α^{A} = k}} 1_{{α^{B} = k}} | Z} = \sum_{k \in A} E {W_{k} 1_{{α^{A} = k}} | Z} E {1_{{α^{B} = k}} | Z} \\ + \sum_{k \in B} E {1_{{α^{A} = k}} | Z} E {W_{k} 1_{{α^{B} = k}} | Z} + \sum_{k \in C \cup AB} E_{k} E {1_{{α^{A} = k}} | Z} E {1_{{α^{B} = k}} | Z} \\ = \sum_{k \in A} q_{k}^{A} p_{k}^{B} + \sum_{k \in B} p_{k}^{A} q_{k}^{B} + \sum_{k \in C \cup AB} E_{k} p_{k}^{A} p_{k}^{B} . \end{matrix}

(10)

4. Computing the $ϵ$ -Approximate Nash Equilibrium

This section focuses on finding an

ϵ

-approximate Nash equilibrium of the game. Fix

ϵ > 0

. A strategy pair

(g^{A}, g^{B})

is defined as an

ϵ

-approximate Nash equilibrium if neither player can improve its expected reward by more than

ϵ

if it changes its strategy (while holding the strategy of the other player fixed).

Combining (10) with (9), we have that

\begin{matrix} E { & R^{A} (g^{A}, g^{B}) | Z} = \sum_{k \in A} q_{k}^{A} + \sum_{k \in A^{c}} E_{k} p_{k}^{A} - \frac{1}{2} (\sum_{k \in A} q_{k}^{A} p_{k}^{B} + \sum_{k \in B} p_{k}^{A} q_{k}^{B} + \sum_{k \in C \cup AB} E_{k} p_{k}^{A} p_{k}^{B}) . \end{matrix}

(11)

Similarly, for player B, we have

\begin{matrix} E { & R^{B} (g^{A}, g^{B}) | Z} = \sum_{k \in B} q_{k}^{B} + \sum_{k \in B^{c}} E_{k} p_{k}^{B} - \frac{1}{2} (\sum_{k \in A} q_{k}^{A} p_{k}^{B} + \sum_{k \in B} p_{k}^{A} q_{k}^{B} + \sum_{k \in C \cup AB} E_{k} p_{k}^{A} p_{k}^{B}) . \end{matrix}

(12)

First, we focus on finding the best response for players A and B, given the other player’s strategy is fixed.

Lemma 1.

The best response for players A and B are given by

α^{A} = arg {max}_{1 \leq k \leq n} A_{k}

, and

α^{B} = arg {max}_{1 \leq k \leq n} B_{k}

, where

A_{k}

and

B_{k}

are given by,

\begin{matrix} A_{k} = \{\begin{matrix} W_{k} (1 - \frac{p_{k}^{B}}{2}) & i f k \in A, \\ E_{k} - \frac{q_{k}^{B}}{2} & i f k \in B, \\ E_{k} (1 - \frac{p_{k}^{B}}{2}) & i f k \in C \cup AB, \end{matrix} B_{k} = \{\begin{matrix} E_{k} - \frac{q_{k}^{A}}{2} & i f k \in A, \\ W_{k} (1 - \frac{p_{k}^{A}}{2}) & i f k \in B, \\ E_{k} (1 - \frac{p_{k}^{A}}{2}) & i f k \in C \cup AB . \end{matrix} \end{matrix}

(13)

Proof of Lemma 1.

We find the best response for A, and the best response for B follows similarly. Notice that we can rearrange (11) as,

\begin{matrix} E {R^{A} (g^{A}, g^{B}) | Z} = \sum_{k \in A} q_{k}^{A} (1 - \frac{p_{k}^{B}}{2}) + \sum_{k \in B} p_{k}^{A} (E_{k} - \frac{q_{k}^{B}}{2}) + \sum_{k \in C \cup AB} p_{k}^{A} E_{k} (1 - \frac{p_{k}^{B}}{2}) \\ = \sum_{k \in A} E {W_{k} 1_{{α^{A} = k}} | Z} (1 - \frac{p_{k}^{B}}{2}) + \sum_{k \in B} E {1_{{α^{A} = k}} | Z} (E_{k} - \frac{q_{k}^{B}}{2}) \\ + \sum_{k \in C \cup AB} E_{k} E {1_{{α^{A} = k}} | Z} (1 - \frac{p_{k}^{B}}{2}) \\ = E \{\sum_{k \in A} W_{k} (1 - \frac{p_{k}^{B}}{2}) 1_{{α^{A} = k}} + \sum_{k \in B} (E_{k} - \frac{q_{k}^{B}}{2}) 1_{{α^{A} = k}} + \sum_{k \in C \cup AB} E_{k} (1 - \frac{p_{k}^{B}}{2}) 1_{{α^{A} = k}} | Z\} . \end{matrix}

(14)

The above expectation is maximized when A chooses according to the given policy. □

Next, we find a potential function for the game. A potential function is a function of the strategies of the players such that the change in the utility of a player when he changes his strategy (while the strategies of other players are held fixed) is equal to the change in the potential function [15].

Theorem 1.

The function

H (g^{A}, g^{B})

given by,

\begin{matrix} H (g^{A}, g^{B}) & = \sum_{k \in A} (q_{k}^{A} + E_{k} p_{k}^{B}) + \sum_{k \in B} (q_{k}^{B} + E_{k} p_{k}^{A}) + \sum_{k \in C \cup AB} E_{k} (p_{k}^{A} + p_{k}^{B}) \\ - \frac{1}{2} (\sum_{k \in A} q_{k}^{A} p_{k}^{B} + \sum_{k \in B} p_{k}^{A} q_{k}^{B} + \sum_{k \in C \cup AB} E_{k} p_{k}^{A} p_{k}^{B}), \end{matrix}

(15)

is a potential function for the game, where

p_{k}^{A}

,

p_{k}^{B}

for

1 \leq k \leq n

,

q_{k}^{A}

for

k \in A

and

q_{k}^{B}

for

k \in B

are defined in (5) and (6). Moreover, we have that for all

g^{A}, g^{B} \in S^{A} \times S^{B}

,

H (g^{A}, g^{B}) \leq 2 \sum_{k = 1}^{n} E_{k}

.

Proof of Theorem 1.

The key to the proof is separating (15) (using (11) and (12)) as,

\begin{matrix} H (g^{A}, g^{B}) & = E {R^{A} (g^{A}, g^{B})} + \sum_{k \in B^{c}} E_{k} p_{k}^{B} + \sum_{k \in B} q_{k}^{B} \end{matrix}

(16)

\begin{matrix} = E {R^{B} (g^{A}, g^{B})} + \sum_{k \in A} q_{k}^{A} + \sum_{k \in A^{c}} p_{k}^{A} E_{k} . \end{matrix}

(17)

Consider updating the strategy of player A while holding the strategy of player B fixed. Notice that since

\sum_{k \in B^{c}} E_{k} p_{k}^{B} + \sum_{k \in B} q_{k}^{B}

is not affected in this process, from (16), we have that the change in the expected utility of player A is equal to the change in the H function. Similarly, this holds when player B updates the strategy while holding player A’s strategy fixed. Hence, this is indeed a potential function. The proof that

H (g^{A}, g^{B}) \leq 2 \sum_{k = 1}^{n} E_{k}

is omitted for brevity (See technical report [36] for details). □

Using Theorem 1 with standard potential game theory (see, for example, [37]), we have that the iterative best response algorithm with the best response found in Lemma 1 converges to an

ϵ

-approximate Nash equilibrium in at most

(2 \sum_{k = 1}^{n} E_{k}) / ϵ

iterations.

5. Worst-Case Expected Utility

Finding a Nash equilibrium using the above algorithm may not be desirable when the players do not trust each other and place no assumptions on the incentives of the opponent. To mitigate this issue, we consider maximizing the worst-case expected utility of player A. Similar to the case of finding the Nash equilibrium, the analysis is simplified when

Z

is fixed.

Notice that we can simplify (10) to yield,

\begin{matrix} \sum_{k = 1}^{n} & E {W_{k} 1_{{α^{A} = k}} 1_{{α^{B} = k}} | Z} \\ = \sum_{k \in A} q_{k}^{A} E {1_{{α^{B} = k}} | Z} + \sum_{k \in B} p_{k}^{A} E {W_{k} 1_{{α^{B} = k}} | Z} + \sum_{k \in C \cup AB} E_{k} p_{k}^{A} E {1_{{α^{B} = k}} | Z} \end{matrix}

(18)

\begin{matrix} = \sum_{k \in A} E {Ω_{k} q_{k}^{A} 1_{{α^{B} = k}} | Z} + \sum_{k \in A^{c}} E {Ω_{k} p_{k}^{A} 1_{{α^{B} = k}} | Z}, \end{matrix}

(19)

where

Ω_{k} = \{\begin{matrix} 1 & if k \in A, \\ W_{k} & if k \in B, \\ E_{k} & if k \in C \cup AB . \end{matrix}

(20)

Plugging the above into (9), we find that

\begin{matrix} E {R^{A} (g^{A}, g^{B}) | Z} & = \sum_{k \in A} q_{k}^{A} + \sum_{k \in A^{c}} E_{k} p_{k}^{A} - \frac{1}{2} E \{\sum_{k \in A} Ω_{k} q_{k}^{A} 1_{{α^{B} = k}} + \sum_{k \in A^{c}} Ω_{k} p_{k}^{A} 1_{{α^{B} = k}} | Z\} . \end{matrix}

(21)

The difficulty in dealing with

E {R^{A} (g^{A}, g^{B}) | Z}

is that it depends on the strategy

g^{B}

of player B, which is not known to player A. Hence, given a strategy

g^{A}

of player A, we first focus on obtaining the worst-case strategy

\hat{g^{A}}

of player B. Then we focus on finding the strategy

g^{A}

of player A, which maximizes

E {R^{A} (g^{A}, \hat{g^{A}}) | Z}

. This way, we can guarantee a minimum expected utility for player A irrespective of player B’s strategy.

Lemma 2.

For given

g^{A} \in S^{A}

, the strategy

g^{B} \in S^{B}

that minimizes

E {R^{A} (g^{A}, g^{B}) | Z}

chooses

α^{B}

=

arg {max}_{1 \leq k \leq n} Λ_{k}

, where

\begin{matrix} Λ_{k} = \{\begin{matrix} Ω_{k} q_{k}^{A} & i f k \in A, \\ Ω_{k} p_{k}^{A} & i f k \in A^{c}, \end{matrix} \end{matrix}

(22)

and

Ω_{k}

are defined in (20).

Proof of Lemma 2.

Notice that the only term of

E {R^{A} (g^{A}, g^{B}) | Z}

in (21) that depends on the strategy of player B is the last expectation. This expectation is maximized when player B chooses k, for which

Λ_{k}

is maximized.1 □

Hence, we have

\begin{matrix} E {R^{A} (g^{A}, \hat{g^{A}}) | Z} = \sum_{k \in A} q_{k}^{A} + \sum_{k \in A^{c}} E_{k} p_{k}^{A} - \frac{1}{2} E {\max {Λ_{k}; 1 \leq k \leq n} | Z}, \end{matrix}

(23)

where

Λ_{k}

is defined in (22). We formulate a strategy for player A using the following optimization problem

\begin{matrix} (P 1) : & maximize & f (q, p_{a + 1 : n}) \\ g \in S^{A} \\ subject to & q \in R^{a}, \\ p \in R^{n}, \\ q_{k} = E_{W, U^{A}} {W_{k} 1_{\{g (U^{A}, X, Z) = k\}} | Z} \forall 1 \leq k \leq a \\ p_{l} = E_{W, U^{A}} {1_{{g (U^{A}, X, Z) = l}} | Z} \forall 1 \leq l \leq n, \end{matrix}

(24)

where

f : R^{n} \to R

is defined by,

\begin{matrix} f (x) = & \sum_{k \in A} x_{k} + \sum_{k \in A^{c}} E_{k} x_{k} - \frac{1}{2} E {\max {Ω_{j} x_{j}; 1 \leq j \leq n} | Z} . \end{matrix}

(25)

Although not used immediately, we derive certain properties of f in the following theorem, which are useful later.

Theorem 2.

The function f

1.: is concave.
2.: is entry-wise non-decreasing.
3.: satisfies,

$\begin{matrix} | f (x) - f (y) | \leq \frac{3}{2} \sum_{j \in A} | x_{j} - y_{j} | + \frac{3}{2} \sum_{j \in A^{c}} E_{j} | x_{j} - y_{j} |, \end{matrix}$

(26)

for any $x, y \in R^{n}$ .

Proof of Theorem 2.

See Appendix A. □

It turns out that when

a = b = d = 0

, an explicit solution can be obtained to (P1), which we describe in Section 5.1. In Section 5.2, we describe the solution to the general case. In the technical report [36], we provide simpler alternative solutions to the special cases

a = 0

(with no restriction on b) and

a = 1

(with the additional assumption that

W_{1}

has a continuous CDF).

5.1. Explicit Solution for $a = b = d = 0$

When neither player knows any of the reward realizations, we have

a = b = d = 0

, and the problem reduces to the following.

\begin{matrix} (P 2) : & maximize & \sum_{k = 1}^{n} p_{k} E_{k} - \frac{1}{2} \max {p_{k} E_{k}; 1 \leq k \leq n} \\ subject to & p \in I, \end{matrix}

(27)

where

\begin{matrix} I = {p \in R^{n} : \sum_{i = 1}^{n} p_{i} = 1, p_{i} \geq 0 \forall i} \end{matrix}

(28)

is the n-dimensional probability simplex. For this section, we assume without loss of generality that

E_{k} > 0

for all k. If at least one of the

E_{k}

’s is zero, we could transform (P2) into a lower dimensional problem with non-zero

E_{k}

’s. The following lemma constructs an explicit solution for

a = b = d = 0

.2

Lemma 3.

Assume without loss of generality that

E_{k} \geq E_{k + 1}

for

1 \leq k \leq n - 1

. Further, let,

\begin{matrix} r = arg max_{1 \leq k \leq n} \frac{k - \frac{1}{2}}{\sum_{j = 1}^{k} \frac{1}{E_{j}}}, \end{matrix}

(29)

where the lowest index is chosen in the case of ties. The optimal solution for (P2) is given by

p^{*}

where

\begin{matrix} p_{k}^{*} = \{\begin{matrix} \frac{1}{E_{k} (\sum_{j = 1}^{r} \frac{1}{E_{j}})} & if k \leq r, \\ 0 & otherwise . \end{matrix} \end{matrix}

(30)

Proof of Lemma 3.

See Appendix B. □

It should be noted that this solution is not unique. For instance, consider the case when

n = 2

,

E_{1} = 2

, and

E_{2} = 1

. In this case, the lemma finds the solution

(p_{1}, p_{2}) = (1, 0)

, but it should be noted that

(p_{1}, p_{2}) = (1 / 3, 2 / 3)

is also a solution. It is also interesting that the solution assigns positive probabilities to the r resources with the highest average reward, although within these r resources, higher probabilities are assigned to the resources with lower rewards.

It should also be noted that the worst-case strategy can be arbitrarily worse than the Nash equilibrium strategy. For instance, consider the simple scenario with two resources such that

E_{1} = E_{2}

, where none of the players observe any of the reward realizations. In this case, a Nash equilibrium would be player A always choosing resource 1 and player B always choosing resource 2. Another Nash equilibrium would be player B always choosing resource 1 and player A always choosing resource 2. In either case, player A’s expected utility is

E_{1}

. However, notice that, from Lemma 3, the maximum worst-case expected utility of player A is

3 E_{1} E_{2} / (2 E_{1} + 2 E_{2}) = 3 E_{1} / 4

. Hence,

E_{1}

can be scaled to obtain arbitrarily large deviation between the worst-case and the Nash equilibrium solutions.

5.2. Solving the General Case

In this section, we focus on solving the most general version of (P1) (with no restrictions on the sets

A, B, AB, C

). In particular, we focus on finding a mixed strategy to optimize the worst-case expected utility for player A. It turns out that our optimal solution chooses from a mixture of pure strategies parameterized by

Q \in R^{n}

, of the following form

\begin{matrix} g_{Q}^{A} (X) = arg max_{1 \leq j \leq n} {{Q_{j} W_{j}; j \in A} \cup {Q_{j}; j \in A^{c}}} . \end{matrix}

(31)

We name this special class of pure strategies as threshold strategies. We develop a novel algorithm to solve this problem. Our algorithm leverages techniques from drift-plus penalty theory [14] and online convex optimization [12,13]. It should be noted that our algorithm runs offline and is used to construct an appropriate strategy for player A that approximately solves (P1) conditioned on the observed realization of

Z

. We show that we can obtain values arbitrarily close to the optimal value of (P1) by using a finite equiprobable mixture of pure strategies of the above form. It should be noted that the algorithm developed in this section can be used to solve the general unconstrained problem of finding the randomized decision

α \in {1, 2, \dots, n}

which maximizes

E {h (x; Θ)}

, where

x \in R^{n}

with

x_{k} = E {Γ_{k} 1_{{α = k}}}

,

Θ \in R^{m}

and

Γ \in R^{n}

are non-negative random vectors with finite second moments, and h is a concave function such that

\tilde{h} (x) = E {h (x; Θ)}

is Lipschitz continuous, entry-wise non-decreasing, and has bounded subgradients.

We first provide an algorithm that generates a mixture of T pure strategies, after which we establish the closeness to the optimality of the mixture. We generate a mixture of T pure strategies

{g_{Q (t)}^{A}}_{t = 1}^{T}

by iteratively updating vector

Q

for T iterations, where

Q (t)

and

g_{Q (t)} (X)

denote the state of

Q

and the pure strategy generated in the t-th iteration, respectively. In addition to

Q (t)

, we require another state vector

γ (t) \in R^{n}

, which we also update in each iteration, and parameter V, which decides the convergence properties of the algorithm. We provide the specific details on setting V later in our analysis. We begin with

Q (1) = γ (0) = 0

. In the t-th iteration

(t \geq 1)

, we independently sample

X (t)

and

Ω (t)

from the distributions of

X

and

Ω

, respectively, where

Ω

is defined in (20) while keeping

Z

fixed to its observed value. Then, we update

γ (t)

and

Q (t + 1)

as follows. First, we solve,

\begin{matrix} (32a) & (P 3) : \underset{γ (t)}{minimize} - V f_{t}^{^{'}} {(γ (t - 1))}^{⊤} γ (t) + α {∥ γ (t) - γ (t - 1) ∥}_{2}^{2} + \sum_{j = 1}^{n} Q_{j} (t) γ_{j} (t) \\ (32b) & subject to γ (t) \in K, \end{matrix}

to find

γ (t)

, where

\begin{matrix} f_{t} (x) = \sum_{k \in A} x_{k} + \sum_{k \in A^{c}} x_{k} E_{k} - \frac{1}{2} \max {x_{k} Ω_{k} (t); 1 \leq k \leq n}, \end{matrix}

(33)

α > 0

and

K = (✕_{j \in A} [0, E_{j}]) \times {[0, 1]}^{n - a}

. Notice that

f_{t}^{^{'}} (x)

is given by,

\begin{matrix} f_{t, j}^{^{'}} (x) = \{\begin{matrix} 1 - \frac{1}{2} 1_{{arg {max}_{1 \leq k \leq n} {x_{k} Ω_{k} (t)} = j}} & if j \in A, \\ E_{j} - \frac{1}{2} 1_{{arg {max}_{1 \leq k \leq n} {x_{k} Ω_{k} (t)} = j}} Ω_{j} (t) & if j \in A^{c}, \end{matrix} \end{matrix}

(34)

where

arg max

returns the lowest index in the case of ties. Notice that

f_{t}

is a concave function, which can be established by repeating the same argument used to establish the concavity of f in Theorem 2. Then, we choose the action for the t-th iteration

α^{A} (t) = g_{Q (t)}^{A} (X (t))

(See (31)). Then, to update

Q (t + 1)

, we use,

\begin{matrix} Q_{j} (t + 1) = \max \{Q_{j} (t) + γ_{j} (t) - X_{j} (t) 1_{{α^{A} (t) = j}}, 0\}, \forall j \in A, \\ Q_{j} (t + 1) = \max \{Q_{j} (t) + γ_{j} (t) - 1_{{α^{A} (t) = j}}, 0\}, \forall j \in A^{c} . \end{matrix}

(35)

The algorithm is summarized as Algorithm 1 for clarity.

Algorithm 1: Algorithm for the generation of the optimal mixture of T pure strategies.

After creating the mixture

{g_{Q (t)}^{A}}_{t = 1}^{T}

of pure strategies, we choose one of them randomly with probability

1 / T

to take the decision. In the following two sections, we focus on solving (P3) and evaluating the performance of Algorithm 1.

5.2.1. Solving (P3)

Notice that the objective of (P3) can be written as

\begin{matrix} - V f_{t}^{^{'}} {(γ (t - 1))}^{⊤} γ (t) + α {∥ γ (t) - γ (t - 1) ∥}_{2}^{2} + \sum_{j = 1}^{n} Q_{j} (t) γ_{j} (t) \\ = \sum_{j = 1}^{n} \{- V f_{t, j}^{^{'}} (γ (t - 1)) γ_{j} (t) + α {(γ_{j} (t) - γ_{j} (t - 1))}^{2} + Q_{j} (t) γ_{j} (t)\} . \end{matrix}

(36)

Hence (P3) seeks to minimize a separable convex function over the box constraint

γ (t) \in K

. The solution vector

γ (t)

is found by separately minimizing each component

γ_{j} (t)

over

[0, u_{j}]

, where

\begin{matrix} u_{j} = \{\begin{matrix} E_{j} & if j \in A, \\ 1 & if j \in A^{c} . \end{matrix} \end{matrix}

(37)

The resulting solution is,

\begin{matrix} γ_{j} (t) = Π_{[0, u_{j}]} (γ_{j} (t - 1) - \frac{- V f_{t, j}^{^{'}} (γ (t - 1)) + Q_{j} (t)}{2 α}), \end{matrix}

(38)

where

Π_{[0, u_{j}]}

denotes the projection onto

[0, u_{j}]

. Notice that the above solution is obtained by projecting the global minimizer of the function to be minimized onto

[0, u_{j}]

.

5.2.2. How Good Is the Mixed Strategy Generated by Algorithm 1

Without loss of generality, we assume that

E_{k} > 0

for all

1 \leq k \leq n

. The following theorem establishes the closeness of the expected utility generated by Algorithm 1 to the optimal value

f^{opt}

of (P1).

Theorem 3.

Assume α is set such that

α \geq V^{2}

, and we use the mixed strategy

g^{A}

generated by Algorithm 1 to make the decision. Then,

\begin{matrix} E {R^{A} (g^{A}, \hat{g^{A}}) | Z} \geq & f^{opt} - \frac{D_{1}}{V} - \frac{V D_{2}}{16 α} - \frac{α D_{3}}{V T} - \frac{3}{2 T} \sum_{k \in A} \{\sqrt{α} + E_{k} (2 \sqrt{2 α} + 1)\} \\ - \frac{3}{2 T} \sum_{k \in A^{c}} \{E_{k}^{2} \sqrt{α} + E_{k} (2 \sqrt{2 α} + 1)\}, \end{matrix}

(39)

where

\begin{matrix} D_{1} = n - a + \frac{1}{2} \sum_{j \in A} (E_{j}^{2} + E {W_{k}^{2}}), \\ D_{2} = 4 a + E {{∥ Ω ∥}_{2}^{2} | Z} + \sum_{j \in A^{c}} 4 E_{j}^{2}, \\ D_{3} = n - a + \sum_{j \in A} E_{j}^{2}, \end{matrix}

(40)

Ω is defined in (20), and

f^{opt}

is the optimal value of (P1). Hence, by fixing

ε > 0

, and using

V = 1 / ε

,

α = 1 / ε^{2}

, and

T \geq 1 / ε^{2}

, the average error is

O (ε)

.

Proof of Theorem 3.

The key to the proof is noticing that

Q (t)

can be treated as n queues. Before proceeding with the proof, we define some quantities. Define the history up to time t by

H (t) = {X (τ); 1 \leq τ < t} \cup {Ω (τ); 1 \leq τ \leq t}

. Notice that we include

Ω (t)

in

H (t)

since this will allow us to treat

γ (t)

and

Q (t)

as deterministic functions of

H (t)

and

Z

. Let us define the Lyapunov function

L (t) = \frac{1}{2} {| | Q (t) | |}^{2} = \frac{1}{2} \sum_{j = 1}^{n} Q_{j} {(t)}^{2}

, and the drift

Δ (t) = E {L (t + 1) - L (t) | H (t), Z}

. Now, notice that

\begin{matrix} E {R^{A} (g^{A}, \hat{g^{A}}) | Z} = f (\frac{1}{T} \sum_{t = 1}^{T} E {x (t) | Z}), \end{matrix}

(41)

where

\begin{matrix} x_{k} (t) = \{\begin{matrix} X_{k} (t) 1_{{g_{Q (t)}^{A} (X (t)) = k}} & if k \in A, \\ 1_{{g_{Q (t)}^{A} (X (t)) = k}} & if k \in A^{c} . \end{matrix} \end{matrix}

(42)

□

We begin with the following two lemmas, which will be useful in the proof.

Lemma 4.

The drift is bounded above as

\begin{matrix} Δ (t) \leq D_{1} + \sum_{j = 1}^{n} Q_{j} (t) (γ_{j} (t) - E {x_{j} (t) | H (t), Z}), \end{matrix}

(43)

where

D_{1}

is defined in (40).

Proof of Lemma 4.

See Appendix C. □

The following is a well-known result regarding the minimization of strongly convex functions (see, for example, a more general pushback result in [38]).

Lemma 5.

For a convex function

h : R^{n} \to R

, a convex subset

C

of

R^{n}

,

y \in R^{n}

and

α > 0

, let,

\begin{matrix} x^{*} \in arg min_{x \in C} [h (x) + α {∥ x - y ∥}_{2}^{2}] . \end{matrix}

(44)

Then,

\begin{matrix} h (x^{*}) + α {∥ x^{*} - y ∥}_{2}^{2} \leq h (z) + α {∥ z - y ∥}_{2}^{2} - α {∥ z - x^{*} ∥}_{2}^{2}, \end{matrix}

(45)

for all

z \in C

.

Now, we move on to the main proof. Notice that the objective of (P3) can be written as

\begin{matrix} g_{t}^{^{'}} {(γ (t - 1))}^{⊤} γ (t) + α {∥ γ (t) - γ (t - 1) ∥}_{2}^{2}, \end{matrix}

(46)

where

\begin{matrix} g_{t} (x) = - V f_{t} (x) + \sum_{j = 1}^{n} Q_{j} (t) x_{j} . \end{matrix}

(47)

Let

g^{A, *}

be the strategy that is optimal for (P1). Let us define

x^{*} (t) \in R^{n}

, where

x_{k}^{*} (t) = \{\begin{matrix} (48a) & X_{k} (t) 1_{{g^{A, *} (U^{A} (t), X (t), Z) = k}} & if k \in A, \\ (48b) & 1_{{g^{A, *} (U^{A} (t), X (t), Z) = k}} & if k \in A^{c}, \end{matrix}

where

U^{A} (t)

for

1 \leq t \leq T

is a collection of independent and identically distributed uniform

[0, 1)

random variables. Notice that

y^{*} = E {x^{*} (t) | Z}

is independent of t and belongs to

K

. Hence,

y^{*}

is feasible for (P3). Notice that

\begin{matrix} - V f_{t}^{^{'}} {(γ (t - 1))}^{⊤} γ (t) + \sum_{j = 1}^{n} Q_{j} (t) γ_{j} (t) + α {∥ γ (t) - γ (t - 1) ∥}_{2}^{2} \\ = g_{t}^{^{'}} {(γ (t - 1))}^{⊤} γ (t) + α {∥ γ (t) - γ (t - 1) ∥}_{2}^{2} \\ \leq_{(a)} g_{t}^{^{'}} {(γ (t - 1))}^{⊤} y^{*} + α {∥ y^{*} - γ (t - 1) ∥}_{2}^{2} - α {∥ y^{*} - γ (t) ∥}_{2}^{2} \\ = - V f_{t}^{^{'}} {(γ (t - 1))}^{⊤} y^{*} + \sum_{j = 1}^{n} Q_{j} (t) y_{j}^{*} + α {∥ y^{*} - γ (t - 1) ∥}_{2}^{2} - α {∥ y^{*} - γ (t) ∥}_{2}^{2}, \end{matrix}

(49)

where (a) follows from Lemma 5 for the convex function h given by

h (x) = g_{t}^{^{'}} {(γ (t - 1))}^{⊤} x

, and

C = K

, since

γ (t)

is the solution to (P3) and

y^{*}

is feasible for (P3). Further, step 5 in each iteration of Algorithm 1 of finding the action can be represented as the maximization of

\begin{matrix} \sum_{j \in A} Q_{j} (t) E {X_{j} (t) 1_{{α^{A} = j}} | H (t), Z} + \sum_{j \in A^{c}} Q_{j} (t) E {1_{{α^{A} = j}} | H (t), Z} \end{matrix}

(50)

over all possible actions

α^{A} \in {1, 2, \dots, n}

at time-slot t. Hence, comparing the scenario where

g_{Q (t)}^{A}

is used in the t-th iteration with the scenario where

g^{A, *}

is used with the randomization variable

U^{A} (t)

in the t-th iteration, we have the inequality,

\begin{matrix} - \sum_{j = 1}^{n} Q_{j} (t) E {x_{j} (t) | H (t), Z} & \leq - \sum_{j = 1}^{n} Q_{j} (t) E {x_{j}^{*} (t) | H (t), Z} = - \sum_{j = 1}^{n} Q_{j} (t) y_{j}^{*}, \end{matrix}

(51)

where the last equality follows since

x^{*} (t)

is independent of

H (t)

. Summing (49) and (51),

\begin{matrix} - V f_{t}^{^{'}} {(γ (t - 1))}^{⊤} γ (t) + α {∥ γ (t) - γ (t - 1) ∥}_{2}^{2} + \sum_{j = 1}^{n} Q_{j} (t) (γ_{j} (t) - E {x_{j} (t) | H (t), Z}) \\ \leq - V f_{t}^{^{'}} {(γ (t - 1))}^{⊤} y^{*} + α {∥ y^{*} - γ (t - 1) ∥}_{2}^{2} - α {∥ y^{*} - γ (t) ∥}_{2}^{2} . \end{matrix}

(52)

Adding

D_{1} + V f_{t}^{^{'}} {(γ (t - 1))}^{⊤} γ (t - 1)

to both sides and using Lemma 4 yields,

\begin{matrix} Δ (t) - V f_{t}^{^{'}} {(γ (t - 1))}^{⊤} \{γ (t) - γ (t - 1)\} + α {∥ γ (t) - γ (t - 1) ∥}_{2}^{2} \\ \leq D_{1} - V f_{t}^{^{'}} {(γ (t - 1))}^{⊤} (y^{*} - γ (t - 1)) + α {∥ y^{*} - γ (t - 1) ∥}_{2}^{2} - α {∥ y^{*} - γ (t) ∥}_{2}^{2} \\ \leq D_{1} - V {f_{t} (y^{*}) - f_{t} (γ (t - 1))} + α {∥ y^{*} - γ (t - 1) ∥}_{2}^{2} - α {∥ y^{*} - γ (t) ∥}_{2}^{2}, \end{matrix}

(53)

where the last inequality follows from the sub-gradient inequality for the concave function

f_{t}

. Now, we introduce the following lemma.

Lemma 6.

We have

\begin{matrix} - V f_{t}^{^{'}} {(γ (t - 1))}^{⊤} \{γ (t) - γ (t - 1)\} + α {∥ γ (t) - γ (t - 1) ∥}_{2}^{2} \geq & - \frac{V^{2}}{4 α} (a + \sum_{j \in A^{c}} E_{j}^{2}) \\ - \frac{V^{2}}{16 α} {∥ Ω (t) ∥}_{2}^{2}, \end{matrix}

(54)

Proof of Lemma 6.

See Appendix D. □

Substituting the bound from Lemma 6 in (53) we have that

\begin{matrix} Δ (t) - \frac{V^{2}}{4 α} (a + \sum_{j \in A^{c}} E_{j}^{2}) - \frac{V^{2}}{16 α} {∥ Ω (t) ∥}_{2}^{2} \\ \leq D_{1} - V {f_{t} (y^{*}) - f_{t} (γ (t - 1))} + α {∥ y^{*} - γ (t - 1) ∥}_{2}^{2} - α {∥ y^{*} - γ (t) ∥}_{2}^{2}, \end{matrix}

(55)

The above holds for each

t \in {1, 2, \dots, T}

. Hence, we first take the expectation conditioned on

Z

of both sides of the above expression, after which we sum from 1 to T, which results in,

\begin{matrix} E {L (T + 1) | Z} - E {L (1) | Z} - \frac{T V^{2}}{4 α} (a + \sum_{j \in A^{c}} E_{j}^{2}) - \frac{T V^{2}}{16 α} E {{∥ Ω ∥}_{2}^{2} | Z} \\ \leq D_{1} T - V \sum_{t = 1}^{T} E {f_{t} (y^{*}) | Z} + V \sum_{t = 1}^{T} E {f_{t} (γ (t - 1)) | Z} + α E {{∥ y^{*} - γ (0) ∥}_{2}^{2} | Z} \\ - α E {{∥ y^{*} - γ (T) ∥}_{2}^{2} | Z} . \end{matrix}

(56)

Notice that

\begin{matrix} E {f_{t} (y^{*}) | Z} = f (y^{*}) = f^{opt}, \end{matrix}

(57)

where functions f and

f_{t}

are defined in (25) and (33), respectively. Further, we have that

\begin{matrix} E {f_{t} ( & γ (t - 1)) | Z} = E {E_{Ω (t)} {f_{t} (γ (t - 1)) | H (t - 1), Z} | Z} =_{(a)} E {f (γ (t - 1)) | Z}, \end{matrix}

(58)

where (a) follows from the definition of

f_{t}

in (33), since

γ (t - 1)

is a function of

H (t - 1)

and

Ω (t)

is independent of

H (t - 1)

. Substituting (57) and (58) into (56), we have that

\begin{matrix} E {L (T + 1) | Z} - E {L (1) | Z} - \frac{T V^{2}}{4 α} (a + \sum_{j \in A^{c}} E_{j}^{2}) - \frac{T V^{2}}{16 α} E {{∥ Ω ∥}_{2}^{2} | Z} \\ \leq D_{1} T - V T f^{opt} + V \sum_{t = 1}^{T} E {f (γ (t - 1)) | Z} + α E {{∥ y^{*} - γ (0) ∥}_{2}^{2} | Z} - α E {{∥ y^{*} - γ (T) ∥}_{2}^{2} | Z} \\ \leq_{(a)} D_{1} T - V T f^{opt} + V \sum_{t = 1}^{T} E {f (γ (t - 1)) | Z} + α (n - a + \sum_{k \in A} E_{k}^{2}) \\ \leq D_{1} T - V T f^{opt} + V T f (\frac{1}{T} \sum_{t = 1}^{T} E {γ (t - 1) | Z}) + α D_{3}, \end{matrix}

(59)

where (a) follows since

y^{*}, γ (T), γ (0) \in K

and the last inequality follows from Jensen’s inequality on the concave function f. (See the definition of

D_{3}

and

D_{2}

in (40)). Since

Q (1) = 0

and

E {L (T + 1) | Z} \geq 0

, after some rearrangements above translates to,

\begin{matrix} f^{opt} - \frac{D_{1}}{V} & - \frac{V D_{2}}{16 α} - \frac{α D_{3}}{V T} \leq f (\frac{1}{T} \sum_{t = 0}^{T - 1} E {γ (t) | Z}), \end{matrix}

(60)

where

D_{2}

is defined in (40). Now, we prove the following lemma.

Lemma 7.

We have

\begin{matrix} f (\frac{1}{T} \sum_{t = 0}^{T - 1} E {γ (t) | Z}) \leq & f (\frac{1}{T} \sum_{t = 1}^{T} E {x (t) | Z}) + \frac{3}{2 T} \sum_{k \in A} \{\sqrt{α} + E_{k} (2 \sqrt{2 α} + 1)\} \\ + \frac{3}{2 T} \sum_{k \in A^{c}} \{E_{k}^{2} \sqrt{α} + E_{k} (2 \sqrt{2 α} + 1)\} . \end{matrix}

(61)

Proof of Lemma 7.

We first introduce the following two lemmas.

Lemma 8.

The queues

Q_{j} (t)

for

1 \leq j \leq n

updated according to Algorithm 1 satisfy,

\begin{matrix} \max \{\frac{1}{T} \sum_{t = 1}^{T - 1} E {γ_{j} (t) - x_{j} (t) | Z}, 0\} \leq \frac{E {Q_{j} (T) | Z}}{T} . \end{matrix}

(62)

Proof of Lemma 8.

See Appendix E. □

The following lemma is vital in constructing the

O (\sqrt{α})

bound on the queue sizes, which leads to the

O (1 / ε^{2})

solution. It should be noted that an easier bound can be obtained on the queue sizes, which leads to a

O (1 / ε^{3})

solution.

Lemma 9.

Given that

α \geq V^{2}

,

Q (t)

satisfy the bound

\begin{matrix} Q_{j} (t) \leq \{\begin{matrix} (1 + 2 \sqrt{2} E_{j}) \sqrt{α} + E_{j} & if j \in A \\ (E_{j} + 2 \sqrt{2}) \sqrt{α} + 1 & if j \in A^{c}, \end{matrix} \end{matrix}

(63)

for each

t \in [1 : T]

.

Proof of Lemma 9.

See Appendix F. □

Now, we move on to the main proof. Notice that

\begin{matrix} f (\frac{1}{T} \sum_{t = 0}^{T - 1} E {γ (t) | Z}) = f (\frac{γ (0)}{T} + \frac{1}{T} \sum_{t = 1}^{T - 1} E {γ (t) | Z}) \\ \leq f (\frac{1}{T} \sum_{t = 1}^{T} E {x (t) | Z} + \frac{1}{T} \sum_{t = 1}^{T - 1} E {γ (t) | Z} - \frac{1}{T} \sum_{t = 1}^{T - 1} E {x (t) | Z} - \frac{E {x (T) | Z}}{T}) \\ \leq_{(a)} f (\frac{1}{T} \sum_{t = 1}^{T} E {x (t) | Z} + \max \{\frac{1}{T} \sum_{t = 1}^{T - 1} E {γ (t) | Z} - \frac{1}{T} \sum_{t = 1}^{T - 1} E {x (t) | Z}, 0\}) \\ \leq_{(b)} f (\frac{1}{T} \sum_{t = 1}^{T} E {x (t) | Z}) + \frac{3}{2} \sum_{k \in A} (\max \{\frac{1}{T} \sum_{t = 1}^{T - 1} E {γ_{k} (t) - x_{k} (t) | Z}, 0\}) \\ + \frac{3}{2} \sum_{k \in A^{c}} E_{k} (\max \{\frac{1}{T} \sum_{t = 1}^{T - 1} E {γ_{k} (t) - x_{k} (t) | Z}, 0\}), \end{matrix}

(64)

where (a) follows from the entry-wise non-decreasing property of f (Theorem 2-2) and (b) follows from Theorem 2-3. Combining (64) and Lemma 8 with the bound on

Q (T)

given by Lemma 9, we are finished with the proof of the lemma. □

Combining Lemma 7 with (60), we are finished with the proof of the theorem.

6. Simulations

For the simulations, we use

W_{j}

as exponential random variables. Notice that since we are conditioning on

Z

to solve the problem, the objective of (P1) defined in (25) has the same structure for the two scenarios

(a, b, c, d)

and

(a, b, c + d, 0)

. Hence, we use

d = 0

for all the simulations. Notice that the sets

A

and

B

denote the private information of players A and B, respectively. We consider the three scenarios given below.

$a = 0, b = 0, c = 3, d = 0$ : Both players do not have private information.
$a = 0, b = 1, c = 2, d = 0$ : Only player B has private information.
$a = 1, b = 1, c = 1, d = 0$ : Both players have private information.

Figure 1, Figure 2 and Figure 3 show pictorial representations of these cases.

We first consider scenario 1. For Figure 4 (top-left), we fix

E_{2} = E_{3} = 1

and plot the expected utilities of players A and B at the

ϵ

-approximate Nash equilibrium as functions of

E_{1}

, where

ϵ = 10^{- 3}

is used. For Figure 4 (top-middle and top-right), we use the same configuration and plot a solution for the probabilities of choosing different resources as a function of

E_{1}

at the

ϵ

-approximate Nash equilibrium for players A and B, respectively. For scenarios 2 and 3, Figure 4 (middle) and Figure 4 (bottom), have similar descriptions to scenario 1.

We consider the same three scenarios for the simulations on maximizing the worst-case expected utility. In each scenario, for the top figure, we fix

E_{2} = E_{3} = 1

and plot the maximum expected worst-case utility of player A as a function of

E_{1}

. For the bottom figure, we use the same configuration and plot a solution for the probabilities of choosing different resources for player A as a function of

E_{1}

. Notice that the solutions may not be unique, as discussed in Section 5.1. Additionally, for Figure 5 (top-middle and top-right), we also indicate the maximum possible error of the solution calculated using the error bound derived in Theorem 3. For scenarios 2 and 3, we have obtained the solutions by averaging over

10^{2}

independent simulations. Further, we have used

T = 10^{5}

,

α = 4 \times 10^{4}

, and

V = 2 \times 10^{2}

.

Notice that it is difficult to compare the worst-case strategy and the

ϵ

-approximate Nash equilibrium strategy in general since the first can be computed without any cooperation between the players, whereas computing the second requires cooperation among players. Further, as described in Section 5.1, the worst-case strategy can be arbitrarily worse than the Nash equilibrium strategy. Nevertheless, comparing Figure 4 (left) and Figure 5 (top), it can be seen that the worst-case strategy and the strategy at

ϵ

-approximate Nash equilibrium yield comparable expected utilities for player A when

E_{1} \geq 2

. For instance, in scenario 1, for

E_{1} \geq 2

, the approximate Nash equilibrium strategy coincides with the worst-case strategy of choosing resource 1 with probability 1. However, it should be noted that our algorithm for finding the

ϵ

-approximate Nash equilibrium does not necessarily converge to a socially optimal solution. For instance, in scenario 1, when

E_{1} = 2

, player A chooses resource 1 with probability 1 and player B chooses resource 2 with probability 1 gives a higher utility for player A without changing the utility of player B.

In Figure 5, it is interesting to notice the variation in choice probabilities of different resources with

E_{1}

. Notice that in scenario 1, the choice probability of resource 1 is non-decreasing for

E_{1} \in [0.1, 0.8]

, non-increasing for

E_{1} \in [0.8, 1.9]

, and non-decreasing for

E_{1} \geq 1.9

. Similar behavior can also be observed for scenario 3. This is surprising since intuition suggests that the probability of choosing a resource should increase with the increasing mean of the reward random variable. However, notice that in scenarios 1 and 3, player B does not observe the reward realization of resource 1. This might force player A, playing for the worst case, to believe that player B increases the probability of choosing resource 1 with increasing

E_{1}

, as a result of which player A chooses resource 1 with a lower probability. Notice that the probability of choosing resource 1 in scenario 3 does not grow as fast as the other two. This is because player A observes

W_{1}

and hence can refrain from choosing it when

W_{1}

takes low values.

7. Conclusions

We have implemented the iterative best response algorithm to find the

ϵ

-approximate Nash equilibrium of a two-player stochastic resource-sharing game with asymmetric information. To handle situations where the players do not trust each other and place no assumptions on the incentives of the opponent, we solved the problem of maximizing the worst-case expected utility of the first player using a novel algorithm that combines drift-plus penalty theory and online optimization techniques. An explicit solution can be constructed when both players do not observe the realizations of any of the reward random variables. This special case leads to counter-intuitive insights.

In our approach, we have assumed that the reward random variables of different resources are independent. It should be noted that this assumption can be relaxed without affecting the analysis for the special case when both players do not observe the realizations of any of the reward random variables. An interesting question would be what happens in the general case when the reward random variables are not independent. While it is still possible to implement our algorithm in this setting, it is not guaranteed that the algorithm will converge to the optimal solution. Hence, finding an algorithm for this case that exploits the correlations between the reward random variables could be potential future work.

Several other extensions can be considered as well. One would be considering a scenario with multiple players. The general multiplayer case yields a complex information structure since the set of resources has to be split into

2^{m}

subsets, where m is the number of players. Additionally, the idea of conditioning on the common information is difficult to be adapted for this case. Nevertheless, various simplified schemes could be considered. One example would be a case with no common information. In this case, the set of resources is split into

m + 1

disjoint subsets where the i-th (

1 \leq i \leq m

) subset is the subset of resources of which the i-th player observes the rewards, and the

m + 1

-th subset is the subset of resources of which the rewards are observed by none of the players. Another interesting scenario is when no player observes any of the reward realizations. In both these cases, the expected utility can be calculated following a similar procedure to the two-player case, but finding the worst-case expected utility is difficult. Hence, we believe both cases could be potential future work. Another extension would be extending the algorithm to be implemented with a repeated game structure and in an online scenario.

Author Contributions

Conceptualization, M.W. and M.J.N.; methodology, M.W.; software, M.W.; validation, M.W.; writing—original draft preparation, M.W.; writing—review and editing, M.J.N.; visualization, M.W.; supervision, M.J.N.; project administration, M.J.N. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by one or more of: NSF CCF-1718477, NSF SpecEES 1824418.

Data Availability Statement

This paper does not use any data from external sources.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Proof of Theorem 2

Notice that the term

E {\max {Ω_{j} x_{j}; 1 \leq j \leq n} | Z}

of f is convex since the max function is convex and expectation preserves convexity. Hence, f is concave.

For 2 and 3, we use the two inequalities,

\begin{matrix} f & (x) - f (y) \geq \sum_{j \in A} (x_{j} - y_{j}) + \sum_{j \in A^{c}} E_{j} (x_{j} - y_{j}) - \frac{1}{2} E {\max {Ω_{j} (x_{j} - y_{j}); j \in [1 : n]} | Z}, \end{matrix}

(A1)

and

\begin{matrix} f & (x) - f (y) \leq \sum_{j \in A} (x_{j} - y_{j}) + \sum_{j \in A^{c}} E_{j} (x_{j} - y_{j}) + \frac{1}{2} E {\max {Ω_{j} (y_{j} - x_{j}); j \in [1 : n]} | Z}, \end{matrix}

(A2)

both of which follow from the fact that for real numbers

γ_{1}, γ_{2}, γ_{3}, γ_{4}

,

\max {γ_{1} + γ_{2}, γ_{3} + γ_{4}} \leq \max {γ_{1}, γ_{3}} + \max {γ_{2}, γ_{4}}

.

For 2, we consider

x \geq y

where the inequality is entry-wise. Notice that

\begin{matrix} f (x) - f (y) & \geq \sum_{j \in A} (x_{j} - y_{j}) + \sum_{j \in A^{c}} E_{j} (x_{j} - y_{j}) - \frac{1}{2} E {\sum_{j = 1}^{n} Ω_{j} (x_{j} - y_{j}) | Z} \\ = \sum_{j \in A} \frac{1}{2} (x_{j} - y_{j}) + \sum_{j \in A^{c}} \frac{E_{j}}{2} (x_{j} - y_{j}), \end{matrix}

where the inequality follows from (A1) and the fact that for

γ_{1}, γ_{2} \geq 0

,

\max {γ_{1}, γ_{2}} \leq γ_{1} + γ_{2}

.

For 3, note that

\begin{matrix} f (x) - f (y) \leq_{(a)} \sum_{j \in A} | x_{j} - y_{j} | + \sum_{j \in A^{c}} E_{j} | x_{j} - y_{j} | + \frac{1}{2} E {\sum_{j = 1}^{n} Ω_{j} | x_{j} - y_{j} | | Z} \\ = \frac{3}{2} \sum_{j \in A} | x_{j} - y_{j} | + \frac{3}{2} \sum_{j \in A^{c}} E_{j} | x_{j} - y_{j} |, \end{matrix}

(A3)

where (a) follows from (A2) and the fact that for

γ_{1}, γ_{2} \geq 0

,

\max {γ_{1}, γ_{2}} \leq γ_{1} + γ_{2}

.

Appendix B. Proof of Lemma 3

We begin with several results which are used in the proof.

Lemma A1.

If

(p^{*}, γ^{*})

solves the problem,

\begin{matrix} (P 2 - 1) : & \underset{p, γ}{maximize} & \sum_{k = 1}^{n} p_{k} E_{k} - \frac{1}{2} γ \\ subject to & p \in I, \\ γ \geq p_{k} E_{k} \forall 1 \leq k \leq n, \end{matrix}

(A4)

where

I

is the n-dimensional probability simplex defined in (28), then

p^{*}

solves (P2).

Proof of Lemma A1.

Define,

\begin{matrix} f_{1} (p, γ) = \sum_{k = 1}^{n} p_{k} E_{k} - \frac{1}{2} γ . \end{matrix}

(A5)

Notice that

f (p) = f_{1} (p, max {p_{k} E_{k}; 1 \leq k \leq n})

. Let

(p^{*}, γ^{*})

be a solution for (P2-1). Notice that for

(p^{*}, γ^{*})

to be feasible for (P2-1), we should have

γ^{*} \geq max {p_{k}^{*} E_{k}; 1 \leq k \leq n}

. However, if

γ^{*} > max {p_{k}^{*} E_{k}; 1 \leq k \leq n}

, we have that

f_{1} (p, max {p_{k}^{*} E_{k}; 1 \leq k \leq n}) > f_{1} (p^{*}, γ^{*})

, which contradicts the optimality of

(p^{*}, γ^{*})

for (P2-1). Hence,

γ^{*} = max {p_{k}^{*} E_{k}; 1 \leq k \leq n}

. Hence, we have

f (p^{*}) = f_{1} (p^{*}, γ^{*})

.

Now, consider

\tilde{p} \in I

. Define

\tilde{γ} = max {{\tilde{p}}_{k} E_{k}; 1 \leq k \leq n}

. Since

(\tilde{p}, \tilde{γ})

is also feasible for (P2-1), we should have

f_{1} (\tilde{p}, \tilde{γ}) \leq f_{1} (p^{*}, γ^{*})

. This implies

f (p^{*}) \geq f (\tilde{p})

. Hence,

p^{*}

is an optimal solution of (P2). □

Lemma A2.

Consider fixed

μ \in R^{n}

such that

μ_{k} \geq 0

for all

1 \leq k \leq n

. Now, consider the unconstrained problem,

\begin{matrix} (P 2 - 2) : & maximize & f_{2} (p, γ) = \sum_{k = 1}^{n} p_{k} E_{k} - \frac{1}{2} γ + \sum_{k = 1}^{n} μ_{k} (γ - p_{k} E_{k}) \\ subject to & p \in I, γ \in R . \end{matrix}

(A6)

Assume

(p^{*}, γ^{*})

is a solution (P2-2). Additionally, assume that

\begin{matrix} E_{k} p_{k}^{*} \leq γ^{*} for all 1 \leq k \leq n, \\ E_{k} p_{k}^{*} = γ^{*} whenever μ_{k} > 0 . \end{matrix}

(A7)

Then

(p^{*}, γ^{*})

is a solution for (P2-1).

Proof of Lemma A2.

First, notice that

(p^{*}, γ^{*})

satisfies the constraints of (P2-1). To show that it maximizes the objective in (P2-1), consider any

(p, γ)

that is feasible for (P2-1). Notice that

\begin{matrix} f_{1} (p, γ) & = f_{2} (p, γ) - \sum_{μ_{k} > 0} μ_{k} (γ - p_{k} E_{k}) \\ \leq_{(a)} f_{2} (p^{*}, γ^{*}) - \sum_{μ_{k} > 0} μ_{k} (γ - p_{k} E_{k}) \\ = f_{1} (p^{*}, γ^{*}) + \sum_{μ_{k} > 0} μ_{k} (γ^{*} - p_{k}^{*} E_{k} - γ + p_{k} E_{k}) \\ =_{(b)} f_{1} (p^{*}, γ^{*}) - \sum_{μ_{k} > 0} μ_{k} (γ - p_{k} E_{k}) \leq_{(c)} f_{1} (p^{*}, γ^{*}), \end{matrix}

(A8)

where

f_{1}

is the objective of (P2-1) defined in (A5),

f_{2}

is the objective of (P2-2), (a) follows from the optimality of

(p^{*}, γ^{*})

for (P2-2), (b) follows due to (A7), and (c) follows since

μ_{k} \geq 0

and

(p, γ)

is feasible for (P2-1). Hence, we have the result. □

Define,

\begin{matrix} S_{k} = \sum_{j = 1}^{k} \frac{1}{E_{j}}, \end{matrix}

(A9)

for

1 \leq k \leq n

. We also establish the following lemma, which is useful in our solution.

Lemma A3.

Let

\begin{matrix} r = arg max_{1 \leq k \leq n} \frac{k - \frac{1}{2}}{S_{k}}, \end{matrix}

(A10)

where

arg max

returns the lowest index in the case of ties. Let us also define

μ \in R^{n}

as

\begin{matrix} μ_{k} = \{\begin{matrix} 1 - \frac{1}{E_{k}} \frac{r - \frac{1}{2}}{S_{r}} & if 1 \leq k \leq r, \\ 0 & otherwise . \end{matrix} \end{matrix}

(A11)

Then we have

1.: $μ_{k} \geq 0$ for all k such that $1 \leq k \leq n$ .
2.: $\sum_{k = 1}^{n} μ_{k} = \frac{1}{2}$ .
3.: $E_{k} (1 - μ_{k}) = \frac{r - \frac{1}{2}}{S_{r}}$ for $1 \leq k \leq r$ .
4.: $E_{k} (1 - μ_{k}) \leq \frac{r - \frac{1}{2}}{S_{r}}$ for $r + 1 \leq k \leq n$ .

Proof of Lemma A3.

Notice that by the definition of $μ_{k}$ , it is enough to prove the result for $1 \leq k \leq r$ . Notice that we are required to prove that

$\begin{matrix} \frac{1}{E_{k}} \frac{r - \frac{1}{2}}{S_{r}} \leq 1, \end{matrix}$

(A12)

for all $1 \leq k \leq r$ . Since $E_{k} \geq E_{k + 1}$ for $1 \leq k \leq n - 1$ , it suffices to prove that

$\begin{matrix} \frac{1}{E_{r}} \frac{r - \frac{1}{2}}{S_{r}} \leq 1 . \end{matrix}$

(A13)

We consider two cases.
Case 1: $r = 1$ . This case reduces to,

$\begin{matrix} \frac{1}{2 E_{1}} \leq \frac{1}{E_{1}}, \end{matrix}$

(A14)

which is trivial.
Case 2: $r > 1$ . Note that from the definition of r in (A10), we have

$\begin{matrix} \frac{r - \frac{1}{2}}{S_{r}} \geq \frac{r - \frac{3}{2}}{S_{r - 1}} . \end{matrix}$

(A15)

After substituting $S_{r - 1} = S_{r} - \frac{1}{E_{r}}$ and rearranging, we have the desired result.
Notice that

$\begin{matrix} \sum_{k = 1}^{n} μ_{k} = \sum_{k = 1}^{r} μ_{k} = \sum_{k = 1}^{r} (1 - \frac{1}{E_{k}} \frac{r - \frac{1}{2}}{S_{r}}) = r - \frac{r - \frac{1}{2}}{S_{r}} \sum_{k = 1}^{r} \frac{1}{E_{k}} = r - \frac{r - \frac{1}{2}}{S_{r}} S_{r} = \frac{1}{2} . \end{matrix}$

(A16)
This follows from the definition of $μ_{k}$ for $1 \leq k \leq r$ .
There is nothing to prove if $r = n$ . Hence, we can assume $r < n$ . Since $μ_{k} = 0$ for $k \geq r + 1$ , it suffices to prove that $E_{k} \leq \frac{r - (1 / 2)}{S_{r}}$ . Notice that if we can prove the result for $k = r + 1$ , we are finished since $E_{k} \geq E_{k + 1}$ for $1 \leq k \leq n$ . Note that from the definition of r in (A10), we have

$\begin{matrix} \frac{r - \frac{1}{2}}{S_{r}} \geq \frac{r + \frac{1}{2}}{S_{r + 1}} . \end{matrix}$

(A17)

After substituting $S_{r + 1} = S_{r} + \frac{1}{E_{r + 1}}$ and rearranging, we have the desired result.

□

Now, we solve the problem using the above lemmas. Consider the problem defined in Lemma A2 with

μ

defined in Lemma A3. Specifically, consider the problem,

\begin{matrix} (P 2 - 3) : & maximize & f_{2} (p, γ) = \sum_{k = 1}^{n} p_{k} E_{k} - \frac{1}{2} γ + \sum_{k = 1}^{n} μ_{k} (γ - p_{k} E_{k}) \\ subject to & p \in I, γ \in R . \end{matrix}

(A18)

where

μ

and r are defined in (A10) and (A11). For this choice of

μ_{k}

we have

\begin{matrix} f_{2} (p, γ) = \sum_{k = 1}^{n} p_{k} E_{k} (1 - μ_{k}) + γ (\sum_{k = 1}^{n} μ_{k} - \frac{1}{2}) = \sum_{k = 1}^{n} p_{k} E_{k} (1 - μ_{k}), \end{matrix}

(A19)

where the last equality follows from Lemma A3-2. Now, due to Lemma A3-3 and Lemma A3-4, the optimal solution for (P2-3) is any

(p, γ)

such that

γ \in R

, and

p \in I

such that

p_{k} = 0

for

k > r

. In particular, consider the solution

(p^{*}, γ^{*})

given by,

\begin{matrix} p_{k}^{*} = \{\begin{matrix} \frac{1}{E_{k} S_{r}} & if k \leq r, \\ 0 & otherwise, \end{matrix} \end{matrix}

(A20)

and

γ^{*} = \frac{1}{S_{r}}

. Notice that for

1 \leq k \leq r

, we have that

p_{k}^{*} E_{k} = γ^{*}

, and

p_{k}^{*} E_{k} = 0 \leq γ^{*}

for

r + 1 \leq k \leq n

. Hence, from Lemma A2,

(p^{*}, γ^{*})

is a solution for (P2-1). Hence, from Lemma A1,

p^{*}

is a solution for (P2) as desired.

Appendix C. Proof of Lemma 4

Notice that

\begin{matrix} Δ & (t) = E {L (t + 1) - L (t) | H (t), Z} = \frac{1}{2} E \{\sum_{j = 1}^{n} Q_{j} {(t + 1)}^{2} - Q_{j} {(t)}^{2} | H (t), Z\} \\ = \frac{1}{2} \sum_{j = 1}^{n} E {Q_{j} {(t + 1)}^{2} | H (t), Z} - \frac{1}{2} \sum_{j = 1}^{n} Q_{j} {(t)}^{2} \\ = \frac{1}{2} \sum_{j = 1}^{n} E {\max {\{Q_{j} (t) + γ_{j} (t) - x_{j} (t), 0\}}^{2} | H (t), Z} - \frac{1}{2} \sum_{j = 1}^{n} Q_{j} {(t)}^{2} \\ \leq \frac{1}{2} \sum_{j = 1}^{n} E {{(Q_{j} (t) + γ_{j} (t) - x_{j} (t))}^{2} | H (t), Z} - \frac{1}{2} \sum_{j = 1}^{n} Q_{j} {(t)}^{2} \\ \leq \sum_{j = 1}^{n} Q_{j} (t) (γ_{j} (t) - E {x_{j} (t) | H (t), Z}) + \frac{1}{2} \sum_{j = 1}^{n} γ_{j} {(t)}^{2} + \frac{1}{2} \sum_{j = 1}^{n} E {x_{j} {(t)}^{2} | H (t), Z} \\ \leq_{(a)} \sum_{j = 1}^{n} Q_{j} (t) (γ_{j} (t) - E {x_{j} (t) | H (t), Z}) + \frac{1}{2} \sum_{j \in A} E_{j}^{2} + \frac{n - a}{2} \\ + \frac{1}{2} \sum_{j \in A} E {{(X_{j} (t) 1_{{α (t) = k}})}^{2} | H (t), Z} + \frac{1}{2} \sum_{j \in A^{c}} E {{(1_{{α (t) = k}})}^{2} | H (t), Z} \\ \leq \sum_{j = 1}^{n} Q_{j} (t) (γ_{j} (t) - E {x_{j} (t) | H (t), Z}) + \frac{1}{2} \sum_{j \in A} E_{j}^{2} + \frac{n - a}{2} + \frac{1}{2} \sum_{j \in A} E {X_{j} {(t)}^{2} | H (t), Z} \\ + \frac{1}{2} \sum_{j \in A^{c}} E {1 | H (t), Z} \\ =_{(b)} \sum_{j = 1}^{n} Q_{j} (t) (γ_{j} (t) - E {x_{j} (t) | H (t), Z}) + \frac{1}{2} \sum_{j \in A} E_{j}^{2} + \frac{n - a}{2} + \frac{1}{2} \sum_{j \in A} E {W_{j}^{2}} + \frac{n - a}{2} \\ = \sum_{j = 1}^{n} Q_{j} (t) (γ_{j} (t) - E {x_{j} (t) | H (t), Z}) + D_{1}, \end{matrix}

(A21)

where inequality (a) follows since

γ (t) \in K

and equality (b) follows from the fact that

X (t)

is independent of

H (t)

and

Z

.

Appendix D. Proof of Lemma 6

Notice that

\begin{matrix} f_{t}^{^{'}} (γ (t - 1)) = v - \frac{1}{2} \tilde{Ω} (t), \end{matrix}

(A22)

where

v

is defined by,

\begin{matrix} v_{j} = \{\begin{matrix} 1 & if j \in A, \\ E_{j} & if j \in A^{c} . \end{matrix} \end{matrix}

(A23)

\tilde{Ω} (t)

is given by

{\tilde{Ω}}_{k} (t) = Ω_{k} (t) 1_{{arg {max}_{1 \leq j \leq n} {γ_{j} (t - 1) Ω_{j} (t)} = k}}

, and

arg max

returns the least index in the case of ties. Notice that

\begin{matrix} - V f_{t}^{^{'}} {(γ (t - 1))}^{⊤} \{γ (t) - γ (t - 1)\} + α {∥ γ (t) - γ (t - 1) ∥}_{2}^{2} \\ \geq_{(a)} - V {∥ f_{t}^{^{'}} (γ (t - 1)) ∥}_{2} {∥ γ (t) - γ (t - 1) ∥}_{2} + α {∥ γ (t) - γ (t - 1) ∥}_{2}^{2} \\ = α {({∥ γ (t) - γ (t - 1) ∥}_{2} - \frac{V}{2 α} {∥ f_{t}^{^{'}} (γ (t - 1)) ∥}_{2})}^{2} - \frac{V^{2}}{4 α} {∥ f_{t}^{^{'}} (γ (t - 1)) ∥}_{2}^{2} \\ \geq - \frac{V^{2}}{4 α} {∥ f_{t}^{^{'}} (γ (t - 1)) ∥}_{2}^{2} = - \frac{V^{2}}{4 α} {∥v - \frac{1}{2} \tilde{Ω} (t)∥}_{2}^{2} \geq_{(b)} - \frac{V^{2}}{4 α} {∥ v ∥}_{2}^{2} - \frac{V^{2}}{16 α} {∥ \tilde{Ω} (t) ∥}_{2}^{2} \\ \geq - \frac{V^{2}}{4 α} (a + \sum_{j \in A^{c}} E_{j}^{2}) - \frac{V^{2}}{16 α} {∥ Ω (t) ∥}_{2}^{2}, \end{matrix}

(A24)

where (a) follows from the Cauchy–Schwarz inequality, (b) follows since

v_{k} \geq 0

and

{\tilde{Ω}}_{k} (t) \geq 0

for all

1 \leq k \leq n

.

Appendix E. Proof of Lemma 8

Notice that from the definition of

Q_{j} (t + 1)

in (35) and the definition of

x_{j} (t)

in (42) we have that

\begin{matrix} Q_{j} (t + 1) - Q_{j} (t) \geq γ_{j} (t) - x_{j} (t), \end{matrix}

(A25)

for all

1 \leq j \leq n

and

1 \leq t \leq T - 1

. Summing the above from 1 to

T - 1

, we have that

\begin{matrix} Q_{j} (T) - Q_{j} (1) \geq \sum_{t = 1}^{T - 1} {γ_{j} (t) - x_{j} (t)} . \end{matrix}

(A26)

After using

Q_{j} (1) = 0

, taking expectations conditioned on

Z

and some algebraic manipulations, we have

\begin{matrix} \frac{E {Q_{j} (T) | Z}}{T} \geq \frac{1}{T} \sum_{t = 1}^{T - 1} E {γ_{j} (t) - x_{j} (t) | Z} . \end{matrix}

(A27)

We have the desired inequality from the above since

Q_{j} (T)

is non-negative.

Appendix F. Proof of Lemma 9

Define

v, u

as follows.

\begin{matrix} v_{k} = \{\begin{matrix} 1 & if k \in A, \\ E_{k} & if k \in A^{c}, \end{matrix} and u_{k} = \{\begin{matrix} E_{k} & if k \in A, \\ 1 & if k \in A^{c} . \end{matrix} \end{matrix}

(A28)

Hence we are required to prove that

Q_{j} (t) \leq (v_{j} + 2 \sqrt{2} u_{j}) \sqrt{α} + u_{j}

for all

t \in [1 : T]

.

We begin with several important results.

Lemma A4.

We have the following results regarding

Q_{j} (t)

.

1.: $Q_{j} (t + 1) \leq Q_{j} (t) + u_{j}$ for all $t \geq 1$ .
2.: Assume $Q_{j} (t) \geq (v_{j} + \sqrt{2} u_{j}) \sqrt{α}$ for some $t \geq 1$ . Then we have either $γ_{j} (t) = 0$ or

$\begin{matrix} γ_{j} (t) \leq γ_{j} (t - 1) - \frac{u_{j}}{\sqrt{2 α}} . \end{matrix}$

(A29)
3.: Assume $Q_{j} (τ) \geq (v_{j} + \sqrt{2} u_{j}) \sqrt{α}$ for all $τ \in [t : t + t_{0}]$ , where $t \geq 1$ and $t_{0} \geq 0$ . Additionally assume $γ_{j} (t - 1) = 0$ . Then $γ_{j} (τ) = 0$ for all $τ \in [t - 1 : t + t_{0}]$ .

Proof of Lemma A4.

Notice that from the definition of $Q_{j} (t + 1)$ in (35), for $j \in A$ we have

$\begin{matrix} Q_{j} (t + 1) & = max \{Q_{j} (t) + γ_{j} (t) - X_{j} (t) 1_{{α^{A} (t) = j}}, 0\} \leq max \{Q_{j} (t) + u_{j}, 0\} \\ = Q_{j} (t) + u_{j}, \end{matrix}$

(30)

where the inequality follows from the definition of $u_{j}$ in (A28). The same argument can be repeated for $j \in A^{c}$ .
Notice that if $γ_{j} (t) \neq 0$ then we have

$\begin{matrix} γ_{j} (t) \leq γ_{j} (t - 1) - \frac{- V f_{t, j}^{^{'}} (γ (t - 1)) + Q_{j} (t)}{2 α}, \end{matrix}$

(A31)

which follows since $γ_{j} (t)$ is the projection of $γ_{j} (t - 1) - \frac{- V f_{t, j}^{^{'}} (γ (t - 1)) + Q_{j} (t)}{2 α}$ onto $[0, u_{j}]$ (See (38)). Hence, we have that

$\begin{matrix} γ_{j} (t) & \leq γ_{j} (t - 1) - \frac{- V f_{t, j}^{^{'}} (γ (t - 1)) + Q_{j} (t)}{2 α} \\ \leq_{(a)} γ_{j} (t - 1) - \frac{- V v_{j} + (v_{j} + \sqrt{2} u_{j}) \sqrt{α}}{2 α} \leq_{(b)} γ_{j} (t - 1) - \frac{u_{j}}{\sqrt{2 α}}, \end{matrix}$

(A32)

where (a) follows from the subgradients of $f_{t}$ found in (34) and (b) follows from $α \geq V^{2}$ .
Notice if we prove $γ_{j} (t) = 0$ , we can use the same argument inductively to establish the result. Assume the contrary that $γ_{j} (t) \neq 0$ . Then, from part 2, we should have

$\begin{matrix} γ_{j} (t) \leq γ_{j} (t - 1) - \frac{u_{j}}{\sqrt{2 α}} = - \frac{u_{j}}{\sqrt{2 α}}, \end{matrix}$

(A33)

which is a contradiction since $γ_{j} (t) \geq 0$ . Hence, we have the result.

□

Now, we use an inductive argument to prove the main result. Notice that the result is true for

t = 1

, since

Q_{j} (1) = 0 \leq (v_{j} + 2 \sqrt{2} u_{j}) \sqrt{α} + u_{j}

. Now, we prove that

Q_{j} (t + 1) \leq (v_{j} + 2 \sqrt{2} u_{j}) \sqrt{α} + u_{j}

for

t \geq 1

, with the assumption that

Q_{j} (t) \leq (v_{j} + 2 \sqrt{2} u_{j}) \sqrt{α} + u_{j}

.

We consider three cases.

Case 1:

Q_{j} (t) \leq (v_{j} + 2 \sqrt{2} u_{j}) \sqrt{α}

. This case follows from Lemma A4-1.

Case 2:

t \leq \sqrt{2 α} + 1

. Notice that

\begin{matrix} Q_{j} (t + 1) \leq Q_{j} (1) + u_{j} t \leq (\sqrt{2 α} + 1) u_{j} \leq (v_{j} + 2 \sqrt{2} u_{j}) \sqrt{α} + u_{j}, \end{matrix}

(A34)

where the first inequality follows from Lemma A4-1.

Case 3:

t > \sqrt{2 α} + 1

and

Q_{j} (t) > (v_{j} + 2 \sqrt{2} u_{j}) \sqrt{α}

. For this, we prove that

γ_{j} (t) = 0

, which establishes the claim from the definition of

Q_{j} (t + 1)

in (35) and the induction hypothesis.

Notice that for all

u \in [1 : t]

we have

\begin{matrix} Q_{j} (u) & \geq_{(a)} Q_{j} (t) - (t - u) u_{j} \geq (v_{j} + 2 \sqrt{2} u_{j}) \sqrt{α} - (t - u) u_{j} \\ = (v_{j} + \sqrt{2} u_{j}) \sqrt{α} + \sqrt{2 α} u_{j} - (t - u) u_{j}, \end{matrix}

(A35)

where (a) follows from Lemma A4-1.

Hence, for all

u \in Z

such that

t - \sqrt{2 α} \leq u \leq t

, we have that

\begin{matrix} Q_{j} (u) \geq (v_{j} + \sqrt{2} u_{j}) \sqrt{α} . \end{matrix}

(A36)

Now, we prove that there exists

u \in Z

such that

t - \sqrt{2 α} \leq u \leq t

and

γ_{j} (u) = 0

, which will establish that

γ_{j} (t) = 0

from Lemma A4-3. For the proof, assume the contrary

γ_{j} (u) > 0

for all

u \in Z

such that

t - \sqrt{2 α} \leq u \leq t

(or

u \in [t - ⌊\sqrt{2 α}⌋ : t]

, where

⌊ x ⌋

denotes the largest integer smaller than or equal to x). From Lemma A4-2, we have that

\begin{matrix} γ_{j} (t) \leq γ_{j} (t - ⌊\sqrt{2 α}⌋ - 1) - \frac{(⌊\sqrt{2 α}⌋ + 1) u_{j}}{\sqrt{2 α}} \leq 0, \end{matrix}

(A37)

where the last inequality follows since

⌊ x ⌋ + 1 \geq x

and

γ_{j} (t - ⌊\sqrt{2 α}⌋ - 1) \leq u_{j}

(since

γ_{j} (τ) \in [0, u_{j}]

for all

τ \in [1 : T]

by the projection definition of

γ_{j} (τ)

in (38), and

γ_{j} (0) = 0

). Hence, we should have that

γ_{j} (t) = 0

, which contradicts our initial assumption. Hence, we are finished.

Notes

1

Ideally, player B may not have information about

q_{j}^{A}

and

p_{j}^{A}

. Hence, player B may not be able to utilize this exact strategy. Nevertheless, obtaining a better bound is impossible since we do not have any assumptions or information about player B’s strategy. For instance, if player B assumes that player A is using a particular strategy and if player B’s assumption turns out to be correct since player B knows the distributions of all

W_{j}

for

1 \leq j \leq n

, player B’s estimates of

q_{j}^{A}

and

p_{j}^{A}

are exact.

2

The same problem structure arises in the case with symmetric information between the players (case

a = b = 0

with d arbitrary). Hence, we can use the solution obtained in this section for the above case as well.

References

Akkarajitsakul, K.; Hossain, E.; Niyato, D.; Kim, D.I. Game Theoretic Approaches for Multiple Access in Wireless Networks: A Survey. IEEE Commun. Surv. Tutor. 2011, 13, 372–395. [Google Scholar] [CrossRef]
Aryafar, E.; Keshavarz-Haddad, A.; Wang, M.; Chiang, M. RAT selection games in HetNets. In Proceedings of the 2013 Proceedings IEEE INFOCOM, Turin, Italy, 14–19 April 2013; pp. 998–1006. [Google Scholar] [CrossRef]
Felegyhazi, M.; Cagalj, M.; Bidokhti, S.S.; Hubaux, J.P. Non-Cooperative Multi-Radio Channel Allocation in Wireless Networks. In Proceedings of the IEEE INFOCOM 2007—26th IEEE International Conference on Computer Communications, Anchorage, AK, USA, 6–12 March 2007; pp. 1442–1450. [Google Scholar] [CrossRef]
Li, B.; Qu, Q.; Yan, Z.; Yang, M. Survey on OFDMA based MAC protocols for the next generation WLAN. In Proceedings of the 2015 IEEE Wireless Communications and Networking Conference Workshops (WCNCW), New Orleans, LA, USA, 9–12 March 2015; pp. 131–135. [Google Scholar] [CrossRef]
Rosenthal, R.W. A class of games possessing pure-strategy Nash equilibria. Int. J. Game Theory 1973, 2, 65–67. [Google Scholar] [CrossRef]
Nikolova, E.; Stier-Moses, N.E. Stochastic Selfish Routing. In Proceedings of the Algorithmic Game Theory, Amalfi, Italy, 17–19 October 2011; pp. 314–325. [Google Scholar]
Angelidakis, H.; Fotakis, D.; Lianeas, T. Stochastic Congestion Games with Risk-Averse Players. In Proceedings of the SAGT 2013, Lecture Notes in Computer Science, Aachen, Germany, 21–23 October 2013. [Google Scholar] [CrossRef]
Zhou, C.; Nguyen, T.H.; Xu, H. Algorithmic Information Design in Multi-Player Games: Possibilities and Limits in Singleton Congestion. In Proceedings of the 23rd ACM Conference on Economics and Computation, EC’22, Boulder, CO, USA, 11–15 July 2022; Association for Computing Machinery: New York, NY, USA, 2022; p. 869. [Google Scholar] [CrossRef]
Castiglioni, M.; Celli, A.; Marchesi, A.; Gatti, N. Signaling in Bayesian Network Congestion Games: The Subtle Power of Symmetry. Proc. Aaai Conf. Artif. Intell. 2021, 35, 5252–5259. [Google Scholar] [CrossRef]
Wu, M.; Liu, J.; Amin, S. Informational aspects in a class of Bayesian congestion games. In Proceedings of the 2017 American Control Conference (ACC), Seattle, WA, USA, 24–26 May 2017; pp. 3650–3657. [Google Scholar] [CrossRef]
Syrgkanis, V. The complexity of equilibria in cost sharing games. In Proceedings of the Internet and Network Economics: 6th International Workshop, WINE 2010, Stanford, CA, USA, 13–17 December 2010; Proceedings 6. Springer: Berlin/Heidelberg, Germany, 2010; pp. 366–377. [Google Scholar]
Zinkevich, M. Online Convex Programming and Generalized Infinitesimal Gradient Ascent. In Proceedings of the Twentieth International Conference on International Conference on Machine Learning, Washington, DC, USA, 21–24 August 2003; pp. 928–935. [Google Scholar]
Yu, H.; Neely, M.; Wei, X. Online Convex Optimization with Stochastic Constraints. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Curran Associates, Inc.: Brooklyn, NY, USA, 2017; Volume 30. [Google Scholar]
Neely, M.J. Stochastic Network Optimization with Application to Communication and Queueing Systems; Morgan & Claypool: Kentfield, CA, USA, 2010. [Google Scholar]
Monderer, D.; Shapley, L.S. Potential Games. Games Econ. Behav. 1996, 14, 124–143. [Google Scholar] [CrossRef]
Chien, S.; Sinclair, A. Convergence to approximate Nash equilibria in congestion games. Games Econ. Behav. 2011, 71, 315–327. [Google Scholar] [CrossRef]
Bhawalkar, K.; Gairing, M.; Roughgarden, T. Weighted Congestion Games: Price of Anarchy, Universal Worst-Case Examples, and Tightness. In Proceedings of the Algorithms—ESA 2010, Lecture Notes in Computer Science, Liverpool, UK, 6–8 September 2010; pp. 17–28. [Google Scholar] [CrossRef]
Milchtaich, I. Congestion Games with Player-Specific Payoff Functions. Games Econ. Behav. 1996, 13, 111–124. [Google Scholar] [CrossRef]
Ackermann, H.; Goldberg, P.W.; Mirrokni, V.S.; Röglin, H.; Vöcking, B. A Unified Approach to Congestion Games and Two-Sided Markets. Internet Math. 2008, 5, 439–458. [Google Scholar] [CrossRef]
Fotakis, D.; Kontogiannis, S.; Koutsoupias, E.; Mavronicolas, M.; Spirakis, P. The structure and complexity of Nash equilibria for a selfish routing game. Theor. Comput. Sci. 2009, 410, 3305–3326. [Google Scholar] [CrossRef]
Gairing, M.; Lücking, T.; Mavronicolas, M.; Monien, B. Computing Nash Equilibria for Scheduling on Restricted Parallel Links. In Proceedings of the Thirty-Sixth Annual ACM Symposium on Theory of Computing, STOC’04, Chicago, IL, USA, 13–15 June 2004; Association for Computing Machinery: New York, NY, USA, 2004; pp. 613–622. [Google Scholar] [CrossRef]
Acemoglu, D.; Makhdoumi, A.; Malekian, A.; Ozdaglar, A. Informational Braess’ Paradox: The Effect of Information on Traffic Congestion. Oper. Res. 2018, 66, 893–917. [Google Scholar] [CrossRef]
Le, S.; Wu, Y.; Toyoda, M. A Congestion Game Framework for Service Chain Composition in NFV with Function Benefit. Inf. Sci. 2020, 514, 512–522. [Google Scholar] [CrossRef]
Zhang, L.; Gong, K.; Xu, M. Congestion Control in Charging Stations Allocation with Q-Learning. Sustainability 2019, 11, 3900. [Google Scholar] [CrossRef]
Anshelevich, E.; Dasgupta, A.; Kleinberg, J.; Tardos, E.; Wexler, T.; Roughgarden, T. The price of stability for network design with fair cost allocation. In Proceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science, Rome, Italy, 17–19 October 2004; pp. 295–304. [Google Scholar] [CrossRef]
Caragiannis, I.; Flammini, M.; Kaklamanis, C.; Kanellopoulos, P.; Moscardelli, L. Tight Bounds for Selfish and Greedy Load Balancing. Algorithmica 2006, 58, 311–322. [Google Scholar] [CrossRef]
Zhang, F.; Wang, M.M. Stochastic Congestion Game for Load Balancing in Mobile-Edge Computing. IEEE Internet Things J. 2021, 8, 778–790. [Google Scholar] [CrossRef]
Liu, M.; Ahmad, S.H.A.; Wu, Y. Congestion games with resource reuse and applications in spectrum sharing. In Proceedings of the 2009 International Conference on Game Theory for Networks, Istanbul, Turkey, 13–15 May 2009; pp. 171–179. [Google Scholar] [CrossRef]
Liu, M.; Wu, Y. Spectum sharing as congestion games. In Proceedings of the 2008 46th Annual Allerton Conference on Communication, Control, and Computing, Monticello, IL, USA, 23–26 September 2008; pp. 1146–1153. [Google Scholar] [CrossRef]
Ibrahim, M.; Khawam, K.; Tohme, S. Congestion Games for Distributed Radio Access Selection in Broadband Networks. In Proceedings of the 2010 IEEE Global Telecommunications Conference GLOBECOM 2010, Miami, FL, USA, 6–10 December 2010; pp. 1–5. [Google Scholar] [CrossRef]
Seo, J.B.; Jin, H. Two-User NOMA Uplink Random Access Games. IEEE Commun. Lett. 2018, 22, 2246–2249. [Google Scholar] [CrossRef]
Seo, J.B.; Jin, H. Revisiting Two-User S-ALOHA Games. IEEE Commun. Lett. 2018, 22, 1172–1175. [Google Scholar] [CrossRef]
Malanchini, I.; Cesana, M.; Gatti, N. Network Selection and Resource Allocation Games for Wireless Access Networks. IEEE Trans. Mob. Comput. 2013, 12, 2427–2440. [Google Scholar] [CrossRef]
Trestian, R.; Ormond, O.; Muntean, G.M. Game Theory-Based Network Selection: Solutions and Challenges. IEEE Commun. Surv. Tutorials 2012, 14, 1212–1231. [Google Scholar] [CrossRef]
Quint, T.; Shubik, M. A model of migration. Working Paper. 1994; Cowles Foundation Discussion Papers 1331. Available online: https://elischolar.library.yale.edu/cowles-discussion-paper-series/1331 (accessed on 1 May 2023).
Wijewardena, M.; Neely, M.J. A Two-Player Resource-Sharing Game with Asymmetric Information. arXiv 2023, arXiv:2306.08791. [Google Scholar] [CrossRef]
Nisan, N.; Roughgarden, T.; Tardos, E.; Vazirani, V.V. Algorithmic Game Theory; Cambridge University Press: Cambridge, UK, 2007. [Google Scholar] [CrossRef]
Wei, X.; Yu, H.; Neely, M.J. Online Primal-Dual Mirror Descent under Stochastic Constraints. Proc. Acm Meas. Anal. Comput. Syst. 2020, 4, 1–36. [Google Scholar] [CrossRef]

Figure 1.

a, b, c, d = 0, 0, 3, 0

.

Figure 1.

a, b, c, d = 0, 0, 3, 0

.

Figure 2.

a, b, c, d = 0, 1, 2, 0

.

Figure 2.

a, b, c, d = 0, 1, 2, 0

.

Figure 3.

a, b, c, d = 1, 1, 1, 0

.

Figure 3.

a, b, c, d = 1, 1, 1, 0

.

Figure 4. Top: Case

a = 0

,

b = 0

,

c = 3

,

d = 0

. Middle: Case

a = 0

,

b = 1

,

c = 2

,

d = 0

. Bottom: Case

a = b = c = 1

,

d = 0

. Left: The expected utility of the players at the

ϵ

-approximate Nash equilibrium vs.

E_{1}

. Middle: One possible solution for the probabilities of choosing different resources at the

ϵ

-approximate Nash equilibrium for player A vs.

E_{1}

. Right: One possible solution for the probabilities of choosing different resources at the

ϵ

-approximate Nash equilibrium for player B vs.

E_{1}

.

Figure 4. Top: Case

a = 0

,

b = 0

,

c = 3

,

d = 0

. Middle: Case

a = 0

,

b = 1

,

c = 2

,

d = 0

. Bottom: Case

a = b = c = 1

,

d = 0

. Left: The expected utility of the players at the

ϵ

-approximate Nash equilibrium vs.

E_{1}

. Middle: One possible solution for the probabilities of choosing different resources at the

ϵ

-approximate Nash equilibrium for player A vs.

E_{1}

. Right: One possible solution for the probabilities of choosing different resources at the

ϵ

-approximate Nash equilibrium for player B vs.

E_{1}

.

Figure 5. Left: Case

a = 0

,

b = 0

,

c = 3

,

d = 0

. Middle: Case

a = 0

,

b = 1

,

c = 2

,

d = 0

. Right: Case

a = b = c = 1

,

d = 0

. Top: The maximum expected worst-case utility of player A and the error margin (shaded in blue) vs.

E_{1}

. Bottom: One possible solution for the probabilities of choosing different resources for player A vs.

E_{1}

.

Figure 5. Left: Case

a = 0

,

b = 0

,

c = 3

,

d = 0

. Middle: Case

a = 0

,

b = 1

,

c = 2

,

d = 0

. Right: Case

a = b = c = 1

,

d = 0

. Top: The maximum expected worst-case utility of player A and the error margin (shaded in blue) vs.

E_{1}

. Bottom: One possible solution for the probabilities of choosing different resources for player A vs.

E_{1}

.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wijewardena, M.; Neely, M.J. A Two-Player Resource-Sharing Game with Asymmetric Information. Games 2023, 14, 61. https://doi.org/10.3390/g14050061

AMA Style

Wijewardena M, Neely MJ. A Two-Player Resource-Sharing Game with Asymmetric Information. Games. 2023; 14(5):61. https://doi.org/10.3390/g14050061

Chicago/Turabian Style

Wijewardena, Mevan, and Michael J. Neely. 2023. "A Two-Player Resource-Sharing Game with Asymmetric Information" Games 14, no. 5: 61. https://doi.org/10.3390/g14050061

APA Style

Wijewardena, M., & Neely, M. J. (2023). A Two-Player Resource-Sharing Game with Asymmetric Information. Games, 14(5), 61. https://doi.org/10.3390/g14050061

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Two-Player Resource-Sharing Game with Asymmetric Information

Abstract

1. Introduction

1.1. Background on Resource-Sharing Games

1.2. Notation

2. Materials and Methods

3. Formulation

4. Computing the $ϵ$ -Approximate Nash Equilibrium

5. Worst-Case Expected Utility

5.1. Explicit Solution for $a = b = d = 0$

5.2. Solving the General Case

5.2.1. Solving (P3)

5.2.2. How Good Is the Mixed Strategy Generated by Algorithm 1

6. Simulations

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A. Proof of Theorem 2

Appendix B. Proof of Lemma 3

Appendix C. Proof of Lemma 4

Appendix D. Proof of Lemma 6

Appendix E. Proof of Lemma 8

Appendix F. Proof of Lemma 9

Notes

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

A Two-Player Resource-Sharing Game with Asymmetric Information

Abstract

1. Introduction

1.1. Background on Resource-Sharing Games

1.2. Notation

2. Materials and Methods

3. Formulation

4. Computing the ϵ -Approximate Nash Equilibrium

5. Worst-Case Expected Utility

5.1. Explicit Solution for a = b = d = 0

5.2. Solving the General Case

5.2.1. Solving (P3)

5.2.2. How Good Is the Mixed Strategy Generated by Algorithm 1

6. Simulations

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A. Proof of Theorem 2

Appendix B. Proof of Lemma 3

Appendix C. Proof of Lemma 4

Appendix D. Proof of Lemma 6

Appendix E. Proof of Lemma 8

Appendix F. Proof of Lemma 9

Notes

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

4. Computing the $ϵ$ -Approximate Nash Equilibrium

5.1. Explicit Solution for $a = b = d = 0$