Analytical Method for Mechanism Design in Partially Observable Markov Games

Clempner, Julio B.; Poznyak, Alexander S.

doi:10.3390/math9040321

Open AccessArticle

Analytical Method for Mechanism Design in Partially Observable Markov Games

by

Julio B. Clempner

^1,*

and

Alexander S. Poznyak

²

¹

Escuela Superior de Física y Matemáticas, Instituto Politécnico Nacional, School of Physics and Mathematics, National Polytechnic Institute, Edificio 9 U.P. Adolfo Lopez Mateos, Col. San Pedro Zacatenco, 07730 Mexico City, Mexico

²

Center for Research and Advanced Studies Av. IPN 2508, Department of Control Automatics, Col. San Pedro Zacatenco, 07360 Mexico City, Mexico

^*

Author to whom correspondence should be addressed.

Mathematics 2021, 9(4), 321; https://doi.org/10.3390/math9040321

Submission received: 29 November 2020 / Revised: 25 December 2020 / Accepted: 11 January 2021 / Published: 6 February 2021

(This article belongs to the Section E1: Mathematics and Computer Science)

Download

Browse Figures

Versions Notes

Abstract

:

A theme that become common knowledge of the literature is the difficulty of developing a mechanism that is compatible with individual incentives that simultaneously result in efficient decisions that maximize the total reward. In this paper, we suggest an analytical method for computing a mechanism design. This problem is explored in the context of a framework, in which the players follow an average utility in a non-cooperative Markov game with incomplete state information. All of the Nash equilibria are approximated in a sequential process. We describe a method for the derivative of the player’s equilibrium that instruments the design of the mechanism. In addition, it showed the convergence and rate of convergence of the proposed method. For computing the mechanism, we consider an extension of the Markov model for which it is introduced a new variable that represents the product of the mechanism design and the joint strategy. We derive formulas to recover the variables of interest: mechanisms, strategy, and distribution vector. The mechanism design and equilibrium strategies computation differ from those in previous literature. A numerical example presents the usefulness and effectiveness of the proposed method.

Keywords:

dynamic mechanism design; incentive-compatible mechanisms; Markov games with private information; partially observable Markov chains; incomplete state information

1. Introduction

1.1. Brief Review

Hurwicz [1] published his seminal work on mechanism design that has emerged as a practical framework for tackling game theory problems with an engineering viewpoint when considering players that interact rationally [2,3,4]. For a survey see [5]. This theory is based on games with incomplete information for modeling mechanisms (that implements a social choice function) compatible with individual incentives that result in efficient decisions for maximizing the total reward. The primary aim consists of establishing games that consider independent private values and quasilinear payoffs [6,7], in which players receive messages containing information that is relevant to payoffs [8]. In the evolutions of the game, players commit to a mechanism that presents a result in terms of a function of the possibly untruthfully reported type. It should be pointed out that the mechanism is unknown. The mechanism designer determines a social choice function that is a mapping of the true type profile directly to the alternatives. However, a mechanism maps the reported type profile to the alternatives. The main task in computational mechanism design is to find a mechanism that both maintains the game-theoretic original futures and is computationally “efficient” and “feasible”.

This approach makes it possible managing the restrictions and controlling the information of the players that are engaged in a game. From this perspective, Arrow [9] presented a framework to claim revelation that realizes efficiency and avoids the spend of resources in the incentive payments. d’Aspremont and Gerard-Varet [10] suggested two separate methods to design a mechanism with incomplete information: a) the former consist of the fact that players beliefs are not considered, and the second where they are. Saari [11] presented a mechanism design, which involves types of information. Rogerson [12] proposed a general approach of the hold-up problem, in which several players make relation-specific investments and then decide on some cooperative action proving that first-best solutions exist under a variety of different assumptions regarding the nature of information asymmetries. Mailath and Postlewaite [13] established an approach for the bargaining problems with asymmetric information while considering multiple agents. Miyakawa [14] provided a necessary and sufficient condition for the existence of a stationary perfect Bayesian equilibrium. Athey and Bagwell [15] and Hörner et al. [16] have relevant results on equilibria in repeated games, which consider communication. Clempner and Poznyak [17] suggested a Bayesian partially observable Markov games model supported by an AI approach. Different approaches are presented in the literature, for instance, see [18,19,20].

1.2. Main Results

We contribute to this literature by proposing original outcomes, presenting an analytical method for developing a mechanism that considers incomplete state information whose preferences evolve following a Markov process, and characterizing an approximately equilibrium behavior in game theory models [17]. The foundation of the proposed method is the derivation of formulas for computing the mechanism

μ

. Subsequently, given the mechanism, compute the equilibrium strategy. The derivation of these formulas rely on a direct mechanism design. We propose an extension of the Markov model, suggesting a new variable z that represents the product of the mechanism

μ

and the joint strategy c. Additionally, the joint strategy c is defined by the product of the strategy

π

, the observer q, and the distribution vector P. We derive formulas to recover the variables of interest: mechanism

μ

, the strategies

π

, and the distribution vectors P. We describe a method for the derivative of the player’s equilibrium that instruments the design of the mechanism and we also showed the convergence of the proposed method.

1.3. Organization of the Paper

For ease of exposition, in the next section, we describe the Markov game model. In Section 3, we introduce the variables c and z and suggest the derivation of the formulas. The ergodicity condition expressed in z variables is proven in Section 4. The convergence to a Nash equilibrium is presented in Section 5. Section 6 concludes with some remarks.

2. Markov Games with Incomplete Information

Let us introduce a probability space

(Ω, F, P)

, where

Ω

is a finite set of elementary events,

F

is the discrete

σ -

algebra of the subsets of

Ω

, and

P

is a given probability measure defined on

F

. Let us also consider the natural sequence

t = 1, 2, . . .

as a time argument. Let S be a finite set that consists of states

\{s_{1}, \dots, s_{N}\}

,

N \in N

, called the state space. A Stationary Markov chain [21,22] is a sequence of S-valued random variables

s (t),

t \in N

, satisfying the Markov condition:

\begin{matrix} P (s (t + 1) = s_{j} | s (t) = s_{i}, s (t - 1) = s_{i_{t - 1}}, \dots, s (1) = s_{i_{1}}) = \\ P (s (t + 1) = s_{j} | s (t) = s_{i}) = : p_{j | i} . \end{matrix}

(1)

The random variables

s (t)

are defined on the sample space

Ω

and they take values in S. The stochastic process

\{s (t), t \in N\}

is assumed to be a Markov chain. The Markov chain can be represented by a complete graph whose nodes are the states, where each edge

(s_{i}, s_{j}) \in S^{2}

is labeled by the transition probability in Equation (1). The matrix

P = {(p_{j | i})}_{(s (i), s (j)) \in S} \in {[0, 1]}^{N \times N}

determines the evolution of the chain: for each

n \in N

, the power

P^{n}

has in each entry

(s_{i}, s_{j})

the probability of going from state

s_{i}

to state

s_{j}

in exactly n steps.

Let

M C = (S, A, {A (s)}_{s \in S}, K, P)

be a Markov chain [21,22], where S is a finite set of states,

S \subset N

and A is a finite set of actions. For each

s \in S,

A (s) \subset A

is the non-empty set of admissible actions at state

s \in S

. Without loss of generality we may take

A = \cup_{s \in S} A (s)

. Whereas,

K = \{(s, a) | s \in S, a \in A (s)\}

is the set of admissible state-action pairs. The variable

p_{j | i k}

is a stationary controlled transition matrix, where

p_{j | i k} : = P (X_{t + 1} = s_{j} | X_{t} = s_{i}, A_{t} = a_{k})

\forall t \in N

represents the probability that is associated with the transition from state

s_{i}

to state

s_{j}

,

i = \bar{1, N}

(

i = 1, . . ., N

) and

j = \bar{1, N}

(

j = 1, . . ., N

), under an action

a_{k} \in A (s_{i}),

k = \bar{1, K}

(

k = 1, . . ., K

). The distribution vector is given by

P (X_{t} = s_{i}) = P_{i}

, such that

P_{i} \in S^{N},

where

S^{N} = {s \in R^{N} : \sum_{i = 1}^{N} P (s_{i}) = 1, P (s_{i}) \geq 0}

.

We consider the case where the process is not directly observable [23]. Let us associate with S the observation set Y, which takes values in a finite space

\{1, . . ., M\},

M \in N

. The stochastic process

\{Y_{t}, t \in N\}

is called the observation process. By observing

Y_{t}

at time t information regarding the true value of

X_{t}

is obtained. If

X_{t} = s_{i}

and

A_{t} = a_{k}

an observation

Y_{t} = y_{m}

will have a probability

q_{m | i k} : = P (Y_{t} = y_{m} | X_{t} = s_{i}, A_{t} = a_{k}),

that denotes the relationship between the state and observation when an action

a_{k} \in A (s_{i})

is chosen at time t. The observation kernel is a stochastic kernel on Y, as given by

Q = [q_{m | i k}]

. We restrict ourselves to consider

Q = [q_{m | i}]

.

Definition 1.

A controllable Partially Observable Markov Decision Process (POMDP) is a tuple

P O M D P = {M C, Y, Q, Q_{0}, P, V}

where: (i)

M C

is a Markov chain; (ii) Y is the observation set, which takes values in a finite space

\{1, . . ., M\},

M \in N;

(iii)

Q = {[q_{m | i}]}_{m = \bar{1, M}, i = \bar{1, N}}

denotes the observation kernel is a stochastic kernel on Y, such that

\sum_{m} q_{m | i} = 1

; (iv)

Q_{0} = {[q_{m | i}]}_{m = \bar{1, M}, i = \bar{1, N}}

denotes the initial observation kernel; (v) P is the (a priori) initial distribution; and, (vi)

V_{i j m k}

, is the reward function at time t, given the state

s_{i}

, the observable state

y_{m}

, when the action

a_{k} \in A (s_{i}, y_{m})

is taken.

A realization of the partially observable system at time t is given by the sequence

(s_{0}, y_{0}, a_{0},

s_{1}, y_{1}, a_{1}, . . .) \in Ω : = {(S Y A)}^{\infty}

, where

s_{0}

has a given by the distribution

P (X_{0} = s_{0})

and

\{A_{t}\}

is a control sequence in A that is determined by a control policy. To define a policy we cannot use the (unobservable) states

s_{0}, s_{1}, . . .

. Then, we introduce the observable histories

h_{0} : = (p, Y_{0}) \in H_{0}

and

h_{t} : = (s_{0}, y_{0}, a_{0}, . . ., y_{t - 1}, a_{t - 1}, y_{t}) \in H_{t} for all t \geq 1

and

H_{t} : = H_{t - 1} (A Y)

, if

t \geq 1

. Now, a policy is defined as a sequence

\{π_{k | m} (t)\}

, such that, for each t,

π_{k | m} (t)

is a stochastic kernel on A given

H_{t}

. The set of all policies is denoted by

Π

. A policy

π_{k | m} (t) \in Π

and an initial distribution

P (X_{0} = s_{0})

, also denoted as

P_{0}

, determine all possible realizations of the POMDP. A control strategy satisfies that

\sum_{k} π_{k | m} (t) = 1 and π_{k | m} (t) \geq 0, m = 1, . . ., M

.

A game consists of a set

N = \{1, . . ., n\}

of players (indexed by

l = \bar{1, n}

). We employ l in order to emphasize the l-th player’s variables and

- l

subsumes all the other players’ variables. The dynamics is described, as follows. At time

t = 0

, the initial state

s_{0}

has a given a priori distribution

P_{i}^{l}

, and the initial observation

y_{0}

is generated according to the initial observation kernel

Q_{0}^{l} (y_{0} | s_{0})

. If, at time t, the state of the system is

X_{t}

and the control

A_{t}^{l} \in A^{l}

is applied, then each of strategy is allowed to randomize, with distribution

π_{k | m}^{l} (t)

, over the pure action choices

A_{t}^{l} \in A^{l} (X_{t})

. These choices induce immediate utilities

V_{i j m k}^{l}

. The system tries to maximize the corresponding one-step utility. Next, the system moves to new state

X_{t + 1} = s_{j}

, according to the transition probabilities

P^{l} (π_{k | m}^{l} (t))

. Subsequently, the observation

Y_{t}

is generated by the observation kernel

Q^{l} (Y_{t} | X_{t})

. Based on the obtained utility, the systems adapt a mixed strategy computing

π_{k | m}^{l} (t + 1)

for the next selection of the control actions. For any stationary strategies

π_{k | m}^{l} (t) = π_{k | m}^{l}

, we have

P_{j}^{l} = \sum_{i = 1}^{N} (\sum_{k = 1}^{M} p_{j | i k}^{l} π_{k | m}^{l} q_{m | i}^{l}) P_{i}^{l}

. Then,

\begin{matrix} U^{l} (π) : = \sum_{m = 1}^{M} \sum_{i = 1}^{N} \sum_{k = 1}^{K} W_{i m k}^{l} \prod_{ι = 1}^{n} π_{k | m}^{ι} q_{m | i}^{ι} P_{i}^{ι}, \end{matrix}

where

W_{i m k}^{l} = \sum_{j}^{N} V_{i j m k}^{l} p_{j | i k}^{l} .

Each player maximizes the individual payoff function

U^{l} (π_{k | m})

, realizing the rule that is given by

(π_{k | m}^{l *}) \in A r g max_{π^{l} \in Π^{l}} \sum_{l \in N} U^{l} (π_{k | m}^{l *}) .

(2)

where for a given strategies

π_{k | m}^{*}

satisfy the Nash equilibrium [24,25] fulfilling, for all admissible

π_{k | m}

, the condition

U^{l} (π_{k | m}^{*}) \geq U^{l} (π_{k | m}^{l}, π_{k | m}^{- l *}) .

(3)

3. Main Relations

Following [21,26] and [27], let us introduce a matrix of elements

c = [c_{i m k}]

, as follows

c_{i m k}^{l} = d_{k m}^{l} P_{i}^{l} = π_{k | m}^{l} q_{m | i}^{l} P_{i}^{l} .

Let us define

Ξ_{i | m}^{l} = {(Q^{l})}^{- 1},

Q^{l} = [q_{m | i}^{l}]

. Formally, a mechanism is any function

μ_{k | m}

, such that given

c_{i m k}^{l}

represents the nonlinear programming problem

\begin{matrix} {\tilde{U}}^{l} (c) = \sum_{i = 1}^{N} \sum_{m = 1}^{M} \sum_{k = 1}^{K} \sum_{k^{'} = 1}^{K} W_{i m k}^{l} \prod_{ι = 1}^{n} (μ_{k^{'} | m}^{ι} c_{i m k}^{ι}) \to max_{c^{l} \in C_{a d m}^{l}, μ_{k^{'} | m}^{l} \in M_{a d m}^{l}} \end{matrix}

and defining

μ_{k^{'} | m}^{l} = μ_{k^{'} | m} \forall l = 1, . . ., n

, we have that

\begin{matrix} {\tilde{U}}^{l} (c) = \sum_{i = 1}^{N} \sum_{m = 1}^{M} \sum_{k = 1}^{K} \sum_{k^{'} = 1}^{K} W_{i m k}^{l} \prod_{ι = 1}^{n} (μ_{k^{'} | m} c_{i m k}^{ι}) \to max_{c^{l} \in C_{a d m}^{l}, μ_{k^{'} | m} \in M_{a d m}} \end{matrix}

(4)

such that

\begin{matrix} C_{a d m}^{l} : = \{c_{i m k}^{l} \geq 0 ∣ \sum_{i = 1}^{N} \sum_{m = 1}^{M} \sum_{k = 1}^{K} c_{i m k}^{l} = 1, \sum_{m = 1}^{M} \sum_{k = 1}^{K} c_{i m k}^{l} = P_{i}^{l} > 0, \\ \sum_{m = 1}^{M} \sum_{k = 1}^{K} \sum_{i = 1}^{N} [δ_{i j} - p_{j | i k}^{l}] c_{i m k}^{l} = 0, j = \bar{1, N}; \\ \sum_{h = 1}^{M} \sum_{k = 1}^{K} \sum_{i = 1}^{N} [δ_{h m} - q_{m | i}^{l}] c_{i h k}^{l} = 0, m = \bar{1, M}, \\ \sum_{m = 1}^{M} \sum_{k = 1}^{K} \sum_{i = 1}^{N} Ξ_{h | m}^{l} c_{i m k}^{l} \geq 0, h = \bar{1, N},\}, \end{matrix}

(5)

\begin{matrix} M_{a d m} = \{μ_{k^{'} | m} \geq 0 ∣ \sum_{k^{'} = 1}^{K} μ_{k^{'} | m} = 1, m = \bar{1, M}\} . \end{matrix}

(6)

Now, let us introduce the z-variable, as follows

\begin{matrix} z_{i m k k^{'}}^{l} : = μ_{k^{'} | m} c_{i m k}^{l} = μ_{k^{'} | m} π_{k | m}^{l} q_{m | i}^{l} P_{i}^{l} \end{matrix}

{\tilde{U}}^{l} (z) = \sum_{i = 1}^{N} \sum_{m = 1}^{M} \sum_{k = 1}^{K} \sum_{k^{'} = 1}^{K} W_{i m k}^{l} \prod_{ι = 1}^{n} z_{i m k k^{'}}^{ι} \to max_{z \in Z_{a d m}}

(7)

where

\begin{matrix} Z_{a d m}^{l} : = \{z_{i m k k^{'}}^{l} \geq 0 ∣ \sum_{i = 1}^{N} \sum_{m = 1}^{M} \sum_{k = 1}^{K} \sum_{k^{'} = 1}^{K} z_{i m k k^{'}}^{l} = 1, \sum_{m = 1}^{M} \sum_{k = 1}^{K} \sum_{k^{'} = 1}^{K} z_{i m k k^{'}}^{l} = P_{i}^{l} > 0, \\ \sum_{m = 1}^{M} \sum_{k = 1}^{K} \sum_{i = 1}^{N} [δ_{i j} - p_{j | i k}^{l}] \sum_{k^{'} = 1}^{K} z_{i m k k^{'}}^{l} = 0, j = \bar{1, N}; \sum_{h = 1}^{M} \sum_{k = 1}^{K} \sum_{i = 1}^{N} [δ_{h m} - q_{m | i}^{l}] \\ \sum_{k^{'} = 1}^{K} z_{i h k k^{'}}^{l} = 0, m = \bar{1, M}, \sum_{m = 1}^{M} \sum_{k = 1}^{K} \sum_{i = 1}^{N} Ξ_{h | m}^{l} \sum_{k^{'} = 1}^{K} z_{i m k k^{'}}^{l} \geq 0, h = \bar{1, N},\} \end{matrix}

(8)

Notice that by the relations

\begin{matrix} \sum_{k^{'} = 1}^{K} μ_{k^{'} | m} = 1, & \sum_{k = 1}^{K} π_{k | m}^{l} = 1, & \sum_{m = 1}^{M} q_{m | i}^{l} = 1, & \sum_{i = 1}^{N} P_{i}^{l} = 1 \end{matrix}

μ_{k^{'} | m} \geq 0

, it is easy to check that

z^{l} \in S^{l}

, where

\begin{matrix} S^{l} : = \{z_{i m k k^{'}}^{l} \geq 0 ∣ \sum_{i = 1}^{N} \sum_{m = 1}^{M} \sum_{k = 1}^{K} \sum_{k^{'} = 1}^{K} z_{i m k k^{'}}^{l} = 1, \sum_{m = 1}^{M} \sum_{k = 1}^{K} \sum_{k^{'} = 1}^{K} z_{i m k k^{'}}^{l} = P_{i} > 0\} \end{matrix}

(9)

We define the solution of the problem (7) as

z^{l *}

. The next lemma clarifies how we may recover

μ_{k^{'} | m}^{*}

and

c_{i m k}^{l *}

.

Lemma 1.

Variables

μ_{k^{'} | m}^{*}

and

c_{i m k}^{l *}

can be recovered from

z_{i m k k^{'}}^{l *}

, as follows:

\begin{matrix} (a) & μ_{k^{'} | m}^{*} = \frac{\sum_{l = 1}^{n} \sum_{i = 1}^{N} \sum_{k = 1}^{K} z_{i m k k^{'}}^{l *}}{\sum_{l = 1}^{n} \sum_{i = 1}^{N} \sum_{k = 1}^{K} \sum_{κ = 1}^{K} z_{i m k κ}^{l *}} & (b) & c_{α β γ}^{l *} = \frac{1}{K} \sum_{k^{'} = 1}^{K} \frac{z_{α β γ k^{'}}^{l *}}{μ_{k^{'} | β}^{*}} \end{matrix}

(10)

Proof.

See Appendix A. □

Corollary 1.

In addition,

c_{α β γ}^{l *} = \sum_{k^{'} = 1}^{K} z_{α β γ k^{'}}^{l *} .

Now, in order to derive

π_{k | m}^{l *}

and

{\bar{P}}_{m}^{l *}

we have that

π_{k i m}^{l *} = \{\begin{matrix} \frac{\sum_{k^{'} = 1}^{K} z_{i m k k^{'}}^{l *}}{\sum_{h = 1}^{K} z_{i m h k^{'}}^{l *}} & if & \sum_{h = 1}^{K} \sum_{k^{'} = 1}^{K} z_{i m h k^{'}}^{l *} > 0 \\ 0 & if & \sum_{h = 1}^{K} \sum_{k^{'} = 1}^{K} z_{i m h k^{'}}^{l *} = 0, \end{matrix}

(11)

P_{i}^{l *} = \sum_{m = 1}^{M} \sum_{k = 1}^{K} \sum_{k^{'} = 1}^{K} z_{i m k k^{'}}^{l *}

(12)

Corollary 2.

The strategy

π_{k | m}^{l *}

constructed from

π_{k i m}^{l *}

(11), and the distribution

{\bar{P}}_{m}^{l *}

are given by

\begin{matrix} π_{k | m}^{l *} = \frac{1}{N} \sum_{i = 1}^{N} π_{k i m}^{l *}, & {\bar{P}}_{m}^{l *} = \sum_{i = 1}^{N} \sum_{k = 1}^{K} \sum_{k^{'} = 1}^{K} z_{i m k k^{'}}^{l *} . \end{matrix}

(13)

4. Ergodicity Conditions Expressed in $z$ Variables

We have derived the formulas, which maximize Equation (7) that is based on the variables

z_{i m k k^{'}}^{l *}

and the formulas to recover the policy

π_{k | m}^{l *}

, the mechanism

μ_{k | m}^{*}

and

{\bar{P}}_{m}^{l *}

. Accordingly, we focus our attention on the ergodicity restrictions.

Theorem 1.

The strategy

π_{k | m}^{l *}

and the mechanism

μ_{k^{'} | m}^{l *}

are in Nash equilibrium, where every agent maximizes its expected utility, for every

l = \bar{1, n}

,

{\tilde{U}}^{l} (μ_{k^{'} | m}^{l *} π_{k | m}^{l *} q_{m | i}^{l *} P_{i}^{l *}) \geq {\tilde{U}}^{l} (μ_{k^{'} | m}^{l} π_{k | m}^{l} q_{m | i}^{l} P_{i}^{l})

if the quantities of

z_{i m k k^{'}}^{l}

satisfies the following restrictions

\sum_{β = 1}^{M} \sum_{γ = 1}^{K} \sum_{α = 1}^{N} [δ_{α j} - p_{j | α γ}^{l}] \sum_{k^{'} = 1}^{K} z_{α β γ k^{'}}^{l} = 0, j = \bar{1, N} .

(14)

\sum_{h = 1}^{M} \sum_{γ = 1}^{K} \sum_{α = 1}^{N} [δ_{h β} - q_{α | β}^{l}] \sum_{k^{'} = 1}^{K} z_{α β γ k^{'}}^{l} = 0, β = \bar{1, M} .

(15)

\sum_{β = 1}^{M} \sum_{γ = 1}^{K} \sum_{α = 1}^{N} Ξ_{h | β}^{l} \sum_{k^{'} = 1}^{K} z_{α β γ k^{'}}^{l} \geq 0, h = \bar{1, N} .

(16)

Proof.

See Appendix B. □

5. Convergence Analysis

The Nash Equilibrium is a game theory concept that involves several players that determines the solution in a non-cooperative game in which each player lacks any incentive to only change his/her own strategy. A practical notion in deriving Nash equilibria is a player’s best reply. The best reply is the strategy (or set of strategies) that maximizes/minimizes his/her payoff taking other players’ strategies as given. Then, a player has not just one best-reply strategy, however he/she has a best-reply strategy for each arrangement of strategies for the other players. All of the Nash’s equilibrium can be approximated in a (best reply) sequential process. We want to compute the solution of the problem (7), defined as

z^{l *} = μ_{k^{'} | m}^{l *} π_{k | m}^{l *} q_{m | i}^{l} P_{i}^{l *}

, when considering the best reply approach. For solving the problem (7), let us consider a game whose strategies are denoted by

x^{l} \in X^{l}

, where X is a convex and compact set, where

x^{l} : = c o l (z_{i m k k^{'}}^{l})

and

X^{l} : = Z_{a d m}^{l}

. Let

x = {(x^{1}, . . ., x^{n})}^{⊤} \in X

be the joint strategy of the players and

x^{\hat{l}} : = {(x^{1}, . . ., x^{l - 1}, x^{l + 1}, . . ., x^{n})}^{⊤} \in X^{\hat{l}}

be a strategy of the rest of the players adjoint to

x^{l} \in X_{a d m}^{l}

. We consider a Nash equilibrium problem with n players and denote, by

x = (x^{l}, x^{\hat{l}}) \in R^{n}

, the vector representing the x-th player’s strategy

X_{a d m} = X_{a d m}^{l} \times X_{a d m}^{\hat{l}}

. The method of Lagrange multipliers is an optimization approach for finding the local minimum (maximum) of a function subject to equality constraints (

A_{e q}

), as given in Equation (8). Let us consider the Lagrange function that is given by

L (x, \hat{x} (x), λ) : = U (x, \hat{x} (x)) + λ^{⊺} A_{e q} x

where the Lagrange vector-multipliers

λ \in Λ

may have any sign. The optimization problem

L (x, \hat{x} (x), λ) \to min_{x \in X_{a d m}, \hat{x} (x) \in {\hat{X}}_{a d m}} max_{λ \in Λ}

for which we propose the following iteration algorithm:

1. Proximal prediction step:

\begin{matrix} {\bar{λ}}_{n} = arg min_{λ \geq 0} \{- \frac{1}{2} {∥ λ - λ}_{n} ∥^{2} + θ L (x_{n}, {\hat{x}}_{n}, λ)\} \\ {\bar{x}}_{n} = arg min_{x \in X} \{\frac{1}{2} {∥ x - x}_{n} ∥^{2} + θ L (x, {\hat{x}}_{n} {\bar{λ}}_{n})\} \\ {\bar{\hat{x}}}_{n} = arg min_{\hat{x} \in \hat{X}} \{\frac{1}{2} {∥ \hat{x} - \hat{x}}_{n} ∥^{2} + θ L (x_{n}, \hat{x}, {\bar{λ}}_{n})\} \end{matrix}

(17)

2. Gradient approximation step:

\begin{matrix} λ_{n + 1} = arg min_{λ \geq 0} \{- \frac{1}{2} {∥ λ - λ}_{n} ∥^{2} + θ L ({\bar{x}}_{n}, {\bar{\hat{x}}}_{n}, λ)\} \\ x_{n + 1} = arg min_{x \in X} \{\frac{1}{2} {∥ x - x}_{n} ∥^{2} + θ L (x, {\bar{\hat{x}}}_{n} {\bar{λ}}_{n})\} \\ {\hat{x}}_{n + 1} = arg min_{\hat{x} \in \hat{X}} \{\frac{1}{2} {∥ \hat{x} - \hat{x}}_{n} ∥^{2} + θ L ({\bar{x}}_{n}, \hat{x}, {\bar{λ}}_{n})\} \end{matrix}

(18)

Let us define the following variables

\tilde{x} : = (\begin{matrix} x \\ \hat{x} \end{matrix}) \in \tilde{X} : = X \times \hat{X}, \tilde{y} : = (λ) \in \tilde{Y} : = R^{+}

Subsequently, the Lagrange function can be expressed as

\tilde{L} (\tilde{x}, \tilde{y}) : = L (x, \hat{x}, λ)

The equilibrium point that satisfies Equations (17) and (18) can be represented by

\begin{matrix} {\tilde{x}}^{*} = arg min_{\tilde{x} \in \tilde{X}} \{\frac{1}{2} {∥ \tilde{x} - {\tilde{x}}^{*} ∥}^{2} + γ \tilde{L} (\tilde{x}, {\tilde{y}}^{*})\} \\ {\tilde{y}}^{*} = arg max_{\tilde{y} \in \tilde{Y}} \{\frac{1}{2} {∥ \tilde{y} - {\tilde{y}}^{*} ∥}^{2} + γ \tilde{L} ({\tilde{x}}^{*}, \tilde{y})\} \end{matrix}

In addition, let us introduce the following variables

\tilde{w} = (\begin{matrix} {\tilde{w}}_{1} \\ {\tilde{w}}_{2} \end{matrix}) \in \tilde{X} \times \tilde{Y}, \tilde{v} = (\begin{matrix} {\tilde{v}}_{1} \\ {\tilde{v}}_{2} \end{matrix}) \in \tilde{X} \times \tilde{Y}

and let us define the Lagrangian in terms of the previous variables

L (\tilde{w}, \tilde{v}) : = \tilde{L} ({\tilde{w}}_{1}, {\tilde{v}}_{2}) - \tilde{L} ({\tilde{v}}_{1}, {\tilde{w}}_{2})

For

{\tilde{w}}_{1}

=

\tilde{x},

{\tilde{w}}_{2}

= \tilde{y},

{\tilde{v}}_{1}

=

{\tilde{v}}_{1}^{*}

=

{\tilde{x}}^{*}

and

{\tilde{v}}_{2}

=

{\tilde{v}}_{2}^{*}

=

{\tilde{y}}^{*}

, we have

L (\tilde{w}, {\tilde{v}}^{*}) : = \tilde{L} (\tilde{x}, {\tilde{y}}^{*}) - {\tilde{L}}_{δ} ({\tilde{x}}^{*}, \tilde{y})

In these variables, the relation Equations (17) and (18) can be represented by

{\tilde{v}}^{*} = arg min_{\tilde{w} \in \tilde{X} \times \tilde{Y}} \{\frac{1}{2} {∥ \tilde{w} - \tilde{v}}^{*} ∥^{2} + γ L {(\tilde{w}, \tilde{v}}^{*})\}

(19)

We provide the convergence analysis of the sequence

{\{v_{n}\}}_{n \in N}

in the following theorem [28].

Theorem 2.

Let

L (w, v)

be a convex and differentiable function with the gradient satisfying the Lipschitz condition, i.e.,

∥\nabla L (v) - \nabla L (w)∥

≤

L ∥v - w∥

for all

v, w \in V_{a d m}

, where

V_{a d m}

is a convex and compact set. Let

{\{v_{n}\}}_{n \in N}

be a sequence defined by the local search and proximal iteration algorithm that is given by

\begin{matrix} {\bar{v}}_{n} = arg min_{v \in V_{a d m}} \{\frac{1}{2} {∥ {v - v}_{n} ∥}^{2} + θ_{n} L (w, v)\} \\ v_{n + 1} = arg min_{v \in V_{a d m}} \{\frac{1}{2} {∥ {v - v}_{n} ∥}^{2} + θ_{n} L ({\bar{w}}_{n} . {\bar{v}}_{n})\} \end{matrix}

(20)

then, the sequence

{\{v_{n}\}}_{n \in N}

converges to a Nash equilibrium point

v^{*} \in V_{a d m}

.

Proof.

See Appendix C. □

6. Political Numerical Example

The theory that is related to electoral competition originates in the original contributions of Hotelling [29] and Downs [30]. The proposed framework suggests a majority rule election, where political candidates compete for a position by simultaneously and independently proposing a model from a unidimensional policy space. It is common knowledge that the equilibrium of this model is fundamentally determined on the candidates’ incentives for running for such a position. This example considers a three-player game (

l = \bar{1, 3}

) that is engaged in a political contest, in which the player with the highest performance wins. A question arises as to what is the design of a mechanism to select a candidate? The goal of each candidate is to end up on top. The next time a political position rolls around, pay attention to the campaigning. Candidates who are behind will talk about not only what a good choice they are for such a position, but also what a bad choice the front-runner is.

The assumption that is involved in this example considers the incomplete information version of the game, in which candidates have the same relative weight to their preference strategies versus their desire to win the position. This case is relevant from a theoretical point of view, and it is empirically important. The dynamics are modeled when considering

N = 4

,

M = 4

, and

K = 2

with transition matrices for describing the evolution of the partially observed Markov game. The initial transition matrices are defined, as follows:

\begin{matrix} p_{j | i 1}^{1 *} = [\begin{matrix} 0.2307 & 0.2130 & 0.3120 & 0.2443 \\ 0.4989 & 0.1075 & 0.2250 & 0.1686 \\ 0.1857 & 0.0801 & 0.3938 & 0.3404 \\ 0.3235 & 0.2372 & 0.1065 & 0.3327 \end{matrix}] & p_{j | i 2}^{1 *} = [\begin{matrix} 0.4344 & 0.0728 & 0.3970 & 0.0959 \\ 0.5605 & 0.1505 & 0.1423 & 0.1467 \\ 0.1866 & 0.2866 & 0.3168 & 0.2100 \\ 0.1140 & 0.2091 & 0.2878 & 0.3892 \end{matrix}] \end{matrix}

\begin{matrix} p_{j | i 1}^{2 *} = [\begin{matrix} 0.1613 & 0.4207 & 0.1745 & 0.2435 \\ 0.3372 & 0.1160 & 0.2305 & 0.3163 \\ 0.2488 & 0.3219 & 0.0323 & 0.3970 \\ 0.3696 & 0.5068 & 0.0363 & 0.0873 \end{matrix}] & p_{j | i 2}^{2 *} = [\begin{matrix} 0.3587 & 0.1023 & 0.1044 & 0.4346 \\ 0.1796 & 0.3039 & 0.2303 & 0.2862 \\ 0.0115 & 0.3002 & 0.2537 & 0.4346 \\ 0.2102 & 0.3296 & 0.4079 & 0.0523 \end{matrix}] \end{matrix}

\begin{matrix} p_{j | i 1}^{3 *} = [\begin{matrix} 0.1354 & 0.3183 & 0.0631 & 0.4832 \\ 0.2442 & 0.2663 & 0.2572 & 0.2323 \\ 0.4768 & 0.2447 & 0.0144 & 0.2641 \\ 0.3380 & 0.1812 & 0.3172 & 0.1636 \end{matrix}] & p_{j | i 2}^{3 *} = [\begin{matrix} 0.1201 & 0.0840 & 0.4017 & 0.3942 \\ 0.3531 & 0.1164 & 0.2559 & 0.2746 \\ 0.2919 & 0.0985 & 0.3721 & 0.2375 \\ 0.5341 & 0.0798 & 0.0850 & 0.3010 \end{matrix}] \end{matrix}

As well as, the initial observation matrices are defined, as follows:

\begin{matrix} q_{m | i}^{1 *} = [\begin{matrix} 0.2423 & 0.3692 & 0.0931 & 0.3550 \\ 0.1881 & 0.0397 & 0.2043 & 0.2126 \\ 0.2848 & 0.3933 & 0.1156 & 0.3773 \\ 0.2847 & 0.1978 & 0.5870 & 0.0552 \end{matrix}] & q_{m | i}^{2 *} = [\begin{matrix} 0.1749 & 0.4870 & 0.2563 & 0.4140 \\ 0.2719 & 0.0190 & 0.2748 & 0.4030 \\ 0.2814 & 0.4030 & 0.2405 & 0.1692 \\ 0.2718 & 0.0910 & 0.2284 & 0.0137 \end{matrix}] \end{matrix}

\begin{matrix} q_{m | i}^{3 *} = [\begin{matrix} 0.1734 & 0.0374 & 0.0430 & 0.3581 \\ 0.4350 & 0.2568 & 0.4207 & 0.0150 \\ 0.2747 & 0.3091 & 0.3390 & 0.3458 \\ 0.1170 & 0.3967 & 0.1973 & 0.2811 \end{matrix}] \end{matrix}

Fixing

θ = 0.055

in the extraproximal method that is given in Equations (17) and (18), we have that the Nash equilibrium results from computing the strategies and the mechanism design applying Equations (10) and (13), which are given, as follows:

\begin{matrix} π_{k | m}^{1 *} = [\begin{matrix} 0.4633 & 0.5367 \\ 0.4543 & 0.5457 \\ 0.4850 & 0.5150 \\ 0.3859 & 0.6141 \end{matrix}] & π_{k | m}^{2 *} = [\begin{matrix} 0.3092 & 0.6908 \\ 0.3707 & 0.6293 \\ 0.3879 & 0.6121 \\ 0.3863 & 0.6137 \end{matrix}] \end{matrix}

\begin{matrix} π_{k | m}^{3 *} = [\begin{matrix} 0.3937 & 0.6063 \\ 0.4165 & 0.5835 \\ 0.3850 & 0.6150 \\ 0.4990 & 0.5010 \end{matrix}] & μ_{k | m}^{*} = [\begin{matrix} 0.2929 & 0.7071 \\ 0.3321 & 0.6679 \\ 0.2410 & 0.7590 \\ 0.2187 & 0.7813 \end{matrix}] \end{matrix}

We present a full characterization of the Nash equilibrium for the case of partially observable Markov games. Figure 1, Figure 2 and Figure 3 show the convergence of the strategies

z_{i m k k^{'}}^{l *}

.

7. Conclusions

This paper contributed to the literature on mechanism design for Markov games with incomplete state information (partially observable). We suggested an analytical method for the design of a mechanism. The main result of this work is based on the introduction of the new variable z, which makes the game problem computationally tractable and allow for obtaining the mechanism solution

μ

and the strategies

π

for all of the players in the game. The variable z allows for the introduction of new natural additional linear restrictions for computing the Nash equilibrium of the game. A no feasible solution can be detected with a simple test on the variable z, i.e., it is possible to detect unusual conditions in the solver of the game given the information available for the simplex. A major advantage of introducing this variable relies on the fact it can be efficiently implemented for real settings, which is consistent with the engineering approach for designing economic mechanisms or incentives, toward desired objectives, where players act rationally. We applied these results to a numerical example that is related to political promotion.

In relation to future work, there are several challenges that are left to address. One interesting technical challenge is that of addressing extremum seeking in the context of mechanism design [31,32,33]. Another interesting challenge would be to consider the observer design approach in order to extend the mechanism design theory [23].

Author Contributions

Authors contributed equally to this work. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Proof Lemma 3.1

Proof. Since

μ_{k^{'} | m}^{*}

does not depend on indices

l, i, k^{'}

it may be obtained from Equations (8) and (9):

\begin{matrix} \sum_{l = 1}^{n} \sum_{i = 1}^{N} \sum_{k = 1}^{K} z_{i m k k^{'}}^{l *} : = μ_{k^{'} | m}^{*} \sum_{l = 1}^{n} \sum_{i = 1}^{N} \sum_{k = 1}^{K} c_{i m k}^{l *} = μ_{k^{'} | m}^{*} \sum_{l = 1}^{n} \sum_{i = 1}^{N} \sum_{k = 1}^{K} (π_{k | m}^{l *} q_{m | i}^{l} P_{i}^{l *}) = \\ μ_{k^{'} | m}^{*} \sum_{l = 1}^{n} \sum_{i = 1}^{N} (q_{m | i}^{l} P_{i}^{l *}) \end{matrix}

Hence,

\begin{matrix} μ_{k^{'} | m}^{*} = \frac{\sum_{l = 1}^{n} \sum_{i = 1}^{N} \sum_{k = 1}^{K} z_{i m k k^{'}}^{l *}}{\sum_{l = 1}^{n} \sum_{i = 1}^{N} (q_{m | i}^{l} P_{i}^{l *})} = \frac{\sum_{l = 1}^{n} \sum_{i = 1}^{N} \sum_{k = 1}^{K} z_{i m k k^{'}}^{l *}}{\sum_{l = 1}^{n} \sum_{i = 1}^{N} \sum_{k = 1}^{K} \sum_{κ = 1}^{K} z_{i m k κ}^{l *}} . \end{matrix}

Let us define

z_{α β γ k^{'}}^{l *}

as follows

\begin{matrix} c_{α β γ}^{l *} = \frac{1}{K} \sum_{k^{'} = 1}^{K} \frac{z_{α β γ k^{'}}^{l}}{μ_{k^{'} | β}^{*}} = \frac{1}{K} \sum_{k^{'} = 1}^{K} (\frac{z_{α β γ k^{'}}^{l *}}{\sum_{l = 1}^{n} \sum_{i = 1}^{N} \sum_{k = 1}^{K} z_{i β k k^{'}}^{l *}} \sum_{l = 1}^{n} \sum_{i = 1}^{N} \sum_{k = 1}^{K} \sum_{κ = 1}^{K} z_{i β k κ}^{l *}) \end{matrix}

(A1)

To verify that the definitions of

μ_{k^{'} | m}^{*}

and

c_{i m k}^{l *}

(10) are correct we need to check the fulfilling of Equations (5) and (6), i.e.,

μ_{k^{'} | m}^{*} \in M_{a d m}

and

c_{i m k}^{l *} \in C_{a d m}^{l}

.

(a) As for variables

μ_{k^{'} | m}^{*}

, these properties follow directly:

μ_{k^{'} | m}^{*} = \frac{\sum_{l = 1}^{n} \sum_{i = 1}^{N} \sum_{k = 1}^{K} z_{i m k k^{'}}^{l *}}{\sum_{l = 1}^{n} \sum_{i = 1}^{N} \sum_{k = 1}^{K} \sum_{κ = 1}^{K} z_{i m k κ}^{l *}} \geq 0

(A2)

since

z_{i m k k^{'}}^{l *} \geq 0

. Summing (A2) by

k^{'}

directly leads to the property

\sum_{k^{'} = 1}^{K} μ_{k^{'} | m}^{*} = 1

.

(b) To prove that

c_{i m k}^{l *} \in C_{a d m}^{l}

defined by (A1) notice that

\begin{matrix} \sum_{γ = 1}^{K} c_{α β γ}^{l *} = \frac{1}{K} \sum_{γ = 1}^{K} \sum_{k^{'} = 1}^{K} (\frac{z_{α β γ k^{'}}^{l *}}{\sum_{l = 1}^{n} \sum_{i = 1}^{N} \sum_{k = 1}^{K} z_{i β k k^{'}}^{l *}} \sum_{l = 1}^{n} \sum_{i = 1}^{N} \sum_{k = 1}^{K} \sum_{κ = 1}^{K} z_{i β k κ}^{l *}) = \\ \frac{1}{K} \sum_{k^{'} = 1}^{K} (\frac{\sum_{γ = 1}^{K} z_{α β γ k^{'}}^{l *}}{\sum_{l = 1}^{n} \sum_{i = 1}^{N} \sum_{k = 1}^{K} z_{i β k k^{'}}^{l *}} \sum_{l = 1}^{n} \sum_{i = 1}^{N} \sum_{k = 1}^{K} \sum_{κ = 1}^{K} z_{i β k κ}^{l *}) \end{matrix}

and

\begin{matrix} \sum_{β = 1}^{M} \sum_{γ = 1}^{K} c_{α β γ}^{l *} = \frac{1}{K} \sum_{k^{'} = 1}^{K} \sum_{β = 1}^{M} (\frac{\sum_{γ = 1}^{K} z_{α β γ k^{'}}^{l *}}{\sum_{l = 1}^{n} \sum_{i = 1}^{N} \sum_{k = 1}^{K} z_{i β k k^{'}}^{l *}} \sum_{l = 1}^{n} \sum_{i = 1}^{N} \sum_{k = 1}^{K} \sum_{κ = 1}^{K} z_{i β k κ}^{l *}), \end{matrix}

which leads to following relation

\begin{matrix} \sum_{α = 1}^{N} \sum_{β = 1}^{M} \sum_{γ = 1}^{K} c_{α β γ}^{l *} = \frac{1}{K} \sum_{k^{'} = 1}^{K} \sum_{β = 1}^{M} \sum_{α = 1}^{N} (\frac{\sum_{γ = 1}^{K} z_{α β γ k^{'}}^{l *}}{\sum_{h = 1}^{n} \sum_{i = 1}^{N} \sum_{k = 1}^{K} z_{i β k k^{'}}^{h *}} \sum_{h = 1}^{n} \sum_{i = 1}^{N} \sum_{k = 1}^{K} \sum_{κ = 1}^{K} z_{i β k κ}^{h *}) = \\ \frac{1}{K} \sum_{β = 1}^{M} \sum_{k^{'} = 1}^{K} (\frac{\sum_{α = 1}^{N} \sum_{γ = 1}^{K} z_{α β γ k^{'}}^{l *}}{\sum_{h = 1}^{n} \sum_{i = 1}^{N} \sum_{k = 1}^{K} z_{i β k k^{'}}^{h *}} \sum_{h = 1}^{n} \sum_{i = 1}^{N} \sum_{k = 1}^{K} \sum_{κ = 1}^{K} z_{i β k κ}^{h *}) = \\ \frac{1}{K} \sum_{β = 1}^{M} [(\sum_{h = 1}^{n} \sum_{i = 1}^{N} \sum_{k = 1}^{K} \sum_{κ = 1}^{K} z_{i β k κ}^{h *}) \sum_{k^{'} = 1}^{K} \frac{\sum_{α = 1}^{N} \sum_{γ = 1}^{K} z_{α β γ k^{'}}^{l *}}{\sum_{h = 1}^{n} \sum_{i = 1}^{N} \sum_{k = 1}^{K} z_{i β k k^{'}}^{h *}}] = \sum_{β = 1}^{M} q_{i | β}^{l} P_{i}^{l} \frac{\sum_{α = 1}^{N} q_{α | β}^{h} P_{α}^{l}}{\sum_{h = 1}^{M} \sum_{i = 1}^{N} q_{i | β}^{l} P_{i}^{l}} = \\ \sum_{α = 1}^{N} [\sum_{β = 1}^{M} q_{α | β}^{h}] P_{α}^{h} = \sum_{α = 1}^{N} P_{α}^{h} = 1 \end{matrix}

Then,

z_{α β γ k^{'}}^{l *} \in S

, see Equation (9). The Lemma is proved.

Appendix B. Proof of Theorem 4.1

Proof. This means that new variables

z_{α β θ k^{'}}^{l}

should satisfy the following linear ergodicity constraints:

\begin{matrix} P_{j}^{l} = \sum_{α = 1}^{N} (\sum_{β = 1}^{M} \sum_{γ = 1}^{K} p_{j | α γ}^{l} μ_{k^{'} | β} π_{γ | β}^{l} q_{β | α}^{l}) P_{α}^{l} = \sum_{α = 1}^{N} (\sum_{β = 1}^{M} \sum_{γ = 1}^{K} p_{j | α γ}^{l} z_{α β γ k^{'}}^{l}), \end{matrix}

\begin{matrix} \sum_{j = 1}^{N} (\sum_{β = 1}^{M} \sum_{γ = 1}^{K} p_{j | α γ}^{l} z_{j β γ k^{'}}^{l}) = \sum_{α = 1}^{N} (\sum_{β = 1}^{M} \sum_{γ = 1}^{K} p_{j | α γ}^{l} z_{α β γ k^{'}}^{l}), \end{matrix}

which implies

\sum_{β = 1}^{M} \sum_{γ = 1}^{K} \sum_{α = 1}^{N} [δ_{α j} - p_{j | α γ}^{l}] \sum_{k^{'} = 1}^{K} z_{α β γ k^{'}}^{l} = 0, j = \bar{1, N}

Then,

\begin{matrix} z \in E : = \{z_{α β γ k^{'}}^{l} ∣ \sum_{β = 1}^{M} \sum_{γ = 1}^{K} \sum_{α = 1}^{N} [δ_{α j} - p_{j | α γ}^{l}] \sum_{k^{'} = 1}^{K} z_{α β γ k^{'}}^{l} = 0, j = \bar{1, N},\} \end{matrix}

(A3)

The Equation (15) is fulfilled automatically since

\begin{matrix} \sum_{h = 1}^{M} \sum_{α = 1}^{N} [δ_{h β} - q_{α | β}^{l}] \sum_{l = 1}^{n} \sum_{i = 1}^{N} \sum_{k = 1}^{K} \sum_{κ = 1}^{K} z_{i h k κ}^{l} \sum_{k^{'} = 1}^{K} \frac{μ_{k^{'} | h} \sum_{γ = 1}^{K} π_{γ | h}^{l} q_{α | h}^{l} P_{α}^{l}}{μ_{k^{'} | h} \sum_{l = 1}^{n} \sum_{i = 1}^{N} \sum_{k = 1}^{K} π_{k | h}^{l} q_{i | h}^{l} P_{i}^{l}} = \\ \sum_{h = 1}^{M} \sum_{α = 1}^{N} [δ_{h β} - q_{α | β}^{l}] \sum_{l = 1}^{n} \sum_{i = 1}^{N} \sum_{k = 1}^{K} \sum_{κ = 1}^{K} z_{i h k κ}^{l} \sum_{k^{'} = 1}^{K} \frac{(\sum_{γ = 1}^{K} π_{γ | h}^{l}) q_{α | h}^{l} P_{α}^{l}}{\sum_{l = 1}^{n} \sum_{i = 1}^{N} (\sum_{k = 1}^{K} π_{k | h}^{l}) q_{i | h}^{l} P_{i}^{l}} = \\ K \sum_{h = 1}^{M} \sum_{α = 1}^{N} [δ_{h β} - q_{α | β}^{l}] \sum_{l = 1}^{n} \sum_{i = 1}^{N} \sum_{k = 1}^{K} \sum_{κ = 1}^{K} z_{i h k κ}^{l} \frac{q_{α | h}^{l} P_{α}^{l}}{\sum_{l = 1}^{n} \sum_{i = 1}^{N} q_{i | h}^{l} P_{i}^{l}} = \\ K \sum_{h = 1}^{M} \sum_{α = 1}^{N} [δ_{h β} - q_{α | β}^{l}] \sum_{l = 1}^{n} \sum_{i = 1}^{N} (\sum_{k = 1}^{K} (\sum_{κ = 1}^{K} μ_{κ | h}) π_{k | h}^{l}) q_{i | h}^{l} P_{i}^{l} \frac{q_{α | h}^{l} P_{α}^{l}}{\sum_{l = 1}^{n} \sum_{i = 1}^{N} q_{i | h}^{l} P_{i}^{l}} \\ = K \sum_{h = 1}^{M} \sum_{α = 1}^{N} [δ_{h β} - q_{α | β}^{l}] \sum_{l = 1}^{n} \sum_{i = 1}^{N} q_{i | h}^{l} P_{i}^{l} \frac{q_{α | h}^{l} P_{α}^{l}}{\sum_{l = 1}^{n} \sum_{i = 1}^{N} q_{i | h}^{l} P_{i}^{l}} = K \sum_{h = 1}^{M} \sum_{α = 1}^{N} [δ_{h β} - q_{α | β}^{l}] (q_{α | h}^{l} P_{α}^{l}) = \\ K [\sum_{h = 1}^{M} \sum_{α = 1}^{N} δ_{h β} (q_{α | h}^{l} P_{α}^{l}) - \sum_{h = 1}^{M} \sum_{α = 1}^{N} q_{α | β}^{l} (q_{α | h}^{l} P_{α}^{l})] = \\ K [\sum_{h = 1}^{M} \sum_{α = 1}^{N} δ_{h β} q_{α | h}^{l} P_{α}^{l} - \sum_{α = 1}^{N} q_{α | β}^{l} (\sum_{h = 1}^{M} q_{α | h}^{l}) P_{α}^{l}] = K [\sum_{α = 1}^{N} (q_{α | β}^{l} P_{α}^{l}) - \sum_{α = 1}^{N} q_{α | β}^{l} P_{α}^{l}] = 0 \end{matrix}

Now, we prove the relation given in Equation (16) as follows:

\begin{matrix} \sum_{β = 1}^{M} \sum_{γ = 1}^{K} \sum_{α = 1}^{N} Ξ_{h | β}^{l} \sum_{k^{'} = 1}^{K} z_{α β γ k^{'}}^{l} = \sum_{β = 1}^{M} \sum_{γ = 1}^{K} \sum_{α = 1}^{N} Ξ_{h | β}^{l} c_{α β γ}^{l} = \sum_{β = 1}^{M} \sum_{γ = 1}^{K} \sum_{α = 1}^{N} Ξ_{h | β}^{l} π_{γ | β}^{l} q_{α | β}^{l} P_{α}^{l} = \\ \sum_{α = 1}^{N} P_{α}^{l} \sum_{β = 1}^{M} q_{α | β}^{l} Ξ_{h | β}^{l} \sum_{γ = 1}^{K} π_{γ | β}^{l} = \sum_{α = 1}^{N} P_{α}^{l} δ_{α, h} = P_{h}^{l} \geq 0 . \end{matrix}

The Theorem is proved.

Appendix C. Proof of Theorem 5.1

Let us define

ω_{0} = \sum_{l = 1}^{∣ N ∣} ω_{0, l} \leq ∣ N ∣ max_{l = \bar{1, ∣ N ∣}} ω_{0, l} = ∣ N ∣ ω_{0}^{+}

and

\tilde{w} = {\tilde{v}}_{n + 1}

, then

\frac{1}{2} {∥ \hat{v}}_{n} {- \tilde{v}}_{n} ∥^{2} {+ θ L (\hat{v}}_{n} {, \tilde{v}}_{n}) \leq \frac{1}{2} {∥ \tilde{v}}_{n + 1} {- \tilde{v}}_{n} ∥^{2} {+ θ L (\tilde{v}}_{n + 1} {, \tilde{v}}_{n}) - \frac{1}{2} {∥ \tilde{v}}_{n + 1} {- \hat{v}}_{n} ∥^{2}

(A4)

Let also

\tilde{w} = {\tilde{v}}^{*} \in {\tilde{X}}^{*} \times {\tilde{Y}}^{*}

then

\frac{1}{2} {∥ \tilde{v}}_{n + 1} {- \tilde{v}}_{n} ∥^{2} {+ θ L (\tilde{v}}_{n + 1} {, \hat{v}}_{n}) \leq \frac{1}{2} {∥ \tilde{v}}^{*} {- \tilde{v}}_{n} ∥^{2} {+ θ L (\tilde{v}}^{*} {, \hat{v}}_{n}) - \frac{1}{2} {∥ \tilde{v}}^{*} {- \tilde{v}}_{n + 1} ∥^{2}

(A5)

Adding Equations (A4) and (A5) and multiplying by two yields

\begin{matrix} {∥ \tilde{v}}^{*} {- \tilde{v}}_{n + 1} ∥^{2} {+ ∥ \tilde{v}}_{n + 1} {- \hat{v}}_{n} ∥^{2} {+ ∥ \hat{v}}_{n} {- \tilde{v}}_{n} ∥^{2} {- 2 θ L (\tilde{v}}^{*} {, \hat{v}}_{n}) + {2 θ [L (\tilde{v}}_{n + 1} {, \hat{v}}_{n}) {+ L (\hat{v}}_{n} {, \tilde{v}}_{n}) - \\ L ({\tilde{v}}_{n + 1} {, \tilde{v}}_{n})] \leq {∥ \tilde{v}}^{*} {- \tilde{v}}_{n} ∥^{2} \end{matrix}

(A6)

Adding and subtracting L

({\hat{v}}_{n}, {\hat{v}}_{n})

in Equation (A6) we have

\begin{matrix} {∥ \tilde{v}}^{*} {- \tilde{v}}_{n + 1} ∥^{2} {+ ∥ \tilde{v}}_{n + 1} {- \hat{v}}_{n} ∥^{2} {+ ∥ \hat{v}}_{n} {- \tilde{v}}_{n} ∥^{2} + 2 θ [L ({\hat{v}}_{n}, {\hat{v}}_{n}) - L ({\tilde{v}}^{*}, {\hat{v}}_{n})] + 2 θ [L ({\tilde{v}}_{n + 1}, {\hat{v}}_{n}) - \\ L ({\hat{v}}_{n}, {\hat{v}}_{n}) + L {(\hat{v}}_{n} {, \tilde{v}}_{n} {) - L (\tilde{v}}_{n + 1} {, \tilde{v}}_{n})] \leq {∥ {\tilde{v}}^{*} - {\tilde{v}}_{n} ∥}^{2} \end{matrix}

(A7)

Let

\tilde{w} + h = {\tilde{v}}_{n + 1}, \tilde{w} = {\hat{v}}_{n}, \tilde{v} + k = {\tilde{v}}_{n}, \tilde{v} = {\hat{v}}_{n}

having h=

{\tilde{v}}_{n + 1} - {\hat{v}}_{n}

and k=

{\tilde{v}}_{n} - {\hat{v}}_{n}

. The inequality (A7) becomes

\begin{matrix} {∥ \tilde{v}}^{*} {- \tilde{v}}_{n + 1} ∥^{2} {+ ∥ \tilde{v}}_{n + 1} {- \hat{v}}_{n} ∥^{2} {+ ∥ \hat{v}}_{n} {- \tilde{v}}_{n} ∥^{2} + 2 θ [L ({\hat{v}}_{n}, {\hat{v}}_{n}) - L ({\tilde{v}}^{*}, {\hat{v}}_{n})] - \\ {2 θ ω ∥ \tilde{v}}_{n + 1} {- \hat{v}}_{n} {∥ ∥ \tilde{v}}_{n} {- \hat{v}}_{n} {∥ \leq ∥ {\tilde{v}}^{*} - {\tilde{v}}_{n} ∥}^{2} \end{matrix}

Using

L {({\hat{v}}_{n}, {\hat{v}}_{n}) - L ({\tilde{v}}^{*}, {\hat{v}}_{n}) \geq ∥ \hat{v}}_{n} {- \tilde{v}}^{*} ∥^{2}

we obtain

{∥ \tilde{v}}^{*} {- \tilde{v}}_{n + 1} ∥^{2} {+ ∥ \tilde{v}}_{n + 1} {- \hat{v}}_{n} ∥^{2} + 2 θ

∥ {\hat{v}}_{n} - {\tilde{v}}^{*} ∥^{2} + ({1 - 2 θ}^{2} ω^{2}) {∥ \tilde{v}}_{n} {- \hat{v}}_{n} ∥^{2} {\leq ∥ {\tilde{v}}^{*} - {\tilde{v}}_{n} ∥}^{2}

. Now, by the fact

\begin{matrix} {2 〈 a - c, c - b 〉 = ∥ a - b ∥}^{2} {- ∥ a - c ∥}^{2} {- ∥ c - b ∥}^{2} \end{matrix}

and replacing

a = {\hat{v}}_{n}

,

b = {\tilde{v}}^{*}

and

c = {\tilde{v}}_{n},

to the left-hand side of the last inequality we have

\begin{matrix} {∥ \tilde{v}}^{*} {- \tilde{v}}_{n + 1} ∥^{2} {+ ∥ \tilde{v}}_{n + 1} {- \hat{v}}_{n} ∥^{2} + ({1 - 2 θ}^{2} ω^{2}) {∥ \tilde{v}}_{n} {- \hat{v}}_{n} ∥^{2} + 2 θ [2 〈 {\hat{v}}_{n} - {\tilde{v}}_{n}, {\tilde{v}}_{n} - {\tilde{v}}^{*} 〉 + {∥ {\tilde{v}}_{n} {- \hat{v}}_{n} ∥}^{2} + \\ ∥ {\tilde{v}}_{n} - {\tilde{v}}^{*} ∥^{2} \leq {∥ {\tilde{v}}^{*} - {\tilde{v}}_{n} ∥}^{2} \end{matrix}

Computing the square form of the third and fourth terms we have that

∥ {\tilde{v}}^{*} - {\tilde{v}}_{n + 1} ∥^{2} \leq (1 - 2 θ + \frac{{(2 θ)}^{2}}{{1 + 2 θ - 2 θ}^{2} ω^{2}}) {∥ {\tilde{v}}^{*} - {\tilde{v}}_{n} ∥}^{2}

Let

ξ = 1 - 2 θ + \frac{{(2 θ)}^{2}}{{1 + 2 θ - 2 θ}^{2} ω^{2}} < 1 .

then iterating over the previous inequality, we have

{∥ \tilde{v}}^{*} {- \tilde{v}}_{n + 1} ∥^{2} \leq q {∥ {\tilde{v}}^{*} - {\tilde{v}}_{n} ∥}^{2} \leq . . . \leq e^{n + 1 ln ξ} {∥ {\tilde{v}}^{*} - {\tilde{v}}_{0} ∥}^{2}

(A8)

By Equation (A8) we have that

{∥ \tilde{v}}^{*} {- \tilde{v}}_{n + 1} ∥^{2} \underset{n \to \infty}{\to} 0

. Taking into account that

\tilde{v}

is a sequence that is bounded, we have that there is a point

{\tilde{v}}^{'}

such that any subsequence

{\tilde{v}}_{n_{i}}

fulfills that

{\tilde{v}}_{n_{i}} \underset{n_{i} \to \infty}{\to} {\tilde{v}}^{'}

(Weierstrass theorem). Now, we have that

{∥{\tilde{v}}_{n_{i}} - {\tilde{v}}_{n_{i} + 1}∥}^{2} \to 0

. Leting

n = n_{i}

in Equation (19) and taking the limit when

n_{i} \to \infty

we obtain

{\tilde{v}}^{'} = arg min_{\tilde{w} \in \tilde{X} \times \tilde{Y}} \{\frac{1}{2} {∥ \tilde{w} - \tilde{v}}^{'} ∥^{2} + θ L {(\tilde{w}, \tilde{v}}^{'})\}

As a result, we have that

{\tilde{v}}^{'} = {\tilde{v}}^{*}

. Provided that

{∥{\tilde{v}}_{n} - {\tilde{v}}^{*}∥}^{2}

is monotonically decreasing then there exists a unique limit point (equilibrium point). As a result, the sequence

{\tilde{v}}_{n}

satisfies that

{\tilde{v}}_{n} \underset{n \to \infty}{\to} {\tilde{v}}^{*}

with a rate given by

e^{n + 1 ln ξ}

.

References

Hurwicz, L. Optimality and informational efficiency in resource allocation processes. In Mathematical Methods in the Social Sciences: Proceedings of the First Stanford Symposium; Arrow, K.J., Karlin, S., Suppes, P., Eds.; Stanford University Press: Palo Alto, CA, USA, 1960; pp. 27–46. [Google Scholar]
Nobel. The Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel 2007: Scientific Background; Technical Report; The Nobel Foundation: Stockholm, Sweden, 2007. [Google Scholar]
Myerson, R.B. Allocation, Information and Markets; Chapter Mechanism Design; The New Palgrave; Palgrave Macmillan: London, UK, 1989; pp. 191–206. [Google Scholar]
Vickrey, W. Counterspeculation, auctions, and competitive sealed tenders. J. Financ. 1961, 16, 8–37. [Google Scholar] [CrossRef]
Bergemann, D.; Välimäki, J. Dynamic mechanism design: An introduction. J. Econ. Perspect 2019, 52, 235–274. [Google Scholar] [CrossRef] [Green Version]
Clarke, E. Multi-part pricing of public goods. Public Choice 1971, 11, 17–23. [Google Scholar] [CrossRef]
Groves, T. Incentives in teams. Econometrica 1973, 41, 617–631. [Google Scholar] [CrossRef]
Harsanyi, J.C. Games with incomplete information played by bayesian players. part i: The basic model. Manag. Sci. 1967, 14, 159–182. [Google Scholar] [CrossRef]
Arrow, K. Economics and Human Welfare; Chapter The Property Rights Doctrine and Demand Revelation under Incomplete Information; Academic Press: New York, NY, USA, 1979; pp. 23–39. [Google Scholar]
D’Aspremont, C.; Gerard-Varet, L. Incentives and incomplete information. J. Public Econ. 1979, 11, 25–45. [Google Scholar]
Saari, D.G. On the types of information and mechanism design. J. Comput. Appl. Math. 1988, 22, 231–242. [Google Scholar] [CrossRef] [Green Version]
Rogerson, W. Contractual solutions to the hold-up problem. Rev. Econ. Stud. 1992, 59, 777–793. [Google Scholar] [CrossRef] [Green Version]
Mailath, G.; Postlewaite, A. Asymmetric information bargaining problems with many agents. Rev. Econ. Stud. 1990, 57, 351–360. [Google Scholar] [CrossRef]
Miyakawa, T. Non-Cooperative Foundation of Nash Bargaining Solution under Incomplete Informational; Osaka University of Economics Working Paper Serier No. 2012-2.; Osaka University: Suita, Japan, 2012. [Google Scholar]
Athey, S.; Bagwell, K. Collusion with persistent cost shocks. Econometrica 2008, 76, 493–540. [Google Scholar] [CrossRef] [Green Version]
Hörner, J.; Takahashi, S.; Vieille, N. Truthful equilibria in dynamic bayesian games. Econometrica 2015, 83, 1795–1848. [Google Scholar] [CrossRef] [Green Version]
Clempner, J.B.; Poznyak, A.S. A nucleus for bayesian partially observable markov games: Joint observer and mechanism design. Eng. Appl. Artif. Intell. 2020, 95, 103876. [Google Scholar] [CrossRef]
Rahman, D. The power of communication. Am. Econ. Rev. 2014, 104, 3737–3751. [Google Scholar] [CrossRef]
Bernheim, B.; Madsen, E. Price cutting and business stealing in imperfect cartels. Am. Econ. Rev. 2017, 107, 387–424. [Google Scholar] [CrossRef] [Green Version]
Escobar, J.F.; Llanes, G. Cooperation dynamics in repeated games of adverse selection. J. Econ. Theory 2018, 176, 408–443. [Google Scholar] [CrossRef]
Poznyak, A.S.; Najim, K.; Gómez-Ramírez, E. Self-Learning Control of Finite Markov Chains; Marcel Dekker, Inc.: New York, NY, USA, 2000. [Google Scholar]
Clempner, J.B.; Poznyak, A.S. Simple computing of the customer lifetime value: A fixed local-optimal policy approach. J. Syst. Sci. Syst. Eng. 2014, 23, 439–459. [Google Scholar] [CrossRef]
Clempner, J.B.; Poznyak, A.S. Observer and control design in partially observable finite markov chains. Automatica 2019, 110, 108587. [Google Scholar] [CrossRef]
Clempner, J.B. On lyapunov game theory equilibrium: Static and dynamic approaches. Int. Game Theory Rev. 2018, 20, 1750033. [Google Scholar] [CrossRef] [Green Version]
Clempner, J.B.; Poznyak, A.S. Finding the strong nash equilibrium: Computation, existence and characterization for markov games. J. Optim. Theory Appl. 2020, 186, 1029–1052. [Google Scholar] [CrossRef]
Sragovich, V.G. Mathematical Theory of Adaptive Control; World Scientific Publishing Company: Singapore, 2006. [Google Scholar]
Asiain, E.; Clempner, J.B.; Poznyak, A.S. A reinforcement learning approach for solving the mean variance customer portfolio for partially observable models. Int. J. Artif. Intell. Tools 2018, 27, 1850034-1–1850034-30. [Google Scholar] [CrossRef]
Trejo, K.K.; Clempner, J.B.; Poznyak, A.S. Computing the lp-strong nash equilibrium for markov chains games. Appl. Math. Model. 2017, 41, 399–418. [Google Scholar] [CrossRef]
Hotelling, H. Stability in competition. Econ. J. 1929, 39, 41–57. [Google Scholar] [CrossRef]
Downs, A. An Economic Theory of Democracy; Harper & Brothers: New York, NY, USA, 1957. [Google Scholar]
Solis, C.; Clempner, J.B.; Poznyak, A.S. Robust extremum seeking for a second order uncertain plant using a sliding mode controller. Int. J. Appl. Math. Comput. Sci. 2019, 29, 703–712. [Google Scholar] [CrossRef] [Green Version]
Solis, C.; Clempner, J.B.; Poznyak, A.S. Robust integral sliding mode controller for optimisation of measurable cost functions with constraints. Int. J. Control 2019, 1–13, To be published. [Google Scholar] [CrossRef]
Solis, C.; Clempner, J.B.; Poznyak, A.S. Continuous-time gradient-like descent algorithm for constrained convex unknown functions: Penalty method application. J. Comput. Appl. Math. 2019, 355, 268–282. [Google Scholar] [CrossRef]

Figure 1. Convergence of strategies

z_{i m k k^{'}}

for player 1.

Figure 1. Convergence of strategies

z_{i m k k^{'}}

for player 1.

Figure 2. Convergence of strategies

z_{i m k k^{'}}

for player 2.

Figure 2. Convergence of strategies

z_{i m k k^{'}}

for player 2.

Figure 3. Convergence of strategies

z_{i m k k^{'}}

for player 3.

Figure 3. Convergence of strategies

z_{i m k k^{'}}

for player 3.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Clempner, J.B.; Poznyak, A.S. Analytical Method for Mechanism Design in Partially Observable Markov Games. Mathematics 2021, 9, 321. https://doi.org/10.3390/math9040321

AMA Style

Clempner JB, Poznyak AS. Analytical Method for Mechanism Design in Partially Observable Markov Games. Mathematics. 2021; 9(4):321. https://doi.org/10.3390/math9040321

Chicago/Turabian Style

Clempner, Julio B., and Alexander S. Poznyak. 2021. "Analytical Method for Mechanism Design in Partially Observable Markov Games" Mathematics 9, no. 4: 321. https://doi.org/10.3390/math9040321

APA Style

Clempner, J. B., & Poznyak, A. S. (2021). Analytical Method for Mechanism Design in Partially Observable Markov Games. Mathematics, 9(4), 321. https://doi.org/10.3390/math9040321

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Analytical Method for Mechanism Design in Partially Observable Markov Games

Abstract

1. Introduction

1.1. Brief Review

1.2. Main Results

1.3. Organization of the Paper

2. Markov Games with Incomplete Information

3. Main Relations

4. Ergodicity Conditions Expressed in $z$ Variables

5. Convergence Analysis

6. Political Numerical Example

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A. Proof Lemma 3.1

Appendix B. Proof of Theorem 4.1

Appendix C. Proof of Theorem 5.1

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Analytical Method for Mechanism Design in Partially Observable Markov Games

Abstract

1. Introduction

1.1. Brief Review

1.2. Main Results

1.3. Organization of the Paper

2. Markov Games with Incomplete Information

3. Main Relations

4. Ergodicity Conditions Expressed in z Variables

5. Convergence Analysis

6. Political Numerical Example

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A. Proof Lemma 3.1

Appendix B. Proof of Theorem 4.1

Appendix C. Proof of Theorem 5.1

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

4. Ergodicity Conditions Expressed in $z$ Variables