Next Article in Journal
Optimization of Energy Consumption in Chemical Production Based on Descriptive Analytics and Neural Network Modeling
Previous Article in Journal
Volatility Forecasting for High-Frequency Financial Data Based on Web Search Index and Deep Learning Model
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Analytical Method for Mechanism Design in Partially Observable Markov Games

by
Julio B. Clempner
1,* and
Alexander S. Poznyak
2
1
Escuela Superior de Física y Matemáticas, Instituto Politécnico Nacional, School of Physics and Mathematics, National Polytechnic Institute, Edificio 9 U.P. Adolfo Lopez Mateos, Col. San Pedro Zacatenco, 07730 Mexico City, Mexico
2
Center for Research and Advanced Studies Av. IPN 2508, Department of Control Automatics, Col. San Pedro Zacatenco, 07360 Mexico City, Mexico
*
Author to whom correspondence should be addressed.
Mathematics 2021, 9(4), 321; https://doi.org/10.3390/math9040321
Submission received: 29 November 2020 / Revised: 25 December 2020 / Accepted: 11 January 2021 / Published: 6 February 2021
(This article belongs to the Section E1: Mathematics and Computer Science)

Abstract

:
A theme that become common knowledge of the literature is the difficulty of developing a mechanism that is compatible with individual incentives that simultaneously result in efficient decisions that maximize the total reward. In this paper, we suggest an analytical method for computing a mechanism design. This problem is explored in the context of a framework, in which the players follow an average utility in a non-cooperative Markov game with incomplete state information. All of the Nash equilibria are approximated in a sequential process. We describe a method for the derivative of the player’s equilibrium that instruments the design of the mechanism. In addition, it showed the convergence and rate of convergence of the proposed method. For computing the mechanism, we consider an extension of the Markov model for which it is introduced a new variable that represents the product of the mechanism design and the joint strategy. We derive formulas to recover the variables of interest: mechanisms, strategy, and distribution vector. The mechanism design and equilibrium strategies computation differ from those in previous literature. A numerical example presents the usefulness and effectiveness of the proposed method.

1. Introduction

1.1. Brief Review

Hurwicz [1] published his seminal work on mechanism design that has emerged as a practical framework for tackling game theory problems with an engineering viewpoint when considering players that interact rationally [2,3,4]. For a survey see [5]. This theory is based on games with incomplete information for modeling mechanisms (that implements a social choice function) compatible with individual incentives that result in efficient decisions for maximizing the total reward. The primary aim consists of establishing games that consider independent private values and quasilinear payoffs [6,7], in which players receive messages containing information that is relevant to payoffs [8]. In the evolutions of the game, players commit to a mechanism that presents a result in terms of a function of the possibly untruthfully reported type. It should be pointed out that the mechanism is unknown. The mechanism designer determines a social choice function that is a mapping of the true type profile directly to the alternatives. However, a mechanism maps the reported type profile to the alternatives. The main task in computational mechanism design is to find a mechanism that both maintains the game-theoretic original futures and is computationally “efficient” and “feasible”.
This approach makes it possible managing the restrictions and controlling the information of the players that are engaged in a game. From this perspective, Arrow [9] presented a framework to claim revelation that realizes efficiency and avoids the spend of resources in the incentive payments. d’Aspremont and Gerard-Varet [10] suggested two separate methods to design a mechanism with incomplete information: a) the former consist of the fact that players beliefs are not considered, and the second where they are. Saari [11] presented a mechanism design, which involves types of information. Rogerson [12] proposed a general approach of the hold-up problem, in which several players make relation-specific investments and then decide on some cooperative action proving that first-best solutions exist under a variety of different assumptions regarding the nature of information asymmetries. Mailath and Postlewaite [13] established an approach for the bargaining problems with asymmetric information while considering multiple agents. Miyakawa [14] provided a necessary and sufficient condition for the existence of a stationary perfect Bayesian equilibrium. Athey and Bagwell [15] and Hörner et al. [16] have relevant results on equilibria in repeated games, which consider communication. Clempner and Poznyak [17] suggested a Bayesian partially observable Markov games model supported by an AI approach. Different approaches are presented in the literature, for instance, see [18,19,20].

1.2. Main Results

We contribute to this literature by proposing original outcomes, presenting an analytical method for developing a mechanism that considers incomplete state information whose preferences evolve following a Markov process, and characterizing an approximately equilibrium behavior in game theory models [17]. The foundation of the proposed method is the derivation of formulas for computing the mechanism μ . Subsequently, given the mechanism, compute the equilibrium strategy. The derivation of these formulas rely on a direct mechanism design. We propose an extension of the Markov model, suggesting a new variable z that represents the product of the mechanism μ and the joint strategy c. Additionally, the joint strategy c is defined by the product of the strategy π , the observer q, and the distribution vector P. We derive formulas to recover the variables of interest: mechanism μ , the strategies π , and the distribution vectors P. We describe a method for the derivative of the player’s equilibrium that instruments the design of the mechanism and we also showed the convergence of the proposed method.

1.3. Organization of the Paper

For ease of exposition, in the next section, we describe the Markov game model. In Section 3, we introduce the variables c and z and suggest the derivation of the formulas. The ergodicity condition expressed in z variables is proven in Section 4. The convergence to a Nash equilibrium is presented in Section 5. Section 6 concludes with some remarks.

2. Markov Games with Incomplete Information

Let us introduce a probability space Ω , F , P , where Ω is a finite set of elementary events, F is the discrete σ algebra of the subsets of Ω , and P is a given probability measure defined on F . Let us also consider the natural sequence t = 1 , 2 , . . . as a time argument. Let S be a finite set that consists of states s 1 , , s N , N N , called the state space. A Stationary Markov chain [21,22] is a sequence of S-valued random variables s ( t ) , t N , satisfying the Markov condition:
P ( s ( t + 1 ) = s j | s ( t ) = s i , s ( t 1 ) = s i t 1 , , s ( 1 ) = s i 1 ) = P s ( t + 1 ) = s j | s ( t ) = s i = : p j | i .
The random variables s ( t ) are defined on the sample space Ω and they take values in S. The stochastic process s ( t ) , t N is assumed to be a Markov chain. The Markov chain can be represented by a complete graph whose nodes are the states, where each edge ( s i , s j ) S 2 is labeled by the transition probability in Equation (1). The matrix P = ( p j | i ) ( s ( i ) , s ( j ) ) S [ 0 , 1 ] N × N determines the evolution of the chain: for each n N , the power P n has in each entry ( s i , s j ) the probability of going from state s i to state s j in exactly n steps.
Let M C = ( S , A , { A ( s ) } s S , K , P ) be a Markov chain [21,22], where S is a finite set of states, S N and A is a finite set of actions. For each s S , A ( s ) A is the non-empty set of admissible actions at state s S . Without loss of generality we may take A = s S A ( s ) . Whereas, K = ( s , a ) | s S , a A ( s ) is the set of admissible state-action pairs. The variable p j | i k is a stationary controlled transition matrix, where p j | i k : = P ( X t + 1 = s j | X t = s i , A t = a k ) t N represents the probability that is associated with the transition from state s i to state s j , i = 1 , N ¯ ( i = 1 , . . . , N ) and j = 1 , N ¯ ( j = 1 , . . . , N ), under an action a k A ( s i ) , k = 1 , K ¯ ( k = 1 , . . . , K ). The distribution vector is given by P X t = s i = P i , such that P i S N , where S N = { s R N : i = 1 N P ( s i ) = 1 , P ( s i ) 0 } .
We consider the case where the process is not directly observable [23]. Let us associate with S the observation set Y, which takes values in a finite space 1 , . . . , M , M N . The stochastic process Y t , t N is called the observation process. By observing Y t at time t information regarding the true value of X t is obtained. If X t = s i and A t = a k an observation Y t = y m will have a probability q m | i k : = P ( Y t = y m | X t = s i , A t = a k ) , that denotes the relationship between the state and observation when an action a k A ( s i ) is chosen at time t. The observation kernel is a stochastic kernel on Y, as given by Q = [ q m | i k ] . We restrict ourselves to consider Q = [ q m | i ] .
Definition 1.
A controllable Partially Observable Markov Decision Process (POMDP) is a tuple
P O M D P = { M C , Y , Q , Q 0 , P , V }
where: (i) M C is a Markov chain; (ii) Y is the observation set, which takes values in a finite space 1 , . . . , M , M N ; (iii) Q = [ q m | i ] m = 1 , M ¯ , i = 1 , N ¯ denotes the observation kernel is a stochastic kernel on Y, such that m q m | i = 1 ; (iv) Q 0 = [ q m | i ] m = 1 , M ¯ , i = 1 , N ¯ denotes the initial observation kernel; (v) P is the (a priori) initial distribution; and, (vi) V i j m k , is the reward function at time t, given the state s i , the observable state y m , when the action a k A ( s i , y m ) is taken.
A realization of the partially observable system at time t is given by the sequence ( s 0 , y 0 , a 0 , s 1 , y 1 , a 1 , . . . ) Ω : = S Y A , where s 0 has a given by the distribution P X 0 = s 0 and A t is a control sequence in A that is determined by a control policy. To define a policy we cannot use the (unobservable) states s 0 , s 1 , . . . . Then, we introduce the observable histories h 0 : = ( p , Y 0 ) H 0 and h t : = ( s 0 , y 0 , a 0 , . . . , y t 1 , a t 1 , y t ) H t for all t 1 and H t : = H t 1 ( A Y ) , if t 1 . Now, a policy is defined as a sequence π k | m ( t ) , such that, for each t, π k | m ( t ) is a stochastic kernel on A given H t . The set of all policies is denoted by Π . A policy π k | m ( t ) Π and an initial distribution P X 0 = s 0 , also denoted as P 0 , determine all possible realizations of the POMDP. A control strategy satisfies that k π k | m ( t ) = 1 and π k | m ( t ) 0 , m = 1 , . . . , M .
A game consists of a set N = 1 , . . . , n of players (indexed by l = 1 , n ¯ ). We employ l in order to emphasize the l-th player’s variables and l subsumes all the other players’ variables. The dynamics is described, as follows. At time t = 0 , the initial state s 0 has a given a priori distribution P i l , and the initial observation y 0 is generated according to the initial observation kernel Q 0 l ( y 0 | s 0 ) . If, at time t, the state of the system is X t and the control A t l A l is applied, then each of strategy is allowed to randomize, with distribution π k | m l ( t ) , over the pure action choices A t l A l ( X t ) . These choices induce immediate utilities V i j m k l . The system tries to maximize the corresponding one-step utility. Next, the system moves to new state X t + 1 = s j , according to the transition probabilities P l ( π k | m l ( t ) ) . Subsequently, the observation Y t is generated by the observation kernel Q l ( Y t | X t ) . Based on the obtained utility, the systems adapt a mixed strategy computing π k | m l ( t + 1 ) for the next selection of the control actions. For any stationary strategies π k | m l ( t ) = π k | m l , we have P j l = i = 1 N ( k = 1 M p j | i k l π k | m l q m | i l ) P i l . Then,
U l ( π ) : = m = 1 M i = 1 N k = 1 K W i m k l ι = 1 n π k | m ι q m | i ι P i ι ,
where W i m k l = j N V i j m k l p j | i k l . Each player maximizes the individual payoff function U l ( π k | m ) , realizing the rule that is given by
( π k | m l ) A r g max π l Π l l N U l ( π k | m l ) .
where for a given strategies π k | m satisfy the Nash equilibrium [24,25] fulfilling, for all admissible π k | m , the condition
U l ( π k | m ) U l ( π k | m l , π k | m l ) .

3. Main Relations

Following [21,26] and [27], let us introduce a matrix of elements c = c i m k , as follows
c i m k l = d k m l P i l = π k | m l q m | i l P i l .
Let us define Ξ i | m l = Q l 1 , Q l = q m | i l . Formally, a mechanism is any function μ k | m , such that given c i m k l represents the nonlinear programming problem
U ˜ l ( c ) = i = 1 N m = 1 M k = 1 K k = 1 K W i m k l ι = 1 n μ k | m ι c i m k ι max c l C a d m l , μ k | m l M a d m l
and defining μ k | m l = μ k | m l = 1 , . . . , n , we have that
U ˜ l ( c ) = i = 1 N m = 1 M k = 1 K k = 1 K W i m k l ι = 1 n μ k | m c i m k ι max c l C a d m l , μ k | m M a d m
such that
C a d m l : = c i m k l 0 i = 1 N m = 1 M k = 1 K c i m k l = 1 , m = 1 M k = 1 K c i m k l = P i l > 0 , m = 1 M k = 1 K i = 1 N [ δ i j p j | i k l ] c i m k l = 0 , j = 1 , N ¯ ; h = 1 M k = 1 K i = 1 N [ δ h m q m | i l ] c i h k l = 0 , m = 1 , M ¯ , m = 1 M k = 1 K i = 1 N Ξ h | m l c i m k l 0 , h = 1 , N ¯ , ,
M a d m = μ k | m 0 k = 1 K μ k | m = 1 , m = 1 , M ¯ .
Now, let us introduce the z-variable, as follows
z i m k k l : = μ k | m c i m k l = μ k | m π k | m l q m | i l P i l
U ˜ l ( z ) = i = 1 N m = 1 M k = 1 K k = 1 K W i m k l ι = 1 n z i m k k ι max z Z a d m
where
Z a d m l : = z i m k k l 0 i = 1 N m = 1 M k = 1 K k = 1 K z i m k k l = 1 , m = 1 M k = 1 K k = 1 K z i m k k l = P i l > 0 , m = 1 M k = 1 K i = 1 N [ δ i j p j | i k l ] k = 1 K z i m k k l = 0 , j = 1 , N ¯ ; h = 1 M k = 1 K i = 1 N [ δ h m q m | i l ] k = 1 K z i h k k l = 0 , m = 1 , M ¯ , m = 1 M k = 1 K i = 1 N Ξ h | m l k = 1 K z i m k k l 0 , h = 1 , N ¯ ,
Notice that by the relations
k = 1 K μ k | m = 1 , k = 1 K π k | m l = 1 , m = 1 M q m | i l = 1 , i = 1 N P i l = 1
μ k | m 0 , it is easy to check that z l S l , where
S l : = z i m k k l 0 i = 1 N m = 1 M k = 1 K k = 1 K z i m k k l = 1 , m = 1 M k = 1 K k = 1 K z i m k k l = P i > 0
We define the solution of the problem (7) as z l . The next lemma clarifies how we may recover μ k | m and c i m k l .
Lemma 1.
Variables μ k | m and c i m k l can be recovered from z i m k k l , as follows:
( a ) μ k | m = l = 1 n i = 1 N k = 1 K z i m k k l l = 1 n i = 1 N k = 1 K κ = 1 K z i m k κ l ( b ) c α β γ l = 1 K k = 1 K z α β γ k l μ k | β
Proof. 
See Appendix A. □
Corollary 1.
In addition, c α β γ l = k = 1 K z α β γ k l .
Now, in order to derive π k | m l and P ¯ m l we have that
π k i m l = k = 1 K z i m k k l h = 1 K z i m h k l if h = 1 K k = 1 K z i m h k l > 0 0 if h = 1 K k = 1 K z i m h k l = 0 ,
P i l = m = 1 M k = 1 K k = 1 K z i m k k l
Corollary 2.
The strategy π k | m l constructed from π k i m l (11), and the distribution P ¯ m l are given by
π k | m l = 1 N i = 1 N π k i m l , P ¯ m l = i = 1 N k = 1 K k = 1 K z i m k k l .

4. Ergodicity Conditions Expressed in z Variables

We have derived the formulas, which maximize Equation (7) that is based on the variables z i m k k l and the formulas to recover the policy π k | m l , the mechanism μ k | m and P ¯ m l . Accordingly, we focus our attention on the ergodicity restrictions.
Theorem 1.
The strategy π k | m l and the mechanism μ k | m l are in Nash equilibrium, where every agent maximizes its expected utility, for every l = 1 , n ¯ ,
U ˜ l ( μ k | m l π k | m l q m | i l P i l ) U ˜ l ( μ k | m l π k | m l q m | i l P i l )
if the quantities of z i m k k l satisfies the following restrictions
β = 1 M γ = 1 K α = 1 N [ δ α j p j | α γ l ] k = 1 K z α β γ k l = 0 , j = 1 , N ¯ .
h = 1 M γ = 1 K α = 1 N [ δ h β q α | β l ] k = 1 K z α β γ k l = 0 , β = 1 , M ¯ .
β = 1 M γ = 1 K α = 1 N Ξ h | β l k = 1 K z α β γ k l 0 , h = 1 , N ¯ .
Proof. 
See Appendix B. □

5. Convergence Analysis

The Nash Equilibrium is a game theory concept that involves several players that determines the solution in a non-cooperative game in which each player lacks any incentive to only change his/her own strategy. A practical notion in deriving Nash equilibria is a player’s best reply. The best reply is the strategy (or set of strategies) that maximizes/minimizes his/her payoff taking other players’ strategies as given. Then, a player has not just one best-reply strategy, however he/she has a best-reply strategy for each arrangement of strategies for the other players. All of the Nash’s equilibrium can be approximated in a (best reply) sequential process. We want to compute the solution of the problem (7), defined as z l = μ k | m l π k | m l q m | i l P i l , when considering the best reply approach. For solving the problem (7), let us consider a game whose strategies are denoted by x l X l , where X is a convex and compact set, where x l : = c o l z i m k k l and X l : = Z a d m l . Let x = ( x 1 , . . . , x n ) X be the joint strategy of the players and x l ^ : = x 1 , . . . , x l 1 , x l + 1 , . . . , x n X l ^ be a strategy of the rest of the players adjoint to x l X a d m l . We consider a Nash equilibrium problem with n players and denote, by x = ( x l , x l ^ ) R n , the vector representing the x-th player’s strategy X a d m = X a d m l × X a d m l ^ . The method of Lagrange multipliers is an optimization approach for finding the local minimum (maximum) of a function subject to equality constraints ( A e q ), as given in Equation (8). Let us consider the Lagrange function that is given by
L x , x ^ ( x ) , λ : = U ( x , x ^ ( x ) ) + λ A e q x
where the Lagrange vector-multipliers λ Λ may have any sign. The optimization problem
L x , x ^ ( x ) , λ min x X a d m , x ^ ( x ) X ^ a d m max λ Λ
for which we propose the following iteration algorithm:
1. Proximal prediction step:
λ ¯ n = arg min λ 0 1 2 λ λ n 2 + θ L x n , x ^ n , λ x ¯ n = arg min x X 1 2 x x n 2 + θ L x , x ^ n λ ¯ n x ^ ¯ n = arg min x ^ X ^ 1 2 x ^ x ^ n 2 + θ L x n , x ^ , λ ¯ n
2. Gradient approximation step:
λ n + 1 = arg min λ 0 1 2 λ λ n 2 + θ L x ¯ n , x ^ ¯ n , λ x n + 1 = arg min x X 1 2 x x n 2 + θ L x , x ^ ¯ n λ ¯ n x ^ n + 1 = arg min x ^ X ^ 1 2 x ^ x ^ n 2 + θ L x ¯ n , x ^ , λ ¯ n
Let us define the following variables
x ˜ : = x x ^ X ˜ : = X × X ^ , y ˜ : = λ Y ˜ : = R +
Subsequently, the Lagrange function can be expressed as
L ˜ ( x ˜ , y ˜ ) : = L ( x , x ^ , λ )
The equilibrium point that satisfies Equations (17) and (18) can be represented by
x ˜ = arg min x ˜ X ˜ 1 2 x ˜ x ˜ 2 + γ L ˜ ( x ˜ , y ˜ ) y ˜ = arg max y ˜ Y ˜ 1 2 y ˜ y ˜ 2 + γ L ˜ ( x ˜ , y ˜ )
In addition, let us introduce the following variables
w ˜ = w ˜ 1 w ˜ 2 X ˜ × Y ˜ , v ˜ = v ˜ 1 v ˜ 2 X ˜ × Y ˜
and let us define the Lagrangian in terms of the previous variables
L ( w ˜ , v ˜ ) : = L ˜ ( w ˜ 1 , v ˜ 2 ) L ˜ ( v ˜ 1 , w ˜ 2 )
For w ˜ 1 = x ˜ , w ˜ 2 = y ˜ , v ˜ 1 = v ˜ 1 = x ˜ and v ˜ 2 = v ˜ 2 = y ˜ , we have
L ( w ˜ , v ˜ ) : = L ˜ ( x ˜ , y ˜ ) L ˜ δ ( x ˜ , y ˜ )
In these variables, the relation Equations (17) and (18) can be represented by
v ˜ = arg min w ˜ X ˜ × Y ˜ 1 2 w ˜ v ˜ 2 + γ L ( w ˜ , v ˜ )
We provide the convergence analysis of the sequence v n n N in the following theorem [28].
Theorem 2.
Let L w , v be a convex and differentiable function with the gradient satisfying the Lipschitz condition, i.e., L v L w L v w for all v , w V a d m , where V a d m is a convex and compact set. Let v n n N be a sequence defined by the local search and proximal iteration algorithm that is given by
v ¯ n = arg min v V a d m 1 2 v v n 2 + θ n L ( w , v ) v n + 1 = arg min v V a d m 1 2 v v n 2 + θ n L ( w ¯ n . v ¯ n )
then, the sequence v n n N converges to a Nash equilibrium point v V a d m .
Proof. 
See Appendix C. □

6. Political Numerical Example

The theory that is related to electoral competition originates in the original contributions of Hotelling [29] and Downs [30]. The proposed framework suggests a majority rule election, where political candidates compete for a position by simultaneously and independently proposing a model from a unidimensional policy space. It is common knowledge that the equilibrium of this model is fundamentally determined on the candidates’ incentives for running for such a position. This example considers a three-player game ( l = 1 , 3 ¯ ) that is engaged in a political contest, in which the player with the highest performance wins. A question arises as to what is the design of a mechanism to select a candidate? The goal of each candidate is to end up on top. The next time a political position rolls around, pay attention to the campaigning. Candidates who are behind will talk about not only what a good choice they are for such a position, but also what a bad choice the front-runner is.
The assumption that is involved in this example considers the incomplete information version of the game, in which candidates have the same relative weight to their preference strategies versus their desire to win the position. This case is relevant from a theoretical point of view, and it is empirically important. The dynamics are modeled when considering N = 4 , M = 4 , and K = 2 with transition matrices for describing the evolution of the partially observed Markov game. The initial transition matrices are defined, as follows:
p j | i 1 1 = 0.2307 0.2130 0.3120 0.2443 0.4989 0.1075 0.2250 0.1686 0.1857 0.0801 0.3938 0.3404 0.3235 0.2372 0.1065 0.3327 p j | i 2 1 = 0.4344 0.0728 0.3970 0.0959 0.5605 0.1505 0.1423 0.1467 0.1866 0.2866 0.3168 0.2100 0.1140 0.2091 0.2878 0.3892
p j | i 1 2 = 0.1613 0.4207 0.1745 0.2435 0.3372 0.1160 0.2305 0.3163 0.2488 0.3219 0.0323 0.3970 0.3696 0.5068 0.0363 0.0873 p j | i 2 2 = 0.3587 0.1023 0.1044 0.4346 0.1796 0.3039 0.2303 0.2862 0.0115 0.3002 0.2537 0.4346 0.2102 0.3296 0.4079 0.0523
p j | i 1 3 = 0.1354 0.3183 0.0631 0.4832 0.2442 0.2663 0.2572 0.2323 0.4768 0.2447 0.0144 0.2641 0.3380 0.1812 0.3172 0.1636 p j | i 2 3 = 0.1201 0.0840 0.4017 0.3942 0.3531 0.1164 0.2559 0.2746 0.2919 0.0985 0.3721 0.2375 0.5341 0.0798 0.0850 0.3010
As well as, the initial observation matrices are defined, as follows:
q m | i 1 = 0.2423 0.3692 0.0931 0.3550 0.1881 0.0397 0.2043 0.2126 0.2848 0.3933 0.1156 0.3773 0.2847 0.1978 0.5870 0.0552 q m | i 2 = 0.1749 0.4870 0.2563 0.4140 0.2719 0.0190 0.2748 0.4030 0.2814 0.4030 0.2405 0.1692 0.2718 0.0910 0.2284 0.0137
q m | i 3 = 0.1734 0.0374 0.0430 0.3581 0.4350 0.2568 0.4207 0.0150 0.2747 0.3091 0.3390 0.3458 0.1170 0.3967 0.1973 0.2811
Fixing θ = 0.055 in the extraproximal method that is given in Equations (17) and (18), we have that the Nash equilibrium results from computing the strategies and the mechanism design applying Equations (10) and (13), which are given, as follows:
π k | m 1 = 0.4633 0.5367 0.4543 0.5457 0.4850 0.5150 0.3859 0.6141 π k | m 2 = 0.3092 0.6908 0.3707 0.6293 0.3879 0.6121 0.3863 0.6137
π k | m 3 = 0.3937 0.6063 0.4165 0.5835 0.3850 0.6150 0.4990 0.5010 μ k | m = 0.2929 0.7071 0.3321 0.6679 0.2410 0.7590 0.2187 0.7813
We present a full characterization of the Nash equilibrium for the case of partially observable Markov games. Figure 1, Figure 2 and Figure 3 show the convergence of the strategies z i m k k l .

7. Conclusions

This paper contributed to the literature on mechanism design for Markov games with incomplete state information (partially observable). We suggested an analytical method for the design of a mechanism. The main result of this work is based on the introduction of the new variable z, which makes the game problem computationally tractable and allow for obtaining the mechanism solution μ and the strategies π for all of the players in the game. The variable z allows for the introduction of new natural additional linear restrictions for computing the Nash equilibrium of the game. A no feasible solution can be detected with a simple test on the variable z, i.e., it is possible to detect unusual conditions in the solver of the game given the information available for the simplex. A major advantage of introducing this variable relies on the fact it can be efficiently implemented for real settings, which is consistent with the engineering approach for designing economic mechanisms or incentives, toward desired objectives, where players act rationally. We applied these results to a numerical example that is related to political promotion.
In relation to future work, there are several challenges that are left to address. One interesting technical challenge is that of addressing extremum seeking in the context of mechanism design [31,32,33]. Another interesting challenge would be to consider the observer design approach in order to extend the mechanism design theory [23].

Author Contributions

Authors contributed equally to this work. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Proof Lemma 3.1

Proof. Since μ k | m does not depend on indices l , i , k it may be obtained from Equations (8) and (9):
l = 1 n i = 1 N k = 1 K z i m k k l : = μ k | m l = 1 n i = 1 N k = 1 K c i m k l = μ k | m l = 1 n i = 1 N k = 1 K π k | m l q m | i l P i l = μ k | m l = 1 n i = 1 N q m | i l P i l
Hence,
μ k | m = l = 1 n i = 1 N k = 1 K z i m k k l l = 1 n i = 1 N q m | i l P i l = l = 1 n i = 1 N k = 1 K z i m k k l l = 1 n i = 1 N k = 1 K κ = 1 K z i m k κ l .
Let us define z α β γ k l as follows
c α β γ l = 1 K k = 1 K z α β γ k l μ k | β = 1 K k = 1 K z α β γ k l l = 1 n i = 1 N k = 1 K z i β k k l l = 1 n i = 1 N k = 1 K κ = 1 K z i β k κ l
To verify that the definitions of μ k | m and c i m k l (10) are correct we need to check the fulfilling of Equations (5) and (6), i.e., μ k | m M a d m and c i m k l C a d m l .
(a) As for variables μ k | m , these properties follow directly:
μ k | m = l = 1 n i = 1 N k = 1 K z i m k k l l = 1 n i = 1 N k = 1 K κ = 1 K z i m k κ l 0
since z i m k k l 0 . Summing (A2) by k directly leads to the property k = 1 K μ k | m = 1 .
(b) To prove that c i m k l C a d m l defined by (A1) notice that
γ = 1 K c α β γ l = 1 K γ = 1 K k = 1 K z α β γ k l l = 1 n i = 1 N k = 1 K z i β k k l l = 1 n i = 1 N k = 1 K κ = 1 K z i β k κ l = 1 K k = 1 K γ = 1 K z α β γ k l l = 1 n i = 1 N k = 1 K z i β k k l l = 1 n i = 1 N k = 1 K κ = 1 K z i β k κ l
and
β = 1 M γ = 1 K c α β γ l = 1 K k = 1 K β = 1 M γ = 1 K z α β γ k l l = 1 n i = 1 N k = 1 K z i β k k l l = 1 n i = 1 N k = 1 K κ = 1 K z i β k κ l ,
which leads to following relation
α = 1 N β = 1 M γ = 1 K c α β γ l = 1 K k = 1 K β = 1 M α = 1 N γ = 1 K z α β γ k l h = 1 n i = 1 N k = 1 K z i β k k h h = 1 n i = 1 N k = 1 K κ = 1 K z i β k κ h = 1 K β = 1 M k = 1 K α = 1 N γ = 1 K z α β γ k l h = 1 n i = 1 N k = 1 K z i β k k h h = 1 n i = 1 N k = 1 K κ = 1 K z i β k κ h = 1 K β = 1 M h = 1 n i = 1 N k = 1 K κ = 1 K z i β k κ h k = 1 K α = 1 N γ = 1 K z α β γ k l h = 1 n i = 1 N k = 1 K z i β k k h = β = 1 M q i | β l P i l α = 1 N q α | β h P α l h = 1 M i = 1 N q i | β l P i l = α = 1 N β = 1 M q α | β h P α h = α = 1 N P α h = 1
Then, z α β γ k l S , see Equation (9). The Lemma is proved.

Appendix B. Proof of Theorem 4.1

Proof. This means that new variables z α β θ k l should satisfy the following linear ergodicity constraints:
P j l = α = 1 N β = 1 M γ = 1 K p j | α γ l μ k | β π γ | β l q β | α l P α l = α = 1 N β = 1 M γ = 1 K p j | α γ l z α β γ k l ,
j = 1 N β = 1 M γ = 1 K p j | α γ l z j β γ k l = α = 1 N β = 1 M γ = 1 K p j | α γ l z α β γ k l ,
which implies
β = 1 M γ = 1 K α = 1 N [ δ α j p j | α γ l ] k = 1 K z α β γ k l = 0 , j = 1 , N ¯
Then,
z E : = z α β γ k l β = 1 M γ = 1 K α = 1 N [ δ α j p j | α γ l ] k = 1 K z α β γ k l = 0 , j = 1 , N ¯ ,
The Equation (15) is fulfilled automatically since
h = 1 M α = 1 N [ δ h β q α | β l ] l = 1 n i = 1 N k = 1 K κ = 1 K z i h k κ l k = 1 K μ k | h γ = 1 K π γ | h l q α | h l P α l μ k | h l = 1 n i = 1 N k = 1 K π k | h l q i | h l P i l = h = 1 M α = 1 N [ δ h β q α | β l ] l = 1 n i = 1 N k = 1 K κ = 1 K z i h k κ l k = 1 K γ = 1 K π γ | h l q α | h l P α l l = 1 n i = 1 N k = 1 K π k | h l q i | h l P i l = K h = 1 M α = 1 N [ δ h β q α | β l ] l = 1 n i = 1 N k = 1 K κ = 1 K z i h k κ l q α | h l P α l l = 1 n i = 1 N q i | h l P i l = K h = 1 M α = 1 N [ δ h β q α | β l ] l = 1 n i = 1 N k = 1 K κ = 1 K μ κ | h π k | h l q i | h l P i l q α | h l P α l l = 1 n i = 1 N q i | h l P i l = K h = 1 M α = 1 N [ δ h β q α | β l ] l = 1 n i = 1 N q i | h l P i l q α | h l P α l l = 1 n i = 1 N q i | h l P i l = K h = 1 M α = 1 N [ δ h β q α | β l ] q α | h l P α l = K h = 1 M α = 1 N δ h β q α | h l P α l h = 1 M α = 1 N q α | β l q α | h l P α l = K h = 1 M α = 1 N δ h β q α | h l P α l α = 1 N q α | β l h = 1 M q α | h l P α l = K α = 1 N q α | β l P α l α = 1 N q α | β l P α l = 0
Now, we prove the relation given in Equation (16) as follows:
β = 1 M γ = 1 K α = 1 N Ξ h | β l k = 1 K z α β γ k l = β = 1 M γ = 1 K α = 1 N Ξ h | β l c α β γ l = β = 1 M γ = 1 K α = 1 N Ξ h | β l π γ | β l q α | β l P α l = α = 1 N P α l β = 1 M q α | β l Ξ h | β l γ = 1 K π γ | β l = α = 1 N P α l δ α , h = P h l 0 .
The Theorem is proved.

Appendix C. Proof of Theorem 5.1

Let us define ω 0 = l = 1 N ω 0 , l N max l = 1 , N ¯ ω 0 , l = N ω 0 + and w ˜ = v ˜ n + 1 , then
1 2 v ^ n v ˜ n 2 + θ L ( v ^ n , v ˜ n ) 1 2 v ˜ n + 1 v ˜ n 2 + θ L ( v ˜ n + 1 , v ˜ n ) 1 2 v ˜ n + 1 v ^ n 2
Let also w ˜ = v ˜ X ˜ × Y ˜ then
1 2 v ˜ n + 1 v ˜ n 2 + θ L ( v ˜ n + 1 , v ^ n ) 1 2 v ˜ v ˜ n 2 + θ L ( v ˜ , v ^ n ) 1 2 v ˜ v ˜ n + 1 2
Adding Equations (A4) and (A5) and multiplying by two yields
v ˜ v ˜ n + 1 2 + v ˜ n + 1 v ^ n 2 + v ^ n v ˜ n 2 2 θ L ( v ˜ , v ^ n ) + 2 θ [ L ( v ˜ n + 1 , v ^ n ) + L ( v ^ n , v ˜ n ) L ( v ˜ n + 1 , v ˜ n ) ] v ˜ v ˜ n 2
Adding and subtracting L ( v ^ n , v ^ n ) in Equation (A6) we have
v ˜ v ˜ n + 1 2 + v ˜ n + 1 v ^ n 2 + v ^ n v ˜ n 2 + 2 θ L ( v ^ n , v ^ n ) L ( v ˜ , v ^ n ) + 2 θ L ( v ˜ n + 1 , v ^ n ) L ( v ^ n , v ^ n ) + L ( v ^ n , v ˜ n ) L ( v ˜ n + 1 , v ˜ n ) v ˜ v ˜ n 2
Let w ˜ + h = v ˜ n + 1 , w ˜ = v ^ n , v ˜ + k = v ˜ n , v ˜ = v ^ n having h= v ˜ n + 1 v ^ n and k= v ˜ n v ^ n . The inequality (A7) becomes
v ˜ v ˜ n + 1 2 + v ˜ n + 1 v ^ n 2 + v ^ n v ˜ n 2 + 2 θ L ( v ^ n , v ^ n ) L ( v ˜ , v ^ n ) 2 θ ω v ˜ n + 1 v ^ n v ˜ n v ^ n v ˜ v ˜ n 2
Using L ( v ^ n , v ^ n ) L ( v ˜ , v ^ n ) v ^ n v ˜ 2 we obtain v ˜ v ˜ n + 1 2 + v ˜ n + 1 v ^ n 2 + 2 θ v ^ n v ˜ 2 + 1 2 θ 2 ω 2 v ˜ n v ^ n 2 v ˜ v ˜ n 2 . Now, by the fact
2 a c , c b = a b 2 a c 2 c b 2
and replacing a = v ^ n , b = v ˜ and c = v ˜ n , to the left-hand side of the last inequality we have
v ˜ v ˜ n + 1 2 + v ˜ n + 1 v ^ n 2 + 1 2 θ 2 ω 2 v ˜ n v ^ n 2 + 2 θ [ 2 v ^ n v ˜ n , v ˜ n v ˜ + v ˜ n v ^ n 2 + v ˜ n v ˜ 2 v ˜ v ˜ n 2
Computing the square form of the third and fourth terms we have that
v ˜ v ˜ n + 1 2 1 2 θ + ( 2 θ ) 2 1 + 2 θ 2 θ 2 ω 2 v ˜ v ˜ n 2
Let ξ = 1 2 θ + ( 2 θ ) 2 1 + 2 θ 2 θ 2 ω 2 < 1 . then iterating over the previous inequality, we have
v ˜ v ˜ n + 1 2 q v ˜ v ˜ n 2 . . . e n + 1 ln ξ v ˜ v ˜ 0 2
By Equation (A8) we have that v ˜ v ˜ n + 1 2 n 0 . Taking into account that v ˜ is a sequence that is bounded, we have that there is a point v ˜ such that any subsequence v ˜ n i fulfills that v ˜ n i n i v ˜ (Weierstrass theorem). Now, we have that v ˜ n i v ˜ n i + 1 2 0 . Leting n = n i in Equation (19) and taking the limit when n i we obtain
v ˜ = arg min w ˜ X ˜ × Y ˜ 1 2 w ˜ v ˜ 2 + θ L ( w ˜ , v ˜ )
As a result, we have that v ˜ = v ˜ . Provided that v ˜ n v ˜ 2 is monotonically decreasing then there exists a unique limit point (equilibrium point). As a result, the sequence v ˜ n satisfies that v ˜ n n v ˜ with a rate given by e n + 1 ln ξ .

References

  1. Hurwicz, L. Optimality and informational efficiency in resource allocation processes. In Mathematical Methods in the Social Sciences: Proceedings of the First Stanford Symposium; Arrow, K.J., Karlin, S., Suppes, P., Eds.; Stanford University Press: Palo Alto, CA, USA, 1960; pp. 27–46. [Google Scholar]
  2. Nobel. The Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel 2007: Scientific Background; Technical Report; The Nobel Foundation: Stockholm, Sweden, 2007. [Google Scholar]
  3. Myerson, R.B. Allocation, Information and Markets; Chapter Mechanism Design; The New Palgrave; Palgrave Macmillan: London, UK, 1989; pp. 191–206. [Google Scholar]
  4. Vickrey, W. Counterspeculation, auctions, and competitive sealed tenders. J. Financ. 1961, 16, 8–37. [Google Scholar] [CrossRef]
  5. Bergemann, D.; Välimäki, J. Dynamic mechanism design: An introduction. J. Econ. Perspect 2019, 52, 235–274. [Google Scholar] [CrossRef] [Green Version]
  6. Clarke, E. Multi-part pricing of public goods. Public Choice 1971, 11, 17–23. [Google Scholar] [CrossRef]
  7. Groves, T. Incentives in teams. Econometrica 1973, 41, 617–631. [Google Scholar] [CrossRef]
  8. Harsanyi, J.C. Games with incomplete information played by bayesian players. part i: The basic model. Manag. Sci. 1967, 14, 159–182. [Google Scholar] [CrossRef]
  9. Arrow, K. Economics and Human Welfare; Chapter The Property Rights Doctrine and Demand Revelation under Incomplete Information; Academic Press: New York, NY, USA, 1979; pp. 23–39. [Google Scholar]
  10. D’Aspremont, C.; Gerard-Varet, L. Incentives and incomplete information. J. Public Econ. 1979, 11, 25–45. [Google Scholar]
  11. Saari, D.G. On the types of information and mechanism design. J. Comput. Appl. Math. 1988, 22, 231–242. [Google Scholar] [CrossRef] [Green Version]
  12. Rogerson, W. Contractual solutions to the hold-up problem. Rev. Econ. Stud. 1992, 59, 777–793. [Google Scholar] [CrossRef] [Green Version]
  13. Mailath, G.; Postlewaite, A. Asymmetric information bargaining problems with many agents. Rev. Econ. Stud. 1990, 57, 351–360. [Google Scholar] [CrossRef]
  14. Miyakawa, T. Non-Cooperative Foundation of Nash Bargaining Solution under Incomplete Informational; Osaka University of Economics Working Paper Serier No. 2012-2.; Osaka University: Suita, Japan, 2012. [Google Scholar]
  15. Athey, S.; Bagwell, K. Collusion with persistent cost shocks. Econometrica 2008, 76, 493–540. [Google Scholar] [CrossRef] [Green Version]
  16. Hörner, J.; Takahashi, S.; Vieille, N. Truthful equilibria in dynamic bayesian games. Econometrica 2015, 83, 1795–1848. [Google Scholar] [CrossRef] [Green Version]
  17. Clempner, J.B.; Poznyak, A.S. A nucleus for bayesian partially observable markov games: Joint observer and mechanism design. Eng. Appl. Artif. Intell. 2020, 95, 103876. [Google Scholar] [CrossRef]
  18. Rahman, D. The power of communication. Am. Econ. Rev. 2014, 104, 3737–3751. [Google Scholar] [CrossRef]
  19. Bernheim, B.; Madsen, E. Price cutting and business stealing in imperfect cartels. Am. Econ. Rev. 2017, 107, 387–424. [Google Scholar] [CrossRef] [Green Version]
  20. Escobar, J.F.; Llanes, G. Cooperation dynamics in repeated games of adverse selection. J. Econ. Theory 2018, 176, 408–443. [Google Scholar] [CrossRef]
  21. Poznyak, A.S.; Najim, K.; Gómez-Ramírez, E. Self-Learning Control of Finite Markov Chains; Marcel Dekker, Inc.: New York, NY, USA, 2000. [Google Scholar]
  22. Clempner, J.B.; Poznyak, A.S. Simple computing of the customer lifetime value: A fixed local-optimal policy approach. J. Syst. Sci. Syst. Eng. 2014, 23, 439–459. [Google Scholar] [CrossRef]
  23. Clempner, J.B.; Poznyak, A.S. Observer and control design in partially observable finite markov chains. Automatica 2019, 110, 108587. [Google Scholar] [CrossRef]
  24. Clempner, J.B. On lyapunov game theory equilibrium: Static and dynamic approaches. Int. Game Theory Rev. 2018, 20, 1750033. [Google Scholar] [CrossRef] [Green Version]
  25. Clempner, J.B.; Poznyak, A.S. Finding the strong nash equilibrium: Computation, existence and characterization for markov games. J. Optim. Theory Appl. 2020, 186, 1029–1052. [Google Scholar] [CrossRef]
  26. Sragovich, V.G. Mathematical Theory of Adaptive Control; World Scientific Publishing Company: Singapore, 2006. [Google Scholar]
  27. Asiain, E.; Clempner, J.B.; Poznyak, A.S. A reinforcement learning approach for solving the mean variance customer portfolio for partially observable models. Int. J. Artif. Intell. Tools 2018, 27, 1850034-1–1850034-30. [Google Scholar] [CrossRef]
  28. Trejo, K.K.; Clempner, J.B.; Poznyak, A.S. Computing the lp-strong nash equilibrium for markov chains games. Appl. Math. Model. 2017, 41, 399–418. [Google Scholar] [CrossRef]
  29. Hotelling, H. Stability in competition. Econ. J. 1929, 39, 41–57. [Google Scholar] [CrossRef]
  30. Downs, A. An Economic Theory of Democracy; Harper & Brothers: New York, NY, USA, 1957. [Google Scholar]
  31. Solis, C.; Clempner, J.B.; Poznyak, A.S. Robust extremum seeking for a second order uncertain plant using a sliding mode controller. Int. J. Appl. Math. Comput. Sci. 2019, 29, 703–712. [Google Scholar] [CrossRef] [Green Version]
  32. Solis, C.; Clempner, J.B.; Poznyak, A.S. Robust integral sliding mode controller for optimisation of measurable cost functions with constraints. Int. J. Control 2019, 1–13, To be published. [Google Scholar] [CrossRef]
  33. Solis, C.; Clempner, J.B.; Poznyak, A.S. Continuous-time gradient-like descent algorithm for constrained convex unknown functions: Penalty method application. J. Comput. Appl. Math. 2019, 355, 268–282. [Google Scholar] [CrossRef]
Figure 1. Convergence of strategies z i m k k for player 1.
Figure 1. Convergence of strategies z i m k k for player 1.
Mathematics 09 00321 g001
Figure 2. Convergence of strategies z i m k k for player 2.
Figure 2. Convergence of strategies z i m k k for player 2.
Mathematics 09 00321 g002
Figure 3. Convergence of strategies z i m k k for player 3.
Figure 3. Convergence of strategies z i m k k for player 3.
Mathematics 09 00321 g003
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Clempner, J.B.; Poznyak, A.S. Analytical Method for Mechanism Design in Partially Observable Markov Games. Mathematics 2021, 9, 321. https://doi.org/10.3390/math9040321

AMA Style

Clempner JB, Poznyak AS. Analytical Method for Mechanism Design in Partially Observable Markov Games. Mathematics. 2021; 9(4):321. https://doi.org/10.3390/math9040321

Chicago/Turabian Style

Clempner, Julio B., and Alexander S. Poznyak. 2021. "Analytical Method for Mechanism Design in Partially Observable Markov Games" Mathematics 9, no. 4: 321. https://doi.org/10.3390/math9040321

APA Style

Clempner, J. B., & Poznyak, A. S. (2021). Analytical Method for Mechanism Design in Partially Observable Markov Games. Mathematics, 9(4), 321. https://doi.org/10.3390/math9040321

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop