Abstract
In this paper, we consider the discrete-time constrained average stochastic games with independent state processes. The state space of each player is denumerable and one-stage cost functions can be unbounded. In these game models, each player chooses an action each time which influences the transition probability of a Markov chain controlled only by this player. Moreover, each player needs to pay some costs which depend on the actions of all the players. First, we give an existence condition of stationary constrained Nash equilibria based on the technique of average occupation measures and the best response linear program. Then, combining the best response linear program and duality program, we present a non-convex mathematic program and prove that each stationary Nash equilibrium is a global minimizer of this mathematic program. Finally, a controlled wireless network is presented to illustrate our main results.
MSC:
90C40; 91A25
1. Introduction
Stochastic games introduced by Shapley in Reference [1] which have been actively pursued over the last few decades because of several applications mainly in economics and queueing system; see, for instance, References [2,3,4,5]. In this paper, we study the special constrained stochastic games with independent state processes. More precisely, each player chooses an action and pays some costs which depend on the actions of all players in each stage. However, each player’s action only influence the transition probability of the Markov chain controlled by herself. In such case, the player would wish to minimize certain expected average cost which she is most concerned about, while wants to keep other kinds of expected average costs within bounds. The game models under these formulation have been considered in References [6,7]. In Reference [6], the authors present an existence condition of stationary Nash equilibria for the constrained average games. Reference [7] derives a necessary and sufficient condition for a stationary Nash equilibria to be a global minimizer of a mathematic program.
Different from the framework of game model considered in this paper, References [2,8,9] study the so-called centralized stochastic games in which all players jointly control a single Markov chain and the one-stage costs of each player are influenced by the actions of all players. Reference [8] considers the game model with expected discounted cost criteria and expected average cost criteria and yields the existence of stationary Nash equilibria in the context of finite state space and compact action spaces. Using vanishing discount approach, Reference [9] generalizes the existence result in Reference [8] to the average games with denumerable states. Reference [2] extends the existence result in Reference [8] to the discounted game with denumerable states but under some special hypothesis on the transition laws and boundness condition imposed on the one-stage costs. By special finite-state approximation approach, Reference [10] weakens the above existence condition in Reference [8]. For applications of constrained games, Reference [11] studies the wireless powered networks including a user and an attacker. Based on the constrained game theory in Reference [6], Reference [11] gives the energy request and data transmission strategy of the user and the attack strategy of the attacker in the sense of Nash equilibria. Reference [12] considers the wireless networks in which each mobil wants to maximize its expected capacity subject to some power and buffer length constraints. Reference [13] introduces a neighbor-aware energy-efficient monitoring system for energy harvesting Internet of Things. A constrained stochastic game model is established to minimize the number of transmissions while keeping a desired monitoring probability and a best response dynamics-based algorithm is developed in Reference [13].
In this paper, we consider the constrained average games with denumerable states and unbounded costs. The main contributions of the present paper are as follows. First, by introducing the average occupation measures, we establish the so-called best-response linear program to characterize the occupation measures of stationary constrained Nash equilibria as fixed points of certain multifunction from the product space of occupation measures into itself. The standard weak convergence technique used in References [6,7] can not apply directly to the case wherein the costs are unbounded. Meanwhile, the arguments in References [6,7] employ the finiteness of the state spaces. However, the costs are always unbounded and the state spaces are not finite in some controlled stochastic models such as the stochastic inventory model, queueing system; see, for instance References [14,15]. Therefore, we introduce the so-called w-weak convergence topology and impose the standard drift condition on the transition kernel, the growth condition and additive condition on the costs, w-uniform geometrical ergodicity condition and continuity-compactness conditions. By the properties of w-weak convergence, we study the asymptotic properties of average occupation measures and expected average costs, which are used to establish the upper-continuity of the multifunction. Then, we show the existence of stationary constrained Nash equilibria by the fixed-point theorem. It should be mentioned that the vanishing discount approach in Reference [9] is based on the existence of discounted Nash equilibria in Reference [10], that is to show the limit of discounted Nash equilibria is the average Nash equilibrium when the discount factors tend to one. However, in this paper, we use fixed-point method directly because the existence of discounted Nash equilibria for stochastic games with independent state process has not been established. Finally, we characterize each stationary constrained Nash equilibrium as a global minimizer of this mathematical program which can be viewed as the combination of the best response linear program and duality program.
The rest of the paper is organized as follows. In Section 2, we introduce the constrained average game model we are concerned with. In Section 3, we introduce the best response linear program and study the corresponding convergent property of the best response linear program based on the average occupation measures. In Section 4, we establish the main statements, including the existence result and the characterization of stationary constrained Nash equilibria. In Section 5, a controlled wireless network is presented to illustrate our main results.
2. The Game Model
If S is a Borel space, we denote by its Borel -algebra, by the set of all probability measures on endowed with the topology of weak convergence, and by the set of all Borel measurable functions on S. Let I stand for the indicator function, be a fixed integer. Let us define , and use indexs i or j to denote a player. Now, we introduce the N-person constrained game model
For each , the state space is assumed to be a denumerable set and is the action space which is assumed to be a Polish space endowed with the Borel -algebra . Without loss of generality, we assume and is enumerated by the natural order. Let be the product space of the state spaces and be the product space of the action spaces. For each , the nonempty measurable subset of denotes the set of actions available when the state of player i is . Let and . is the stochastic kernel on given controlled by player i, that is is the probability of moving from state to if player i chooses action . For each , the one-stage cost is a real-valued measurable function on . The constants () denote the constraints and denotes the initial distribution of player i. Let .
For each , let be the set of all histories up to time t and be the set of all stochastic kernels on given for each i, where denotes the t-power product space of .
Definition 1.
- (1)
- A randomized history-dependent strategy for player i is a sequence of stochastic kernels on the action space given satisfying
- (2)
- A randomized history-dependent strategy for player i is said to be randomized stationary if there is a stochastic kernel such that for each . We will write such a stationary strategy as .
Assumption 1.
The players do not observe their costs, that is, the strategy chosen by any player does not depend on the realization of the cost.
Remark 1.
Assumption 1 is imposed on [6,7], which is used to ensure that a player could not use the one-stage costs to estimate the state and the action of the other players.
The sets of all randomized history-dependent strategies and randomized stationary strategies for player i are denoted by and , respectively. A multi-strategy is a vector , where . Let denote the set of all randomized stationary multi-strategies.
Let and be the corresponding product -algebra. Then, for each and each initial distribution , the well-known Tulcea’s Theorem ([16], p. 178) ensures the existence of a unique probability measure on such that, for each and ,
where and denote the state and the action at the decision epoch t, respectively. The expectation operator with respect to is denoted by . If is concentrated at some state , we will write and as and , respectively. For each , the product measure of is denoted by and the corresponding expectation operator is denoted by . For each , we denote by the N-vector multi-strategy obtained from by replacing with . Similarly, for each , we denote by the N-vector in which the jth component is for , while the ith component is .
For each and , the expected average criteria are defined, for each player and , as
where and denote the state vector and action vector at time t, respectively.
Definition 2.
- (1)
- For a fixed , a strategy is said to be feasible for player i against π if for each . Letbe the set of all feasible strategies for player i against π.
- (2)
- is called a feasible multi-strategy for if is in for each . We denote by Δ the set of all feasible multi-strategies.
- (3)
- (Nash equilibrium) is called a constrained Nash equilibrium if andA constrained Nash equilibrium is said to be stationary if belongs to .
3. The Technique Preliminary
A monotone nondecreasing function such that will be refereed to as a Lyapunov function on for each . For each function g and constant , we denote by the power of the function g.
To guarantee the finiteness of the expected average costs, we need the following drift conditions, which is widely used in References [14,16,17,18] for discrete-time Markov decision processes and in References [5,10] for stochastic games.
Assumption 2.
For each , there exist a Lyapunov function on , constants and such that
- (1)
- ;
- (2)
- .
By Jensen’s inequality, it follows that there exist constants and (depending on ) for each such that
Now, we give two different kinds of hypotheses imposed on the cost functions to ensure the existence of Nash equilibria.
Assumption 3.
- (1)
- (Growth condition) There exists a constant such that for each and .
- (2)
- The function is continuous in for each fixed .
Assumption 4.
Suppose that there exist a constant and functions on with for each and , such that
- (1)
- and ;
- (2)
- is continuous in for each .
Remark 2.
- (1)
- It should be mentioned that the cost functions in References [6,7,9] are assumed to be bounded. However, the cost functions are always unbounded in some controlled stochastic models such as the stochastic inventory model, queueing system; see, for instance, References [14,15].
- (2)
- As in Assumption 4, the case that the immediate costs are additive is also considered in References [7,19] for discrete-time unconstrained stochastic games.
Lemma 1.
Suppose that Assumptions 2 and 3(1) (resp. 4(1)) hold. For each , and ,
For each function (resp. ) and , let us define the f-norm by (resp. ) and the Banach space (resp. ). The set of all bounded measurable functions on is denoted by .
For each and , let us define
For each , let denote the t-step transition probability from to corresponding to . Obviously, .
Definition 3.
For each and , the transition kernel is said to be irreducible, if for all .
Assumption 5.
For each ,
- (1)
- let , there exist a probability measure on and constants and such thatfor each ;
- (2)
- is irreducible for all .
Remark 3.
- (1)
- Under Assumptions 2 and 5, it is easy to see that is the unique invariant probability measure of the transition kernel , see Reference ([17], p.12). Moreover, we have for each and for each .
- (2)
- The w-uniform geometrical ergodicity condition in Assumption 5(1) has been widely used in discrete-time Markov decision processes, see References [17,18] and the reference therein. Since the state space in References [6,7] is finite, the standard ergodicity condition is only required in References [6,7]. Moreover, as the cost functions in Reference [9] are assumed to be bounded, the weaker uniform geometrical ergodicity condition is imposed on [9].
For each , we define the average occupation measure for each and the set
For each and , we introduce the following Markov decision processes :
where
for each and , the other components are the same as in (1).
For each and , the expected average cost is defined by
Definition 4.
For a fixed , strategy is said to be feasible for if
Let be the set of all feasible strategies of . A strategy is said to be optimal for if and
We denote by the optimal value of .
For each and , under Assumptions 2, 3 (resp. 4) and 5, we have
Now, we introduce the following constrained optimality problem
Lemma 2.
Suppose that Assumptions 1, 2 and 5 are satisfied and in addition either Assumption 3 or 4 holds. Let and with for each . Then
Proof.
We only prove for . Let us define
First, we assume Assumption 3 holds. It follows from Assumption 5 that, for each player , we have
where for each . For notational ease, we define for and for and . Hence,
Thus, we have that
which implies the desired result. On the other hand, if Assumption 4 holds,
□
Let be the function as in Assumption 2 for each . Let us define and .
Definition 5.
For each , let . is said to converge to with respect to the -weak topology if and only if for each In this case, we denote it by .
Let and with , we say if for each . For each and , we define for each .
To ensure the existence of constrained Nash equilibria, we also need the following standard continuity-compactness conditions which are widely used; see, for instance, References [4,14,16,17,19] and the reference therein.
Assumption 6.
- (1)
- For each and , is a compact set.
- (2)
- The functions is continuous in for each fixed and .
- (3)
- The functions is continuous in for each fixed and .
Lemma 3.
Suppose that Assumptions 2, 5 and 6 hold. For each , the set is convex and compact with respect to the -weak topology.
Proof.
Fix , it follows from Remark 3 that and the set is compact in for each . Hence, it follows from Corollary A.46 in Reference [20], the set is compact with respect to the -weak topology. The other statement can be obtained by Lemma 5.2.2 in Reference [18]. □
Remark 4.
The standard weak convergence technique used in References [6,7] for bounded costs does not apply directly to the case wherein costs are unbounded in this paper.
Lemma 4.
Suppose that Assumptions 1, 2, 5 and 6 are satisfied and in addition either Assumption 3 or 4 holds. Let be fixed, and , such that and weakly in . Then,
for each .
Proof.
For each , we assume . According to proposition D.8 in Reference [16], there exist and such that and for each . For an arbitrary , it follows from Lemma 5.1.2 in Reference [18] that
Since , it follows from Assumption 6(2) that
which together with Lemma 5.1.2 in Reference [18] implies that for each . Similarly, we also have . Thus, we can see that
and weakly in for each . It follows from Assumption 3(2) or Assumption 4 that
for each , where for each .
Under Assumptions 2(2) and 5, we have that
If Assumption 3 is replaced by Assumption 4, then we have
□
The following slater condition is common for constrained games, see References [2,6,7,8,9,10].
Assumption 7.
(Slater condition) For each stationary multi-strategy and each player i, there exists such that
Lemma 5.
Suppose that Assumptions 1, 2, 5, 6 and 7 are satisfied and in addition either Assumption 3 or 4 holds. Let and be fixed. Then,
- (1)
- for each , there exists a stationary strategy (depending on , and ), such that for each ;
- (2)
- there exists a stationary strategy such that for each .
Proof.
- (1)
- See Lemma 5.7.10 in Reference [16].
- (2)
- Let such that for each . For the fixed , let be the corresponding strategy as in Assumption 7. It follows from Lemma 2 that for each . By part (1), there exists a stationary strategy (depending on and ) such that for each .
□
It follows from Lemma 5.1.2 in Reference [18] that if and only if
According to Lemma 5, the constrained optimality problem (6) can be restricted to the set of all stationary strategies. Hence, for each and , the constrained optimality problem (6) is equivalent to the following best response linear program :
For a fixed and , is said to be feasible for LP if it satisfies (10). The sets of all feasible solutions and optimal solutions are denoted by and , respectively. The optimal value of is denoted by .
Proposition 1.
Suppose that Assumptions 1, 2, 5, 6 and 7 and are satisfied and in addition either Assumption 3 or 4 holds. Let and such that for each . Then,
- (1)
- is an optimal strategy of for each if and only if φ is a stationary constrained Nash equilibrium;
- (2)
- is an optimal strategy of LP for each if and only if φ is a stationary constrained Nash equilibrium.
Proof.
- (1)
- ⟹. Let be fixed and . It follows from Lemma 2 thatThus, we have . Then, Lemma 2 yields thatwhich implies that is a constrained Nash equilibrium.⟸. For each , we take an arbitrary strategy such that for each . Lemma 2 yields that for each , which implies that . As is a constrained Nash equilibrium, it follows that , which implies that is an optimal strategy of .
- (2)
- is optimal for LP if and only if is optimal for . Hence, we get the desired result by part (1). □
Lemma 6.
Suppose that Assumptions 1, 2, 5, 6 and 7 are satisfied and in addition either Assumption 3 or 4 holds. Let be fixed and such that , and for each . Then, is relatively compact in with respect to the -weak topology and the accumulation point is a feasible solution of LP.
Proof.
It follows from Lemma 3 that there exists a subsequence such that with respect to the -weak topology. Then, Lemma 4 yields that
As , it follows that
which together with (11) implies that . □
Lemma 7.
Suppose that Assumptions 1, 2, 5, 6 and 7 are satisfied and in addition either Assumption 3 or 4 holds. Let be fixed and take an arbitrary , then LP has an optimal solution.
Proof.
First, it follows from Lemma 5(2) that . Let be the minimizing sequence such that
By Lemma 6, there exists a subsequence of such that . Then, it follows from Lemma 4 and (12) that
This is . □
The idea of the proof of the following lemma is from Lemma 5.1 and Theorem 3.9 in Reference [21] for constrained discrete-time Markov decision processes with discounted cost criteria.
Lemma 8.
Suppose that Assumptions 1, 2, 5, 6 and 7 are satisfied and in addition either Assumption 3 or 4 holds. Let be fixed and such that . Then,
- (1)
- for each , there exist an integer N and for all , such that weakly in ;
- (2)
- if is an accumulation point of sequence in which is an optimal solution of LP for each , then
Proof.
- (1)
- Lemma 5(2) gives the existence of and constant , such that and for all . By Lemma 4, it follows thatfor allLet be such that . By (13)-(14), there exists an integer (depending on ) such that, for each and ,Let where . Then, we derive from Lemma 3 that , andfor each and , which implies that for all .Then, let , such that and . For each fixed (corresponding to a given ), as in the previous argument, there exists an integer (depending on s) which is assumed to be increasing in , such thatLet
- (2)
- By Lemma 6, without loss of generality, we assume that weakly in . On the other hand, for any , it follows from part (1) that there exist an integer N and for all , such that weakly in , which together with , givesThen, by Lemma 4 and (18), we havewhich implies .
4. The Main Results
In this section, we give the existence and characterization of stationary constrained Nash equilibria. For each , we introduce the following two multi-functions:
Theorem 1.
Suppose that Assumptions 1, 2, 5, 6 and 7 are satisfied and in addition either Assumption 3 or 4 holds. Then, there exists a stationary constrained Nash equilibrium.
Proof.
Let be fixed, and such that and . Then, by Lemma 8(2), we know that which implies that is upper semi-continuous. By the arbitrariness of i, we can derive that is upper semi-continuous. Moreover, Lemmas 3, 7 and (10) deduce that the set is nonempty and convex for each . Moreover, we can see that is compact by the upper semi-continuity of . Hence, it follows from Fan’s fixed point Theorem in Reference [22] that that has a fixed point . By Proposition D.8 in Reference [16], there exists such that , which together with Proposition 1, implies that is a stationary constrained Nash equilibrium. □
By the same proof of Theorems 5.3.1 and 5.4.1 in Reference [18] for discrete-time Markov decision processes, we can yield the following statements.
Lemma 9.
Suppose that Assumptions 2, 5, 6 and 7 are satisfied and in addition either Assumption 3 or 4 holds. Then, for each given ,
- (1)
- there exists a function and a vector for each such that
- (2)
- for each , there exists such thatand .
Now we introduce the following duality program for each and :
Combining and together, we introduce the following mathematical program :
where . Let us define the function , for each .
In Reference [7], the authors characterize each stationary constrained Nash equilibrium in the finite-state games as a global minimum of certain mathematical program. The following theorem generalizes the above result to the denumerable-state games.
Theorem 2.
Suppose that Assumptions 1, 2, 5, 6 and 7 are satisfied and in addition either Assumption 3 or 4 holds. Then,
- (1)
- there exists such that it is a global minimum of the mathematical program MP;
- (2)
- suppose that is a global minimum of the mathematical program MP with . Then, there exits such that and is a constrained Nash equilibrium of the model .
Proof.
- (1)
- Let be a constrained Nash equilibrium and let be the average occupation measure corresponding to . By Proposition 1, satisfies (10) if is replaced by . Theorem 5.3.1 in Reference [18] and Theorem 10.3.6 in Reference [17] yield that there exists which satisfies (20) by replacing with andfor each . Let , we have . In turn, let be a feasible point of , we get thatwhich implies .
- (2)
Then, let and take an arbitrary feasible solution of , we have
Then, proceeding as in the proof of (22), it follows that
which implies is optimal for each . Then, Proposition D.8 in Reference [16] yields that there exists a stationary strategy such that for each . By Proposition 1, is a stationary constrained Nash equilibrium of the model .
□
5. An Example
In this section, we use a wireless network to illustrate our conditions and main results.
Example 1.
Consider a wireless network in which there are N nodes and each node contains a mobile, a buffer and a channel. Let denote the number of packets in the buffer of node i and we assume that the new packets are not admitted if the buffer is not empty. At time t, if , each mobile transmits a packet with power , where . We assume each mobile will retransmit packet at time if the packet has not been transmitted successfully. When , the number of the new arrivals at time t is denoted by , which is assume to be identically distributed. The number of packets in the buffer i and the action are only available for mobile i itself. Hence, the transition probability are defined as follows: if
and , where with for each . Let denote the initial distribution of the buffer i, be some base value of the power, be the cost of power for node i and denote the delay cost for node i. As in Reference [12], we assume that the signal to interference ratio of mobile i denoted by is given by
where are the coding orthogonality coefficients and is the thermal noise in the medium, are some constants. Let denotes the cost that each mobile wants to minimize, and denote the constraints of node i corresponding to and .
Condition 1.
- (1)
- (2)
- There exists an interval such that for each .
- (3)
- There exists such that for each i.
- (4)
- There exists an action such that and for each i.
Proposition 2.
Under condition 1, Example 1 satisfies Assumptions 1, 2, 3, 5, 6 and 7. Hence, (by Theorem 1), there exists a stationary constrained Nash equilibrium.
Proof.
For each i and , let denote the invariant measure and , which satisfies
Hence, it follows from Condition 1(1) that
which implies that
Let us define , the functions and and for each . It is obvious that
If ,
and
Thus, by Condition 1(2), it follows that
Hence, the model satisfies Assumption 6(3). Let and . Similarly, if ,
if ,
Therefore, we have
for each , which together with Condition 1(3) and Proposition 10.2.5 in Reference [17], implies Assumptions 2 and 5 with . Let for each i and , it follows from (23) and Condition 1(1) that
which together with Condition 1(4) implies the Assumption 7. It is obvious that Assumptions 3 and 6(1)–(2) hold for this model. Thus, by Theorem 1, there exists a stationary constrained Nash equilibrium. □
6. Conclusions
In the present paper, we have studied the discrete-time constrained stochastic games with denumerable state space under the average cost criteria. By introducing the average occupation measures, we have established the best-response linear program and characterized the average occupation measures of stationary Nash equilibria as fixed points of ceratin multifunction. By introducing the so-called w-weak convergence topology, we have considered the asymptotic properties of average occupation measures and given the new existence condition of stationary constrained Nash equilibria. Furthermore, we have established a mathematical program and showed that each stationary Nash equilibria is a global minimizer of this mathematical program. It should be mentioned that the arguments in References [6,7] employ the compactness of the space of occupation measures with respect to the standard weak convergence topology and the solvability of best response linear program which can not apply directly for the case of unbounded costs and denumerable state space. However, the costs are usually unbounded and state space is not finite in some controlled stochastic models. Moreover, the fixed point method in the present paper is also different from the vanishing discount method in Reference [9] which is based on the existence of constrained discounted Nash equilibria. Finally, a wireless network has been presented to illustrate the applications of our main results.
Funding
This research was funded by the National Natural Science Foundation of China (Grant No. 11801080).
Conflicts of Interest
The authors declare no conflict of interest.
References
- Shapley, L.S. Stochastic games. Proc. Natl. Acad. Sci. USA 1953, 39, 1095–1100. [Google Scholar] [CrossRef] [PubMed]
- Alvarez-Mena, J.; Hernández-Lerma, O. Existence of Nash equilibria for constrained stochastic games. Math. Methods Oper. Res. 2006, 63, 261–285. [Google Scholar] [CrossRef]
- Filar, J.; Vrieze, K. Competitive Markov Decision Processes; Springer: New York, NY, USA, 1997. [Google Scholar]
- Nowak, A.S. On stochastic games in economics. Math. Methods Oper. Res. 2007, 66, 513–530. [Google Scholar] [CrossRef]
- Yang, J.; Guo, X.P. Zero-sum stochastic games with average payoffs: New optimality conditions. Acta Math. Sin. Engl. Ser. 2009, 25, 1201–1216. [Google Scholar] [CrossRef]
- Altman, E.; Avrachenkov, K.; Bonneau, N.; Debbah, M.; EI-Azouzi, R.; Sadoc Menasche, D. Constrained cost-coupled stochastic games with independent state processes. Oper. Res. Lett. 2008, 36, 160–164. [Google Scholar] [CrossRef][Green Version]
- Singh, V.V.; Hemachandra, N. A characterization of stationary Nash equilibria of constrained stochastic games with independent state processes. Oper. Res. Lett. 2014, 42, 48–52. [Google Scholar] [CrossRef]
- Altman, E.; Shwartz, A. Constrained Markov Games: Nash Equilibria. In Annals of the International Society of Dynamic Games: Advances in Dynamic Games and Applications; Filar, J.A., Gaitsgory, V., Mizukami, K., Eds.; Birkha¨user: Boston, MA, USA, 2000; Volume 5, pp. 213–221. [Google Scholar]
- Wei, Q.D.; Chen, X. Constrained stochastic games with the average payoff criteria. Oper. Res. Lett. 2015, 43, 83–88. [Google Scholar] [CrossRef]
- Zhang, W.Z.; Huang, Y.H.; Guo, X.P. Nonzero-sum constrained discrete-time Markov games: The case of unbounded costs. TOP 2014, 22, 1074–1102. [Google Scholar] [CrossRef]
- Niyato, D.; Wang, P.; Kim, D.I.; Han, Z.; Xiao, L. Game theoretic modeling of jamming attack in wireless powered communication networks. In Proceedings of the 2015 IEEE International Conference on Communications (ICC), London, UK, 8–12 June 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 6018–6023. [Google Scholar]
- Altman, E.; Avratchenkov, K.; Bonneau, N.; Debbah, M.; El-Azouzi, R.; Menasché, D.S. Constrained stochastic games in wireless networks. In Proceedings of the IEEE GLOBECOM 2007—IEEE Global Telecommunications Conference, Washington, DC, USA, 26–30 November 2007; IEEE: Piscataway, NJ, USA, 2007; pp. 315–320. [Google Scholar]
- Ko, H.; Pack, S. Neighbor-aware energy-efficient monitoring system for energy harvesting internet of things. IEEE Internet Things J. 2019, 6, 5745–5752. [Google Scholar] [CrossRef]
- Puterman, M.L. Markov Decision Processes: Discrete Stochastic Dynamic Programming; Wiley: New York, NY, USA, 1994. [Google Scholar]
- Sennott, L.I. Stochastic Dynamic Programming and the Control of Queueing Systems; John Wiley & Sons: Hoboken, NJ, USA, 2009. [Google Scholar]
- Hernández-Lerma, O.; Lasserre, J.B. Discrete-Time Markov Control Processes: Basic Optimality Criteria; Springer: New York, NY, USA, 1996. [Google Scholar]
- Hernández-Lerma, O.; Lasserre, J.B. Further Topics on Discrete-Time Markov Control Processes; Springer: New York, NY, USA, 1999. [Google Scholar]
- Mendoza-Pérez, A. Pathwise Average Reward Markov Control Processes. Ph.D. Thesis, CINVE-STAV-IPN, Mexico City, Mexico, 2008. Available online: http://www.math.cinvestav.mx/ohernand_students (accessed on 22 October 2019).
- Nowak, A.S. Remarks on sensitive equilibria in stochastic games with additive reward and transition structure. Math. Methods Oper. Res. 2006, 64, 481–494. [Google Scholar] [CrossRef]
- Föllmer, H.; Schied, A. Stochastic Finance: An Introduction in Discrete Time; Walter de Gruyter: Berlin, Germany, 2004. [Google Scholar]
- Alvarez-Mena, J.; Hernández-Lerma, O. Convergence of the optimal values of constrained Markov control processes. Math. Methods Oper. Res. 2002, 55, 461–484. [Google Scholar] [CrossRef]
- Fan, K. Fixed-point and minimax theorems in locally convex topological linear spaces. Proc. Natl. Acad. Sci. USA 1952, 38, 121–126. [Google Scholar] [CrossRef] [PubMed]
© 2019 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).