Next Article in Journal
Intention-Based Sharing
Next Article in Special Issue
Game Theory of Tumor–Stroma Interactions in Multiple Myeloma: Effect of Nonlinear Benefits
Previous Article in Journal
Ethics, Morality, and Game Theory
Previous Article in Special Issue
Fractionated Follow-Up Chemotherapy Delays the Onset of Resistance in Bone Metastatic Prostate Cancer

Games 2018, 9(2), 21; https://doi.org/10.3390/g9020021

Article
Bifurcation Mechanism Design—From Optimal Flat Taxes to Better Cancer Treatments
1
Department of Electrical and Computer Engineering, University of Texas at Austin, Austin, TX 78705, USA
2
Integrated Mathematical Oncology, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL 33612, USA
3
Engineering Systems and Design (ESD), Singapore University of Technology and Design, 8 Somapah Road, Singapore 487372, Singapore
*
Author to whom correspondence should be addressed.
Received: 25 February 2018 / Accepted: 18 April 2018 / Published: 26 April 2018

Abstract

:
Small changes to the parameters of a system can lead to abrupt qualitative changes of its behavior, a phenomenon known as bifurcation. Such instabilities are typically considered problematic, however, we show that their power can be leveraged to design novel types of mechanisms. Hysteresis mechanisms use transient changes of system parameters to induce a permanent improvement to its performance via optimal equilibrium selection. Optimal control mechanisms induce convergence to states whose performance is better than even the best equilibrium. We apply these mechanisms in two different settings that illustrate the versatility of bifurcation mechanism design. In the first one we explore how introducing flat taxation could improve social welfare, despite decreasing agent “rationality,” by destabilizing inefficient equilibria. From there we move on to consider a well known game of tumor metabolism and use our approach to derive potential new cancer treatment strategies.
Keywords:
game theory; cancer; economics; hysteresis

1. Introduction

The term bifurcation, which means splitting in two, is used to describe abrupt qualitative changes in system behavior due to smooth variation of its parameters. Bifurcations are ubiquitous and permeate all natural phenomena. Effectively, they produce discrete events (e.g., rain breaking out) out of smoothly varying, continuous systems (e.g., small changes to humidity or temperature). Typically, they are studied through bifurcation diagrams, multi-valued maps that prescribe how each parameter configuration translates to possible system behaviors (e.g., Figure 1).
Bifurcations arise in a natural way in game theory. Games are typically studied through their Nash correspondences, a multi-valued map connecting the parameters of the game (i.e., payoff matrices) to system behavior, in this case Nash equilibria. As we slowly vary the parameters of the game, typically the Nash equilibria will also vary smoothly, except at bifurcation points where, for example, the number of equilibria abruptly changes as some equilibria appear/disappear altogether. Such singularities may substantially impact both system behavior and system performance. For example, if the system state was at an equilibrium that disappeared during the bifurcation, then a turbulent transitionary period ensues where the system tries to reorganize itself at one of the remaining equilibria. Moreover, the quality of all remaining equilibria may be significantly worse than the original. Even more disturbingly, it is not a priori clear that the system will equilibrate at all. Successive bifurcations that lead to increasingly more complicated recurrent behavior is a standard route to chaos [1], which may have devastating effects on system performance.
Game theorists are particularly aware of the need to produce “robust" predictions, i.e., predictions that allow for deviations from an idealized exact specification of the parameters of the setting [2]. For example, ϵ -approximate Nash equilibria allow for the possibility of computational bounded agents, whereas ϵ -regret outcomes allow for persistently non-equilibrating behavior [3]. These approaches, however, do not really address the problem at its core as any solution concept defines a map from parameter space to behavioral space and no such map is immune to bifurcations. If pushed hard enough any system will destabilize. The question is what happens next?
Well, a lot of things may happen. It is intuitively clear that if we are allowed to play around arbitrarily with the payoffs of the agents then we can reproduce any game and no meaningful analysis is possible. Using payoff entries as controlling parameters is problematic for another reason. It is not clear that there exists a compelling parametrization of the payoff space that captures how real life decision-makers deviate from the Platonic ideal of the payoff matrix. Instead, we focus on another popular aspect of economic theory: agent “rationality”.
We adopt a standard model of boundedly rational learning agents. Boltzmann Q-learning dynamics [4,5,6] is a well studied behavioral model in which agents are parameterized by a temperature/rationality term T. Each agent keeps track of the collective past performance of his/her actions (i.e., learns from experience) and chooses an action according to a Boltzmann/Gibbs distribution with parameter T. When applied to a multi-agent game, the behavioral fixed points of Q-learning are known as quantal response equilibria (QREs) [7]. Naturally, QREs depend on the temperature T. As T 0 players become perfectly rational, and play approaches a Nash equilibrium,1 whereas as T all agents use uniformly random strategies. As we vary the temperature the QRE(T) correspondence moves between these two extremes producing bifurcations along the way at critical points where the number of QREs changes (Figure 1).
Our goal in this paper is to quantify the effects of these rationality-driven bifurcations to the social welfare of two-player two-strategy games. At this point, a moment of pause is warranted. Why is this a worthy goal? Games of small size ( 2 × 2 games in particular) are rarely seem like a subject worthy of serious scientific investigation. This, however, could not be further from the truth.
First, the correct way to interpret this setting is from the point of population games where each agent is better understood as a large homogeneous population (e.g., men and women, attackers and defenders, cells of phenotype A, and cells of phenotype B). Each of a handful of different types of users has only a few meaningful actions available to them. In fact, from the perspective of applied game theory, only such games with a small number of parameters are practically meaningful. The reason should be clear by now. Any game theoretic modeling of a real life scenario is invariably noisy and inaccurate. In order for game-theoretic predictions to be practically binding, they have to be robust to these uncertainties. If the system intrinsically has a large number of independent parameters, e.g., 20, then this parameter space will almost certainly encode a vast number of bifurcations, which invalidate any theoretical prediction. Practically useful models need to be small.
Secondly, game theoretic models applied for scientific purposes are often small. Specifically, the exact setting studied here with Boltzmann Q-learning dynamics applied in 2 × 2 games has been used to model the effects of taxation to agent rationality [9] (see Section 6.2 for a more extensive discussion) as well as to model the effects of treatments that trigger phase transitions to cancer dynamics [10] (see Section 6.1). Our approach yields insights to explicit open questions in both of these applications areas. In fact, direct application of our analysis can address similar inquiries for any other phenomenon modeled by Q-learning dynamics applied in 2 × 2 games.
Finally, the analysis itself is far from straightforward as it requires combining sets of tools and techniques that have so far been developed in isolation from each other. On one hand, we need to understand the behavior of these dynamical systems using tools from topology of dynamical systems, whose implications are largely qualitative (e.g., prove the lack of cyclic trajectories). On the other hand, we need to leverage these tools to quantify at which exact parameter values bifurcations occur and produce price-of-anarchy guarantees, which by definition are quantitative. As far as we know, this is the first instance of a fruitful combination of these tools. In fact, not only do we show how to analyze the effects of bifurcations to system efficiency, we also show how to leverage this understanding (e.g., knowledge of the geometry of the bifurcation diagrams) to design novel types of mechanisms with good performance guarantees.

Our Contribution

We introduce two different types of mechanisms: hysteresis and optimal control mechanisms.
Hysteresis mechanisms use transient changes to the system parameters to induce permanent improvements to its performance via optimal (Nash) equilibrium selection. The term hysteresis is derived from an ancient Greek word that means “to lag behind.” It reflects a time-based dependence between the system’s present output and its past inputs. For example, let’s assume that we start from a game theoretic system of Q-learning agents with temperature T = 0 and assume that the system has converged to an equilibrium. By increasing the temperature beyond some critical threshold and then bringing it back to zero, we can force the system to provably converge to another equilibrium, e.g., the best (Nash) equilibrium (Figure 1, Theorem 4). Thus, we can ensure performance equivalent to that of the price of stability instead of the price of anarchy. One attractive feature of this mechanism is that from the perspective of the central designer it is relatively “cheap" to implement. Whereas typical mechanisms require the designer to continuously intervene (e.g., by paying the agents) to offset their greedy tendencies, this mechanism is transient with a finite amount of total effort from the perspective of the designer. Further, the idea that game theoretic systems have effectively systemic memory is rather interesting and could find other applications within algorithmic game theory.
Optimal control mechanisms induce convergence to states whose performance is better than even the best Nash equilibrium. Thus, we can at times even beat the price of stability (Theorem 5). Specifically, we show that by controlling the exploration/exploitation tradeoff, we can achieve strictly better states than those achievable by perfectly rational agents. In order to implement such a mechanism, it does not suffice to identify the right set of agents’ parameters/temperatures so that the system has some QRE whose social welfare is better than the best Nash. We need to design a trajectory through the parameter space so that this optimal QRE becomes the final resting point.

2. Preliminaries

2.1. Game Theory Basics: 2 × 2 Games

In this paper, we focus on 2 × 2 games. We define it as a game with two players, and each player has two actions. We write the payoff matrices of the game for each player as
A = a 11 a 12 a 21 a 22 B = b 11 b 12 b 21 b 22
respectively. The entry a i j denotes the payoff for Player 1 when s/he chooses action i and his/her opponent chooses action j; similarly, b i j denotes the payoff for Player 2 when s/he chooses action i and his/her opponent chooses action j. We define x as the probability that the Player 1 chooses his/her first action, and y as the probability that Player 2 chooses his/her first action. We also define two row vectors x = ( x , 1 x ) T and y = ( y , 1 y ) T as the strategy for each player. For simplicity, we denote the i-th entry of vector x by x i . We call the tuple ( x , y ) as the system state or the strategy profile.
An important solution concept in game theory is the Nash equilibrium, where each user cannot make profit by unilaterally changing his/her strategy, that is
Definition 1 (Nash equilibrium).
A strategy profile ( x N E , y N E ) is a Nash equilibrium (NE) if
x N E arg max x [ 0 , 1 ] x T A y N E y N E arg max y [ 0 , 1 ] y T B x N E .
We call ( x N E , y N E ) a pure Nash equilibrium (PNE) if both x N E { 0 , 1 } and y N E { 0 , 1 } . A Nash equilibrium assumes each user is fully rational. An alternative solution concept is the quantal response equilibrium [7], where it assumes that each user has bounded rationality:
Definition 2 (Quantal response equilibrium).
A strategy profile ( x Q R E , y Q R E ) is a QRE with respect to temperature T x and T y if
x Q R E = e 1 T x ( A y Q R E ) 1 j { 1 , 2 } e 1 T x ( A y Q R E ) j 1 x Q R E = e 1 T x ( A y Q R E ) 2 j { 1 , 2 } e 1 T x ( A y Q R E ) j y Q R E = e 1 T y ( B x Q R E ) 1 j { 1 , 2 } e 1 T y ( B x Q R E ) j 1 y Q R E = e 1 T y ( B x Q R E ) 2 j { 1 , 2 } e 1 T y ( B x Q R E ) j .
Analogous to the definition of Nash equilibria, we can consider the QREs as the case where each player is not only maximizing the expected utility but also maximizing the entropy. We can see that the QREs are the solutions to maximizing the linear combination of the following program:
x Q R E arg max x x T A y Q R E T x j x j ln x j y Q R E arg max y y T B x Q R E T y j y j ln y j .
This formulation has been widely seen in Q-learning dynamics literature (e.g., [9,11,12]). With this formulation, we can find that the two parameters T x and T y control the weighting between the utility and the entropy. We call T x and T y the temperatures, and their values define the level of irrationality. If T x and T y are zero, then both players are fully rational, and the system state is a Nash equilibrium. However, if both T x and T y are infinity, then each player is choosing his/her action according to a uniform distribution, which corresponds to the fully irrational players.

2.2. Efficiency of an Equilibrium

The performance of a system state can be measured via the social welfare. Given a system state ( x , y ) , we define the social welfare as the sum of the expected payoff of all users in the system:
Definition 3.
Given a 2 × 2 game with payoff matrices A and B , and a system state ( x , y ) , the social welfare is defined as
S W ( x , y ) = x y ( a 11 + b 11 ) + x ( 1 y ) ( a 12 + b 21 ) + y ( 1 x ) ( a 21 + b 12 ) + ( 1 x ) ( 1 y ) ( a 22 + b 22 ) .
In the context of algorithmic game theory, we can measure the efficiency of a game by comparing the best social welfare with the social welfare of equilibrium system states. We call the strategy profile that achieves the maximal social welfare as the socially optimal (SO) strategy profile. The efficiency of a game is often described as the notion of the price of anarchy (PoA) and the price of stability (PoS). Given a set of equilibrium states S, we define the PoA/PoS as the ratio of the social welfare of the socially optimal state to the social welfare of the worst/best equilibrium state in S, respectively. Formally,
Definition 4.
Given a 2 × 2 game with payoff matrices A and B, and a set of equilibrium system states S [ 0 , 1 ] 2 , the price of anarchy (PoA) and the price of stability (PoS) are defined as
P o A ( S ) = max ( x , y ) [ 0 , 1 ] 2 S W ( x , y ) min ( x , y ) S S W ( x , y ) P o S ( S ) = max ( x , y ) [ 0 , 1 ] 2 S W ( x , y ) max ( x , y ) S S W ( x , y ) .

3. Our Model

3.1. Q-Learning Dynamics

In this paper, we are particularly interested in the scenario when both players’ strategies are evolving under Q-learning dynamics:
x ˙ i = x i ( A y ) i x T A y + T x j x j ln ( x j / x i ) y ˙ i = y i ( B x ) i y T B x + T y j y j ln ( y j / y i ) .
Q-learning dynamics has been studied because of its connection with multi-agent learning problems. For example, it has been shown in [13,14] that Q-learning dynamics captures the system evolution of a repeated game, where each player learns his/her strategy through Q-learning and Boltzmann selection rules. More details are provided in Appendix A.
An important observation on the dynamics of Equation (2) is that it demonstrates the exploration/exploitation tradeoff [14]. We can find that the right hand side of Equation (2) is composed of two parts. The first part x i [ ( A y ) i x T A y ] is exactly the vector field of replicator dynamic [15]. Basically, the replicator dynamics drives the system to the state of higher utility for both players. As a result, we can consider this as a selection process in terms of population evolutionary, or an exploitation process from the perspective of a learning agent. Then, for the second part, x i [ T x j x j ln ( x j / x i ) ] , we show in the appendix that if the time derivative of x contains this part alone, this results in an increase of the system entropy.
The system entropy is a function that captures the randomness of the system. From the population evolutionary perspective, the system entropy corresponds to the variety of the population. As a result, this term can be considered as the mutation process. The level of the mutation is controlled by the temperature parameters T x and T y . Besides, in terms of the reinforcement learning, this term can be considered as an exploration process, as it provides the opportunity for the agent to gain information about the action that does not look the best so far.

3.2. Convergence of the Q-Learning Dynamics

By observing the Q-learning dynamics of Equation (2), we can find that the interior rest points for the dynamics are exactly the QREs of the 2 × 2 game. It is claimed in [16] (albeit without proof) that the Q-learning dynamics for a 2 × 2 game converges to interior rest points of probability simplexes for any positive temperature T x > 0 and T y > 0 . We provide a formal proof in Appendix B. The idea is that, for positive temperatures, the system is dissipative and, by leveraging the planar nature of the system, it can be argued that it converges to fixed points.

3.3. Rescaling the Payoff Matrix

At the end of this section, we discuss the transformation of the payoff matrices that preserves the dynamics in Equation (2). This idea is proposed in [17,18], where the rescaling of a matrix is defined as follows
Definition 5
([18]). A and B is said to be a rescaling of A and B if there exist constants c j , d i , and α > 0 , β > 0 such that a i j = α a i j + c j and b j i = β b j i + d i .
It is clear that rescaling the game payoff matrices is equivalent to updating the temperature parameters of the two agents in Equation (2). Therefore, it suffices to study the dynamics under the assumption that the 2 × 2 payoff matrices A and B are in the following diagonal form.
Definition 6.
Given 2 × 2 matrices A and B , their diagonal form is defined as
A D = a 11 a 21 0 0 a 22 a 12 B D = b 11 b 21 0 0 b 22 b 12
Note that, although rescaling the payoff matrices to their diagonal form preserves the equilibria, it does not preserve the social optimality, i.e., the socially optimal strategy profile in the transformed game is not necessarily the socially optimal strategy profile in the original game.

4. Hysteresis Effect and Bifurcation Analysis

4.1. Hysteresis effect in Q-Learning Dynamics: An Example

We begin our discussion with an example:
Example 1 (Hysteresis effect).
Consider a 2 × 2 game with reward matrices
A = 10 0 0 5 B = 2 0 0 4
There are two PNEs in this game: ( x , y ) = ( 0 , 0 ) and ( 1 , 1 ) . By fixing different T y , we can plot different QREs with respect to T x as in Figure 2 and Figure 3, which we call the bifurcation diagrams. For simplicity, we only show the value of x in the figure, since, according to Equation (4), given x and T y , the value of y can be uniquely determined. Assuming the system follows the Q-learning dynamics, as we slowly vary T x , x tends to stay on the line segment that is the closest to where it was originally corresponding to a stable but inefficient fixed point. We consider the following process:
1. 
Where the initial state is ( 0.05 , 0.14 ) , where T x 1 and T y 2 , plot x versus T x by fixing T y = 2 in Figure 3.
2. 
Fix T y = 2 and increase T x to where there is only one QRE correspondence.
3. 
Fix T y = 2 and decrease T x back to 1. Now x 0.997 .
In the above example, we can find that, although at the end the temperature parameters are set back to their initial value, the system state ends up being an entirely different equilibrium. This behavior is known as the hysteresis effect. In this section, we would like to answer the question of when this is going to happen. Further, in the next section, we will show how can we take advantage of this phenomenon.

4.2. Characterizing QREs

We consider the bifurcation diagrams for QREs in 2 × 2 games. Without loss of generality, we consider a properly rescaled 2 × 2 game with payoff matrices in the diagonal form:
A D = a X 0 0 b X , B D = a Y 0 0 b Y
We can also assume that the action indices are ordered properly and rescaled properly so that a X > 0 and | a X | | b X | . For simplicity, we assume a X = b X and b X = b Y do not hold at the same time. At QRE, we have
x = e 1 T x y a X e 1 T x y a X + e 1 T x ( 1 y ) b X y = e 1 T y x a Y e 1 T y x a Y + e 1 T y ( 1 x ) b Y .
Given T x and T y , there could be multiple solutions to Equation (4). However, we find that, if we know the equilibrium states, then we can recover the temperature parameters. We solve for T x and T y in Equation (4) and get
T X I ( x , y ) = ( a X + b X ) y + b X ln ( 1 x 1 ) T Y I ( x , y ) = ( a Y + b Y ) x + b Y ln ( 1 y 1 ) .
We call this the first form of representation, where T x and T y are written as functions of x and y. Here the capital subscripts for T X and T Y indicate that they are considered as functions. A direct observation of Equation (5) is that both of them are continuous function over ( 0 , 1 ) × ( 0 , 1 ) except for x = 1 / 2 and y = 1 / 2 .
An alternative way to describe the QRE is to write T x and y as a function of x and parameterize with respect to T y in the following second form of representation. This will be the form that we use to prove many useful characteristics of QREs.
T X I I ( x , T y ) = ( a X + b X ) y I I ( x , T y ) + b X ln ( 1 x 1 )
y I I ( x , T y ) = 1 + e 1 T y ( ( a Y + b Y ) x + b Y ) 1 .
In this way, if we are given T y , we are able to analyze how T x changes with x. This helps us understand how to answer the question of what the QREs are given T x and T x in the system.
We also want to analyze the stability of the QREs. From dynamical system theory (e.g., [19]), a fixed point of a dynamical system is said to be asymptotically stable if all of the eigenvalues of its Jacobian matrix have a negative real part; if it has at least one eigenvalue with a positive real part, then it is unstable. It turns out that, under the second form representation, we are able to determine whether a segment in the diagram is stable or not.
Lemma 1.
Given T y , the system state x , y I I ( x , T y ) is a stable equilibrium if and only if
1. 
T X I I x ( x , T Y ) > 0 if x ( 0 , 1 / 2 ) ;
2. 
T X I I x ( x , T Y ) < 0 if x ( 1 / 2 , 1 ) .
Proof. 
The given condition is equivalent to the case where both eigenvalues of the Jacobian matrix of the dynamics (2) are negative. ☐
Finally, we define the principal branch. In Example 1, we call the branch on x ( 0.5 , 1 ) the principal branch given T y = 2 , since, for any T x > 0 , there is some x ( 0.5 , 1 ) such that T X I I ( x , T y ) = T x . Analogously, we can define it formally as in the following definition with the help of the second form representation.
Definition 7.
Given T y , the region ( a , b ) ( 0 , 1 ) contains the principal branch of QRE correspondence if it satisfies the following conditions:
1. 
T X I I ( x , T y ) is continuous and differentiable for x ( a , b ) .
2. 
T X I I ( x , T y ) > 0 for x ( a , b ) .
3. 
For any T x > 0 , there exists x ( a , b ) such that T X I I ( x , T y ) = T x .
Further, for a region ( a , b ) that contains the principal branch, x ( a , b ) is on the principal branch if it satisfies the following conditions:
1. 
The equilibrium state ( x , y I I ( x , T y ) ) is stable.
2. 
There is no x ( a , b ) , x < x such that T X I I ( x , T y ) = T X I I ( x , T y ) .

4.3. Coordination Games

We begin our analysis with the class of coordination games, where we have all a X , b X , a Y , and b Y positive. Additionally, without loss of generality, we assume a X b X . In this case, there is no dominant strategy for either player, and there are two PNEs.
Let us revisit Example 1, we can make the following observations from Figure 2 and Figure 3:
  • Given T y , there are three branches. One is the principal branch, while the other two appear in pairs and occur only when T x is less than some value.
  • For small T y , the principal branch goes toward x = 0 ; for a large T y , the principal branch goes toward x = 1 .
Now, we are going to show that these observations are generally true in coordination games. The proofs in this section are deferred to Appendix D, where we will provide a detailed discussion on the proving techniques.
The first idea we are going to introduce is the inverting temperature, which is the threshold of T y in Observation (2). We define it as
T I = max 0 , b Y a Y 2 ln ( a X / b X ) .
We note that T I is positive only if b Y > a Y , which is the case where two players have different preferences. When T y < T I , as the first player increases his/her rationality from fully irrational, i.e., T x decreases from infinity, s/he is likely to be influenced by the second player’s preference. If T y is greater than T I , then the first player prefers to follow his/her own preference, making the principal branch move toward x = 1 . We formalize this idea in the following theorem:
Theorem 1 (Direction of the principal branch).
Given a 2 × 2 coordination game, and given T y , the following statements are true:
1. 
If T y > T I , then ( 0.5 , 1 ) contains the principal branch.
2. 
If T y < T I , then ( 0 , 0.5 ) contains the principal branch.
The second idea is the critical temperature, denoted as T C ( T y ) , which is a function of T y . The critical temperature is defined as the infimum of T x such that, for any T x > T C ( T y ) , there is a unique QRE correspondence under ( T x , T y ) . Generally, there is no close form for the critical temperature. However, we can still compute it efficiently, as we show in Theorem 2. Another interesting value of T y we should point out is T B = b Y ln ( a X / b X ) , which is the maximum value of T y that QREs not on the principal branch are presenting. Intuitively, as T y goes beyond T B , the first player ignores the decision of the second player and turns his/her face to what s/he thinks is better. We formalize the idea of T C and T B in the following theorem:
Theorem 2 (Properties about the second QRE).
Given a 2 × 2 coordination game, and given T y , the following statements are true:
1. 
For almost every T x > 0 , all QREs not lying on the principal branch appear in pairs.
2. 
If T y > T B , then there is no QRE correspondence in x ( 0 , 0.5 ) .
3. 
If T y > T I , then there is no QRE correspondence for T x > T C ( T y ) in x ( 0 , 0.5 ) .
4. 
If T y < T I , then there is no QRE correspondence for T x > T C ( T y ) in x ( 0.5 , 1 ) .
5. 
T C ( T y ) is given as T X I I ( x L , T y ) , where x L is the solution to the equality
y I I ( x , T y ) + x ( 1 x ) ln 1 x 1 y I I x ( x , T y ) = b X a X + b X .
6. 
x L can be found using binary search.
The next aspect of the QRE correspondence is their stability. According to Lemma 1, the stability of the QREs can also be inspected with the advantage of the second form representation by analyzing T X I I x . We state the results in the following theorem:
Theorem 3 (Stability).
Given a 2 × 2 coordination game, and given T y , the following statements are true:
1. 
If a Y b Y , then the principal branch is continuous.
2. 
If T y < T I , then the principal branch is continuous.
3. 
If T y > T I and a Y < b Y , then the principal branch may not be continuous.
4. 
If T x is fixed, for the pairs of QREs not lying on the principal branch, the one with the lowest distance to x = 0.5 is unstable, while the other one is stable.
Note that Part 3 in Theorem 3 infers that there is potentially an unstable segment between segments of the principal branch. This phenomenon is illustrated in Figure 4 and Figure 5. Though this case is weaker than other cases, this does not hinder us from designing a controlling mechanism as we are going to do in Section 5.3.

4.4. Non-Coordination Games

Due to space constraints, the analysis for non-coordination games is deferred to Appendix C.

5. Mechanism Design

In this section, we aim to design a systematic way to improve the social welfare in a 2 × 2 game by changing the temperature parameters. We focus our discussion on the class of coordination games. Recall that any 2 × 2 game has more than one PNE if and only if its diagonal form is a coordination game. This means that, in a coordination game, given any temperature parameters, there could be more than one equilibrium correspondences. In this case, we are not guaranteed to achieve the socially optimal equilibrium state even if we set the system to the correct temperatures due to the hysteresis effects that we have discussed in the previous section. Therefore, the main task for us in this section is to determine when and how we can get to the socially optimal equilibrium state. In Section 5.3, we consider the case when the socially optimal state is one of the PNEs. Since rescaling the payoff matrices to their diagonal form does not preserve the social optimality, in Section 5.1, we generalize our discussion to the case when the social optimal state does not coincide with any PNE.

5.1. Hysteresis Mechanism: Select the Best Nash Equilibrium via QRE Dynamics

First, we consider the case when the socially optimal state is one of the PNEs. The main task for us in this case is to determine when and how we can get to the socially optimal PNE. In Example 1, by sequentially changing T x , we move the equilibrium state from around ( 0 , 0 ) to around ( 1 , 1 ) , which is the social optimum state. We formalize this idea as the hysteresis mechanism and present it in Theorem 4. The hysteresis mechanism mainly takes advantage of the hysteresis effect we have discussed in Section 4—that we use transient changes of system parameters to induce permanent improvements to system performance via optimal equilibrium selection.
Theorem 4 (Hysteresis Mechanism).
Consider a 2 × 2 game that satisfies the following properties:
1. 
Its diagonal form satisfies a X , b X , a Y , b Y > 0 .
2. 
Exactly one of its pure Nash equilibrium is the socially optimal state.
Without loss of generality, we can assume a X b X . Then there is a mechanism to control the system to the social optimum by sequentially changing T x and T y if (1) a Y b Y and (2) the socially optimal state is ( 0 , 0 ) do not hold at the same time.
Proof. 
First, note that, if a Y b Y , by Theorem 1, the principal branch is always in the region x > 0.5 . As a result, once T y is increased beyond the critical temperature, the system state will no longer return to x < 0.5 at any positive temperature. Therefore, ( 0 , 0 ) cannot be approached from any state in x > 0.5 through the QRE dynamics.
On the other hand, if a Y b Y and the socially optimal state is the PNE ( 1 , 1 ) , then we can approach that state by first getting onto the principal branch. The mechanism can be described as
(C1)(a)Raise T x to some value above the critical temperature T C ( T y ) .
(b)Reduce T x and T y to 0.
Though in this case the initial choice of T y does not affect the result, if the social designer is taking the costs from assigning large T x and T y values into account, s/he is going to trade off between T C and T y since a typically smaller T y induces a larger T C .
Next, consider a Y < b Y . If we are aiming for state ( 0 , 0 ) , then we can undergo the following procedure:
(D1)(a)Keep T y at some value below T I = b Y a Y 2 ln ( a X / b X ) . Now the principal branch is at ( 0 , 0.5 ) .
(b)Raise T x to some value above the critical temperature T C ( T y ) .
(c)Reduce T x to 0.
(d)Reduce T y to 0.
On the other hand, if we are aiming for state ( 1 , 1 ) , then the following procedure suffices:
(D2)(a)Keep T y at some value above T I = b Y a Y 2 ln ( a X / b X ) . Now the principal branch is at ( 0.5 , 1 ) .
(b)Raise T x to some value above the critical temperature T C ( T y ) .
(c)Reduce T x to 0.
(d)Reduce T y to 0.
Note that, in the last two steps, only by reducing T y after T x keeps the state around x = 1 . We recommend that the interested reader refers to Figure 11 for Case (D1) and Figure 12 for Case (D2) for more insights. ☐

5.2. Efficiency of QREs: An Example

A question that arises with the solution concept of QRE is whether QRE improves social welfare? Here we show that the answer is yes. We begin with an example to illustrate:
Example 2.
Consider a standard coordination game with the payoff matrices of the form
A = ϵ 1 0 1 + ϵ B = 1 + ϵ 0 1 ϵ
where ϵ > ϵ > 0 are some small numbers. Note that, in this game, there are two PNEs, ( x , y ) = ( 1 , 1 ) and ( x , y ) = ( 0 , 0 ) , with social welfare values 1 + 2 ϵ and 1 + 2 ϵ , respectively. We can see that for small ϵ and ϵ values, the socially optimal state is ( x , y ) = ( 1 , 0 ) , with social welfare value 2. In this case, the state ( x , y ) = ( 1 , 1 ) is the PNE with the best social welfare. However, we are able to achieve a state with better social welfare than any NE through QRE dynamics. We illustrate the social welfare of the QREs with different temperatures of this example in Figure 6. In this figure, we can see that, at PNE, which is the point T x = T y = 0 , the social welfare is 1 + 2 ϵ . However, we are able to increase the social welfare by increasing T y . We will show in Section 5.3 a general algorithm for finding particular temperature as well as a mechanism, which we refer to as the optimal control mechanism, that drives the system to the desired state.

5.3. Optimal Control Mechanism: Better Equilibrium with Irrationality

Here, we show a general approach to improve the PoS bound for coordination games from Nash equilibria by QREs and Q-learning dynamics. We denote Q R E ( T x , T y ) as the set of QREs with respect to T x and T y . Further, denote Q R E as the set of the union of Q R E ( T x , T y ) over all positive T x and T y . Additionally, denote the set of pure Nash equilibria system states as N E . Since the set N E is the limit of Q R E ( T x , T y ) as T x and T y approach zero, we have the bounds:
P o A ( Q R E ) P o A ( N E ) , P o S ( Q R E ) P o S ( N E ) .
Then, we define QRE-achievable states:
Definition 8.
A state ( x , y ) [ 0 , 1 ] 2 is a QRE-achievable state if for every ϵ > 0 , there is a positive finite T x and T y and ( x , y ) such that | ( x , y ) ( x , y ) | < ϵ and ( x , y ) Q R E ( T x , T y ) .
Note that, with this definition, pure Nash equilibria are QRE-achievable states. However, the socially optimal states are not necessarily QRE-achievable. For example, we illustrate in Figure 7 the set of QRE-achievable states for Example 2. We can find that the socially optimal state, ( x , y ) = ( 1 , 0 ) , is not QRE-achievable. Nevertheless, it is easy to see from Figure 7 and Figure 8 that we can achieve a higher social welfare at ( x , y ) = ( 1 , 0.5 ) , which is a QRE-achievable state. Formally, we can describe the set of QRE-achievable states as the positive support of T X I and T Y I :
S = x 1 2 , 1 , y b X a X + b X , 1 x 0 , 1 2 , y 0 , b X a X + b X x b Y a Y + b Y , 1 , y 1 2 , 1 x 0 , b Y a Y + b Y , y 0 , 1 2 .
An example for the region of a game with a Y b Y is illustrated in Figure 7. For the case a Y < b Y , we demonstrate it in Figure 9.
In the following theorem, we propose the optimal control mechanism for a general process to achieve an equilibrium that is better than the PoS bound from Nash equilibria.
Theorem 5 (Optimal Control Mechanism).
Given a 2 × 2 game, if it satisfies the following property:
1. 
Its diagonal form satisfies a X , b X , a Y , b Y > 0 .
2. 
None of its pure Nash equilibrium is the socially optimal state.
Without loss of generality, we can assume a X b X . Then
1. 
there is a stable QRE-achievable state whose social welfare is better than any Nash equilibrium;
2. 
there is a mechanism to control the system to this state from the best Nash equilibrium by sequentially changing T x and T y .
Proof. 
Note that, given those properties, there are two PNEs ( 0 , 0 ) and ( 1 , 1 ) . Since we know neither of them is socially optimal, the socially optimal state must be either ( 0 , 1 ) or ( 1 , 0 ) .
First, consider a Y b Y . In this case, we know from Theorem 3 that all x ( 0.5 , 1 ) states belong to a principal branch for some T y > 0 and are stable, while for x < 0.5 not all of them are stable. We illustrate the region of stable QRE-achievable states in Figure 10. By Theorems 2 and 3, we can infer that the states near the border x = 0 are stable. As a result, we can claim that the following states are what we are aiming for:
(A1)
If ( 1 , 1 ) is the best NE and ( 0 , 1 ) is the SO state, then we select ( 0.5 , 1 ) .
(A2)
If ( 1 , 1 ) is the best NE and ( 1 , 0 ) is the SO state, then we select ( 1 , 0.5 ) .
(A3)
If ( 0 , 0 ) is the best NE and ( 0 , 1 ) is the SO state, then we select 0 , b X a X + b X .
(A4)
If ( 0 , 0 ) is the best NE and ( 1 , 0 ) is the SO state, then we select b Y a Y + b Y , 0 .
It is clear that these choices of states improve the social welfare. It is known that for the class of games we are considering, the price of stability is no greater than 2. In fact, in Cases A1 and A2, we reduce this factor to 4 / 3 . Additionally, in Cases A3 and A4, we reduce this factor to 1 2 + b X / 2 a X + b X 1 .
The next step is to show the mechanism to drive the system to the desired state. Due to symmetry, we only discuss Cases A1 and A3, where Cases A2 and A4 can be done analogously. For Case A1, the state corresponds to the temperatures T x and T y 0 . For any small δ > 0 , we can always find the state ( 0.5 + δ , 1 δ ) on the principal branch of some T y . This means that we can achieve this state from any initial state, not only from the NEs. With the help of the first form representation of the QREs in Equation (5), given any QRE-achievable system state ( x , y ) , we are able to recover them to corresponding temperatures through T X I and T Y I . The mechanism can be described as follows:
(A1)(a)From any initial state, raise T x to T X I ( 0.5 + δ , 1 δ ) .
(b)Decrease T y to T Y I ( 0.5 + δ , 1 δ ) .
For Case A3, the state we selected is not on the principal branch. This means that we cannot increase the temperatures too much; otherwise, the system state will move to the principal branch and will never return. We assume initially the system state is at ( δ , δ ) for some small δ > 0 , which is some state close to the best NE. Additionally, we can assume the initial temperatures are T x = T X I ( δ , δ ) and T y = T Y I ( δ , δ ) . Our goal is to arrive at the state δ 1 , b X a X + b X δ 2 for some small δ 1 > 0 and δ 2 > 0 such that δ 1 , b X a X + b X δ 2 is stable. We present the mechanism in the following:
(A3)(a)From the initial state ( δ , δ ) , move T x to T X I δ 1 , b X a X + b X δ 2 .
(b)Increase T y to T Y I δ 1 , b X a X + b X δ 2 .
Here, note that Step (b) should not proceed before Step (a) because, if we increase T y first, then we risk leaving the principal branch.
Next, consider the case where a Y < b Y . Similarly to the previous case, we know from Theorems 2 and 3 that states near the borders x = 0 , 0.5 , 1 and y = 0 , 0.5 , 1 are basically stable states. Hence, we can claim the following results:
(B1)
If ( 1 , 1 ) is the best NE and ( 0 , 1 ) is the SO state, then we select b Y a Y + b Y , 1 .
(B2)
If ( 1 , 1 ) is the best NE and ( 1 , 0 ) is the SO state, then we select ( 1 , 0.5 ) .
(B3)
If ( 0 , 0 ) is the best NE and ( 0 , 1 ) is the SO state, then we select 0 , b X a X + b X .
(B4)
If ( 0 , 0 ) is the best NE and ( 1 , 0 ) is the SO state, then we select 0.5 , 0 .
It is clear that these choices of states create improvement on the social welfare. An interesting result for this case is that basically these desired states can be reached from any initial state. Due to symmetry, we demonstrate the mechanisms for Cases (B3) and (B4), and the remaining ones can be done analogously.
For Case (B3), we are aiming for the state δ 1 , b X a X + b X δ 2 for some small δ 1 > 0 and δ 2 > 0 . We propose the following mechanism:
(B3)Phase 1: Getting to the principal branch.
(a)
From any initial state, fix T y at some value less than T I = b Y a Y 2 ln ( a X / b X ) .
(b)
Increase T x above the critical temperature T C ( T y ) .
(c)
Decrease T x to T x I δ 1 , b X a X + b X δ 2 .
Phase 2: Staying at the current branch.
(a)
Increase T y to T Y I δ 1 , b X a X + b X δ 2 .
This process is illustrated in Figure 11 and Figure 12. In Phase 1, as we are keeping low T y , meaning the second player is of more rationality. As the first player getting more rational, s/he is more likely to be influenced by the second player’s preference, and eventually getting to a Nash equilibrium. In phase 2, we make the second player more irrational to increase the social welfare. The level of irrationality we add in phase 2 should be capped to prevent the first player to deviate his/her decision.
For Case (B4), since our desired state is on the principal branch, the mechanism will be similar to Case (A1).
(B4)(a)From any initial state, raise T x to T X I ( 0.5 + δ , δ ) .
(b)Decrease T y to T Y I ( 0.5 + δ , δ ) .
 ☐
As a remark, in Cases (A3) and (A4), if we do not start from ( δ , δ ) but from some other states on the principal branch, we can instead aim for state ( 0.5 , 1 ) . This state is not better than the best Nash equilibrium, but still makes improvements over the initial state. The process can be modified as
(A3’)(a)From any initial state, raise T x to T X I ( 0.5 + δ , 1 δ ) (above T C ( T y ) ).
(b)Reduce T y to T Y I ( 0.5 + δ , 1 δ ) .

6. Applications

6.1. Evolution of Metabolic Phenotypes in Cancer

Evolutionary game theory (EGT) has been instrumental in studying evolutionary aspects of the somatic evolution that characterizes cancer progression. As opposed to conventional game theory, in evolutionary game theory, the strategies are fixed for the player and constitute its phenotype. Tumors are very heterogeneous, and frequency-dependent selection is a driving force in somatic evolution. While evolutionary outcomes can change depending on initial conditions or on the exact features and microenvironment of the relevant tumor phenotypes, evolutionary game theory can explain why certain clonal populations, usually the more aggressive and faster proliferating ones, emerge and overtake the previous ones. Tomlinson and Bodmer were the first to explore the role of cell–cell interactions in cancer using EGT [20]. This pioneering work was followed by others that built on those initial ideas to study the role of key aspects of cancer evolution, such as the role of space [21] treatment [22,23] or metabolism [10,24].
Work by Kianercy and colleagues [10] shows how microenvironmental heterogeneity impacts somatic evolution. Kianercy and colleagues show how the tumor’s genetic instability adapts to the heterogeneous microenvironment (with regard to oxygen concentration) to better tune metabolism to the dynamic microenvironment. While evolutionary dynamics can help a tumor population evolve to acquire all relevant mutations to become an aggressive cancer [25], they also help them become treatment-resistant, which leads to treatment failure as well as increased toxicity for the patient, which can result in patient death. Researchers such as Axelrod and colleagues [26] have speculated that tumor cells do not need to acquire all the hallmarks of cancer to become an aggressive cancer but that the cooperation between different cells with different abilities might allow the tumor as a whole to acquire all the hallmarks. A few years ago, Hanahan and Weinberg updated their original research to include disregulated metabolisms as one of the hallmarks of cancer [27]. Here we suggest that cooperation between cells with different metabolic needs and abilities could allow the tumor to grow faster but also present a new therapeutical target that could be clinically exploited. Namely, this cooperation, as described by Kinaercy and colleagues, allows for hypoxic cells to benefit from the presence of oxygenated non-glycolytic cells with modest glucose requirements, whereas cells with aerobic metabolism can benefit from the lactic acids that are the byproduct of anaerobic metabolism (see Figure 13).
By targeting this cooperation, a tumor’s growth and progression could be disrupted using novel microenvironmental pH normalizers. What our work suggests is that small perturbations could return the system back to a state different from the one it started so that the microenvironmental impact does not need to be too substantial for the therapy to have an impact. The work we have described here supports the hypothesis that hysteresis would allow us to apply treatments for a short duration of time with the aim of changing the nature of the game instead of killing tumor cells. This would have the combined advantages of reducing toxicity and side effects and decreasing selection for resistant tumor phenotypes and thus reducing the emergence of resistance to the treatment. For instance, treatments that aim to reduce the acidity of the environment [28] would impact not only acid producing cells but also the acid-resistant normoxic ones.
Our techniques (the hysteresis mechanism and the optimal control mechanism) can be applied to the cancer game [10] with two types of tumor phenotypic strategies: hypoxic cells and oxygenated cells (Table 1). These cells inhabit regions where oxygen could be either abundant or lacking. In the former, oxygenated cells with regular metabolism thrive but in the latter, hypoxic cells whose metabolism is less reliant on the presence of oxygen (but more on the presence of glucose) have higher fitness.

6.2. Taxation

A direct application for the solution concept of QRE is to analyze the effect of taxation, which has been discussed in [9]. Unlike Nash equilibria, for QREs, if we multiply the payoff matrix by some factor α , the equilibrium does change. This is because, by multiplying α , effectively we are dividing the temperature parameters by α . This means that, if we charge taxes to the players with some flat tax rate α 1 , the QREs will differ. Formally, we define the base temperature T 0 as the temperature when no tax is applied for both players. Then, we can define the tax rate for each player as α x = 1 T 0 / T x , α y = 1 T 0 / T y , respectively.
We demonstrate how the hysteresis mechanism can be applied in a 2 × 2 game via taxation with Example 1. Recall that in Example 1, we have two types of agents. We can consider these two types of agents as corresponding to two different sectors of the economy (e.g., aircraft manufacturing versus car manufacturing), which need to coordinate on their choice between two different competing technologies that are related to both sectors (e.g., 3D-printing). We can consider the row player as being the aircraft manufacturer and the column player as being the car manufacturer, with payoff matrices specified in Table 2. By assuming both players are of bounded rationality with temperature 1, we assume the base temperatures for both players are T 0 = 1 . In this game, the equilibrium where both players choose Technology 1 has greater social welfare than the equilibrium where both players choose Technology 2. Consider the situation where, initially, the system is in an equilibrium state where both players choose Technology 2 with high probability. Then, with taxation, we have shown in the previous sections that we are able to increase the social welfare via the hysteresis mechanism or the optimal control mechanism. Here, we demonstrate how the simplified process that we have described in Example 1 can improve the social welfare in this game (see Figure 2 for the bifurcation diagram of this game):
  • The initial state is ( 0.05 , 0.14 ) , where the row agent chooses Technology 1 with probability 0.05 and the column agent chooses Technology 1 with probability 0.14 . This is an equilibrium state when we impose the tax rate α x 0 to the row agent and the tax rate α y 0.5 to the column agent (where T x 1 and T y 2 ).
  • Fix the tax rate for the column agent at α y = 0.5 (where T y = 2 ) and increase the tax rate for the row agent to α x = 0.8 (where T x = 5 ). Under this assignment of tax rates, there is only one QRE correspondence.
  • Fix the tax rate for the column agent at α y = 0.5 (where T y = 2 ) and decrease the tax rate α x for the row agent back to 0 (where T x = 1 ). Now x 0.997 , where both agents choose Action 1 with high probability.
In [9], they considered three approaches—“anarchy,” “socialism,” and “market”—of how the taxes can be dynamically adjusted by the society, depending on whether the taxes are determined in a decentralized manner, by an external regulator, or through bargaining, respectively. The concept of our mechanisms is a variant of the “socialism” scheme since in our model the mechanism, who can be thought as an external regulator, determines the tax rates. Our mechanisms are systematic approaches that optimize an objective where, in [9], the trajectories toward maximizing expected utilities are considered.

7. Connection to Previous Works

Recently, there has been a growing interplay between game theory, dynamical systems, and computer science. Examples include the integration of replicator dynamics and topological tools [29,30,31] in algorithmic game theory, and Q-learning dynamics [5] in multi-agent systems [6]. Q-learning dynamics has been studied extensively in game settings, e.g., by Sato et al. in [13] and Tuyls et al. in [14]. In [12], Q-learning dynamics is considered as an extension of replicator dynamics driven by a combination of payoffs and entropy. Recent advances in our understanding of evolutionary dynamics in multi-agent learning can be found in the survey in [32].
We are particularly interested in the connection between Q-learning dynamics and the concept of QRE [7] in game theory. In [11], Cominetti et al. study this connection in traffic congestion games. The hysteresis effect of Q-learning dynamics was first identified in 2012 by Wolpert et al. [9]. Kianercy et al. in [16] observed the same phenomenon and provided discussions on bifurcation diagrams in 2 × 2 games. The hysteresis effect has also been highlighted in recent follow-up work by [10] as a design principle for future cancer treatments. It was also studied in [33] in the context of minimum-effort coordination games. However, our current understanding is still mostly qualitative and in this work we have pushed towards a more practically applicable, quantitative, and algorithmic analysis.
Analyzing the characteristics of various dynamical systems has also been attracting the attention of computer science community in recent years. For example, besides the Q-learning dynamics, the (simpler) replicator dynamics has been studied extensively due to its connections [30,34,35] to the multiplicative weight update (MWU) algorithm in [36].
Much attention has also been devoted to biological systems and their connections to game theory and computation. In recent work by Mehta et al. [37], the connection with genetic diversity was discussed in terms of the complexity of predicting whether genetic diversity persists in the long run under evolutionary pressures. This paper builds upon a rapid sequence of related results [38,39,40,41,42,43]. The key result is [39,40], where it was made clear that there is a strong connection between studying replicator dynamics in games and standard models of evolution. Follow-up works show how dynamics that incorporate errors (i.e., mutations) can be analyzed [44] and how such mutations can have a critical effect on ensuring survival in the presence of dynamically changing environments. Our paper makes progress along these lines by examining how noisy dynamics can introduce, for example, bifurcations.
We were inspired by recent work by Kianercy et al. establishing a connection between cancer dynamics and cancer treatment and studying Q-learning dynamics in games. This is analogous to the connections [39,40,45] between MWU and evolution detailed above. It is our hope that by starting off a quantitative analysis of these systems we can kickstart similarly rapid developments in our understanding of the related questions.

8. Conclusions

In this paper, we perform a quantitative analysis of bifurcation phenomena connected to Q-learning dynamics in 2 × 2 games. Based on this analysis, we introduce two novel mechanisms, the hysteresis mechanism and the optimal control mechanism. Hysteresis mechanisms use transient changes to the system parameters to induce permanent improvements to its performance via optimal (Nash) equilibrium selection. Optimal control mechanisms induce convergence to states whose performance is better than the best Nash equilibrium, showing that by controlling the exploration/exploitation tradeoff, we can achieve strictly better states than those achievable by perfectly rational agents.
We believe that these new classes of mechanisms could lead to interesting new questions within game theory. Importantly they could also lead to a more thorough understanding of cancer biology and how treatments could be designed not to kill tumor cells but to induce transient changes in the game with long-lasting consequences, impacting the equilibrium in ways that would be therapeutically useful.

Author Contributions

G.Y. worked on the analysis, experiments, figures and writeup, D.B. worked on the writeup, G.P. proposed the research direction and worked on the analysis and writeup.

Acknowledgments

Georgios Piliouras would like to acknowledge SUTD grant SRG ESD 2015 097 and MOE AcRF Tier 2 Grant 2016-T2-1-170 and an NRF 2018 Fellowship (NRF-NRFF2018-07). Ger Yang is supported in part by NSF grant numbers CCF-1216103, CCF-1350823, CCF-1331863, and CCF-1733832. David Basanta is partly funded by an NCI U01 (NCI) U01CA202958-01. Part of the work was completed while Ger Yang and Georgios Piliouras were visiting scientists at the Simons Institute for the Theory of Computing.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. From Q-Learning to Q-Learning Dynamics

In this section, we provide a quick sketch on how we can get to the Q-learning dynamics from Q-learning agents. We start with an introduction to the Q-learning rule. Then, we discuss the multi-agent model when there are multiple learners in the system. The goal of this section is to identify the dynamics of the system in which there are two learning agents playing a 2 × 2 game repeatedly over time.

Appendix A.1. Q-Learning Introduction

Q-learning [4,5] is a value-iteration method for solving the optimal strategies in Markov decision processes. It can be used as a model where users learn about their optimal strategy when facing uncertainties. Consider a system that consists of a finite number of states and there is one player who has a finite number of actions. The player is going to decide his/her strategy over an infinite time horizon. In Q-learning, at each time t, the player stores a value estimate Q ( s , a ) ( t ) for the payoff of each state–action pair ( s , a ) . S/he chooses his/her action a t + 1 that maximizes the Q-value Q ( s t , · ) ( t ) for time t + 1 , given the system state is s t at time t. In the next time step, if the agent plays action a t + 1 , s/he will receive a reward r ( t + 1 ) , and the value estimate is updated according to the rule:
Q ( s t , a t + 1 ) ( t + 1 ) = ( 1 α ) Q ( s t , a t + 1 ) ( t ) + α ( r ( t + 1 ) + γ max a Q ( s t + 1 , a ) ( t ) )
where α is the step size, and γ is the discount factor.

Appendix A.2. Joint-Learning Model

Next, we consider the joint learning model as in [16]. Suppose there are multiple players in the system that are learning concurrently. Denote the set of players as P. We assume the system state is a function of the action each player is playing, and the reward observed by each player is a function of the system state. Their learning behaviors are modeled as simplified models based on the Q-learning algorithm described above. More precisely, we consider the case where each player assumes the system is only of one state, which corresponds to the case where the player has very limited memory and has discount factor γ = 0 . The reward observed by player i P given s/he plays action a at time t is denoted as r a i ( t ) . We can write the updating rule of the Q-value for agent i as follows:
Q a i ( t + 1 ) = Q a i ( t ) + α [ r a i ( t ) Q a i ( t ) ] .
For the selection process, we consider the mechanism that each player i P selects his/her action according to the Boltzmann distribution with temperature T i :
x a i ( t ) = e Q a i ( t ) / T i a e Q a i ( t ) / T i
where x a i ( t ) is the probability that agent i chooses action a at time t. The intuition behind this mechanism is that we are modeling the irrationality of the users by the temperature parameter T i . For small T i , the selection rule corresponds to the case of more rational agents. We can see that for T i 0 , (A1) corresponds to the best-response rule, that is, each agent selects the action with the highest Q-value with probability one. On the other hand, for T i , we can see that Equation (A1) corresponds to the selection rule of selecting each action uniformly at random, which models the case of fully irrational agents.

Appendix A.3. Continuous-Time Dynamics

This underlying Q-learning model has been studied in the previous decades. It is known that if we take the time interval to be infinitely small, this sequential joint learning process can be approximated as a continuous-time model ([13,14]) that has some interesting characteristics. To see this, consider the 2 × 2 game as we have described in Section 2.1. The expected payoff for the first player at time t given s/he chooses action a can be written as r a x ( t ) = [ A y ( t ) ] a ; similarly, the expected payoff for the second player at time t given s/he chooses action a is r a y ( t ) = [ B x ( t ) ] a . The continuous-time limit for the evolution of the Q-value for each player can be written as
Q ˙ a x ( t ) = α [ r a x ( t ) Q a x ( t ) ] Q ˙ a y ( t ) = α [ r a y ( t ) Q a y ( t ) ] .
Then, we take the time derivative of Equation (A1) for each player to obtain the evolution of the strategy profile:
x ˙ i = 1 T x x i Q ˙ i x k x k Q ˙ k x y ˙ i = 1 T y y i Q ˙ i y k y k Q ˙ k y .
Putting these together, and rescaling the time horizon to α t / T x and α t / T y respectively, we obtain the continuous-time dynamics:
x ˙ i = x i ( A y ) i x T A y + T x j x j ln ( x j / x i )
y ˙ i = y i ( B x ) i y T B x + T y j y j ln ( y j / y i ) .

Appendix A.4. The Exploration Term Increases Entropy

Now, we show that the exploration term in the Q-learning dynamics results in the increase of the entropy:
Lemma A1.
Suppose A = 0 and B = 0 . The system entropy
H ( x , y ) = H ( x ) + H ( y ) = i x i ln x i i y i ln y i
for the dynamics (2) increases with time, i.e.,
H ˙ ( x , y ) > 0
if x and y are not uniformly distributed.
Proof of Lemma A1.
It is equivalent that we consider the single agent dynamics:
x ˙ i = x i T x ln x i + j x j ln x j .
Taking the derivative of the entropy H ( x ) , we have
H ˙ ( x ) = i ( ln x i 1 ) x ˙ i = T x i x i ( ln x i ) 2 + j x i ln x i 2 ,
and since we have i x i = 1 , by Jensen’s inequality, we can find that
j x i ln x i 2 i x i ( ln x i ) 2
where equality holds if and only if x is a uniform distribution. Consequently, if we have x i ( 0 , 1 ) , and x is not a uniform distribution, H ˙ ( x ) is strictly positive, which means that the system entropy increases with time. ☐

Appendix B. Convergence of Dissipative Learning Dynamics in 2 × 2 Games

Appendix B.1. Liouville’s Formula

Liouville’s formula can be applied to any system of autonomous differential equations with a continuously differentiable vector field V on an open domain of S R k . The divergence of V at x S is defined as the trace of the corresponding Jacobian at x, i.e., div [ V ( x ) ] i = 1 k V i x i ( x ) = t r ( D V ( x ) ) . Since divergence is a continuous function we can compute its integral over measurable sets A S (with respect to Lebesgue measure μ on R n ). Given any such set A, let ϕ t ( A ) = { ϕ ( x 0 , t ) : x 0 A } be the image of A under map Φ at time t. ϕ t ( A ) is measurable and its measure is μ ( ϕ t ( A ) ) ) = ϕ t ( A ) d x . Liouville’s formula states that the time derivative of the volume ϕ t ( A ) exists and is equal to the integral of the divergence over ϕ t ( A ) : d d t [ A ( t ) ] = ϕ t ( A ) div [ V ( x ) ] d x . Equivalently,
Theorem A1
([46], p. 356). d d t μ ( ϕ t ( A ) ) = ϕ t ( A ) t r ( D V ( x ) ) d μ ( x ) .
A vector field is called divergence free if its divergence is zero everywhere. Liouville’s formula trivially implies that volume is preserved in such flows.
This theorem extends in a straightforward manner to systems where the vector field V : X T X is defined on an affine set X R n with tangent space T X . In this case, μ represents the Lebesgue measure on the (affine hull) of X. Note that the derivative of V at a state x X must be represented using the derivate matrix D V ( x ) R n × n , which by definitions has rows in T X . If V ^ : R n R n is a C 1 extension of V, then D V ( x ) = D V ^ ( x ) P T X , where P T X R n × n is the orthogonal projection2 of R n onto the subspace T X .

Appendix B.2. Poincaré–Bendixson Theorem

The Poincaré–Bendixson theorem is a powerful theorem that implies that two-dimensional systems cannot effectively exhibit chaos. Effectively, the limit behavior is either going to be an equilibrium, a periodic orbit, or a closed loop, punctuated by one (or more) fixed points. Formally, we have
Theorem A2
([47,48]). Given a differentiable real dynamical system defined on an open subset of the plane, then every non-empty compact ω-limit set of an orbit, which contains only finitely many fixed points, is either a fixed point, a periodic orbit, or a connected set composed of a finite number of fixed points together with homoclinic and heteroclinic orbits connecting these.

Appendix B.3. Bendixson–Dulac Theorem

By excluding the possibility of closed loops (i.e., periodic orbits, homoclinic cycles, and heteronclinic cycles) we can effectively establish global convergence to equilibrium. The following criterion, which was first established by Bendixson in 1901 and further refined by French mathematician Dulac in 1933, allows us to do that. It is typically referred to as the Bendixson–Dulac negative criterion. It focuses exactly on the planar system where the measure of initial conditions always shrinks (or always increases) with time, i.e., dynamical systems with vector fields whose divergence is always negative (or always positive).
Theorem A3
([49], p. 210). Let D R 2 be a simply connected region and ( f , g ) in C 1 ( D , R ) with d i v ( f , g ) = f x + g y being not identically zero and without a change of sign in D. Then the system
d x d t = f ( x , y )
d y d t = g ( x , y )
has no loops lying entirely in D.
The function φ ( x , y ) is typically called the Dulac function.
Remark A1.
This criterion can also be generalized. Specifically, it holds for the system:
d x d t = ρ ( x , y ) f ( x , y )
d y d t = ρ ( x , y ) g ( x , y )
if ρ ( x , y ) > 0 is continuously differentiable. Effectively, we are allowed to rescale the vector field by a scalar function (as long as this function does not have any zeros), before we prove that the divergence is positive (or negative). That is, it suffices to find ρ ( x , y ) > 0 continuously differentiable, such that ( ρ ( x , y ) f ( x , y ) ) x + ( ρ ( x , y ) g ( x , y ) ) y possesses a fixed sign.
By [16], after a change of variables, u k = ln ( x k + 1 ) ln x 1 , v k = ln ( y k + 1 ) ln y 1 for k = 1 , , n 1 , the replicator system transforms to the following system:
u ˙ k = j a ^ k j e v j 1 + j e v j T x u k , v ˙ k = j a ^ k j e u j 1 + j e u j T x v k , ( II )
where a ^ k j = a k + 1 , j + 1 a 1 , j + 1 , b ^ k j = b k + 1 , j + 1 a 1 , j + 1 .
In the case of 2 × 2 games, we can apply both the Poincaré–Bendixson theorem as well as the Bendixson–Dulac theorem, since the resulting dynamical system is planar and u ˙ 1 u 1 + v ˙ 1 v 1 = ( T x + T y ) < 0 . Hence, for any initial condition system, (II) converges to equilibria. The flow of the original replicator system in the 2 × 2 game is diffeomorhpic3 to the flow of System (II); thus, the replicator dynamics with positive temperatures T x , T y converges to equilibria for all initial conditions as well.

Appendix C. Bifurcation Analysis for Games with Only One Nash Equilibrium

In this section, we present the results for the class of games with only one Nash equilibrium, where it can be either a pure one or a mixed one, where the mixed Nash equilibrium is defined as follows.
Definition A1 (Mixed Nash equilibrium).
A strategy profile ( x N E , y N E ) is a mixed Nash equilibrium if
x N E arg max x [ 0 , 1 ] x T A y N E y N E arg max y [ 0 , 1 ] y T B x N E .
This corresponds to the case where b X , a Y , or b Y is negative. Similarly, our analysis is based on the second form representation described in Equations (6) and (7), which demonstrates insights from the first player’s perspective.

Appendix C.1. No Dominating Strategy for the First Player

More specifically, this is the case when there is no dominating strategy for the first player, i.e., both a X and b X are positive. From Equation (7), we can presume that the characteristics of the bifurcation diagrams depend on the value of a Y + b Y since it affects whether y I I is increasing with x or not. Additionally, we can find some interesting phenomenon from the discussion below.
First, we consider the case when a Y + b Y > 0 . This can be considered as a more general case as we have discussed in Section 4.3. In fact, the statements we have made in Theorems 1–3 applies to this case. However, there are some subtle difference that should be noticed. If a Y > b Y , where we can assume b Y < 0 , then by the second part of Theorem 2, there are no QREs in x ( 0 , 0.5 ) , since T B is now a negative number. This means that we always only have the principal branch. On the other hand, if a Y < b Y , where we can assume a Y < 0 , then, similar to the example in Figure 4 and Figure 5, there could still be two branches. However, we can presume that the second branch vanishes before T y actually goes to zero, as the state ( 1 , 1 ) is not a Nash equilibrium.
Theorem A4.
Given a 2 × 2 game in which the diagonal form has a X , b X > 0 , a Y + b Y > 0 , and a Y < b Y , and given T y , if T y < T A , where T A = a Y ln ( a Y / b Y ) , then there is no QRE correspondence in x ( 0.5 , 1 ) .
The proof of the above theorem directly follows from Proposition A4 in the appendix. An interesting observation here is that we can still make the first player achieve his/her desired state by changing T y to some value that is greater than T A .
Next, we consider a Y + b Y 0 . The bifurcation diagram is illustrated in Figure A1 and Figure A2. We can find that in this case the principal branch directly goes toward its unique Nash equilibrium. We present the results formally in the following theorem, where the proof follows from Appendix D.1.2 in the appendix.
Figure A1. Bifurcation diagram for a game with no dominating strategy for the first player, a Y + b Y < 0 , and a low T Y .
Figure A1. Bifurcation diagram for a game with no dominating strategy for the first player, a Y + b Y < 0 , and a low T Y .
Games 09 00021 g0a1
Figure A2. Bifurcation diagram for a game with no dominating strategy for the first player, a Y + b Y < 0 , and a high T Y .
Figure A2. Bifurcation diagram for a game with no dominating strategy for the first player, a Y + b Y < 0 , and a high T Y .
Games 09 00021 g0a2
Theorem A5.
Given a 2 × 2 game in which the diagonal form has a X , b X > 0 , a Y + b Y 0 , QRE is unique given T x and T y .

Appendix C.2. Dominating Strategy for the First Player

Finally, we consider the case when there is a dominating strategy for the first player, i.e., b X < 0 . According to Figure A3 and Figure A4, the principal branch seems always goes towards x = 1 . This means that the first player always prefers his/her dominating strategy. We formalize this observation, as well as some important characteristics for this case in the theorem below, where the proof can be found in Appendix D.2 in the appendix.
Figure A3. Bifurcation diagram for a game with one dominating strategy for the first player and a Y + b Y < 0 .
Figure A3. Bifurcation diagram for a game with one dominating strategy for the first player and a Y + b Y < 0 .
Games 09 00021 g0a3
Figure A4. Bifurcation diagram for a game with one dominating strategy for the first player, a Y + b Y > 0 , and a Y < b Y .
Figure A4. Bifurcation diagram for a game with one dominating strategy for the first player, a Y + b Y > 0 , and a Y < b Y .
Games 09 00021 g0a4
Theorem A6.
Given a 2 × 2 game in which the diagonal form has a X > 0 , b X < 0 , a X + b X > 0 , and, given T y , the following statements are true:
1. 
The region ( 0 , 0.5 ) contains the principal branch.
2. 
There is no QRE correspondence for x ( 0.5 , 1 ) .
3. 
If a Y + b Y < 0 or a Y > b Y , then the principal branch is continuous.
4. 
If a Y + b Y > 0 and b Y > a Y , then the principal branch may not be continuous.
As we can see from Theorem A6, for most cases, the principal branch is continuous. One special case is when a Y + b Y > 0 with b Y > a Y . In fact, this can be seen as a duality, i.e., flipping the role of two players, of the case we have discussed in Part 3 of Theorem A4, where, if T y is within T A and T I , there can be three QRE correspondences.

Appendix D. Detailed Bifurcation Analysis for General 2 × 2 Game

In this section, we provide technical details for the results we stated in Section 4.3 and Appendix C. Before we get into details, we state some results that will be useful throughout the analysis in the following lemma. The proof of this lemma is straightforward and we omit it in this paper.
Lemma A2.
The following statements are true.
1. 
The derivative of T X I I is given as
T X I I x ( x , T y ) = ( a X + b X ) L ( x , T y ) + b X x ( 1 x ) [ ln ( 1 / x 1 ) ] 2
where
L ( x , T y ) = y I I + x ( 1 x ) ln 1 x 1 y I I x .
2. 
The derivative of y I I is given as
y I I x = y I I ( 1 y I I ) a Y + b Y T y .
3. 
For x ( 0 , 1 / 2 ) ( 1 / 2 , 1 ) , T X I I x > 0 if and only if L ( x , T y ) < b X a X + b X ; on the other hand, T X I I x < 0 if and only if L ( x , T y ) > b X a X + b X .

Appendix D.1. Case 1: bX ≥ 0

First, we consider the case b X 0 . As we are going to show in Proposition A1, the direction of the principal branch relies on y I I ( 0.5 , T y ) , which is the strategy the second player is performing, assuming the first player is indifferent to his/her payoff. The idea is that if y I I ( 0.5 , T y ) is large, then it means that the second player pays more attention to the action that the first player thinks is better. This is more likely to happen when the second player has less rationality, i.e., high temperature T y . On the other hand, if the second player pays more attention to the other action, the first player is forced to choose that as it gets more expected payoff.
We show that, for T y > T I , the principal branch lies on x 1 2 , 1 ; otherwise, the principal branch lies on x 0 , 1 2 . This result follows from the following proposition:
Proposition A1.
For Case 1, if T y > T I , then y I I ( 1 / 2 , T y ) > b X a X + b X ; hence,
lim x 1 2 + T X I I ( x , T y ) = + a n d lim x 1 2 T X I I ( x , T y ) = .
On the other hand, if T y < T I , then y I I ( 1 / 2 , T y ) < b X a X + b X ; hence,
lim x 1 2 + T X I I ( x , T y ) = a n d lim x 1 2 T X I I ( x , T y ) = + .
Proof. 
First, consider the case where b Y > a Y , then, for T y > T I = b Y a Y 2 ln ( a X / b X ) ,
y I I 1 2 , T y = 1 + e b Y a Y 2 T y 1 > 1 + e b Y a Y 2 T I 1 = 1 + a X b X 1 = b X a X + b X .
Then, for the case where a Y > b Y ,
y I I 1 2 , T y = 1 + e b Y a Y 2 T y 1 > 1 + e 0 1 = 1 2 b X a X + b X .
For the case where a Y = b Y , since we assumed a X b X ,
y I I 1 2 , T y = 1 + e b Y a Y 2 T y 1 = 1 + e 0 1 = 1 2 > b X a X + b X .
As a result, the numerator of Equation (6) at x = 1 2 is negative for T y > T I , which proves the first two limits.
For the remaining two limits, we only need to consider the case b Y > a Y ; otherwise, T I = 0 , which is meaningless. For b Y > a Y and T y < T I ,
y I I 1 2 , T y = 1 + e b Y a Y 2 T y 1 < 1 + e b Y a Y 2 T I 1 = 1 + a X b X 1 = b X a X + b X .
This makes the numerator of Equation (6) at x = 1 2 positive and proves the last two limits.

Appendix D.1.1. Case 1a: bX ≥ 0, aY + bY > 0

In this section, we consider a relaxed version of the class of coordination game as in Section 4.3. We prove theorems presented in Section 4.3, showing that these results can in fact be extended to the case where a Y + b Y > 0 , instead of requiring a Y > 0 and b Y > 0 .
First, a Y + b Y > 0 , y I I is an increasing function of x, meaning
y I I x = y I I ( 1 y I I ) a Y + b Y T y > 0 .
This implies that both players tend to agree to each other. Intuitively, if a Y b Y , then both players agree that the first action is the better one. For this case, we can show that, no matter what T y is, the principal branch lies on x 1 2 , 1 . In fact, this can be extended to the case whenever T y > T I , which is the first part of Theorem 1.
Proof of Part 1 of Theorem 1.
We can find that, for T y > T I , y I I ( 1 / 2 , T Y ) > b X a X + b X for any T y according to Proposition A1. Since y I I is monotonically increasing with x, y I I > b X a X + b X for x > 1 / 2 . This means that T X I I > 0 for any x ( 1 / 2 , 1 ) . Additionally, it is easy to see that lim x 1 T X I I = 0 . As a result, ( 0.5 , 1 ) contains the principal branch. ☐
For Case 1a with a Y b Y , on the principal branch, the lower the T x , the closer x is to 1. We are able to show these monotonicity characteristics in Proposition A2, and they can be used to justify the stability owing to Lemma 1.
Proposition A2.
In Case 1a, if a Y b Y , then T X I I x < 0 for x 1 2 , 1 .
Proof. 
It suffices to show that L ( x , T y ) > b X a X + b X for x 1 2 , 1 . Note that, according to Proposition A1, if a Y b Y ,
L ( 1 / 2 , T y ) = y I I ( 1 / 2 , T y ) 1 2 b X a X + b X .
Since y I I ( x , T y ) is monotonically increasing when a Y + b Y > 0 , y I I ( x , T y ) > 1 2 for x 1 2 , 1 . As a result, 1 2 y I I < 0 ; hence, we can see that, for x 1 2 , 1 ,
L x = ( 1 2 x ) + x ( 1 x ) ( 1 2 y I I ) a Y + b Y T y ln 1 x 1 y I I x > 0 .
Consequently, for x 1 2 , 1 , L ( x , T y ) > b X a X + b X ; hence, T X I I x < 0 according to Lemma A2.
Proof of Part 1 of Theorem 3.
According to Lemma 1, Proposition A2 implies that all x ( 0.5 , 1 ) is on the principal branch. This directly leads us to Part 1 of Theorem 3. ☐
Next, if we look into the region x ( 0 , 1 / 2 ) , we can find that, in this region, QREs appears only when T x and T y are low. This observation can be formalized in the proposition below. We can see that this proposition directly proves Parts 2 and 3 of Theorem 2, as well as Part 2 of Theorem 3.
Proposition A3.
Consider Case 1a. Let x 1 = min 1 2 , T y ln a X b X + b Y a Y + b Y and T B = b Y ln ( a X / b X ) . The following statements are true for x ( 0 , 1 / 2 ) :
1. 
If T y > T B , then T X I I < 0 .
2. 
If T y < T B , then T X I I > 0 if and only if x ( 0 , x 1 ) .
3. 
L x > 0 for x ( 0 , x 1 ) .
4. 
If T y < T I , then T X I I x > 0 .
5. 
If T y > T I , then there is a nonnegative critical temperature T C ( T y ) such that T X I I ( x , T Y ) T C ( T y ) for x ( 0 , 1 / 2 ) . If T Y < T B , then T C ( T y ) is given as T X I I ( x L ) , where x L ( 0 , x 1 ) is the unique solution to L ( x , T y ) = b X a X + b X .
Proof. 
For the first and second part, consider any x ( 0 , 1 / 2 ) and we can see that
T X I I > 0 y I I < b X a X + b X 1 + e 1 T y ( ( a Y + b Y ) x + b Y ) 1 < b X a X + b X x < min 1 2 , T y ln a X b X + b Y a Y + b Y .
Note that for T y > b Y ln ( a X / b X ) = T B , we have x 1 < 0 ; hence, T X < 0 .
From the above derivation, for all x ( 0 , 1 / 2 ) such that T X I I ( x , T y ) > 0 , y I I < 1 / 2 since b X a X + b X < 1 / 2 . Then
L x = ( 1 2 x ) + x ( 1 x ) ( 1 2 y I I ) a Y + b Y T y ln 1 x 1 y I I x > 0 .
Further, when T y < T I , y I I ( 1 / 2 , T y ) < b X a X + b X . This implies that, for x ( 0 , 1 / 2 ) , y I I ( x , T y ) < b X a X + b X . Since L x > 0 , and L is continuous, L ( x , T y ) < b X a X + b X for x ( 0 , 1 / 2 ) . This implies the fourth part of the proposition.
Next, if we look at the derivative of T X I I ,
T X I I x ( x , T y ) = ( a X + b X ) L ( x , T y ) + b X x ( 1 x ) [ ln ( 1 / x 1 ) ] 2 .
We can see that any critical point in x ( 0 , 1 / 2 ) must satisfy L ( x , T y ) = b X a X + b X . When T y > T I , x 1 < 1 / 2 , and L ( x 1 , T y ) > y I I ( x 1 , T y ) = b X a X + b X . If T y < b Y ln ( a X / b X ) , then lim x 0 + T X = y I I ( 0 , T Y ) < b X a X + b X . Hence, there is exactly one critical point for T X for x ( 0 , x 1 ) , which is a local maximum for T X . If T y > b Y ln ( a X / b X ) , then we can see that T X is always negative, in which case the critical temperature is zero. ☐
The results in Proposition A3 not only apply for the case a Y b Y but also general cases about the characteristics on ( 0 , 1 / 2 ) . According to this proposition, we can conclude the following for the case a Y b Y , as well as the case a Y < b Y when T y > T I :
  • The temperature T B = b Y ln ( a X / b X ) determines whether there is a branch appears in x ( 0 , 1 / 2 ) .
  • There is some critical temperature T C . If we raise T x above T C , then the system is always on the principal branch.
  • The critical temperature T C is given as the solution to the equality L ( x , T Y ) = b X a X + b X .
When there is a positive critical temperature, though it has no closed form solution, we can perform a binary search to look for x ( 0 , x 1 ) that satisfies L ( x , T y ) = b X a X + b X .
Another result we are able to obtain from Proposition A3 is that the principal branch for Case 1a when T y < T I lies on ( 0 , 1 / 2 ) .
Proof of Part 2 of Theorem 1.
First, we note that T y < T I is meaningful only when b Y > a Y , for which case we always have T I < T B . From Proposition A3, we can see that for T Y I I < T I , we have x 1 = 1 / 2 ; hence, T X I I > 0 for x ( 0 , 1 / 2 ) . From Proposition A1, we already have lim x 1 2 T X I I = . Additionally, it is easy to see that lim x 0 + T X I I = 0 . As a result, since T X I I is continuously differentiable over ( 0 , 0.5 ) , for any T x > 0 , there exists x ( 0 , 0.5 ) such that T X I I ( x , T y ) = T x . ☐
What remains to be shown is the characteristics on the side ( 1 / 2 , 1 ) when b Y > a Y . In Figure 4 and Figure 5, for low T y , the branch on the side ( 1 / 2 , 1 ) demonstrated a similar behavior as what we have shown in Proposition A3 for the side ( 0 , 1 / 2 ) . However, for a high T y , while we still can find that ( 0 , 1 / 2 ) contains the principal branch, the principal branch is not continuous. These observations are formalized in the following proposition. From this proposition, the proof of Part 4 of Theorem 2 directly follows.
Proposition A4.
Consider Case 1a with b Y > a Y . Let x 2 = max 1 2 , T Y ln a X b X + b Y a Y + b Y and T A = max 0 , a Y ln ( a X / b X ) . The following statements are true for x ( 1 / 2 , 1 ) .
  • If T y < T A , then T X I I < 0 .
  • If T y > T A , then T X I I > 0 if and only if x ( x 2 , 1 ) .
  • For x b Y a Y + b Y , 1 , we have L x > 0 .
  • If T y ( T A , T I ) , then there is a positive critical temperature T C ( T y ) such that T X I I ( x , T y ) T C ( T y ) for x ( 1 / 2 , 1 ) , given as T C ( T y ) = T X I I ( x L ) , where x L ( 1 / 2 , 1 ) is the unique solution of L ( x , T y ) = b X a X + b X .
Proof. 
For the first part and the second part, consider x ( 1 / 2 , 1 ) , and we can find that
T X I I > 0 y I I > b X a X + b X 1 + e 1 T y ( ( a Y + b Y ) x + b Y ) 1 > b X a X + b X x > max 1 2 , T y ln a X b X + b Y a Y + b Y = x 2 .
Note that, for T y > T I , x 2 = 1 / 2 . Additionally, if T y < T A , then T X I I < 0 for all x ( 1 / 2 , 1 ) .
For the third part, y I I 1 2 for all x b Y a Y + b Y and b Y a Y + b Y > 1 2 . Thus,
L x = ( 1 2 x ) + x ( 1 x ) ( 1 2 y I I ) a Y + b Y T y ln 1 x 1 y I I x > 0 .
For the fourth part, we can find that any critical point of L ( x , T Y ) in ( 0 , 1 ) must be either x = 1 2 or satisfies the following equation:
( 1 2 x ) + x ( 1 x ) ( 1 2 y I I ) a Y + b Y T y = 0 .
Consider G ( x , T y ) = ( 1 2 x ) + x ( 1 x ) ( 1 2 y I I ) a Y + b Y T y . For b Y > a Y , y I I ( 1 / 2 , T y ) is strictly less than 1 / 2 . Additionally, b Y a Y + b Y > 1 / 2 . Now, G ( 1 / 2 , T y ) > 0 and G ( b Y a Y + b Y , T y ) < 0 . Next, we can see that G ( x , T y ) is monotonically decreasing with respect to x for x 1 2 , b Y a Y + b Y by looking at its derivative:
G ( x , T y ) x = 2 + a Y + b Y T y ( 1 2 x ) ( 1 2 y I I ) 2 x ( 1 x ) y I I x < 0 .
As a result, there is some x * 1 2 , b Y a Y + b Y such that G ( x * , T y ) = 0 . This implies that L ( x , T y ) has exactly one critical point for x 1 2 , b Y a Y + b Y . Additionally, if G ( x , T y ) > 0 , L x < 0 ; if G ( x , T y ) < 0 , then L x > 0 . Therefore, x * is a local minimum for L.
From the above arguments, we can conclude that the shape of L ( x , T y ) for T y < T I is as follows:
  • There is a local maximum at x = 1 / 2 , where L ( 1 / 2 , T y ) = y ( 1 / 2 , T y ) < b X a X + b X .
  • L is decreasing on the interval 1 2 , x * , where x * is the unique solution to Equation (A7).
  • L is increasing on the interval ( x * , 1 ) . If T y > T A , then lim x 1 L ( x , T y ) = y ( 1 , T y ) > b X a X + b X .
Finally, we can claim that there is a unique solution to L ( x , T Y ) = b X a X + b X , and such a point gives a local maximum to T X I I . ☐
The above proposition suggests that, for T y ( T A , T I ) , we are able to use binary search to find the critical temperature. For T y > T I , unfortunately, with a similar argument of Proposition A4, we can find that there are potentially at most two critical points for T X I I on ( 1 / 2 , 1 ) , as shown in Figure 5, which may induce an unstable segment between two stable segments. This also proves Part 3 of Theorem 3.
Now, we have enough materials to prove the remaining statements in Section 4.3.
Proof of Parts 1, 5, and 6 of Theorem 2, Part 4 of Theorem 3.
For T y > T I , by Proposition A3, we can conclude that, for x ( 0 , x L ) , we have T X I I x > 0 , for which the QREs are stable by Lemma 1. With similar arguments, we can conclude that the QREs on x ( x L , x 1 ) are unstable. Additionally, given T x , the stable QRE x a ( 0 , x L ) and the unstable x b ( x L , x 1 ) that satisfies T X I I ( x a , T y ) = T X I I ( x b , T y ) = T x appear in pairs. For T y < T I , with the same technique and by Proposition A4, we can claim that the QREs in x ( x 2 , x L ) are unstable, while the QREs in x ( x L , 1 ) are stable. This proves the first part of Theorem 2 and Part 4 of Theorem 3.
Parts 5 and 6 of Theorem 2 are corollaries of Part 5 of Proposition A3 and Part 4 of Proposition A4. ☐

Appendix D.1.2. Case 1b: bX > 0, aY + bY < 0

In this case, both players have different preferences. For the game within this class, there is only one Nash equilibrium (either pure or mixed). We presented examples in Figure A1 and Figure A2. We can see that, in these figures, there is only one QRE given T x and T y . We show in the following two propositions that this observation is true for all instances.
Proposition A5.
Consider Case 1b. Let x 3 = max 0 , T y ln ( a X / b X ) + b Y a Y + b Y . If T y < T I , then the following statements are true:
1. 
T X I I ( x , T y ) < 0 for x ( 1 / 2 , 1 ) .
2. 
T X I I ( x , T y ) > 0 for x x 3 , 1 2 .
3. 
T X I I ( x , T y ) x > 0 for x x 3 , 1 2 .
4. 
x 3 , 1 2 contains the principal branch.
Proof. 
Note that, if T y < T I , x 3 < 1 / 2 . Additionally, according to Proposition A2, y I I ( 1 / 2 , T y ) < b X a X + b X . Since y I I is continuous and monotonically decreasing with x, y I I < b X a X + b X for x > 1 / 2 . Therefore, the numerator of Equation (6) is always positive for x ( 1 / 2 , 1 ) , which makes T X I I negative. This proves the first part of the proposition.
For the second part, observe that, for x ( 0 , 1 / 2 ) , T X I I > 0 if and only if y I I < b X a X + b X . This is equivalent to x > T y ln ( a X / b X ) + b Y a Y + b Y .
For the third part, note that, for x ( 0 , 1 / 2 ) , x ( 1 x ) ln ( 1 / x 1 ) y I I x < 0 . This implies L ( x , T y ) < y I I ( x , T y ) < b X a X + b X for x ( x 3 , 1 / 2 ) , from which we can conclude that T X I I ( x , T y ) x > 0 .
Finally, we note that if x 3 > 0 , then T X I I ( x 3 , T y ) = 0 . If x 3 = 0 , we have lim x 0 + T X I I = 0 . As a result, we can conclude that ( x 3 , 1 / 2 ) contains the principal branch. ☐
With the similar arguments, we are able to show the following proposition for T y > T I :
Proposition A6.
Consider Case 1b. Let x 3 = min 1 , T y ln ( a X / b X ) + b Y a Y + b Y . If T y > T I , then the following statements are true:
1. 
T X I I ( x , T y ) < 0 for x ( 0 , 1 / 2 ) .
2. 
T X I I ( x , T y ) > 0 for x 1 2 , x 3 .
3. 
T X I I ( x , T y ) x < 0 for x 1 2 , x 3 .
4. 
1 2 , x 3 contains the principal branch.

Appendix D.1.3. Case 1c: aY + b + Y = 0

In this case, we have T I = b Y ln ( a X / b X ) , and y I I is a constant with respect to x. The proof of Theorem A5 for a Y + b Y = 0 directly follows from the following proposition.
Proposition A7.
Consider Case 1c. The following statements are true:
1. 
If T y < T I , then T X I I ( x , T y ) < 0 for x ( 0.5 , 1 ) , and T X I I ( x , T y ) > 0 for x ( 0 , 0.5 ) .
2. 
If T y > T I , then T X I I ( x , T y ) < 0 for x ( 0 , 0.5 ) , and T X I I ( x , T y ) > 0 for x ( 0.5 , 1 ) .
3. 
If T y < T I , then T X I I ( x , T y ) x > 0 for x 0 , 0.5 .
4. 
If T y > T I , then T X I I ( x , T y ) x < 0 for x 0.5 , 1 .
Proof. 
Note that y I I = 1 + e b Y / T y 1 .
First consider the case when a Y > b Y . In this case, T I = 0 and b Y < 0 . Therefore, y I I > b X a X + b X , from which we can conclude that T X I I > 0 for x ( 0.5 , 1 ) and T X I I < 0 for x ( 0 , 0.5 ) , for any positive T y .
Now consider the case where a Y < b Y . If T y < T I , y I I < b X a X + b X ; hence, we get T X I I ( x , T y ) < 0 for x ( 0.5 , 1 ) and T X I I ( x , T y ) > 0 for x ( 0 , 0.5 ) , which is the first part of the proposition statement. Similarly, if T y > T I , y I I > b X a X + b X , from which the second part of the proposition follows.
For the third part and the fourth part, note that L ( x , T y ) = y I I in this case, as y I I x = 0 as per Equation (A5), and the sign of the derivative of T X I I can be seen from Lemma A2. ☐

Appendix D.2. Case 2: bX < 0

In this case, the first action is a dominating strategy for the first player. Note that both ( a X + b X ) and b X are not positive, which means that the numerator of Equation (6) is always smaller than or equal to zero. This implies that all QRE correspondences appear on x 1 2 , 1 . In fact, since y I I > 0 for x ( 1 / 2 , 1 ) , the numerator of Equation (6) is always negative, we have T X I I > 0 for x ( 1 / 2 , 1 ) . Additionally, we can easily see that
lim x 1 2 + T X I I ( x , T y ) = + .
This implies that ( 1 / 2 , 1 ) contains the principal branch. First, we show the result when a Y + b Y < 0 in the following proposition. Additionally, the bifurcation diagram is presented in Figure A3.
Proposition A8.
For Case 2, if a Y + b Y < 0 , then for x ( 1 / 2 , 1 ) , T X I I x < 0 .
Proof. 
In this case, y I I is monotonically decreasing with x. We can see that
L ( x , T Y ) = y I I + x ( 1 x ) ln 1 x 1 y I I x > y I I > 0
since x ( 1 x ) ln 1 x 1 y I I x is positive for x ( 1 / 2 , 1 ) . Bringing this back to Equation (A4), we have T X I I x < 0 . ☐
For a Y + b Y > 0 , if a Y > b Y , the bifurcation diagram has the similar trend as in Figure A3; while, if a Y < b Y , we lose the continuity on the principal branch.
Proposition A9.
For Case 2, if a Y + b Y > 0 , then for x ( 1 / 2 , 1 ) , we have
1. 
if a Y > b Y , then T X I I x < 0 .
2. 
if a Y < b Y , then T X has at most two local extrema.
Proof. 
In this case, y I I is monotonically increasing with x. For a Y > b Y , we can find that y I I ( 1 / 2 , T y ) > 0 and L ( 1 / 2 , T y ) = y I I ( 1 / 2 , T y ) > 0 . Additionally, we can obtain that L is monotonically increasing for x ( 1 / 2 , 1 ) by inspecting
L ( x , T y ) x = ( 1 2 x ) + x ( 1 x ) ( 1 2 y I I ) a Y + b Y T y ln 1 x 1 y I I ( x , T y ) x > 0 .
Hence, for x ( 1 / 2 , 1 ) , L ( x , T y ) > 0 . This implies T X I I x < 0 for x ( 1 / 2 , 1 ) .
For the second part, we can find that, for a Y < b Y , y I I ( 1 / 2 ) < 1 / 2 . Let x 2 = min 1 , b Y a Y + b Y . First note that, if x 2 < 1 , then, for x > x 2 , we have y > 1 / 2 , and further we can get L ( x , T y ) x > 0 for x ( x 2 , 1 ) . We use the same technique as in the proof of Proposition A4. Let G ( x , T y ) = ( 1 2 x ) + x ( 1 x ) ( 1 2 y I I ) a Y + b Y T y . Note that G ( 1 / 2 , T y ) > 0 and G ( x 2 , T y ) < 0 . Next, observe that G ( x , T y ) is monotonically decreasing for x 1 2 , x 2 . Hence, there is an x * ( 1 / 2 , x 2 ) such that G ( x * , T y ) = 0 . This x * is a local minimum for L. We can conclude that, for x ( 1 / 2 , 1 ) , L has the following shape:
  • There is a local maximum at x = 1 / 2 , where L ( 1 / 2 , T y ) = y ( 1 / 2 , T y ) > 0 .
  • L is decreasing on the interval x ( 1 / 2 , x * ) , where x * is the solution to G ( x * , T y ) = 0 .
  • L is increasing on the interval x ( x * , x 2 ) . Note that lim x 1 L ( x , T y ) = y I I ( 1 , T y ) > 0 .
As a result, if L ( x * , T y ) > b X a X + b X , then T X I I is monotonically decreasing; otherwise, T X I I has a local minimum and a local maximum on ( 1 / 2 , 1 ) . ☐

References

  1. Devaney, R.L. A First Course in Chaotic Dynamical Systems; Westview Press: Boulder, CO, USA, 1992. [Google Scholar]
  2. Roughgarden, T. Intrinsic robustness of the price of anarchy. In Proceedings of the 41st Annual ACM Symposium on Theory of Computing (STOC 2009), Bethesda, MD, USA, 31 May–2 June 2009; pp. 513–522. [Google Scholar]
  3. Palaiopanos, G.; Panageas, I.; Piliouras, G. Multiplicative weights update with constant step-size in congestion games: Convergence, limit cycles and chaos. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 5874–5884. [Google Scholar]
  4. Watkins, C.J.C.H. Learning from Delayed Rewards. Ph.D Thesis, University of Cambridge, Cambridge, UK, 1989. [Google Scholar]
  5. Watkins, C.J.; Dayan, P. Q-learning. Mach. Learn. 1992, 8, 279–292. [Google Scholar] [CrossRef]
  6. Tan, M. Multi-agent reinforcement learning: Independent vs. cooperative agents. In Proceedings of the tenth international conference on machine learning, Amherst, MA, USA, 27–29 June 1993; pp. 330–337. [Google Scholar]
  7. McKelvey, R.D.; Palfrey, T.R. Quantal response equilibria for normal form games. Games Econ. Behav. 1995, 10, 6–38. [Google Scholar] [CrossRef]
  8. Nash, J. Equilibrium points in n-person games. Proc. Natl. Acad. Sci. USA 1950, 36, 48–49. [Google Scholar] [CrossRef] [PubMed]
  9. Wolpert, D.H.; Harré, M.; Olbrich, E.; Bertschinger, N.; Jost, J. Hysteresis effects of changing the parameters of noncooperative games. Phys. Rev. E 2012, 85, 036102. [Google Scholar] [CrossRef] [PubMed]
  10. Kianercy, A.; Veltri, R.; Pienta, K.J. Critical transitions in a game theoretic model of tumour metabolism. Interface Focus 2014, 4, 20140014. [Google Scholar] [CrossRef] [PubMed]
  11. Cominetti, R.; Melo, E.; Sorin, S. A payoff-based learning procedure and its application to traffic games. Games Econ. Behav. 2010, 70, 71–83. [Google Scholar] [CrossRef]
  12. Coucheney, P.; Gaujal, B.; Mertikopoulos, P. Entropy-Driven Dynamics and Robust Learning Procedures in Games. Available online: https://hal.inria.fr/hal-00790815/document (accessed on 25 April 2018).
  13. Sato, Y.; Crutchfield, J.P. Coupled replicator equations for the dynamics of learning in multiagent systems. Phys. Rev. E 2003, 67, 015206. [Google Scholar] [CrossRef] [PubMed]
  14. Tuyls, K.; Verbeeck, K.; Lenaerts, T. A selection-mutation model for q-learning in multi-agent systems. In Proceedings of the 2nd international joint conference on Autonomous agents and multiagent systems, Melbourne, Australia, 14–18 July 2003; pp. 693–700. [Google Scholar]
  15. Sandholm, W.H. Evolutionary game theory. In Encyclopedia of Complexity and Systems Science; Springer: Berlin, Germany, 2009; pp. 3176–3205. [Google Scholar]
  16. Kianercy, A.; Galstyan, A. Dynamics of Boltzmann q learning in two-player two-action games. Phys. Rev. E 2012, 85, 041145. [Google Scholar] [CrossRef] [PubMed]
  17. Hofbauer, J.; Hopkins, E. Learning in perturbed asymmetric games. Games Econ. Behav. 2005, 52, 133–152. [Google Scholar] [CrossRef]
  18. Hofbauer, J.; Sigmund, K. Evolutionary Games and Population Dynamics; Cambridge University Press: Cambridge, UK, 1998. [Google Scholar]
  19. Perko, L. Differential Equations and Dynamical Systems, 3rd ed.; Springer: Berlin, Germany, 1991. [Google Scholar]
  20. Tomlinson, I.; Bodmer, W. Modelling the consequences of interactions between tumour cells. Br. J. Cancer 1997, 75, 157–160. [Google Scholar] [CrossRef] [PubMed]
  21. Kaznatcheev, A.; Scott, J.G.; Basanta, D. Edge effects in game-theoretic dynamics of spatially structured tumours. J. R. Soc. Interface 2015, 12, 20150154. [Google Scholar] [CrossRef] [PubMed]
  22. Basanta, D.; Scott, J.G.; Fishman, M.N.; Ayala, G.; Hayward, S.W.; Anderson, A.R. Investigating prostate cancer tumour–stroma interactions: Clinical and biological insights from an evolutionary game. Br. J. Cancer 2012, 106, 174–181. [Google Scholar] [CrossRef] [PubMed][Green Version]
  23. Kaznatcheev, A.; Velde, R.V.; Scott, J.G.; Basanta, D. Cancer treatment scheduling and dynamic heterogeneity in social dilemmas of tumour acidity and vasculature. arXiv, 2016; arXiv:1608.00985. [Google Scholar]
  24. Basanta, D.; Simon, M.; Hatzikirou, H.; Deutsch, A. Evolutionary game theory elucidates the role of glycolysis in glioma progression and invasion. Cell Prolif. 2008, 41, 980–987. [Google Scholar] [CrossRef] [PubMed]
  25. Hanahan, D.; Weinberg, R.A. The hallmarks of cancer. Cell 2000, 100, 57–70. [Google Scholar] [CrossRef]
  26. Axelrod, R.; Axelrod, D.E.; Pienta, K.J. Evolution of cooperation among tumor cells. Proc. Natl. Acad. Sci. USA 2006, 103, 13474–13479. [Google Scholar] [CrossRef] [PubMed]
  27. Hanahan, D.; Weinberg, R.A. Hallmarks of cancer: The next generation. Cell 2011, 144, 646–674. [Google Scholar] [CrossRef] [PubMed]
  28. Ribeiro, M.; Silva, A.S.; Bailey, K.M.; Kumar, N.B.; Sellers, T.A.; Gatenby, R.A.; Ibrahim-Hashim, A.; Gillies, R.J. Buffer Therapy for Cancer. J. nutr. Food Sci. 2012, 2, 6. [Google Scholar] [CrossRef] [PubMed]
  29. Piliouras, G.; Nieto-Granda, C.; Christensen, H.I.; Shamma, J.S. Persistent Patterns: Multi-agent Learning Beyond Equilibrium and Utility. In Proceedings of the 2014 international conference on Autonomous agents and multi-agent systems (AAMAS), Paris, France, 5–9 May 2014; pp. 181–188. [Google Scholar]
  30. Papadimitriou, C.; Piliouras, G. From Nash Equilibria to Chain Recurrent Sets: Solution Concepts and Topology. In Proceedings of the 2016 ACM Conference on Innovations in Theoretical Computer Science, Cambridge, MA, USA, 14–16 January 2016; pp. 227–235. [Google Scholar]
  31. Panageas, I.; Piliouras, G. Average case performance of replicator dynamics in potential games via computing regions of attraction. In Proceedings of the 2016 ACM Conference on Economics and Computation, Maastricht, The Netherlands, 24–28 July 2016; pp. 703–720. [Google Scholar]
  32. Bloembergen, D.; Tuyls, K.; Hennes, D.; Kaisers, M. Evolutionary dynamics of multi-agent learning: A survey. J. Artif. Intell. Res. 2015, 53, 659–697. [Google Scholar]
  33. Romero, J. The effect of hysteresis on equilibrium selection in coordination games. J. Econ. Behav. Organ. 2015, 111, 88–105. [Google Scholar] [CrossRef]
  34. Kleinberg, R.; Ligett, K.; Piliouras, G.; Tardos, É. Beyond the Nash equilibrium barrier. In Proceedings of the Symposium on Innovations in Computer Science (ICS), Beijing, China, 7–9 January 2011. [Google Scholar]
  35. Piliouras, G.; Shamma, J.S. Optimization Despite Chaos: Convex Relaxations to Complex Limit Sets via Poincaré Recurrence. In Proceedings of the Symposium of Discrete Algorithms (SODA), Portland, OR, USA, 5–7 January 2014. [Google Scholar]
  36. Kleinberg, R.; Piliouras, G.; Tardos, É. Multiplicative Updates Outperform Generic No-Regret Learning in Congestion Games. In Proceedings of the ACM Symposium on Theory of Computing (STOC), Bethesda, MD, USA, 31 May–2 June 2009. [Google Scholar]
  37. Mehta, R.; Panageas, I.; Piliouras, G.; Yazdanbod, S. The Computational Complexity of Genetic Diversity. In Proceedings of the 24th Annual European Symposium on Algorithms (ESA 2016), Aarhus, Denmark, 22–24 August 2016; Sankowski, P., Zaroliagis, C., Eds.; Leibniz International Proceedings in Informatics (LIPIcs). Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik: Dagstuhl, Germany, 2016; Volume 57, p. 65. [Google Scholar]
  38. Livnat, A.; Papadimitriou, C.; Dushoff, J.; Feldman, M.W. A mixability theory for the role of sex in evolution. Proc. Natl. Acad. Sci. USA 2008, 105, 19803–19808. Available online: http://www.pnas.org/content/105/50/19803.full.pdf+html (accessed on 20 April 2018). [Google Scholar] [CrossRef] [PubMed]
  39. Chastain, E.; Livnat, A.; Papadimitriou, C.H.; Vazirani, U.V. Multiplicative updates in coordination games and the theory of evolution. In Proceedings of the 4th Innovations in Theoretical Computer Science (ITCS) conference, Berkeley, CA, USA, 10–12 January 2013; pp. 57–58. [Google Scholar]
  40. Chastain, E.; Livnat, A.; Papadimitriou, C.; Vazirani, U. Algorithms, games, and evolution. Proc. Natl. Acad. Sci. USA 2014, 111, 10620–10623. Available online: http://www.pnas.org/content/early/2014/06/11/1406556111.full.pdf+html (accessed on 20 April 2018). [Google Scholar] [CrossRef] [PubMed]
  41. Livnat, A.; Papadimitriou, C.; Rubinstein, A.; Valiant, G.; Wan, A. Satisfiability and evolution. In Proceedings of the 2014 IEEE 55th Annual Symposium on Foundations of Computer Science (FOCS), Philadelphia, PA, USA, 18–21 October 2014; pp. 524–530. [Google Scholar]
  42. Meir, R.; Parkes, D. A Note on Sex, Evolution, and the Multiplicative Updates Algorithm. In Proceedings of the 12th International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS 15), Istanbul, Turkey, 4–8 May 2015. [Google Scholar]
  43. Mehta, R.; Panageas, I.; Piliouras, G. Natural Selection as an Inhibitor of Genetic Diversity: Multiplicative Weights Updates Algorithm and a Conjecture of Haploid Genetics. In Proceedings of the 2015 Conference on Innovations in Theoretical Computer Science, ITCS 2015, Rehovot, Israel, 11–13 January 2015. [Google Scholar]
  44. Mehta, R.; Panageas, I.; Piliouras, G.; Tetali, P.; Vazirani, V.V. Mutation, Sexual Reproduction and Survival in Dynamic Environments. In Proceedings of the 2017 Conference on Innovations in Theoretical Computer Science (To Appear), ITCS’ 17, Berkeley, CA, USA, 9–11 January 2017. [Google Scholar]
  45. Livnat, A.; Papadimitriou, C. Sex as an algorithm: The theory of evolution under the lens of computation. Commun. ACM (CACM) 2016, 59, 84–93. [Google Scholar] [CrossRef]
  46. Sandholm, W.H. Population Games and Evolutionary Dynamics; MIT Press: Cambridge, MA, USA, 2010. [Google Scholar]
  47. Bendixson, I. Sur les courbes définies par des équations différentielles. Acta Math. 1901, 24, 1–88. [Google Scholar] [CrossRef]
  48. Teschl, G. Ordinary Differential Equations and Dynamical Systems; American Mathematical Soc.: Providence, RI, USA, 2012; Volume 140. [Google Scholar]
  49. Müller, J.; Kuttler, C. Methods and Models in Mathematical Biology; Springer: Berlin, Germany, 2015. [Google Scholar]
  50. Meiss, J. Differential Dynamical Systems; SIAM: Philadelphia, PA, USA, 2007. [Google Scholar]
1
Mixed strategies in the QRE model are sometimes interpreted as frequency distributions of deterministic actions in a large population of users. This population interpretation of mixed strategies is standard and dates back to Nash [8]. Depending on context, we will use either the probabilistic interpretation or the population one.
2
To find the matrix of the orthogonal projection onto T X (or any subspace Y of R n ) it suffices to find a basis ( v 1 , v 2 , , v m ). Let B be the matrix with columns v i ; then P = B ( B T B ) 1 B T .
3
A function f between two topological spaces is called a diffeomorphism if it has the following properties: f is a bijection, f is continuously differentiable, and f has a continuously differentiable inverse. Two flows Φ t : A A and Ψ t : B B are diffeomorhpic if there exists a diffeomorphism g : A B such that for each x A and t R g ( Φ t ( x ) ) = Ψ t ( g ( x ) ) . If two flows are diffeomorphic, then their vector fields are related by the derivative of the conjugacy. That is, we get precisely the same result that we would have obtained if we simply transformed the coordinates in their differential equations [50].
Figure 1. Bifurcation diagram for a 2 × 2 population coordination game. The x axis corresponds to the system temperature T, whereas the y axis corresponds to the projection of the proportion of the first population using the first strategy at equilibrium. For small T, the system exhibits multiple equilibria. Starting at T = 0 , and by increasing the temperature beyond the critical threshold T C = 6 , and then bringing it back to zero, we can force the system to converge to another equilibrium.
Figure 1. Bifurcation diagram for a 2 × 2 population coordination game. The x axis corresponds to the system temperature T, whereas the y axis corresponds to the projection of the proportion of the first population using the first strategy at equilibrium. For small T, the system exhibits multiple equilibria. Starting at T = 0 , and by increasing the temperature beyond the critical threshold T C = 6 , and then bringing it back to zero, we can force the system to converge to another equilibrium.
Games 09 00021 g001
Figure 2. The bifurcation diagram for Example 1 with T y = 0.5 . The horizontal axis corresponds to the temperature T x for the first (row) player and the vertical axis corresponds to the probability that the first player chooses the first action in equilibrium. There exist three branches (two stable and one unstable). For x > 0.5 , there are two branches appearing in pairs, and they occur only when T x is less than some value. For x < 0.5 , there is a branch, which we call the principal branch, where the quantal response equilibrium (QRE) always exists for any T x > 0 .
Figure 2. The bifurcation diagram for Example 1 with T y = 0.5 . The horizontal axis corresponds to the temperature T x for the first (row) player and the vertical axis corresponds to the probability that the first player chooses the first action in equilibrium. There exist three branches (two stable and one unstable). For x > 0.5 , there are two branches appearing in pairs, and they occur only when T x is less than some value. For x < 0.5 , there is a branch, which we call the principal branch, where the quantal response equilibrium (QRE) always exists for any T x > 0 .
Games 09 00021 g002
Figure 3. Bifurcation diagram for Example 1 with T y = 2 . The horizontal axis corresponds to the temperature T x for the first (row) player and the vertical axis corresponds to the probability that the first player chooses the first action in equilibrium. Similar to Figure 2, there exist three branches (two stable and one unstable). However, unlike Figure 2, now the two branches appearing in pairs happen at x < 0.5 , and the principal branch is at x > 0.5 .
Figure 3. Bifurcation diagram for Example 1 with T y = 2 . The horizontal axis corresponds to the temperature T x for the first (row) player and the vertical axis corresponds to the probability that the first player chooses the first action in equilibrium. Similar to Figure 2, there exist three branches (two stable and one unstable). However, unlike Figure 2, now the two branches appearing in pairs happen at x < 0.5 , and the principal branch is at x > 0.5 .
Games 09 00021 g003
Figure 4. Bifurcation diagram for a coordination game with a Y < b Y and a low T y . The horizontal axis corresponds to the temperature T x for the first (row) player and the vertical axis corresponds to the probability that the first player chooses the first action in equilibrium. We can find that the principal branch is contained in x < 0.5 .
Figure 4. Bifurcation diagram for a coordination game with a Y < b Y and a low T y . The horizontal axis corresponds to the temperature T x for the first (row) player and the vertical axis corresponds to the probability that the first player chooses the first action in equilibrium. We can find that the principal branch is contained in x < 0.5 .
Games 09 00021 g004
Figure 5. Bifurcation diagram for a coordination game with a Y < b Y and a high T Y . The horizontal axis corresponds to the temperature T x for the first (row) player and the vertical axis corresponds to the probability that the first player chooses the first action in equilibrium. We can find that the principal branch is contained in x > 0.5 . In addition, there is a non-stable segment on the principal branch.
Figure 5. Bifurcation diagram for a coordination game with a Y < b Y and a high T Y . The horizontal axis corresponds to the temperature T x for the first (row) player and the vertical axis corresponds to the probability that the first player chooses the first action in equilibrium. We can find that the principal branch is contained in x > 0.5 . In addition, there is a non-stable segment on the principal branch.
Games 09 00021 g005
Figure 6. The left figure is the social welfare on the principal branch for Example 2, and the right figure is an illustration when T X = 0 . We can see that by increasing T y , we can obtain an equilibrium with a social welfare higher than that of the best Nash equilibrium (which is T x = T y = 0 ).
Figure 6. The left figure is the social welfare on the principal branch for Example 2, and the right figure is an illustration when T X = 0 . We can see that by increasing T y , we can obtain an equilibrium with a social welfare higher than that of the best Nash equilibrium (which is T x = T y = 0 ).
Games 09 00021 g006
Figure 7. Set of QRE-achievable states for Example 2. A point ( x , y ) represents a mixed strategy profile where the first agent chooses its first strategy with probability x and the second agent chooses its first strategy with probability y. The grey areas depict the set of mixed strategy profiles ( x , y ) that can be reproduced as QRE states for Example 2, i.e., these are outcomes for which there are temperature parameters ( T x , T y ) for which the ( x , y ) mixed strategy profile is a QRE.
Figure 7. Set of QRE-achievable states for Example 2. A point ( x , y ) represents a mixed strategy profile where the first agent chooses its first strategy with probability x and the second agent chooses its first strategy with probability y. The grey areas depict the set of mixed strategy profiles ( x , y ) that can be reproduced as QRE states for Example 2, i.e., these are outcomes for which there are temperature parameters ( T x , T y ) for which the ( x , y ) mixed strategy profile is a QRE.
Games 09 00021 g007
Figure 8. Social welfare for all states in Example 2. A point ( x , y ) represents a mixed strategy profile where the first agent chooses its first strategy with probability x and the second agent chooses its first strategy with probability y. The color of the point ( x , y ) corresponds to the social welfare of that mixed strategy profile with states of higher social welfare corresponding to lighter shades. The optimal state is ( 1 , 0 ) , whereas the worst state is ( 0 , 1 ) .
Figure 8. Social welfare for all states in Example 2. A point ( x , y ) represents a mixed strategy profile where the first agent chooses its first strategy with probability x and the second agent chooses its first strategy with probability y. The color of the point ( x , y ) corresponds to the social welfare of that mixed strategy profile with states of higher social welfare corresponding to lighter shades. The optimal state is ( 1 , 0 ) , whereas the worst state is ( 0 , 1 ) .
Games 09 00021 g008
Figure 9. Set of QRE-achievable states for a coordination game with a Y < b Y . A point ( x , y ) represents a mixed strategy profile where the first agent chooses its first strategy with probability x and the second agent chooses its first strategy with probability y. The grey areas depict the set of mixed strategy profiles ( x , y ) that can be reproduced as QRE states a coordination game with a Y < b Y , i.e., these are outcomes for which there exists temperature parameters ( T x , T y ) for which the ( x , y ) mixed strategy profile is a QRE.
Figure 9. Set of QRE-achievable states for a coordination game with a Y < b Y . A point ( x , y ) represents a mixed strategy profile where the first agent chooses its first strategy with probability x and the second agent chooses its first strategy with probability y. The grey areas depict the set of mixed strategy profiles ( x , y ) that can be reproduced as QRE states a coordination game with a Y < b Y , i.e., these are outcomes for which there exists temperature parameters ( T x , T y ) for which the ( x , y ) mixed strategy profile is a QRE.
Games 09 00021 g009
Figure 10. Stable QRE-achievable states for a coordination game with a Y > b Y . A point ( x , y ) represents a mixed strategy profile, where the first agent chooses its first strategy with probability x and the second agent chooses its first strategy with probability y. The grey areas depict the set of mixed strategy profiles ( x , y ) that can be reproduced as stable QRE states a coordination game with a Y > b Y , i.e., these are outcomes for which there are temperature parameters ( T x , T y ) for which the ( x , y ) mixed strategy profile is a stable QRE.
Figure 10. Stable QRE-achievable states for a coordination game with a Y > b Y . A point ( x , y ) represents a mixed strategy profile, where the first agent chooses its first strategy with probability x and the second agent chooses its first strategy with probability y. The grey areas depict the set of mixed strategy profiles ( x , y ) that can be reproduced as stable QRE states a coordination game with a Y > b Y , i.e., these are outcomes for which there are temperature parameters ( T x , T y ) for which the ( x , y ) mixed strategy profile is a stable QRE.
Games 09 00021 g010
Figure 11. Illustration for Phase 1 in Case (B3), where we keep low T Y but increase T X and then decrease T X back to a small value. In this phase, the equilibrium state moves from the branch where x ( 0.7 , 1.0 ) to the principal branch (the branch where x < 0.5 ).
Figure 11. Illustration for Phase 1 in Case (B3), where we keep low T Y but increase T X and then decrease T X back to a small value. In this phase, the equilibrium state moves from the branch where x ( 0.7 , 1.0 ) to the principal branch (the branch where x < 0.5 ).
Games 09 00021 g011
Figure 12. Illustration for Phase 2 in Case (B3). In this phase, we increase T Y to T Y I δ 1 , b X a X + b X δ 2 . The principal branch switches from x < 0.5 to x > 0.5 and the equilibrium state stays on the branch x < 0.5 (the branch pointed out by the blue arrow) only if T X is low.
Figure 12. Illustration for Phase 2 in Case (B3). In this phase, we increase T Y to T Y I δ 1 , b X a X + b X δ 2 . The principal branch switches from x < 0.5 to x > 0.5 and the equilibrium state stays on the branch x < 0.5 (the branch pointed out by the blue arrow) only if T X is low.
Games 09 00021 g012
Figure 13. Interaction diagram between different type of cells. The hypoxic cells can benefit from the presence of oxygenated non-glycolytic cells with modest glucose requirements, whereas cells with aerobic metabolism can benefit from the lactic acids that are the byproduct of anaerobic metabolism.
Figure 13. Interaction diagram between different type of cells. The hypoxic cells can benefit from the presence of oxygenated non-glycolytic cells with modest glucose requirements, whereas cells with aerobic metabolism can benefit from the lactic acids that are the byproduct of anaerobic metabolism.
Games 09 00021 g013
Table 1. Payoff matrix for the cancer game in [10], where L > G o / 2 . This 2 × 2 game represents the tumor metabolic symbiosis rewards (ATP generation). The row agent represents hypoxic cells, and the column one represents oxygenated cell energy generation values based on their collective actions. Specifically, oxygenated cells can use both glucose and lactate for energy generation, whereas the hypoxic cells can use only glucose. Empirical data as discussed in [10] suggests that L > G o / 2 .
Table 1. Payoff matrix for the cancer game in [10], where L > G o / 2 . This 2 × 2 game represents the tumor metabolic symbiosis rewards (ATP generation). The row agent represents hypoxic cells, and the column one represents oxygenated cell energy generation values based on their collective actions. Specifically, oxygenated cells can use both glucose and lactate for energy generation, whereas the hypoxic cells can use only glucose. Empirical data as discussed in [10] suggests that L > G o / 2 .
Hypoxic/OxygenatedGlucoseLactate
Glucose G h / 2 , G o / 2 G h , L
Lactate 0 , G o 0 , 0
Table 2. Payoff matrix for a coordination game between two agents where neither of the two pure Nash Pareto dominates the other. States where both agents play the first strategy (Technology 1) are nearly socially optimal and they can be selected via a bifurcation argument.
Table 2. Payoff matrix for a coordination game between two agents where neither of the two pure Nash Pareto dominates the other. States where both agents play the first strategy (Technology 1) are nearly socially optimal and they can be selected via a bifurcation argument.
Sector A/Sector BTechnology 1Technology 2
Technology 1 10 , 2 0 , 0
Technology 2 0 , 0 5 , 4

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Back to TopTop