Equilibrium Selection under the Bayes-Based Strategy Updating Rules

: In this paper, ﬁrst, an evolutionary game model for Bayes-based strategy updating rules was constructed, in which players can only observe a signal that reveals a strategy type instead of the strategy type directly, which deviates from the strategy type of players. Then, the equilibrium selection of populations in the case of the asymmetric game, the Battle of the Sexes (BoS), and the case of a symmetric coordination game was studied where individuals make decisions based on the signals released by each player. Finally, it was concluded that in the BoS game, when the accuracy of the signal is low, the population eventually reaches an incompatible state. If the accuracy of the signal is improved, the population ﬁnally reaches a coordinated state. In a coordination game, when the accuracy of the signal is low, the population will eventually choose a payo ﬀ -dominated equilibrium. With the improvement of signal accuracy, the equilibrium of the ﬁnal selection of the population depends on its initial state.


Introduction
Because there are many Nash equilibria in a coordination game, it is difficult to predict the final game equilibrium. Individuals constantly change their strategies in the process of the game, which, as a dynamic adjustment, is a process of equilibrium selection. In evolutionary game theory, individuals with bounded rationality adjust their strategies in the process of natural selection [1][2][3][4][5][6] and finally reach equilibrium.
In a uniformly mixed infinite population [5,7], the evolutionary stability of game equilibrium was studied by constructing a deterministic evolutionary dynamic process of replicator dynamics [1,4]. Then, a stochastic replicator dynamic model [8][9][10][11][12][13][14][15] was constructed based on the determined evolutionary dynamic model and considered the impact of external random shocks to study the impact of external random disturbances on population equilibrium selection. For uniformly mixed finite populations [6], a Markov chain-based stochastic evolutionary dynamic model was constructed [16][17][18][19][20][21][22][23] to study the evolution of population strategies. For ergodic Markov processes, the stationary distribution of the population state space was calculated [17]. For non-ergodic Markov processes, the rooting probability of the absorption state of the process was calculated [16,24].
In the above studies, it is assumed that players can directly observe the strategies adopted by other players and the payoffs obtained. Players update their strategies based on the payoffs that determine the probability of choosing one of the strategies. However, in the real world, it is often difficult to directly observe the strategies adopted by opponents, which need to be considered based on the signals released by them. Therefore, some researchers have begun to use the Bayes Theorem to construct learning and updating rules based on Bayesian inference to study the mechanism of collective Symmetry 2020, 12 behavior formation [25][26][27][28][29][30] and to analyze the equilibrium selection of the population in the long-term evolution process. Fudenberg and Imhof (2006) [17] studied a symmetric game with a rare mutation rate in a finite population. They construct a Markov chain where the transition probabilities were related to their payoff. Their research shows that the evolutionary process simplifies significantly and approximates a simpler process over the pure states, which dramatically reduces the state space and the calculation of the stationary distribution. Based on the research of Fudenberg and Imhof (2006), Veller and Hayward (2016) [18] further studied an asymmetric game with a rare mutation rate in a finite population and presented the analytic results of the stationary distribution of the evolutionary process. However, recent studies have identified Bayesian inference as a mechanism behind collective behavior in nature and human society [26,27]. Therefore, it is necessary to analyze the equilibrium selection from the perspective of Bayesian inference.
In this paper, strategy updating rules based on Bayesian inference were constructed, in which players cannot directly observe the payoffs and strategies of their opponents, but only receive the related signals and then analyze their strategy evolution under the updating rules. For the problem of multiple equilibrium selection in a coordinated game, these rules were used to analyze the equilibrium selection in the long-term evolution process. The equilibrium choice in an asymmetric game represented by the Battle of the Sexes game (BOS) [17,18] and a symmetric game were studied, respectively. In the asymmetric Battle of the Sexes game [31][32][33][34], players with different roles use different strategies. The male player prefers a football match, while the female player prefers a ballet performance. They have to convince each other, otherwise they will get nothing. For the collective payoff, it is optimal for them to choose the same strategy. Players hold different preferences but hope to adopt the same strategy. Contrary to the Battle of the Sexes game, in a symmetric coordination game, players hold the same preferences and hope to adopt the same strategy. We constructed the evolutionary game model with strategy updating rules based on Bayesian inference and studied the evolutionary equilibrium selection in these games.

Model in the BoS Game
Suppose a 2 × 2 Battle of the Sexes game has two populations, men (m) and women (w), with the size of N. At each time t, a population is randomly selected from two populations to update the strategy, and all players in the population update the strategy at the same time. For each individual in the two populations, there is the same strategy space, and the strategy type is S ∈ {A, B}. The payoff matrix of the game is as follows in Table 1: In this matrix, a > b. There are two pure-strategy Nash equilibria (A, A) and (B, B) in this game. Both populations want to choose the same strategy. At each time t, each player cannot directly observe the strategy type of the players in the other population, but can only receive a signal S mt = S 1mt , S 2mt , · · · , S Nmt , S wt = S 1wt , S 2wt , · · · , S Nwt about their strategy type. The signal of each player's strategy type is public knowledge; that is, all the players' strategy types in each population will be observed by all the players in the opposite population, the received signals of all the players' strategy types are the same. There is a deviation between the type of strategy and the signal sent by Symmetry 2020, 12, 739 3 of 11 player i and j in m and w. The conditional probability of the signal accurately reflecting the type of strategy of players i and j is as follows: where 1/2 < u < 1. Meanwhile, because every player updates their strategy according to public signal information at the same time, all players in a population will adopt the same strategy at every time t. P m t (A) = P(X(t) = N) and P w t (A) = P(Y(t) = N) represent the probability that all players in m and w adopt strategy A at time t, respectively. The strategy type of the population is inferred according to Bayesian rules updating. The posteriori probability of strategy A being selected by the opposite population considered by the population is as follows: where δ w t = 2n w t − N, δ m t = 2n m t − N; n w t and n m t represent the number of signals of strategy A sent by the two populations w and m at time t, respectively; δ w t and δ m t represent the difference between the signals of strategy A and strategy B sent by the two populations at time t. As with the previous model, it is assumed in this paper that the prior probabilities of strategies A and B are equal for two populations at each time t, i.e., P m t0 (A) = 1/2, P w t0 (A) = 1/2; the probability of choosing strategies A and B is 1/2 at the initial time.
In the BoS game, there is a mixed Nash equilibrium p * = (p * m , p * w ) = a a+b , b a+b . For population m, if the posterior probability of population w in the state of strategy A at time t meets P m t A S wt ≥ p * w , then population m will adopt strategy A; otherwise, population m will adopt strategy B. Similarly, for population w, if the posterior probability of the population m in the state of strategy A at time t meets P w t A S mt ≥ p * m , then population w will adopt strategy A; otherwise, population w will adopt strategy B. From P m t A S wt ≥ p * w and P w t A S mt ≥ p * m , we derive the following: Thus, δ w t ≥ log a b /log 1−u u . Then, population m will adopt strategy A, while population w will adopt strategy A when δ m t ≥ log b a /log 1−u u . Because players only update the posterior probability of each other's strategy type according to the currently received signal and its prior probability at time t. At the same time, it is assumed in this paper that, at every time t, players will not be affected by the posterior probability of the previous period; that is, the prior probability of players to the other party's strategy type in each period is always equiprobable P m 0 (A) = P w 0 (A) = 1/2. P m t A S wt and P w t A S mt will only affect the current decision-making, not future ones. At this time, the strategy update of the two populations only where Similarly, the transition probability matrix of Markov process is as follows: Symmetry 2020, 12, 739 , the Markov process is ergodic with a stable distribution as follows: According to Proposition 1, when t →∞, the population is in a coordinated state; that is, the probability of (A, A) and (B, B) states is equal. At the same time, it is derived from Φ w ≥ Φ m that π A,B > π B,A . When the population is in an incompatible state, strategy A will bring higher payoffs to m, while strategy B will bring higher payoffs to population w. Compared with the incompatible state (B, A), the population will choose the incompatible state (A, B) with higher probability.
, population m and population w will eventually be in an incompatible state (A, B).

It is proven that when
At this time, the matrix of transition probability of the Markov process becomes: Proposition 2 shows that in this asymmetric game, if there is a big difference in payoff between the two kinds of coordinated equilibria, the population will choose the strategy that brings higher payoffs to itself, which makes the two populations always in an incompatible state, unable to reach the Nash equilibrium.
In Figure 1, (a) shows the probability that the population is in coordination (A, A) or (B, B) under the combination of different parameters u and N. Figure 1b shows the final evolutionarily stable state of the strategy updating process in the region with different parameter combinations.
In Figure 1, (b) also shows that when the population size N is smaller or the signal reliability u is smaller, the population will be more inclined to choose the strategy that brings higher payoffs to itself. Because the two populations have different incomes under different equilibrium tendencies, population m is more inclined to choose (A, A) equilibrium, while population w is more inclined to choose (B, B) equilibrium, which makes the two populations always in an incompatible state (A, B). With the improvement of signal accuracy, four states coexist. When the signal accuracy is further improved, the population must be in an (A, A) or (B, B) equilibrium state after a long-time evolution. It is thus concluded that the increase of signal accuracy and population size is more conducive to population coordination. Proposition 3. When N → ∞ , there is a u * = 1/2, so that when u > u * , the population m and the population w will finally be in a coordinated state (A, A) or (B, B). the two kinds of coordinated equilibria, the population will choose the strategy that brings higher payoffs to itself, which makes the two populations always in an incompatible state, unable to reach the Nash equilibrium.
In Figure 1, (a) shows the probability that the population is in coordination ( , ) or ( , ) under the combination of different parameters and . Figure 1b shows the final evolutionarily stable state of the strategy updating process in the region with different parameter combinations. In Figure 1, (b) also shows that when the population size is smaller or the signal reliability is smaller, the population will be more inclined to choose the strategy that brings higher payoffs to itself. Because the two populations have different incomes under different equilibrium tendencies, population m is more inclined to choose ( , ) equilibrium, while population w is more inclined to choose ( , ) equilibrium, which makes the two populations always in an incompatible state ( , ).
With the improvement of signal accuracy, four states coexist. When the signal accuracy is further improved, the population must be in an ( , ) or ( , ) equilibrium state after a long-time evolution. It is thus concluded that the increase of signal accuracy and population size is more conducive to population coordination. It is proven that when N → ∞ , j * w /N ≤ 1/2 < u, from the law of large numbers, lim It is derived from Proposition 3 that when the population size area is infinite, there are two absorbing states (A, A) and (B, B) in the Markov process. After a long time of evolution, the population will finally reach the absorbing state; that is, the population will be in an equilibrium state. Based on the theoretical model of replicator dynamics of infinite population, as shown in Figure 2, (A, A), (B, B) show the evolutionary stable equilibrium of the replicator dynamic system. After a long series of games between populations, the evolutionary stable equilibrium will be achieved, and the final evolutionary stable state of the population depends on its initial state. Replicator dynamics is a deterministic imitation process. However, under the Bayes-based strategy updating rules, when the infinite population meets the assumption that the signal accuracy is 1/2 < u < 1, that is, the signal has a higher probability to reflect its real strategy type, even if the players cannot directly observe the strategy type of the opponent, they can still accurately guess and update the strategy according to a large number of signals sent by the opponent's population. At this time, the population will still reach the equilibrium state. It is derived from Proposition 3 that when the population size area is infinite, there are two absorbing states ( , ) and ( , ) in the Markov process. After a long time of evolution, the population will finally reach the absorbing state; that is, the population will be in an equilibrium state. Based on the theoretical model of replicator dynamics of infinite population, as shown in Figure  2, ( , ), ( , ) show the evolutionary stable equilibrium of the replicator dynamic system. After a long series of games between populations, the evolutionary stable equilibrium will be achieved, and the final evolutionary stable state of the population depends on its initial state. Replicator dynamics is a deterministic imitation process. However, under the Bayes-based strategy updating rules, when the infinite population meets the assumption that the signal accuracy is 1/2 < < 1, that is, the signal has a higher probability to reflect its real strategy type, even if the players cannot directly observe the strategy type of the opponent, they can still accurately guess and update the strategy according to a large number of signals sent by the opponent's population. At this time, the population will still reach the equilibrium state.

Model for Coordination Game
As showed in Table 2, suppose a 2 × 2 coordination game has two populations, 1 and 2 with the size of . At each time t, a population is randomly selected from two populations to update the strategy, and all players in the population update the strategy at the same time. For each individual in the two populations, there is the same strategy space, and the strategy type is ∈ { , }. The payoff matrix of the game is as follows:

Model for Coordination Game
As showed in Table 2, suppose a 2 × 2 coordination game G has two populations, 1 and 2 with the size of N. At each time t, a population is randomly selected from two populations to update the strategy, and all players in the population update the strategy at the same time. For each individual in the two populations, there is the same strategy space, and the strategy type is S ∈ {A, B}. The payoff matrix of the game is as follows: In this matrix, a > b. There are two pure-strategy Nash equilibria (A, A) and (B, B) in this game. Both populations want to choose the same strategy. Same as the previous model, at each time t, each player cannot directly observe the strategy type of the players in the other population, but can only receive the signal S = S 1t , S 2t , · · · , S Nt about the their strategy type. The signal of each player's strategy type is public knowledge; that is, all the players' strategy types in each population will be observed by all the players in the opposite population, and the received signals of all the players' strategy types are the same. There is a deviation between the type of strategy and the signal sent by player i in each population. The conditional probability of the signal accurately reflecting the type of strategy of player i is as follows: In the coordination game, there is a mixed Nash equilibrium p * = b a+b . At this time, if the posterior probability of the other population in the state of strategy A at t time meets P w t A S t ≥ p * , then population w will adopt strategy A; otherwise, population w will adopt strategy B. It is derived from P w t A S t ≥ p * that when δ t ≥ log a b /log 1−u u , the population will choose strategy A.
Similarly, the matrix of transition probability of Markov process is as follows: , the Markov process is ergodic with a stable distribution as follows: It is derived from Φ A ≥ Φ B that π A,A > π B,B , because the equilibrium point (A, A) will bring higher payoffs than (B, B), the population will be in the equilibrium state (A, A) with a higher probability. Strategy A will bring higher payoffs to population 1, while strategy B will bring higher payoffs to population 2; thus, the population will choose the incompatible state (A, B) with higher probability compared with the incompatible state (B, A).

It is proved that when
At this time, the matrix of transition probability of the Markov process becomes the following: Proposition 5 shows that in this coordination game, if there is a big difference in payoff between the two kinds of coordinated equilibria, the population will choose the equilibrium that brings higher payoffs to itself, which results in the two populations always being in a coordinated state, finally reaching the Nash equilibrium. Figure 3 shows the probability that the population is in coordination (A, A) or (B, B) under the combination of different parameters u and N. It shows the final evolutionarily stable state of the strategy updating process in the region with different parameter combinations.
In the Markov process, state ( , ), ( , ), ( , ) are transient, and state ( , ) is the absorbing state of the process. Proposition 5 shows that in this coordination game, if there is a big difference in payoff between the two kinds of coordinated equilibria, the population will choose the equilibrium that brings higher payoffs to itself, which results in the two populations always being in a coordinated state, finally reaching the Nash equilibrium. Figure 3 shows the probability that the population is in coordination ( , ) or ( , ) under the combination of different parameters and . It shows the final evolutionarily stable state of the strategy updating process in the region with different parameter combinations.  Figure 3 shows that when the accuracy of the signal is low and the population size is small, it will be difficult for the population to determine the strategy type of the opponent through signals, and the population will prefer to choose the payoff-dominant equilibrium ( , ) . With the improvement of signal accuracy and the increase of population size, people can infer the strategy type of opponents to a certain extent. At this time, there will be the coexistence of four strategies. When the signal accuracy is very high or the population size is very large, the population can accurately determine the strategy type of the opponent and be in equilibrium. Compared with the BoS game, there is a very small chance for the probability of players in the coordination game to be in a non-equilibrium state. Proposition 6. When → ∞, there is a * = 1/2, so that when > * , the two populations will finally be in a coordinated state ( , ) or ( , ).
The proof is same as that of Proposition 3. When → ∞, →∞ = →∞ = 1, the matrix of transition probability at this time of this process is the following:  Figure 3 shows that when the accuracy of the signal is low and the population size is small, it will be difficult for the population to determine the strategy type of the opponent through signals, and the population will prefer to choose the payoff-dominant equilibrium (A, A). With the improvement of signal accuracy and the increase of population size, people can infer the strategy type of opponents to a certain extent. At this time, there will be the coexistence of four strategies. When the signal accuracy is very high or the population size is very large, the population can accurately determine the strategy type of the opponent and be in equilibrium. Compared with the BoS game, there is a very small chance for the probability of players in the coordination game to be in a non-equilibrium state. Proposition 6. When N → ∞ , there is a u * = 1/2, so that when u > u * , the two populations will finally be in a coordinated state (A, A) or (B, B). The proof is same as that of Proposition 3. When N → ∞ , lim N→∞ Φ A = lim N→∞ Φ B = 1, the matrix of transition probability at this time of this process is the following: At this time, there are two absorbing states (A, A) and (B, B), and the final state of the process depends on its initial state.
It is derived from Proposition 6 that when the population size area is infinite, there are two absorbing states (A, A) and (B, B) in the Markov process. After a long time of evolution, the population will finally reach the absorbing state; that is, the population will be in an equilibrium state. Based on the theoretical model of replicator dynamics of infinite population, as shown in Figure 4, (A, A), (B, B) show the evolutionary stable equilibrium of the replicator dynamic system. After a long period of games between populations, the evolutionary stable equilibrium will be achieved, and the final evolutionary stable state of the population depends on its initial state. Replicator dynamics is a deterministic imitation process. However, under the Bayes-based strategy updating rules, when the infinite population meets the assumption that the signal accuracy is 1/2 < u < 1, that is, the signal has a higher probability to reflect its real strategy type, even if the players cannot directly observe the strategy type of the opponent, they can still accurately guess and update the strategy according to a large number of signals sent by the opponent's population. At this time, the population will still reach the equilibrium state.  (28) At this time, there are two absorbing states ( , ) and ( , ), and the final state of the process depends on its initial state.
It is derived from Proposition 6 that when the population size area is infinite, there are two absorbing states ( , ) and ( , ) in the Markov process. After a long time of evolution, the population will finally reach the absorbing state; that is, the population will be in an equilibrium state. Based on the theoretical model of replicator dynamics of infinite population, as shown in Figure  4, ( , ), ( , ) show the evolutionary stable equilibrium of the replicator dynamic system. After a long period of games between populations, the evolutionary stable equilibrium will be achieved, and the final evolutionary stable state of the population depends on its initial state. Replicator dynamics is a deterministic imitation process. However, under the Bayes-based strategy updating rules, when the infinite population meets the assumption that the signal accuracy is 1/2 < < 1, that is, the signal has a higher probability to reflect its real strategy type, even if the players cannot directly observe the strategy type of the opponent, they can still accurately guess and update the strategy according to a large number of signals sent by the opponent's population. At this time, the population will still reach the equilibrium state.

Conclusions
In this paper, we analyzed the equilibrium selection from the perspective of Bayesian inference. In the study of Veller and Hayward (2016) [18], the state of population switches among different pure states but tends to lead towards the coordination state. However, our research shows that the finite population finally reaches the incompatible state when the accuracy of the signal is low in the BoS game. With the improvement of signal accuracy, the Markov process has a stable distribution, and the population transfers between the coordinated and incompatible states. When the signal accuracy is very high, the population will eventually reach a coordinated state. Because the state of the population is closely related to the accuracy of the signal, populations always tend towards uncoordinated states because of inaccurate signal, which is different from the results of Veller and Hayward (2016) [18]. In the infinite populations, as long as the accuracy of the signal is more than

Conclusions
In this paper, we analyzed the equilibrium selection from the perspective of Bayesian inference. In the study of Veller and Hayward (2016) [18], the state of population switches among different pure states but tends to lead towards the coordination state. However, our research shows that the finite population finally reaches the incompatible state when the accuracy of the signal is low in the BoS game. With the improvement of signal accuracy, the Markov process has a stable distribution, and the population transfers between the coordinated and incompatible states. When the signal accuracy is very high, the population will eventually reach a coordinated state. Because the state of the population is closely related to the accuracy of the signal, populations always tend towards uncoordinated states because of inaccurate signal, which is different from the results of Veller and Hayward (2016) [18]. In the infinite populations, as long as the accuracy of the signal is more than one half, the population will eventually reach a coordinated state. In the coordination game, when the accuracy of the signal is low, the population will reach the coordinated state and finally choose the payoff-dominant equilibrium, which is different from the case of the BoS game. With the improvement of signal accuracy, the population transfers between the coordinated and incompatible states and will eventually reach a coordinated state when the signal accuracy is very high, which is similar to the case of the BoS game. In the infinite populations, the population will always reach a coordinated state, and the finally selected equilibrium of the population depends on its initial state.
Author Contributions: C.Z. contributed to the conception of the study, J.L. performed the data analyses and wrote the manuscript. All authors have read and agreed to the published version of the manuscript.

Conflicts of Interest:
The authors declare no conflict of interest.