Entropy-Based Measure of Statistical Complexity of a Game Strategy

In this note, we introduce excess strategic entropy—an entropy-based measure of complexity of the strategy. It measures complexity and predictability of the (mixed) strategy of a player. We show and discuss properties of this measure and its possible applications.


Introduction
Many social (economic, political, etc.) interactions have been modeled as formal games. In particular, repeated games play the central role in models of the long-term competition in economic theory and modeling interactions which are repeated frequently [1,2]. In such games, a strategy is a set of history-contingent plans of action. Each plan of action (play) is an infinite sequence of pairs (for two-player games) or n-tuples (in the n-player game) of strategies in each stage game. In every stage game, each player uses a mixed strategy (in order to avoid possible confusion, we point out that, as is standard in the literature, we consider mixed strategies so long as their support lies in the set of feasible pure strategies; a possible interpretation of mixed strategies in games in general is that they are distributions of pure strategies in a population of potential players). Generally, in the literature, it is assumed that players can carry out any strategy in a specified strategy set, should they choose to play it. While this latter assumption may seem innocuous in a model where few strategies are available to each player, it may be criticized as being unrealistically rational in more complex models where a theoretical definition of strategy leads to a strategy set that contains a large number of choices, many of which are impractically complex. This is usually the case in repeated games-at every stage, players are playing a history (in-)dependent game and therefore the set of all strategies can be huge. Thus, it is reasonable to consider the complexity of the strategy (the idea that the assumption of fully, or unboundedly, rational players is unrealistic is not new; see, e.g., [3]). In addition, putting the complexity problem in the wider context, one of the main benchmark results in the theory of repeated games is the folk theorem. It demonstrates that any individually rational and feasible payoff vector of a two-player normal form game is a perfect equilibrium outcome of the repeated game, when the discount factor is sufficiently near one [4]. This multiplicity of equilibria has led some researchers, both theoretical and applied, to restrict attention to certain equilibria. Along with other restrictions, the restriction to certain simple strategies, such as the grim trigger strategy, is commonly used. Moreover, these types of equilibrium selection arguments are not restricted to repeated game. Such restrictions are the norm rather than the exception in dynamic models in economics. Most analyses of dynamic macro models, search models, and random matching models restrict attention to equilibria in which agents use history-independent strategies. This is also true for many models in dynamic games, including stochastic games and multi-person bargaining.
In general, restricting the set of feasible strategies having regard to complexity of the strategy seems to be particularly relevant. A popular approach to the problem of complexity of strategies is to assume bounded rationality of an agent. There have been many attempts to model feasible (implementable) sets of strategies that reflect some aspects of the bounded rationality of players. Finite automata, bounded recall, and Turing machines are a few of the approaches taken. These models are useful because they provide us with quantitative measures of complexity of strategies, e.g., number of states of automata and the length of recall. This approach, used e.g., in [5,6], describes the complexity of the strategy from the point of view of an individual who wants to implement the strategy, and is based on the idea similar to Kolmogorov-Chaitin complexity of a channel in the context of information theory. However, the complexity of the strategy comes not only from the fact of how complicated it is to implement. The complex, structured strategy is the strategy that is difficult to predict. The player may face the problem of choosing the best strategy against the strategy of the opponent, which he learns during the repeated game. The more structured the strategy is, the more difficult it is to predict what the opponent will do, and the more difficult it is to find the best response strategy. Therefore, there is a necessity to consider measures that would help us to detect more or less structured strategies.
In the seminal article, Neyman and Okada introduced entropy-based measures of uncertainty for mixed strategies of repeated games-strategic entropy and strategic entropy rate [7] . They studied repeated two-player zero-sum games in which a player, say Player 1, with a restricted set of strategies plays against an unrestricted player, Player 2. Concerning the values of such games, two questions arise: "What is the number of repetitions needed for Player 2 to take advantage of Player' 1's restriction?" and "How long can Player 1 protect himself against an unrestricted Player 2?" These questions can be posed in the language of asymptotic behavior of the value of the repeated game: "What is the relationship between the number of repetitions and the unpredictability bound so that, as they tend to infinity, either (a) Player 2 can hold Player 1's payoff down to his one-shot game max-min value in pure actions or (b) Player 1 can still secure the value of the one-shot game?" They imposed a restriction directly on mixed strategies. To this end, they employed entropy as a measure of uncertainty of the mixed strategy relative to the other player's strategy. The strategic entropy rate of a mixed strategy is the maximal entropy rate of the play with respect to the other player's strategy. Thus, it is the maximal uncertainty of the play that the other player faces against (mixed) strategy. Of course, their entropy concept is not a measure of the strategic complexity in the way that the size of the automaton or the length of recall was intended to be but captures an abstract informational feature common in bounded recall restrictions, and thereby serves as a useful tool to analyze them.
To provide a proper and more precise insight into the structure and complexity of strategy, we propose taking the next step-looking at convergence rates of the entropy. It is commonly agreed that a slow entropy convergence rate is a sign of complex structure of a process [8]. Obviously, a more structured strategy is more difficult to predict. Therefore, strategies with a slow convergence rate are less predictable. Hence, restricting the set of feasible strategies to the set of those with bounded unpredictability can give more precise insight on the theory of long-term competition.
We propose to use the concept of excess entropy, which measures an entropy convergence rate. It can result in better understanding of strategies played-measuring not only randomness of the strategy but its structure, regularity, and predictability as well. To this end, we will construct a measure based on the quantity widely used in information theory and physics.
Structure and correlation are not completely independent of randomness. It is generally agreed that both maximally random and perfectly ordered systems possess no structure [9]. Nevertheless, at a given level of randomness away from these extremes, there can be an enormously wide range of differently structured processes. There are many ad hoc methods for detecting structure, but none are as widely applicable as entropy is for indicating randomness. The quantities that have been proposed as general structural measures are often referred to as complexity measures. To reduce confusion, it has become convenient to refer to them instead as statistical complexity measures. In doing so, they are immediately distinguished from deterministic complexities. In contrast, statistical complexity measures discount for randomness, and so provide a measure of the regularities present in an object above and beyond pure randomness.
In 1983, Grassberger and Procaccia in their seminal paper [10] introduced the measure of complexity-excess entropy (they called it the effective complexity measure) to study complexity of chaotic signal. It made a huge impact in particular on statistical mechanics and information theory (Google Scholar records over 1600 citations of this article), and in economics as well [11]. The most precise and consistent approach to this measure can be found in the article by Crutchfield and Feldman [8]. The main idea is to look carefully at the manner in which block entropy H(n), n ∈ N, converges to its asymptotic form. Let us consider infinite sequences defined over the alphabet A. We define the n-block entropy as where s n are blocks of length n and µ is a probability distribution. The average (per-symbol) uncertainty is given by After defining these quantities, the excess entropy can be introduced as The excess entropy has a number of different interpretations and goes also by a variety of different names (also stored information, effective measure complexity, Grassberger's complexity, and predictive information). Among others, it measures the excess randomness of the system, and therefore tells us how much additional information must be gained about the sequences in order to reveal the actual per-symbol uncertainty h. Thus, it can be interpreted as a measure of statistical complexity. The aim of this note is to incorporate the concept of excess entropy into the game theory setting. It can be used to answer the question how fast the strategic entropy of the (first) n stage games of the repeated game converges to the limit that is to the (strategic) entropy rate of the strategy. To this aim, we propose an analogue of excess entropy, which we will call excess strategic entropy (shortly: excess s-entropy). We will use it to measure statistical complexity and unpredictability of strategies. Natural interpretations of this measure impose its applicability e.g., in the case of bounded recall and reputation models. Undertaken research and obtained results fit into the growing field of applications of entropy-based concepts in economics, and game theory in particular (see [12][13][14][15][16][17] and references therein).

Formal Description of the Game
This section is intended to make the reader familiar with basic concepts of game theory. For the clarity of presentation, we consider a two-player zero-sum games. A play of a repeated game is an We denote the set of all plays by Ω, i.e., Two plays ω = (ω k ) ∞ k=1 and ω = (ω k ) ∞ k=1 are said to be n-equivalent if ω k = ω k for k = 1, . . . , n. The n-equivalence is clearly an equivalence relation on Ω. Denote by Q n the finite partition of Ω into n-equivalence classes. Each n-equivalence class of plays is called an n-history, and it represents the information available to the players at the end of stage n.
S n and T n denote the sets of measurable mappings from (Ω, Σ n−1 ) to A and B, respectively. Each element of S n and T n represents a 'strategy at stage n'. For each s n ∈ S n (and similarly for t n ∈ T n ), since s n (ω) depends only on the first n coordinates of ω, we sometimes write s n (ω 1 , . . . , ω n−1 ). A pure strategy of Player 1 (resp. 2) is a sequence s = (s n ) with s n ∈ S n (t = (t n ) with t n ∈ T n , respectively).
Thus, the sets of pure strategies of the two players are S = × n≥1 S n and T = × n≥1 T n . We consider S and T to be endowed with the product topologies with the discrete topology on each factor. S and T denote the Borel σ-algebra of S and T, respectively. A mixed strategy of Player 1 (resp. 2) is then a probability on (S, S) (resp. (T, T)).

Information Theoretic Concepts
In this subsection, we recall well-known quantities and facts from information theory. Let Γ be a finite set and X be a random variable which takes values in Γ, and whose distribution is p ∈ ∆(Γ), i.e., p(γ) = Prob(X = γ) for each γ ∈ Γ. To simplify the notation, we define a function η : [0, ∞) → R by

Definition 1.
The entropy H(X) of X defined as the expected value of the information is equal to The notion of entropy can be naturally extended to an arbitrary finite dimensional vector of random variables or probability distributions.

Definition 2.
The conditional entropy H(X 2 |X 1 ) of X 2 given X 1 is defined as Lemma 1 (see [18]). H(Y|X) = H(X, Y) − H(X) and more generally H(X i ). Definition 3. Let (X n ) be a stochastic process, where each X n takes values in a finite set Γ. The entropy rate of the process (X n ) is defined as h((X n )) = lim sup k→∞ 1 k H(X 1 , . . . , X k ). (1)

Stationary Processes and Symbolic Dynamics
Stationary processes (and induced by them stationary strategies) will be an important tool in our considerations. In this section, we recall the concept of a stationary process and show the connection between stationary process, and existence of a shift-invariant measure (an in-depth discussion of these connections can be found in [19]).
We say that process (X n ) is stationary, if the joint distributions do not depend on the choice of time origin, that is, for all m, n ∈ N and a = (a 1 , a 2 , . . .) ∈ A N . The statement that a process (X n ) on the finite alphabet A is stationary, which simply translates into the statement that the Kolmogorov measure introduced on cylinder sets [a n m ] = {x ∈ A N : is invariant under the shift transformation τ : A N → A N , that is, for any measurable set B.
In this note, we will focus our attention on stationary strategies.

Definition 4.
A strategy profile (σ, t) ∈ ∆(S) × T is called stationary (strategy σ is stationary with respect to t) if a random play (X n ) induced by (σ, t) is stationary.
For stationary processes, the upper limit in Formula (1) can be replaced by the limit. In the next section (Lemma 3), we will show a similar fact for stationary strategies.

Derivation of the Strategic Entropy Rate
We define the strategic entropy rate of the strategy σ ∈ ∆(S), given strategy t ∈ T, following construction proposed in [7]. It is a measure of average degree of uncertainty which Player 2 suffers playing the pure strategy t if Player 1 plays a mixed strategy σ.
For each n ∈ N, we define a function H n (·, ·) : ∆(S) × T → R as follows. For a given (σ, t) ∈ ∆(S) × T, let (X k ) be the random play induced by (σ, t). Then, H n (σ, t) is defined as the entropy of this random play up to stage n, that is, H n (σ, t) = H(X 1 , . . . , X n ) = ∑ C∈Q n η(P σ,t (C)).
Recall that Q n is the partition of Ω with respect to actions in the first n stages. Therefore, H n (σ, t) is the uncertainty about the play up to the stage n that Player 2 faces when Player 1 uses σ and Player 2 uses t. The dual interpretation is that it is the amount of information on the play of the game, which Player 2 can obtain using t when Player 1 uses σ. Properties of H n are listed in the following lemma, and its proof can be found in [7]: 1. H n (·, ·) is continuous on ∆(S) × T. 2. For each t ∈ T, H n (·, t) is concave on ∆(S) and constant on each equivalence class of ∆(S), 3. For each σ ∈ ∆(S), H n (σ, ·) is constant on each n-equivalence class of T, 4. H n (σ, t) ∈ [0, n log card A] for every (σ, t) ∈ ∆(S) × T.

Definition 5.
If (X n ) is the random play induced by (σ, t) ∈ ∆(S) × T, then the entropy rate of the strategy σ with respect to the strategy t is h(σ, t) = lim sup n→∞ 1 n H n (σ, t) = lim sup n→∞ 1 n H(X 1 , . . . , X n ) (4) and the strategic entropy rate of the strategy σ is the supremum of h(σ, t) over t ∈ T: In general, the limit of H n (σ, t)/n doesn't have to exist. However, in the next section, we will show that, for a large class of strategies, it does.

Limits for Stationary Strategies
We start this section with a discussion on the existence of the limit in Formula (4) for stationary strategies.
If additionally σ is stationary, then they are decreasing and have a common limit.
From now on, we assume that (X n ) is stationary. Therefore, we can look at the Kolmogorov measure µ (see (3)). To simplify notation, let for ι = {i 1 , . . . , i n } ∈ A n . Then, from invariance of the measure µ and strong subadditivity of Shannon entropy [20], we obtain Therefore, the sequence (g n (σ, t)) is non-negative and decreasing; thus, it has a limit.
Naturally, we can ask whether we can consider limits instead of the upper limits in Formula (6). Unfortunately, in general, neither h n (σ, t) (see discussion after Definition 5) nor g n (σ, t) (see Example 1) have to possess the limit.  For every k ∈ N, we have that Thus, g n doesn't have the limit.
We can see that the assumption that the strategy is stationary gives a nice theory of the strategic entropy (and as we will see in the next section-excess s-entropy). One may ask if the assumption of stationarity isn't too restrictive. Obviously using only stationary strategies may seem to be quite restrictive, but we know that, for many types of games, there exists an optimal stationary strategy [21,22]. Such strategy can be used widely because of its simplicity and optimality.

Excess s-Entropy
Now, we are able to introduce the fundamental tool of this note. Definition 6. The excess strategic entropy of the strategy σ ∈ ∆(S) with respect to the strategy t ∈ T (shortly the excess s-entropy of (σ, t)) is defined as We define the excess strategic entropy of the strategy σ ∈ ∆(S) (excess s-entropy of σ) as Excess s-entropy is an analogue of the excess entropy concept discussed thoroughly e.g., in [8]. It measures how structured and how complex the strategy is. From the point of view of Player 1, it can be interpreted as a measure of complexity of his choices. On the other hand, from the point of view of Player 2, it can be understood as the measure of predictability of Player 1, that is, how eager Player 1 is to change the (one-shot) strategy. In the following considerations, we show properties of this quantity and calculate excess s-entropy of strategies in few games.

Theorem 1.
For any σ ∈ ∆(S) and t ∈ T, we have Proof. Let M ∈ N. Then, Taking the upper limit over M, we complete the proof.
Directly from Theorem 1, we obtain the condition for finiteness of excess s-entropy.

Corollary 2.
If h n (σ, t) ∼ 1 n that is Obviously, having equivalent formulas for excess s-entropy, one may ask which of these two is better to use. Both of them are useful. Because h n is a Césaro average of g n , it is easy to see that g n converges to the (upper) limit faster than h n , so it is better to use it in numerical computations. On the other hand, calculating the upper limit of (n(h n − h)) can be easier analytically.

Examples
Now, we give examples of excess s-entropy of a strategy for few games.
From Examples 2 and 3, we can see that excess s-entropy doesn't respond to the unpredictability of a one-stage game. In both examples, a player plays in an unpredictable way (playing mixed strategy) but not in a complex way. Nevertheless, it detects unpredictability of a long-run play that is how predictable is the change in a player's mixed strategy (probability distribution).
We can see that we should expect infinite excess s-entropy when changes of strategies are frequent, while, if changes are rare, the excess s-entropy should be finite. Now, we show another example which catches this intuition. The first player plays "Top" as long as Player 2 chooses "Left". If Player 2 plays "Right" at the stage m for the first time, then Player 1 chooses a mixed action with distribution ( 1 2 , 1 2 ) ("Top" and "Bottom" are equally probable) for the following m 2 stages. After this time, he always chooses "Top". Therefore, for every t ∈ T, the play (X n ) induced by (σ, t) is X n = (Top, Left) for every n, or there exists m ∈ N such that for n = 1, . . . , m − 1 (Top, Right), for n = m (( 1 2 , 1 2 ), t n ), for n = m + 1, . . . , m + m 2 (Top, t n ), for n ≥ m + m 2 + 1.
One may ask how this concept works and how this quantity can be calculated in more sophisticated and complex problems. Although this is not the aim of this article, we shortly discuss this here. The purpose of this section was to give illustrative examples, showing the intuition behind the introduced quantity. Nevertheless, to describe how excess s-entropy can be used in more complex games, we observe that we are building on the theory of entropy convergence rates. Therefore, we are able to use tools and ideas designed to study these rates. First, looking through the lens of information theory, Feldman et al. showed that excess entropy is the proper tool to detect and analyze patterns produced by a process [23,24]. It is able to distinguish between different patterns that have the same structure factors. However, E(σ, t) is the excess entropy of a process generated by (σ, t). Thus, excess s-entropy of a strategy σ with respect to a given strategy t can be used to detect and quantify patterns produced by a play (σ, t)(E(σ, t) can be calculated numerically). Second, from a dynamical systems perspective, we can study entropy convergence rates using different tools, especially from symbolic dynamics.

Excess s-Entropy for Stationary Strategies
As we can see from the previous section, excess s-entropy agrees with the intuition of the complexity of the strategy. The natural question to ask is how we can bound excess s-entropy. Here, we will concentrate on stationary strategies. First, from Corollary 1, if a play induces stationary process, then E(σ, t) = lim n→∞ n( 1 n H n (σ, t) − h(σ, t)).
In Theorems 2 and 3, we give few properties of the excess s-entropy.

Discussion
In this article, we proposed a new measure of statistical complexity and unpredictability of a strategy-excess strategic entropy. Because of its entropy-based origin, we were able to catch intuition behind this quantity by understanding how entropy and entropy convergence rates work in the context of game theory. We showed and discussed its properties. To simplify discussion, we introduced this quantity for a two-player game, but it can be easily extended to games with more players. Lastly, in this section, we look at two possible applications of this quantity.
It is natural to restrict strategies of a player to those of finite statistical complexity. Obviously starting from the Nash equilibrium excess s-entropy (and complexity of a strategy) will be equal to zero, as the player won't be eager to change his strategy. Nevertheless, if a player is not in the Nash equilibrium from the beginning, he may prefer to choose one of the mixed strategies which is not too complicated-for instance that doesn't force him to change a one-shot strategy too much. Thus, restricting the set of strategies to those with bounded excess s-entropy seems to be a natural choice. Even if the player wants to "converge to the Nash equilibrium", he may prefer those Nash equilibria to which his sequences of one-shot strategies converge using strategies with bounded excess s-entropy. Moreover, results obtained by Neyman et al. [7,[25][26][27] suggest that restricting strategies to those of bounded strategic entropy can impact both Nash equilibria of the player and response of unrestricted players. Similarly to their considerations on strategies with small strategic entropy, restricting strategies to those with small excess s-entropy will impose high predictability of the mixed strategy (probability distribution which he uses is predictable), and will impact the possibility of choosing the best response strategy by the player with an unrestricted set of strategies. Therefore, bounded excess s-entropy strategies should be thoroughly explored.
Another research area where studying excess s-entropy may give a better insight are reputation models. The notion that commitment is valuable has long been a critical insight of non-cooperative game theory, and has deeply affected a number of social science fields, including macroeconomics, international finance, and industrial organization. It seems natural to expect that agents playing in the long-run competition will base their decisions on the reputation of the opponent. Moreover, if the opponent is more predictable (in the sense that the probability distribution which he uses is predictable), then the player can find a better response to the predicted strategy, or may prefer to play with the more predictable player. It should be added that using entropy-based measures in reputation models is not a new idea. For instance, in [28], Ekmekci et al. used entropy bounds to study the impact of unobservable stochastic replacements for the long-run player in the classical reputation model with a long-run player and a series of short-run players. Other applications of entropy concept to reputation models were made, e.g., in [29,30].

Materials and Methods
Methods used in article are standard methods and are thoroughly discussed in Section 2.