Iterated Economic Games , Logic Gates , and the Flow of Shared Information in Strategic Interactions

Iterated games, in which the same economic interaction is repeatedly played between the same agents, are an important framework for understanding the effectiveness of strategic choices over time. To date very little work has applied information theory to the information sets used by agents in order to decide what action to take next in such strategic situations. This article looks at the mutual information between previous game states and an agent’s next action by introducing two new classes of games: ‘invertible games’ and ‘cyclical games’. By explicitly expanding out the mutual information between past states and the next action we show under what circumstances these expressions can be simplified. These information measures are then applied to the Traveler’s Dilemma game and the Prisoner’s Dilemma game, the Prisoner’s Dilemma being invertible, to illustrate their use. In the Prisoner’s Dilemma a novel connection is made between the computational principles of logic gates and both the structure of games and the agents’ decision strategies. This approach is applied to the cyclical game Matching Pennies to analyse the foundations of a behavioural ambiguity between two well studied strategies: ‘Tit-for-Tat’ and ‘Win-Stay, Lose-Switch’.


Introduction
Game theory as it was originally framed by von Neumann and Morgenstern [1] and Nash [2] is concerned with agents (decision-makers) selecting actions to take when they interact strategically with other agents.Strategic interactions in the economic sense are situations in which the reward, or utility, one agent receives is based upon the action they choose to take as well as the actions taken by other agents.In much of non-cooperative game theory [3] it is assumed that agents are maximising their personal utility in one-off encounters between agents that know nothing of past behaviours, and that they choose their actions independently of one another, i.e. they do not discuss their strategies or collaborate with one another before choosing their actions.Relaxing these assumptions has been very fruitful in understanding the fundamental principles of strategic interactions: repeated games with learning can lead to chaotic dynamics [4] and spatially structured games can lead to cooperation where cooperation would not usually occur [5] or to a lack of cooperation where cooperation would usually occur [6].
An important approach to broadening the interaction model between agents has been to include interactions over time, as opposed to a single one-off game, and this has a significant impact on the possible outcomes.Each game can be thought to have occurred at a discrete point in time, i.e. agents' moves and utilities are said to occur at time t, and then consider what choices the agents then make at time t + 1 based on the moves and utilities at time t, t − 1, etc.These are called iterated games and they have been extensively studied by Axelrod [7], Nowak [8] and many others.In order to explore this approach, Axelrod ran a tournament in which contestants submitted algorithms that would compete against each other in playing a game, the Prisoner's Dilemma (see below for details), in which the algorithm that accumulated the highest utility would be the winner.The winning algorithm, submitted by Anatol Rapport, played a very simple strategy called Tit-for-Tat in which the algorithm initially cooperates with its opponent and thereafter chooses the strategy its opponent had used in the prior round [9].These and subsequent results led to a series of fundamental insights into the complexity of strategic interactions in dynamic games [10][11][12][13].
This article is concerned with iterated non-cooperative economic games with finite choices and a finite number of decision-making agents, the strategies the agents use, and the information each agent uses in choosing an action based on past states of the game.First we define two sub-classes of games whose definitions appear to be new to this area: invertible games and cyclical games.Invertible games include the Prisoner's Dilemma and cyclical games include the Matching Pennies game.We are interested in studying two specific strategies for these classes of games: 'Tit-for-Tat' and 'Win-Stay, Lose-Shift' for which the principal methods of analysis are from information theory and computational logic.We use the chain rule for mutual information to explicitly expand out the terms that describe the combinations of and relationships between stochastic variables in iterated games, see .Using these identities we prove a number of theorems where their properties are then explored by applying them to specific examples.In particular we show that both the game structure and the decision rules can be described using logic gates (XOR, NOT etc.) and this description is used to analyse the computational complexity of games and the decision processes involved in the Tit-for-Tat and Win-Stay, Lose-Shift strategies.

Non-Cooperative Game Theory
A normal form, non-cooperative game is composed of i = 1, . . ., N agents who are each able to select an act (often called a 'pure strategy' in game theory) a i ∈ A i (A = A 1 × . . .× A N ) where the joint acts of all agents collectively determines the utility for each agent i, u i : A → I R.An act a i * is said to be preferable to an act a i via the utility function if we have: u i (a 1 , . . ., a i * , . . ., a N ) > u i (a 1 , . . ., a i , . . ., a N ).We use u i to denote agent i's utility function, taking joint action a = a 1 × . . .× a N as an argument, and u n as the utility value (a real number) for agent i in the n th round of an iterated game (see below), if there is any uncertainty as to which agent's utility value we are referring to we use u i n .We will represent the actions available to the agents and their subsequent utility values using the conventional bi-matrix notation, for example the Prisoner's Dilemma [14] is given by the following payoff bi-matrix 1 : In this game there are two agents (prisoners held in two different jail cells) who have been arrested for a crime and they are being questioned (independently) by the police regarding the crime.Each agent chooses from the same set of possible acts: cooperate with their fellow detainee and remain silent about the crime or defect and tell the police about the crime and implicate their fellow detainee.If they both cooperate with each other they each get 1 year in jail, if they both defect they each get 3 years in jail, if one defects and one cooperates the defector gets no jail time (0 years) while the cooperator gets 5 years in jail.This form of a story describing the strategic situation combined with numerical values provides the motivation for studying these types of socio-economic interactions.We also define a game's state space S i for agent i, a specific set of acts and utilities (variables) for agent i, for example S i = {co-op, defect, 5yrs} is a specific state space of variables when i cooperates and −i defects.The variables in S i are deterministically related to one another via the bi-matrix of the game being played.The following definitions form the two sub-classes of games that we will use in the following section: Definition 2.1.Invertible Games: An invertible game for agent i is a game for which [u i ] −1 : I R → A is uniquely defined for each of agent i's utilities.A game is invertible if it is invertible for all agents.
Remark 2.1.For any game as described above, every joint action a = a 1 × . . .× a N defines a single real-valued utility but the inverse, that a real-valued utility defines a unique joint action, does not necessarily hold.However, in an invertible game each element of agent i's payoff matrix is unique such that the utility can identify the single joint act associated with it.

Information Theory
Information theory measures the degree of stochastic variability and dependency in a system based on the probability distributions that describe the relationships between the elements of the system.For a discrete stochastic variable x ∈ {x 1 , . . ., x j } = X the (Shannon) Entropy is: measured in bits as log 2 assumed throughout, and this is maximised if x is uniformly distributed over all x i ∈ X and zero if p(x = x i ) = 1 for any i.We will write p(x i ) or p(x) if there is little ambiguity.There are many possible extensions to the notion of Entropy, in this work we will make use of the Mutual Information and the conditional Mutual Information, respectively defined as [17]: ) = 0 and log( 0 0 ) = 0 and so I(x : y|z) = 0 bits.
A useful special case of the conditional mutual information is the Transfer Entropy (TE) [18].For a system with two stochastic variables X and Y that have discrete realisations x n and y n at time points n ∈ {1, 2, . ..} then the TE of the joint time series: {{x n , y n }, {x n+1 , y n+1 }, . ..} is the mutual information between one variable's current state and the second variable's next state conditional on the second variable's current state: This is interpreted as the degree to which the previous state of Y is informative about the next state of X excluding the past information X carries about its next state.This is one of many different specifications of TE, generalisations based on history length appears in Schreiber's original article [18], and further considerations of delays [19] as well as averaging the TE over whole systems of stochastic variables [20] have also been developed, for a recent review see [21].
In iterated games we wish to know how much information passes from each variable in agent i's state space S i n at time n to their next act a i n+1 and what simplifications can be made.It is assumed that each agent i has a strategy Z i (. ..) that maps previous system states to actions at time n + 1 and in general this may depend on system states an arbitrary length of time l + 1 into the past: where {S i n , S i n−1 , . . ., S i n−l } is assumed to be the totality of an agent's information set used to make a decision.For example one of the simplest 1-step strategies is the Tit-for-Tat (TfT) strategy whereby agent i simply copies the previous act of the other agent [8,9]: It is possible for Z i to take no arguments in which case the agent chooses from a distribution over their next act that is independent of any information set.This is the case for the Matching Pennies game considered below: two agents simultaneously toss one coin each and the outcome, either matched coins or mismatched coins, decides the payoff to either agent and so for this '0-step strategy' each action a i n is independent of all past states of the game for both agents.Given a maximum history length of l we will say that a strategy Z i is an l-step Markovian strategy and an l-step Markovian game is a sequence of the same game played repeatedly for which each agent has an l-step Markovian strategy.

Information Chain Rule and Iterated Games
We are interested in constructing probability distributions over the variables in S i n for iterated games.As n indexes time, for large n a probability of an event is the frequency of occurrence of the event (an element in S i n ) divided by n.Other probabilities, such as conditional or joint probabilities, are natural extensions of this approach.For the moment we make no assumptions about the relationship between elements of S i n and a i n+1 except that they are not statistically independent of each other.Given a 2-agent, 1-step Markovian game with states S i n = {a i n , a −i n , u n }, from the chain rule for information ([17] Theorem 2.5.2) we have the following six identities for the total information between the previous game state and agent i's subsequent act a i n+1 : = I(a i n : = I(u n : = I(u n : We will say that I(S i n : a i n+1 ) measures the amount of information S i n shares with a i n+1 .These expressions for the shared information between game states and next actions can be simplified in useful ways as follows: Theorem 3.1.For any normal form, non-cooperative 1-step Markovian game (not necessarily cyclical or invertible): I(u i n : Proof.From the definition of a non-cooperative game the utility value u i (a n ) = u n is determined by the joint strategies so the log term in equation 3.04 can be written: log . In this case both conditional probabilities in the log term will be either 0 or 1 and note the discussion following equation 3.04.Remark 3.1.From Theorem 3.1 the first terms in Equations 3.19e and 3.19f are zero.While the joint actions of the agents unambiguously identifies a utility value, in general the utility value does not unambiguously identify a joint action.See the Traveler's Dilemma example below.Corollary 3.1.For a 2-agent, 1-step Markovian game the total information from agent i's previous state space to i's next act is encoded in the sum of agent i's Transfer Entropy from agent −i's previous act to agent i's current act and agent i's 'active memory' [22] of their own past acts: I(S i n : Proof.This follows the same approach as Theorem 3.1 where knowing the joint act a n implies knowing any single agent's act a i n and from the definition of an invertible game for which [u i ] −1 is the necessary deterministic map.Remark 3.2.From Theorem 3.2 the first terms in Equations 3.19a -3.19f are zero for invertible 1-step Markovian games.
Corollary 3.2.For an invertible 1-step Markovian game the total information from agent i's previous state space to agent i's next act is encoded in agent i's active memory between the previous utility and their subsequent act: I(S i n : Proof.By Theorem 3.2 for equation 3.19c: Theorem 3.3.For a cyclical 1-step Markovian game I(x : Proof.This follows the same approach as Theorem 3.1 and noting that for a three element S i n in a cyclical game there is a c ∈ S i n and distinct {a, b} = S i n \ c for which there is a deterministic relationship between any pairing of a or b with c and consequently the conditional mutual information in Theorem 3.3 is zero.Remark 3.4.Equality 3.111 is notable, it states that the transfer entropy from the utility of agent i to their next act is the same as the transfer entropy from agent −i's act to agent i's next act.

Example Games
In this section we consider in some detail three games belonging to three distinct classes.The first example, the Traveler's Dilemma, belongs to the class of games that are neither invertible nor cyclical.The second example, the Prisoner's Dilemma, is an invertible game and the final example, the Matching Pennies game, is a cyclical game.The Traveler's Dilemma and the Prisoner's Dilemma illustrate the information theoretical aspects of iterated games and the Matching Pennies game illustrates the computational properties of cyclical games.

The Traveler's Dilemma
The story describing the Traveler's Dilemma (TD) game is often told in the following form [23]: Lucy and Pete, returning from a remote Pacific island, find that the airline has damaged the identical antiques that each had purchased.An airline manager says that he is happy to compensate them but is handicapped by being clueless about the value of these strange objects.Simply asking the travelers for the price is hopeless, he figures, for they will inflate it.
Instead he devises a more complicated scheme.He asks each of them to write down the price of the antique as any dollar integer between $2 and $100 without conferring together.If both write the same number, he will take that to be the true price, and he will pay each of them that amount.But if they write different numbers, he will assume that the lower one is the actual price and that the person writing the higher number is cheating.In that case, he will pay both of them the lower number along with a bonus and a penalty-the person who wrote the lower number will get $2 more as a reward for honesty and the one who wrote the higher number will get $2 less as a punishment.For instance, if Lucy writes $46 and Pete writes $100, Lucy will get $48 and Pete will get $44.
The utility function for the TD game can be defined using he Heaviside step function Θ(z): and agent i's action a i is the cost i claims for the vase and agent −i's action a −i is the cost −i claims for the vase, the utility for agent i is [24]: This game is not invertible because in the case where the two agents disagree with each other the agent i who offers the lowest value a i has a utility of u i = a i + 2 whereas the other agent has a utility of u −i = a i − 2, i.e. agent −i's utility is independent of the precise value offered, therefore the agent utility is invertible for agent i but not for agent −i, so the game is not invertible.For iterated TD games the simplest calculation of the information shared between the previous state and the next action is:

Prisoner's Dilemma
The Prisoner's Dilemma (PD) was introduced earlier, the payoff matrix is: Because it is invertible for both agents (from the uniqueness of each of the four utilities for each agent) the total information from the previous time step to the next act for either agent is encoded in the Mutual Information between the previous utility and the next act: I(S i n : a i n+1 ) = I(u n : a i n+1 ).For a 1-step Markovian strategy no other past information is needed to select the next act.In this sense the utility acts as a reward that an agent can use to distinguish between the actions of other players, and the entire joint action space of the game's previous state can be reconstructed by agent i using only the utility value.
To explore this further we follow Nowak [8] [pg.78 -89] in defining two deterministic strategies in the iterated version of the PD game in terms of previous game states and subsequent acts.In the iterated PD there are four possible joint acts at time n: Cc, Cd, Dc and Dd2 .A vector of these joint acts is mapped to a vector of acts at time n + 1 in the following way: [Cc, Cd, Dc, Dd] → [x 1 , x 2 , x 3 , x 4 ] where x i ∈ {C, D}.In the Tit-for-Tat strategy (TfT): [Cc, Cd, Dc, Dd] → [C, D, C, D] in which agent i copies the other agent's previous act, and in the Win-Stay, Lose-Switch strategy (WSLS): [Cc, Cd, Dc, Dd] → [C, D, D, C] in which agent i repeats their act if they receive a high payoff (a 'win' of either 1 or 0 years) but changes their act if they receive a low payoff (a 'loss' of 3 or 5 years).This interpretation of the WSLS strategy is not an accurate representation of the strategy though: WSLS monitors the success (measured by utility) of the previous act and adjusts the next act accordingly i.e.WSLS uses the information set {a i n , u n } to select a i n+1 , not {a i n , a −i n } as is implied by the representation [Cc, Cd, Dc, Dd] → [C, D, D, C].This distinction is not important for Nowak's analysis of PD ([8] pg.88) but it is important in the analysis below. 3It can be seen that these two strategies have the same amount of shared information between their previous game states and their next acts: TfT WSLS 4 .For TfT the shared information between S i n and a i n+1 is: The two expansions of I(S i n , a i n+1 ) that contain this mutual information term are Equations 3.19a and 3.19f so the first two terms in these two equations must be zero for PD.The first term is zero by Theorem 3.2 and we observe that in TfT a −i n explicitly determines a i n+1 so the second conditional entropy terms in which a −i n conditions a i n+1 must also be zero.For WSLS we measure I(S i n : a i n+1 ) in terms of {a i n , u n } and then show that this is equivalent to the TfT result.By Theorem 3.2 for invertible games the first term in any expansion of I(S i n : a i n+1 ) is zero, and the equations that then only have {a i n , u n } in their expression are equations 3.19b and 3.19c.These equations measure the shared information between the information set of WSLS and the agent's next act, i.e.I(a i n , u n : a i n+1 ).Equation 3.19c is then: I(a i n , u n : a i n ) = I(a i n : a i n+1 |u n ) + I(u n : a i n+1 ) and by Corollary 3.2 I(a i n , u n : a i n+1 ) = I(u n : a i n+1 ) WSLS .By definition Equation 3.19c Equation 3.19a and therefore I(u n : a i n+1 ) WSLS I(a −i n : a i n+1 ) T f T .However TfT is not behaviourally equivalent to WSLS for the PD: WSLS ≡ TfT as can be seen by direct comparison of the strategies described above.
We now consider the relationship between WSLS and TfT for the PD game by making the following substitutions: C, c → 0, D, d → 1, a 'win' → 1, and a 'loss' → 0. Replacing the utilities with a win-loss binary variable is justified as WSLS (the only strategy that uses the utilities) only considers wins and losses, not the numerical values of the utilities.It can then be seen that this modified version of the Prisoner's Dilemma is equivalent to a NOT logic gate for agent i that inverts the action of agent −i: The first three columns are the actions of the two agents in round n of an iterated game and the outcome of the game based on these actions, i.e. each row is an instance of the 3-element set S i n .The next two columns are the actions of agent i in round n + 1 for WSLS and TfT.The diagram shows the relationship between the elements in S i n when the game is interpreted as a logic gate relating inputs (agent actions) and outputs (wins and losses).In this case the 4×3 matrix made up of the 4 rows of possible combinations of S i n is the 'truth table' of the PD game.Note that while both agents always have an incentive to defect irrespective of what the other agent does 5 , it is only agent −i's action that decides whether or not agent i will win or lose 6 , Figure 1 represents these relationships for TfT being played in an iterated PD game.
The modified Prisoner's Dilemma game for agent i based on wins and losses using the TfT strategy.S i n−1 and S i n are the game states at time n − 1 and n, the NOT gate is the logical operator that connects the variable a −i n to the win-loss status of the game, in this sense the NOT operator is the logic of the modified PD game.The only connection between successive game states for the TfT strategy is a direct connection between a −i n−1 and a i n , neither agent i's act nor the win-loss status of the game is in the information set of the TfT strategy.The a i n variable influences the win-loss outcome in a similar diagram for agent −i just as a −i n influences the win-loss outcome for agent i in this diagram.
Next we show that WSLS is a non-linear function of its two inputs while TfT is a linear function of its inputs.To see this note that TfT WSLS and TfT is linearly (anti-)correlated with win-loss, however there is zero pairwise linear correlation between WSLS actions a i n+1 and either a i n or u n .Consequently, because mutual information measures both linear and non-linear relationships between variables, all of the shared information between S i n and a i n+1 is either linear for TfT or non-linear for WSLS.

5
In the PD defection is said to strictly dominate cooperation, see [25] page 10.

6
Whether this is a large or a small win or loss is controlled by agent i though, this aspect was lost when the utility was made into a binary win-loss outcome in modifying the PD game in the table.A truth table can also be constructed from the information set of WSLS (columns 1 and 3 in the table above) and the WSLS action in the next round (column 4 in the table above) and the truth table is that of an XNOR logic gate in which matching inputs (00 or 11) are mapped to 0 and mismatched inputs (10 or 01) are mapped to 1. Encoding the utilities of the PD as win-loss and representing it as a NOT gate makes clear that learning the relationships between the variables of S i for the PD game (as encoded in the first three columns of the table above) is a linearly separable task; if agent i wants to learn the PD, it only needs to divide the state space of agent −i's actions up into wins and losses for agent i, a problem that can be solved by single layer perceptrons [26].However, learning the WSLS strategy, equivalent to learning an XNOR operation, is not a linearly separable task and requires a multi-layer perceptron [27].Figure 2 represents these relationships for WSLS being played in an iterated PD game.
The modified Prisoner's Dilemma game for agent i based on wins and losses and using the WSLS strategy.The WSLS strategy is equivalent to an XNOR gate that has as its information set (inputs) {a i n−1 , win-loss} and outputs a i n .
In the modified PD the utility values were converted to binary variables, this works because TfT does not have the utility value in its information set and WSLS has the utility value in its information set but only considers the outcome as a binary win-loss variable.In the next example these substitutions are unnecessary.We take a similar approach to establish that the Matching Pennies game is not a linearly separable problem and that WSLS is both informationally and behaviourally identical to TfT for one agent but not the other.

Matching Pennies
In the Matching Pennies (MP) game two agents have one coin each and each coin has two states, either Heads (H) or Tails (T).When the agents compare coins one agent wins if the two coins match and the other agent wins if they do not match.In the usual description of the game the two agents toss their coins before comparing them, this randomising of the coin states can be interpreted as one step of an iterated game using a 0-step Markovian strategy.In what follows though we use the name 'Matching Pennies' to describe the iterated game where the WSLS and TfT are deterministic strategies that use the action-utility bi-matrix given by: agent −i Heads Tails agent i Heads (1, 0) (0, 1) Tails (0, In the same fashion as for the PD game above, we can map agents' acts to numerical values: H, h → 1, T, t → 0, and a win = 1, a loss = 0 (unlike the PD game we do not need to simplify the utility values and so the game is not modified as the PD was).Then the MP game can be interpreted as XNOR and XOR gates in the following two tables for the two agents: XNOR logic gate for agent i As for the PD game the first three columns of each row represents an instance of S i n (top table) and S −i n (bottom table) and the four possible permutations of the variables of S i n and S −i n forms a 4×3 table that can be interpreted as the truth table of a logic gate.However, unlike in the PD game, both agents now have an input into each other's logic gate, see Figures 3 and 4 for schematic representations for agent i.Just as in the PD, for MP the WSLS strategy is not linearly correlated with any element of its information set and TfT is linearly correlated with its information set 7 .But unlike the PD, now WSLS is perfectly (linearly) correlated with TfT for agent i and perfectly anti-correlated for agent −i.It can also be seen that knowing any two variables in S i n = {a i n , a −i n , u i n } is sufficient to derive the third variable.This is a property of the XOR (⊕) and XNOR (⊕) logic operations, given the three variables a i n , a −i n , u i n , ∈ {0, 1} related by the XNOR operator then: a i n ⊕ a −i n = u i n , a −i n ⊕ u i n = a i n , and u i n ⊕ a i n = a −i n , we will refer to this as the cyclical property of cyclical games.We can now prove the following relationship: Theorem 4.1.TfT ≡ WSLS for agent i: For the Matching Pennies iterated game the Tit-for-Tat strategy is behaviourally indistinguishable from the Win-Stay-Lose-Shift strategy for agent i.

Proof. By the cyclical property for Matching Pennies, if:
The WSLS strategy G WSLS takes these same inputs {a i n−1 , u i n−1 } but outputs a i n : Because G MP and G WSLS are equivalent to XNOR logic gates, for the same input they produce the same output: G MP ≡ G WSLS and therefore a −i n−1 ≡ a i n .By definition the TfT strategy is: a −i n−1 ≡ a i n and so TfT ≡ WSLS.
Remark 4.1.TfT is a prototypical herd-like strategy: it simply follows the behaviour of another agent.In contrast, WSLS is a prototypical fundamental strategy: it uses past actions and their payoffs to decide whether to stay with the current strategy or change.The fact that they are indistinguishable from each other is not guaranteed, for the Prisoner's Dilemma TfT is behaviourally distinguishable from WSLS, the indistinguishability in the Matching Pennies game comes from identifying the MP game and the WSLS strategy with the same (XNOR) logic gate and the cyclical property of G MP .
Before the next result we introduce a variation on the WSLS strategy, called 'Win-Switch-Lose-Stay': WSLS −1 in which an agent changes strategy if they win but will stay with a losing strategy.Proof.The XOR logic gate for agent −i shows the WSLS strategy is anti-correlated with the TfT strategy, this is the opposite behaviour of the XNOR logic gate for agent i, so flipping the behaviours of WSLS: 0 → 1 and 1 → 0 results in the strategy WSLS −1 that is behaviourally the same as TfT.In Nowak's notation in the Matching Pennies game for agent −i (who wins if pennies are mismatched) this results an inverted WSLS: [Hh, Ht, Th, Tt] → [H, T, H, T] which is equivalent to [11,10,01, 00] → [1, 0, 1, 0].The TfT strategy for the MP game is a linearly separable learning task, given S i n−1 a single layer perceptron is sufficient to map a −i n−1 to the correct a i n output [26].The WSLS strategy for the MP game is not a linearly separable task because it is equivalent to an XNOR gate [27]: given S i n−1 a multi-layer perceptron is necessary to map a i n−1 and u i n−1 to a i n .This difference in the complexity of the computational task and the indistinguishable character of the subsequent behaviour of the agents suggests that understanding cognitive decision-making processes is not easily untangled by observing behaviour.

Discussion
This article is, to the best of the author's knowledge, the first to use Transfer Entropy in the analysis of iterated games 8 while also contributing theoretical concepts to the analysis of the computational foundations of games and strategies.In some respects the analysis of games discussed here is very similar to the Elementary Cellular Automata (ECAs) work of Lizier and colleagues [29,30].In ECAs the number of agents is much larger than the two agent games considered here, they form a potentially infinite spatial array of locally connected agents that switch states based on their own states and the states of their neighbours.ECAs are simpler than the agents considered in this article as they do not have a utility function associated with their collective behaviour, an added complexity that has been shown here to be either informative or redundant depending on the game and the strategy being played.The potential for the utility values to be behaviourally redundant is important in reward based learning and it is a poorly studied area.This can be seen in economic theory in which 'revealed preference theory' plays a significant role in understanding the connection between behaviour and reward.As Binmore points out in Section 1.5 of [31] (also see Savage's foundational work in statistics [32]), if an economic decision-maker consistently acts to select one option over another, then their subjective preferences are revealed by their behaviour and their acts can then be interpreted as though they are maximising a real valued utility function.The emphasis here is on what people do, the acts described in the current article, and it is this behavioural focus and not the form of a utility function that economics is conventionally based upon.The alternative point of view, and one often implicitly adopted in agent based modelling of economic learning and behaviour, is that decision-makers act consistently because they have an internal utility function that allows them to order their preferences, this is called the causal utility fallacy in economics, see pages 19-22 of [31] for a discussion.
The classification of games and strategies as logic gates also has analogies in ECAs where Rule 90 [33] (using Wolfram's numbering system for ECAs) is based on the XOR operation and it is the simplest non-trivial ECA [34].If we label an agent in a Rule 90 ECA as i and its two nearest neighbours as i − 1 and i + 1 and the state of i at time t as s i t then Rule 90 updates agent i's state at t + 1 to: t+1 .This can be seen as an elementary version of the more complex interactions depicted in the figures illustrated above and emphasises the point that both game theory and strategies in iterated games can be seen as computational processes, and potentially universal turing processes, just as ECAs are [35].From this point of view, if an idealised economy is seen as a collection of agents strategically interacting in a game-theoretical fashion, this can be interpreted as a large network of parallel and sequential computational operations that continuously 'computes' the output of an economy.These results are also important for understanding strategic behaviour at (at least) two different levels.At an individual's cognitive level the neural processing of information can been represented as a combination of logic operations implemented by a (biological) neural network [36].Similarly, earlier neuro-economic work experimentally connected the level of individual neural recordings with adaptive learning and overt behaviour in the iterated matching pennies game [37,38].In [39], Lee and collaborators identified the ambiguity between Win-Stay, Lose-Switch and copying the other agent's previous move (Tit-for-Tat) but did not consider the issue any further.Previously Nowak [8] had examined the strategic effectiveness of Win-Stay, Lose-Switch relative to Tit-for-Tat for the Prisoner's Dilemma but did not note the possibility of an ambiguity in Matching Pennies.The results here show why ambiguities occur and an approach for understanding how different computational processes (decision strategies), in combination with the particular game being played, results in a fundamental indeterminacy in differentiating 'internal' strategies by observing 'external' behaviour.
At the larger scale of collective behaviour we would like to understand, and distinguish between, the strategies of those who are following the behaviour of others and those who are processing information in a more complex fashion based on their own past experience.It seems likely that realistic strategies would include a combination of both social influences and fundamental computations because both aspects play a part in the pay-off an agent receives, but splitting these processes out provides an insight into the foundations of collective behaviour.Herding behaviour and financial contagion have been suggested as a possible mechanism that drives financial market collapses [40,41] and so it is important both theoretically and empirically to understand the limits of our ability to detect these two different behaviours.Previously, information theory has been used to measure abrupt transitions in general [20,42] and in financial market collapses specifically [43,44] but little progress has been made in relating information sets and strategic computation in economic theory, particularly as it relates to fundamentalists versus herders [45].

Prisoner' s
Dilemma as a NOT logic gate for agent i for i in round n

Figure 3 .Figure 4 .
Figure 3. Two steps in the MP iterated game, for agent i the elements of S i n are related to one another via an XNOR logic gate g MP representation of the MP game.TfT is the copy operation (identity logic gate) from a −i n−1 to a i n

8
Note the work in[28] using a different measure of 'information flow' for the iterated Matching Pennies game.Preprints (www.preprints.org)| NOT PEER-REVIEWED | Posted: 1 March 2017 doi:10.20944/preprints201703.0002.v1Peer-reviewed version available at Entropy 2017, 19, , 201; doi:10.3390/e19050201 |S i |, a cyclical game for agent i is a game for which knowing any combination of φ − 1 variables of S i is both necessary and sufficient to derive the remaining variable.A game is cyclical if it is cyclical for all agents.
[16]example the Prisoner's Dilemma and Stag Hunt games are invertible but the Matching Pennies and Rock-Paper-Scissors games are not.Matching Pennies and Prisoner's Dilemma are discussed in detail below, and see for example[15]for Rock, Paper, Scissors and[16]for Stag Hunt.Definition 2.2.Cyclical Games: Given the cardinality of variables in agent i's state space φ =