Cooperation between Emotional Players

: This paper uses the framework of stochastic games to propose a model of emotions in repeated interactions. An emotional player can be in either a friendly, a neutral, or a hostile state of mind. The player transitions between the states of mind as a response to observed actions taken by the other player. The state of mind determines the player’s psychological payoff which together with a material payoff constitutes the player’s utility. In the friendly (hostile) state of mind the player has a positive (negative) concern for other players’ material payoffs. This paper shows how emotions can both facilitate and obstruct cooperation in a repeated prisoners’ dilemma game. In ﬁnitely repeated games a player who cares only for their own material payoffs can have an incentive to manipulate an emotional player into the friendly state of mind. In inﬁnitely repeated games with two emotional players less patience is required to sustain cooperation. However, emotions can also obstruct cooperation if they make the players unwilling to punish each other, or if the players become hostile when punished.


Introduction
The multitude of repeated interactions in everyday life are often modeled as repeated games. However, a crucial assumption of a repeated game is that the stage game is identical for each period. This is a strong assumption even when the material consequences of the interaction remain the same. For one thing, emotions are likely both to be affected by the outcomes of earlier periods and to influence preferences. Consequently, the stage games differ and the interaction cannot properly be modelled as a repeated game. While conditional preferences have been studied before [1][2][3], this paper is the first to study preferences conditional on emotional states of mind in repeated interactions.
This paper proposes a model of players with emotionally driven social preferences in repeated interactions. The assumption of a unique stage game is relaxed through use of the stochastic games framework. Stochastic games, first introduced by Shapley [4], are a generalization of repeated games, in which the stage game can change between periods. Stochastic games have been frequently used in models of oligopoly and competition, e.g., to model endogenous demand. This paper is, to my knowledge, the first to apply the stochastic games framework to a model of social preferences. In this model, the players can react emotionally to the actions of other players, and, as a response, transition between different states of mind. A cooperative act by another player may induce a player to become friendly and value the other player's material payoff positively, whereas a hurtful act may induce the player to become hostile and value the other player's material payoff negatively. 1 Each state of mind 1 defines a vector with stage game utilities. The players' type defines a transition matrix which, given a current state and an action profile, defines a probability distribution over next period's states of mind.
Focus is on a set of relatively simple player types whose state transitions are solely determined by the last observed action of the other player. The traditionally studied Homo oeconomicus corresponds to a type who cannot transition from the neutral state of mind. The Homo oeconomicus is contrasted to an emotional type which transitions between three states of mind, a friendly, a neutral, and a hostile.
The model is applied to a repeated prisoners' dilemma game form with fixed material payoffs, but where the players' preferences over outcomes evolve with the history of play. Both the finite and the infinite interaction is studied. In the infinitely repeated interaction focus is on the Grim Trigger and a mutual minmax strategy profiles and the conditions required for them to be subgame perfect Nash equilibria.
The paper shows how a history-sensitive concern for the other player's material payoffs can be beneficial for cooperation. A positive concern, for example after a good history of play, can encourage further cooperation. Likewise, a negative concern, after a bad history of play, can make threats of punishment more credible than in the standard model. Moreover, a Homo oeconomicus can have an incentive to manipulate the other player for future gains. The two main contributions of this paper are (i) the framework for modeling emotions in a repeated interaction using stochastic games and (ii) a better understanding of how emotions can affect cooperation.
The observation that altruistic preferences can be conditional has been made before. In the model by Levine [1], altruistic preferences are conditional on the degree of altruism of the other player. Altruistic individuals want to be altruistic toward other altruistic individuals, but not toward those who only care for their own well-being. Cox et al. [2] and Cox et al. [3] model reciprocity in extensive form games. That is, the players can become more or less altruistic toward someone depending on how altruistically the other player has acted. In the model proposed in this paper, the altruistic preferences are conditional on the players' emotional state of mind, which is affected by the history of play. Related is also Oechssler [9]. Oechssler shows that when players in a repeated interaction have social preferences, utility may not be separable across periods since outcomes in earlier periods may affect the utility from choices in the last period.
This paper is outlined as follows. Section 2 presents the model of a stochastic prisoners' dilemma game with emotional players. Section 3 analyzes the finitely and infinitely repeated interaction between (potentially) emotional players. Section 4 concludes and relates the model to earlier literature.

Model
This paper studies repeated interactions between players with emotionally driven social preferences. The assumption is that two players who have never met before, and who are in the initial state of a repeated interaction, care only for own material payoff. As the interaction progresses the players become more friendly, or more hostile, toward each other depending on the other's behavior.
Focus is on two-player games, I = {1, 2}, and more precisely on a repeated prisoners' dilemma game (form) with fixed material payoffs, as illustrated in Table 1. Each player has two actions, cooperate (C) and defect (D). If both players choose C, they each receive a material payoff of c; if both choose D, they each receive a material payoff of d. If one of them chooses C while the other chooses D, the first player receives a material payoff of 0 and the second player receives a material payoff of b. With b > c > d > 0 each player has a dominant action, D, but the associated payoff vector is Pareto dominated by the payoff vector received if they both choose C.
Let A = {C, D} denote the set of available actions. An action profile, a = (a 1 , a 2 ) ∈ A 2 , is a pair of actions.
Both players have the same finite set of states of mind, M. When the players' preferences over own and other's material payoffs depend on their current state of mind, the repeated interaction can be modeled as a stochastic game. Stochastic games generalize repeated games by allowing the stage game, or state, to change between periods and were first introduced by Shapley [4]. In this model the state space corresponds to the cartesian product of the two players' states of mind, S = M 2 .
Each player i has transition probabilities P i (m|s, a) that, given an action profile, a ∈ A 2 , and the current state, s ∈ S, specifies the probability of transitioning to m ∈ M, a state of mind. Attention is restricted to players whose state of mind is only affected by the last observed action of the other player, and not by any action further back in the history of play. Given the interaction's material payoffs, each state of mind defines a vector of corresponding utilities.
The stage payoff function for player i is denoted u i : S × A 2 → R: The first term is utility derived from a material payoff, π i . The second term is utility derived from a psychological payoff, φ(m i )π j . That is, utility derived from a positive or negative concern for the other player's material payoff. This paper considers three states of mind: Friendly (F), neutral (N), and hostile (H). In the friendly state of mind, the player cares positively for the other player's material payoff, φ(F) = α ∈ R + . In the neutral state of mind, the player only cares for own material payoff, φ(N) = 0. In the hostile state of mind the player cares negatively for the other player's material payoff, Focus is on two types of players, traditional and emotional. The player types differ in their transition probabilities. The traditional player, or Homo oeconomicus, starts in the neutral state of mind and cannot transition to any other state (such transitions have a zero probability). The emotional player starts in the neutral state of mind and can transition between all states of mind.
The game is finitely or infinitely repeated. The players' common discount factor is denoted δ ∈ (0, 1). In real life, the material and psychological discount factors most likely differ. However, for simplicity of analysis, I assume that material and psychological payoffs are discounted identically. The game is of perfect monitoring. In each period a simultaneous-move game of complete information is played. Before the players choose their actions the state is drawn and revealed to them. The interpretation is that the players are able to recognize their own state of mind and identify the other player's state of mind from their facial expression. 2 This is a strong assumption, made to simplify the analysis. The model is however not restricted to games of complete information and uncertainty over others' state of mind can be modeled using a stochastic game of incomplete information [12].
The set of period-t ex ante histories, denotedH t , is the set (S × A 2 ) t for each t > 0, which identifies the state and action profile in each period. The set of ex post histories also specifies the current state of the game, (S × A) t × S, and is denoted byH A pure behavior strategy profile, σ : H → A, defines an action for each player and each ex post history. Given a strategy profile σ ∈ Σ, the stochastic game defines a Markov chain on the finite state space S. 2 There is a rich literature on the identification of emotions in facial expressions [10,11]. The basic emotions-happy, sad, angry, fearful, disgusted, and surprised-can be identified across all cultures.
The normalized expected present value for player i is 3 where T is a positive integer or infinity. A strategy profile σ constitutes a Nash equilibrium of the stochastic game if U i (σ) ≥ U i (σ i , σ −i ) for all i ∈ I and all σ i ∈ Σ i . A strategy profile is a subgame perfect equilibrium of the stochastic game if, for any ex post history h ∈ H , the continuation strategy profile σ|h is a Nash equilibrium of the continuation game.

Analysis
The focus of this analysis is on repeated interactions between two pairs of player types. The first pair consists of one Homo oeconomicus and one emotional player, and the second of two identical emotional players. Further, each pair is compared to the benchmark case of play between two Homo oeconomicus. For the infinitely repeated interaction, two subgame perfect strategy profiles, Grim Trigger and a mutual minmax, are compared.
The emotional player type has deterministic state transitions as illustrated in Figure 1

The Finitely Repeated Interaction
Consider the finitely repeated interaction between one Homo oeconomicus, player 1, and one emotional player, player 2, with three states of mind, M = {F, N, H}. The corresponding stochastic game has nine states, S = M 2 . Only three are reachable from the initial state since by definition the Homo oeconomicus never leaves the neutral state of mind: {NF, NN, NH} ⊂ S. The utility matrices are illustrated in Table 2. Player 1 is the row player and player 2 is the column player. 3 Multiplying by (1 − δ) as in (2) is a non-essential normalization, made to simplify calculations. 4 The transitions are initially assumed to be deterministic. An analysis of players with stochastic state transitions can be found in Appendix A. 5 An alternative is to let the player transition to the neutral state rather than to the hostile state. This would however not affect the analysis of the game.
Player 2's utility from choosing D in the friendly state of mind is higher compared to in the neutral state of mind. However, so is the utility from choosing C, and the increase in utility from choosing C is larger than the increase in utility from choosing D. Suppose player 2 is in the friendly state of mind and believes that player 1 will choose C with probability p. Then player 2's expected utility from choosing C is: p(c + αc) + (1 − p)αb. Likewise, player 2's expected utility from choosing D is: . Thus, given p, an increase in α increases player 2's utility from choosing C more than it increases the utility from choosing D.
For (D, D) to be a Nash equilibrium in the stage game in the friendly state, α has to be sufficiently small. 6 If αb > d + αd, then (D, C) is the unique Nash equilibrium in that stage game, and if c + αc > b, then C is the emotional players best response to the other player choosing C. 7 Consider the finitely repeated game with a payoff structure that satisfies the above conditions, and let T = 2. In the last period, the players have to play the Nash equilibrium of that stage game: If they are in the friendly state, player 1 chooses D, and player 2 chooses C. In the first period, player 1 has to choose C for the players to transition to the friendly state. Player 2 best responds by choosing D. Player 1 uses the first period to manipulate player 2 into the friendly state of mind. Player 1 is willing to do so given sufficient patience, Let Σ * be the set of subgame perfect equilibria of the finitely repeated interaction.
Consider next the case of T = 3. Player 1 still has to be sufficiently patient to choose C in the first period. Further, in the second period, if the players are in the friendly state, then player 2 chooses C, player 2's dominant action, whereas player 1 chooses between C and D. By choosing C, player 1 postpones the higher utility until the last period, and receives a total utility of δc + δ 2 b. By choosing D, player 1 receives the higher utility in this period, but receives a lower utility in the last period, δb + δ 2 d. Player 1 chooses C in the second period given sufficient patience, 1). Finally, in the last period, the players play the Nash equilibrium of that stage game. 6 The stage game is thus not a prisoner's dilemma in the state s = NF. 7 The action profile (D, C) is an unintuitive Nash equilibrium. It is an equilibrium because the players first receive their payoffs, both material and psychological, and then transition to the other state of mind. If the players would transition before receiving their payoffs, and thus evaluate the payoffs according to their new state, (D, C) would not be an equilibrium.
Hence for T > 2, player 1 has more opportunities to benefit from player 2's friendly state of mind. Player 1 will take these opportunities given sufficient patience. The patience required for player 1 to manipulate player 2 into the friendly state of mind in the first period is then decreasing in T.
For each finite time horizon T, there is a minimal discount factor, δ • (T), required for the action profile (C, C) to be played on the equilibrium path of the finitely repeated game.
Finally, consider the finitely repeated interaction between two emotional players. The corresponding stochastic game has nine states, S = M 2 , all reachable from the initial state. 8 Note that a player's transitions are independent of the other player's state of mind. One player can be in the neutral state of mind while the other is in the hostile state of mind. If α is sufficiently large, c + αc > b, then the players can sustain cooperation by strategically manipulating each other s states of mind. Notice that once both players are in the friendly state of mind the unique stage game Nash equilibrium is for both to choose C.
Let Σ * be defined as before.
To summarize, in the finitely repeated prisoner's dilemma game between a Homo oeconomicus and an emotional player, the Homo oeconomicus may cooperate initially to manipulate the emotional player into a friendly state of mind. Similarly, two emotional players may strategically manipulate each other's state of mind. If their friendly emotion is sufficiently strong, they will end up cooperating for the full duration of the game.

The Infinitely Repeated Interaction
Consider the infinitely repeated interaction between a Homo oeconomicus, player 1, and an emotional player, player 2. 9 The set of feasible payoff vectors (utility vectors) in a stochastic game is the convex hull of the range of the present values from the pure stationary Markov strategy profiles. 10 The two players have eight pure stationary Markov strategies each: Choose C regardless of state; choose C in the friendly and neutral states and D in the hostile; choose C in the friendly state and D in the neutral and hostile states; choose C in the friendly and hostile states and D in the neutral; choose D in the friendly state and C in the neutral and hostile states; choose D in the friendly and neutral state and C in the hostile state; choose D in the friendly and hostile states and C in the neutral state; and choose D regardless of state.
Player 1's utilities are state independent, and the minmax utility is v 1 (s) = d. Since both players start in the neutral state of mind s 0 = NN, player 2's minmax utility is v 2 (δ, s 0 ) = d − γd. Figure 2 illustrates the set of feasible and individually rational payoff vectors. The shaded areas mark the differences to the benchmark case of play between two Homo oeconomicus. Player 2 can achieve a higher utility compared to a Homo oeconomicus given that sufficient time is spent in the friendly state of mind. Player 2 can also achieve a lower utility compared to a Homo oeconomicus given that sufficient time is spent in the hostile state of mind. Note also that a Homo oeconomicus can receive a higher utility in a game with an emotional player compared to the benchmark case. This is because 8 The observant reader may note that the state s = FF, with two emotional players, also transforms the stage game. In short, the stage game of s = FF is a prisoner's delight if c + αc > b > d + αd, a stag hunt if c + αc > b and d + αd > b, and a hawk-dove game if c + αc < b and αb > d + αd. 9 The following analysis relies on the folk theorem for stochastic games proved by Dutta [13]. player 2 can have a negative concern for player 1's material utility and thus has a lower minmax payoff. In other words, a Homo oeconomicus can bring about a higher payoff for him-or herself by using the emotional player's self-knowledge of own hostile state of mind.
The set of feasible and individually rational utility vectors. One emotional player with three states of mind and one Homo oeconomicus play the game.
In addition, there is a subset of feasible and individually rational utility vectors in which both players receive a higher utility than what is possible for two Homo oeconomicus. This is because player 2's positive concern for the material payoff of player 1 increases the maximum utility that can be received in each period. If b = 4, c = 3, d = 2, then the maximum utility for any player in the benchmark case is 3.29, and the cooperative utility vector is (3,3). If, in addition, γ = α = 0.2, the maximum utility player 1 can receive from the interaction with the emotional player is 3.45, and the maximum utility the emotional player can receive is 3.74. The maximum symmetric utility vector is (3.11, 3.11).
The cooperative utility vector, (c, c + αc), belongs to the set of feasible and individually rational utility vectors. Moreover, as we will see, both the Grim Trigger and mutual minmax strategy profiles can sustain this outcome as a subgame perfect Nash equilibrium, given that the players are sufficiently patient.

Grim trigger
In a standard repeated game the one-shot deviation principle is used to verify that a strategy profile is a subgame perfect equilibrium. In a stochastic game the players have two reasons to deviate from a strategy. As in the repeated game they may do so for the immediate payoff, but there is also an incentive to deviate in order to influence future state transitions. As the players become more patient the second incentive becomes more important. In a stochastic game with either a finite set of states or a deterministic transition function, a strategyσ i is a one-shot deviation for player i, from strategy σ i , if there is only one history at which the strategies disagree. A strategy profile σ in such a game is subgame perfect if, and only if, there are no profitable one-shot deviations [14].
First consider the game between one Homo oeconomicus, player 1, and one emotional player, player 2. The Grim Trigger strategy profile is subgame perfect if neither of the two players has a profitable one-shot deviation. Player 1 has no incentive to deviate if the normalized discounted utility from using the strategy, c, is higher than the sum of the normalized discounted utilities from player 1's best one-shot deviation, (1 − δ)b, and from continued play of the Nash equilibrium, δd.
Player 1 has no incentive to deviate iff This condition is identical to the corresponding one in the benchmark case. Note that player 2 cannot hurt player 1 more than by choosing D, and since player 1 only cares for own material payoff, player 1 is unaffected by player 2's state of mind. Now turn to the emotional player, player 2. Assume that player 2 uses the strategy. In the first period player 2 is in the neutral state of mind and receives a normalized discounted utility of (1 − δ)c before transitioning to the friendly state of mind and remaining there for the rest of the interaction, receiving a normalized discounted utility of δ(c + αc).
Player 2 can deviate either in the neutral or in the friendly state of mind. It is most profitable for player 2 to deviate already in the first period because the immediate benefit from cooperation is lower in the neutral state. The normalized discounted utility in the first period is then (1 − δ)b. In the second period the Nash equilibrium action profile (D, D) is played. Player 2 is in the friendly state of mind since player 1 choose C in the first period. Player 2's normalized discounted utility in that period is (1 − δ)δ(d + αd). In the third period, player 2 has transitioned to the hostile state of mind and will stay there for the remaining interaction, receiving a normalized discounted utility of δ 2 (d − γd).
If s = NN, then player 2 has no profitable one-shot deviation from the Grim Trigger strategy profile iff When α = γ = 0, (4) is identical to (3). An increase in either α or γ decreases player 2's profitability of deviating. Player 2's positive concern for the other player's material utility increases the benefit from cooperation, and the negative concern increases the cost of deviating. Player 2 requires less patience than player 1 not to deviate, and the common discount factor required for Grim Trigger to be subgame perfect is determined by the condition for player 1.
Let δ GT H and δ GT be the minimal discount factors required for the Grim Trigger strategy profile to be subgame perfect in the case of play between one Homo oeconomicus and one emotional player, and in the benchmark case respectively. Next consider the game between two emotional players. Note that an emotional player receives no utility from the state of mind of the other player, only from their material payoff. Therefore the condition for an emotional player not to deviate from a strategy profile does not depend on the type of the other player.
Thus, in the infinitely repeated interaction between two identical emotional players, the Grim Trigger strategy profile is subgame perfect iff (4) holds. An increase in either α or γ decreases the patience required not to deviate; both states of mind facilitate cooperation when the Grim Trigger strategy is used.
Let δ GT be defined as before and let δ GT E be the minimal discount factor required for the Grim Trigger strategy profile to be subgame perfect in the game between two emotional players.

Proposition 5.
Suppose that both players use the Grim Trigger strategy. Then δ GT E < δ GT , and δ GT E is decreasing in α and γ.
To summarize, two emotional players can sustain cooperation easier than two Homo oeconomicus if they use the Grim Trigger strategy. With that strategy, both the friendly and the hostile state of mind facilitate cooperation.

Mutual Minmax
Consider next the mutual minmax strategy profile in which the players choose C as long as both have always chosen C; they choose C also if the history contains L consecutive periods of mutual play of D after which only C has been played; they play the mutual minmax action, D, for L consecutive periods after all other histories. Let any history after which the players choose C be denoted the cooperative phase, and any history after which the players choose D be denoted the punishment phase.
A noteworthy implication of the transition probabilities in Figure 1 is that two players who punish each other by choosing D will both transition to the hostile state of mind. When the punishment period is over, the players are still in a hostile state of mind. 11 In the benchmark case the players may have an incentive to deviate either during the cooperative phase or during the punishment phase. The players have no incentive to deviate during the punishment phase since the mutual minmax action profile, (D, D), coincides with the unique Nash equilibrium. In the cooperative phase the players have no incentive to deviate iff The normalized discounted utility from deviating is the sum of three terms: The one-shot deviation utility of (1 − δ)b, the utility received during the punishment phase of L periods, (1 − δ)(δ + . . . + δ L )d, and the utility received from returning to choosing C after the punishment period, δ L+1 c.
The above condition can also be presented as a cost-benefit calculation. The benefit of deviating is the utility from the one-period deviation, b, minus the one-period utility from using the strategy, c. The cost of deviating is the utility received if no one had deviated, c, minus the utility received during the punishment phase, d, for the L consecutive periods of punishment. Thus (5) can be rewritten as Now consider the case of one Homo oeconomicus and one emotional player. The condition required for the Homo oeconomicus not to deviate is the same as the condition in the benchmark case. The reasoning is similar to the case of the Grim Trigger strategy profile. Now turn to player 2, the emotional player. Player 2 can either deviate during the cooperative phase or during the punishment phase. Moreover, the incentives to deviate are affected by player 2's states of mind.
First consider the equilibrium path history, in which no player thus far has chosen D. Start with assuming a punishment phase of length one, L = 1. On the equilibrium path, player 2 can either deviate in the neutral or in the friendly state of mind, and it is most profitable to deviate in the neutral state. Player 2's benefit from the best one-shot deviation is b − c. After the deviation the players punish each other for L periods. In the first punishment period, however, player 2 is in the friendly state of mind since player 1 cooperated in the previous period, and incurs a normalized discounted cost of (1 + α)(c − d)δ. Player 2 then transitions to the hostile state of mind, and therefore receives a smaller utility from choosing C during the two periods required to transition to the friendly state of mind. This implies a additional cost of δ 2 (α + γ)c + δ 3 αc.
Suppose that L = 1 and s = NN, then player 2 has no profitable one-shot deviation in the When α = γ = 0, (7) is identical to (6). When α, γ > 0, player 2 has a higher cost of deviating than player 1. Now assume L = 2. Player 2's one-period benefit from deviating is as before, and so is the cost in the first punishment period. In the second period of the punishment phase, player 2 has transitioned to the hostile state of mind, and incurs a normalized discounted cost of ((1 + α) In addition, player 2 also incurs the cost pf transitioning back to the friendly state of mind once the players return to cooperation.
Suppose that L = 2 and s = NN, then player 2 has no profitable one-shot deviation in the cooperative phase iff For punishment phases of length L > 2, the above condition can be generalized to When α = γ = 0, (9) is identical to (6). An increase in either α or γ increases the cost of deviation and thus decreases the patience required to use this strategy.
Player 2's states of mind have three effects on the profitability of deviating in the neutral state of mind. First, the utility from continued cooperation is larger because of the, in the friendly state of mind, positive concern for the other player's material payoff. Second, the utility during the punishment phase is smaller if L > 1, due to the negative concern for the other players material payoff. Third, it takes time for player 2 to calm down and transition back to a friendly state of mind once the players have returned to cooperation.
Next consider the off the equilibrium path histories. If one player deviated in the previous period, the players are supposed to play the mutual minmax action profile, (D, D), for L consecutive periods. If player 1 deviated then player 2 is in the hostile state of mind and has no incentive to deviate from the punishment. However, if player 2 deviated and player 1 cooperated, player 2 has transitioned to the friendly state of mind and might be less inclined to punish the other player. If player 2 deviates from punishment, the immediate benefit is: αb − (1 + α)d. After this deviation, the players are supposed to restart the punishment phase, and now player 2 has transitioned to the hostile state of mind. The normalized discounted utility from the extra period of punishment is δ L+1 (d − γd). Player 2 would have received δ L+1 (c − γc) from using the strategy. In addition, player 2 also faces the delayed cost of transitioning back to the friendly state of mind.
If s = NF, then player 2 has no profitable one-shot deviation from punishment iff The friendly state of mind has two effects on the profitability of deviating during the punishment phase. It increases the cost of punishing the other player due to the positive concern for the other player's material payoff, but also increases the cost of deviating due to the cost of transitioning back to the friendly state of mind after the punishment phase. When δ L+3 > (b − d)/c the first effect dominates and an increase in α increases the cost of punishing the other player.
Further, the hostile state of mind also has two effects on the cost of punishing the other player. It increases the cost of deviating from punishment by increasing the cost of remaining in the hostile state, but also decreases the cost of deviating by decreasing the difference in utility between cooperation and punishment. If player 2 is sufficiently patient, δ > (c − d)/c, the first effect dominates, and an increase in γ increases the cost of deviating from the punishment. Thus, while player 2's positive concern for the other player's material payoff can make threats of punishment less credible, the possibility of transitioning to the hostile state of mind offsets this effect.
Finally, after a punishment phase the players are supposed to return to cooperation, but once in the hostile state of mind, player 2 may require more patience to do so.
First suppose L = 1, and that player 2 is in the hostile state of mind. Player 2's immediate benefit from deviating is b − c + γc. Since player 1 cooperated player 2 transitions to the neutral state of mind, and the cost of the single punishment period is (c − d)δ. Following the punishment period player 2 transitions to the hostile state of mind. Once cooperation is resumed, player 2 also faces the cost of transitioning from the hostile to the friendly state of mind.
If L = 1, and s = NH, then player 2 has no profitable one-shot deviation from cooperation iff When α = γ = 0, (11) is identical to (6). An increase in α increases player 2's cost of deviating, and an increase in γ increases player 2's benefit from deviating. Player 2 might thus require more patience than player 1 for the strategy profile to be subgame perfect. Now suppose L = 2. Then player 2's one-period benefit from deviating is as before, and so is the cost in the first period of the punishment phase. In the second period of the punishment phase, player 2 has transitioned to the hostile state of mind. In the hostile state of mind player 2 receives a lower utility, d − γd, than in the friendly state of mind. After the punishment phase has ended, player 2 faces the cost associated with transitioning back to the friendly state of mind.
For punishment phases of length L > 1, the condition in (11) can be generalized to When α = γ = 0, (12) is identical to (6). An increase in α increases player 2's cost of deviating, whereas an increase in γ now increases both the benefit and the cost of deviating. The cost increase dominates if If this inequality holds, then the Homo oeconomicus has the binding restriction on the minimal discount factor required for the mutual minmax strategy profile to be subgame perfect.
Given that α is sufficiently small, such that player 2 has no incentive to deviate during the punishment phase, and that (13) holds, player 2 requires less patience than player 1 not to deviate from the mutual minmax strategy profile. The minimal discount factor required is therefore determined by the condition for player 1.
Let δ MM H and δ MM be the minimal discount factors required for the mutual minmax strategy profile to be subgame perfect in the case of play between one Homo oeconomicus and one emotional player, and in the benchmark case resepectively. Next consider the case of two emotional players. Both the positive and the negative concern for the other player's material payoff has conflicting effects on cooperation. The friendly state of mind makes it more costly to deviate from cooperation, but also more costly to punish the other player for deviating. The hostile state of mind can obstruct cooperation by making it more difficult to return to cooperation after a punishment phase, but can at the same time facilitate cooperation by making threats of punishment more credible. When L = 1, the mutual minmax strategy profile is subgame perfect iff (7), (10), and (11) hold. When L > 1, it is subgame perfect iff (9), (10), and (12) hold.
Only if α is sufficiently small, such that the players have no incentive to deviate during the punishment phase, and the players are sufficiently patient, such that (13) holds, can two emotional players sustain cooperation easier than in the benchmark case when using the mutual minmax strategy.
Let δ MM be defined as before and let δ MM E be the minimal discount factor required for the mutual minmax strategy profile to be subgame perfect in the game between two emotional players.

Proposition 7.
Suppose that both players use the mutual minmax strategy with a punishment length L > 1.
If αb < (1 + α)d, and (13)  To summarize, two emotional players who use the mutual minmax strategy may find it either easier or more difficult to sustain cooperation than two Homo oeconomicus. Both the hostile and the friendly state of mind have conflicting effects on cooperation.

Discussion
This paper has proposed a model of repeated interactions between players who react emotionally to the history of play, and who can become friendly if the other player cooperates, and hostile if the other player defects. Focus is on players who transition between three states of mind as a response to the last observed action of the other player. A player in a friendly (hostile) state of mind values the other player's material payoff positively (negatively), and a player in a neutral state of mind only cares for own material payoffs. The traditionally studied Homo oeconomicus corresponds to a player who never leaves the neutral state of mind.
Cooperation in the infinitely repeated prisoners' game is facilitated by the possibility that a player's concern for the other player's material payoff changes, given that the players use the Grim Trigger strategy profile. A player in the friendly state of mind finds it less profitable to deviate from cooperation. If the other player deviates, the player transitions to a hostile state of mind in which it is less profitable to deviate from punishing the player. The finding that emotions can help sustain cooperation is in line with the reasoning by Frank [15] who proposes that emotions have evolved to help sustain cooperation and that cooperative individuals experience strong emotions. However, if the players use the mutual minmax strategy profile then emotions can obstruct cooperation. A player who is in the hostile state of mind after a punishment phase finds it less profitable to return to cooperation. These results highlight the importance for emotional players to be able to commit to a Grim Trigger strategy profile.
An assumption of the proposed model is that the players discount their material and psychological payoffs identically. In reality, the discount factors are likely to differ. Suppose for example that the discount factor is higher for the psychological payoff than for the material payoff. This would imply that the psychological payoff receives a higher weight in the repeated game, further decreasing the patience required to sustain cooperation for players using the Grim Trigger strategy profile.
Another assumption of the model is that the players only care about, or remember, the last observed action of the other player. This is admittedly a simplification. Several other factors affect an individual's state of mind. Consider for example the case with a single player who is affected by his or her own actions. The player could then potentially choose actions such that he or she transitions to the state of mind with the most beneficial preferences. The player could then choose his or her preferences [16]. To which degree this is a real phenomenon is an empirical question.
The assumption of a one-period memory can be relaxed. The effect of a longer memory depends on how the players reason about, and justify, their own actions. If the players remember the last two periods, they may not only react to the last two observed actions of the other player, but also take into account how their own action in the second to last period affected the other player's choice of action in the last period. To what extent individuals take responsibility for how their own actions affect the outcome is an empirical question, but the growing literature on motivated beliefs in both economics and psychology suggests that individuals have a remarkable ability to justify their own behavior [17][18][19][20]. An interesting avenue for further research is to study the difference in cooperative outcomes between individuals who justify their behavior, and individuals who acknowledges the consequences of their behavior.
A limitation of the model is that the players receive their payoffs, both material and psychological, before transitioning to another state. In real life, the state transitioning is likely to occur before the evaluation of the psychological payoff. For example, take the player in the friendly state of mind who observes the other player choosing D. This player receives a payoff of αb before transitioning to the hostile state of mind. If the transition had occurred first, the payoff had been −γb, a notable difference. This assumption does not substantially affect the analysis of the infinitely repeated game. However, the analysis of the finitely repeated game between one Homo oeconomicus and one emotional player is more sensitive since it relies on the equilibrium (D, C) to be played in the last period.
The strategic consequences of a player's state of mind depend on the structure of the game. In the infinitely repeated prisoner's dilemma game, the players need only concern themselves with their own state of mind. The emotional player can neither hurt nor reward the other player more than a Homo oeconomicus would. In other games, for example a game where two players cooperate to produce a public good by supplying complementary inputs, emotional players can both hurt and reward the other player. In such games the players also need to consider the other player's state of mind.
The model proposed in this paper share similarities with the literature that models bounded rationality using finite automata. A finite automaton consists of a finite set of states, a transition function, and an output function. The output function determines the one-shot game strategy to be played in that state [21,22]. An emotional player does not have determined one-shot game strategies for each state of mind, but a state dependent utility function which puts restrictions on the strategies for the repeated game.
Traditionally in game theory, and in this paper as well, utility depends solely on actions. Utility can also depend directly on the players' beliefs, and it is common for emotions to depend on the individual's beliefs or expectations [23]. Emotions modeled using Psychological game theory [24,25] can depend on the players' expectations and beliefs about intentions [26][27][28].
A related concern is that of social norms. When games are played in a historical and cultural context the players know how similar games have been played in the past, and as a consequence social norms specifying which actions the players are expected to consider usually develops [29]. If the social norm is for everyone to defect in the prisoners' dilemma game, the emotional reaction is most likely not the same as if the social norm is for everyone to cooperate. The social norm affects the players' expectations which in turn affect the players' transition probabilities. This is left as an avenue for future work.

Conflicts of Interest:
The author declares no conflict of interest.

Appendix A
Consider the case of two emotional players with stochastic transition state transitions. The transition probabilities are presented in Figure A1, where P|C denotes the probability, conditional on observing action C, of transitioning from the current state to the neutral or friendly state, and P|D denotes the probability, conditional on observing action D, of transitioning from the current to the hostile state. The corresponding stochastic game has nine states, S = M 2 , all reachable from the initial state. Consider for example the specific case where the players have p = q = w = 1 and v < 1. In this case the players remain longer in the hostile state of mind before transitioning to the neutral state of mind after observing C. This allows for more or less resentful players, where a player is less resentful the larger v is.
Let the punishment phase be of length L > 1. If no player so far has deviated, the players can either deviate in the first period, s = NN, or any subsequent period, s = FF. The players find it most profitable to deviate in the neutral state due to the lower utility from cooperation.
If L > 1, and s = NN, then the players have no profitable one-shot deviation from cooperation iff When v = 1, (A1) is identical to (9). A decrease in v increases the players' cost of deviating by increasing the number of periods they spend in the hostile state of mind.
After a deviation, the players are supposed to choose their mutual minmax action for L consecutive periods. However, if they care sufficiently for the other player, they may have an incentive to deviate from punishment.
If the players are in the friendly state of mind, then they have no profitable one-shot deviation from punishment iff When v = 1, (A2) is identical to (10), and an increase in v increases the cost of deviating from punishment by increasing the time spent in the hostile state of mind.
After a punishment phase the players should return to cooperation, but once they are in the hostile state of mind they might be less inclined to do so.
If L > 1, and s = HH, then the players have no profitable one-shot deviation from cooperation iff When v = 1, (A3) is identical to (12). The cost of the punishment phase is lower in the hostile state of mind than in the neutral. A decrease in v decreases the cost of punishment and makes it more profitable for the players to deviate. The players also face the cost of remaining in the hostile state of mind even after cooperation is resumed. The probability is decreasing in v and makes it less profitable for the players to deviate. The first effect dominates, and a decrease in v increases the players' cost of deviating also in the hostile state of mind.
Let δ MM EES be the minimal discount factor required for the mutual minmax strategy profile to be subgame perfect between the two emotional players with stochastic state transitions.
Proposition A1. Suppose that both players use the mutual minmax strategy with a punishment length L. If αb < (1 + α)d holds, then δ MM EES is decreasing in v.
In other words, more resentful players require more patience to sustain cooperation in the prisoners' dilemma game when they use the mutual minmax strategy due to their difficulty of returning to cooperation after a punishment phase.