This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

The paper presents an evolutionary model, based on the assumption that agents may revise their current strategies if they previously failed to attain the maximum level of potential payoffs. We offer three versions of this reflexive mechanism, each one of which describes a distinct type: spontaneous agents, rigid players, and ‘satisficers’. We use simulations to examine the performance of these types. Agents who change their strategies relatively easily tend to perform better in coordination games, but antagonistic games generally lead to more favorable outcomes if the individuals only change their strategies when disappointment from previous rounds surpasses some predefined threshold.

Individuals are averse to unpleasant experiences. When such experiences happen, it makes sense to assert that the individuals affected shall choose different strategies from those that brought about unsatisfactory outcomes. In this paper we present three variations of a learning model, which reflects the intuitive fact that agents try to eschew perceived disappointing outcomes. Disappointment emerges when a player fails to achieve the maximum level of payoffs that would be possible, had a different strategy been chosen. In other words, the core assumption is that a differential between someone’s actual payoffs and the maximum level of potential payoffs (with the opponent’s choice taken as a given) generates a tendency to choose a different strategy in the next round of the same game.

While it seems safe to conjecture that individuals avoid disappointment in general, individual reactions to past disappointment outcomes are contingent on psychological issues, and, as such, they may vary dramatically across persons. For example, a person with relatively low tolerance to disappointment might be expected to change their strategy after a disappointing outcome with higher probability than another person who is more patient. Evidently, the individual psychological profile is important in determining action. Each one of the three variations of the learning model we describe represents a distinct behavioral type: we deal with spontaneous and impatient agents, players who are rigid and display high inertia, and ‘satisficers’. We use a simulation program to study the evolutionary equilibria in an assortment of 2 × 2 games with populations consisting of the aforementioned behavioral types, the aim being to compare these types’ performances in coordination and antagonistic games.

Several prominent authors (such as Sugden [

One might wonder why we choose to introduce newer learning rules, rather than use something from the wealth of rules that can be found in the literature. The three rules presented in this paper have a clear behavioral background, based on the postulation that what determines human action now is possible disappointment experienced in past play. Therefore, even if the adaptive procedures we suggest could be approximated by an existing rule, we are interested in exploring how the

The paper is structured in five sections.

Evolutionary game theory took off in the first half of the 1970s, after Maynard Smith and Price [

Usually, evolutionary selection processes do not just portray how a population grows over time, but they are also based on an underlying series of assumptions that offer an explicit rationale of how agents adapt with historical time. Except for natural selection, Young (1998) distinguishes between rules of imitation (for example, [

Evolutionary dynamics can be deterministic or stochastic. Deterministic evolutionary dynamics (such as the replicator dynamics) describe the evolutionary path with systems of differential (or difference) equations; each one of these equations expresses the increasing (or decreasing) rate of some strategy’s frequency as a function of the portion of the population who chooses this strategy, and in accordance to the assumed revision protocol [

At first, this paper describes an agent-driven, stochastic evolutionary model featuring heuristic learning. In each new period, players observe whether their past choices have been best replies to those of their opponents. If not, then “disappointment” emerges, and the players present a tendency to switch to alternative strategies in the future. Individual choice based on the avoidance of disappointment or regret has been discussed in the literature by various authors [

In Loomes and Sugden seminal contribution [

The paper explores these dynamics by use of simulation software, confining the analysis to 2 × 2 symmetric games. The aim is to gain insights on the properties of the different revision protocols (each one of which corresponds to a specific behavioral type), and, thus, investigate the possibility that some of these types consistently outperform the rest. The software also allows for stochastic shocks in the form of ‘intruders’ (i.e., preprogrammed automata), who may be matched, with positive probability, with members of the original population. Allowing for such perturbations is important, because they account for random errors or even calculated deviations (or interventions), which can potentially affect the evolutionary course, often in unexpected ways. Simulation software programs are, in any case, being used increasingly in the literature in the study of evolutionary games; see, for instance, [

Let _{i}_{i}_{i}_{i}_{i}^{n}_{1}(_{2}(_{n}

We use ^{-i} to denote a pure strategy combination of all players except ^{-i} to be the nonempty finite set _{i}_{i}_{i}^{-i}) ≥ _{i}^{-i}) ∀ _{i}^{-i} to be the nonempty finite set _{i}_{i}_{i}^{-i}) ≤ _{i}^{-i}) ∀ _{i}_{i}_{i = }_{i}^{-i}) − _{i}^{-i})}, for all ^{-i}, _{i}_{i}^{-i}, we calculate the difference between the payoffs associated with the best reply and the payoffs associated with the worst reply to ^{-i}; thus, we have _{1}_{2}_{i-}_{1}_{i+}_{1}_{n}

Suppose that _{t}_{,i}(_{t}_{t}^{-i}) player _{i}_{t}^{-i} (denoted _{t}^{-i}). Similarly, we use _{t,i}_{t}^{-i}. The following subsections describe three revision protocols, by providing different possible probability distributions used by

The short-sightedness protocol is described by the following probability distribution:
_{t}_{+1,i}(_{t}_{,i}(_{t}_{t}^{-i}) – _{t}_{,I}(_{t}_{t}^{-i}))/(_{i}_{i}_{t}_{+1,i}(_{t}_{,i}(_{t}_{t}^{-i}) – _{t}_{,i}(_{t}_{t}^{-i}))/_{t}_{t,i}

In this revision protocol, players are depicted as more rigid or patient, as they switch only after experiencing _{τ}_{,I}(_{τ}_{τ}^{-i}) – _{τ}_{,i}(_{τ}_{τ}^{-i})) ≠ 0,_{t}_{+1,i}(_{τ}_{,i}(_{τ}_{τ}^{-i}) – _{τ}_{,i}(_{τ}_{τ}^{-i}))/((_{i}_{i}_{t}_{+1,i}(_{τ}_{,i}(_{τ}_{τ}^{-i})–_{τ}_{,i}(_{τ}_{τ}^{-i}))/_{τ}_{,I}(_{τ}_{τ}^{-i}) – _{τ}_{,i}(_{τ}_{τ}^{-i})) = 0, then _{t}_{+1,i}(_{i}_{t}_{+1,i}(_{τ}_{τ,i}_{τ}_{,i}(_{τ}_{τ}^{-i}) = _{τ}_{,i}(_{τ}_{τ}^{-i}) = 0 when

If

Let _{i}^{T+1}, v_{i =}_{0,i} _{1,i} _{T,i}] and define _{i}_{0,i}_{τ,i = }v_{τ}_{-1,I} + 1, _{τ}_{τ}_{-1}; _{τ,i} =_{τ}_{τ}_{-1}. This vector effectively keeps track of when player _{i}_{i}_{i}_{i}_{τ}_{,i} is equal to 1, we can know that a change of strategy happened at _{τ-}_{1,i}. We can now define the ^{t}^{-}^{τ}·_{τ}_{,i}(_{τ},s_{τ}^{-i}) – _{τ}_{,i}(_{τ},s_{τ}^{-I})) ≥ _{t}_{+1,i}(_{i}_{i}_{t}_{+1,i}(^{t}^{-}^{τ}_{τ}_{,i}(_{τ},s_{τ}^{-i}) – _{τ}_{,i}(_{τ},s_{τ}^{-I})) < _{t}_{+1,i}(_{i}_{t}_{+1,i}(_{τ}_{τ,i}^{+}

The above reflects players who change their current strategy _{t}_{,i} rounds (where _{t}_{,i} rounds is discounted at rate

The software assumes a population of 1,000 agents and is amenable to symmetric 2 × 2 games

At the outset, agents choose strategies with predefined (by the user) probabilities, with _{t}_{0} is 0.5.

One crucial difference of our model with the canonical version of evolutionary game theory lies in that it makes no allusion to expected values, but rather, uses realized ones. This leads to a more plausible representation of the evolutionary course, consistent with the empirical assertion that someone’s current choices will reflect their tendency to avoid unpleasant past experiences due to perceived erroneous choices. The specific mechanics of the tendency to switch strategies following such ‘disappointment’ depends on the three behavioral types described in the previous section. The modifications of revision protocols (1), (2), and (3) necessary to introduce them into the software presented therein are straightforward. As only two players are randomly selected to participate in each round, the probability distributions (1), (2), and (3) are valid only for the periods where player

The following subsections present the modifications to protocols (1), (2), and (3), as implemented by the software program.

We denote the currently used strategy with ^{-i}. The time periods indicate the rounds where _{t}_{+1,i}(_{t}_{,i}(_{t}_{t}^{-i}) – _{t}_{,i}(_{t}_{t}^{-i}))/3_{i}_{t}_{+1,i}(_{t}_{,i}(_{t}_{t}^{-i}) – _{t}_{,i}(_{t}_{t}^{-i}))/3_{t}_{t}_{,i}.

In addition to its intuitive appeal, the introduction of behavioral inertia helps explain why the evolutionary process manages to escape its original state _{0} (see Subsection 4.5 for an example). Probability distribution (2') below is a special case of revision protocol (2) for 2 × 2 games and with _{τ}_{,i}(_{τ}_{τ}^{-I}) – _{τ}_{,i}(_{τ}_{τ}^{-i})) ≠ 0,_{t}_{+1,i}(_{τ}_{,i}(_{τ}_{τ}^{-I}) – _{τ}_{,i}(_{τ}_{τ}^{-i}))/3_{i}_{t}_{+1,i}(_{τ}_{,i}(_{τ}_{τ}^{-i}) – _{τ}_{,i}(_{τ}_{τ}^{-i}))/3_{τ}_{,i}(_{τ}_{τ}^{-i}) – _{τ}_{,i}(_{τ}_{τ}^{-i})) = 0 then _{t}_{+1,i}(_{i}_{t}_{+1,i}(_{τ}_{τ,i}_{τ}_{,i}(_{τ}_{τ}^{-i}) = _{τ}_{,i}(_{τ}_{τ}^{-i}) = 0 when

Three-period memory implies patient players who do not switch strategies at the first, or second setback. Change of current strategy

Additive disappointment offers a less stylized variation of the previous protocol: agent ^{τ}^{t}^{-}^{τ}_{τ}_{,i}(_{τ}_{τ}^{-i}) – _{τ}_{,i}(_{τ}_{τ}^{-i})) ≥ 2_{t}_{+1,i}(_{i}_{t}_{+1,i}(^{t}^{-}^{τ}_{τ}_{,i}(_{τ}_{τ}^{-i}) – _{τ}_{,i}(_{τ}_{τ}^{-i})) < 2_{t}_{+1,i}(_{i}_{t}_{+1,i}(_{τ}_{τ,i}

Under this protocol, players switch strategies when they feel they have ‘had enough’ of the disappointment due to their strategy choice. The specified discount factor and disappointment threshold, while arbitrary, models, nicely, agents who adopt satisficing behavior in that they stick to some possibly suboptimal strategy until their tolerance level (factoring in the relative importance of more recent disappointment) is exceeded.

Five classic games.

In each of these games, the “social welfare” standpoint rejects certain outcomes (as inferior). Unfortunately, individually rational action often leads players to these very outcomes. In the ‘

In this subsection, we assume that there are no intruders (

Unsurprisingly, in the ‘

In ‘

In these three variants of the coordination problem, standard evolutionary game theory admits each game’s pure Nash equilibria as evolutionarily stable. Which of the two obtains depends on the initial conditions: In ‘_{0} > 1/2). In ‘_{0} > 1/3).

In contrast, under the _{0} < 1) the system will perform a random walk, oscillating back and forth in each round with equal probabilities, to only rest if one the two equilibria is accidentally reached. The explanation for this lies in that both players have the same probability of changing their strategy when coordination fails; therefore, the probability for _{t}_{t}_{t}

It follows that under the short-sightedness protocol, agents caught in a pure ‘_{t} = p_{0} for all

The situation is different when the short-sightedness scenario is used in the context of ‘_{0} = 0. In short, the efficient outcome is attained as long as a minor fraction of individuals is opting, initially, for the first strategy, because an instance of non-coordination in this game brings greater disappointment to the player who chose the second strategy; hence, the probability with which the player who chose the second strategy will switch to the first strategy is greater than the probability with which the player who chose strategy 1 will switch to strategy 2. This explains the convergence to the efficient outcome from any state except for the one where (nearly) all players choose the second strategy.

Turning now to the _{0} = 0.5, then the system converges to either _{0} <(>) 0.5, it tends to _{0} > 0.42. When the equilibria are Pareto-ranked, and unlike the situation in which agents are short-sighted, the evolution of the optimal equilibrium requires a lot more initial adherents (i.e., a much higher _{0}). Clearly, the players’ relatively high inertia may inhibit the evolution of the Pareto optimal outcome since, if a critical mass of a least 42% of players opting for the Pareto optimal strategy is not present from the outset, it is sufficiently likely that those who choose it will experience three disappointing results in a row, causing them to switch to the strategy that corresponds to the suboptimal equilibrium.

On the other hand, under the _{0} > 0.17. Interestingly, when the suboptimal strategy (strategy 2) carries less risk in case of coordination failure, as in the ‘_{0} > 1/2, and condemned if _{0} < 1/2.

It is worth noticing that the analysis is sensitive to the relative attractiveness of the efficient equilibrium (just as the relative unattractiveness of the conflictual outcome made a crucial difference in ‘_{0} = 0). However, this can also work the other way round: if we make the inferior outcome less unattractive, then we get the same effect in reverse. For example, if we change the payoffs of the sub-optimal outcome of ‘_{0} = 1). In this last example, the three-period memory and the additive disappointment protocols lead to the efficient equilibrium as long as _{0} > 0.55 and _{0} > 0.74, respectively.

It is now clear that the agents’ behavioral type is a crucial determinant of the evolutionary path. The players’ attitude to disappointing resolutions may not make a difference in games with unique dominant strategy equilibria, like the ‘

Our first insight is that, while short-sighted players perform poorly when coordinating on equally desirable equilibria (e.g., pure ‘

Games 4 and 5, amended.

Simulation results in the case without random perturbations.

Short-Sightedness | Three-Period Memory | Additive Disappointment | |
---|---|---|---|

Randomness, possible convergence |
|||

Same results as ‘ |
|||

Same results as ‘ |
|||

Agents described by the additive disappointment protocol fare better in antagonistic interactions; their attitude of sticking to a strategy (unless the amassed disappointment from previous rounds surpasses some threshold) is bound to turn them into a peaceful group of people, not necessarily because they favor peace, but because they are contended more easily and have no incentive to strive for more, at peace’s expense. Moreover, these agents perform remarkably well in ‘

The three-period memory protocol credits the agents with some level of sophistication. Their elevated inertia does not seem to be in their favor in several cases, especially in ‘_{0} values. These players are sometimes not flexible enough to let the evolutionary course work in their favor, but this very rigidity is what may protect them against possibly unpleasant situations (such as being attracted by the sub-optimal equilibrium in ‘_{0} = 1, as happens under short-sightedness).

We have already noticed how short-sighted players may be heavily influenced by minor perturbations. To explore this further, the simulation software has been augmented potentially to include ‘intruders’. The latter interact with our population members with probability

To illustrate, let us consider ‘_{0} = 0, _{t}

‘

We now turn to ‘_{0} = 0.5, our (non-stochastic) simulation took more than one million iterations for the system to hit one of the two absorbing barriers. Naturally, the closer _{t} is to one of the two barriers/equilibria, the more probable convergence is to that equilibrium point. However, the addition of intruders can change this. Consider the case where _{0} = 0.1. In

‘

Notwithstanding the obvious merits of short-sightedness, the implied impatience of the agents and the ease with which they switch to the other strategy may not always be a virtue.

Our analysis in the case of no intruders in the previous section suggested that, in games of the ‘_{0} = 0.5. Without intruders, the evolutionary course would have taken the population to the sub-optimal outcome (Series 1). However, a sizeable population of intruders may avert this: with

‘

‘

The three-period memory behavioral code is generally resistant to shocks. While a minor shock may have dramatic effects under short-sightedness, the same does not apply when agents are more patient, even when the perturbation is far from discrete.

In ‘_{0} = 0.5. We find that, while the percentage of aggressive players indeed decreases, the effect is rather minor (the difference of the two series is less than 10%, which is insignificant

Turning to ‘_{0} = 0.35. Without intruders, as we saw in previous sections, the system will rest at the inefficient equilibrium (_{0} = 0.35, _{0}, the greater the value of

‘_{0} = 0.5. Series 1:

‘_{0} = 0.35. Series 1:

Meanwhile, in ‘_{0}<0.55 as its catchment area, and when _{0} is slightly greater than that, the efficient equilibrium is not threatened, not even when there is a 10% percent probability of meeting an intruder who always selects the second strategy (_{0} = 0.6. Series 1 emerges when there are no intruders, while Series 2 illustrates the scenario

‘_{0} = 0.4,

‘_{0} = 0.6. Series 1:

Our additive disappointment protocol stands as some kind of middle ground between the impatience of the short-sighted players and the inertia of agents behaving under the three-period memory protocol. This section concludes with several instructive scenarios based on the disappointment protocols.

Under additive disappointment, _{0} > 0.17). Even if _{0} < 0.17, a small perturbation is enough to avert convergence to the sub-optimal equilibrium (see _{0} = 0.1 and

‘Hawk-Dove’ with _{0} = 0.5. Series 1:

The relative inertia of the additive disappointment protocol is also illustrated in _{0} = 0.5,

‘_{0} = 0.1. Series 1:

‘_{0} = 0.1. Series 1:

‘_{0} = 0.5,

Simulation results for short-sightedness protocol with stochastic perturbations.

Protocol: Short-Sightedness | |||
---|---|---|---|

Game | Initial Condition | Convergence | Comments |

Stag-Hunt | If |
||

Stag-Hunt | If |
||

Stag-Hunt #3 | If |

Simulation results for three-period memory protocol with stochastic perturbations.

Protocol: Three-Period Memory | |||
---|---|---|---|

Game | Initial Condition | Convergence | Comments |

Hi-Lo | |||

Stag-Hunt | Convergence at |
||

Stag-Hunt #3 | If |

Simulation results for additive disappointment protocol with stochastic perturbations.

Protocol: Additive disappointment | |||
---|---|---|---|

Game | Initial Condition | Convergence | Comments |

Hi-Lo | If |
||

Stag-Hunt | If |
||

Stag-Hunt #3 |

The simulations presented in the previous section allow for comparisons across the three behavioral ‘types’ modeled in

The paper also illuminated the central role of stochastic perturbations in the determination of the relative social welfare effects of the different behavioral ‘types’. Short-sighted agents were shown to be highly sensitive to random perturbations and more likely to benefit from a benevolent third party (a social planner, perhaps?) who directs such shocks in a bid to steering the population to the desired equilibrium. On the other hand, if the planner’s intentions are not benign, the population risks being led to a sub-optimal outcome just as easily. In this sense, the three-period memory protocol shields a population from malevolent outside interventions at the expense of reducing the effectiveness of social policy that would, otherwise, have yielded effortless increases in social welfare.

Additive disappointment combines elements from both the short-sightedness and the three-period memory protocols, but its main theoretical disadvantage is that the system seems too sensitive to the choice of two exogenous parameters (the discount factor

On a similar note, for players acting under additive disappointment, there can be combinations of

Finally, a critical note: This paper has confined its attention to homogeneous populations comprising agents who subscribe exclusively to one of the three revision protocols. A more realistic analysis would allow not only for the coexistence of these protocols in the same population but also for heterogeneity within a single protocol (i.e., agents who, while adopting the additive disappointment protocol, feature different values for parameters

I am indebted to Yanis Varoufakis for his valuable comments. I would also like to thank three anonymous reviewers for their helpful suggestions.

The author declares no conflict of interest.

For comprehensive reviews, see [

As agent _{t,i}

The software was written by the author in Microsoft Visual Basic 6.0. Random numbers are generated by use of the language’s Rnd and Randomize functions.

As this adjustment causes no ambiguity, we will simplify notation by not adding