Imitation Dynamics in Oligopoly Games with Heterogeneous Players

: We investigate the role and performance of imitative behavior in a class of quantity-setting, Cournot games. Within a framework of evolutionary competition between rational, myopic best-response and imitation heuristics with differential heuristics’ costs, we found that the equilibrium stability depends on the sign of the cost differential between the unstable heuristic (Cournot best-response) and the stable one (imitation) and on the intensity of the evolutionary pressure. When this cost differential is positive (i.e., imitation is relatively cheaper vis a vis Cournot), most firms use this heuristic and the Cournot equilibrium is stabilized for market sizes for which it was unstable under Cournot homogeneous learning. However, as the number of firms increases ( n = 7 ) , instability eventually sets in. When the cost differential is negative (imitation is more expensive than Cournot), complicated quantity fluctuations, along with the co-existence of heuristics, arise already for the triopoly game.


Introduction
Ref. [1] remarkably shows that, when firms compete on quantity using the [2] adjustment process 1 , the Cournot-Nash equilibrium becomes unstable if the number of firms exceeds two.In fact, with linear demand and constant marginal costs, the Cournot-Nash equilibrium loses stability and bounded but perpetual oscillations arise already for a triopoly.For more than three firms oscillations grow unbounded, but they are limited once the non-negativity price and demand constraints bind.
Whereas [1] focused only on the homogeneous (Cournot) adjustment process, more recent research extends to models of heterogeneous expectations 2 .For instance, Ref. [3] focus on the Cournot heuristic in competition with rational firms, under replicator dynamics.Ref. [4] extend the framework in which these heuristics compete in a quantity-setting oligopoly with arbitrary number of firms and for general monotone selection dynamics.Each firm chooses a behavioral rule from a finite set of different rules, which are assumed to be commonly known.When making a choice concerning the behavioral rules, a firm takes the past performance of the rules, i.e., the past realized profit net of the cost associated with the behavioral rules to compare fitness.Both past performance and costs associated with the behavioral rules are publicly available.This implies that successful heuristics will continue to be used, while unsuccessful behavioral rules are dropped.
Experimental evidence for the (in)stability and homogeneous learning in Cournot games, even in linear environments, is rather mixed.Ref. [5] discuss a linear Cournot oligopoly experiment with four firms.They do not find that quantities explode as the [1] model predicts, instead the time average quantities converge to the Cournot-Nash equilibrium quantity, although there is substantial volatility of the actual period-by-period around the Cournot-Nash equilibrium quantity.Ref. [6] run laboratory experiments on two types of linear Cournot oligopolies, the one that displays an unique, stable interior Nash equilibrium (Type I) and a Type II oligopoly for which the interior equilibrium is unstable and two additional, stable boundary Nash equilibria emerge (with one firm acting as a de facto monopolist).Subjects learn to play the Type I, interior stable equilibrium, but not the unstable, or even the stable, boundary ones in the Type II oligopoly.Importantly, on the equilibration path, they reject prediction of homogeneous learning theories such as Cournot adjustment, fictitious play or best-reply to the average of opponents' entire past play 3 .The experimental rejection of the homogeneous learning theories above, naturally leads us to consider alternative heuristics and, more generally, heterogenous learning, with rules updating.
Among the alternative heuristics, there is an increasing interest, both theoretical and experimental, in understanding the performance of imitative players.Ref. [8] show that Cournot markets with players imitating the past successful strategy of their opponents converge to the Walrasian equilibrium.However, the hypothesis that players actually use the imitate-the-best heuristics in Counot games is invalidated, experimentally, by [9].They posit that (naive) imitate-the-best players may realize that past period "success" may be purely drive by a particular realization of opponents' types.Ref. [10] adds myopic optimizers (Cournot best-responders) to [8] population of imitate-the-best players, and finds that in stationary distribution of the stochastic process, the imitators are better off.Moreover, Ref. [11] prove that imitation can be unbeatable if imitate-the-best heuristic is not subjected to a money pump (i.e., game is not of the circular best-reply, Rock-Scissors-Paper type).Subsequently, Ref. [12] show that unconditional imitation, of the tit-for-tat variety, is essentially unbeatable in class of potential games.Ref. [13] analyze a Cournot duopoly, subjects earn on average higher profits when playing against "best-response" computers than against "imitate-the-successful" computers.Ref. [5] find that a process where participants mix between the Cournot adjustment heuristic an imitating the previous period's average quantity gives the best description of behavior.Our choice of modelling imitation as imitate-the-average lagged strategy in the population is informed, on the one hand, by this experimental result, and, on the other hand, by the evidence against imitatethe-most successful past period strategy in [9].The imitate-the-average heuristic will be tested, firstly, against the standard Cournot heuristic, i.e., firms best-respond to average population past play, and secondly, against 'rational' players, i.e., players that understand that they are facing a distribution of updating players types (heuristics) and, consequently, they respond optimally to the expected configuration of types in the market.We show that, even if the imitate-the-average tends to stabilize the Cournot equilibrium (the thresholds for which instability sets in is typically higher, ceteris paribus, in comparison to models of heterogeneous learning without imitation), the differential costs of the more sophisticated heuristic over the simpler one rest critical for the instability threshold number of players, as well as for the non-equilibrium dynamics around Cournot-Nash equilibrium.
Outside the oligopoly realm, learning rules evolution has also been investigated in twoperson, strategic games.Ref. [14] look at the interplay of three heuristics-best-response, imitate-the-majority, imitate-the-minority-and show that it may generate complex, non-Nash equilibrium attractors, such as limit cycles, even in simple, 2 × 2 Coordination games.Ref. [15] finds that, in a coordination game with endogenous updating of myopic bestresponse imitate-the best heuristic, the efficient equilibrium is selected.Ref. [16] also considers 2 × 2 coordination games, but with a threewise ecology of best-reply, betterreply and imitate-the best action among the sample of past play and prove that the riskdominant equilibrium is selected for small sample sizes.Ref. [17] studies the evolution of bargaining rules (i.e., pie-sharing demands rules) via an external stability argument, by which incumbent rules should do better than mutant demands, reminiscent of the concept of ESS of [18].To this rule evolution literature, we contribute the study of a threewise ecology of heuristics-naive best-reply, equilibrium (rational) play and imitate-the-average past play-which, to the best of our knowledge, has not been investigated before.It would be a direct exercise to apply our methodology to the simple 2 × 2 coordination games that the previous work on heterogenous learning rules has focused on.
A somewhat parallel stream of literature deals with the evolution of preferences 4 as the underlying factor upon which selection acts and which, in turns determines the actual strategy choice in the game played.Pioneered by [19], who shows that utility-maximization (homo economicus) may be just one among many preferences types selected by evolution in the quest of adaptation to environment, this "indirect evolutionary approach" was developed further in the context of two-person games by, e.g., [20,21].They look at stable configurations of preferences-game equilibria and document the striking possibility that non-Nash outcomes and non-maximizing preferences can be stable, evolutionarily.In the IO literature, Ref. [22] use this indirect approach to show that owners' preferences overs managers' incentives do not always converge to profit-maximization, but may favor sales maximization, in Cournot competition or even sales minimization, in differentiated Bertrand oligopolies.Relatedly, Ref. [23] prove that oligopolists' preferences may evolve towards total surplus (producer and consumer) maximization, rather than the (narrow) profit-maximization, assumed by the rational producer theory.
Our approach is largely driven by the quest for understanding the behaviour of boundedly rational players, disciplined by the evolutionary selection operating at the level of behavioral rules.In particular, the question of whether learning converges to the Counot equilibrium, in an environment with competing and costly learning rules.Therefore, we put forth, three ecologies of heterogeneous heuristics (Cournot vs. Imitation, Rational vs. Imitation, Rational vs. Cournot vs. Imitation).Both the fixed fractions and evolutionary switching between heuristics scenarios, are discussed.Our concern is, first and foremost, the role and performance of the imitate-the-average heuristic within a given menu of heuristics, and, secondly, if for a given configuration of heuristics, the Cournot-Nash equilibrium is (un)stable.Unlike, for instance, Ref. [10], our imitate-the-average rule does not always outperforms Cournot, or sophisticated (rational) play, but may co-exist with other heuristics in equilibrium.In contrast to the strong convergence result in [8], complicated, non-Cournotian equilibrium limit sets (two-cycles, strange attractors) may emerge once learning heterogeneity is endogenized.
Main findings are that, first, the stability of Cournot equilibrium critically depends on the cost advantage of the 'stable' heuristic (imitation) vis-a-vis the unstable one (Cournot myopic best-reply): when the (relatively) cheaper behavioral rule is imitation, the dynamics converge to a situation where most firms use this behavioral rule and all firms produce the Cournot-Nash equilibrium quantity, for market sizes for which homogeneous Cournot play is known to destabilize the interior equilibrium (e.g., triopoly).However, instability eventually sets in, if the number of firms passes a higher threshold (n = 7).In the case when the relatively cheaper heuristic is unstable, complicated endogenous fluctuations may already occur for n = 3, in particular, when the evolutionary pressure is high.The nonlinearity causing this erratic behavior comes from the endogenously updating of the fractions, because in our leading example the oligopoly specifications were linear.Secondly, when rational firms compete with imitators, in the particular scenario of linear inverse demand and constant marginal cost, the system is always stable regardless of the game and behavioral parameters.Last, in the case when rational firms, Cournot firms and imitators compete, the stability depends on the differential cost of the rational plays versus the Cournot and imitation heuristics, as well as on the magnitude of the evolutionary pressure.For very large intensity of selection (β → ∞) and for costly rational choice and costless Cournot and imitation rules, we are able to recover analytically the n = 7 instability threshold in the pairwise contest Cournot vs. imitation.
The remainder of this paper is organized as follows, in Section 2 the theoretical framework is introduced, namely the quantity-setting oligopoly, the menu of heuristics and the population evolutionary dynamics.In Section 3 the evolutionary dynamics will be investigated under exogenous population fractions whereas in Section 4 the stability of systems made of two learning rules will be investigated, under endogenous population dynamics.In Section 5 the model extended to a three heuristics environment, with rational players, Cournot and imitators competing and switching learning heuristics, according to their relative performance.Finally, we conclude in Section 6.

Set-Up
Consider a finite population of firms who are competing on the market for a certain good, each discrete-time period all producers have to decide their production plans for the next period.However, instead of simultaneously choosing the supplied quantities directly, the firms act according to behavioral rules that exactly prescribe the quantity to be supplied.Before the evolutionary model is studied a brief review of the traditional, static Cournot model will be given.
Consider a symmetric Cournot oligopoly game, where q i denotes the quantity supplied by firm i, where i = 1, . . .n. Next to that let Q = ∑ n i=1 q i be the aggregated production.Furthermore let P(Q) denote the twice differentiable, nonnegative and non-increasing inverse demand function and let C(q i ) denote the twice differentiable non-decreasing cost function, which is the same for all firms.For firm i the resulting profit function from the above described model is given by where Q −i = ∑ j̸ =i q j .Assume that the profit function of a firm is strictly concave in its own output q i .The profit maximizing strategy of firm i, taking the quantity supplied by the competitors as given, results in the well-known best-reply function for firm i, which is given by Due to symmetry, all firms have the same best-reply function R(•).Moreover, the symmetric Cournot-Nash equilibrium quantity q * corresponds to the solution of q * = R((n − 1)q * ).
In the sequel, we will restrict the analysis to oligopolies displaying an unique, interior Nash equilibrium, the so called Type I oligopoly in [6] 5 .This very general class includes, as special cases, oligopolies with linear demand and (i) constant, (ii) increasing and (iii) even decreasing marginal costs, as long as marginal costs do not decrease too fast, compared to the demand function.In Type I duopolies most learning processes (Cournot best-reply, adaptive dynamics, fictititous play, etc) are known to converge to the unique, interior fixed point.This is not the case for the Type II duopolies 6 .It is the main reason why we focus on the Type I oligopolies, as we want to isolate the instability built into the the oligopoly game itself from the destabilizing forces that reside in the heuristics' of choice and their updating mechanism.
The original Cournot specification of [1] falls within the class of Type I oligopoly and will be used as leading example: an oligopoly game with linear inverse demand and linear costs, given by respectively.First, in order to have a strictly concave profit function assume that b > 0. Furthermore, for strictly positive prices we assume that q i < a nb (so that aggregate output Q < a b and P(Q) > 0).For these specifications of the inverse demand function and cost function the reaction function is given by Note that if the other firms produce on average more (less) than the Cournot-Nash equilibrium quantity, firm i reacts by producing less (more) than that quantity.
Straightforward calculations show that in this case the Cournot-Nash equilibrium quantity, aggregated production, price and profit are equal to q Traditional Cournot analysis refers to a static environment.However, in a dynamic setting the reaction function introduced above can be used to study the so called Cournotdynamics where firms best-reply to their expectations where q i,t denotes the quantity supplied by player i in period t and Q e −i,t stands for player's i expectations about her opponents's total output at time t.The symmetric Cournot-Nash equilibrium where all firms produce q * is stable under the Cournot-dynamics if Main interest is on how firm i decides to play q * and, more specifically, what does firm i believe about Q −i when the production decision has to be made.Our key assumption is that firms are choosing their expectation-formation rule (heuristic) based on its performance.

Learning Heuristics
In the Cournot oligopoly game the producers have to form expectations about opponents' production plans.Based on this expectation firms decide how much to produce the next period.One approach is to assume complete information, i.e., rational firms with common knowledge of rationality.This implies that firms have perfect foresight about competitors' aggregated production plan, i.e., Q e −i,t+1 = Q −i,t+1 .This results in the following production plan: Alternatively one may consider rules that require less information, for example Q e −i,t+1 = Q −i,t .This results in the following production plan: where firms expect that aggregated production in the next period equals current aggregated production.This is the so called Cournot adjustment heuristic.It is an straightforward exercice to show that, if all firms use the Cournot heuristic, the Cournot-Nash equilibrium is a locally asymptotically stable 7 fixed point of system (3) whenever where λ(n) is defined as the largest eigenvalue of the Jacobian, evaluated at the equilibrium.Leading example.From Equation (2) it can easily be seen that R ′ (Q * −i ) = − 1 2 , meaning that if others' aggregated output increases by one unit, the Cournot-Nash firms decrease their output by 1  2 units.From stability condition (4) it follows that the Cournot-Nash equilibrium is stable for this specification only when n = 2 and unstable when n > 3 (and neutrally stable, resulting in bounded oscillations, for n = 3).The reason for this instability is 'overshooting': if aggregated output is above (below) the Cournot-Nash equilibrium quantity, firms react by reducing (increasing) their output.For n > 3 this aggregated reduction (increase) in output is so large that the resulting deviation of aggregated output from the equilibrium quantity is larger in the next period than in the current, and so on.
It is a broadly supported idea that not all producers best-reply to their expectations.Experiments [5] show that people often imitate others' behavior.A heuristic that possibly seizes this production plan is the so called imitation-the-average heuristic.Imitators believe that "everyone else can't be wrong" and will therefore produce the average of the other players' production in the next period, i.e., Local stability of the Cournot-Nash equilibrium with only imitation firms depends on the eigenvalues of the Jacobian matrix of the system of Equation ( 5) evaluated at that Cournot-Nash equilibrium q * .This Jacobian matrix is given by Imitators only respond to other firms' production and do not respond to their own production, therefore all diagonal elements are equal to zero.If one competitor increases current production by one unit, an imitator will increase next production with 1  n−1 unit, therefore all off-diagonal elements are equal to 1  n−1 .The Jacobian matrix (6) thus has n − 1 eigenvalues equal to − 1 n−1 and one eigenvalue equal to (n − 1) 1 n−1 = 1 which is the largest in absolute value.Therefore it follows immediately that the Cournot-Nash equilibrium is neutrally stable independent of n and game structure (demand and cost function).The reason for this is that if one producer changes his production plan the economy will stabilize to a new equilibrium unequal to q * and will remain at this new equilibrium until one producer deviates again.In fact this system has infinitely many neutrally stable equilibria, namely if q i = q † ∀i, the system is neutrally stable for all q † .

Population Dynamics
In the previous section it was shown how the supplied quantities evolve over time under the Cournot and the imitation heuristic.In this section it will be explained how the population fractions evolve over time.Let us first introduce the vector η t which has entries equal to η k,t , which is the fraction of the population that uses heuristic k at time t.Thus for every time t, η t denotes the K-dimensional vector of fractions for each strategy/heuristic and belongs to the We will now describe how the fractions η k,t evolve over time.It is assumed that the choice of a behavioral rule is based on its past performance, capturing the idea that more successful rules will be used more frequently.
Evolutionary game theory deals with games played within a (large) population over a long time horizon.Its main ingredients are its underlying game, e.g., the Cournot one-shot game, and the evolutionary dynamic class which defines a dynamical system on the state of the population.The evolutionary dynamical system depends on current fractions η t and current fitness U t .In general, such an evolutionary dynamic in discrete time, describing how the population fractions evolve, is given by with U t = (U 1,t , . . ., U K,t ) ′ the vector of average utilities and η t = (η 1,t , . . ., η K,t ) ′ the factor of fractions.To make sure that the population dynamics is well-behaved in terms of dynamic implications we assume that K(•, •) is continuous, nondecreasing in U k,t , and such that the population state remains in the K-dimensional unit simplex ∆ K .One widely used evolutionary dynamic in the literature on learning in games is the Logit Dynamics (see, e.g., [26] for an extensive treatment).It can be derived from discrete choice/random utility models and it specifies that fractions of different heuristics are updated according to The parameter β represents the intensity of selection, and it captures the idea of boundedly rational play, since individuals do not necessarily select the rule that yields the highest utility.Notice that in the extreme case where β = 0 we have completely random behavior: the noise is so large that observed average utility is equal for all behavioral rules.Each behavioral rule is thus chosen with equal probability: η k,t = 1 K ∀k.In the other extreme case, when β → ∞, everybody switches to the most profitable strategy each period 8 .
In case of equal costs of the heuristics, equilibrium fractions are thus given by η * k = 1 K ∀k, since production is equal and thus profits are equal.
In the leading examples we will focus on the Logit evolutionary dynamics.Firstly, because the Logit rule is a standard tool in economics for modelling boundedly rational choice, and, secondly, because it displays nice regularity/continuity conditions (0 ≤ η k ≤ 1).

Heterogeneity in Behavior in Cournot Oligopolies
In this Section we introduce heterogeneity in production decision' heuristics.In a population game set-up, n firms are randomly picked, from a large population of firms in which a fraction η k plays according to heuristic k.We first focus on pairwise competition between two heuristics, Cournot vs. Imitation and Rational vs. Imitation, respectively, under the no switching scenario.The assumption of fixed shares η for each period will be relaxed in Section 4.

Cournot vs. Imitation Firms with Fixed Fractions
In order to facilitate studying the aggregate behavior of a heterogeneous set of interacting quantity-setting-heuristics we study the Cournot model as a population game.Consider a large population of firms from which in each period groups of n firms are sampled randomly and matched to play the one-shot n-player Cournot game.We assume that a fixed fraction of η of the large population of firms uses the Cournot heuristic and the others use the imitation heuristic.After each one-shot Cournot game, the random matching procedure is repeated, leading to new combinations types of firms.The distribution of possible samples follows a binomial distribution with parameters n, and η.Below the example Cournot vs. Imitation firms will be discussed again but now under the assumption of random matching.
Suppose that a fraction of η of the population of the firms uses the Cournot heuristic and observes the population-wide average quantity qt and best responds to it, q C t+1 = R((n − 1) qt ), where q C t is the quantity produced by each Cournot firm in period t.Consequently a fraction of η firms of the large population makes use of the the imitation heuristic.Making use of the law of large numbers, the average quantity played in period t can be expressed as qt = ηq C t + (1 − η)q I t .Recall that imitate-the-average firms produce, in the next period, the other firms' average quantity, produced in the current period q I i,t+1 = Again, by a law of large numbers we obtain n−1 → qt when n → ∞.Therefore we obtain the following quantity dynamics Note that this is a 2-dimensional dynamical system which dimension cannot be reduced.Furthermore the Cournot-Nash equilibrium is not the unique equilibrium of the homogeneous, imitation rule, in fact all quantities are.The Cournot-Nash equilibrium is, however, still the unique equilibrium quantity of the heterogeneous heuristics dynamical system (9).Proposition 1.The Cournot-Nash equilibrium, where all firms produce the Cournot-Nash quantity (q * , q * ), is a locally stable fixed point for the model with exogenous fractions of Cournot and imitation firms if and only if Proof.It can easily be shown that the Jacobian matrix, evaluated at the Cournot-Nash equilibrium (q * , q * ), is given by The corresponding eigenvalues are λ 1 = 0 and Here λ 2 is the largest eigenvalue in absolute value.Thus the system is stable if |λ 2 | < 1, this is the condition stated in the proposition.
Leading example.Here R ′ * = − 1 2 substituting this in Equation ( 10) gives, after some simplification Meaning that an economy with as much Cournot firms as imitators (η = 1 2 ) is stable if n < 7. Next to that as found earlier, an economy with only Cournot firms (η = 1) is stable if n < 3. Furthermore, an economy where close to all firms use the imitation heuristic, but some Cournot firms exist (η close to zero), the economy is always stable.

Rational vs. Imitation Firms with Fixed Fractions
In this section we focus on the dynamics when there is competition between rational and imitation firms.We set the fraction of rational firms equal to η.A fully rational firm is assumed to know the fraction of imitation firms.Moreover, it knows exactly how much all firms will produce.However, we assume that it does not know the composition of firms in its market (or has to make a production decision before observing this).The rational quantity dynamics therefore have the following structure It forms expectations over all possible mixtures of heuristics resulting from randomly drawing n − 1 other players from a large population, of which each with chance η is a rational firm too, and with chance 1 − η is an imitator.Rational firm i therefore chooses quantity q i such that his objective function, its own expected utility is maximized given the production of the other players and the population fractions.Here q R t is the symmetric output level of all of the other rational firms in period t, and q I t is the output level of all of the imitation firms.The first order condition for an optimum is characterized by equality between marginal cost an expected marginal revenue.Typically, marginal revenue in the realized market will differ from marginal costs.
Given the value of q I t and the fraction η, all rational firms coordinate on the same output level q R t .This gives the first order condition δU R t (q i,t |q R t , q I t , η) δq i,t = 0, which equals to: Let the solution 9 to Equation ( 14), the rational play quantity be given by q R t = H R (q I t , η), the full system of equations is thus given by It is easily checked that if the imitators play the Cournot-Nash equilibrium quantity q * , or if all firms are rational, the rational firms will play the Cournot-Nash equilibrium quantity, that is H R (q * , η) = q * , for all η and H R (q I , 1) = q * for all q I .Moreover, if a rational firm is certain it will only meet imitation firms (that is η = 0), it plays a best response to the currently average played quantity, that is H R (q I t , 0) = R((n − 1)q I t ), for all q I t .In the remainder we will denote the partial derivatives of H R (q, η) with respect to q and η by H R q (q, η) and H R η (q, η) respectively.
Proposition 2. The Cournot-Nash equilibrium, where all firms produce the Cournot-Nash quantity (q * , q * ), is a locally stable fixed point for the model with exogenous fractions of rational and imitation firms if and only if |ηH q (q * , η) Proof.In order to determine the local stability of the equilibrium (q * , q * ) where all firms produce the Cournot-Nash quantity, we need to determine the eigenvalues of the Jacobian matrix of system (15), evaluated at the equilibrium.It can be shown that this Jacobian matrix is given by which has eigenvalues λ 1 = ηH q (q * , η) + 1 − η and λ 2 = 0. Consequently the system is locally stable when |λ 1 | < 1, this is exactly the condition stated.
Leading example.In the leading example the implicit function defining q R t (Equation ( 14)) when using that The system of equations for the leading example is given by The eigenvalues of this system are given by λ 1 = 0 and 2+(n−1)η < 1, this stability condition always holds and the economy is always stable in the linear-linear oligopoly.

Evolutionary Competition between Two Heuristics
In this Section we develop an evolutionary version of the model outlined in Section 3, i.e., relaxing the assumption that η is fixed.As before in ever period t, n firms play the n-player Cournot game.We now assume that the fractions of firms using a heuristic η evolves over time according to a general monotone selection dynamic, capturing the idea that heuristics that perform relatively better are more likely to spread through the population.Under the assumption of random interactions, the fitness of heuristic k is determined by averaging the payoffs from from each interaction with weights given by the chance of that specific state minus the information cost of using the heuristic.Denoting with Π t the expected payoff vector in period t, its entries-individual payoff or fitness in biological terms-of strategy 1 is given by: and with expected profits for heuristic 2 given by Π 2 = F(q 2 , q 1 , 1 − η).If the population of firms and the number of groups of n firms drawn from that population are large enough, average profits will be approximated well by these expected profits, which we will use therefore as a proxy for average profits from now on.
There might be a substantial difference in sophistication between different heuristics.As a consequence some heuristics may require more information or effort to implement than others.Therefore we allow for the possibility that heuristics involve information cost C k ≥ 0, k ∈ {C, I}, that may differ across heuristics.Fitness of a heuristic is then given by the average profits generated in the game minus the information costs of acquiring heuristic k, U k = Π k − C k .We only use the realized profit to determine the fitness measure of a behavioral rule.The fitness measure can be generalized by weighting the utility of the past M periods, yielding similar results [27].We assume that the above fitness measures U k are publicly observable.
Having the fitness measure we are ready to introduce the population dynamics.Let the fraction of firms using the first heuristic be given by η in period t.This fraction evolves endogenously according to an evolutionary dynamic which is an increasing function in the difference between the current fitness of the two heuristics and current fraction, that is The map K : R → [0, 1] is a continuously differentiable, monotonically increasing function with In the following two subsections we will derive two dynamical versions of the two models discussed in Section 3 and investigate their stability.First we investigate the stability of the Cournot-Nash equilibrium for the model with endogenous fractions of Cournot and imitation firms and second we investigate the stability of the Cournot-Nash equilibrium for the model with endogenous fractions of rational and imitation firms.

Cournot vs. Imitation Firms with Endogenous Switching
The dynamics in this section consists of three equations, two equations describing the quantity dynamics: the production of the Cournot firms and the production of the imitation firms.Next to that we need one equation to describe the dynamics of the population fraction.The population and quantity dynamics look like the following system of three equations: where ∆U t = U C,t − U I,t .Note that this is a 3-dimensional dynamical system which dimensions cannot be reduced.Furthermore, the Cournot-Nash equilibrium quantity q * is the unique equilibrium quantity of the complete dynamical system.Let η * be the unique equilibrium fraction of Cournot players, such that η * = K(−C), with C = C C − C I , the differential cost of the Cournot over the Imitation heuristic.For the general, monotone selection population dynamics K(•) we obtain the result stated in the proposition below.
Proposition 3.For C ≥ 0, i.e., costlier Cournot than Imitation heuristic, the Cournot-Nash equilibrium (q * , q * ) along with the equilibrium fractions η * = K(−C), is a locally asymptotically stable, fixed point for the model with endogenous fractions of Cournot and imitators where all firms produce the Cournot-Nash quantity, firms if and only if Proof.See Appendix A.
Leading example.The equilibrium quantities are given by q * .Here R ′ * = − 1 2 , filling this in Equation (21) gives the stability condition for the leading example.Thus the equilibrium (q * , q * , η * ) is stable when n < 4−η * η * .In the equilibrium, when all firms produce the same quantity, profits are equal and therefore the equilibrium fraction under Logit dynamics simplifies to η * = e −βC e −βC +1 .For C > 0 (strictly positive cost differential of Cournot over imitation heuristic) and very large intensity of selection β → ∞, η * → 1 2 and, therefore stability threshold becomes n < 7 10 .The same critical instability threshold n obtains for C = 0 coupled with any level of the intensity of selection β.
In Figure 1 the model is simulated under Logit-dynamics with intensity of choice parameter β, as in [26].Panel (a) depicts a period-doubling route to chaotic quantity dynamics as the number of firms n increases.The first period-doubling bifurcation is for n = 7 as calculated analytically.Panel (b) displays oscillating time series of produced quantity by the Cournot and imitation firms and the equilibrium quantity fraction q * .As one can see the Cournot quantities are fluctuating more than the imitation quantities.The stabilizing effect of the imitation firms is here clearly visible, when Cournot firms produce more (less) then the Cournot-Nash equilibrium quantity, the imitation firms produce less (more) than the Cournot-Nash equilibrium quantity and therefore decrease the aggregated deviation from the equilibrium.Panel (c) displays the resulting Cournot profit differential Π C − Π I .Panel (d) displays the resulting oscillating time series of the Cournot and imitation fractions.In Panel (e) a phase portrait is shown for the Cournot heuristic whereas in Panel (f) a phase portrait for the imitation heuristic is shown.In Panel (g) the largest Lyapunov exponent for an increasing number of firms is shown.Game and behavioural parameters are equal set to: n = 10, a = 17, b = 1, c = 1, C C = 0, C I = 0, β = 0.05.Initial conditions are set equal to: q C 0 = 0.8, q I 0 = 0.8, η 0 = 0.5.When the evolutionary pressure increases, the system evolves to an equilibrium different from the Cournot-Nash equilibrium where the imitation firms produce more than the Cournot-Nash equilibrium whereas the Cournot firms produce less.Imitation profits are therefore much higher and as a consequence the complete population switches to the imitation heuristic.The bifurcation diagram is re-plotted in Figure 2 under the same game and behavioral parameters and initial conditions, the only difference is that now β = 3.When 1.7 < n < 2.8 the imitation firms produce more then the Cournot-Nash equilibrium quantity while the Cournot firms produce less.This results in higher profits for the imitators and therefore the complete populations switches to imitators (η = 0).When 2.8 ≤ n ≤ 3.2 all firms produce the Cournot-Nash equilibrium quantity again, therefore profits and thus fractions are equal.When n > 3.2, the imitation firms produce again more then the equilibrium quantity while the Cournot firms produce less, except when n is close to 3.65, then all firms produce the Cournot-Nash equilibrium quantity.Finally, when n > 5.6 the imitation firms produce so much that the Cournot firms decide to produce nothing (q C = 0).

Rational vs. Imitation Firms with Switching
As in the previous section, we need a 3-dimensional system to describe the dynamics of the model.The rational firms produce each period such that their expected profit is maximized whereas an imitator produces in the next period the currently average played quantity.
The rational quantity dynamics therefore have the following structure It forms expectations over all possible mixtures of heuristics resulting from randomly drawing n − 1 other players from a large population, of which each with chance η t is a rational firm too, and with chance 1 − η t is a imitator.Rational firm i therefore chooses quantity q i such that his objective function, its own expected utility is maximized given the production of the other players and the population fraction.Here q R t is the symmetric output level of each of the other rational firms in period t, and q I t is the output level of each of the imitator firms in period t.The first order condition for an optimum is characterized by equality between marginal cost an expected marginal revenue.
Given the value of q I t and the fraction η t , all rational firms coordinate on the same output level q R t .This gives the first order condition Let the solution to Equation (23) be given by q R t = H R (q I t , η t ), the full system of equations is thus given by q R t+1 = H R (q I t+1 , η t+1 ) where It is easily checked that if the imitators play the Cournot-Nash equilibrium quantity q * , or if all firms are rational, then the rational firms will play the Cournot-Nash equilibrium quantity, that is H R (q * , η) = q * , for all η and H R (q I , 1) = q * for all q I .Moreover, if a rational firm is certain it will only meet imitation firms (that is η = 0), it plays a best response to the currently average played quantity, that is H R (q I t , 0) = R((n − 1)q I t ), for all q I t .In the remainder we will denote the partial derivatives of H R (q, η) with respect to q and η by H R q (q, η) and H R η (q, η) respectively.
Proposition 4. The Cournot-Nash equilibrium (q * , q * ) is a locally stable fixed point for the model with endogenous fractions of rational and imitation firms, where all firms produce the Cournot-Nash quantity, if and only if Proof.See Appendix A.
Leading example.Since the stability condition is the similar to the condition derived in Section 3.2, the equilibrium quantities (q * , q * ) and rational play fraction η * is stable for all n in this linear specification.

Rational vs. Cournot vs. Imitation with Switching Heuristics
In this section we combine the ideas that we gathered in Section 4. We will investigate the dynamics when the three heuristics discussed before compete.As before every round n firms are drawn from a large pool of firms to play the one-shot Cournot game.From this large pool of firms a fraction η R t plays according to the rational strategy in period t, a fraction η C t plays according to the Cournot heuristic in period t and consequently the fraction of imitators in period t is determined by 1 As in Section 4 the fitness of a heuristic is determined by the average payoff minus the information cost of using that heuristic.Again the average profits will be approximated by the expected profits but in contrast to Section 4 the distribution of states now follows a multinomial distribution instead of a binomial distribution.In general the average profit of a firm producing q 1 and competing with other firms that produce either q 2 or q 3 given the fractions η 1 and η 2 is stated below, in this average profit approximation the profit in each state is weighted by the chance of this state.
Π 1,t = F(q 1,t , q 2,t , q 3,t , The summation is over all possible combinations of k 1 and k 2 , which stand for the number of other firms producing q 1 and q 2 respectively, that is: Expected profits for heuristic 2 in period t are given by F(q 2,t , q 1,t , q 3,t , η 2,t , η 1,t ), expected profits for heuristic 3 in period t are given by F(q 3,t , q 2,t , q 1,t , 1 − η 1,t − η 2,t , η 2,t ).
The complete dynamical system consists of five equations, three for the quantity dynamics and two to describe how the fractions evolve.As in all previous sections, the Cournot firms play in the next period a best-response to the current aggregated output of the others, imitators play in the next period the average produced quantity by the others in the current period.Rational players produce every period the quantity that maximizes expected payoff given the fractions and production plans of all other firms (imitators, Cournot players but rational players too).The rational firms produce expectations over all possible mixtures of heuristics resulting from randomly drawing the n − 1 other players from the large population of firms.In this setting the rational objective function, its own expected utility is of the following form: with denotes the multinomial distribution of three heuristics in the population of firms and x = (q R t , q I t , q C t , η R t , η C t ) the system's state variables.The first order condition for an optimum of ( 27) is characterized by equality between marginal cost an expected marginal revenue.
Given the value of q C t q I t η R t η C t , all rational firms coordinate on the same output level q R t .Differentiating Equation ( 27) with respect to q i,t gives the first order condition, which is equal for all rational firms.This first order condition is given by: δU R t (q i,t |x) δq i,t = 0 which equals to: Let the solution to this be given by q R t = H R (q C t , q I t , η R t , η C t ).The system of quantity dynamics is thus given by Note that rational player plays such that expected marginal revenue equals marginal cost at t + 1 and a Cournot firm plays such that its marginal revenue (of period t) equals marginal cost (at period t).Therefore the Cournot heuristic is a lagged version of the rational heuristic if and only if Thus the Cournot heuristic is only a lagged version of the rational heuristic if the inverse demand is linear.In this specific case the analysis become easier because this gives the possibility to lower the dimension of the dynamical system.
It is easily checked that if the imitation and Cournot firms play the Cournot-Nash equilibrium quantity q * , or if all firms are rational, the rational firms will play the Cournot-Nash equilibrium quantity, that is H R (q * , q * , η R t , η C t ) = q * , for all η R and η C and H R (q C t+1 , q I t+1 , 1, 0) = q * for all q C , q I .In the remainder we will denote by ) with respect to q R , q C , q I , η R and η C respectively, evaluated at the equilibrium (q * , q * , q * , η R * η C * ), which we will denote by x * in the remainder of this chapter for notational convenience.Now that we have the quantity dynamics we can turn to the population dynamics.These are related to the population dynamics from Section 4 but differ significantly since we are in a three heuristic environment now.The population dynamics, as in Section 4, depend on relative fitness.Let the fraction dynamics be given by where η R t+1 is the fraction of rational firms in period t + 1 whereas η C t+1 is the fraction of Cournot firms in that period.Denoting the rational, Cournot and imitation heuristics' costs by C R , C C , C I , respectively, we obtain ), the difference in average fitness of the rational and the Cournot heuristic, and, in average fitness of the Cournot and the imitation heuristic.Note that K R and K C are R 2 → [0, 1] are continuously differentiable functions where the difference in fitness of the rational and Cournot heuristics and the difference in fitness of the Cournot and imitation heuristic are used as input.The difference in fitness of the rational and imitation heuristic is not used as an input variable since this information is captured implicitly in the other two differences.Note that K R is a monotonically increasing function in the first and second element whereas K C is decreasing in the first element but increasing in the second element.Furthermore, K R (0, 0) = K C (0, 0) = 1 3 .In the remainder of this chapter we denote K R 1 and K R 2 the partial derivatives of K R with respect to the first and the second element respectively and with K C 1 and K C 2 the partial derivatives of K C with respect to the first and the second element respectively.
The full system of quantity and population dynamics is given by: Since a dynamical system can only depend on lagged variables we substituted ϕ 2 , ϕ 3 , ϕ 4 , ϕ 5 into H R (•).In order to determine the local stability of the unique, interior equilibrium x * , we need to determine the eigenvalues of the Jacobian matrix evaluated at that equilibrium.The Jacobian of the general oligopoly-general selection dynamic K(•) has very complicated eigenvalues which cannot be expressed in useful functions (see Appendix A, Proof of Proposition 5, for the full Jacobi matrix), therefore we will restrain the subsequent analysis to the linear-demand, linear-cost oligopoly with Logit Dynamics.
Leading example.We know that the Cournot heuristic is a lagged version of the rational heuristic in this leading example since the inverse demand function is linear, therefore the dimension of the dynamical system can be reduced by one.Note that only the Cournot production is a lagged version of the rational production.The Cournot profits and resulting fractions are in general not lagged rational profits and fractions.Therefore, the restriction of the complete, 5-dim dynamical system to the linear-linear oligopoly with Logit Dynamics becomes 4-dim This system has one unique equilibrium where all firms produces the Cournot-Nash quantity q * .Since production is equal at the equilibrium, profits are equal at the equilibrium.The equilibrium fractions are therefore a function of the information costs and the evolutionary pressure, given by For general heuristics' costs-and, therefore, general fractions η R * and η C * -the eigenvalues of the corresponding Jacobian at the rest point are convoluted expressions of model's parameters, which cannot be meaningfully simplified.Nevertheless, because the rational heuristic is more informationally-and computationally-intensive than the Cournot and the imitation heuristics, it is reasonable to assume a strictly positive cost C R > 0 associated to its use, while setting the costs of the other two heuristics C C = C I = 0. We obtain the following result: Proposition 5.For costly, rational heuristic (C R > 0) and, costless, Cournot and imitation heuristics (C C = C I = 0), the interior, Cournot-Nash equilibrium (q * , q * , q * ) along with equilibrium , is a locally stable fixed point for the model with endogenous fluctuations between of rational, Cournot and imitation firms, if and only if From Equation (34) we notice first that, when information cost of the rational play C R approaches zero, the system is stable for all market sizes n.If the cost differential of rational vs. Cournot and imitation play is strictly positive (C R > 0) and intensity of selection β → ∞ , the equilibrium is stable for n < 7. For, intermediate, finite β, the equilibrium fractions (η R * , η C * , η I * ), as well as the instability threshold n are a function of both C R and β.In the simulations below, C R = 1 and β = 3, the equilibrium is stable when n < 7.42.When n = 7.42, the system undergoes its first bifurcation.The largest eigenvalue is equal to −1 at the bifurcation, indicating that the first bifurcation is a period-doubling bifurcation.This is confirmed by the simulations below.
The leading example is simulated in Figure 3. Panel (a) depicts the bifurcation diagram for increasing number of firms n.The first period-doubling bifurcation appears, as calculated analytically for n = 7.42.For n = 11.85, the system undergoes a Hopf-bifurcation which creates highly non-linear dynamics.For 13 ≤ n ≤ 14.4, the system is in a 10-cycle whereas for n > 14.4 the system becomes chaotic again.Panel (b) displays oscillating time series of produced quantity by the Cournot and imitation firms and the equilibrium quantity fraction q * .Since the rational quantity in period t + 1 equals the Cournot quantity in period t this time series is not included.Panel (c) displays the resulting profits.Note that Π I t > Π R t ∀t and Π I t ≥ Π C t ∀t.Panel (d) displays the resulting oscillating time series of the fraction fractions.Due to the information cost the sophisticated rational firms do not perform better than the Cournot and imitation firms resulting in low fractions of rational firms.Moreover, since the imitation profit is at least as high as the Cournot profit, the resulting imitation fraction is at least as high as the Cournot fraction.In Panel (e) the largest Lyapunov exponent for increasing number of firms is shown whereas in Panel (f) the largest Lyapunov exponent for increasing β is shown.Game and behavioural parameters are set equal to: Last, Figure 4 shows some attractors of the evolutionary model for increasing evolutionary pressure, with (quasi-)periodic motion just after the second bifurcation and breaking of the invariant circles into a strange attractor as the number of firms further increases.Similar 'breaking of the invariant circles' route to chaos appears for the rational and Cournot series.

Concluding Remarks
In this paper we set out to filling a gap in the literature on heterogenous heuristics in a Cournot oligopoly with boundedly rational players.Partly motivated by the experimental evidence for imitate-the-average behaviors in oligopoly games, our focus is on better understanding the role this specific imitation rule plays in a competitive Cournot environment, populated by myopic best-reply and rational (equilibrium) firms.In a population game framework-random re-matching of players, with fitness approximated by the expected payoffs accruing in all possible realizations of opponents' heuristic types-the interplay between two model's parameters-differential cost C between heuristics and intensity of evolutionary selection acting upon heuristics β-determines the critical market size for which the unique, interior Cournot-Nash equilibrium of the underlying oligopoly loses stability.
For the pairwise, evolutionary contests between heuristics we first showed that in the case when Cournot firms compete with imitators, if C > 0 (imitation heuristic enjoys a strict cost advantage over the myopic Cournot adjustment) and intensity of selection β is very large (β → ∞ limit), then the Cournot equilibrium is locally stable for n < 7. Absent the information costs' differential (i.e., C = C C − C I = 0), the threshold on the number of firms that changes the system from stable to unstable remains 7, irrespective of the magnitude of the evolutionary pressure β.In contrast with the model comprising Cournot players only (Cournot equilibrium already unstable for n = 3), the addition of players using the relatively cheaper imitate-the-average heuristic, stabilizes the equilibrium.Secondly, in the situation when rational firms compete with imitators, the system is always stable, regardless of the information costs of the more sophisticated, rational choice.
For the full ecology of behavioral rules-rational firms, Cournot firms and imitators-we are able to derive the (in)stability threshold on the number of firms for the linear inverse demand-linear costs oligopoly with Logit Dynamics and for a particular configuration of heuristics costs: costly rational heuristic (C R ≥ 0) and costless Cournot and imitation rules (C C = C I = 0).On one hand, if rational plays is costless, the equilibrium is stable, for all possible market sizes n.On the other hand, even for small, strictly positive C R , the interior, Cournot equilibrium loses stability for n = 7 (in the β → ∞ limit) and for larger market sizes n > 7 (in the case of finite β).Complicated endogenous fluctuations (two-cycles, strange attractors) in the quantities played and the fractions of various heuristics may occur, in particular, when the evolutionary pressure is high.
By deriving the stability conditions for our pairwise and threewise evolutionary model, we conclude that introducing imitators tends to stabilize the dynamics, provided that imitation is the least costly heuristics in the menu of potential learning rules.
At least two avenues seem promising for future research: firstly, our analysis deals with Cournot oligopolies that display an unique, interior Cournot-Nash equilibrium, and, therefore, we conjecture that our local stability analysis also holds globally.This is, of course, not valid for the Type II oligopolies that display multiple-interior and boundary-Nash equilibria.How the (in)stability of these equilibria, along with the relative size of their basins of attraction, vary with the market size n, under the full ecology of three, switching heuristics proposed in this paper, remains an unexplored question.
Secondly, related to the implications of imitative behavior in heterogenous heuristics environments, it would be worth considering alternative models of imitation.Our way of modelling imitation favors the average play (imitate the average behavior in the population) but other imitation rules well-studied in the literature may be envisaged.For instance, a version of the imitation heuristic that copies the past production decision of the most profitable firm from the entire population or one that imitates the successful players only among the m closest neighbors (in a location model, à la Hotelling).In these alternative imitation scenarios, coordination on a non-Cournot equilibrium-for instance, on the Walrasian equilibrium as in [8] under the homogeneous, imitate-the-best play-may arise.and and This leaves us to examine the partial derivatives of ψ 1 with respect to q R t , q I t and η t , evaluated at the equilibrium. and and Therefore the Jacobian matrix, evaluated at the equilibrium is given by which has eigenvalues λ 1 = ηH q (q * , η * ) + 1 − η * , λ 2 = 0 and λ 3 = 0. Consequently the system is locally stable when |λ 1 | < 1, this is exactly the condition stated in Proposition 4. Note again the similarity with the condition in Section 3 where we fixed the fraction η.
Proof of Proposition 5.It can easily be shown that the partial derivatives of ϕ 3 with respect to q R , q C , q I , η R and η C , evaluated at the equilibrium are η R * , η C * , 1 − η R * − η C * , 0 and 0, respectively.To determine the partial derivatives of ϕ 4 and ϕ 5 we need to determine the partial derivatives of ∆U R t and ∆U C t .In accordance to Section 4.2 we can write the first profit differential as which does not depend upon the produced quantities, and Next to that the partial derivatives of D k 1 ,k 2 (x) evaluated at the equilibrium are given by where the second equalities follow from the fact that P ′ * )q * + P(Q * ) − C ′ * ) = 0 is the first order condition of any firm in a Cournot-Nash equilibrium.Using this it follows immediately that the partial derivatives of ϕ 4 are given by: and and and and Furthermore, the partial derivatives of ϕ 5 are given by and and and and The Jacobian of the system, evaluated at the equilibrium x * is thus given by The Jacobian of the system in the leading example evaluated at the equilibrium is therefore given by For the calculation of the eigenvalues J 13 and J 14 are irrelevant because the third and fourth row contain only zeros.For C R > 0 and C C = C I = 0 the eigenvalues are a function of n, β and C R only: For the system to be stable we need |λ 4 (n, C R β)| < 1.Note that λ 4 is always less than 1.Rearranging gives the threshold number 11 of firms ψ(C R * β) stated in the proposition.

Notes 1
Firms that display Cournot behaviour take the current period's aggregate output of their competitors as a predictor for the next period competitors' aggregate output and best-respond to that.
In models with heterogeneous expectations producers can have different heuristics to adjust their production.We thank an anonymous referee for pointing out this literature on preferences evolution as an alternative to modelling heuristics (rules) evolution in game dynamics.
Sufficient conditions for the existence and uniqueness of the interior Cournot-Nash equilibrium are that P(•) is twice continuously differentiable, nonincreasing and that C(•) is twice continuously differentiable, nondecreasing and convex, see [24].6 Ref.
[6] provide an example of a Type II duopoly (linear demand, and marginal costs decreasing faster than the demand).Such Type II oligopolies display, in addition to the interior Cournot-Nash equilibrium, two boundary Nash equilibria with one of the firms producing the monopoly output and the other producing nothing.
Throughout the paper, a (locally) asymptotically fixed point (or a steady state) x * is a fixed point of the dynamical system which attracts all initial conditions in a neighborhood of x * .Technically, x * is the ω− limit set of all initial conditions x ε (0) that lie within ε distance from it.A fixed point is locally asympt.stable (a sink) if all eigenvalues of the Jacobi matrix, evaluated at the equilibrium, lie within the unit circle.If at least one eigenvalue lies outside the unit circle, the fixed point will be called unstable [25].

8
Note that the population dynamics remains in the interior of the unit simplex for finite β.This implies that in each time period all behavior rules are present in the population and no behavioral rule will ever vanish (this is the so-called no-extinction condition).Furthermore, no new behavioral rules emerge from this model (this is the so-called no-creation condition).The property that the simplex is invariant under the K(•) dynamic also ensures that fractions, and therefore quantities played, remain bounded.9 This rational choice solution exists and is unique for standard assumptions as strictly concave inverse demand functions P(•) and nondecreasing cost functions C(•).Outside this class of oligopolies, for the case of multiple (local) maxima, we make the additional assumption that rational players are able to coordinate on the global maximum of the profit function.
Notice that in the opposite case C < 0 (i.e., Imitation is costlier than Cournot heuristic) and large selection pressure β → ∞ equilibrium fraction of Cournot players η * approaches 1 and we recover [1] instability threshold with Cournot only players, n = 3.
Economically, this threshold number of firms can only be an integer but mathematically the number of firms can be treated as a continuous variable.

Figure 1 .
Figure 1.Linear n-player Cournot game with endogenous fraction dynamics.

Figure 3 .
Figure 3. Linear n-player Cournot competition between rational, Cournot and imitation firms with endogenous fraction dynamics.

3 Essentially, a
Cournot adjustment process with long memory, introduced by [7].