Reputation in the Iterated Prisoner’s Dilemma: A Simple, Analytically Solvable Agents’ Model

Cieśla, Michał

doi:10.3390/e27060639

Open AccessArticle

Reputation in the Iterated Prisoner’s Dilemma: A Simple, Analytically Solvable Agents’ Model

by

Michał Cieśla

Institute of Theoretical Physics and Mark Kac Center for Complex Systems Research, Jagiellonian University, Łojasiewicza 11, 30-348 Kraków, Poland

Entropy 2025, 27(6), 639; https://doi.org/10.3390/e27060639

Submission received: 4 May 2025 / Revised: 1 June 2025 / Accepted: 14 June 2025 / Published: 15 June 2025

(This article belongs to the Collection Social Sciences)

Download

Browse Figures

Versions Notes

Abstract

This study introduces a simple model, which can be used to examine the influence of reputation on expected income achieved within the Iterated Prisoner’s Dilemma (IPD) game framework. The research explores how different reputation distributions among society members impact overall outcomes by modeling a society of agents, each characterized by a reputation score that dictates their likelihood of cooperation. Due to the simplicity of the model, we can analytically determine the expected incomes based on the distribution of agents’ reputations and model parameters. The results show that a higher reputation generally leads to greater expected income, thereby promoting cooperation over defection. However, in some cases, where there are more defecting individuals, the expected income reaches the maximum for agents with an average reputation, and then decreases for individuals who cooperate more. Various scenarios, including uniform, increasing, and decreasing reputation distributions, are analyzed to understand their effects on the promoted interaction strategy. Finally, we outline future extensions of the model and potential research directions, including the exploration of alternative reputation distributions, variable interaction parameters, and different payoff structures in the dilemma games.

Keywords:

iterated prisoner’s dilemma; reputation; cooperation and defection; agent model

1. Introduction

The Prisoner’s Dilemma (PD) is a simple two-player game in which each player must choose between two available strategies: cooperate (C) or defect (D) [1,2]. When both players cooperate, each receives a reward of R. If both of them are defective, each receives a punishment of P. If one player cooperates while the other defects, the cooperating player receives the sucker’s payoff S, while the defecting player receives the temptation payoff T. Typically,

T > R > P > S

and

2 R > T + S

. For example,

T = 5

,

R = 3

,

P = 1

, and

S = 0

[2]. These rules can be written in the form of a payoff matrix:

\begin{array}{c} \begin{matrix} C D \end{matrix} \\ \begin{matrix} C & R S \\ D & T P \end{matrix} \end{array} ⟶ \begin{array}{c} \begin{matrix} C D \end{matrix} \\ \begin{matrix} C & 3 0 \\ D & 5 1 \end{matrix} \end{array}

(1)

When PD is played only once, a rational player should defect, as it always yields a higher payoff than cooperation, regardless of what the other player does. Therefore, the strategy when both players defect

(D, D)

corresponds to PD’s Nash equilibrium [3]. It remains unchanged when the game is repeated a definite number of times. In the last game, the optimal strategy for a rational player is to defect. As a result of the last game, both players are aware that the penultimate game becomes the last game, which should be taken into consideration. Therefore, using backward induction, one can prove that the Nash equilibrium corresponds to a

(D, D)

strategy for each game in a sequence. However, this strategy is not optimal, as both players could receive a higher reward of

R > P

if they cooperate. Therefore, especially when the number of consecutive games between the same two players is unknown, for example, if the probability of a repeated interaction with the same player is high enough, the optimal strategy can change to

(C, C)

.

In general, such an iterated version of the Prisoner’s Dilemma (IPD) is one of the toy models in game theory actively used to study the phenomenon of cooperation, detection, and reputation [4,5,6,7,8,9,10,11,12,13,14]. For example, Cooper et al. explored the mechanisms behind cooperative behavior in PD games, particularly in settings where reputation effects are minimized or eliminated, and argued that a combination of altruism and limited reputation effects may better explain observed behavior [5]. However, numerous studies indicate that reputation is a crucial factor in strengthening cooperation [10,11,15,16,17,18,19]. Second-order reputation evaluation, where an individual’s reputation is updated based on their strategy and the reputations of their neighbors, has been found to enhance cooperation beyond spatial reciprocity [15]. The value of reputation is demonstrated by its impact on expected future payoffs and its tradability in reputation markets [16]. Experiments with random-matching PD games that incorporate information about opponents’ past actions and previous opponents’ behaviors have shown significant improvements in cooperation, especially when subjects have experienced low cooperation in no-information games [17]. These findings have important implications for the design and implementation of reputation systems in various contexts, including e-commerce and credit reporting [16,17].

Interestingly, in a spatial version of IPD, when cooperating or defecting players interact with their closest neighbors, clusters of cooperating agents can form [20,21]. Similar behavior was observed for humans playing the IPD game [22].

It is worth stressing that the type of interaction described in the Prisoner’s Dilemma game is observed in real life [1,9]. The IPD framework is also used to study trust and conflict resolution in long-term relationships [23]. It has been applied in disciplines such as economics, where businesses decide whether to compete or collaborate over repeated interactions [2,24], and in international relations, where countries develop policies of cooperation or retaliation based on others’ past actions, such as in arms control agreements or trade negotiations [25]. In evolutionary biology, IPD models how cooperative behavior can emerge and stabilize within populations over time, even when short-term incentives favor selfish behavior [23,26]. A key insight from the influential study by Axelrod and Hamilton was that simple strategies such as Tit-for-Tat can promote robust cooperation, even in environments where defection seems advantageous individually [2,25].

In this study, the PD game models the bilateral trading within a society of agents [27,28,29]. The general idea is based on assumptions taken from [29], where a randomly selected agent chooses a business partner based on the wealth of the partner. However, here, the second agent is chosen based on its reputation, which reflects its ability to cooperate rather than defect. It is based on the observation that in a modern world with almost no barriers to spreading information, reputation, along with price, is the most important factor when making trading arrangements [30,31]. Thus, agents who cooperate will be chosen more frequently as transaction parties than those who defect [6,32]. The main objective of this study is to propose a simple model that takes into account the most important factors in bilateral trading and can also lead to nontrivial conclusions. The study of the proposed model focuses on determining how the expected income of a member of such a society depends on their reputation, measured in terms of the probability of cooperation.

This paper is organized as follows: Section 2 describes how the trading agents are chosen and what the results of their interaction are. In Section 3, the expected income dependence on the reputation of the agents is derived for several different distributions of the reputation of the agents in the society. The results obtained are discussed in Section 4, after which the study is concluded.

2. Model

The model is based on two assumptions. First, each agent can play the PD game, and second, they have a limited opportunity to choose an opponent. They want to play with one who cooperates the most. To formulate it, let us assume that there are

N ≫ 1

independent agents. Each agent is characterized by a single parameter q, which represents the agent’s reputation. This reputation is defined here as the probability that the agent will cooperate with another agent during a game, regardless of the other agent’s reputation. Two interacting agents are selected from the entire population using the following rules:

The first agent is chosen randomly with uniform probability. Thus, the probability of selecting a given agent is $P_{1} = 1 / N$ , and it does not depend on the reputation of the agent.
The second agent is selected in a two-stage process. Firstly, $n < N$ agents are randomly selected according to a uniform probability (the probability of selecting a given agent is $n / (N - 1)$ ). Then, from this group of n agents, one with the highest reputation is taken.

This type of matching is often referred to as a selective assortment [33,34], however, there is no classical setup when agents are divided into two groups: always cooperating and always defecting [35]. Here, most players can use both strategies simultaneously, albeit with different probabilities, which seems to be closer to reality. The probability that the highest reputation

q_{m a x}

among the group of n agents is smaller than a given value of x is

P r o b (q_{m a x} < x) = P r o b (q_{1} < x) \cdot P r o b (q_{2} < x) \cdot . . . \cdot P r o b (q_{n} < x) = {[P r o b (q < x)]}^{n} .

(2)

Note that the above equation uses the cumulative distribution functions, and thus, the probability density function of the second agent’s reputation is

p_{q_{m a x}} (x) = \frac{d}{d x} P r o b (q_{m a x} < x) = n {[P r o b (q < x)]}^{n - 1} p_{q} (x),

(3)

where

p_{q} (x)

is the probability density function of finding an agent of reputation q in the agents’ society.

Two selected agents interact with each other and choose their strategy (cooperate or defect) randomly and independently, based on their reputation. Thus, the result of the game for the first agent is

G_{1} (q_{1}, q_{2}) = q_{1} [q_{2} R + (1 - q_{2}) S] + (1 - q_{1}) [q_{2} T + (1 - q_{2}) P] .

(4)

Analogously,

G_{2} (q_{1}, q_{2}) = G_{1} (q_{2}, q_{1})

. Each agent collects their gain from this game, and the entire process repeats: two agents are selected, they interact, gain the prize, and the process continues. For convenience and without loss of generality, let us assume that there are N such iterations. It is also assumed that the agents’ reputations remain unchanged during the whole process, which differs from other approaches, where agents’ reputations are adjusted to their former games [36].

3. Results

The expected income of the first selected agent of reputation q is

I_{1} (q) = \int_{0}^{1} p_{q_{m a x}} (x) G_{1} (q, x) d x,

(5)

because his partner’s reputation is given by the

p_{q_{m a x}} (x)

probability density function. On the other hand, the expected gain of the second agent is

I_{2} (q) = \int_{0}^{1} p_{q} (x) G_{1} (q, x) d x .

(6)

Thus, the total expected income of a given agent after N iterations is

I (q) = N [\frac{1}{N} I_{1} (q) + \frac{n}{N - 1} {(\int_{0}^{q} p_{q} (x) d x)}^{n - 1} I_{2} (q)],

(7)

where the two components relate to the possibility of being the first or the second chosen agent, respectively. The prefactors before

I_{1} (q)

and

I_{2} (q)

correspond to the probabilities that the agent will be selected as a first and second player, respectively. The integral in the second component corresponds to the probability that the reputation of the second agent is the highest among the group of n chosen agents—see (2).

In the limit of large N, the expected income of a given agent (7) depends only on his reputation and parameters of the model, which here are n—the size of the group of agents from which we select the one with highest reputation, and

p_{q} (x)

—the reputation probability density function. Because we do not know which distribution of

p_{q} (x)

resembles reality the best, we will analyze the model output for several typical example forms of

p_{q} (x)

, namely, the uniform distribution, increasing and decreasing distributions—where more players are of higher or lower reputation, respectively. Next, we look at a distribution condensed around

q = 0.5

, and one that promotes simultaneously agents of very high and very low reputation.

3.1. Case 1: $p_{q} (x) = C o n s t$

The first studied case is of a reputation uniformly distributed in the society of the agents:

p_{q} (x) = \{\begin{matrix} 1 & x \in [0, 1] \\ 0 & otherwise \end{matrix} .

(8)

Here, the expected income can be calculated analytically:

\begin{matrix} I (q) & = & \frac{1}{2} n q^{n - 1} (- P q + P + q R + q S - q T + T) \end{matrix}

(9)

\begin{matrix} + \frac{n q R - n q T + n T - P q + P + q S}{n + 1}, \end{matrix}

(10)

which, for standard PD parameters

T = 5

,

R = 3

,

P = 1

and

S = 0

, reduces to

I (q) = \frac{1}{2} n (6 - 3 q) q^{n - 1} + \frac{- 2 n q + 5 n - q + 1}{n + 1} .

(11)

The character of income dependence on q strongly depends on the parameter n. For example, for

n = 1

,

I_{n = 1} (q) = 6 - 3 q

, and its maximum corresponds to

q = 0

, which is in accordance with the Nash equilibrium of the PD game. For

n = 2

,

I_{n = 2} (q) = \frac{1}{3} (- 9 q^{2} + 13 q + 11)

, and the “always defect” strategy becomes the worst. The expected income has a maximum at

q = \frac{13}{18} \approx 0.722

. For

n = 3

,

I_{n = 3} (q) = - \frac{9}{2} q^{3} + 9 q^{2} - \frac{7}{4} q + 4

, and here we see a global minimum for

q = \frac{1}{18} (12 - \sqrt{102}) \approx 0.106

, but the maximum income is achieved by the most reputable agent:

I (q = 1) = 6.75

. For higher n, the character of

I (q)

does not change qualitatively. As n increases, the minimum shifts to the right, and the maximum at

q = 1

increases. These and other results are presented in Figure 1.

3.2. Case 2: $p_{q} (x) \sim x$

Now, assume that the probability of finding an individual with a given reputation increases linearly with it. For example,

p_{q} (x) = \{\begin{matrix} 2 x & x \in [0, 1] \\ 0 & otherwise \end{matrix},

(12)

which leads to

\begin{matrix} I (q) & = & \frac{1}{3} n q^{2 (n - 1)} (- P q + P + 2 q R + q S - 2 q T + 2 T) \end{matrix}

(13)

\begin{matrix} + \frac{2 n q R - 2 n q T + 2 n T - P q + P + q S}{2 n + 1} . \end{matrix}

(14)

For

T = 5, R = 3, P = 1

and

S = 0

the above relation reduces to

I (q) = \frac{1}{3} n (11 - 5 q) q^{2 (n - 1)} + \frac{- 4 n q + 10 n - q + 1}{2 n + 1} .

(15)

As in the previous case, the expected income depends on the parameter n. These dependencies are shown in Figure 2.

Although there are not many qualitative differences with the previous case of constant reputation distribution

p_{q} (x)

, it is worth noting the following:

For $n = 1$ , the maximum for $q = 0$ is higher as there are more opponents, who prefer to cooperate;
For $n = 2$ , there is no maximum for medium q. In contrast, we observe the minimum near $q = 0.134$ . The position of this minimum shifts to the right as n increases;
For $n \geq 2$ , the maximum is at $q = 1$ and it grows with the increase of n.

3.3. Case 3: $p_{q} (x)$ Decreases with an Increase of x

An opposite case is when the majority of agents have a poor reputation. Here, let us consider two possibilities. In the first one, the decrease is given by a linear function

p_{q} (x) = \{\begin{matrix} 2 - 2 x & x \in [0, 1] \\ 0 & otherwise \end{matrix},

(16)

and in the second one, the decrease is exponential

p_{q} (x) = \frac{a exp (a)}{exp (a) - a - 1} \{\begin{matrix} exp (- a x) - exp (- a) & x \in [0, 1] \\ 0 & otherwise \end{matrix} .

(17)

Note that the prefactor

[a exp (a)] / [exp (a) - a - 1]

is to normalize the distribution, and the component

exp (- a)

is to ensure that

p_{q} (x = 1) = 0

. The parameter

a > 0

corresponds to the slope of the exponential decrease—the larger a is, the faster

p_{q} (x)

decreases.

In the case of linear decrease, some of the integrals (5)–(7) that define the expected income for arbitrary n cannot be evaluated using only elementary functions. However, for specific n values,

I (q)

has a polynomial form. For exponential decrease, the general form of the expected income is complicated, but it can still be calculated analytically for specific values of parameters. The results are presented in Figure 3.

For

n = 1

, as previously, the “always defect” (

q = 0

) strategy is the best, but it is much less profitable due to the higher probability of selecting a defecting opponent:

I_{n = 1} (0) \approx 4.667

for the linear decay, and

I_{n = 1} (0) \approx 3.460

for exponential one. For

n = 2

, we observe the maximum

I_{n = 2} (0.489) \approx 4.634

for the linear decay, and

I_{n = 2} (0.297) \approx 3.897

for exponential one. These maxima increase with the growth of the parameter n and also move in the direction of larger reputation q. For example,

I_{n = 3} (0.687) \approx 5.571

,

I_{n = 5} (0.831) \approx 7.624

, and

I_{n = 10} (0.924) \approx 12.775

for the linear decay, and

I_{n = 3} (0.413) \approx 4.746

,

I_{n = 5} (0.532) \approx 6.352

, and

I_{n = 10} (0.661) \approx 9.965

for exponential decay. In contrast to previous cases, all these maxima are not reached for the maximum reputation of

q = 1

. It is worth mentioning that in the exponential case, there are fewer individuals with higher reputation than in the linear case. Thus, in general, the more defective players in the society, the lower the reputation, which guarantees the largest profit.

To complete the analysis, we study two more cases. The first is when the distribution of

p_{q} (x)

has the maximum for

q = 0.5

and the minima for

q \in {0, 1}

. The second, which is opposite to the first one, namely,

p_{q} (x)

, has a minimum of

q = 0.5

and maxima of

q \in {0, 1}

.

3.4. Case 4: $p_{q} (x)$ with Maximum at $q = \frac{1}{2}$ and Minima for $q \in {0, 1}$

To model this case, we chose the probability distribution function of reputation q defined as follows:

p_{q} (x) = \{\begin{matrix} 6 x (1 - x) & x \in [0, 1] \\ 0 & otherwise \end{matrix} .

(18)

Similarly to the previous case, the total income for arbitrary n does not have a compact, elementary form, but for specific n, it is again given by a polynomial in reputation q. The results are shown in Figure 4.

Qualitatively, the results are similar to those from the previous case. The main difference is that the “always defect” strategy yields better results here, especially for

n = 1

. For larger n, the maxima are higher and also closer to the

q = 1

limit.

3.5. Case 5: $p_{q} (x)$ with Minimum at $q = \frac{1}{2}$ and Maxima for $q \in {0, 1}$

In the last studied case, we also use a quadratic function to model agents’ reputation distribution:

p_{q} (x) = \{\begin{matrix} - 12 x (1 - x) + 3 & x \in [0, 1] \\ 0 & otherwise \end{matrix}

(19)

Here, also, income

I (q)

has a polynomial form for specific values of the parameter n. The results are presented in Figure 5.

The results here are the most similar to those from Case 2, where the decreasing distribution of agents’ reputation was studied. However, it should be noted that for moderate values of n, we observe some fine structure of local maxima around

q \approx 0.4

and minima near

q \approx 0.8

. Finally, for the studied models, we can estimate the expected agent’s income in the society

E (I) = \int_{0}^{1} p_{q} (x) I (x) d x,

(20)

and its dispersion

σ (I) = \sqrt{\int_{0}^{1} p_{q} (x) {(I (x) - E [I])}^{2} d x} .

(21)

The results are collected in Table 1.

The largest overall results are obtained for case 2, where cooperation is more probable. On the other hand, the lowest expected income is for case 3 with exponential probability decay, where most agents defect. Similarly, the dispersion is larger where more cooperating agents are present. Interestingly, if we treat dispersion as a measure of inequality, we see that more egalitarian societies tend to be poorer and contain a higher proportion of defecting members. Comparing the two other cases, the better results are for case 5, where, again, there are many agents with higher reputations. Thus, on average, the studied model fosters cooperation over defection, as is the case in the IPD game.

All notebooks with the results, calculations, and analytical formulas are attached as supplementary materials. The reader is encouraged to use them and experiment with the model’s results using different parameters.

4. Discussion

Despite the significant differences in the assumed reputation distribution among the agents’ society, all the results presented are qualitatively quite similar.

First, for

n = 1

, the “always defect” strategy is the most profitable [7,8]. In this case, the reputation does not affect the number of possible interactions that could lead to more opportunities to gain a positive reward from the game. In N games, the player plays twice on average and obtains a temptation (T) or a punishment (P) payoff, depending on the opponent’s behavior. The strategy is more prominent where opponents have higher reputations, as the temptation (T) payoff occurs more often.

On the other hand, the “always defect” strategy is the worst for

n = 2

. To better understand this, let us revisit the first case and the relation (7) that describes the income

I (q)

. The “always defect” agent will play only if they are chosen as the first player, and the payoff from this game is

I (q = 0) = I_{1} (q = 0) = \int_{0}^{1} p_{q} (x) [x T + (1 - x) P] d x,

(22)

which, in the case of

p_{q} (x) = c o n s t

,

T = 5

, and

P = 1

, reduces to

I (q = 0) = \frac{1 + 5 n}{1 + n}

, while for

p_{q} (x) \sim x

and the same T and P, to

I (q = 0) = \frac{1 + 10 n}{1 + 2 n}

. Both of these relations grow with increasing n. This is because the larger the set of agents selected as potential opponents, the higher the expected reputation of the most reputable agent in this set. Thus, the temptation (T) payoff for the first, “always defect”, player becomes more probable than the punishment (P) one. Therefore,

n = 2

is the worst possible case for such a player.

For higher n, agents with a high reputation gain greater expected rewards. This is because their reputation often leads them to play more often as second-tier players. It is especially profitable when there are many reputable players with whom they can cooperate and receive a reward of R. Only in the third and fourth of the studied cases, where there are less reputable agents, is the “always cooperate” strategy not optimal, as opponents often defect, resulting in no payoff for the cooperating player. In such a case, the optimal reputation is lower than

q = 1

, especially when most players are defective, as in case 3 with exponential decay of the probability of finding a cooperative opponent. Either way, the larger the parameter n, the higher the optimal reputation. On the other hand, for each studied case, the dependence of expected income on the player’s reputation is typically highly nonlinear and nonmonotonic, which agrees with the previous observation that the coexistence of cooperating and defective players may lead to changes in the Nash equilibrium [37].

5. Conclusions

We presented a simple model of a society of agents that interact with themselves based on the rules of the Prisoner’s Dilemma game. Agents can cooperate or defect according to their reputation attribute. We examined the model with six specific distributions of agents’ reputation in the society. The results obtained suggest that, despite significant differences in reputation distributions, all variants studied here have many common characteristics. For example, agents with a higher reputation generally reach the maximum expected income [38]. On the other hand, a medium reputation often gave worse incomes than the “always defect” strategy.

Because the model can be solved analytically, it opens up various opportunities for future studies, ranging from other distributions of reputations to the introduction of the variable parameter n from one game to another. Other possible extensions can include fluctuations in the reputation distribution, interaction probabilities based on the distance between agents defined by a graph structure, or in a different manner [39]. Additionally, a player’s strategy can change depending on its location [40] or history of past games [41]. Moreover, instead of total income, one can study income distribution and wealth inequality [42]. Agents could also adjust their strategies to maximize the expected rewards [43]. Lastly, different Prisoner’s Dilemma game payoff tables can be tested, or even other dilemma games, e.g., the Snowdrift game or Stag Hunt game [21,44].

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/e27060639/s1.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the supplementary material. Further inquiries can be directed to the corresponding author.

Acknowledgments

The research for this publication has been supported by a grant from the Priority Research Area digiWorld under the Strategic Programme Excellence Initiative at Jagiellonian University.

Conflicts of Interest

The author declares no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

PD	Prisoner’s Dilemma
IPD	Iterated Prisoner’s Dilemma

References

Rapoport, A.; Chammah, A.M. Prisoner’s Dilemma: A Study in Conflict and Cooperation; University of Michigan Press: Ann Arbor, MI, USA, 1965; Volume 165. [Google Scholar]
Axelrod, R. The Evolution of Cooperation; Basic Books: New York, NY, USA, 1984. [Google Scholar]
Nash, J.F., Jr. Equilibrium points in n-person games. Proc. Natl. Acad. Sci. USA 1950, 36, 48–49. [Google Scholar] [CrossRef] [PubMed]
Perc, M.; Jordan, J.J.; Rand, D.G.; Wang, Z.; Boccaletti, S.; Szolnoki, A. Statistical physics of human cooperation. Phys. Rep. 2017, 687, 1–51. [Google Scholar] [CrossRef]
Cooper, R.; DeJong, D.V.; Forsythe, R.; Ross, T.W. Cooperation without Reputation: Experimental Evidence from Prisoner’s Dilemma Games. Games Econ. Behav. 1996, 12, 187–218. [Google Scholar] [CrossRef]
Press, W.H.; Dyson, F.J. Iterated Prisoner’s Dilemma contains strategies that dominate any evolutionary opponent. Proc. Natl. Acad. Sci. USA 2012, 109, 10409–10413. [Google Scholar] [CrossRef] [PubMed]
Hilbe, C.; Nowak, M.A.; Sigmund, K. Evolution of extortion in iterated prisoner’s dilemma games. Proc. Natl. Acad. Sci. USA 2013, 110, 6913–6918. [Google Scholar] [CrossRef]
Wang, Z.; Zhou, Y.; Lien, J.W.; Zheng, J.; Xu, B. Extortion can outperform generosity in the iterated prisoner’s dilemma. Nat. Commun. 2016, 7, 11125. [Google Scholar] [CrossRef]
Dal Bó, P.; Fréchette, G.R. Strategy Choice in the Infinitely Repeated Prisoner’s Dilemma. Am. Econ. Rev. 2019, 109, 3929–3952. [Google Scholar] [CrossRef]
Mieth, L.; Buchner, A.; Bell, R. Moral labels increase cooperation and costly punishment in a Prisoner’s Dilemma game with punishment option. Sci. Rep. 2021, 11, 10221. [Google Scholar] [CrossRef] [PubMed]
Gong, Y.; Liu, S.; Bai, Y. Reputation-based co-evolutionary model promotes cooperation in prisoner’s dilemma game. Phys. Lett. A 2020, 384, 126233. [Google Scholar] [CrossRef]
Gächter, S.; Lee, K.; Sefton, M.; Weber, T.O. The role of payoff parameters for cooperation in the one-shot Prisoner’s Dilemma. Eur. Econ. Rev. 2024, 166, 104753. [Google Scholar] [CrossRef]
You, T.; Zhang, H.; Zhang, Y.; Li, Q.; Zhang, P.; Yang, M. The influence of experienced guider on cooperative behavior in the Prisoner’s dilemma game. Appl. Math. Comput. 2022, 426, 127093. [Google Scholar] [CrossRef]
Yang, Z.; Zheng, L.; Perc, M.; Li, Y. Interaction state Q-learning promotes cooperation in the spatial prisoner’s dilemma game. Appl. Math. Comput. 2024, 463, 128364. [Google Scholar] [CrossRef]
Dong, Y.; Sun, S.; Xia, C.; Perc, M. Second-Order Reputation Promotes Cooperation in the Spatial Prisoner’s Dilemma Game. IEEE Access 2019, 7, 82532–82540. [Google Scholar] [CrossRef]
Pfeiffer, T.; Tran, L.; Krumme, C.; Rand, D.G. The value of reputation. J. R. Soc. Interface 2012, 9, 2791–2797. [Google Scholar] [CrossRef]
Gong, B.; Yang, C.L. Reputation and Cooperation: An Experiment on Prisoner’s Dilemma with Second-Order Information. 2010. Available online: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=1549605 (accessed on 13 June 2025).
Lu, P.; Wang, F. Heterogeneity of inferring reputation probability in cooperative behaviors for the spatial prisoners’ dilemma game. Phys. A Stat. Mech. Its Appl. 2015, 433, 367–378. [Google Scholar] [CrossRef]
Samu, F.; Számadó, S.; Takács, K. Scarce and directly beneficial reputations support cooperation. Sci. Rep. 2020, 10, 11486. [Google Scholar] [CrossRef]
Nowak, M.A.; May, R.M. Evolutionary games and spatial chaos. Nature 1992, 359, 826–829. [Google Scholar] [CrossRef]
Benko, T.P.; Pi, B.; Li, Q.; Feng, M.; Perc, M.; Blažun Vošner, H. Evolutionary games for cooperation in open data management. Appl. Math. Comput. 2025, 496, 129364. [Google Scholar] [CrossRef]
Cuesta, J.A.; Gracia-Lázaro, C.; Ferrer, A.; Moreno, Y.; Sánchez, A. Reputation drives cooperative behaviour and network formation in human groups. Sci. Rep. 2015, 5, 7843. [Google Scholar] [CrossRef]
Trivers, R.L. The evolution of reciprocal altruism. Q. Rev. Biol. 1971, 46, 35–57. [Google Scholar] [CrossRef]
Bao, A.R.H.; Liu, Y.; Dong, J.; Chen, Z.P.; Chen, Z.J.; Wu, C. Evolutionary Game Analysis of Co-Opetition Strategy in Energy Big Data Ecosystem under Government Intervention. Energies 2022, 15, 2066. [Google Scholar] [CrossRef]
Axelrod, R.; Hamilton, W.D. The Evolution of Cooperation. Science 1981, 211, 1390–1396. [Google Scholar] [CrossRef]
Nowak, M.A.; May, R.M. Tit for tat in heterogeneous populations. Nature 1992, 355, 250–253. [Google Scholar] [CrossRef]
Parsons, S.D.; Gymtrasiewicz, P.; Wooldridge, M. Game Theory and Decision Theory in Agent-Based Systems; Springer Science & Business Media: New York, NY, USA, 2012; Volume 5. [Google Scholar]
Chen, S.H. Agent-Based Computational Economics: How the Idea Originated and Where It Is Going; Routledge: London, UK, 2017. [Google Scholar]
Cieśla, M.; Snarska, M. A simple mechanism causing wealth concentration. Entropy 2020, 22, 1148. [Google Scholar] [CrossRef] [PubMed]
Gray, J.; Hicks, R.P. Reputations, perceptions, and international economic agreements. Int. Interact. 2014, 40, 325–349. [Google Scholar] [CrossRef]
Chen, S.H.; Xiao, H.; Huang, W.d.; He, W. Cooperation of Cross-border E-commerce: A reputation and trust perspective. J. Glob. Inf. Technol. Manag. 2022, 25, 7–25. [Google Scholar] [CrossRef]
Liberman, V.; Samuels, S.M.; Ross, L. The name of the game: Predictive power of reputations versus situational labels in determining prisoner’s dilemma game moves. Personal. Soc. Psychol. Bull. 2004, 30, 1175–1185. [Google Scholar] [CrossRef] [PubMed]
Eshel, I.; Cavalli-Sforza, L.L. Assortment of encounters and evolution of cooperativeness. Proc. Natl. Acad. Sci. USA 1982, 79, 1331–1335. [Google Scholar] [CrossRef]
Izquierdo, S.S.; Izquierdo, L.R.; Hauert, C. Positive and negative selective assortment. J. Theor. Biol. 2025, 608, 112129. [Google Scholar] [CrossRef]
Bergstrom, T.C. The Algebra of Assortative Encounters and the Evolution of Cooperation. Int. Game Theory Rev. 2003, 05, 211–228. [Google Scholar] [CrossRef]
Feng, M.; Pi, B.; Deng, L.J.; Kurths, J. An Evolutionary Game With the Game Transitions Based on the Markov Process. IEEE Trans. Syst. Man Cybern. Syst. 2024, 54, 609–621. [Google Scholar] [CrossRef]
Taylor, C.; Nowak, M.A. Evolutionary game dynamics with non-uniform interaction rates. Theor. Popul. Biol. 2006, 69, 243–252. [Google Scholar] [CrossRef] [PubMed]
Glynatsi, N.E.; Knight, V.; Harper, M. Properties of winning Iterated Prisoner’s Dilemma strategies. PLoS Comput. Biol. 2024, 20, e1012644. [Google Scholar] [CrossRef]
Li, Q.; Zhao, G.; Feng, M. Prisoner’s dilemma game with cooperation-defection dominance strategies on correlational multilayer networks. Entropy 2022, 24, 822. [Google Scholar] [CrossRef] [PubMed]
Perc, M.c.v.; Szolnoki, A. Social diversity and promotion of cooperation in the spatial prisoner’s dilemma game. Phys. Rev. E 2008, 77, 011904. [Google Scholar] [CrossRef]
Pi, B.; Feng, M.; Deng, L.J. A memory-based spatial evolutionary game with the dynamic interaction between learners and profiteers. Chaos Interdiscip. J. Nonlinear Sci. 2024, 34, 063120. [Google Scholar] [CrossRef]
Burda, Z.; Krawczyk, M.J.; Malarz, K.; Snarska, M. Wealth rheology. Entropy 2021, 23, 842. [Google Scholar] [CrossRef]
Kim, Y. The Influence of Conformity and Global Learning on Social Systems of Cooperation: Agent-Based Models of the Spatial Prisoner’s Dilemma Game. Systems 2025, 13, 288. [Google Scholar] [CrossRef]
Osborne, M.J. An Introduction to Game Theory; Springer: Berlin/Heidelberg, Germany, 2004; Volume 3. [Google Scholar]

Figure 1. Dependence of the agents’ expected income (7) on its reputation for uniform reputation distribution and different values of the parameter n. For

n = 1

, the income decreases linearly with increased q. For

n = 2

, a global maximum at average q is observed. For larger n, we observe a local minimum and then the largest income at

q = 1

.

Figure 1. Dependence of the agents’ expected income (7) on its reputation for uniform reputation distribution and different values of the parameter n. For

n = 1

, the income decreases linearly with increased q. For

n = 2

, a global maximum at average q is observed. For larger n, we observe a local minimum and then the largest income at

q = 1

.

Figure 2. Dependence of the agents’ expected income (7) on its reputation for growing reputation distribution and different values of the parameter n. For

n = 1

, the income decreases linearly with increased q. For larger n, we observe a local minimum and then the largest income at

q = 1

.

Figure 2. Dependence of the agents’ expected income (7) on its reputation for growing reputation distribution and different values of the parameter n. For

n = 1

, the income decreases linearly with increased q. For larger n, we observe a local minimum and then the largest income at

q = 1

.

Figure 3. Dependence of the agents’ expected income (7) on its reputation for growing reputation distribution and different values of the parameter n. The left plot corresponds to the linear decrease of

p_{q} (x)

and the right if for exponential decay with

a = 5

. In both, for

n = 1

, the income decreases linearly with increased q, and for larger n, a global maximum at average q is observed. The maximums shift to the right as n increases. The maximums are closer to the

q = 1

in the case of linear decrease (left panel), where there are fewer agents of very low reputation.

Figure 3. Dependence of the agents’ expected income (7) on its reputation for growing reputation distribution and different values of the parameter n. The left plot corresponds to the linear decrease of

p_{q} (x)

and the right if for exponential decay with

a = 5

. In both, for

n = 1

, the income decreases linearly with increased q, and for larger n, a global maximum at average q is observed. The maximums shift to the right as n increases. The maximums are closer to the

q = 1

in the case of linear decrease (left panel), where there are fewer agents of very low reputation.

Figure 4. Dependence of the agents’ expected income (7) on its reputation for growing reputation distribution and different values of the parameter n. For

n = 1

, the income decreases linearly with increased q. For larger n, a global maximum at

q < 1

is observed, similar to that in the left panel in Figure 3.

Figure 4. Dependence of the agents’ expected income (7) on its reputation for growing reputation distribution and different values of the parameter n. For

n = 1

, the income decreases linearly with increased q. For larger n, a global maximum at

q < 1

is observed, similar to that in the left panel in Figure 3.

Figure 5. Dependence of the agents’ expected income (7) on its reputation for growing reputation distribution and different values of the parameter n. For

n = 1

, the income decreases linearly with increased q. For larger n, we observe global (

n = 2

) or local maxima for smaller q, then local minima for larger q. For

n \geq 3

, the income reaches its highest value at

q = 1

.

Figure 5. Dependence of the agents’ expected income (7) on its reputation for growing reputation distribution and different values of the parameter n. For

n = 1

, the income decreases linearly with increased q. For larger n, we observe global (

n = 2

) or local maxima for smaller q, then local minima for larger q. For

n \geq 3

, the income reaches its highest value at

q = 1

.

Table 1. Expected income (20) and its dispersion (21) in brackets of the agents’ society for each studied case and several values of the parameter n.

	$n = 1$	$n = 2$	$n = 3$	$n = 5$	$n = 10$
case 1	4.50 (0.87)	4.83 (0.45)	5.00 (0.94)	5.17 (1.67)	5.32 (2.86)
case 2	5.11 (0.79)	5.33 (0.71)	5.42 (1.44)	5.52 (2.40)	5.59 (3.94)
case 3 (linear)	3.78 (0.63)	4.09 (0.51)	4.27 (0.88)	4.47 (1.41)	4.70 (2.25)
case 3 (exponential)	3.03 (0.40)	3.25 (0.55)	3.41 (0.87)	3.60 (1.28)	3.87 (1.89)
case 4	4.50 (0.67)	4.76 (0.63)	4.89 (1.19)	5.02 (1.95)	5.16 (3.16)
case 5	4.50 (1.16)	4.95 (0.43)	5.14 (0.71)	5.32 (1.48)	5.43 (2.76)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Cieśla, M. Reputation in the Iterated Prisoner’s Dilemma: A Simple, Analytically Solvable Agents’ Model. Entropy 2025, 27, 639. https://doi.org/10.3390/e27060639

AMA Style

Cieśla M. Reputation in the Iterated Prisoner’s Dilemma: A Simple, Analytically Solvable Agents’ Model. Entropy. 2025; 27(6):639. https://doi.org/10.3390/e27060639

Chicago/Turabian Style

Cieśla, Michał. 2025. "Reputation in the Iterated Prisoner’s Dilemma: A Simple, Analytically Solvable Agents’ Model" Entropy 27, no. 6: 639. https://doi.org/10.3390/e27060639

APA Style

Cieśla, M. (2025). Reputation in the Iterated Prisoner’s Dilemma: A Simple, Analytically Solvable Agents’ Model. Entropy, 27(6), 639. https://doi.org/10.3390/e27060639

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Reputation in the Iterated Prisoner’s Dilemma: A Simple, Analytically Solvable Agents’ Model

Abstract

1. Introduction

2. Model

3. Results

3.1. Case 1: $p_{q} (x) = C o n s t$

3.2. Case 2: $p_{q} (x) \sim x$

3.3. Case 3: $p_{q} (x)$ Decreases with an Increase of x

3.4. Case 4: $p_{q} (x)$ with Maximum at $q = \frac{1}{2}$ and Minima for $q \in {0, 1}$

3.5. Case 5: $p_{q} (x)$ with Minimum at $q = \frac{1}{2}$ and Maxima for $q \in {0, 1}$

4. Discussion

5. Conclusions

Supplementary Materials

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Reputation in the Iterated Prisoner’s Dilemma: A Simple, Analytically Solvable Agents’ Model

Abstract

1. Introduction

2. Model

3. Results

3.1. Case 1: p q ( x ) = C o n s t

3.2. Case 2: p q ( x ) ∼ x

3.3. Case 3: p q ( x ) Decreases with an Increase of x

3.4. Case 4: p q ( x ) with Maximum at q = 1 2 and Minima for q ∈ { 0 , 1 }

3.5. Case 5: p q ( x ) with Minimum at q = 1 2 and Maxima for q ∈ { 0 , 1 }

4. Discussion

5. Conclusions

Supplementary Materials

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

3.1. Case 1: $p_{q} (x) = C o n s t$

3.2. Case 2: $p_{q} (x) \sim x$

3.3. Case 3: $p_{q} (x)$ Decreases with an Increase of x

3.4. Case 4: $p_{q} (x)$ with Maximum at $q = \frac{1}{2}$ and Minima for $q \in {0, 1}$

3.5. Case 5: $p_{q} (x)$ with Minimum at $q = \frac{1}{2}$ and Maxima for $q \in {0, 1}$