1. Introduction
According to Darwin’s theory of natural selection [
1,
2], individuals tend to be selfish and will use strategies that maximize the benefit they obtain as much as possible. However, cooperation is ubiquitous in both natural and social systems in real life [
3,
4,
5]. Therefore, it is a challenging problem to understand the generation, maintenance and evolution of cooperative behavior among selfish individuals [
6,
7]. The Prisoner’s Dilemma Game (PDG) is a typical paradigm for studying the evolution of cooperation between two selfish individuals, and the public goods game (PGG) extends it to the evolution of cooperation between multiple selfish individuals and has received extensive attention in the study of cooperation problems [
8,
9,
10]. In the simplest PGG [
11,
12], N participants are asked to decide whether they wish to contribute to the common pool (i.e., cooperate) or not (i.e., defect), and subsequently, the total contribution value multiplied by an enhancement factor is equally distributed among all participants, regardless of whether they contributed or not. Obviously, the overall payoff is maximized when all participants choose to cooperate. However, participants face the temptation of free-riding and selfishly choosing to defect in order to obtain the maximum individual payoff, because no matter what other participants choose, individuals who choose to defect always obtain a higher payoff, which leads to the Tragedy of the Commons [
13].
Based on years of research on social dilemmas, Nowak summarized five main mechanisms of cooperative evolution [
14], namely, direct reciprocity, indirect reciprocity, kin selection, group selection, and network reciprocity. Due to the rapid development of complex networks, considerable effort has been invested in the cooperative evolution of structured populations and their network reciprocity in the last decade [
15,
16,
17,
18,
19,
20,
21,
22,
23]. Scholars in economics, biology, and social sciences have tried to explain social dilemmas from different perspectives and proposed many important mechanisms, such as rewarding cooperators [
24,
25,
26,
27], punishing defectors [
28,
29,
30,
31,
32,
33], taxation [
34,
35], noise [
36,
37,
38], social diversity [
39,
40,
41,
42], reputation [
43,
44,
45], and so on.
In addition to mechanisms regarding cooperation, the influence of strategy renewal rules on cooperation cannot be ignored. Theoretical scholars from various disciplines have proposed various updating rules of the strategy, including desire-driven [
46,
47], Fermi [
48], Moran processes [
49,
50], and so on. Some scholars have introduced the idea of group intelligence into the strategy update rules. However, the strategy update rule mentioned above only applies to the case of discrete strategies, namely to either purely cooperate or to purely defect. In real life, people’s strategy choices are unlikely to be simply black or white.
Scholars then tried to introduce the particle swarm optimization (PSO) intelligence algorithm to study the effect of PSO learning rules on the cooperative evolution of successive strategies. With the PSO learning rule, participants adjust their own strategies by considering the payoffs of their opponents and their own historical experiences. Chen et al. introduced PSO into the PGG in [
51]. Their simulation experiments showed that cooperative behavior in the PGG can be effectively generated and maintained using group intelligence algorithms. Quan et al. [
52] proposed two punishment mechanisms for the public goods game, which are self-punishment due to personal guilt and peer punishment due to peer dissatisfaction. The parameters of individual tolerance and social tolerance are adjusted to determine whether individuals are punished or not or whether they punish others. Through simulation experiments, both punishment mechanisms were found to significantly promote cooperation. Quan et al. [
53] investigated the effect of PSO on the evolution of cooperation for two structured populations of square networks and nearest neighbor coupled networks, respectively. They found that the nearest neighbor coupled network structure promoted cooperation in the SPGG more effectively. Lv et al. [
54] proposed a public goods model in which each participant could adjust the proportion of the amount he or she spent on investment versus that spent on punishment based on PSO. The effect of PSO learning rules on cooperation and punishment evolution was investigated, and the effect of propensity coefficients on the cooperation evolution and cooperation punishment inputs was found to vary with punishment intensity. The propensity coefficient determines whether players prefer to stick to their best historical investments or mimic the best performing investments in their current neighborhood. Its higher value indicates that the participant prefers to learn from his own history. For lower punishment intensities, lower propensity coefficient values facilitate cooperative evolution. However, as punishment intensity increases, higher propensity coefficients facilitate cooperation. The PSO-based PGG studies described above were concerned with learning from the neighbors’ performance versus their own historical best memories. However, in real life, people’s memory mechanisms are complex, and the memory scale is not infinite. Moreover, as the external environment changes, learning memories that are too obsolete may not be applicable for the new environment.
In this paper, we introduce the concept of memory stability [
55], using a combination of two aspects, the amount of the payoff and the difference between the current time step and the time step where the memory is located, to decide which historical strategy to learn. The effects of the continuous public goods game and memory stability on the evolution of cooperation are considered on a square lattice network. Participants continuously adjust the amount of investment in the public pool according to the PSO strategy update rule. We focused on the effect of the update rule based on PSO strategy on the cooperative evolution of public goods investments in different memory stability scenarios, including the effect of the average level of cooperation, the distribution of strategies, and the magnitude of strategy changes. The innovations in this paper are as follows: (1) Previous PSO solving PGG ignored the fact that human memory is not infinite and is subject to forgetting, and we supplemented and improved in this regard. (2) Previous PSO algorithms, in almost only learning their own memory and not imitating their neighbors, always learn the initial memory, resulting in the average investment value being maintained at the initial level and being unable to produce a higher average investment value, and our introduction of a memory stability PSO algorithm solved this problem.
The rest of this paper is organized as follows. In
Section 2, we describe the continuous space public goods game and the PSO learning rule that introduces memory stability.
Section 3 presents the main simulation results and discussion. In
Section 4, we summarize our main conclusions.
2. Models and Methods
The participants are located on an
square lattice with periodic boundary conditions. Each participant is connected to their
neighbors and participates in an
game organized by the participant and their neighbors. At first, with uniform probability on the interval [0, 1], each participant is assigned a strategy, which represents the willingness to cooperate. Referring to the payoff model in [
52], the payoff of participant
in the game centered on participant
is:
where
denotes the set of neighbors of participant
, including the participant, and
is the total amount each participant can invest in a game. We set
;
denotes the investment of participant
in the public pool at moment t, representing their willingness to cooperate, and
(
) denotes the enhancement factor. At moment
, the cumulative payoff of participant
by participating in the self-centered vs. neighbor-centered game is:
After accumulating payoffs for each game, each participant updates their strategy simultaneously, guided by the update rules based on the PSO strategy. Because individuals investing in the common pool face the risk of being free-riding by their neighbors, rational individuals tend to choose not to invest, which leads to the worst-case scenario where the overall payoff is minimized. How to make the overall investment maximize becomes a problem.
Each participant has strategy. The magnitude of the strategy changes, and so do all of the previous memories. Depending on the magnitude of strategy changes, each participant decides to increase or decrease their investment in the common pool. The investment of participant
in the common pool at time step
is:
In the investment update process, participant investments are subject to the following boundary conditions: if
, let
; if
, then
. Each participant adjusts their magnitude of change in the direction of the most profitable strategy based on their past behavior and the neighbor’s current best strategy. The magnitude of change of adjustment of the public goods investment is calculated as follows:
The propensity coefficient
is a predetermined value that determines whether players tend to learn their own historical maximum-intensity memories or imitate their current highest payoff neighbors, with a smaller
indicating that participants are more inclined to imitate their neighbor and a larger
indicating that participants are more inclined to learn their own historical maximum-intensity memory [
56]. To prevent participants from adjusting their magnitude of investment change excessively in a step, the magnitude of the participant strategy change is subject to the following boundary conditions: if
, let
; if
, let
. Initially, given the same investment amount magnitude of the strategy change for each participant,
.
denotes the amount of investment in the public good with the best return among its neighboring participants at moment
, defined as follows:
where
denotes the amount of investment with the highest memory strength among all memories of participant
, calculated as follows:
The memory strength at the moment
is determined by two aspects, the first is the payoff
and the second is the memory extractability
; the payoff is proportional to the memory strength, and
denotes the memory extractability of moment
at moment
and is calculated as follows:
where
is determined by the time difference and the memory stability coefficient.
is the time difference, i.e., the difference between the current time step and the time step where the memory is located.
is the memory stability coefficient, and when
tends to infinity, participant
’s learning of historical memory is influenced only by payoff, independent of memory extractability, and the maximum-intensity memory learned each time is the memory with the best payoff in the literature [
51,
52,
53,
54]. When
is not infinite, the memory intensity gradually decreases as the time difference increases. In this paper, the memory stability coefficient
was divided into a total of six levels, namely 1, 2, 3, 4, 5, and infinite. When
infinite, the above PSO algorithm is the same as in the literature [
53]; therefore, the following simulations use
infinite as a comparison. Obviously, the lower the memory stability coefficient
is for the same time-difference case, the more easily the memory is forgotten. For the same stability factor, the larger the time difference, the more likely it is to be forgotten. We explored the effect of the strength of memory on the evolution of cooperation by adjusting the level of the memory stability coefficient
.
The models and formulas mentioned above are the problem descriptions and constraints for the PSO optimization problem. For an individual, the PSO algorithm’s goal for adjusting his or her strategy is to maximize his or her payoff, whether the participant tends to learn from memory or tends to imitate his or her neighbors. For the whole, to maximize the overall payoff is to maximize the overall investment in the common pool, because all payoffs are obtained by multiplying the funds in the common pool by the gain coefficient; thus, the optimization goal of the PSO algorithm is to maximize the average investment.
We introduce the conceptual graph of the model under study. As shown in
Figure 1, PSO integrates three factors to adjust the strategy, which are the magnitude of strategy change, the maximum strength memory, and the neighbor best strategy.
Figure 2 shows a schematic diagram of the PSO algorithm guiding the strategy adjustment.
The algorithm pseudo-code and flowchart regarding the above model are given by Algorithm 1 and
Figure 3, respectively.
Algorithm 1: PGG Based on PSO with Memory Stability |
Set the parameters: (enhancement factor) , (propensity coefficient) , (memory stability coefficient) , (iteration number) , (number of participants) N = 10,000. Create lattice network and initialization strategy and strategy change magnitude. While () For Update the payoffs of participant using Equation (2). End for For Update the value of the investment with the best payoff in participant ’s neighborhood (including ) using Equation (5). Update the maximum-intensity memory of participant using Equation (6). Update the magnitude of strategy change of participant using Equation (4). Update the strategy of participant using Equation (3). End for . End while Output the average level of cooperation .
|
3. Results
We conducted simulation experiments on a square lattice network with periodic boundary conditions. Each node is connected to its top, bottom, left, and right four neighbors. There are participants located at the vertices of the network. The key metric used to characterize the cooperative behavior of the system is the average level of cooperation , defined as , where denotes the strategy of participant when the system reaches dynamic equilibrium, i.e., the value of in the system no longer varies or fluctuates within a small range. The optimization objective mentioned in the model section maximizes the average investment, i.e., the average level of cooperation. In the following simulations, is obtained by averaging the last 500 generations of over 50,000 generations when the system reaches stability. Each entry is the average of 10 individual runs. The above model was simulated with the simultaneous update rule. All simulation results were generated by a code developed in C++. The experimental hardware is an Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40 GHz server was manufactured by GenuineIntel in California, USA, and with 128 G RAM.
We first investigated the impact of the changes in
on the evolution of cooperation in the high and low
cases. As in
Figure 4, first, cooperation was greatly enhanced in all regions of
where the strategy updates were guided by group intelligence compared to the case where all participants adjusted their strategies using the Fermi rule (i.e., the original curve). In addition, in the low
case,
infinite kept the cooperation level at a lower value, and decreasing
led to a decrease in the level of cooperation to 0. In the high
case, decreasing
effectively promoted cooperation. Overall, decreasing
led to a larger range of
values for reaching a high level of cooperation compared to
infinite, as shown in
Table 1. Comparing the left and right plots, the change in cooperation level due to a change in
for different
trends was roughly the same, but a higher
increased the gap in cooperation levels for a change in
compared to a lower
.
Figure 4 also shows that the curve of the high memory stability coefficient was smoother, while the curve of low memory stability coefficient was more undulating. Counter-intuitively, the curve of the low memory stability coefficient even appeared to decrease as
increased. As shown in
Figure 5, in the case of
and
, the evolutionary process
= 3.8 generated more clusters of high investors compared to
because the
was larger, but the average level of cooperation that finally reached stability was lower. This is because there is such a situation in the evolutionary process, wherein complete investors expand outward due to network reciprocity forming clusters, and eventually there will be a small number of low investors where these clusters intersect that remain difficult to eliminate due to the high returns of low investment. In the lower case of
, the entire population has fewer clusters of high investors due to network reciprocity; thus, they end up with fewer hard-to-eliminate low investors at their intersections, while in the higher case, there is many clusters and their intersection areas are larger, resulting in more hard-to-eliminate low investors and leading to a decrease in the average level of cooperation.
We next explored the effect of changes in
on the evolution of cooperation in the extreme
case. At the extremely small
, participants tended to imitate their neighbors almost exclusively, the effect of memory can be ignored, and the change in
had no effect on the evolution of cooperation; thus, we did not include images of the extremely small
in the article. At the extremely large
, participants almost exclusively learned their own memories, and the effect of the change in
on the evolution of cooperation was significant. As in
Figure 6, in the extreme case of
with
infinite, even with a high
, higher cooperation levels could not be reached because each participant learned almost only their initial memory and not their surrounding neighbors. In contrast, in this case, decreasing the memory stability coefficient allowed high cooperation rates to be reached even if almost no surrounding neighbors learned. Therefore, in the extremely large
case,
infinite could not achieve a high level of cooperation and cooperation could only be reached by decreasing
.
To investigate the combined effect of the memory stability coefficient
on the level of population cooperation, we took 100 points in each of the two parameter intervals
and
as parameters in our experiments. We conducted simulation experiments for each pair of parameter combinations.
Figure 7 shows a plot of the population cooperation level versus the parameter
for different memory stability coefficients
. Each data point was obtained by averaging the average cooperation level
over the last 500 time-steps of a total of 50,000 time-steps, and each data point was calculated from the average of 10 independent realizations.
As can be seen from
Figure 7, a decrease in
at a higher
promoted cooperation, whereas a lower value of
at lower
reduced the overall cooperation level. Changes in
affect only the part of learning from one’s own history and not from one’s neighbors; thus, the effect of
on cooperative evolution is greater at the case of higher
values. Therefore, we next chose
as a prerequisite to explore the effect of different memory stability coefficients on cooperative evolution in detail.
To explore the effect of the change in
on the distribution of group strategies for different
cases, we simulated three values of
for
and
, respectively.
Figure 8 gives snapshots after the evolution of different combinations of
and
reach stability. At
,
infinite could maintain a certain level of cooperation, and a smaller
led to unsustainable cooperation; at
, it can be seen that high investors formed clusters through network reciprocity and participants on the periphery of the cluster defended themselves against completely non-investors through lower investments. However, after a certain level of expansion, high
made participants on the periphery of the cluster retain the memory of investing less and earning more, which led to a decrease in investment. Thus, the effect of
on the level of cooperation of the group was bidirectional. When
was relatively small, lowering
made cooperation unsustainable, while when
was large, lowering
significantly promoted cooperation.
To explore the effect of the change in
on the distribution of the magnitude of strategy change, as in
Figure 9, at low
, with
infinite, participants learned the initial investment memory, and most participants updated their strategies slowly, allowing the system to maintain a low level of cooperation rather than a completely uncooperative one. At a high value of
, the strategy update rate increased in the positive direction as
decreased, and at
, the positive increase in the magnitude of strategy change was obvious for participants forming clusters. Only a very small number of participants at the intersection of clusters and clustering increased their strategy update rate in the negative direction. Thus, at a low
, the magnitude of strategy change increased in the negative direction as
decreased; at a high
, it increased in the positive direction as
decreased.
To understand more intuitively the effect of
on the diversity of public good investment,
Figure 10 gives the distribution of public good investments for different values of
. At
, as the
level decreased, the average individual strategy update rate increased in the negative direction, and participants’ investment in the common pool decreased to 0. At
, as
decreased, most of participant strategy update rates increased in the positive direction, and a small number of participant strategy update rates increased in the negative direction, resulting in a decrease in the median value of investment, which ended up with only two strategy values of 0 and 1. Thus, as the memory stability coefficient decreased, the diversity of investment also decreased.
Figure 11 shows the mean of the difference between the participants’ current time-step and the time-step where the learned memory was located for different combinations of
and
. Because the system reached stability at time step 5000, we only plotted the images for the first 5000 time-steps. Larger values indicate that the overall average learned memory is older. In the low
case, when
infinite, most participants learned the initial strategy and could maintain a certain rate of cooperation, which can be maintained. A lower
cannot promote cooperation, and the investment environment gradually becomes worse after participants forget the initial strategy memory and cannot maintain cooperation. In the higher case of
, when
infinite, most participants were still learning the initial memory, leading to a stagnant rise in the level of cooperation. When
, a higher
can promote higher-investment participants to form clusters through network reciprocity, the environment gradually becomes better, and participants forget the initial memories and can reach higher levels of cooperation. As shown in
Figure 12, at a lower
, lowering the memory stability coefficient causes participants to forget their initial memories and new memories to invest less than one at a time, thus causing the average cooperation level to fall rapidly, while at a higher
, lowering the memory stability coefficient causes lower investors around the clusters formed by high-investing participants to forget the memories of previous low investments with high returns and learn from the high-investing participants, thus causing the average cooperation level to rise. Therefore, in the case of
infinite, participants learn a very old memory, and for low
, this memory enables cooperation to be maintained at a low level. For a high
, this memory leads to a stagnant level of cooperation; lowering
causes participants to learn new memories, which is not conducive to the maintenance of cooperation at low
, but can facilitate the generation of cooperation at high
.
To explore the effect of the change in
on the change in cooperation level during evolution for different
cases, as shown in
Figure 13, all six curves reach stability within 5000 time-steps; thus, we only plotted the evolution up to 5000 time-steps. When
, a lower memory stability coefficient causes its average cooperation level to fall faster, while when
, a lower memory stability coefficient causes the average cooperation level to rise faster, but the faster rise does not mean that the average cooperation level is higher when the system reaches stability. The effect of the memory stability coefficient on the average cooperation level at a higher
and a lower
is diametrically opposed because the average cooperation level changes differently at a higher
and a lower
. At a lower
, those who invest high are exploited by those who invest low, and overall, this is unfavorable to those who invest high; thus, their average cooperation level decreases. A high memory stability coefficient makes participants learn their initial investment strategy, which can maintain the average cooperation level at a lower level and no longer decline. When
is relatively high, participants with a high investment tend to converge into clusters and make their payoffs greater than those of the surrounding participants with a low investment through network reciprocity, and at this time, the
ω parameter is larger, and participants are more inclined to learn their own history. Those with a low investment continue to learn from the memory with a low investment and a higher payoff in the past, which leads to an average investment level. After a certain level, it does not continue to rise. Thus, as
decreases, the average cooperation level changes faster, but in a different direction, decreasing at a low
and increasing at a high
.
As shown in
Figure 14, because the average strategy update rate reaches stability within 5000 time-steps, we only plotted the change in the average policy update rate within 5000 time-steps. It can be seen that at a higher
, decreasing the memory stability coefficient leads to a decrease in the average magnitude of strategy change, and the smaller the memory stability coefficient, the faster the average magnitude of strategy change decreases. At a lower
, decreasing the memory stability coefficient leads to a rise in the average magnitude of strategy change, and the smaller the memory stability coefficient, the faster the average magnitude of strategy change rises.
Combined with the results in
Figure 13 and
Figure 14, it can be seen that a high rate of strategy updates does not necessarily mean a high average level of final cooperation; this is because an excessive magnitude of strategy change of the strategy update leads to rapid changes in the investment of some participants, causing some low investors to be surrounded by these high investors before they can learn from the high investors, resulting in gaining high returns, and these low investors become exploiters. A slower rising magnitude of strategy change gives low investors time to learn from their high investor neighbors, making the average level of cooperation higher.