Social Learning Strategies and Cooperative Behaviour: Evidence of Payoff Bias, but Not Prestige or Conformity, in a Social Dilemma Game

: Human cooperation, occurring without reciprocation and between unrelated individuals in large populations, represents an evolutionary puzzle. One potential explanation is that cooperative behaviour may be transmitted between individuals via social learning. Using an online social dilemma experiment, we ﬁnd evidence that participants’ contributions were more consistent with payoff-biased transmission than prestige-biased transmission or conformity. We also found some evidence for lower cooperation (i) when exposed to social information about peer cooperation levels than without such information, and (ii) in the prisoners’ dilemma game compared to the snowdrift game. A simulation model established that the observed cooperation was more likely to be caused by participants’ general propensity to cooperate than by the effect of social learning strategies employed within the experiment, but that this cooperative propensity could be reduced through selection. Overall, our results support previous experimental evidence indicating the role of payoff-biased transmission in explaining cooperative behaviour, but we ﬁnd that this effect was small and was overwhelmed by participants’ general propensity for cooperation.


Introduction
The spread of behaviour that benefits others is difficult to explain through natural selection, as such behaviour risks exploitation from others [1]. Scenarios where prosocial behaviours can be exploited by others are termed social dilemmas [2]. Classic mechanisms to maintain cooperation include kin selection [3], punishment of non-cooperators [4] and reciprocity [5]. Given this, human cooperation is especially surprising because it occurs between unrelated individuals and is often unreciprocated [6,7]. Laboratory studies (usually from WEIRD samples but see [8]) also show that individuals often cooperate at higher levels than would be predicted by game theory [9].
Forms of cooperation may be culturally transmitted within or across social groups through social learning [10,11]. Through social learning, individuals acquire traits or information by observing or interacting with other individuals or the products of their behaviour [12]. Social learning allows individuals to obtain adaptive traits that are difficult to acquire asocially but can also result in the spread of outdated or maladaptive information [13,14]. For this reason, complete reliance on social learning is unlikely to be adaptive [15,16]. Instead, scholars have suggested that selection should favour strategic use of social learning via strategies that influence when, what and from whom individuals socially learn [17,18]. Three strategies have received particular attention: payoff biased transmission (copy traits that yield a high payoff; [16], henceforth payoff bias); conformity (disproportionate propensity to copy common traits; [19]); and prestige biased transmission (copy individuals of high status; [20], henceforth prestige bias). Kendal et al. [21] review evidence for contexts in which these strategies are used, either individually or in combination.
Given that both cooperation and social learning are thought to underpin the massive habitat expansion and the evolution of complex cultural systems characteristic of our species [22,23], it is perhaps surprising that relatively few studies have addressed how they interact. Conformity may be able to sustain cooperation when combined with punishment [24] or when cooperation is already common [25]. In the lab, participants conformed to an external group's donations [26] or cooperated with a previously cooperative partner [27] but direct reciprocity was an overall stronger influence on behaviour. Henrich and Gil-White [20] suggest that in prestige bias, followers grant voluntary deference towards leaders in exchange for learning opportunities. This could incentivise copying of cooperation among followers and increased cooperation from leaders [28]. Models suggest that prestige can maintain cooperation in a larger range of scenarios than other social learning strategies [28][29][30]. In the lab, participants have exhibited a bias to copy large contributions made by leaders [31]. Furthermore, prosocial leaders (measured by a questionnaire) elicited greater cooperation from their group than selfish leaders and used punishment less than traditional peer sanctioning groups [32]. Likewise, experimental and ethnographic studies suggest that leader fairness and charisma can positively affect cooperation [33]. There is evidence in strictly hierarchical institutions that team performance and information flow is correlated with the degree of informal prestige conferred upon leaders [34,35] although, contrary to common marketing strategy, there is also evidence that real-life cooperative behaviours are not highly influenced by celebrity endorsement [36,37]. Formal status or rank has received little attention, although one social dilemma experiment found participants labelled with stars (indicating a superior quiz performance) were copied more than those without stars [38].
Because payoff-biased social learning results in the adoption of traits proportional to their relative fitness, as formalised in the replicator equation [39,40], it would be expected to spread selfish behaviour. An experimental study showed that participants exhibited a bias to copy their more successful neighbours and reduce their cooperative contributions to a public good [41]. Further experimental evidence suggests that participants are more likely to exhibit a payoff bias than conformity in a cooperation game and reduce their contributions [42,43] and also decrease their contributions when reminded how their behaviour was benefiting others [44]. A recent analysis of 237 PGGs also showed that declines in contributions were most consistent with improving personal payoffs [45]. Furthermore, cooperation was also higher when participants had no information on the behaviour of their group mates [46]. However, payoff biased learning may not be detrimental for cooperation in all cases, for example when defection is less rewarding [47] or when group migration and punishment is possible [48]. Generally, it results in the decline of cooperative behaviour and is the information that is preferentially attended to.
While strategic defection can maximise payoff, the pattern of results points towards payoff biased social learning being the preferred strategy adopted by participants in social dilemmas. While conformity can increase cooperation in some contexts, it appears to be the weakest cue when compared with other social learning strategies [29,42,43]. Prestige (specifically, high status) biased social learning is comparatively understudied in cooperative dilemmas but is predicted to sustain cooperation in a wide array of circumstances [28,30]. Because no experimental study has considered all three strategies simultaneously in a cooperative context, this is the primary aim of our study. The evolution of cooperation can also be affected by the payoff structure of the social dilemma. Typically, cooperation games assume a prisoner's dilemma (PD) payoff structure (see Table 1), where game theory always predicts defection as the rational choice [49]. An alternative is the snowdrift game (SD, see Table 1 for payoff structure), also sometimes referred to as the chicken game or the hawk-dove game [50]. Whereas models based on the PD predict defection as the evolutionarily stable strategy [51], SD games predict stable populations of both cooperators and defectors [52,53]. This is because, in the SD game, exploited cooperators still outperform exploited defectors and so cooperation is favoured when defection is common. To clarify, the production of enzymes in the environment by yeast and bacteria equates to a SD game as enzyme producers benefit from their enzyme production as much as defectors. As predicted, while many cells defect by abstaining from enzyme production and freeride on neighbouring cells, the production of enzymes is not extinguished [54,55]. Table 1. Payoffs associated with cooperation (C) and defection (D) depending on the behaviour of a partner between a PD and SD game adapted from Doebeli and Hauert [50]. Both tables show benefit (b) compared to the costs (c). Payoff rankings are DC > CC > DD > CD for the PD game and DC > CC > CD > DD for the SD game.

Prisoner's Dilemma
Snowdrift There are fewer experimental studies using the SD game than the PD game, perhaps because the evolution of cooperation is a harder problem in the latter. Nonetheless, both scenarios can be seen in the real world. For example, climate issues are commonly seen as a PD game or dilemmas of collective action [56], whereas scenarios like constructing communal flood defences or watching for predators are more akin to SD games. One experimental study comparing an iterated binary PD with a SD game found higher cooperation in the SD game [57]. Similar patterns have also been found in other experiments, often using one-shot binary decision games [58][59][60]. Payoff structure also affects the spatial patterns by which cooperation is predicted to evolve, where localised clusters and dendritic spines of cooperators form in models of PD and SD games, respectively [50].
Despite these patterns in findings, comparatively little is known about the dynamics of the SD game compared with the PD. While there are many examples of PD models which consider cooperation on a continuum [61], few have considered SD games along these lines [62][63][64][65]. Exact payoff structures vary slightly, but they each follow the characteristic hierarchy shown in Table 1 and described by Doebeli and Hauert [50]. Typical findings in such models are a convergence towards contributions of around 50%. No experiments have considered iterated continuous SD games in a group context or alongside social learning, so addressing this limitation is the second aim of this study.

Research Questions
Our experiment addresses several key gaps in previous research. Rather than forcing participants to adopt a particular social learning strategy across experimental conditions, we adopt a more naturalistic approach by permitting participants free access to the information required for all three (prestige, conformity, and payoff bias) of the major social learning strategies in a cooperative game. We then use statistical models to infer which social learning strategies were used. We compare both the PD and SD games played across 6 rounds in groups of 4. Each round, participants could contribute between 0 and 10 units to a pot which was doubled and split between all participants. In the SD game, participants received no points for the round if the total did not reach 10. This allows a comparison of cooperation rates and social learning strategy use between games beyond a one-shot context and allows participants to express differing degrees of cooperation. To this end, our experiment employs a between participants 2 (social versus asocial) × 2 (PD versus SD) factorial design with a PD and SD condition alongside asocial (no access to social information) and social learning (access to social information) conditions. This experiment addressed 4 research questions (RQ) (Appendix A).

1.
How do social learning strategies influence cooperative behaviour? 1a.
Which social learning strategies, if any, do participants use?-We predict that payoff biased learning will have the strongest influence on cooperative behaviour [45] followed by prestige and then conformity [38,42,43]. 1b.
Are the patterns of social learning strategies consistent across the PD and SD game?-Due to a lack of prior studies, we make no predictions over the direction of the interaction between social learning strategy use and payoff structure.

2.
What effect does payoff structure have on cooperative behaviour?-We predict higher cooperation in the SD game than the PD [50,57].

3.
What effect does access to social information have on cooperative behaviour? We predict lower cooperation with access to social information than when individuals make decisions asocially because we expect a payoff bias to decrease cooperation [45,46]. Figure 1 shows the mean cooperation rates from rounds 2-6 for the four experimental conditions. Generally, mean cooperation was around 6 points at round 2 and showed little change across subsequent rounds. This suggests that overall cooperation rates were relatively consistent throughout the experiment. Participants also generally indicated a good understanding (using a scale of 1/poor to 10/good) of how the game worked (Median = 8, IQR = 3). shot context and allows participants to express differing degrees of cooperation. To this end, our experiment employs a between participants 2 (social versus asocial) × 2 (PD versus SD) factorial design with a PD and SD condition alongside asocial (no access to social information) and social learning (access to social information) conditions. This experiment addressed 4 research questions (RQ) (Appendix A).

Results
1. How do social learning strategies influence cooperative behaviour?
1a. Which social learning strategies, if any, do participants use?-We predict that payoff biased learning will have the strongest influence on cooperative behaviour [45] followed by prestige and then conformity [38,42,43]. 2a. Are the patterns of social learning strategies consistent across the PD and SD game?-Due to a lack of prior studies, we make no predictions over the direction of the interaction between social learning strategy use and payoff structure.
2. What effect does payoff structure have on cooperative behaviour?-We predict higher cooperation in the SD game than the PD [50,57]. 3. What effect does access to social information have on cooperative behaviour? We predict lower cooperation with access to social information than when individuals make decisions asocially because we expect a payoff bias to decrease cooperation [45,46]. Figure 1 shows the mean cooperation rates from rounds 2-6 for the four experimental conditions. Generally, mean cooperation was around 6 points at round 2 and showed little change across subsequent rounds. This suggests that overall cooperation rates were relatively consistent throughout the experiment. Participants also generally indicated a good understanding (using a scale of 1/poor to 10/good) of how the game worked (Median = 8, IQR = 3).  Although there appears to be little variation between rounds, economic games commonly find declines in cooperation across rounds [46,66]. Therefore, it may still be necessary to control for variation between rounds. Two competing models were compared, one which ignored round ("No round") and another which added a varying intercept for round ("Round"). No round had a WAIC score of 5502.3 (SE = 76.9, weight = 0.73) and Round had a WAIC score of 5504.3 (SE = 77.0, weight = 0.27), indicating no improvement in out-of-sample predictive ability by varying intercepts by round. The results were similar when round was included as a continuous linear predictor (No round; WAIC = 5501.4, SE = 76.9, weight = 0.71, Round; WAIC = 5504.2, SE = 77.0, weight = 0.29). Therefore, all further models excluded the effect of round. Some modelling concerns needed to be addressed before answering this question. Data from the asocial condition were retained in the model for analysis to ensure that parameter estimates for the effects of payoff structure can be evaluated across the social and asocial condition. However, data from the asocial condition cannot be used to estimate the social learning parameters because participants did not view any social information. To address this, we modelled the interaction of the three social learning strategy parameters with the social information condition: the predicted effect is always 0 if the data come from the asocial condition.

Results
A second concern is that, for participants who are themselves either prestigious (having scored highest in a pre-game quiz relating to understanding of how social groups work) or have the highest payoff, the prestige and payoff social information is not strictly social as it refers to their own previous behaviour. To address this, the model used binary variables to exclude each participant from using social learning strategy data about themselves to construct the social learning parameter estimates. Specifically, prestige interacted with a binary variable where a value of one indicates they are not the prestigious individual. Payoff interacts with a binary variable where a value of one indicates they are not currently the highest earner. Accordingly, parameter estimations occur only for cases where the slopes are not inflated by one's own behaviour.
The conformity information presented to participants (average group behaviour) included their own behaviour, but not exclusively. While reconstructing this variable to exclude their own behaviour would correct for this issue, this introduces an inconsistency between the modelled variable and the information participants were presented with in the experiment. Therefore, the proceeding analysis was repeated for uncorrected (includes their behaviour) and corrected (excludes each participant's own behaviour) conformity information. The main text details the uncorrected analysis while Appendix B shows the main model predictions with the corrected variable and the difference in estimated parameters. Qualitatively, the primary conclusions do not differ from one another.
Eight different models were fit to the data, consistent with the constraints described above, covering all possible combinations of the three social learning parameters (Prestige, Conformity and Payoff). The compared models ranged from a model containing only the control variables of experimental condition (SD/PD and Social/Asocial) and being the prestigious participant, to a model additionally containing all the social learning strategies (Prestige + Conformity + Payoff). The WAIC values and associated model weights are displayed in Table 2. Table 2. WAIC values and model weights for models evaluating the impact of social learning strategies. Standard error difference shows the standard error in the difference between each model and the model with the lowest WAIC value. All social learning parameters interact with the social information condition. Model names indicate which social learning strategies are included. All models include the parameters for game structure and social information condition. The pattern of WAIC scores do not provide conclusive support for any particular social learning strategy. Overall, the strongest evidence is for payoff bias as the two top models which are favoured over the asocial model and have a combined weight of 0.61, include payoff bias. Conversely, the four models which include conformity have the lowest overall model weights (0.10), indicating models that include conformity are overfit compared to the asocial model. There appears to be a small effect associated with a prestige bias, as adding prestige to a model containing payoff does slightly improve its out-of-sample predictive ability. However, prestige alone is not favoured over an asocial model, which suggests that it is primarily payoff that is improving the model fit. Additionally, the asocial model is (modestly) favoured over those which do not contain a payoff bias or contain a conformity. This includes the Prestige + Conformity + Payoff model which despite containing payoff, is penalised by WAIC for including conformity and prestige. This further suggests that conformity bias and prestige bias are overfit compared with payoff bias.

Model
Parameter estimates ( Figure 2) and model predictions ( Figure 3) from the Prestige + Conformity + Payoff model are displayed in the plots below. Figure 3 is split between the three social learning strategies and predictions are generated for increases in the respective social learning information while holding all other variables constant. The slope for payoff is positive which indicates that, generally, participants' behaviour aligned with the direction (increase/decrease) of this social information. The slopes for prestige and conformity are weakly positive but have wider prediction intervals and the parameter estimates overlap 0. This, combined with the distribution of model weights, suggests that out of the three social learning strategies, a payoff bias shows the strongest influence on participant cooperation. Therefore, the changes in cooperation observed are most consistent with a payoff bias.

Are the Patterns of Social Learning Strategies Consistent across the PD and SD Game? (RQ 1b)
To evaluate any differences in social learning strategy use between PD and SD games (research question 1b), the Prestige + Payoff + Conformity model was compared to a model where the social learning parameters also interacted with game structure. This allowed the model to estimate different slopes for the social learning parameters between the PD and the SD game. This did not improve model fit (Prestige + Conformity + Payoff: WAIC = 5503.7; se = 77.3; weight = 0.77, Interaction: WAIC = 5506.2; se = 77.6; weight = 0.23), indicating that social learning strategy use, or the influence of any social learning strategy, did not differ between the PD and SD games.

Are the Patterns of Social Learning Strategies Consistent across the PD and SD Game? (RQ 1b)
To evaluate any differences in social learning strategy use between PD and SD games (research question 1b), the Prestige + Payoff + Conformity model was compared to a model where the social learning parameters also interacted with game structure. This allowed the model to estimate different slopes for the social learning parameters between the PD and the SD game. This did not improve model fit (Prestige + Conformity + Payoff: WAIC = 5503.7; se = 77.3; weight = 0.77, Interaction: WAIC = 5506.2; se = 77.6; weight =

Are the Patterns of Social Learning Strategies Consistent across the PD and SD Game? (RQ 1b)
To evaluate any differences in social learning strategy use between PD and SD games (research question 1b), the Prestige + Payoff + Conformity model was compared to a model where the social learning parameters also interacted with game structure. This allowed the model to estimate different slopes for the social learning parameters between the PD and the SD game. This did not improve model fit (Prestige + Conformity + Payoff: WAIC = 5503.7; se = 77.3; weight = 0.77, Interaction: WAIC = 5506.2; se = 77.6; weight =

Evaluating the Experimental Conditions (RQs 2 and 3)
To evaluate the effects of game structure and the availability of social information on cooperative behaviour (research questions 2 and 3), model comparisons were run between the Prestige + Conformity + Payoff model and models that dropped different combinations of binary variables pertaining to game structure and social condition or allowed them to interact. This means these effects can be evaluated while controlling for social learning strategy use and remain comparable to the models presented above. Every model also  Table 3. Table 3. WAIC values and model weights for models evaluating the impact of binary experimental condition variables. Standard error difference shows the standard error in the difference between each model and the model with the lowest WAIC value. Model names indicate which predictor variables are included in addition to the social learning strategy parameters. The Prestige + Conformity + Payoff model is the full model containing the social learning strategies and the parameters for game structure and social information condition. Overall, there was no clear distinction between any of the models. It is therefore unclear whether including either (or both) predictors (or their interaction) benefits out-ofsample model fit or not, though both top models contain the social information condition (combined weight 0.60). Figure 4 shows model predictions generated from the interaction model. There is some indication that cooperation was lower in the social information condition than the asocial condition and (to a lesser degree) higher in the SD than the PD game (Social = −0.58, 95% PI = −0.04; −1.14, Snowdrift = 0.39, 95% PI = −0.15; 0.94).

Simulation Model Dynamics
We used a simulation model to evaluate the longer-term consequences of the patterns of behaviour observed in this experiment. This permits the predictions from the Bayesian

Simulation Model Dynamics
We used a simulation model to evaluate the longer-term consequences of the patterns of behaviour observed in this experiment. This permits the predictions from the Bayesian model (and the role of social learning) to be investigated for larger group sizes and under selection. This model samples from the Bayesian posterior estimates from the Prestige + Conformity + Payoff model to establish each agent's intercept propensity for cooperation and the influence of the social learning strategies (taking into account that the simulation model considers the PD and social condition). Note that the social learning strategies are assumed to operate non-independently of one another. Figure 5 shows that for the basic horizontal-transmission simulation, mean cooperation quickly stabilised to a relatively steady state at around a contribution of 5.7, indicating that social learning strategy use is not predicted to cause long-term change in the frequency of cooperation in a population. By comparison, Figure 6 shows that if we force agents to adopt a particular intercept propensity for cooperation (high, low), cooperation stabilises at different levels. Thus, over a long timeframe and provided participants continue to behave on average as they did in the experiment, cooperation levels are far more strongly affected by the intercept propensity for cooperation than by the effects of social learning strategies. We found that group size did not affect these qualitative findings (see Appendix D).   We introduced selection and small random mutation on the intercept propensity for cooperation by assuming intercept values in one round are represented in the next round in proportion to payoffs earned and then altered by a small amount by sampling from a normal distribution around the inherited intercept value. This simulation can either represent selection and mutation across biological generations, or modification of an individual's propensity for cooperation over time within a generation.
We found that cooperation declined as agents with small intercepts contribute less overall and gain greater payoffs than those with large intercepts (Figure 7). This result illustrates that, as expected for a PD game, the stable degree of cooperation shown in the horizontal transmission model and observed in the experiment is susceptible to selection

Discussion
This experiment sought to test multiple predictions. Specifically, whether there is evidence for the use of social learning in social dilemmas and if so, which social learning strategy between payoff bias, conformity, and prestige bias did participants appear to be

Discussion
This experiment sought to test multiple predictions. Specifically, whether there is evidence for the use of social learning in social dilemmas and if so, which social learning strategy between payoff bias, conformity, and prestige bias did participants appear to be following. Further, comparisons were made between Prisoner's dilemma and Snowdrift public goods games. Finally, the statistical estimates of parameters contributing to behaviour in the experiment were fed into a simulation model to predict long-term trends, examining group size and the effects of selection on the propensity for cooperation.
We found evidence for the use of payoff biased learning in social dilemmas, but little support for prestige or conformity. However, the overall impact of the social learning strategies on cooperative behaviour was small. There was little evidence of an interaction between game structure and social learning strategy use. Payoff biased copying has also been found in previous social dilemma experiments where, in each case, social learning and specifically, payoff biased copying eroded cooperation [43,44]. These findings add to the growing evidence of payoff biased social learning in a variety of other contexts and species [67][68][69][70][71].
In our experiment we found no strong evidence for the use of conformity. Theoretically, conformity may influence patterns of cooperation, but it can often depend on the initial composition of the population [25], or other complementary mechanisms such as network reciprocity [72]. In social dilemma experiments, conformity can sometimes increase cooperation, though it is outperformed by stronger cues such as reciprocity [26,27], is often ignored [43], or increases cheating [73,74]. Outside of cooperative contexts, frequency information is only used if payoff information is unreliable [75], which may explain our findings. Despite this, a null result in our experiment does not necessarily imply conformity is unimportant for the evolution of cooperation. One of the benefits of strong conformity, often absent from experimental research [76], is the spread of shared cultural norms or values, which in turn, can facilitate cooperation [77].
The absence of a strong prestige bias was unexpected. Of the little research available, the effect of prestige or leadership on cooperative behaviour seems overwhelmingly positive [28,29,31,33,38]. While our study suggests that prestige does not influence cooperation as strongly as other research has suggested, there are several possible reasons for this. Like conformity, it may be that prestige was not used because accurate payoff information was available. By definition, prestige serves as a heuristic to be used when payoff information is ambiguous or unavailable [78], which has been demonstrated in an experimental setting [79]. It is also important to consider the way prestige was defined in this experiment. A prestigious individual is defined as someone with either high general skill and knowledge and/or with a large following [20,80]. Our operationalization of prestige using a quiz follows other studies that have successfully used this approach [38,79,81]. Nonetheless, the possibility remains that our participants did not consider the winner of the quiz to be prestigious in the context of the social dilemma. Moreover, high scoring individuals demonstrated skill in the same domain as the context in which they could be copied (the social dilemma game) rather than a potentially less "useful" general knowledge.
It should be noted that, even for payoff bias, the effects sizes associated with social learning strategies were not particularly large and were all associated with a good deal of uncertainty. This was reflected in the patterns of model comparison which showed only small differences in WAIC scores between competing models, which suggests that each model would make roughly similar out-of-sample predictions. In addition, the simulation model indicated that social learning strategies did not cause a significant change in cooperation which, instead, was determined by individual propensities for cooperation (determined by intercepts). Furthermore, when asked, after their participation in the game, whether they used the social information in some way, only 28% of participants (that responded) said yes.
We found lower levels of cooperation in the social information condition than the asocial condition. Although a concern for reputation might suggest that cooperative acts are more common when such behaviour is observable [82,83], overwhelmingly, classic economic games which provide breakdowns of group mates' behaviours, find free riding to be the dominant strategy [2,9]. In a study which compared playing with and without information about group mates' behaviour, higher contributions were found in groups where no information was available [46,84]. These, and our, findings suggest that providing social information reduces cooperation. One explanation is that social information is used to update beliefs about how little other group members are contributing [85].
Finally, as predicted, we found evidence of higher levels of cooperation in the SD game compared with the PD. Although the effect was small, this result is consistent with existing theoretical work [50,53] as well as biological [54,55] and experimental evidence [57,59] which predict higher cooperation in SD than PD.
There are several methodological aspects of our study worth addressing. Unlike most other studies which consider SD games using one shot or binary interactions, we allowed contributions on a continuous scale across multiple rounds. The setup of our SD game represented an extremely harsh SD game (e.g., compared to [62,65]) where a failure to meet the public good threshold resulted in a complete loss of all individuals' payoffs for that round. Many formulations (though often binary cases) consider such an outcome to result in no change in individuals' payoffs [50]. In that sense, our experiment may be more akin to a Chicken game, where mutual defection (or failure to swerve) produces an actively deleterious outcome. Nevertheless, the formulation of our experiment still conforms to the characteristic payoff hierarchy of the SD game (where cooperating against defector(s) is preferable to defecting) which applies to real life contexts. For instance, the failure of a population to reach the investment necessary for functional flood defences or invest sufficiently in predator defence could result in the collapse of that population. Therefore, we maintain that the setup of this experiment is a useful approximation of real-life cooperative dilemmas.
The mean group donation displayed in this experiment was around six units (of a possible ten), which showed little decline across rounds. This is unusual for PD social dilemma games, which generally show high initial donations which decline sharply towards the end of the game [9] and average contributions of around 37% [66]. One possibility is the relatively low number of rounds in our experiment, though previous experiments have shown declines within this timeframe [32,43]. Alternatively, participants may have been confused about how the game worked [86]. While this is possible, our self-report measure suggested that participants generally believed they understood how the game worked. A more likely explanation for the elevated contribution rate is the multiplication factor of 2 used in this experiment. High multiplication factors have been found to both raise cooperation rates and slow declines across rounds [87,88].
In our experiment, participants could be socially influenced by others taking part in the same iteration of the social dilemma game. This contrasts with other experimental designs which only allow social learning between groups playing different iterations of the social dilemma game [27,43]. The latter approach has benefits, as it allows social learning to be decoupled from other factors such as reciprocity or the possibility of participants attempting to influence their group mates' behaviour through their own behaviour. Nonetheless, we consider that the within-group social influence design holds greater ecological validity in simulating situations where people may be socially influenced by those that are participating in the same social dilemma. The decision not to manipulate what social information was offered to participants also approximated a more realistic scenario, allowing each participant to adopt one or more social learning strategies [21,42]. Of course, we cannot discount that participants used some other strategy (or combination of strategies) aside from the ones considered here [75].
Future research could address individual differences in social learning strategy use in the context of cooperation [21,68,89]. A larger sample size than was feasible in our experiment would allow the GLMMs to be extended to include an individual slope for each participant and calculate the proportion of participants who employed a given strategy [90].
An alternative might be to allow participants to choose what information they viewed [91]. Further attention should also be given to prestige as we failed to document a strong effect in contrast to clear predictions from theoretical models [28,29]. To address the possibility that our operationalization of prestige was not relevant to participants in this experiment, it would be useful to consider a different definition of prestige, perhaps one based on popularity [79]. Experiments could also investigate other game structures than those considered here, such as the stag hunt game [92,93]. Finally, it would be useful to consider social learning strategies within real-world cooperative scenarios. For example, normative messages are widely used in interventions to reduce household energy use [94] and cultural group selection has been applied to understand the transmission of lobstering practices in Maine [95]. Both our study and the literature suggest that payoff bias may affect cooperative behaviour within applied settings. However, given the overwhelming effect of intercept variation in our study, it may also be important to consider factors such as personality and the socio-cultural environment that shapes the development of inclinations to cooperate.

Design
The experiment involved four conditions in a between participants 2 × 2 factorial design. Factor one was the social information condition (Asocial vs. Social) which manipulated whether the participants had access to social information (see below). Factor two was the payoff structure (Prisoner's dilemma vs. Snowdrift). We used post hoc model selection to infer which social learning strategies had been used.

Materials and Procedure
The experiment was executed using the experimental automation platform Dallinger [96] which recruits participants via Amazon's Mechanical Turk. Upon arrival into the virtual environment, participants were assigned to a group and a unique numerical participant ID was generated for them. Once the group contained four participants, the experiment began. It was split into two parts and all participants completed both parts.
Participants first completed a ten-item quiz containing a variety of questions assessing their understanding of how social groups work to act as a proxy for prestige (see Supplementary Materials). Each question had three possible answers with one (pre-determined) correct answer. At the end of the quiz, a public congratulation was displayed on screen for the participant who gave the most correct answers. For the rest of the experiment, this participant's ID was displayed surrounded by stars (*) and participants were aware this identified the top quiz scoring participant. The questions and scoring were identical across experimental conditions and at no point were participant's actual scores revealed.
Following the quiz, all four participants took part in a six-round public goods social dilemma game (PGG) with either a PD or SD payoff structure, designed in accordance with typical PGGs found in the literature [97]. Participants did not know ahead of time, how many rounds they would be playing, but received detailed instructions on how the game worked (see Supplementary Materials) and that they would receive a bonus payment depending on their score. In the PD, in each round, participants were granted 10 points and could then decide how much of this to donate to a pot. The pot was then doubled and split evenly between all players. The points received from the pot were then added to what the participant had kept for themselves which formed their total score for that round.
The SD game had the following modification: if the donations to the pot were less than 10, all participants received nothing for that round (including losing whatever points they had kept for themselves). This was motivated by precedent as SD models have previously employed similar payoff structures [53]. Moreover, the snowdrift game requires that defection offers the best payoff against a cooperator but cooperating is favourable to defection against another defector [50]. In a real-life environment, this implies the public good is unreachable without some minimum investment. Therefore, a value of 10 is chosen as this allowed a single participant to meet the necessary threshold in a single round.
In the asocial condition, participants received no information about their groupmates' behaviour and only received feedback on their own earnings from the round (calculated based on the amount received from the pot and what they kept). In the social information condition, at the end of each round, participants viewed a table showing each group member's ID, their contribution to the pot for that round, their cumulative score and the average donation across the group. The participant with the most cumulative points was labelled as the current leader with text beside their ID (see Supplementary Materials). The participant who scored the highest in the quiz was also labelled with stars (*) around their name. This information could be used if participants engaged in particular social learning strategies: the average donation for conformity; the identification of the PGG leader for a payoff bias; and the identity of the individual who scored the highest in the quiz as a proxy for prestige.
Once the social dilemma game was completed, participants were debriefed, and basic demographic information was collected. Participants also answered two short free text questions to explain their decisions in the experiment (see Supplementary Materials). Participants were also asked to rate their understanding of the game to ensure they had understood the protocol [86].

Participants
Participants were recruited using Amazon's Mechanical Turk and completed the experiment online. After filtering out groups with missing data (exclusion criteria is described below) this left 286 participants in the dataset. Of those participants, for whom demographic information was available, the median age was 32.5 years (IQR = 13 years) with 181 men, 97 women, and 1 non-binary individual. Participants self-identified as White (195), Black (23), Asian (21), Hispanic (11), other (12), or did not report this information (21). The experiment took between 5 and 10 min to complete. Participants earned a minimum of $1 for completing the experiment but could earn a further maximum bonus of $3 dependent upon their cumulative points score. On average, participants earned a bonus of $1.62, resulting in them earning above US federal minimum wage. Despite equal recruitment across conditions, there were unequal completion rates (see Table 4). The required sample size was determined using simulated data (code available at https: //osf.io/vx78c/ accessed on: 21 November 2021) Table 4. Distribution of participants and groups across experimental conditions.

Data Analysis
All analyses were pre-registered on open science (https://osf.io/vx78c/ accessed on: 21 November 2021), though minor deviations to this are explained below. Analyses were performed in R studio version 4.0.0 [98] using the packages tidyverse, ggplot2, and brms [99]. Though the use of rethinking was pre-registered, brms creates equivalent models and both packages use MCMC (Markov chain Monte Carlo) to construct the posterior distribution. Groups were excluded from analysis if either; the group had fewer than three participants complete the experiment (to avoid 1:1 return from investment and collapse of the social dilemma [2]) or the individual who scored the highest in the quiz had dropped out. This also deviated from pre-registration, where any groups of less than four were planned to be dropped. This was necessary to maintain sufficient power because far more groups than anticipated were incomplete. The analysis excluded the first round's donation behaviour, as participants in the social information condition had not viewed any social information by this point. Finally, it was pre-registered that only the social condition would be used in models evaluating social learning strategies. Adding the interaction with the binary variable social meant it was not necessary to filter the data in this way.
Cooperation (whole number of points donated between 0 and 10) was treated as a categorical variable for the purposes of the models, so Bayesian ordered probit regressions were fit to the data. An ordinal outcome was appropriate for cooperation, as it is not truly a continuous measure and using a cumulative log odds link function permits non-linearity between the levels of the variable. Bayesian models were fit using the brms package [99] and the posterior distribution constructed using MCMC and main model results validated in JAGS. All models were fit with four chains of 1000 warmup samples and 5500 samples for inference, using weakly regularising priors. Model diagnostics indicated good parameter identification and model convergence (Rhat values between 1.00 and 1.01 and lowest bulk effective sample size > 2500).
Multiple models were fit to evaluate each research question, each containing different combinations of predictor variables. Though each model differed in terms of additional predictor variables (described in the respective sections), all models contained a random effect of participant to account for autocorrelation between repeat observations and variability between participants. The models used to infer the effect of social learning strategies on the degree of cooperation also contained binary variable fixed effects of game structure (snowdrift), social information condition (social) and being prestigious (prestigious participant, see Appendix C) to act as control variables. For social, the integer 1 indicated it was the social information condition, for snowdrift the integer 1 indicated it was the snowdrift condition and prestigious participant the integer 1 indicated they were the prestigious participant.
The structure of the Prestige + Conformity + Payoff model is shown below, where Cooperation is predicted by a vector of probabilities p and each response value k is linked to an intercept parameter a k , with additional deviation from participant level effects (ε participant ) and slopes for each of the possible predictors (e.g., payoff bias, prestige bias, and conformity). This produced an estimate of the cumulative log odds for all values of cooperation.
In f , k = 11 φ = β 1 * snowdri f t + β 2 * prestigious participant + S * social + ε participant S = (β 3 + β 4 * prestige bias + β 5 * payo f f bias + β 6 * con f ormity bias) α 1:10 ∼ Normal(0, 1.5) The predictive ability of competing models was evaluated using Widely Applicable Information Criteria (WAIC) which is computed from the log likelihoods of models and a parameter penalty to ensure predictive ability is balanced against the risks of overfitting. The calculated value indicates predicted out-of-sample deviance, where lower values indicate lower deviance (and thus, better fit). Following model evaluation, model predictions were plotted to visualise the results. Note that all model predictions were generated using the average of the participant effects. For further discussions on this approach, see [100,101].

Simulation Model
We simulated the long-term consequences of participant behaviour, as estimated by the Bayesian model (see Appendix D for full details of the simulation). In the basic horizontal-transmission model, agents played repeated PD games. Like the experiment, at each round agents donated between 0 and 10 units and received a payoff according to their donation and the total in the public good. Each agent's contribution for the next round, was calculated by sampling from the posterior of the Bayesian model that includes all three social learning strategies. Additionally, each agent had an intercept parameter, which can be thought of as their baseline propensity towards cooperation, independent of their social learning strategy use, drawn from the posterior distribution of intercepts estimated from the experiment.
In addition to matching the simulated conditions to those of the experiment, we also examined dynamics across different group sizes (see Appendix D) and varied agent intercept values to establish their influence on dynamics over that of the social learning strategies estimated from the experiment. We also introduced selection to examine what happens if, consistent with the replicator equation [40], the distribution of agent intercept values changes each round (or timestep) in proportion to the payoffs received by the agents.
Supplementary Materials: The following are available online at https://osf.io/vx78c. Table S1: Quiz questions and instructions for part 1, Table S2 Informed Consent Statement: Informed consent was obtained from all participants involved with the study.

Data Availability Statement:
The data and analysis scripts used in this study are openly available here (https://osf.io/vx78c (accessed on 21 November 2021)).

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Free Text Responses
During debriefing and collection of demographic information, participants were asked to provide free text responses (of any length) to two questions. "How do you think most people would say you should behave in this game?" and, "Why did you choose to behave how you did in the game?". Overall, most responses were comments unrelated to the question such as "good" or were too vague to determine their meaning (for example, "like you normally would behave"). Where possible, comments were grouped according to themes in participant's responses.
Question 1 (Table A1) sought to identify if there was any pattern in social norms participants felt were associated with the game. Of those that could be grouped, 90 participants either directly suggested cooperative behaviour (or a synonym of this) or made some reference to "fair" or group beneficial behaviour. Twenty-one suggested behaviours should be sensitive to the behaviour of others, which indicates concerns for reciprocity. In this sample, only 11 participants noted selfish behaviour as the expected norm. A further 14 participants indicated they didn't know or that they expected variability in individuals' opinions. Nevertheless, the general expected norm for participants in this experiment was for cooperative behaviour. Table A1. Summed responses from the question "How do you think most people would say you should behave in the game" grouped according to the theme of the response.

Theme Count
Unrelated/Unclear comments 148 Behave cooperatively/alluding to "fair play" 90 Reciprocate others cooperation or behave tactically in response to others 21 Unsure/provided no strong indication 14 Behave selfishly or for your own personal benefit 11 Question 2 (Table A2) sought to better understand the conscious decision making made by participants in this experiment. The two most common themes (40 and 39 participants respectively) were comments suggesting maximising earnings or that decisions were based on observing their groupmates. It should be noted that, although 39 participants reported basing their decisions on their group mates, no participant made explicit mention of any social learning strategies. Twenty-six participants claimed to be seeking to actively benefit their group and a further nine did so irrespective of their group's contributions. This conforms somewhat with findings from experimental games which find most participants can be grouped into strong cooperators/free riders, or conditional cooperators [102]. Seventeen participants referenced attempting to behave "fairly" to their group or that it was the "right thing to do". Five participants reported following no particular strategy (reportedly choosing randomly, in some cases), two alluded to the warm glow effect [103] and feeling good from donating points and one participant reported wanting to "punish" those withholding points. Table A2. Summed responses from the question "Why did you choose to behave how you did in the game" grouped according to the theme of the response. Overall, there was considerable variation in participants self-reported behavioural strategies but (aside from being generally influenced by their group) this did not include any explicit references to the social learning strategies addressed in this experiment. This ties into debates from the social learning literature regarding the conscious or unconscious use of social learning strategies [104]. In this case, the patterns associated with social information in our experiment do not appear to have occurred consciously, though the frequent mention of maximising earnings may imply the use of a payoff bias. Of note, participants made no mention of the highest scoring participant in the quiz (the proxy for prestige), which further supports concerns that this individual was not granted high status by participants.

Appendix B. Conformity Variable
As described in the main text, the conformity information used in the models included participant's own behaviour. Though this is consistent with what participants saw in the experiment, it means that participants contribute to their own social learning parameter, which may inflate the estimate associated with conformity. Therefore, the conformity information is reconstructed to exclude each participants behaviour from the average and the Prestige + Conformity + Payoff model was rerun with this corrected variable. The prestige and payoff bias variables remain the same. Figure A1 shows plotted model predictions from the new model for varying levels of social information and Figure A2 shows model parameters from the corrected model plotted beside the Prestige + Conformity + Payoff model from the main text (uncorrected). The new conformity variable does not change the pattern of predictions. Payoff bias remains the strongest influence on participant cooperation while conformity and prestige remain weakly positive. The effect associated with conformity is virtually unchanged from that of the uncorrected model.
The parameter values are also virtually unchanged. The effect of the social condition remains negative while the difference between the SD and PD remains weakly positive. The social learning parameters did not change at all, meaning payoff bias remains as the most impactful social learning strategy. Therefore, the qualitative interpretations (and, largely, quantitative results) of the study remain unchanged whichever conformity variable is used in the analysis.

Appendix C. Evaluating the Impact of Being Prestigious on Levels of Cooperation
To evaluate the impact of being prestigious on cooperative behaviour, a similar strategy to that of evaluating social condition and game structure was taken. Note that the goal was to evaluate whether the subset of prestigious participants was more cooperative than non-prestigious participants. This is separate from the question addressed in the main text of whether participants were being influenced by the prestigious individual. The Prestige + Conformity + Payoff model from the main text was compared to an equivalent model

Appendix C. Evaluating the Impact of Being Prestigious on Levels of Cooperation
To evaluate the impact of being prestigious on cooperative behaviour, a similar strategy to that of evaluating social condition and game structure was taken. Note that the goal was to evaluate whether the subset of prestigious participants was more cooperative than non-prestigious participants. This is separate from the question addressed in the main text of whether participants were being influenced by the prestigious individual. The Prestige + Conformity + Payoff model from the main text was compared to an equivalent model which dropped the binary variable for prestige and another which also permitted an interaction with the social condition. This is to assess the possibility that prestigious individuals may have sought to lead their group by example, as is described in theoretical work [28]. If this is the case, it would be expected that increased cooperation would only occur in the social condition (where behaviour is observable). WAIC scores and model weights are reported below (Table A3). Table A3. WAIC values and model weights for models evaluating the impact of prestige. Standard error difference shows the standard error in the difference between each model and the model with the lowest WAIC value. The * denotes an interaction in the model. The * denotes an interaction in the model.

Model
As in the evaluation between game structure and social condition, there is similarly no clear distinction between any of the models. The small differences between WAIC indicate considerable uncertainty as to whether prestige should be in the model or not and that each model is likely to make very similar predictions. However, the models which include prestige are slightly favoured over the model that does not, which indicates some improvement in fit (combined weight = 0.82). To help visualise the predicted effects, model predictions from the Prestige * Social condition model are plotted below in Figure A3. There is a slightly greater spread of predicted values for prestigious individuals which is likely explainable by the smaller number of observations. Otherwise, there is little indication of any difference between the prestigious and non-prestigious participants and further, no clear indication of any patterns in the interaction effect. The mean parameter estimate of the fixed effect for prestige in the interaction model was 0.41 (95% PI = −0.18; 1.00) which overlaps 0 quite considerably, indicating uncertainty in the effect. Overall, there is little evidence that prestigious individuals are more cooperative than non-prestigious individuals.
As in the evaluation between game structure and social condition, there is similarly no clear distinction between any of the models. The small differences between WAIC indicate considerable uncertainty as to whether prestige should be in the model or not and that each model is likely to make very similar predictions. However, the models which include prestige are slightly favoured over the model that does not, which indicates some improvement in fit (combined weight = 0.82). To help visualise the predicted effects, model predictions from the Prestige * Social condition model are plotted below in Figure  A3. There is a slightly greater spread of predicted values for prestigious individuals which is likely explainable by the smaller number of observations. Otherwise, there is little indication of any difference between the prestigious and non-prestigious participants and further, no clear indication of any patterns in the interaction effect. The mean parameter estimate of the fixed effect for prestige in the interaction model was 0.41 (95% PI = −0.18; 1.00) which overlaps 0 quite considerably, indicating uncertainty in the effect. Overall, there is little evidence that prestigious individuals are more cooperative than non-prestigious individuals. Figure A3. 1000 predictions of mean cooperation drawn from the posterior distribution across all experimental conditions for prestigious (grey) and non-prestigious individuals (yellow).

Appendix D. Description of the Simulation Model
We constructed a simulation model which emulated the experiment and used the Bayesian posterior distributions and the Prestige + Conformity + Payoff model formula to affect changes in cooperation under various conditions. At each round, each agent could contribute between 0 and 10 to the public good and then received a payoff calculated from the sum contributions of others divided by group size plus how much they kept (as in the experiment). Agents played in groups of varying sizes (group size is noted for each model below) across 100 rounds. This was repeated for 100 groups. One difference to the experiment was the prestigious agent was assumed to not partake in the public good. Instead, it was assumed that the prestigious agent could only be observed by the population, and their contribution level was set to either 2 or 8 (the value used is specified for each model). This kept the prestige effect consistent between groups as otherwise the prestigious agent's behaviour would change alongside the other agents. The population was finite and did not change size over time. For simplicity, we simulated only the PD and not the SD payoff structure used in the experiment. Agents change their behaviour using the model formula of the Prestige + Conformity + Payoff model shown below.
Cooperation ∼ Categorical(p) p k = q k , k = 1 q k − q k−1 , k > 1 logit(q k ) = α k − φ φ = S + Intercept S = β social + β prestige * prestigious contribution + β payo f f * top payo f f contribution * not the top earner + β con f ormity * average contribution At the start of the simulation, each agent samples from the posterior distribution intercepts (α k ) from the Bayesian model. These alongside the cooperation values of other agents in the current round are used to calculate S. The exception is the level of cooperation practiced by the prestigious agent, which is always set to 2 (low cooperation) or 8 (high cooperation).
At each round, we calculate a new cooperation value using the model formula above. A vector of probabilities for each level of cooperation (p), comprised of the probability of each individual level of cooperation (k), is used to generate a new value of cooperation from 0-10. p k is calculated by reversing the logit on the vector of cumulative probabilities (q k ) which is calculated by subtracting the linear model term (φ) from the cut points (α k ) sampled from the posterior distribution. φ is given by the sum of S (social learning effect) and a varying intercept for each agent which can be thought of as their baseline propensity to cooperate irrespective of social learning. S is calculated using β values sampled from the posterior distribution alongside the social information of the current round. Note that to maintain the degree of parameter uncertainty expressed by the Bayesian model, each agent sampled unique values for the parameters each round when using horizontal transmission.
See main text for main results.

Appendix D.3. Varying Cooperative Inclination of the Prestigious Individual
Here we compare dynamics across different constant contributions of the Prestigious individual, taking a value of 2 or 8. Figure A4 shows that the mean cooperation is weakly correlated with the degree of cooperation exhibited by the prestigious individual, but does not change the overall pattern of results Figure A4. Cooperation rates across 100 rounds (N = 100) where cooperation modifies by horizontal transmission only. Low cooperation from the prestigious individual (left) and high cooperation from the prestigious individual (right). Black line shows mean cooperation rates across all groups and coloured lines show cooperation rates for 8 randomly drawn groups.

Appendix D.4. Varying Group Size
Here we compare the effect of group size upon mean levels of cooperation exhibited across the population. Figure A5 shows that mean cooperation rates stabilise to approximately the same level, irrespective of group size. Though, on average, smaller groups can sustain slightly higher cooperation levels than larger groups.