You are currently viewing a new version of our website. To view the old version click .
Behavioral Sciences
  • Review
  • Open Access

13 December 2025

Decision-Making in Repeated Games: Insights from Active Inference

,
,
,
and
1
State Key Laboratory of Cognitive Science and Mental Health, Institute of Psychology, Chinese Academy of Sciences, Beijing 100101, China
2
Department of Psychology, University of Chinese Academy of Sciences, Beijing 100049, China
*
Author to whom correspondence should be addressed.
This article belongs to the Section Behavioral Economics

Abstract

This review systematically explores the potential of the active inference framework in illuminating the cognitive mechanisms of decision-making in repeated games. Repeated games, characterized by multi-round interactions and social uncertainty, closely resemble real-world social scenarios in which the decision-making process involves interconnected cognitive components such as inference, policy selection, and learning. Unlike traditional reinforcement learning models, active inference, grounded in the principle of free energy minimization, unifies perception, learning, planning, and action within a single generative model. Belief updating occurs by minimizing variational free energy, while the exploration–exploitation dilemma is balanced by minimizing expected free energy. Based on partially observable Markov decision processes, the framework naturally incorporates social uncertainty, and its hierarchical structure allows for simulating mentalizing processes, providing a unified account of social decision-making. Future research can further validate its effectiveness through model simulations and behavioral fitting.

1. Introduction

Decision-making in repeated games is a crucial aspect of human intelligence and has been the focus of extensive research across fields such as psychology and artificial intelligence (Akata et al., 2025; van Dijk & De Dreu, 2021). In repeated games, participants engage in multiple rounds of interpersonal interactions that closely resemble the real social world—a world frequently portrayed as more unpredictable and uncertain than the non-social world (Feldman Hall & Shenhav, 2019). Decision-making within social contexts presents significant complexity and difficulty. This stems not only from the challenge of identifying choices that maximize self-interest given the unpredictability of others’ behaviors, but also from the necessity of reconciling the self-interest with the interests of others, which requires a trade-off between cooperation and competition (Lee, 2008). Such decision-making presents significant challenges for both artificial intelligence and human agents. Consequently, a deeper investigation into its cognitive mechanisms is crucial not only for enhancing human decision-making capabilities but also for advancing AI towards greater intelligence, enhanced adaptability to complex social environments, and more effective collaboration with humans.
Traditional psychological methods are limited in their ability to assess such complex dynamic processes. Computational modeling, as a quantitative methodology characterized by rigor, scientific precision, and interpretability, has been extensively applied in domains such as computational psychiatry (Hitchcock et al., 2022; Montague et al., 2012), and has demonstrated emerging utility in social psychology (Cushman, 2024; Hackel & Amodio, 2018). It enables the simulation of complex cognitive processes underlying behavioral phenomena through mathematical formalisms, provides rigorous scientific characterization of the dynamic mechanisms governing human behavior, and derives latent variables from behavioral data that are not directly observable (Montague, 2018). In doing so, it establishes a novel theoretical framework for advancing understanding of psychological and neural mechanisms, while overcoming the limitations of traditional research methods.
Various families of computational models have been developed to characterize the decision-making process. Unlike reinforcement learning, which frames decision-making as an adaptive behavior driven by the principle of maximizing external rewards (Sutton & Barto, 1998), active inference conceptualizes decisions as a process of minimizing free energy (K. Friston et al., 2016). In this framework, the decision-making agent explores the environment to reduce uncertainty, minimizing the discrepancy between anticipated outcomes and preferred outcomes, while dynamically integrating perception and policy selection. The active inference framework offers a promising model for simulating human cognitive processes in partially observable environments, such as social contexts. This approach could significantly deepen our understanding of how human intelligence navigates adaptive decision-making within ever-changing social settings.

2. Game Theory and Repeated Games

2.1. The History of Game Theory Development

The development of game theory has undergone a long process from ideas to theory, and from simplicity to complexity, as shown in Figure 1. Long before the formal establishment of modern game theory, game-theoretic ideas were already present. As early as in ancient times, Sun Tzu’s The Art of War in China and Machiavelli’s The Prince in the West contained the core game theory concepts, such as strategic interaction and the balancing of interests. In the 16th century, the Italian mathematician Girolamo Cardano, in his book Liber de Ludo Aleae, was the first to apply mathematical methods to analyze the probabilities and payoffs of gambling games like dice, marking the germination of game theory thought.
Figure 1. The history of game theory development.
The work of John von Neumann established game theory as an independent field in the first half of the 20th century. In 1928, Von Neumann published On the Theory of Parlor Games, proving for the first time the minimax theorem, which provided a solid mathematical foundation for two-person zero-sum games (Neumann, 1928). He simplified its proof as an extension of Brouwer’s fixed-point theorem, a method that later became standard in game theory and mathematical economics. Collaborating with economist Oskar Morgenstern, von Neumann developed these ideas into a monumental work published in 1944, Theory of Games and Economic Behavior. This book was the first to establish a systematic axiomatic framework for game theory; it explicitly positioned game theory as an analytical tool for economics, laying the foundation for its later widespread application (Von Neumann & Morgenstern, 1944).
After game theory became an independent discipline, its core concepts were continuously broken through and refined, leading to a significant improvement of its theoretical system. John Nash made outstanding contributions by proposing the most important concept in game theory—the Nash Equilibrium (Nash, 1950). In a multi-player game, when the strategies of all players form a stable combination such that no player can improve their own payoff by unilaterally changing their own strategy, this strategy profile constitutes a Nash Equilibrium. The Nash Equilibrium is a more general solution concept than the traditional minimax solution, applicable not only to zero-sum games but to all game models. John Harsanyi proposed the Bayesian Nash Equilibrium, applicable to situations where players do not have complete information about their opponents’ types (Harsanyi, 1968). By introducing a probability distribution over the opponent’s types, he successfully transformed incomplete information into imperfect information, breaking through the previous limitation of game theory focusing mostly on complete information. This allowed for a more accurate simulation of human decision-making under uncertainty. Reinhard Selten proposed the Subgame Perfect Nash Equilibrium, which ensures that equilibrium strategies are reasonable and feasible in every subgame by eliminating incredible threats in sequential games, thus refining the Nash equilibrium from a dynamic perspective (Selten, 1975). For constructing a complete, rigorous, and realistic theoretical framework for game theory and making groundbreaking contributions to equilibrium analysis, John Nash, John Harsanyi, and Reinhard Selten jointly received the Nobel Memorial Prize in Economic Sciences in 1994.
In recent years, the development of game theory has gradually extended beyond the boundaries of economics, deeply integrating with psychology, biology, computer science, and other fields, leading to a series of interdisciplinary studies. Scholars like Daniel Kahneman introduced cognitive biases into game analysis, revising the assumption of perfect rationality and promoting the development of behavioral game theory (Kahneman & Tversky, 1979). John Maynard Smith, among others, pioneered evolutionary game theory, using the concept of an Evolutionarily Stable Strategy to explain competition and cooperation in animal behavior (J. M. Smith & Price, 1973). In the field of computer science, game theory provides a framework for strategy optimization and equilibrium decision-making in multi-agent interactive scenarios, while computer science, in turn, provides efficient algorithms for solving game-theoretic problems (Nisan et al., 2007).

2.2. Repeated Games

Based on the sequence of players’ actions and the availability of information, games can be classified into static games and dynamic games. In static games, players act simultaneously with no knowledge of prior actions; whereas in dynamic games, players move sequentially and have access to the observable history of play. Repeated games are a special form of dynamic game, where a stage game with the same structure is played multiple times in succession. The conditions, rules, and content of each iteration remain identical, and participants can observe historical behaviors after each round of interaction, requiring them to balance short-term gains with long-term interests when making decisions.
Previously introduced scholars John Nash and Reinhard Selten made significant contributions to establishing equilibrium concepts in repeated games. Beyond their work, many other researchers have conducted in-depth studies on repeated games. Robert Aumann pioneered the theory of infinitely repeated games, demonstrating that rational players can achieve stable cooperation through trigger strategies (e.g., continuing to cooperate as long as the opponent cooperates, but triggering retaliatory defection upon observing the opponent’s betrayal). This resolved the core puzzle of how cooperation can emerge in repeated games and provided a theoretical framework for long-term interactive scenarios (Aumann, 1981). Thomas Schelling integrated repeated games with real-world social contexts, refining the classification framework into conflict games, cooperative games, and coordination games. He also introduced concepts such as focal points and commitment strategies, explaining how participants in repeated interactions can quickly reach cooperation through shared cognition (Schelling, 1960). In the Prisoner’s Dilemma tournament, researchers verified the optimality of the tit-for-tat strategy (i.e., imitating the opponent’s move from the previous round in each subsequent round) in repeated games (Axelrod, 1980). This strategy, being simple and clear, consistently promoted cooperation and provided strong empirical and simulation-based support for understanding cooperative behavior in the real world.
Depending on the nature of interdependence among players’ interests, repeated games can be categorized into three distinct types: conflict games, cooperation games, and coordination games (Schelling, 1960).
Conflict games, also known as zero-sum games, are strategic situations in which the interests of players are inherently antagonistic. In these games, the total sum of payoffs and losses across all players always equals zero, meaning one player’s gain necessarily implies another’s loss (Schelling, 1960). Within this framework, cooperation among players is structurally impossible; instead, decision-making is premised on competition. Participants focus on concealing their own strategic intentions while attempting to infer opponents’ strategies and psychological states to optimize their own outcomes. Conflict games manifest ubiquitously across real-world scenarios, encompassing competitive sports (e.g., table tennis), board games (e.g., chess), as well as the game of rock-paper-scissors. In psychological research, such adversarial frameworks are frequently used as experimental paradigms to investigate deception, inequality, and related strategic performance (Lacomba et al., 2017; Zhang et al., 2017).
Cooperation games, also known as mixed-motive games, characterize interactions where individuals face both competitive and collaborative incentives, resulting in payoff structures that are neither purely adversarial nor fully aligned. These games inherently involve resource allocation between self and others, compelling players to navigate a critical trade-off: whether to cooperate to maximize collective benefits or defect to secure personal gains (Miller Moya, 2007). Unlike zero-sum conflict games, participants in cooperative frameworks exhibit stronger motivations to understand opponents’ strategies (Wang & Kwan, 2023). Cooperation games such as the Prisoner’s dilemma, trust game, and dictator game are widely employed to investigate the formation of cooperation and defection (Bonowski & Minnameier, 2022; Loennqvist & Walkowitz, 2019; Press & Dyson, 2012). The Prisoner’s dilemma stands as a canonical example, in which two players independently choose between “cooperation” and “defection” without communication. Its payoff structure is characterized by four critical outcomes: mutual cooperation yields moderate rewards for both players; mutual defection results in moderate penalties; while unilateral defection provides the defector with maximum gains at the expense of the cooperator, who incurs the highest penalty. This matrix is shown in Figure 2. The payoff matrix of the game describes the strategies available to players and their corresponding outcomes. Based on this matrix, strategies and their expected utilities can be quantitatively analyzed. In the one-shot Prisoner’s dilemma, defection constitutes the dominant strategy that maximizes individual payoff under rationality. However, in the repeated game, persistent defection risks triggering retaliatory defection from opponents in subsequent rounds, reducing long-term cumulative gains. It is within this dynamic that cooperation gradually forms (A. M. Colman et al., 2018).
Figure 2. Decision and payoff matrix of the Prisoner’s dilemma game.
The coordination game is a type of game in which the interests of participants are consistently aligned. In such a game, the payoffs and losses for all participants are identical, meaning that both parties will either succeed or fail together (Schelling, 1960). Under the rules of this game, due to the congruence of interests between the two sides, cooperation becomes their inevitable choice. The players only need to consider how to coordinate with each other, rather than deliberating on how to allocate resources or resolve conflicts of interest. Coordination games, such as matching games (e.g., the Heads-Tails game), involve scenarios in which players receive a positive payoff only if they choose the same option, whereas mismatched choices result in no payoff for either. Although less frequently studied in decision-making research, these games are commonly used to investigate inter-brain synchronization during cooperation (Cui et al., 2012; Pan et al., 2017).

3. Decision-Making in Repeated Games

Within social environments, human decision-making requires explicit consideration of others’ behaviors. Games provide a valuable methodological framework for quantitatively assessing such social decisions, with structured interaction scenarios enabling the study of social cognitive processes such as cooperation, competition, and trust formation (Maurer et al., 2018; Ng & Au, 2016; Zhang et al., 2017). Unlike single-round games in which participants have no shared history or future, repeated games more closely resemble real-life social interactions. These interactions involve multiple sequential encounters, with participants receiving feedback between rounds. The feedback enables participants to update their beliefs about the opponent and adjust their strategy accordingly, resulting in dynamically evolving behavior over time (Cochard et al., 2004).
On the spectrum of games, pure coordination games and zero-sum games represent the extremes, while mixed-motive games, which incorporate elements of both cooperation and conflict, occupy the intermediate space (Schelling, 1960). However, in the context of repeated games, regardless of the players’ incentive structures, the opponent’s behavior directly impacts the decision-maker’s choices and outcomes. The opponent’s hidden states and actions introduce social uncertainty that extends beyond non-social uncertainty. Social uncertainty refers to a person’s inability to precisely predict their own future states and actions due to uncertainty about others’ states and actions (Feldman Hall & Shenhav, 2019). The repeated game continuum provides a powerful framework for investigating decision-making processes under such social uncertainty.
Extensive research has focused on decision-making under non-social uncertainty. Individuals prioritize attention to cues that can reduce uncertainty (Walker et al., 2019) and pay greater attention to options with higher uncertainty (Stojic et al., 2020a). Increased uncertainty enhances individuals’ learning rates (Speekenbrink & Shanks, 2010) and promotes exploratory behaviors (Stojic et al., 2020b). While the mechanisms by which non-social uncertainty influences decision-making can be extended to social uncertainty to some extent, social contexts inherently involve greater unpredictability. This is because the behavioral motives of others are difficult to directly observe and evolve dynamically over time. Additionally, in contexts involving repeated interactions with others—such as repeated games—suboptimal decisions can have long-lasting negative effects. These effects may hinder the formation of cooperation with others and ultimately compromise one’s own interests.
To resolve social uncertainty and make adaptive decisions, individuals develop specialized mechanisms in which social inference plays an important role. Inference can be categorized into automatic and controlled inference based on the degree of cognitive control and the corresponding effort costs (Feldman Hall & Shenhav, 2019). When encountering opponents in a game, individuals rapidly and automatically form initial impressions—including whether they are trustworthy, threatening or risk-seeking—based on the opponent’s features like skin color and clothing (Hughes et al., 2017). When information about others is scarce, people quickly resort to established social norms, assuming others are more likely to make decisions based on principles like cooperation and trust (Fleischhut et al., 2022). Automatic inference consumes minimal cognitive resources and can strongly constrain subsequent predictions about opponents. In contrast, controlled inference requires greater cognitive control to infer the opponent’s motives and specific behaviors in a more nuanced manner, thereby updating the initial impression. People often engage in perspective-taking, adopting the viewpoint of others or their situational context to imagine or infer their perspectives and attitudes (Devaine et al., 2014; Galinsky et al., 2005). This process narrows the scope of predicting others’ behaviors and further reduces social uncertainty.
Based on inferences about their opponent, individuals need to engage in appropriate policy selection in games. Traditional behavioral game theory posits that policy selection is based on the principle of expected utility maximization (Von Neumann & Morgenstern, 1944). Individuals are assumed to rationally weigh the future rewards of different action sequences and choose the policy that yields the highest payoff. However, this principle faces explanatory limitations in social decision-making contexts characterized by high uncertainty (A. Colman, 2003). It relies on the individual’s ability to form and update precise probabilistic beliefs about others’ behavior, which poses a significant challenge when the opponent’s strategy is unknown or dynamically changing. The recently developed active inference framework offers a novel perspective. It proposes that policy selection is grounded in the principle of free energy minimization—that is, minimizing prediction error (Parr & Friston, 2019). This process entails both minimizing the discrepancy between actual outcomes and the individual’s preferred outcomes, and actively reducing uncertainty about the environment. Consequently, the driver of policy selection expands from merely pursuing reward to a dual pursuit of both pragmatic and epistemic value.
In repeated games, the outcomes resulting from policy selections directly guide an individual’s subsequent decisions. Participants can directly observe opponents’ responses to their behavioral choices, receive feedback between game rounds, and engage in learning (Behrens et al., 2007). They integrate new feedback evidence with prior inferences by weighting these information sources, thereby updating their predictions about the opponent (Speekenbrink & Shanks, 2010). Depending on whether the feedback is consistent with prior predictions, it can broaden or narrow the belief distribution concerning the opponent’s likely behaviors (Feldman Hall & Shenhav, 2019). As the game unfolds, participants continuously gather information about the opponent, learning their behavioral patterns. The learning rate is influenced by uncertainty; at the beginning of repeated games, when social uncertainty is highest, the learning rate is also at its peak (Courville et al., 2006; Speekenbrink & Shanks, 2010). At the same time, participants face a critical exploration–exploitation trade-off during learning (Gershman, 2019; Krafft et al., 2021). They must weigh the choice between exploring unknown strategies to observe the opponent’s response and gain information about the value of such behaviors, versus exploiting known strategies that maximize immediate payoff based on past experience (Speekenbrink & Konstantinidis, 2015). Excessive exploration may lead to reduced efficiency, while prematurely exploiting existing experience may result in missing out on better strategies.
Social gaming situations evoke high levels of uncertainty, during which the decision-making process, as illustrated in Figure 3, involves interrelated mechanisms of inference, policy selection, and learning. Automatic inference and controlled inference operate simultaneously as two poles of a continuum, constraining predictions regarding the intentions and behaviors of opponents. These predictions guide participants’ strategic choices based on different principles and are continuously updated through learning in repeated games, thereby further reducing social uncertainty. Human social cognition resembles a sophisticated computational system, allowing individuals to predict, respond to, and coordinate with others’ behaviors, thereby underpinning the stability of social interactions.
Figure 3. Decision-making process in repeated games.

4. Computational Modeling for Decision-Making

With the advancement of computational science, computational ideas pervade many areas of science and play an integrative explanatory role in cognitive sciences and neurosciences (Montague et al., 2012). While focusing on human social behavior, social psychology is committed to explaining social life based on the general principles of human cognition (Cushman & Gershman, 2019). Computational ideas have thus been increasingly adopted and utilized by social psychology researchers. Computational modeling offers a rigorous framework for describing abstract theories underlying human sociality in a clear and interpretable manner (Cushman, 2024). Since decision-making in repeated games involves recursive cognitive processes such as inference and learning—where recursivity, characterized by cyclic causal relationships, represents a core concept in computational modeling (Qinglin & Yuan, 2021). Additionally, computational modeling establishes a system of interacting relationships, allowing models of inference and learning to be integrated to capture more complex cognitive processes. Thus, employing computational modeling to investigate the underlying social psychological mechanisms in repeated games is highly feasible.
Current computational models for decision-making research can be primarily categorized into two major classes: reinforcement learning models and Bayesian models. Reinforcement learning models, grounded in the rational agent assumption, are widely used in the field to simulate how individuals learn from outcome feedback during interactive behaviors (Tomov et al., 2021; Tump et al., 2024; Yifrah et al., 2021). These models conceptualize the individual-environment interaction as a Markov Decision Process (MDP), in which the agent can observe all possible states of the environment and influence it through actions. As the state of the environment transitions, a reward function provides feedback to the individual, who in turn adapts their behavior based on the principle of maximizing cumulative rewards (Puterman, 1994). Researchers have integrated various cognitive strategies with reinforcement learning frameworks to develop a range of decision-making models. For instance, the Prospect-Valence Learning model posits that individuals evaluate the expected utility of different options and update expected valences using reinforcement learning rules to guide decisions (Ahn et al., 2008). Similarly, the Valence-Plus-Perseverance model combines heuristic strategies with reinforcement learning, proposing that choices are influenced by previous actions and their outcomes, which are integrated with expected valence to inform subsequent decision-making (Worthy et al., 2013).
The rational agent assumption of reinforcement learning models is not fully consistent with individuals’ decision-making behaviors in daily life, and the environment in which individuals operate is not fully observable—instead, it is fraught with uncertainty. Therefore, unlike reinforcement learning models, Bayesian models for decision-making are based on the Partially Observable Markov Decision Process (POMDP), which formalizes the state of the decision-making environment as only partially observable (Itoh & Nakamura, 2007). Bayesian models assume that individuals exhibit bounded rationality and hold their own prior beliefs about environmental states; they can update these beliefs based on feedback obtained from interactions to form posterior beliefs, and then make decisions accordingly. In comparison, the Bayesian framework conceptualizes decision-making as a process of belief updating based on both internal and external information under uncertainty, thereby offering a more reasonable explanation for decision-making behavior in everyday social interactions (Feldman Hall & Shenhav, 2019). Meanwhile, the Bayesian framework does not require individuals to behave in a strictly Bayes-optimal manner. Rather, approximate Bayesian inference processes can effectively account for human decision-making behavior.

5. Basic Concepts of Active Inference

Active inference, as a specific implementation and computational framework of approximate Bayesian inference, is founded on a key premise: that organisms possess an internal generative model of the statistical regularities of their environment. Equipped with this model, an organism can infer the hidden causes of its perception and select optimal actions to achieve desired outcomes. Simultaneously, the active inference framework is underpinned by two core concepts (R. Smith et al., 2022). The first is that decision-makers are not simply passive Bayesian observers. Instead, they actively engage with the environment to gather information and seek preferred observations. The second is Bayesian inference, which refers to the uncertainty-weighted updating of prior beliefs based on the observed distribution of possible outcomes, as formalized by Bayes’ theorem:
p s | o   =   p o | s p s p o
In Bayes’ theorem, the left-hand side, p(s|o), represents the posterior belief about possible states (s)—that is, the updated belief distribution after incorporating a new observation (o). Here, s is an abstract variable that can represent any entity about which beliefs can be formed. On the right-hand side, p(s) denotes the prior belief about s before acquiring the new observation. The term p(o|s) is the likelihood term, representing the probability of observing o given that the state is s. The denominator p(o) represents the model evidence, also known as the marginal likelihood, which indicates the total probability of observing o. It serves as a normalization constant, ensuring that the posterior belief constitutes a valid probability distribution.
For instance, in a cooperation game, we need to infer whether an opponent is more inclined to cooperate or defect. Here, s represents the opponent’s behavioral type. Assuming the opponent can be coarsely categorized into two types: s1 represents a cooperative type and s2 represents a defecting type. The variable o corresponds to the observed actual behavior of the opponent in each round of the game. Prior to the game, we can form an initial impression of the opponent through social inference. For example, if we believe the opponent is inclined to cooperate, we may assign a higher prior probability to s1, such as p(s1) = 0.8, and consequently p(s2) = 0.2. Suppose during the feedback phase of the game, we observe our opponent’s defection. In this context, p(o|s1) signifies the probability that a cooperative-type opponent would defect; this value is expected to be low, for instance, p(o|s1) = 0.3. Conversely, p(o|s2) denotes the probability that a defecting-type opponent would defect, which would be higher, e.g., p(o|s2) = 0.7. The marginal likelihood p(o), which acts as a normalizing constant, is calculated as the total probability of observing a defection across all possible opponent types: p(o) = p(o|s1) × p(s1) + p(o|s2) × p(s2) = (0.3 × 0.8) + (0.7 × 0.2) = 0.38. The posterior belief that the opponent is cooperative is then updated via Bayes’ theorem: p(s1|o) = [p(o|s1) × p(s1)]/p(o) = (0.3 × 0.8)/0.38 ≈ 0.632. Similarly, the posterior belief that the opponent is defecting is: p(s2|o) = [p(o|s2) × p(s2)]/p(o) = (0.7 × 0.2)/0.38 ≈ 0.368. Based on this Bayesian update, after observing one instance of defection, the player’s belief about the opponent’s behavioral pattern shifts: the probability assigned to the cooperative type decreases from 80% to approximately 63.2%, while the probability of the defecting type increases from 20% to 36.8%. In subsequent rounds of the game, this posterior belief becomes the new prior. The player continues to perform Bayesian updates based on the opponent’s new actions, iteratively refining their beliefs to better approximate the opponent’s true behavioral pattern.
In this simple example, we were able to easily compute Bayes’ theorem numerically. However, beyond the simplest belief distributions, the marginal likelihood p(o) in Bayes’ theorem is computationally intractable. It requires summing the probability of observations under all possible states. As the number of state dimensions increases, the number of terms to sum grows exponentially. For instance, in a game, an opponent’s type might be defined by multiple parameters such as cooperative tendency, risk aversion, and capability. In such cases, calculating p(o) becomes an integration over an ultra-high-dimensional space, which is infeasible to perform directly. Since computing posterior beliefs through exact Bayesian inference is infeasible in complex models, approximation techniques are required to solve this problem. This computational infeasibility is the fundamental reason for the central concept of variational free energy (VFE) in active inference.
Since the exact posterior distribution p(s|o) is computationally intractable, a simple approximate posterior distribution q(s) is introduced. By adjusting parameters through optimization algorithms, q(s) is made to approximate p(s|o) as closely as possible, transforming the uncomputable Bayesian inference into an optimization problem of minimizing a computable function. The metric used to quantify the discrepancy between two distributions is the Kullback–Leibler (KL) Divergence. The closer two distributions match, the smaller the KL divergence. Through mathematical derivation based on the relevant equation, the divergence between q(s) and p(s|o) can be expressed as follows:
D KL q s p s | o = s q s ln q s p s | o = s q s ln q s ln p s | o = s q s ln q s ln p o , s p o = s q s ln q s ln p o , s + ln p o = s q s ln q s p o , s + ln p o = E q s ln q s p o , s + ln p o
According to Equation (2), in order to make q(s) as close as possible to p(s|o), it is necessary to minimize D KL q s p s | o . Since ln p(o) is a constant term independent of q(s), minimizing the KL divergence is equivalent to minimizing E q s ln q s p o , s , which is defined as the variational free energy (F). Since D KL q s p s | o ≥ 0, it can be inferred through equation transformation that the variational free energy F ln p o . ln p o represents the surprise of sensory input, where p o denotes the overall probability of obtaining an observation, also known as model evidence. A lower value of p o corresponds to a higher degree of surprise in the sensory input. For example, the probability of seeing an elephant in an office is very low, so the surprise associated with such an observation is high. As the variational free energy (F) is an upper bound on surprise, minimizing the variational free energy consequently minimizes surprise indirectly.
The equation for variational free energy and its further derivation are shown below. It is worth noting that in the active inference framework, variational free energy is typically computed conditioned on different policies (π). This is because active inference emphasizes that agents actively infer future observations based on potential courses of action, hence the approximate posterior and other quantities are conditioned on policies.
F π = E q s | π ln q s | π p o , s | π = E q s | π ln q s | π ln p o , s | π = E q s | π ln q s | π ln p s | π E q s | π ln p o | s , π = D KL q s | π p s | π E q s | π ln p o | s
According to the last line of Equation (3), minimizing variational free energy is equivalent to minimizing D KL q s | π p s | π while maximizing E q s | π ln p o | s . D KL q s | π p s | π refers to the difference between the approximate posterior distribution q(s|π) and the prior distribution p(s|π) under a given policy π. It represents the cost incurred to align posterior beliefs with prior expectations, serving as a measure of complexity. Minimizing this term means the agent tends to choose policies that minimize changes in its beliefs. Excessively high complexity leads to a phenomenon analogous to overfitting in statistics, where the agent adjusts its model to accommodate the randomness of a particular observation, thereby reducing its overall predictive capability. E q s | π ln p o | s is the expected log-likelihood, i.e., the expected logarithmic probability of observing o given state s. It reflects how accurately the model predicts observations based on beliefs. Therefore, minimizing variational free energy is achieved by simultaneously minimizing model complexity and maximizing predictive accuracy.
The active inference framework treats perception and learning as processes of minimizing variational free energy (K. J. Friston, 2010). Perception corresponds to updating posterior beliefs in real-time based on each new observation, providing the best interpretation of sensory input; learning corresponds to gradually adjusting model parameters over long-term observations to align with accumulated experience. In the processes of perception and learning, the agent is not solely concerned with finding the best-fitting posterior. It also strives to update its beliefs in the most parsimonious way, avoiding excessive deviations from prior beliefs, thereby achieving a balance between accuracy and complexity.
The active inference addresses not only the processing of past and present information but also encompasses the planning and action selection concerning future states. Similar to the principles underlying perception and learning, the goal of planning and action selection is to choose a policy π that minimizes variational free energy in the future. The key distinction lies in the extension of the variational free energy to incorporate expected future observations, yielding the expected free energy (EFE). The specific derivation of the expected free energy (G) is shown below. The first two lines of its equation closely resemble those of the variational free energy, with the only difference being the inclusion of future observations o under expectation. After decomposing expected free energy into two components related to information seeking and reward seeking in the third line, the fourth line introduces the agent’s preferences C. As a decision-making model, active inference similarly requires encoding preferences. Unlike reinforcement learning, which encodes preferences as an external reward function, active inference internalizes them as part of the generative model by incorporating preferred observations p o | C into the expected free energy.
G π = E q o , s | π ln q s | π p o , s | π = E q o , s | π ln q s | π ln p o , s | π = E q o , s | π ln q s | π ln p s | o , π E q o | π ln p o | π E q o , s | π ln q s | π ln q s | o , π E q o | π ln p o | C = E q o , s | π ln q s | o , π ln q s | π E q o | π ln p o | C
In Equation (4), the first term E q o , s | π ln q s | o , π ln q s | π in the final line represents the epistemic value, which expresses the expected log difference between the approximate posterior (based on the policy and the observations it generates) and the prior (based on the policy). A greater difference between the prior and the posterior indicates that the agent acquires more new information, implying a greater information gain from the policy and, consequently, a higher epistemic value. According to the equation, a higher epistemic value leads to a lower expected free energy for that policy. The second term E q o | π ln p o | C represents the pragmatic value, which expresses the expected log probability of the agent’s preferred observations. A higher probability of preferred outcomes indicates that the agent is more likely to obtain rewards, resulting in a greater pragmatic value. Similarly, a higher pragmatic value also leads to a lower expected free energy.
Planning and action selection are conceptualized as a process of minimizing expected free energy in active inference—that is, the agent seeking a policy that maximizes the sum of pragmatic value and epistemic value (Hodson et al., 2024; Parr & Friston, 2019). Consequently, during behavioral selection, the agent not only seeks to maximize reward returns by exploiting known resources but also pursues exploring unknown information to reduce uncertainty. Expected free energy offers a principled solution to the exploration–exploitation dilemma in repeated games (Gijsen et al., 2022). Exploration seeking epistemic value and exploitation seeking pragmatic value are regarded as two equally important aspects of minimizing expected free energy. The exploration–exploitation trade-off transitions from a sequential decision-making problem in reinforcement learning to the optimization of a single objective function associated with expected free energy minimization. The choice to explore or exploit depends on current levels of uncertainty and the level of expected reward (R. Smith et al., 2022). It is worth emphasizing that the epistemic value term within expected free energy formally instantiates a mechanism for directed exploration, analogous to curiosity, driving the agent to autonomously and actively seek observations that reduce uncertainty about hidden states (K. Friston et al., 2015; Parr & Friston, 2017).

6. Why Active Inference? Advantages for Decision-Making in Repeated Games

Active inference, a theoretical framework grounded in generative models, commonly formulated as Partially Observable Markov Decision Processes (POMDP), offers both depth and breadth for understanding decision-making in repeated games. Its strength lies not only in its ability to unify multiple cognitive processes under the single principle of minimizing free energy, but also in its capacity to more closely capture the core essence of human social interaction. Future research could validate the effectiveness of active inference models through computational model comparisons and simulations. Furthermore, by fitting these models to behavioral game data and neuroimaging data, we can rigorously test the psychological and neural significance of their parameters, thereby bridging the gap between computation, behavior, and brain function.

6.1. The Free-Energy Principle: A Unified Framework for Cognitive Integration and Behavioral Optimization

The free-energy principle posits that any self-organizing system (such as the brain) within a changing environment must minimize its free energy to maintain the homeostasis necessary for survival (K. J. Friston et al., 2006). Free energy, serving as an upper bound on surprise, is a tractable measure of prediction error. The most fundamental theoretical advantage of the active inference framework lies precisely in this free-energy principle. It transcends the paradigm of traditional models that treat perception, learning, planning, and action selection as separate modules, offering instead a unified and biologically plausible computational framework. This framework conceptualizes all these cognitive processes as different manifestations of a single principle: free energy minimization (K. J. Friston, 2010). This characteristic gives it with exceptional explanatory power for modeling the complex decision-making processes inherent in repeated games. Within the context of repeated games, a player’s perception of their opponent (inferring their hidden states), immediate action selection (choosing to cooperate or defect), and learning from multi-round interactions (updating beliefs about the opponent’s behavioral patterns) are no longer treated as separate cognitive processes. Instead, they are integrated into a unified process of active inference under this single framework.
Conventional cognitive science research often presupposes different optimization objectives for distinct cognitive functions such as perception and action. For instance, perception optimizes for accuracy, while action optimizes for utility. In social decision-making, the accuracy of individuals’ inferences about another person’s thoughts and feelings is crucial, as it pertains to prediction, control, and decision outcomes (Vorauer et al., 2025). Researchers focus on how to optimize the accuracy of interpersonal perception within social interactions (Kenny & Albright, 1987). At the action level, from a game theory perspective, the goal of an individual in a game is to continually optimize their behavior to approximate an optimal strategy, seeking to maximize their own utility (Camerer, 2003). Faced with this perception and action that have different optimization targets, the active inference framework allows them to align around the same fundamental objective: minimizing the model’s prediction error—the minimization of free energy. In terms of perception, minimizing free energy involves updating the internal state of others’ thoughts and feelings to reduce the discrepancy between the “anticipated thoughts of others” and the “actually observed interactive cues,” ultimately enhancing the accuracy of interpersonal perception. In terms of action, minimizing free energy involves adjusting one’s own behavior to make the “actual outcome of the interaction” align more closely with the “preferred anticipated outcome,” thereby increasing behavioral utility.
Furthermore, regarding the core issue of the exploration–exploitation dilemma in repeated games, the free energy principle similarly offers an endogenous solution, enabling a dynamic balance in behavioral optimization. Traditional reinforcement learning models often depend on externally tuned parameters to regulate exploratory versus exploitative behaviors. For instance, in the ε-greedy strategy, ε represents the probability of exploration, while 1 − ε corresponds to exploitation, with ε commonly scheduled to decay over the learning process (Vermorel & Mohri, 2005). The setting and adjustment of ε are generally guided by empirical or mathematical considerations rather than being grounded in a well-founded cognitive mechanism. In active inference, by contrast, this trade-off is internalized through the minimization of expected free energy, which reframes it as a unified optimization problem with a more principled cognitive basis (K. Friston et al., 2015). In the mathematical formulation of expected free energy, pragmatic value motivates exploitative behavior, whereas epistemic value promotes exploratory actions (Kirsh & Maglio, 1994). Exploration and exploitation are integrated into a single objective function (surprise minimization), where action selection aims to maximize the combined epistemic and pragmatic value (K. J. Friston, 2010). Thus, compared to classical expected utility maximization, surprise minimization under the free energy principle does not contradict but extends it by incorporating epistemic value. This allows the model to adapt dynamically to the game context. In early interaction stages, when social uncertainty is high, epistemic value dominates, leading to pronounced exploratory behavior. As beliefs about the opponent become more precise, uncertainty decreases, the weight of epistemic value diminishes, and pragmatic value increasingly guides decision-making, resulting in a natural shift toward exploitation. This dynamic balance arises from the evolution of internal belief states rather than from external parameter tuning, significantly enhancing the model’s realism and explanatory power in describing human behavior in repeated games. Moreover, the free energy principle provides a compelling account of commonly observed behaviors, such as risk-taking in high-reward contexts and curiosity-driven exploration in the absence of extrinsic incentives (R. Smith et al., 2022). By incorporating the exploration–exploitation trade-off into the unified framework of the free energy principle, the active inference framework exhibits strong explanatory power in both behavioral optimization and cognitive integration.

6.2. Simulating the Nature of Social Interaction

The core challenge of social decision-making stems from social uncertainty, which is ubiquitous in the social world—not merely in repeated game scenarios. During social interactions, the thoughts and intentions of others are largely hidden, making it difficult to infer their behavior and its implications for us (Feldman Hall & Shenhav, 2019; Kappes et al., 2019). Active inference, grounded in the framework of Partially Observable Markov Decision Processes (POMDP), naturally accounts for this uncertainty and unobservability into its models (R. Smith et al., 2022). Furthermore, active inference inherently incorporates the behavioral drive to resolve this uncertainty into its mathematical formulation. Extensive prior research has focused on exploring the influence of rewards and punishments on social decision-making, revealing that reward-related computations and neural circuits play a crucial role in guiding choices, similar to non-social decision-making (Ruff & Fehr, 2014). However, another equally important guiding factor has been overlooked: the motivation to reduce social uncertainty (Alchian, 1950). In social interactions, people often engage in exploratory behaviors to reduce uncertainty; these actions do not confer direct rewards and may even involve risks and costs. The emergence of such behavior indicates that beyond the practical rewards brought by decision-making, understanding others’ thoughts and reducing social uncertainty themselves also has intrinsic value—a value distinct from external rewards (Loewenstein, 1994). In social interactions, individuals possess a proactive desire to explore others’ thoughts and intentions, known as interpersonal curiosity (Way & Taffe, 2025). To satisfy interpersonal curiosity, individuals need to reduce uncertainty. The expected free energy of active inference incorporates epistemic value into behavioral choices. Epistemic value, also known as intrinsic value, mathematically corresponds to information gain or a reduction in uncertainty (K. Friston et al., 2015). According to the theory of expected free energy, an individual’s behavioral choices aim not only to maximize pragmatic value but are simultaneously driven by the maximization of intrinsic epistemic value. This computational mechanism provides a clear explanation for why individuals exhibit interpersonal curiosity during social interactions.
In social interactions, individuals need to infer others’ thoughts, perspectives, emotional states, and behavior patterns, and more. This involves both automatic inferences that form initial impressions and controlled inferences like perspective-taking (Devaine et al., 2014; Hughes et al., 2017). When modeling this mentalizing process in repeated game scenarios, the hierarchical model of active inference offers an elegant framework (K. J. Friston et al., 2017; Proietti et al., 2023). Its Bayesian network representation is shown in Figure 4d. To facilitate understanding of the complex hierarchical model, Figure 4 illustrates the evolution of various active inference models, progressing from simple to complex. In the figure, circles represent variables, squares represent factors that modulate relationships, and arrows indicate dependencies between variables. Figure 4a depicts a generative model for static perception, involving only a single time point, analogous to standard Bayesian inference. Consistent with the earlier description of Bayes’ theorem, s represents the abstract hidden state, o denotes the observable outcome, D represents the prior belief about the hidden state s, and A is the likelihood function, specifying the probability of observing o given the state s. Figure 4b shows a generative model for dynamic perception. Compared to Figure 4a, it incorporates two or more time points and includes a state transition matrix B, which describes how the hidden state s evolves over time. Figure 4c builds upon Figure 4b by adding policy selection, representing dynamic perception with action choices. Here, π denotes a policy, where different policies correspond to different state transition matrices. G represents the expected free energy, and C represents the preferred outcomes. Figure 4d illustrates the hierarchical model, which comprises two levels of generative models. In this architecture, the hidden state s2 at the higher level provides the prior belief for the hidden state s1 at the lower level. Conversely, the posterior belief of the lower-level model is treated as an observation at a given time point for the higher-level model. The likelihood matrix A2 of the higher-level model mediates this bidirectional information flow between levels. This structure allows the higher-level model to update at a slower temporal scale compared to the lower-level model, which is why this framework is also referred to as the deep temporal model.
Figure 4. Bayesian network representation of various active inference models.
In repeated game scenarios, understanding the thoughts and goals underlying others’ actions typically requires observing a series of specific behaviors within the game. This is analogous to inferring a higher-level intention behind a shot (e.g., forehand or backhand) in tennis by observing the movement characteristics of various parts of the opponent’s body (Proietti et al., 2023). Within the hierarchical active inference model, the agent uses the lower level of the hierarchy to process the specific choices in each round of the game, while the upper level characterizes the opponent’s higher-level, more stable attributes—such as intentions, strategies, or personality traits. For example, in multi-round prisoner’s dilemma games, the prior beliefs D2 in the upper-level model encode the automatically inferred initial impression of the opponent. The hidden states s2 encode the opponent’s different strategic modes (such as cooperative, deceptive, tit-for-tat, etc.). The opponent’s strategy may change slowly over time, and the agent gradually updates these high-level beliefs through perspective-taking and learning processes, which is consistent with the characteristics of the upper-level model. The high-level beliefs about the opponent’s strategy pattern, combined with the high-level likelihood matrix A2, generate predictions regarding the opponent’s specific behavior in the next round. These predictions serve as prior beliefs for the lower-level model, thereby guiding the agent’s own specific behavioral choices. The hierarchical model offers considerable flexibility, allowing it to be tailored to different game contexts. It enables the modeling of an individual’s high-order beliefs, making it possible to simulate mentalizing during social interactions and thereby providing a closer approximation to human social decision-making processes.

6.3. Future Directions: Model Simulation and Behavioral Fitting

The explanatory power of active inference is demonstrated not only at the theoretical level, such as through the free energy principle, but also in its provision of a computationally tractable and testable generative model that effectively bridges theory and data. Using simulation studies, researchers have shown that active inference models generate predictions about perception and behavior across domains such as action understanding, cognitive control, and sport anticipation that closely match empirically observed patterns (Harris et al., 2022; Proietti et al., 2025; Proietti et al., 2023). Other researchers have employed model fitting on behavioral data from cognitive processes such as approach-avoidance conflict, interoceptive inference, and directed exploration, extracting key model parameters that act as effective biomarkers for distinguishing between healthy individuals and those with psychopathology (R. Smith et al., 2021, 2020a, 2020b). Beyond computational psychiatry, studies that fit active inference models to behavioral data from sport anticipation tasks have likewise yielded parameters that effectively differentiate experts from novices (Harris et al., 2023). However, relatively few researchers have applied active inference models to the study of social cognition using either simulation or behavioral fitting methods. The theoretical advantages of the active inference framework for understanding social decision-making in repeated games likewise still require rigorous testing.
Model simulation and behavioral fitting are two complementary methodologies that form a progression from theoretical verification to empirical testing. Model simulation constructs models based on theory to generate simulated data. It is a deductive process conducted in an idealized environment, which allows researchers to test a theory’s internal consistency and generative capacity. Future research could define the state space (hidden states s, observations o), likelihood matrix A, and state transition matrix B for generative models based on specific game tasks. By adjusting parameters such as prior beliefs and the precision of expected free energy, researchers can observe whether the agent’s policy patterns align with empirically observed behavior. For instance, a hierarchical active inference model can be constructed on the basis of the Prisoner’s Dilemma game rules. By manipulating the priors of the higher-level model to simulate agents with different initial impressions of their opponent (e.g., expecting cooperation vs. defection), researchers could investigate whether the process of forming cooperation differs, thereby testing whether the model can reproduce empirically observed strategic patterns. Building upon the preliminary validation of a model’s effectiveness through simulation, behavioral fitting further applies this model to empirical behavioral data. Starting from real data, this approach either compares the model’s goodness of fit or estimates its parameters. As a form of model inversion, behavioral fitting tests the theory’s external validity and explanatory power. By collecting behavioral data (e.g., specific choices, reaction times, subjective reports) from participants in game experiments, researchers can compare the active inference model with classic computational models such as reinforcement learning. This comparison tests whether the active inference model can effectively explain human behavioral data and evaluates its goodness of fit. The parameters estimated from fitting can serve as computational biomarkers to distinguish between groups and to quantify individual differences in social decision-making. Future work may design experimental tasks that encompass different types of games to collect human behavioral data. Using the same active inference model, researchers could fit data across game types by adjusting specific structural parameters only as a function of the game types. If the model successfully explains human behavior across diverse game contexts, this would demonstrate its potential and generality as a unified theory of social decision-making. Furthermore, integrating neuroimaging data with computational model parameters will help reveal the neurocomputational mechanisms underlying social decision-making, thereby promoting the broader and deeper application of active inference in social cognitive neuroscience.
However, it must be acknowledged that applying active inference to the study of social decision-making presents several challenges. Active inference remains a predominantly theoretical framework, and its empirical application is still in early stages. Its high computational complexity and large number of parameters make the modeling process relatively challenging for researchers in psychology. Furthermore, existing guidelines for constructing active inference models are primarily based on relatively simple experimental tasks (R. Smith et al., 2022); transferring them to more complex repeated game is a formidable undertaking. Concurrently, researchers in the philosophy of science have questioned the free energy principle (Colombo & Wright, 2021), arguing that it is overly abstract and general, and cannot replace research into the specific mechanisms underlying cognition and behavior. This serves as a crucial reminder that while we recognize its substantial explanatory power as a theoretical framework, equal emphasis must be placed on empirical research to validate and substantiate the concrete implementation of its computational models.

7. Conclusions

This review has systematically examined the potential of the active inference framework to illuminate the cognitive mechanisms underlying decision-making in repeated games. Unlike one-shot games, repeated games more closely resemble real-life social interactions and thus serve as a classic paradigm for studying social decision-making. The difficulty of directly observing others’ thoughts and motivations introduces social uncertainty, which significantly increases the complexity of decision-making in repeated games. Individuals must engage in real-time learning and strategic adjustments based on feedback across multiple rounds of interaction, a process that involves interconnected cognitive components such as inference, strategy generation, and learning. Faced with such complex cognitive processes, traditional reinforcement learning models, while adept at capturing experience-driven value updating, are limited in their ability to handle incomplete information and to infer opponents’ strategies. Bayesian inference can accurately describe belief updating and uncertainty management, yet traditional exact Bayesian methods often struggle to fully integrate with action policy selection. Against this backdrop, the active inference framework, through the free-energy principle, unifies perception, learning, planning, and action within a single generative model structure. Perception and learning are achieved by minimizing variational free energy, thereby balancing model complexity against accuracy. Planning and action selection are realized by minimizing expected free energy and thereby maximizing both epistemic and pragmatic value. Compared to other models, the active inference framework offers a powerful approach to tackling the critical exploration–exploitation dilemma inherent in repeated games. Formulated based on partially observable Markov decision processes (POMDPs), the framework naturally incorporates social uncertainty. Furthermore, its hierarchical structure affords a plausible representation of the mentalizing processes that are central to social interaction. Future research should further evaluate the framework’s empirical validity using complementary methods of model simulation and behavioral fitting. In summary, the active inference framework provides a unified and explanatory perspective for understanding human behavior in repeated games. It offers a rigorous methodology for uncovering the hidden dynamics of social decision-making and thus positions itself as a potent tool for advancing computational social psychology.

Author Contributions

Conceptualization, H.Y. and L.W.; writing—original draft preparation, H.Y.; writing—review and editing, H.Y. and L.W.; supervision and discussion, W.G., T.T. and C.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Science and Technology Major Project (2022ZD0116403).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Ahn, W.-Y., Busemeyer, J. R., Wagenmakers, E.-J., & Stout, J. C. (2008). Comparison of decision learning models using the generalization criterion method. Cognitive Science, 32(8), 1376–1402. [Google Scholar] [CrossRef]
  2. Akata, E., Schulz, L., Coda-Forno, J., Oh, S. J., Bethge, M., & Schulz, E. (2025). Playing repeated games with large language models. Nature Human Behaviour, 9, 1380–1390. [Google Scholar] [CrossRef]
  3. Alchian, A. A. (1950). Uncertainty, evolution, and economic theory. Journal of Political Economy, 58(3), 211–221. [Google Scholar] [CrossRef]
  4. Aumann, R. J. (1981). Survey of repeated games. In Essays in game theory and mathematical economics in honor of oskar morgenstern (pp. 11–42). Bibliographisches Institut. [Google Scholar]
  5. Axelrod, R. (1980). Effective choice in the prisoner’s dilemma. The Journal of Conflict Resolution, 24(1), 3–25. [Google Scholar] [CrossRef]
  6. Behrens, T. E. J., Woolrich, M. W., Walton, M. E., & Rushworth, M. F. S. (2007). Learning the value of information in an uncertain world. Nature Neuroscience, 10(9), 1214–1221. [Google Scholar] [CrossRef]
  7. Bonowski, T., & Minnameier, G. (2022). Morality and trust in impersonal relationships. Journal of Economic Psychology, 90, 102513. [Google Scholar] [CrossRef]
  8. Camerer, C. (2003). Behavioural studies of strategic thinking in games. Trends in Cognitive Sciences, 7(5), 225–231. [Google Scholar] [CrossRef]
  9. Cochard, F., Van, P., & Willinger, M. (2004). Trusting behavior in a repeated investment game. Journal of Economic Behavior & Organization, 55(1), 31–44. [Google Scholar] [CrossRef]
  10. Colman, A. (2003). Cooperation, psychological game theory, and limitations of rationality in social interaction. Behavioral and Brain Sciences, 26(2), 139–153. [Google Scholar] [CrossRef] [PubMed]
  11. Colman, A. M., Pulford, B. D., & Krockow, E. M. (2018). Persistent cooperation and gender differences in repeated prisoner’s dilemma games: Some things never change. Acta Psychologica, 187, 1–8. [Google Scholar] [CrossRef]
  12. Colombo, M., & Wright, C. (2021). First principles in the life sciences: The free-energy principle, organicism, and mechanism. Synthese, 198(Suppl. S14), 3463–3488. [Google Scholar] [CrossRef]
  13. Courville, A. C., Daw, N. D., & Touretzky, D. S. (2006). Bayesian theories of conditioning in a changing world. Trends in Cognitive Sciences, 10(7), 294–300. [Google Scholar] [CrossRef]
  14. Cui, X., Bryant, D. M., & Reiss, A. L. (2012). NIRS-based hyperscanning reveals increased interpersonal coherence in superior frontal cortex during cooperation. Neuroimage, 59(3), 2430–2437. [Google Scholar] [CrossRef]
  15. Cushman, F. (2024). Computational social psychology. Annual Review of Psychology, 75, 625–652. [Google Scholar] [CrossRef]
  16. Cushman, F., & Gershman, S. (2019). Editors’ introduction: Computational approaches to social cognition. Topics in Cognitive Science, 11(2), 281–298. [Google Scholar] [CrossRef]
  17. Devaine, M., Hollard, G., & Daunizeau, J. (2014). The social bayesian brain: Does mentalizing make a difference when we learn? PLoS Computational Biology, 10(12), e1003992. [Google Scholar] [CrossRef]
  18. Feldman Hall, O., & Shenhav, A. (2019). Resolving uncertainty in a social world. Nature Human Behaviour, 3(5), 426–435. [Google Scholar] [CrossRef]
  19. Fleischhut, N., Artinger, F. M., Olschewski, S., & Hertwig, R. (2022). Not all uncertainty is treated equally: Information search under social and nonsocial uncertainty. Journal of Behavioral Decision Making, 35(2), e2250. [Google Scholar] [CrossRef]
  20. Friston, K., FitzGerald, T., Rigoli, F., Schwartenbeck, P., O’Doherty, J., & Pezzulo, G. (2016). Active inference and learning. Neuroscience and Biobehavioral Reviews, 68, 862–879. [Google Scholar] [CrossRef] [PubMed]
  21. Friston, K., Rigoli, F., Ognibene, D., Mathys, C., Fitzgerald, T., & Pezzulo, G. (2015). Active inference and epistemic value. Cognitive Neuroscience, 6(4), 187–214. [Google Scholar] [CrossRef] [PubMed]
  22. Friston, K. J. (2010). The free-energy principle: A unified brain theory? Nature Reviews Neuroscience, 11(2), 127–138. [Google Scholar] [CrossRef]
  23. Friston, K. J., Editor, R. R. G., Parr, T., Price, C., & Bowman, H. (2017). Deep temporal models and active inference. Neuroscience and Biobehavioral Reviews, 77, 388–402. [Google Scholar] [CrossRef]
  24. Friston, K. J., Kilner, J., & Harrison, L. (2006). A free energy principle for the brain. Journal of Physiology-Paris, 100(1–3), 70–87. [Google Scholar] [CrossRef] [PubMed]
  25. Galinsky, A., Ku, G., & Wang, C. (2005). Perspective-taking and self-other overlap: Fostering social bonds and facilitating social coordination. Group Processes & Intergroup Relations, 8(2), 109–124. [Google Scholar] [CrossRef]
  26. Gershman, S. J. (2019). Uncertainty and exploration. Decision, 6(3), 277–286. [Google Scholar] [CrossRef]
  27. Gijsen, S., Grundei, M., & Blankenburg, F. (2022). Active inference and the two-step task. Scientific Reports, 12(1), 17682. [Google Scholar] [CrossRef]
  28. Hackel, L. M., & Amodio, D. M. (2018). Computational neuroscience approaches to social cognition. Current Opinion in Psychology, 24, 92–97. [Google Scholar] [CrossRef] [PubMed]
  29. Harris, D. J., Arthur, T., Broadbent, D. P., Wilson, M. R., Vine, S. J., & Runswick, O. R. (2022). An active inference account of skilled anticipation in sport: Using computational models to formalise theory and generate new hypotheses. Sports Medicine, 52(9), 2023–2038. [Google Scholar] [CrossRef]
  30. Harris, D. J., North, J. S., & Runswick, O. R. (2023). A bayesian computational model to investigate expert anticipation of a seemingly unpredictable ball bounce. Psychological Research-Psychologische Forschung, 87(2), 553–567. [Google Scholar] [CrossRef]
  31. Harsanyi, J. C. (1968). Part ii: Bayesian equilibrium points. Management Science, 14, 320–334. [Google Scholar] [CrossRef]
  32. Hitchcock, P. F., Fried, E. I., & Frank, M. J. (2022). Computational psychiatry needs time and context. Annual Review of Psychology, 73, 243–270. [Google Scholar] [CrossRef]
  33. Hodson, R., Mehta, M., & Smith, R. (2024). The empirical status of predictive coding and active inference. Neuroscience and Biobehavioral Reviews, 157, 105473. [Google Scholar] [CrossRef]
  34. Hughes, B. L., Zaki, J., & Ambady, N. (2017). Motivation alters impression formation and related neural systems. Social Cognitive and Affective Neuroscience, 12(1), 49–60. [Google Scholar] [CrossRef]
  35. Itoh, H., & Nakamura, K. (2007). Partially observable markov decision processes with imprecise parameters. Artificial Intelligence, 171(8), 453–490. [Google Scholar] [CrossRef]
  36. Kahneman, D., & Tversky, A. (1979). Prospect theory: An analysis of decision under risk. Econometrica, 47(2), 263–291. [Google Scholar] [CrossRef]
  37. Kappes, A., Nussberger, A.-M., Siegel, J. Z., Rutledge, R. B., & Crockett, M. J. (2019). Social uncertainty is heterogeneous and sometimes valuable. Nature Human Behaviour, 3(8), 764. [Google Scholar] [CrossRef]
  38. Kenny, D. A., & Albright, L. (1987). Accuracy in interpersonal perception: A social relations analysis. Psychological Bulletin, 102, 390–402. [Google Scholar] [CrossRef] [PubMed]
  39. Kirsh, D., & Maglio, P. (1994). On distinguishing epistemic from pragmatic action. Cognitive Science, 18(4), 513–549. [Google Scholar] [CrossRef]
  40. Krafft, P. M., Shmueli, E., Griffiths, T. L., Tenenbaum, J. B., & Pentland, A. (2021). Bayesian collective learning emerges from heuristic social learning. Cognition, 212, 104469. [Google Scholar] [CrossRef] [PubMed]
  41. Lacomba, J. A., Lagos, F., Reuben, E., & van Winden, F. (2017). Decisiveness, peace, and inequality in games of conflict. Journal of Economic Psychology, 63, 216–229. [Google Scholar] [CrossRef][Green Version]
  42. Lee, D. (2008). Game theory and neural basis of social decision making. Nature Neuroscience, 11(4), 404–409. [Google Scholar] [CrossRef]
  43. Loennqvist, J.-E., & Walkowitz, G. (2019). Experimentally induced empathy has no impact on generosity in a monetarily incentivized dictator game. Frontiers in Psychology, 10, 337. [Google Scholar] [CrossRef]
  44. Loewenstein, G. (1994). The psychology of curiosity: A review and reinterpretation. Psychological Bulletin, 116(1), 75–98. [Google Scholar] [CrossRef]
  45. Maurer, C., Chambon, V., Bourgeois-Gironde, S., Leboyer, M., & Zalla, T. (2018). The influence of prior reputation and reciprocity on dynamic trust-building in adults with and without autism spectrum disorder. Cognition, 172, 1–10. [Google Scholar] [CrossRef]
  46. Miller Moya, L. M. (2007). Coordination and collective action. Revista Internacional De Sociologia, 65(46), 161–183. [Google Scholar] [CrossRef]
  47. Montague, P. R. (2018). Computational phenotypes revealed by interactive economic games. In Computational psychiatry: Mathematical modeling of mental illness (pp. 273–292). Academic Press. [Google Scholar]
  48. Montague, P. R., Dolan, R. J., Friston, K. J., & Dayan, P. (2012). Computational psychiatry. Trends in Cognitive Sciences, 16(1), 72–80. [Google Scholar] [CrossRef]
  49. Nash, J. F. (1950). Equilibrium points in n-person games. Proceedings of the National Academy of Sciences of the United States of America, 36(1), 48–49. [Google Scholar] [CrossRef]
  50. Neumann, J. v. (1928). Zur theorie der gesellschaftsspiele. Mathematische Annalen, 100(1), 295–320. [Google Scholar] [CrossRef]
  51. Ng, G. T. T., & Au, W. T. (2016). Expectation and cooperation in prisoner’s dilemmas: The moderating role of game riskiness. Psychonomic Bulletin & Review, 23(2), 353–360. [Google Scholar] [CrossRef]
  52. Nisan, N., Roughgarden, T., Tardos, E., & Vazirani, V. V. (Eds.). (2007). Algorithmic game theory. Cambridge University Press. [Google Scholar] [CrossRef]
  53. Pan, Y., Cheng, X., Zhang, Z., Li, X., & Hu, Y. (2017). Cooperation in lovers: An fNIRS-based hyperscanning study. Human Brain Mapping, 38(2), 831–841. [Google Scholar] [CrossRef]
  54. Parr, T., & Friston, K. J. (2017). Uncertainty, epistemics and active inference. Journal of the Royal Society Interface, 14(136), 20170376. [Google Scholar] [CrossRef]
  55. Parr, T., & Friston, K. J. (2019). Generalised free energy and active inference. Biological Cybernetics, 113(5–6), 495–513. [Google Scholar] [CrossRef]
  56. Press, W. H., & Dyson, F. J. (2012). Iterated prisoner’s dilemma contains strategies that dominate any evolutionary opponent. Proceedings of the National Academy of Sciences of the United States of America, 109(26), 10409–10413. [Google Scholar] [CrossRef] [PubMed]
  57. Proietti, R., Parr, T., Tessari, A., Friston, K., & Pezzulo, G. (2025). Active inference and cognitive control: Balancing deliberation and habits through precision optimization. Physics of Life Reviews, 54, 27–51. [Google Scholar] [CrossRef] [PubMed]
  58. Proietti, R., Pezzulo, G., & Tessari, A. (2023). An active inference model of hierarchical action understanding, learning and imitation. Physics of Life Reviews, 46, 92–118. [Google Scholar] [CrossRef]
  59. Puterman, M. L. (1994). Markov decision processes: Discrete stochastic dynamic programming. John Wiley & Sons, Inc. [Google Scholar]
  60. Qinglin, G., & Yuan, Z. (2021). Psychological and neural mechanisms of trust formation: A perspective from computational modeling based on the decision of investor in the trust game [计算模型视角下信任形成的心理和神经机制--基于信任博弈中投资者的角度]. Advances in Psychological Science, 29(1), 178–189. [Google Scholar]
  61. Ruff, C. C., & Fehr, E. (2014). The neurobiology of rewards and values in social decision making. Nature Reviews Neuroscience, 15(8), 549–562. [Google Scholar] [CrossRef]
  62. Schelling, T. (1960). The strategy of conflict. Harvard University Press. [Google Scholar]
  63. Selten, R. (1975). Reexamination of the perfectness concept for equilibrium points in extensive games. International Journal of Game Theory, 4(1), 25–55. [Google Scholar] [CrossRef]
  64. Smith, J. M., & Price, G. R. (1973). The logic of animal conflict. Nature, 246(5427), 15–18. [Google Scholar] [CrossRef]
  65. Smith, R., Friston, K. J., & Whyte, C. J. (2022). A step-by-step tutorial on active inference and its application to empirical data. Journal of Mathematical Psychology, 107, 102632. [Google Scholar] [CrossRef]
  66. Smith, R., Kirlic, N., Stewart, J. L., Touthang, J., Kuplicki, R., McDermott, T. J., Taylor, S., Khalsa, S. S., Paulus, M. P., & Aupperle, R. L. (2021). Long-term stability of computational parameters during approach-avoidance conflict in a transdiagnostic psychiatric patient sample. Scientific Reports, 11(1), 11783. [Google Scholar] [CrossRef]
  67. Smith, R., Kuplicki, R., Feinstein, J., Forthman, K. L., Stewart, J. L., Paulus, M. P., Tulsa 1000 Investigators & Khalsa, S. S. (2020a). A bayesian computational model reveals a failure to adapt interoceptive precision estimates across depression, anxiety, eating, and substance use disorders. PLoS Computational Biology, 16(12), e1008484. [Google Scholar] [CrossRef] [PubMed]
  68. Smith, R., Schwartenbeck, P., Stewart, J. L., Kuplicki, R., Ekhtiari, H., Paulus, M. P., & Tulsa 1000 Investigators. (2020b). Imprecise action selection in substance use disorder: Evidence for active learning impairments when solving the explore-exploit dilemma. Drug and Alcohol Dependence, 215, 108208. [Google Scholar] [CrossRef]
  69. Speekenbrink, M., & Konstantinidis, E. (2015). Uncertainty and exploration in a restless bandit problem. Topics in Cognitive Science, 7(2), 351–367. [Google Scholar] [CrossRef]
  70. Speekenbrink, M., & Shanks, D. R. (2010). Learning in a changing environment. Journal of Experimental Psychology-General, 139(2), 266–298. [Google Scholar] [CrossRef]
  71. Stojic, H., Orquin, J. L., Dayan, P., Dolan, R. J., & Speekenbrink, M. (2020a). Uncertainty in learning, choice, and visual fixation. Proceedings of the National Academy of Sciences of the United States of America, 117(6), 3291–3300. [Google Scholar] [CrossRef] [PubMed]
  72. Stojic, H., Schulz, E., Analytis, P., & Speekenbrink, M. (2020b). It’s new, but is it good? How generalization and uncertainty guide the exploration of novel options. Journal of Experimental Psychology-General, 149(10), 1878–1907. [Google Scholar] [CrossRef] [PubMed]
  73. Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. IEEE Transactions on Neural Networks, 9(5), 1054. [Google Scholar] [CrossRef]
  74. Tomov, M. S., Schulz, E., & Gershman, S. J. (2021). Multi-task reinforcement learning in humans. Nature Human Behaviour, 5(6), 764–773. [Google Scholar] [CrossRef]
  75. Tump, A. N., Deffner, D., Pleskac, T. J., Romanczuk, P., & Kurvers, R. H. J. M. (2024). A cognitive computational approach to social and collective decision-making. Perspectives on Psychological Science, 19(2), 538–551. [Google Scholar] [CrossRef]
  76. van Dijk, E., & De Dreu, C. K. W. (2021). Experimental games and social decision making. Annual Review of Psychology, 72, 415–438. [Google Scholar] [CrossRef]
  77. Vermorel, J., & Mohri, M. (2005). Multi-armed bandit algorithms and empirical evaluation. In J. Gama, R. Camacho, P. Brazdil, A. Jorge, & L. Torgo (Eds.), Lecture notes in computer science: Vol 3720. Machine learning: ECML 2005 (pp. 437–448). Springer. [Google Scholar]
  78. Von Neumann, J., & Morgenstern, O. (1944). Theory of games and economic behavior. Princeton University Press. [Google Scholar]
  79. Vorauer, J. D., Hodges, S. D., & Hall, J. A. (2025). Thought-feeling accuracy in person perception and metaperception: An integrative perspective. Annual Review of Psychology, 76, 413–441. [Google Scholar] [CrossRef]
  80. Walker, A. R., Luque, D., Le Pelley, M. E., & Beesley, T. (2019). The role of uncertainty in attentional and choice exploration. Psychonomic Bulletin & Review, 26(6), 1911–1916. [Google Scholar] [CrossRef] [PubMed]
  81. Wang, H., & Kwan, A. C. (2023). Competitive and cooperative games for probing the neural basis of social decision-making in animals. Neuroscience and Biobehavioral Reviews, 149, 105158. [Google Scholar] [CrossRef]
  82. Way, N., & Taffe, R. (2025). Interpersonal curiosity: A missing construct in the field of human development. Human Development, 69(2), 79–90. [Google Scholar] [CrossRef]
  83. Worthy, D. A., Pang, B., & Byrne, K. A. (2013). Decomposing the roles of perseveration and expected value representation in models of the iowa gambling task. Frontiers in Psychology, 4, 640. [Google Scholar] [CrossRef]
  84. Yifrah, B., Ramaty, A., Morris, G., & Mendelsohn, A. (2021). Individual differences in experienced and observational decision-making illuminate interactions between reinforcement learning and declarative memory. Scientific Reports, 11(1), 5899. [Google Scholar] [CrossRef]
  85. Zhang, M., Liu, T., Pelowski, M., & Yu, D. (2017). Gender difference in spontaneous deception: A hyperscanning study using functional near-infrared spectroscopy. Scientific Reports, 7, 7508. [Google Scholar] [CrossRef] [PubMed]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.