Playing Extensive Games with Learning of Opponent’s Cognition

Decision-making is a basic component of agents’ (e.g., intelligent sensors) behaviors, in which one’s cognition plays a crucial role in the process and outcome. Extensive games, a class of interactive decision-making scenarios, have been studied in diverse fields. Recently, a model of extensive games was proposed in which agent cognition of the structure of the underlying game and the quality of the game situations are encoded by artificial neural networks. This model refines the classic model of extensive games, and the corresponding equilibrium concept—cognitive perfect equilibrium (CPE)—differs from the classic subgame perfect equilibrium, since CPE takes agent cognition into consideration. However, this model neglects the consideration that game-playing processes are greatly affected by agents’ cognition of their opponents. To this end, in this work, we go one step further by proposing a framework in which agents’ cognition of their opponents is incorporated. A method is presented for evaluating opponents’ cognition about the game being played, and thus, an algorithm designed for playing such games is analyzed. The resulting equilibrium concept is defined as adversarial cognition equilibrium (ACE). By means of a running example, we demonstrate that the ACE is more realistic than the CPE, since it involves learning about opponents’ cognition. Further results are presented regarding the computational complexity, soundness, and completeness of the game-solving algorithm and the existence of the equilibrium solution. This model suggests the possibility of enhancing an agent’s strategic ability by evaluating opponents’ cognition.


Introduction 1.Background
Decision-making is a basic component of various agents' behaviors [1], such as Autonomous driving and sensor systems [2], and has been studied in many fields such as psychology, economics, and artificial intelligence due to its ubiquitousness [3,4].One's cognition plays a crucial role in decision-making processes and outcomes, since available alternatives can be identified and weighed effectively only when meaningful information is collected.Cognition involves several aspects, including memory, learning and perception, and thus has attracted the interest of researchers in psychology, neuroscience, cognitive science [5][6][7], etc.
Focusing on mathematical analysis of interactive multi-agent decision-making processes, game theory has gained increasing acknowledgement as a classic tool in areas [8,9] including wireless networks, Blockchains [10], Robots, and so on [11].The game-playing process can be significantly influenced by the player's cognition regarding the possibilities of different choices and the suitability of these choices, since a player makes his/her choices based on such a cognition.To model and explain the various interactive decision-making scenarios in social and economic activities [12,13], different game-theory models have been studied.A typical game model is extensive games [14][15][16][17]; this model is used in sequential decision-making (SDM) scenarios [18].In an extensive game, players take turns to choose actions; thus, a game tree is normally used to represent the process of an extensive game.In the game tree, each node represents a game situation, while each edge represents a move between game situations [19].

The Challenge
To find optimal solutions to extensive games, backward induction (BI) [20,21] is a well-known method.It computes backwards from the terminal nodes of a game tree to the root of the game tree.During this process, the player is assumed to be fully rational, always pursuing the most optimal choices by searching over the whole game tree.Consequently, the resulting solution concept via the BI algorithm is referred to as subgame perfect equilibrium (SPE) [22,23].
However, in the actual game-playing process, the limitations of computing power, memory, time, skills, etc., must be taken into account.Therefore, it seems impossible and unnecessary for the player to search the entire game tree, especially in large games.Instead, the player considers merely a portion of the game tree [24,25].Meanwhile, based on prior knowledge, accumulated experience and playing tactics, the players hold their own opinions about the plausibility of future actions and the suitability of these game situations following the current decision point.Hence, both the classic model and the equilibrium for extensive games are too ideal to represent how the game is actually being played and the practical game-playing outcome.As a result, there is a need to develop alternative models and equilibrium concepts for extensive games, which should provide more realistic insights into the actual game-playing process.
Recently, [26] proposed a novel model of extensive games called extensive games with cognition, in which the agents' cognition (including the underlying game being played and the quality of different game states) was simulated by artificial neural networks (ANNs) [27,28].Unlike the standard CPE, the equilibrium concept under extensive games with cognition is dubbed cognitive perfect equilibrium (CPE).CPE seems to more accurately reflect the game-playing of players, and the ideal assumption regarding the visibility of the complete game tree to the players is weakened.
Despite the progress in the aforementioned framework, in which players' cognition plays an indispensable role, essential work must be conducted in pursuit of modeling of practical game-playing.A key point is that the cognition in the existing model precludes the modeling of the opponent's cognition.However, acquiring the opponent's cognition of the underlying game would naturally benefit a player's decision-making by allowing them to recognize their opponent's strategy.This characteristic coincides with how humans play games, since they take advantage of their opponent's expected reactions.Although the importance of reasoning about an opponent's strategy was noted [29,30], not much attention has been paid to modeling game playing in consideration of an opponent's cognition.

Our Contribution
In this paper, we build upon the model of cognitive extensive games and propose a model of extensive games with learning of the opponent's cognition.The resulting equilibrium concept is called adversarial cognitive equilibrium (ACE).In contrast with SPE and CPE, ACE ignores the ideal assumption of full rationality by considering both the player's cognition and his or her views on the opponent's cognition.More specifically, we focus on the following issues: (1) Modeling of adversarial cognition in extensive games.In Section 3, we propose a model of extensive games involving the opponent's cognition, which we call extensive games with adversarial cognition, in which each opponent is endowed with his expected cognition on the game tree and the evaluation of game situations therein.First, we introduce the existing model-extensive games with cognition-in Section 2. (2) Game solving with adversarial cognition.In Section 4, a new algorithmic procedure for solving extensive games with adversarial cognition is presented, and the resulting solution of this algorithm is defined.Since the new solution concept is obtained based on a player's reasoning about their opponents' cognition, the strategy is not guaranteed to be the absolute best: this scenario is the reality that practical players face.A series of issues are discussed regarding the correctness and computational complexity of the game-solving algorithm, the existence of the ACE, and its connection with the CPE.(3) Examples and reasonability of the model.For a better understanding of this model, Section 6 is devoted to an illustrative example.In addition, this framework is shown to be reasonable for practical game-playing.

Preliminaries: Cognitive Extensive Games
This section aims to introduce a game framework [26] that characterizes players' cognition when playing extensive games by incorporating ANNs into the classic model of extensive games.

Game Models
First, we introduce the concept of (finite) extensive games characterized by pure strategies with perfect information.
Extensive form-perfect information games An extensive form-perfect information game [31] is formally defined as a tuple G=(N, T, t, Σ i , ρ i ), in which

•
N represents the set of players, and →} a∈A ) denotes the (directed, irreflexive and finite) game tree, which consists of a set of nodes (or vertices) V, a set of moves or actions A, and a set of arcs { a →} a∈A ⊆ V 2 .For any two nodes v and v ′ , v ′ is said to be an immediate successor to v if v a → v ′ .Nodes without successors are called leaves (terminal nodes) and are denoted by Z.The set of moves available at v is denoted as The utility function ρ i : Z → R determines the utility of each terminal node in Z for each player i ∈ N; • Σ i denotes the set of strategies σ i for player i.Each σ i : {v ∈ V\Z| t(v) = i} → V is a function assigned to every nonterminal node v, with t(v) = i acting as an immediate successor of v.
A strategy profile σ = (σ i ) i∈N is a combination of the strategies of every player.The set of all strategy profiles is denoted as Σ.For any player i ∈ N, σ −i is the strategy of players in N \ i.An outcome function O : Σ → Z is a function that assigns a terminal node to each strategy profile.O(σ −i ) denotes the outcomes that can be achieved by agent i, given that all the other players follow σ, and O(σ ′ i , σ −i ) depicts the outcome when player i follows σ ′ i and the other players utilize σ.
An alternative way to depict players' payoff is through preference relation ⪰ i , such that for each player i, v We focus on games with a finite horizon and no infinite branches in the game tree.In this paper, by "extensive games", we refer to extensive form-perfect information games with a finite horizon.
For an extensive game G and any node v ∈ V, the subgame of G following v is defined as The outcome of σ in subgame G| v is written as O| v (σ| v ).
While this approach gives a full picture of the game from an omniscient observer's point of view, the extensive game lacks consideration of the players' vision of the game, which might differ from the real game being played.Normally, players view the game according to accumulated experience, including their judgment of the plausibility of future moves and the suitability of game configurations.That is, a player's view of the game is only a part of the original game tree, which is narrow and short due to their limited abilities.
In [26], a new model of extensive games was proposed by considering players' cognition of the game.Technically, two kinds of ANNs were introduced into the game model to model and simulate agents' cognition.
The first type of ANN, called a filter net, represents the players' cognition regarding the plausibility of future moves.For a filter net FN of a game G, the input is any state of G and the output is a probability function ff : V × A → R over all future moves at that state.For a state v and a possible move a at v, the probability of choosing a at v is defined as ff(v)(a), which is usually written as ff a v .The second type of ANN, the evaluation net, simulates players' evaluation of the quality of game states.For an evaluation net EN of a game G, the input is any state of G, and the output is an evaluation function ef : V → R assigning a probability to each state.For any state v, ef(v) predicts the probability of winning the game following v and is usually simply written as ef v .
The cognitive gameplay process can be modeled based on the filter net and evaluation net.For each decision at a current state, four subprocedures are involved: the first three capture players' cognition by obtaining the filtration T s of the game tree T of an extensive G.

Algorithm for obtaining the filtration Fil s (T)
1. Branch-Pick.According to prior knowledge, the branches of the current decision point can be narrowed by selecting several (e.g., b) of the most plausible alternatives among all the available actions.Formally, for any state s, the branches s 0 corresponding to arg max b {ff(s)(a)|a ∈ A s } will be chosen; that is, only the top b elements of ff(s)(a) are selected.2. Subtree-Pick.To make decisions, the current player searches the subsequent game tree by following these branches.Due to computational limitations, the exploration of the future involves a finite number of steps (e.g., l).Each branch rooted at s 0 is extended to depth l, i.e., the subtree that can be reached within l steps is obtained.3. Evidence-Pick.To choose the optimal of the b branches, the player evaluates the goodness based on the payoffs of the final states of the subtrees.This evaluation of the game state depends on two aspects: (1) an evaluation via prior knowledge and (2) a vague prediction of the future.The former can be given by the evaluation net, while the latter requires a simulation of future moves, which is necessary, since even if it is difficult for players to obtain the complete game tree, they can still hold a vague prediction of the far future.This process is completed in the following manner.First, for each leaf s l in the subtree following s 0 , a path is computed until the end of the game, where at each node along the path, the most optimal action according to the plausibility returned by ff is chosen.Then, the payoff of the final outcome is determined following s l as the simulation result of s l .The overall payoff of s l is the weighted sum of the two parts.
The fourth subprocedure is performed based on the filtration Fil s (T).Given the cognition on the game tree and the suitability of game states, the player who moves at s can choose the optimal move from the b options via BI.
Therefore, the game being played is different from the ideal model of extensive games: players' cognition regarding the game must be included.The following is a model called cognition games, which was proposed in [26].
Cognition Games.Given an extensive game G = (N, T, t, Σ i , ρ i ), a filter net FN and an evaluation net EN for G, a cognition game G s for G at any state s is a tuple (N s , T s , t s , Σ s i , ρ s i ), in which

•
T s denotes the filtration Fil s (T) of T at s; The utility function ρ s i is an integration of the results obtained via the evaluation net and simulation.Let z s be any leaf of T s , and let z be the final point following z s in the simulation process.Then, ρ s i (z s ) = (ef(z s ) + ρ i (z))/2.

Equilibrium Concepts
A substantial concern in games is the equilibrium concept, which characterizes a state of dynamic balance in which no player can deviate from this strategy and improve their expected payoff.Two classic equilibrium concepts for extensive games are the Nash equilibrium [32] and SPE.
Nash equilibrium.Let σ * be any strategy profile of an extensive game G. σ * is the best response for player Subgame perfect equilibrium.Let σ * be any strategy profile of an extensive game G. If, for each player, i and any node v with t The best response is the best option for each player given what other players will do.Consequently, the Nash equilibrium requires that a strategy profile consists of the best response for every player.SPE considers sequential moves and must be the Nash equilibrium in all subgames.
For any extensive game G with root v 0 , an SPE σ of G can determine a node sequence , and v k is a terminal node.We call this node sequence a σ-SPE solution of G, which is written as q σ .A fundamental way to find the SPE of extensive games is BI [33], which identifies the best move for the player who is the last to move in the game.Subsequently, the optimal move for the next-to-last player is then found.This process is repeated from the end of the game to the beginning to determine the strategies for all players.
An apparent weakness of BI is the need to search the full game tree, which makes the approach impractical for large-scale games.Due to resource limitations or ability constraints [34], players cannot make a perfect prediction of the future in practical gameplaying processes.Normally, players must make decisions via a heuristic search over a limited part of the game tree based on prior knowledge.By taking players' cognition about the game into consideration, cognitive game-solving provides a more realistic framework for playing extensive games.The resulting equilibrium concept is called CPE.
Cognitive Perfect Equilibrium [26].For an extensive game G = (N, T, t, Σ i , ρ i ), a filter net FN and an evaluation net EN of G, a strategy profile σ * is called a CPE of G if and only if the following holds: The intuition of CPE is that at every decision point, the CPE is consistent with the SPE of the corresponding cognition game.
Cognitive games provide realistic representations of extensive games, and the CPE reflects the gameplay procedure.However, a major drawback exists concerning cognitive games and the CPE.In cognitive games, players assume that their cognition is consistent with that of their opponents, i.e., their opponents have the same view of the game being played and the same evaluation of the game states.
As a result, cognitive games omit the player's reasoning on their opponents' cognition, which might play significant roles in the player's strategies.In particular, the player may benefit from exploiting the opponents' cognition.This paper aims to refine cognitive games by endowing players with the ability to learn their opponents' cognition about the game being played and to evaluate game situations.An appropriate solution concept is obtained under this new game model.

Adversarial Cognition Game
In this section, we introduce a refinement of cognition games, in which players are allowed to learn and reason about the cognition of their opponents, namely, the game the opponents believe is being played and their evaluation of the game situations, i.e., the utility functions they use.We first introduce the notion of state pair, a formal structure that allows reasoning about the cognition of opponents.
State Pairs.Consider an extensive game G = (N, T, t, Σ i , ρ i ), a filter net FN, and an evaluation net EN for G.A state pair π of G is a pair of states of the form (v 0 , v 1 ) satisfying v 1 ∈ T v 0 , i.e., states following v 0 in the pair are those within the filtration at v 0 .
Without loss of generality, we assume that the opponents' cognitive ability is encoded by the number of future steps that are foreseeable to them, i.e., their search depth.The opponents' evaluation of the goodness of leaves of the search tree can be modeled as a static payoff function p.The set of utility functions is denoted by P. The intuition behind state pairs is to capture the adversarial cognition of the player who moves at v 0 .The expression of the form (v 0 , v 1 ) encodes the cognition that the player moving at v 0 holds about the cognition of the player moving at v 1 , including what he can foresee occurring in the future and what his utility function is.We use Π to denote the set of state pairs of G.
Based on the notion of state pairs, we can represent adversarial cognition structures by associating each state pair with a set of states and an evaluation over the terminal states therein.
Adversarial cognition structures.Let G be an extensive game and FN and EN be the filter net and evaluation net for G, respectively.An adversarial cognition structure associating a subset of nodes following v 1 with each state pair (v 0 , v 1 ), and C E is a function C E : Π → P, associating a payoff function with each state pair.C satisfies the following conditions: , where d ∈ N, i.e., the player's cognition at v 0 about what the player moving at v 1 can see is represented by nodes in the subtree limited by a depth d (we use For π = (v 0 , v 1 ), C E (π) denotes the player's cognition regarding the utility function of the player moving at v 1 .
A game model with the agent's cognition regarding their opponents can then be obtained by assigning an adversarial cognition structure to a cognition game.
Adversarial Cognition Game.An adversarial cognition game (ACG) is defined as a tuple Γ = (G, FN, EN, C), with G being an extensive game, while FN, EN and C are the filter net, evaluation net, and adversarial cognition structure for G, respectively.
Note that an ACG induces a sequence of extensive games, one for each state pair concerning the player's adversarial cognition.For any ACG Γ, we denote the game induced by any sate pair π as Γ π , for which the game tree is restricted by C V (π) and the utility function is C E (π).

Game Solving and Equilibrium
Based on the player's adversarial cognition at each state pair, he can search the current game and make an optimal decision with regard to the possible moves and payoffs for the corresponding outcomes.The combination of all these optimal decisions results in a solution to the ACG.According to this idea, Algorithm 1 presents the process for solving an ACG, which is called adversarial cognitive game solving.The process starts from root v 0 and extends the sequence q by successively adding the successor node that is the result of the optimal move in the cognition game at the current node determined by using Algorithm 2. This process is depicted in Figure 1.The most important parts of Algorithm 2 are the function Search (Algorithm 3) and function BB (Algorithm 4).Algorithm 3 computes, for a state pair π = (v, v ′ ), the node sequences determined by the SPE of Γ π and then yields the optimal successors following v.Given the optimal move of each node v ′ in the cognition game G v of the current player at v, Algorithm 4 computes a best path following v for t(v) and returns the immediate successor of v as required.Note that the above game-solving algorithm is different from the standard BI or the cognitive game-playing process in [26].Consequently, the resulting equilibrium of the ACG differs from that of the SPE or CPE.Algorithm 3 searches the game Γ (v,v ′ ) induced by state pair (v, v ′ ), which corresponds to an SPE of Γ (v,v ′ ) .The optimal move at state v should be consistent with the SPE of Γ (v,v ′ ) for any state v ′ within t(v)'s cognition game G v .Therefore, the first equilibrium we can define is the one obtained at state v, which is called local adversarial cognition equilibrium (LACE).
LACE.Let Γ = (G, FN, EN, C) be an ACG, and let v be any node in We denote LACS(Γ, v) as the set of LACE outcomes of Γ at v. The composition of such outcomes yields the global solution for Γ: An ACS is the composition of best strategies of players at each decision node.Each such strategy is the best response for the players, given the player's cognition about the opponents' beliefs on the games being played and the quality of the game states.
As is the case for other equilibrium concepts, each ACE of an ACG also determines a specific sequence of nodes.Suppose σ is an ACE of Γ with root r 0 .The ACE solution determined by σ, dubbed as q σ , is a sequence of (v the set of adversarial cognition solutions of Γ is denoted as ACS.
For game theory, another fundamental concern is the existence of an equilibrium.The following lemma clarifies that every ACG has an ACE.

Lemma 1. (Existence) Every ACG Γ has an ACE.
Proof.It is sufficient to show the existence of the ACE at any position v.The first step is to prove the existence of the SPE for each Γ (v,v ′ ) .We can obtain an SPE at v via induction with regard to the depth h of the nodes.Let f be a function that connects a path with each state v ′′ ∈ Γ (v,v ′ ) .When h = 0, i.e., v ′′ is a leaf, define f (v ′′ ) = (v ′′ ).Then, if f (v ′′ ) is defined for all nodes with a depth of h ≤ k for some k > 0, suppose v * is a node with h(v * ) = k + 1, and According to the definition, σ ′ is an ACE of Γ at v. Finally, we can construct a strategy σ * = σ ′ (v) for every state v in Γ; thus, it is evident that σ * is an ACE of Γ.
The observation below illustrates the connection between the ACE and the two previously mentioned equilibrium concepts, SPE and CPE, by specifying the conditions under which the ACE collapses into the SPE or CPE.
ACE of Γ is also a CPE, and vice versa.(2) If for every nonterminal node v, G v = G| v and Γ (v,v ′ ) =G| v ′ for any node v ′ in the filtration G v at v, then an ACE of Γ is also an SPE of G, and vice versa. Proof.
(1) (⇐).Let σ * be a CPE of Γ.For all the nonterminal nodes v and any node (2) (⇐).Let σ * be an SPE of G.For every nonterminal node v with t For any nonterminal node v, there exists a strategy σ v , which is a LACE of Γ.That is, for Therefore, if the current player's cognition regarding the following players' cognition is the same as his cognition of himself, then the ACE is equivalent to the CPE; if the player's cognition is the same as the complete subtree therein, then the ACE is equivalent to the SPE.However, these conditions are normally impossible during real gameplay, which reflects the rationality of our framework.
Crucial issues concerning the game-solving algorithm include its correctness and complexity.The following theorem presents an argument that each solution returned by Algorithm 1 is an ACS of the game.Theorem 1. (Correctness) For any ACG Γ with root r 0 , for any path q * returned by Sol(Γ) in Algorithm 1, there exists an ACE σ * of Γ, such that q σ * = q * .Proof.This can be proved by induction regarding the depth d of the game tree.
Base case: d = 1 is trivial, with only a single node r 0 in the game.When d = 2, let q * = (r 0 , z 1 ).According to Algorithm 1, (z 1 ) is a successor of v obtained by executing ACM(Γ, r 0 ) (Lines 5-7).That is, (z 1 ) is a sequence returned by Search(Γ (r 0 ,z 1 ) , z 1 ) (line 4 in Algorithm 2) and the action a, such that r 0 a → z 1 is a best move returned by BB(Γ, r 0 , Continuations).Therefore, (z 1 ) is an SPE solution of Γ ( r 0 , z 1 ).At the same time, (z 1 ) is a LACS of Γ at r 0 .We can define a strategy profile σ * as σ * (r 0 ) = z 1 , and σ * (z) = z for any z ∈ Z r 0 .Observe that σ * is an ACE of Γ according to the definition of ACE.Hence, q * is determined by an ACE.
The complexity of Algorithm 1 is analyzed in the following proposition.
Considering that b and l are normally much smaller than m and d, we can obtain the complexity of the filtration at v, i.e., f (G To obtain a node sequence q, the ACM musts be called O(d) times.Hence, the overall complexity of Algorithm 1 is

An Example: Tic-Tac-Toe
After establishing a model of extensive games involving players' cognition on the opponents and the new solution concept, we proceed with an example illustrating this framework, through which the procedure of solving such games is demonstrated.Comparison with the case without cognition about opponents confirms the feasibility of opponent modeling in gameplay.
We consider the example first presented in [26], which starts with a scenario in a Tic-Tac-Toe game.Tic-Tac-Toe is a simple game that is suitable for illustrating our model, and has been extensively used in the literature due to its simplicity.According to [26], with a player's own cognition regarding the game, a cognition game model of Tic-Tac-Toe consists of three components: a classic extensive game model G, a filter net FN and an evaluation net EN, where: (1) G = (N, T, t, Σ i , u i ), such that • the set of players N = {1, 2}, with 1 for × and 2 for •; • game tree T=(V, A, { a →} a∈A ), with V={legal layouts o f the 3 × 3 board}; A={legal actions by the game rule}; nodes in which it is player 1's turn to move, and t(v) = 2 for player 2's turn; • player i's strategies Σ i ={σ i }, and σ i (v) ∈ A v for each σ i and v ∈ V\Z with t(v) = i; • utility ρ i (z) is defined as 1 for any terminal node z where i wins the game; ρ i (z) = 0 when player i loses at z; and ρ i (z) = 0.5 when there is a draw.(2) FN is a multi-layer backpropagation (BP) neural network, in which the number of input neurons is nine, representing the feature of nine grids.There are also 9 output neurons, one for each grid (−1 for ×, 1 for •, and 0 for idle).There are also 50 hidden neurons.The filter function f f is determined by the output probability p(s, a) of the filter net for any state s and any possible move following s.
(3) EN shares the same structure as FN, but it has only one output neuron, which outputs a probability p(s) for s.
The process for game solving in [26] under this model is to compute the CPE.The decisions at each point are made based on the two output probabilities from the filter net and evaluation net, which characterize players' cognition on the plausibility of moves and the quality of game states, respectively.
For comparison with the model proposed here, we consider the same instance, viz., a partial game of Tic-Tac-Toe, with a starting point v 0 (see Figure 2 and two of its successors b 1 , b 2 ) in which it is player O's turn to move.The game tree after filtration via the filter net FN is shown in Figure 3, where the board configurations of these nodes are shown in Figure 4 (for intermediate nodes) and Figure 5  The final utilities of the terminal nodes (based on the cognition of player O) are shown in Table 1.Note that these nodes are not terminal nodes of the original game, but the terminal ones within the cognition of O.For each node, each utility is given as the average of the probability returned by the evaluation net EN and the utility of the leaf in the most plausible subsequent path.For each pair of utilities, the first value is the utility of player X, and the second is the utility of player O.For simplicity, more details about obtaining the above figures and table are omitted, since the information does not affect our consideration.
of their opponents greatly affects the quality of their decision-making.Considering the importance of improving the game-playing outcome by utilizing the opponent's cognition regarding the underlying game, this paper proposed a new model of extensive games, based on which a new equilibrium concept -ACE-was derived.An algorithmic procedure of adversarial cognitive game playing and the learning of opponents' cognition were also presented.The proposed model and solution concept are shown to be superior to the standard ones.
It is acknowledged that alternative methods of adversarial learning exist.Focusing on the modeling of adversarial cognition, we provide only one possible procedure.In particular, optimized algorithms can be adopted for different concrete games [38].Nevertheless, the process is expected to offer some direction regarding the realization of abstract modeling of games in practical game-playing scenarios.
Several topics remain to be explored in the future.First, to concentrate on the effects of a player's cognition about their opponents, adversarial cognition is modeled as a one-level reasoning result.Notably, the opponent of a player may also hold an adversarial cognition about this player.Moreover, the player may give further consideration to his opponent's cognition regarding his cognition, and so on.Therefore, this process represents a kind of high-level cognitive reasoning, which can be explored in future work.Another issue is the dynamic evolution of adversarial cognition.As an illustrating example, the current algorithm for learning the opponent's cognition is a one-shot process based on the playing history.However, with an increase in observed information about the opponent, more knowledge can be gained, which should lead to a more accurate learning result about the opponent's cognition.Thus, the online incremental learning of an opponent's cognition would also be interesting to explore.With a close relation to cognitive theory [39,40], our study also raises concerns about the logical methods of reasoning and verification.This suggests that our framework offers a good platform for theoretical exploration under practical scenarios for a contingent topic on the correlation between logic and game theory.

Proposition 2 .
(Complexity) The worst time complexity of Algorithm 1 is O(n log 2 n), where n is the number of nodes in the underlying game.Proof.First, let b be the number of branches selected in the filtration and let d be the depth of the game; then, the time complexity of Algorithm 4 is T(BB) = O(b * d).For Algorithm 3, let t(d) be the complexity of a game tree with depth d.Then, t(d) = O(b * t(d − 1)), and for anyk = 1, • • • , l, t(d − k) = O(m * t(d − k − 1)), where m represents the width of the game tree.Meanwhile, t(d − l − j) = 1 for j = 1, • • • , (d − 2).Through iterative computation, the time complexity of Algorithm 3 is obtained, i.e., t(d) = O(b * m l ).Algorithm 2 must first obtain the filtration at v, then call Algorithm 3 for each v ′ in the filtration, and finally call Algorithm 4. Therefore, the time complexity is the sum of the three parts.For the filtration, the complexities of the three subprocedures are f 1 where b is a constant and m d = O(n), l ≪ d, d = O(logn).

Figure 5 .
Figure 5. Board configuration of terminal nodes.
Search(Γ| v succ , v succ ); 8 q tmp ← 10 newZ ← final node in q tmp ; 11 if newZ ≻ t(v) bestZ then 12 /*the current optimal outcome and branch*/ 13 bestZ ← newZ; 14 q ← q tmp ; 15 return (v, q); Algorithm 4: Best Branch. 1 BB(Γ, v, Continuations) /* Compose the move (in array Continuations) chosen for each decision point, thus get all the node sequences following v and choose a best move following v */ Input: A game Γ, a state v, an array Continuations Output: A best move a ′ following v determined by Continuations 12 Return bestmove;