Designing Behavioural Artificial Intelligence to Record, Assess and Evaluate Human Behaviour
:1. Introduction and Outline
1.1. Motivation
1.2. Context
1.3. Outline
1.4. Disclaimer
2. Behavioural Sciences
- (a)
- We elaborate on the claim that experts in psychology disagree on what affects behaviour (Section 2.1).
- (b)
- We argue that game theoretic models regularly fail to correctly predict human behaviour (Section 2.2).
- (c)
- We discuss the dilemma of competitive versus cooperative behaviour (Section 2.3) which our proof-of-concept behavioural artificial intelligence (cf. Section 6) was designed to face.
- (d)
- We introduce (Section 2.4) the model from behavioural psychology used for our formalization (Section 3).
2.1. Behavioural Psychology
2.2. Game Theory and Rational Choice
2.3. Cooperative/Competitive Behaviour
2.4. Modelling Behaviour
- Behavioural beliefs: Behavioural beliefs are someone’s expectations about the likely outcome of actions, paired with the subjective view of these outcomes.
- Normative beliefs: Normative beliefs are the opinion of others regarding the outcomes of actions, the personal intention to adhere to these peer standards as well as the desire of the individual to live up to the expectations of one’s peers.
- Control beliefs: Control beliefs are one’s level of confidence that they have control over all relevant factors required to bring about an outcome.
3. Formalisms
- (a)
- behavioural psychology (i.e., the TACT paradigm, Section 3.1) to formally describe behaviour;
- (b)
- philosophy and logic (i.e., classical propositional logic as well as modal logic, Section 3.2); and
- (c)
- game theory (i.e., game theory, utility theory and subjective expected utility theory, Section 3.3).
3.1. The TACT Paradigm
- Action: walking, exercising, working out
- Target: on a treadmill, on a stair master, on a walking machine
- Context: at home, in a physical fitness centre, in the gym
- Time: for at least 30 min each day in the forthcoming month
- (1)
- This summer, Person sells a boat to Person in Edinburgh.
- (2)
- Last year Person bought a boat from Person in Glasgow.
- (3)
- The next five months Person will not sail on the Clyde.
- Action: buy, sell, sail
- Target: Person , Person , Person
- Context: in Glasgow, in Edinburgh, on the Clyde
- Time: this summer, last year, the next five months
3.2. Propositional and Modal Logic
3.2.1. Propositional Logic
Syntax and Semantics
Syntactic Sugar
- (p ∨ q) is equivalent to ¬(¬p∧¬q)
- (p → q) is equivalent to ¬p∨ q
- (p ↔ q) is equivalent to (p →q) ∧ (q → p)
3.2.2. Modal Logic
3.3. Game Theory and Rational Choice
3.3.1. Game Theory
3.3.2. Utility Theory
3.3.3. Subjective Expected Utility (SEU) Theory
- A set of behaviours from which to choose
- A subset thereof to consider
- A set of possible futures, resulting from the different behaviours/actions
- A payoff function representing the utility
- A function to determine the outcome a certain action will bring about
- Some information about the probability of that particular outcome occurring
4. Games as a Discreet Environment for Controlled Behaviour Assessment
- (a)
- We open by mentioning the use of games in psychology and education in general.
- (b)
- We introduce a certain type of game which fits our purpose. The aim is to advocate the use of games as controllable environments where test subjects can be subjected to comparable decisions (with the obvious aim to then record and compare these choices).
- (c)
- The bulk of this section formally defines what we understand to be a game in Section 4.3.
- (d)
- The bulk of this section provides a formal model of games in Section 4.4. This model can then be used to interpret behavioural statements (which is defined in Section 5) when expressed in the formalism presented in Section 3.
4.1. Psychology and Computer Games
4.2. Resource-Management Games for Serious Games
- They are challenging due to restricted or limited resources, location and time, the need to plan ahead and the multitude of potentially conflicting objectives.
- They stimulate the fantasy by putting the player into an unfamiliar and imaginary position.
- They constantly require the player to control issues arising from the continuity of the game and from the actions of competing AI or human players. Choosing a behavioural strategy in response to actions of others is a substantial part of the game.
4.3. Defining a Game
- the set of players .
- the exhaustive set of histories .
- p
- a function mapping histories to individual payoffs for the n players.
4.4. Modelling a Game
4.4.1. Possible World Models (Kripke Semantics)
- the set of states .
- a set of rules .
- the set of states .
- a set of rules .
- a valuation
Practical Considerations
4.4.2. Potentially Infinite Games
4.4.3. Disjoint Submodels
4.4.4. Characterizing Submodels
Practical Considerations
4.4.5. Specific Instances of a Game
- 1.
- and with
- 2.
- and
- 3.
- and
4.4.6. Uncertainties
- ,
- :
5. Formalizing and Evaluating Human and Machine Behaviour
5.1. Formal Statements about Individual States of a Game
5.2. Formal Statements about Behaviour in Games
5.2.1. Behaviour in General
5.2.2. Behavioural Statements
5.2.3. Nested Statements
5.2.4. Complex Behavioural Statements
- Cooperative
- Player i is not bidding against any player this round
- Player i is not bidding against any player that has played cooperatively the last round
- Player i is not bidding against any player (that has played competitively against any player that has played cooperatively the last five rounds) this round
- Competitive
- Player i is bidding against another player this round
- Player i is bidding against all other players the next three rounds
- Player i is only bidding against players (that have bid on Player i the last round) this round
- Action: Player i is bidding
- Target: Player 1, Player 2, Player 3, Player 4
- Context: Player j has played ϕ (where is another TACT statement)
- Time: this round, last round, last five rounds, next three rounds
5.3. Automated Translation of Behaviour Statements
5.3.1. Formalism ⇔ Natural Language
- Translate atomic statements into their propositions.
- Replace occurrences of “it is not the case that” by ¬.
- Rewrite “statement1 and statement2” to (“statement1” ∧ “statement2”) and,correspondingly, “statement1 or statement2” to (“statement1” ∨ “statement2”).
- Sentences of the form “if statement1 then statement2” are replaced by (“statement1” → “statement2”).
- Finally, “statement1 if and only if statement2” is replaced by (“statement1” ↔ “statement2”).
5.3.2. Formalism ⇔ Normal Form
- (p ↔ q) becomes ¬(p∧¬q) ∧ ¬(¬p∧ q)
- (p → q) becomes ¬(p∧¬q)
- (p ∨ q) becomes ¬(¬p∧¬q)
- (p ↔ q) becomes ¬(¬(¬p ∨q) ∨ ¬(p ∨¬q))
- (p → q) becomes ¬p∨ q
- (p ∧ q) becomes ¬(¬p∨¬q)
5.4. Automated Behaviour Evaluation
5.5. Automated Generation of Consistent Behaviour Statements
5.6. Behavioural Artificial Intelligence
- the set of actions available to the player.
- the set of consequences of these actions.
- r
- the rules, i.e., a function mapping actions to consequences.
- a (subjective) strategy, i.e., a function mapping subsets of to an action a (, ).
- g
- a function mapping consequences to multi-valued goodness values representing specific aspects of these consequences (such as reaching the specific goal of, e.g., taking a pawn).
- u
- a function mapping goodness values to utility values (representing, for example, how taking that pawn will serve one of a number of strategic objectives, which ultimately lead to winning the game). The arity of u and that of its output may differ.
- an evaluation function , which maps a state to k boolean values. Each of these values indicates whether a formula ϕ is true in that state (i.e., whether the formula is valid, given the valuation (assignment of truth values to propositions) for that state). Each of these k formulae represents a behaviour which we want to support.
- a function , that combines the k boolean values calculated by and the m utility values. The output are m values which combine the utility as well as the behavioural preference of the action.
- ≻
- a preference relation over such that iff .
6. Proof of Concept Implementations
6.1. Aims and Objectives
- The formalism is suitable for controlled environments such as simulations and computer games (assuming that their data structures are designed appropriately). The evaluation and comparison of formally stated behaviours, as well as the translation thereof into their natural language equivalent, is straightforward and can be automated. The algorithms for doing so are computationally efficient and scale well.
- Using formal behaviour statements (expressed in our formalism), we can augment standard models for rational decision making from the literature to include behavioural stances. Using this model, we can design and implement a game-playing AI whose choices exhibit clear (and human-like) behavioural preferences.
6.2. Proof of Concept Game—Utility Tycoon
6.2.1. Objectives
6.2.2. Brief Description of the Game
6.2.3. Game Design and Implementation
Serious Game Principles
6.2.4. Validation of Our Approach
Representation and Evaluation of Formalised Behaviour Statements
Computational Efficiency and Performance
6.3. Proof of Concept Game—SoxWars
6.3.1. Objectives
6.3.2. Brief Description of the Game
- The players initially start with a small amount of money (resource).
- Using that money they can purchase supplies of socks for stock (products).
- The shops where these socks can be sold are limited and there is a system in place that favours the supplier that has, in the past, supplied the respective shop.
- The game is turn-based, and turns consist of a number of phases, the order of which is fixed.
- The revenue from selling socks is fixed while the cost for restocking (i.e., the acquisition of socks) varies depending on the phase of the turn when it happens.
- During each phase, players make their choices simultaneously. These decisions, which can affect the outcome for all players, are being revealed directly afterwards (i.e., before the next phase).
- The game is designed to converge to situations where trade-offs are required. While a balanced and mutually fair distribution of opportunities is possible, any player can upset this balance and force the game into a series of conflicts (i.e., situations where players will compete for something).
6.3.3. Game Design
The Different Phases of a Turn
- A limited number of new products can be acquired per round in three separate RAC phases:
- RAC1 Buying: Each player is offered the same number of socks, for $1 per unit.
- RAC2 Bidding: The players are offered additional socks at a cost of $1.50 per product. Players can also choose to bid on the products offered to other players (at the inflated price of $2.50).
- RAC3 Trading: Players can offer remaining resources to other players for $2 (market value).
- There are a number of territories, each with a number of shops where the resources are sold for $2 during RAL. Assignment happens in three phases and only to territories, not to specific shops:
- RAS1 Shops only accept resources from the player that delivered to them in the last round.
- RAS2 Shops that had a supplier last round only accept resources from that player.
- RAS3 Delivery to any remaining (not yet supplied) shop.
Conflicting deliveries are handled during the RAL part of a turn. - RAL
- There is a bias towards players who supplied shops in the last round, making it beneficial to reliably supply your shops. This is especially relevant since the number of shops is finite. Shops first accept delivery from players that supplied them in the last round but then accept supplies evenly from all supplying players (on a territory by territory basis). Conflicts are resolved by random allocation such that all players are favoured in turn. If a fair (but random) distribution is not mathematically possible, the human player is favoured (by design).Essentially, it is in the players’ interest to deliver (and keep delivering) to as many shops as possible. This will let the game converge to a state where all shops are loyal. At this stage, the game will produce only the number of resources required to satisfy the demand of all shops. Once this happens, the only way to acquire more territory is to bid on another player’s resources (at a loss) in the hope of keeping the shops this player can no longer deliver to in the next round.
Three Variations for the Order of Phases in a Turn
6.3.4. Modelling the Game
Modelling Resource-Acquisition
- Modelling RAC1: Phase one of the resource acquisition (RAC1) does not contain any behaviour of interest to us. Whether a player decides to purchase resources does not appear as part of the considered behavioural statements. Obviously, there are changes in the state of the game, but these can be represented as a single world in the model: if a specific player purchases resources, this only affects the propositions related to this player’s stock and funds. These propositions are disjoint with the propositions of the other players and thus we can express all the changes in within a single world. The only minor liberty taken in this approach is the fact that the globally available resources are of course decreasing every time a player purchases stock, which happens multiple times in the stage. However, global resources are not considered for our behavioural statements as they are not under the control of the player. Therefore, they are not included in .The difference in the frames (i.e., the models without the propositions assigned) for the stages is thus mainly in the number of possible future states (e.g., i in Figure 9: to .) This means that if the player can decide on the number of resources to purchase, and if there are exactly n products offered to each of the j players, there are different results for each player (allowing for zero products being purchased), resulting in in the model for RAC1. In our implementation, we included this information as it was relevant for the expression of the rational aspect of the AI; however, we restricted this decision to “buying” and “¬ buying”, so that in our implementation .
- Modelling RAC2: The second stage of the resource acquisition is the most important one for the behaviour analysis. Again, the model can be collapsed to the model shown in Figure 9. This time, however, we consider the actions of the individual players with regard to the other players as the bidding on resources happens by one player but targets the resources of a specific other player. As above for RAC1, we only allowed the bidding on resources and did not enable a quantification for this (i.e., it is not possible to bid on a few resources of a player, it is either bidding on all offered resources or bidding on none). We furthermore did not offer the option to “bid on all players”, forcing the player to select every opponent individually. We furthermore required that one bids on one’s own resources before bidding on those of other players. This is rational strategic behaviour and removes a number of complex behavioural constructions such as bidding on other players’ resources at the cost of not bidding on your own (which would be cheaper). This means that for j players there are other players to bid on, “bidding only on the resources offered to the player” and “not bidding at all”. Due to this, there are possible combinations, and thus in the model for RAC2 .
- Modelling RAC3: In the last stage where resources can be acquired, we ignored the decision to accept resources offered. The justification for this was that including this increases the complexity of the represented behaviour by allowing for sulking and other emotional responses. The main justification for being able to omit these more complex behaviours is that the AI players will make the decision to purchase such resources on a purely tactical basis. The idea is that the human player is aware that the opponents are played by a computer and it is assumed that emotional responses are not exhibited towards these players. The decisions in RAC3 are very similar to the one in RAC2, in that each player gets to decide whether to offer a fixed amount of stock to another player. Therefore, for model RAC3 as well.
Modelling Resource-Allocation
Modelling Resource-Assignment
6.3.5. Implementation
Implementation: Cardboard Version
Implementation: Mobile Phone App
Implementation: Web-Based Game
6.3.6. Validation and Evaluation of Our Approach
Lessons Learned: Cardboard Version
Lessons Learned: Mobile Phone App
Lessons Learned: Web-Based Game
7. Summary and Conclusions
Conflicts of Interest
AI | Artificial Intelligence | |
DOAJ | Directory of open access journals | |
ToPB | Theory of Planned Behaviour | Section 2.4 |
TACT | Target, Action, Context and Time | Section 3.1 |
PL | Propositional Logic | Section 3.2.1 |
ML | Modal Logic | Section 3.2.2 |
SEU | Subjective Expected Utility | Section 3.3.3 |
RMG | Resource-Management Games | Section 4.2 |
Cooperate Player 1 | Defect Player 1 | |
Cooperate Player 2 | (B, B) | (A, D) |
Defect Player 2 | (D, A) | (C, C) |
Connector | Usage | Operator | Usage | Rewritten | |
not | “it is not the case that statement’ | ¬ | |||
or | “statement1 or statement2’ | ∨ | |||
and | “statement1 and statement2’ | ∧ | ≡ | ||
if ... then | “if statement1 then statement2’ | → | ≡ | ||
if and only if | “statement1 if (and only if) statement2’ | ↔ | ≡ |
