The core idea of the presented framework is to replace the manual definition of rules that govern agent behaviour by an automated, generic process. This shifts the effort during model development away from finding rules that correctly depict reality towards defining an equation for a score or a utility function for an agent. This utility function can be tailored specifically to model the investigated system and it is also possible to use different utility functions for different agents. In most cases, the definition of a utility function will be a simpler and less subjective task than finding rules for behaviour. The actual decision making, i.e. the decision on what action to perform given the current sensory input and past experience, is then handled by an artificial neural network. A simple example where this advantage is evident would be an agent-based model that describes a game of chess. Both players want to win the game, so their utility function would be trivial to find:
However, finding a rule based description of which moves the players think will lead them to maximise this function is virtually impossible. A neural network can close this gap.
2.1. Artificial Neural Networks
Artificial neural networks [
25,
26,
27,
28] are based on the idea of building artificial networks that closely resemble the principles of biological neural networks. A signal enters the network through some sensory cells or input nodes and is transferred to a complex network. Depending on the individual connections between nodes, this input signal is then transformed into an output at an output node (see
Figure 1). Similar to a biological neural network, an artificial neural network also needs to be trained. This means it needs to be confronted with various input signals and receive feedback regarding the desired output, so that the required connections can be formed. There are various techniques to train artificial neural networks with different benefits and downsides [
29,
30,
31,
32,
33,
34,
35], but they all have in common that they rely on a large amount of data, either measured or generated.
Trained neural networks can then be used for many applications, such as data processing [
36], function approximation [
37], solving prediction problems [
38] and solving classification problems [
39]. In a classification problem, a neural network is presented with certain inputs (images or other data) and has to classify each input as a member of a set of predefined categories or classes. The most prominent example for a neural network that solves a classification problem is the automated reading of handwriting. The inputs are images of handwritten letters and each of them corresponds to an actual letter. In that case, the input layer of the network would encode the image and the output layer would have one node for each letter in the alphabet. Using images that are already classified, the network can then be trained to produce a signal in the correct output node.
For the framework, we focus on the classification problem, since this is the problem agents are faced with. They have a set of possible actions available to them and need to decide for each whether this would be a good or a bad decision in their current situation. Alternatively, this could be framed as a prediction problem. In a prediction problem, agents would not classify decision as good or bad but would try to approximate the utility that this decision would lead to. Thus, they would search for a quantitative value for each possible decision and not for a qualitative one, making the process more complicated. Furthermore, the formulation as a prediction problem would assume that agents have a quantitative understanding of their utility function. Since this assumption is unjustified in many systems, the basis of our framework is the aforementioned classification problem, where a qualitative understanding (worse, the same, or better) of their utility is sufficient.
2.2. A Framework for Agent-Based Modelling
The core of the presented framework is a trained artificial neural network that solves the following classification problem: “Which of the actions available to me can increase my score?” To train the network, we first need to gather a sufficient amount of data. Each entry should consist of the sensory input of the agent, the decision it made and whether this decision increased its score or not. The modelling process is split into two parts: the experience phase, in which agents make random decisions in order gather experience data, and the application phase, in which the agents use the trained neural network to make realistic decisions. These two phases are depicted schematically in
Figure 2.
During the experience phase, agents are initialised (in general) in the same environment and under the same conditions in which the actual modelling will take place. Agents take turns and perform the following steps in each turn:
Save all the sensory inputs in a vector .
Calculate current score from utility function and save as .
Perform a random action a from the list of available actions.
Calculate the new score and save as .
Rate the decision and save as r: good if , bad otherwise.
Add an entry to the experience database: .
After the experience phase, we end up with a large database that contains information about whether a certain action in a certain situation can be classified as good or bad. Similar to the problem of reading handwritten letters, this information of correctly classified inputs can be used to train the neural network, so that it can then classify inputs, even though they are not exactly the same as the ones it trained with. When applying a neural network for this purpose, one has to take great care regarding validation: Validating a neural network that is used for reinforcement learning is simple. The closer the rate of correct classifications is to 100%, the better. Here, the situation is more complicated: We do not want perfect, but realistic decisions. Thus, reaching a correct classification rate of close to 100% cannot be used as an indicator for successful training. For our application, one has to look at convergence: If the rate of correct classifications stops improving, this is a sign that the agents are sufficiently trained, even if the only reach a rate of, e.g., 60%.
After the training of the neural network, the application phase begins. Agents are reinitialised, so that the random decisions they took during training have no influence on their state. In addition, in the application phase the agents take turns and perform the following steps:
Save all the sensory inputs in a vector .
For each action available action , use as an input for the neural network.
Rank all actions according to the certainty that they are good decisions.
Perform the action which was ranked highest.
2.3. Applying the Framework
To showcase the presented framework, we apply it to the well-know segregation model by Schelling [
23,
24]. Agents are distributed randomly on a grid (see
Figure 3). Each agent is a member of one of two groups. Every time step, agents can choose to stay at their position or move to a random unoccupied place on the grid. In the original model, one has to define a certain percentage of neighbours from the other group, above which the agents should move away. Depending on this value, one can observe weak or strong segregation [
40]. Using the presented framework, there is no need to determine such specific rules. We simply define how to calculate a score
S: Agents gain a point for every neighbour in the same group and lose a point for every neighbour in the other group:
with
being the score of the agent at position
,
and
being the group membership of the agent (1 for membersr of Group 1,
for members of Group 2 and 0 for unoccupied cells). The first term of Equation (
2) counteracts the contribution of the sums, where
and an agent would be awarded a point for being its own neighbour. Note that the actual formulation of this equation has no impact on the results, as long as each “correct” neighbour increases the score and each “incorrect” neighbour decreases the score by the same amount. Having different weights for positive and negative contributions is possible, but seems counter-intuitive in the context of segregation [
41].
In addition to this score, we have to define what agents can sense or know: In this implementation, we assume that agents know which group they are in and that they can see the group of all agents in their Moore neighbourhood [
42]. These are all the assumptions that are required for the experience phase.
During the experience phase, agents make random decisions. They store their sensory input and their decision as a positive experience, if the score increased as an effect of the decision and as a negative experience if it decreased. Since the decisions of all agents are random, the states of the system are also more or less random. This means we need to collect data over long periods of time, in order to also have information about situations that have only a small probability of occurring randomly. However, since the data can be generated automatically and this process only involves a random decision, an action and the recalculation of the score, it is computationally cheap. For this study, we generated 1000 time steps in a system consisting of agents, leading to 200,000 experiences, which were used for training the neural network.
In the application phase, agents use the trained neural network in order to classify all possible actions as positive or negative and choose what they think is the best option. Note that in this phase the network will encounter data not seen before. Ideally, they should show patterns of segregation, similar to the original model. However, in the framework, we never defined any specific rules or equipped agents with the knowledge what the input parameters mean. The agents receive a vector containing numbers, but they do not know what information is encoded there or how it is encoded. They learn their meaning during the experience phase, which makes the framework highly flexible.
In addition to replicating the original model, we can also experiment with the experience phase. On the one hand, we can change the environment in which the agents train, so that it is different from the one the agents encounter during application. That way we can investigate any change in behaviour that comes from a different training environment. On the other hand, we can truncate some of the senses of the agents during training. This enables us to pinpoint which sensory input is necessary for which kind of behaviour. This is related to the established technique of feature extraction [
43]. However, in feature extraction, one is limited to removing or adding features, while here we can make changes to the learning environment that go beyond that.
To showcase these two methods, we train agents in a population with an asymmetrical population share and restrict their vision to fewer cells, to see how the resulting behaviour changes.