Replacing Rules by Neural Networks A Framework for Agent-Based Modelling

Agent-based modelling is a successful technique in many different fields of science. As a bottom-up method, it is able to simulate complex behaviour based on simple rules and show results at both micro and macro scales. However, developing agent-based models is not always straightforward. The most difficult step is defining the rules for the agent behaviour, since one often has to rely on many simplifications and assumptions in order to describe the complicated decision making processes. In this paper, we investigate the idea of building a framework for agent-based modelling that relies on an artificial neural network to depict the decision process of the agents. As a proof of principle, we use this framework to reproduce Schelling’s segregation model. We show that it is possible to use the presented framework to derive an agent-based model without the need of manually defining rules for agent behaviour. Beyond reproducing Schelling’s model, we show expansions that are possible due to the framework, such as training the agents in a different environment, which leads to different agent behaviour.


Introduction
The main idea behind agent-based modelling [1] is to describe a system by using its constituent parts as a starting point. These so-called "agents" have certain properties and actions available to them and the whole system is then modelled based on the actions and interactions of the agents. Thus, the dynamic in this bottom-up approach originates at the micro scale, in contrast to most macroscopic modelling approaches, where the dynamic is mostly governed by equations that describe system properties. Agent-based modelling is relevant for different scientific disciplines, in which systems can be seen as comprised of a large number of interacting entities. There are many examples of areas where agent-based modelling was used successfully: for evacuation models [2], it is quite intuitive to describe the system as interacting agents. The goal of each agent is clear, but due to their interactions interesting effects can emerge. Another important area for agent-based models are traffic simulations [3][4][5]. The goals of the agents are also clear here (getting from one point to an other point), but, due to interactions with other agents, they must behave differently from how they would if they could use all roads for themselves, leading to congestion or even stop-and-go traffic. Somewhat related to this field is the field of city planning [6], where models with mobile agents, which have various needs, can give crucial input about how efficient a specific plan for a city is. Beyond that, there are also various applications in social sciences [7,8] and the simulation of human systems [9]. In the fields of ecology [10] and complexity research [11], agent-based models can also be used. Another important application of agent-based modelling is computational economics [12]. This field is especially active and it is possible to simulate whole economies from the bottom up [13][14][15][16]. An agent-based approach can also be used if one is interested in (social) networks [17] and phenomena that are influenced by network effects. A prominent example would be tax compliance and tax evasion [18], which can be better understood by models that see this behaviour not as a external feature of the system, but as something that emerges on the agent-level and is disseminated via a network.
For the development of each agent-based model certain steps are necessary: (i) defining the agents; (ii) defining the environment; (iii) defining what agents know and sense; (iv) defining the goal of the agents; and (v) finding rules for their behaviour. While the first four steps are relatively straightforward for most models, the last step features some unique challenges. The rules should use the information the agents have access to as an input and yield what action they take in this situation as an output, considering bounded rationality [19]. The input that the agents use for their decision is relatively easy to find and to justify. In an evacuation model for example, agents sense their immediate surroundings and choose their path accordingly, but they have no information about the situation behind a visual barrier such as a wall. In a traffic model, agents know which roads are usually congested and which are not and may have access to information about recent construction sites or accidents and mainly base their path decision on this input. However, specifying the rules that lead to a decision is much more difficult and often relies on assumptions from psychology or economics, which are often hard to back up empirically or by theories. This makes the search for valid rules for agent behaviour one of the biggest challenges in agent-based modelling [20].
In this paper, a framework is presented that uses an artificial neural network to simulate the decision making process in an agent-based model. The framework is generic enough that it can be used for any kind of agent-based model, since the training of the artificial neural network is included in the process and therefore independent of the investigated system or model. The framework is related to reinforcement learning [21,22], but there are important differences. While the goal of reinforcement learning is to arrive at an optimal solution, the framework focuses on providing a realistic decision process, by training agents in realistic environments and not in hypothetical environments proven to show the best results, as is done for reinforcement learning. This also includes the possibility of wrong decisions or misjudgement, allowing us to model a wide range of systems, in which agents are not able to find the optimal solution, realistically. This paper is organised as follows. Section 2 details the used methods. Section 2.1 gives a short introduction to artificial neural networks. A detailed description of the proposed framework can be found in Section 2.2. As a proof of principle, the framework is applied to reproduce Schelling's famous segregation model [23,24] in Section 2.3. Results of this application are presented in Section 3. Section 4 discusses these results. Finally, Section 5 gives a short conclusion of this study.

Methods
The core idea of the presented framework is to replace the manual definition of rules that govern agent behaviour by an automated, generic process. This shifts the effort during model development away from finding rules that correctly depict reality towards defining an equation for a score or a utility function for an agent. This utility function can be tailored specifically to model the investigated system and it is also possible to use different utility functions for different agents. In most cases, the definition of a utility function will be a simpler and less subjective task than finding rules for behaviour. The actual decision making, i.e. the decision on what action to perform given the current sensory input and past experience, is then handled by an artificial neural network. A simple example where this advantage is evident would be an agent-based model that describes a game of chess. Both players want to win the game, so their utility function would be trivial to find: However, finding a rule based description of which moves the players think will lead them to maximise this function is virtually impossible. A neural network can close this gap.

Artificial Neural Networks
Artificial neural networks [25][26][27][28] are based on the idea of building artificial networks that closely resemble the principles of biological neural networks. A signal enters the network through some sensory cells or input nodes and is transferred to a complex network. Depending on the individual connections between nodes, this input signal is then transformed into an output at an output node (see Figure 1). Similar to a biological neural network, an artificial neural network also needs to be trained. This means it needs to be confronted with various input signals and receive feedback regarding the desired output, so that the required connections can be formed. There are various techniques to train artificial neural networks with different benefits and downsides [29][30][31][32][33][34][35], but they all have in common that they rely on a large amount of data, either measured or generated.  Trained neural networks can then be used for many applications, such as data processing [36], function approximation [37], solving prediction problems [38] and solving classification problems [39]. In a classification problem, a neural network is presented with certain inputs (images or other data) and has to classify each input as a member of a set of predefined categories or classes. The most prominent example for a neural network that solves a classification problem is the automated reading of handwriting. The inputs are images of handwritten letters and each of them corresponds to an actual letter. In that case, the input layer of the network would encode the image and the output layer would have one node for each letter in the alphabet. Using images that are already classified, the network can then be trained to produce a signal in the correct output node.

Input
For the framework, we focus on the classification problem, since this is the problem agents are faced with. They have a set of possible actions available to them and need to decide for each whether this would be a good or a bad decision in their current situation. Alternatively, this could be framed as a prediction problem. In a prediction problem, agents would not classify decision as good or bad but would try to approximate the utility that this decision would lead to. Thus, they would search for a quantitative value for each possible decision and not for a qualitative one, making the process more complicated. Furthermore, the formulation as a prediction problem would assume that agents have a quantitative understanding of their utility function. Since this assumption is unjustified in many systems, the basis of our framework is the aforementioned classification problem, where a qualitative understanding (worse, the same, or better) of their utility is sufficient.

A Framework for Agent-Based Modelling
The core of the presented framework is a trained artificial neural network that solves the following classification problem: "Which of the actions available to me can increase my score?" To train the network, we first need to gather a sufficient amount of data. Each entry should consist of the sensory input of the agent, the decision it made and whether this decision increased its score or not. The modelling process is split into two parts: the experience phase, in which agents make random decisions in order gather experience data, and the application phase, in which the agents use the trained neural network to make realistic decisions. These two phases are depicted schematically in Figure 2. During the experience phase, agents are initialised (in general) in the same environment and under the same conditions in which the actual modelling will take place. Agents take turns and perform the following steps in each turn:

1.
Save all the sensory inputs in a vector s in . 2.
Calculate current score from utility function and save as u 1 .

3.
Perform a random action a from the list of available actions. 4.
Calculate the new score and save as u 2 . 5.
Rate the decision and save as r: good if u 2 > u 1 , bad otherwise..

6.
Add an entry to the experience database: (s in , a, r).
After the experience phase, we end up with a large database that contains information about whether a certain action in a certain situation can be classified as good or bad. Similar to the problem of reading handwritten letters, this information of correctly classified inputs can be used to train the neural network, so that it can then classify inputs, even though they are not exactly the same as the ones it trained with. When applying a neural network for this purpose, one has to take great care regarding validation: Validating a neural network that is used for reinforcement learning is simple. The closer the rate of correct classifications is to 100%, the better. Here, the situation is more complicated: We do not want perfect, but realistic decisions. Thus, reaching a correct classification rate of close to 100% cannot be used as an indicator for successful training. For our application, one has to look at convergence: If the rate of correct classifications stops improving, this is a sign that the agents are sufficiently trained, even if the only reach a rate of, e.g., 60%.
After the training of the neural network, the application phase begins. Agents are reinitialised, so that the random decisions they took during training have no influence on their state. In addition, in the application phase the agents take turns and perform the following steps:

1.
Save all the sensory inputs in a vector s in . 2.
For each action available action a i , use (s in , a i ) as an input for the neural network.

3.
Rank all actions a i according to the certainty that they are good decisions.

4.
Perform the action which was ranked highest.

Applying the Framework
To showcase the presented framework, we apply it to the well-know segregation model by Schelling [23,24]. Agents are distributed randomly on a grid (see Figure 3). Each agent is a member of one of two groups. Every time step, agents can choose to stay at their position or move to a random unoccupied place on the grid. In the original model, one has to define a certain percentage of neighbours from the other group, above which the agents should move away. Depending on this value, one can observe weak or strong segregation [40]. Using the presented framework, there is no need to determine such specific rules. We simply define how to calculate a score S: Agents gain a point for every neighbour in the same group and lose a point for every neighbour in the other group: with S ij being the score of the agent at position x = i, y = j and G ij {−1, 0, 1} being the group membership of the agent (1 for membersr of Group 1, −1 for members of Group 2 and 0 for unoccupied cells). The first term of Equation (2) counteracts the contribution of the sums, where dy = dx = 0 and an agent would be awarded a point for being its own neighbour. Note that the actual formulation of this equation has no impact on the results, as long as each "correct" neighbour increases the score and each "incorrect" neighbour decreases the score by the same amount. Having different weights for positive and negative contributions is possible, but seems counter-intuitive in the context of segregation [41]. In addition to this score, we have to define what agents can sense or know: In this implementation, we assume that agents know which group they are in and that they can see the group of all agents in their Moore neighbourhood [42]. These are all the assumptions that are required for the experience phase.

A B
During the experience phase, agents make random decisions. They store their sensory input and their decision as a positive experience, if the score increased as an effect of the decision and as a negative experience if it decreased. Since the decisions of all agents are random, the states of the system are also more or less random. This means we need to collect data over long periods of time, in order to also have information about situations that have only a small probability of occurring randomly. However, since the data can be generated automatically and this process only involves a random decision, an action and the recalculation of the score, it is computationally cheap. For this study, we generated 1000 time steps in a system consisting of N = 200 agents, leading to 200,000 experiences, which were used for training the neural network.
In the application phase, agents use the trained neural network in order to classify all possible actions as positive or negative and choose what they think is the best option. Note that in this phase the network will encounter data not seen before. Ideally, they should show patterns of segregation, similar to the original model. However, in the framework, we never defined any specific rules or equipped agents with the knowledge what the input parameters mean. The agents receive a vector containing numbers, but they do not know what information is encoded there or how it is encoded. They learn their meaning during the experience phase, which makes the framework highly flexible.
In addition to replicating the original model, we can also experiment with the experience phase. On the one hand, we can change the environment in which the agents train, so that it is different from the one the agents encounter during application. That way we can investigate any change in behaviour that comes from a different training environment. On the other hand, we can truncate some of the senses of the agents during training. This enables us to pinpoint which sensory input is necessary for which kind of behaviour. This is related to the established technique of feature extraction [43]. However, in feature extraction, one is limited to removing or adding features, while here we can make changes to the learning environment that go beyond that.
To showcase these two methods, we train agents in a population with an asymmetrical population share and restrict their vision to fewer cells, to see how the resulting behaviour changes. Figure 4 shows the results for reproducing the results of the original model. Shown is the initial, random configuration of agents (left) and the final configuration after the trained neural network governed the decision making of the agents over 100 time steps (right). We can clearly see segregation: both blue and red agents form connected groups. Note that we use periodic boundary conditions [44], meaning that the cells on the left edge of the image are connected to the ones on the right edge and the cells on the top edge are connected to the ones on the bottom edge. Thus, the large groups are indeed connected. Additionally, we see the scores of individual agents and overall scores. Agents represented as squares have high scores (4 or more), medium circles have a medium score (2-3), small circles have a small score (1) and dots have a score of less than 1. A violin plot of the overall score distribution is also shown, which is similar to a histogram, but rotated so that different distributions can be compared more easily. This plot reveals that, at t = 0, scores are Gaussian distributed around 0 (Figure 4, left), while they tend towards the maximum of 8 at t = 100 (Figure 4, right). This result cannot be taken for granted, since application of the framework to reproduce a model is more than a simple optimisation. Agents have no knowledge about the way their score is calculated, nor do they know the meaning of the sensory input data and how their actions affect them and their score. All behaviour is learned during the generic experience phase. Nevertheless, it was possible to gain very similar results to the original segregation model, but without the need to define rules, serving as a proof of concept that suggests that also other systems, where it is difficult to find correct rules, can be modelled using the framework.

Training in a Different Environment
Beyond this proof of concept, we also investigate what happens if we use an environment during application that is different to the one used in training. Here, we use a population consisting of a majority of blue agents and only a few red agents. Results of this simulation can be seen in Figure 5. Figure 5 (left) shows a random configuration during training. Even though the positioning is completely random, blue agents tend to have a higher score, simply because the density of blue agents is much higher. After the training phase, we change the population to a symmetrical population share and use the trained neural network for decision making. In Figure 5 (right), we see the final configuration after 100 time steps. It is clearly visible that red and blue agents learned completely different behaviour. Blue agents are only satisfied with very high scores and cluster together very closely. Red agents learned that high scores are difficult to obtain and are thus satisfied with a lower score, even though higher scores would be obtainable in this system. We also see blue agents that are encircled by red agents. The reason for this behaviour is that the blue agents never encountered such a situation during training and are not sure what to do, leading to a wrong decision.  with an asymmetric population share, the different groups of agents learn different behaviour. Even if the groups are the same size during application, the group that was under-represented during learning is satisfied with a lower score.

Truncating Input during Training
Models where the agents are not capable of perfectly classifying all actions are especially interesting for the presented framework. Such a situation can be achieved by truncating the senses of the agents during training, representing a system with a suboptimal learning environment. Figure 6 shows the results of such a simulation run. Even with all their senses enabled during application, they tend to ignore input that they cannot relate to a previous experience. Figure 6 (left) shows the final configuration of a simulation, where agents were only able to see their left and their right neighbour, but nothing else. This leads to the agents forming horizontal lines. In Figure 6 (right), red agents could only sense left and right, blue agents could only sense up and down during training. The result is the formation of horizontal lines for the red agents and vertical lines for the blue agents. Overall score is higher than for Figure 6 (left), since the different orientation aids pattern formation.

Discussion
The obtained results show that it is possible to use the framework to generate a model that produces very similar results to Schelling's original model. The most striking difference is that trained agents do not make a perfect decisions all the time. Sometimes they perform actions that decrease their scores, which would not happen if they would correctly classify all actions available to them. This is possible because the neural network is not trained until it has 100% accuracy in solving any classification problem, but under realistic conditions to produce realistic behaviour. This inaccuracy goes beyond the typical problem of generalisation in neural networks: Part of the information that would be needed for correct classification is simply not available to the agents. In the presented example, the average density of agents of a certain colour would be crucial to determine if moving away from a spot would most likely increase or decrease the current score, yet this density is not perceived by the agents and therefore has no influence on their decision.
To highlight the difference between a perfect decision (i.e., correct classifications throughout, as would be the goal of reinforcement learning) and a realistic decision, we truncate the sensory input that agents receive. As expected, agents are then unable to correctly judge many of the possible situations and behave differently. Again, the model provides realistic results without the need to manually define agent behaviour.
While the model was able to successfully reproduce Schelling's segregation model, there are also several possible ways to expand it in order to increase its scope. Currently, the agents only consider the direct effects of their action. This is sufficient for the used model, but other models would require the agents to think ahead several steps in order to behave realistically. To solve this problem, the model would need a small adaptation. Instead of judging the effect of their next action, they need to judge the effect of their next few actions. If the number of available actions or the number of steps becomes to large to evaluate all possible options, a Monte-Carlo approach using a Markov process [45,46] may be used, or even more sophisticated techniques from reinforcement learning, such as efficient exploration [47][48][49]. These techniques are able to filter out the most promising options and evaluate them carefully, while ignoring options that have a low chance of success, thus drastically reducing the sample space that needs to be considered in order to make a decision.
Another difficulty that one could encounter when applying the framework to different systems is related to the experience phase. For the presented application, random decisions were sufficient to generate a satisfying training environment. However, this is not true for all systems. In many systems, the environment produced by realistic decisions is fundamentally different from the environment produced by random decisions, thus making it unsuitable for training the agents. In that case, it might be better to train the agents iteratively. After the first training, agents could enter a second experience phase in which they collect data in the same way as previously, but with decisions utilising the neural network instead of random decisions. This process could be repeated until convergence. However, since generating smart decisions is computationally more expensive than generating random decisions, such a process would drastically increase the time required for training.
Using the framework offers unique benefits. While rules are hard to define and even harder to justify, finding a way of scoring the success of an agent is straightforward most of the time, as is apparent for the example of modelling a game of chess. In addition, finding an appropriate training environment for the agents is easier than finding rules, mainly because such an environment is more visible in the real world than some underlying rules. Furthermore, in many cases, the training environment is completely identical to the application environment. However, using a different training environment offers new flexibility and ways of influencing agent behaviour, as was demonstrated by changing the agent density during the training phase. As long as the training environment is realistic, the resulting agent behaviour is also realistic. On the other hand, an unrealistic training environment will lead to unrealistic behaviour. Since an unrealistic training environment is easier to spot and identify than an unrealistic rule set, using the framework also reduces the risk of errors in model development, to which agent-based models are usually very susceptible [50][51][52].
Further steps in the development of the framework will be its application to many different systems where established models are available for comparison, in order to gain insight into its flexibility and improve it where necessary. After this step, application can be expanded to systems where there is as of yet no satisfying agent-based solution. Possible areas for such tests would be cooperation research [11,53,54], game theory [55][56][57] or sociology [58][59][60][61].
After the framework is sufficiently tested, so that we can ensure that it can be used for most systems suitable for agent-based modelling, it will be made available as open-source software, so that it can be used and modified by the whole research community and beyond. This would facilitate the diffusion of agent-based modelling and provide easy access to this methodology. The current implementation of the framework and the presented model is done in Python, thus all data are transferred directly. Future plans include a flexible interface, so that models programmed in a different language (e.g., NetLogo) can also benefit from the framework.

Conclusions
We identified the search for rules that govern the agent behaviour as the most challenging step in the development of agent-based models. Since the decision process for agents in most systems consists of picking the best alternative, given a finite set of choices, we propose to use a trained neural network to handle this decision as a classification problem. We introduced a generic framework, in which we do not need to define rules, but a score and a training environment for the agents. As a poof of principle, we use the framework to reproduce Schelling's segregation model. We found that, even without specifying rules, we can develop a model that produces the same results as Schelling's original segregation model, but with the added possibility to change the training environment, giving us more flexibility. Thus, this study substantiates the claim that it is possible to derive a generic framework for agent-based models using artificial neural networks. Nevertheless, more work needs to be done before the framework can be applied to an arbitrary system, which would be the final goal of developing this framework.
Funding: This research received no external funding.