ACS2-Powered Pedestrian Flow Simulation for Crowd Dynamics

Hayashida, Tomohiro; Sekizaki, Shinya; Furuya, Yushi; Nishizaki, Ichiro

doi:10.3390/appliedmath5030088

Open AccessArticle

ACS2-Powered Pedestrian Flow Simulation for Crowd Dynamics

by

Tomohiro Hayashida

^*,†

,

Shinya Sekizaki

,

Yushi Furuya

^† and

Ichiro Nishizaki

Graduate School of Advanced Science and Engineering, Hiroshima University (Higashihiroshima Campus), 1-4-1, Kagamiyama, Higashihiroshima 739-8527, Hiroshima, Japan

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

AppliedMath 2025, 5(3), 88; https://doi.org/10.3390/appliedmath5030088

Submission received: 26 May 2025 / Revised: 21 June 2025 / Accepted: 25 June 2025 / Published: 9 July 2025

Download

Browse Figures

Versions Notes

Abstract

Pedestrian flow simulations play a pivotal role in urban planning, transportation engineering, and disaster response by enabling the detailed analysis of crowd dynamics and walking behavior. While physical models such as the Social Force model and Boids have been widely used, they often struggle to replicate complex inter-agent interactions. On the other hand, reinforcement learning (RL) methods, although adaptive, suffer from limited interpretability due to their opaque policy structures. To address these limitations, this study proposes a pedestrian simulation framework based on the Anticipatory Classifier System 2 (ACS2), a rule-based evolutionary learning model capable of extracting explicit behavior rules through trial-and-error learning. The proposed model captures the interactions between agents and environmental features while preserving the interpretability of the acquired strategies. Simulation experiments demonstrate that the ACS2-based agents reproduce realistic pedestrian dynamics and achieve comparable adaptability to conventional reinforcement learning approaches such as tabular Q-learning. Moreover, the extracted behavior rules enable systematic analysis of movement patterns, including the effects of obstacles and crowd composition on flow efficiency and group alignment. The results suggest that the ACS2 provides a promising approach to constructing interpretable multi-agent simulations for real-world pedestrian environments.

Keywords:

pedestrian simulation; classifier system; Anticipatory Classifier System (ACS2); multi-agent systems; interpretable machine learning

1. Introduction

Pedestrian flow simulation has become an indispensable analytical framework across multiple disciplines, including urban design, transportation systems engineering, architecture, crowd safety, and emergency evacuation planning. As cities grow denser and public spaces become more dynamic and multifunctional, the ability to understand and predict pedestrian movement is critical to ensuring not only the efficiency of infrastructure but also the safety and comfort of individuals in shared environments. From large-scale transit terminals and commercial complexes to narrow corridors and emergency exits, modeling pedestrian behavior allows stakeholders to anticipate congestion, improve spatial layouts, and design policies that promote smooth and safe pedestrian flows.

The traditional modeling techniques for pedestrian dynamics generally fall into two main categories: macroscopic and microscopic models. Macroscopic models, such as the Lighthill–Whitham–Richards (LWR) model [1], treat pedestrian flows as continuous media, borrowing heavily from hydrodynamic analogies. These models are computationally efficient and suitable for large-scale simulation, but they lack the ability to reflect the discrete, intentional, and often unpredictable behavior of individuals in crowded environments. As a result, while macroscopic models are useful for estimating overall flow patterns and densities, they fall short in scenarios where localized interactions and decision-making play critical roles.

Microscopic models aim to fill this gap by representing each pedestrian as an individual agent with unique decision-making capabilities. Among the most established of these are the Social Force model [2] and the Boids model [3]. The Social Force model expresses pedestrian dynamics through attractive and repulsive forces, capturing phenomena such as lane formation and bottleneck congestion. Boids models, originally designed to simulate flocking in animals, emphasize local rule-based interactions among agents and have been adapted for human crowd simulation. While these models provide more realistic and granular behavior than macroscopic models, they still rely heavily on manually tuned parameters and predefined interaction rules, limiting their ability to adapt to diverse or unforeseen scenarios.

In recent years, advancements in artificial intelligence and data-driven modeling have paved the way for learning-based simulation techniques. Reinforcement learning (RL), in particular, has been applied to agent-based pedestrian models to facilitate adaptive decision-making in complex environments [4]. These agents can learn optimal policies by interacting with the environment, receiving feedback through rewards or penalties. While these RL models have achieved commendable performance in various tasks, they often suffer from a lack of interpretability due to the opaque nature of neural policies and the large number of parameters involved. This “black-box” character complicates model validation and restricts their use in safety-critical or policy-sensitive applications.

To address the trade-off between adaptability and transparency, this study proposes a novel approach using the Anticipatory Classifier System 2 (ACS2) [5,6], a type of learning classifier system that integrates symbolic rule representation with reinforcement learning principles. Originally inspired by cognitive systems theory [7], the ACS2 extends traditional classifier systems [8,9] by incorporating an “effect” component that encodes expected environmental changes, thereby allowing agents to form predictive models of their environment. Through evolutionary mechanisms such as genetic algorithms and covering strategies, the rule population evolves over time, enabling agents to adapt to complex tasks while maintaining a set of interpretable behavior rules.

One of the central motivations of this study is the interpretability of the resulting behavior. While black-box models such as deep reinforcement learning can outperform traditional algorithms in many cases, their internal workings remain opaque, often making them unsuitable for domains that require transparency, such as public policy, urban safety planning, or collaborative robotic systems. In contrast, the if–then rules derived through ACS2 learning can be directly examined, allowing human researchers and practitioners to understand, verify, and even manually refine agent behavior. This opens the door for more trustworthy and verifiable pedestrian simulation frameworks, especially in high-stakes applications.

The design of our pedestrian simulation environment incorporates realistic elements, including static obstacles, moving agents, and multiple destinations. Agent perception is modeled through sector-based visual fields, and the state space includes information about surrounding pedestrians and obstacles. A discrete set of actions, including movement in various directions and stopping, allows agents to choose from a limited but sufficient set of behavioral responses. Rewards are provided based on goals such as reaching destinations, avoiding collisions, and minimizing stopping behavior. These components collectively define a dynamic and responsive environment suitable for learning-based navigation.

To validate our approach, we conduct multiple simulation experiments designed to assess the behavior of ACS2-based agents in realistic scenarios. These include bottleneck navigation under various density conditions [10], escape dynamics similar to panic evacuation [11], and the emergence of cooperative patterns in bidirectional pedestrian flows [12,13]. Performance metrics include throughput rates, average travel time, conflict frequency, and convergence of behavior. The outcomes demonstrate that ACS2 agents are capable of acquiring socially plausible and efficient navigation strategies under diverse environmental constraints.

Additionally, we compare the ACS2 with conventional reinforcement learning models, such as tabular Q-learning. While both approaches are capable of learning effective behavior in the pedestrian domain, the ACS2 shows a distinct advantage in terms of rule compactness and interpretability. Moreover, the ACS2’s evolutionary learning component promotes generalization, allowing agents to transfer their knowledge across scenarios with minimal retraining [14,15]. These properties make the ACS2 a promising candidate for scalable and human-understandable agent modeling in pedestrian simulation and beyond.

This paper is organized as follows. Section 2 discusses the related work in pedestrian simulation, reinforcement learning, and classifier systems, establishing the theoretical foundation for our study [16,17,18]. Section 3 describes the ACS2-based simulation framework, including the agent architecture, state representation, action space, reward formulation, and learning mechanisms. Section 4 presents a comprehensive evaluation of the simulation experiments and provides insightful comparisons with the baseline models. Finally, Section 5 summarizes the findings and outlines potential directions for future research.

2. Related Work

This section reviews prior studies on pedestrian simulation and classifier systems, which provide the theoretical foundation for our proposed model. We first introduce classical and modern approaches to pedestrian modeling, including macroscopic and microscopic techniques [1,2,3,10]. We then discuss learning classifier systems with a focus on the Anticipatory Classifier System 2 (ACS2), which serves as the basis for our simulation framework [5,6,8,9].

2.1. Pedestrian Simulation Models

Pedestrian simulation has become a cornerstone of research in urban informatics, transportation engineering, and safety science [19,20]. By modeling the movement of individuals or groups within built environments, researchers can explore a wide range of phenomena, including congestion dynamics, evacuation efficiency, and lane formation [11,12]. Pedestrian modeling techniques are generally classified into two major types: macroscopic models and microscopic models. Macroscopic models describe pedestrian flow in aggregate, often drawing from fluid dynamics analogies [1], while microscopic models represent each individual as an autonomous agent interacting with others and the environment [2,3]. This section provides an overview of both modeling approaches, highlighting their assumptions, applications, and limitations in the context of realistic crowd dynamics.

2.1.1. Agent-Based Approaches for Pedestrian Dynamics

Microscopic modeling techniques treat each pedestrian as an autonomous agent, capable of perceiving its surroundings and making decisions based on local information. These agent-based models (ABMs) have been widely employed to simulate crowd dynamics at high fidelity, especially in scenarios involving complex spatial constraints or diverse individual behaviors.

One of the most influential models in this category is the Social Force model proposed by Helbing and Molnár [2]. In this model, pedestrians are subject to a combination of attractive forces (towards their goals) and repulsive forces (from other pedestrians and obstacles), resulting in emergent behaviors such as lane formation in counterflows and clogging at bottlenecks. Although the model is simple and interpretable, it relies on predefined interaction rules and parameter settings that may require fine-tuning for different scenarios.

Another notable example is the Boids model introduced by Reynolds [3], originally designed to simulate flocking behavior in animals. The Boids framework defines three simple local rules: separation, alignment, and cohesion. These principles have been adapted to pedestrian simulation, where agents maintain personal space, align direction with nearby individuals, and move cohesively as a group. Boids-based models are particularly useful in capturing emergent collective behavior with minimal assumptions.

Despite their simplicity and effectiveness, the traditional agent-based models still suffer from limitations. Most notably, they lack adaptability and learning capability. Once their parameters are set, agents behave deterministically or stochastically based on fixed rules. This makes it difficult to simulate realistic learning behaviors or policy adjustments in response to changing environments. To overcome this, learning-based agent models, such as those incorporating reinforcement learning or evolutionary strategies, have recently been proposed [4,15].

These learning-based approaches enhance the flexibility of agent decision-making, allowing agents to acquire new behavior policies through interaction with the environment. However, many of them—especially those based on deep reinforcement learning—lack transparency, making it difficult to interpret or validate their learned policies. This motivates the exploration of hybrid models that combine adaptability with interpretability, as discussed in the subsequent sections.

Recent theoretical analyses and advancements in Q-learning, such as those presented in [21,22], have rigorously examined its convergence properties and effectiveness. These studies provide important theoretical insights into the traditional RL paradigm. In contrast, our work leverages the ACS2’s predictive and interpretable nature, thus complementing these recent findings by demonstrating an alternative approach with clear advantages in interpretability and predictive rule evaluation.

2.1.2. Empirical Data and Its Integration into Simulations

Empirical data plays a crucial role in the development and validation of pedestrian simulation models. While theoretical models and learning-based approaches can capture essential behavioral mechanisms, they often rely on assumptions that need to be grounded in real-world observations. Empirical studies provide insights into pedestrian density, velocity, spatial distribution, and collective behaviors, enabling model calibration and evaluation under realistic conditions [19,20].

One of the most fundamental forms of empirical data in pedestrian dynamics is the relationship between pedestrian density and walking speed. Originally studied in the context of vehicular traffic [20], this relationship was later adapted to pedestrian contexts and formalized in various fundamental diagrams [19]. These diagrams typically show that, as density increases, walking speed decreases nonlinearly, and flow efficiency reaches a peak before rapidly declining under congested conditions. Such relationships are essential for calibrating simulation models and are widely used in urban planning, evacuation design, and safety assessment.

Empirical data can be collected through controlled laboratory experiments, video tracking in public spaces, or wearable sensors. For example, Liu et al. [10] conducted controlled bottleneck experiments with bidirectional pedestrian flows, recording trajectory data to quantify passing behavior and congestion patterns. These data sets are valuable for validating simulation models, especially in scenarios involving physical constraints, limited space, or group heterogeneity.

Integrating empirical data into simulations involves both qualitative and quantitative techniques. Quantitatively, model parameters can be adjusted to fit observed metrics, such as flow rate, throughput, or speed distributions. Qualitatively, simulated behaviors such as lane formation or arching near exits can be visually compared with empirical footage. This calibration process is essential to ensure that models generalize well across environments and yield reliable predictions in practical applications. Without empirical grounding, models may overfit to specific scenarios or fail to capture critical crowd phenomena, reducing their utility in real-world decision-making.

In this study, we utilize empirical findings to evaluate the realism of behaviors generated by ACS2-based agents. For instance, we compare the simulated density–velocity relationships and bottleneck flow characteristics against published experimental results [10,11]. This empirical grounding not only validates our approach but also demonstrates its potential for interpreting emergent crowd behaviors through an interpretable learning-based framework.

2.2. Classifier Systems and the ACS2 Framework

Learning classifier systems (LCSs) are a family of rule-based machine learning algorithms that combine reinforcement learning with evolutionary computation. Originally introduced by Holland [16], LCSs are designed to discover sets of if–then rules that enable agents to adapt to changing environments through trial-and-error interactions. These systems have been successfully applied in various domains, including robotics, adaptive control, and knowledge discovery [15,23].

This section provides an overview of classifier system architectures, with a particular focus on the Anticipatory Classifier System 2 (ACS2) [5], which serves as the core mechanism of the agent model proposed in this study. We begin by outlining the foundational structure and learning principles of conventional classifier systems and then explain the extensions introduced by the ACS2 to incorporate environmental prediction and cognitive modeling.

2.2.1. Fundamentals of Learning Classifier Systems

Learning classifier systems (LCSs) are a class of adaptive systems that learn and evolve sets of condition–action rules—often expressed as if–then rules—through interaction with their environment. First introduced by John Holland [16,17], LCSs combine principles of reinforcement learning and evolutionary computation to generate and refine rule populations capable of solving complex decision-making tasks.

An LCS typically consists of four main components: a rule population (also called a classifier set), a reinforcement learning mechanism, a genetic algorithm (GA), and an environmental interface. When an input state is received from the environment, the system matches it against rule conditions to form a match set. An action is selected from this set and executed, leading to a reward or penalty. Based on the received feedback, the fitness of the rules involved in the decision is updated. Over time, fit rules are favored during reproduction via the genetic algorithm, while less useful rules are eliminated or mutated. This evolutionary learning process enables the system to refine its rule base adaptively and autonomously.

Several notable variants of LCSs have been developed, each emphasizing different aspects of the learning process. The ZCS (Zeroth-Level Classifier System) [8] was one of the earliest practical implementations and used a simple bucket-brigade algorithm for credit assignment. Later, the XCS (eXtended Classifier System) [9] introduced the idea of accuracy-based fitness, focusing on prediction accuracy rather than mere reward accumulation. The XCS became a widely adopted standard in LCS research and laid the foundation for subsequent extensions, including systems capable of model-building, memory-based processing, and anticipatory learning mechanisms.

2.2.2. Anticipatory Classifier System 2 (ACS2)

The Anticipatory Classifier System 2 (ACS2) is an extension of traditional learning classifier systems that incorporates environmental prediction into the rule structure. Unlike the XCS, which evaluates if–then rules exclusively based on reward outcomes obtained after actions, the ACS2 evaluates rules based on both the accuracy of predicted environmental changes (effects) and the rewards. This predictive capability is particularly beneficial in scenarios like pedestrian simulations, where immediate reward feedback is limited, and the effectiveness of rules is primarily evaluated at the end of an episode. Building on the foundations of the XCS, the ACS2 enhances the representational power of rules by introducing an additional component—known as the effect part—that encodes the expected outcome of executing a rule in a given state [5,6]. This allows agents not only to associate actions with conditions and rewards but also to form internal models of environmental dynamics, improving both foresight and adaptability.

Each rule in the ACS2 consists of three components: a condition, an action, and an effect. The condition specifies the set of states in which the rule can apply, the action denotes the behavior to execute, and the effect predicts the resulting state after the action is taken. During learning, the system observes actual state transitions and compares them with predicted effects. Rules that accurately predict the environmental response are reinforced, while those that fail are penalized. This anticipatory learning process enables the agent to refine not only its action policy but also its internal model of the world.

The ACS2 employs a genetic algorithm to maintain diversity and discover novel rules. Reproduction is biased toward rules with high prediction accuracy and environmental consistency. Moreover, when no suitable rule exists for a given state–action pair, a covering mechanism generates new candidate rules that incorporate the observed transition as an initial effect. This integration of reinforcement and supervised-like learning allows the ACS2 to handle both reactive and model-based decision-making within a unified framework. A key theoretical advantage of the ACS2 over conventional RL methods, such as Q-learning, lies in its predictive evaluation mechanism: rules are maintained and updated not only based on immediate rewards but also on their accuracy in predicting future environmental states. This predictive feature enables ACS2 agents to adapt effectively, especially in environments with sparse and delayed rewards.

One of the major advantages of the ACS2 is its interpretability. Since each rule is explicit and symbolic, human analysts can directly inspect the agent’s learned behavior, unlike in black-box models such as deep neural networks. In this study, we leverage the ACS2 to design pedestrian agents capable of learning navigational strategies while retaining rule-level transparency. This supports post hoc analysis of learned behaviors and offers insights into emergent crowd phenomena, such as path optimization, conflict avoidance, and self-organization.

3. Simulation Framework

To evaluate the applicability of the Anticipatory Classifier System 2 (ACS2) to pedestrian flow modeling, we designed a simulation framework in which agents interact with a dynamic environment through perceptual input, rule-based decision-making, and reinforcement-guided learning. This section describes the architecture of ACS2-powered pedestrian agents, the state representation derived from the environment, the available action space, and the reward structure that drives learning.

3.1. Agent Architecture and Visual Perception

Each pedestrian agent in the simulation is implemented as an autonomous decision-making entity equipped with a sector-based field of view and a rule-based controller. Inspired by human visual perception, the field of view is divided into nine discrete sectors, spanning ±60° from the agent’s facing direction. Each sector contains information on whether it is occupied by an obstacle, another agent, or represents a valid destination cell. This discretized sensory input forms the condition part of the ACS2 rules and serves as the agent’s perception of the environment at each time step. Figure 1 illustrates the perceptual field of a pedestrian agent. The visual field spans 120 degrees, divided into nine sectors covering the forward direction and peripheral views. Each sector is encoded with a 2-bit value representing the type of object detected—whether it is empty, an obstacle, another agent, or a goal cell. This structured representation enables compact and interpretable rule matching in the ACS2 framework.

The sensory input from the nine sectors is encoded into a binary string, which constitutes the environmental state as perceived by the agent. Specifically, each sector is assigned a 2-bit code indicating its contents: 00 for empty, 01 for obstacle, 10 for another agent, and 11 for a goal-related cell. The full state representation thus becomes an 18-bit string that succinctly captures the spatial configuration around the agent. This binary encoding enables efficient rule matching and classification within the ACS2 framework as rules are defined over condition strings of identical length.

Each agent selects its action from a discrete set of seven possible movement directions at every time step. These include moving forward, turning ±10° or ±60° relative to the current heading, stopping in place, or moving directly toward the goal if it is within the field of view. This action set is designed to balance navigational flexibility with computational tractability, enabling agents to react adaptively to dynamic surroundings while maintaining a manageable policy space. The selected action is executed only if the target cell is not occupied by an obstacle or another agent; otherwise, the agent is forced to stop, receiving a penalty in the reward function.

To guide the learning process, agents receive scalar rewards based on their interactions with the environment. A positive reward is granted when an agent successfully reaches its goal, whereas penalties are applied in the case of collisions, invalid moves, or unnecessary stopping. More specifically, a high positive reward (e.g., +100) is assigned upon goal arrival, a moderate negative reward (e.g., −10) is given for collisions or blocked movements, and a small penalty (e.g., −1) is applied when the agent chooses to stop without necessity. This reward structure encourages agents to navigate efficiently and safely, balancing goal-directed movement with collision avoidance.

Learning in the ACS2 is driven by a combination of reinforcement signals and anticipatory predictions. When an agent selects and executes an action, the ACS2 evaluates the corresponding rule based on the reward received and the accuracy of its predicted environmental outcome. Rules that produce both effective actions and accurate predictions are reinforced and selected for reproduction through a genetic algorithm. Conversely, poorly performing rules are penalized or replaced. In cases where no matching rule exists for the current situation, a covering mechanism generates a new rule based on the observed condition, action, and resulting effect. This dynamic rule evolution enables agents to construct compact and interpretable policy sets that adapt to diverse scenarios over time.

3.2. Action Set and Movement Dynamics

The action set available to each agent is designed to balance simplicity, navigational realism, and learning efficiency. At every time step, the agent selects one of seven possible actions: move forward, turn ±10° or ±60° relative to its current heading, stop, or move directly toward the goal direction if the goal is visible. Each action corresponds to a transition to an adjacent cell in a discretized grid environment. The resulting movement is subject to environmental constraints: agents cannot move into cells occupied by obstacles or other agents. If the selected move is invalid, the agent remains in place and incurs a penalty, as defined by the reward structure described in Section 3.1.

The grid-based environment assumes that agents move at a uniform speed of one cell per time step, provided the selected action is valid. The direction of movement is updated according to the agent’s current heading and the relative angle associated with the chosen action. In the case of goal-directed movement, the agent computes the angular offset between its current heading and the goal direction and selects the action that most closely aligns with that vector. After each movement decision, the agent’s heading is updated to reflect the direction of the most recent action, which affects its future perception and decision-making. This orientation-aware mechanism allows agents to exhibit more realistic and consistent motion trajectories over time.

To prevent overlapping and simulate realistic crowd behavior, the model enforces collision avoidance through a simple but effective mechanism. When two or more agents attempt to move into the same target cell simultaneously, a conflict resolution rule is applied: all conflicting agents are prevented from moving and instead remain in their current positions. This approach models the hesitation and re-planning often observed in real pedestrian interactions. Additionally, agents maintain a soft repulsion effect by assigning higher penalties for selecting actions that bring them into close proximity with others, even if no direct collision occurs. These mechanisms contribute to smoother flow dynamics and more natural lane formation during dense crowd scenarios.

4. Simulation Experiments

To evaluate the effectiveness and validity of the proposed ACS2-based pedestrian simulation framework, we conducted a series of experiments under diverse crowd scenarios. The simulations were conducted using custom-developed Python code (Python 3.11 with Numpy 1.23.5, pandas 1.5.3, matplotlib 3.7.0, and other standard numerical libraries), incorporating standard numerical libraries such as NumPy. The simulation explicitly implemented the ACS2 algorithm as detailed earlier in the manuscript. The objectives of these experiments are threefold: (1) to assess whether the learned behavior of ACS2 agents aligns with empirical pedestrian flow data; (2) to compare the performance and interpretability of the ACS2 against conventional reinforcement learning models such as tabular Q-learning; and (3) to analyze the emergent crowd dynamics under different environmental configurations, including obstacle placements and flow ratios. This section describes the experimental settings, performance metrics, and evaluation results in detail. (Note: Due to copyright and intellectual property constraints, we are unable to publicly share the simulation source code. However, the ACS2 algorithm and experimental procedures are comprehensively detailed in the manuscript, ensuring reproducibility. Additional clarifications can be provided upon request.)

5. Experiments

To evaluate the effectiveness and validity of the proposed ACS2-based pedestrian simulation framework, we conducted a series of experiments under diverse crowd scenarios.

Table 1 summarizes the key parameters used in the simulation. These include environmental configurations such as agent speed and field of view, as well as ACS2-specific learning and evolutionary parameters. The reward structure reflects incentives for goal-reaching and penalties for collisions and unnecessary idling.

5.1. Validation with Empirical Pedestrian Flow

The first set of experiments focuses on validating whether ACS2-powered agents can reproduce pedestrian flow patterns observed in empirical studies. We use data from controlled bottleneck experiments reported by Liu et al. [10], which provide measurements of pedestrian density, velocity, and throughput under various inflow conditions. The simulation environment is configured to mimic these settings, including corridor width, entrance size, and agent arrival rates.

The simulation environment, illustrated in Figure 2, models a corridor with a fixed width of 3 m and a variable agent inflow rate. The bottleneck is represented by a narrowed passage 1 m wide, located at the corridor exit. Agents are generated at the entrance with randomized arrival intervals following a Poisson distribution, replicating inflow variability. Each agent is assigned a random destination at the far end of the corridor, and its behavior is governed by ACS2-learned rules. The simulation runs for 300 time steps per episode, and results are averaged over 20 episodes to ensure statistical robustness. Evaluation metrics include the density–velocity relationship, the number of agents passing through the bottleneck per unit time, and the spatial distribution of congestion.

To assess the realism of the learned pedestrian behavior, we compare the simulation results with the empirical density–velocity relationship. A designated measurement area is placed in front of the bottleneck, where pedestrian density and average velocity are recorded. Throughput is also computed by tracking the number of agents passing through the bottleneck per unit time. These metrics jointly assess the plausibility and generalizability of the learned behavior under realistic flow conditions.

Figure 3 shows the relationship between pedestrian density and average velocity, measured within the designated area. The ACS2-based simulation reproduces the characteristic inverse correlation observed in empirical studies: as density increases, average velocity declines, with a critical threshold near 1.8 persons/m², beyond which flow efficiency drops sharply. This pattern aligns well with the experimental data reported by Liu [10]. In this figure, the horizontal axis ρ denotes pedestrian density [1/m²], and the vertical axis v represents the average walking speed [m/s] measured within the designated area.

Liu [10] report a similar trend, particularly beyond 3.0 persons/m², where velocity drops significantly. Our results exhibit comparable behavior under different inflow conditions (labeled as 1.6-1, 1.6-2, and 1.6-3), with the transition from free to congested flow clearly emerging near 2.0 persons/m².

5.2. Comparison with Reinforcement Learning (Q-Learning)

To further evaluate the performance and interpretability of the proposed ACS2-based model, we compared it with a conventional reinforcement learning approach—namely tabular Q-learning. Q-learning is a widely used algorithm for learning optimal policies through trial-and-error interactions with the environment. In our implementation, agents maintain a Q-table indexed by discretized state–action pairs and update the values using the standard Bellman equation with ϵ-greedy exploration. The state representation mirrors the 18-bit perception vector used in the ACS2 setting, and the action set is identical. Both models are trained in the same environment under equivalent simulation conditions, allowing for a fair comparison in terms of convergence behavior, throughput efficiency, and policy complexity.

The simulation environment used for this comparison is illustrated in Figure 4. Unlike the bottleneck scenario described in Section 5.1, this setup features a T-shaped corridor layout with two symmetric entry points and a shared central zone. This configuration is designed to induce moderate agent interference and decision-making complexity while maintaining a balanced flow across both learning models.

Figure 5 compares the learning performance of the ACS2 and tabular Q-learning in terms of average episode reward over training iterations. Both models show an upward trend in reward, indicating successful learning; however, the convergence behavior differs significantly. The ACS2 model exhibits smoother and more stable convergence, with less variance across episodes. In contrast, tabular Q-learning shows oscillatory behavior during early training and requires more iterations to reach a comparable level of performance. These results suggest that the ACS2 is more sample-efficient and robust in sparse-reward multi-agent environments like pedestrian flow scenarios.

From Figure 5, we observe that both the ACS2 and tabular Q-learning achieve similar levels of average episode reward over training iterations, indicating no significant difference in overall performance. However, their internal policy representations differ substantially in terms of structural compactness and interpretability. In the ACS2 model, knowledge is encoded as a set of condition–action–effect rules, which can generalize across similar perceptual inputs. As a result, the final rule population typically contains fewer than 500 unique rules, even after extended training. In contrast, tabular Q-learning stores a value for each discretized state–action pair, leading to a significantly larger table. Given the 18-bit state encoding and seven possible actions, the full Q-table contains over 180,000 entries, most of which remain unused or sparsely updated due to the combinatorial explosion of the state space. Although our experimental setup allowed a maximum of 300,000 rules for both the ACS2 and Q-learning, most experiments reached this upper bound. However, we did not record specific quantitative data on the number or simplicity of generated rules during experiments, which represents a limitation in quantifying interpretability. Nonetheless, the modular nature of ACS2 rules still qualitatively enhances human interpretability compared to Q-learning’s large numerical tables.

Despite these strengths, the ACS2 may face limitations in extremely diverse environments, where the number of rules required for comprehensive coverage can grow significantly, potentially affecting scalability and learning efficiency. This growth in the rule population could slow down learning processes and complicate rule management in highly heterogeneous scenarios. Therefore, careful tuning of the genetic algorithm parameters and periodic rule pruning might be necessary to maintain efficiency and effectiveness in practical applications.

5.3. Analysis of Crowd Dynamics and Rule-Based Behavior

While the previous sections focused on performance metrics and learning stability, we now turn to a qualitative analysis of agent behavior and crowd-level dynamics emerging from ACS2-based learning. Since the ACS2 produces human-readable rules that map perceptual inputs to actions, we can inspect these rules to uncover typical behavioral patterns, such as lane formation, collision avoidance, and turn-taking at intersections. Furthermore, we analyze spatial flow patterns and congestion formation over time to examine how local rule execution leads to global crowd phenomena. This analysis provides insight into the interpretability and emergent coordination properties of the learned system.

To investigate the influence of environmental design on pedestrian flow, we adopt a simulation environment in which a central obstacle is placed, as illustrated in Figure 6. This layout is motivated by urban design considerations, aiming to assess how such obstacles affect collision avoidance behavior and overall flow efficiency. The simulation runs for 800 episodes per trial, with 10 independent trials conducted for statistical validity. The first 700 episodes are used for learning, and the remaining 100 episodes are used for evaluation. We also analyze the acquired ACS2 rules to evaluate how learned behaviors contribute to crowd-level coordination and improved flow.

To quantitatively evaluate the emerging coordination and behavioral convergence of agents, we introduce three metrics: the separation rate, information entropy, and action selection probability. The separation rate measures the degree to which pedestrian groups are spatially divided when passing around the central obstacle. It is defined as

Separation Rate = \frac{| L_{y} - L_{g} | + | R_{g} - R_{y} |}{L_{y} + L_{g} + R_{y} + R_{g}},

(1)

where

L_{y}

and

L_{g}

represent the number of agents passing on the left side from the lower and upper groups, respectively, and

R_{y}

and

R_{g}

represent those passing on the right side. Information entropy quantifies behavioral diversity, capturing the uncertainty in pedestrian decision-making. High entropy indicates more varied and less predictable movements, while low entropy reflects convergence towards stable coordinated behaviors. To assess behavioral diversity, we calculate the information entropy H, which captures how uniformly actions are distributed under observed states:

H = - \sum_{s \in S} p (s) \sum_{a \in A} p (a | s) log p (a | s),

(2)

where S is the set of observed states, A is the action set,

p (s)

is the empirical frequency of state s, and

p (a | s)

is the probability of choosing action a in state s. This selection probability is computed based on the fitness of ACS2 classifiers as

p (a | s) = \frac{\sum_{i \in C (s, a)} f_{i}}{\sum_{i \in C (s)} f_{i}},

(3)

where

C (s)

is the set of classifiers matching state s,

C (s, a)

is the subset proposing action a, and

f_{i}

is the fitness of classifier i. These metrics jointly allow us to analyze how local rule adaptations scale up to collective order in pedestrian flows.

To assess the emergence of coordinated behavior between pedestrian groups, we analyze the entropy of action distributions for agents originating from the upper and lower sides of the corridor. In this study, we define the convergence of collective behavior as the point at which the entropy for both groups drops below a predefined threshold—specifically the maximum entropy value during the evaluation period—continuously for 20 episodes. The number of episodes until this convergence is reached is used as the time to consensus formation.

Figure 7 shows the maximum flow rate and the separation rate observed under varying obstacle widths w. Figure 8 depicts the number of episodes required to achieve consensus under the same conditions. These results reveal a clear relationship between obstacle width and emergent flow efficiency. To derive the summary metric in Figure 8, we computed information entropy (Equation (2) and action selection probability Equation (3)) to quantitatively evaluate pedestrian behavioral diversity and convergence. Due to the extensive volume of numerical data, individual values are not reported as they offer limited additional insight.

From Figure 7 and Figure 8, we observe that reduced consensus time and high separation rates contribute to increased maximum flow rate. When obstacle width is zero (i.e., no obstacle is present), pedestrians naturally separate according to their direction of travel, which results in a high flow rate. However, the time required to reach consensus in this condition is longer since the wide corridor allows agents to choose diverse paths freely, delaying the emergence of coordinated flow. When the obstacle width ranges from 0.2 to 0.6 m, the consensus time remains long, while the flow rate tends to decrease. This is because, although the obstacle exists, the passage is still wide enough for divergent path choices, resulting in dispersed agent movement and delayed coordination. In contrast, when the width is between 1.2 and 1.4 m, both the maximum flow rate and separation rate significantly increase, and the time to consensus shortens. A narrower passage restricts directional options and promotes group alignment, leading to more efficient movement. Notably, a width of 1.4 m yields the best performance across all three metrics. For widths above 1.6 m, both flow rate and separation rate decline again, and consensus takes longer to achieve. This degradation is attributed to the difficulty of self-organizing in excessively narrow paths, which increases agent interference and delays agreement on travel direction.

These findings suggest that inserting a central obstacle can indeed improve pedestrian flow, but the effect is highly dependent on the obstacle’s width. An appropriately sized obstacle facilitates alignment and separation, promoting smoother movement. Conversely, poorly sized obstacles can delay consensus formation and reduce efficiency. Therefore, in urban design, careful adjustment of obstacle width can lead to improved pedestrian throughput and spatial organization.

6. Conclusions

This study proposed a rule-based pedestrian simulation framework using the Anticipatory Classifier System 2 (ACS2), aiming to model emergent behavior in crowd dynamics through interpretable learning-based agent policies. Each pedestrian agent perceives its local environment through a sector-based visual field and selects actions based on compact condition–action–effect rules evolved via reinforcement learning and genetic algorithms.

Through simulation experiments, we validated the plausibility of learned behaviors under bottleneck scenarios, demonstrating that ACS2 agents can reproduce realistic density–velocity relationships and adapt to variable inflow conditions. A comparative analysis with tabular Q-learning revealed that, while both methods achieved similar performance in terms of cumulative reward, the ACS2 offered notable advantages in rule compactness, learning stability, and interpretability.

Further experiments in environments containing a central obstacle showed that the ACS2 agents developed coordinated avoidance and alignment strategies. Metrics such as separation rate, information entropy, and consensus time revealed that appropriately sized obstacles promote faster convergence to structured movement and improve the overall flow efficiency. In particular, obstacle widths around 1.4 m yielded optimal performance across all the evaluation metrics.

Overall, this research highlights the potential of rule-based learning approaches like the ACS2 in modeling complex social behaviors in pedestrian dynamics. The interpretability of evolved rules provides valuable insight for analyzing collective decision-making processes, and the framework can inform future work in urban design and adaptive crowd management systems. Future research will explore practical applications, such as evacuation simulations during emergencies or crowd management at large public events, leveraging the interpretability and adaptability of ACS2-generated behavior rules.

Author Contributions

Conceptualization, T.H.; methodology, T.H. and Y.F.; software, Y.F.; validation, T.H., S.S., Y.F. and I.N.; formal analysis, T.H. and Y.F.; investigation, T.H. and I.N.; resources, T.H. and I.N.; data curation, Y.F.; writing—original draft preparation, T.H.; writing—review and editing, T.H.; visualization, T.H. and Y.F.; supervision, I.N.; project administration, T.H.; funding acquisition, T.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data is contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Lighthill, M.J.; Whitham, G.B. On kinematic waves. I. Flood movement in long rivers. II. A theory of traffic flow on long crowded roads. Proc. R. Soc. A Math. Phys. Eng. Sci. 1955, 229, 281–345. [Google Scholar]
Helbing, D.; Molnár, P. Social force model for pedestrian dynamics. Phys. Rev. E 1995, 51, 4282–4286. [Google Scholar]
Reynolds, C.W. Flocks, herds and schools: A distributed behavioral model. ACM SIGGRAPH Comput. Graph. 1987, 21, 25–34. [Google Scholar]
Xu, D.; Huang, X.; Mango, J.; Li, X.; Li, Z. Simulating multi-exit evacuation using deep reinforcement learning. Trans. GIS 2021, 25, 1542–1564. [Google Scholar]
Butz, M.V.; Stolzmann, W. An algorithmic description of ACS2. Lect. Notes Comput. Sci. LNCS 2002, 2321, 361–390. [Google Scholar]
Stolzmann, W. Anticipatory classifier systems. In Genetic Programming 1998, Proceedings of the Third Annual Conference, Madison, WI, USA, 22–25 July 1998; University of Wisconsin: Madison, WI, USA, 1998; pp. 658–664. [Google Scholar]
Holland, J.H.; Reitman, J.S. Cognitive systems based on adaptive algorithms. ACM SIGART Bull. 1977, 63, 49. [Google Scholar]
Wilson, S.W. ZCS: A zeroth-level classifier system. Evol. Comput. 1994, 2, 1–18. [Google Scholar]
Wilson, S.W. Classifier fitness based on accuracy. Evol. Comput. 1995, 3, 149–175. [Google Scholar]
Liu, X.; Song, W.; Lv, W. Empirical data for pedestrian counterflow through bottlenecks in the channel. Transp. Res. Procedia 2014, 2, 26–34. [Google Scholar]
Helbing, D.; Farkas, I.J.; Vicsek, T. Simulating dynamical features of escape panic. Nature 2001, 407, 487–490. [Google Scholar]
Schadschneider, A.; Seyfried, A.; Kretz, T.; Seyfried, A.; Rogsch, C.; Klüpfel, H. Evacuation dynamics: Empirical results, modelling and applications. In Encyclopedia of Complexity and Systems Science; Meyers, R.A., Ed.; Springer: Berlin/Heidelberg, Germany, 2008; pp. 3142–3176. [Google Scholar]
Rindsfüser, G.; Klügl, F. Agent-based pedestrian simulation: A case study of the Bern railway station. DISP— Plan. Rev. 2007, 43, 9–18. [Google Scholar]
Stolzmann, W.; Butz, M.V. Latent learning and action planning in robots with anticipatory classifier systems. In Learning Classifier Systems: From Foundations to Applications; Lanzi, P.L., Stolzmann, W., Wilson, S.W., Eds.; Springer: London, UK, 2000; pp. 301–320. [Google Scholar]
Holmes, J.H.; Sager, J.A. Rule discovery in epidemiologic surveillance data using EpiXCS: An evolutionary computation approach. In Artificial Intelligence in Medicine; Carbonell, J.G., Siekmann, J., Eds.; Springer: Berlin, Germany, 2005; pp. 444–452. [Google Scholar]
Holland, J.H. Adaptation in Natural and Artificial Systems; University of Michigan Press: Ann Arbor, MI, USA, 1975. [Google Scholar]
Holland, J.H. Adaptation. In Progress in Theoretical Biology; Rosen, R., Snell, F.M., Eds.; Academic Press: New York, NY, USA, 1976; Volume 4, pp. 263–293. [Google Scholar]
De, J.; Kenneth, A. Learning with genetic algorithms: An overview. In Machine Learning 3; Langley, P., Ed.; Kluwer Academic Publishers: Philip Drive Norwell, MA, USA, 1988; pp. 121–138. [Google Scholar]
Fruin, J.J. Pedestrian Planning and Design; Metropolitan Association of Urban Designers and Environmental Planners: New York, NY, USA, 1971. [Google Scholar]
Greenshields, B.D. A study of traffic capacity. Proc. Highw. Res. Board 1935, 14, 448–477. [Google Scholar]
Wan, X.; Yang, Z.; Liu, Y.; Chen, G. A reinforcement learning-based strategy updating model for the cooperative evolution. Phys. Stat. Mech. Appl. 2023, 618, 128699. [Google Scholar] [CrossRef]
Alavizadeh, H.; Alavizadeh, H.; Jang-Jaccard, J. Deep Q-learning based reinforcement learning approach for network intrusion detection. Computers 2022, 11, 41. [Google Scholar] [CrossRef]
Dorigo, M.; Colombetti, M. Robot Shaping: An Experiment in Behavior Engineering; MIT Press: Cambridge, MA, USA, 1998; Bradford Books. [Google Scholar]

Figure 1. Perceptual field.

Figure 2. Corridor configuration used in the simulation, based on the empirical setup reported by Liu et al. [10].

Figure 3. Density–velocity relationships observed in simulation under three distinct inflow scenarios (labeled 1.6-1, 1.6-2, and 1.6-3, representing different initial inflow densities).

Figure 4. Simulation environment used for the comparison between ACS2 and tabular Q-learning.

Figure 5. Learning performance of ACS2 and tabular Q-learning measured by average episode reward over training iterations.

Figure 6. Simulation environment with central obstacle used for analyzing crowd dynamics.

Figure 7. Maximum flow rate and left–right separation rate as a function of obstacle width.

Figure 8. Number of episodes required to achieve consensus formation at varying obstacle widths.

Table 1. Experimental parameters.

Parameter	Value
Agent Speed	1.2 m/s
Field of View Distance	1.2 m
Field of View Angle	±60°
Number of Sectors	9
Collision Radius	0.25 m
ACS2 Max Rule Set Size	300,000
Genetic Algorithm Threshold ( $θ_{GA}$ )	40
Crossover Probability ( $p_{c}$ )	0.5
Mutation Probability ( $p_{m}$ )	0.01
Discount Factor ( $γ$ )	0.71
Reward Structure	Goal: +100
	Collision: −10
	Idle penalty: −0.1

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hayashida, T.; Sekizaki, S.; Furuya, Y.; Nishizaki, I. ACS2-Powered Pedestrian Flow Simulation for Crowd Dynamics. AppliedMath 2025, 5, 88. https://doi.org/10.3390/appliedmath5030088

AMA Style

Hayashida T, Sekizaki S, Furuya Y, Nishizaki I. ACS2-Powered Pedestrian Flow Simulation for Crowd Dynamics. AppliedMath. 2025; 5(3):88. https://doi.org/10.3390/appliedmath5030088

Chicago/Turabian Style

Hayashida, Tomohiro, Shinya Sekizaki, Yushi Furuya, and Ichiro Nishizaki. 2025. "ACS2-Powered Pedestrian Flow Simulation for Crowd Dynamics" AppliedMath 5, no. 3: 88. https://doi.org/10.3390/appliedmath5030088

APA Style

Hayashida, T., Sekizaki, S., Furuya, Y., & Nishizaki, I. (2025). ACS2-Powered Pedestrian Flow Simulation for Crowd Dynamics. AppliedMath, 5(3), 88. https://doi.org/10.3390/appliedmath5030088

Article Menu

ACS2-Powered Pedestrian Flow Simulation for Crowd Dynamics

Abstract

1. Introduction

2. Related Work

2.1. Pedestrian Simulation Models

2.1.1. Agent-Based Approaches for Pedestrian Dynamics

2.1.2. Empirical Data and Its Integration into Simulations

2.2. Classifier Systems and the ACS2 Framework

2.2.1. Fundamentals of Learning Classifier Systems

2.2.2. Anticipatory Classifier System 2 (ACS2)

3. Simulation Framework

3.1. Agent Architecture and Visual Perception

3.2. Action Set and Movement Dynamics

4. Simulation Experiments

5. Experiments

5.1. Validation with Empirical Pedestrian Flow

5.2. Comparison with Reinforcement Learning (Q-Learning)

5.3. Analysis of Crowd Dynamics and Rule-Based Behavior

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI