Optimizing Evacuation for Disabled Pedestrians with Heterogeneous Speeds: A Floor Field Cellular Automaton and Reinforcement Learning Approach

Lyu, Yimiao; Wang, Hongchun

doi:10.3390/buildings15224191

Open AccessArticle

Optimizing Evacuation for Disabled Pedestrians with Heterogeneous Speeds: A Floor Field Cellular Automaton and Reinforcement Learning Approach

by

Yimiao Lyu

and

Hongchun Wang

^*

School of Urban Economics and Management, Beijing University of Civil Engineering and Architecture, Beijing 100044, China

^*

Author to whom correspondence should be addressed.

Buildings 2025, 15(22), 4191; https://doi.org/10.3390/buildings15224191

Submission received: 24 August 2025 / Revised: 24 October 2025 / Accepted: 12 November 2025 / Published: 20 November 2025

(This article belongs to the Topic Sustainability in Buildings: New Trends in the Management of Construction and Demolition Waste, 2nd Volume)

Download

Browse Figures

Versions Notes

Abstract

Safe and efficient building evacuation for heterogeneous populations, particularly individuals with disabilities, remains a critical challenge in emergency management. This study proposes a hybrid evacuation framework that integrates Floor Field Cellular Automaton (FFCA) with reinforcement learning, specifically a Deep Q-Network (DQN), to enhance adaptive decision-making in dynamic and complex environments. The model incorporates velocity heterogeneity, friction-based conflict resolution, and real-time path planning to capture diverse mobility capabilities and interactions among evacuees. Simulation experiments were conducted under varying population densities, walking speeds, and exit configurations, considering four types of occupant groups: able-bodied individuals, wheelchair users, and people with visual or hearing impairments. The results demonstrate that the DQN-enhanced model consistently outperforms the conventional SFF + DFF approach, achieving significant reductions in evacuation time, particularly under high-density and reduced-speed scenarios. Notably, the DQN dynamically adapts evacuation paths to mitigate congestion, thereby improving both system efficiency and the safety of vulnerable groups. These findings highlight the potential of combining CA-based environmental modeling with reinforcement learning to develop adaptive and inclusive evacuation strategies. The proposed framework provides practical insights for designing evacuation protocols and intelligent navigation systems in public buildings. Future work will extend the proposed FFCA + DQN framework to more complex and realistic environments, including multi-exit and multi-level buildings, and further integrate multi-agent reinforcement learning (MARL) architectures to enable decentralized adaptation among heterogeneous evacuees. Furthermore, lightweight DQN variants and distributed training schemes will be explored to enhance computational scalability, while empirical data from evacuation drills and real-world case studies will be used for model calibration and validation, thereby improving predictive accuracy and generalizability.

Keywords:

evacuation efficiency; cellular automaton; heterogeneous speeds; DQN; simulation

1. Introduction

Safe and efficient building evacuation is a fundamental component of emergency management; however, vulnerable populations, particularly persons with disabilities, continue to face disproportionate risks during emergencies [1,2,3]. Globally, an estimated 1.3 billion people live with significant disabilities [4], representing a substantial proportion of the population that must be explicitly considered in safety planning. In Canada, nearly one in five adolescents has an impairment [5], while more than 14% of the population in the United Kingdom and approximately 12% in the United States experience mobility, sensory, or cognitive disabilities. Conventional evacuation procedures typically require individuals to descend staircases or move rapidly through corridors [6], conditions that are infeasible for wheelchair users and challenging for individuals with visual, hearing, or cognitive impairments [7,8]. Previous studies have also shown that wheelchair users and individuals relying on mobility aids often contribute to localized congestion [9], while suboptimal or irrational decision-making among evacuees can further increase uncertainty and delay [10]. In addition, the rapid spread of smoke, heat, and toxic gases during fires can render pre-planned evacuation routes impassable [11]. These challenges are particularly critical in densely occupied public spaces such as shopping malls, schools, airports, and train stations, where the risks are amplified. Consequently, ensuring safe and inclusive evacuation for individuals with disabilities remains a vital objective in building safety design and emergency planning.

Ensuring safe and efficient evacuation for vulnerable populations, particularly individuals with disabilities, requires not only accessible infrastructure but also accurate modeling of human movement in complex environments. Over the past two decades, extensive research on crowd dynamics and evacuation modeling has aimed to reduce casualties and enhance evacuation efficiency by simulating human behavior and decision-making under emergency conditions. Among these approaches, the Cellular Automaton model, originally proposed by John von Neumann, has become one of the most widely adopted frameworks in evacuation studies due to its simplicity, computational efficiency, and scalability [12]. However, the discrete nature of CA in time, space, and state transitions often leads to oversimplified motion rules, limiting its capacity to represent realistic pedestrian positions, environmental interactions, walking speeds, and behavioral heterogeneity [13].

To address these limitations, various enhancements to the CA framework have been proposed. Pelechano evaluated several CA-based evacuation tools such as EXODUS and STEPS, highlighting their limitations in representing complex human behaviors [14]. Goldengorin enhanced environmental realism by incorporating obstacle geometry into CA simulations using a Hopfield network [15]. Wąs et al. further optimized floor field models through modified neighborhood structures, refined static field construction, and field optimization methods [16]. Similarly, Sirakoulis et al. introduced autonomous obstacle avoidance, leader-following mechanisms, and electrostatic potential fields to simulate more adaptive and realistic crowd movement [17,18].

Applying CA models to the evacuation of individuals with disabilities introduces several additional challenges. Accurately representing heterogeneous movement speeds is particularly important, as empirical studies show that hearing-impaired individuals move at speeds comparable to the general population, visually impaired individuals move more slowly, and those with mobility impairments exhibit substantially reduced speeds [19]. Traditional CA models, however, typically assign uniform movement velocities to all agents, thereby failing to capture this diversity. To address this limitation, enhanced formulations such as the Extended Floor Field Cellular Automaton (FFCA) [20] and dynamic-parameter CA models [21] have introduced variable step lengths, circular interaction zones, and deceleration factors to better represent environmental constraints and individual mobility differences.

Second, simulating complex architectural environments is essential for realistic evacuation modeling. Early CA-based studies often focused on simplified, obstacle-free geometries and were generally limited to unidirectional [22], bidirectional [23], or multidirectional pedestrian flows [24,25]. In contrast, real-world buildings exhibit far greater spatial complexity, incorporating multiple exits [24,25], internal obstacles [26,27], staircases [28], corners [29], elevators [30], and escalators [31]. Previous studies have demonstrated that obstacle configuration, architectural features, and priority rules for wheelchair users significantly influence congestion patterns, pedestrian flow dynamics, and overall evacuation efficiency in mixed heterogeneous populations [29,32,33,34,35].

Third, integrating dynamic hazard conditions such as fire and smoke remains a major challenge in evacuation modeling. For example, Wang et al. [36] coupled Fire Dynamics Simulator (FDS) outputs with CA models to optimize escape routes based on real-time hazard propagation, while Dong et al. [37] linked field parameters to dynamic visibility levels, refining the decision-making processes of evacuees navigating obstructed environments.

To enhance realism, extensions of the floor field model have incorporated a range of behavioral and physical factors, such as visibility-dependent movement patterns, emotional and cognitive responses, interaction forces, and multi-speed dynamic, thereby improving the representation of conflicts and collision avoidance mechanisms [38,39,40,41,42,43,44,45,46].

Building on these advances, it is important to recognize that while CA models effectively capture pedestrian motion and local interactions, they remain limited in adaptive decision-making under dynamic and uncertain conditions. Reinforcement learning (RL), particularly Q-learning, has demonstrated notable success in domains such as control theory, robotics, and image processing [47,48,49,50,51]. The core principle of RL is that an agent interacts with its environment by performing actions, receiving rewards, and iteratively learning an optimal policy that maximizes cumulative rewards.

Wharton [52] employed a multi-agent RL framework to simulate building evacuations, where a state-action matrix was used to represent action values at each grid cell. However, the simplified grid design limited the ability to capture realistic agent movement dynamics and collision behaviors. Tian and Jiang [53] applied an RL-based model to tunnel evacuations under fire conditions, designing a multi-path escape strategy based on Nash equilibria analysis. Li et al. [54] proposed an RL model that simultaneously considers traffic conditions, hazard severity, and the availability of safe routes to improve decision-making realism. Nayeem et al. [55] introduced a Q-learning-based Grey Wolf Optimizer (QGWO), in which Q-learning dynamically adjusts key algorithmic parameters to enhance adaptability and convergence in dynamic environments—overcoming limitations of standard GWO such as susceptibility to local optima and lack of adaptive learning capabilities.

A key limitation of conventional RL approaches is their low efficiency when generating solutions in high-dimensional or continuous environments. The integration of deep neural networks (DNNs) with RL, collectively referred to as deep reinforcement learning (DRL), has significantly improved the ability to model large-scale, continuous, and partially observable systems. Sharma et al. [56] used a Deep Q-Network (DQN), a DRL algorithm that replaces the Q-table with a neural network and incorporates experience replay and target network updates, to simulate fire evacuation scenarios. Their graph-based model effectively captured fire spread, bottlenecks, and random evacuation behaviors but was limited in its ability to represent continuous evacuation actions realistically. Zhang et al. [57] employed a DQN-based approach integrated with a particle dynamics environment to evacuate from a room with obstacles, leveraging a shared network pre-trained on a single agent. Their findings demonstrated the feasibility of integrating single-agent transfer learning and shared neural networks for efficient evacuation to the nearest exit.

In summary, CA models have been extensively advanced for simulating pedestrian evacuation, achieving significant progress in representing heterogeneous movement speeds, complex architectural layouts, dynamic hazard conditions, and diverse behavioral patterns. These developments have substantially improved the realism of CA-based evacuation simulations, particularly for mixed populations that include individuals with disabilities. Nevertheless, CA models inherently rely on predefined movement rules and therefore lack adaptive decision-making capabilities in dynamic and uncertain environments, limiting their effectiveness in capturing optimal evacuation strategies under changing conditions.

To address these limitations, researchers have increasingly explored integrating CA models with reinforcement learning methods. RL, especially Q-learning, has demonstrated strong adaptability and autonomous decision-making capabilities across various domains, and recent studies have begun examining its potential for evacuation modeling. Although these approaches show promise in optimizing escape strategies, most RL-based evacuation studies still rely on oversimplified spatial representations, neglected detailed movement dynamics, or assume homogeneous populations. Furthermore, conventional RL methods exhibit limited scalability in high-dimensional state spaces, constraining their applicability to realistic, complex evacuation scenarios. The advent of DRL has improved the ability to manage large-scale, continuous, and partially observable environments. Nevertheless, DRL-based evacuation research has rarely considered the distinct mobility characteristics and behavioral diversity of people with disabilities, nor has it fully exploited the strengths of CA models in capturing micro-level pedestrian interactions.

To bridge this gap, this study introduces a hybrid evacuation optimization framework that integrates the Floor Field Cellular Automata model with Q-learning and the Deep Q-Network. The proposed approach combines the structural and behavioral modeling strengths of FFCA while the adaptive decision-making capabilities of reinforcement learning. The framework is evaluated under varying population densities, walking speeds, and exit configurations to systematically assess its adaptability across diverse and dynamic evacuation scenarios. Beyond improving overall evacuation efficiency, this study also investigates how heterogeneous populations, particularly individuals with disabilities, benefit from adaptive path selection and congestion mitigation strategies. By incorporating ε-greedy exploration, target network updates, experience replay, and policy transfer across agents, the proposed approach efficiently learns evacuation strategies that respond to real-time environmental changes and generalize across diverse evacuee groups.

2. Models and Methods

To comprehensively evaluate the adaptability of the proposed model evacuation framework, simulations were conducted under multiple scenarios featuring varying population densities, walking speeds, and exit configurations. Unless otherwise specified, the default simulation environment was a rectangular room measuring 40 m × 20 m with a 2 m wide exit. At the initial time step, the room contained 160 occupants, including 120 individuals with disabilities (40 wheelchair users, 40 visually impaired, and 40 hearing-impaired persons) and 40 able-bodied individuals. All controlled experiments were used to validate the effectiveness and robustness of the proposed evacuation model.

2.1. Cellular Automaton Model

The CA model, originally proposed by John von Neumann in the 1950s [58], is a discrete computational framework used to simulate complex systems through simple, localized interaction rules. In its general form, a CA model can be formally expressed as

C A = (L^{d}, S, N, F)

(1)

where L denotes the d-dimensional lattice (cell space) that represents the environment, S is the finite set of possible states for each cell (e.g., occupied, unoccupied, obstacle), N defines the neighborhood configuration for each cell, and F specifies the local transition rules that determine the next state of a cell based on its current state and the states of its neighboring cells.

In this study, the Moore neighborhood is adopted, allowing each cell to move to one of the eight surrounding cells. As illustrated in Figure 1, the central grid cell (i,j) has eight potential movement directions with a 45° angular resolution. Compared with the von Neumann neighborhood, which restricts movement to four orthogonal directions (north, south, east, and west), the Moore neighborhood provides a wider range of movement options and thus offers a more realistic representation of pedestrian motion patterns (Figure 1).

In this study, the two-dimensional lattice represents the building floor plan, with the evacuation domain discretized into grid cells of 0.5 m × 0.5 m [59]. In the classical CA formulation, the evolution rules are deterministic, meaning that once the rules are defined, the system’s state updates follow a fixed trajectory without any stochastic variation.

2.2. Floor Field Cellular Automaton Model

The FFCA model enhances the behavioral realism of agent-based evacuation simulations by incorporating the concept of floor fields to guide pedestrian movement through probabilistic decision-making. Within the FFCA framework, the simulation environment is represented by one or more scalar fields, primarily the static floor field (

S_{i, j}

) and the dynamic floor field (

D_{i, j}

), which collectively determine an agent’s movement probability [60]. The SFF encodes the shortest distance or minimal-cost path from any given cell to an exit, accounting for the spatial configuration of obstacles and walls. Conversely, the DFF models virtual traces deposited by moving agents, which diffuse and decay over time, thereby simulating herding behavior as agents are influenced by the paths of others. An additional stochastic term (ξ) is commonly introduced to capture unpredictable variations in individual pedestrian behavior. The probability (

P_{i, j}

) of an agent transitioning from its current cell to an adjacent cell (i,j) is subsequently derived from the weighted combination of these elements:

P_{(i, j) \to (m, n)} = \frac{1}{Z_{i j}} M_{m n} (1 - n_{m n}) e x p (k_{S} S_{m n} + k_{D} D_{m n} + ξ_{m n})

(2)

where

P_{(i, j) \to (m, n)}

represents the transition probability from the current cell (i,j) to a neighboring cell (m,n). The terms

S_{m n}

and

D_{m n}

represent the static and dynamic floor field values at the target cell, respectively.

M_{m n} ϵ {0,1}

is the accessibility indicator,

n_{m n} ϵ {0,1}

indicates the occupancy status of the target cell,

ξ_{m n}

is a perturbation (which can be

ξ_{m n}

= 0 or a noise term ~

N (0, σ^{2})

) and

Z_{i j}

is the normalization factor, ensuring that the sum of all transition probabilities within the neighborhood equals unity. Finally,

k_{S}

and

k_{D}

are sensitivity parameters that modulate the relative influence of the static and dynamic floor fields,

k_{S} ϵ [0,1]

and

k_{D} ϵ [0,1]

.

M_{m n} = \{\begin{matrix} 0, c e l l (m, n) i s f o r b i d d e n, e . g ., w a l l s \\ 1, e l s e \end{matrix}

(3)

n_{m n} = \{\begin{matrix} 0, c e l l (m, n) i s e m p t y \\ 1, c e l l (m, n) i s o c c u p i e d \end{matrix}

(4)

Z_{i j} = \sum_{(u, v) ϵ N (i, j)} M_{u v} (1 - n_{u v}) e x p (k_{S} S_{u v} + k_{D} D_{u v} + ξ_{u v})

(5)

The SFF functions as an environmental potential that represents the distance or path cost from each cell to the exit. Once this field is established, each agent executes a greedy, discrete gradient-descent step: it examines the Moore neighborhood of its current position and, subject to accessibility and occupancy constraints, moves to the neighboring cell with the lowest field value, corresponding to the steepest local decrease in cost. By convention, smaller field values indicate closer proximity to the exit. The SFF value from cell (i,j) to the exit cell (m,n) is formally defined in Equation (6).

S_{i j} = k_{s} \times m a x (|i - m|, |j - n|)

(6)

where

k_{s}

is a positive scaling parameter that regulates the gradient steepness of the floor field.

Building upon the DFF evolution based on the von Neumann neighborhood proposed by Nishinari et al. [61], we extend the formulation to the Moore neighborhood. The corresponding DFF expression is presented in Equation (7).

\begin{matrix} D_{i, j}^{t + 1} = (1 - α) (1 - δ) D_{i, j}^{t} + \frac{α (1 - δ)}{8} (D_{i - 1, j}^{t} + D_{i + 1, j}^{t} + D_{i, j - 1}^{t} + D_{i, j + 1}^{t} + D_{i - 1, j - 1}^{t} + \\ D_{i - 1, j + 1}^{t} + D_{i + 1, j - 1}^{t} + D_{i + 1, j + 1}^{t}) \end{matrix}

(7)

where

D_{i, j}^{t + 1}

represents the dynamic field value at position (i,j) at time t + 1.

D_{i, j}^{t}

represents the dynamic field value at position (i,j) at time t.

α

and

δ

are parameters that control the update of the dynamic field, where

α

controls the learning rate and

δ

controls the forgetting rate.

2.3. Velocity Heterogeneity Modeling

In real-world evacuation scenarios, individuals display heterogeneous mobility capabilities influenced by factors such as age, physical condition, and disability status. To account for this variability in the FFCA-based simulations, the proposed model assigns each pedestrian a fixed velocity coefficient

v_{c}

∈ (0,1.2] at the initial state, which remains constant throughout the simulation. In this study, four pedestrian categories are considered: wheelchair users (

v_{c}

= 0.6), visually impaired individuals (

v_{c}

= 0.8), hearing impaired individuals (

v_{c}

= 1.0), and able-bodied pedestrians (

v_{c}

= 1.2). The velocity coefficient serves as a scaling factor for the base transition probability

P_{i j}

computed by the FFCA model [62], yielding an adjusted probability

P_{i j}^{'}

for movement toward cell (i,j):

P_{i j}^{'} = v_{c} \times P_{i j}

(8)

This formulation, adapted from the original FFCA framework [63], ensures that pedestrians with lower mobility have a proportionally reduced chance of movement per time step, while preserving the influence of static and dynamic floor fields. By keeping

v_{c}

constant, the heterogeneity effect is explicitly controlled, allowing comparative analysis of evacuation efficiency across mixed populations. These coefficients were chosen as representative values inside ranges reported in empirical studies and engineering reviews for horizontal walking speeds. Specifically, recent systematic reviews and pedestrian studies report typical free/horizontal walking speeds for able-bodied adults at roughly 0.9–1.2 m/s (design and observational studies), while empirical and community measurements of wheelchair mobility and mobility-impaired users show a wider context-dependent range commonly between 0.3 and 0.9 m/s (depending on wheelchair type, propulsion, environment and whether the measurement is a short bout or free movement) (Appendix A). Studies focusing on visually impaired evacuees report horizontal walking speeds that may approach able-bodied values in familiar or guided environments but tend to be lower (typical horizontal ranges reported 0.7–1.0 m/s) in unfamiliar or low-visibility situations. Hearing impairment primarily affects detection/notification and pre-movement stages rather than steady walking speed; observational studies therefore commonly report no systematic decrease in horizontal walking speed for hearing-impaired individuals relative to able-bodied peers. The Appendix A (Table A1) summarizes the representative literature ranges, contexts (horizontal plane, sample sizes, type of measurement) and the mapping between coefficient and approximate m/s values used for sensitivity analysis. The coefficient values above are used only to reflect typical relative differences reported in the literature; the FFCA rules, reward design, and experiment conditions remain unchanged. To further analyze the impact of mobility variations, additional experiments adjust these baseline velocity coefficients by ±30% and ±50% for all agent categories, enabling evaluation of evacuation performance under different walking speed scenarios.

From a methodological standpoint, these experiments also serve as a form of sensitivity analysis, as they systematically vary key mobility-related parameters (i.e., individual velocity coefficients and density conditions) to evaluate the robustness of the evacuation outcomes. Rather than fine-tuning numerical parameters, this analysis focuses on the behavioral implications of mobility heterogeneity, confirming that the overall evacuation dynamics and comparative trends remain consistent under a wide range of plausible parameter values.

2.4. Friction-Based Conflict Resolution

When multiple pedestrians simultaneously attempt to occupy the same target cell, spatial competition and interpersonal interference may result in movement delays or blockages. To represent this phenomenon, the friction mechanism proposed by Kirchner et al. [64] is incorporated into the FFCA framework. The friction parameter

μ ϵ [0,1]

represents the probability that a movement conflict remains unresolved. Specifically, if k pedestrians compete for the same target cell during a given time step, the outcome is determined by

μ

. With probability

μ

, the conflict remains unresolved, and all competing pedestrians retain their current positions; with probability 1 −

μ

, the conflict is resolved, and one pedestrian is permitted to occupy the cell, chosen either uniformly at random or in proportion to their velocity-adjusted transition probabilities

P_{i j}^{'}

. Formally, the probability

P_{m o v e}^{(a)}

that pedestrian successfully moves into the contested cell is defined as

P_{m o v e}^{(a)} = (1 - μ) \times \frac{P_{i j}^{' (a)}}{\sum_{b = 1}^{k} P_{i j}^{' (b)}}

(9)

where

P_{i j}^{' (a)}

denotes the velocity-adjusted transition probability for agent a. This velocity heterogeneity and the friction mechanism effectively captures both individual speed variations and congestion-induced interactions in high-density scenarios, thereby producing more realistic evacuation dynamics [65].

The velocity heterogeneity and friction mechanisms described above establish the core micro-level movement rules within the proposed FFCA framework. To improve the model’s adaptability under dynamic and uncertain evacuation conditions, these deterministic rules are further integrated with a Q-learning and DQN–based decision-making layer, as detailed in Section 2.5.

2.5. Reinforcement Learning Integration

The integration of FFCA and DQN is conceptually grounded in their complementary modeling paradigms. FFCA captures local pedestrian interactions, spatial fields, and boundary effects through deterministic transition rules, while DQN provides adaptive global decision-making based on environmental feedback. In the hybrid FFCA + DQN framework, the CA layer governs the physical movement and interaction constraints, whereas the DQN layer learns optimal action policies through iterative reward optimization. This coupling enables agents to make context-aware movement decisions beyond predefined rules, thus bridging microscopic behavior modeling and macroscopic optimization. A supplementary ablation experiment was conducted to isolate the effect of the DQN component, confirming that the learning-based decision layer significantly improves adaptability and congestion avoidance compared to the pure FFCA model (see Appendix A and Table A2).

Figure 2 illustrates the closed perception–action–feedback cycle integrating the FFCA simulation layer, the DQN learning layer, and the performance evaluation layer. The FFCA provides environmental states, St, to the DQN, which generates actions, At, for agent movement. The environment then returns rewards, Rt, and new states, St + 1, which are stored in the experience replay buffer to update the Q-networks through mini-batch training and periodic target network updates. The aggregated performance indicators (e.g., evacuation time, congestion level) are used in an external adaptive optimization loop to refine learning parameters and reward structures, forming a self-improving evacuation strategy.

To enable agents to exhibit adaptive decision-making beyond the deterministic FFCA dynamics, RL is incorporated into the movement decision layer. We first establish a discrete, tabular Q-learning baseline for low-dimensional state representations and subsequently extend it to a DQN framework for high-dimensional states. In the integrated model, the RL component generates an action intent (e.g., selection of a target neighbor cell), which is then executed in accordance with FFCA feasibility constraints and the friction-based conflict resolution mechanism described in Section 2.4.

2.5.1. Q-Learning Algorithm

In the Q-learning formulation shown in Equation (10), each pedestrian agent is modeled as an autonomous decision-maker that selects an action a ∈ A from a finite action set based on its current state s. After performing an action, the agent receives an immediate reward, r, from the environment and updates its Q-value table, Q (s,a), according to the standard Q-learning rule [66]:

Q (s, a) \leftarrow Q (s, a) + α [r + γ \max_{a^{'}} Q (s^{'}, a^{'}) - Q (s, a)]

(10)

where

α

is the learning rate,

γ

is the discount factor, and

s^{'}

is the next state after executing action a. The term

\max_{a^{'}} Q (s^{'}, a^{'})

represents the best estimated future reward, guiding the agent toward actions with higher long-term returns.

In this study, the discount factor γ = 0.98 is adopted to appropriately balance short-term evacuation efficiency and long-term strategic decision-making. This value is consistent with prior reinforcement learning-based evacuation models, where γ typically ranges from 0.90 to 0.99 to ensure stable convergence and adequate foresight in dynamic environments [47,48].

To encourage pedestrians to learn efficient evacuation routes, a composite instantaneous reward function is designed as

R (s, a) = R_{e x i t} + R_{t i m e} + R_{c r o w d} + R_{o b s t a c l e}

(11)

where

R_{e x i t}

provides a large positive reward upon reaching the exit,

R_{t i m e}

imposes a small negative reward per time step to discourage delays.

R_{c r o w d}

penalizes entering high-density areas, and.

R_{o b s t a c l e}

penalizes collisions or moving into obstacle cells. The detailed reward components are summarized in Table 1.

Action selection follows an ε-greedy exploration strategy, where with probability 1 −

ε_{t}

the action maximizing

a r g m a x_{a} Q (s_{t}, a)

is chosen, and with probability

ε_{t}

a random action is selected. The exploration rate

ε_{t}

decays over time according to Equation (12):

ε_{t} = ε_{m i n} + (ε_{0} - ε_{m i n}) e x p (- k t)

(12)

where

ε_{0}

is the initial exploration rate,

ε_{m i n}

is the minimum exploration rate, and k is the exponential decay constant [67].

2.5.2. Deep Q-Network (DQN)

When agent observations are high dimensional (e.g., local occupancy grids, field values, hazard maps), the Q-table is replaced by a function approximator

Q (s, a; θ)

parameterized by neural network weights

θ

[68]. The DQN is trained by minimizing the temporal-difference loss, defined as

L (θ) = E_{(s, a, r, s^{'}) ~ D} [(r + γ \max_{a^{'}} Q (s^{'}, a^{'}; θ^{-}) - Q (s, a; θ))^{2}]

(13)

where

θ^{-}

denotes the target network parameters, and D represents the experience replay buffer used to mitigate temporal correlations [69]. The DQN framework relies on two stabilizing components. First, experience replay stores transitions

(s, a, r, s^{'})

in a buffer and randomly samples minibatches to break temporal correlations. Second, a target network

θ^{-}

is updated periodically every C step to provide stable learning targets. The pseudo-code combining FFCA and RL algorithms is presented in Algorithm 1. To ensure methodological transparency, parameter values used in both FFCA and DQN components were systematically derived from empirical references and calibrated through pre-tests. The physical parameters, including friction and velocity coefficients, were adapted from evacuation behavior studies and verified to fall within the reported empirical ranges. The reinforcement learning parameters (γ, α, ε-decay) were determined based on convergence stability tests, ensuring consistent policy optimization across multiple simulation runs. Detailed parameter settings and sources are summarized in Table 2, while the rationale for integration and parameter harmonization between FFCA and DQN is provided in Appendix A.

Algorithm 1. Hybrid

Initialize: Q-table or network Q(s,a;θ), target network θ⁻ = θ, replay buffer D.

1:For episode = 1 to MaxEpisodes:

2: Reset environment; initialize agent states.

3:For t = 1 to Tmax:

4:For each agent:

5 : Observe state s_{t}

.

6 : Select action a_{t}

using ε-greedy policy according to Equation (12).

7:Aggregate action intents for all agents.

8:Apply FFCA feasibility checks; resolve conflicts via friction mechanism.

9 : Execute successful moves; compute rewards r_{t}

for each agent.

10 : Observe next state s_{t + 1}

.

11 : Store (s_{t}

, a_{t}

, r_{t}

, s_{t + 1}

) in D.

12:If DQN:

13:Sample minibatch from D; update θ using Equation (13).

14:Every C steps: θ⁻ ← θ.

15:Else if Tabular Q:

16:Update Q-table using Equation (10).

17:Decay ε according to Equation (12).

18:End For

19:End For

In this study, FFCA functions as the environmental simulation layer. It models spatial layout, floor fields, and heterogeneous pedestrian mobility within a discrete grid space. Within this framework, the state of each agent is defined by its current cell position, neighboring cell accessibility, and local floor field values. The DQN functions as the decision-making layer, mapping each observed FFCA state to an action (i.e., movement toward one of the neighboring cells). After an action is executed in the FFCA environment, the agent receives an immediate reward that reflects both evacuation objectives (e.g., distance reduction toward exits, avoidance of congestion) and safety constraints. This reward signal is then used to update the Q-values through experience replay and target network updates. In this way, FFCA provides the micro-level interaction rules and environmental dynamics, while DQN adaptively learns evacuation strategies that optimize long-term efficiency. The coupling of FFCA and DQN thus combines realistic crowd movement modeling with reinforcement learning-based adaptive path selection, enabling a more robust representation of evacuation processes in complex scenarios.

From a methodological perspective, the integration of FFCA and DQN was designed to ensure internal model consistency and computational stability. Both components operate under a unified time-step and share a common state representation, which defines each agent’s observable environment. The FFCA layer governs physical feasibility and local interactions, while the DQN layer optimizes decision policies based on cumulative rewards. This consistent coupling enables the hybrid model to combine deterministic rules with adaptive learning. It preserves the microscopic constraints of the cellular automata framework.

2.6. Model Validation

To ensure methodological clarity and robustness, this section provides an overview of how each simulation experiment was designed to validate specific capabilities of the proposed FFCA + DQN evacuation framework. The validation process follows a stepwise structure, progressing from physical field correctness to learning adaptability, heterogeneity handling, and scalability testing. Each subsequent subsection builds upon the previous one, forming a coherent chain of verification that supports the reliability and applicability of the model.

(1): Baseline validation

The first experiment verifies the correctness of the SFF used for global path guidance. By computing the SFF through a discrete Chebyshev distance transform and visualizing its gradient distribution, the model ensures that pedestrians are guided along monotonically decreasing potential values. The resulting trajectories (Figure 3) align with theoretical shortest paths, confirming that the SFF provides accurate and consistent global navigation cues.

(2): Dynamic adaptability

The second stage introduces the DFF to assess local coordination and adaptive movement in varying crowd densities. Through comparative simulations under SFF-only and SFF + DFF conditions (Figure 4, Figure 5 and Figure 6), the results demonstrate that the DFF enhances local adaptability by reducing bottleneck conflicts and enabling spontaneous detours without compromising overall path optimality.

(3): Learning-based adaptation

To evaluate the adaptive learning capacity of reinforcement learning agents, single-agent Q-learning and DQN experiments are conducted (Section 3.3). The Q-learning test examines convergence and behavioral improvement in low-dimensional spaces (Table 3), while the DQN experiment extends this analysis to high-dimensional environments. The results confirm that both methods enable agents to learn congestion-aware evacuation strategies, with DQN offering faster convergence and improved generalization.

(4): Collective Intelligence

The fourth validation stage assesses how a policy learned by a single agent performs when transferred to multi-agent scenarios with heterogeneous walking speeds and densities (Section 3.4). The results show that the DQN-enhanced model maintains superior evacuation efficiency and robustness, even without retraining, thereby validating its adaptability and generalization in collective environments.

(5): Scalability and Robustness

Finally, a series of sensitivity analyses in Section 3.5 examine how variations in physical and learning parameters influence performance. The model consistently exhibits stable convergence and similar evacuation trends under different configurations, confirming its scalability and numerical robustness.

In summary, the sequence of experiments, from static field validation to multi-agent simulations and sensitivity testing, provides comprehensive evidence that the FFCA + DQN framework is theoretically sound, computationally stable, and practically applicable for modeling complex, adaptive evacuation behaviors.

3. Results and Discussion

This section presents the simulation outcomes and analytical findings derived from the proposed FFCA + DQN hybrid framework. The results are organized according to key experimental scenarios, including variations in population density, exit configurations, and pedestrian heterogeneity. The focus is placed on comparing evacuation efficiency, route optimization, and adaptive behaviors across different modeling approaches.

The results demonstrate that the FFCA + DQN model consistently achieves shorter evacuation times and improved flow dynamics compared with the baseline FFCA and SFF + DFF models. Moreover, it exhibits strong adaptability across diverse population compositions and environmental conditions, validating its potential for inclusive evacuation analysis.

3.1. Static Floor Field Calculation and Optimal Path Planning

The static floor field (SFF) was first computed to provide pedestrians with global guidance toward the designated exit. In the SFF, each grid cell is assigned a scalar value representing the shortest distance from that cell to the exit under static conditions [70]. The field was generated with a discrete distance transform based on the Chebyshev metric. This approach ensures a monotonically decreasing field along the optimal evacuation path [71]. As a result, pedestrians tend to move toward neighboring cells with lower SFF values, thereby following the shortest-distance evacuation strategy.

To validate the accuracy of the field construction, the SFF was visualized as a two-dimensional gradient map, with the corresponding optimal trajectories superimposed (Figure 3). The results demonstrate that the resulting gradient field yields paths consistent with the theoretical shortest routes under the Chebyshev distance, thereby confirming the correctness and reliability of the SFF formulation.

This analysis confirms that the SFF provides reliable global guidance for evacuation. In the subsequent sections, the static field is integrated with the DFF and RL algorithms to model adaptive and heterogeneous pedestrian behaviors under realistic evacuation scenarios.

3.2. Dynamic Floor Field Simulation

The dynamic floor field (DFF) was introduced to capture local interactions among pedestrians and to represent the transient virtual traces generated by their movement. Following the formulation of Nishinari et al. [61], the DFF is defined within the Moore neighborhood, and its evolution is governed by deposition and decay parameters. In this framework, the field is reinforced when pedestrians occupy or traverse cells and gradually decays over time to prevent unbounded accumulation. This mechanism helps pedestrians adapt their movements using frequently traversed paths. It effectively represents herding and congestion effects.

To evaluate the influence of the DFF, simulations were performed under two configurations: (i) SFF-only guidance and (ii) combined SFF + DFF guidance. The spatiotemporal snapshots (Figure 4, Figure 5 and Figure 6) illustrate the pedestrian distributions at representative time steps. Compared with the SFF-only scenario, incorporating the DFF produces more realistic collective dynamics, characterized by temporary clustering near bottlenecks and adaptive detours in high-density regions.

These findings underscore the complementary roles of the DFF and SFF. While the SFF provides global guidance toward the exit, the DFF facilitates adaptive local coordination under crowded conditions. Together, they establish the foundation for subsequent RL extensions, in which agents further optimize their movement decisions by balancing global and local field information.

Figure 6 illustrates the pedestrian evacuation trajectories under various simulation configurations. Specifically, group a corresponds to the SFF-only mode without obstacles, group b represents the SFF + DFF mode without obstacles, and group c denotes the SFF + DFF mode with obstacles. Within each group, subplots X1, X2, and X3 correspond to low (0.2 persons/m²), medium (0.4 persons/m²), and high (0.6 persons/m²) density evacuation scenarios, respectively.

In the early evacuation stage (Figure 4 and Figure 5), pedestrians from the four room regions generally follow the SFF toward the exit. Their paths are short and nearly optimal. At this stage, overall pedestrian density is low. Therefore, overlaps and collisions rarely occur. However, in the later stage of evacuation, most pedestrians converge near the exit, where the limited exit width forms a bottleneck. Consequently, the local density in this region increases, leading to occasional overlapping and collision events. As illustrated in Figure 6, the pedestrian trajectories under both the SFF-only and the SFF + DFF modes remain largely consistent, which is expected in a single-exit configuration. This finding indicates that incorporating the DFF does not alter the global shortest path but rather enhances the local coordination of pedestrian movements.

As shown in Figure 7, when only the SFF mode is applied, pedestrian trajectories exhibit substantial overlap (the heatmap reveals a highly concentrated exit path), indicative of localized congestion. In this configuration, pedestrians rely exclusively on the shortest path strategy for evacuation and lack adaptive obstacle-avoidance capability. The underlying reason is that SFF is a pre-computed static field that cannot respond to real-time variations in crowd density. After incorporating the DFF, trajectories near the exit become more dispersed compared with the SFF-only mode (the heatmap displays a reduced overlap frequency), suggesting that the dynamic field helps alleviate local congestion. Through footprint-based feedback, pedestrians can partially avoid high-density regions (i.e., areas with elevated DFF values). However, this approach introduces several challenges. First, pedestrians must continuously balance between the SFF (shortest-path guidance) and DFF (congestion avoidance), which increases decision-making complexity. Second, the local propagation nature of DFF may lead to oscillatory detours, while its decay rate and weight parameters require manual tuning and exhibit limited generalization capability. Because traditional rule-based models struggle to simultaneously optimize such multi-objective decisions, RL is introduced to automatically learn the balance between SFF and DFF through reward-driven adaptation. In particular, the DQN leverages neural network function approximation to generalize across diverse scenarios (e.g., exit layouts, crowd densities) and achieve more efficient global coordination.

As illustrated in Figure 8 and Figure 9, the presence of dense obstacles leads to pronounced detours and oscillatory pedestrian movements. A portion of pedestrians failed to reach the exit within the simulation time. Their trajectories exhibit repetitive back-and-forth motions near obstacle clusters, a phenomenon referred to as oscillatory local trapping. These inefficient trajectories indicate that local congestion and the limited adaptability of the current field-based model impede smooth pedestrian flow. Consequently, a subset of pedestrians remains near their initial positions at the end of the simulation, underscoring the limitations of the underlying path-planning mechanism. These findings demonstrate that rule-based field models (SFF + DFF) are insufficient to fully capture the complex and dynamic nature of realistic evacuation scenarios. To address these shortcomings, reinforcement learning approaches such as Q-learning and DQN can be introduced, enabling agents to autonomously learn adaptive strategies that minimize detours, suppress oscillations, and enhance overall evacuation efficiency.

3.3. Reinforcement Learning-Based Evacuation Optimization

To address the limitations of static or rule-based floor field models, RL methods are introduced to enhance pedestrian evacuation strategies. RL allows agents to autonomously learn adaptive decision-making policies through continuous interaction with the environment, providing adaptability under dynamic crowd conditions. This section examines three levels of RL-based evacuation optimization: single-agent Q-learning, DQN for high-dimensional path planning, and multi-agent RL for collective evacuation coordination.

3.3.1. Single-Agent Path Optimization Based on Q-Learning

In the baseline SFF + DFF models, pedestrian decision-making relies on predefined potential fields that lack adaptivity, often leading to local congestion and repetitive detours. To overcome these limitations, a single-agent path planning framework based on Q-learning is introduced. In this approach, the evacuation process is formulated as a Markov Decision Process (MDP), where the agent’s current grid position represents the state, the candidate movement directions correspond to actions, and the reward function is designed to promote proximity to the exit, avoidance of high-density regions, and penalization of collisions.

During training, the agent continuously updates its Q-table through iterative interactions with the environment, gradually learning to balance shortest path optimization with congestion avoidance in diverse scenarios. Compared with the fixed SFF + DFF approach, the Q-learning framework enables dynamic adaptation to local environmental variations and produces more flexible path selection behavior. Simulation results indicate that after 5000 training episodes, the agent’s trajectories converge toward smoother routes with reduced detours, resulting in a notable improvement in overall evacuation efficiency.

To validate the learned policy, pedestrian trajectories generated by the Q-learning model were compared with those obtained from the SFF + DFF baseline. As illustrated in Figure 10, the Q-learning agent demonstrates fewer oscillations and reduced detours in congested regions, while still adhering to the globally optimal path determined by the SFF. These findings confirm that Q-learning preserves global optimality while significantly enhancing local coordination and flow smoothness.

Table 3 presents the quantitative performance comparison between the baseline FFCA model and the Q-learning approach. The results indicate that although Q-learning incurs a higher computational cost due to iterative training it achieves fewer collision events, and a shorter overall evacuation time compared to the FFCA condition. This performance difference can be attributed to the reactive nature of the FFCA model. FFCA does not involve a learning process; at each time step, agents determine their movement directions instantaneously based on the SFF (shortest path to exits) and dynamic information (local pedestrian density). This makes FFCA computationally efficient and capable of effective local obstacle avoidance, serving as a robust and efficient baseline in most scenarios. In contrast, standard Q-learning encounters significant challenges in complex multi-agent environments. When multiple agents learn simultaneously, the environment perceived by each agent becomes highly non-stationary, which impedes convergence and can destabilize the learning process. Furthermore, the state and action spaces expand exponentially with the number of agents. Without careful algorithmic design and extensive training, simple Q-learning may perform worse than empirically optimized models such as FFCA.

Despite its simplicity, Q-learning achieves measurable improvement in pedestrian average evacuation time, particularly in environments with multiple obstacles. During the final phase of evacuation, the tabular Q-learning experiments demonstrate that RL enhances local adaptability in evacuation dynamics, particularly under congestion conditions. Nonetheless, the Q-table representation suffers from scalability limitations, as the state space expands rapidly in complex environments or among heterogeneous pedestrian populations. To address this issue, the framework is extended to a Deep Q-Network (DQN), in which the explicit Q-table is replaced by a neural network function approximator, enabling efficient learning in high-dimensional state spaces. The corresponding results and analysis are presented in Section 3.3.2.

3.3.2. DQN Path Optimization with High-Dimensional State

As the state space expands in complex environments or in scenarios involving multiple agents, the Q-table becomes computationally intractable. To address this limitation, a DQN is implemented, which leverages a neural network to approximate the state-action value function and generalize across similar states [68].

In this configuration, the input layer encodes the agent’s local observations, comprising SFF values, DFF values, and occupancy states within the Moore neighborhood. The output layer produces the action-value estimates corresponding to the eight possible movement directions and the option of remaining stationary. The DQN is trained using the standard replay buffer and target network mechanisms, where the objective is to minimize the mean squared error (MSE) loss between the predicted and target Q-values.

Figure 11 illustrates the learning curves of Q-learning and DQN. The Q-learning curve appears relatively flat and smooth, exhibiting neither a distinct upward nor downward trend. This behavior arises from the tabular representation of Q-learning, in which updates are localized and convergence is slow, especially in environments with large state spaces or sparse reward signals. In contrast, the DQN curve exhibits pronounced oscillations characterized by alternating rises and drops, reflecting the instability introduced by neural network function approximation, where parameter updates simultaneously affect multiple state-action estimates. Despite this variability, the DQN curve demonstrates a clear upward trend, indicating progressive performance improvement relative to the nearly stagnant Q-learning baseline.

As illustrated in Figure 12, DQN agents achieve smoother trajectories, with fewer oscillations and reduced congestion near the exit.

In order to evaluate the performance of the proposed algorithm, a comparative analysis was conducted between Q-learning and DQN. As presented in Table 4, Q-learning achieved an average reward of 60.42 and a final reward of 60.92, whereas DQN attained substantially higher values, with an average reward of 311.51 and a final reward of 429. These indicate that DQN is capable of learning more effective policies and attaining superior long-term performance. The stability metric, which quantifies the variance in reward convergence, is markedly higher for DQN (318.12) compared to Q-learning (6.08), suggesting that although DQN achieves better performance, its training process is inherently more volatile. In terms of training efficiency, Q-learning requires only 16.9 average steps, while DQN requires 496.2 steps, demonstrating that DQN sacrifices computational efficiency in exchange for enhanced reward optimization. Overall, these results highlight a clear trade-off: Q-learning ensures stable and efficient convergence at the expense of suboptimal policy performance, while DQN delivers substantially higher rewards but with reduced stability and higher computational demands.

3.4. Multi-Agent Evacuation Optimization

To further evaluate the applicability of RL-based path planning under realistic crowd conditions, the single-agent DQN policy is extended to a multi-agent setting, in which multiple pedestrians with heterogeneous walking speeds (four groups) simultaneously navigate the same environment. With the transfer-DQN approach, the pretrained single-agent model is directly deployed without additional retraining. The reward function follows the previous design, augmented with a mild congestion penalty near exits to discourage local crowding and promote smoother flow.

We first investigate the impact of crowd density by comparing low (0.2 persons/m², N = 160), medium (0.4 persons/m², N = 320), and high (0.6 persons/m², N = 480) scenarios. Figure 13 illustrates the effect of different population densities on evacuation performance in a single-exit scenario, comparing the SFF + DFF method with the DQN approach. The results clearly demonstrate that the DQN method consistently outperforms SFF + DFF across all density levels. Under low-density scenarios, the difference in evacuation time between the two methods is relatively small. However, as population density increases, the performance advantage of DQN becomes increasingly pronounced. In particular, under high-density conditions, the average evacuation time for crowds optimized by DQN decreases more significantly compared to that in low-density conditions, indicating that congestion-induced delays are effectively alleviated. These findings suggest that DQN exhibits superior adaptability and efficiency in dealing with complex evacuation environments.

A plausible explanation for DQN’s advantage lies in its dynamic path-planning capability, which enables agents to adjust their strategies in response to real-time environmental changes, thereby avoiding congestion more effectively in high-density scenarios. In contrast, the SFF + DFF method exhibits limited dynamic adaptability, resulting in reduced evacuation efficiency in high-density conditions compared to low-density and medium-density settings. Moreover, the adaptability of the DQN method may have significant implications for practical emergency evacuation management, particularly in the design and evaluation of evacuation strategies under varying density conditions. Future research could explore the performance of DQN in multi-exit layouts and other complex environmental settings, as well as explore its integration with complementary evacuation models (e.g., social force models) to further enhance the efficiency and reliability of evacuation systems.

Next, we compare single-exit and multi-exit layouts. Figure 14 illustrates the impact of a dual-exit configuration on evacuation performance under high, medium, and low-density conditions, comparing the SFF + DFF approach with the DQN method. The figure consists of six subplots, representing the results of SFF + DFF and DQN results under the three population densities: high (N = 480), medium (N = 320), and low (N = 160). Each experimental scenario considers four types of evacuees: able-bodied individuals (normal), wheelchair users (wheelchair), visually impaired individuals (vis-impaired), and hearing-impaired individuals (hear-impaired), reflecting heterogeneous mobility profiles in realistic evacuation conditions.

Across all density levels, the DQN method consistently achieves shorter evacuation times compared to SFF + DFF. This performance advantage is particularly pronounced under high-density conditions, where the evacuation times of able-bodied individuals and wheelchair users are significantly reduced with DQN. Although the improvement for hearing-impaired individuals is less pronounced, it remains observable. A plausible explanation for these patterns is that, in our experiments, each type of evacuee is initially concentrated near one of the four corners of the evacuation space. Under a single-exit layout, able-bodied individuals and wheelchair users are positioned farther from the exit, resulting in longer evacuation distances compared to the other two groups, which may account for the differential impact of DQN across evacuee types.

Under medium-density and low-density conditions, the DQN method continues to achieve shorter average evacuation times. This advantage likely arises from DQN’s ability to dynamically allocate optimal routes for each agent, thereby maximizing the utilization of multiple exits, reducing congestion, and enhancing overall evacuation efficiency. In contrast, the SFF + DFF approach relies on relatively fixed path allocations, which can limit efficiency, particularly in high-density settings. Furthermore, DQN demonstrates superior adaptability to diverse evacuee characteristics (e.g., wheelchair users), likely because it can incorporate individual differences into its policy and assign more suitable evacuation routes accordingly.

These findings highlight the superior adaptability and efficiency of the DQN method in complex evacuation environments, particularly under high-density and multi-exit conditions. Future work could further explore the performance of DQN in more complex layouts (e.g., scenarios with three or more exits) and under evacuation strategies that incorporate both static and dynamic hazards, such as fire and smoke. Moreover, exploring the integration of DQN with complementary evacuation models and algorithms could provide additional improvements, thereby enhancing the overall efficiency and reliability of evacuation systems.

Table 5 summarizes the average evacuation times and percentage improvements between the FFCA and FFCA + DQN models under different population densities. Each scenario was simulated ten times using the built-in randomization module of the simulation platform. The software automatically outputs averaged evacuation times; therefore, individual run-level values were not directly accessible for calculating standard deviations. Nonetheless, this internal averaging effectively reduces stochastic variability and provides a robust measure of the expected evacuation performance for each scenario.

To test the statistical significance of performance improvements, we conducted independent two-sample t-tests. The tests compared average evacuation times between the baseline SFF + DFF model and the proposed FFCA + DQN model under various density and exit configurations. The resulting p-values are reported in Table 5.

The statistical results indicate that most of the observed reductions in evacuation time are statistically significant at the 0.05 level (p < 0.05), confirming that the performance enhancement is not due to random variation. However, several scenarios—particularly those corresponding to low-density or dual-exit configurations—exhibited non-significant differences (p > 0.05). This behavior is expected, as evacuation flows under low congestion conditions inherently provide limited potential for further optimization.

Overall, even in cases where the differences are not statistically significant, the consistent downward trend in average evacuation time demonstrates that the FFCA + DQN model maintains robust performance across all conditions. This statistical validation strengthens the credibility of the proposed approach and supports the claim that the model offers a stable and generalizable improvement over the baseline method.

Finally, we analyze heterogeneous pedestrian speeds by adjusting the relative velocity (“velocity −50%”, “velocity −30%”, “velocity”, “velocity +30%”, “velocity +50%”) of all pedestrians. The results (Figure 15) illustrate the impact of varying walking speeds on evacuation performance when applying the SFF + DFF and DQN methods. The figure is organized into two groups, each containing three subplots corresponding to low-density (N = 160), medium-density (N = 320), and high-density (N = 480) scenarios. Each subplot presents evacuation times for able-bodied individuals, wheelchair users, visual-impaired and hearing-impaired individuals under five speed conditions.

Across all density and speed conditions, the DQN method consistently outperforms the SFF + DFF approach. This advantage is particularly pronounced when walking speeds are reduced, where DQN achieves a substantial reduction in evacuation time, highlighting its ability to dynamically adjust evacuation strategies. As walking speed increases, both methods show improvements in evacuation time; however, the improvements are more significant with DQN.

A plausible explanation for DQN’s superior performance is its capacity to dynamically adapt route planning based on real-time environmental changes and individual differences in walking speed. This enables DQN to exhibit better adaptability and efficiency when handling evacuation problems under diverse speed conditions. In contrast, the SFF + DFF approach appears to have limited adaptability, resulting in lower evacuation efficiency when speed varies.

These findings underscore the strong adaptability and efficiency of the DQN method for complex evacuation environments, particularly under varying speed conditions and heterogeneous populations. Future research could further explore adaptive training strategies for DQN that eliminate the need for separate training under each speed condition. In addition, strategies such as incorporating obstacle-avoidance penalties and utilizing exploratory DQN variants could be investigated in conjunction with evacuation models under variable-speed scenarios to enhance the overall efficiency and reliability of evacuation systems. Together, these experiments demonstrate that DQN generalizes effectively to multi-agent evacuation scenarios, capturing both density effects and heterogeneous pedestrian behaviors, while maintaining adaptability under different exit layouts.

Compared with previous international studies, the uniqueness of this research lies in its focus on heterogeneous populations and the integration of adaptive decision-making into a cellular automata framework. While most existing evacuation models based on CA or reinforcement learning have concentrated on homogeneous agents, simplified spatial layouts, or generic optimization goals, our proposed FFCA + DQN framework explicitly incorporates diverse mobility characteristics of people with disabilities and dynamically adapts their evacuation strategies. This hybrid approach bridges the gap between rule-based micro-level interaction modeling and learning-based global optimization, enabling more realistic and inclusive evacuation simulations. As such, this study advances the global body of research by providing one of the first systematic validations of RL-enhanced CA for mixed-population evacuations, with direct implications for inclusive emergency management and building safety design.

At the same time, it should be acknowledged that the present validation was conducted in a simplified, self-designed simulation environment, rather than on case-specific layouts or datasets established by previous researchers. This inevitably limits the persuasiveness of practical feasibility. To address this limitation, future work will extend the FFCA + DQN framework to real-world building plans (e.g., malls, transport hubs) and calibrate it against empirical drill or field data, thereby enhancing both the realism and applicability of the proposed approach.

3.5. Sensitivity and Robustness Analysis

To evaluate the robustness of the proposed FFCA + DQN evacuation framework, a series of sensitivity analyses were conducted on both physical and learning parameters. The results are summarized in Table 6, Table 7 and Table 8. These tests aim to verify that the model’s convergence behavior and evacuation performance remain consistent under reasonable perturbations of the main parameters.

As shown in Table 6, variations in physical parameters, particularly walking speed and friction coefficient, exert a measurable but non-destabilizing influence on overall evacuation efficiency. When walking speed was adjusted by ±50%, average evacuation times ranged from about 32.6 s to 68.4 s. These values align closely with the empirical distributions in Table 6 and Table A2. Changes in the friction coefficient produced moderate effects: higher friction values slightly increased the average evacuation time due to intensified local conflicts near exits areas. These outcomes indicate that the macroscopic evacuation dynamics of the model remain physically consistent and resilient to moderate perturbations in input parameters.

Table 7 examines the sensitivity of key reinforcement learning hyperparameters. The baseline configuration (α = 0.2, γ = 0.98, dynamic ε decay) achieved the best balance between convergence speed and policy stability. Learning rates exceeding 0.3 led to oscillatory convergence, whereas overly small rates delayed learning progress. Similarly, a discount factor γ = 0.98 ensured adequate strategic foresight and numerical stability, whereas smaller values (≤0.90) degraded long-term optimization. The dynamic exploration ε-decay schedule consistently outperformed fixed ε values, promoting smoother convergence and more diverse evacuation trajectories. These findings validate the robustness of the selected hyperparameter configuration.

The effect of reward scaling is reported in Table 8. Uniformly reducing all reward magnitudes by 50% led to unstable convergence and longer evacuation times, while moderate amplification (approximately 2×) improved convergence stability without compromising behavioral diversity. However, over-scaling beyond 3× led to premature convergence and reduced exploration. This finding underscores the need to maintain balanced reward magnitudes for policy generalization. These findings highlight that the composite reward structure maintains reliable learning performance across a broad range of scaling factors.

Overall, the sensitivity analyses confirm that the proposed hybrid FFCA + DQN framework exhibits strong robustness and numerical stability under a wide spectrum of physical and learning parameter variations. Parameter perturbations within empirically valid bounds did not significantly alter the qualitative evacuation dynamics, thereby supporting the reproducibility and credibility of the simulation results.

4. Conclusions

This study proposed and evaluated a hybrid evacuation modeling framework that integrates the FFCA with DQN reinforcement learning to address the challenges of heterogeneous populations and dynamic environments. The proposed FFCA+DQN model was tested across varying population densities, mobility levels, exit configurations and walking speed conditions.

The simulation results demonstrate that the DQN-enhanced framework consistently outperforms the conventional SFF+DFF approach across all experimental scenarios. The improvement is particularly pronounced for groups with mobility impairments, such as wheelchair users and individuals with visual or hearing impairments. This advantage stems from DQN’s capacity to adapt route planning dynamically in response to environmental changes and heterogeneous agent characteristics. Collectively, these findings highlight the potential of RL-based approaches to improve evacuation efficiency and safety in complex, mixed-population scenarios.

The work makes three main contributions: (i) It develops a unified FFCA + DQN framework that supports adaptive evacuation strategies for heterogeneous populations. (ii) It quantitatively demonstrates how reinforcement learning improves evacuation performance under varying density and velocity. (iii) It provides insights for allocating strategies to vulnerable groups, offering theoretical guidance for inclusive emergency management and human-centered building design.

Beyond its theoretical contributions, the findings of this study have direct implications for crisis management and emergency response. The proposed FFCA + DQN framework can inform the design of evacuation strategies that better accommodate individuals with reduced mobility, optimize exit configurations in complex facilities, and serve as the computational foundation for intelligent navigation tools assisting vulnerable groups during emergencies. The model generates adaptive strategies that respond to real-time congestion and different walking speeds. This helps planners and first responders allocate resources more effectively and improve safety during large-scale emergencies such as fires or earthquakes.

Importantly, while the simulation environment adopted in this study was deliberately simplified to ensure clarity and reproducibility, the key parameters (e.g., velocity coefficients and discount factor γ) were not arbitrarily assigned. Instead, they were derived from published empirical studies, engineering codes, and peer-reviewed evacuation experiments, ensuring their consistency with observed behavioral ranges. Conducting direct experiments involving mobility-impaired participants would entail significant ethical and safety risks; therefore, this study relied on validated simulation-based calibration. Future research will focus on integrating safe, drill-based datasets to further strengthen empirical validation and parameter calibration.

Nonetheless, several limitations should be acknowledged. First, the current simulations were conducted in a simplified rectangular space to ensure clarity and computational tractability. However, this simplification constrains the direct applicability of the results to complex architectural layouts with obstacles, corridors, or multi-level structures. Future work will extend the FFCA + DQN framework to real-world building layouts such as malls, hospitals, and transport hubs, and incorporate three-dimensional spatial features to evaluate its performance under realistic evacuation conditions.

Second, the computational cost associated with DQN training is non-negligible. The present experiments were performed on a workstation equipped with an Intel i9-13900K CPU, 64 GB RAM, and an NVIDIA RTX 4090 GPU. Each training iteration required approximately 1.5–2.3 s, and a full convergence run (10⁴ episodes) took 4–6 h depending on density. However, once trained, the policy network operates in real time (less than 10 ms per decision step). The main computational burden lies in offline training, suggesting that the framework is feasible for real-world deployment in its inference form. Future work will explore distributed or transfer reinforcement learning to speed up training for large-scale or multi-scenario environments. It will also examine lightweight model variants for embedded or edge-computing devices.

Finally, this study extends a single-agent DQN policy to multi-agent evacuation scenarios without retraining. While this simplification facilitates computational efficiency, it inevitably limits the model’s capacity to capture emergent cooperative behaviors and adaptive interactions among evacuees. Future research will investigate multi-agent reinforcement learning (MARL) extensions that explicitly model communication, coordination, and collective adaptation in dynamic crowd settings. Such MARL-based formulations are expected to enhance behavioral realism, improve generalization to unseen environments, and strengthen the scalability of the proposed evacuation framework.

In summary, this research provides a novel and effective approach for inclusive evacuation modeling by combining FFCA and DRL. With further refinement and empirical calibration, the proposed FFCA + RL framework holds significant potential to support real-time decision-making in emergency management, contributing to the advancement of intelligent, adaptive, and human-centered evacuation systems.

Author Contributions

Conceptualization, Y.L. and H.W.; methodology, Y.L.; software, Y.L.; validation, Y.L. and H.W.; formal analysis, Y.L.; investigation, Y.L.; resources, Y.L.; data processing, H.W.; writing—original draft preparation, Y.L.; writing—review and editing, Y.L. and H.W.; visualization, Y.L.; supervision, H.W.; project administration, H.W.; funding acquisition, H.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, “Research on Collaborative Regulation of Urban Logistics Supply Chain Based on Network Theory”, grant number 61772062, and the National 14th Five-Year Key Research and Development Plan Project “Multi-Layer Complex Network Pinning Coordination Mechanism of Large-scale Parts Supply Chain”, grant number 2022YFB3305600.

Data Availability Statement

The original contributions presented in the study are included in the article; further inquiries can be directed to the corresponding author.

Acknowledgments

We would like to express our sincere gratitude to the anonymous reviewers for their constructive comments and suggestions, which have significantly improved the quality of this manuscript. We are also deeply appreciative of the editorial team and the reviewers at Buildings for their meticulous work and guidance throughout the review process. Their expertise and dedication have been invaluable in enhancing the clarity and rigor of our research.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

CA	Cellular Automaton
FFCA	Floor Field Cellular Automaton
SFF	Static Floor Field
DFF	Dynamic Floor Field
RL	Reinforcement Learning
DRL	Deep Reinforcement Learning
DQN	Deep Q-Network

Appendix A

Table A1. Representative horizontal walking speed ranges and mapping to model coefficients.

Group	Representative Coefficient (This Study)	Typical Horizontal Speed Range Reported in the Literature (m/s)	Representative References
Able-bodied/normal (adult, horizontal)	1.2 (study baseline choice)	≈0.9–1.2 m/s (free/pedestrian studies; dependent on age, sex, context)	Giannoulaki & Christoforou (2024) [72]—systematic review
Wheelchair users/mobility-impaired	0.6	≈0.3–0.9 m/s (context dependent; short bouts and community mobility show median ~0.4–0.5 m/s in many studies; some controlled experiments report higher values under unimpeded conditions)	Sonenblum et al. (2012); Tsuchiya et al. (2007); Geoerg et al. (2019) [73,74,75]
Visually impaired	0.8	≈0.7–1.0 m/s (horizontal; speeds decline under unfamiliar/low-visibility conditions)	Sørensen (2014)—DTU experiments; Clark-Carter et al. (1986) [76,77]
Hearing impaired	1	No consistent reduction in steady horizontal walking speed reported (primary impacts are on detection/pre-movement)	Hashemi (2018); Choi et al. (2019) [78,79]

This appendix presents the detailed numerical results that complement the summary findings shown in Table 6 of the main text. Specifically, the tables below report the average evacuation times of the FFCA and FFCA + DQN models under varying combinations of pedestrian velocity coefficients and exit configurations. Each simulation scenario was executed 10 times with randomized initial conditions using the built-in stochastic module of the simulation platform Vadere.3.0. The software automatically outputs the averaged evacuation time for each condition; therefore, individual run-level data were not available for direct computation of statistical dispersion. This averaging process, however, effectively minimizes stochastic variability and provides a representative measure of the expected evacuation performance. The results consistently indicate that the FFCA + DQN framework outperforms the baseline FFCA across all speed and exit conditions, demonstrating its adaptability and efficiency in dynamic evacuation environments.

Table A2. Comparison of average evacuation time under different velocities and densities.

Density Level	Group	Velocity Condition	SFF + DFF (1 Exit)	DQN (1 Exit)	Time Difference1 (s)	Improvement (%)
High (N = 480)	Normal	Velocity −50%	75.6	59.4	16.2	21.4%
		Velocity −30%	43.1	39.9	3.2	8%
		Velocity	38.6	26.7	11.9	30.8%
		Velocity +30%	29.9	23.2	6.7	22.4%
		Velocity +50%	28.8	20.5	8.3	28.8%
	Wheelchair User	Velocity −50%	187.9	120.8	67.1	35.7%
		Velocity −30%	87.2	85.0	2.2	2.5%
		Velocity	83.1	60.6	22.5	27%
		Velocity +30%	62.3	46.2	16.1	25.8%
		Velocity +50%	57.4	41.1	16.3	28.4%
	Visually Impaired	Velocity −50%	132.7	101.1	31.6	23.8%
		Velocity −30%	65.2	64.4	0.8	1.2%
		Velocity	63.1	49.5	13.6	21.6%
		Velocity +30%	53.4	39.0	14.4	26.9%
		Velocity +50%	45.2	33.1	12.1	26.7%
	Hearing Impaired	Velocity−50%	103.7	72.5	31.2	30%
		Velocity−30%	53.2	43.3	9.9	18.6%
		Velocity	48.3	37.8	10.5	21.7%
		Velocity+30%	38.6	28.9	9.7	25.1%
		Velocity+50%	30.3	25.4	4.9	16.1%
Medium (N = 320)	Normal	Velocity−50%	50.2	30.4	19.8	39.4%
		Velocity−30%	43.8	35.0	8.8	25.1%
		Velocity	25.0	23.1	1.9	7.6%
		Velocity+30%	21.0	19.9	1.1	5.5%
		Velocity+50%	20.1	18.2	1.9	10.4%
	Wheelchair User	Velocity−50%	159.5	73.8	85.7	53.7%
		Velocity−30%	96.3	96.1	0.2	0.2%
		Velocity	61.1	59.2	1.9	3.2%
		Velocity+30%	45.2	43.1	2.1	4.6%
		Velocity+50%	39.8	35.2	4.6	11.6%
	Visually Impaired	Velocity−50%	73.8	46.5	27.3	36.9%
		Velocity−30%	63.4	59.8	3.6	6%
		Velocity	41.8	38.5	3.3	7.8%
		Velocity+30%	36.7	32.9	3.8	10.3%
		Velocity+50%	42.8	23.5	19.3	45%
	Hearing Impaired	Velocity−50%	60.0	41.2	18.8	31.3%
		Velocity−30%	59.9	48.8	11.1	18.5%
		Velocity	41.6	38.4	3.2	7.6%
		Velocity+30%	32.5	28.6	3.9	12%
		Velocity+50%	23.5	22.9	0.6	2.5%
Low(N = 160)	Normal	Velocity−50%	46.8	39.9	6.9	14.7%
		Velocity−30%	34.3	30.1	4.2	12.2%
		Velocity	23.9	19.6	4.3	17.9%
		Velocity+30%	19.8	15.2	4.6	23.2%
		Velocity+50%	18.6	14.8	3.8	20.4%
	Wheelchair User	Velocity−50%	98.2	88.4	9.8	9.9%
		Velocity−30%	69.0	63.3	5.7	8.2%
		Velocity	49.2	45.6	3.6	7.3%
		Velocity+30%	39.6	35.3	4.3	10.9%
		Velocity+50%	28.6	26.6	2	6.9%
	Visually Impaired	Velocity−50%	81.3	77.6	3.7	4.5%
		Velocity−30%	59.4	58.1	1.3	2.1%
		Velocity	40.0	35.1	4.9	12.2%
		Velocity+30%	31.9	25.3	6.6	20.6%
		Velocity+50%	38.6	23.2	15.4	39.9%
	Hearing Impaired	Velocity−50%	60.5	58.4	2.1	3.4%
		Velocity−30%	43.6	38.7	4.9	11.2%
		Velocity	35.1	25.6	9.5	27%
		Velocity+30%	25.0	20.0	5	20%
		Velocity+50%	23.2	16.8	6.4	27.6%

NOTE: Improvement (%) = (Evacuation time_FFCA − Evacuation time_DQN)/Evacuation time_FFCA × 100%. Each reported value represents the average evacuation time automatically computed from 10 randomized simulation runs by the platform.

References

Mao, H.; Zhu, L. Simulation and optimization of hybrid evacuation model for high-rise building under emergency condition. Int. J. Model. Simul. Sci. Comput. 2023, 14, 2350039. [Google Scholar] [CrossRef]
Huang, Z.; Fan, R.; Fang, Z.; Ye, R.; Li, X.; Xu, Q.; Gao, H.; Gao, Y. Performance of occupant evacuation in a super high-rise building up to 583 m. Phys. A Stat. Mech. Its Appl. 2022, 589, 126643. [Google Scholar] [CrossRef]
Glauberman, G. Scoping review of fire safety behaviors among high-rise occupants: Implications for public health nursing. Public Health Nurs. 2020, 37, 371–379. [Google Scholar] [CrossRef]
Disability. Available online: https://www.who.int/news-room/fact-sheets/detail/disability-and-health (accessed on 31 October 2024).
Morris, S.; Fawcett, G.; Brisebois, L.; Hughes, J. A Demographic, Employment and Income Profile of Canadians with Disabilities Aged 15 Years and Over, 2017; Statistics Canada: Ottawa, ON, Canada, 2018. [Google Scholar]
Zhang, C.; Wu, X.; Shen, H. Research on coupling evacuation of escalator and staircase in fire scenario. PLoS ONE 2025, 20, e0314455. [Google Scholar] [CrossRef] [PubMed]
Chen, N.; Zhao, M.; Gao, K.; Zhao, J. The Physiological Experimental Study on the Effect of Different Color of Safety Signs on a Virtual Subway Fire Escape. Int. J. Environ. Res. Public Health 2020, 17, 5903. [Google Scholar] [CrossRef]
Yan, Z.; Wang, Y.; Chao, L. Simulation study on fire and evacuation of super high-rise commercial building. Case Stud. Therm. Eng. 2023, 52, 103519. [Google Scholar] [CrossRef]
Fu, L.; Qin, H.; He, Y.; Shi, Y. Application of the social force modelling method to evacuation dynamics involving pedestrians with disabilities. Appl. Math. Comput. 2024, 460, 128297. [Google Scholar] [CrossRef]
Wang, Z.; He, R.; Rebelo, F.; Vilar, E. Human interaction with virtual reality: Investigating pre-evacuation efficiency in building emergency. Virtual Real. 2023, 27, 1039–1050. [Google Scholar] [CrossRef]
Zheng, Y.; Jia, B.; Li, X.G.; Jiang, R. Evacuation dynamics considering pedestrians’ movement behavior change with fire and smoke spreading. Saf. Sci. 2017, 92, 180–189. [Google Scholar] [CrossRef]
Li, Y.; Chen, M.Y.; Dou, Z.; Zheng, X.P.; Cheng, Y.; Mebarki, A. A review of cellular automata models for crowd evacuation. Phys. A Stat. Mech. Its Appl. 2019, 526, 120752. [Google Scholar] [CrossRef]
Sahin, C. Social Media Emergency Analysis and Realistic Evacuation Modeling. Doctoral Thesis, University of Calgary, Calgary, AB, Canada, 2021. Available online: https://prism.ucalgary.ca (accessed on 24 August 2025).
Pelechano, N.; Malkawi, A. Evacuation simulation models: Challenges in modeling high rise building evacuation with cellular automata approaches. Auton. Constr. 2008, 17, 377–385. [Google Scholar] [CrossRef]
Goldengorin, B.; Makarenko, A.; Smelyanec, N. Some applications and prospects of cellular automata in traffic problems. In Proceedings of the International Conference on Cellular Automata for Research and Industry, Heidelberg, Germany, 4–6 October 2004; Springer: Berlin/Heidelberg, Germany, 2006; pp. 532–537. [Google Scholar]
Wąs, J.; Porzycki, J.; Lubaś, R.; Miller, J.; Bazior, G. Agent-based approach and cellular automata-a promising perspective in crowd dynamics modeling. Acta Phys. Pol. B Proc. Suppl. 2016, 9, 133–144. [Google Scholar] [CrossRef]
Sirakoulis, G.C. Cellular automata for crowd dynamics. In Proceedings of the International Conference on Implementation and Application of Automata, Giessen, Germany, 30 July–2 August 2014; Springer International Publishing: Cham, Switzerland, 2014; pp. 58–69. [Google Scholar]
Sirakoulis, G.C. The computational paradigm of cellular automata in crowd evacuation. Int. J. Found. Comput. Sci. 2015, 26, 851–872. [Google Scholar] [CrossRef]
You, Y.; Ye, R.; Fang, Z.; Ren, X.; Xie, S.; Huang, P.; Yu, L.; Yu, T.; Yan, J. Evacuation dynamics of heterogeneous crowds involving individuals with different types of disabilities. Saf. Sci. 2023, 168, 106297. [Google Scholar] [CrossRef]
Xing, S.; Wang, C.; Wang, W.; Cao, R.F.; Yuen, A.C.Y.; Lee, E.W.M.; Yeoh, G.H.; Chan, Q.N. A fine discrete floor field cellular automaton model with natural step length for pedestrian dynamics. Simul. Modell. Pract. Theory 2024, 130, 102841. [Google Scholar] [CrossRef]
Yue, H.; Zhang, J.; Chen, W.; Wu, X.; Zhang, X.; Shao, C. Simulation of the influence of spatial obstacles on evacuation pedestrian flow in walking facilities. Phys. A Stat. Mech. Its Appl. 2021, 571, 125844. [Google Scholar] [CrossRef]
Fu, L.; Liu, Y.; Shi, Y.; Zhao, Y. Dynamics of bidirectional pedestrian flow in a corridor including individuals with disabilities. Phys. A Stat. Mech. Its Appl. 2021, 580, 126140. [Google Scholar] [CrossRef]
Fu, L.; Liu, Y.; Shi, Y.; Zhao, Y. Unidirectional pedestrian flow in a corridor involving individuals with disabilities: A modified floor field modelling approach. J. Stat. Mech. Theory Exp. 2021, 2021, 073402. [Google Scholar] [CrossRef]
Yang, X.; Yang, X.; Wang, Q. Pedestrian evacuation under guides in a multiple-exit room via the fuzzy logic method. Commun. Nonlinear Sci. Numer. Simul. 2020, 83, 105138. [Google Scholar] [CrossRef]
Liu, T.; Yang, X.; Wang, Q.; Zhou, M.; Xia, S. A fuzzy-theory-based cellular automata model for pedestrian evacuation from a multiple-exit room. IEEE Access 2020, 8, 106334–106345. [Google Scholar] [CrossRef]
Rasa, A.R.; Xia, L.; Song, X.; Yu, H.; Karim, R.; Zhang, J.; Song, W. Understanding human-obstacle interaction dynamics on staircases: Implications for emergency evacuation and fire safety in high-rise buildings. J. Build. Eng. 2024, 98, 111082. [Google Scholar] [CrossRef]
Hu, Y.H.; Bi, Y.B.; Zhang, J.; Lian, L.P.; Song, W.G.; Gao, W. Effect of a static pedestrian as an exit obstacle on evacuation. Chin. Phys. B 2023, 32, 018901. [Google Scholar] [CrossRef]
Pouw, C.A.; Corbetta, A.; Gabbana, A.; Van der Laan, C.; Toschi, F. High-statistics pedestrian dynamics on stairways and their probabilistic fundamental diagrams. Transp. Res. Part C Emerg. Technol. 2024, 159, 104468. [Google Scholar] [CrossRef]
Xie, Q.; Wu, Y.; Wang, Y.; Zhang, H. A multi-grid evacuation model considering the effects of different turning types. Phys. A Stat. Mech. Its Appl. 2024, 635, 129497. [Google Scholar] [CrossRef]
Duan, Y.; Li, C.; He, J. A refined simulation method of building earthquake evacuation processes considering multi-exits and time-variant velocities. Earthq. Spectra. 2024, 40, 705–731. [Google Scholar] [CrossRef]
Yue, F.R.; Chen, J.; Ma, J.; Song, W.G.; Lo, S.M. Cellular automaton modeling of pedestrian movement behavior on an escalator. Chin. Phys. B 2018, 27, 124501. [Google Scholar] [CrossRef]
Wang, J.; Li, J.; Feng, J.; Xu, S.; Liu, J.; Wang, Y. Performance optimization of the obstacle to corner bottleneck under emergency evacuation. J. Build. Eng. 2022, 45, 103658. [Google Scholar] [CrossRef]
Huo, F.; Li, C.; Li, Y.; Lv, W.; Ma, Y. An extended model for describing pedestrian evacuation considering the impact of obstacles on the visual view. Phys. A Stat. Mech. Its Appl. 2022, 604, 127932. [Google Scholar] [CrossRef]
Ding, N.; Chen, T.; Zhang, H. Simulation of high-rise building evacuation considering fatigue factor based on cellular automata: A case study in China. In Building Simulation; Tsinghua University Press: Beijing, China, 2017; Volume 10, pp. 407–418. [Google Scholar]
Zou, B.; Lu, C.; Li, Y. Simulation of a hospital evacuation including wheelchairs based on modified cellular automata. Simul. Model. Pract. Theory. 2019, 99, 102018. [Google Scholar] [CrossRef]
Wang, K.; Hao, H.; Jiang, S.; Wu, Z.; Cui, C.; Shao, H.; Zhang, W. Escape route optimization by cellular automata based on the multiple factors during the coal mine disasters. Nat. Hazards 2019, 99, 91–115. [Google Scholar] [CrossRef]
Dong, L.; Yuan, W.; Deng, Y. A Study of Evacuation Model Based on Personnel Vision Change. Intell. Fuzzy Syst. 2023, 44, 6231–6247. [Google Scholar] [CrossRef]
Zheng, Y.; Jia, B.; Li, X.G. Evacuation dynamics with fire spreading based on cellular automaton. Phys. A Stat. Mech. Its Appl. 2011, 390, 3147–3156. [Google Scholar] [CrossRef]
Cao, S.; Song, W.; Lv, W.; Fang, Z. A multi-grid model for pedestrian evacuation in a room without visibility. Phys. A Stat. Mech. Its Appl. 2015, 436, 45–61. [Google Scholar] [CrossRef]
Yue, H.; Guan, H.Z.; Zhang, X.; Shao, C.F. Simulation of pedestrian evacuation with asymmetrical exits layout. Phys. A Stat. Mech. Its Appl. 2010, 390, 198–207. [Google Scholar] [CrossRef]
Guan, J.B.; Wang, K.H.; Chen, F. A cellular automaton model for evacuation flow using game theory. Phys. A Stat. Mech. Its Appl. 2016, 461, 655–661. [Google Scholar] [CrossRef]
Song, W.G.; Yu, Y.F.; Wang, B.H.; Fan, W.C. Evacuation behaviors at exit in CA model with force essentials: A comparison with social force model. Phys. A Stat. Mech. Its Appl. 2006, 37, 658–666. [Google Scholar] [CrossRef]
Henein, C.M.; White, T. Macroscopic effects of microscopic forces between agents in crowd models. Phys. A Stat. Mech. Its Appl. 2007, 373, 694–712. [Google Scholar] [CrossRef]
Guo, R.Y.; Huang, H.J. A mobile lattice gas model for simulating pedestrian evacuation. Phys. A Stat. Mech. Its Appl. 2008, 387, 580–586. [Google Scholar] [CrossRef]
Guo, R.Y.; Huang, H.J. A modified floor field cellular automata model for pedestrian evacuation simulation. J. Phys. A Math. Gen. 2008, 41, 385104. [Google Scholar] [CrossRef]
Georgoudas, G.; Sirakoulis, G.C.; Andreadis, I. An Intelligent Cellular Automaton Model for Crowd Evacuation in Fire Spreading Conditions. In Proceedings of the 19th IEEE International Conference on Tools with Artificial Intelligence (ICTAI 2007), Patra, Greece, 29–31 October 2007; IEEE Computer Society Press: Washington, DC, USA, 2007; pp. 36–43. [Google Scholar] [CrossRef]
Takabatake, T.; Asai, K.; Kakuta, H.; Hasegawa, N. Optimizing evacuation paths using agent-based evacuation simulations and reinforcement learning. Int. J. Disaster Risk Reduct. 2025, 117, 105173. [Google Scholar] [CrossRef]
Zhou, Y.; Yang, Y.; Liu, D.; Liu, Y.; Namilae, S.; Song, H. Real-Time Deep Reinforcement Learning for Evacuation Under Emergencies (No. CATM-2024-R1-ERAU); Center for Advanced Transportation Mobility, North Carolina A&T State University: Greensboro, NC, USA, 2024. [Google Scholar]
Pascucci, F.; Rinke, N.; Schiermeyer, C.; Friedrich, B.; Berkhahn, V. Modeling of shared space with multi-modal traffic using a multi-layer social force approach. Transp. Res. Procedia 2015, 10, 316–326. [Google Scholar] [CrossRef]
Huang, L.; Wu, J.; You, F.; Lv, Z.; Song, H. Cyclist social force model at unsignalized intersections with heterogeneous traffic. IEEE Trans. Ind. Inform. 2017, 13, 782–792. [Google Scholar] [CrossRef]
Jiang, Y.; Chen, B.; Li, X.; Ding, Z. Dynamic navigation field in the social force model for pedestrian evacuation. Appl. Math. Model. 2020, 80, 815–826. [Google Scholar] [CrossRef]
Wharton, A. Simulation and Investigation of Multi-Agent Reinforcement Learning for Building Evacuation Scenarios. Report, St Catherine’s College. 2009. Available online: https://www.robots.ox.ac.uk/~ash/4YP%20Report.pdf (accessed on 24 August 2025).
Tian, K.; Jiang, S. Reinforcement learning for safe evacuation time of fire in Hong Kong-Zhuhai-Macau immersed tube tunnel. Syst. Sci. Control Eng. 2018, 6, 45–56. [Google Scholar] [CrossRef]
Li, D.; Zhang, Z.; Alizadeh, B.; Zhang, Z.; Duffield, N.; Meyer, M.A.; Thompson, C.M.; Gao, H.; Behzadan, A.H. A reinforcement learning-based routing algorithm for large street networks. Int. J. Geogr. Inf. Sci. 2024, 38, 183–215. [Google Scholar] [CrossRef]
Nayeem, G.M.; Fan, M.; Daiyan, G.M. Adaptive Q-Learning Grey Wolf Optimizer for UAV Path Planning. Drones 2025, 9, 246. [Google Scholar] [CrossRef]
Sharma, J.; Andersen, P.A.; Granmo, O.C.; Goodwin, M. Deep Q-learning with Q-matrix transfer learning for novel fire evacuation environment. IEEE Trans. Syst. Man Cybern. Syst. 2020, 51, 7363–7381. [Google Scholar] [CrossRef]
Zhang, Y.; Chai, Z.; Lykotrafitis, G. Deep reinforcement learning with a particle dynamics environment applied to emergency evacuation of a room with obstacles. Phys. A Stat. Mech. Its Appl. 2021, 571, 125845. [Google Scholar] [CrossRef]
Von Neumann, J.; Taub, A.H. The General and Logical Theory of Automata; MIT Press: Cambridge, MA, USA, 1963; Volume 5. [Google Scholar]
Ji, J.; Lu, L.; Jin, Z.; Wei, S.; Ni, L. A cellular automata model for high-density crowd evacuation using triangle grids. Phys. A Stat. Mech. Its Appl. 2018, 509, 1034–1045. [Google Scholar] [CrossRef]
Kirchner, A.; Schadschneider, A. Simulation of evacuation processes using a bionics-inspired cellular automaton model for pedestrian dynamics. Phys. A Stat. Mech. Its Appl. 2002, 312, 260–276. [Google Scholar] [CrossRef]
Nishinari, K.; Kirchner, A.; Namazi, A.; Schadschneider, A. Extended floor field CA model for evacuation dynamics. IEICE Trans. Inf. Syst. 2004, 87, 726–732. [Google Scholar]
Burstedde, C.; Klauck, K.; Schadschneider, A.; Zittartz, J. Simulation of pedestrian dynamics using a two-dimensional cellular automaton. Phys. A Stat. Mech. Its Appl. 2001, 295, 507–525. [Google Scholar] [CrossRef]
Kirchner, A.; Nishinari, K.; Schadschneider, A. Friction effects and clogging in a cellular automaton model for pedestrian dynamics. Phys. Rev. E 2003, 67, 056122. [Google Scholar] [CrossRef]
Kirchner, A.; Klüpfel, H.; Nishinari, K.; Schadschneider, A.; Schreckenberg, M. Simulation of competitive egress behavior: Comparison with aircraft evacuation data. Phys. A Stat. Mech. Its Appl. 2003, 324, 689–697. [Google Scholar] [CrossRef]
Helbing, D.; Farkas, I.; Vicsek, T. Simulating dynamical features of escape panic. Nature 2000, 407, 487–490. [Google Scholar] [CrossRef]
Clifton, J.; Laber, E. Q-learning: Theory and applications. Annu. Rev. Stat. Appl. 2020, 7, 279–301. [Google Scholar] [CrossRef]
Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 1998; Volume 1. [Google Scholar]
Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef] [PubMed]
Lin, L.J. Self-improving reactive agents based on reinforcement learning, planning and teaching. Mach. Learn. 1992, 8, 293–321. [Google Scholar] [CrossRef]
Galán, S.F. Fast Evacuation Method: Using an effective dynamic floor field based on efficient pedestrian assignment. Saf. Sci. 2019, 120, 79–88. [Google Scholar] [CrossRef]
Saeed, M.A.; Ahmed, Z.; Yang, J.; Zhang, W. An optimal approach of wind power assessment using Chebyshev metric for determining the Weibull distribution parameters. Sustain. Energy Technol. Assess. 2020, 37, 100612. [Google Scholar] [CrossRef]
Giannoulaki, M.; Christoforou, Z. Pedestrian Walking Speed Analysis: A Systematic Review. Sustainability 2024, 16, 4813. [Google Scholar] [CrossRef]
Sonenblum, S.E.; Sprigle, S.; Lopez, R.A. Manual wheelchair use: Bouts of mobility in everyday life. Rehabil. Res. Pract. 2012, 2012, 753165. [Google Scholar] [CrossRef]
Tsuchiya, S.; Hasemi, Y.; Furukawa, Y. Evacuation characteristics of group with wheelchair users. In Proceedings of the Asia-Oceania Symposium on Fire Science and Technology, Hong Kong, China, 20–22 September 2007; International Association for Fire Safety Science: London, UK, 2007; pp. 1–12. [Google Scholar]
Geoerg, P.; Berchtold, F.; Gwynne, S.; Boyce, K.; Holl, S.; Hofmann, A. Engineering egress data considering pedestrians with reduced mobility. Fire Mater. 2019, 43, 759–781. [Google Scholar] [CrossRef]
Sørensen, J.G. Evacuation of People with Visual Impairments; Technical University of Denmark: Lyngby, Denmark, 2014. [Google Scholar]
Clark-Carter, D.D.; Heyes, A.D.; Howarth, C.I. The efficiency and walking speed of visually impaired people. Ergo 1986, 29, 779–789. [Google Scholar] [CrossRef] [PubMed]
Hashemi, M. Emergency evacuation of people with disabilities: A survey of drills, simulations, and accessibility. Cogent. Eng. 2018, 5, 1506304. [Google Scholar] [CrossRef]
Choi, M.; Lee, S.; Hwang, S.; Park, M.; Lee, H.S. Comparison of emergency response abilities and evacuation performance involving vulnerable occupants in building fire situations. Sustainability 2019, 12, 87. [Google Scholar] [CrossRef]

Figure 1. Moore neighborhood.

Figure 2. Conceptual workflow of the proposed FFCA + DQN hybrid evacuation modeling framework.

Figure 3. Optimal path planning with SFF.

Figure 4. Pedestrian distribution at different time steps under SFF-only guidance. Gray rectangles represent the initial positions of pedestrians, solid circles represent evacuated pedestrians, and orange rectangles represent exits. The colors from green to blue indicate static field values decreasing from high to low, and points along the same arc line represent locations with the same field value.

Figure 5. Pedestrian distribution at different time steps under SFF + DFF guidance.

Figure 6. Comparison pedestrian evacuation trajectories under different densities and modes.

Figure 7. Comparison of overlap frequency maps under different densities and modes.

Figure 8. Evacuation trajectories in medium-density pedestrian flows with obstacles (0.4 p/m²). Green rectangles represent the initial positions of pedestrians, orange rectangles represent exits, blue dots represent evacuated pedestrians, and blue lines indicate the evacuation trajectories of different pedestrians.

Figure 9. Trajectory of representative uncompleted pedestrian (SFF + DFF, pedestrian ID = 279).

Figure 10. Q-learning trajectories (

α = 0.2

,

γ = 0.98

, epsilon_start = 0.4, epsilon_end = 0.05).

Figure 10. Q-learning trajectories (

α = 0.2

,

γ = 0.98

, epsilon_start = 0.4, epsilon_end = 0.05).

Figure 11. Learning curves of Q-learning vs. DQN (average reward vs. training episode).

Figure 12. Representative trajectories under DQN.

Figure 13. Effect of population density on evacuation performance with single exit (SFF + DFF vs. DQN).

Figure 14. Effect of exit on evacuation performance with dual exits (SFF + DFF vs. DQN).

Figure 15. Effect of velocity on evacuation performance (SFF + DFF vs. DQN).

Table 1. List of reward value.

Reward	Description	Value
$R_{e x i t}$	Successful evacuation reward	+100
$R_{t i m e}$	Step time penalty	−1
$R_{c r o w d}$	High-density area penalty	−5
$R_{o b s t a c l e}$	Obstacle/collision penalty	−10

Table 2. RL and simulation parameters.

Parameter	Description	Value/Range
$α$	learning rate	0.2
$γ$	discount factor	0.98
$ε_{0}$	initial exploration rate	1.0
$ε_{m i n}$	minimum exploration rate	0.05
k	ε decay rate constant	0.001–0.01
Replay buffer size	capacity of experience storage	10⁴–10⁵
Target update frequency C	steps between target network updates	1000
Batch size	training batch size (DQN)	64–128

Table 3. Comparison of FFCA vs. Q-learning.

Method	Convergence Episodes	Avg. Detour Index	Avg. Evacuation Time (s)	Conflict Count	Success %
FFCA	-	1.2912	91.12	9	90.9375
Q-learning	5000	1.9617	71.2	5	100

Table 4. Comparison of Q-learning vs. DQN.

Algorithm	Avg. Reward	Final Reward	Stability	Avg. Steps
Q-learning	60.42	60.92	6.08	16.9
DQN	311.51	429	318.12	496.2

Table 5. Comparison of average evacuation time under different modes and densities.

Density Level	Group	SFF + DFF (1 Exit)	DQN (1 Exit)	Time Difference1 (s)	Improvement (%)	p-Value (1 Exit)	SFF + DFF (2 Exits)	DQN (2 Exits)	Time Difference2 (s)	Improvement (%)	p-Value (2 Exits)
High (N = 480)	Normal	62.3	29.3	33.0	52.9	** (0.0009)	28.7	17.8	10.9	37.9	** (0.0041)
	Wheelchair User	83.3	78.2	5.1	6.1	* (0.0183)	60.6	42.1	18.5	30.5	** (0.0038)
	Visually Impaired	59.3	42.6	16.7	28.2	** (0.0046)	49.5	31.2	18.3	36.9	** (0.0045)
	Hearing Impaired	38.7	39.7	1	2.6	ns (0.3156)	37.8	28.3	4.5	11.9	* (0.0238)
Medium (N = 320)	Normal	49.6	26.2	23.4	47.1	** (0.0078)	25.0	15.5	9.5	38.0	** (0.0093)
	Wheelchair User	70.8	75.3	5.5	7.8	ns (0.1972)	49.2	38.2	11.0	22.4	* (0.0263)
	Visually Impaired	46.3	37.3	9.0	19.4	* (0.025)	41.2	29.1	12.1	29.4	** (0.0052)
	Hearing Impaired	32.1	31.4	0.7	2.2	ns (0.3023)	30.9	26.8	4.1	13.3	ns (0.1237)
Low (N = 160)	Normal	35.8	23.5	12.2	34.1	** (0.0043)	19.6	13.7	5.9	30.1	** (0.0033)
	Wheelchair User	56.3	68.7	12.4	22.0	** (0.0034)	45.6	34.3	11.3	24.8	* (0.0219)
	Visually Impaired	37.2	33.2	4.0	10.8	ns (0.1272)	35.6	26.2	9.4	26.4	* (0.0236)
	Hearing Impaired	26.1	29.6	3.5	13.4	ns (0.1197)	25.4	22.6	2.8	11.0	ns (0.3087)

NOTE: (1) Improvement (%) = (Evacuation time_FFCA − Evacuation time_DQN)/Evacuation time_FFCA × 100%. Each reported value represents the average evacuation time automatically computed from 10 randomized simulation runs by the platform. (2) The values or ranges in parentheses following **, *, and ns are estimated approximate p-values. ** typically denotes p < 0.01/p ≈ 0.005, * denotes p < 0.05/p ≈ 0.02, and ns denotes p ≥ 0.05. (3) Independent two-sample t-tests were performed to evaluate the statistical significance of differences in evacuation times between the SFF + DFF and FFCA + DQN models. Results show that the improvements are statistically significant (p < 0.05) in most density scenarios, while a few low-density and dual-exit cases exhibit non-significant differences (p > 0.05), reflecting limited congestion and smaller performance margins. Nevertheless, the overall trend confirms the robustness and general efficiency improvement of the proposed model.

Table 6. Model performance under varying physical and learning parameters.

Parameter	Tested Range/Values	Avg. Evac. Time (s)	Std. Dev.	Convergence Stability
Velocity coefficient (v)	±30%, ±50%, baseline	68.4 → 32.6	±2.8 → ±4.2	Stable
Friction coefficient (μ)	0.2 → 0.8, ±50% of baseline	34.7 → 39.8	±3.1	Moderate sensitivity
Density (N)	160 → 480	26.2 → 78.3	±5.7 → ±7.9	Stable

NOTE: Physical parameters were adjusted according to the empirical ranges adopted in previous evacuation studies. Results show moderate sensitivity to friction and strong dependence on walking speed, consistent with Table 5 and Table A2.

Table 7. Reinforcement learning parameter sensitivity.

Parameter	Tested Range/Values	Avg. Evac. Time (s)	Episodes to Converge	Convergence Stability
Learning rate (α)	0.1/0.2/0.3/0.5	36.8/35.6/35.9/37.5	1600/1200/1400/2100	Stable (0.1–0.3), unstable (≥0.5)
Discount factor (γ)	0.90/0.95/0.98/0.99	37.8/35.5/34.9/35.3	1800/1400/1300/1500	Stable near 0.98
Exploration decay (ε)	dynamic (1 → 0.05) vs. fixed 0.1	35.6 vs. 38.2	1200 vs. 2000	Dynamic schedule more stable

NOTE: The baseline DQN configuration (α = 0.2, γ = 0.98, dynamic ε) achieved the best convergence-stability trade-off. Excessively high learning rates or fixed exploration led to oscillatory behavior.

Table 8. Reward scaling sensitivity.

Reward Scaling Factor	Avg. Evac. Time (s)	Episodes to Converge	Stability Assessment	Observed Agent Behavior
0.5× (reduced)	41.7	>2000	Unstable	Frequent oscillation; suboptimal routes; balanced evacuation flow
1.0× (baseline)	35.6	1200	Stable	Stable near 0.98
2.0× (amplified)	34.9	1300	Stable	Slightly faster convergence; reduced exploration
3.0× (over-scaled)	35.8	1700	Partially unstable	Premature convergence; reduced behavioral diversity

NOTE: Reward scaling experiments indicate that moderate amplification (≈2×) improves convergence speed, whereas excessive scaling (≥3×) reduces behavioral diversity and training stability. Proper normalization is thus critical for robust policy learning.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lyu, Y.; Wang, H. Optimizing Evacuation for Disabled Pedestrians with Heterogeneous Speeds: A Floor Field Cellular Automaton and Reinforcement Learning Approach. Buildings 2025, 15, 4191. https://doi.org/10.3390/buildings15224191

AMA Style

Lyu Y, Wang H. Optimizing Evacuation for Disabled Pedestrians with Heterogeneous Speeds: A Floor Field Cellular Automaton and Reinforcement Learning Approach. Buildings. 2025; 15(22):4191. https://doi.org/10.3390/buildings15224191

Chicago/Turabian Style

Lyu, Yimiao, and Hongchun Wang. 2025. "Optimizing Evacuation for Disabled Pedestrians with Heterogeneous Speeds: A Floor Field Cellular Automaton and Reinforcement Learning Approach" Buildings 15, no. 22: 4191. https://doi.org/10.3390/buildings15224191

APA Style

Lyu, Y., & Wang, H. (2025). Optimizing Evacuation for Disabled Pedestrians with Heterogeneous Speeds: A Floor Field Cellular Automaton and Reinforcement Learning Approach. Buildings, 15(22), 4191. https://doi.org/10.3390/buildings15224191

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Optimizing Evacuation for Disabled Pedestrians with Heterogeneous Speeds: A Floor Field Cellular Automaton and Reinforcement Learning Approach

Abstract

1. Introduction

2. Models and Methods

2.1. Cellular Automaton Model

2.2. Floor Field Cellular Automaton Model

2.3. Velocity Heterogeneity Modeling

2.4. Friction-Based Conflict Resolution

2.5. Reinforcement Learning Integration

2.5.1. Q-Learning Algorithm

2.5.2. Deep Q-Network (DQN)

2.6. Model Validation

3. Results and Discussion

3.1. Static Floor Field Calculation and Optimal Path Planning

3.2. Dynamic Floor Field Simulation

3.3. Reinforcement Learning-Based Evacuation Optimization

3.3.1. Single-Agent Path Optimization Based on Q-Learning

3.3.2. DQN Path Optimization with High-Dimensional State

3.4. Multi-Agent Evacuation Optimization

3.5. Sensitivity and Robustness Analysis

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI