1. Introduction
Airports are high-flow, time-sensitive, and tightly coupled critical infrastructures in which passenger-facing operations depend increasingly on digital systems. Global passenger traffic reached about 9.5 billion in 2024, exceeding pre-pandemic levels, which means that even short operational disruptions can affect very large passenger volumes across the airport network [
1]. Modern terminals also rely heavily on digital support for check-in, security coordination, flight information display systems, and wayfinding. During the 2024 CrowdStrike outage, failures in systems supporting booking, check-in, baggage, and crew-related operations quickly produced visible disruption and long airport queues even though aircraft systems and air traffic control were unaffected [
2]. In terminal settings, such incidents can therefore appear as slower service, queue growth, congestion spillback, misrouting, and throughput loss.
Existing research has already explained several important parts of this problem but mostly in separate streams. Smart-airport cybersecurity has been framed as a question of resilience controls and operational continuity rather than a purely technical protection issue, which places cyber incidents within the broader challenge of keeping airport operations functioning under stress [
3]. Discrete-event simulation has also been shown to represent ordered service stages, waiting lines, and resource use in airport terminals, establishing why DES remains a natural way to model passenger processing, queueing, and resource utilization [
4]. Explicit spatial representation has likewise been shown to change how terminal dynamics are understood because local interference and route competition matter, which is why passenger movement, density, and route choice cannot be reduced to queue metrics alone [
5]. Collectively, these streams explain the service-logic side and the spatial-movement side of airport terminals reasonably well.
The critical gap is that existing research still lacks a terminal-level, mechanism-based, and operationally interpretable framework that explains how a cyber disruption propagates simultaneously through service degradation, passenger movement, and behavior uncertainty under impaired guidance. Malicious events in linked infrastructures have been shown to create cascading airport impacts across interdependent systems, but that cascade perspective does not explain how disruption unfolds inside the terminal at the passenger-process level [
6]. Airport-complex analysis has also been argued to require integrated modeling views rather than isolated process descriptions, which points directly to the need for a framework that can explain how queues, walking paths, and local congestion interact during disruption [
7]. A second difficulty is that behavior uncertainty matters in degraded guidance scenarios, yet it cannot be handled by letting a large language model improvise freely. LLM-based simulation is useful when the model role is bounded and explicitly controlled, so behavior uncertainty must be represented in a way that remains interpretable and reproducible rather than autonomous [
8].
This study addresses this gap by proposing a hybrid DES–JuPedSim–LLM framework that couples service events, passenger movement, and guidance-sensitive behavior within a common terminal representation. The framework models passenger processing and queue dynamics with DES, movement and congestion with JuPedSim, and post-security behavioral responses under degraded guidance through structured LLM-generated passenger-level decisions. The LLM is used only to support post-security decision modeling and does not control the simulation or modify service capacities. The framework is evaluated through a 500-passenger departure-terminal case study with one baseline case and four cyber-disruption cases, together with Rotterdam checkpoint validation, a Palma benchmark, and an LLM ablation study. The results show that the framework can capture baseline stability, post-attack throughput loss, subsystem-specific bottleneck shifts, spatial congestion patterns, and guidance-related post-security behavior changes. The contribution of the paper is therefore a methodological framework for exploratory airport cyber-resilience analysis under coupled service, movement, and degraded-guidance conditions.
2. Literature Review
The literature relevant to this study spans airport resilience, airport process simulation, pedestrian microsimulation, hybrid terminal modeling, and LLM-based behavior simulation. Reviewing these streams together is necessary because airport cyber disruption affects service rates, path choice, dwell behavior, and congestion at the same time. The review below is therefore organized around what each stream explains well and what each stream still leaves unresolved for a behavior-aware airport cyber-resilience case study.
2.1. Airport Cyber Resilience and Operational Disruption
Airport cyber-resilience literature explains why airport disruption should be interpreted through continuity and operational performance rather than only through technical failure. Lykou et al. [
3] examined cybersecurity controls for smart airports and argued that resilience measures must be integrated with airport operations. Their study is important here because it frames cyber risk as an operational continuity problem rather than a purely technical security problem. Huang et al. [
9] developed an assessment model for airport resilience and translated resilience into explicit evaluation dimensions. Their framework is useful because it gives a structured way to think about how airport performance can be judged under disruption.
Studies closer to operational disruption reinforce the continuity perspective and extend it toward terminal conditions. Zhou and Chen [
10] measured airport resilience under severe weather and treated resilience as retained operational performance during disruption. Their study matters here because it shows that continuity-oriented resilience metrics remain useful even when the disruption source is not cyber. Zapola et al. [
11] proposed a resilience assessment framework for airport passenger terminal operations and focused their attention directly on passenger-facing terminal functions. Their contribution is relevant because it moves resilience analysis closer to terminal operations, even though it remains framework-oriented rather than mechanism-oriented. Piekert et al. [
6] examined malicious events affecting linked critical infrastructures and showed how airport impacts could cascade across interdependent systems. Their analysis is valuable because it highlights cascading disruption, but it still does not explain how the disruption propagates inside the terminal itself.
What remains underdeveloped in this literature is the terminal-level mechanism of disruption propagation. The resilience perspective is now well established, but the passenger-facing pathway from cyber degradation to queue growth, walking delay, misrouting, and local congestion has not yet been modeled in operational detail.
2.2. Discrete-Event Simulation in Airport Operations
Discrete-event simulation literature shows that airport service processes can be modeled clearly when the main concern is queueing logic, waiting time, and resource utilization. Jim and Chang [
4] presented one of the early airport terminal simulators for planning and design and showed how passenger processing could be decomposed into service stages and waiting lines. Their study remains foundational because it demonstrates why DES is well suited to ordered service logic in terminals. Takakuwa and Oyama [
12] analyzed international-departure passenger flow with a staged simulation structure that connected arrivals, processing times, and congestion. Their work is important because it shows how DES can represent bottleneck formation in an operationally interpretable way.
Later airport studies extended the same modeling logic to more specific operational questions. Ruiz and Cheu [
13] developed a simulation model for airport security screening checkpoints and used it to evaluate checkpoint operations. Their study is relevant because it shows how DES can capture service-stage performance in passenger-facing subsystems. Dorton and Liu [
14] used queueing networks and discrete-event simulation to study airport security screening checkpoint efficiency under different baggage-volume and alarm-rate conditions. Their study is especially useful here because it shows that simulation can directly analyze passenger-facing screening operations, checkpoint congestion mechanisms, and service-efficiency sensitivity in airport security settings. Oprea et al. [
15] modeled passenger flows in an airport terminal with discrete simulation and evaluated waiting time and level of service. Their results reinforce the point that DES remains very effective when the goal is to understand service capacity, queueing, and process efficiency.
This leaves a clear boundary on what DES can explain by itself. It remains highly effective for service flows, queues, and resource utilization, but it becomes less informative once a disruption begins to operate through spatial movement, local crowding, spillback, and competition for access to the next subsystem.
2.3. Pedestrian Simulation and Spatial Congestion Modeling
Pedestrian simulation literature shows why explicit movement modeling is necessary when local congestion, route choice, and guidance quality matter. Fonseca et al. [
5] simulated passenger flow in a hub airport and demonstrated that microscopic movement models revealed spatial interactions that queue-only models could not represent. Their work is valuable because it shows how corridor use, local interference, and movement competition shape terminal performance. Lin et al. [
16] studied guiding-sign optimization for an airport terminal and linked wayfinding quality to pedestrian efficiency. Their results are especially relevant here because degraded information systems are likely to alter route choice, hesitation, and travel time before those effects appear in queue metrics.
Other pedestrian studies further show that movement behavior changes terminal outcomes in ways that aggregated process models cannot capture well. Kalakou and Moura [
17] analyzed passenger behavior in airport terminals through activity preferences and showed that passenger choices affected how terminal dynamics evolved. Their study matters here because it demonstrates that behavior-sensitive movement assumptions change system outcomes. Alam et al. [
18] simulated airport pedestrian movement under social-distancing conditions and emphasized density and circulation effects. Their work reinforces the point that crowding patterns and movement constraints can reshape terminal performance even before service logic is considered.
The limitation here lies in the opposite direction. Pedestrian models capture movement, density, route choice, and guidance effects well, yet they do not naturally represent first-come, first-served queues, server occupancy, or service-time degradation. On their own, they therefore cannot provide a full account of passenger-processing disruption.
2.4. Hybrid Simulation Approaches for Terminal Analysis
The hybrid airport modeling literature shows that service processes and movement processes need to be coupled when terminal interactions matter. Wu et al. [
19] proposed a hybrid queue-based Bayesian network framework for passenger facilitation modeling and explicitly connected process states with passenger movement logic. Their work is important because it shows that queue dynamics and movement conditions can be analyzed within one representation rather than as disconnected layers. Metzner [
20] compared agent-based and discrete-event simulation for airport terminal resilience assessment. That comparison is useful because it demonstrates that different modeling paradigms capture different mechanisms and therefore need to be combined thoughtfully.
Recent studies continue to support hybrid terminal analysis, but they do so mainly for planning and operational design purposes. Ma et al. [
21] developed an integrated passenger-flow model that connected stochastic passenger behavior with terminal analysis. Their study showed that combining movement and decision logic could improve how passenger flows were represented. Anagnostopoulou et al. [
22] used AI-supported simulation to analyze passenger flows in an airport terminal as a decision-making tool. Their contribution reinforced the practical value of coupled representations, but it remained focused on terminal analysis rather than cyber-resilience propagation.
The key unresolved issue is not whether hybrid modeling is useful, but what it has been used for. Most existing hybrid studies are oriented toward efficiency, capacity, planning, or design, whereas airport cyber resilience requires a mechanism-based account of how degraded passenger-facing digital functions cause service degradation and guidance uncertainty to propagate together.
2.5. LLMs and Behavior-Aware Simulation
LLM-based simulation has emerged as a promising approach for representing human-like behavior in synthetic environments, but existing studies also show that such use requires careful control and validation. Aher et al. [
23] used LLMs to simulate multiple humans and replicate human-subject studies in synthetic settings, showing that LLMs can approximate plausible response patterns under structured prompts and evaluation designs. Park et al. [
24] developed a generative-agent environment in which agents produced coherent social behavior over time, demonstrating the potential of LLM-driven agents while also highlighting the importance of structure and memory design.
More recent studies further emphasize the need for evaluation, grounding, and robustness checks. Park et al. [
25] evaluated individual-level simulation against participant-specific responses after grounding the simulation in rich interview data. This study is especially relevant because it treats LLM-based simulation as behavior that must be validated against observed responses rather than assumed to be realistic. Xie et al. [
26] examined whether LLM agents could simulate trust behavior and identified important alignment and robustness limitations. These studies suggest that free-form LLM agents may be useful for exploratory simulation, but they remain difficult to interpret, reproduce, and validate.
In transportation-related research, LLMs have also begun to be applied to cybersecurity-oriented simulation. Gao et al. [
27] proposed a multi-agent framework that uses LLMs to automate traffic scenario generation, cyberattack design, and defense strategy development. This work shows the potential of LLM-based simulation for transportation cybersecurity analysis. However, its focus remains on road traffic environments and does not address airport terminal operations, where service queues, pedestrian movement, and passenger decisions are tightly coupled.
For degraded airport information environments, the relevant behavioral problem is not general human simulation, but passenger decision-making when digital guidance becomes incomplete, delayed, or unreliable. Airport wayfinding studies show that signage, flight information displays, and guidance systems affect passenger route choice, hesitation, and the need for assistance [
28,
29]. Passenger-behavior studies further show that travelers may visit displays, ask staff, wait, enter optional activity areas, or return to the gate depending on their activity preferences and available information [
30,
31]. These findings suggest that degraded information should be modeled as a behavior-sensitive disruption, not only as a reduction in service rate.
Transportation cybersecurity studies support this framing. Smart-airport cybersecurity research treats cyber incidents as operational continuity problems rather than purely technical failures [
3], while malicious-event studies show that disruptions in connected infrastructure can propagate into airport operations [
6]. However, existing LLM-based transportation simulation studies mainly focus on road traffic environments and do not examine passenger-level airport terminal behavior under degraded guidance. This gap motivates a constrained use of LLMs, where the model maps passenger-specific degraded-information states to bounded post-security actions rather than acting as an autonomous agent or generating unconstrained behavior.
2.6. Research Gap
Across these strands, the central gap is still the lack of an operationally interpretable airport-terminal framework that jointly represents degraded service performance, spatial passenger movement, and uncertainty in behavior under impaired guidance. Resilience studies explain why continuity matters. DES studies explain service logic. Pedestrian studies explain spatial interference. Hybrid studies explain why process and movement should be coupled. LLM studies explain why behavior uncertainty can be modeled only under explicit control. What is still missing is a terminal-level, mechanism-based, and unified framework that brings those insights together for airport cyber-disruption analysis.
3. Methodology
This section presents the hybrid simulation framework used to evaluate airport performance and resilience under cyber disruption. The framework combines two main layers: (i) a discrete-event simulation (DES) layer that models passenger-processing logic, service events, and queues, and (ii) a microscopic pedestrian layer in JuPedSim that models movement, congestion, and spatial interactions. A large language model (LLM) is used only in a bounded role to generate passenger-level post-security decisions for travelers affected by degraded display or guidance conditions. Each LLM-generated decision specifies an allowed next action, target area, dwell time, reason code, and confidence score. Service capacities and processing rules remain explicitly defined within the simulation model.
The workflow begins with scenario definition, including layout, demand, service, and cyber-disruption inputs. The DES and pedestrian layers then run in a coupled manner so that service events and physical movement remain synchronized. Under degraded display or guidance conditions, bounded LLM-based decision-making is applied to affected post-security passengers. Each simulation run is then evaluated using performance and resilience metrics.
Figure 1 summarizes this workflow and the coupling among the DES, pedestrian, and cyber layers. It shows the main information exchanges in the framework, including service-state updates from the DES layer, movement and density feedback from the pedestrian layer, and bounded passenger-level post-security decisions injected from the cyber layer for display- or guidance-affected passenger flows.
3.1. Discrete-Event Layer (DES): Queues and Service
3.1.1. Passenger Arrivals
Passenger arrivals define how travelers enter the simulation. A fixed passenger population P is generated for each scenario. Each passenger i is assigned a release time and an entrance , where denotes the set of available terminal entrances. Passengers enter the terminal according to their assigned release times. This setup allows the simulation to represent gradual passenger arrivals rather than assuming that all passengers enter the terminal at the same time. The same passenger population and arrival schedule are used across nominal and disrupted scenarios, so that performance differences can be attributed to the cyber-disruption mechanisms rather than to changes in demand.
3.1.2. Event Scheduling and Simulation Clock
The DES layer uses an event-driven simulation clock. The model updates system states only when key events occur, such as passenger arrival, queue entry, service start, service completion, and transfer to the next stage. Between two consecutive events, the DES state remains unchanged. This event-based structure is suitable for airport passenger-processing systems because queues and services change only when passengers arrive, begin service, or complete service. It also supports coupling with the pedestrian layer: physical arrival at a service point triggers DES queue entry, while service completion releases the passenger to the next movement stage. This design keeps the service process simple, efficient, and synchronized with passenger movement.
3.1.3. Service Stations and Queues
Passenger processing is modeled through service stations and queues. The main service stations are check-in (CI), information desks (IDs), and security screening (SC). Each station has a fixed number of service channels and a nominal service rate. If all channels are busy, passengers wait in a first-come, first-served queue. Queues have finite capacity because the terminal layout provides only a limited number of queue positions. When a queue is full, incoming passengers are redirected to the upstream waiting area until space becomes available. This rule allows the model to capture spillback, where delays at one service point create congestion in nearby areas. The DES layer controls queueing and service logic, while the pedestrian layer controls movement between entrances, service stations, waiting areas, and exits.
3.1.4. Passenger Routing Between Service Stages
The DES layer defines the logical order in which passengers move through service stages. In the nominal process, passengers complete check-in (CI) and then proceed to security screening (SC). Some passengers may also visit the information desk (ID) before security if they need clarification or assistance. This optional branch is controlled by a predefined probability : passengers not selected for this branch follow CI–SC, while selected passengers follow CI–ID–SC. The route defines the required service sequence, but the actual arrival time at each stage depends on queueing delay, walking time, congestion, and access conditions in the pedestrian layer.
3.1.5. Cyber-Disruption Model
A cyber disruption starts at
and degrades selected airport functions. For service-related disruptions, the affected service rate is reduced as
where
is the nominal service rate, and
is the degradation multiplier. A smaller
indicates slower service. Guidance-related disruptions are modeled through changes in passenger post-security decisions.
3.2. JuPedSim Pedestrian Layer: Movement, Congestion, and Route Choice
JuPedSim, based on the Optimal Steps Model (OSM), is used to model passenger movement in the terminal layout [
32]. In OSM, each passenger selects the next feasible position by minimizing a local movement cost:
where
is the set of positions passenger
i can reach in one step, and
represents the movement cost at candidate position
. This cost reflects movement toward the target, obstacle avoidance, and interaction with nearby passengers.
The terminal layout includes walkable areas, barriers, corridor openings, service zones, queueing regions, waiting areas, post-security activity areas, and gates. These spatial elements define JuPedSim movement stages, where each stage represents a spatial objective such as a waiting area, queue slot, service point, activity area, or gate. The DES layer controls service logic and queue release, while JuPedSim controls passenger movement and spatial occupancy.
After security screening, passengers may move to flight information displays, retail areas, food areas, restrooms, or gates. Under degraded display or guidance conditions, LLM-generated decisions may redirect affected passengers to a display, staff-help point, waiting area, optional activity area, wrong intermediate area, or directly to the gate. The possible actions and targets are predefined by the simulation model, and these decisions affect only post-security movement. They do not change DES service capacities.
The pedestrian layer records realized paths, area occupancy, display use, optional activity visits, and gate-arrival behavior. These outputs help identify whether delays are caused by queue pressure, route confusion, post-security dwell, or spatial congestion.
3.3. Coupling DES and JuPedSim
The hybrid model couples service progression in the DES layer with physical movement in the JuPedSim layer. A passenger can join a DES queue only after reaching the corresponding queue-entry region in the pedestrian map. After service is completed in the DES layer, the passenger is released back to the pedestrian layer and moves to the next spatial stage.
Queue spillback is handled through the physical capacity of each queueing area. Let
be the DES queue length at station
k, and let
be the maximum number of passengers that can physically fit in that queueing area. The number of passengers placed in the physical queue is
Passengers beyond this capacity remain in the upstream walking area and increase local pedestrian density. This coupling allows service delay, queue spillback, and walking congestion to affect one another during disrupted operation.
3.4. LLM-Generated Passenger-Level Post-Security Decisions
The LLM is used only after passengers complete security and enter the post-security area. Its role is to generate structured passenger-level decisions under degraded display or guidance conditions.
For each affected passenger, the input state includes the intended gate, time to departure, display status, guidance status, local density, display-area density, gate-area density, current time, scenario, and previous action. The LLM returns one decision with five fields: action, next_target, dwell_time_sec, reason_code, and confidence. The possible actions and targets are predefined by the simulation model.
The prompt used for passenger-level post-security decision generation is shown below.
Generate one passenger-level post-security decision for an airport disruption scenario. |
| You are simulating one passenger inside an airport terminal. |
The passenger has already passed security and is now in the post-security area. |
| Current passenger state: |
| * Passenger ID: {passenger_id} |
| * Intended destination: {intended_destination} |
| * Time to departure: {time_to_departure} |
| * Previous action: {previous_action} |
| Current airport condition: |
| * Scenario: {scenario} |
| * Display status: {display_status} |
| * Guidance status: {guidance_status} |
* Local/display/gate density: {density_state} |
| Choose one allowed action and target: |
| * action: gate/display/staff/wait/shop/food/restroom/wrong route |
| * next target: gate/display area/staff help/shop/food/restroom/wrong area |
| * dwell time: 0–180 s |
| Return ONLY JSON: |
{“action”: “…”, “next_target”: “…”, “dwell_time_sec”: number, “reason_code”: “…”, “confidence”: number} |
The generated decision is applied only in the JuPedSim layer. A passenger may go directly to the gate, check a display, ask staff, wait, visit a shop, food area, or restroom, or first move to a wrong intermediate area before being redirected to the gate. These decisions change only the passenger’s post-security movement path and dwell time.
For reproducibility, each run records the prompt template, model setting, passenger state, random seed, and generated decision. The ablation study compares these LLM-generated decisions with rule-based, random, and no-guidance baselines.
3.5. Performance, Validation, and Benchmark Metrics
This study evaluates airport performance and resilience through operational, spatial, validation, and benchmark metrics. Here, resilience is interpreted as the ability of the terminal system to retain service performance under cyber disruption. The main operational metrics are system-exit throughput, queue length, waiting time, and total completion time.
Let
denote the cumulative number of passengers who have exited the modeled system by time
t. System-exit throughput over a sampling interval
is computed as
Queue burden is measured by the average and peak queue length at each service subsystem. Passenger waiting time is measured as the time between joining a queue and starting service. Total completion time is defined as the time when the last passenger exits the modeled system:
where
is the exit time of passenger
i.
Spatial outputs from the pedestrian layer include passenger paths, area occupancy, display use, optional activity visits, and gate-arrival behavior. These outputs help explain whether a disruption causes delay mainly through queue buildup, route confusion, post-security dwell, or spatial congestion.
For empirical validation, simulated outputs are compared with observed airport data using MAE and RMSE:
where
is the observed value and
is the simulated value. Additional validation measures include calibration error,
, KS distance, and Wasserstein distance when matched observed data are available.
For benchmark analysis, service utilization is computed as
where
is accumulated busy server time, and
is accumulated available server time at subsystem
k.
3.6. Implementation and Scenario Execution
The framework is implemented as a coupled simulation pipeline. Each scenario is defined by the terminal layout, passenger population, service rates, queue capacities, attack start time , degradation factors , and post-security decision rules for guidance-affected passengers. The same layout and demand settings are used for the baseline and disrupted scenarios.
For each run, the model records operational and spatial outputs, including queue lengths, system exits, throughput, passenger trajectories, area occupancy, post-security activity use, and total processing time. These outputs are used to compare nominal and disrupted conditions and to evaluate how cyber degradation affects both service logic and passenger movement.
3.7. Experiment Protocol
For each scenario, the simulation follows the same procedure. First, the baseline DES + JuPedSim case is run under normal service and guidance conditions. Second, the disrupted scenario is run by applying the specified service degradation factors and, when applicable, degraded display or guidance conditions. Third, for passengers affected by degraded guidance after security, the LLM generates structured post-security decisions, including action, target, dwell time, reason code, and confidence. These decisions are then applied only in the JuPedSim movement layer.
Each scenario is repeated under multiple random seeds. The reported results are averaged across repeated runs. The final outputs include throughput, queue length, waiting time, completion time, passenger trajectories, area occupancy, and post-security activity use.
4. Case Studies
This section applies the proposed hybrid DES–JuPedSim–LLM framework through three case studies. Case Study 1 uses a 500-passenger departure-terminal scenario to evaluate cyber-disruption effects. Case Study 2 uses the Rotterdam The Hague Airport checkpoint dataset to compare the security-checkpoint queue and service logic with observed passenger-level data. Case Study 3 uses published Palma de Mallorca Airport parameters to examine whether the simulation logic produces plausible queue, waiting-time, throughput, and utilization patterns under real-airport operating assumptions.
4.1. Case Study 1: Cyber-Disruption Terminal Simulation
Case Study 1 evaluated how cyber-induced degradation affects passenger processing and movement in a 500-passenger departure-terminal scenario. The study included one baseline scenario and four disruption scenarios: check-in degraded, guidance degraded, security degraded, and all degraded.
The terminal layout used in Case Study 1 is shown in
Figure 2 and
Figure 3.
Figure 2 shows the main spatial components, including entrances, waiting areas, ticket counters, information desks, security stations, post-security activity areas, restrooms, and gates.
Figure 3 shows the feasible passenger routes encoded in the layout. These layout elements define the physical environment used by the coupled DES and JuPedSim model.
Table 1 summarizes the main layout and simulation inputs used in Case Study 1, including terminal size, service resources, queue capacities, destinations, and pedestrian movement assumptions.
Table 2 summarizes the main behavioral and disruption parameters used in Case Study 1. The parameter values are based on published airport simulation studies, airport passenger-behavior studies, and reported airport cyber/IT disruption evidence. The check-in settings follow prior airport check-in modeling and reported manual check-in delays during the Collins Aerospace cyberattack [
33,
34,
35]. The degraded security setting is treated as a stress scenario informed by reported cyber-disrupted terminal operations at BER [
36,
37]. The information-desk, display-use, wrong-route, optional-activity, and dwell-time assumptions are informed by airport capacity, wayfinding, passenger-activity, flight-information display, and dwell-time studies [
28,
29,
30,
31,
38,
39,
40].
Cyber-Disruption Simulation Results
Figure 4 compares average system-exit throughput across the baseline and disruption scenarios. The baseline case has the highest throughput at 21.68 passengers/h. Guidance disruption causes only a small reduction to 21.32 passengers/h. In contrast, check-in and security disruptions reduce throughput substantially, to 11.65 and 11.36 passengers/h, respectively. The combined disruption produces the lowest throughput at 11.31 passengers/h. These results show that throughput loss is driven mainly by degraded check-in and security processing. Guidance disruption alone has a limited effect on total throughput, and the combined case remains close to the check-in and security disruption cases because the main bottleneck is already created by degraded service capacity.
Figure 5 compares average check-in and security queue lengths across the baseline and disruption scenarios. Under baseline conditions, the average queue lengths are 3.589 at check-in and 3.104 at security. When check-in service is degraded, the check-in queue increases to 3.823, while the security queue decreases to 1.237 because fewer passengers reach the downstream security stage. When security service is degraded, the security queue increases to 3.722, while the check-in queue remains close to baseline at 3.413. The guidance-degraded case keeps both queues close to the upper range observed across the scenarios, with average queue lengths of 3.921 at check-in and 3.707 at security. The combined case produces the highest average values at both check-in and security, with queue lengths of 3.935 and 3.843, respectively.
Figure 6 compares total completion time across the baseline and disruption scenarios. The baseline case clears the modeled passenger population in 23.06 h, while guidance degradation causes only a small increase to 23.45 h. In contrast, check-in and security degradation substantially increase completion time to 43.29 h and 44.00 h, respectively. The combined case produces the longest completion time at 44.21 h. These results show that system clearing time is driven mainly by degradation at the main service stations, while guidance degradation alone has a limited effect on total completion time.
Figure 7 compares average and 95th-percentile waiting times at check-in and security. Guidance degradation keeps waiting times close to baseline, while service degradation shifts delay toward the affected stage. Under check-in degradation, the average check-in wait rises to 19.86 min and the P95 check-in wait rises to 23.39 min, while security waiting decreases because fewer passengers reach the downstream stage. Under security degradation, the opposite pattern appears, with the average security wait rising to 19.79 min and the P95 security wait rising to 23.50 min. In the combined case, both check-in and security waits remain high, showing that multi-stage service degradation distributes delay across both major processing stages.
Figure 8 shows system-exit throughput over time for the baseline and four disruption scenarios. The baseline case remains relatively stable after the initial transient. In contrast, the check-in-degraded, security-degraded, and combined-degraded cases show a clear throughput drop after the attack onset and remain in a lower operating range for the rest of the simulation. The guidance-degraded case stays comparatively close to the baseline trajectory, indicating that degraded guidance alone has a smaller effect on throughput than degraded processing capacity. This figure complements the aggregate results by showing not only the magnitude of throughput loss but also when the change occurs and how long it persists.
Figure 9 shows spatial congestion heatmaps for the four disruption scenarios. Check-in degradation concentrates congestion near the entrance and check-in waiting area, while security degradation shifts the main hotspot to the security subsystem. The guidance-degraded case remains lighter and more dispersed, consistent with its smaller effect on throughput and waiting-time metrics. In the combined case, both check-in and security hotspots appear in the same layout, showing that congestion is sustained across both major processing stages. These spatial patterns reveal where disruption burdens form and show that the framework identifies bottleneck locations and upstream–downstream congestion effects, rather than only reporting aggregate throughput loss.
4.2. Case Study 2: Rotterdam Checkpoint Validation
Case Study 2 validated the security-checkpoint queue and service logic using the Rotterdam The Hague Airport security-checkpoint dataset released with Janssen et al. [
41,
42]. The dataset provides passenger-level timing observations from a real airport checkpoint, allowing simulated outputs to be compared with observed security-stage time, occupation, and throughput.
The Rotterdam data contain 2277 passenger records organized into 13 observation blocks. A leave-one-block-out validation procedure was used. For each held-out block, the security-stage time distribution was calibrated using the remaining blocks, and the observed passenger arrivals to the checkpoint were used as the simulation input stream. The security-only DES model was then run with 30 stochastic replications, and the simulated security-stage time, occupation, and throughput were compared with the observed values. Checkpoint occupation was used as a queue-related proxy.
Table 3 reports the validation results against the Rotterdam checkpoint observations. The calibrated security-stage timing distribution closely matches the observed data, with a timing MAE of 16.11 s, RMSE of 23.26 s, and timing-distribution
. The throughput comparison is also close in aggregate: under 15 min buckets, the throughput MAE is 7.50 passengers/h and
reaches 0.933. Overall, the validation supports the security-checkpoint queue and service component of the model.
Figure 10 compares observed and simulated checkpoint throughput using 15 min validation buckets. The simulated mean closely follows the observed Rotterdam throughput series, with only short-period deviations. This visual agreement is consistent with the quantitative validation results in
Table 3, where the 15 min throughput MAE is 7.50 passengers/h and the throughput
is 0.933.
Figure 11 compares observed and simulated checkpoint occupation using 15 min validation buckets. The simulated series captures the general scale of and variation in the observed occupation pattern, although some bucket-level differences remain. Because the public Rotterdam data do not provide a separate pre-entry queue length, occupation is used as a queue-related proxy rather than as a direct observed queue-length measure.
Figure 12 compares the empirical cumulative distributions of observed and simulated security-stage times. The two curves closely overlap across most of the distribution, consistent with the KS distance of 0.132 and Wasserstein distance of 16.11 s reported in
Table 3. This result shows that the validation captures not only the mean timing value but also the overall shape of the passenger-level timing distribution.
4.3. Case Study 3: Palma Benchmark and Supporting Evidence
Case Study 3 provided a supporting benchmark for the simulation logic using published Palma de Mallorca Airport parameters and related operational evidence. The Palma benchmark was used as the main normal-operation reference because it provides published arrival profiles, routing split, service rates, dynamic security resources, and benchmark queue statistics [
43]. Additional evidence from TSA and CATSA provided broader throughput and wait-time context [
44,
45,
46], while SEA, Collins, and EUROCONTROL records support the plausibility of cyber-disruption scenarios involving degraded guidance, manual check-in, and multi-system passenger-processing disruption [
34,
47,
48,
49].
The benchmark followed a five-part DES structure: departure hall, check-in, corridor, security control, and boarding gates. Passenger arrivals used the published Weibull profiles for international and Schengen/domestic flows. The routing split followed the Palma study, with 87% of passengers entering check-in and 13% proceeding directly to security. Check-in and security processing times, reference capacities, and dynamic security desk rules are summarized in
Table 4.
Figure 13 shows the five-cell DES benchmark structure adapted from the Palma passenger-flow model.
The Palma benchmark was executed under three normal-operation arrival settings: international, Schengen/domestic, and mixed demand. Each setting used 500 passengers and 30 random seeds. The international and Schengen/domestic cases used two check-in desks, while the mixed case also tested 2, 4, 6, and 8 open check-in desks.
Table 5 summarizes the two single-profile runs. In both cases, the realized routing pattern follows the Palma assumption that most passengers enter check-in before security. With two check-in desks, check-in utilization is high, reaching 0.955 in the international case and 0.927 in the Schengen/domestic case. This produces large check-in queues and long check-in waits, while security queues remain small because dynamic security resources absorb the downstream flow.
Table 6 reports the mixed-demand sensitivity analysis. Increasing the number of open check-in desks from two to eight reduces the average check-in queue from 128.36 to 7.01 passengers and the average check-in wait from 67.43 to 0.61 min. Check-in utilization also decreases from 0.941 to 0.547. These results show that queue formation is mainly driven by near-saturated check-in capacity; once additional check-in desks are opened, the upstream bottleneck is largely relieved.
Figure 14 shows service utilization profiles for the mixed-demand Palma benchmark under different check-in desk configurations. With two check-in desks, check-in utilization remains near saturation for a long period, which is consistent with the large check-in queues reported in
Table 6. As additional check-in desks are opened, check-in utilization decreases and the upstream bottleneck is relieved. Security utilization remains lower in comparison, indicating that the main capacity constraint in this benchmark is the check-in stage rather than security screening.
4.4. Ablation Study of the LLM Decision Module
This section evaluates the contribution of the LLM decision module through an ablation study. The goal is to test whether LLM-generated post-security decisions produce different passenger behavior from simpler alternatives. The comparison includes four variants: an LLM decision variant, a rule-based variant, a random-decision variant, and a no-guidance-effect variant. The analysis focuses on post-security outcomes because the LLM acts only after passengers complete security and does not change DES service capacity or queue rules.
Table 7 reports the LLM inference settings used in the ablation study. The model was accessed through the DeepSeek API with temperature set to zero and top-
to reduce sampling variability. Randomness in the experiments comes from the simulation seeds used for passenger generation, routing, and dwell-time realization.
The ablation used 20 post-security passengers for each guidance-sensitive scenario.
Table 8 compares how the LLM, rule-based, random, and no-guidance variants affect post-security actions and time outcomes. The table is intended to show whether the LLM module changes the specific behavior layer it controls: where passengers go after security and how long they remain in the post-security area. The key comparisons are the action counts for display checking, staff assistance, wrong-route movement, and optional activities, as well as the resulting dwell time, post-security time, and gate delay.
The selected paired
t-tests in
Table 9 compare the LLM decision variant with the rule-based, random, and no-guidance baselines using matched seeds. The results are consistent with the descriptive ablation results in
Table 8. For both display-degraded and combined-degraded scenarios, the LLM variant produces statistically distinguishable post-security travel times compared with the random and no-guidance baselines. The LLM variant also shows significant differences in wrong-route counts compared with the rule-based baseline. These results indicate that the LLM decision module does not simply reproduce rule-based or random behavior. Instead, it generates a distinct post-security behavior pattern, mainly by changing dwell burden, wrong-route behavior, and gate-arrival delay.
Overall, the ablation results demonstrate the added value of the LLM decision module. Compared with the random baseline, the LLM variant produces lower dwell burden, shorter post-security travel time, and lower gate delay, showing that its decisions are more structured than simple random action selection. Compared with the rule-based baseline, the LLM variant produces fewer wrong-route decisions while still generating diverse post-security behaviors, including display checking, staff assistance, and optional activity visits. Compared with the no-guidance baseline, the LLM variant captures guidance-related detours and dwell behavior that would otherwise be absent. These results show that the LLM module contributes a more context-sensitive and operationally interpretable behavior layer for degraded guidance conditions, while DES and JuPedSim continue to control service processing and physical movement.
5. Conclusions
This paper presented a hybrid simulation framework for airport cyber-resilience assessment by integrating DES-based passenger-processing logic, JuPedSim-based pedestrian movement, and LLM-supported post-security decision modeling. The framework captures both operational effects, such as degraded service rates, queue growth, and throughput loss, and spatial-behavioral effects, such as detours, dwell, route confusion, and congestion in terminal space. The main contribution is a coupled methodology for examining how cyber disruption propagates through service processes, passenger movement, and degraded guidance conditions.
A key contribution is the use of structured passenger-specific LLM decisions to represent post-security behavior under degraded display and guidance conditions. The LLM does not control the full simulation or modify service capacities. Instead, it maps passenger-specific local states to predefined post-security actions, such as going to the gate, checking a display, asking staff, waiting, visiting optional activity areas, or first moving to a wrong intermediate area. This design allows the framework to include guidance-sensitive behavior while keeping the simulation interpretable and auditable.
The case-study results show that the framework produces consistent disruption mechanisms. Check-in and security degradation create major throughput loss and longer completion times, while guidance degradation alone has a smaller effect on aggregate throughput but changes post-security behavior. The spatial heatmaps further show where bottlenecks form and how congestion shifts across the terminal. These findings demonstrate that the framework does more than report aggregate performance loss; it identifies how and where disruption burdens emerge.
The additional validation and benchmark results further support the framework. The Rotterdam checkpoint validation shows that the security-processing component can reproduce observed passenger-level timing and throughput patterns with reasonable MAE, RMSE, calibration error, and distribution-fit results. The Palma benchmark shows that the simulation logic produces plausible queue, waiting-time, throughput, and utilization patterns when parameterized with published real-airport inputs. The LLM ablation study further demonstrates the added value of the LLM decision module. Compared with the random baseline, the LLM variant produces lower dwell burden, shorter post-security travel time, and lower gate delay. Compared with the rule-based baseline, it produces fewer wrong-route decisions while still generating diverse post-security behaviors, including display checking, staff assistance, and optional activity visits. These results show that the LLM module provides a more context-sensitive and operationally interpretable behavior layer than simple random or fixed-rule alternatives.
Future work should extend the framework with broader airport-wide calibration, larger passenger volumes, richer recovery strategies, and additional real-world disruption datasets. These extensions would allow the model to support more detailed resilience planning and mitigation analysis across different airport terminal settings.