1. Introduction
Hospitals face the dual challenge of delivering timely, high-quality care while operating under constraints of finite bed availability, limited staffing, and volatile patient inflows. Among the most critical operational problems is scheduling—the real-time assignment of patients to resources such as emergency department (ED) beds, wards, intensive care units (ICUs), clinicians, and diagnostic services—where errors can directly translate into preventable harm.
Real-world clinical evidence highlights the scale of this challenge. In acute care environments like EDs and ICUs, inefficiencies can rapidly become dangerous: in the U.S., median ED boarding times (i.e., the time admitted patients wait in ED before transfer to inpatient beds) are around 2.0 h, with the upper spread (95th percentile) approaching 8 h during periods of high occupancy. When hospital occupancy exceeds 85%, boarding times routinely surpass the widely referenced 4-hour threshold and have been reported to reach approximately 6.6 h. Delays, misallocations, and rejections from ICU admission are also common: one study found that nearly 47% of ED requests for ICU admission were denied or delayed due to capacity constraints. These patterns are echoed internationally, with systematic reviews linking ED crowding to increased waiting times, elevated rates of patients leaving without being seen, higher morbidity and mortality, and poorer patient satisfaction. In many hospitals, approximately 10% of patients depart before clinical evaluation during peak load periods. ICUs, which typically represent only 10–15% of hospital bed capacity, are recurrent bottlenecks; constrained critical care resources often force EDs to “board” critically ill patients or delay ICU transfers, increasing clinical risk [
1,
2,
3,
4,
5].
These pressures highlight the urgent need for scheduling methods that jointly optimize efficiency by reducing delays and diversions, equity by balancing waiting times across triage groups, and safety by ensuring accurate ICU placement and mitigating staff overload.
Historically, hospitals have relied on heuristics such as First-Come-First-Served (FCFS) or fixed triage queues, which are simple to defend but structurally myopic, ignoring demand variability, resource contention, and staff fatigue. During surges such as seasonal outbreaks, static queues fail to redistribute loads or anticipate peaks, often leading to excessive waiting, walkouts, and ICU congestion [
1,
2,
3]. Recent data-driven approaches have also demonstrated the value of learning-based decision systems in hospital operations, including RL-guided treatment strategies in ICU care [
6] and fairness-aware queueing control for clinical scheduling [
7]. Classical optimization techniques, including mixed-integer and linear programming, have improved planned operations but often lack scalability in noisy real-time ED/ICU contexts [
8,
9]. These limitations underscore the need for adaptive, data-driven methods that can anticipate demand and dynamically rebalance resources.
Reinforcement learning (RL) offers a natural fit by framing scheduling as a sequential decision-making problem under uncertainty, optimizing long-run outcomes rather than immediate throughput.
It is important to note that all numerical performance metrics reported later in this paper, e.g., a baseline mean waiting time of 215.3 min under FCFS, are not clinical statistics from real hospitals but values generated by our benchmark-aligned synthetic dataset. These simulation baselines reflect the assumed demand intensity, capacity levels, and triage distributions encoded in the synthetic environment and should be interpreted as internally consistent reference points rather than real-world hospital measurements.
In this study, we propose a fairness-aware Deep Q-Network (DQN) tailored to hospital scheduling, where the state representation integrates patient attributes (triage, waiting time, expected length-of-stay), system attributes (bed availability, ICU pressure, staff workload), and forecast attributes capturing short-horizon mean arrivals (
) and volatility (
). Clinically interpretable actions include allocating ED, ward, or ICU beds, short-waiting or long-waiting, or diverting/transferring patients. Earlier studies have begun exploring fairness-aware decision frameworks in ICU and clinical resource allocation, demonstrating how learning-based policies can incorporate ethical constraints into operational decisions [
10,
11]. Moreover, recent advances in model interpretability, particularly through unified SHAP-based attribution methods [
12], have demonstrated the feasibility of generating transparent, clinically meaningful explanations for complex learning models. The reward function is multi-objective, balancing efficiency, fairness, and safety by penalizing waiting times, diversions, ICU misuse, overtime, and disparities across triage categories [
13,
14].
To ensure clinical trust, every scheduling decision is accompanied by SHAP-based explainability, attributing outcomes to inputs such as triage level, current wait, ICU availability, and forecast signals, thereby supporting both global model interpretability and per-decision auditability [
12]. Broader advances in clinical AI and electronic health record modeling [
15,
16] and extensive reports on operational stressors in emergency care systems [
17,
18] further underscore the need for predictive tools that support real-time decision-making. Short-term forecasts of arrivals and volatility are injected into the state space to transform reactive queuing into proactive control, enabling the system to buffer against imminent demand surges using lightweight, transparent forecasting models [
19,
20,
21,
22,
23,
24,
25]. Fairness is embedded directly in the reward as the variance of mean waiting times across triage categories, with a fairness index monitored during training and evaluation to prevent inequitable trade-offs [
9,
10,
23,
24,
25,
26,
27].
To evaluate performance without compromising confidentiality, we constructed a synthetic, benchmark-aligned dataset simulating approximately 60,000 annual ED visits, with diurnal and seasonal cycles, triage-specific length-of-stay distributions, holiday surges, and staffing rosters of ∼450 personnel. ICU capacity was set at 10–15% of total beds, consistent with operational norms [
1,
2,
3,
4,
5]. This dataset provides realism and reproducibility, enabling controlled comparison against FCFS and ablation models.
Empirical results show that the proposed policy halves the mean waiting time, reduces diversions and transfers, improves ICU matching, raises the fairness index, and lowers staff overtime, all with statistical significance. Moreover, forecast-informed state features yield measurable gains over forecast-naïve DQNs, particularly under surge conditions and shifts in triage mix.
Beyond empirical performance, the integration of fairness constraints, anticipatory context, and SHAP-based explainability provides a principled framework for transparent, auditable, and ethically defensible hospital scheduling. In doing so, this work contributes fourfold:
the design of a fairness-aware DQN optimizing multi-objective outcomes across delays, ICU accuracy, fairness, and overtime [
9,
10,
23,
24,
25,
26,
27];
the injection of short-horizon forecast signals for proactive scheduling [
19,
20,
21,
22,
23,
24,
25];
the embedding of explainability through SHAP to produce global and per-decision justifications [
12,
28];
reproducible evaluation on benchmark-aligned synthetic data [
1,
2,
3,
4,
5,
7,
8,
9,
13,
14].
Taken together, these contributions demonstrate that reinforcement learning can move hospital scheduling from static heuristics to adaptive, auditable control. The broader significance lies in its extensibility: the same principles apply to perioperative flow, disaster triage, vaccination surge planning, and outpatient capacity management—domains where demand volatility, equity, and explainability must be treated as first-class operational constraints [
27,
29,
30,
31,
32,
33].
2. Related Work
Research on hospital scheduling spans rule-based heuristics, operations research (OR), reinforcement learning (RL), forecasting for operational preparedness, and, more recently, fairness and explainability in clinical AI. This section reviews these strands and positions our contribution: a fairness-aware Deep Q-Network (DQN) that integrates short-horizon forecasts directly into the online decision state and exposes per-decision explanations, evaluated against a transparent FCFS baseline.
Hospitals have historically relied on simple rule-based approaches for admission and bed management. Among these, First-Come-First-Served (FCFS) and fixed triage queues remain the most widely deployed, particularly in Emergency Departments (EDs) and inpatient units, because they are transparent, easy to implement, and straightforward to defend in audits. However, these methods are structurally limited. FCFS does not differentiate by patient severity, sometimes prioritizing low-acuity patients over critically ill ones during surges, while static triage queues become brittle under load, especially when ICU capacity saturates. Such failures manifest as prolonged delays, boarding, and even systematic denial of critical care. Empirical evidence shows that these limitations are not confined to a single jurisdiction but are systemic, driven by globally high bed occupancy, chronic crowding, and workforce shortages [
1,
2,
3,
4,
5]. Thus, transparency alone cannot guarantee safety or equity in stochastic, resource-constrained settings.
To move beyond heuristics, the operations research (OR) community has contributed formal optimization models to healthcare scheduling and capacity allocation. Techniques such as mixed-integer programming and linear programming have been used for staff rostering, surgical block scheduling, and bed assignment [
8,
9]. These frameworks excel in encoding complex constraints and yield efficiency gains in tactical and planned contexts. For instance, OR models can balance staff shifts, maximize utilization, and ensure coverage across units. Yet, such methods often assume static inputs and tractable problem sizes. In reality, ED and ICU arrivals are stochastic, non-stationary, and high-dimensional, with interactions between patient severity, length-of-stay, and bed turnover. Under these conditions, optimization approaches frequently become brittle: they must recompute plans in real-time and may not scale fast enough to support continuous admission and deferral decisions. As a result, OR is better suited for medium-term planning and resource allocation than for real-time operational control in dynamic, noisy environments.
Reinforcement learning (RL) offers a powerful alternative by treating scheduling as a sequential decision problem with delayed rewards. Instead of pre-specifying optimal schedules, RL agents learn policies through interaction, adapting to non-stationary demand. Prior studies have shown RL can reduce delays in outpatient scheduling, ED throughput, and ICU triage relative to clinician-derived or heuristic baselines [
7,
11,
13,
14]. These works highlight the potential of learning-based control for hospital operations. However, key limitations remain:
many RL implementations optimize throughput or average delay without incorporating fairness constraints, inadvertently privileging some patient groups over others. This is problematic in ED/ICU contexts, where systematic delays for Triage-1 or Triage-2 patients directly translate to clinical harm.
most RL controllers are reactive: they rely only on realized states (e.g., current occupancy and wait times) and rarely integrate forecasts of near-term arrivals or volatility into their state space. This leaves them ill-prepared for surges.
explainability is seldom embedded. Policies often function as black boxes, making them difficult to audit or justify to clinicians. Unconstrained optimization can disadvantage high-severity or slower-arriving cohorts, undermining trust in AI recommendations.
Recent studies emphasize that embedding fairness directly in the reward—penalizing cross-triage disparities and monitoring a fairness index during training and evaluation—is a principled way to align RL with ethical and clinical priorities [
9,
10,
23,
24,
25,
26,
27].
Beyond these broad observations, only a limited body of work explicitly combines fairness with RL in healthcare. Two main paradigms emerge in the literature: (1) reward-based fairness shaping and (2) constrained optimization via constrained Markov Decision Processes (CMDPs). Foundational work on fairness metrics, such as Jain’s fairness index [
34] and subsequent AI fairness surveys by Rao and Wang [
35], provide the theoretical basis for reward-based equity measures, but their application to clinical scheduling remains sparse. Representative of reward-based shaping, Zhang et al. [
7] incorporate fairness penalties into the reward to reduce waiting-time disparities in queueing systems, though their model does not integrate forecasting nor provide explainability. In operational healthcare contexts, Raschke and Mann [
36] and Maass [
37] discuss equity and ethical resource allocation, but without embedding fairness directly into RL training. In contrast, Suresh et al. [
38] adopt a CMDP-based formulation for ICU allocation, enforcing fairness through Lagrangian constraints. A complementary constrained-RL formulation was introduced by Suresh et al. [
39], who demonstrated how Lagrangian-based policy updates can enforce fairness constraints in clinical resource-allocation settings. While CMDPs offer strong theoretical guarantees, they introduce training instability, require delicate constraint tuning, and lack mechanisms for real-time interpretability. The present study follows the reward-based fairness shaping paradigm but extends it substantially by integrating arrival forecasts (mean and volatility) into the RL state for anticipatory control and by embedding SHAP-based global and local explanations for clinical auditability, capabilities absent in prior fairness-aware RL approaches.
Deep RL architectures present design trade-offs. DQNs are particularly well-suited for discrete assignment actions such as “assign to ED/ward/ICU,” “wait-short,” “wait-long,” or “divert/transfer.” They support auditability via Q-values, experience replay, and target networks, providing stability in high-dimensional but structured problems. Actor–critic and Proximal Policy Optimization (PPO) methods, by contrast, are advantageous for continuous controls such as staffing intensity or resource scaling. Multi-agent RL has also been explored in hospital flow, coordinating interdependent units to reduce cross-ward congestion, though it introduces added complexity and stability concerns [
17,
18,
19,
20,
21,
22]. Existing hospital-flow simulations report substantial reductions in delays and improved unit matching with DQNs, while multi-agent methods achieve coordination gains at higher computational cost. Given the discrete and high-stakes nature of patient-to-bed assignments, Double-DQN architectures are particularly attractive, offering stable learning and transparent updates. Continuous staffing adjustments can later be layered with actor–critic methods, making hybrid approaches promising for future extensions.
Explainability is increasingly required for AI systems in healthcare, especially when recommendations affect patient safety. Clinicians and administrators must understand why a recommendation is made before trusting or acting upon it. Model-agnostic explanation tools, particularly SHAP, provide both global rankings of feature importance and local, per-decision attributions [
12,
28]. In the hospital scheduling context, SHAP can highlight whether a recommendation to assign or defer a patient was driven by triage severity, current waiting time, ICU pressure, or forecasted demand. This supports human-in-the-loop governance: clinicians can retrospectively review patterns, spot-check real-time decisions, and apply overrides when warranted, all without significantly degrading model performance. Embedding explainability directly in the scheduling system thus bridges efficiency gains from RL with the accountability requirements of medical governance.
Forecasting constitutes another crucial dimension. Time-series forecasting is a mature discipline with demonstrated utility in hospital operations. Classical models such as ARIMA and SARIMA, as well as state-space approaches, can accurately predict ED arrivals, ICU demand, and length-of-stay when diurnal or weekly patterns dominate. More recent ML/DL models, including gradient-boosted trees, LSTMs, and Transformers, extend predictive power to more complex signals [
19,
20,
21,
22,
23,
24,
25,
27,
29,
30,
31,
32,
33]. Despite this progress, forecasts are rarely coupled directly with online scheduling. They are commonly used for medium-term staffing and bed capacity planning but seldom injected into the real-time state observed by a controller. This gap limits operational readiness: controllers that respond only to realized congestion cannot proactively adjust for impending surges or rising volatility.
Beyond healthcare, hybrid systems integrating forecasting and scheduling are widespread and successful. Airlines routinely use demand predictions to guide crew and aircraft rotations; supply chains integrate order arrival forecasts into fulfillment and routing decisions. In both domains, hybridization significantly improves efficiency relative to siloed designs [
40,
41]. Despite their success, similar hybrid architectures are rare in healthcare. By incorporating forecast means and volatilities into the online state, controllers can transition from reactive to anticipatory scheduling, buffering against demand shocks before they materialize. To our knowledge, such integration remains rare in hospital-flow RL research, representing a novel contribution of this study.
Positioned against this background, our work integrates four elements that are seldom combined in hospital scheduling:
a Double-DQN tailored to discrete, auditable patient-to-bed actions;
fairness embedded directly in the reward via cross-triage waiting-time penalties and a monitored Fairness Index;
short-horizon arrival mean and volatility features injected into the online state to enable proactive buffering;
SHAP-based per-decision attributions to support clinical governance. This alignment maps directly to clinical KPIs (waiting time, diversions/transfers, ICU match accuracy, fairness, and overtime) and underpins the empirical gains.
As summarized in
Table 1, existing approaches offer partial-progress heuristics provide transparency, OR supports tactical planning, RL enhances efficiency, and forecasting improves preparedness, but few combine fairness, proactive forecasting, and explainability within a single real-time controller. This motivates our framework, which unifies these elements and evaluates them under benchmark-aligned synthetic data.
Beyond individual methodological strands, several operational bed-assignment systems closely relate to the present study. Traditional optimization and metaheuristic approaches, such as the tabu search model of Demeester et al. [
42], address overload and transfer minimization but operate in a static, single-day planning horizon with no mechanism for online adaptation or fairness control. More recent hybrid frameworks, such as Schäfer et al. [
43], combine machine-learning-based patient–unit compatibility predictions with integer programming for daily bed allocation. While these methods improve efficiency, fairness is evaluated only post hoc and the models lack sequential, state-dependent decision capability.
To provide a stronger empirical comparison, two additional baselines were implemented in this study: (i) a hybrid tabu-search scheduler adapted from Demeester et al. [
42], and (ii) a machine-learning-guided optimization model inspired by Schäfer et al. [
43]. Both were evaluated under identical synthetic demand conditions. Results indicate that the proposed DDQN–Fair controller consistently outperforms these methods in mean waiting time (7–12% reduction) and fairness index (+0.08–0.14), demonstrating its ability to jointly improve efficiency and equity in dynamic hospital-flow environments.
Taken together, the reviewed approaches illustrate a progression from transparency to adaptability, yet the field still lacks integrated solutions that balance efficiency, equity, and accountability simultaneously. While heuristics and OR methods offer ease and tactical planning, they struggle in dynamic contexts; RL shows promise but often overlooks fairness and explainability; forecasting is mature but underutilized in online controllers. Crucially, most evaluations stop short of stress-testing under realistic surge conditions, triage imbalances, or workforce volatility scenarios that define everyday hospital operations. By embedding fairness directly in the reward, injecting forecasts into the decision state, and instrumenting explainability at every action, our work addresses these gaps and demonstrates how a unified RL-based scheduler can transition hospital operations from reactive to proactive, equitable, and auditable control.
4. Results and Discussion
This study evaluates the proposed fairness-aware, forecast-informed DQN against a transparent FCFS baseline on a benchmark-aligned synthetic cohort (60,000 visits; realistic ward/ICU ratios; 450 staff; one-year horizon). Metrics include Average Waiting Time, Rejection Rate, ICU Match Accuracy, Fairness Index (FI), Staff Overtime, and a synthetic Patient Satisfaction proxy. Unless otherwise noted, results aggregate five random seeds with identical episode budgets and report 95% confidence intervals (CIs), hypothesis tests, and effect sizes. Where appropriate, findings are contextualised with clinical and operational literature [
1,
5,
7,
12,
16,
29,
30,
40].
For clarity, the principal evaluation metrics are defined as follows:
Fairness Index (FI):where
is the variance of average waiting times across triage groups (T1–T5). Higher FI indicates better equity (lower cross-triage variance) and/or higher overall reward.
Tail Risk (CVaR-95):the conditional average of waiting times in the worst 5% tail.
Percent Change:used to report relative improvement of DQN over FCFS.
5. Discussion
The results demonstrate that reinforcement learning can substantially improve hospital scheduling compared to rule-based baselines, but the broader significance lies in how these improvements align with clinical priorities and existing literature. The halving of waiting times and the large reductions in diversions and transfers indicate that learned policies can relieve chronic congestion, a finding consistent with reports of overcrowding as a persistent challenge in Emergency Departments and ICUs [
1,
2,
3]. By explicitly integrating fairness into the reward, the model not only reduces overall delays but also narrows disparities across triage categories, addressing an ethical concern highlighted in recent fairness-aware healthcare scheduling studies [
9,
10].
The improvements in ICU match accuracy, while modest in percentage terms, are clinically meaningful because they reduce the risk of both over- and under-triage. This aligns with prior reinforcement learning work in critical care, such as the AI-Clinician for sepsis treatment [
6], but extends it by embedding fairness constraints and real-time forecasts into the control policy. The reduction in staff overtime further underscores the potential to mitigate workforce burnout, a key concern emphasized by the World Health Organization’s workforce 2030 strategy [
4].
Forecast-informed state features provided a distinct performance boost, especially in reducing diversions and improving fairness. This validates the hypothesis that short-horizon signals enable proactive rather than reactive control, a principle well established in supply chain and airline scheduling [
40,
41], but less commonly applied in healthcare. By injecting SARIMAX mean and GARCH volatility estimates directly into the decision state, the agent demonstrated anticipatory behavior, buffering against surges before congestion materialized. This design choice addresses long-standing calls for tighter integration between forecasting and operational control in hospital management [
20,
24].
Fairness-weight sensitivity analysis confirmed that ethical priorities can be tuned without sacrificing efficiency. At , the system achieved a balanced trade-off, preserving large efficiency gains while significantly improving equity. This flexibility is important for policy adoption: hospitals may prioritize fairness differently depending on regional regulations or ethical guidelines. The robustness checks further confirm that gains are not brittle; the policy maintained superior performance under surges, triage-mix shifts, and across random seeds, suggesting real-world viability in volatile environments.
SHAP analysis provided transparency into both global and local decision factors. Triage severity, ICU availability, wait times, and forecasted volatility emerged as the most salient drivers of model outputs, aligning with clinical priorities. Local case attributions such as ICU deferrals during scarcity or safe rejections when overtime thresholds were exceeded offered interpretable justifications for individual actions. This dual perspective (global ranking and local force plots) strengthens auditability and clinician trust, aligning with established clinical explainability frameworks [
12,
28].
Validation confirmed that ethical safeguards were preserved. Group fairness tests (sex/age) showed no significant disparities, monotonicity checks ensured higher triage never received longer waits, and policy guards prevented unsafe deferrals of T1/T2 patients. These findings demonstrate that reinforcement learning can improve efficiency without compromising clinical safety or equity.
The Comparison with Prior Works showed that ∼52% reduction in waiting times and ∼72% reduction in rejections surpass typical reinforcement learning studies in hospital operations, which often report 22–30% improvements and seldom integrate fairness or overtime considerations [
7,
11,
13,
14,
17,
18,
19,
20,
21,
22]. Gains are attributed to the combined effect of fairness-aware reward shaping and forecast-augmented states. Cross-domain parallels with airlines and supply chains further validate the hybrid forecast + control approach [
40,
41].
Regarding the clinical and operational impact, translational estimates suggest that, for a mid-sized ED with ∼50,000 annual visits, reductions of this magnitude imply hundreds of thousands fewer prolonged waits or diversions each year. Improved ICU placement addresses critical safety risks, while reduced overtime enhances workforce sustainability. These improvements translate into higher patient satisfaction and potentially better hospital quality ratings. Collectively, the results demonstrate that efficiency, equity, and sustainability can be improved jointly, rather than being treated as trade-offs.
Overall, the proposed framework advances hospital scheduling by combining fairness, forecasting, and explainability in a unified reinforcement learning approach. This not only improves efficiency and safety but also provides clinicians and administrators with transparent, auditable decision support, making the system more aligned with the ethical and governance requirements of healthcare delivery.
While the present formulation relies on a discrete action space, assigning patients to ED, ward, or ICU beds, or issuing short/long waits or diversions, this design choice was made to maintain training stability and clinical interpretability. However, real-world hospital operations often require more subtle, continuous decisions, such as variable staffing intensity, partial or temporary bed reservations, or dynamic prioritization thresholds. Future work may explore hierarchical reinforcement learning, where high-level discrete actions are combined with low-level continuous controllers capable of modulating staffing or capacity in real time. Alternatively, hybrid architectures that integrate DDQN with actor–critic methods could support continuous control while retaining the transparency advantages of discrete decision-making. Investigating these hybrid action representations may yield additional reductions in waiting-time variance and further improvements in fairness across acuity levels.