Next Article in Journal
Generalized Game Theory in Perspective: Foundations, Developments and Applications for Socio-Economic Decision Models
Next Article in Special Issue
Generative Simulation and Summarization of Neonatal Patient Data
Previous Article in Journal
NIABIAuth: A Non-Interactive Attribute Binding Identity Authentication Protocol for Internet of Things Terminals
Previous Article in Special Issue
Advances in Application of Federated Machine Learning for Oncology and Cancer Diagnosis
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Fairness-Aware Intelligent Reinforcement (FAIR): An AI-Powered Hospital Scheduling Framework

Department of Computer Engineering and Computational Sciences, Canadian University Dubai, Dubai 117781, United Arab Emirates
*
Author to whom correspondence should be addressed.
Information 2025, 16(12), 1039; https://doi.org/10.3390/info16121039
Submission received: 10 October 2025 / Revised: 8 November 2025 / Accepted: 20 November 2025 / Published: 28 November 2025

Abstract

Hospitals must allocate beds and staff effectively under volatile arrival patterns, where scheduling errors can cause preventable harm. This study introduces a fairness-aware, forecast-informed reinforcement learning framework for hospital scheduling, explicitly integrating fairness constraints, short-term demand forecasts, and SHAP-based explainability. The state fuses patient and system context with short-horizon demand forecasts, mean arrivals ( λ ^ ), and volatility ( σ ^ 2 ). The reward jointly optimizes efficiency, equity, and safety by penalizing waiting, diversions/transfers, ICU misuse, overtime, and cross-ward disparity. Using a benchmark-aligned synthetic cohort (60k visits over one year), the approach is compared against First-Come-First-Served (FCFS) and ablations without forecast features. The learned policy halves the mean waiting time (from 215.3 to 102.5 min), reduces diversions/transfers (from 27.6% to 7.8%), improves ICU match accuracy (from 93.4% to 95.1%), raises the fairness index by 45%, and cuts staff overtime by 56%. Adding forecast signals yields further gains over forecast-naive DQN (9% shorter waits; 28% fewer diversions/transfers), with robustness under demand surges and triage-mix shifts. By unifying equity constraints, anticipatory context, and explanation, the method turns reactive queues into proactive, auditable control and is extensible to perioperative flow, disaster triage, and outpatient capacity management.

Graphical Abstract

1. Introduction

Hospitals face the dual challenge of delivering timely, high-quality care while operating under constraints of finite bed availability, limited staffing, and volatile patient inflows. Among the most critical operational problems is scheduling—the real-time assignment of patients to resources such as emergency department (ED) beds, wards, intensive care units (ICUs), clinicians, and diagnostic services—where errors can directly translate into preventable harm.
Real-world clinical evidence highlights the scale of this challenge. In acute care environments like EDs and ICUs, inefficiencies can rapidly become dangerous: in the U.S., median ED boarding times (i.e., the time admitted patients wait in ED before transfer to inpatient beds) are around 2.0 h, with the upper spread (95th percentile) approaching 8 h during periods of high occupancy. When hospital occupancy exceeds 85%, boarding times routinely surpass the widely referenced 4-hour threshold and have been reported to reach approximately 6.6 h. Delays, misallocations, and rejections from ICU admission are also common: one study found that nearly 47% of ED requests for ICU admission were denied or delayed due to capacity constraints. These patterns are echoed internationally, with systematic reviews linking ED crowding to increased waiting times, elevated rates of patients leaving without being seen, higher morbidity and mortality, and poorer patient satisfaction. In many hospitals, approximately 10% of patients depart before clinical evaluation during peak load periods. ICUs, which typically represent only 10–15% of hospital bed capacity, are recurrent bottlenecks; constrained critical care resources often force EDs to “board” critically ill patients or delay ICU transfers, increasing clinical risk [1,2,3,4,5].
These pressures highlight the urgent need for scheduling methods that jointly optimize efficiency by reducing delays and diversions, equity by balancing waiting times across triage groups, and safety by ensuring accurate ICU placement and mitigating staff overload.
Historically, hospitals have relied on heuristics such as First-Come-First-Served (FCFS) or fixed triage queues, which are simple to defend but structurally myopic, ignoring demand variability, resource contention, and staff fatigue. During surges such as seasonal outbreaks, static queues fail to redistribute loads or anticipate peaks, often leading to excessive waiting, walkouts, and ICU congestion [1,2,3]. Recent data-driven approaches have also demonstrated the value of learning-based decision systems in hospital operations, including RL-guided treatment strategies in ICU care [6] and fairness-aware queueing control for clinical scheduling [7]. Classical optimization techniques, including mixed-integer and linear programming, have improved planned operations but often lack scalability in noisy real-time ED/ICU contexts [8,9]. These limitations underscore the need for adaptive, data-driven methods that can anticipate demand and dynamically rebalance resources.
Reinforcement learning (RL) offers a natural fit by framing scheduling as a sequential decision-making problem under uncertainty, optimizing long-run outcomes rather than immediate throughput.
It is important to note that all numerical performance metrics reported later in this paper, e.g., a baseline mean waiting time of 215.3 min under FCFS, are not clinical statistics from real hospitals but values generated by our benchmark-aligned synthetic dataset. These simulation baselines reflect the assumed demand intensity, capacity levels, and triage distributions encoded in the synthetic environment and should be interpreted as internally consistent reference points rather than real-world hospital measurements.
In this study, we propose a fairness-aware Deep Q-Network (DQN) tailored to hospital scheduling, where the state representation integrates patient attributes (triage, waiting time, expected length-of-stay), system attributes (bed availability, ICU pressure, staff workload), and forecast attributes capturing short-horizon mean arrivals ( λ ^ ) and volatility ( σ ^ 2 ). Clinically interpretable actions include allocating ED, ward, or ICU beds, short-waiting or long-waiting, or diverting/transferring patients. Earlier studies have begun exploring fairness-aware decision frameworks in ICU and clinical resource allocation, demonstrating how learning-based policies can incorporate ethical constraints into operational decisions [10,11]. Moreover, recent advances in model interpretability, particularly through unified SHAP-based attribution methods [12], have demonstrated the feasibility of generating transparent, clinically meaningful explanations for complex learning models. The reward function is multi-objective, balancing efficiency, fairness, and safety by penalizing waiting times, diversions, ICU misuse, overtime, and disparities across triage categories [13,14].
To ensure clinical trust, every scheduling decision is accompanied by SHAP-based explainability, attributing outcomes to inputs such as triage level, current wait, ICU availability, and forecast signals, thereby supporting both global model interpretability and per-decision auditability [12]. Broader advances in clinical AI and electronic health record modeling [15,16] and extensive reports on operational stressors in emergency care systems [17,18] further underscore the need for predictive tools that support real-time decision-making. Short-term forecasts of arrivals and volatility are injected into the state space to transform reactive queuing into proactive control, enabling the system to buffer against imminent demand surges using lightweight, transparent forecasting models [19,20,21,22,23,24,25]. Fairness is embedded directly in the reward as the variance of mean waiting times across triage categories, with a fairness index monitored during training and evaluation to prevent inequitable trade-offs [9,10,23,24,25,26,27].
To evaluate performance without compromising confidentiality, we constructed a synthetic, benchmark-aligned dataset simulating approximately 60,000 annual ED visits, with diurnal and seasonal cycles, triage-specific length-of-stay distributions, holiday surges, and staffing rosters of ∼450 personnel. ICU capacity was set at 10–15% of total beds, consistent with operational norms [1,2,3,4,5]. This dataset provides realism and reproducibility, enabling controlled comparison against FCFS and ablation models.
Empirical results show that the proposed policy halves the mean waiting time, reduces diversions and transfers, improves ICU matching, raises the fairness index, and lowers staff overtime, all with statistical significance. Moreover, forecast-informed state features yield measurable gains over forecast-naïve DQNs, particularly under surge conditions and shifts in triage mix.
Beyond empirical performance, the integration of fairness constraints, anticipatory context, and SHAP-based explainability provides a principled framework for transparent, auditable, and ethically defensible hospital scheduling. In doing so, this work contributes fourfold:
  • the design of a fairness-aware DQN optimizing multi-objective outcomes across delays, ICU accuracy, fairness, and overtime [9,10,23,24,25,26,27];
  • the injection of short-horizon forecast signals for proactive scheduling [19,20,21,22,23,24,25];
  • the embedding of explainability through SHAP to produce global and per-decision justifications [12,28];
  • reproducible evaluation on benchmark-aligned synthetic data [1,2,3,4,5,7,8,9,13,14].
Taken together, these contributions demonstrate that reinforcement learning can move hospital scheduling from static heuristics to adaptive, auditable control. The broader significance lies in its extensibility: the same principles apply to perioperative flow, disaster triage, vaccination surge planning, and outpatient capacity management—domains where demand volatility, equity, and explainability must be treated as first-class operational constraints [27,29,30,31,32,33].

2. Related Work

Research on hospital scheduling spans rule-based heuristics, operations research (OR), reinforcement learning (RL), forecasting for operational preparedness, and, more recently, fairness and explainability in clinical AI. This section reviews these strands and positions our contribution: a fairness-aware Deep Q-Network (DQN) that integrates short-horizon forecasts directly into the online decision state and exposes per-decision explanations, evaluated against a transparent FCFS baseline.
Hospitals have historically relied on simple rule-based approaches for admission and bed management. Among these, First-Come-First-Served (FCFS) and fixed triage queues remain the most widely deployed, particularly in Emergency Departments (EDs) and inpatient units, because they are transparent, easy to implement, and straightforward to defend in audits. However, these methods are structurally limited. FCFS does not differentiate by patient severity, sometimes prioritizing low-acuity patients over critically ill ones during surges, while static triage queues become brittle under load, especially when ICU capacity saturates. Such failures manifest as prolonged delays, boarding, and even systematic denial of critical care. Empirical evidence shows that these limitations are not confined to a single jurisdiction but are systemic, driven by globally high bed occupancy, chronic crowding, and workforce shortages [1,2,3,4,5]. Thus, transparency alone cannot guarantee safety or equity in stochastic, resource-constrained settings.
To move beyond heuristics, the operations research (OR) community has contributed formal optimization models to healthcare scheduling and capacity allocation. Techniques such as mixed-integer programming and linear programming have been used for staff rostering, surgical block scheduling, and bed assignment [8,9]. These frameworks excel in encoding complex constraints and yield efficiency gains in tactical and planned contexts. For instance, OR models can balance staff shifts, maximize utilization, and ensure coverage across units. Yet, such methods often assume static inputs and tractable problem sizes. In reality, ED and ICU arrivals are stochastic, non-stationary, and high-dimensional, with interactions between patient severity, length-of-stay, and bed turnover. Under these conditions, optimization approaches frequently become brittle: they must recompute plans in real-time and may not scale fast enough to support continuous admission and deferral decisions. As a result, OR is better suited for medium-term planning and resource allocation than for real-time operational control in dynamic, noisy environments.
Reinforcement learning (RL) offers a powerful alternative by treating scheduling as a sequential decision problem with delayed rewards. Instead of pre-specifying optimal schedules, RL agents learn policies through interaction, adapting to non-stationary demand. Prior studies have shown RL can reduce delays in outpatient scheduling, ED throughput, and ICU triage relative to clinician-derived or heuristic baselines [7,11,13,14]. These works highlight the potential of learning-based control for hospital operations. However, key limitations remain:
  • many RL implementations optimize throughput or average delay without incorporating fairness constraints, inadvertently privileging some patient groups over others. This is problematic in ED/ICU contexts, where systematic delays for Triage-1 or Triage-2 patients directly translate to clinical harm.
  • most RL controllers are reactive: they rely only on realized states (e.g., current occupancy and wait times) and rarely integrate forecasts of near-term arrivals or volatility into their state space. This leaves them ill-prepared for surges.
  • explainability is seldom embedded. Policies often function as black boxes, making them difficult to audit or justify to clinicians. Unconstrained optimization can disadvantage high-severity or slower-arriving cohorts, undermining trust in AI recommendations.
Recent studies emphasize that embedding fairness directly in the reward—penalizing cross-triage disparities and monitoring a fairness index during training and evaluation—is a principled way to align RL with ethical and clinical priorities [9,10,23,24,25,26,27].
Beyond these broad observations, only a limited body of work explicitly combines fairness with RL in healthcare. Two main paradigms emerge in the literature: (1) reward-based fairness shaping and (2) constrained optimization via constrained Markov Decision Processes (CMDPs). Foundational work on fairness metrics, such as Jain’s fairness index [34] and subsequent AI fairness surveys by Rao and Wang [35], provide the theoretical basis for reward-based equity measures, but their application to clinical scheduling remains sparse. Representative of reward-based shaping, Zhang et al. [7] incorporate fairness penalties into the reward to reduce waiting-time disparities in queueing systems, though their model does not integrate forecasting nor provide explainability. In operational healthcare contexts, Raschke and Mann [36] and Maass [37] discuss equity and ethical resource allocation, but without embedding fairness directly into RL training. In contrast, Suresh et al. [38] adopt a CMDP-based formulation for ICU allocation, enforcing fairness through Lagrangian constraints. A complementary constrained-RL formulation was introduced by Suresh et al. [39], who demonstrated how Lagrangian-based policy updates can enforce fairness constraints in clinical resource-allocation settings. While CMDPs offer strong theoretical guarantees, they introduce training instability, require delicate constraint tuning, and lack mechanisms for real-time interpretability. The present study follows the reward-based fairness shaping paradigm but extends it substantially by integrating arrival forecasts (mean and volatility) into the RL state for anticipatory control and by embedding SHAP-based global and local explanations for clinical auditability, capabilities absent in prior fairness-aware RL approaches.
Deep RL architectures present design trade-offs. DQNs are particularly well-suited for discrete assignment actions such as “assign to ED/ward/ICU,” “wait-short,” “wait-long,” or “divert/transfer.” They support auditability via Q-values, experience replay, and target networks, providing stability in high-dimensional but structured problems. Actor–critic and Proximal Policy Optimization (PPO) methods, by contrast, are advantageous for continuous controls such as staffing intensity or resource scaling. Multi-agent RL has also been explored in hospital flow, coordinating interdependent units to reduce cross-ward congestion, though it introduces added complexity and stability concerns [17,18,19,20,21,22]. Existing hospital-flow simulations report substantial reductions in delays and improved unit matching with DQNs, while multi-agent methods achieve coordination gains at higher computational cost. Given the discrete and high-stakes nature of patient-to-bed assignments, Double-DQN architectures are particularly attractive, offering stable learning and transparent updates. Continuous staffing adjustments can later be layered with actor–critic methods, making hybrid approaches promising for future extensions.
Explainability is increasingly required for AI systems in healthcare, especially when recommendations affect patient safety. Clinicians and administrators must understand why a recommendation is made before trusting or acting upon it. Model-agnostic explanation tools, particularly SHAP, provide both global rankings of feature importance and local, per-decision attributions [12,28]. In the hospital scheduling context, SHAP can highlight whether a recommendation to assign or defer a patient was driven by triage severity, current waiting time, ICU pressure, or forecasted demand. This supports human-in-the-loop governance: clinicians can retrospectively review patterns, spot-check real-time decisions, and apply overrides when warranted, all without significantly degrading model performance. Embedding explainability directly in the scheduling system thus bridges efficiency gains from RL with the accountability requirements of medical governance.
Forecasting constitutes another crucial dimension. Time-series forecasting is a mature discipline with demonstrated utility in hospital operations. Classical models such as ARIMA and SARIMA, as well as state-space approaches, can accurately predict ED arrivals, ICU demand, and length-of-stay when diurnal or weekly patterns dominate. More recent ML/DL models, including gradient-boosted trees, LSTMs, and Transformers, extend predictive power to more complex signals [19,20,21,22,23,24,25,27,29,30,31,32,33]. Despite this progress, forecasts are rarely coupled directly with online scheduling. They are commonly used for medium-term staffing and bed capacity planning but seldom injected into the real-time state observed by a controller. This gap limits operational readiness: controllers that respond only to realized congestion cannot proactively adjust for impending surges or rising volatility.
Beyond healthcare, hybrid systems integrating forecasting and scheduling are widespread and successful. Airlines routinely use demand predictions to guide crew and aircraft rotations; supply chains integrate order arrival forecasts into fulfillment and routing decisions. In both domains, hybridization significantly improves efficiency relative to siloed designs [40,41]. Despite their success, similar hybrid architectures are rare in healthcare. By incorporating forecast means and volatilities into the online state, controllers can transition from reactive to anticipatory scheduling, buffering against demand shocks before they materialize. To our knowledge, such integration remains rare in hospital-flow RL research, representing a novel contribution of this study.
Positioned against this background, our work integrates four elements that are seldom combined in hospital scheduling:
  • a Double-DQN tailored to discrete, auditable patient-to-bed actions;
  • fairness embedded directly in the reward via cross-triage waiting-time penalties and a monitored Fairness Index;
  • short-horizon arrival mean and volatility features injected into the online state to enable proactive buffering;
  • SHAP-based per-decision attributions to support clinical governance. This alignment maps directly to clinical KPIs (waiting time, diversions/transfers, ICU match accuracy, fairness, and overtime) and underpins the empirical gains.
As summarized in Table 1, existing approaches offer partial-progress heuristics provide transparency, OR supports tactical planning, RL enhances efficiency, and forecasting improves preparedness, but few combine fairness, proactive forecasting, and explainability within a single real-time controller. This motivates our framework, which unifies these elements and evaluates them under benchmark-aligned synthetic data.
Beyond individual methodological strands, several operational bed-assignment systems closely relate to the present study. Traditional optimization and metaheuristic approaches, such as the tabu search model of Demeester et al. [42], address overload and transfer minimization but operate in a static, single-day planning horizon with no mechanism for online adaptation or fairness control. More recent hybrid frameworks, such as Schäfer et al. [43], combine machine-learning-based patient–unit compatibility predictions with integer programming for daily bed allocation. While these methods improve efficiency, fairness is evaluated only post hoc and the models lack sequential, state-dependent decision capability.
To provide a stronger empirical comparison, two additional baselines were implemented in this study: (i) a hybrid tabu-search scheduler adapted from Demeester et al. [42], and (ii) a machine-learning-guided optimization model inspired by Schäfer et al. [43]. Both were evaluated under identical synthetic demand conditions. Results indicate that the proposed DDQN–Fair controller consistently outperforms these methods in mean waiting time (7–12% reduction) and fairness index (+0.08–0.14), demonstrating its ability to jointly improve efficiency and equity in dynamic hospital-flow environments.
Taken together, the reviewed approaches illustrate a progression from transparency to adaptability, yet the field still lacks integrated solutions that balance efficiency, equity, and accountability simultaneously. While heuristics and OR methods offer ease and tactical planning, they struggle in dynamic contexts; RL shows promise but often overlooks fairness and explainability; forecasting is mature but underutilized in online controllers. Crucially, most evaluations stop short of stress-testing under realistic surge conditions, triage imbalances, or workforce volatility scenarios that define everyday hospital operations. By embedding fairness directly in the reward, injecting forecasts into the decision state, and instrumenting explainability at every action, our work addresses these gaps and demonstrates how a unified RL-based scheduler can transition hospital operations from reactive to proactive, equitable, and auditable control.

3. Methodology

This section presents the proposed fairness-aware, forecast-informed reinforcement learning (RL) framework for hospital scheduling. The methodology is organized into six components: (i) synthetic data construction, (ii) Markov Decision Process (MDP) formulation, (iii) DDQN architecture, (iv) fairness-aware reward, (v) forecast-signal integration, and (vi) evaluation and ablation protocols. Two core algorithms are provided in-line: the DDQN training loop (Algorithm 1) and the DDQN Inference (Online Scheduling) (Algorithm 2). Supporting algorithms remain in Appendix A for reproducibility.
Algorithm 1: DDQN Training (Fairness-Aware, Forecast-Informed)
Information 16 01039 i001
Algorithm 2: DDQN Inference (Online Scheduling)
Information 16 01039 i002

3.1. Synthetic Dataset and Preprocessing

We constructed a synthetic, benchmark-aligned dataset simulating approximately 60,000 annual ED visits. The dataset incorporated diurnal and seasonal cycles, triage-specific length-of-stay distributions, holiday surges, and staffing rosters of approximately 450 personnel. ICU capacity was fixed at 10–15% of total hospital beds, with no repeated deferrals for Triage-1/2; escalation was triggered when ICU saturation persisted consistent with operational benchmarks in the literature [1,2,3]. To validate realism, Kolmogorov–Smirnov and chi-square tests compared synthetic distributions with public benchmarks (e.g., national ED arrivals, ICU shares). Baseline statistics under FCFS (mean wait ≈ 215 min, rejection ≈ 28%) aligned with published crowding reports. All simulations and analyses were implemented in Python using standard scientific computing libraries, including Matplotlib for visualization, NumPy for numerical processing, and SciPy for statistical routines [44,45,46].
Each patient record contained:
  • Triage severity (T1–T3 categories),
  • Arrival timestamp (aligned to diurnal and weekly patterns),
  • Expected length-of-stay (LOS) distributions,
  • Routing outcome (ED discharge, ward transfer, ICU admission).
Figure 1 illustrates the triage severity distribution within the synthetic cohort, highlighting the skew towards moderate-acuity patients (T2), consistent with real-world hospital data. Further details on the synthetic data construction, calibration assumptions, and validation against CDC/AHA/AHRQ benchmarks are provided in Appendix B.

3.2. Reinforcement Learning Framework

Reinforcement learning offers a natural paradigm for modelling hospital scheduling as a sequential decision-making problem under uncertainty. This subsection details the formulation of the proposed RL-based controller, including the Markov Decision Process (MDP), the DDQN architecture, the fairness-aware reward structure, the training configuration, and the core algorithms used in the framework.
  • Markov Decision Process (MDP) Formulation
Hospital scheduling is modelled as a discrete-time Markov Decision Process (MDP) defined by the tuple:
M = ( S , A , P , R , γ ) .
where S is the state space, A the action space, P( s |s,a) the transition dynamics, R(s,a) the reward function, and  γ ∈ [0, 1] the discount factor.
State Space: Each state s t captures the operational snapshot of the hospital at time t, including:
s t = q ED , q Ward , q ICU , a beds , a staff , λ t forecast , τ triage , d delay , h t .
  • Triage level and weighted acuity score,
  • Current waiting time and expected length-of-stay,
  • Queue lengths in ED, ward, and ICU,
  • Bed availability percentages per unit,
  • Staff utilisation in the current shift,
  • Short-horizon arrival forecast λ ^ t ,
  • Forecasted demand volatility σ ^ t 2 ,
  • Approximate transfer delay d delay ,
  • Time-of-day indicator.
This high-dimensional representation captures temporal variations, congestion patterns, and anticipated demand spikes.
Action Space: At each decision epoch, the agent selects one of five discrete actions:
  • Assign patient to an Emergency Department bed.
  • Assign to a General Ward bed.
  • Assign to an Intensive Care Unit bed.
  • Postpone admission (short wait).
  • Divert patient externally.
Discrete actions preserve interpretability and training stability. Continuous decisions (e.g., staffing intensity, partial bed holds) are discussed as limitations and future extensions in Section 5.
  • Fairness-Aware Reward Design
To incorporate ethical and clinical priorities directly into decision-making, the reward integrates both efficiency and equity objectives. The final reward is:
R t = α 1 ( w t ) + α 2 b t + α 3 s t α 4 ( 1 FI t ) ,
where w t is the waiting time, b t the bed-utilisation proxy, s t [ 0 , 1 ] the satisfaction score, and  FI t the Fairness Index defined below. All reward components are normalized to 1 , 1 and clipped to ±2 for numerical stability.
To operationalize fairness in a clinically meaningful manner, we define a Fairness Index (FI) grounded in Jain’s index and normalized cross-group dispersion measures. It combines overall service quality with cross-triage equity:
FI = μ ˜ s · 1 σ ˜ triage ( w ) , 0 FI 1 .
Here, μ ˜ s is the mean normalized satisfaction across all patients,
μ ˜ s = 1 N i = 1 N s i ,
ensuring that fairness is not evaluated in isolation from global service quality. The term σ ˜ triage ( w ) captures inequity in waiting times across triage levels. For each triage group g, we compute a within–group dispersion σ g ( w ) (standard deviation or normalized MAD), and aggregate these as:
σ triage ( w ) = 1 | G | g G σ g ( w ) .
This quantity is then min–max normalized to yield σ ˜ triage ( w ) [ 0 , 1 ] , where higher values indicate greater disparity in waiting times across acuity levels.
This formulation allows explicit control over efficiency–equity trade-offs via the fairness coefficient as presented in (Appendix A, Algorithms 1 and 2).
Relation to standard fairness metrics. The proposed FI aligns with several widely used fairness measures: (1) group parity, by explicitly penalizing disparities across clinically meaningful cohorts (triage categories); (2) max–min fairness, by reducing the spread of waiting times across groups; (3) Gini-like inequality, through the use of normalized dispersion; and (4) normalized variance, since σ ˜ triage ( w ) is a bounded, variance-like measure. Unlike single-purpose metrics, the FI jointly captures quality of care (through μ ˜ s ) and distributional fairness (through σ ˜ triage ( w ) ), offering a clinically grounded and RL-stable formulation of equity.

3.3. DDQN Architecture

The Double Deep Q-Network (DDQN) is composed of several interdependent components that together enable stable learning, fairness-aware decision-making, and proactive scheduling. These architectural elements, outlined below, represent the backbone of the reinforcement learning framework and each plays a distinct role in ensuring efficiency, equity, and robustness. As illustrated in Figure 2, the architecture consists of:
  • Q-network: two hidden layers (128 ReLU units each) and a linear head over six actions.
  • Replay buffer: 50,000 transitions; minibatch size 64.
  • Target synchronization: every 200 gradient steps (extended to 1000 in robustness runs).
  • Optimizer: Adam, learning rate 10 3 (sensitivity sweep down to 2.5 × 10 4 ).
  • Exploration: ϵ -greedy schedule from 1.0 → 0.05 across ∼150,000 steps.
Figure 2. DDQN architecture for hospital scheduling. The state integrates patient, system, and forecast features.
Figure 2. DDQN architecture for hospital scheduling. The state integrates patient, system, and forecast features.
Information 16 01039 g002
The training loop follows the standard DDQN procedure with ϵ -greedy exploration and periodic target network updates. Training consists of 1000 episodes of 240 steps (≈10 simulated days), across 5 random seeds with bootstrap 95% confidence intervals. (Appendix A, Algorithm 1).

3.4. Main Algorithms

To provide full procedural transparency, the core scheduling logic of the proposed framework is expressed explicitly through its two operational algorithms: the training algorithm and the online scheduling (inference) algorithm. Algorithm 1 describes the full Double Deep Q-Network (DDQN) training loop, including how fairness penalties are computed, how forecast-informed states are incorporated at each decision epoch, and how Double Q-targets stabilize learning. Algorithm 2 outlines the deployment-time controller, showing how the learned policy selects actions, applies clinical safety guards (such as prohibiting repeated T1/T2 deferrals and enforcing ICU escalation rules), and generates SHAP-based explanations for real-time auditability. These algorithms collectively formalize the reinforcement learning pipeline underlying the FAIR framework, clarifying both the learning dynamics and the real-time decision flow.

3.5. Forecast Signal Integration

To enable proactive rather than reactive control, short-horizon demand signals were integrated into the state space:
  • SARIMAX models were used to estimate mean arrival forecasts λ ^ t + 1 ,
  • GARCH models provided volatility estimates σ ^ t + 1 2 .
The augmented state s t = [ patient , system , λ ^ , σ ^ 2 ] provided the DDQN with anticipatory context (Appendix A, Algorithm A2).

3.6. Evaluation and Ablation Protocols

Evaluation was performed using paired experiments across random seeds, comparing the DDQN against an FCFS baseline. Performance was measured across key KPIs: mean waiting time, rejection rate, ICU match accuracy, fairness index, and staff overtime. Statistical validity was ensured through paired t-tests, Wilcoxon signed-rank tests, ANOVA, and Kruskal–Wallis tests with Holm–Bonferroni corrections.
Ablation studies isolated the effect of forecast signals by comparing forecast-naïve and forecast-aware models, while robustness checks simulated:
  • Arrival surges (+25% arrivals),
  • Triage mix shifts (+30% T1 patients),
  • Tail-risk (CVaR-95) analysis,
  • Convergence across multiple seeds.
The evaluation workflow is summarized in Appendix A, Algorithm A3.

3.7. Explainability and Governance

To ensure clinical trust, SHAP-based explainability was applied to every decision, providing both global feature importance (triage severity, ICU availability, forecast signals, staff overtime) and local per-decision attributions. These explanations were logged to support IRB-style audits and enable clinicians to override decisions when necessary. SHAP values were computed directly on the trained online Q-network Q θ , treating the vector of Q-values as a multi-output model. For each decision step, we explain the Q-value corresponding to the action selected by the agent. No surrogate model was used: all attributions are derived from the actual function learned by Q θ , ensuring that explanations faithfully reflect the policy’s internal reasoning.

4. Results and Discussion

This study evaluates the proposed fairness-aware, forecast-informed DQN against a transparent FCFS baseline on a benchmark-aligned synthetic cohort (60,000 visits; realistic ward/ICU ratios; 450 staff; one-year horizon). Metrics include Average Waiting Time, Rejection Rate, ICU Match Accuracy, Fairness Index (FI), Staff Overtime, and a synthetic Patient Satisfaction proxy. Unless otherwise noted, results aggregate five random seeds with identical episode budgets and report 95% confidence intervals (CIs), hypothesis tests, and effect sizes. Where appropriate, findings are contextualised with clinical and operational literature [1,5,7,12,16,29,30,40].
For clarity, the principal evaluation metrics are defined as follows:
  • Fairness Index (FI):
    F I = μ ( per-patient reward ) Var triage ( waiting time )
    where Var triage ( waiting time ) is the variance of average waiting times across triage groups (T1–T5). Higher FI indicates better equity (lower cross-triage variance) and/or higher overall reward.
  • Tail Risk (CVaR-95):
    CVaR 95 ( W ) = E [ W W VaR 95 ( W ) ]
    the conditional average of waiting times in the worst 5% tail.
  • Percent Change:
    Δ % = 100 × Baseline DQN Baseline
    used to report relative improvement of DQN over FCFS.

4.1. Waiting Time Reductions

The mean waiting time decreased from 215.3 min under FCFS to 102.5 min with the proposed DQN, corresponding to a 52.4% overall reduction. Improvements were most pronounced for high-acuity cohorts, with Triage-1 patients dropping from 302.1 to 90.4 min (−70.1%) and Triage-2 from 241.9 to 92.8 min (−61.6%), while Triage-3–5 experienced more moderate gains of 38–48%. As shown in Figure 3, the reduction was especially marked for Triage-1 and Triage-2, where the clinical impact of shorter delays is greatest.
Statistical testing confirmed the robustness of these results: Δ = 112.8 minutes (95% CI [−143.2, −84.4]); paired t-test p = 0.002 ; Cohen’s d = 0.86 (large). Extreme delays also improved significantly, with the 90th percentile wait time falling from 498 to 221 min (−55.6%) and the 95th percentile from 611 to 276 min (−54.8%).
Discussion: The mechanisms behind these gains are twofold. First, the fairness-aware reward penalised long waits disproportionately for Triage-1/2, redistributing resources toward the most urgent patients. Second, forecast-augmented states allowed pre-allocation of ICU capacity ahead of anticipated spikes, thereby reducing bottlenecks. Complementary statistical modeling using a binomial GLM with logit link ( p = 0.029 ) further supported the findings, with FCFS 95% CI [92.5, 94.3] versus DQN 95% CI [94.3, 95.9] and an odds ratio OR = 1.21.
These results are consistent with prior studies documenting crowding-driven delays as direct contributors to excess morbidity and mortality [1,5,28,40]. The observed FCFS wait times (>200 min) align with national ED benchmarks. The halving of delays—particularly for high-acuity patients—represents a clinically material safety improvement and underscores the potential of fairness-aware, forecast-informed reinforcement learning to address one of the most pressing operational risks in acute care.

4.2. Rejection Rate (Diversions/Turn-Aways)

The rejection rate decreased from 27.6% under FCFS to 7.8% with the proposed DQN, representing a 71.7% reduction. Statistical testing confirmed the significance of this improvement: Wilcoxon signed-rank vs. FCFS yielded p = 0.008 , with Hodges–Lehmann Δ = 18.9 percentage points and Cliff’s δ 0.62 (large). Figure 4 illustrates the comparative performance, showing that the DQN consistently reduced diversions below 10%, even under peak demand conditions.
Discussion. The large reduction in rejection rate reflects the mechanism of the DQN: forecast-informed “wait-short” deferrals and volatility-band buffering created safe admission windows and prevented premature rejections. In contrast, FCFS scheduling—where patients are processed sequentially without consideration of urgency or optimal capacity use—often produced arbitrary rejections. Under FCFS, approximately 120 patients would be rejected in a given evaluation window, with 80–90 of these attributable to capacity mismanagement.
Scheduling enhancements embedded in the DQN, such as dynamic discharge planning, outpatient scheduling, and bounded staff overtime, allowed the system to exploit unused capacity. The analysis also revealed that surplus non-core staff, when effectively deployed, contributed to lowering the lost patient ratio.
In practice, diversion rates of 20–30% are typical during ED crowding [3,5]. Achieving rejection rates below 10% therefore represents a notable benchmark, placing our system well below standard averages. Importantly, the implications extend beyond efficiency: reducing diversions ensures fairer access to care and alleviates downstream overcrowding at neighboring hospitals.

4.3. ICU Match Accuracy

Correct ICU placement improved from 93.4% under FCFS to 95.1–95.2% with the proposed DQN. Statistical analysis using a binomial GLM with logit link confirmed the effect ( p = 0.029 ). Confidence intervals were FCFS: [92.5–94.3%] vs. DQN: [94.3–95.9%], corresponding to an odds ratio (OR) of approximately 1.21. Figure 5 shows the distribution, highlighting the consistent improvement under the DQN scheduler.
Discussion. Although the numerical increase appears modest, its clinical implications are substantial. By anticipating high-acuity arrivals, the DQN selectively deferred low-acuity cases when ICU capacity was tight, thereby preserving slots for critical patients and reducing instances of unsafe warding. In contrast, FCFS policies enforced rigid FIFO processing, ignoring near-term demand and capacity pressure, which led to misplacements.
Even small percentage gains in ICU match accuracy have been linked to significant reductions in excess mortality, since inappropriate ward placement of ICU-eligible patients is a direct risk factor for adverse outcomes [6,7,10]. The observed improvement therefore demonstrates that efficiency and fairness gains were not achieved at the expense of clinical appropriateness. Instead, the DQN framework enhances patient safety by ensuring more accurate allocation of limited ICU resources.

4.4. Fairness (Equity of Waiting Times)

The Fairness Index (FI), defined in Section 3.3, was used to quantify disparities across triage groups. Results show that FI increased from 0.49 under FCFS to 0.71 with the proposed DQN, representing a 45% improvement. Variance in triage-wise waiting times was reduced by approximately 37–41%. Statistical analysis using the Kruskal–Wallis testing confirmed significance across triage groups, with post-hoc effects particularly for T1/T2 vs. T3–T5. Figure 6 illustrates the FI improvement and variance reduction.
Discussion. The increase in FI stems from explicit fairness penalties embedded in the DQN reward, which discouraged policies that minimized mean waiting times by favoring low-acuity patients. In contrast, FCFS showed FI 0.49 and cross-triage variance 3.8 , indicating systematic inequities for higher-acuity patients. By comparison, the DQN achieved FI 0.71 and cross-triage variance 0.9 , reflecting more equitable treatment across all triage categories.
Equity in ED/ICU access is not only an ethical consideration but also a clinical necessity: systematic disadvantage for T1/T2 patients directly translates into excess morbidity and mortality [9,10,23,27]. These results confirm that fairness can be embedded directly into the learning objective, rather than being monitored only post hoc, and that RL systems can align both efficiency and ethical standards.

4.5. Fairness-Weight Sensitivity ( ζ )

To explore sensitivity, the fairness coefficient ζ { 0 , 0.25 , 0.5 , 0.75 , 1.0 } was varied. At  ζ = 0 , the policy achieved the best mean waits (∼55% reduction) but rejections remained high (10–11%) and FI stagnated (∼0.50–0.55). At  ζ 0.5 , the selected operating point, outcomes were balanced: waits decreased by 52%, rejections fell to 7.8%, and FI rose to ∼0.71. At higher ζ values ( 0.75 ), FI improved further (∼0.74–0.76), but efficiency regressed modestly. Figure 7 presents the Pareto front of mean wait time versus FI across ζ .
Discussion. This analysis highlights the tunability of the proposed framework: administrators can select ζ based on local policy priorities. A lower ζ emphasizes efficiency but weakens equity, while higher ζ prioritizes fairness at the cost of longer waits. The chosen ζ 0.5 offers a clinically meaningful trade-off, delivering significant improvements in both efficiency and equity. Importantly, this demonstrates that fairness is not a fixed property but a controllable parameter in reinforcement learning systems, enabling adaptive deployment across diverse hospital settings.

4.6. Staff Overtime (Sustainability)

Daily overtime declined significantly, from 16.3 h/day under FCFS to 7.2 h/day with the proposed DQN, representing a 55.8% reduction. An ANOVA across seeds confirmed significance ( F ( 1 , 8 ) = 8.35 , p = 0.015 , η 2 = 0.28 , moderate–large effect). Figure 8 shows aggregate daily overtime under both models.
Discussion. The reduction in overtime reflects the DQN’s ability to anticipate workload and smooth demand proactively. Forecast features allowed the model to allocate capacity in advance, while fairness-aware penalties discouraged excessive reliance on overtime. In contrast, FCFS scheduling often led to reactive overuse of staff, producing daily overtime that regularly exceeded 10 h and demonstrating an inability to adjust to inflow.
Excessive overtime is strongly associated with staff burnout, fatigue, and medical errors [2,4,5,40]. Many health systems target thresholds of fewer than 10 h of overtime per week per clinician, a target far exceeded by the FCFS baseline. By halving daily overtime while improving efficiency, the DQN demonstrates that operational gains need not be “purchased” at the expense of an unsustainable staff burden. This outcome contributes directly to the sustainability of hospital operations and long-term workforce well-being.

4.7. Patient Satisfaction (Synthetic Proxy)

Mean satisfaction rose from 2.3/5 under FCFS to 4.3/5 with the proposed DQN. A bootstrap test confirmed the improvement was significant ( p < 0.01 ). SHAP attribution of the synthetic proxy ranked contributors as waiting times (41%), rejections (33%), fairness (16%), and other factors (10%). Figure 9 presents the distribution of satisfaction under both models.
Discussion. The FCFS model produced satisfaction rates of only ∼45%, with 35% of patients unsatisfied, reflecting the negative impact of long waits and high rejection rates. In contrast, the Double DQN achieved satisfaction rates near 80%, with unsatisfied patients falling to ∼10%. This improvement highlights how fairness-aware scheduling and adaptive decision-making diffuse benefits across patient outcomes.
While this proxy is synthetic, it mirrors patterns observed in real HCAHPS surveys (https://www.cms.gov/medicare/quality/initiatives/hospital-quality-initiative/hcahps-patients-perspectives-care-survey (accessed on 1 September 2025)), where long waits and diversions are leading drivers of dissatisfaction [1]. By improving ICU matching, reducing rejections, and halving waiting times, the DQN addresses the key factors known to shape patient experience and downstream hospital quality ratings. The results suggest that reinforcement learning approaches may support not only operational performance but also patient-centered care, with potential implications for reimbursement and reputation.

4.8. Statistical Validity and Effect Sizes

All improvements were statistically validated. For paired comparisons, both parametric (paired t-tests) and non-parametric (Wilcoxon signed-rank) tests were applied. Multi-group differences were examined using one-way ANOVA or Kruskal–Wallis, with Holm–Bonferroni correction for multiple hypotheses. Effect sizes were reported using Cohen’s d (parametric) or Cliff’s δ (non-parametric). Across KPIs, statistical power exceeded 0.9, confirming adequacy of the synthetic cohort size. Results are summarised in Table 2.
Discussion. Effect sizes were consistently medium to large, suggesting that observed gains are not only statistically significant but also practically meaningful. The high statistical power reflects adequacy of the synthetic cohort, lending robustness to conclusions. This rigorous statistical validation addresses common concerns in healthcare AI research, where many studies report only significance without effect size or reproducibility analysis.

4.9. Ablation: Forecast-Aware vs. Forecast-Naive DQN (4.8)

To isolate the role of forecast features, we compared the full model against a DQN with identical architecture but no forecast inputs. Forecast-augmented states (SARIMAX mean + GARCH volatility) produced consistent improvements: waiting times decreased by ∼9%, diversions fell by ∼28%, fairness index rose by 9.6%, and overtime decreased by 19%. All improvements were statistically significant ( p < 0.05 ). Results are summarised in Table 3.
Discussion. Forecast-informed features enabled the agent to anticipate surges rather than reacting to congestion after it materialized. This yielded shorter waits, fewer diversions, and improved fairness. The results reinforce evidence from operational research that forecasting enhances preparedness, but here it is embedded directly into the online decision state. The observed equity gains also show that forecasting benefits extend beyond efficiency, aligning with clinical priorities for reducing disparities across triage categories.

4.10. Robustness: Surges, Mix Shifts, and Seeds

Robustness was evaluated under multiple stress scenarios:
  • Arrival Surge (+25% arrivals): Waits increased for both FCFS and DQN, but DQN still reduced delays by 42% and diversions by 61%.
  • Triage Mix Shift (+30% T1 patients): DQN maintained fairness index > 0.55 and halved delays for T1 relative to FCFS.
  • Tail Risk (CVaR-95): In the worst 5% of cases, waits under DQN averaged 276 min versus 611 under FCFS.
  • Convergence/Stability: Across five random seeds, variance in KPIs was under 3%, confirming reproducibility.
The quantitative results of these robustness experiments are summarized in Table 4.
Discussion. These stress tests demonstrate that the proposed framework is not brittle. Gains in efficiency, equity, and safety persisted under surge loads and changing triage mixes, while extreme delays were substantially curtailed. In particular, reductions in tail risk (CVaR-95) highlight the model’s ability to mitigate catastrophic worst-case waits, which are often most strongly associated with excess morbidity and litigation. Low variance across seeds further confirms reproducibility, strengthening confidence in deployment readiness.

4.11. Explainability (SHAP) Findings

Global SHAP attributions identified triage severity, current waiting time, ICU availability, forecasted mean arrivals, volatility, and staff overtime as the most influential features in scheduling decisions. Local explanations confirmed these drivers: for instance, in high-acuity cases, ICU scarcity combined with elevated volatility led the agent to select “wait-short” deferrals, whereas in non-urgent cases, overtime breaches with no relief forecasted produced safe rejections. Figure 10 illustrates the global feature importance distribution.
Discussion: The SHAP outputs confirm that model reasoning was clinically aligned: urgency and capacity constraints dominated decision-making, while secondary features such as arrival timing and departmental load refined edge-case outcomes. Global bar plots highlighted the most salient predictors, while local force plots clarified boundary conditions for individual patients (e.g., ICU deferral vs. safe rejection). This dual perspective strengthens auditability, enabling clinicians to override when needed, and directly supports transparency requirements in clinical AI guidelines [12,28]. Such interpretability fosters trust and facilitates deployment in real-world hospital settings.

4.12. Safety and Bias Checks

Several safeguards were evaluated. Group fairness tests revealed no significant bias in rejection rates by sex or age (e.g., χ 2 p = 0.19 for sex). Monotonicity checks confirmed that higher-acuity triage groups never received systematically longer waits once forecasts were held constant. Policy-level safety caps further ensured that T1/T2 deferrals were bounded, and ICU overflow escalation was automatically triggered after repeated waits.
Discussion: These results demonstrate that efficiency and fairness gains were achieved without compromising ethical safeguards. The absence of systematic demographic disparities and the enforcement of monotonicity provide assurance against hidden biases, a frequent concern in reinforcement learning systems. Moreover, embedding hard policy constraints on critical triage classes and ICU safety margins complements the fairness-aware reward, ensuring clinical acceptability and regulatory compliance. Collectively, these checks reinforce the framework’s alignment with both operational and ethical standards.

5. Discussion

The results demonstrate that reinforcement learning can substantially improve hospital scheduling compared to rule-based baselines, but the broader significance lies in how these improvements align with clinical priorities and existing literature. The halving of waiting times and the large reductions in diversions and transfers indicate that learned policies can relieve chronic congestion, a finding consistent with reports of overcrowding as a persistent challenge in Emergency Departments and ICUs [1,2,3]. By explicitly integrating fairness into the reward, the model not only reduces overall delays but also narrows disparities across triage categories, addressing an ethical concern highlighted in recent fairness-aware healthcare scheduling studies [9,10].
The improvements in ICU match accuracy, while modest in percentage terms, are clinically meaningful because they reduce the risk of both over- and under-triage. This aligns with prior reinforcement learning work in critical care, such as the AI-Clinician for sepsis treatment [6], but extends it by embedding fairness constraints and real-time forecasts into the control policy. The reduction in staff overtime further underscores the potential to mitigate workforce burnout, a key concern emphasized by the World Health Organization’s workforce 2030 strategy [4].
Forecast-informed state features provided a distinct performance boost, especially in reducing diversions and improving fairness. This validates the hypothesis that short-horizon signals enable proactive rather than reactive control, a principle well established in supply chain and airline scheduling [40,41], but less commonly applied in healthcare. By injecting SARIMAX mean and GARCH volatility estimates directly into the decision state, the agent demonstrated anticipatory behavior, buffering against surges before congestion materialized. This design choice addresses long-standing calls for tighter integration between forecasting and operational control in hospital management [20,24].
Fairness-weight sensitivity analysis confirmed that ethical priorities can be tuned without sacrificing efficiency. At  ζ = 0.5 , the system achieved a balanced trade-off, preserving large efficiency gains while significantly improving equity. This flexibility is important for policy adoption: hospitals may prioritize fairness differently depending on regional regulations or ethical guidelines. The robustness checks further confirm that gains are not brittle; the policy maintained superior performance under surges, triage-mix shifts, and across random seeds, suggesting real-world viability in volatile environments.
SHAP analysis provided transparency into both global and local decision factors. Triage severity, ICU availability, wait times, and forecasted volatility emerged as the most salient drivers of model outputs, aligning with clinical priorities. Local case attributions such as ICU deferrals during scarcity or safe rejections when overtime thresholds were exceeded offered interpretable justifications for individual actions. This dual perspective (global ranking and local force plots) strengthens auditability and clinician trust, aligning with established clinical explainability frameworks [12,28].
Validation confirmed that ethical safeguards were preserved. Group fairness tests (sex/age) showed no significant disparities, monotonicity checks ensured higher triage never received longer waits, and policy guards prevented unsafe deferrals of T1/T2 patients. These findings demonstrate that reinforcement learning can improve efficiency without compromising clinical safety or equity.
The Comparison with Prior Works showed that ∼52% reduction in waiting times and ∼72% reduction in rejections surpass typical reinforcement learning studies in hospital operations, which often report 22–30% improvements and seldom integrate fairness or overtime considerations [7,11,13,14,17,18,19,20,21,22]. Gains are attributed to the combined effect of fairness-aware reward shaping and forecast-augmented states. Cross-domain parallels with airlines and supply chains further validate the hybrid forecast + control approach [40,41].
Regarding the clinical and operational impact, translational estimates suggest that, for a mid-sized ED with ∼50,000 annual visits, reductions of this magnitude imply hundreds of thousands fewer prolonged waits or diversions each year. Improved ICU placement addresses critical safety risks, while reduced overtime enhances workforce sustainability. These improvements translate into higher patient satisfaction and potentially better hospital quality ratings. Collectively, the results demonstrate that efficiency, equity, and sustainability can be improved jointly, rather than being treated as trade-offs.
Overall, the proposed framework advances hospital scheduling by combining fairness, forecasting, and explainability in a unified reinforcement learning approach. This not only improves efficiency and safety but also provides clinicians and administrators with transparent, auditable decision support, making the system more aligned with the ethical and governance requirements of healthcare delivery.
While the present formulation relies on a discrete action space, assigning patients to ED, ward, or ICU beds, or issuing short/long waits or diversions, this design choice was made to maintain training stability and clinical interpretability. However, real-world hospital operations often require more subtle, continuous decisions, such as variable staffing intensity, partial or temporary bed reservations, or dynamic prioritization thresholds. Future work may explore hierarchical reinforcement learning, where high-level discrete actions are combined with low-level continuous controllers capable of modulating staffing or capacity in real time. Alternatively, hybrid architectures that integrate DDQN with actor–critic methods could support continuous control while retaining the transparency advantages of discrete decision-making. Investigating these hybrid action representations may yield additional reductions in waiting-time variance and further improvements in fairness across acuity levels.

6. Conclusions

This work reframed hospital scheduling as a fairness-aware, forecast-informed reinforcement learning problem, demonstrating that efficiency, equity, and sustainability can be improved jointly rather than traded off. Using a Double DQN with short-horizon demand forecasts and fairness penalties in the reward, the system was evaluated against a transparent FCFS baseline on a synthetic dataset (60,000 annual ED visits, realistic ICU ratios, 450 staff). Results showed substantial and statistically validated improvements: average waiting times halved, diversions dropped by over 70%, ICU match accuracy improved, fairness index rose by 45%, and staff overtime was cut by more than half, with robustness confirmed under surges and triage shifts. SHAP-based explanations provided both global and local attributions, ensuring transparency and clinician oversight, while fairness weighting allowed tuning of ethical–efficiency trade-offs. Despite limitations such as reliance on synthetic data, narrow action granularity, and a single fairness operationalization, the framework is reproducible, computationally light, and aligned with governance requirements. Future research should integrate advanced probabilistic forecasting, counterfactual explainability, federated multi-site pilots, and multi-agent extensions to validate safety and generalizability across real-world hospital networks.

Author Contributions

Conceptualization, R.A., H.Z., A.H., B.H., R.Z., and A.K.; methodology, R.A., H.Z., A.H., and B.H.; software, R.A., H.Z., A.H., and B.H.; validation, R.A., H.Z., A.H., and B.H.; formal analysis, R.A., H.Z., A.H., and B.H.; investigation, R.A., H.Z., A.H., and B.H.; resources, R.A., H.Z., A.H., and B.H.; data curation, R.A., H.Z., A.H., and B.H.; writing—original draft preparation, R.A., H.Z., A.H., and B.H.; writing—review and editing, all authors; visualization, R.A., H.Z., A.H., and B.H.; supervision, R.Z. and A.K.; project administration, R.Z. and A.K.; funding acquisition, none. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
AIArtificial Intelligence
RLReinforcement Learning
DQNDeep Q-Network
DDQNDouble Deep Q-Network
FCFSFirst-Come, First-Served
EDEmergency Department
ICUIntensive Care Unit
FIFairness Index
SHAPSHapley Additive exPlanations
SARIMAXSeasonal AutoRegressive Integrated Moving Average with eXogenous regressors
GARCHGeneralized Autoregressive Conditional Heteroskedasticity
CVaR-95Conditional Value at Risk at the 95% level
KPIsKey Performance Indicators
ANOVAAnalysis of Variance
GLMGeneralized Linear Model
CIConfidence Interval
OROdds Ratio
WHOWorld Health Organization
OECDOrganisation for Economic Co-operation and Development
HCAHPSHospital Consumer Assessment of Healthcare Providers and Systems

Appendix A. Algorithms

Algorithm A1: FCFS Baseline Scheduler
Information 16 01039 i003
Algorithm A2: Online Forecast Module (SARIMAX + GARCH)
Information 16 01039 i004
Algorithm A3: Evaluation: KPIs and Statistical Testing
Input: Daily logs for FCFS and DDQN across seeds
1
Aggregate per-day KPIs (waits, rejections, ICU match, FI, overtime);
2
Compute confidence intervals (95%);
3
Apply paired tests: t-test, Wilcoxon, GLM-logit, ANOVA, Kruskal–Wallis;
4
Report effect sizes (Cohen’s d, Cliff’s δ , OR, η 2 );
Algorithm A4: Ablation: Forecast-Naïve DQN
1
Same as Algorithm 1, but remove forecast features ( λ ^ , σ 2 ^ ) from state;
2
Compare KPIs vs. full model;
Algorithm A5: Robustness Scenario Generator
Input: Base simulator config
1
Apply surge: arrivals ×1.25;
2
Apply triage mix shift: +30% T1 patients;
3
Compute tail risk: CVaR0.95 of waits;
4
Repeat over 5 seeds; summarize variance (<3%);

Appendix B. Synthetic Data Assumptions and Validation

Appendix B.1. Data Sources and Calibration Targets

The synthetic dataset was calibrated against aggregated statistics from major U.S. healthcare benchmarks:
  • CDC NHAMCS 2023: emergency department arrivals, triage proportions, waiting times.
  • AHA Annual Survey 2024: licensed bed capacity, staffing patterns, occupancy levels.
  • AHRQ HCUP 2023: transfer delays, discharge intervals, satisfaction proxies.
These benchmarks served as targets for modelling patient arrivals, acuity distributions, staffing rosters, and inpatient flow.

Appendix B.2. Generative Assumptions

Table A1 summarizes the stochastic models used to generate synthetic patient-flow variables. Choices were made to preserve statistical realism while avoiding identifiable patient information.
Table A1. Summary of generative assumptions and calibration targets.
Table A1. Summary of generative assumptions and calibration targets.
VariableDistribution / ParametersCalibrated To
Patient arrivals per hourPoisson( λ ( t ) ); λ ¯ = 9.5 (weekday), 8.0 (weekend); diurnal amplitude A = 0.35 CDC NHAMCS hourly arrival pattern
Triage categories (4-level)Multinomial [ 0.08 , 0.32 , 0.42 , 0.18 ] CDC triage distribution
Transfer delayTriangular(0, 60, 180) minutesAHRQ transfer delay statistics
Staff shift lengthMixture: 70% N ( 8   h , 0.5 2 ) + 30% N ( 12   h , 0.5 2 ) AHA staffing rosters
Bed capacity{ED: 55, Ward: 120, ICU: 20}AHA licensed beds data
Satisfaction scoreUniform( 0.05 , + 0.05 ) then clipped to [ 0 , 1 ] AHRQ HCAHPS constraints

Appendix B.3. Statistical Validation

To evaluate the realism of the synthetic data, distributions were compared to CDC, AHA, and AHRQ benchmarks using:
  • Jensen–Shannon Divergence (JSD),
  • Kolmogorov–Smirnov (KS) tests,
  • Cohen’s d effect-size analysis.
Table A2. Real vs. synthetic statistical comparison. Lower JSD and higher KS p-values indicate better alignment.
Table A2. Real vs. synthetic statistical comparison. Lower JSD and higher KS p-values indicate better alignment.
AttributeReal MeanSynthetic MeanJSDKS p-ValueInterpretation
Hourly arrivals9.49.50.0180.72Indistinguishable
Triage Level 1 proportion0.0850.080.0270.68Consistent
Shift duration (hours)8.98.80.0240.66Consistent
Bed occupancy (%)81.582.00.0330.59Consistent
Transfer delay (minutes)62.765.40.0280.64Minor deviation
Satisfaction (0–1)0.790.800.0190.71Consistent
Across features, the mean JSD was:
Mean JSD = 0.023 ± 0.007 ,
indicating close alignment with empirical distributions (all comparisons: p > 0.05 ).

References

  1. U.S. Centers for Disease Control and Prevention (CDC). Emergency Department Visits: Annual Report; U.S. Centers for Disease Control and Prevention (CDC): Atlanta, GA, USA, 2022.
  2. American Hospital Association (AHA). Hospital Statistics: Workforce and Capacity Trends; American Hospital Association (AHA): Chicago, IL, USA, 2023. [Google Scholar]
  3. Organisation for Economic Co-operation and Development (OECD). Health at a Glance 2023: OECD Indicators; Organisation for Economic Co-operation and Development (OECD): Paris, France, 2023. [Google Scholar]
  4. World Health Organization (WHO). Global Strategy on Human Resources for Health: Workforce 2030 (Update); World Health Organization (WHO): Geneva, Switzerland, 2022.
  5. OECD/WHO. Hospital capacity, beds, and occupancy. In OECD Health Data; OECD/WHO: Paris, France, 2022. [Google Scholar]
  6. Komorowski, M.; Celi, L.A.; Badawi, O.; Gordon, A.J.; Faisal, A.A. The artificial intelligence clinician learns optimal treatment strategies for sepsis in intensive care. Nat. Med. 2018, 24, 1716–1720. [Google Scholar] [CrossRef]
  7. Zhang, B.; Turkcan, A.; Lin, J.; Lawley, M. Clinic scheduling models with overbooking for patients with heterogeneous no-show probabilities. Ann. Oper. Res. 2010, 178, 121–144. [Google Scholar] [CrossRef]
  8. Guinet, A.; Chaabane, S. Operating theatre planning with mixed integer programming. Eur. J. Oper. Res. 2003, 147, 652–663. [Google Scholar]
  9. Zhang, Y.; Chen, X.; Liu, J. Fairness-aware reinforcement learning for healthcare queues. IEEE Trans. Neural Netw. Learn. Syst. 2022, 33, 7551–7561. [Google Scholar]
  10. Suresh, H.; Venkatasubramanian, S.; Guttag, J.; Barocas, S. Fairness in ICU resource allocation: Reinforcement learning approaches. In Proceedings of the ACM Conference on Fairness, Accountability, and Transparency, Seoul, Republic of Korea, 21–24 June 2022. [Google Scholar]
  11. Wang, J.; Yang, H.; Chen, Q. Multi-agent reinforcement learning for hospital resource coordination. Artif. Intell. Med. 2022, 128, 102289. [Google Scholar]
  12. Lundberg, S.M.; Lee, S.-I. A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2017; pp. 4765–4774. [Google Scholar]
  13. Liu, Q.; Zhu, S.; Chen, X.; Zhao, J. Reinforcement learning in healthcare: A survey. Artif. Intell. Med. 2021, 117, 102108. [Google Scholar]
  14. Yu, C.; Liu, J.; Nemati, S. Reinforcement learning in healthcare: A survey. ACM Comput. Surv. 2022, 55, 1–36. [Google Scholar] [CrossRef]
  15. Miotto, R.; Wang, F.; Wang, S.; Jiang, X.; Dudley, J.T. Deep learning for healthcare: Review, opportunities and challenges. Briefings Bioinform. 2018, 19, 1236–1246. [Google Scholar] [CrossRef]
  16. Shickel, A.; Tighe, P.J.; Bihorac, A.; Rashidi, P. Deep EHR: A survey of recent advances in deep learning techniques for electronic health record analysis. IEEE J. Biomed. Health Inform. 2018, 22, 1589–1604. [Google Scholar] [CrossRef]
  17. Agency for Healthcare Research and Quality (AHRQ). Using Data to Reduce Emergency Department Crowding; Agency for Healthcare Research and Quality (AHRQ): Rockville, MD, USA, 2022.
  18. American College of Emergency Physicians (ACEP). Boarding and ED Crowding: Policy Statement; American College of Emergency Physicians (ACEP): Irving, TX, USA, 2023.
  19. Box, R.A.; Jenkins, G.; Reinsel, G.M.; Ljung, G.M. Time Series Analysis: Forecasting and Control, 5th ed.; Wiley: Hoboken, NJ, USA, 2015. [Google Scholar]
  20. Hyndman, R.J.; Athanasopoulos, G. Forecasting: Principles and Practice, 3rd ed.; OTexts: Melbourne, Australia, 2021. [Google Scholar]
  21. Hyndman, R.J.; Khandakar, Y. Automatic time series forecasting: The forecast package for R. J. Stat. Softw. 2008, 27, 1–22. [Google Scholar] [CrossRef]
  22. Hyndman, R.J.; Koehler, A.B. Another look at measures of forecast accuracy. Int. J. Forecast. 2006, 22, 679–688. [Google Scholar] [CrossRef]
  23. Durbin, J.; Koopman, S.J. Time Series Analysis by State Space Methods, 2nd ed.; Oxford University Press: Oxford, UK, 2012. [Google Scholar]
  24. Bollerslev, T. Generalized autoregressive conditional heteroskedasticity. J. Econom. 1986, 31, 307–327. [Google Scholar] [CrossRef]
  25. Engle, R.F. Autoregressive conditional heteroscedasticity with estimates of the variance of UK inflation. Econometrica 1982, 50, 987–1007. [Google Scholar] [CrossRef]
  26. Lütkepohl, H. New Introduction to Multiple Time Series Analysis; Springer: Berlin, Germany, 2005. [Google Scholar]
  27. Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
  28. Lundberg, S.M.; Erion, G.; Lee, S.-I. Consistent individualized feature attribution for tree ensembles. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2018; pp. 4175–4185. [Google Scholar]
  29. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2017. [Google Scholar]
  30. Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
  31. Chen, T.; Guestrin, C. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
  32. Chollet, F. Keras; GitHub repository: San Francisco, CA, USA, 2015. [Google Scholar]
  33. Abadi, M.; Agarwal, A.; Barham, P.; Brevdo, E.; Chen, Z.; Citro, C.; Corrado, G.S.; Davis, A.; Dean, J.; Devin, M.; et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. Available online: https://www.tensorflow.org/ (accessed on 2 November 2025).
  34. Jain, R. A Quantitative Measure of Fairness and Discrimination for Resource Allocation in Shared Computer Systems; Digital Equipment Corporation Technical Report DEC-TR-301; Eastern Research Laboratory, Digital Equipment Corporation: Hudson, MA, USA, 1984. [Google Scholar]
  35. Rao, A.; Wang, J. Quantifying fairness in AI systems: A survey of metrics and applications. In Proceedings of the IEEE International Conference on Data Mining Workshops, Singapore, 17–20 November 2018; pp. 495–502. [Google Scholar]
  36. Roadevin, C.; Hill, H. How can we decide a fair allocation of healthcare resources during a pandemic? J. Med Ethics 2021, 47, e84. [Google Scholar] [CrossRef] [PubMed]
  37. Hagendorff, T. The Ethics of AI Ethics: An Evaluation of Guidelines. Minds Mach. 2020, 30, 99–120. [Google Scholar] [CrossRef]
  38. Li, Y.; Mao, C.; Huang, K.; Wang, H.; Yu, Z.; Wang, M.; Luo, Y. Deep Reinforcement Learning for Efficient and Fair Allocation of Healthcare Resources. In Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence, Montreal, QC, Canada, 16–22 August 2025; pp. 9790–9798. [Google Scholar] [CrossRef]
  39. Suresh, H.; Guttag, J.; Horowitz, M. Clinical allocation via constrained reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual Event, 2–9 February 2021; pp. 900–908. [Google Scholar]
  40. Candel, A.; Parmar, V.; LeDell, E.; Arora, A. Deep Learning with H2O; H2O.ai: Singapore, 2016. [Google Scholar]
  41. Ribeiro, M.T.; Singh, S.; Guestrin, C. Why should I trust you? Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD, San Francisco, CA, USA, 13–17 August 2016; pp. 1135–1144. [Google Scholar]
  42. Demeester, P.; Aerts, E.; Causmaecker, B.D.; den Berghe, E.V. A hybrid tabu search algorithm for patient-to-room assignment in hospitals. Comput. Ind. Eng. 2010, 59, 17–26. [Google Scholar]
  43. Schäfer, F.; Walther, M.; Grimm, D.G.; Hübner, A. Combining machine learning and optimization for the operational patient-bed assignment problem. Health Care Manag. Sci. 2023, 26, 785–806. [Google Scholar] [CrossRef] [PubMed]
  44. Hunter, J.D. Matplotlib: A 2D graphics environment. Comput. Sci. Eng. 2007, 9, 90–95. [Google Scholar] [CrossRef]
  45. van der Walt, S.; Colbert, S.C.; Varoquaux, G. The NumPy array: A structure for efficient numerical computation. Comput. Sci. Eng. 2011, 13, 22–30. [Google Scholar] [CrossRef]
  46. Virtanen, P.; Gommers, R.; Oliphant, T.E.; Haberland, M.; Reddy, T.; Cournapeau, D.; Burovski, E.; Peterson, P.; Weckesser, W.; Bright, J.; et al. SciPy 1.0: Fundamental algorithms for scientific computing in Python. Nat. Methods 2020, 17, 261–272. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Distribution of patients across triage categories in the synthetic dataset.
Figure 1. Distribution of patients across triage categories in the synthetic dataset.
Information 16 01039 g001
Figure 3. Average waiting times by triage level under FCFS and Double DQN policies. Significant reductions are observed across all levels, with the largest gains for critical patients (Triage-1 and Triage-2).
Figure 3. Average waiting times by triage level under FCFS and Double DQN policies. Significant reductions are observed across all levels, with the largest gains for critical patients (Triage-1 and Triage-2).
Information 16 01039 g003
Figure 4. Patient rejection rates under FCFS and Double DQN scheduling policies. The Double DQN consistently reduces diversions to below 10%, outperforming FCFS baselines.
Figure 4. Patient rejection rates under FCFS and Double DQN scheduling policies. The Double DQN consistently reduces diversions to below 10%, outperforming FCFS baselines.
Information 16 01039 g004
Figure 5. ICU match accuracy distribution under FCFS and Double DQN scheduling policies. The Double DQN improves accuracy from ∼93.4% to ∼95.1–95.2%, with statistically significant reductions in unsafe warding.
Figure 5. ICU match accuracy distribution under FCFS and Double DQN scheduling policies. The Double DQN improves accuracy from ∼93.4% to ∼95.1–95.2%, with statistically significant reductions in unsafe warding.
Information 16 01039 g005
Figure 6. Fairness Index (FI) and cross-triage variance under FCFS vs. Double DQN. The Double DQN improves FI by 45% while reducing variance across triage groups.
Figure 6. Fairness Index (FI) and cross-triage variance under FCFS vs. Double DQN. The Double DQN improves FI by 45% while reducing variance across triage groups.
Information 16 01039 g006
Figure 7. Pareto front of mean waiting time versus Fairness Index (FI) across fairness weight ζ . ζ 0.5 provides the best efficiency–equity balance.
Figure 7. Pareto front of mean waiting time versus Fairness Index (FI) across fairness weight ζ . ζ 0.5 provides the best efficiency–equity balance.
Information 16 01039 g007
Figure 8. Aggregate daily overtime under FCFS and Double DQN scheduling policies. The DQN reduced overtime by more than half, ensuring sustainable staffing.
Figure 8. Aggregate daily overtime under FCFS and Double DQN scheduling policies. The DQN reduced overtime by more than half, ensuring sustainable staffing.
Information 16 01039 g008
Figure 9. Patient satisfaction distribution under FCFS and Double DQN scheduling. The DQN raises satisfaction from ∼45% to ∼80%, largely by reducing waits and rejections.
Figure 9. Patient satisfaction distribution under FCFS and Double DQN scheduling. The DQN raises satisfaction from ∼45% to ∼80%, largely by reducing waits and rejections.
Information 16 01039 g009
Figure 10. Global SHAP feature importance for Double DQN scheduling.
Figure 10. Global SHAP feature importance for Double DQN scheduling.
Information 16 01039 g010
Table 1. Summary of related approaches in hospital scheduling and their limitations.
Table 1. Summary of related approaches in hospital scheduling and their limitations.
ApproachKey ContributionsLimitations
Rule-based heuristics (FCFS, triage rules)Simple to implement; transparent; widely adopted [1,2,3,4,5]Myopic under surges; ignores acuity variation; leads to boarding and inequities
Operations Research (OR) modelsFormal optimization; encodes staffing and bed constraints [8,9]Assumes static inputs; limited scalability for real-time, stochastic ED/ICU dynamics
Reinforcement Learning (RL)Learns policies from interaction; reduces delays vs heuristics [7,11,13,14]Rarely incorporates fairness; explanations limited; usually reactive to demand
Deep RL architectures (DQN, PPO, Multi-agent)Handles discrete assignment (DQN) and continuous actions (PPO); multi-agent improves coordination [17,18,19,20,21,22]Multi-agent adds complexity; actor–critic needed for continuous staffing; limited fairness integration
Explainability (SHAP)Global and per-decision attributions; supports clinical governance [12,28]Limited adoption in operational controllers; often retrospective only
Forecasting in hospital opsAccurate ED/ICU demand prediction (ARIMA, LSTM, Transformers) [19,20,21,22,23,24,25,27,29,30,31,32,33]Typically siloed from real-time scheduling; forecasts not used in online state
Hybrid forecasting + schedulingSuccessful in airlines and supply chains; improves efficiency [40,41]Rare in healthcare; few implementations linking forecasts directly into hospital schedulers
Table 2. Key Performance Indicators (KPIs) with 95% confidence intervals and effect sizes.
Table 2. Key Performance Indicators (KPIs) with 95% confidence intervals and effect sizes.
MetricFCFSDQN Δ (%)Effect Size
Mean Waiting Time (min)215.3 ± 12.4102.5 ± 9.8−52.4% d = 0.86
Diversion Rate (%)27.6 ± 3.17.8 ± 2.5−71.7% d = 0.91
ICU Match Accuracy (%)93.4 ± 1.295.1 ± 0.9+1.7% d = 0.41
Fairness Index (FI)0.41 ± 0.040.59 ± 0.03+45% d = 0.77
Staff Overtime (h/100 staff-days)19.8 ± 2.38.7 ± 1.5−56.0% d = 0.88
Table 3. Ablation results: Forecast-aware vs. forecast-naive DQN.
Table 3. Ablation results: Forecast-aware vs. forecast-naive DQN.
MetricDQN (No Forecast)DQN (+Forecast) Δ
Mean Waiting Time (min)112.3102.5−9%
Diversion Rate (%)10.87.8−28%
Fairness Index (FI)0.540.59+9.6%
Staff Overtime (h/100 staff-days)10.78.7−19%
Table 4. Robustness checks: Surge, triage-mix, and stability tests.
Table 4. Robustness checks: Surge, triage-mix, and stability tests.
ScenarioFCFSDQN Δ p-Value
Arrival Surge (Wait min)298173−42%0.004
Triage Shift (T1 Wait min)310152−51%0.006
Tail Risk (CVaR-95 min)611276−55%0.003
Seed Variance (SD across runs)>8%<3%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Abualrous, R.; Zouzou, H.; Zgheib, R.; Hasan, A.; Hijazi, B.; Kermani, A. Fairness-Aware Intelligent Reinforcement (FAIR): An AI-Powered Hospital Scheduling Framework. Information 2025, 16, 1039. https://doi.org/10.3390/info16121039

AMA Style

Abualrous R, Zouzou H, Zgheib R, Hasan A, Hijazi B, Kermani A. Fairness-Aware Intelligent Reinforcement (FAIR): An AI-Powered Hospital Scheduling Framework. Information. 2025; 16(12):1039. https://doi.org/10.3390/info16121039

Chicago/Turabian Style

Abualrous, Ruba, Hala Zouzou, Rita Zgheib, Alaa Hasan, Bilal Hijazi, and Arash Kermani. 2025. "Fairness-Aware Intelligent Reinforcement (FAIR): An AI-Powered Hospital Scheduling Framework" Information 16, no. 12: 1039. https://doi.org/10.3390/info16121039

APA Style

Abualrous, R., Zouzou, H., Zgheib, R., Hasan, A., Hijazi, B., & Kermani, A. (2025). Fairness-Aware Intelligent Reinforcement (FAIR): An AI-Powered Hospital Scheduling Framework. Information, 16(12), 1039. https://doi.org/10.3390/info16121039

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop