Development and Clinical Interpretation of an Explainable AI Model for Predicting Patient Pathways in the Emergency Department: A Retrospective Study

Arnaud, Émilien; Moreno-Sanchez, Pedro Antonio; Elbattah, Mahmoud; Ammirati, Christine; van Gils, Mark; Dequen, Gilles; Ghazali, Daniel Aiham

doi:10.3390/app15158449

Open AccessArticle

Development and Clinical Interpretation of an Explainable AI Model for Predicting Patient Pathways in the Emergency Department: A Retrospective Study

by

Émilien Arnaud

^1,2

,

Pedro Antonio Moreno-Sanchez

³

,

Mahmoud Elbattah

^2,4,

Christine Ammirati

^1,5,

Mark van Gils

³,

Gilles Dequen

²

and

Daniel Aiham Ghazali

^1,6,*

¹

Department of Emergency Medicine, Amiens Picardy University Hospital, 80000 Amiens, France

²

Laboratoire Modélisation, Information, Systèmes (MIS) UR4290, University of Picardie Jules Verne, 80000 Amiens, France

³

Faculty of Medicine and Health Technology, Tampere University, 60100 Seinäjoki, Finland

⁴

College of Arts, Technology and Environment, University of the West of England, Bristol BA1 9DZ, UK

⁵

Amiens Picardy University Hospital—SimuSanté, 80000 Amiens, France

⁶

INSERM UMR1137—Infection, Antimicrobials, Modelling, Evolution, University of Paris-Cité, 75018 Paris, France

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(15), 8449; https://doi.org/10.3390/app15158449

Submission received: 27 June 2025 / Revised: 15 July 2025 / Accepted: 21 July 2025 / Published: 30 July 2025

(This article belongs to the Special Issue Unlocking Scientific Insights: Data Mining, Large Models, and AI-Driven Discovery)

Download

Browse Figures

Review Reports Versions Notes

Abstract

Background: Overcrowded emergency departments (EDs) create significant challenges for patient management and hospital efficiency. In response, Amiens Picardy University Hospital (APUH) developed the “Prediction of the Patient Pathway in the Emergency Department” (3P-U) model to enhance patient flow management. Objectives: To develop and clinically validate an explainable artificial intelligence (XAI) model for hospital admission predictions, using structured triage data, and demonstrate its real-world applicability in the ED setting. Methods: Our retrospective, single-center study involved 351,019 patients consulting in APUH’s EDs between 2015 and 2018. Various models (including a cross-validation artificial neural network (ANN), a k-nearest neighbors (KNN) model, a logistic regression (LR) model, and a random forest (RF) model) were trained and assessed for performance with regard to the area under the receiver operating characteristic curve (AUROC). The best model was validated internally with a test set, and the F1 score was used to determine the best threshold for recall, precision, and accuracy. XAI techniques, such as Shapley additive explanations (SHAP) and partial dependence plots (PDP) were employed, and the clinical explanations were evaluated by emergency physicians. Results: The ANN gave the best performance during the training stage, with an AUROC of 83.1% (SD: 0.2%) for the test set; it surpassed the RF (AUROC: 71.6%, SD: 0.1%), KNN (AUROC: 67.2%, SD: 0.2%), and LR (AUROC: 71.5%, SD: 0.2%) models. In an internal validation, the ANN’s AUROC was 83.2%. The best F1 score (0.67) determined that 0.35 was the optimal threshold; the corresponding recall, precision, and accuracy were 75.7%, 59.7%, and 75.3%, respectively. The SHAP and PDP XAI techniques (as assessed by emergency physicians) highlighted patient age, heart rate, and presentation with multiple injuries as the features that most specifically influenced the admission from the ED to a hospital ward. These insights are being used in bed allocation and patient prioritization, directly improving ED operations. Conclusions: The 3P-U model demonstrates practical utility by reducing ED crowding and enhancing decision-making processes at APUH. Its transparency and physician validation foster trust, facilitating its adoption in clinical practice and offering a replicable framework for other hospitals to optimize patient flow.

Keywords:

artificial intelligence; explainable artificial intelligence; emergency medicine; patient pathway

1. Introduction

Overcrowding in emergency departments (EDs) has become an international issue of overriding importance. Several studies from different countries have provided indisputable evidence to show that overcrowded EDs can lead to a decrease in quality of care in general, and unintended harm to patients in particular [1,2,3]. In turn, these issues influence patient management; patients admitted to the wrong ward may experience higher care costs [4], a longer hospital stay [4], a greater risk of morbidity [5] and mortality [6,7,8,9], and a greater likelihood of leaving the ED without being examined [10]. Accordingly, a variety of solutions have been developed with the goal of limiting ED overcrowding, including pre-hospital dispatch before attending the ED (to encourage attendance at other healthcare facilities), improved bed coordination, effective triage with front-loading investigations, fast tracks, optimized transfer to the destination ward (even if the bed is not ready), and an increase in the number of available beds [11,12,13].

Furthermore, machine learning (ML) and artificial intelligence (AI) have become promising tools in medicine; by learning from large amounts of data, these tools can help clinicians to detect disease patterns or improve the patient care pathway, for example. In 2012, the results of two studies confirmed that the timely prediction of patient outcomes helps to reduce the length of stay in the ED [1,14]. Thus, the use of AI in the ED has emerged as an active topic of research for the prediction of patient outcomes (discharge or admission) using triage data [15], mortality prediction [16], prioritization [17], prediction of adverse event (on the basis of symptoms) [18], and diagnosis prediction [19]. The literature data cover a variety of types of ML algorithms, e.g., logistic regression (LR), support vector machines, decision trees, ensemble methods, fuzzy logic, naïve Bayes classifiers, and deep learning models [15,16,17].

Nevertheless, the credibility of predictive models is still a concern for clinicians [20,21]. Greater clinical uptake will require more readily accessible and interpretable results [22]. Explainable artificial intelligence (XAI) has been developed to meet these requirements [23] and thus allow healthcare experts to confidently make use of data-driven decisions and provide more personalized, trustworthy treatments and diagnoses [24]. In the ED, an XAI model might help (i) emergency physicians to understand how data are considered internally in the model and (ii) managers to gain insight into the model’s logic and justify its deployment in the hospital. Furthermore, XAI might allow patients to understand how their data are used, which is in line with the terms of the European Union’s General Data Protection Regulation and the “right to an explanation” for people whose data are processed by an AI algorithm [25].

In the context of overcrowding in the ED, Amiens Picardy University Hospital (APUH, located in the city of Amiens in northern France, with 1.9 million inhabitants in the surrounding region) [26] has developed an AI model called “Prediction of the Patient Pathway in the Emergency Department” (3P-U, U for “Urgence” which is the French word for ED). This model has been designed to improve patient flow by predicting the final medical decision [27,28]. However, to ensure wider adoption by ward specialists, the 3P-U model was refined and made more explainable (especially for the outputs generated by the algorithms) while maintaining the level of classification performance for each specific medical issue. This balance between accuracy and explainability is especially important in the case of neural networks and deep learning models, due to the complexity of the internal layers and the “black box” behavior.

2. Problem Statement

The primary objective of the present study was to develop an explainable predictive model for the prediction of hospital admission using structured ED triage data. The secondary objectives were to (i) improve the interpretability of the 3P-U model’s predictions, (ii) obtain insights into the model’s inner logic (to make the prediction of hospital admission more understandable), and (iii) validate the explainability results against clinical evidence (i.e., the ED clinicians’ expertise).

To promote standardized reporting on prediction models in medicine, we drafted the present manuscript in compliance with the Transparent Reporting of Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) checklist (Supplementary Material S1.1) [29] and the “Guidelines for Developing and Reporting Machine Learning Predictive Models in Biomedical Research” [30].

3. Background and Related Work

This section provides the necessary background for the study, beginning with an overview of Deep Learning (DL) in EDs, where AI models aid in triage, patient outcome prediction, and workflow optimization. The discussion then shifts to XAI in healthcare, highlighting its role in ensuring transparency, trust, and adoption of AI models in medical settings.

3.1. Applications of DL in ED

DL has emerged as a valuable tool in emergency medicine, with its application to ED challenges growing significantly over the past five years. DL techniques have demonstrated considerable potential in enhancing various aspects of ED care, including diagnosis, imaging interpretation, triage, and clinical decision-making. These systems can rapidly analyze multimodal data (e.g., vital signs, EHRs, imaging) to uncover complex patterns that complement clinician judgment.

3.2. Triage and Risk Stratification

Triage is critical for prioritizing patients by urgency, yet traditional nurse-led triage can be inconsistent due to its subjective nature. DL-based systems have been developed to assist triage nurses, standardizing acuity assessments and improving patient flow [31]. Studies report that AI-supported triage improves the identification of high-risk patients, resulting in improved patient outcomes [32]. For example, a DL model trained on triage vitals, symptoms, and notes outperformed conventional methods in predicting critical outcomes, enabling more reliable severity stratification [33].

3.3. Patient Admission Prediction

Predicting which ED patients will require hospital admission is another area significantly transformed by DL. Early admission prediction models enable proactive bed allocation and have the potential to mitigate ED crowding. For instance, one study demonstrated that a DL-based model achieved an area under the receiver operating characteristic (AUROC) curve of 0.87–0.88 for predicting admissions, outperforming traditional risk scores [33]. Furthermore, incorporating triage text notes into these models has enhanced their prediction accuracy [27]. Such advancements empower ED staff to anticipate patient disposition more effectively and streamline care delivery.

3.4. Diagnostics and Decision Support

DL is also augmenting ED diagnostics, with AI models assisting in the real-time interpretation of tests and imaging [31]. For example, DL-based image analysis can detect critical findings on ED radiographs (e.g., pneumothorax, fractures) with high sensitivity and AUROC > 0.8, comparable to expert performance [34]. Similarly, DL models using vital signs and lab results can early-detect life-threatening conditions like sepsis or even cardiac arrest risk (e.g., Zhang et al. [35] or Choi et al. [36]). By extending physicians’ diagnostic capabilities, these tools accelerate emergency decision-making.

3.5. Introduction to XAI in Healthcare

AI is transforming healthcare by enhancing diagnostic accuracy, optimizing treatment plans, and improving operational efficiency. However, the increasing complexity of DL models has raised concerns about their lack of interpretability, particularly in high-stakes medical applications [37]. XAI aims to address this challenge by making AI-driven decisions transparent, interpretable, and trustworthy, enabling clinicians to understand and validate model outputs [38].

One of the earliest discussions on AI interpretability in healthcare emphasized the need for trust and human oversight in decision-making [39]. More recent studies have reinforced this view, highlighting that black-box AI models can struggle with clinician adoption unless their reasoning aligns with medical knowledge [40]. In response, various XAI techniques have been developed to improve interpretability, including:

Feature Attribution Methods: Techniques like Shapley Additive Explanations (SHAP) [41] and Local Interpretable Model-agnostic Explanations (LIME) [38] assign importance scores to input features, showing how each factor contributes to a prediction.
Global vs. Local Explanations: XAI methods can be broadly categorized into global and local explanations [23]. Global explanations provide insights into the overall behavior of a model, showing how features generally influence predictions across the entire dataset. Methods such as Partial Dependence Plots (PDPs) illustrate the average impact of a feature, helping clinicians understand the general trends learned by the model. However, global explanations may not always be useful for individual patient decisions. In contrast, local explanations focus on interpreting a single prediction for a specific patient. Techniques like LIME aim to identify which features contributed most to a particular decision. For example, in an emergency setting, a local explanation might reveal that a patient’s elevated heart rate and abnormal oxygen saturation were the primary drivers behind an AI model predicting a high likelihood of hospital admission. These patient-specific insights enhance clinical decision-making by allowing physicians to verify whether the model’s reasoning aligns with their medical judgment.
Example-Based Explanations: Aims to improve the interpretability of AI models by presenting specific instances from the dataset that are similar to the case under consideration [42]. This approach allows clinicians to compare a current patient’s data with past cases, facilitating a more intuitive understanding of the model’s predictions. For instance, a study [43] developed an oral cancer screening system using Case-Based Reasoning with DL. The system retrieves similar past cases to provide visual explanations, aligning with clinician reasoning. The DL model integrates medical knowledge, improving accuracy (85%) and interpretability.

In emergency medicine, where decisions must be made rapidly, XAI can play a crucial role in ensuring that predictive models align with clinical expectations. Recent studies (e.g., Arnaud et al. [44] and Moreno-Sanchez et al. [45]) demonstrate that integrating explainability improves physician trust and adoption, particularly when model insights are validated against expert knowledge. As AI continues to evolve in healthcare, XAI will remain a key factor in bridging the gap between machine intelligence and human expertise.

3.6. Related Work

XAI has emerged as a crucial development in healthcare, especially in emergency departments (EDs), where decision-making should be rapid, reliable, and transparent. Several studies have explored the application of XAI in predicting patient outcomes, optimizing resource management, and improving triage processes, all of which aim to enhance both the accuracy of ML models and the trust clinicians place in them.

A common theme across many studies is the integration of XAI techniques to address the “black box” nature of AI models. Okada et al. (2023) stressed the importance of post-model explainability methods, such as visualization and feature relevance, to mitigate trust issues among clinicians [46]. This foundational need for transparency is echoed in studies that focus on more specific applications, such as the prediction of reattendance in EDs. For example, Chmiel et al. (2021) used SHAP values to explain individual-level predictions, demonstrating how XAI can be used to inform post-discharge interventions and reduce reattendance rates [47].

Expanding on this, Petsis et al. (2022) and Peláez-Rodríguez et al. (2024) applied explainability techniques to forecast ED visits and manage resources [48,49]. Both studies used XGBoost and SHAP to uncover the factors influencing patient flow, with Petsis et al. highlighting variables such as weekday and patient volume, while Peláez-Rodríguez et al. focused on continuous training and multi-model regression approaches for both short- and long-term forecasting. The shared emphasis on feature importance and interpretability across these studies illustrates the growing consensus that transparency is key to improving operational efficiency in EDs.

Similarly, in their systematic review, Piliuk and Tomforde (2023) emphasized the need for generalized approaches to AI in emergency medicine, noting that most studies remain narrowly focused on specific conditions or patient groups [50]. Their review highlighted the limitations of existing models and the necessity for explainable systems to facilitate broader application. This aligns with Arnaud et al. (2023), who developed an explainable Natural Language Processing (NLP) model using transformers to predict ED admissions based on triage notes. Their use of Local Interpretable Model-agnostic Explanation (LIME) for interpretability highlights the importance of transparency in text-based models, which can complement the more numerical data-driven approaches employed in earlier studies [44].

Deep learning methods have also been applied in emergency care, with Lee et al. (2024) exploring NLP-based models for symptom detection during triage [51]. Their use of SHAP to explain the model’s outputs, combined with their fine-tuning of transformer models like Bidirectional Encoder Representations from Transformers (BERT), underscores the broader trend towards integrating XAI into deep learning frameworks to enhance both performance and trust.

The mixed-methods study by Laxar et al. (2023) further demonstrates the psychological dimension of XAI, revealing that even though explainability may not always directly influence decision outcomes, it significantly increases trust in AI systems [52]. This is a key factor when deploying AI in high-stakes environments like emergency medicine, where clinicians rely heavily on intuitive decision-making.

Finally, Moreno-Sánchez et al. (2024) extended the scope of explainability to resource allocation, a crucial challenge in ED management. Their ensemble tree classifiers provided both accurate admission predictions and resource allocation insights, once again demonstrating the value of explainable AI systems [45]. Even in studies focused on pandemic-specific contexts, such as Casiraghi et al. (2020), which developed XAI models for COVID-19 risk prediction, the emphasis remains on transparent decision-making, illustrating the versatility and necessity of explainable AI across various healthcare applications [53]. Beyond the healthcare sector, the value of SHAP-based interpretability has been demonstrated in other critical domains such as port logistic optimization [54].

In summary, the body of work around XAI in emergency medicine converges on the importance of explainability not only to improve model performance but to foster trust and ensure clinical applicability. From predicting patient reattendance and managing ED traffic to addressing broader operational challenges like resource allocation, explainability is emerging as a critical component in making AI both usable and trustworthy in the fast-paced, high-stakes environment of emergency care.

4. Material and Methods

4.1. Study Design, Data Source, and Participants

We conducted a retrospective, single-center study of all adult patients having consulting in the EDs at APUH between 2015 and 2018. The data were obtained directly from the hospital’s electronic health records (ResUrgences^®, Berger Levrault, Boulogne-Billancourt, France). We excluded patients with no triage data and patients who had been transferred to an intensive care unit (ICU) immediately upon arrival.

In French EDs, triage is performed by a nurse according to the five-level French Emergency Nurses Classification in Hospital (FRENCH) triage scale [55] (Table 1). Each triage criterion is associated with a triage level; for example, the criterion “pulse ≥ 180 bpm” is associated with triage level 1, and the criterion “chest pain with a normal ECG” is associated with triage level 3. The nurse considers all triage criteria available and prioritizes the patient according to the most severe triage level met. The triage nurse can initiate simple tests (such as an ECG or a urine test) at the triage stage and can refer complex cases to a physician.

The ML models were implemented by an interdisciplinary team of physicians from APUH and AI researchers from the University of Picardy Jules Verne (UPJV, Amiens, France). Specifically, structured triage data were used to develop binary classifiers for predicting the outcome (i.e., admission to a hospital ward or discharge). The outcome was binary because at the triage stage, not enough medical information is collected to predict a multi classifier with acceptable performance. The predictors and target outcomes are detailed in Supplementary Material S1.3.

In typical ED settings, not all variables were systematically collected (e.g., “data not purposely collected”). Consultations with missing values were not excluded. We next assumed that some vital signs were “data not purposely collected” because they appeared to be of little clinical relevance and/or were not ethical to collect. We therefore decided to fill missing data with a value of zero or a physiological value (Table 2). This decision was based on our clinical experience: an emergency physician would more readily understand predictions if the clinician knew that missing values had been replaced by the physiological value. In contrast, the use of more complex methods (such as multiple imputation chained equations) for in-context value imputation would be harder for the physician to understand. This missing data strategy (i.e., the replacement of missing values with fixed values) has been validated previously [56]. Accordingly, we decided to keep all the variables because “abnormal” data (i.e., either non-physiological or non-imputed) would nevertheless provide the model with information.

Between 2015 and 2018, 351,019 patients attended the EDs at APUH. Of these, 48,053 (13.7%) were excluded because no triage data were recorded or because they were immediately transferred to an ICU. Thus, 302,966 patients were included in the study (Table 2). The final medical decision corresponded to 99,340 (32.8%) admissions and 203,626 (67.2%) discharges. The main characteristics are summarized in Table 2, and the full table (including all features, missing values, and imputation strategies) is given in Supplementary Material S1.3.

The French Emergency Nurses Classification in Hospital (FRENCH) triage scale is detailed in Supplementary Material S1.5 and in Taboulet et al. [55]. Data are expressed as the mean (SD) or n (%).

4.2. Machine Learning Algorithms and Performance Metrics

To validate the best model’s performance, 20% of the initial dataset was reserved randomly as a test set by stratifying on the “admission status” variable. Using the remaining 80% of the initial dataset, a set of models (such as artificial neural networks (ANNs), K-nearest neighbor (KNN), random forest (RF), and logistic regression (LR)) were evaluated using a five-fold cross-validation method and trained as a regressor (1 if admitted, and 0 if discharged). An ANN is a widely used ML algorithm composed of input, hidden, and output layers that are connected to each other by weighted edges. After input, feature vectors are processed sequentially by every layer via non-linear transformations in each neuron; an output (e.g., a class label for classification or regression) is generated by the final layer. In the KNN method, each object being classified is compared with its k-nearest training examples via a distance function, where k is an integer; its label is then assigned by a majority vote [57]. RFs are a combination of tree predictors such that each tree depends on the value of a random vector sampled independently from the same distribution for all trees in the forest [58]. Predictions from all trees are aggregated, and the final class is assigned by majority vote. In LR, categorical and continuous predictors predict a binomial outcome as a special case of a generalized linear model [59]. The 5-fold cross validation method consists of five random splits (four fifths for training and one fifth for validation) stratified on the outcome variable; this results in five models for each type of algorithm. The preprocessing consisted of value normalization for numerical variables and one-hot encoding for categorical variables. Given the limited medical data available at this stage in the care path, we used all the features available at triage. The ANN model consisted of a multilayer perceptron composed of four dense layers and three dropout layers. This architecture was based on the structured data branch of the preliminary experiments [27]. The hyperparameters (the number of neurons in the first three dense layers and the dropout rate) were tuned using the random search strategy, with the cross entropy as the loss function. A rectified linear unit activation function was used in the hidden layers, and a sigmoid function was used in the output layer. The training process was completed over 10 epochs, using the Adam [60] optimizer with its default parameters. The KNN algorithm was trained with up to 10 neighbors and the kd_tree algorithm driven by the Manhattan metric. The RF algorithm was set up using 100 estimators, the Gini criterion, “min_samples_split” = 2, and use of the square root to find the best split. The LR algorithm was set up using the penalty at L2 and the limited-memory Broyden–Fletcher–Goldfarb–Shanno solver [61].

Each algorithm’s performance was assessed in terms of the mean and the standard deviation (SD) AUROC for the 5-folds cross validation of the training set (80%). A Friedman test was used to compare the respective models’ AUROC, with p < 0.05 as the threshold for statistical significance (Supplementary Material S1.2). Lastly, the performance metrics were evaluated once on the best model for each architecture, using the test set (20%). The performance of each of the best models was rated in terms of recall, precision, and accuracy, by applying a threshold (defined by the best F1 score). The models were trained and assessed using the Python library Scikit-Learn 0.21 [62] and Keras 2.11.0 [63].

4.3. XAI Techniques

In general, post hoc XAI techniques can provide global explanations (i.e., how the trained model makes predictions) and/or local explanations (i.e., how the model arrives at a prediction for a single instance). It helps to understand the distribution of prediction outputs as a function of the input features [23]. The adoption of XAI methods in healthcare is becoming increasingly crucial. A recent systematic review highlights the need for transparency and trust in clinical AI models, particularly through interpretable predictions [64,65].

In the present study, global and local explanations were explored with two different model-agnostic, post hoc XAI techniques: SHAP and the PDP. Firstly, SHAP is a feature importance technique that offers local explanations by calculating an additive measure (known as the Shapley value) with local accuracy and consistency for each feature of the single instance [66]. SHAP also provides global explanations by aggregating all the Shapley values concerning every single instance of the dataset using the “Kernel SHAP” implementation. Here, we used (i) the SHAP summary dot plot implementation [66] to quantify the contribution of each feature’s value to the global model’s prediction, and (ii) waterfall plots to describe the specific influence of the feature values on the local prediction of an instance. Secondly, PDPs provide visual explanations about the marginal effect on the predicted outcome of a given feature over the range of its observed values [67]. In this way, PDPs can visualize the trend in a specific feature’s influence on the global prediction’s probability. The SHAP and PDP analyses were implemented as part of a “local framework” with the SHAP Python library 0.35.0 [68] and PDPbox 0.21 [69], respectively. In this paper, the XAI procedure was applied to the best model assessed in the previous step.

4.4. Qualitative Clinical Assessment

The goal was to compare XAI results with emergency physician decision-making. After the XAI techniques had been employed, the outcomes generated by the SHAP and PDP methods were evaluated for medical significance by two emergency physicians who contributed to the model development. They assessed the global explanation and two instances: one randomly selected true positive case and one randomly selected true negative case. One physician has 7 years of experience, and the second one has more than 10 years of experience. They used their medical expertise to determine whether the models’ findings (i.e., feature importance scores, SHAP visualizations, and related figures) were in line with the underlying physiopathology and clinical practice. The physicians’ feedback played a vital role in validating the coherence and clinical relevance of the results obtained. Agreement between the two emergency physicians was evaluated using a concordance rate. In case of discordance between the two main experts, a third physician would be consulted for the qualitative assessment to determine the model’s validity.

4.5. Ethics Approval

The study was authorized by a hospital committee with competency for research not requiring approval by an institutional review board (APUH, reference: PI2019_843_0066).

5. Results

5.1. Model Performance and Evaluation

Using the training set (242,372 patients; 80%), the mean (SD) AUROC was 83.0% (0.1%) with the ANN model, 71.7% (0.2%) with the RF model, 67.2% (0.2%) with the KNN model, and 71.5% (<0.1%) with the LR model (Supplementary Material S1.4). The difference between the models was statistically significant (p = 0.002).

Using the test set (60,593 patients; 20%) and the best ANN model (learning curve in the Supplementary Material S1.4), the AUROC was 83.2%. For the other models, the AUROCs were, respectively, 71.5%, 67.1%, and 61.8% for the LR, KNN, and RF algorithms (Table 3 and Figure 1). With the best threshold (0.35) determined by the best F1 score (0.67), the recall was 75.7%, the precision was 59.7%, and the accuracy was 75.3% (Supplementary Material S1.4).

The tuned number of neurons for the three dense layers of the artificial neural network was, respectively, 192, 128, and 48. The drop-out rate for all three drop-out layers was tuned to 0.2. The other parameters are described in the method section. According to these results, the used model for XAI procedure was the optimized ANN.

5.2. Explainability Results

Figure 2 summarizes the most relevant features detected by SHAP and shows the overall features and additive contributions to the probability of being admitted to a ward, using the optimized ANN only.

Amongst the features with the greatest contributions, ‘Age’, ‘Heart Rate’, ‘French Emergency Nurses Classification in Hospital (FRENCH) grade: 2’, and ‘Reason: multiple severe injuries’ increased the patient’s likelihood of hospital admission. Since ‘Age’ and ‘Heart Rate’ are quantitative variables, the plot shows how much these variables are involved in the algorithm’s decision: a young patient (blue color) has a lower likelihood of admission (positive SHAP value, on the right-hand side) than an older patient (red color). Conversely, features like ‘Waiting mode: wheel chair’, ‘Reason: chest pain’, and ‘Reason: fainting’ made a considerable negative contribution to the prediction outcome. Even though the (binary) variables ‘Arrival mode: Personal’, ‘FRENCH grade: 5’, ‘FRENCH grade: 4’, and ‘Reason: limb trauma’ made a small negative contribution to the probability of admission when they were positive, negative values also made a slight positive contribution to the probability of admission (e.g., ‘not arriving in the ED by one’s own means’ increased the likelihood of admission).

SHAP also provided local explainability about the predictions for an individual case by depicting the magnitude and direction of each feature’s contribution to the final score (red: increasing the likelihood of hospital admission, blue: decreasing the likelihood of hospital admission) and relevance (i.e., the length of the bar in the chart). For instance, Figure 3a,b show the prediction of a true positive case and a true negative case, respectively. In both cases, the prediction started from a baseline Shapley value of 0.378, which corresponds to the model output’s average probability of hospital admission for the training data. According to the general attributions in the SHAP summary plots, the patient’s age was one of the most relevant features for both true negatives and true positives. It was decisive (−0.2) for a true negative case (i.e., a discharged patient), as was ED attendance for ‘Reason: dyspnea < 32/min’ (−0.12). For a true positive (i.e., the prediction of hospital admission), ‘FRENCH 2’ had the highest individual SHAP contribution (+0.11), although ‘Age’ = 79 (+0.1) and ‘Reason: general physical deterioration’ (+0.12) increased the probability of hospital admission.

For quantitative features in particular, PDPs quantify the probability contribution as a function of the magnitude. By drawing up a PDP for every feature (to assess how probability changes), we learned that the features associated with the largest increase in the probability of hospital admission were ‘Age’ and ‘Heart Rate’ (Figure 4 and Figure 5); this confirmed the SHAP results. Other features (such as ‘Capillary blood ketone level’ and ‘Capillary blood glucose level’ (Figures S1 and S2 in the Supplementary Material S1.6) also gave a large increase in the probability of hospital admission when extreme values (not imputed) were considered: 59 for ‘Capillary blood ketone level’ and 108 for ‘Capillary blood glucose level’ corresponded to direct hospital admission. Consequently, considering these variables proved informative for the model even when most of their values are missing.

The increases in probability of hospital admission findings obtained with SHAP for other categorical features were confirmed by the PDP results: ‘Reason: patient with multiple, severe injuries’ (Figure S3), ‘FRENCH 2’ (Figure S4), and ‘Reason: general physical deterioration’ (Figure S5) increased the probability by 25%, 12.5%, and 9%, respectively. It is noteworthy that the number of features with a positive contribution (of around 15% or 20%) to hospital admission was greater with PDP than with SHAP: ‘Reason’ with a capillary blood glucose level > 13 mmol/L and capillary blood ketone level > 1.5 mmol/L (Figure S6), ‘Jaundice’ (Figure S7), ‘Ascites’ (Figure S8), ‘Anemia < 8 g/L’ (Figure S9), ‘Glasgow Coma Score (GCS) < 8’ (Figure S10), ‘GCS 9–12’ (Figure S11), ‘Major dyspnea’ (Figure S12), and ‘Hypoxemia < 85%’ (Figure S13). Furthermore, ‘Alcohol’ (Figure S14) and ‘Administration of oxygen’ (Figure S15) were associated with a substantial increase in the likelihood of hospital admission (26% and 30%, respectively).

5.3. Medical Interpretation of Explainability Results

The application of XAI techniques to the 3P-U model revealed key features that significantly influenced admission predictions in the ED, including heart rate, patient age, arrival with multiple severe injuries, and a FRENCH triage grade of 2. These XAI results are in line with well-established medical knowledge as follows: Firstly, the higher likelihood of admission for older patients can be attributed to their increased frailty, and short- or mid-term discharge tends to be more complex [70]. Secondly, patients with severe injuries often undergo at least 24 h of monitoring, even when the initial additional examinations (lab tests, imaging, etc.) give normal results. It has been reported that the first whole-body CT scan misses severe injuries in 10% of cases, with clinical relevance in 6.7% and no clinical relevance in 3.7% [71]. Thirdly, an abnormal heart rate is a common indicator of a severe medical condition, including cardiovascular disease [72], sepsis [73], and post-surgical bleeding [74].

Conversely, features like ‘Waiting standing or in a wheelchair’ and ‘Arrival in the ED by one’s own means’ accounted for a substantial proportion of discharge predictions. One can reasonably hypothesize that a patient who has waited for test results while standing or sitting is more likely to be discharged from the ED. In such cases, the condition may not have necessitated lying down on a stretcher—making outpatient care more probable, irrespective of the final diagnosis.

The PDP explainability analysis identified other features of clear relevance for predicting hospital admission, including high capillary blood levels of ketones, glucose, and alcohol, a FRENCH triage grade of 2, and jaundice. These results have compelling clinical explanations. Firstly, elevated capillary blood ketone and glucose levels indicate a high likelihood of a decompensated diabetic disorder, possibly accompanied by acidoketosis [75]. This type of condition often necessitates hospital admission for appropriate management and care. Secondly, the FRENCH triage grade 3 corresponds to relatively urgent cases requiring treatment following a nurse evaluation, while grades 4 or 5 denote less urgent situations. As urgency increases, so does the likelihood of admission [55]; this explains the impact of this triage grade on the prediction of admission. Thirdly, jaundice indicates a hepatic or biliary disorder requiring hospital treatment. Given its medical significance, jaundice influences the decision to admit patients exhibiting this sign.

The results given by the SHAP and PDP XAI techniques reinforce the importance of these features and validate their use by emergency physicians to determine whether a patient should be admitted or discharged. There was no discordance between the two experts; the rate agreement was 100%. By shedding light on the clinical significance of these predictors, the XAI analysis bolsters the model’s credibility and its alignment with medical decision-making practices in the ED.

6. Discussion

The aim of this study is to build an acceptable model, provide explanations on this model, and assess the acceptability of these explanations by the emergency physicians. The 3P-U predictive model demonstrated acceptable performance in predicting the patient’s pathway within the ED. The ANN outperforms other algorithms considering the AUROC, whereas the accuracy and the precision are similar. The AUROC represents the global performance for all thresholds where the accuracy and precision represent only one threshold. As we can see in Figure 1, the AUROC of the ANN has a significative area upon the converging point. The interpretation is that the performance is better for more thresholds than other algorithms.

The global explanations and the only two local explanations offered by XAI were in line with the two emergency physicians’ analysis involved in the study. Below, we examine key aspects of the model’s performance and assess its predictive accuracy and effectiveness.

6.1. Comparison with Previous Studies

There were significant differences between the ANN, RF, KNN, and LR models; we chose the best model (the ANN) as the reference. Based on triage data, the 3P-U model predicted the merged classes of admission or discharge of patients with an accuracy of 75%, a recall of 76%, a precision at 60%, and an AUROC of 83%. Graham et al.’s model achieved similar performance in predicting patient admission (AUROC: 85% [76]). However, the 3P-U model and Graham et al.’s model do not use exactly the same types of data. Models developed for specific patient populations (e.g., neurology or cardiology) often achieve higher performance. For example, Klang et al. predicted ICU admission on the basis of unstructured data (AUROC: 92%) [77] but the latter was not used in our study.

6.2. Explainability Approaches and Alternatives

In this part, we discuss the necessity of explainability in the healthcare domain and different approaches: the white (i.e., transparent) and black (i.e., opaque) boxes. In AI, there is an inherent conflict between performance and explainability: the most complex models (e.g., ensemble trees and neural networks) usually perform best but are less interpretable. Many ML studies predicting hospital admissions have used decision trees or ensemble models, which offer good performance and some level of interpretability. ANN models have been less commonly used—probably due to the challenges of interpretation and the higher computational costs involved in their design and training [15,16,17]. Nevertheless, deep learning is a useful method when the data are unstructured (such as the text in clinical notes) or when the number of variables is high [15,16,17]. The accuracy–explainability trade-off is therefore domain-specific; in the field of healthcare, the end users (i.e., clinicians and, in some cases, patients) and data scientists must agree on the trade-off point.

Several XAI studies of the prediction of hospital admission have focused on the importance of the input features. For example, Zhu et al. applied SHAP to a prediction model of psychiatric ward readmissions in which multivariate sources of information (demographics, pharmacology, psychology, and vital signs) were combined [35]. Recent studies have applied deep learning and XAI techniques to predict COVID-19-related hospital admissions, resource needs, and clinical deterioration [63,64,65], given the importance of diagnosing and understanding subtle disease patterns. Hence, an interpretable deep learning model was developed in order to predict the ICU admission and death of patients with COVID-19 [78]. Furthermore, Estiri et al. leveraged iterative feature selection to improve the interpretability of their ML framework for predicting the patient-level risk of hospital admission, ICU admission, the need for mechanical ventilation, and death [79]. In addition, the timely decision-making capabilities that AI models offer in ICUs have been applied to children with pneumonia; SHAP was used to assess random-forest-based prediction models (AUROC: 98.7%) and thus obtain a feature importance list that was subsequently validated by physicians [80]. XAI-based feature relevance techniques are used to support AI models for predicting hospital admission in other fields of medicine, including cardiology [81], organ transplantation [82], spine surgery [83], acute gastrointestinal bleeding [84], and general readmissions [85].

6.3. Practical Implications

Recent studies highlight the use of ML models to refine emergency triage processes. For example, integrating NLP with machine learning for Emergency Severity Index (ESI) acuity assignment has shown improved accuracy over traditional approaches [86]. Additionally, benchmark platforms leveraging large-scale electronic health record (EHR) datasets have been developed to evaluate triage prediction models’ performance in diverse clinical scenarios [87].

The prediction of admission or discharge is important for patients, the ED, and the hospital. Firstly, knowledge of the probability of admission should allow the patient to inform family members accordingly. Secondly, medical supervisors seek to optimize the flow of patients in the ED. Using the triage data, the supervisors dispatch patients to different sectors as a function of the current workload and capacity and anticipate possible additional examinations (lab tests, biomedical imaging, etc.). Automatic, systematic knowledge of the probability of admission would enable organ specialists or specialist hospital units to be contacted earlier. Thirdly, knowing the number of beds needed in real time helps the hospital’s bed manager to adopt a targeted strategy and to free up beds in the required units only. All these aspects might help to reduce the length of stay and thus the mortality rate in the ED [8,9].

At present, the 3P-U model predicts the probability of hospital admission based on triage data collected shortly after a patient’s arrival. Patients are categorized as having a high probability of admission if the predicted probability exceeds 0.6, while others are classified as “not high probability of admission.” This information supports bed managers in prioritizing patients likely to require admission, enabling more informed resource allocation decisions in coordination with ward clinicians (Figure 6). However, adoption of the 3P-U model has been limited, partly due to systemic challenges such as the availability of hospital beds and some skepticism toward AI-based approaches. During the COVID-19 pandemic, the model’s predictions were used to estimate bed requirements during periods of increased patient flow, demonstrating its potential utility in resource planning [28].

To address concerns about transparency and trust, XAI techniques such as SHAP and PDP have been integrated into the model. These methods enhance the interpretability of the ANN model’s predictions, providing clinicians with insights into how decisions are made while maintaining its predictive performance. In clinical practice, local explanations for each patient are more useful than the global explanations, which help physicians understand the model’s overall logic and behavior. Although the explainability features of the model had not been fully introduced to emergency physicians before this study, future efforts will focus on validating the model across individual specialties and involving clinicians in practical evaluations. These steps are expected to encourage greater confidence and adoption of the 3P-U model in clinical workflows. Because emergency medicine is a sensitive domain, local explanations should help to clarify the confidence in predictions given by the model. The final decision should remain the responsibility of the clinician, who considers additional information and ethical considerations [88].

6.4. Study Limitations

The present study has several limitations that should be acknowledged. First, while the explainability results were generated independently, the algorithm was developed by physicians, which may introduce bias reflecting their clinical reasoning. Although the validation aligns with findings reported in the literature, independent validation by other physicians would strengthen the robustness of the results. Additionally, future studies should explore the clinical impact of these findings, particularly by quantitatively assessing the contribution of each variable to decision-making from a medical perspective, by evaluating the perceived weight of individual patient features in clinicians’ judgments

Second, the dataset used in this study was derived from a single center, which limits the generalizability of the findings to broader patient populations. The training data likely reflect the specific characteristics of the local population and admission policies. While the model’s successful application within this center suggests potential for replication, further multi-center studies are necessary to validate its robustness in varied healthcare settings and patient cohorts. We are currently exploring collaborations with hospitals in England to develop such a multi-center validation study. This raises additional questions: should the 3P-U model be adapted or retrained to align with local admission strategies? For instance, in some EDs, hand injuries may be admitted immediately for surgical intervention, while in others, they may be scheduled for next-day ambulatory care. These differences in local clinical practice could significantly affect model generalizability.

It is important to note that the model was trained on admission policies specific to a single emergency department, influenced by local practices, hospital structure, and staffing. Since SHAP explanations are model-dependent, the resulting explanations are inherently tied to this particular model and setting. Future work should aim to systematically compare SHAP attributions across multiple models and clinical centers to assess the stability of the explanations and to better distinguish between model-specific and context-specific variations.

Third, the binary nature of the outcome (i.e., admission or discharge) restricts the granularity of predictions, whereas predicting the specific ward required for each patient would provide more actionable insights. However, this is inherently more complex, as patients are sometimes admitted to inappropriate wards due to bed shortages (e.g., patients with decompensated diabetes admitted to nephrology rather than endocrinology). Additionally, inadequate documentation of ward assignment in the ED further complicates this task.

Fourth, the XAI methodology presented in this study represents an initial clinical application and has several limitations. Notably, we did not examine the potential impact of collinearity among input features on SHAP explanations, nor did we apply proxy or statistical evaluation techniques to assess the reliability of SHAP and PDP outputs. Following the recommendations of Salih et al. [89], future work could enhance the robustness of our explainability framework by incorporating quantitative evaluation methods—such as permutation, feature removal with retraining, and perturbation-based assessments—as well as formally addressing variable collinearity.

Although SHAP is widely used in healthcare AI, it has several limitations that require careful consideration. A key issue is the assumption of feature independence, which can lead to inaccurate attributions when input features are correlated, which is a frequent scenario in clinical datasets. Research has shown that even modest correlations can distort SHAP values, misrepresenting the true importance of features [88,89]. Additionally, SHAP values are sensitive to the choice of background distribution and model architecture, making the explanations less stable and sometimes inconsistent across different implementations [90,91]. Moreover, some studies (e.g., Kumar et al. [89]) have noted that SHAP outputs do not always align with clinical intuition, particularly in cases involving feature redundancy or complex interactions. Given these concerns, we emphasize the importance of human oversight and recommend that SHAP explanations be interpreted alongside clinical expertise. Addressing these limitations contributes to a more transparent and responsible application of XAI in healthcare settings.

Lastly, this study did not account for potential bias arising from “data not purposely collected”, as the model’s performance remained within acceptable ranges. Future work could investigate how missing or uncollected data might impact predictions, particularly in scenarios requiring high precision for specific subgroups.

7. Conclusions

This study developed and evaluated the 3P-U model, an XAI framework for predicting hospital admissions using triage data in emergency departments (EDs). The integration of explainability techniques, such as SHAP and PDP, ensured that the model’s predictions were transparent and clinically interpretable. Validated by emergency physicians, the model identified critical predictors, including age, heart rate, injury severity, and the FRENCH triage grade, which align with established clinical reasoning. The 3P-U model demonstrated its utility in supporting resource allocation and improving patient flow during high-demand periods, such as the COVID-19 pandemic.

Despite being limited to a single-center dataset and focusing on binary outcomes, the 3P-U model offers a practical solution for addressing ED overcrowding. Its adaptability to other clinical settings will depend on further multi-center validation and refinement to include ward-specific predictions. By integrating AI-driven insights into routine workflows, the model demonstrates how XAI can enhance operational efficiency and support timely, data-driven decision-making in healthcare.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/app15158449/s1.

Author Contributions

Conceptualization, É.A., P.A.M.-S., M.E., G.D. and D.A.G.; Methodology, É.A., P.A.M.-S., M.E., G.D. and D.A.G.; Software, É.A., P.A.M.-S. and M.E.; Validation, É.A., P.A.M.-S. and M.E.; Formal analysis, É.A., P.A.M.-S. and M.E.; Investigation, É.A., P.A.M.-S., M.E. and D.A.G.; Resources, É.A. and P.A.M.-S.; Data curation, É.A.; Writing—original draft, É.A., P.A.M.-S. and M.E.; Writing—review & editing, C.A., M.v.G., G.D. and D.A.G.; Visualization, P.A.M.-S., M.E., C.A., M.v.G., G.D. and D.A.G.; Supervision, M.v.G., G.D. and D.A.G.; Project administration, G.D. and D.A.G.; Funding acquisition, D.A.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki, and approved by the Institutional Review Board of Amiens-Picardy University Hospital (protocol code PI2019_843_0066 approved by 21 October 2019).

Informed Consent Statement

Patient consent was waived but was informed due to the category of the study.

Data Availability Statement

The datasets generated during and/or analyzed during the current study are not publicly available for privacy reasons, but are available from the corresponding author on reasonable request.

Acknowledgments

We thank William Gacquer and Maxime Delangue (Cellule Infocentre, CHU Amiens-Picardie, Amiens, France) for helping us to gain access to well-structured data. We also thank David Fraser (Biotech Communication SARL, Ploudalmézeau, France) for copy-editing and editorial assistance.

Conflicts of Interest

The authors have no conflicts of interest to disclose.

References

Hong, W.S.; Haimovich, A.D.; Taylor, R.A. Predicting Hospital Admission at Emergency Department Triage Using Machine Learning. PLoS ONE 2018, 13, e0201016. [Google Scholar] [CrossRef]
Sartini, M.; Carbone, A.; Demartini, A.; Giribone, L.; Oliva, M.; Spagnolo, A.M.; Cremonesi, P.; Canale, F.; Cristina, M.L. Overcrowding in Emergency Department: Causes, Consequences, and Solutions-A Narrative Review. Healthcare 2022, 10, 1625. [Google Scholar] [CrossRef]
Trzeciak, S.; Rivers, E.P. Emergency Department Overcrowding in the United States: An Emerging Threat to Patient Safety and Public Health. Emerg. Med. J. 2003, 20, 402–405. [Google Scholar] [CrossRef]
Zhang, X.; Kim, J.; Patzer, R.E.; Pitts, S.R.; Patzer, A.; Schrager, J.D. Prediction of Emergency Department Hospital Admission Based on Natural Language Processing and Neural Networks. Methods Inf. Med. 2017, 56, 377–389. [Google Scholar] [CrossRef]
Kraaijvanger, N.; Rijpsma, D.; Roovers, L.; van Leeuwen, H.; Kaasjager, K.; van den Brand, L.; Horstink, L.; Edwards, M. Development and Validation of an Admission Prediction Tool for Emergency Departments in the Netherlands. Emerg. Med. J. 2018, 35, 464–470. [Google Scholar] [CrossRef] [PubMed]
Richardson, D.B. Increase in Patient Mortality at 10 Days Associated with Emergency Department Overcrowding. Med. J. Aust. 2006, 184, 4. [Google Scholar] [CrossRef]
Sprivulis, P.C.; Silva, J.-A.D.; Jacobs, I.G.; Frazer, A.R.L.; Jelinek, G.A. The Association between Hospital Overcrowding and Mortality among Patients Admitted via Western Australian Emergency Departments. Med. J. Aust. 2006, 184, 5. [Google Scholar] [CrossRef]
Guttmann, A.; Schull, M.J.; Vermeulen, M.J.; Stukel, T.A. Association between Waiting Times and Short Term Mortality and Hospital Admission after Departure from Emergency Department: Population Based Cohort Study from Ontario, Canada. BMJ 2011, 342, d2983. [Google Scholar] [CrossRef] [PubMed]
Jones, S.; Moulton, C.; Swift, S.; Molyneux, P.; Black, S.; Mason, N.; Oakley, R.; Mann, C. Association between Delays to Patient Admission from the Emergency Department and All-Cause 30-Day Mortality. Emerg. Med. J. 2022, 39, 168–173. [Google Scholar] [CrossRef]
Weiss, S.J.; Ernst, A.A.; Derlet, R.; King, R.; Bair, A.; Nick, T.G. Relationship between the National ED Overcrowding Scale and the Number of Patients Who Leave without Being Seen in an Academic ED. Am. J. Emerg. Med. 2005, 23, 288–294. [Google Scholar] [CrossRef] [PubMed]
Higginson, I.; Boyle, A. What Should We Do about Crowding in Emergency Departments? Br. J. Hosp. Med. 2018, 79, 500–503. [Google Scholar] [CrossRef] [PubMed]
Yarmohammadian, M.; Rezaei, F.; Haghshenas, A.; Tavakoli, N. Overcrowding in Emergency Departments: A Review of Strategies to Decrease Future Challenges. J. Res. Med. Sci. 2017, 22, 23. [Google Scholar] [CrossRef] [PubMed]
Hoot, N.R.; Aronsky, D. Systematic Review of Emergency Department Crowding: Causes, Effects, and Solutions. Ann. Emerg. Med. 2008, 52, 126–136.e1. [Google Scholar] [CrossRef]
Peck, J.S.; Benneyan, J.C.; Nightingale, D.J.; Gaehde, S.A. Predicting Emergency Department Inpatient Admissions to Improve Same-Day Patient Flow: PREDICTING ED INPATIENT ADMISSIONS. Acad. Emerg. Med. 2012, 19, E1045–E1054. [Google Scholar] [CrossRef]
Sánchez-Salmerón, R.; Gómez-Urquiza, J.L.; Albendín-García, L.; Correa-Rodríguez, M.; Martos-Cabrera, M.B.; Velando-Soriano, A.; Suleiman-Martos, N. Machine Learning Methods Applied to Triage in Emergency Services: A Systematic Review. Int. Emerg. Nurs. 2022, 60, 101109. [Google Scholar] [CrossRef]
Naemi, A.; Schmidt, T.; Mansourvar, M.; Naghavi-Behzad, M.; Ebrahimi, A.; Wiil, U.K. Machine Learning Techniques for Mortality Prediction in Emergency Departments: A Systematic Review. BMJ Open 2021, 11, e052663. [Google Scholar] [CrossRef]
Fernandes, M.; Vieira, S.M.; Leite, F.; Palos, C.; Finkelstein, S.; Sousa, J.M.C. Clinical Decision Support Systems for Triage in the Emergency Department Using Intelligent Systems: A Review. Artif. Intell. Med. 2020, 102, 101762. [Google Scholar] [CrossRef]
Stewart, J.; Lu, J.; Goudie, A.; Bennamoun, M.; Sprivulis, P.; Sanfillipo, F.; Dwivedi, G. Applications of Machine Learning to Undifferentiated Chest Pain in the Emergency Department: A Systematic Review. PLoS ONE 2021, 16, e0252612. [Google Scholar] [CrossRef]
Murray, N.M.; Unberath, M.; Hager, G.D.; Hui, F.K. Artificial Intelligence to Diagnose Ischemic Stroke and Identify Large Vessel Occlusions: A Systematic Review. J. Neurointerv. Surg. 2020, 12, 156–164. [Google Scholar] [CrossRef]
Poon, A.I.F.; Sung, J.J.Y. Opening the Black Box of AI-Medicine. J. Gastroenterol. Hepatol. 2021, 36, 581–584. [Google Scholar] [CrossRef] [PubMed]
Carvalho, T.P.; Soares, F.A.A.M.N.; Vita, R.; Francisco, R.d.P.; Basto, J.P.; Alcalá, S.G.S. A Systematic Literature Review of Machine Learning Methods Applied to Predictive Maintenance. Comput. Ind. Eng. 2019, 137, 106024. [Google Scholar] [CrossRef]
Tjoa, E.; Guan, C. A Survey on Explainable Artificial Intelligence (XAI): Toward Medical XAI. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 4793–4813. [Google Scholar] [CrossRef]
Barredo Arrieta, A.; Díaz-Rodríguez, N.; Del Ser, J.; Bennetot, A.; Tabik, S.; Barbado, A.; Garcia, S.; Gil-Lopez, S.; Molina, D.; Benjamins, R.; et al. Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges Toward Responsible AI. Inf. Fusion 2020, 58, 82–115. [Google Scholar] [CrossRef]
Stiglic, G.; Kocbek, P.; Fijacko, N.; Zitnik, M.; Verbert, K.; Cilar, L. Interpretability of Machine Learning-Based Prediction Models in Healthcare. WIREs Data Min. Knowl. Discov. 2020, 10, e1379. [Google Scholar] [CrossRef]
Mourby, M.; Ó Cathaoir, K.; Collin, C.B. Transparency of Machine-Learning in Healthcare: The GDPR & European Health Law. Comput. Law Secur. Rev. 2021, 43, 105611. [Google Scholar] [CrossRef]
Résultats Du Recensement de La Population—Picardie: Une Faible Croissance Démographique, Un Déficit Migratoire Qui s’aggrave.—Insee Picardie Analyses. Available online: https://www.insee.fr/fr/statistiques/1290329 (accessed on 29 October 2022).
Arnaud, E.; Elbattah, M.; Gignon, M.; Dequen, G. Deep Learning to Predict Hospitalization at Triage: Integration of Structured Data and Unstructured Text. In Proceedings of the 2020 IEEE International Conference on Big Data (Big Data), Atlanta, GA, USA, 10–13 December 2020; IEEE: Atlanta, GA, USA, 2020; pp. 4836–4841. [Google Scholar]
Arnaud, E.; Elbattah, M.; Ammirati, C.; Dequen, G.; Ghazali, D.A. Use of Artificial Intelligence to Manage Patient Flow in Emergency Department during the COVID-19 Pandemic: A Prospective, Single-Center Study. Int. J. Environ. Res. Public Health 2022, 19, 9667. [Google Scholar] [CrossRef] [PubMed]
Collins, G.S.; Reitsma, J.B.; Altman, D.G.; Moons, K.G.M. Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD). Br. J. Surg. 2015, 102, 148–158. [Google Scholar] [CrossRef]
Luo, W.; Phung, D.; Tran, T.; Gupta, S.; Rana, S.; Karmakar, C.; Shilton, A.; Yearwood, J.; Dimitrova, N.; Ho, T.B.; et al. Guidelines for Developing and Reporting Machine Learning Predictive Models in Biomedical Research: A Multidisciplinary View. J. Med. Internet Res. 2016, 18, e323. [Google Scholar] [CrossRef] [PubMed]
Tyler, S.; Olis, M.; Aust, N.; Patel, L.; Simon, L.; Triantafyllidis, C.; Patel, V.; Lee, D.W.; Ginsberg, B.; Ahmad, H.; et al. Use of Artificial Intelligence in Triage in Hospital Emergency Departments: A Scoping Review. Cureus 2024, 16, e59906. [Google Scholar] [CrossRef] [PubMed]
Chenais, G.; Lagarde, E.; Gil-Jardiné, C. Artificial Intelligence in Emergency Medicine: Viewpoint of Current Applications and Foreseeable Opportunities and Challenges. J. Med. Internet Res. 2023, 25, e40031. [Google Scholar] [CrossRef]
Yao, L.-H.; Leung, K.-C.; Tsai, C.-L.; Huang, C.-H.; Fu, L.-C. A Novel Deep Learning-Based System for Triage in the Emergency Department Using Electronic Medical Records: Retrospective Cohort Study. J. Med. Internet Res. 2021, 23, e27008. [Google Scholar] [CrossRef] [PubMed]
López Alcolea, J.; Fernández Alfonso, A.; Cano Alonso, R.; Álvarez Vázquez, A.; Díaz Moreno, A.; García Castellanos, D.; Sanabria Greciano, L.; Hayoun, C.; Recio Rodríguez, M.; Andreu Vázquez, C.; et al. Diagnostic Performance of Artificial Intelligence in Chest Radiographs Referred from the Emergency Department. Diagnostics 2024, 14, 2592. [Google Scholar] [CrossRef] [PubMed]
Zhu, T.; Jiang, J.; Hu, Y.; Zhang, W. Individualized Prediction of Psychiatric Readmissions for Patients with Major Depressive Disorder: A 10-Year Retrospective Cohort Study. Transl. Psychiatry 2022, 12, 170. [Google Scholar] [CrossRef]
Choi, A.; Lee, K.; Hyun, H.; Kim, K.J.; Ahn, B.; Lee, K.H.; Hahn, S.; Choi, S.Y.; Kim, J.H. A Novel Deep Learning Algorithm for Real-Time Prediction of Clinical Deterioration in the Emergency Department for a Multimodal Clinical Decision Support System. Sci. Rep. 2024, 14, 30116. [Google Scholar] [CrossRef]
Doshi-Velez, F.; Kim, B. Towards a Rigorous Science of Interpretable Machine Learning. arXiv 2017, arXiv:1702.08608. [Google Scholar] [CrossRef]
Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13 August 2016; Association for Computing Machinery: New York, NY, USA, 2016; pp. 1135–1144. [Google Scholar]
Caruana, R.; Lou, Y.; Gehrke, J.; Koch, P.; Sturm, M.; Elhadad, N. Intelligible Models for HealthCare: Predicting Pneumonia Risk and Hospital 30-Day Readmission. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, NSW, Australia, 10–13 August 2015; Association for Computing Machinery: New York, NY, USA, 2015; pp. 1721–1730. [Google Scholar]
Cabitza, F.; Campagner, A.; Balsano, C. Bridging the “Last Mile” Gap between AI Implementation and Operation: “Data Awareness” That Matters. Ann. Transl. Med. 2020, 8, 501. [Google Scholar] [CrossRef]
Lundberg, S.M.; Lee, S.-I. A Unified Approach to Interpreting Model Predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Curran Associates Inc.: Red Hook, NY, USA, 2017; pp. 4768–4777. [Google Scholar]
Caruana, R. Case-Based Explanation for Artificial Neural Nets. In Artificial Neural Networks in Medicine and Biology; Malmgren, H., Borga, M., Niklasson, L., Eds.; Springer: London, UK, 2000; pp. 303–308. [Google Scholar]
Parola, M.; Galatolo, F.A.; La Mantia, G.; Cimino, M.G.C.A.; Campisi, G.; Di Fede, O. Towards Explainable Oral Cancer Recognition: Screening on Imperfect Images via Informed Deep Learning and Case-Based Reasoning. Comput. Med. Imaging Graph. 2024, 117, 102433. [Google Scholar] [CrossRef]
Arnaud, E.; Elbattah, M.; Moreno-Sánchez, P.A.; Dequen, G.; Ghazali, D.A. Explainable NLP Model for Predicting Patient Admissions at Emergency Department Using Triage Notes. In Proceedings of the 2023 IEEE International Conference on Big Data (BigData), Sorrento, Italy, 15–18 December 2023; pp. 4843–4847. [Google Scholar]
Moreno-Sánchez, P.A.; Aalto, M.; van Gils, M. Prediction of Patient Flow in the Emergency Department Using Explainable Artificial Intelligence. Digit. Health 2024, 10, 20552076241264194. [Google Scholar] [CrossRef]
Okada, Y.; Ning, Y.; Ong, M.E.H. Explainable Artificial Intelligence in Emergency Medicine: An Overview. Clin. Exp. Emerg. Med. 2023, 10, 354–362. [Google Scholar] [CrossRef]
Chmiel, F.P.; Burns, D.K.; Azor, M.; Borca, F.; Boniface, M.J.; Zlatev, Z.D.; White, N.M.; Daniels, T.W.V.; Kiuber, M. Using Explainable Machine Learning to Identify Patients at Risk of Reattendance at Discharge from Emergency Departments. Sci. Rep. 2021, 11, 21513. [Google Scholar] [CrossRef]
Petsis, S.; Karamanou, A.; Kalampokis, E.; Tarabanis, K. Forecasting and Explaining Emergency Department Visits in a Public Hospital. J. Intell. Inf. Syst. 2022, 59, 479–500. [Google Scholar] [CrossRef]
Peláez-Rodríguez, C.; Torres-López, R.; Pérez-Aracil, J.; López-Laguna, N.; Sánchez-Rodríguez, S.; Salcedo-Sanz, S. An Explainable Machine Learning Approach for Hospital Emergency Department Visits Forecasting Using Continuous Training and Multi-Model Regression. Comput. Methods Programs Biomed. 2024, 245, 108033. [Google Scholar] [CrossRef] [PubMed]
Piliuk, K.; Tomforde, S. Artificial Intelligence in Emergency Medicine. A Systematic Literature Review. Int. J. Med. Inform. 2023, 180, 105274. [Google Scholar] [CrossRef] [PubMed]
Lee, S.; Lee, J.; Park, J.; Park, J.; Kim, D.; Lee, J.; Oh, J. Deep Learning-Based Natural Language Processing for Detecting Medical Symptoms and Histories in Emergency Patient Triage. Am. J. Emerg. Med. 2024, 77, 29–38. [Google Scholar] [CrossRef]
Laxar, D.; Eitenberger, M.; Maleczek, M.; Kaider, A.; Hammerle, F.P.; Kimberger, O. The Influence of Explainable vs. Non-Explainable Clinical Decision Support Systems on Rapid Triage Decisions: A Mixed Methods Study. BMC Med. 2023, 21, 359. [Google Scholar] [CrossRef]
Casiraghi, E.; Malchiodi, D.; Trucco, G.; Frasca, M.; Cappelletti, L.; Fontana, T.; Esposito, A.A.; Avola, E.; Jachetti, A.; Reese, J.; et al. Explainable Machine Learning for Early Assessment of COVID-19 Risk Prediction in Emergency Departments. IEEE Access 2020, 8, 196299–196325. [Google Scholar] [CrossRef]
Rao, A.R.; Wang, H.; Gupta, C. Predictive Analysis for Optimizing Port Operations. Appl. Sci. 2025, 15, 2877. [Google Scholar] [CrossRef]
Taboulet, P.; Moreira, V.; Haas, L.; Porcher, R.; Braganca, A.; Fontaine, J.-P.; Poncet, M.-C. Triage with the French Emergency Nurses Classification in Hospital Scale: Reliability and Validity. Eur. J. Emerg. Med. 2009, 16, 61–67. [Google Scholar] [CrossRef]
Arnaud, E.; Elbattah, M.; Ammirati, C.; Dequen, G.; Ghazali, D.A. Predictive Models in Emergency Medicine and Their Missing Data Strategies: A Systematic Review. npj Digit. Med. 2023, 6, 28. [Google Scholar] [CrossRef] [PubMed]
Al’Aref, S.J.; Anchouche, K.; Singh, G.; Slomka, P.J.; Kolli, K.K.; Kumar, A.; Pandey, M.; Maliakal, G.; van Rosendael, A.R.; Beecy, A.N.; et al. Clinical Applications of Machine Learning in Cardiovascular Disease and Its Relevance to Cardiac Imaging. Eur. Heart J. 2019, 40, 1975–1986. [Google Scholar] [CrossRef]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Kleinbaum, D.G.; Klein, M. Logistic Regression, Statistics for Biology and Health; Springer: New York, NY, USA, 2010; ISBN 978-1-4419-1741-6. [Google Scholar]
Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Byrd, R.H.; Peihuang, L.; Nocedal, J. A Limited-Memory Algorithm for Bound-Constrained Optimization. Available online: https://digital.library.unt.edu/ark:/67531/metadc666315/ (accessed on 1 November 2022).
Pedregosa, F. Scikit-Learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 6, 2825–2830. [Google Scholar]
Chollet, F. Keras 2015. Available online: https://keras.io (accessed on 12 November 2019).
Caterson, J.; Lewin, A.; Williamson, E. The Application of Explainable Artificial Intelligence (XAI) in Electronic Health Record Research: A Scoping Review. Digit. Health 2024, 10, 20552076241272657. [Google Scholar] [CrossRef] [PubMed]
Chaddad, A.; Peng, J.; Xu, J.; Bouridane, A. Survey of Explainable AI Techniques in Healthcare. Sensors 2023, 23, 634. [Google Scholar] [CrossRef]
Lundberg, S.M.; Nair, B.; Vavilala, M.S.; Horibe, M.; Eisses, M.J.; Adams, T.; Liston, D.E.; Low, D.K.-W.; Newman, S.-F.; Kim, J.; et al. Explainable Machine-Learning Predictions for the Prevention of Hypoxaemia during Surgery. Nat. Biomed. Eng. 2018, 2, 749–760. [Google Scholar] [CrossRef]
Friedman, J.H. Greedy Function Approximation: A Gradient Boosting Machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
Welcome to the SHAP Documentation—SHAP Latest Documentation. Available online: https://shap.readthedocs.io/en/latest/index.html (accessed on 14 June 2021).
PDPbox—PDPbox 0.2.0+17.Gb022a0a. Dirty Documentation. Available online: https://pdpbox.readthedocs.io/en/latest/ (accessed on 16 December 2022).
Sirois, M.-J.; Griffith, L.; Perry, J.; Daoust, R.; Veillette, N.; Lee, J.; Pelletier, M.; Wilding, L.; Émond, M. Measuring Frailty Can Help Emergency Departments Identify Independent Seniors at Risk of Functional Decline After Minor Injuries. J. Gerontol. 2017, 72, 68–74. [Google Scholar] [CrossRef] [PubMed][Green Version]
Geyer, L.L.; Körner, M.; Linsenmaier, U.; Huber-Wagner, S.; Kanz, K.-G.; Reiser, M.F.; Wirth, S. Incidence of Delayed and Missed Diagnoses in Whole-Body Multidetector CT in Patients with Multiple Injuries after Trauma. Acta. Radiol. 2013, 54, 592–598. [Google Scholar] [CrossRef]
Bangalore, S.; Messerli, F.H.; Ou, F.-S.; Tamis-Holland, J.; Palazzo, A.; Roe, M.T.; Hong, M.K.; Peterson, E.D.; for the CRUSADE Investigators. The Association of Admission Heart Rate and In-Hospital Cardiovascular Events in Patients with Non-ST-Segment Elevation Acute Coronary Syndromes: Results from 135 164 Patients in the CRUSADE Quality Improvement Initiative. Eur. Heart J. 2010, 31, 552–560. [Google Scholar] [CrossRef]
Barnaby, D.; Ferrick, K.; Kaplan, D.T.; Shah, S.; Bijur, P.; Gallagher, E.J. Heart Rate Variability in Emergency Department Patients with Sepsis. Acad. Emerg. Med. 2002, 9, 661–670. [Google Scholar] [CrossRef]
Olaussen, A.; Blackburn, T.; Mitra, B.; Fitzgerald, M. Review Article: Shock Index for Prediction of Critical Bleeding Post-Trauma: A Systematic Review. Emerg. Med. Australas. 2014, 26, 223–228. [Google Scholar] [CrossRef]
Kitabchi, A.E.; Wall, B.M. Management of Diabetic Ketoacidosis. Am. Fam. Physician 1999, 60, 455–464. [Google Scholar]
Graham, B.; Bond, R.; Quinn, M.; Mulvenna, M. Using Data Mining to Predict Hospital Admissions From the Emergency Department. IEEE Access 2018, 6, 10458–10469. [Google Scholar] [CrossRef]
Klang, E.; Kummer, B.R.; Dangayach, N.S.; Zhong, A.; Kia, M.A.; Timsina, P.; Cossentino, I.; Costa, A.B.; Levin, M.A.; Oermann, E.K. Predicting Adult Neuroscience Intensive Care Unit Admission from Emergency Department Triage Using a Retrospective, Tabular-Free Text Machine Learning Approach. Sci. Rep. 2021, 11, 1381. [Google Scholar] [CrossRef] [PubMed]
Nazir, A.; Ampadu, H.K. Interpretable Deep Learning for the Prediction of ICU Admission Likelihood and Mortality of COVID-19 Patients. PeerJ Comput. Sci. 2022, 8, e889. [Google Scholar] [CrossRef] [PubMed]
Estiri, H.; Strasser, Z.H.; Murphy, S.N. Individualized Prediction of COVID-19 Adverse Outcomes with MLHO. Sci. Rep. 2021, 11, 5322. [Google Scholar] [CrossRef] [PubMed]
Liu, Y.-C.; Cheng, H.-Y.; Chang, T.-H.; Ho, T.-W.; Liu, T.-C.; Yen, T.-Y.; Chou, C.-C.; Chang, L.-Y.; Lai, F. Evaluation of the Need for Intensive Care in Children With Pneumonia: Machine Learning Approach. JMIR Med. Inform. 2022, 10, e28934. [Google Scholar] [CrossRef] [PubMed]
Kucukseymen, S.; Arafati, A.; Al-Otaibi, T.; El-Rewaidy, H.; Fahmy, A.S.; Ngo, L.H.; Nezafat, R. Noncontrast Cardiac Magnetic Resonance Imaging Predictors of Heart Failure Hospitalization in Heart Failure With Preserved Ejection Fraction. J. Magn. Reson. Imaging 2022, 55, 1812–1825. [Google Scholar] [CrossRef]
Killian, M.O.; Payrovnaziri, S.N.; Gupta, D.; Desai, D.; He, Z. Machine Learning–Based Prediction of Health Outcomes in Pediatric Organ Transplantation Recipients. JAMIA Open 2021, 4, ooab008. [Google Scholar] [CrossRef]
Martini, M.L.; Neifert, S.N.; Gal, J.S.; Oermann, E.K.; Gilligan, J.T.; Caridi, J.M. Drivers of Prolonged Hospitalization Following Spine Surgery: A Game-Theory-Based Approach to Explaining Machine Learning Models. JBJS 2021, 103, 64–73. [Google Scholar] [CrossRef] [PubMed]
Deshmukh, F.; Merchant, S.S. Explainable Machine Learning Model for Predicting GI Bleed Mortality in the Intensive Care Unit. Off. J. Am. Coll. Gastroenterol. ACG 2020, 115, 1657–1668. [Google Scholar] [CrossRef]
Hilton, C.B.; Milinovich, A.; Felix, C.; Vakharia, N.; Crone, T.; Donovan, C.; Proctor, A.; Nazha, A. Personalized Predictions of Patient Outcomes during and after Hospitalization Using Artificial Intelligence. npj Digit. Med. 2020, 3, 51. [Google Scholar] [CrossRef]
Ivanov, O.; Wolf, L.; Brecher, D.; Lewis, E.; Masek, K.; Montgomery, K.; Andrieiev, Y.; McLaughlin, M.; Liu, S.; Dunne, R.; et al. Improving ED Emergency Severity Index Acuity Assignment Using Machine Learning and Clinical Natural Language Processing. J. Emerg. Nurs. 2021, 47, 265–278.e7. [Google Scholar] [CrossRef]
Xie, F.; Zhou, J.; Lee, J.W.; Tan, M.; Li, S.; Rajnthern, L.S.; Chee, M.L.; Chakraborty, B.; Wong, A.-K.I.; Dagan, A.; et al. Benchmarking Emergency Department Prediction Models with Machine Learning and Public Electronic Health Records. Sci. Data 2022, 9, 658. [Google Scholar] [CrossRef]
Aas, K.; Jullum, M.; Løland, A. Explaining Individual Predictions When Features Are Dependent: More Accurate Approximations to Shapley Values. Artif. Intell. 2021, 298, 103502. [Google Scholar] [CrossRef]
Kumar, I.E.; Venkatasubramanian, S.; Scheidegger, C.; Friedler, S. Problems with Shapley-Value-Based Explanations as Feature Importance Measures. In Proceedings of the 37th International Conference on Machine Learning, PMLR, Virtual, 21 November 2020; pp. 5491–5500. [Google Scholar]
Salih, A.M.; Raisi-Estabragh, Z.; Galazzo, I.B.; Radeva, P.; Petersen, S.E.; Lekadir, K.; Menegaz, G. A Perspective on Explainable Artificial Intelligence Methods: SHAP and LIME. Adv. Intell. Syst. 2025, 7, 2400304. [Google Scholar] [CrossRef]
Slack, D.; Hilgard, S.; Jia, E.; Singh, S.; Lakkaraju, H. Fooling LIME and SHAP: Adversarial Attacks on Post Hoc Explanation Methods. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, New York, NY, USA, 7–9 February 2020; Association for Computing Machinery: New York, NY, USA, 2020; pp. 180–186. [Google Scholar]

Figure 1. The receiver operating characteristic curves for the data from the test set. The gray dashed line represents the random predictor.

Figure 2. The SHAP summary dot plot of the hospital admission model’s overall explainability: the higher the SHAP value (for quantitative variables), the higher probability of hospital admission, and vice versa. With respect to binary features or one-hot coded categorical features, the red and blue colors refer to values of 1 and 0, respectively. The French Emergency Nurses Classification in Hospital (FRENCH) triage scale is detailed in Supplementary Material S1.5 and in Taboulet et al. [55].

Figure 3. The individual explainability of true positive (a) and true negative (b) cases, using SHAP. The blue colors indicates a negative impact on the prediction whereas the red color indicates a positive impact on the prediction.

Figure 4. The PDP for age. The solid dark line represents the mean predicted outcome across age, with individual points indicating specific ages at which predictions were calculated. The shaded area around the curve represents the 95% confidence interval of the predictions. The horizontal dashed red line marks the baseline reference at zero, facilitating visual interpretation of deviations from baseline.

Figure 5. The PDP for heart rate. The solid dark line represents the mean predicted outcome across age, with individual points indicating specific heart rates at which predictions were calculated. The shaded area around the curve represents the 95% confidence interval of the predictions. The horizontal dashed red line marks the baseline reference at zero, facilitating visual interpretation of deviations from baseline.

Figure 6. 3P-U dashboard with prediction of admission (blue progress bar) that impacts the disposition status (probably admitted). All statuses are presented in the global view.

Table 1. The FRENCH triage scale [55].

Triage Level	Description	Action
1	Immediately life-threatening	Immediate medical intervention
2	Marked impairment of a vital organ or imminently life-threatening	Medical intervention within 20 min
3	Functional impairment or organic lesions likely to deteriorate within 24 h or complex medical situation requiring several hospital resources	Medical intervention within 60 min
4	Stable, noncomplex functional impairment or organic lesions but requiring urgent use of at least one hospital resource	Medical intervention within 120 min
5	No functional impairment or organic lesion requiring no hospital resource	Medical intervention within 240 min
*	Intense symptom or abnormal vital parameter requiring rapid corrective action	Specific action within 20 min

* is added to another triage level to flag the requirement of a rapide corrective action.

Table 2. Principal characteristics of the study population.

Characteristics	Overall
Demographic characteristics
Number of patients	302,966 (100%)
Age	51 (22)
Sex
Male	156,621 (51.7%)
Female	146,345 (48.3%)
Clinical triage characteristics
Heart rate (/min)	85 (17.8)
Systolic blood pressure (mmHg)	136 (24)
Diastolic blood pressure (mmHg)	77 (24)
Blood oxygen saturation (%)	99 (2)
Body temperature (°C)	36.6 (0.8)
Capillary blood glucose level (mmol/L)	7.72 (4.78)
Capillary blood ketone level (mmol/L)	1.13 (3.40)
Oxygen flow (L/min)	0.65 (4.5)
Capillary blood hemoglobin level (dg/dL)	11.4 (3.1)
Expired breath alcohol level (g/L)	1.87 (0.83)
Bladder volume (mL)	366 (320)
Pain intensity	3 (3)
FRENCH triage scale grade
1	930 (0.1%)
2	15,174 (5.1%)
3	136,839 (45.9%)
4	85,235 (28.7%)
5	60,280 (20.2%)
Outcome
Admission	99,340 (32.8%)
Discharge	203,626 (67.2%)

Table 3. Performance of models on the test set.

	ANN	LR	KNN	RF
AUROC	83.2%	71.5%	67.1%	71.8%
Accuracy	77.5%	77.2%	73.0%	77.6%
Precision	68.9%	69.4%	60.6%	70.4%
Recall	57.2%	54.7%	50.0%	54.8%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Arnaud, É.; Moreno-Sanchez, P.A.; Elbattah, M.; Ammirati, C.; van Gils, M.; Dequen, G.; Ghazali, D.A. Development and Clinical Interpretation of an Explainable AI Model for Predicting Patient Pathways in the Emergency Department: A Retrospective Study. Appl. Sci. 2025, 15, 8449. https://doi.org/10.3390/app15158449

AMA Style

Arnaud É, Moreno-Sanchez PA, Elbattah M, Ammirati C, van Gils M, Dequen G, Ghazali DA. Development and Clinical Interpretation of an Explainable AI Model for Predicting Patient Pathways in the Emergency Department: A Retrospective Study. Applied Sciences. 2025; 15(15):8449. https://doi.org/10.3390/app15158449

Chicago/Turabian Style

Arnaud, Émilien, Pedro Antonio Moreno-Sanchez, Mahmoud Elbattah, Christine Ammirati, Mark van Gils, Gilles Dequen, and Daniel Aiham Ghazali. 2025. "Development and Clinical Interpretation of an Explainable AI Model for Predicting Patient Pathways in the Emergency Department: A Retrospective Study" Applied Sciences 15, no. 15: 8449. https://doi.org/10.3390/app15158449

APA Style

Arnaud, É., Moreno-Sanchez, P. A., Elbattah, M., Ammirati, C., van Gils, M., Dequen, G., & Ghazali, D. A. (2025). Development and Clinical Interpretation of an Explainable AI Model for Predicting Patient Pathways in the Emergency Department: A Retrospective Study. Applied Sciences, 15(15), 8449. https://doi.org/10.3390/app15158449

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Development and Clinical Interpretation of an Explainable AI Model for Predicting Patient Pathways in the Emergency Department: A Retrospective Study

Abstract

1. Introduction

2. Problem Statement

3. Background and Related Work

3.1. Applications of DL in ED

3.2. Triage and Risk Stratification

3.3. Patient Admission Prediction

3.4. Diagnostics and Decision Support

3.5. Introduction to XAI in Healthcare

3.6. Related Work

4. Material and Methods

4.1. Study Design, Data Source, and Participants

4.2. Machine Learning Algorithms and Performance Metrics

4.3. XAI Techniques

4.4. Qualitative Clinical Assessment

4.5. Ethics Approval

5. Results

5.1. Model Performance and Evaluation

5.2. Explainability Results

5.3. Medical Interpretation of Explainability Results

6. Discussion

6.1. Comparison with Previous Studies

6.2. Explainability Approaches and Alternatives

6.3. Practical Implications

6.4. Study Limitations

7. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI