Conformance-Aware Predictive Process Monitoring for Early Detection of Sepsis Deterioration Using Incomplete Care Pathways

Harry, Kimberly D.; Samara, Mohammad Najeh

doi:10.3390/jcm15051956

Open AccessArticle

Conformance-Aware Predictive Process Monitoring for Early Detection of Sepsis Deterioration Using Incomplete Care Pathways

by

Kimberly D. Harry

^*

and

Mohammad Najeh Samara

School of Systems Science and Industrial Engineering, Thomas J. Watson College of Engineering and Applied Science, Binghamton University, Binghamton, NY 13902, USA

^*

Author to whom correspondence should be addressed.

J. Clin. Med. 2026, 15(5), 1956; https://doi.org/10.3390/jcm15051956

Submission received: 27 January 2026 / Revised: 27 February 2026 / Accepted: 1 March 2026 / Published: 4 March 2026

(This article belongs to the Section Machine Learning and Artificial Intelligence in Clinical Medicine)

Download

Browse Figures

Versions Notes

Abstract

Background/Objectives: Sepsis is a leading cause of morbidity and mortality due to its rapid progression and variability in care delivery. While existing predictive models estimate sepsis risk using clinical variables, they typically rely on static attributes and overlook temporal, behavioral, and process-related characteristics of care pathways. In particular, deviations from recommended protocols and process inefficiencies are rarely incorporated into early deterioration prediction. This study proposes a Conformance-Aware Predictive Process Monitoring (CAPPM) framework to enable early detection of sepsis deterioration using incomplete care pathways. Methods: The proposed framework integrates process mining with predictive modeling. Using the publicly available Sepsis Cases Event Log, we first discovered the reference care pathway and generated prefix-level representations of ongoing cases. Temporal and behavioral features were engineered alongside alignment-based and declarative conformance metrics to quantify pathway deviations. These features were used to train and evaluate multiple supervised learning models, including Adaptive Boosting and Gradient Boosting. Predictive performance was assessed using the area under the receiver operating characteristic curve (AUROC). Results: Incorporating conformance and pathway-based features improved predictive performance compared to models relying solely on traditional attributes. Adaptive Boosting and Gradient Boosting achieved the strongest results, with AUROC values of 0.744 and 0.731, respectively, demonstrating enhanced early detection ability. Conclusions: The findings indicate that early deviations in care pathways and temporal progression patterns provide meaningful predictive signals for sepsis deterioration. Integrating process mining with machine learning offers a promising approach for time-critical clinical decision support and proactive intervention.

Keywords:

process mining; machine learning; healthcare; sepsis; early detection; care pathway

1. Introduction

Sepsis, a life-threatening organ dysfunction caused by a dysregulated host response to infection, remains one of the most critical challenges in modern healthcare [1]. According to the Global Burden of Diseases, Injuries, and Risk Factors Study (GBD) in 2021, there were approximately 166 million sepsis-related cases and 21.4 million sepsis-related deaths worldwide, representing nearly 31.5% of all global deaths [2]. Early identification and timely intervention are crucial to prevent sepsis condition deterioration. However, the inherent complexity of healthcare systems and numerous interrelated processes often lead to deviations from the optimal care pathways, resulting in inefficiencies, delayed interventions, and rapid patient deterioration.

While data-driven approaches using artificial intelligence (AI) have been widely employed to predict sepsis risk [3,4,5], they treat clinical data or event logs as static instances rather than a dynamic temporal sequence of events that occur over time. To address this challenge, Process Mining (PM) has emerged as a powerful discipline used to discover, monitor, analyze, and visualize real-world processes by extracting valuable insights from event logs and identifying bottlenecks [6,7]. However, traditional PM techniques are predominantly retrospective, analyzing completed workflows after execution, which in healthcare typically correspond to patient trajectories from admission to a terminal outcome such as discharge or death [8,9].

Recent advancements in Predictive Process Monitoring (PPM) attempt to bridge this gap by predicting the future of ongoing cases [8,10]. However, two critical limitations exist in the current literature. First, most PPM techniques rely on complete or near-complete traces, which are clinically unrealistic for sepsis, where decisions must be made in the first few hours. Second, existing models largely ignore conformance features, specifically, whether a patient’s current trajectory violates established medical protocols. This is a critical oversight, as non-conformance, for example, delayed antibiotics, is often a precursor to septic shock [11].

To address these gaps, this paper proposes a novel Conformance-Aware Predictive Process Monitoring (CAPPM) framework for early detection of sepsis deterioration using incomplete care pathways and combining AI/ML and PM techniques. The proposed framework integrates temporal, behavioral, and conformance-based features extracted from prefix-based traces to capture the progression of care and detect deviations from reference clinical models. By monitoring incomplete prefixes against a reference clinical model, we can detect deterioration risks that stem from workflow failures, not just physiological changes.

The key contributions of this paper are as follows:

A conformance-aware predictive monitoring framework that supports early detection of sepsis deterioration from incomplete care pathways.
A method for incorporating alignment-based conformance information, including prefix-level costs and trend indicators, into predictive modelling capability.
A pathway profiling approach that captures structural differences in patient trajectories and enhances early sepsis risk identification.
A unified feature representation that combines clinical, temporal, behavioral, and conformance signals to improve predictive accuracy.

The rest of the paper is organized as follows: Section 2 presents a review of existing works on early sepsis prediction and detection using AI/ML, as well as the integration of PM. The methodology adopted for this study is presented and discussed in Section 3. Section 4 presents and discusses the results of the analysis, and Section 5 provides the limitations of the study and presents recommendations for future research. Finally, Section 6 concludes the paper.

2. Literature Review

This section reviews the literature on early sepsis detection, covering the application of AI/ML models and the evolution of process mining in healthcare. The review focuses particularly on the recent integration of these techniques to achieve predictive process monitoring.

The literature review was conducted to identify recent advances in sepsis prediction, process mining in healthcare, and predictive process monitoring. Publications were retrieved using databases including PubMed, Scopus, IEEE Xplore, and Google Scholar. Searches were performed using combinations of keywords such as sepsis prediction, process mining in healthcare, predictive process monitoring, care pathway analysis, and machine learning for clinical deterioration. Priority was given to peer-reviewed journal and conference papers published primarily between 2015 and 2025, with earlier foundational works included where necessary. Studies were selected based on relevance to sepsis management, healthcare workflow analysis, or predictive process monitoring, with emphasis on works integrating process mining and predictive analytics.

2.1. AI/Machine Learning for Sepsis Detection

Sepsis is a life-threatening medical condition triggered by the body’s dysregulated immune response to infection, leading to tissue damage, organ dysfunction, and potentially death if not promptly treated [12]. It remains one of the leading causes of mortality and morbidity globally, particularly among critically ill and hospitalized patients. Due to its rapid progression and highly heterogeneous clinical presentation, early detection of sepsis is essential for improving patient outcomes. Recent advances in AI and ML have enabled significant progress in early prediction, diagnosis, and mortality risk assessment for sepsis [3,13,14]. Predication models based on deep learning, gradient boosting, random forests, and ensemble methods have demonstrated strong predictive performance using electronic health record (EHR) data, vital signs, laboratory results, and clinical notes.

However, despite their predictive accuracy, these models treat healthcare data as static and fail to consider the temporal, sequential, and process-oriented nature of clinical care. In real-world sepsis management, patient care unfolds through complex clinical workflows involving multiple healthcare professionals, diagnostic procedures, interventions, and time-sensitive decisions, elements that are not captured by conventional ML predictions. To address this limitation, process mining has emerged as a complementary approach that focuses on modeling and analyzing actual care pathways using event logs [15]. It enables the extraction and visualization of real-world healthcare workflows, including treatment deviations, bottlenecks, delays, and compliance with medical guidelines. Through process discovery, conformance checking, and enhancement, it provides actionable insights into operational efficiency, quality of care, and protocol adherence [16]. Several extensive reviews have been conducted exploring its applications as well as challenges in the healthcare system, providing valuable trends, insights, and directions for further research [17,18,19,20].

2.2. Process Mining and Predictive Integration

Exploring the applicability to sepsis-related clinical and operational contexts [21,22], studies conducted PM on hospital event logs of emergency department patients with sepsis to map real-world clinical workflows, identify bottlenecks, and uncover potential inefficiencies in sepsis management. These studies demonstrated that there are possible associations between discharge methods and patient readmissions, emphasizing the value of process-aware analysis for improving operational efficiency and clinical outcomes. Bakhshi et al. [23] applied Heuristics Miner (HM) and Inductive Miner (IM) to real-world sepsis patient trajectories and demonstrated that they struggle to model complex clinical pathways accurately. Similarly, authors [24] applied Alpha Miner, Heuristic Mining, and Direct Flow Graph (DFG) to a specific sepsis treatment dataset, where it was shown that PM can uncover deviations and bottlenecks in treatment workflows, offering valuable insights for improving care quality, reducing costs, and optimizing sepsis management processes.

Process Mining Project Methodology in Healthcare (PM²HC) was employed in [25] to analyze and compare sepsis treatment workflows across different patient subpopulations, such as severe and non-severe cases. Results obtained showed that distinct patient groups follow substantially different treatment trajectories and exhibit varying outcomes, emphasizing the importance of subgroup-specific workflow analysis to enable more tailored, efficient, and outcome-oriented sepsis management. Real-world trajectories of sepsis patients from emergency admission to discharge were analyzed in [26,27] using PM to discover workflow models, assess conformance with clinical guidelines, and visualize care flows. Results showed that PM can clarify patient flows, reveal deviations from recommended protocols, and uncover challenges in data quality and model interpretability, highlighting its value as an iterative, insightful tool for analyzing sepsis care processes.

These research works have demonstrated that PM can map/visualize patients’ workflow, uncover inefficiencies, reveal deviations, and remove bottlenecks, particularly considering the sepsis condition. However, it remains retrospective and descriptive, limiting its use in real-time clinical decision support. To address this limitation, recent works integrate ML with PM. For example, A previous study [28] applied the DFG process mining technique to the MIMIC-IV-ED dataset to extract emergency care pathways, pre-processed them into structured event-based features, and evaluated seven tree-based and boosting ML classification algorithms to predict clinical and operational outcomes. Results showed that RF and Extreme Gradient Boosting (XGBoost) consistently performed better than the other models across all metrics. Similarly, the authors in [29] combined PM with ML on the MOVER EHR dataset to model real-world surgical workflows, detect bottlenecks, and generate predictive insights related to operating room efficiency and postoperative outcomes. A multi-stage framework was implemented, including heuristic control-flow discovery, Petri net-based conformance checking, temporal performance analysis, unsupervised clustering, and RF-based classification, where it was seen that the process model that achieved perfect fitness of 1.0 and moderate precision of 0.46, while RF model achieved 90.33% accuracy but only 24.23% recall for delayed recovery cases, highlighting challenges in detecting high-risk patients. ML was also utilized in [30] to improve the clinical process in sepsis management. While the research work did not conduct simulations or analysis, it provided valuable insights into the benefits as well as the drawbacks of leveraging ML for process improvement.

From the review, it can be seen that while ML models provide good predictive accuracy, they largely ignore the temporal and process-oriented nature of sepsis care, and existing PM applications, although useful for analyzing workflows and detecting deviations, remain retrospective and lack predictive capabilities. Few studies have attempted to integrate PM with ML, and those that did focused mainly on surgical or operational settings, with limited attention to early sepsis deterioration and little use of conformance metrics as predictive features. This highlights a clear research gap, which this study addresses by proposing a Conformance-Aware Predictive Process Monitoring (CAPPM) framework that leverages incomplete care pathways to enable early detection of sepsis deterioration. To clarify conceptual differences between traditional PM, Predictive PM, and the proposed CAPPM framework, Table 1 summarizes their primary characteristics and intended use. Figure 1 below illustrates a graphic showing the SEPSIS care pathway predictive process integrating behavioral features, temporal features, clinical features, conformance-based features, and care pathway features (i.e., viz, sepsis type, and protocol/treatment stages).

3. Research Methodology

This section presents the methodology adopted for this research, comprising the dataset description, the data preprocessing steps, and the proposed CAPPM framework. The framework is designed to integrate PM capabilities with ML to enable early detection of sepsis deterioration.

All process mining analyses were conducted using PM4Py (Process Mining for Python) version 2.7.11 (Eindhoven University of Technology, Eindhoven, The Netherlands). Machine learning models were implemented in Python version 3.12.2 (Python Software Foundation, Wilmington, DE, USA) using scikit-learn version 1.4.2 (scikit-learn developers, online resource). SHAP analysis was performed using the SHAP library version 0.45.0 (online resource). All computations were executed in a standard Python environment.

3.1. Dataset Description

This study utilizes the publicly available Sepsis Cases Event Log originally published by Mannhardt [31], which contains real-world patient trajectories from a Dutch academic hospital. The event log records the sequence of activities from Emergency Room (ER) registration until discharge and covers approximately one year of hospital operations. Each trace represents the treatment pathway of a single patient, recorded in eXtensible Event Stream (XES) format.

Each event is at least annotated with a

a case identifier (case:concept:name),
an activity label (concept:name),
a timestamp (time:timestamp).

The main clinical activities include ER Registration, ER Triage, ER Sepsis Triage, various laboratory tests, including C-reactive protein (CRP), Leucocytes, and LacticAcid, administration of IV Liquid and IV Antibiotics, ward and intensive care admissions include Admission NC and Admission IC, possible returns to the ER, and several discharge or release outcomes. Table 2 presents descriptive statistics of activity occurrences across all cases. The event log also contains numeric clinical values for selected laboratory tests that are used as time-varying attributes in the predictive models. The event log provides a moderate number of patient cases and events, which is suitable for process discovery and predictive modelling while still reflecting the complexity of real sepsis care.

3.2. Data Preprocessing

The data processing pipeline focused on turning the raw sepsis event log into a structured, temporally consistent event log suitable for process discovery, conformance checking, and predictive modelling. It followed a stepwise methodology discussed as follows.

3.2.1. Import and Standardization

The original XES file is imported using the PM4Py toolkit, and the columns are renamed to the standard identifiers, timestamps are converted to a uniform time zone, and events are sorted by time within each case. Before training, missing numerical values were imputed using median imputation to ensure model compatibility. This generates a clean event log, mathematically represented as:

L = \{τ_{1}, τ_{2}, \dots, τ_{N}\},

(1)

where each trace

τ_{1} = (e_{i 1}, \dots, e_{i n_{i}})

is a time-ordered list of events.

3.2.2. Outcome Labelling

To support supervised learning, every case is labelled as either deteriorated or non-deteriorated, such that

D = {A d m i s s i o n I C, R e l e a s e D}

(2)

is the set of deterioration-related activities. A binary outcome

y_{i}

is defined for each trace

τ_{1}

as

y_{i} = \{\begin{array}{l} 1, i f ∋ j s u c h t h a t a_{i j} \in D, \\ 0, o t h e r w i s e, \end{array}

(3)

where

a_{i j}

denotes the activity of events

e_{i j}

. These events indicate transfer to an intensive care unit or discharge outcomes that are associated with more severe clinical status, including death. The same label is applied to all prefixes derived from the original case.

In this study, deterioration is operationally defined using IC admission and severe discharge outcomes recorded in the event log, as these events reliably indicate clinical worsening within the available dataset. Alternative definitions based on delayed IC admission, mortality, or shock-related indicators may further refine deterioration labeling. However, such variables are not consistently available in the current dataset.

3.3. CAPPM Framework

The architecture of the proposed CAPPM framework is presented in Figure 1. It consists of sequential modules designed to extract process knowledge and integrate it into a predictive workflow. The framework consists of four sequential stages: (1) Data Preprocessing and Prefix Generation, where raw event logs are cleaned and truncated into incomplete care pathways prior to any deterioration event; (2) Feature Engineering, which extracts behavioral, temporal, and clinical attributes from the incomplete traces; (3) Conformance Checking, which aligns the prefixes against a discovered reference Petri net to compute deviation metrics and pathway clusters; and (4) Predictive Modelling, where machine learning classifiers predict deterioration risk. Feature contributions are interpreted using SHapley Additive exPlanations (SHAP), a method that quantifies the impact of each feature on the model’s prediction.

3.3.1. Prefix Generation

The key objective of this study is to predict early sepsis deterioration from incomplete care pathways. Based on this objective, the goal is to simulate a real-time clinical environment where the final patient outcome is unknown and where early detection of sepsis deterioration is accomplished. The CAPPM framework utilized a prefix-based approach. Rather than analyzing completed traces, this module generates partial traces of fixed lengths

k \in {3, 5, 10}

, which correspond roughly to early, intermediate, and more advanced stages of the care trajectory. Each prefix keeps the original case identifier and is tagged by its length. This yields a prefix log that contains multiple partial views of the same patient at different stages of the care pathway.

To ensure that predictions simulate a realistic early-warning scenario and avoid information leakage, prefixes containing deterioration-defining events, such as Admission IC or severe discharge outcomes, were excluded from prediction evaluation. Only prefixes occurring strictly before the first deterioration-related event were retained for training and evaluation, ensuring that the model predicts deterioration risk prior to its occurrence. Train–test splitting was performed at the case level rather than at the prefix level. All prefixes derived from a given patient case were assigned exclusively to either the training or testing set. Mathematically, for each prefix observed at time

t

, the predictive task estimates the conditional probability:

P (Y = 1 ∣ e v e n t s o b s e r v e d u p t o t i m e t)

(4)

where

Y = 1

indicates that deterioration will occur at any future time

t^{'} > t

. This formulation ensures a strict temporal ordering between observed features and the predicted outcome. The predictive task does not rely on a fixed time horizon. Instead, for each prefix observed at time

t

, the model estimates the cumulative probability that deterioration will occur at any future time point after

t

. This formulation aligns with continuous clinical monitoring, where risk is dynamically updated as new events occur, rather than predicting deterioration within a predefined number of hours.

3.3.2. Feature Engineering

The CAPPM framework combines several groups of features at the prefix level so that they reflect the information available at prediction time. These include behavioral, temporal, clinical, and conformance-based and pathway features.

A. Behavioral Features

Behavioral features describe how many and which activities have occurred so far. Specifically, it captures the execution flow, including the number of unique activities, the occurrence of specific events. For each prefix, the following quantities are computed:

number of events $n$
number of distinct activities $| \{a_{i 1}, \dots, a_{i n}\} |$
revisit ratio $R$

R_{i}^{n} = 1 - \frac{{| \{a_{i 1}, \dots, a_{i n}\}}_{u n i q u e} |}{n}

(5)

Binary indicators capture whether key steps have occurred, such as Admission IC, any ward admission, or Return ER. These features summarize the structural progression of the care pathway up to the current point.

B. Temporal Features

Temporal features encode the timing of the prefix. These features quantify the speed of care, including the total elapsed time since registration, the mean time between events, and the time since the last recorded activity. For the event times,

t_{i 1}, \dots, t_{i k}

, the following are calculated:

elapsed hours ( $h$ ) since the first event

h_{i}^{k} = \frac{t_{i k} - t_{i 1}}{3600},

(6)

mean inter-event time in hours,
time since the last event, which can highlight periods of inactivity that may indicate waiting or delays.

C. Clinical Features

Clinical features are derived from the numeric laboratory attributes identified during preprocessing. These include the actual values of laboratory tests, for example, LacticAcid and Leucocytes, recorded within the prefix. The framework aggregates these by calculating the mean, minimum, maximum, and last-observed values.

Static case attributes, if available, are added unchanged to all prefixes belonging to that case. This creates a set of baseline features that approximate conventional ML inputs and serve as the comparative baseline in later experiments.

D. Conformance-Based and Pathway Features

The feature engineering module also integrates outputs from the conformance module and from pathway clustering. For pathway clustering, each case trajectory was represented using structural characteristics derived exclusively from events occurring prior to the first deterioration-defining event for positive cases. For non-deterioration cases, structural representations were derived from equivalent prefix-based truncations to maintain temporal consistency. Clustering was therefore performed without incorporating post-deterioration information. Feature vectors included trace length, activity presence indicators, and selected activity counts computed only from available prefix information. After standardization, K-means clustering with three clusters was applied in an unsupervised manner without using outcome labels. The resulting cluster assignment was then included as a categorical feature for all prefixes of the corresponding case.

The CAPPM framework is flexible with respect to additional clinical information sources. New diagnostic events, laboratory tests, or rapid pathogen identification technologies can be incorporated as additional activities or attributes within the event log. Corresponding behavioral, temporal, or clinical features can then be extracted without modifying the overall framework. As richer data become available, predictive performance and interpretability may improve through enhanced representation of patient care progression.

3.3.3. Conformance Checking Module

The conformance checking module is central to the CAPPM framework. It quantifies the degree to which each prefix conforms to a reference process model [32]. The module generates the core predictive features that distinguish standard variation from critical workflow failure by quantifying the discrepancy between the current activity and the expected activity. It operates through three distinct stages, which include the process discovery and reference model, prefix alignment calculation, and conformance metrics [32,33].

A. Process Discovery and Reference Model

To capture the normative sepsis pathway, the Inductive Miner algorithm was applied to the complete event log. The miner discovers a process tree that is then converted to a Petri net. The Petri net consists of places, transitions that correspond to activities, and flow relations. Initial and final markings define the start and end of a valid process instance [32]. This model serves as a reference for conformance checking. The Petri net provides a compact control-flow description that supports synchronous and asynchronous behavior, choice, parallelism, and loops; these control-flow constructs reflect typical clinical variation, such as parallel ordering of diagnostics and iterative monitoring steps [32].

B. Prefix Alignment Calculation

For every generated prefix, an optimal alignment is computed against the reference Petri net. The alignment algorithm finds a minimum-cost mapping between the observed activity sequence and the model transitions by allowing synchronous moves, which pair a matching activity with a model transition. The model-only moves are model transitions executed without a matching log event, while the log-only moves are the logged event without a matching model transition. Each alignment returns an overall cost and the counts of the three move types. Because alignment counts are typically heavy-tailed, raw counts were transformed before using them as learning inputs. Specifically, the natural log plus one transform was applied to all costs to reduce skew, improve numerical stability, and improve performance in tree and boosting models.

\log 1 p (x) = \log (1 + x)

(7)

C. Conformance Metrics

For each alignment, we derive the following conformance metrics for the prefix-level feature set:

Alignment cost: the total numeric cost returned by the alignment algorithm. Higher values indicate larger deviations from the reference model. Both the raw cost for inspection and $\log 1 p (c o s t)$ for modeling.
Synchronous moves: the number of steps the log and model agree on. Synchrony is indicative of conformity.
Model moves: model-only moves that indicate expected steps missing from the observed prefix.
Log moves: log-only moves that represent unexpected activities not permitted by the model.

In addition to these feature sets, trend features across prefixes were also computed for the same case to capture dynamic conformance behavior, and a set of declarative response-violation ratios was computed for selected activation-response pairs. For instance, ER registration to ER triage. For an activation

a

and a response

b

, the response violation ratio is the fraction of

a

-occurrences in the prefix that are not followed by any

b

later in the same prefix. This value ranges from 0 (no violations or no activations) to 1 (all activations violate the response).

3.3.4. Predictive Model

To achieve predictive monitoring for the early sepsis deterioration within the CAPPM framework, several supervised ML models were implemented and evaluated. Specifically, tree-based and ensemble models were considered as they are well-suited for structured, event-derived data and can effectively capture interactions across the diverse feature groups generated by the CAPPM pipeline. All models were trained using the prefix-level feature matrices generated from the process, conformance, declarative, and clinical components. The aim was to identify the model that provides the most reliable and accurate predictions of sepsis deterioration risk at different stages of the patient care management pathway.

A. Decision Tree (DT)

A decision tree is a supervised learning model that organizes decisions in a hierarchical, tree-like structure [28]. At each internal node, the model applies a rule that splits the data into smaller groups based on feature values. This splitting continues until the tree reaches its terminal nodes, where final class predictions are made. Decision trees are intuitive and easy to interpret, but their structure can become overly complex when trained on real-world data, which may lead to reduced generalization if not properly managed.

B. Random Forest (RF)

Random Forest extends the idea of a single decision tree by building many trees and combining their outputs [28]. Each tree is trained using a random selection of both samples and features, which introduces variability and reduces the chance of overfitting to the training data. The final prediction is obtained by aggregating the predictions of all trees in the ensemble. This approach improves stability, increases predictive accuracy, and makes Random Forest a strong choice for handling diverse and noisy datasets.

C. Gradient Boosting (GB)

Gradient Boosting is an ensemble learning technique that builds predictive models in a sequential manner [28]. Each tree in the sequence attempts to correct the errors of the previous one by focusing on instances that were previously misclassified. This additive approach allows the model to capture complex patterns in the data while maintaining control over overfitting through learning rate adjustments and shallow tree depth. Gradient Boosting is widely applied in predictive analytics due to its strong performance on structured datasets.

D. Extra Trees (ET)

The Extra Trees model, also known as Extremely Randomized Trees, is another ensemble of decision trees but introduces additional randomness during the split selection process [28]. Unlike Random Forest, which searches for the best split among a random subset of features, Extra Trees selects split thresholds at random. This increased randomness reduces variance and training time while still maintaining strong predictive performance. Extra Trees often performs well when the dataset contains many correlated or noisy features.

E. Adaptive Boosting (AdaBoost)

AdaBoost constructs an ensemble of weak learners where each subsequent learner focuses more heavily on samples that were previously misclassified [28]. By iteratively adjusting the weights of training instances, the algorithm encourages the ensemble to correct earlier mistakes. This results in a strong final classifier formed by a weighted combination of weak learners. AdaBoost is effective for handling imbalanced datasets and can improve performance even with simple base models such as shallow decision trees.

F. Model Interpretability Using SHAP

To improve the interpretability of the predictive models, SHAP analysis was applied to quantify how individual features contribute to deterioration predictions. SHAP assigns contribution values to each feature based on cooperative game theory principles, allowing identification of features that increase or decrease predicted risk. This analysis enables interpretation of model outputs in terms of clinical workflow characteristics and supports understanding of how pathway deviations and temporal factors influence predictions.

Although the CAPPM framework is demonstrated using a single hospital dataset, its core components are transferable across institutions. Prefix generation, conformance checking, and predictive modeling procedures can be applied to other healthcare environments. However, the reference process model and pathway structures must be derived from local event logs, as they depend on institution-specific workflows and documentation practices. Implementation in new settings, therefore, requires recalibration using local process data.

3.4. Evaluation Metrics

Several metrics were utilized to assess the effectiveness of the predictive models. These metrics were considered because they provide complementary views of model performance, especially in a clinical prediction setting where class imbalance and decision reliability are important.

A. Area Under the ROC Curve (AUROC)

The AUROC measures how well a model can distinguish between deterioration and non-deterioration cases across all possible classification thresholds. It is a widely used performance metric for binary classifiers and summarizes the trade-off between the True Positive Rate (TPR) and the False Positive Rate (FPR) across all thresholds [34,35]. A higher AUROC means the model is better at ranking deteriorating cases higher than non-deteriorating ones.

T P R = \frac{T r u e P o s i t i v e s}{T r u e P o s i t i v e s + F a l s e N e g a t i v e s}

(8)

F P R = \frac{F a l s e P o s i t i v e s}{F a l s e P o s i t i v e s + T r u e N e g a t i v e s}

(9)

The AUROC value ranges from 0.5 (no discrimination) to 1.0 (perfect discrimination).

B. Area Under the Precision-Recall Curve (AUPRC)

AUPRC summarizes the relationship between Precision and Recall across different thresholds. It is especially useful when the positive class (deterioration) is rare. AUPRC focuses specifically on how well the model identifies positive cases, making it more informative than AUROC when the dataset is imbalanced [36]. A higher AUPRC means the model maintains strong precision while still identifying most deterioration cases.

P r e c i s i o n = \frac{T r u e P o s i t i v e s}{T r u e P o s i t i v e s + F a l s e P o s i t i v e s}

(10)

R e c a l l = \frac{T r u e P o s i t i v e s}{T r u e P o s i t i v e s + F a l s e N e g a t i v e s}

(11)

C. Confusion Matrix

A confusion matrix summarizes how many predictions the model classified correctly or incorrectly [37]. It provides a direct, interpretable view of model errors and is useful for assessing clinical risk, for example, the cost of false negatives. It shows:

True Positives (TP): deterioration correctly predicted
False Positives (FP): non-deterioration predicted as deterioration
True Negatives (TN): non-deterioration correctly predicted
False Negatives (FN): deterioration missed

D. Brier Score and Calibration Assessment

The Brier score measures the mean squared difference between predicted probabilities and observed binary outcomes. For predicted probability

p_{i}

and true outcome

y_{i}

, the Brier score is expressed mathematically as follows:

Brier = \frac{1}{N} \sum_{i = 1}^{N} (p_{i} - y_{i})^{2}

(12)

Lower Brier scores indicate more accurate and better calibrated probability estimates. Unlike AUROC and AUPRC, which evaluate ranking performance, the Brier score assesses the overall accuracy of predicted risk magnitudes.

Calibration curves group predicted probabilities into bins and compare the mean predicted probability within each bin to the observed event frequency. A perfectly calibrated model would lie on the diagonal line, indicating agreement between predicted risk and actual outcome frequency.

Reliability diagrams provide a graphical representation of this relationship by plotting observed outcome proportions against predicted probabilities across risk strata. These analyses are particularly important in clinical applications, where predicted probabilities may inform decision thresholds and resource allocation.

4. Results and Discussion

This section presents and discusses the results obtained from the implementation and evaluation of the proposed CAPPM framework for the early prediction of sepsis deterioration using incomplete care pathways.

4.1. Sepsis Care Pathway Discovery

Figure 2 and Figure 3 present the process models discovered from the sepsis event log using the Inductive Miner. These models provide an overview of the typical clinical pathways observed within the dataset and form the basis for subsequent conformance analysis.

The process tree in Figure 2 offers a hierarchical view of the control-flow structure. It highlights the key sequencing of activities beginning with ER Registration and branching into parallel and alternative behavior. The tree reveals several diagnostic and treatment loops, such as repeated measurements of laboratory tests including CRP, Leucocytes, and LacticAcid. It also shows multiple release transitions (Release A–E), indicating variation in patient discharge destinations.

Figure 3 translates this process tree into an executable process representation. The net visualizes the flow of patients through different clinical stages, from registration and triage to laboratory evaluation, treatment, and eventual discharge or admission. The structure captures both routine and exceptional care pathways, including direct admissions, repeated diagnostic cycles, and return visits to the ER. Figure 4 further presents how patients move through the sepsis care process based on the event log.

Each row of colored blocks shows the activities in the order in which they occurred, and the percentages and frequencies on the left side of each row indicate how often that sequence appears in the dataset. It can be seen that almost all high-frequency variants begin with ER Registration, ER Triage, and ER Sepsis Triage, reflecting the standard entry flow for patients in the emergency department. After these initial steps, the pathways diverge. Some variants proceed with laboratory tests such as CRP, Leucocytes, or LacticAcid, while others include early administration of IV Liquid or IV Antibiotics. The ordering and presence of these steps differ across variants, illustrating variation in diagnostic and treatment decisions.

4.2. Comparison of Predictive Models

Figure 5 presents the ROC curves for all models trained using CAPPM features. All models show higher discriminative ability compared to the baseline models, with AUROC values ranging from 0.598 to 0.744, indicating that the CAPPM features effectively separate deterioration from non-deterioration cases. Among the models, the AdaBoost model achieved the highest AUROC of 0.744, closely followed by GB with 0.731 and RF with 0.680. The DT model, however, achieved the lowest AUROC of 0.598. This is because decision trees, unlike ensemble models, are single-split, greedy learners that partition the feature space along one boundary at a time, making them inherently limited in their ability to capture the complex, non-linear interactions between conformance trajectory features, pathway structure, and clinical measurements that characterize sepsis deterioration.

Given the class imbalance in deterioration prediction, with only 12.3% of cases representing deterioration outcomes, the Precision–Recall (PR) curves present a more informative assessment of model performance. Figure 6 shows the PR curves for all five classifiers trained on CAPPM features, with the dashed horizontal line indicating the no-skill baseline corresponding to the deterioration prevalence of 0.11 in the test set. All models substantially exceed this baseline across the full recall range, indicating that CAPPM features concentrate true positive cases more effectively than random classification, thus providing a meaningful discriminative signal under real-world class-imbalance conditions.

GB achieved the highest AUPRC of 0.379, followed closely by AdaBoost at 0.355 and RF at 0.272. Extra Trees achieved an AUPRC of 0.188, while the DT yielded the lowest AUPRC of 0.163, consistent with its reduced ability to capture complex nonlinear interactions. However, the absolute AUPRC values remain moderate. This reflects the inherent difficulty of detecting early deterioration using partial trajectory information, where strong clinical signals may not yet be fully expressed. The PR curves further show a progressive decline in precision as recall increases. Capturing a larger proportion of deterioration cases, therefore, requires accepting a higher rate of false positives. This trade-off is expected in imbalanced clinical prediction settings and highlights the importance of selecting an operating threshold aligned with clinical objectives.

The comparison between the baseline feature set and the full CAPPM feature set in Figure 7 shows consistent improvements in AUROC and AUPRC across all models, confirming that the addition of process conformance, pathway clustering, temporal trend, and DECLARE constraint features systematically enhances predictive performance beyond what static clinical and event-level attributes alone can achieve.

All CAPPM-based models outperform their baseline counterparts, which rely only on static and event-level attributes, indicating improved identification of deterioration cases under class imbalance. For example, GB improves from an AUROC of 0.710 in the baseline setting to 0.731 with CAPPM features, while its AUPRC increases from 0.289 to 0.379. AdaBoost shows a similar pattern, with AUROC rising from 0.689 to 0.744 and AUPRC from 0.294 to 0.355. Random Forest also demonstrates improvement in both metrics. This pattern shows that combining conformance, process, declare, and trend features enables the models to more reliably recognize early deterioration signals. Overall, the figure demonstrates that integrating PM concepts into predictive modelling enhances early sepsis detection performance.

Figure 8 presents the Brier scores for all classifiers, which measure the mean squared difference between predicted probabilities and observed outcomes, with lower values indicating better probabilistic accuracy and calibration. Four of the five models, incorporating CAPPM features, reduced the Brier score relative to the baseline models. DT improved from 0.207 to 0.148, RF from 0.124 to 0.094, GB from 0.095 to 0.088, and Extra Trees from 0.152 to 0.112. These reductions indicate that CAPPM features not only enhance discrimination but also improve the reliability of predicted probabilities. The most notable improvement is observed for the DT and Extra Trees models, suggesting that process and conformance-based features provide meaningful probabilistic refinement even for less complex learners.

GB with CAPPM features achieved the lowest Brier score overall at 0.088, indicating the most accurate probability estimates among the evaluated models. In contrast, AdaBoost exhibited a slight increase in Brier score when CAPPM features were added, rising from 0.152 to 0.162. Although AdaBoost achieved a high AUROC performance, this result suggests that its probability estimates are less well calibrated compared to GB.

Figure 9 further reinforces the calibration performance of the GB model, where it showed near-monotonic alignment with the diagonal reference line across most probability bins, indicating reasonable agreement between predicted risk and observed outcome frequency. Minor deviations were observed in the higher probability ranges, where predicted probabilities slightly underestimated the true event rate. This behavior is consistent with the limited number of cases assigned to high-risk bins in an imbalanced dataset, which increases variability in empirical estimates.

The reliability diagram for the CAPPM models presented in Figure 10 illustrates this relationship by comparing the mean predicted probability within each bin to the corresponding observed proportion of deterioration. The GB model demonstrates acceptable calibration across low- and mid-risk strata, with modest divergence in the upper probability bins. Importantly, no systematic overestimation of risk was observed across the full probability spectrum. Overall, the ensemble-based models, particularly GB and RF, provide not only improved discrimination but also stable and clinically interpretable probability estimates.

The confusion matrix presented in Figure 11 provides a detailed view of the CAPPM GB model’s classification performance at a threshold of 0.5. Among 715 true non-deterioration cases, the model correctly classified 700 and misclassified 15 as deterioration, indicating a low false-positive rate. For deterioration cases, 19 out of 88 were correctly identified, while 69 were misclassified as non-deterioration. This corresponds to a sensitivity of approximately 0.22 at the default threshold. Specificity remains high at approximately 0.98. In other words, the model prioritizes precision and specificity over sensitivity when operating at 0.5, thereby minimizing false alarms but failing to detect a substantial portion. The relatively low number of false positives demonstrates that the CAPPM feature set supports stable discrimination of non-deterioration trajectories. At the same time, the modest sensitivity indicates that a lower probability threshold may be required in practice if the objective is to maximize early detection. Overall, the result indicates that the CAPPM feature set enables the model to make accurate predictions while keeping false alarms minimal, which is important for supporting early sepsis clinical decision-making.

The temporal distance between evaluated prefixes and the first deterioration-defining event was quantified to assess the effective early warning window. Among deterioration cases, the median time between the final evaluated prefix and ICU admission or severe discharge was 3.56 h, with an interquartile range of 1.89 to 21.67 h. These results indicate that risk predictions were generated several hours prior to clinical escalation in a substantial proportion of cases. Notably, one quarter of deterioration cases had more than 21 h between prediction and ICU transfer, demonstrating that the framework captures deterioration risk well before terminal events occur.

4.3. Feature Importance and Interpretability

The SHAP analysis, presented in Figure 12, highlights the features that contribute most to the prediction of early sepsis deterioration. All SHAP analyses were computed using models trained strictly on prefix-level features, ensuring that feature contributions reflect information available at prediction time. The most influential feature is align_cost_slope, indicating that the rate of change in alignment cost over successive prefixes plays a central role in prediction. A larger slope reflects increasing deviation from the reference process model as the case unfolds. Its dominant contribution suggests that dynamic divergence from expected care pathways is strongly associated with deterioration risk. The second most important feature is pathway_cluster, which captures structural similarities among patient trajectories. Its high importance indicates that pathway-level grouping contains substantial predictive information. This suggests that certain structural care patterns are consistently associated with higher or lower risk profiles.

Temporal process characteristics also contribute meaningfully. mean_inter_event_hours and elapsed_hours appear among the top-ranked features. These variables reflect the pace and progression of clinical activity. Their prominence indicates that timing dynamics, not only event presence, influence predicted risk. Several laboratory-derived features follow in importance, including CRP_min, Leucocytes_min, and CRP_mean, along with physiological indicators such as SIRSCritTachypnea_max, LacticAcid_max, and Hypotensie_last. These features represent early inflammatory and hemodynamic markers. Although their individual contributions are smaller than the leading process-based features, they collectively form a substantial portion of the predictive signal. The distribution of feature contributions supports the CAPPM framework, in which temporal and conformance-based signals constitute primary indicators of early deterioration risk.

The detected process deviations are not only descriptive but may support actionable clinical insights. For example, increasing alignment costs or protocol violations may correspond to delays in treatment administration, missing diagnostic steps, or unexpected workflow transitions. Such deviations can serve as early warning indicators that prompt clinicians to reassess patient management, verify protocol adherence, or accelerate pending interventions. Therefore, CAPPM provides both predictive risk estimation and interpretable workflow signals that may support timely clinical decision-making.

Overall, the results obtained from the analysis indicate that incorporating conformance and pathway-based features into predictive models can serve as an early warning decision support tool in the emergency and acute care environment, where timely intervention is crucial. Unlike traditional risk scores that rely solely on physiological measurements and laboratory values [38], the CAPPM framework can detect early deviations from expected clinical workflows, such as delayed treatments, abnormal workflow sequences, or missing diagnostic steps, by explicitly modeling how care is delivered and how patient trajectories evolve over time before severe deterioration becomes evident. This creates an opportunity for earlier intervention, potentially reducing the likelihood of progression to septic shock, ICU admission, or mortality. Furthermore, with the integration of SHAP, interpretability has been improved as clinicians can relate elevated risk scores to concrete workflow issues rather than unclear model outputs. CAPPM can serve as a practical foundation for more reliable, transparent, and continuously improving sepsis care.

5. Limitations and Future Research Directions

5.1. Limitations of the Study

The methodology used in this study has several limitations that should be acknowledged. The Sepsis Event Log represents a single-center dataset reflecting Dutch clinical workflows. Therefore, the discovered reference model and pathway structures are institution-specific and may not generalize directly to other healthcare systems without recalibration. In addition, the dataset reflects workflows from a Dutch academic hospital collected several years ago, and clinical practices may differ across regions or evolve over time, which may further limit direct generalization. The size of the dataset limits the ability to evaluate certain models or investigate rare patterns of sepsis deterioration in detail. The study also focused on offline prediction, and the framework has not yet been tested in a real-time environment where data arrive continuously, and decisions must be made under time constraints. Another limitation is the inability to directly compare the proposed framework with established clinical risk scores such as Sequential Organ Failure Assessment (SOFA), Quick SOFA (qSOFA), or National Early Warning Score (NEWS). The event log used in this study primarily contains process and laboratory event information and does not consistently provide the physiological measurements required to compute these scores, including blood pressure, respiratory rate, Glasgow Coma Scale, platelet count, bilirubin, and creatinine. In addition, the dataset lacks consistent demographic and clinical background variables, such as comorbidities or immune status, which are important factors influencing deterioration risk, particularly among elderly, pediatric, or immunocompromised patients. Future work using richer electronic health record datasets may enable benchmarking of CAPPM predictions against standard clinical scoring systems and evaluation of performance across patient subpopulations, supporting development of subgroup-specific predictive models and pathway analyses. Specifically, in a richer dataset, subgroup-specific reference models could be discovered separately for elderly (>65), pediatric, or immunocompromised populations, enabling subgroup-calibrated conformance assessment. These limitations do not reduce the value of the findings but highlight areas where further validation is needed.

5.2. Future Research Directions

To address these limitations and strengthen the proposed framework, future research can focus on the following areas:

Validation across multiple hospitals: Evaluating the framework using event logs from different institutions would help determine how variations in clinical practice affect process behavior and predictive performance.
Development of real-time monitoring capabilities: The framework can be extended to operate with streaming data so that predictions and conformance insights update as new events occur during patient care.
Use of adaptive or personalized reference models: Future studies can explore models that adjust to changes in clinical practice over time or that generate patient-specific pathway baselines for improved conformance assessment.
Integration of federated learning with process mining: This direction would allow hospitals to collaborate on predictive modelling while keeping patient data local, which supports privacy-preserving analytics in healthcare systems [39].

6. Conclusions

This study proposed a CAPPM framework for early detection of sepsis deterioration using incomplete care pathways. The approach transformed raw event logs into prefix-based representations of patient trajectories, extracted behavioral, temporal, clinical, pathway, and conformance features, and used these features to train several predictive models. AdaBoost achieved the highest AUROC, while GB demonstrated superior calibration and overall probabilistic reliability, and the confusion matrix showed high specificity with low false positives. Although the results obtained show good performance, the study has several limitations that create opportunities for further work. One important direction identified is the integration of federated learning with process mining so that hospitals can develop predictive models and conformance structures without sharing raw patient data [39]. This could support multi-institutional collaboration while preserving privacy. Additional research can also explore richer temporal abstraction methods, patient-specific pathway modelling, and adaptive alignment techniques that update the reference model as clinical practice evolves.

Author Contributions

Conceptualization, K.D.H. and M.N.S.; Methodology, K.D.H. and M.N.S.; Software, K.D.H.; Validation, K.D.H.; Formal analysis, K.D.H.; Data curation, K.D.H.; Writing—original draft, K.D.H.; Writing—review & editing, K.D.H. and M.N.S.; Visualization, M.N.S.; Supervision, K.D.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

This study analyzed a publicly available, fully anonymized secondary dataset (Sepsis Cases—Event Log) obtained from the 4TU.ResearchData repository. No new data were collected from human participants, and no identifiable patient information was accessed. Therefore, Institutional Review Board (IRB) approval was not required.

Informed Consent Statement

Informed consent was not required for this study, as no direct involvement of participants occurred and the data were obtained from an open-access, de-identified dataset.

Data Availability Statement

The data presented in this study are openly available in the 4TU.ResearchData repository as the Sepsis Cases—Event Log dataset at https://data.4tu.nl/articles/dataset/Sepsis_Cases_-_Event_Log/12707639 (accessed on 1 January 2026) [31].

Conflicts of Interest

The authors declare no conflicts of interest.

References

Guarino, M.; Perna, B.; Cesaro, A.E.; Maritati, M.; Spampinato, M.D.; Contini, C.; De Giorgio, R. 2023 Update on Sepsis and Septic Shock in Adult Patients: Management in the Emergency Department. J. Clin. Med. 2023, 12, 3188. [Google Scholar] [CrossRef]
Gray, A.P.; Chung, E.; Hsu, R.L.; Araki, D.T.; Gershberg Hayoon, A.; Davis Weaver, N.; Swetschinski, L.R.; Wool, E.E.; Han, C.; Mestrovic, T.; et al. Global, Regional, and National Sepsis Incidence and Mortality, 1990–2021: A Systematic Analysis. Lancet Glob. Health 2025, 13, e2013–e2026. [Google Scholar] [CrossRef]
Bignami, E.G.; Berdini, M.; Panizzi, M.; Domenichetti, T.; Bezzi, F.; Allai, S.; Damiano, T.; Bellini, V. Artificial Intelligence in Sepsis Management: An Overview for Clinicians. J. Clin. Med. 2025, 14, 286. [Google Scholar] [CrossRef] [PubMed]
Ocampo-Quintero, N.; Vidal-Cortés, P.; del Río Carbajo, L.; Fdez-Riverola, F.; Reboiro-Jato, M.; Glez-Peña, D. Enhancing Sepsis Management through Machine Learning Techniques: A Review. Med. Intensiv. Engl. Ed. 2022, 46, 140–156. [Google Scholar] [CrossRef]
Fleuren, L.M.; Klausch, T.L.T.; Zwager, C.L.; Schoonmade, L.J.; Guo, T.; Roggeveen, L.F.; Swart, E.L.; Girbes, A.R.J.; Thoral, P.; Ercole, A.; et al. Machine Learning for the Prediction of Sepsis: A Systematic Review and Meta-Analysis of Diagnostic Test Accuracy. Intensive Care Med. 2020, 46, 383–400. [Google Scholar] [CrossRef]
Mans, R.S.; Van Der Aalst, W.M.P.; Vanwersch, R.J.B.; Moleman, A.J. Process Mining in Healthcare: Data Challenges When Answering Frequently Posed Questions. In Process Support and Knowledge Representation in Health Care; Lenz, R., Miksch, S., Peleg, M., Reichert, M., Riaño, D., Ten Teije, A., Eds.; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2013; Volume 7738, pp. 140–153. ISBN 978-3-642-36437-2. [Google Scholar]
Van Der Aalst, W.; Adriansyah, A.; De Medeiros, A.K.A.; Arcieri, F.; Baier, T.; Blickle, T.; Bose, J.C.; Van Den Brand, P.; Brandtjen, R.; Buijs, J.; et al. Process Mining Manifesto. In Business Process Management Workshops; Daniel, F., Barkaoui, K., Dustdar, S., Eds.; Lecture Notes in Business Information Processing; Springer: Berlin/Heidelberg, Germany, 2012; Volume 99, pp. 169–194. ISBN 978-3-642-28107-5. [Google Scholar]
Ceravolo, P.; Comuzzi, M.; De Weerdt, J.; Di Francescomarino, C.; Maggi, F.M. Predictive Process Monitoring: Concepts, Challenges, and Future Research Directions. Process Sci. 2024, 1, 2. [Google Scholar] [CrossRef]
Salehi, M.; Khayami, R.; Akbari, R.; Mirmozaffari, M. An Integrated Predictive Impact–Enhanced Process Mining Framework for Strategic Oncology Workflow Optimization: Case Study in Iran. Bioengineering 2025, 12, 1288. [Google Scholar] [CrossRef]
Silva, E.; Marreiros, G. Predictive Process Mining a Systematic Literature Review. In Good Practices and New Perspectives in Information Systems and Technologies; Rocha, Á., Adeli, H., Dzemyda, G., Moreira, F., Poniszewska-Marańda, A., Eds.; Lecture Notes in Networks and Systems; Springer Nature: Cham, Switzerland, 2024; Volume 987, pp. 357–378. ISBN 978-3-031-60220-7. [Google Scholar]
Chiotos, K.; Balamuth, F.; Fitzgerald, J.C. A Critical Assessment of Time-to-Antibiotics Recommendations in Pediatric Sepsis. J. Pediatr. Infect. Dis. Soc. 2024, 13, 608–615. [Google Scholar] [CrossRef] [PubMed]
Demir, O. Sepsis: An Overview of Current Therapies and Future Research. J. Clin. Pract. Res. 2025, 47, 99–110. [Google Scholar] [CrossRef]
Papareddy, P.; Lobo, T.J.; Holub, M.; Bouma, H.; Maca, J.; Strodthoff, N.; Herwald, H. Transforming Sepsis Management: AI-Driven Innovations in Early Detection and Tailored Therapies. Crit. Care 2025, 29, 366. [Google Scholar] [CrossRef]
Zubair, M.; Din, I.; Sarwar, N.; Elov, B.; Makhmudov, S.; Trabelsi, Z. Revolutionizing Sepsis Diagnosis Using Machine Learning and Deep Learning Models: A Systematic Literature Review. BMC Infect. Dis. 2025, 25, 1396. [Google Scholar] [CrossRef]
Samara, M.N.; Harry, K.D. Leveraging Kaizen with Process Mining in Healthcare Settings: A Conceptual Framework for Data-Driven Continuous Improvement. Healthcare 2025, 13, 941. [Google Scholar] [CrossRef] [PubMed]
Ghasemi, M.; Amyot, D. Process Mining in Healthcare: A Systematised Literature Review. Int. J. Electron. Healthc. 2016, 9, 60. [Google Scholar] [CrossRef]
Santos, A.; Leal, G.C.L.; Balancieri, R. Process Mining in Healthcare: A Tertiary Study. BMC Med. Inform. Decis. Mak. 2025, 25, 306. [Google Scholar] [CrossRef]
Munoz-Gama, J.; Martin, N.; Fernandez-Llatas, C.; Johnson, O.A.; Sepúlveda, M.; Helm, E.; Galvez-Yanjari, V.; Rojas, E.; Martinez-Millana, A.; Aloini, D.; et al. Process Mining for Healthcare: Characteristics and Challenges. J. Biomed. Inform. 2022, 127, 103994. [Google Scholar] [CrossRef]
De Roock, E.; Martin, N. Process Mining in Healthcare—An Updated Perspective on the State of the Art. J. Biomed. Inform. 2022, 127, 103995. [Google Scholar] [CrossRef]
Al-Badarneh, H.; Arif, A. A Systematic Review of Process Mining in Healthcare. In Proceedings of the 2025 International Conference on New Trends in Computing Sciences (ICTCS), Amman, Jordan, 16–18 April 2025; pp. 485–492. [Google Scholar]
Hendricks, R. Process Mining of Incoming Patients with Sepsis. Online J. Public Health Inform. 2019, 11, e14. [Google Scholar] [CrossRef]
Quintano Neira, R.A.; Hompes, B.F.A.; De Vries, J.G.-J.; Mazza, B.F.; Simões De Almeida, S.L.; Stretton, E.; Buijs, J.C.A.M.; Hamacher, S. Analysis and Optimization of a Sepsis Clinical Pathway Using Process Mining. In Business Process Management Workshops; Di Francescomarino, C., Dijkman, R., Zdun, U., Eds.; Lecture Notes in Business Information Processing; Springer International Publishing: Cham, Switzerland, 2019; Volume 362, pp. 459–470. ISBN 978-3-030-37452-5. [Google Scholar]
Bakhshi, A.; Hassannayebi, E.; Sadeghi, A.H. Optimizing Sepsis Care through Heuristics Methods in Process Mining: A Trajectory Analysis. Healthc. Anal. 2023, 3, 100187. [Google Scholar] [CrossRef]
da Silva Júnior, E.R.; Santos, U.L.; Figueredo, M.B. Process Mining in Sepsis Treatment Management: A Step-by-Step Approach to Discovery. J. Bioeng. Technol. Health 2024, 6, 51–59. [Google Scholar] [CrossRef]
Rademaker, F.; Bemthuis, R.; Arachchige, J.; Bukhsh, F. Analyzing Sepsis Treatment Variations in Subpopulations with Process Mining. In Proceedings of the 26th International Conference on Enterprise Information Systems; SCITEPRESS—Science and Technology Publications: Angers, France, 2024; pp. 85–94. [Google Scholar]
Mannhardt, F.; Blinde, D. Analyzing the Trajectories of Patients with Sepsis Using Process Mining; Eindhoven University of Technology: Eindhoven, The Netherlands, 2017. [Google Scholar]
Mohammadi, M. Comparative Process Mining for Identifying the Critical Activities in Sepsis Trajectories. Healthc. Technol. Lett. 2025, 12, e70010. [Google Scholar] [CrossRef] [PubMed]
Madau, A.; Semeraro, G. Process Mining and Machine Learning for Predicting Clinical Outcomes in Emergency Care: A Study on the MIMICEL Dataset. In Proceedings of the 14th International Conference on Data Science, Technology and Applications; SCITEPRESS—Science and Technology Publications: Bilbao, Spain, 2025; pp. 791–799. [Google Scholar]
Celik, U.; Korkmaz, A.; Stoyanov, I. Integrating Process Mining and Machine Learning for Surgical Workflow Optimization: A Real-World Analysis Using the MOVER EHR Dataset. Appl. Sci. 2025, 15, 11014. [Google Scholar] [CrossRef]
Ferreira, L.D.; McCants, D.; Velamuri, S. Using machine learning for process improvement in sepsis management. J. Healthc. Qual. Res. 2023, 38, 304–311. [Google Scholar] [CrossRef]
Mannhardt, F. Sepsis Cases—Event Log 2016. Available online: https://scholar.google.com/citations?view_op=view_citation&hl=en&user=s1HfgbcAAAAJ&citation_for_view=s1HfgbcAAAAJ:nb7KW1ujOQ8C (accessed on 1 January 2026).
De Leoni, M.; Van Der Aalst, W.M.P. Aligning Event Logs and Process Models for Multi-Perspective Conformance Checking: An Approach Based on Integer Linear Programming. In Business Process Management; Daniel, F., Wang, J., Weber, B., Eds.; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2013; Volume 8094, pp. 113–129. ISBN 978-3-642-40175-6. [Google Scholar]
Van Der Aalst, W. Conformance Checking. In Process Mining; Springer: Berlin/Heidelberg, Germany, 2016; pp. 243–274. ISBN 978-3-662-49850-7. [Google Scholar]
Fawcett, T. An Introduction to ROC Analysis. Pattern Recognit. Lett. 2006, 27, 861–874. [Google Scholar] [CrossRef]
Siddiqui, M.K.; Morales-Menendez, R.; Ahmad, S. Application of Receiver Operating Characteristics (ROC) on the Prediction of Obesity. Braz. Arch. Biol. Technol. 2020, 63, e20190736. [Google Scholar] [CrossRef]
Saito, T.; Rehmsmeier, M. The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets. PLoS ONE 2015, 10, e0118432. [Google Scholar] [CrossRef] [PubMed]
Sathyanarayanan, S. Confusion Matrix-Based Performance Evaluation Metrics. Afr. J. Biomed. Res. 2024, 27, 4023–4031. [Google Scholar] [CrossRef]
Alaa, A.M.; Yoon, J.; Hu, S.; Van der Schaar, M. Personalized Risk Scoring for Critical Care Prognosis Using Mixtures of Gaussian Processes. IEEE Trans. Biomed. Eng. 2017, 65, 207–218. [Google Scholar] [CrossRef]
Alozie, E.; Olagunju, H.I.; Faruk, N.; Garba, S. Technical Considerations of Federated Learning in Digital Healthcare Systems. In Federated Learning for Digital Healthcare Systems; Elsevier: Amsterdam, The Netherlands, 2024; pp. 237–282. ISBN 978-0-443-13897-3. [Google Scholar]

Figure 1. CAPPM Framework Architecture.

Figure 2. Inductive Miner Process Tree.

Figure 3. Petri Net model of the discovered sepsis care pathway. Circles represent places, squares represent transitions (activities), and directed arcs denote the flow relations between them.

Figure 4. Trace Explorer of Variants with Frequencies.

Figure 5. ROC Curves for CAPPM Models.

Figure 6. Precision-Recall Curves for CAPPM Models.

Figure 7. AUROC and AUPRC comparison for Baseline and CAPPM feature sets.

Figure 8. Brier Score comparison for Baseline and CAPPM model.

Figure 9. Calibration Curves for CAPPM Models.

Figure 10. Reliability Diagram for CAPPM Gradient Boosting.

Figure 11. Confusion Matrix for CAPPM GB Model.

Figure 12. Top 20 most important CAPPM features using SHAP.

Table 1. Comparison of Process Monitoring Approaches.

Aspect	PM	PPM	CAPPM
Primary Goal	Discover and analyze completed process executions	Predict future outcomes of ongoing process instances	Predict future outcomes while simultaneously evaluating deviations from expected clinical workflows
Data Used	Completed event logs	Partial traces (process prefixes) from ongoing cases	Partial traces enriched with conformance information derived from reference process models
Temporal Perspective	Retrospective analysis	Prospective prediction of ongoing cases	Prospective prediction incorporating evolving conformance behavior
Role of Conformance Checking	Used mainly for retrospective evaluation	Typically not included in prediction	Integrated as a core component to quantify workflow deviations during prediction
Clinical Utility	Identifies bottlenecks and workflow inefficiencies	Provides early prediction of clinical or operational outcomes	Supports early risk detection while revealing workflow deviations that may require clinical intervention
Output	Process models and descriptive workflow insights	Predicted future outcomes	Predicted risk outcomes accompanied by interpretable deviation indicators

Table 2. Statistics of Activity Occurrences in the Sepsis Event Log.

Activity	Count	Mean	Std	Min	25%	50%	75%	Max
ER Registration	1050.0	1.000000	0.000000	1.0	1.0	1.0	1.0	1.0
Leucocytes	1050.0	3.221905	4.461176	0.0	1.0	2.0	4.0	74.0
CRP	1050.0	3.106667	3.918626	0.0	1.0	2.0	4.0	69.0
LacticAcid	1050.0	1.396190	2.701760	0.0	1.0	1.0	1.0	51.0
ER Triage	1050.0	1.002857	0.053401	1.0	1.0	1.0	1.0	2.0
ER Sepsis Triage	1050.0	0.999048	0.030861	0.0	1.0	1.0	1.0	1.0
IV Liquid	1050.0	0.717143	0.450602	0.0	0.0	1.0	1.0	1.0
IV Antibiotics	1050.0	0.783810	0.411842	0.0	1.0	1.0	1.0	1.0
Admission NC	1050.0	1.125714	0.877320	0.0	1.0	1.0	2.0	5.0
Release A	1050.0	0.639048	0.480506	0.0	0.0	1.0	1.0	1.0
Return ER	1050.0	0.280000	0.449213	0.0	0.0	0.0	1.0	1.0
Admission IC	1050.0	0.111429	0.335340	0.0	0.0	0.0	0.0	2.0
Release B	1050.0	0.053333	0.224804	0.0	0.0	0.0	0.0	1.0
Release E	1050.0	0.005714	0.075413	0.0	0.0	0.0	0.0	1.0
Release C	1050.0	0.023810	0.152528	0.0	0.0	0.0	0.0	1.0
Release D	1050.0	0.022857	0.149519	0.0	0.0	0.0	0.0	1.0

Release A–E correspond to discharge categories defined in the original hospital information system within the Sepsis Event Log dataset. These categories represent different discharge destinations or outcomes recorded by the hospital, although detailed clinical definitions are not available in the public dataset.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Harry, K.D.; Samara, M.N. Conformance-Aware Predictive Process Monitoring for Early Detection of Sepsis Deterioration Using Incomplete Care Pathways. J. Clin. Med. 2026, 15, 1956. https://doi.org/10.3390/jcm15051956

AMA Style

Harry KD, Samara MN. Conformance-Aware Predictive Process Monitoring for Early Detection of Sepsis Deterioration Using Incomplete Care Pathways. Journal of Clinical Medicine. 2026; 15(5):1956. https://doi.org/10.3390/jcm15051956

Chicago/Turabian Style

Harry, Kimberly D., and Mohammad Najeh Samara. 2026. "Conformance-Aware Predictive Process Monitoring for Early Detection of Sepsis Deterioration Using Incomplete Care Pathways" Journal of Clinical Medicine 15, no. 5: 1956. https://doi.org/10.3390/jcm15051956

APA Style

Harry, K. D., & Samara, M. N. (2026). Conformance-Aware Predictive Process Monitoring for Early Detection of Sepsis Deterioration Using Incomplete Care Pathways. Journal of Clinical Medicine, 15(5), 1956. https://doi.org/10.3390/jcm15051956

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Conformance-Aware Predictive Process Monitoring for Early Detection of Sepsis Deterioration Using Incomplete Care Pathways

Abstract

1. Introduction

2. Literature Review

2.1. AI/Machine Learning for Sepsis Detection

2.2. Process Mining and Predictive Integration

3. Research Methodology

3.1. Dataset Description

3.2. Data Preprocessing

3.2.1. Import and Standardization

3.2.2. Outcome Labelling

3.3. CAPPM Framework

3.3.1. Prefix Generation

3.3.2. Feature Engineering

3.3.3. Conformance Checking Module

3.3.4. Predictive Model

3.4. Evaluation Metrics

4. Results and Discussion

4.1. Sepsis Care Pathway Discovery

4.2. Comparison of Predictive Models

4.3. Feature Importance and Interpretability

5. Limitations and Future Research Directions

5.1. Limitations of the Study

5.2. Future Research Directions

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI