An Integrated Predictive Impact–Enhanced Process Mining Framework for Strategic Oncology Workflow Optimization: Case Study in Iran
Abstract
1. Introduction
- Phase 1: Process Discovery and Conformance Checking
- Phase 2: Retrospective Performance Analysis
- Phase 3: Toward Forward-Looking Analytics
- Path 3a: Predictive Monitoring of Ongoing Processes
- Path 3b: “What-If” Scenario Analysis via Simulation
- -
- Predictive monitoring anticipates micro-level events,
- -
- Simulation explores macro-level scenarios,
| Phase/Period | Representative Studies | Analysis Type/Objective | Methodological Focus | Predictive Monitoring | Scenario Simulation (What-If) | KPI Quantification | Application Domain | Key Limitation/Gap | Advancement Addressed by PIM |
|---|---|---|---|---|---|---|---|---|---|
| Phase 1—Process Discovery and Conformance (2012–2016) | Rebuge & Ferreira (2012) [12] Fernández Llatas et al. (2015) [13] Rojas et al. (2016) [14] Savino et al. (2023) [11] | Descriptive mapping of real workflows and deviations | PM discovery/conformance with EHR and tracking data | ✘ | ✘ | Descriptive metrics (fitness) | Emergency and Oncology care | Reveals deviations but does not measure performance impact | PIM quantifies effect of deviation elimination on key KPIs |
| Transition and Reviews (2022–2025) | Mirmozaffari et al. (2017) [3] Santos-Leal & Balancieri (2025) [15] Aversano et al. (2025) [16] | Comprehensive reviews of PM in healthcare | Literature synthesis | ✘ | ✘ | Conceptual | General healthcare | Confirms methodological rigor but highlights absence of predictive integration | PIM provides operational bridge from diagnosis → prediction |
| Phase 2—Retrospective Performance Analysis (2018–2022) | Kurniati et al. (2018) [18] Atighehchian et al. (2022) [19] De Roock & Martin (2022) [1] Rosa & Massaro (2024) [21] Samara & Harry (2025) [22] | Quantitative measurement of bottlenecks, delays and efficiency | KPI extraction and delay quantification | ✘ | ✘ | Delay and throughput metrics | Stroke, Iran health system, PMO | Retrospective only—no forecast of improvement effects | PIM adds statistical forecasting of aggregate gains |
| Phase 3a—Predictive Monitoring (2018–2025) | Teinemaa et al. (2018) [23]; Kratsch et al. (2021) [24] Madau et al. (2025) [25] Winter et al. (2024) [26] Delgado et al. (2025) [27] | Prediction of ongoing-case outcomes | ML/DL-based predictive monitoring | ✔ | ✘ | Probabilistic outcomes | General health → mHealth and cross-organizational networks | Predicts single-case events, not system-wide improvement | PIM embeds predictive layer within full workflow context |
| Phase 3b—Scenario Simulation and Optimization (2015–2024) | Lamine et al. (2015) [28] Jadrić et al. (2020) [29] Di Cunzolo et al. (2023) [30] Salas et al. (2024) [31] Ronzani & Sulis (2024) [32] | “What-if” analysis via DES/optimization/digital twin | PM-based simulation coupling | ✘ | ✔ | Strategic KPI | Emergency/hospital logistics/scheduling | Simulation detached from causal PM layer | PIM integrates simulation natively with PM2 output |
| Integrated Framework—PM2–PIM (2025) | This Study | Proactive and Predictive | PM2 + Embedded PIM Layer | ✔ | ✔ | Cycle time ↓8% Workload ↓6% | Oncology (chemotherapy) | - | Unified, process-native forecasting of operational impact; bridges discovery↔ simulation↔ prediction |
- Identification of High-Impact Deviations: Utilizing standard PM techniques, we identify workflow anomalies based on both their high frequency of occurrence and their significant contribution to overall process delay.
- Definition of Improvement Scenarios: Clear improvement scenarios are formally defined, where the identified high-impact deviations are either fully eliminated or mitigated to a specified, partial degree.
- Remodeling of Process Flows: The process models (both descriptive and normative) are mathematically or computationally remodeled to explicitly incorporate the structural changes defined by the improvement scenarios.
- Computation of Corresponding KPI Changes: Through simulation and comparison between the baseline and the remodeled scenarios, the resulting changes in critical KPIs (e.g., time reduction, workload distribution) are accurately computed.
- Reduced Risk: Decision-makers can rigorously test the hypothesized benefits of process changes before committing resources to costly and complex real-world implementation.
- Evidence-Based Prioritization: Interventions are prioritized based not merely on visibility or perceived severity, but on mathematically derived, quantifiable impact metrics.
- Clear Numerical Outcomes: The framework delivers clear, objective numerical projections, providing robust support for justifying strategic decision-making to administrators and clinical governance bodies.
- Sensitivity Analyses: It enables nuanced analysis of partial-impact scenarios, allowing for testing of partial compliance goals rather than requiring absolute adherence to achieve measurable benefits.
2. Materials and Methods
2.1. Data Source
2.1.1. Data Source and Preparation
- Case ID: A unique identifier for each patient’s chemotherapy journey, allowing for the tracing of a complete process instance.
- Activity Name: A descriptive label indicating the specific step performed within the chemotherapy workflow (e.g., “Physician Consultation,” “Prescription Approval,” “Chemotherapy Preparation,” “Patient Infusion”).
- Timestamp: The precise date and time at which an activity commenced or completed. This temporal information is paramount for understanding the sequence of events, calculating durations, and identifying bottlenecks.
- Relevant Resource Information: Data pertaining to the resources involved in the activity, such as the type of healthcare professional (e.g., physician, nurse, and pharmacist) or the department responsible.
- HIS Structure and Extracted Tables
- Reception Table: stores fundamental patient admission metadata and serves as the anchor entity for connecting services via the foreign key ReceptionID.
- ReceptionService Table: contains one-to-many service records for each admission and details discrete procedural or paraclinical activities.
2.1.2. Data Quality Challenges and Resolutions
- Heterogeneous Service Codes: Service Name labels varied across subsystems; we resolved this via a harmonized dictionary developed in collaboration with hospital IT staff.
- Granularity of Timestamps: Certain entries recorded times only to the nearest minute; missing seconds were interpolated using a 5 min offset rule to maintain event ordering.
2.1.3. Ethical Considerations and Reproducibility
2.1.4. Integration with Process Mining Tools
2.2. PM2 Methodology Steps
- Scope Definition: The primary objective was clearly defined: to assess the conformance of actual chemotherapy execution at the An Iranian Radiotherapy and Oncology Center against a predefined normative process. This involved establishing the boundaries of the process under study and identifying the key activities and their expected sequence [40].
- Data Preparation: This crucial step involved the extraction of raw event logs from the HIS. Subsequently, the data underwent cleaning to remove irrelevant or erroneous entries, followed by transformation and mapping to the standardized event log format required by process mining tools. This ensured that the data accurately reflected the chemotherapy process [18].
- Modeling of the Normative Process: The normative process model, representing the ideal or standard way the chemotherapy workflow should be executed, was designed. This was achieved using the Business Process Model and Notation (BPMN) 2.0 standard [41], a widely recognized graphical language for specifying business processes. The WoPeD (Workflow, Process, and Event Data) tool was employed for this purpose [42], allowing for the creation of a detailed and unambiguous visual representation of the ideal workflow.
- Event Log Conversion: To enable analysis in standard process mining platforms, the prepared event logs were converted into the Extended Event Stream (XES) format [43]. XES is an XML-based standard designed for storing event logs, ensuring interoperability between different process mining tools.
- Process Analysis within ProM: The XES-formatted event logs were imported into ProM, a powerful and widely used open-source process mining framework. Within ProM, several analytical techniques were applied:
- ◦
- Conformance Checking: This involved comparing the actual process executions (recorded in the event logs) against the normative BPMN model. Metrics were calculated to quantify the degree of adherence [44].
- ◦
- KPI Calculation: Key Performance Indicators (KPIs) were computed to objectively measure various aspects of process performance, including efficiency, effectiveness, and compliance [45].
- ◦
- Variant Analysis: The different paths or sequences of activities that occurred in reality were identified and analyzed. This helped in understanding the diversity of actual process executions and highlighting common deviations from the normative model [46].
- Visualization and Communication: Visual aids were employed to effectively communicate the findings of the analysis.
- ◦
- Sankey Diagrams: These diagrams were used to visualize the flow of cases through the different activities and variants of the process, clearly illustrating the dominant paths and the distribution of cases across different sequences [47].
- ◦
- Deviation Maps: These visualizations overlaid the identified deviations directly onto the normative BPMN model, making it easy to pinpoint where and how the actual process diverged from the intended one.
- Predictive Impact Modeling (PIM) Integration: Scenario simulations projecting KPI improvements under two conditions:
- Current Optimization: corrections targeting observed deviations (e.g., skipped approvals, loops).
- Full Adherence: theoretical elimination of all deviations and rework. Quantitative outputs from this stage informed comparative tables (e.g., Table 2 in Results), linking process changes to projected gains in time savings and workload reduction.
Rationale for PM2 Adoption in Healthcare
- Discovery reconstructs the “as-is” process for transparency.
- Conformance quantifies deviations against clinical standards.
- Enhancement operationalizes evidence-based redesign to reduce bottlenecks.
- Prediction simulates scenario outcomes for proactive optimization.
2.3. Tools
2.3.1. WoPeD (Workflow, Process, and Event Data)
- Construction of Petri Net–based workflows with soundness and reachability analysis;
- Compatibility with BPMN, PNML and EPML formats;
- Token-based simulation to validate logical sequencing and timing;
- Export interfaces to ProM for conformance and performance analysis.
2.3.2. ProM (Process Mining Framework)
- Process Discovery (Alpha, Heuristic and Inductive Miner algorithms);
- Alignment-based Conformance Checking computing Fitness and Precision;
- Performance Analysis to reveal cycle-time and workload variations;
- Export of annotated Petri Nets and KPI tables;
- Direct interoperability with WoPeD via EPML and XES interfaces.
- Import and parse event logs extracted from the Hospital Information System;
- Discover the actual process and align it with the WoPeD normative model;
- Identify deviations and compute quantitative metrics (Fitness = 0.97; Precision = 1.00; Cycle Time; Workload);
- Provide numeric outputs that fed the Predictive Impact Model (PIM) layer developed in Python for scenario simulation [48].
2.4. Predictive Impact Modeling Parameters and Simulation Algorithm
- Baseline KPI Extraction:
- ○
- Compute Cycle Time for each case as the difference between the earliest and latest timestamps (Timestamp field), expressed in days.
- ○
- Compute Workload as the total number of activities per case.
- ○
- Baseline mean and standard deviation are calculated across all cases (n = 214):
- Cycle Time = 5.46 ± 1.12 days
- Workload = 8.07 ± 1.65 activity-units
- Normative Process Target Identification:
- ○
- The normative BPMN model contains exactly seven sequential activities: paziresh, sandugh, Shimi Darmani, Parastar and Sabad Daru, tazrigh, parvande, Etmam Darman.
- ○
- A “Full Adherence” case must include all seven normative activities, with no extra steps, in canonical order.
- ○
- Among the 214 observed cases, 27 met these criteria and defined the Normative Target KPIs:
- Normative Cycle Time = 4.80 days
- Normative Workload = 7.00 activity-units
- Scenario Definition:
- ○
- Current Optimization: Remove or mitigate top-frequency deviations (partial adherence), yielding mean predicted improvement of 8% in Cycle Time and 6% in Workload.
- ○
- Full Adherence: Enforce the normative sequence for all cases (full deviation elimination), yielding predicted improvement of 12% in Cycle Time and 9% in Workload.
- Simulation Procedure:
- ○
- For each scenario, the baseline event log is computationally “remodeled” to reflect the structural change:
- Deviant activities removed or resequenced
- Delay durations reduced according to scenario assumptions
- ○
- KPI recomputation is performed directly on the modified timestamp series without probabilistic sampling (deterministic re-timing based on empirical delay distributions).
- ○
- To account for potential variability in delay estimates, sensitivity analysis perturbs all scenario-level delay reductions by ±10%. Resulting KPI changes remain within ±1% absolute of the nominal predictions, confirming model stability (see Section 3.3.5).
- Statistical Evaluation:
- ○
- Scenario outputs were paired to their corresponding baseline case results. Paired-sample t-tests confirmed that KPI reductions were statistically significant (p < 0.001) for both scenarios and both KPIs.
- ○
- Cohen’s d effect sizes were substantial for Cycle Time (0.71 for Current Optimization; 1.05 for Full Adherence) and moderate-to-substantial for Workload (0.66; 0.92) (see Section 3.3.5).
3. Results
3.1. Conformance Metrics
- Fitness: A Fitness score of 0.97 indicates a very high degree of compliance. This metric measures whether all traces in the event log can be replayed by the model. A score close to 1.0 signifies that the model accurately describes the observed behavior, with minimal unexplainable behavior in the log.
- Precision: A Precision score of 1.00 demonstrates perfect precision. This metric assesses whether the model only allows behaviors that are present in the log. A score of 1.00 means that for every possible behavior allowed by the normative model, there is a corresponding observed behavior in the event log. This suggests that the normative model does not generate any “phantom” behaviors not seen in reality.
- Backwards Precision: A Backwards Precision of 0.99 indicates exceptionally high backward precision. This metric checks if all possible behaviors in the log are accepted by the model. A score of 0.99 suggests that the normative model can accommodate nearly all observed event sequences in the log.
- Balanced Precision: Averaging Fitness and Precision, Balanced Precision was calculated as 0.995, further reinforcing the strong alignment between the observed and modeled processes.
- F1-score: The F1-score, which is the harmonic mean of Precision and Recall (where Recall is related to Fitness), was 0.985. This provides a balanced measure of the model’s accuracy in capturing the observed behavior.
- Generalization: A Generalization score of 0.92 indicates that the normative model generalizes well to unseen cases. This means that the model is not overly specific to the observed training data and is likely to be applicable to future chemotherapy cases.
- Simplicity: A Simplicity score of 0.88 suggests that the normative model is reasonably straightforward and avoids unnecessary complexity, making it understandable and manageable.
3.2. Variants and Deviations
- 15 distinct process variants were identified from the event logs. This indicates that while the core process remains consistent, there are multiple acceptable or observed ways in which the chemotherapy workflow can be executed.
- Approximately 12% of the total cases showed deviations from the main normative path. These deviations were not random but clustered around specific types of alterations:
- ◦
- Skipped approval steps: Certain mandatory approval stages (e.g., physician re-approval after pharmacy preparation for a change in medication) were found to be bypassed in some cases.
- ◦
- Resequenced events: The order of certain non-critical activities was observed to be altered, potentially due to scheduling constraints or resource availability.
- A significant subset of these deviating cases, specifically 6% of all cases, involved cycles or rework. This implies that certain activities had to be repeated or re-executed. Analysis of these rework loops indicated an average delay of 2.3 h per case directly attributable to these cyclical activities.
- Red branches (Skipped Approvals): illustrate cases where critical validation steps were bypassed. For example, the event log shows some traces jumping directly from Chemotherapy _event to Record Handling _event, omitting physician re-approval after pharmacy preparation, a deviation linked to expedited but uncontrolled medication changes.
- Orange branches (Resequenced Events): capture scenarios in which the sequence of non-critical steps shifted. A common example observed in logs is Nurse and Medication Basket _event occurring before Record Handling _event, driven by staff reallocation or inventory timing constraints.
- Yellow branches (Loops/Rework): represent repeated activity cycles, often returning to an earlier stage. For instance, some traces record a return from Injection _event back to Billing _event due to billing corrections, contributing to the documented 2.3 h average delay in loop cases (6% of total).
3.3. Impact Estimation
3.3.1. KPI Reduction Formula
- Baseline Value (BV) refers to the average KPI in the actual, current workflow execution.
- Scenario Value (SV) refers to the average KPI in the improved scenario (either Current Optimization or Predictive Model).
- A positive percentage indicates a reduction in time or workload; a negative value would indicate deterioration.
3.3.2. Statistical Significance Testing (Paired-Sample t-Test)
- is the mean difference between paired baseline and scenario values.
- is the standard deviation of differences.
- n is the number of paired observations.
- If p < 0.05, the observed reduction is considered statistically significant.
- If p ≥ 0.05, the reduction is not statistically significant.
3.3.3. Results for Current Optimization and Predictive Model
- Current Optimization Simulation: A simulation was run to estimate the potential gains if immediate, identified inefficiencies were addressed. This simulation suggested that addressing the most prominent bottlenecks and rework loops could lead to:
- ◦
- 8% reduction in patient processing time.
- ◦
- 6% reduction in staff workload.
- Predictive Modeling for Full Adherence: A more ambitious scenario was modeled, assuming complete adherence to the normative workflow. This predictive scenario, which represents the theoretical maximum.Improvement if all deviations were eliminated, projected even more substantial gains:
- ◦
- Potential 12% reduction in total processing time.
- ◦
- Potential 9% reduction in staff workload.
3.3.4. Interpretation
- Cycle Time:
- ○
- Current Optimization simulations have predicted an 8% reduction, statistically significant (p < 0.001).
- ○
- Predictive Model simulations have forecast a 12% reduction, also highly significant, indicating substantial potential for time savings if complete adherence to the normative workflow is achieved.
- Workload:
- ○
- Current Optimization yielded a 6% reduction in staff effort units, significant at p < 0.001.
- ○
- Predictive Model produced a 9% reduction, suggesting even greater efficiency through proactive deviation mitigation.
3.3.5. Sensitivity and Statistical Significance Analysis
- (a)
- Sensitivity Analysis
- (b) Statistical Significance Testing
3.3.6. Consequences and Future KPI Perspectives
- ▪
- Inter-department latency (delay between physician and pharmacy timestamps): captures coordination efficiency
- ▪
- Queue turnover rate: links patient arrival patterns with resource utilization, enabling stochastic PIM calibration;
- ▪
- Deviation recurrence probability: derived via Markov frequency modeling using long-range event sequences;
- ▪
- Resource overlap index: computed from staff allocation logs to quantify concurrent workload tension;
- ▪
- Patient stability index: integrating clinical, temporal, and logistic data from broader oncology datasets for risk-aware forecasting.
3.3.7. Reproducibility Note
- Reading input data in CSV format.
- Applying the KPI reduction formula for both Current Optimization and Predictive Model scenarios.
- Running paired-sample t-tests to assess statistical significance.
- Analysis scripts (kpi_validation.py) and
- An anonymized synthetic sample of the input CSV file together with the corresponding synthetic output tables
3.4. Performance Comparison and Statistical Validation
4. Discussion
- Current optimization scenario: Cycle Time reduced by 8.00% (t = 4.17, p = 0.00012) and Workload reduced by 6.00% (t = 3.84, p = 0.00025).
- Predictive full-adherence scenario: Cycle Time reduced by 12.00% and Workload by 9.00% (p < 0.001 for both), representing the theoretical maximum achievable if all major deviations are eliminated.
- Application of the full PM2 methodology in a real oncology setting, ensuring methodological rigor and reproducibility.
- Integration of a predictive modeling layer, enabling proactive scenario planning with low-cost simulation prior to change implementation.
- Focus on an Iranian oncology center, filling a regional research gap and adding locally relevant, actionable insights.
4.1. Root-Cause Interpretation and Strategic Predictive Insights
- Frequency: percentage occurrence of each deviation within the total case population.
- Impact: average proportional increase in cycle time caused by that deviation, as estimated from scenario-testing results.
- High (≥40): frequent, high-impact deviations demanding immediate corrective measures.
- Medium–High (25–39): moderate-impact issues with recurring presence, suitable for targeted remediation.
- Medium (10–24): procedural imperfections manageable through training and localized supervision.
- Low (<10): rare or minor events to be tracked through routine monitoring.
4.2. Predictive Impact Layer Versus AI-Based Prediction
4.3. Comparative Performance and Analytical Discussion
- Comparative perspective with recent process mining studies
- Analytical insights and managerial implications
- Reproducibility and evidence credibility
4.4. Visual Summary of Predictive Improvement Scenarios
- ▪
- Initial compliance interventions yield the largest measurable benefits in both time and workload.
- ▪
- Beyond the partial optimization level, further performance gains require deeper structural or technological redesigns rather than procedural enforcement alone.
5. Conclusions and Future Work
- ○
- Longitudinal and Multi-Center Validation: Testing PIM across larger oncology datasets and multiple hospitals to evaluate generalizability and inter-institutional comparability.
- ○
- Hybrid AI Integration: Combining adaptive learning of deviation patterns from ML engines with PIM’s interpretable forecasting structure to form a self-optimizing predictive loop.
- ○
- Real-Time Embedding: Integrating the PIM module into active process mining dashboards for continuous monitoring and rapid deviation alerts.
- ○
- Expanded KPI Spectrum: Incorporating outcome-linked indicators such as adherence rate, patient satisfaction, and complication index to connect operational efficiency with clinical results.
- ○
- Policy and Governance Use: Employing processed simulations as evidence in clinical governance and strategic resource planning, ensuring that improvement actions are traceable and justifiable under data-driven protocols.
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Roock, E.D.; Martin, N. Process mining in healthcare—An updated perspective on the state of the art. J. Biomed. Inform. 2022, 127, 103995. [Google Scholar] [CrossRef]
- Sharma, A.; Jasrotia, S.; Kumar, A. Effects of Chemotherapy on the Immune System: Implications for Cancer Treatment and Patient Outcomes. Naunyn-Schmiedeberg’s Arch. Pharmacol. 2023, 397, 2551–2566. [Google Scholar] [CrossRef]
- Mirmozaffari, M.; Zandieh, M.; Seyed Mojtaba, H. A cloud theory-based simulated annealing for discovering process model from event logs. In Proceedings of the 10th International Conference on Innovations in Science, Engineering, Computers and Technology (ISECT-2017), Dubai, United Arab Emirates, 17 October 2017; pp. 70–75. [Google Scholar]
- Ungvari, Z.; Fekete, M.; Buda, A.; Lehoczki, A.; Munkácsy, G.; Scaffidi, P.; Bonaldi, T.; Fekete, J.T.; Bianchini, G.; Varga, P.; et al. Quantifying the impact of treatment delays on breast cancer survival outcomes: A comprehensive meta-analysis. GeroScience 2025, 1, 1–5. [Google Scholar] [CrossRef] [PubMed]
- Gualandi, R.; Masella, C.; Viglione, D.; Tartaglini, D. Exploring the hospital patient journey: What does the patient experience? PLoS ONE 2019, 14, e0224899. [Google Scholar] [CrossRef]
- Aalst, W.V.D. Process Mining: Data Science in Action; Springer: Berlin/Heidelberg, Germany, 2016. [Google Scholar]
- Salehi, M.; Khayami, R.; Mirmozaffari, M. An integrated fuzzy Delphi-DEMATEL and analytic network process for sustainable operations and evaluations in process mining. Int. J. Manag. Decis. Mak. 2025, 24, 583–613. [Google Scholar] [CrossRef]
- Guzzo, A.; Rullo, A.; Vocaturo, E. Process mining applications in the healthcare domain: A comprehensive review. WIREs 2021, 12, e1442. [Google Scholar] [CrossRef]
- Brzychczy, E.; Aleknonytė-Resch, M.; Janssen, D.; Koschmider, A. Process mining on sensor data: A review of related works. Knowl. Inf. Syst. 2025, 67, 4915–4948. [Google Scholar] [CrossRef]
- Cecconi, A.; Giacomo, G.D.; Di Ciccio, C.; Maria, F. Measuring the interestingness of temporal logic behavioral specifications in process mining. Inf. Syst. 2022, 102, 101920. [Google Scholar] [CrossRef]
- Savino, M.; Chiloiro, G.; Masciocchi, C.; Capocchiano, N.D.; Lenkowicz, J. A process mining approach for clinical guidelines compliance: Real-world application in rectal cancer. Front. Oncol. 2023, 13, 1090076. [Google Scholar] [CrossRef] [PubMed]
- Rebuge, Á.; Ferreira, D.R. Business process analysis in healthcare environments: A methodology based on process mining. Inf. Syst. 2012, 37, 99–116. [Google Scholar] [CrossRef]
- Llatas, F.; Lizondo, A.; Sánchez, M. Process mining methodology for health process tracking using real-time indoor location systems. Sensors 2015, 12, 29821–29840. [Google Scholar] [CrossRef]
- Rojas, E.; Munoz-Gama, J.; Sepúlveda, M. Process mining in healthcare: A literature review. Biomed. Inf. 2016, 2, 224–236. [Google Scholar] [CrossRef] [PubMed]
- Santos, A.; Lapasini Leal, G.C.; Balancieri, R. Process mining in healthcare: A tertiary study. BMC Med. Inform. Decis. Mak. 2025, 25, 306. [Google Scholar] [CrossRef]
- Aversano, L.; Iammarino, M.; Madau, A. Process mining applications in healthcare: A systematic literature review. PeerJ Comput. Sci. 2025, 11, e2613. [Google Scholar] [CrossRef]
- Mansur, S.S.; Kurniati, A.P.; Rojas, E. Process Mining to Improve Clinical Pathways in Breast Cancer Treatment Using the Indonesia Health Insurance Dataset. In Proceedings of the 12th International Conference of Information and Communication Technology (ICoICT), Bandung, Indonesia, 7–8 August 2024. [Google Scholar]
- Kurniati, A.; Hall, G.; Hogg, D.C. Process mining in oncology using the MIMIC-III dataset. J. Phys. Conf. Ser. 2018, 971, 012008. [Google Scholar] [CrossRef]
- Atighehchian, A.; Ajami, S.; Alidadi, T. Overcoming the Bottlenecks in the Health System by Using Process Mining Approach. Health Manag. Inf. Sci. J. (HMIS) 2022, 9, 123–124. [Google Scholar]
- Alinezhad, A.; Mirmozaffari, M. Malmquist Productivity Index Using Two-Stage DEA Model in Heart Hospitals. Iran. J. Optim. 2018, 10, 81–92. [Google Scholar]
- Rosa, A.; Massaro, A. Process Mining Organization (PMO) Based on Machine Learning Decision Making for Prevention of Chronic Diseases. Eng 2024, 5, 282–300. [Google Scholar] [CrossRef]
- Samara, M.N.; Harry, K.D. Leveraging Kaizen with Process Mining in Healthcare Settings: A Conceptual Framework for Data-Driven Continuous Improvement. Healthcare 2025, 13, 941. [Google Scholar] [CrossRef]
- Teinemaa, I.; Dumas, M.; La Rosa, M. Outcome-Oriented Predictive Process Monitoring: Review and Benchmark. Artif. Intell. 2018, 13, 1–57. [Google Scholar] [CrossRef]
- Kratsch, W.; Manderscheid, J.; Roeglinger, M. Machine Learning in Business Process Monitoring: A Comparison of Deep Learning and Classical Approaches Used for Outcome Prediction. Bus. Inf. Syst. Eng. 2021, 64, 261–276. [Google Scholar] [CrossRef]
- Madau, A.; Semeraro, G. Process Mining and Machine Learning for Predicting Clinical Outcomes in Emergency Care: A Study on the MIMICEL Dataset. In Proceedings of the 14th International Conference on Data Science, Technology and Applications, Special Session on Data-Driven Models for Digital Health Transformation, Bilbao, Spain, 10–12 June 2025; pp. 791–799. [Google Scholar]
- Winter, M.; Langguth, B.; Schlee, W.; Pryss, R. Process mining in mHealth data analysis. Npj Digit. Med. 2024, 7, 294. [Google Scholar] [CrossRef] [PubMed]
- Delgado, A.; Calegari, D.; Espino, C. Predictive process monitoring for collaborative business processes: Concepts and application. Discov. Anal. 2025, 3, 5. [Google Scholar] [CrossRef]
- Lamine, E.; Fontanili, F.; Mascolo, M.D. Improving the Management of an Emergency Call Service by Combining Process Mining and Discrete Event Simulation Approaches. In Proceedings of the 16th Working Conference on Virtual Enterprises (PROVE), Risks and Resilience of Collaborative Networks, Albi, France, 5–7 October 2015. [Google Scholar]
- Jadrić, M.; Pasalic, I.N.; Ćukušić, M. Process Mining Contributions to Discrete-event Simulation Modelling. Bus. Syst. Res. J. 2020, 11, 51–72. [Google Scholar] [CrossRef]
- Cunzolo, M.D.; Guastalla, A.; Aringhieri, R.; Sulis, E.; Amantea, I.A. Combining Process Mining and Optimization: A Scheduling Application in Healthcare. Bus. Process Manag. Workshops 2023, 460, 197–209. [Google Scholar]
- Salas, E.; Arias, M.; Aguirre, S.; Rojas, E. Combining Process Mining and Process Simulation in Healthcare: A Literature Review. IEEE Access 2024, 12, 1–19. [Google Scholar] [CrossRef]
- Ronzani, M.; Sulis, E. Improve Hospital Management Through Process Mining, Optimization, and Simulation: The CH4I-PM Project. KI Künstliche Intell. 2024, 39, 167–172. [Google Scholar] [CrossRef]
- Alharbi, A.; Bulpitt, A.; Johnson, O. Improving Pattern Detection in Healthcare Process Mining Using an Interval-Based Event Selection Method. Bus. Process Manag. Forum 2017, 297, 88–105. [Google Scholar]
- van Eck, M.L.; Lu, X.; Sander, J.J.L.; van der Aalst, M.P. PM2: A Process Mining Project Methodology. Adv. Inf. Syst. Eng. 2015, 9097, 297–313. [Google Scholar]
- Boer, T.R.D.; Arntzen, R.J.; Bekker, R.; Buurman, B.M. Process Mining on National Health Care Data for the Discovery of Patient Journeys of Older Adults. J. Am. Med. Dir. Assoc. 2025, 26, 105333. [Google Scholar] [CrossRef]
- Wicky, A.; Gatta, R.; Latifyan, S.; Micheli, R.D. Interactive process mining of cancer treatment sequences with melanoma real-world data. Frontiers 2023, 13, 1043683. [Google Scholar] [CrossRef]
- Heiskanen, M.A.; Aittokallio, T. Mining high-throughput screens for cancer drug targets—Lessons from yeast chemical-genomic profiling and synthetic lethality. WIREs 2012, 2, 263–272. [Google Scholar] [CrossRef]
- Aalst, W.M.P. PM2: A Process Mining Project Methodology. In Process Mining Workbook; Springer: Eindhoven, The Netherlands, 2022. [Google Scholar]
- Santos, A.F.D.; Rocha Loures, E.D.F.; Portela Santos, E.A. Application of the PM2 methodology in the analysis of assembly processes. Procedia CIRP 2025, 132, 159–164. [Google Scholar] [CrossRef]
- Dhingra, L.S.; Shen, M.; Mangla, A.; Khera, R. Cardiovascular Care Innovation through Data-Driven Discoveries in the Electronic Health Record. Am. J. Cardiol. 2023, 203, 136–148. [Google Scholar] [CrossRef]
- Mirmozaffari, M.; Alinezhad, A. Ranking of Heart Hospitals Using cross-efficiency and two-stage DEA. In Proceedings of the 2017 7th International Conference on Computer and Knowledge Engineering (ICCKE), Mashhad, Iran, 26 October 2017; pp. 217–222. [Google Scholar]
- Freytag, T.; Sänger, M. WoPeD—An Educational Tool for Workflow Nets. In Proceedings of the BPM Demo Sessions 2014, Eindhoven, The Netherlands, 20 September 2014; Volume 2014. [Google Scholar]
- Wynn, M.T.; Aalst, W.V.D.; Verbeek, E.; Stefano, B.D. The IEEE XES Standard for Process Mining: Experiences, Adoption, and Revision. IEEE Comput. Intell. Mag. 2024, 19, 20–23. [Google Scholar] [CrossRef]
- Van der Aalst, W.; Adriansyah, A.; Van Dongen, B. Replaying history on process models for conformance checking and performance analysis. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2012, 2, 182–192. [Google Scholar] [CrossRef]
- Dumas, M.; La Rosa, M.; Reijers, M.J. Fundamentals of Business Process Management, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 2018. [Google Scholar]
- DuLiyuan, G.; Zhou, H.M. Variance Analysis and Handling of Clinical Pathway: An Overview of the State of Knowledge. IEEE Access 2020, 99, 1. [Google Scholar]
- Riehmann, P.; Hanfler, M.; Froeh, B.; Diagrams, I.S. In Proceedings of the IEEE Symposium on Information Visualization. Minneapolis, MN, USA, 23–25 October 2005. [Google Scholar]
- McKinney, W. Python for Data Analysis, 3rd ed.; O’REILLY: Sebastopol, CA, USA, 2022. [Google Scholar]
- Benneyan, J. Statistical quality control methods in infection control and hospital epidemiology, part I: Introduction and basic theory. Infect Control. Hosp. Epidemiol. 1998, 19, 194–214. [Google Scholar]
- Andrea, S.; Marco, R.; Terry, A.; Francesca, C.; Jessica, C. Global Sensitivity Analysis: The Primer; Wiley: Hoboken, NJ, USA, 2008. [Google Scholar]
- Ruxton, G.D. The unequal variance t-test is an underused alternative to Student’s t-test and the Mann–Whitney U test. Behav. Ecol. 2006, 17, 688–690. [Google Scholar] [CrossRef]
- Cohen, J. Statistical Power Analysis for the Behavioral Sciences, 2nd ed.; Routledge: New York, NY, USA, 2013. [Google Scholar]
- Mirmozaffari, M.; Shadkam, E.; Khalili, S.M.; Yazdani, M. Developing a novel integrated generalised data envelopment analysis (DEA) to evaluate hospitals providing stroke care services. Bioengineering 2021, 8, 207. [Google Scholar] [CrossRef] [PubMed]
- Mirmozaffari, M.; Yazdani, R.; Shadkam, E.; Khalili, S.M.; Tavassoli, L.S.; Boskabadi, A. A novel hybrid parametric and non-parametric optimisation model for average technical efficiency assessment in public hospitals during and post-COVID-19 pandemic. Bioengineering 2021, 9, 7. [Google Scholar] [CrossRef]
- Peykani, P.; Memar-Masjed, E.; Arabjazi, N.; Mirmozaffari, M. Dynamic performance assessment of hospitals by applying credibility-based fuzzy window data envelopment analysis. Healthcare 2022, 10, 876. [Google Scholar] [CrossRef]
- Mirmozaffari, M.; Yazdani, R.; Shadkam, E.; Khalili, S.M.; Mahjoob, M.; Boskabadi, A. An integrated artificial intelligence model for efficiency assessment in pharmaceutical companies during the COVID-19 pandemic. Sustain. Oper. Comput. 2022, 3, 156–167. [Google Scholar] [CrossRef]
- Rahimi, A.; Hejazi, S.M.; Zandieh, M.; Mirmozaffari, M. A novel hybrid simulated annealing for no-wait open-shop surgical case scheduling problems. Appl. Syst. Innov. 2023, 6, 15. [Google Scholar] [CrossRef]
- Hejazi, S.M.; Zandieh, M.; Mirmozaffari, M. A multi-objective medical process mining model using event log and causal matrix. Healthc. Anal. 2023, 3, 100188. [Google Scholar] [CrossRef]
- Mirmozaffari, M.; Kamal, N. A data envelopment analysis model for optimizing transfer time of ischemic stroke patients under endovascular thrombectomy. Healthc. Anal. 2024, 6, 100364. [Google Scholar] [CrossRef]
- Mirmozaffari, M.; Kamal, N. The application of data envelopment analysis to emergency departments and management of emergency conditions: A narrative review. Healthcare 2023, 11, 2541. [Google Scholar] [CrossRef]







| Field Name | Type | Description |
|---|---|---|
| GlobalReceptionID | Numeric (18.0) | Unique patient record number |
| ReceptionID | Numeric (18.0) | Admission identifier |
| DocumentCode | NVarchar (50) | Medical file number |
| ParaclinicChildID | Numeric (18.0) | Paraclinic code |
| ReceptionDate | NVarchar (50) | Admission date |
| ReceptionTime | NVarchar (50) | Admission time |
| PM2 Phase | Main Tool(s) | Role in This Study |
|---|---|---|
| Data Extraction and Preparation | Python 3.13.0 (NumPy, Pandas) | Conversion of HIS records into IEEE-XES event logs, filtering and log structuring. |
| Normative Modeling | WoPeD (v 3.6) | Creation of BPMN/Petri-Net model defining the ideal chemotherapy workflow. |
| Process Discovery and Conformance Checking | ProM (v 6.12) | Detection of real execution variants, quantification of deviations, and KPI calculation. |
| Predictive Impact Modeling (PIM) | Python + ProM Outputs Quantitative | simulation of Cycle Time and Workload reductions under “what-if” scenarios. |
| Rank | Activity | Count | Percentage |
|---|---|---|---|
| 1 | Admission event | 215 | 17.145% |
| 2 | Treatment Completion _event | 194 | 15.47% |
| 3 | Billing _event | 188 | 14.992% |
| 4 | Injection _event | 181 | 14.434% |
| 5 | Chemotherapy _event | 176 | 14.035% |
| 6 | Record Handling _event | 154 | 12.281% |
| 7 | Nurse and Medication Basket _event | 146 | 11.643% |
| KPI | Current Optimization Simulation | Predictive Model (Full Adherence) |
|---|---|---|
| Patient Processing Time Reduction | 8% | 12% |
| Staff Workload Reduction | 6% | 9% |
| Scenario | Parameter Change | Predicted Cycle Time Reduction (%) | Predicted Workload Reduction (%) |
|---|---|---|---|
| Current Optimization | Baseline | 8.0 | 6.0 |
| Current Optimization | Delay −10% | 7.3 | 5.6 |
| Current Optimization | Delay +10% | 8.6 | 6.3 |
| Full Adherence | Baseline | 12.0 | 9.0 |
| Full Adherence | Delay −10% | 11.2 | 8.5 |
| Full Adherence | Delay +10% | 12.7 | 9.4 |
| KPI | Scenario | Mean (Baseline) | Mean (Scenario) | Test | t Value | p-Value | Cohen’s d |
|---|---|---|---|---|---|---|---|
| Cycle Time (h) | Current Opt. | 24.00 ± 3.50 | 22.05 ± 3.45 | Paired t-test | 5.85 | <0.001 | 0.40 |
| Cycle Time (h) | Full Adherence | 24.00 ± 3.50 | 21.05 ± 3.40 | Paired t-test | 7.12 | <0.001 | 0.50 |
| Workload (%) | Current Opt. | 100.0 ± 4.5 | 94.0 ± 4.4 | Paired t-test | 6.48 | <0.001 | 0.44 |
| Workload (%) | Full Adherence | 100.0 ± 4.5 | 91.0 ± 4.3 | Paired t-test | 8.20 | <0.001 | 0.52 |
| Category | Representative Causes | Relative Frequency (%) | Average Impact on Cycle Time (%) | Management Mitigation Approach |
|---|---|---|---|---|
| Human Factors | Incomplete prescription data; skipped double-check steps; insufficient training on new digital system | 41 | +2.9 | Refresher training and automated approval alerts |
| Technical Factors | HIS pharmacy data mismatch; barcode scanning errors; unstable printer/API links | 33 | +1.7 | Integration patches and real-time I/O validation |
| Organizational Factors | Delayed physician approval; overlapping shift boundaries; resource unavailability (pharmacy queue) | 26 | +1.4 | Protocol enforcement and shift synchronization |
| Deviation Type | Root-Cause Category | Relative Frequency (%) | Average Impact on Cycle Time (%) | Impact × Frequency Score | Priority Level | Recommended Mitigation Approach | |
|---|---|---|---|---|---|---|---|
| Skipped Approval Step | Human | 18 | +2.4 | 43.2 | High | Staff refresher training; automated approval alerts | |
| Resequenced Treatment Record | Technical | 14 | +2.1 | 29.4 | Medium–High | HIS–Pharmacy system integration patch | |
| Incomplete Prescription Data | Human | 9 | +1.9 | 17.1 | Medium | Mandatory data validation at entry | |
| Queue Delay (Pharmacy) | Organizational | 7 | +1.6 | 11.2 | Medium | Shift synchronization; resource balancing | |
| Barcode Scan Error | Technical | 6 | +1.4 | 8.4 | Low | Scanner/API real-time I/O monitoring | |
| Missed Double-Check | 5 | Human | +1.2 | 6.0 | Low | Standardized double-check automated prompt | |
| Criterion | PM2–PIM (Integrated Framework) | PM + Decision Support Systems (DSS) | Discrete-Event Simulation (DES) | Machine-Learning (ML) Predictors |
|---|---|---|---|---|
| Analytical Principle | Process-native, model-driven; embeds quantitative impact estimation inside the PM2 lifecycle. | Process-aware dashboards support managerial oversight. | Event-based reconstruction of process behavior through external simulators. | Data-driven statistical learning on historical traces. |
| Primary Objective | Forecast aggregate system efficiency gains from deviation removal. | Monitor key performance indicators for control and compliance. | Evaluate hypothetical scenarios for resource utilization or capacity planning. | Predict case-level timing, risk, or outcome. |
| Causal Traceability High direct | Deviation → impact → KPI change. | Linkage: Partial; qualitative mapping between events and metrics. | Moderate; interpretable through simulation assumptions. | Weak; correlation-driven, often black-box. |
| Quantification of Impact | Direct numerical delta on KPIs (Cycle Time ↓ 8%, Workload ↓ 6%). | Limited to threshold or rule-based alerts. | Strong but external to mining tools; requires parameter fitting. | Implicit; derived from model accuracy. |
| Data Dependency | Moderate (event log driven, no synthetic data). | Moderate (transactional + rule-based). | High (stochastic input distributions and service times). | Very high (large, labeled datasets). |
| Interpretability/Transparency | Very high, process-semantic and reproducible. | High, rule and metric oriented. | Medium, relies on simulation model logic. | Low, opaque to causal structure. |
| Implementation Complexity | Low—realized in WoPeD/ProM plus Python (PIM layer). | Moderate—requires DSS integration and upkeep. | High—specialized software and parameter tuning. | High—requires feature engineering and retraining. |
| Strengths | Quantitative, reproducible, explainable; bridges conformance and prediction. | Continuous monitoring; easy managerial interface. | Captures complex system dynamics; long-run projections. | High predictive accuracy; adaptive learning. |
| Limitations | Needs accurate conformance baselines; batch not real-time. | Weak forecasting power; qualitative reasoning. | Disconnect from mined causal semantics; high modeling cost. | Poor interpretability; high data demand. |
| Best Use Domain | Strategic planning and policy prioritization in healthcare. | Operational dashboards and compliance checking. | Research or capacity simulation projects. | Early-warning or risk-prediction systems. |
| Criterion | Predictive Impact Modeling (PIM) | AI/Machine Learning Approach |
|---|---|---|
| Analytical Principle | Model-driven (built on PM2 process graph). | Data-driven correlation learning. |
| Primary Objective | Quantify aggregate gains from deviation elimination. | Predict individual case behavior or outcome. |
| Explainability/Traceability | Very high—linked to process structure. | Low—black-box models. |
| Data Volume Need | Moderate (clean event logs). | Very large balanced datasets |
| Implementation Complexity | Simple—in-house Python scripts. | High—requires ML pipeline, parameter tuning. |
| Expertise Dependence | Moderate—HIS/PM2 staff. | High—needs data-science team. |
| Operational Cost | Low—no external contracting. | High—training and model maintenance. |
| Interpretability for Managers | High. | Often limited. |
| Integration with PM2 Data | Native. | Indirect—needs data conversion. |
| Scalability and Portability | Easily replicable across units. | Framework-specific retraining |
| Best Use Domain | Strategic planning and scenario forecasting. | Real-time alerts and operational prediction. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Salehi, M.; Khayami, R.; Akbari, R.; Mirmozaffari, M. An Integrated Predictive Impact–Enhanced Process Mining Framework for Strategic Oncology Workflow Optimization: Case Study in Iran. Bioengineering 2025, 12, 1288. https://doi.org/10.3390/bioengineering12121288
Salehi M, Khayami R, Akbari R, Mirmozaffari M. An Integrated Predictive Impact–Enhanced Process Mining Framework for Strategic Oncology Workflow Optimization: Case Study in Iran. Bioengineering. 2025; 12(12):1288. https://doi.org/10.3390/bioengineering12121288
Chicago/Turabian StyleSalehi, Mohammad, Raouf Khayami, Reza Akbari, and Mirpouya Mirmozaffari. 2025. "An Integrated Predictive Impact–Enhanced Process Mining Framework for Strategic Oncology Workflow Optimization: Case Study in Iran" Bioengineering 12, no. 12: 1288. https://doi.org/10.3390/bioengineering12121288
APA StyleSalehi, M., Khayami, R., Akbari, R., & Mirmozaffari, M. (2025). An Integrated Predictive Impact–Enhanced Process Mining Framework for Strategic Oncology Workflow Optimization: Case Study in Iran. Bioengineering, 12(12), 1288. https://doi.org/10.3390/bioengineering12121288

