Checking Medical Process Conformance by Exploiting LLMs
Abstract
1. Introduction
- In particular, our work is articulated into three main contributions:
- We adopt a Large Language Model (LLM)-based approach to
- Extract a set of normative behaviors from a textual guideline;
- Formalize such an output into executable rules.
- We check the conformance of patient traces to the extracted rules and quantify it. To this end, we define the Trace Conformance Indicator (TCI), a metric measuring the percentage of log traces that satisfy the rules.
- We also exploit our conformance-checking approach as a means for assessing the quality of process model discovery algorithms [1]. To this end,
- We mine a process model from the event log by means of the algorithm we wish to evaluate, obtaining a graphical representation (a Petri Net).
- We check the conformance of the paths in the Petri Net to the extracted rules and quantify it. To this end, we define the Path Conformance Indicator (PCI), a metric measuring the percentage of model paths that satisfy the rules.
2. Related Work
2.1. Conformance Checking
2.2. LLMs in Process Mining
2.3. Final Considerations
3. LLM-Assisted Conformance Checking
3.1. Architecture
- The tool operates according to the following steps:
- The Rule Detection module takes in input of the clinical guideline in a textual format in natural language. It implements an algorithm that makes an HTTP-REST call to the ChatGPT-4o API using Python to query the ChatGPT-4o LLM, passing the guideline as an input. This enables the LLM to extract rules in natural language.
- The Rule Checking module shows the extracted rules to a medical expert, who is in charge of checking the validity of the rules on the basis of domain knowledge.
- The Rule Formalization module takes in input of the validated rules along with an example trace from the available event log, and automatically converts the rules from natural language into Python script code by using the ChatGPT o3-mini-high model (optimized for coding).
- The Trace Conformance Checking module takes in input of the event log and the formalized rules, and automatically checks the log conformance with respect to the rules, trace by trace, outputting the TCI, defined as the percentage of compliant traces for each rule.
- The Model Conformance Checking module takes in input of a process model and the formalized rules, and outputs the PCI, i.e., the percentage of paths compliant with each rule. In particular, the best model is the one where the PCI is the closest to the TCI calculated on traces in step 4.
3.2. Trace Conformance Checking
Algorithm 1 Function returns the percentage of traces that satisfy a rule |
|
3.3. Model Conformance Checking
- is a finite set of places.
- is a finite set of transitions, with .
- is a set of arcs (flow relation).
- is a weight function that assigns a positive integer weight to each arc.
- is the initial marking, a function that assigns a number of tokens to each place.
4. Experimental Results
- If patients have acute ischemic stroke and treatment can be started within 4.5 h of known onset, then they should be considered for thrombolysis with alteplase or tenecteplase;
- If patients have acute ischemic stroke and were last known to be well more than 4.5 h earlier, then they should be considered for thrombolysis with alteplase if treatment can be started between 4.5 and 9 h of known onset or within 9 h of the midpoint of sleep when they have woken with symptoms, and they have evidence from CT/MR perfusion (core-perfusion mismatch) or MRI (DWI-FLAIR mismatch) of the potential to salvage brain tissue;
- If patients with acute ischemic stroke are treated with thrombolysis, then they should be started on an antiplatelet agent after 24 h unless contraindicated, once significant hemorrhage has been excluded.
5. Discussion
IF treating patients with MS with natalizumab, THEN periodically evaluate the risk of Progressive Multifocal Leukoencephalopathy (PML) by measuring anti-JCV antibody titer, AND discuss the risk/benefit ratio of continuing therapy with the patient; after initiating therapy, consider switching to a 6-week interval dosing regimen to minimize PML risk.
6. Conclusions and Future Research Directions
- It adopts an LLM to extract normative behaviors from a textual guideline and to formalize it into computer-interpretable rules; while this task is relatively straightforward (and, in fact, could be easily adopted to different domains as well), to the best of our knowledge, our approach is the first of its kind in medicine.
- It evaluates the conformance of patient traces to the extracted rules and quantifies it through the TCI. The TCI allows us to assess the conformance of the actual behavior of a given hospital with respect to the guideline, in a quality assessment perspective, without requiring a time-consuming formalization of the guideline itself.
- It evaluates the conformance of a process model to the rules: it can therefore be adopted as a new dimension to compare different process models (discovered by different algorithms), focusing on their adherence to the prescribed normative behavior.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A
Algorithm A1 Function returns the percentage of paths in the model that satisfy a particular rule . |
|
References
- van der Aalst, W.M.P. Process Mining—Data Science in Action, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 2016. [Google Scholar] [CrossRef]
- Reichert, M.; Weber, B. Enabling Flexibility in Process-Aware Information Systems—Challenges, Methods, Technologies; Springer: Berlin/Heidelberg, Germany, 2012. [Google Scholar] [CrossRef]
- Desel, J.; Reisig, W.; Rozenberg, G. (Eds.) Lectures on Concurrency and Petri Nets, Advances in Petri Nets; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2004; Volume 3098. [Google Scholar] [CrossRef]
- de Clercq, P.A.; Kaiser, K.; Hasman, A. Computer-interpretable Guideline Formalisms. In Computer-Based Medical Guidelines and Protocols: A Primer and Current Trends; ten Teije, A., Miksch, S., Lucas, P.J.F., Eds.; Studies in Health Technology and Informatics; IOS Press: Amsterdam, The Netherlands, 2008; Volume 139, pp. 22–43. [Google Scholar] [CrossRef]
- Buijs, J.C.A.M.; van Dongen, B.F.; van der Aalst, W.M.P. On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery. In On the Move to Meaningful Internet Systems: OTM 2012, Confederated International Conferences: CoopIS, DOA-SVI, and ODBASE 2012, Rome, Italy, 10–14 September 2012; Proceedings, Part I; Meersman, R., Panetto, H., Dillon, T.S., Rinderle-Ma, S., Dadam, P., Zhou, X., Pearson, S., Ferscha, A., Bergamaschi, S., Cruz, I.F., Eds.; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2012; Volume 7565, pp. 305–322. [Google Scholar] [CrossRef]
- Dunzer, S.; Stierle, M.; Matzner, M.; Baier, S. Conformance checking: A state-of-the-art literature review. In Proceedings of the 11th International Conference on Subject-Oriented Business Process Management, S-BPM ONE 2019, Seville, Spain, 26–28 June 2019; Betz, S., Ed.; ACM: New York, NY, USA, 2019; pp. 1–10. [Google Scholar] [CrossRef]
- Rozinat, A.; van der Aalst, W.M.P. Conformance checking of processes based on monitoring real behavior. Inf. Syst. 2008, 33, 64–95. [Google Scholar] [CrossRef]
- Leemans, S.J.J.; Fahland, D.; van der Aalst, W.M.P. Scalable process discovery and conformance checking. Softw. Syst. Model. 2018, 17, 599–631. [Google Scholar] [CrossRef] [PubMed]
- Adriansyah, A.; Munoz-Gama, J.; Carmona, J.; van Dongen, B.F.; van der Aalst, W.M.P. Alignment Based Precision Checking. In Proceedings of the Business Process Management Workshops—BPM 2012 International Workshops, Tallinn, Estonia, 3 September 2012; Rosa, M.L., Soffer, P., Eds.; Revised Papers; Lecture Notes in Business Information Processing. Springer: Berlin/Heidelberg, Germany, 2012; Volume 132, pp. 137–149. [Google Scholar] [CrossRef]
- Borrego, D.; Barba, I. Conformance checking and diagnosis for declarative business process models in data-aware scenarios. Expert Syst. Appl. 2014, 41, 5340–5352. [Google Scholar] [CrossRef]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 3–5 June 2019; Burstein, J., Doran, C., Solorio, T., Eds.; Long and Short Papers. Association for Computational Linguistics: Minneapolis, MN, USA; 2019; Volume 1, pp. 4171–4186. [Google Scholar] [CrossRef]
- OpenAI; Achiam, J.; Adler, S.; Agarwal, S.; Ahmad, L.; Akkaya, I.; Aleman, F.L.; Almeida, D.; Altenschmidt, J.; Altman, S.; et al. OpenAI: GPT-4 technical report CoRR. arXiv 2024, arXiv:2303.08774. [Google Scholar] [CrossRef]
- Khurana, D.; Koli, A.; Khatter, K.; Singh, S. Natural language processing: State of the art, current trends and challenges. Multim. Tools Appl. 2023, 82, 3713–3744. [Google Scholar] [CrossRef] [PubMed]
- Berti, A.; Kourani, H.; Hafke, H.; Li, C.Y.; Schuster, D. Evaluating Large Language Models in Process Mining: Capabilities, Benchmarks, and Evaluation Strategies. arXiv 2024, arXiv:2403.06749. [Google Scholar] [CrossRef]
- Berti, A.; Schuster, D.; van der Aalst, W.M.P. Abstractions, Scenarios, and Prompt Definitions for Process Mining with LLMs: A Case Study. In Proceedings of the Business Process Management Workshops—BPM 2023 International Workshops, Utrecht, The Netherlands, 11–15 September 2023; Weerdt, J.D., Pufahl, L., Eds.; Revised Selected Papers; Lecture Notes in Business Information Processing. Springer: Berlin/Heidelberg, Germany, 2023; Volume 492, pp. 427–439. [Google Scholar] [CrossRef]
- Kourani, H.; Berti, A.; Hennrich, J.; Kratsch, W.; Weidlich, R.; Li, C.Y.; Arslan, A.; Schuster, D.; van der Aalst, W.M.P. Leveraging Large Language Models for Enhanced Process Model Comprehension. arXiv 2024, arXiv:2408.08892. [Google Scholar]
- Qafari, M.S.; van der Aalst, W. Fairness-Aware Process Mining. In Proceedings of the on the Move to Meaningful Internet Systems: OTM 2019 Conferences, Rhodes, Greece, 21–25 October 2019; Panetto, H., Debruyne, C., Hepp, M., Lewis, D., Ardagna, C.A., Meersman, R., Eds.; Springer: Berlin/Heidelberg, Germany; Cham, Switzerland, 2019; pp. 182–192. [Google Scholar]
- Kourani, H.; Berti, A.; Schuster, D.; van der Aalst, W.M. ProMoAI: Process Modeling with Generative AI. In Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence. International Joint Conferences on Artificial Intelligence Organization, Jeju, Republic of Korea, 3–9 August 2024. [Google Scholar]
- Grohs, M.; Abb, L.; Elsayed, N.; Rehse, J.R. Large Language Models can accomplish Business Process Management Tasks. arXiv 2023, arXiv:2307.09923. [Google Scholar] [CrossRef]
- Jessen, U.; Sroka, M.; Fahland, D. Chit-Chat or Deep Talk: Prompt Engineering for Process Mining. arXiv 2023, arXiv:2307.09909. [Google Scholar] [CrossRef]
- Berti, A.; van Zelst, S.; Schuster, D. PM4Py: A process mining library for Python. Softw. Impacts 2023, 17, 100556. [Google Scholar] [CrossRef]
- Weijters, A.J.; van Der Aalst, W.M.; De Medeiros, A.A. Process Mining with the HeuristicsMiner Algorithm; Technische Universiteit Eindhoven: Eindhoven, The Netherlands, 2006. [Google Scholar]
- Bottrighi, A.; Guazzone, M.; Leonardi, G.; Montani, S.; Striani, M.; Terenziani, P. Integrating ISA and Part-of Domain Knowledge into Process Model Discovery. Future Internet 2022, 14, 357. [Google Scholar] [CrossRef]
- Jensen, K.; Kristensen, L.M. Coloured Petri Nets—Modelling and Validation of Concurrent Systems; Springer: Berlin/Heidelberg, Germany, 2009. [Google Scholar] [CrossRef]
- Reisig, W.; Rozenberg, G. (Eds.) Lectures on Petri Nets I: Basic Models, Advances in Petri Nets; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 1998; Volume 1491. [Google Scholar] [CrossRef]
- van der Aalst, W.M.P.; Stahl, C. Modeling Business Processes—A Petri Net-Oriented Approach; Cooperative Information Systems Series; MIT Press: Cambridge, MA, USA, 2011. [Google Scholar]
- Esparza, J.; Römer, S.; Vogler, W. An Improvement of McMillan’s Unfolding Algorithm. Form. Methods Syst. Des. 2002, 20, 285–310. [Google Scholar] [CrossRef]
- van der Aalst, W.M.P.; Rubin, V.; Verbeek, H.M.W.; van Dongen, B.F.; Kindler, E.; Günther, C.W. Process mining: A two-step approach to balance between underfitting and overfitting. Softw. Syst. Model. 2010, 9, 87–111. [Google Scholar] [CrossRef]
- National Clinical Guideline for Stroke for the UK and Ireland. London: Intercollegiate Stroke Working Party. Available online: https://www.strokeguideline.org (accessed on 8 April 2025).
- van der Aalst, W.M.P.; van Dongen, B.F. Discovering Workflow Performance Models from Timed Logs. In Proceedings of the Engineering and Deployment of Cooperative Information Systems, First International Conference, EDCIS 2002, Beijing, China, 17–20 September 2002; Proceedings. Han, Y., Tai, S., Wikarski, D., Eds.; Lecture Notes in Computer Science. Springer: Berlin/Heidelberg, Germany, 2002; Volume 2480, pp. 45–63. [Google Scholar] [CrossRef]
- Johnson, R.A. Miller and Freund’s Probability and Statistics for Engineers, 8th ed.; Prentice Hall International: Hoboken, NJ, USA, 2011. [Google Scholar]
- Rojas, E.; Munoz-Gama, J.; Sepúlveda, M.; Capurro, D. Process mining in healthcare: A literature review. J. Biomed. Inform. 2016, 61, 224–236. [Google Scholar] [CrossRef] [PubMed]
- Yang, L.; Xu, S.; Sellergren, A.; Kohlberger, T.; Zhou, Y.; Ktena, I.; Kiraly, A.; Ahmed, F.; Hormozdiari, F.; Jaroensri, T.; et al. Advancing Multimodal Medical Capabilities of Gemini. arXiv 2024, arXiv:2405.03162. [Google Scholar] [CrossRef]
- Lewis, P.; Perez, E.; Piktus, A.; Petroni, F.; Karpukhin, V.; Goyal, N.; Küttler, H.; Lewis, M.; tau Yih, W.; Rocktäschel, T.; et al. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. In Proceedings of the Advances in Neural Information Processing Systems; Curran Associates, Inc.: New York, NY, USA, 2020; Volume 33, pp. 9459–9474. [Google Scholar]
- Sahoo, P.; Singh, A.K.; Saha, S.; Jain, V.; Mondal, S.; Chadha, A. A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications. arXiv 2025, arXiv:2402.07927. [Google Scholar]
- Zhou, Y.; Muresanu, A.I.; Han, Z.; Paster, K.; Pitis, S.; Chan, H.; Ba, J. Large Language Models Are Human-Level Prompt Engineers. arXiv 2023, arXiv:2211.01910. [Google Scholar] [CrossRef]
- Francia, R.; Leone, M.; Leonardi, G.; Montani, S.; Pennisi, M.; Striani, M.; D’Alfonso, S. AutoML-Med: A Framework for Automated Machine Learning in Medical Tabular Data. arXiv 2025, arXiv:2508.02625. [Google Scholar]
- Gu, J.; Jiang, X.; Shi, Z.; Tan, H.; Zhai, X.; Xu, C.; Li, W.; Shen, Y.; Ma, S.; Liu, H.; et al. A Survey on LLM-as-a-Judge. arXiv 2025, arXiv:2411.15594. [Google Scholar]
Hospital Name | Total Traces | Total Activities | Min Trace Length | Max Trace Length | Mean Trace Length | St. Dev. Trace Length |
---|---|---|---|---|---|---|
H1 | 72 | 1059 | 11 | 21 | 16 | 2.19 |
H2 | 86 | 1359 | 11 | 23 | 17 | 2.84 |
H3 | 105 | 1569 | 11 | 23 | 17 | 2.33 |
H4 | 363 | 5597 | 11 | 23 | 17 | 2.31 |
Hospital | Rule 1 | Rule 2 | Rule 3 |
---|---|---|---|
H1 | 100.00% | 0.00% | 100.00% |
H2 | 81.25% | 33.33% | 100.00% |
H3 | 83.33% | 8.33% | 100.00% |
H4 | 79.69% | 5.26% | 98.44% |
Hospital | Rule 4 | Rule 5 |
---|---|---|
H1 | 93.94% | 40.00% |
H2 | 95.45% | 29.41% |
H3 | 96.43% | 16.67% |
H4 | 93.33% | 20.31% |
Hospital | vs. () Rule 1 | vs. () Rule 2 | vs. () Rule 3 |
---|---|---|---|
H1 | 100.00% vs. (100%) | 0.00% vs. (0.00%) | 100.00% vs. (100%) |
H2 | 80.81% vs. (81.25%) | 0.00% vs. (33.33%) | 99.23% vs. (100.00%) |
H3 | 100.00% vs. (83.33%) | 0.00% vs. (8.33%) | 100.00% vs. (100.00%) |
H4 | 9.42% vs. (79.69%) | 0.00% vs. (5.26%) | 99.00% vs. (98.44%) |
Hospital | vs. () Rule 1 | vs. () Rule 2 | vs. () Rule 3 |
---|---|---|---|
H1 | 0.00% vs. (100.00%) | 0.00% vs. (0.00%) | 100.00% vs. (100.00%) |
H2 | 42.15% vs. (81.25%) | 100% vs. (33.33%) | 100.00% vs. (100.00%) |
H3 | 68.12%vs. (83.33%) | 0.00% vs. (8.33%) | 97.99% vs. (100.00% ) |
H4 | 0.00% vs. (79.69%) | 0.00% vs. (5.26%) | 100.00% vs. (98.44%) |
Hospital | vs. () Rule 1 | vs. () Rule 2 | vs. () Rule 3 |
---|---|---|---|
H1 | 100% vs. (100%) | 0.00% vs. (0.00%) | 100.00% vs. (100.00%) |
H2 | 86.63% vs. (81.25%) | 100% vs. (33.33%) | 100.00% vs. (100.00%) |
H3 | 92.12% vs. (83.33%) | 0.00% vs. (8.33%) | 79.15% vs. (100.00%) |
H4 | 31.04% vs. (79.69%) | 0.00% vs. (5.26%) | 60.33% vs. (98.44%) |
Hospital | H1 | H2 | H3 | H4 | Average | |
---|---|---|---|---|---|---|
Alpha Miner | Replay Fitness | 0.601 | 0.567 | 0.556 | 0.416 | 0.535 |
Generalization | 0.357 | 0.453 | 0.482 | 0.551 | 0.461 | |
Simplicity | 0.313 | 0.225 | 0.234 | 0.387 | 0.290 | |
Precision | 1.000 | 0.977 | 1.000 | 0.250 | 0.807 | |
Heuristic Miner | Replay Fitness | 0.862 | 0.779 | 0.797 | 0.874 | 0.828 |
Generalization | 0.518 | 0.507 | 0.484 | 0.623 | 0.533 | |
Simplicity | 1.000 | 0.676 | 0.855 | 0.650 | 0.795 | |
Precision | 0.237 | 0.735 | 0.378 | 0.974 | 0.581 | |
SIM | Replay Fitness | 0.786 | 0.538 | 0.599 | 0.552 | 0.619 |
Generalization | 0.072 | 0.121 | 0.144 | 0.108 | 0.112 | |
Simplicity | 0.951 | 0.930 | 0.932 | 0.905 | 0.929 | |
Precision | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Leonardi, G.; Montani, S.; Striani, M. Checking Medical Process Conformance by Exploiting LLMs. Appl. Sci. 2025, 15, 10184. https://doi.org/10.3390/app151810184
Leonardi G, Montani S, Striani M. Checking Medical Process Conformance by Exploiting LLMs. Applied Sciences. 2025; 15(18):10184. https://doi.org/10.3390/app151810184
Chicago/Turabian StyleLeonardi, Giorgio, Stefania Montani, and Manuel Striani. 2025. "Checking Medical Process Conformance by Exploiting LLMs" Applied Sciences 15, no. 18: 10184. https://doi.org/10.3390/app151810184
APA StyleLeonardi, G., Montani, S., & Striani, M. (2025). Checking Medical Process Conformance by Exploiting LLMs. Applied Sciences, 15(18), 10184. https://doi.org/10.3390/app151810184