A Goal-Oriented Evaluation Methodology for Privacy-Preserving Process Mining

Ileri, Ibrahim; Erdogan, Tugba Gurgen; Kolukisa-Tarhan, Ayca

doi:10.3390/app15147810

Open AccessArticle

A Goal-Oriented Evaluation Methodology for Privacy-Preserving Process Mining

by

Ibrahim Ileri

^1,*,†

,

Tugba Gurgen Erdogan

^2,†

and

Ayca Kolukisa-Tarhan

^2,†

¹

Computer Engineering Department, TED University, Ankara 06420, Turkey

²

Computer Engineering Department, Hacettepe University, Ankara 06800, Turkey

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Appl. Sci. 2025, 15(14), 7810; https://doi.org/10.3390/app15147810

Submission received: 28 May 2025 / Revised: 5 July 2025 / Accepted: 10 July 2025 / Published: 11 July 2025

Download

Browse Figures

Versions Notes

Abstract

Process mining (PM) is a growing field that looks at how to find, analyze, and improve process models using data from information systems. It automates much of the detailed work that usually requires a lot of manual effort. But there are concerns about privacy when dealing with this kind of data. This research introduces a novel, goal-oriented model evaluation methodology leveraging the privacy-preserving process mining (PPPM) cycle for diverse domains. The methodology encompasses the following: establishing goals and questions, targeted data acquisition, data refinement, log inspection, PPPM analysis, question resolution and interpretation, performance assessment, and possible improvement recommendations. The proposed methodology was applied in a case study analyzing four real-life event logs from different domains, yielding quantitative insights into the operational efficiency of the privacy-preserving approaches. To improve how well PPPM approaches work, we identified key issues and errors that affect their results and time utility performance. Our preliminary application of the proposed methodology indicates its potential to facilitate improvements by guiding the implementation of PPPM techniques across various domains.

Keywords:

privacy-enhanced methodologies; process mining; process management

1. Introduction

Process mining is an evolving academic discipline dedicated to the systematic identification, monitoring, and optimization of processes through the extraction and analysis of data from event logs generated by information systems [1]. Its primary objective is to enhance the comprehension of real-world processes, serving as a crucial enabler of Automated Business Process Management. The extraction and analysis of event logs represents the foundational phase of process mining, yet this stage is often accompanied by various challenges and complexities. Due to the inherent characteristics of information flow within information systems, event logs may contain noise and inconsistencies, potentially affecting the accuracy and reliability of process analysis. Conversely, the process models derived through mining may not accurately reflect real-world operational dynamics and thus require refinement and enhancement by managerial oversight. Process mining techniques effectively address these challenges, ultimately yielding precise and high-quality process models.

Event logs often contain readily accessible and sensitive personal information, necessitating careful handling to ensure data privacy. Concerns regarding privacy preservation emerge whenever such data are incorporated through any mechanism. In accordance with regulatory frameworks such as the European General Data Protection Regulation (GDPR) [2], process analysts are obligated to account for individuals’ privacy in their analyses and decision-making processes. Recently, a diverse array of methodologies has been utilized to analyze event logs within the framework of process mining. However, the influence of privacy regulations on both the technical infrastructure and organizational dimensions of process mining has revealed a substantial gap that necessitates careful examination and strategic remediation.

Building upon the identified necessity and a recent literature review conducted by the same researchers [3], this article introduces a novel goal-oriented evaluation methodology aimed at enhancing PPPM models for diverse domains. This goal-driven methodology seeks to enhance privacy-preserving models by integrating domain knowledge into process enactments, implementing privacy-related activities and techniques within the PM framework, and offering practical guidance to ensure the traceability, usability, and functionality of PPPM implementations. This methodology commences with the delineation of its scope, followed by the development of a structured table known as GQM, which systematically defines the initial project goals (G), corresponding questions (Q), and relevant metrics (M) within the context of a PPPM project. In the subsequent phase, the extraction of event log data and corresponding models followed by their description is conducted in alignment with predefined goals and guiding questions. Subsequently, the data undergo preprocessing through modeling representation, establishing a structured framework for their interpretation in accordance with the predefined goals. Building upon the GQM table, the PPPM analysis phase is conducted, wherein the performance metrics associated with each question are systematically evaluated. The obtained results, including performance indicators and responses to the formulated questions, serve as inputs for the subsequent phase, which focuses on assessing the utility and effectiveness of PPPM models. Ultimately, potential enhancement opportunities for PPPM models are systematically identified and recommended.

The research presented in this paper was conducted in accordance with a design science methodology [4], wherein artifacts were systematically developed and assessed to address the identified requirements. In this study, the term “artifact” denotes a goal-oriented evaluation methodology, with the research structured into two distinct phases: the design and construction phase, followed by the evaluation phase, wherein the validity of the artifact is assessed through a case study. The viability of the proposed methodology was substantiated through an embedded and multiple case study [5] incorporating PPPM models. The findings indicate that the proposed approach holds significant potential for generating informed recommendations.

The organization of this paper is structured as follows: Section 2 presents a comprehensive overview of the foundational background related to process mining and privacy. It also includes a detailed examination of the specific PPPM models and methodological approaches. Section 3 provides a comprehensive and detailed exposition of the proposed goal-oriented evaluation methodology. Section 4 presents the conducted case study along with its corresponding results, findings, and possible PPPM improvement recommendations. Finally, Section 5 presents concluding observations and outlines potential directions for future research endeavors.

2. Background

This section presents a comprehensive overview of process mining with a particular emphasis on the role of privacy within the PM field.

2.1. Process Mining

Process mining [6] extracts the fundamental significance of actual processes by analyzing collections of recorded events. It facilitates the efficient auditing and monitoring of business processes in a timely manner. Various factors play a crucial role in process mining, including the correlation of events, their associated time stamps, and other relevant attributes. An event log constitutes the foundational input for the application of process mining techniques. During execution, an event log systematically records multiple instances of a process, commonly referred to as cases. An event log is composed of a collection of systematically recorded event records. Each record contains a minimum of one timestamp, an activity reference, and a case identification reference, ensuring precise tracking and contextual association within the event log. Within a case, events are systematically arranged in sequential order based on their respective timestamps.

Process mining comprises three fundamental objectives: process discovery, conformance checking, and model enhancement. Process discovery utilizes event logs as input to construct a model that accurately represents real-world process workflows. Conformance checking facilitates the identification of deviations between the designed process model and the actual observed behavior in a real-world setting. Based on these principles, model enhancement refines the existing process model through the utilization of event logs.

Several process mining techniques have been previously developed, including Heuristic, Genetic, Inductive, and Fuzzy [7]. All of these methodologies aim to refine the representation of control flow within processes, ensuring well-informed and structured decision-making within an organization. An increasing number of commercial process mining tools are also available, including ProM [8], Disco [9], and PM4Py [10]. ProM is a highly extensible framework that accommodates a diverse range of plugins, facilitating the implementation of various process mining methodologies. As Java serves as its implementation language, the framework exhibits platform independence, ensuring compatibility across diverse operating systems. Leveraging process data, researchers can efficiently generate visual representations through Disco’s innovative and commercially available process mining technology. PM4Py is designed for seamless integration with process mining science workflows, facilitating the manipulation and analysis of event logs. It provides support for standard process mining formats, such as XES and CSV, as well as various algorithms, including Inductive Miner [11]. Additionally, it enables conformance checking through alignments and token-based replay [12]. It is a Python (>=3.9)-based open-source library designed to support both academic research and industrial applications, facilitating advanced process mining analyses and implementations.

2.2. Privacy and Process Mining

Privacy concerns primarily center on individual autonomy, encompassing the ability to regulate and determine the entities permitted to access personal data. This principle is regarded as a fundamental human right and is subject to regulatory frameworks across multiple jurisdictions. Privacy protection encompasses a broader scope than security, necessitating a trusted framework that fosters reliable interactions between data providers and consumers. In contemporary research and practice, process mining and privacy are addressed concurrently, highlighting their interconnected significance. The balance between the accuracy of a discovered process model and the preservation of privacy remains a critical and pressing consideration. Ensuring GDPR-compliant process mining necessitates the implementation of organizational and technical privacy safeguards across all phases of the development life-cycle.

Privacy has become a focal point in the modern data-driven society, largely driven by the prevalence of data breaches and the enactment of regulatory frameworks such as GDPR in Europe. GDPR defines personal data as any information pertaining to an identified or identifiable natural person, referred to as the “data subject”. An identifiable natural person is one who can be recognized, either directly or indirectly. The utilization and governance of medical data must also adhere to international legal frameworks designed to safeguard personal data privacy and security. The governance of health data entails risks that may jeopardize an individual’s well-being and exerts significant influence on both their personal and professional spheres. To ensure compliance, the management of health data is regulated by multiple legal frameworks, including the Health Insurance Portability and Accountability Act (HIPAA) privacy regulations [13] and the GDPR.

The privacy–utility trade-off in process mining pertains to the fundamental challenge of safeguarding sensitive information within event logs while preserving the analytical integrity required for deriving precise and meaningful process insights. Event logs frequently include personal or confidential information, such as user identifiers, timestamps, and case attributes. To mitigate the risk of re-identification or data leakage, privacy-preserving methodologies such as data anonymization and differential privacy are employed to enhance data privacy while maintaining analytical utility. However, these privacy-enhancing techniques often diminish the utility of the data, which may adversely affect the accuracy of process discovery, conformance checking, and model enhancement results. This trade-off is particularly significant in sectors such as healthcare and finance, where regulatory frameworks, such as the GDPR, impose stringent data protection requirements while stakeholders continue to expect insightful and meaningful process analyses. Achieving an optimal equilibrium between data utility and privacy preservation remains a fundamental challenge in the domain of PPPM.

2.3. Related Works

The integration of privacy-preserving techniques into process mining has become progressively indispensable, as the examination of event logs frequently entails sensitive personal or organizational data that must be safeguarded in adherence to legal and ethical regulations. In this research progress, an initial study [3] was conducted with a descriptive analysis of PPPM research across various domains through the application of a systematic literature review methodology [14]. A total of 39 relevant papers were identified and systematically analyzed to discern demographic trends, challenges, key characteristics, and implementations pertaining to privacy in process mining. Our findings revealed that PPPM approaches have garnered substantial attention, particularly since 2018, with a primary emphasis on the anonymization of event logs rather than the advancement of privacy-compliant methodologies for process mining. Despite the inherent challenges, this field is experiencing rapid growth and remains open to further research and practical advancements.

The systematic literature review further revealed that the process mining community across various domains lacks a comprehensive methodological framework that delineates the practical implementation of PPPM activities and techniques. Although numerous privacy-preserving techniques have been developed and rigorously examined within the broader domain of data mining, their direct applicability to process mining remains constrained and insufficiently explored. This necessity arises from the distinctive nature of event logs, which encompass both behavioral and contextual information that must be retained to facilitate rigorous and meaningful process analysis. Consequently, process mining presents distinct challenges in privacy preservation, necessitating the development of specialized methodologies that effectively reconcile behavioral integrity with robust privacy protections. The present work bridges this gap by introducing an innovative, goal-driven evaluation methodology expressly tailored to privacy-preserving process mining. To the best of our knowledge, no directly comparable prior studies exist; accordingly, this contribution initiates a novel research trajectory within this domain.

During our literature review, we identified different PPPM approaches under different categories. We have selected state-of-the-art PPPM models from group-based anonymization and differential privacy (DP) categories for our case study. Group-based anonymized PPPM models are founded on the principle of grouping cases that share identical quasi-identifier values, based on assumptions regarding the background knowledge possessed by potential adversaries. The primary objective is to mitigate the risk of isolating a specific individual record within the event log, thereby ensuring compliance with the requisite privacy standards. k-Anonymity [15] ensures that that each record within a dataset is indistinguishable from at least (k − 1) other records concerning a defined set of quasi-identifiers, thereby mitigating the likelihood of re-identification. l-Diversity [16] enhances k-anonymity by mandating that each set of indistinguishable records contains a minimum of l well-represented values for sensitive attributes, thereby mitigating the risk of attribute disclosure. t-Closeness [17] enhances l-diversity by guaranteeing that the distribution of a sensitive attribute within any equivalence class remains within a specified distance t of its overall distribution in the dataset, thereby mitigating the risk of inference privacy attacks.

Beyond foundational group-based anonymization models, more sophisticated group-based PPPM models are also examined to further advance the scope of exploration. The authors of paper [18] present PRETSA, a novel algorithm developed for event log sanitization, providing privacy guarantees through the implementation of k-anonymity and t-closeness principles. The authors of paper [19] introduce the TLKC privacy model for process mining, which provides robust privacy guarantees through the implementation of group-based anonymization techniques. Unlike PRETSA, introduced in paper [18], which focuses on privacy from a resource-oriented perspective, TLKC prioritizes the case-based dimension of privacy within process mining.

Differential privacy PPPM models are founded on the principle that the probability distribution of the data must not disclose the presence or absence of any individual case within the original event log. These models introduce random noise into the results, ensuring that variations in an individual’s data do not influence the overall distribution of the original dataset. In other words, these approaches introduce uncertainty into individual data, ensuring that modifications to a single case do not influence the overall analytical outcomes. For our exploratory case study analysis under this category, we have adapted two supervised and one unsupervised DP privacy-preserving machine learning models into PM initially. As first supervised, a differentially private Naive Bayes classification model [20] integrates precisely calibrated noise into the computation of class priors and conditional probabilities, ensuring the indistinguishability of individual data records. This approach furnishes formal differential privacy assurances while preserving classification accuracy. As the second, a differentially private decision tree model [21] is constructed by introducing carefully calibrated noise into the selection process of splitting attributes or information gain metrics. This approach ensures that the presence or absence of any individual data instance does not substantially impact the structural integrity of the model, thereby offering robust differential privacy assurances. For the one unsupervised DP privacy-preserving machine learning model, we have employed state-of-the-art DP-k-Means. A DP-k-means clustering model, as introduced by [22], ensures privacy by incorporating carefully calibrated noise into both centroid updates and data point assignments throughout the iterative clustering process. This approach guarantees that the impact of any individual data record on the final cluster formation remains constrained in accordance with differential privacy principles.

In addition to these, we further extended our case study experiments with more advanced DP PPPM models from the literature. Study [23] seeks to develop a comprehensive privacy protection framework for event data, grounded in the well-established principles of differential privacy. The proposed privacy protection model focuses exclusively on the abstract privacy challenge previously delineated, facilitating the dissemination of aggregate data while ensuring the confidentiality of individual participant information. Paper [24] examines the challenge of incorporating a process’s semantic structure into the anonymization of control flow. The authors present SaCoFa as a flexible algorithmic model designed for the controlled insertion of noise, utilizing an exponential mechanism. In study [25], the authors empirically illustrate, through a practical implementation on a real-world event log, that anonymized event logs preserve their analytical utility across a range of applications that incorporate contextual information. Namely, PRIPEL guarantees differential privacy at the granularity of individual cases, rather than applying it to the event log as a whole, thereby preserving contextual information. The authors of paper [26] propose TraVaS, that is an innovative method for the release of differentially private trace variants, leveraging anonymized partition selection techniques to enhance privacy protection. In contrast to conventional approaches that depend on prefix expansion, often resulting in substantial computational complexity and the generation of artificial variants, TraVaS prioritizes the direct publication of trace variants, thereby enhancing efficiency and authenticity.

3. Methodology

In this section, we propose and describe the goal-oriented evaluation methodology based on PPPM for possible model improvements. The proposed PPPM methodology aims to facilitate an integration of a privacy-preserving model for PM by experimenting on real-life event logs from different domains and emphasizing their individual characteristics. Figure 1 presents a comprehensive overview of the proposed goal-oriented PPPM models evaluation methodology.

This methodology effectively manages substantial volumes of experimental data by systematically distributing PPPM activities, thereby extracting valuable insights through the following sequential steps:

Establishing Goals and Questions: In the initial phase of our methodology, we employed the Goal Question Metric (GQM) framework [27] to systematically structure and guide our quantitative experimental case study, utilizing real-world event log datasets. By initially establishing well-defined, overarching goals aligned with the study’s objectives, we maintained a focused and contextually relevant evaluation. Each overarching goal was systematically broken down into precise, quantifiable questions aimed at evaluating the effectiveness and robustness of the proposed methodology within realistic operational conditions. These questions were subsequently linked to well-defined metrics that furnished quantitative evidence to substantiate their corresponding answers. This systematic approach not only reinforced the rigor and traceability of the evaluation process but also promoted a transparent interpretation of results, ensuring coherence with the overarching research objectives. The resulting output of this step is the GQM table.
Data Gathering: In this step, we retrieved real-world event log datasets, accompanied by their corresponding privacy-preserving models for PM and pertinent quantitative metrics. These event logs were obtained from publicly accessible repositories to enhance reproducibility and accurately represent genuine process behaviors across a wide range of domains. Established significant privacy-preserving models are additionally gathered to evaluate their conformance with predefined metrics for PM. Consequently, the output artifacts of this phase comprise real-world event log datasets, privacy-preserving models for process mining, and quantitative metrics relevant to the associated experiments.
Event Logs Description: For each event log, we systematically identified the underlying process structure and delineated key characteristics, including variant frequency and trace uniqueness, which are essential for assessing the performance and the trade-offs between privacy and utility in our case study. This information was presented in a structured tabular format as the designated output of this phase.
Data Preprocessing: During the data preprocessing phase, feature vector normalization techniques were employed to refine the extracted metrics, ensuring their suitability for subsequent privacy-preserving machine learning model experiments. This phase was essential to guarantee the proportional contribution of all process mining features (e.g., case ids, activity names, timestamps, and other resource attributes) during model training, especially in algorithms that are sensitive to feature scaling. Standard normalization techniques, including min-max scaling, were applied based on the distribution characteristics of each event logs feature to ensure consistency and comparability. Conversely, advanced PPPM models employ proprietary trace encoding techniques internally, eliminating the necessity for an additional preprocessing layer at this stage. For these models, it is sufficient to identify the attributes, such as sensitive data and quasi-identifiers, that will be considered in the anonymization process. The outputs are the preprocessed model ready event logs representation.
PPPM Analysis and the Derivation of Answers to Questions: Following the preprocessing of the event log data into an appropriate format, various types of PPPM analyses can be conducted. Building upon preprocessed real-world event logs and legally established privacy-preserving models, process mining analysis commences with process discovery techniques that adopt a control-flow perspective, followed by conformance checking to ensure alignment with expected behavioral patterns. In essence, following log inspection, the process model is identified within the control-flow perspective and analyzed through various PPPM algorithms to address the pertinent research questions in this phase. Moreover, potential compliance issues and performance bottlenecks, if present, are systematically identified and analyzed.
Results Evaluation: At this stage, a thorough analysis of the quantitative experimental results obtained in the preceding phase was conducted to ensure a rigorous evaluation of the findings. Utilizing the previously extracted quantitative metrics, we conducted a systematic interpretation of the results to evaluate the performance, effectiveness, and practical implications of the employed models. By systematically aligning the metric outcomes with our research questions and objectives, we established a coherent, evidence-based discussion that reinforces the validity and relevance of our findings.
Recommendations for Possible PPPM Improvement: In the final phase, we developed recommendations for the future enhancement of the evaluated privacy-preserving PM models, drawing upon insights derived from our comprehensive assessment. These recommendations aim to improve privacy-preserving model robustness, scalability, and adaptability to dynamic and heterogeneous real-world process environments. As the key outcomes of this phase, these recommendations are intended to steer future research toward the development of more effective and broadly generalizable solutions in privacy-aware process mining.

4. Case Study

The case study was meticulously designed and executed in accordance with the established methodological framework and guidelines of formal case study research [5]. This study aims to examine the influence of privacy models on the structural and qualitative characteristics of event log data within the context of process mining.

Figure 2 presents the compact view of our case study protocol. We adopted a holistic analysis view that would allow us to understand the effects of applying privacy models on process mining data result quality and runtime performances. This study is classified as embedded, as it incorporates multiple analytical units corresponding to distinct surgical categories, while simultaneously being designated as a multiple-case study, given its focus on the real-life data sources from multiple domains.

By following the seven steps of the proposed methodology, the case study is implemented and conducted by using different Python tools and libraries. Especially for the main PM analysis, such as event logs description statistics, process discovery, and conformance checking, the PM4Py tool was utilized.

4.1. Establishing Goals and Questions

The GQM approach was utilized as a structured framework to systematically direct and assess the progression of this research. Initially, comprehensive goals were established to accurately encapsulate the fundamental objectives of the study. Each overarching goal was systematically deconstructed into sub-goals, offering more precise and targeted guidance. Aligned with these sub-goals, a structured set of precise inquiries was developed to systematically guide data collection and analytical processes. Ultimately, quantifiable metrics were established for each question, facilitating objective assessment and systematic evaluation. The GQM hierarchy used in this study is concisely outlined in the subsequent Table 1.

4.2. Data Gathering

After the GQM table is constructed, four real-life event log datasets are extracted from the public 4TU ResearchData repository (4TU ResearchData: https://data.4tu.nl/). Given the algorithmic variations among PPPM models, it is essential to ensure an equitable and methodologically rigorous comparison. Therefore, we utilize the event logs with diverse structural configurations and conduct a comprehensive analysis across a wide range of privacy budget parameters, including epsilon and sigma. The Sepsis log provides a detailed representation of hospital procedures administered to sepsis patients and encompasses a substantial number of infrequent process traces [28]. The events recorded in the BPIC13 log are associated with the VINST system, which is designed for incident and problem management [29]. CoSeLog is an event log originating from an actual customer support setting, systematically documenting detailed sequences of interactions and activities to support research in process mining and behavioral analysis [30]. The Road Traffic Fines event log documents authentic administrative procedures concerning the management of traffic fines by an Italian municipality, functioning as a benchmark dataset for research in process mining, compliance analysis, and the optimization of public sector workflows [31].

To gather privacy-preserving models, we used the respective publicly shared Python-based available code repositories in our own benchmark implementation. For the group-based privacy-preserving models, we selected AnonyPy (AnonyPy: https://github.com/glassonion1/anonypy) [32]. AnonyPy is a Python-based framework developed for the anonymization of datasets through group-based methodologies, incorporating privacy-preserving techniques such as k-anonymity, l-diversity, and t-closeness to safeguard sensitive information. It provides researchers with the capability to anonymize tabular data through the generalization or suppression of quasi-identifiers, while supporting customizable hierarchical structures and grouping methodologies. For DP-based privacy machine learning models, we employed IBM DiffPrivLib (IBM Differential Privacy Library. Available at https://github.com/IBM/differential-privacy-library) [33]. Developed atop widely used scientific Python libraries such as scikit-learn (scikit-learn: machine learning in Python. Available at https://scikit-learn.org) [34], DiffPrivLib provides DP privacy-preserving implementations of fundamental data transformation techniques, statistical functions, and machine learning algorithms. For all the other more advanced PPPM models, their own proposed code repositories were employed with recommended configurations. While SaCoFa and Laplacian-DP were integrated within the PRIPEL framework, TraVaS operates independently, maintaining its own distinct framework.

For the process mining analysis component, the PM4Py (PM4Py: Process Mining for Python. Available at https://github.com/process-intelligence-solutions/pm4py) process mining tool was employed to derive process control-flow representations in the form of Petri nets using the InductiveMiner algorithm [35]. Subsequently, conformance checking is conducted based on three distinct metrics as follows [36]:

Fitness: A log trace is deemed to exhibit perfect conformity with the model if it can be precisely replayed within the model and corresponds fully to a complete trace of the model.
Precision: It concerns a model’s ability to accurately represent observed behavior while effectively preventing the inclusion of unobserved behavior, thereby mitigating the risk of underfitting.
Generalization: It concerns a model’s ability to incorporate previously unobserved behavior while mitigating the risk of overfitting.

To assess the performance of privacy-preserving machine learning algorithms, both accuracy and computational time metrics from the scikit-learn Python library were utilized, offering a comprehensive evaluation of model utility and efficiency. The accuracy metric measures the fraction of correct predictions generated by the model when applied to a labeled test dataset, serving as a fundamental criterion for assessing the performance of classification models within the constraints of differential privacy. For unsupervised learning tasks, specifically clustering, the inertia metric was employed to quantify the sum of squared distances between data points and their respective cluster centroids, serving as a measure of clustering cohesion. This metric quantifies cluster compactness and serves as an indirect measure of clustering quality, facilitating the assessment of how privacy-preserving transformations influence the structural integrity and coherence of the data. Furthermore, the time metric (in seconds), computed based on execution time measurements, provides valuable insights into the computational overhead imposed by privacy mechanisms, which is essential for evaluating the practicality of implementing such models in real-world scenarios.

All experimental procedures were executed on a personal desktop computer equipped with an Intel Core i7-13700K CPU and 32 GB of DDR5 RAM. An execution timeout of 30 min was established for each anonymization task, defined as the application of an algorithm, configured with specific parameters, on an individual event log.

4.3. Event Logs Description

By using PM4Py, we give the base descriptive statistics for the real-life event logs as shown in Table 2. The table illustrates significant variations in the characteristics of the event logs. Trace uniqueness is an important analysis criterion. A high degree of trace uniqueness indicates that individual process instances can be distinctly identified based on their event sequences, thereby improving the granularity and accuracy of behavioral analysis. However, this heightened specificity also elevates the risk of re-identifying sensitive cases. Conversely, a lower degree of uniqueness mitigates privacy risks by enhancing trace similarity; however, it may also obscure significant variations in process behavior. We assert that the data collection methodology employed is well suited to ensuring a high degree of external validity in the results.

To establish a comparative baseline for the utility of PPPM models, the event log conformance checking metrics, obtained following process discovery without the application of anonymization, are presented in the Table 3. The process model for each event log was derived using the PM4Py InductiveMiner algorithm, represented as Petri nets, with conformance checking metrics assessed through token-based replay. Figure 3 shows an example of the discovered PetriNet of the Road Traffic Fines event log which has the highest instance count and lowest trace uniqueness.

4.4. Data Preprocessing

Transforming the event logs into model-ready feature vectors is the most important step. To carry out this process, under each category, event logs were preprocessed to construct their feature vector according to the algorithmic model’s requirements.

For group-based anonymization models such as k-anonymity, l-diversity, and t-closeness, after setting activity names as the sensitive attribute, the rest of the attributes were used as quasi-identifiers for the models run. However, some event logs, especially timestamp attributes were preprocessed to be converted into a common format since some of them include differences. AnonyPy tool has a preserver component which is responsible for auto-preprocessing the feature vectors internally.

For the privacy-preserving machine learning models, skelarn’s preprocess library was utilized to create their feature vectors. By converting timestamp attributes to strings initially, we applied the LabelEncoder technique to transform ordinal string values into numeric integers. Moreover, for unsupervised learning as DP-k-Means, training set features were standardized by scaling to unit variance to ensure their equal contribution to distance-based model performance.

More sophisticated PPPM models use their internal techniques to represent the event log features in order to preserve the sequential order between the instances. The two state-of-the-art approaches are prefix-tree encoding and trace variant encoding. The prefix-tree encoding method characterizes each trace as an ordered sequence of activities and systematically encodes all trace prefixes within a hierarchical tree structure (i.e.,Trie). Each node is associated with a distinct activity, and every path extending from the root to a given node delineates a prefix of a trace. Alternatively, trace-variant encoding categorizes traces based on their precise sequences of activities, thereby discerning distinct execution patterns, referred to as variants. Each variant is represented as either a vector or a categorical identifier, enabling the simplification of log complexity while supporting variant-based analytical and comparative evaluations.

4.5. PPPM Analysis and the Derivation of Answers to Questions

In this section, we share the results of our detailed look at the PPPM models. For each stage, we answered the relevant questions based on our goals and checked the answers with respective quantitative metrics.

4.5.1. PPPM Analysis of k-Anonymity (Q1.1.1)

In this section, we present the quantitative results of applying the k-anonymity model on our real-life event logs under two different aspects as result and time utility qualities.

Result Utility Quality

The Figure 4 heatmap shows the applied k-anonymity model for PM quality metrics with respect to five different k parameter values behavior on our real-life event logs. For all the results, baseline metric values without any anonymization were included in Table 3 and employed for the comparative interpretations. Since k values are proportional to privacy protection, and the runtime performance is very low (>30 min timeout limit), this model does not apply to the Road Traffic Fines event log for k < 16 values.

All k-anonymized event logs showed alignment with their baseline fitness and generalization metric values. However, for all event logs except for BPIC13, there was an increase in precision for a strong privacy budget in comparison with their baselines. On the other hand, CoSeLog event log generalization was also reduced in this case.

Time Utility Quality

Figure 5 includes the runtime (s) performance metric results of the applied k-anonymity model on our real-life event logs. During the experimentation with the k-anonymity algorithm, an inversely proportional decrease in model runtime was observed with increasing privacy levels as a consequence of specific k parameter modifications, attributable to reduced model complexity due to the instance count size. The Road Traffic Fines event log had the highest runtime performance even for the lowest k value.

4.5.2. PPPM Analysis of l-Diversity (Q1.1.2)

In this section, we present the quantitative results of applying an l-diversity anonymity model on our real-life event logs under two different aspects as result and time utility qualities. Here, elevated l values necessitate a higher degree of diversity among sensitive attributes within each k-anonymous group, which may lead to increased data suppression or generalization to preserve analytical utility.

Result Utility Quality

The Figure 6 heatmap shows the applied l-diversity model for PM quality metrics with respect to five different k and l parameter values behavior on our real-life event logs. For all the results, baseline metric values without any anonymization were included in Table 3 and employed for the comparative interpretations. Since the k and l values are proportional to privacy protection, but the runtime performance is very low (>30 min timeout limit), this model does not apply to the Road Traffic Fines event log for k < 16 values.

All l-diversity anonymized event logs showed alignment with their baseline fitness metric values. However, for the Sepsis and Road Traffic Fines event logs, there was a decrease in precision for a strong privacy budget in comparison with their baselines. Namely, the Road Traffic Fines event log precision reduced the most here. On the other hand, the BPIC13 event log generalization was dramatically reduced but precision was increased for this case. The same situation was detected with the CoSeLog event log for k = 64 and t = 4 parameter values in this single configuration only.

Time Utility Quality

Figure 7 includes the runtime (s) performance metric results of the applied l-diversity model on our real-life event logs. During the experimentation with the l-diversity algorithm, an inversely proportional decrease in model runtime was observed with increasing privacy levels as a consequence of specific k and l parameter modifications, attributable to reduced model complexity as well as instance count size. Runtime for the anonymization values was decreased while privacy protection increased as a general trend. The Road Traffic Fines event log had the highest runtime performance even for the lowest k and l values.

4.5.3. PPPM Analysis of t-Closeness (Q1.1.3)

In this section, we present the quantitative results of applying the t-closeness anonymity model on our real-life event logs under two different aspects as result and time utility qualities. Here, the parameter t regulates the risk of attribute disclosure by quantifying the extent to which the distribution of sensitive attributes within a group diverges from the overall population distribution. In other words, a reduction in t imposes more stringent privacy constraints, ensuring that the distribution of sensitive attributes within each equivalence class closely aligns with the global distribution.

Result Utility Quality

Figure 8 heatmap shows the applied t-closeness model for the PM quality metrics with respect to five different k and t parameter values behavior on our real-life event logs. For all the results, the baseline metric values without any anonymization were included in Table 3 and employed for the comparative interpretations. Since the k and t values are proportional to privacy protection, but the runtime performance is very low (>30 min timeout limit), this model does not apply to the Road Traffic Fines event log for k < 16 values.

All the t-closeness anonymized event logs showed alignment with their baseline fitness and precision metric values. However, for all the event logs, there was a decrease in generalization for a strong privacy budget in comparison with their baselines. Specifically, the generalization applied to the Road Traffic Fines event log exhibited the least reduction in this context. On the other hand, all the metric results showed a steady trend for each parameter configuration.

Time Utility Quality

Figure 9 includes the runtime(s) performance metric results of the applied t-closeness model on our real-life event logs. During the experimentation with the t-closeness algorithm, an inversely proportional decrease in model runtime was observed with increasing privacy levels as a consequence of specific k and t parameter modifications, attributable to reduced model complexity besides instance count size. The runtime for the anonymization values decreased while privacy protection increased as a general trend. However, for the Sepsis event log, the runtime values significantly increased for the lowest k and t values byproduct of the anonymization algorithm’s behavior. The Road Traffic Fines event log had the highest runtime performance even for the lowest k and t values.

4.5.4. PPPM Analysis of PRETSA (Q1.2.1)

In this section, we present the quantitative results of applying the PRETSA PPPM model on our real-life event logs under two different aspects as result and time utility qualities. Here, the parameters k and t regulate the risk of attribute disclosure.

Result Utility Quality

The Figure 10 heatmap shows the applied PRETSA model for the PM quality metrics with respect to five different k and t parameter values behavior on our real-life event logs. For all the results, the baseline metric values without any anonymization were included in Table 3 and employed for the comparative interpretations. Since the k and t values are proportional to privacy protection, but the runtime performance is very low (>30 min timeout limit), this model does not apply to the Road Traffic Fines event log for k > 4 values.

All the PRETSA anonymized event logs showed alignment with their baseline fitness and generalization values. However, for all the event logs, there was a decrease in precision for a strong privacy budget (when k goes high) in comparison with their baselines. Specifically, the precision and generalization applied to the CoSeLog event log exhibited the highest increase in this context. On the other hand, all the metric results showed steady trends for each parameter configuration. The Sepsis event log showed the optimum result quality metrics for this model.

Time Utility Quality

Figure 11 includes the runtime (s) performance metric results of the applied PRETSA PPPM model model on our real-life event logs. During the experimentation with the PRETSA algorithm, an inversely proportional decrease in model runtime was observed with increasing privacy levels as a consequence of specific k and t parameter modifications, attributable to reduced model complexity besides instance count size. On the BPIC13 and Road Traffic Fines event logs, the runtime for the anonymization values was increased with the t parameter value explicitly. However, for the Sepsis event log, the runtime values were very steady for all the configurations byproduct of the anonymization algorithm’s behavior. The BPIC13 event log had the highest runtime performance for the lowest k and the highest t values.

4.5.5. PPPM Analysis of TLKC (Q1.2.2)

In this section, we present the quantitative results of applying the TLKC PPPM model on our real-life event logs under two different aspects as result and time utility qualities. Here, the parameters k and l regulate the risk of attribute disclosure. All the other parameters are settled out as weak for time performance consideration according to the authors original publication. In particular, for all experiments, when we included the timestamp event attribute, the trials were taking a very long amount of time, so we discarded it.

Result Utility Quality

The Figure 12 heatmap shows the applied TLKC model for the PM quality metrics with respect to five different k and l parameter values behavior on our real-life event logs. For all the results, the baseline metric values without any anonymization were included in Table 3 and employed for the comparative interpretations.

This model did not produce any result for the BPIC13 event log due to its possible algorithmic nature since it uses background knowledge specific to the data. Also for the CoSeLog event log, TLKC is not applicable for k > 16 values. The CoSeLog and Road Traffic Fines event logs also had high overfitting with respect to the perfect precision metric value. The Sepsis event log showed the optimum quality metrics results for this model.

Time Utility Quality

Figure 13 includes the runtime (s) performance metric results of the applied TLKC PPPM model model on our real-life event logs. During the experimentation with the TLKC algorithm, a proportional increase in model runtime was observed with increasing privacy levels as a consequence of specific k and l parameter modifications, attributable to increased model complexity besides instance count size. However, for the Sepsis event, the log runtime values were very steady for all configurations byproduct of the model’s behavior.

4.5.6. PPPM Analysis of DP-Naive Bayes (Q2.1.1)

In this section, we present the quantitative results of applying the DP-Naive Bayes model on our real-life event logs under two different aspects as result and time utility qualities. In order to create a baseline for our comparisons, Table 4 includes both the metric values of model quality and runtime performance as a result of applying the naive (non-private) version of the DP-Naive Bayes model on four different real-life event logs.

Result Utility Quality

Figure 14 shows the DP-Naive Bayes model (test) accuracy metric with respect to the epsilon parameter values behavior on our real-life event logs. As seen from the results, there exists a decreasing trend in accuracy values according to increase in the privacy level for all event logs as expected. However, even if differential privacy has a nondeterministic nature, random deviations between different epsilon privacy budget parameters do not occur for all event logs. For supervised learning algorithms, the instance count is an important consideration for the model’s accuracy. In our experiments, while the Road Traffic Fines event log has the highest number of events, there is a dramatic reduction (around 85%) in the model’s accuracy after differential privacy protection. On the other hand, given that the Sepsis event log exhibits the highest percentage of trace uniqueness, it experiences the second most significant decline in accuracy of approximately 50%.

Time Utility Quality

Table 5 includes the runtime(s) performance metric results of the applied DP-Naive Bayes model on our real-life event logs. During the experimentation with the DP-Naive Bayes algorithm, slight fluctuations in the model runtime were observed as a consequence of specific epsilon parameter modifications. A proportional decrease in runtime was observed with increasing privacy levels, attributable to reduced model complexity. However, these fluctuations are generally minimal and do not constitute significant changes in algorithmic efficiency.

4.5.7. PPPM Analysis of DP-Decision Trees (Q2.1.2)

In this section, we present the quantitative results of applying the DP-Decision Trees model on our real-life event logs under two different aspects as result and time utility qualities. In order to create a baseline for our comparisons, Table 6 includes both the metric values of model quality and runtime performance as a result of applying the naive (non-private) version of the DP-Decision Trees model on four different real-life event logs.

Result Utility Quality

Figure 15 shows the DP-Decision Trees model (test) accuracy metric with respect to the epsilon parameter values behavior on our real-life event logs. The results show a decreasing trend in accuracy values according to increase in the privacy consideration for all the event logs. However, due to the nondeterministic nature of differential privacy, random deviations between different epsilon privacy budget parameters can occur, especially parallel with the instance count of the event logs. For supervised learning algorithms, the instance count is an important consideration for the model’s accuracy. In our experiments, while the Road Traffic Fines event log has the highest number of events, there is a dramatic reduction (around 60%) in the model’s accuracy after differential privacy protection.

Additionally, to enhance model interpretability and understand the underlying decision logic, decision trees for trained DP-Decision Trees models are visualized with nodes color-coded according to their majority class, facilitating a clearer representation of the decision paths. As an example, Figure 16 illustrates this situation for the Road Traffic Fines event log before and after applying the DP-Decision Trees model. The resulting tree exhibited a structurally compact form, characterized by a reduced number of nodes and shallower depth. However, this simplification incurred a trade-off in predictive performance, as reflected in the lower accuracy metrics observed in the test set. This trade-off aligns with the well-documented bias–variance dilemma in machine learning, wherein simpler models generally demonstrate higher bias and lower variance. While this reduces the likelihood of overfitting, it may also result in underfitting, thereby limiting the model’s ability to capture complex patterns in the data.

Time Utility Quality

Table 7 includes the runtime(s) performance metric results of the applied DP-Decision Trees model on our real-life event logs. During the experimentation with the nondeterministic DP-Decision Trees algorithm, slight fluctuations in model runtime were also observed as a consequence of specific epsilon parameter modifications. A proportional decrease in runtime was observed with increasing privacy levels, attributable to reduced model complexity. However, with the high number instance count of the Road Traffic Fines event log, a strong privacy level (i.e., lowest epsilon value) was gathered in a considerable amount of time, considering its respective scale. No statistically significant variation indicates stability in the measured values.

4.5.8. PPPM Analysis of DP-k-Means (Q2.1.3)

In this section, we present the quantitative results of applying the DP-k-Means model on our real-life event logs under two different aspects as result and time utility qualities. In order to create a baseline for our comparisons, Table 8 includes both the metric values of model quality and runtime performances as a result of applying the naive (non-private) version of the k-Means clustering model on four different real-life event logs.

Result Utility Quality

Figure 17 shows the DP-k-Means clustering model inertia (min-max scaled) metric with respect to the epsilon parameter values behavior on our real-life event logs. For each event log, the values show an expected decreasing trend compared with the baseline non-private values due to the anonymized nature. However, for the CoSeLog and Road Traffic Fines event logs, there exist small deviations between the two epsilon values due to the nondeterministic property of differential privacy.

Time Utility Quality

Table 9 includes the runtime(s) performance metric results of the applied DP-k-Means clustering model on our real-life event logs. Based on the results, an increase in the uniqueness of traces within the event logs corresponds to a more significant reduction in the model runtime performance. Thus, these kind of event logs are more suitable for differential privacy-preserving clustering techniques in terms of time performance consideration. Furthermore, a rapid clustering process does not necessarily guarantee precisely accurate representative selection.

4.5.9. PPPM Analysis of Laplacian-DP (Q2.2.1)

In this section, we present the quantitative results of applying the LaplacianDP PPPM model in PRIPEL on our real-life event logs under two different aspects as result and time utility qualities. Here, the parameter

ϵ

regulates the privacy risk. All the other hyperparameters are drawn from the authors original publication. In particular, for the BPIC13 and Sepsis event logs, the values were set due to their variant count nature. For each

ϵ

value, the experiments were run 10 times and averaged to reduce the nondeterministic behavior of DP.

Result Utility Quality

The Figure 18 heatmap shows the applied LaplacianDP model for the PM quality metrics with respect to five different k and t parameter values behavior on our real-life event logs. For all the results, the baseline metric values without any anonymization are included in Table 3 and employed for the comparative interpretations. Since the runtime performance is very low (>30 min timeout limit) within the PRIPEL framework, this model does not apply to the Road Traffic Fines event log for all configurations.

All the LaplacianDP anonymized event logs showed alignment with their baseline fitness and generalization values. However, for all the event logs, there was an increase in precision for a weak privacy budget (when

ϵ

goes high) in comparison with their baselines. Specifically, the precision and generalization applied to the Sepsis and CoSeLog event logs exhibited the highest increase for

ϵ

= 0.5. On the other hand, for the BPIC13 event log, the metric results showed well-defined trends for each parameter configuration.

Time Utility Quality

Figure 19 includes the runtime(s) performance metric results of the applied LaplacianDP PPPM model in PRIPEL on our real-life event logs. During the experimentation with the LaplacianDP algorithm, an inversely proportional decrease in model runtime was observed with increasing privacy levels as a consequence of the specific

ϵ

privacy parameter modifications, attributable to reduced model complexity. On the BPIC13 event log, the runtime for the anonymization values was increased with the t parameter value explicitly. However, for the Sepsis event log, the runtime values were significantly reduced between

ϵ

= 0.01 and

ϵ

= 0.05 values. The BPIC13 event log also had effective runtime performance for strong privacy despite its high count number of instances.

4.5.10. PPPM Analysis of SaCoFa (Q2.2.2)

In this section, we present the quantitative results of applying the SaCoFa PPPM model in PRIPEL on our real-life event logs under two different aspects as result and time utility qualities. Here, the parameter

ϵ

regulates the privacy risk. All the other hyperparameters are drawn from the authors’ original publication. In particular, for the BPIC13 and Sepsis event logs, the values were set due to their similar high trace variant count nature. For each

ϵ

value, the experiments were run 10 times and averaged to reduce the nondeterministic behavior of DP.

Result Utility Quality

The Figure 20 heatmap shows the applied SaCoFa model for the PM quality metrics with respect to five different k and t parameter values behavior on our real-life event logs. For all the results, the baseline metric values without any anonymization are included in Table 3 and employed for the comparative interpretations. Since the runtime performance is very low (>30 min timeout limit) within the PRIPEL framework, this model does not apply to the Road Traffic Fines event log for all configurations despite reducing the size of the event data by 10% with random sampling.

All the SaCoFa anonymized event logs showed alignment with their baseline fitness and generalization values. However, for all the event logs, there was an increase in precision for a weak privacy budget (when

ϵ

goes high) in comparison with their baselines. The CoSeLog event log showed a significant increase in precision besides generalization for the strongest privacy protection. Specifically, the precision and generalization applied to the Sepsis event log exhibited deviations for different

ϵ

values. On the other hand, for the BPIC13 event log, the metric results showed well-observed trends for each parameter configuration.

Time Utility Quality

Figure 21 includes the runtime (s) performance metric results of the applied SaCoFa PPPM model in PRIPEL on our real-life event logs. During the experimentation with the SaCoFa algorithm, a steady trend in the model runtime was observed with increasing privacy levels as a consequence of specific

ϵ

privacy parameter modifications. On the BPIC13 event log, the runtime for the anonymization values fluctuated with the

ϵ

parameter value explicitly. However, for the Sepsis and CoSeLog event logs, the runtime values were steady between the

ϵ

values. The BPIC13 event log also had detrimental runtime performance for all strong privacy levels so that it had the highest count number of instances.

4.5.11. PPPM Analysis of TraVaS (Q2.2.3)

In this section, we present the quantitative results of applying the TraVaS PPPM model on our real-life event logs under two different aspects as result and time utility qualities. Here, the parameters

ϵ

and

δ

regulate the privacy risk with respect to the model’s approximate differential privacy mechanism. All the other hyperparameters are drawn from the authors original publication. In particular, for the BPIC13 and Sepsis event logs, the values were set due to their similar high trace variant count nature. For each of the

ϵ

and

δ

values, the experiments were run 10 times and averaged to reduce the nondeterministic behavior of DP.

Result Utility Quality

The Figure 22 heatmap shows the applied TraVaS model for the PM quality metrics with respect to five different

ϵ

and

δ

parameter values behavior on our real-life event logs. For all the results, the baseline metric values without any anonymization are included in Table 3 and employed for the comparative interpretations. This model had successful results for all the event logs at all configurations.

All the TraVaS anonymized event logs showed alignment with their baseline fitness and generalization values. However, for all the event logs, there was an increase in precision for a weak privacy budget (when

ϵ

and

δ

goes high) in comparison with their baselines. The CoSeLog event log showed a significant increase in precision beside generalization for the strongest privacy protection. Specifically, the precision and generalization applied to the Sepsis event log exhibited deviations for different

ϵ

and

δ

values. Simultaneously, for the CoSeLog and Road Traffic Fines event logs, the generalization values were reduced drastically. On the other hand, for the BPIC13 event log, the metric results showed well-defined trends for each parameter configuration.

Time Utility Quality

Figure 23 includes the runtime(s) performance metric results of the applied TraVaS PPPM model on our real-life event logs. During the experimentation with the TraVaS algorithm, a nearly steady trend in model runtime was observed with increasing privacy levels as a consequence of the specific

ϵ

and

δ

privacy parameter modifications for all the event logs. On the Road Traffic Fines event log, the runtime for the anonymization values fluctuated with the

ϵ

and

δ

parameter values to a negligible degree. Overall, this model had a manageable runtime performance, especially with the event logs that had the highest instance count sizes.

4.6. Results Evaluation

Our observations indicate that the implementation of privacy-preserving algorithms can yield a high degree of precision. However, elevated values for certain quality measures do not inherently indicate that the privacy-preserving algorithm maintains data utility, as its primary objective is to generate results that closely resemble the original data rather than to enhance the quality of the discovered models. At this stage, a specific instance of the GQM table is constructed based on the actual metric values, and the quantitative findings are systematically synthesized into responses addressing each question.

(Q1.1.1.) The findings reveal that as the value of k increases, a greater degree of generalization and suppression becomes necessary, leading to a reduction in the granularity of the trace data. Precision exhibits a more pronounced decline as the reduction in trace variety heightens the probability of the model accommodating unobserved behaviors, thereby leading to potential overgeneralization. In addition, since k-anonymity considers the contextual information, it is not scaled well for the event logs that include a higher number of instances.

(Q1.1.2.) The results show that precision diminishes as trace merging renders the discovered model more permissive, enabling the incorporation of behaviors that were not originally recorded in the log. However, for some event logs, generalization is adversely affected, as the suppression of low-frequency patterns diminishes the model’s capacity to accommodate previously unseen yet plausible behaviors. In addition, since the l-diversity model considers the contextual information, it is not scaled well for the event logs that include a higher number of instances.

(Q1.1.2.) Our findings suggest that as the parameter t decreases, privacy is enhanced; however, the richness of trace-level behavioral patterns deteriorates, thereby adversely impacting the quality and expressiveness of the model’s results. Generalization further declines as the omission of lower-frequency, nuanced behaviors diminishes the model’s capacity to extend its applicability to previously unseen yet valid behavioral patterns. Also, for this model, the time performance did not scale well with respect to the number of instances.

(Q1.2.1.) As validated with the PRETSA PPPM model quality results, precision decreases over time, especially when the k parameter goes high. Thus, the more privacy is gained, the more utility decreases. In order to achieve higher k-values, the event log data are often generalized more. As the generalization increases, the data become less detailed, leading to a loss in precision. If the event log data have the maximum trace uniqueness percentage, PRETSA may be the right choice for privacy-utility trade-off. However, it did not show time performance efficiency for large event logs.

(Q1.2.2.) The TLKC PPPM model needed higher trace uniqueness to produce optimum results due to the model’s nature. Notably, despite the enforcement of stringent TLKC constraints, certain anonymized event logs produced process models characterized by high precision. This phenomenon can be ascribed to the limited behavioral diversity present in the original log, as well as to the targeted influence of TLKC on attribute-level generalization rather than on control-flow structures. In these cases, the prevailing trace variants are preserved, enabling the model to constrain behavior in a manner closely corresponding to the original log.

(Q2.1.1.) DP-Naive Bayes offers a viable approach to privacy-preserving classification, characterized by the following key attributes: effective for moderate

ϵ

values and not ideal for strict privacy settings. On the other hand, due to exhibiting high computational efficiency, this model is particularly well suited for the rapid and cost-effective deployment of anonymized models.

(Q2.1.2.) As seen from the results, DP-Decision Trees provide an effective trade-off among interpretability, privacy preservation, and computational efficiency. However, under stringent privacy constraints, performance declines as a result of imprecise split decisions introduced by noise. Training maintains a high level of computational efficiency, even when operating at lower

ϵ

values. The behavior of the model also exemplifies the bias–variance trade-off within the framework of differential privacy, necessitating careful management during parameter selection to optimize performance.

(Q2.1.3.) We have visualized the inertia values with respect to the epsilon parameter in the DP-k-Means model results. A lower inertia value indicated that the event log instances were closer to their respective cluster centroids, suggesting more compact and well-defined clusters. That is desirable as it means the clusters are tight and compact. On the other hand, the results indicated that a higher degree of trace uniqueness within the event logs is associated with a more pronounced decline in the model runtime performance.

(Q2.2.1.) Our LaplacianDP findings revealed that different hyperparameter configurations with the data characteristics had different effects on algorithm behavior. Especially for the Road Traffic Fine event log data, the parameter configurations from the original publication had unfeasible runtimes within the PRIPEL framework. Since the smaller

ϵ

parameter introduces more privacy, precision reduces due to this fact. It indicates that the discovered model started capturing noisy events so overfitting occurred. Generalization also decreased in parallel. In addition to these, for the runtime performances, increasing the

ϵ

parameter reduces the runtime in inverse proportion. In the PRIPEL original publication [25], the authors considered only the Sepsis event log for their experiments.

(Q2.2.2.) In the SaCoFa model experiments, various hyperparameter configurations, in conjunction with the underlying data characteristics, exerted distinct influences on algorithmic behavior. Especially for the Road Traffic Fines event log data, the parameter configurations (from original publication) had unfeasible runtimes within the PRIPEL framework, so we discarded it from our experiments. Thus, the data instance count is the main issue to consider for this model’s contextual information-carrying nature. We concluded that the PRIPEL framework is unfeasible to use with event logs that have a high count of events.

(Q2.2.3.) As validated with the TraVaS PPPM model quality results, the effectiveness of this model is considerably influenced by the privacy parameters. It may struggle with event logs that contain many infrequent trace variants. While TraVaS had runtimes that are efficient for event logs that have a high count of instances, its quality is affected by the high trace uniqueness characteristics.

In order to extend and support our fine-grained analysis results, we validated and discussed the result and time utility qualities of each PPPM model under their respective categories by using different statistical analysis techniques. To align with our subgoals, we applied statistical tests for each model within its category to gain a further comprehensive and competitive perspective.

To evaluate the practical equivalence of alternative models relative to a predefined baseline, we employed the Two One-Sided Tests (TOST) procedure, a statistical framework specifically designed to formally assess equivalence within a defined tolerance margin [37]. In contrast to conventional hypothesis testing, which seeks to identify statistically significant differences, the TOST procedure explicitly evaluates whether a model’s performance is statistically equivalent to that of a baseline, within a user-specified range of practical relevance known as the equivalence margin (△). This consideration is particularly critical when the objective is not to establish superiority, but rather to demonstrate that a model performs comparably to a reference system within the bounds of operational constraints.

Since PPPM models have a heterogeneous nature of their kind, to determine the equivalence margins for each model’s categories, we conducted a sensitivity analysis by following a 70%-equivalence-count rule that, after a grid of candidate margins was defined, computed the empirical equivalence count to specify the margin satisfying our threshold. This rule balances two imperatives in privacy model comparison: utility retention and privacy prioritization. By stipulating that a substantial proportion (70%) of utility losses must fall within a predefined margin, we ensure that the privacy-preserving approach maintains acceptable performance in the majority of cases. Conversely, by selecting the minimal margin that satisfies the condition

p \geq 0.70

, we adopt a sufficiently permissive criterion that allows for the recognition of stronger privacy guarantees, even in cases where utility loss occasionally exceeds the specified threshold.

For each PPPM model, we conducted a one-sample TOST comparing the distribution of the model result quality metrics to their fixed baseline metric scores. For some models with absent metric results, we employed a single imputation technique by replacing the missing value with the mean of its group values to ensure statistical significance.

Table A1 presents the one-sample TOST results for the discovered process model quality metrics as a result of applying group-based privacy models on our event logs. All group-based anonymization models showed statistically equivalent fitness because they allow sufficient behavior to replay the logs. For k-anonymity, only the precision metric for the Road Traffic Fines event log is not statistically equivalent as it slightly increases on average. It showed an overgeneralization trend over variability for low trace uniqueness event logs due to removal of rare behaviors; the model becomes more deterministic. For l-diversity, although the precision for Road Traffic Fines is not statistically equivalent, the precision and generalization metrics for BPIC13 also had no equivalence. However, while l-diversity showed a similar trend with k-anonymity for the BPIC13 event log, it had reduced precision and generalization metrics on average, so overfitting the rare, non-representative behaviors. This phenomenon was most prominent in logs with high trace variability, indicating that such logs require careful preprocessing or more robust PPPM approaches. For t-closeness, only the fitness values are statistically equivalent to their baselines for all the event logs in addition to slightly lowering, except that the generalization metric for the Road Traffic Fines is also equivalent apart from this. For all the event logs, we observed a pattern of low fitness, high precision, and low generalization trend. This indicates t-closeness that is overly restrictive; it accepts only a narrow slice of the behavior seen in the event log, rejecting many valid traces as the model fails to abstract over variations. For PRETSA, all the metrics are statistically equivalent for the Sepsis event log and for BPIC13 only the generalization metric is not equivalent in a negligible amount. Thus, it is the most promising model for the event logs that have high trace uniqueness. For CoseLog, only the fitness is equivalent but a cumulative increasing trend is shown for this event log that has the lowest instance count. This situation may be attributed to the log’s structured nature and the algorithm’s ability to abstract meaningful control-flow patterns while anonymizing. For Road Traffic Fines, only the precision metric is not equivalent as the model also included behavior that was not present in the log. This trade-off may be desirable in this domain where future flexibility is important. For TLKC, only the precision metric had no equivalence for all the available event logs (no result for BPIC13 due to algorithmic behavior). However, it struck a strong balance between specificity and generality for Sepsis; it fits the log, avoids unnecessary behavior, and generalizes well. Thus, this model is specifically addressed for this event log due to its algorithmic nature.

Table A2 shows the one-sample TOST results for the discovered process model quality metrics as a result of applying a differential privacy model on the event log data by fitting learning algorithms. For the ml-based DP PPPM models result qualities, their metrics have different interpretations. For the DP-Naïve Bayes and DP-Decision Trees supervised models, only the accuracy metric for BPIC13 had statistical equivalence for their respective margins. However, for the DP-k-Means unsupervised model, all the metric values were equivalent except for the BPIC13 event log for its given margin. This suggest that for supervised models, DP-Naïve Bayes has a more preferable approach in more privacy need domains due to its lower margin and utility loss especially for high instance count event logs like Road Traffic Fines and BPIC13. On the other hand, DP-k-Means showed optimum clustering behavior when compared to baseline since all the event logs had already naturally been built grouped into a cases structure.

Table A3 includes the one-sample TOST results for the discovered process model quality metrics as a result of applying a more advanced differential privacy model on the event log data. In advanced DP PPPM models, for LaplacianDP, BPIC13 had optimal equivalence for all three metrics due to its moderate amount of trace uniqueness and instance count characteristics. For the other event logs, only precision had no equivalence to their baseline. The fitness metrics showed a likely decreasing trend since noise injection with DP mechanism changes activity sequences or inserts spurious events. We also observed a small but consistent increase in both precision and generalization. This trade-off aligns with the goals of PPPM, where some reduction in replayability is acceptable in exchange for better abstraction and protection of sensitive behaviors. Also for SaCoFa, only the precision values were not equivalent to their baselines for all event logs (for Road Traffic Fines, no results are available for either LaplacianDP or SaCoFa algorithms). They had non-inferior precision performance due their p-values. However, high trace uniqueness, as in the Sepsis event log data, causes a slight reduction in the generalization metric so that the SaCoFa’s algorithmic design may not introduce a semantically filtered noise that smooths the trace-variant distribution for that. For TraVaS, BPIC13 had equivalence for all metrics. Sepsis was not equivalent to its baseline for the generalization metric only. In other words, mean generalization dropped while mean precision increased a little. This trade-off points to a form of overfitting, where the model overly emphasizes dominant behaviors at the cost of completeness and flexibility. CoSeLog had equivalence only for the fitness metric. Road Traffic Fines was not equivalent for precision only. It had considerably reduced mean precision and showed nonsuperior performance due to its p-values. For this event log, we observed a simultaneous drop in fitness, precision, and generalization. Such outcomes may result from a poor match between the privacy algorithm and the log characteristics, excessive noise, or overly aggressive filtering that removes meaningful structure. These findings reveal that even if the TraVaS model had successful metric results for all event logs, its result quality is highly sensitive to event log data characteristics like trace uniqueness. This highlights the importance of carefully selecting privacy techniques and preprocessing steps based on log complexity and domain needs.

To assess the comparative time utility performance of the PPPM models across the event logs under each respective category, we applied the non-parametric Friedman test, a robust statistical method well suited for comparing multiple algorithms across diverse datasets, without requiring the assumption of normality in performance metrics [38]. This choice is particularly warranted given the limited sample size and the ordinal nature of the ranking data obtained from the assessment of anonymization effectiveness.

The Friedman statistic is derived from the average ranks assigned to each method across the set of event logs. These ranks provide a summary of the relative performance of each method across individual event logs, with lower ranks denoting superior performance. The test is designed to assess the null hypothesis that, on average, all models exhibit equivalent performance across the datasets.

For the five group-based anonymization PPPM models across the event logs, the results of the Friedman test yielded a Friedman statistic of 8.800 with an associated p-value of 0.0663. This result suggests that there may be meaningful performance differences, albeit not statistically strong enough to declare definitive superiority using the strictest standards. However, closer inspection of the average ranks may still reveal trends or practical performance patterns among the models, even when the omnibus test is not statistically significant.

Table A4 shows the per-event log and average ranks that reveal which group-based PPPM models runtime was consistently better or worse across the event logs. t-closeness ranks the best in most event logs. This suggests robust performance, even though the TLKC model’s data-dependent nature can be misleading for Road Traffic Fines due to averaging the absent times with a single imputation method for statistical validity. t-closeness typically demonstrates greater computational efficiency by employing optimized clustering heuristics alongside a single distribution-distance metric (e.g., Earth Mover’s Distance), thereby circumventing the combinatorial complexity inherent in repeated generalization or diversity evaluations. In contrast, k-anonymity exhibits poorer runtime performance, reflecting limited generalizability and heightened sensitivity to event log characteristics, because identifying the minimal set of generalizations and suppressions required to ensure every quasi-identifier tuple appears in at least k records constitutes an NP-hard combinatorial optimization problem, necessitating exhaustive or heuristic search strategies with repeated event log data scans.

For the three ml-based differential privacy PPPM models across the event logs, the results of the Friedman test yielded a Friedman statistic of 0.000 with an associated p-value of 1.0000. Thus, there is also no sufficiently strong statistical evidence to declare definitive superiority using the strictest standards. Table A5 shows the per-event log and average ranks that reveals which ml-based DP PPPM models runtimes were consistently better or worse across the event logs. All models have consistently better equal runtime performances according to their average ranks that indicate richer generalizability. However, for the supervised algorithms, DP-Decision Trees is not a good candidate for the high instance count event logs like Road Traffic Fines, so that building a full tree involves many costly randomized queries instead of simple impurity computations.

For the three more advanced differential privacy PPPM models across the event logs, the results of the Friedman test yielded a Friedman statistic of 6.500 with an associated p-value of 0.00388. This result suggests that significant meaningful performance differences are found among the models. A post hoc Nemenyi test [38] was also conducted to explore pairwise differences among the models. The Nemenyi post hoc procedure systematically performs pairwise comparisons among models, applies the studentized range distribution to adjust for multiple testing, and identifies which specific pairs exhibit statistically significant differences.

Table A6 includes the per-event log and average ranks that reveal which of the more advanced DP PPPM models runtimes were consistently better or worse across the event logs. The TraVaS model’s highest runtime robustness is also statistically validated with respect to its first ranked place across all the event logs. This model would be a strong candidate for preferred adoption due to its consistent top performance. A Nemenyi post hoc analysis (

α = 0.05

) yielded the pairwise p-value matrix shown in Table A7. We found that TraVaS runs significantly faster than SaCoFa and LaplacianDP (all p < 0.05) whereas no significant difference was observed between SaCoFa and LaplacianDP (p = 0.759287). TraVaS attains superior runtime efficiency by employing a direct differentially private partition-selection mechanism to release precise trace-variant frequencies in a single thresholding pass, thereby obviating the computationally intensive prefix-based variant generation and iterative noise-query procedures characteristic of other models.

4.7. Recommendations for Possible PPPM Improvement

Despite the increasing interest in privacy-preserving process mining, several obstacles persist. A fundamental challenge involves reconciling privacy guarantees with model utility and the preservation of behavioral fidelity. Anonymization frequently alters control-flow semantics, thereby diminishing the fitness and precision of the models derived from the anonymized data. Moreover, current models encounter difficulties in handling high-dimensional event attributes, extended trace variants, and infrequent behaviors, which are susceptible to either suppression or excessive generalization. Computational efficiency represents another significant challenge, especially in the context of intricate differential privacy mechanisms. Potential improvements in PPPM models may involve the exploration of adaptive anonymization techniques that explicitly account for process behavior, the integration of semantic-aware utility preservation mechanisms, and the implementation of privacy-by-design strategies throughout the process discovery phase. Furthermore, the development of scalable and interpretable differential privacy algorithms specifically designed for process data structures is essential to improving practical applicability while preserving formal privacy guarantees.

Although differential privacy mechanisms are theoretically robust, they face challenges in maintaining behavioral patterns due to the introduction of noise in event frequencies, trace variants, and model parameters. This frequently results in diminished fitness, precision, and generalization in the process models derived from the discovered data. Furthermore, real-world event logs containing extended or distinctive traces are particularly susceptible to overgeneralization and a diminished level of interpretability when subjected to differential privacy constraints. Runtime scalability presents a significant challenge, as the intricacy of differential privacy budget allocation and noise calibration increases proportionally with log size and attribute dimensionality. To mitigate these challenges, potential advancements may center on the development of behavior-aware differential privacy algorithms and hybrid models that integrate both syntactic and semantic privacy principles. Additionally, conducting benchmarking on real-world logs encompassing diverse domains is crucial for assessing the robustness, fairness, and generalization of the proposed models within practical applications.

5. Conclusions and Future Work

This paper introduced a novel goal-oriented model evaluation methodology for privacy-preserving process mining, specifically designed for diverse application domains. The methodology is implemented through operational scenarios to address different enquiries pertaining to these models. The objectives, activities, inputs, and outputs of each stage within the proposed methodology were examined, followed by a case study conducted to assess its feasibility in facilitating improvements to the PPPM model. A sample GQM table was constructed for the case study, PPPM activities were conducted, the predefined inquiries were addressed, and the outcomes were systematically evaluated. After applying the privacy-preserving model, discovered process models qualities and runtime performances were measured with corresponding metrics. Areas for potential enhancement were systematically identified.

Table 10 presents a comprehensive summary of the SWOT analysis [39] conducted on the proposed methodology for applying PPPM across various domains. The SWOT analysis indicates that the strengths and opportunities associated with the proposed methodology outweigh its weaknesses and potential threats.

Despite the proliferation of privacy-preserving approaches in process mining, such as event log anonymization and differential privacy, future research must challenge the tacit presumption that these techniques inherently comply with the requirements of the General Data Protection Regulation (GDPR). Inherent in this assumption are several risks, including incomplete compliance, an illusory sense of legal protection, and the disregard of fundamental GDPR principles like the lawful basis for processing, data subject rights (such as access and erasure), and accountability obligations. Technical safeguards alone do not suffice to address challenges such as consent management, purpose limitation, and demonstrable compliance, as mandated by Articles 5, 6, and 30 of the GDPR.

To address this gap, an alternative direction for future work may focus on the development of a structured GDPR-oriented evaluation framework for privacy-preserving process mining. A robust framework ought to reconcile legal and ethical imperatives with technical evaluation criteria, thereby enabling researchers and practitioners to systematically assess whether proposed approaches genuinely satisfy GDPR obligations. This strategy would not only bolster regulatory compliance but also underpin the development of process-mining systems that are inherently privacy-aware by design and resilient to evolving data protection standards.

Furthermore, future research may concentrate on expanding the scope of the current case study to encompass multiple domains, thereby assessing the generalization and adaptability of the proposed methodology. Integrating a broader and more extensive collection of real-world event logs could provide a more rigorous assessment of the approach’s robustness and scalability. Furthermore, the incorporation of automated privacy-preserving models, including adaptive differential privacy and context-aware anonymization, has the potential to improve both usability and regulatory compliance. Comparative analyses of alternative methodologies, coupled with longitudinal studies aimed at evaluating the long-term effects on process performance and decision-making, represent promising avenues for future research.

Author Contributions

Conceptualization, I.I., T.G.E. and A.K.-T.; methodology, I.I.; validation, I.I.; investigation, I.I.; writing—original draft preparation, I.I.; writing—review and editing, T.G.E. and A.K.-T.; supervision, A.K.-T. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Acknowledgments

The authors declare that no external funding or support was received for this research. All contributions are solely those of the authors.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

DP	Differential Privacy
GDPR	General Data Protection Regulation
GQM	Goal Question Metric
HIPAA	Health Insurance Portability and Accountability Act
PM	Process Mining
PPPM	Privacy-Preserving Process Mining

Appendix A

Table A1. One-sample TOST result quality metrics for group-based PPPM (Eq. margin: △ = 0.1).

Model	Event Log	Metric	Mean_Diff	p_lower	p_upper	Eq.
k_anonymity	Sepsis	fitness	−0.016821772	$2.04584 \times 10^{- 5}$	$5.30332 \times 10^{- 6}$	Yes
k_anonymity	Sepsis	precision	0.042823684	0.001213876	0.026306603	Yes
k_anonymity	Sepsis	generalization	0.009441443	0.000216424	0.000449906	Yes
k_anonymity	BPIC13	fitness	0	0	0	Yes
k_anonymity	BPIC13	precision	−0.021183526	$1.01366 \times 10^{- 6}$	$1.81784 \times 10^{- 7}$	Yes
k_anonymity	BPIC13	generalization	0.015379115	$2.05768 \times 10^{- 5}$	$7.00647 \times 10^{- 5}$	Yes
k_anonymity	CoSeLog	fitness	−0.010881192	$8.1703 \times 10^{- 6}$	$3.42276 \times 10^{- 6}$	Yes
k_anonymity	CoSeLog	precision	−0.017207349	0.005727065	0.001654694	Yes
k_anonymity	CoSeLog	generalization	−0.008473584	0.00177847	0.000944632	Yes
k_anonymity	Road Traffic Fines	fitness	0	0	0	Yes
k_anonymity	Road Traffic Fines	precision	0.110871496	0.00016875	0.705885537	No
k_anonymity	Road Traffic Fines	generalization	−0.008988866	$1.64296 \times 10^{- 7}$	$7.9925 \times 10^{- 8}$	Yes
l_diversity	Sepsis	fitness	−0.017256907	$7.51856 \times 10^{- 6}$	$1.87418 \times 10^{- 6}$	Yes
l_diversity	Sepsis	precision	−0.062192902	0.005349092	$2.0938 \times 10^{- 5}$	Yes
l_diversity	Sepsis	generalization	0.019705937	0.000631058	0.002782744	Yes
l_diversity	BPIC13	fitness	−0.057690499	$8.25624 \times 10^{- 8}$	$4.2831 \times 10^{- 10}$	Yes
l_diversity	BPIC13	precision	0.116267805	$6.42954 \times 10^{- 11}$	0.999998003	No
l_diversity	BPIC13	generalization	−0.216253372	0.999969002	$5.76628 \times 10^{- 7}$	No
l_diversity	CoSeLog	fitness	−0.006774908	$1.88899 \times 10^{- 6}$	$1.09908 \times 10^{- 6}$	Yes
l_diversity	CoSeLog	precision	−0.038315933	0.002135029	$9.77791 \times 10^{- 5}$	Yes
l_diversity	CoSeLog	generalization	0.007048211	0.000183112	0.000316729	Yes
l_diversity	Road Traffic Fines	fitness	0	0	0	Yes
l_diversity	Road Traffic Fines	precision	−0.174836992	0.998678525	$8.29927 \times 10^{- 6}$	No
l_diversity	Road Traffic Fines	generalization	−0.001158652	$7.01388 \times 10^{- 9}$	$6.39304 \times 10^{- 9}$	Yes
t_closeness	Sepsis	fitness	−0.001893195	$2.27556 \times 10^{- 9}$	$1.95574 \times 10^{- 9}$	Yes
t_closeness	Sepsis	precision	0.721461899	$1.1137 \times 10^{- 8}$	0.999999966	No
t_closeness	Sepsis	generalization	−0.360078875	0.999990458	$9.82363 \times 10^{- 7}$	No
t_closeness	BPIC13	fitness	−0.090909091	0	0	Yes
t_closeness	BPIC13	precision	0.263058067	$1.6396 \times 10^{- 63}$	1	No
t_closeness	BPIC13	generalization	−0.703764617	1	0	No
t_closeness	CoSeLog	fitness	−0.017416045	0	0	Yes
t_closeness	CoSeLog	precision	0.333047401	0	1	No
t_closeness	CoSeLog	generalization	−0.568191827	1	$8.93136 \times 10^{- 66}$	No
t_closeness	Road Traffic Fines	fitness	−0.000452477	$1.24272 \times 10^{- 12}$	$1.19854 \times 10^{- 12}$	Yes
t_closeness	Road Traffic Fines	precision	0.23919059	$8.96318 \times 10^{- 7}$	0.999968953	No
t_closeness	Road Traffic Fines	generalization	−0.010888864	$9.93196 \times 10^{- 7}$	$4.14764 \times 10^{- 7}$	Yes
PRETSA	Sepsis	fitness	0	0	0	Yes
PRETSA	Sepsis	precision	0.085459246	$1.27003 \times 10^{- 8}$	0.000313763	Yes
PRETSA	Sepsis	generalization	−0.034813474	$8.39842 \times 10^{- 6}$	$4.63035 \times 10^{- 7}$	Yes
PRETSA	BPIC13	fitness	0	0	0	Yes
PRETSA	BPIC13	precision	0.049849179	0.001545461	0.049750839	Yes
PRETSA	BPIC13	generalization	0.113947829	$2.74254 \times 10^{- 11}$	0.999998489	No
PRETSA	CoSeLog	fitness	0.000127814	0	0	Yes
PRETSA	CoSeLog	precision	0.505767401	$7.22349 \times 10^{- 8}$	0.999999642	No
PRETSA	CoSeLog	generalization	0.123084276	$7.96451 \times 10^{- 7}$	0.994809952	No
PRETSA	Road Traffic Fines	fitness	0	0	0	Yes
PRETSA	Road Traffic Fines	precision	−0.191852588	1	$2.45396 \times 10^{- 64}$	No
PRETSA	Road Traffic Fines	generalization	0.009940589	$1.94989 \times 10^{- 61}$	$4.33038 \times 10^{- 61}$	Yes
TLKC	Sepsis	fitness	0	0	0	Yes
TLKC	Sepsis	precision	0.191579246	0	1	No
TLKC	Sepsis	generalization	0.002146526	0	0	Yes
TLKC	CoSeLog	fitness	0.000127814	0	0	Yes
TLKC	CoSeLog	precision	0.821114068	$7.60346 \times 10^{- 9}$	0.99999998	No
TLKC	CoSeLog	generalization	−0.083595724	0.000700554	$4.95081 \times 10^{- 8}$	Yes
TLKC	Road Traffic Fines	fitness	0	0	0	Yes
TLKC	Road Traffic Fines	precision	0.417747412	0	1	No
TLKC	Road Traffic Fines	generalization	−0.007659411	0	0	Yes

Table A2. One-sample TOST result quality metrics for ml-based DP PPPM (Eq. margins:

▵_{D P - N a i v e B a y e s}

= 0.13,

▵_{D P - D e c i s i o n T r e e s}

= 0.37,

▵_{D P - k - M e a n s}

= 0.05).

Table A2. One-sample TOST result quality metrics for ml-based DP PPPM (Eq. margins:

▵_{D P - N a i v e B a y e s}

= 0.13,

▵_{D P - D e c i s i o n T r e e s}

= 0.37,

▵_{D P - k - M e a n s}

= 0.05).

Model	Event Log	Metric	Mean_Diff	p_lower	p_upper	Eq.
DP-Naive Bayes	Sepsis	accuracy	−0.5574	0.999932122	$1.03443 \times 10^{- 5}$	No
DP-Naive Bayes	BPIC13	accuracy	−0.07206	0.04283869	0.000688306	Yes
DP-Naive Bayes	CoSeLog	accuracy	−0.14854	0.712464003	0.000394594	No
DP-Naive Bayes	Road Traffic Fines	accuracy	−0.32798	0.998257567	$6.95796 \times 10^{- 5}$	No
DP-Decision Trees	Sepsis	accuracy	−0.5113934	0.999999989	$7.56172 \times 10^{- 12}$	No
DP-Decision Trees	BPIC13	accuracy	−0.357886	0.005684391	$5.94289 \times 10^{- 10}$	Yes
DP-Decision Trees	CoSeLog	accuracy	−0.403032	0.966531201	$2.56524 \times 10^{- 7}$	No
DP-Decision Trees	Road Traffic Fines	accuracy	−0.561306	0.999994198	$1.04223 \times 10^{- 8}$	No
DP-k-Means	Sepsis	silhoutte	−5.55112 $\times 10^{- 17}$	0.000818872	0.000818872	Yes
DP-k-Means	BPIC13	silhoutte	−0.1214	0.989682964	0.000438762	No
DP-k-Means	CoSeLog	silhoutte	−0.0012	0.028098696	0.024548798	Yes
DP-k-Means	Road Traffic Fines	silhoutte	−0.0066	0.020363342	0.008869675	Yes

Table A3. One-sample TOST result quality metrics for more advanced DP PPPM (Eq. margin: △ = 0.17).

Model	Event Log	Metric	Mean_Diff	p_lower	p_upper	Eq.
LaplacianDP	Sepsis	fitness	−0.0002	$5.77413 \times 10^{- 12}$	$5.72004 \times 10^{- 12}$	Yes
LaplacianDP	Sepsis	precision	0.030179246	0.088575408	0.158436544	No
LaplacianDP	Sepsis	generalization	0.025346526	0.001302219	0.003881089	Yes
LaplacianDP	BPIC13	fitness	0	0	0	Yes
LaplacianDP	BPIC13	precision	0.058969179	0.001728048	0.019910174	Yes
LaplacianDP	BPIC13	generalization	0.098467829	$3.58955 \times 10^{- 7}$	$6.91213 \times 10^{- 5}$	Yes
LaplacianDP	CoSeLog	fitness	−0.001872186	$3.04058 \times 10^{- 9}$	$2.78417 \times 10^{- 9}$	Yes
LaplacianDP	CoSeLog	precision	0.109447401	0.036851638	0.314681253	No
LaplacianDP	CoSeLog	generalization	0.085804276	0.000600765	0.027017717	Yes
SaCoFa	Sepsis	fitness	0	0	0	Yes
SaCoFa	Sepsis	precision	0.130779246	0.003907639	0.277264149	No
SaCoFa	Sepsis	generalization	−0.046453474	0.001468294	0.000172437	Yes
SaCoFa	BPIC13	fitness	0	0	0	Yes
SaCoFa	BPIC13	precision	0.085369179	0.005421772	0.105175586	No
SaCoFa	BPIC13	generalization	0.089067829	$1.29238 \times 10^{- 7}$	$1.33975 \times 10^{- 5}$	Yes
SaCoFa	CoSeLog	fitness	0.000127814	0	0	Yes
SaCoFa	CoSeLog	precision	0.282247401	0.000383864	0.958041984	No
SaCoFa	CoSeLog	generalization	0.027604276	0.001098833	0.003638782	Yes
TraVaS	Sepsis	fitness	−9.33627 $\times 10^{- 5}$	$2.39473 \times 10^{- 15}$	$2.38423 \times 10^{- 15}$	Yes
TraVaS	Sepsis	precision	0.0106477	0.000155886	0.000253978	Yes
TraVaS	Sepsis	generalization	−0.321483243	0.999921526	$7.30525 \times 10^{- 7}$	No
TraVaS	BPIC13	fitness	0	0	0	Yes
TraVaS	BPIC13	precision	0.017799995	$9.9117 \times 10^{- 6}$	$2.28312 \times 10^{- 5}$	Yes
TraVaS	BPIC13	generalization	−0.028381629	$3.51169 \times 10^{- 5}$	$9.22262 \times 10^{- 6}$	Yes
TraVaS	CoSeLog	fitness	−7.94264 $\times 10^{- 5}$	$4.61889 \times 10^{- 13}$	$4.60166 \times 10^{- 13}$	Yes
TraVaS	CoSeLog	precision	0.25697802	$7.98887 \times 10^{- 5}$	0.975872353	No
TraVaS	CoSeLog	generalization	−0.329367836	0.995901805	$5.36057 \times 10^{- 5}$	No
TraVaS	Road Traffic Fines	fitness	−3.57533 $\times 10^{- 5}$	$1.55276 \times 10^{- 17}$	$1.55015 \times 10^{- 17}$	Yes
TraVaS	Road Traffic Fines	precision	−0.186112023	0.718024594	$7.79982 \times 10^{- 5}$	No
TraVaS	Road Traffic Fines	generalization	−0.109754938	0.038889465	0.000197377	Yes

Appendix B

Table A4. Friedman results of runtime quality ranks for group-based PPPM.

Model	BPIC13	Sepsis	CoSeLog	Road Traffic Fines	Avg. Rank
t_closeness	1	2	1	3	1.75
TLKC	2	1	3	1	1.75
l_diversity	3	5	4	2	3.5
PRETSA	5	3	2	5	3.75
k_anonymity	4	4	5	4	4.25

Table A5. Friedman results of runtime quality ranks for ml-based DP PPPM.

Model	BPIC13	Sepsis	CoSeLog	Road Traffic Fines	Avg. Rank
DP-Naive Bayes	1	3	3	1	2
DP-Decision Trees	3	1	1	3	2
DP-k-Means	2	2	2	2	2

Table A6. Friedman results of runtime quality ranks for more advanced DP PPPM.

Model	BPIC13	Sepsis	CoSeLog	Road Traffic Fines	Avg. Rank
TraVaS	1	1	1	1	1
SaCoFa	3	2	2	2	2.25
LaplacianDP	2	3	3	3	2.75

Table A7. Post hoc Nemenyi pairwise p-value matrix result of runtime quality ranks for more advanced DP PPPM.

	TraVaS	SaCoFa	LaplacianDP
TraVaS	1	0.180509	0.035557
SaCoFa	0.180509	1	0.759287
LaplacianDP	0.035557	0.759287	1

References

Van Der Aalst, W. Process mining. Commun. ACM 2012, 55, 76–83. [Google Scholar] [CrossRef]
EPSU. The General Data Protection Regulation (GDPR): An EPSU Briefing. European Federation of Public Service Unions. 2019. Available online: https://www.epsu.org/sites/default/files/article/files/GDPR_FINAL_EPSU.pdf (accessed on 4 July 2025).
Ileri, I.; Erdogan, T.G.; Kolukisa-Tarhan, A. Privacy for Process Mining: A Systematic Literature Review. IEEE Access 2025, 13, 83171–83194. [Google Scholar] [CrossRef]
Hevner, A.; Chatterjee, S. Design Research in Information Systems: Theory and Practice; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2010; Volume 22. [Google Scholar]
Yin, R.K. Case Study Research and Applications: Design and Methods; Sage Publications: Thousand Oaks, CA, USA, 2017. [Google Scholar]
van der Aalst, W.M.P. Process Mining: A 360 Degree Overview. In Process Mining Handbook; Springer International Publishing: Cham, Switzerland, 2022; Volume 448. [Google Scholar]
van der Aalst, W.; Adriansyah, A.; Alves de Medeiros, A.K.; Arcieri, F.; Baier, T.; Blickle, T.; Bose, J.C.; van den Brand, P.; Brandtjen, R.; Buijs, J.; et al. Process mining manifesto. In Business Process Management Workshops, Proceedings of the BPM 2011 International Workshops, Clermont-Ferrand, France, 29 August 2011, Revised Selected Papers, Part I; Lecture Notes in Business Information Processing; Springer: Cham, Switzerland, 2012; Volume 99, pp. 169–194. [Google Scholar] [CrossRef]
Van Dongen, B.F.; De Medeiros, A.K.A.; Verbeek, H.M.W.; Weijters, A.J.M.M.; Van Der Aalst, W.M.P. The ProM framework: A new era in process mining tool support. In Applications and Theory of Petri Nets 2005, Proceedings of the 26th International Conference, ICATPN 2005, Miami, FL, USA, 20–25 June 2005, Proceedings; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2005; Volume 3536, pp. 444–454. [Google Scholar] [CrossRef]
Günther, C.W.; Rozinat, A. Disco: Discover your processes. In Proceedings of the Demonstration Track of the 10th International Conference on Business Process Management, BPM Demos 2012, Tallinn, Estonia, 3–6 September 2012; CEUR Workshop Proceedings. Volume 936, pp. 40–44. [Google Scholar]
Berti, A.; van Zelst, S.; Schuster, D. PM4Py: A process mining library for Python. Softw. Impacts 2023, 17, 100556. [Google Scholar] [CrossRef]
Leemans, S.J.J.; Fahland, D.; van der Aalst, W.M.P. Discovering block-structured process models from event logs—A constructive approach. In Proceedings of the International Conference on Applications and Theory of Petri Nets and Concurrency, Milan, Italy, 24–28 June 2013; Springer: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
Berti, A.; Van der Aalst, W.M.P. A novel token-based replay technique to speed up conformance checking and process enhancement. In Transactions on Petri Nets and Other Models of Concurrency XV; Springer: Berlin/Heidelberg, Germany, 2021; pp. 1–26. [Google Scholar]
HIPAA Privacy Rule. Available online: https://www.hhs.gov/sites/default/files/ocr/privacy/hipaa/administrative/privacyrule/privrulepd.pdf (accessed on 4 July 2025).
Keele, S. Guidelines for Performing Systematic Literature Reviews in Software Engineering; Technical Report, ver. 2.3 EBSE Technical Report; EBSE: Keele, UK, 2007; Volume 5. [Google Scholar]
Sweeney, L. k-anonymity: A model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 2002, 10, 557–570. [Google Scholar] [CrossRef]
Machanavajjhala, A.; Kifer, D.; Gehrke, J.; Venkitasubramaniam, M. l-diversity: Privacy beyond k-anonymity. ACM Trans. Knowl. Discov. Data (TKDD) 2007, 1, 3-es. [Google Scholar] [CrossRef]
Li, N.; Li, T.; Venkatasubramanian, S. t-closeness: Privacy beyond k-anonymity and l-diversity. In Proceedings of the 2007 IEEE 23rd International Conference on Data Engineering, Istanbul, Turkey, 15–20 April 2007; IEEE: Piscataway, NJ, USA, 2006. [Google Scholar]
Fahrenkrog-Petersen, S.A.; van der Aa, H.; Weidlich, M. PRETSA: Event Log Sanitization for Privacy-aware Process Discovery: (Extended Abstract). Inform.-Spektrum 2019, 42, 352–353. [Google Scholar] [CrossRef]
Rafiei, M.; Wagner, M.; van der Aalst, W.M.P. TLKC-Privacy Model for Process Mining. In Research Challenges in Information Science, Proceedings of the 14th International Conference, RCIS 2020, Limassol, Cyprus, 23–25 September 2020, Proceedings; Lecture Notes in Business Information Processing; Springer: Cham, Switzerland, 2020; Volume 385, pp. 398–416. [Google Scholar] [CrossRef]
Vaidya, J.; Shafiq, B.; Basu, A.; Hong, Y. Differentially private naive bayes classification. In Proceedings of the 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT), Atlanta, GA, USA, 17–20 November 2013; IEEE: Piscatway, NJ, USA, 2013; Volume 1. [Google Scholar]
Fletcher, S.; Islam, M.Z. Differentially private random decision forests using smooth sensitivity. Expert Syst. Appl. 2017, 78, 16–31. [Google Scholar] [CrossRef]
Su, D.; Cao, J.; Li, N.; Bertino, E.; Jin, H. Differentially private k-means clustering. In Proceedings of the Sixth ACM Conference on Data and Application Security and Privacy, New Orleans, LA, USA, 9–11 March 2016. [Google Scholar]
Mannhardt, F.; Koschmider, A.; Baracaldo, N.; Weidlich, M.; Michael, J. Privacy-Preserving Process Mining: Differential Privacy for Event Logs. Bus. Inf. Syst. Eng. 2019, 61, 595–614. [Google Scholar] [CrossRef]
Fahrenkog-Petersen, S.A.; Kabierski, M.; Rosel, F.; Van Der Aa, H.; Weidlich, M. SaCoFa: Semantics-aware Control-flow Anonymization for Process Mining. In Proceedings of the 2021 3rd International Conference on Process Mining (ICPM), Eindhoven, The Netherlands, 31 October–4 November 2021; pp. 72–79. [Google Scholar] [CrossRef]
Fahrenkrog-Petersen, S.A.; van der Aa, H.; Weidlich, M. PRIPEL: Privacy-preserving event log publishing including contextual information. In Business Process Management, Proceedings of the 18th International Conference, BPM 2020, Seville, Spain, 13–18 September 2020, Proceedings 18; Springer International Publishing: Cham, Switzerland, 2020. [Google Scholar]
Rafiei, M.; Wangelik, F.; van der Aalst, W.M.P. TraVaS: Differentially private trace variant selection for process mining. In Proceedings of the International Conference on Process Mining, Bozen-Bolzano, Italy, 23–28 October 2022; Springer Nature: Cham, Switzerland, 2022. [Google Scholar]
Basili, V.R. Software Modeling and Measurement: The Goal/Question/Metric Paradigm; Technical Report CS-TR-2956, UMIACS-TR-92-96; University of Maryland: College Park, MD, USA, 1992. [Google Scholar]
Mannhardt, F. (Felix). Sepsis Cases—Event Log. 2016. Available online: https://data.4tu.nl/articles/_/12707639/1 (accessed on 4 July 2025).
Van Dongen, B.F.; Weber, B.; Ferreira, D.R.; Weerdt, J.D. BPI Challenge 2013. In Proceedings of the 3rd Business Process Intelligence Challenge, Beijing, China, 26 August 2013. [Google Scholar]
Buijs, J. Environmental Permit Application Process (‘Wabo’), Coselog Project. 2014. Available online: https://data.4tu.nl/collections/_/5065529/1 (accessed on 4 July 2025).
De Leoni, M.; Mannhardt, F. Road Traffic Fine Management Process. 2015. Available online: https://data.4tu.nl/articles/_/12683249/1 (accessed on 4 July 2025).
Fujita, T. AnonyPy (Version 0.2.1) [Python Library]. 2024. Available online: https://pypi.org/project/anonypy/ (accessed on 4 July 2025).
Holohan, N.; Braghin, S.; Mac Aonghusa, P.; Levacher, K. Diffprivlib: The IBM differential privacy library. arXiv 2019, arXiv:1907.02444. [Google Scholar]
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
Van Dongen, B.F.; Alves de Medeiros, A.K.; Wen, L. Process mining: Overview and outlook of petri net discovery algorithms. In Transactions on Petri Nets and Other Models of Concurrency II: Special Issue on Concurrency in Process-Aware Information Systems; Springer: Berlin/Heidelberg, Germany, 2009; pp. 225–242. [Google Scholar]
Buijs, J.C.A.M.; Van Dongen, B.F.; van Der Aalst, W.M. On the role of fitness, precision, generalization and simplicity in process discovery. In On the Move to Meaningful Internet Systems: OTM 2012, Proceedings of the Confederated International Conferences: CoopIS, DOA-SVI, and ODBASE 2012, Rome, Italy, 10–14 September 2012, Proceedings, Part I; Springer: Berlin/Heidelberg, Germany, 2012. [Google Scholar]
Schuirmann, D.J. A comparison of the two one-sided tests procedure and the power approach for assessing the equivalence of average bioavailability. J. Pharmacokinet. Biopharm. 1987, 15, 657–680. [Google Scholar] [CrossRef] [PubMed]
Demšar, J. Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 2006, 7, 1–30. [Google Scholar]
Hill, T.; Westbrook, R. SWOT analysis: It’s time for a product recall. Long Range Plan. 1997, 30, 46–52. [Google Scholar] [CrossRef]

Figure 1. Proposed methodology.

Figure 2. Overview of case study protocol.

Figure 3. Discovered PetriNet of Road Traffic Fines event log.

Figure 4. Discovered k-anonymity model quality results.

Figure 5. Discovered k-anonymity run time (s) metric results.

Figure 6. Discovered l-diversity anonymization model quality results.

Figure 7. Discovered l-diversity anonymization runtime (s) metric results.

Figure 8. Discovered t-closeness anonymization model quality results.

Figure 9. Discovered t-closeness anonymization runtime (s) metric results.

Figure 10. Discovered PRETSA anonymization model quality results.

Figure 11. Discovered PRETSA anonymization runtime (s) metric results.

Figure 12. Discovered TLKC anonymization model quality results.

Figure 13. Discovered TLKC anonymization runtime (s) metric results.

Figure 14. Discovered DP-Naive Bayes model quality results.

Figure 15. Discovered DP-Decision Trees model quality results.

Figure 16. Discovered Decision Trees before and after DP-privacy for Road Traffic Fines event log.

Figure 17. Discovered DP-k-Means clustering model quality results.

Figure 18. Discovered LaplacianDP anonymization model quality results.

Figure 19. Discovered LaplacianDP anonymization runtime (s) metric results.

Figure 20. Discovered SaCoFa anonymization model quality results.

Figure 21. Discovered SaCoFa anonymization runtime (s) metric results.

Figure 22. Discovered TraVaS anonymization model quality results.

Figure 23. Discovered TraVaS anonymization runtime(s) metric results.

Table 1. GQM Table.

Goal	Sub-Goal	Question(s)	Metric(s)
G1: Analysis of the discovered process model quality and runtime performance as a result of applying a group-based privacy model on event log data	SG1.1: Analysis of the discovered process model quality and runtime performance as a result of applying a well-known group-based privacy model on event log data	Q1.1.1. How does applying a k-anonymization privacy model on event log data affect the discovered process model result quality and runtime performance? Q1.1.2. How does applying the l-diversity privacy model on event log data affect the discovered process model result quality and runtime performance? Q1.1.3. How does applying the t-closeness privacy model on event log data affect the discovered process model result quality and runtime performance?	Fitness Precision Generalization Time(s)
	SG1.2: Analysis of the discovered process model quality and runtime performance as a result of applying a more advanced group-based privacy model on event log data	Q1.2.1. How does applying a PRETSA privacy model on event log data affect the discovered process model result quality and runtime performance? Q1.2.2. How does applying a TLKC privacy model on event log data affect the discovered process model result quality and runtime performance?	Fitness Precision Generalization Time(s)
G2: Analysis of the discovered process model quality and runtime performance as a result of applying a differential privacy model on event log data	SG2.1: Analysis of the discovered process model quality and runtime performance as a result of applying a differential privacy model on event log data by fitting learning algorithms	Q2.1.1. How does applying a DP-Naive Bayes (supervised) privacy model on event log data affect the discovered process model result quality and runtime performance? Q2.1.2. How does applying a DP-Decision Trees (supervised) privacy model on event log data affect the discovered process model result quality and runtime performance? Q2.1.3. How does applying DP-k-Means (unsupervised) privacy model on event log data affect the discovered process model result quality and runtime performance?	Accuracy on test set prediction Inertia clustering metric Time(s)
	SG2.2: Analysis of the discovered process model quality and runtime performance as a result of applying a more advanced differential privacy model on event log data	Q2.2.1. How does applying a Laplacian-DP privacy model on event log data within PRIPEL affect the discovered process model result quality and runtime performance? Q2.2.2. How does applying a SaCoFa privacy model on event log data within PRIPEL affect the discovered process model result quality and runtime performance? Q2.2.3. How does applying a TraVaS privacy model on event log data affect the discovered process model result quality and runtime performance?	Fitness Precision Generalization Time(s)

Table 2. Characteristics of the event logs used in the experiments.

Event Log	#Events	#Cases	#Activities	#Variants	Trace Uniqueness
Sepsis	15,214	1050	16	846	80%
BPIC13	65,533	7554	4	1511	20%
CoSeLog	8577	1434	27	116	8%
Road Traffic Fines	561,470	150,370	11	231	0.1%

Table 3. Conformance checking metric results of the event logs after process discovery without anonymization (Baseline).

Event Log	Fitness	Precision	Generalization
Sepsis	1.0000000000000000	0.2576207541934070	0.9024534736155710
BPIC13	1.0000000000000000	0.6258308214735390	0.8711321707799360
CoSeLog	0.9998721858004690	0.1669525990225890	0.8317957236355960
Road Traffic Fines	1.0000000000000000	0.5822525881810320	0.9752594106217890

Table 4. Discovered (non-private) DP-Naive Bayes model quality and runtime metric(s) results.

Event Log	Accuracy	Time
Sepsis	0.79	0.02299785614013672
BPIC13	0.75	0.029001474380493164
CoSeLog	0.35	0.015794992446899414
Road Traffic Fines	0.98	0.36087751388549805

Table 5. Discovered DP-Naive Bayes model runtime(s) metric results.

Event Log	Epsilon = 0.01	Epsilon = 0.05	Epsilon = 0.1	Epsilon = 0.5	Epsilon = 1
Sepsis	3.386	0.814	0.666	0.483	0.474
BPIC13	0.12163	0.06749	0.05439	0.04909	0.04636
CoSeLog	3.709	0.831	0.511	0.3260	0.3261
Road Traffic Fines	0.40613	0.38346	0.38048	0.373040	0.37363

Table 6. Discovered (non-private) DP-Decision Trees model quality and runtime metric(s) results.

Event Log	Accuracy	Time
Sepsis	0.81	0.0729990005493164
BPIC13	1.00	0.07899904251098633
CoSeLog	0.59	0.095794677734375
Road Traffic Fines	1.00	0.5005433559417725

Table 7. Discovered DP-Decision Trees model runtime(s) metric results.

Event Log	Epsilon = 0.01	Epsilon = 0.05	Epsilon = 0.1	Epsilon = 0.5	Epsilon = 1
Sepsis	0.16942	0.14447	0.14175	0.14023	0.14360
BPIC13	0.57645	0.60080	0.60019	0.58835	0.57865
CoSeLog	0.084728	0.0082668	0.078429	0.081888	0.0084339
Road Traffic Fines	5.1188	5.1623	5.1640	5.0518	5.1361

Table 8. Discovered (non-private) k-Means clustering model quality and runtime metric (s) results.

Event Log	Inertia Value	Time
Sepsis	1,375,898,051.000	0.650
BPIC13	1,157,175,781,977.000	0.520
CoSeLog	749,964,802.000	0.710
Road Traffic Fines	7,249,390,361,712.000	11.400

Table 9. Discovered DP-k-Means clustering model runtime(s) metric results.

Event Log	Epsilon = 0.01	Epsilon = 0.05	Epsilon = 0.1	Epsilon = 0.5	Epsilon = 1
Sepsis	0.580	0.530	0.520	0.500	0.480
BPIC13	0.310	0.320	0.310	0.310	0.300
CoSeLog	0.210	0.220	0.210	0.190	0.200
Road Traffic Fines	2.170	2.180	2.160	5.370	6.230

Table 10. SWOT Analysis.

SWOT Analysis	The Proposed Methodology
Strengths	A well-defined conceptualization of goals, questions, and metrics Establishing a systematic linkage between questions and metrics through the utilization of PPPM features Integrating diverse domain-specific knowledge to model performance evaluation and enhancement
Weakness	Employing a range of tools to support the different execution of PPPM-related activities and model development Need for addition of privacy-quantifying metrics tailored to the specific requirements of each domain
Opportunities	Provision of a comprehensive and iterative methodology Structured guidance of methodological support for defining operational analysis scenarios Support for the integration and adaptation of newly emerging PPPM models
Threats	Need for revision of features and related metrics as new PPPM features or models arrive

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ileri, I.; Erdogan, T.G.; Kolukisa-Tarhan, A. A Goal-Oriented Evaluation Methodology for Privacy-Preserving Process Mining. Appl. Sci. 2025, 15, 7810. https://doi.org/10.3390/app15147810

AMA Style

Ileri I, Erdogan TG, Kolukisa-Tarhan A. A Goal-Oriented Evaluation Methodology for Privacy-Preserving Process Mining. Applied Sciences. 2025; 15(14):7810. https://doi.org/10.3390/app15147810

Chicago/Turabian Style

Ileri, Ibrahim, Tugba Gurgen Erdogan, and Ayca Kolukisa-Tarhan. 2025. "A Goal-Oriented Evaluation Methodology for Privacy-Preserving Process Mining" Applied Sciences 15, no. 14: 7810. https://doi.org/10.3390/app15147810

APA Style

Ileri, I., Erdogan, T. G., & Kolukisa-Tarhan, A. (2025). A Goal-Oriented Evaluation Methodology for Privacy-Preserving Process Mining. Applied Sciences, 15(14), 7810. https://doi.org/10.3390/app15147810

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Goal-Oriented Evaluation Methodology for Privacy-Preserving Process Mining

Abstract

1. Introduction

2. Background

2.1. Process Mining

2.2. Privacy and Process Mining

2.3. Related Works

3. Methodology

4. Case Study

4.1. Establishing Goals and Questions

4.2. Data Gathering

4.3. Event Logs Description

4.4. Data Preprocessing

4.5. PPPM Analysis and the Derivation of Answers to Questions

4.5.1. PPPM Analysis of k-Anonymity (Q1.1.1)

Result Utility Quality

Time Utility Quality

4.5.2. PPPM Analysis of l-Diversity (Q1.1.2)

Result Utility Quality

Time Utility Quality

4.5.3. PPPM Analysis of t-Closeness (Q1.1.3)

Result Utility Quality

Time Utility Quality

4.5.4. PPPM Analysis of PRETSA (Q1.2.1)

Result Utility Quality

Time Utility Quality

4.5.5. PPPM Analysis of TLKC (Q1.2.2)

Result Utility Quality

Time Utility Quality

4.5.6. PPPM Analysis of DP-Naive Bayes (Q2.1.1)

Result Utility Quality

Time Utility Quality

4.5.7. PPPM Analysis of DP-Decision Trees (Q2.1.2)

Result Utility Quality

Time Utility Quality

4.5.8. PPPM Analysis of DP-k-Means (Q2.1.3)

Result Utility Quality

Time Utility Quality

4.5.9. PPPM Analysis of Laplacian-DP (Q2.2.1)

Result Utility Quality

Time Utility Quality

4.5.10. PPPM Analysis of SaCoFa (Q2.2.2)

Result Utility Quality

Time Utility Quality

4.5.11. PPPM Analysis of TraVaS (Q2.2.3)

Result Utility Quality

Time Utility Quality

4.6. Results Evaluation

4.7. Recommendations for Possible PPPM Improvement

5. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A

Appendix B

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI