CONDA-PM—A Systematic Review and Framework for Concept Drift Analysis in Process Mining
Abstract
:1. Introduction
2. Fundamentals
- Control-flow perspective, that is, order of activities. Mining this perspective allows characterising and exploring possible execution paths of a process.
- Organisational perspective, that is, organisational resources and roles involved in the occurrence of events. Mining this perspective enables the discovery of the social network and the roles participating in a business process.
- Time perspective, that is, the point in time events occur. Mining this perspective enables finding bottlenecks and monitoring key performance indicators (KPI) of a business process.
- Case perspective, that is, properties of process instances (e.g., values of data elements characterising a process instance). Mining this perspective enables understanding contextual information specific to the current process instance.
3. Concept Drift in Process Mining
3.1. Distinguishing Concept Drift from Other Concepts
3.1.1. Concept Drift and Noise
3.1.2. Concept Drift and Deviations
3.1.3. Concept Drift and Process Drift
3.1.4. Variability and Flexibility
3.2. Categories of Concept Drift Analysis
3.2.1. Concept Drift Types
3.2.2. Concept Drift Analysis Tasks
- Detecting the point of time when a drift took place. This task is about detecting a drift as well as the time period at which the drift occurred.
- Localising the part of the process where the drift happened and characterising the process perspective affected by the drift, for example, the control–flow, data, or resource perspectives. Moreover, this category includes identifying the pattern of the concept drift, that is, whether the drift is sudden, gradual, incremental, recurring, or multi-order.
- Discovering the evolution process. This task involves putting the detected drift into a larger context, in order to discover how the business process is changing. Discovering the change process can be jointly achieved by studying the complete set of detected drifts for this process, rather than analysing the drifts in isolation from each other.
3.2.3. Putting all the Pieces Together
- Where do changes occur (i.e., abstraction level of the change)?—This aspect focuses on whether a change occurs at the process type or the process instance level.
- What is being changed (i.e., subject of change)?—This aspect concerns the process perspectives affected by a change.
- How to characterise a change (i.e., properties of change)?—This aspect can be mapped onto the drift patterns and duration of a drift?
4. Research Methodology
4.1. Research Questions
- RQs investigating the inputs of an approach:
- RQ1: What are the inputs needed by a concept drift analysis approach?—This question targets identifying the artefacts used by an approach as starting point for concept drift analysis. This kind of input affects the choice of the techniques to be used at the presence of this input, the expected outcome, and the kind of change to be studied, that is, whether the approach is concerned with analysing behavioural or structural process change.
- RQs investigating the processing phase of an approach:
- RQ2: How can concept drift analysis be classified to gain a more profound understanding of the characteristics of the various approaches?—Trying to find some matching criteria between approaches may help in gaining understanding of the potential building blocks of a concept drift approach, and hence, enabling forming this understanding into a guiding framework. Finding and exploring differences between contemporary approaches may help in identifying specific application scenarios for these approaches.
- RQ3: Which process perspectives are subject of concept drift analysis processes?—This research question studies which perspectives of an event log are covered an existing approach, which is crucial to judge the expressiveness of the approach. For example, the control-flow perspective may be used to study both structural and behavioural changes of a business process, whereas studying changes concerning the time perspective can facilitate the task of predictive monitoring, enriching it with information beneficial for performance-related predictions.
- RQ4: Which granularity level of a business process is mainly addressed by existing concept drift analysis approaches?—Studying the granularity level of a change (i.e., whether the change is applied at the process type or process instance level) is crucial to derive insights on the prevalence and effects of the change. A drift on the process type affects the related process variants, and hence, a larger number of process instances than a drift on the instance level which is limited to one single variant of this business process. Identifying the level at which a change occurs is important in making a decision on how to handle the change and in calculating the costs of making corrective actions or propagating the change.
- RQs investigating the outputs and evaluation of an approach:
- RQ5: Do existing approaches provide usable implementations?—The maturity of any approach is reflected by the availability of a technical implementation, being able to demonstrate its applicability and sustainability.
- RQ6: How does an approach to concept drift analysis foster a business process proactivity to change?—Studying a concept drift in an online setting has different implications than studying it in an offline mode. Furthermore, the ability of an approach to precisely define both the moment and type of drift is crucial for guiding the decision making process. Yielding precise outcomes of a concept drift analysis approach affects related procedures, for example, predicting the outcome of a process instance when knowing that this process instance was subject to a drift during its execution. Having accurate information about a drift enables process practitioners not only to react to changes, but also to make decisions about future improvements of the business process. Process practitioners may also make decisions taking future drifts into account. This way, concept drift analysis outcomes enable business process proactivity to changes.
- RQ7: How to evaluate concept drift analysis approaches?—Studying the nature and types of event logs contemporarily used to evaluate existing approaches is crucial for any successful evaluation process. In particular, three factors are crucial—the artefacts used for the evaluation, the techniques used, and the questions asked. For example, using an artificial event log tailored towards a specific scenario affects the ability to generalise the approach under evaluation. Note that the maturity of an approach is highlighted through the evaluation phase.
4.2. Search Keywords
“Concept drift in process mining” OR “change mining” OR “concept drift in business process” OR “concept drift in event log” OR “change in business process” OR “deviation in process mining” OR “deviance mining in business process” OR “business process variants” OR “mining process variants”
4.3. Sources
- Science Direct - Elsevier (https://www.sciencedirect.com/)
- IEEE Xplore Digital Library (https://ieeexplore.ieee.org/Xplore/home.jsp)
- SpringerLink (https://link.springer.com/)
- ACM Digital Library (https://dl.acm.org/)
- Google Scholar (https://scholar.google.com/)
- Abstract. This criterion was the necessary for gaining an overview of the topic and contributions of a study.
- Title. This criterion was the determinant in the first round of selecting relevant studies to be included in the SLR.
- Year of publication. The more recent a study is, the more it builds upon knowledge gained from previous works. Furthermore, more recent publications constitute potential sources for identifying other studies by conducting backward reference searching. Finally, less recent studies have been expected to be analysed in more recent ones, that is, this kind of analysis was surveyed as well.
- Type. This criterion was one of the determinants of the maturity level of a study, for example, publications in conference proceedings are expected to provide more mature approaches than the ones published in workshop proceedings. Moreover, journal publications are considered being most mature and detailed.
- Publisher. The publishing institution has a weight on the expected quality of research done through a study. For example, publishers like Elsevier, IEEE Xplorer, and Springer are known for editing high impact factor journals and conference proceedings.
- Number of citations. This criterion was one of the indicators of the relevance on the considered approach. Sometimes, the number of citations was influenced by the year of publication. Furthermore, we came to the conclusion that the number of citations is influenced by the direction an approach is pursuing. This intuition was developed after deriving the directions of studies, which are introduced in Section 5. However, we did not consider the number of citations as a main criterion for including or excluding a study as it will be affected by the year of publications, especially for the studies published between 2017 and 2019.
4.4. Inclusion and Exclusion Criteria
- Inclusion criteria:
- The study uses techniques relevant for analyzing concept drift in the context of process mining.
- The study deals with a case study that is relevant to process mining and there is a possibility of concept drift in this case study.
- The study uses event log-related techniques for analyzing concept drift.
- The study deals with change analysis in business processes.
- Exclusion criteria:
- The study deals with a topic in process mining other than concept drift.
- The study deals with concept drift in a field other than process mining.
- The study is a technical report or a thesis.
- The study is not presented entirely in English language.
- The study is not publicly available or is not published in conference proceedings, journals or books.
4.5. Study Selection
4.6. Data Extraction
- Inputs. This aspect deals with the necessary inputs used by the approach proposed in the study. An approach may have one or multiple inputs of different types (cf. RQ1).
- Change form. This aspect indicates whether the approach tries to identify drifts in the form of behavioural or structural changes (cf. RQ6).
- Regarded perspective. This aspect indicates which process perspectives a study considers, that is, whether to identify drifts regarding process activities and their control flow, or other aspects like time or organisational resources. The more perspectives a study covers, the more comprehensive it is (cf. RQ3).
- Study focus. This aspect describes which concept drift analysis task the study is concerned with (cf. RQ6).
- Handling mode. This aspect indicates whether the tasks of concept drift analysis are accomplished online or offline by the approach proposed in the study (cf. RQ6).
- Drift patterns. This aspect represents the type of drift is addressed by the approach (cf. RQ6).
- Theme. This aspect shows whether the approach represents an attempt to analyse concept drift at the process type level indicating the (variability) of a process, or at the process instance level referring to the (flexibility) of this process (cf. RQ4).
- Used techniques. This point indicates the category of techniques used by the approach (cf. RQ2).
- Tool availability. This point indicates whether there is an implementation of the approach (cf. RQ5).
- Evaluation input. This point represents the types of inputs the study relied on when evaluating the proposed approach (cf. RQ7).
- Evaluation (What/How). This point shows how the approach was evaluated (cf. RQ7).
- Initial list based on previous knowledge. This corresponds to a list constructed by the first author of this paper and revised by one of the co-authors. This list is based on previous knowledge of the point and its possible values in the context of process mining.
- Predefined list. The values of this list are extracted from either an existing cited study, or common preliminary knowledge of concept drift and process mining.
- Frequency counts. This count is used when the value of the criterion is clearly stated in the study.
- Content analysis techniques. Techniques described in Reference [37] to infer and extract information from text are applied.
4.7. Highlights after Data Extraction
5. Concept Drift Analysis Approaches in Process Mining
- windowing to select the traces that shall be considered for drift analysis and for which a statistical analysis is conducted, or on
- clustering-based techniques to find groups of traces that share similar characteristics that can be generalised and used to detect drifts.
5.1. Approaches Relying on Model-to-Log Alignment
5.2. Graph-Based Analysis Approaches
5.3. Clustering-Based Approaches
- Transforming the event log into a relation matrix based on two relations, namely direct succession and weak order relation. Note that the weak order relation does not provide a reliable characterisation of any trace, as it counts all potential relations that may occur between two activities in a given trace, whether or not these relations already exist.
- For each relation, a frequency level of values [always/never/sometimes] is determined to localize candidate change points, that is, positions where a change in the relations between two activities is detected. The latter occur when the frequency level for a given relation is changed from one value to another one within a given interval. The length of this interval is restricted by a pre-decided threshold.
- Candidate change points are combined for the whole event log to gain an overall view of change points in the whole process recorded in the event log.
5.4. Approaches Based on Windowing and Statistical Analysis
6. The CONDA-PM Framework
6.1. Goals Design Phase
- G01: Defining the granularity level at which the approach operates. This task is concerned with specifying whether the approach is going to analyse concept drift at the process type level (variability) or process instance level (flexibility). To evaluate the maturity of an approach in this respect, we use a 3-value scale (1, 2, 3) to indicate whether the approach analyses drifts at process level or process instance level or both, respectively. We tend to give higher values to flexibility, as concept drifts are more captured at less granularity levels, that is, the process instance level in this situation. At the process level, changes tend to be more planned, while capturing changes at the process instance level allows reflecting unplanned changes.
- G02: Configuring the handling mode in which a process instance is analysed. This task requires specifying whether the approach is going to address concept drift in an online mode (e.g., when many running process instances are under execution), or drift analysis is going to be addressed in an offline mode (i.e., after all process instances are complete and recorded in the event log). To evaluate maturity of an approach in this respect, we use a 2-value scale (1, 2) to indicate whether the approach analyses drifts in offline or onlines mode, respectively. We tend to give higher values to analysing concept drift in an online mode, as the outcomes of conducting online analysis of concept drift might increase robustness, that is, not just responding to change, but also overseeing it and carrying on proactive decision making process. Furthermore, analysing drifts online fosters predictive monitoring and provides more insights into the current status of a running process instance.
6.2. Approach Coding Phase
6.2.1. Input Investigation
- AI1: Choosing artefacts to be used. This task is concerned with specifying the inputs that the approach shall process. To evaluate the maturity of an approach in this respect, we use a 2-value scale (1, 2) to indicate whether the approach uses the event log solely or along with the process model, respectively. We tend to give higher values to an approach which uses a process model along with an event log, because this may yield more comprehensive outcomes, enable studying different forms of change, complement event log chosen perspectives, and give a wider spectrum of techniques to be used.
- AI2: Analysing inputs to define regarded event log perspectives. This task is concerned with specifying for which event log perspective the approach shall analyse changes. It is also needed to be decided whether the approach shall focus on one perspective or shall consider multiple event log perspectives. To evaluate the maturity of an approach in this respect, we use a 3-value scale (1, 2, 3) to indicate whether the approach studies only drifts in the control flow perspective or two perspectives or more, respectively.
- AI3: Defining the concept drift type to be addressed. This task is concerned with deciding which concept drift category will be addressed (cf. Section 3.2.1). Deciding which category is highly affected by the variance in data available through an event log. To evaluate the maturity of an approach in this respect, we use a 3-value scale (0, 1, 2) to indicate whether the approach does not specify the type of the drift or studies only sudden drifts or studies two or more drift types, respectively.
6.2.2. Processing
- AP1: Concept drift task addressed. This task is concerned with deciding whether the approach shall detect, localise, or predict the evolution of a concept drift (i.e., how the business process shall be affected by a combination of drifts, not one drift solely), or enable a combination of these tasks (cf. Section 3.2.2). To evaluate the maturity of an approach in this respect, we use a 4-value scale:
- −
- Value of 0 indicates that the focus of an approach is not clear through the study.
- −
- Value of 1 indicates that the approach is only able to detect a drift or localise it.
- −
- Value of 2 indicates that the approach is concerned with both detection and localisation.
- −
- Value of 3 indicates that the approach handles all three tasks of concept drift analysis.
- AP2: Defining the change form to be studied. This task is concerned with defining the change form supported by the approach. The decision on this is influenced by the type of inputs as well as the process perspectives covered by the approach. To evaluate the maturity of an approach in this respect, we use a 2-value scale (1, 2) to indicate whether the approach either analyses structural or behavioural changes or the former studies both forms of change, respectively.In order to evaluate the maturity of an approach in tasks AI2, AI3, AP1, and AP2, we tend to give higher value to choices which may enable a more comprehensive approach. Note that in AP2, we tend to assign the same value (i.e., 1) to an approach when it analyses either forms of change. This equal assignment is due to the fact that a more comprehensive approach is the one that provides mechanisms to analyse both forms of change. Whereas, analysing any of the two forms of change does not have priority over the other one.
- AP3: Applying suitable techniques. This task is concerned with deciding which techniques shall be used to analyse concept drift. Note that the chosen technique depends on the outcomes of the former tasks.The techniques used in the 19 selected studies along with a discussion of their strengths and limitations are presented in Section 5. Furthermore, we list the used techniques as a criterion extracted from the selected studies and used to evaluate these studies in Table 9.
6.3. Implementation Phase
- I01: Approach implementation and tool availability. This task is concerned with coding an approach and implementing the proposed algorithm. To evaluate the maturity of an approach in this respect, we use a 2-value scale (0, 1) to indicate whether the approach is limited to the theoretical concept described by the respective study or there is an available implementation, respectively.
- I02: Choosing deployment platform. This task is concerned with deploying the implementation of an approach and making it available for further usage and evaluation by other researchers. To evaluate the maturity of an approach in this respect, we use a 3-value scale (0, 1, 2) to indicate if there is not an available implementation or there is a prototype or a package is implemented in either ProM or Apromore (https://apromore.org/), respectively. We tend to give higher values to approaches where an implementation is made available through a platform like, for example, ProM or Apromore.
6.4. Evaluation Phase
- E01: Choosing evaluation artefacts. This task is concerned with choosing to evaluate the applicability of an approach on event logs available from either real-life case studies, or artificially generated. To evaluate the maturity of an approach in this respect, we use a 4-value scale (0, 1, 2, 3) to indicate whether no evaluation input is used or the approach is evaluated using only artificial logs or real-life logs or both, respectively.
- E02: Evaluation criteria (i.e., what to evaluate). This task is concerned with defining which aspects of the approach with be evaluated. We list which aspects are evaluated as a criterion extracted (if available) from the 19 selected studies and used to evaluate them in Table 10.
- E03: Used evaluation techniques and metrics (i.e., how to evaluate). This task is concerned with defining how to apply and evaluate an approach, that is, choosing evaluation techniques and metrics. We list how the approaches in the 19 selected studies are evaluated (if available) as a criterion in Table 10.
7. CONDA-PM Application and Evaluation
8. Discussion
9. Related Work
10. Summary and Outlook
Author Contributions
Funding
Conflicts of Interest
Glossary
CONDA-PM | CONcept Drift Analysis in Process Mining Framework. A four-staged framework providing guidance on the fundamental components of a concept drift analysis approach in the context of process mining |
SLR | Systematic Literature Review. A survey of a topic conducted according to systematic steps and adopts a certain format |
concept drift | the situation in which the process is changing while being analysed |
normative process model | an ideal view on how business process activities shall be performed and used for monitoring and inspecting how running process instances are executed |
descriptive model | a process model which captures how process activities are carried out |
An event log | a record of the actual execution of a process in terms of cases |
process instance | notion of process executions at the conceptual view of a business process |
case | notion of process executions at the data view of a business process |
trace | notion of process executions at the logical view of a business process |
An event stream | a collection of events representing an incomplete running process instance |
Control-flow perspective | order of activities executed in a business process |
Organisational perspective | organisational resources and roles involved in the execution of events |
Time perspective | the point in time events occur |
Case perspective | properties of process instances as they are stored in an event log |
A process history | a list of process models discovered for a business process whenever a violation is detected |
A process variant | an execution path of a business process which results in different process instances which share some commonalities like sharing some same activities |
Behavioural profiles | a representation of relations between every pair of events or activities appearing in an event log |
An execution graph | a directed acyclic graph used to represent the execution of a process instance in terms of nodes (i.e., events), edges (i.e. relations between the nodes), and functions assigning each event to its activity class |
Procedural process models | a conceptual view of how activities should be carried out |
declarative process models | a formalisation of the undesired behaviour through defining a set of constraints. In these models, the order of activities execution is not rigid |
noise | rare and infrequent behaviour which may appear in an event log |
Deviations | additional behaviour observed in the event log, but not foreseen in the normative process |
deviance mining | a family of process mining techniques aimed at analyzing event logs in order to explain the reasons why a business process deviates from its normal or expected execution |
business process drift detection | a family of techniques to analyse event logs or event streams generated during the execution of a business process in order to detect points in time when the behaviour of recent executions of the process differs significantly from that of older cases |
change log | a log created and maintained by adaptive PAISs, and contains information about process changes at both type and instance level |
variability | the process of providing a customisable process model |
flexibility | regards the changing circumstances and variations at the process instance level |
sudden concept drift | a drift happening suddenly at a certain point during the execution of a process instance |
Incremental concept drift | a drift is introduced incrementally into the running process until the process reaches the desired version |
Gradual concept drift | a drift type when two versions of the same process co-exist and process instances following both versions are running, till a certain point when only the new process model is adopted |
Recurring concept drift | a drift caused by the process context or environment and takes place only over a defined period of time |
multi-order drift | a change taking place on a macro level involving two different process model versions of the same process and on a micro level representing a slight change in the same process model |
Concept drift analysis tasks | concept drift detection, concept drift localisation, and concept drift evolution analysis |
flexibility by change | a condition where a certain change is applied to selected traces at runtime |
variability by restriction | provides a process model with all allowed behavior and restricting models that may be configured from this model by adding more behavior |
variability by extension | provides a process model whose behaviour can be extended by deriving other process models based on the behaviour explained by it |
Vertical partitioning | the situation when traces in an event log are divided based on a selected criterion, e.g., case Id or start time of the first event in a trace and they are processed on several computing machines |
horizontal partitioning | the situation when traces in an event log are divided based on activities |
alignment matrix | a matrix used in log-to-model alignment to enable replaying logs on process variants to decide on the most frequent process variant or the most relevant one |
data-based partial ordering | two events are dependent if they access the same data attributes |
Time-based partial ordering | two events are dependent if they have different timestamps |
ERTC | a measure used for quantifying the extent of the drift change |
VIVACE | a framework which analyses how different process variability approaches support process variability through a business process lifecycle |
References
- van der Aalst, W.; Adriansyah, A.; de Medeiros, A.K.A.; Arcieri, F.; Baier, T.; Blickle, T.; Bose, J.C.; van den Brand, P.; Brandtjen, R.; Buijs, J.; et al. Process mining manifesto. In Business Process Management Workshops; Daniel, F., Barkaoui, K., Dustdar, S., Eds.; Lecture Notes in Business Information Processing; Springer: Berlin/Heidelberg, Germany, 2012; Volume 99, pp. 169–194. [Google Scholar]
- van der Aalst, W.M.P. Process Mining: Data Science in Action, 2nd ed.; Springer: Berlin/Heidelberg, Germany; New York, NY, USA, 2016; 467p. [Google Scholar]
- Stertz, F.; Rinderle-Ma, S. Process histories-detecting and representing concept drifts based on event streams. In On the Move to Meaningful Internet Systems; Panetto, H., Debruyne, C., Proper, H., Ardagna, C., Roman, D., Meersman, R., Eds.; LNCS sublibrary. SL 2, Programming and software engineering; Springer: Berlin/Heidelberg, Germany, 2018; Volume 11229, pp. 318–335. [Google Scholar]
- Bolt, A.; der Aalst, W.M.; de Leoni, M. Finding process variants in event logs. In On the Move to Meaningful Internet Systems. OTM 2017 Conferences, Proceedings of the Confederated International Conferences: CoopIS, C&TC, and ODBASE 2017, Rhodes, Greece, 23–27 October 2017; Panetto, H., Debruyne, C., Gaaloul, W., Mike, P., Paschke, A., Ardagna, C., Meersman, R., Eds.; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2017; Volume10573, pp. 45–52. [Google Scholar]
- Lu, X.; Fahland, D.; van den Biggelaar, F.J.H.M.; van der Aalst, W.M.P. Detecting deviating behaviors without models. In Business Process Management Workshops, Proceedings of the 13th International Workshops on Business Process Management Workshops (BPM 2015), Innsbruck, Austria, 31 August–3 September 2015; Reichert, M., Reijers, H., Eds.; Lecture Notes in Business Information Processing; Springer: Berlin/Heidelberg, Germany, 2016; Volume 256, pp. 126–139. [Google Scholar]
- Armas-Cervantes, A.; García-Ba nuelos, L.; Dumas, M. Event structures as a foundation for process model differencing, part 1: Acyclic processes. In Web Services and Formal Methods, Proceedings of the 9th International Workshop, WS-FM 2012, Tallinn, Estonia, 6–7 September 2012; Lecture Notes in Computer Science; Hutchison, D., Kanade, T., Kittler, J., Kleinberg, J., Mattern, F., Mitchell, J., Naor, M., Nierstrasz, O., Pandu Rangan, C., Steffen, B., et al., Eds.; Springer: Berlin/Heidelberg, Germany, 2013; Volume 7843, pp. 69–86. [Google Scholar]
- Pedro, H.; Richetti, P.; Bai ao, F.; Santoro, F.M. Declarative process mining: Reducing discovered models complexity by pre-processing event logs. In Business Process Management, Prceedings of the 12th International Conference, BPM 2014, Haifa, Israel, 7–11 September 2014; Sadiq, S., Soffer, P., Völzer, H., Eds.; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2014; Volume 8659, pp. 400–407. [Google Scholar]
- Yeshchenko, A.; di Ciccio, C.; Mendling, J.; Polyvyanyy, A. Comprehensive Process Drift Detection with Visual Analytics. In Conceptual Modeling, Proceedings of the 38th International Conference on Conceptual Modeling (ER 2019), Salvador, Bahia, Brazil, 4–7 November 2019; Laender, A.H.F., Pernici, B., Lim, E., de Oliveira, J.P.M., Eds.; Springer: Berlin/Heidelberg, Germany, 2019; pp. 119–135. [Google Scholar]
- Maaradji, A.; Dumas, M.; Rosa, M.; Ostovar, A. Detecting sudden and gradual drifts in business processes from execution traces. IEEE Trans. Knowl. Data Eng. 2017, 29, 2140–2154. [Google Scholar] [CrossRef]
- Aysolmaz, B.; Schunselaar, D.M.M.; Reijers, H.; Yaldiz, A. Selecting a process variant modeling approach: Guidelines and application. Softw. Syst. Model. 2019, 18, 1155–1178. [Google Scholar]
- Li, C.; Reichert, M.; Wombacher, A. Mining business process variants: Challenges, scenarios, algorithms. Data Knowl. Eng. 2011, 70, 409–434. [Google Scholar] [CrossRef] [Green Version]
- Gama, J.; Žliobaitė, I.; Bifet, A.; Pechenizkiy, M. A survey on concept drift adaptation. ACM Comput. Surv. 2014, 46, 1–37. [Google Scholar] [CrossRef]
- Bose, R.J.; van der Aalst, W.M.P.; Žliobaitė, I.; Pechenizkiy, M. Dealing with concept drifts in process mining. IEEE Trans. Neural Netw. Learn. Syst. 2014, 25, 154–171. [Google Scholar] [CrossRef] [PubMed]
- Weijters, A.J.M.M.; van der Aalst, W.M.P.; De Medeiros, A.K. Process Mining with the Heuristics Miner Algorithm; BETA Publicatie: Working Papers; Technische Universiteit Eindhoven: Eindhoven, The Netherlands, 2006; Volume 166. [Google Scholar]
- Fuzzy Miner in ProM: Tutorial. Available online: http://www.processmining.org/online/fuzzyminer (accessed on 16 June 2020).
- Nguyen, H.; Dumas, M.; La Rosa, M.; Maggi, F.; Suriadi, S. Business Process Deviance Mining: Review and Evaluation; CoRR: Leawood, KS, USA, 2016. [Google Scholar]
- van der Aalst, W.M.P.; Weijters, A.J.M.M. Process mining: A research agenda. Comput. Ind. 2004, 53, 231–244. [Google Scholar] [CrossRef]
- Gunther, C.W.; Rinderle-Ma, S.; Reichert, M.; van der Aalst, W.M.P.; Recker, J. Using process mining to learn from process changes in evolutionary systems. Int. J. Bus. Process Integr. Manag. 2008, 3, 61–78. [Google Scholar] [CrossRef]
- van der Aalst, W.M.P.; Gunther, C.W.; Recker, J.; Reichert, M. Using process mining to analyze and improve process flexibility. In Proceedings of the BPMDS’06: Seventh Workshop on Business Process Modeling, Development, and Support, Luxemburg, 5–9 June 2006; Regev, G., Soffer, P., Schmidt, R., Eds.; 2006; Volume 236, pp. 168–177. [Google Scholar]
- La Rosa, M.; van der Aalst, W.M.P.; Dumas, M.; Milani, F.P. Business process variability modeling: A survey. ACM Comput. Surv. 2017, 50, 1–45. [Google Scholar] [CrossRef] [Green Version]
- Martjushev, J.; Bose, R.P.J.C.; van der Aalst, W.M.P. Change point detection and dealing with gradual and multi-order dynamics in process mining. In Perspectives in Business Informatics Research, Proceedings of the 14th International Conference, BIR 2015, Tartu, Estonia, 26–28 August 2015; Matulevičius, R., Dumas, M., Eds.; Lecture Notes in Business Information Processing; Springer: Berlin/Heidelberg, Germany, 2015; Volume 229, pp. 161–178. [Google Scholar]
- Kumar, M.; Thomas, L.; Basava, A. Capturing the sudden concept drift in process mining. In Proceedings of the International Workshop on Algorithms & Theories for the Analysis of Event Data, Brussels, Belgium, 22–23 June 2015; Volume 1371, pp. 132–143. [Google Scholar]
- Song, M.; Gunther, C.W.; van der Aalst, W.M.P. Trace clustering in process mining. In Business Process Management Workshops, Proceedings of the BPM 2008 International Workshops, Milano, Italy, 1–4 September 2008; Ardagna, D., Mecella, M., Yang, J., Eds.; Lecture Notes in Business Information Processing; Springer: Berlin/Heidelberg, Germany, 2009; Volume 17, pp. 109–120. [Google Scholar]
- Fokkens, A.; Braake, S.; Maks, I.; Ceolin, D. On the Semantics of Concept Drift: Towards Formal Definitions of Concept Drift: Towards Formal Definitions of Concept and Semantic Change. 2016. Available online: http://ceur-ws.org/Vol-1799/Drift-a-LOD2016_paper_2.pdf (accessed on 2 July 2020).
- Kitchenham, B.A.; Charters, S. Guidelines for p Erforming Systematic Literature Reviews in Software Engineering. 2007. Available online: https://edisciplinas.usp.br/pluginfile.php/4108896/mod_resource/content/2/slrPCS5012_highlighted.pdf (accessed on 2 July 2020).
- Florian Stertz and Stefanie Rinderle-Ma. Detecting and identifying data drifts in process event streams based on process histories. In Information Systems Engineering in Responsible Information Systems, Proceedings of the CAiSE: International Conference on Advanced Information Systems Engineering, Rome, Italy, 3–7 June 2019; Cappiello, C., Ruiz, M., Eds.; Lecture Notes in Business Information Processing; Springer: Berlin/Heidelberg, Germany, 2019; Volume 350, pp. 240–252. [Google Scholar]
- Nguyen, H.; Dumas, M.; Marcello, L.R.; Arthur, H.M. Multi-perspective comparison of business process variants based on event logs. In Conceptual Modeling, Proceedings of the 37th International Conference, ER 2018, Xi’an, China, 22–25 October 2018; Trujillo, J.C., Davis, K.C., Du, X., Li, Z., Ling, T.W., Li, G., Lee, M.L., Eds.; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2018; Volume 11157, pp. 449–459. [Google Scholar]
- Hompes, B.F.A.; Buijs, J.C.A.M.; van der Aalst, W.M.P.; Dixit, P.M.; Buurman, J. Detecting changes in process behavior using comparative case clustering. In Data-Driven Process Discovery and Analysis, Proceedings of the 5th IFIP WG 2.6 International Symposium, SIMPDA 2015, Vienna, Austria, 9–11 December 2015; Ceravolo, P., Rinderle-Ma, S., Eds.; Lecture Notes in Business Information Processing; Springer: Berlin/Heidelberg, Germany, 2017; Volume 244, pp. 54–75. [Google Scholar]
- Zheng, C.; Wen, L.; Wang, J. Detecting process concept drifts from event logs. In Proceedings of the Move to Meaningful Internet Systems, OTM 2017 Conferences Confederated International Conferences: CoopIS, C&TC, and ODBASE 2017, Rhodes, Greece, 23–27 October 2017; Hervé, P., Debruyne, C., Gaaloul, W., Papazoglou, M., Paschke, A., Ardagna, C.A., Meersman, R., Eds.; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2017; Volume 10573, pp. 524–542. [Google Scholar]
- Seeliger, A.; Nolle, T.; Max, M. Detecting concept drift in processes using graph metrics on process graphs. In Proceedings of the 9th Conference on Subject-oriented Business Process Management, Darmstadt, Germany, 30–31 March 2017; ACM Press: New York, NY, USA, 2017; pp. 1–10. [Google Scholar]
- Ostovar, A.; Maaradji, A.; La Rosa, M.; Arthur, H.M.; van Dongen, B.F.V. Detecting drift from event streams of unpredictable business processes. In Conceptual Modeling, Proceedings of the 35th International Conference, ER 2016, Gifu, Japan, 14–17 November 2016; Comyn-Wattiau, I., Tanaka, K., Song, I., Shuichiro, Y., Saeki, M., Eds.; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2016; Volume 9974, pp. 330–346. [Google Scholar]
- Hompes, B.F.A.; Buijs, J.C.A.M.; van der Aalst, W.M.P.; Dixit, P.M. Discovering deviating cases and process variants using trace clustering. In Proceedings of the 27th Benelux Conference on Artificial Intelligence, Hasselt, Belgium, 5–6 November 2015. [Google Scholar]
- Buijs, J.C.A.M.; Reijers, H.A. Comparing business process variants using models and event logs. In Enterprise, Business-Process and Information Systems Modeling; van der Aalst, W.M.P., Mylopoulos, J., Rosemann, M., Shaw, M.J., Szyperski, C., Bider, I., Gaaloul, K., Krogstie, J., Nurcan, S., Proper, H.A., et al., Eds.; Springer: Berlin/Heidelberg, Germany, 2014; Volume 175, pp. 154–168. [Google Scholar]
- Van der Aalst, W.M.P. Distributed process discovery and conformance checking. In Fundamental Approaches to Software Engineering, Proceedings of the 15th International Conference, FASE 2012, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2012, Tallinn, Estonia, 24 March–1 April 2012; Hutchison, D., Kanade, T., Kittler, J., Kleinberg, J., Mitchell, J., Naor, M., Nierstrasz, O., Pandu, R.C., Steffen, B., Sudan, M., et al., Eds.; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2012; Volume 7212, pp. 1–25. [Google Scholar]
- Carmona, J.; Gavaldà, R. Online techniques for dealing with concept drift in process mining. In Advances in Intelligent Data Analysis XI, Proceedings of the 11th International Symposium, IDA 2012, Helsinki, Finland, 25–27 October 2012; Hutchison, D., Kanade, T., Kittler, J., Kleinberg, J.M., Mattern, F., Mitchell, J.C., Naor, M., Nierstrasz, O., Steffen, B., Sudan, M., Eds.; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2012; Volume 7619, pp. 90–102. [Google Scholar]
- Luengo, D.; Marcos, S. Applying clustering in process mining to find different versions of a business process that changes over time. In Business Process Management Workshops, Proceedings of the7th International Workshop on Business Process Intelligence, Clermont-Ferrand, France, 29 August 2011; Daniel, F., Barkaoui, K., Dustdar, S., Eds.; Lecture Notes in Business Information Processing; Springer: Berlin/Heidelberg, Germany, 2012; Volume 99, pp. 153–158. [Google Scholar]
- Hsieh, H.; Shannon, S.E. Three approaches to qualitative content analysis. Qual. Health Res. 2005, 15, 1277–1288. [Google Scholar] [CrossRef] [PubMed]
- van Dongen, B.F.; Verbeek, H.M.W.; Weijters, A.J.M.M.; van der Aalst, W.M.P. The ProM Framework: A New Era in Process Mining Tool Support. In Applications and Theory of Petri Nets 2005. ICATPN 2005; Ciardo, G., Darondeau, P., Eds.; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2005; Volume 3536, pp. 444–454. [Google Scholar]
- van Dongen, B.F. (Boudewijn), BPI Challenge 2012. 4TU.Centre for Research Data. Dataset. 2012. Available online: https://doi.org/10.4121/uuid:3926db30-f712-4394-aebc-75976070e91f (accessed on 2 July 2020).
- Ward Steeman, BPI Challenge 2013. Ghent University. Dataset. 2013. Available online: https://doi.org/10.4121/uuid:a7ce5c55-03a7-4583-b855-98b86e1a2b07 (accessed on 2 July 2020).
- van Dongen, B.F. (Boudewijn), BPI Challenge 2014. 4TU.Centre for Research Data, Dataset. 2014. Available online: https://doi.org/10.4121/uuid:c3e5d162-0cfd-4bb0-bd82-af5268819c35 (accessed on 2 July 2020).
- Van Dongen, B.F. (Boudewijn), BPI Challenge 2015. 4TU.Centre for Research Data, Dataset. 2015. Available online: https://doi.org/10.4121/uuid:31a308ef-c844-48da-948c-305d167a0ec1 (accessed on 2 July 2020).
- Dees, M.; van Dongen, B.F. (Boudewijn), BPI Challenge 2016. 4TU.Centre for Research Data, Dataset. 2016. Available online: https://doi.org/10.4121/uuid:360795c8-1dd6-4a5b-a443-185001076eab (accessed on 2 July 2020).
- van Dongen, B.F. (Boudewijn), BPI Challenge 2017. 4TU.Centre for Research Data, Dataset. 2017. Available online: https://doi.org/10.4121/uuid:5f3067df-f10b-45da-b98b-86ae4c7a310b (accessed on 2 July 2020).
- van Dongen, B.F.; Borchert, F. BPI Challenge 2018, Eindhoven University of Technology, Dataset. 2018. Available online: https://doi.org/10.4121/uuid:3301445f-95e8-4ff0-98a4-901f1f204972 (accessed on 2 July 2020).
- van Dongen, B.F. (Boudewijn), BPI Challenge 2019. 4TU.Centre for Research Data. Dataset. 2019. Available online: https://doi.org/10.4121/uuid:d06aff4b-79f0-45e6-8ec8-e19730c248f1 (accessed on 2 July 2020).
- Remco, M. Dijkman, Marlon Dumas, and Luciano García-Ba nuelos. Graph matching algorithms for business process model similarity search. In Business Process Management, Proceedings of the 7th International Conference, BPM 2009, Ulm, Germany, 8–10 September 2009; Dayal, U., Eder, J., Koehler, J., Reijers, H.A., Eds.; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2009; Volume 5701, pp. 48–63. [Google Scholar]
- Lu, X.; Fahland, D.; van der Aalst, W.M.P. Conformance checking based on partially ordered event data. In Business Process Management Workshops, Proceedings of BPM 2014 International Workshops, Eindhoven, The Netherlands, 7–8 September 2014; Fournier, F., Mendling, J., Eds.; Lecture Notes in Business Information Processing; Springer: Berlin/Heidelberg, Germany, 2015; Volume 202, pp. 75–88. [Google Scholar]
- Maaradji, A.; Dumas, M.; La Rosa, M.; Ostovar, A. Fast and accurate business process drift detection. In Business Process Management, Proceedings of the 13th International Conference, BPM 2015, Innsbruck, Austria, 31 August–3 September 2015; Motahari-Nezhad, H.R., Recker, J., Weidlich, M., Eds.; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2015; Volume 9253, pp. 406–422. [Google Scholar]
- Dijkman, R.M.; van Dongen, B.F.V.; Marlon, D.; Luciano, G.; Kunze, M.; Leopold, H.; Mendling, J.; Uba, R.; Weidlich, M.; Weske, M.; et al. A short survey on process model similarity. Seminal Contributions to Information Systems Engineering- 25 years of CAiSE. In Seminal Contributions to Information Systems Engineering; Bubenko, J., Krogstie, J., Pastor, O., Pernici, B., Rolland, C., Arne, S., Eds.; Springer: Berlin/Heidelberg, Germany, 2013; Volume 4275, pp. 421–427. [Google Scholar]
- Kaes, G.; Rinderle-Ma, S. Mining and querying process change information based on change trees. In Service-Oriented Computing, Proceedings of the 13th International Conference, ICSOC 2015, Goa, India, 16–19 November 2015; Barros, A., Grigori, D., Narendra, N.C., Dam, H.K., Eds.; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2015; Volume 9435, pp. 269–284. [Google Scholar]
- Kaes, G.; Rinderle-Ma, S. Improving the usability of process change trees based on change similarity measures. In Enterprise, Business-Process and Information Systems Modeling; Gulden, J., Reinhartz-Berger, I., Schmidt, R., Sérgio, G., Wided, G., Bera, P., Eds.; Springer: Cham, Switzerland, 2018; Volume 318, pp. 147–162. [Google Scholar]
- Ayora, C.; Torres, V.; Weber, B.; Reichert, M.; Pelechano, V. Vivace: A framework for the systematic evaluation of variability support in process-aware information systems. Inf. Softw. Technol. 2015, 57, 248–276. [Google Scholar] [CrossRef] [Green Version]
- Thaler, T.; Ternis, S.F.; Fettke, P.; Loos, P. A comparative analysis of process instance cluster techniques. In Smart Enterprise Engineering, Proceedings of the 12th International Conference on Wirtschaftsinformatik, Osnabrück, Germany, 4–6 March, 2015; Thomas, O., Teuteberg, F., Eds.; Springer: Berlin/Heidelberg, Germany, 2015; pp. 423–437. [Google Scholar]
- Di Francescomarino, C.; Ghidini, C.; Maggi, F.M.; Milani, F. Predictive process monitoring methods: Which one suits me best? In Business Process Management, Proceedings of the 16th International Conference, BPM 2018, Sydney, NSW, Australia, 9–14 September 2018; Weske, M., Montali, M., Weber, I., Vom Brocke, J., Eds.; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2018; Volume 11080, pp. 462–479. [Google Scholar]
Case ID | Event ID | Properties | |||
---|---|---|---|---|---|
Timestamp | Activity | Resource | Cost | ||
1 | 3561 | 30 December 2018 10:02:15 | Enter Loan Application | Sara | 120 |
3562 | 31 December 2018 09:08:22 | Retrieve Applicant Data | Sue | 100 | |
3563 | 1 January 2019 12:20:00 | Compute Installments | Mike | 80 | |
3564 | 5 January 2019 10:10:28 | Notify Eligibility | Pete | 150 | |
3565 | 10 January 2019 08:30:10 | Approve Simple Application | Sue | 50 | |
.................... | |||||
7 | 3581 | 30 December 2018 14:02:26 | Enter Loan Application | Pete | 170 |
3582 | 8 January 2019 10:02:15 | Retrieve Applicant Data | Sara | 97 | |
3583 | 15 January 2019 15:08:09 | Compute Installments | Sue | 250 | |
3584 | 22 January 2019 19:20:10 | Notify Rejection | Pete | 60 | |
.................... | |||||
55 | 3600 | 30 February 2019 08:44:05 | Enter Loan Application | Mike | 54 |
3601 | 10 March 2019 10:15:25 | Retrieve Applicant Data | Sara | 47 | |
3602 | 15 March 2019 11:33:29 | Compute Installments | Pete | 210 | |
3603 | 22 March 2019 13:30:44 | Notify Eligibility | Sue | 64 | |
3604 | 30 March 2019 09:10:50 | Approve Complex Application | Mike | 120 |
a | b | c | d | e | |
---|---|---|---|---|---|
a | |||||
b | |||||
c | |||||
d | |||||
e |
Study Id | Authors | Year | Publication Type | Bibliography No. |
---|---|---|---|---|
S1 | Stertz & Rinderle-Ma | 2019 | Conference | [26] |
S2 | Yeshchenko et al. | 2019 | Conference | [8] |
S3 | Stertz & Rinderle-Ma | 2018 | Conference | [3] |
S4 | Nguyen et al. 2018 | 2018 | Conference | [27] |
S5 | Maaradji et al. | 2017 | Journal | [9] |
S6 | Hompes et al. | 2017 | Symposium | [28] |
S7 | Bolt 2017 | 2017 | Conference | [4] |
S8 | Zheng et al. | 2017 | Conference | [29] |
S9 | Seeliger et al. | 2017 | Conference | [30] |
S10 | Lu et al. | 2016 | Workshop | [5] |
S11 | Ostovar et al. | 2016 | Conference | [31] |
S12 | Martjushev et al. | 2015 | Conference | [21] |
S13 | Hompes et al. | 2015 | Conference | [32] |
S14 | Bose et al. | 2014 | Journal | [13] |
S15 | Buijs & Reijers | 2014 | Conference | [33] |
S16 | Van der Aalst | 2012 | Conference | [34] |
S17 | Carmona & Gavaldà | 2012 | Symposium | [35] |
S18 | Luengo & Sepùlveda | 2012 | Workshop | [36] |
S19 | Song et al. | 2009 | Workshop | [23] |
RQ | Criteria | Extracted Data | Analysis Type |
---|---|---|---|
General Information | Title | Free text | None |
Year | Date | None | |
Type | Free text | None | |
Publisher | Free text | None | |
Number of citations | Number | None | |
RQ1 | Inputs | Initial list based on previous knowledge | Frequency counts |
RQ5 | Change form | Initial list based on previous knowledge | Content analysis techniques |
RQ4 | Regarded perspectives | Predefined list based obtained from [2] | Frequency counts |
RQ5 | Study focus | Predefined list based on existing concept drift task | Content analysis techniques |
RQ5 | Handling mode | Predefined list obtained from [22] | Frequency counts |
RQ5 | Drift patterns | Predefined list based on existing concept drift types | Frequency counts |
RQ6 | Theme | Predefined list obtained from [20] | Content analysis techniques |
RQ2 | Used techniques | Free text | Content analysis techniques |
RQ3 | Tool availability | Free text | None |
RQ7 | Evaluation input | Numbers, free text | None |
RQ7 | Evaluation (What/How) | Free text | Content analysis techniques |
The CONDA-PM Framework | ||
---|---|---|
Goals Design | G01: Defining the granularity level at which the approach operates | |
G02: Configuring the handling mode in which a process instance is analysed | ||
Approach Coding | Input Investigation | AI1: Choosing artefacts to be used |
AI2: Analysing inputs to define regarded event log perspectives | ||
AI3: Defining concept drift type to be addressed | ||
Processing | AP1: Concept drift task addressed | |
AP2: Defining the change form to be studied | ||
AP3: Applying suitable techniques | ||
Implementation | I01: Approach implementation and tool availability | |
I02: Choosing deployment platform | ||
Evaluation | E01: Choosing evaluation artefacts | |
E02: Evaluation criteria (what to evaluate) | ||
E03: Used evaluation techniques and metrics (how to evaluate) |
Dimension | SLR Criteria | Maturity Evaluation Scale |
---|---|---|
G01 | Theme (Variability / Flexibility) | 3-value scale [1–3] |
G02 | Handling modes (Online/Offline) | 2-value scale [1–2] |
AI1 | Input (Event log/ Process model) | 2-value scale [1–2] |
AI2 | Regarded perspectives (Control-flow/Data/Resources/Time) | 3-value scale [1–3] |
AI3 | Drift patterns (Sudden/Gradual/Recurring/Incremental/ Multi-order) | 3-value scale [0–2] |
AP1 | Study focus (Detection/ Localisation/ Evolution) | 4-value scale [0–3] |
AP2 | Change form (Behavioural/ Structural) | 2-value scale [1–2] |
AP3 | Used techniques | Description & criticism (if available, in Section 5 & Table 9) |
I01 | Tool availability | 2-value scale [0–1] |
I02 | Tool availability (platform) | 3-value scale [0–2] |
E01 | Evaluation input (Artificial / Real-life) | 4-value scale [0–3] |
E02 | Evaluation (What) | Description & criticism (if available, in Section 5 & Table 10) |
E03 | Evaluation (How) | Description & criticism (if available, in Section 5 & Table 10) |
Publication | Theme | Handling Mode | ||
---|---|---|---|---|
Variability | Flexibility | Online | Offline | |
S1 | ✓ | ✓ | ||
S2 | ✓ | ✓ | ||
S3 | ✓ | ✓ | ||
S4 | ✓ | ✓ | ||
S5 | ✓ | ✓ | ||
S6 & S13 | ✓ | ✓ | ||
S7 | ✓ | ✓ | ||
S8 | ✓ | ✓ | ||
S9 | ✓ | ✓ | ||
S10 | ✓ | ✓ | ||
S11 | ✓ | ✓ | ||
S12 | ✓ | ✓ | ✓ | |
S14 | ✓ | ✓ | ||
S15 | ✓ | ✓ | ||
S16 | ✓ | ✓ | ||
S17 | ✓ | ✓ | ||
S18 | ✓ | ✓ | ||
S19 | ✓ | ✓ |
Publication | Input | Regarded Perspectives | Patterns of Change | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Event log | Process Model | Control-Flow | Data | Resources | Time | Sudden | Gradual | Recurring | Incremental | Multi-Order | |
S1 | Event Streams | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ||||
S2 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |||||
S3 | Event Streams | ✓ | ✓ | ✓ | ✓ | ✓ | |||||
S4 | ✓ | ✓ | ✓ | ✓ | ✓ | (NA) | |||||
S5 | ✓ | ✓ | ✓ | ✓ | |||||||
S6 & S13 | ✓ | ✓ | ✓ | ✓ | (NA) | ||||||
S7 | ✓ | ✓ | ✓ | ✓ | ✓ | (NA) | |||||
S8 | ✓ | ✓ | ✓ | ||||||||
S9 | ✓ | ✓ | ✓ | (NA) | |||||||
S10 | ✓ | ✓ | (NA) | ||||||||
S11 | ✓ | ✓ | (NA) | ||||||||
S12 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |||||
S14 | ✓ | ✓ | ✓ | ✓ | |||||||
S15 | ✓ | ✓ | ✓ | (NA) | |||||||
S16 | ✓ | ✓ | ✓ | (NA) | |||||||
S17 | ✓ | ✓ | ✓ | ||||||||
S18 | ✓ | ✓ | ✓ | (NA) | |||||||
S19 | ✓ | ✓ | (NA) |
Publication | Study Focus | Change Form | Used Techniques | |||
---|---|---|---|---|---|---|
Detection | Localisation | Evolution | Behavioural | Structural | ||
S1 | ✓ | ✓ | ✓ | Sliding window, Statistical Analysis for outlier detection & Serialisation in YAML | ||
S2 | ✓ | ✓ | ✓ | ✓ | Sliding window, Hierarchical Clustering, Change point detection using PELT & Declarative constraints discovery using MINERFUL | |
S3 | ✓ | ✓ | Sliding window, Stream–Based Abstract Representation (SBAR), Hashtables, Inductive Miner | |||
S4 | (NA) | ✓ | Perspective graphs & finding statistical differences between them to result in differential perspective graphs | |||
S5 | ✓ | ✓ | ✓ | ✓ | Adaptive windowing, statistical testing (Chi–Square test) & oscillation filter | |
S6 & S13 | ✓ | ✓ | MCL & cosine similarity | |||
S7 | (NA) | ✓ | Recursive Partitioning by Conditional Inference (RPCI), points of interest & event augmentation | |||
S8 | ✓ | ✓ | DBSCAN clustering | |||
S9 | ✓ | ✓ | ✓ | Adaptive windowing, Heuristic mining & G-Tests | ||
S10 | ✓ | ✓ | ✓ | ✓ | Backtracking, heuristic function, a greedy algorithm, cost function & REGs | |
S11 | ✓ | ✓ | ✓ | Adaptive windowing, statistical testing (G–test of independence) | ||
S12 | ✓ | ✓ | ✓ | ✓ | Adaptation of ADWIN approach | |
S14 | ✓ | ✓ | ✓ | ✓ | Statistical hypotheses testing using Kolmogorov–Smirnov (KS) test, Mann–Whitney test, Hotelling T2 test & windowing | |
S15 | ✓ | ✓ | ✓ | Alignment matrix | ||
S16 | ✓ | ✓ | Vertical & horizontal partitioning | |||
S17 | ✓ | Proposed Ideas | ✓ | Abstract interpretation, Parikh vectors & Adaptive windowing | ||
S18 | ✓ | ✓ | Euclidean Distance & Agglomerative Hierarchical Clustering (AHC) | |||
S19 | ✓ | ✓ | Trace Clustering / divide & conquer (trace profiles) |
Publication | Tool Availability | Evaluation Input | Evaluation (What/How) | |
---|---|---|---|---|
Artificial | Real-life | |||
S1 | prototype | log from the manufacturing domain with 10 process instances & 40436 events | the ability of the approach to identify drifts with different thresholds for outlier detection | |
S2 | Python implementation of the algorithms | four synthetic logs | Italian IT help desk log & BPIC’11 log | what: the ability to identify different drift types & plotting drifts as charts & maps How: comparing the detection results to those in Reference [31] and computing F-scores |
S3 | a tool to synthesis process execution logs +a web service to transform static logs into event streams + a web service to run algorithms on events | Insurance process augmented by cases representing all types of drifts | The ability of the proposed algorithms to detect the 4 types of drifts | |
S4 | Multi Perspective Process Comparator MPC plugin in ProM | BPIC13 & BPIC15 –BPIC13 (cases of an IT incident handling process at Volvo Belgium–IT teams are organized into tech–wide functions, organization lines & countries | Compared to ProcessComparator plugin in ProM regarding: differences at the event level (intra–event & inter–event relations) –differences at the fragment level(intra–fragment & inter–fragment graphs)–time–wise differences compared to case–wise differences using a sliding window of 3 days. | |
S5 | ProDrift as a plugin in Apromore platform | loan application (72 event logs [four for each of 18 change patterns]–15 activities in the base model)+18 logs for gradual drifts | motor insurance claims(4509 cases–29108 events) –motor insurance claims of different insurance brand (2577 cases–17474 events) | Accuracy assessment using: F–score & mean delay–expert validation of results on real-life logs What: impact of window size on accuracy –impact of oscillation filter size –impact of adaptive window –accuracy per change pattern –time performance–comparison with results in Reference [13] |
S6 & S13 | Trace Clustering package in ProM | 164 variants–1000 cases–17 activities –6812 events | two logs (Dutch hospital–1143 cases–624 activities–150291 events)+ (municipality–1199 cases–398 activities–52217 events) | What: Comparison to Trace Alignment& ActiTrac How: mining the model for each technique using Flexible heuristics Miner (FHM)–fitness is calculated using cost–based log alignment, cluster entropy& split rate for each technique. |
S7 | Process variant finder plugin in Variant–Finder package in ProM | 10000 cases–12721 events–31 resources | Spanish telecommunications company–8296 cases –40965 events representing 5 activities | For the artificial log: performance variability is evaluated by choosing time as the dependent attribute& resources as the independent one For the real-life log: examining the ability to define variants & evaluating the impact of some of the detected partitions on the whole process |
S8 | Python implementation of the algorithms | 32 logs of loan application | –the impact of different thresholds on F-score –the effect of different DBSCAN parameters –comparison to the approach proposed in Reference [49] concerning: time, f-score, when high precision is required, and average detection error. | |
S9 | plugin in ProM | 72 logs from a base model with 15 activities–18 change patterns are applied | –Accuracy is measured using F-Score –Average delay between actual & detected drifts –comparison to the approaches proposed in Reference [13] &[49] concerning: F-score & average delay | |
S10 | ✓ | 1000 cases –6590 events | 2 logs –Maastricht University medical centre (2838 cases& 28163 events) +governmental municipality (1434 cases& 8577 events) | 5 different experiments: Accuracy score is calculated to indicate how accurately deviating events are detected, in comparison to 3 different approaches Experiments are concerned with: effect of different configurations–effect of using sequential orders instead of partial orders –effect of various deviation levels–performance & scalability–effect of using synthetic& real–life data |
S11 | ProDrift 2.0 as a plugin in Apromore platform | 90 logs of sizes 2500,5000,10000 cases using a model with 28 activities in CPN | BPIC 2011 log –Dutch academic hospital–1143 cases–150291 events | Accuracy assessment: F–score& mean delay. What: impact of oscillation filter size–impact of inter drift distance–comparison to previous method they proposed concerning different patterns detection & with different log variability–execution time |
S12 | ConceptDrift plugin in ProM | Insurance claim (sudden & multi–order changes: 6000 cases –15 activities –58783 events)–(gradual drifts: 2000 cases–15 activities–19346 events) | Dutch municipality –184 cases –38 activities–4391 events | What: measuring the ability to detect & localise changes with a lag window of different traces no. How: adaptive windowing using the Kolmogorov test over the data stream then classic data mining metrics are derived. -For the multi–order drifts & gradual drifts: different configurations using different population sizes, step sizes, p-value threshold & gap sizes –examples of linear & exponential gradual drifts |
S14 | Concept Drift plugin in ProM | insurance claim (6000 cases –15 activities –58783 events) | municipality (116 cases –25 activities–2335 events) | The ability to detect (using RC feature) & localize change points (using WC feature) –influence of population size –Time complexity for feature extraction & hypothesis test analysis |
S15 | Comparison framework package in ProM | CoSeLoG project –5 municipalities | What: Comparison visualization understandability How: Meetings with representatives of municipalities- comparison of replay fitness scores & visualization | |
S16 | (NA) | Illustrative examples | (NA) | |
S17 | (NA) | 13 models are used to derive logs with drifts | 1050 cases–35 activities | Concatenation of two logs with different distributions & measuring the approach’s ability to detect the change, with estimation of the number of points needed to sample for change detection. |
S18 | (NA) | 3 logs–each with 2000 cases–over 1 year | What: the accuracy at which each approach is able to classify different traces. How: 3 approaches (without time factor –with time=no. of elements in the feature set –with time is weighted against the no. of elements in the feature set) were applied on the 3 logs –an accuracy metric is calculated. | |
S19 | Trace Clustering plugin in ProM | AMC hospital in Amsterdam (619 cases–52 activities–3574 events) | Different combinations of distance measures & clustering techniques–the understandability of resulting models with the aid of domain experts |
CONDA-PM Criteria | SLR Criteria | Publication | |||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
S1 | S2 | S3 | S4 | S5 | S6 & S13 | S7 | S8 | S9 | S10 | S11 | S12 | S14 | S15 | S16 | S17 | S18 | S19 | ||
Phase I: Goal Design | |||||||||||||||||||
G01 | Theme | 2 | 2 | 2 | 1 | 2 | 2 | 1 | 2 | 2 | 2 | 2 | 3 | 2 | 1 | 2 | 2 | 1 | 2 |
G02 | Handling mode | 2 | 1 | 2 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 2 | 1 | 1 | 1 | 1 | 2 | 1 | 1 |
Phase II: Approach Coding (Input investigation) | |||||||||||||||||||
AI1 | Input | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 2 | 1 | 1 | 1 | 1 | 2 | 2 | 1 | 1 | 1 |
AI2 | Regarded perspective | 2 | 1 | 1 | 3 | 1 | 3 | 3 | 1 | 1 | 1 | 1 | 2 | 1 | 1 | 1 | 1 | 2 | 1 |
AI3 | Patterns of change | 2 | 2 | 2 | 0 | 2 | 0 | 0 | 1 | 0 | 0 | 0 | 2 | 2 | 0 | 0 | 1 | 0 | 0 |
Phase II: Approach Coding (Processing) | |||||||||||||||||||
AP1 | Study focus | 2 | 3 | 1 | 0 | 3 | 1 | 0 | 1 | 2 | 2 | 2 | 3 | 3 | 2 | 1 | 1 | 1 | 1 |
AP2 | Change form | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 2 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
AP3 | Used techniques | Described in Table 9 | |||||||||||||||||
Phase III: Implementation | |||||||||||||||||||
I01 | Tool availability | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 1 |
I02 | Tool availability (platform) | 1 | 1 | 1 | 2 | 2 | 2 | 2 | 1 | 2 | 1 | 2 | 2 | 2 | 2 | 0 | 0 | 0 | 2 |
Phase V: Evaluation | |||||||||||||||||||
E01 | Evaluation input | 2 | 3 | 1 | 2 | 3 | 3 | 3 | 1 | 1 | 3 | 3 | 3 | 3 | 2 | 1 | 3 | 1 | 2 |
E02 | Evaluation (what) | Described in Table 10 | |||||||||||||||||
E03 | Evaluation (How) | Described in Table 10 |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Elkhawaga, G.; Abuelkheir, M.; Barakat, S.I.; Riad, A.M.; Reichert, M. CONDA-PM—A Systematic Review and Framework for Concept Drift Analysis in Process Mining. Algorithms 2020, 13, 161. https://doi.org/10.3390/a13070161
Elkhawaga G, Abuelkheir M, Barakat SI, Riad AM, Reichert M. CONDA-PM—A Systematic Review and Framework for Concept Drift Analysis in Process Mining. Algorithms. 2020; 13(7):161. https://doi.org/10.3390/a13070161
Chicago/Turabian StyleElkhawaga, Ghada, Mervat Abuelkheir, Sherif I. Barakat, Alaa M. Riad, and Manfred Reichert. 2020. "CONDA-PM—A Systematic Review and Framework for Concept Drift Analysis in Process Mining" Algorithms 13, no. 7: 161. https://doi.org/10.3390/a13070161
APA StyleElkhawaga, G., Abuelkheir, M., Barakat, S. I., Riad, A. M., & Reichert, M. (2020). CONDA-PM—A Systematic Review and Framework for Concept Drift Analysis in Process Mining. Algorithms, 13(7), 161. https://doi.org/10.3390/a13070161