Next Article in Journal
Token-Revocation Access Control to Cloud-Hosted Energy Optimization Utility for Environmental Sustainability
Previous Article in Journal
Study of the Effect of Cutting Frozen Soils on the Supports of Above-Ground Trunk Pipelines
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Experimental Analyses of Temporal Activity-Sequencing Anomalies in Process Mining

by
Kwanghoon Pio Kim
Data and Process Engineering Research Laboratory, Division of Computer Science and Engineering, Kyonggi University, Suwon-si 16227, Gyeonggi-do, Republic of Korea
Appl. Sci. 2023, 13(5), 3143; https://doi.org/10.3390/app13053143
Submission received: 17 January 2023 / Revised: 20 February 2023 / Accepted: 27 February 2023 / Published: 28 February 2023

Abstract

:
In the research field of the automated process discovery and analysis, the purity of event log datasets ought to be a matter of vital importance to the success of discovering sound and exact process models. Moreover, there exist various types of anomalies that result in the discovery of inaccurate process models from the process enactment event log datasets. A peculiar one out of these anomalies, which is the core challenging issue of this paper, is the temporal activity-sequencing anomaly that critically affects the overall quality of the automated process discovery. This paper explores such event-log anomalies and noises produced by the special type of anomalies inevitably formed in the event-log preprocessing phase of the automated process discovery. More precisely, it implements an algorithmic approach that is able to detect and filter out those anomalies and noises in performing the automated process discovery. The author also carries out a series of experimental analyses by applying the implemented approach to the five datasets of process event logs available in the 4TU Center for Research Data.

1. Introduction

In recent years, many data scientists have been interested in the field of process mining researches and developments [1,2,3], discovering and rediscovering business process models and their process-related knowledge [4] from process enactment event-log datasets [5,6,7,8]. Generally, the core competitiveness of the process-aware enterprises and organizations ought to be on the automated process discovery [9,10] and knowledge mining functionality [11,12] that properly addresses the various types of operational anomalies [13], such as modeling anomaly, operating anomaly, logging anomaly, and discovering anomaly, producing their corresponding inaccurate event logs – Noises [14,15]. At this stage, however, it is very important to control and properly address these various types of noises in process event logs, because these noises can easily affect the overall qualities of the automated process discovery algorithms and systems.
This paper experimentally verifies the existence of noises on the process enactment event log datasets, which are spawned from the temporal activity-sequencing anomalies, and validate the effectiveness from detection to elimination of these anomalies during the whole procedural phases of the automated process discovery framework. In other words, it shows that a special type of noises must be inevitably formed, without exception, in formalizing the process-mining activities and algorithms, and the author names those noises activity-sequencing noises. Accordingly, this paper proposes a discovery and extinction algorithm [16] for disposing of those noises in order to enhance the accuracy [17,18] of the process-mining activities. Additionally, it implements the proposed algorithm and applies it to a series of real datasets of process event logs through a series of experimental analyses.
The rest of this paper is organized as follows. The related works section surveys the related works that have been conducted in the literature in terms of identifying noises and anomalies on process event logs and addressing noises and anomalies in the process-mining activities and algorithms. Next, the author describes not only the details of the reasons why those noises are spawned and formed from a specific type of anomalies but also the details of the resolution in terms of how to detect and remove them through an algorithmic approach. After implementing the proposed anomaly detection and removal algorithm, This approach is applied to several datasets and the experimental results are summarized. Finally, the author presents the summary of this paper and suggests the future research work to be carried out by the authors’ research group.

2. Related Works

As mentioned in the previous section, discovering an exact process model is one of the major issues in the process-mining literature [4,19,20,21]. One of the main difficulties is addressing noises in process event logs [5,7]. The determination, filtering, and elimination approaches for those noises have been paid attention to by many researchers and scientists. From the beginning of the process-mining research, various theoretical research outcomes have emerged from challenging the noises and anomaly issues as follows.
  • Aalst and their colleagues [3] introduced the first process model discovery algorithm— α -algorithm. In the α -algorithm, they assume that there is no noise and that the workflow log contains "sufficient" information. A.J.M.M. Weijters et al. [22] introduced a mining algorithm that can address noises—HeuristicsMiner—wherein the frequency-based metric is used to truthfully indicate the aspects of a dependency relation between two events A and B. That is, the algorithm uses the metric to determine a relationship of a W b by only considering the order of the events within a same case.
  • Lucantonio Ghionna et al. [20] introduced the TraceOutlierMining algorithm that discovers a set of outliers. When the framework mines the patterns of executions, it uses an outlier-detection approach based on the clustering technique that makes clusters for the logs and finds outliers on each of the clusters.
  • Raffaele Conforti et al. [23] introduced a technique to filter noises based on outliers detection. In order to create an automaton capturing the direct following relations between event labels, their algorithm generates an abstract representation of an event log and then prunes the automaton from arcs with low relative frequency and removes the events that are unfit for the automaton (outliers).
  • Mohammadreza Fani Sani et al. [24] introduced a method to filter outliers using conditional behavioral probabilities between sequences of activities. In their approach, they proposed a filtering method that takes an event log as an input and returns a filtered event log based on a given threshold. They used the conditional probability of the occurrence of an activity after a sequence of activities. If this probability is lower than the given threshold, the activity is considered as an outlier.
  • Weimin Li et al. [25] proposed an anti-noise process-mining algorithm that can address the issue of noise. In the algorithm, they introduced distance-based clustering techniques based on the features of each trace. The distance between traces was calculated to build the minimum spanning tree on clusters generated from event logs. The traces of the non-largest clusters are treated as noise.
  • In another group, Mohammadreza Fani Sani et al. [26] proposed a method to filter outlier behavior using sequence mining. The method consists of three main steps: (i) Discovering sequential rules and patterns from the event log. (ii) Identifying whether it contains any outlier behavior based on discovered sequential rules and patterns for each trace. (iii) Removing traces with outlier behavior from the event log. By using this approach, they were able to discover the flow relation of activities over long distances.
  • In another study, Sebastiaan J. van Zelst et al. [16] proposed an event processor to filter out spurious events from a live event stream. The proposed filter uses a subset of all behaviors emitted onto the stream and intended to be updated incrementally when new events arrive on the stream. In order to determine what events need to be removed, they used a multitude of existing stream-based approaches, e.g., sliding windows, reservoir sampling, or decay methods.
  • Kyoung-Sook Kim et al. [11] proposed a structured information control net discovery algorithm from process event logs, and firstly introduced the basic concept of so-called deliberate noises with a detection approach. Hyun Ahn et al. [27] introduced the theoretical concept of work transference network and its discovery algorithm and framework, which are able to discover a work transference network from process event logs. Additionally, it verified and graphically visualized how the discovered work transference network is affected from noises when the discovery algorithm operates on process event logs embedding noises.
Particularly, the main topic of this paper is the temporal anomaly detection and the noise removal approaches. One of the typical approaches published in the literature is the temporal delay-anomaly detection approach proposed in Andreas Rogge-Solti et al. [28]. This approach is built on the intuitive assumptions of Petrinet-based business processes and the temporal delay-anomalies are derived from practical observations to detect temporal errors and outliers from event logs by using the window-based measurement and error detection approach. Another typical approach related to this main topic is the so-called temporal activity-sequencing anomaly detection approach proposed by the authors’ research group. In this paper, we especially carry out a series of experimental analyses on a group of process event log datasets by applying the temporal activity-sequencing anomaly detection algorithm and its implemented framework and system.

3. Temporal Activity-Sequencing Anomaly Detection Approach

Process event logs are recorded from the operations of process-aware information systems supporting business process models that were defined, planned, deployed, and enacted through the buildtime clients and the runtime clients of BPMS. We can use theoretical process modeling methodologies to represent those models. The Petrinet-based modeling approach [29] is one of the most widely used theoretical modeling methodology to specify the business process models. Another typical process modeling approach is the ICN (Information Control Net)-based modeling approach [30].

3.1. Activity-Sequencing Graph in the Process Mining Framework

In the paper, the emphasis is placed on the quality of the process mining approach, which is named ρ -Algorithm and is able to discover all the combinations of the four types of process routing patterns associated with the deployed business process model from its enactment event log dataset. The procedural framework of Figure 1 is conceptually defining a stepwise approach from the XES-formatted event logs to the information control net process model. That is, this paper defines a series of formal concepts such as event trace, temporal work-case, loop-case, temporal work-transference, and so forth. A single event trace is temporally ordered execution log of a workflow instance, and its named formalism is a temporal work-case model that was precisely introduced in [11]. Essentially, the logging and auditing component of the business process enactment engine records its activity execution events on a log repository, and those logged events are arranged in a form of a temporal sequence of events. This execution sequence of a process instance forms a process instance event trace, from which it can extract a process instance activity trace and its formal representation is specified by a model of temporal workcase [31].
In particular, it assures that the temporal workcases embed a certain quantity of hidden noises that may seriously affect the correctness as well as the quality of the corresponding process-mining algorithms, discovering the correct process models due to all such anomalies caused by the hidden noises. Consequently, we need to build a certain form of anomaly-treatment algorithms that can detect and remove those noises caused by the temporal activity-sequencing anomalies. As can be seen, the right-hand part of the conceptual framework illustrates the procedural algorithms of clearing of temporal activity-sequencing anomalies and noises off from the outcome of the discovery algorithm (The author’s research group has successfully devised an ICN-based process discovery algorithm, the name of which is ρ -Algorithm [11]). The rediscovered activity-sequencing graph embeds the specific types of anomalies and noises, because the internal structure of the temporal workcases has inevitably no choice but to spawn certain types of noises caused by the temporal activity-sequencing anomalies. The temporal activity-sequencing anomaly detection algorithm proposed in this paper does clear off all those noises from the rediscovered activity-sequencing graph and produces the finalized temporal activity-sequencing graph, from which the ρ -Algorithm eventually rediscovers a corresponding SICN-based business process model.
It is worth noting that the authors’ research group has fully verified the correctness of the anomaly-detection and noise-removal algorithm via ρ -Algorithm [11], successfully implemented as a process-mining tool. This paper focuses on carrying out a series of experimental analyses to achieve the eventual goal of validating the process-mining framework and fortified system against the challenging issue of the temporal activity-sequencing anomalies as well as noise detection and removal in process event log datasets.

3.2. The Origin of Temporal Activity-Sequencing Anomalies

This paper uses this ICN-based modeling approach to formalize and deploy the theoretical concept of the temporal activity-sequencing anomalies and their noises. In the ICN-based process modeling approach, there are four types of primitive routing (control flow) patterns, as follows:
  • Sequential control-flow: Activity follows activity by a sequential order.
  • OR control-flow: In the Or control-flow, an activity (or decision) has a disjunctive transition followed by two or more activities. Only one of the followed activities will be executed next.
  • AND control-flow: In the And control-flow, an activity has a conjunctive (or parallel) transition followed by two or more activities. All of the followed activities will be executed next.
  • LOOP control-flow: In the Loop control-flow, an activity has a loop transition. The activities inside the loop area will be executed repeatedly.
Let us assume that the process-aware information system correctly records all the business process instances and their workitem enactment event logs. Then, each of the business process instances produce an event trace representing the temporal ordering of the activities’ event logs. The four primitive process routing patterns are used in the ICN-based business process modeling methodology and the possible workcases instantiated from each of the process routing patterns will look like the following:
  • Sequential workcases: ABCD, ABCD, …
  • OR workcases: ABD, ACD, ACD, ABD, …
  • AND workcases: ABCD, ACBD, ABCD, …
  • LOOP workcases: ABCD, ABCBCD, …, A(BC) i D
In particular, the workcases of the AND and LOOP process-routing patterns are involved in the temporal activity-sequencing anomalies when each workcase forms a temporal activity-sequencing workcase model to be applied into ρ -Algorithm [11]. In the AND workcases, every workcase has a pair of mutual orderings between activity B’s event logs and activity C’s event logs such as BC and CB in ABCD and ACBD, respectively. However, these temporal orderings of BC and CB are nonexistent edges in the original AND process-routing pattern. The author names these nonexistent edges as AND-Anomalies. In principle, activity B and activity C are simultaneously followed by AND control-flow split-gateway, which implies that both activity B and activity C must be performed in parallel right after performing activity A. In the LOOP workcases, every workcase has a pair of iterative orderings with activity B’s event logs and activity C’s event logs such as BCBC and BCBCBC. As we know, however, the temporal orderings of CBs in LOOP workcases are nonexistent edges, too, in the original LOOP process-routing pattern. These situations result in the innate anomalies with the temporal orderings of activity B and activity C in the AND process-routing pattern as well as the temporal orderings of activity C and activity B in the LOOP process-routing pattern.
Consequently, these nonexistent orderings associated with AND and LOOP workcases become the origin of the temporal activity-sequencing anomalies and noises, and ultimately bring about debasing the level of correctness and quality of the process-mining algorithm and system. Therefore, this paper aims at the cause-and-effect analysis of the temporal activity-sequencing anomalies by performing a series of experimental analyses on a group of process-enactment event log datasets provided by the 4TU.ResearchData, which is an international data repository for science, engineering, and design [5]. It is worth noting that the authors’ research group has successfully devised a temporal activity-sequencing anomaly detection and removal algorithm and applied said algorithm to a process-mining algorithm ( ρ -Algorithm) [11]) and its implemented system.

3.3. Detection and Removal of the Temporal Activity-Sequencing Anomalies and Noises

This section explains the approach and solution to detect and remove the temporal activity-sequencing anomalies on the process pattern graph model formed from all the temporal workcases of the event logs. The even log dataset of the ICN-based business process model attempts to explore a pair group of directed edges (A, B) on the temporal workcases, which are satisfied with the conditions of ( A w 1 B ) and ( B w 2 A ) with w 1 w 2 ; then, those pairs hold potentialities for being involved in the temporal activity-sequencing anomalies and creating noises. It is worth noting that the value w like w 1 and w 2 implies the number of edge occurrences. In summary, the author also encountered some meaningful findings according to which the proposed algorithmic approach still has a certain level of limitations, as follows:
  • Addressing a case of the threshold, T = 1 , and ( A w 1 B ) and ( B w 2 A ) : This case theoretically means that there ought to be a particular case where the number of directed edge occurrences between A and B is exactly same to the number of directed edge occurrences between B and A. Furthermore, this case implies that activity B’s enactment is immediately followed by only activity A’s enactment and vise versa, which gives the equality, w 1 = w 2 . Conclusively, this helps us make a decision regarding whether a directed edge is a noise or not. However, the other cases of the threshold, T 1 , continue to escape our understanding when making decisions.
At this moment, it is also necessary to have the ability to know how to distinguish the AND process pattern associations from the LOOP process pattern associations. Finally, the approach, from forming a process pattern graph model embedding those noises to disposing of said noises on the discovered process pattern graph model, is composed of the following five steps with three layers:
  • Step 1: Discovering a process pattern graph model by amalgamating all the temporal workcases of the event log dataset and naming it Layer-1 controlled under the algorithm in [18].
  • Step 2: From Layer-1, synthesizing all the directed edges of adjacent activities with the value w of their edge occurrences and naming it Layer-2.
  • Step 3: From Layer-2, replacing the value w by the ratio ( w ÷ n ) of the edge occurrence to the total number of temporal workcases, where w is the number of edge occurrences of a corresponding pair of adjacent activities and n is the total number of temporal workcases (process instance event log traces) in the entire event log dataset.
  • Step 4: Combining all the directed edges in Layer-2 by calculating the ratio ( b ÷ a ) of one directed edge occurrence ratio to the other directed edge occurrence ratio in a corresponding pair of adjacent activities and finally naming it Layer-3.
  • Step 5: Setting the T-filter threshold, removing all the directed edges higher than the T-value in Layer-3, updating the corresponding values on Layer-1, and obtaining the result.
Algorithm 1 describes our algorithmic approach for detecting and removing the temporal activity-sequencing anomalies and their noises, in detail. It uses the GetGraphFromLog function in this algorithm, which is an updated version of the removePhantomEdge function and the GetTWTNetwork function in the SICN (structured information control net)-based process-mining framework [11] and the human-centered workflow intelligence framework [27], respectively. The authors’ research group was able to successfully develop a much more sound process-mining framework and system through this algorithmic approach. It is worth noting that the author will not describe the details of this algorithmic approach in order to focus on the experimental analyses of this paper’s scope and main goal.
Algorithm 1 The temporal activity-sequencing anomaly detection and removal algorithm
Require: 
Process enactment event logs ( ω ε ) , Threshold T
Ensure: 
Non-noised temporal activity-sequencing graph
1:
 procedure Main
2:
     Initialize Graph G = ∅, activityMap = ∅
3:
     Open the process enactment event log dataset file ( ω ε )
4:
     G = GetGraphFromLog( ( ω ε ) ) (*)
5:
     n = ( ω ε ) .GetNumberOfTraces
6:
     for ( n o d e ( G ) ) do
7:
         new(tempActivityInfo)
8:
         for ( o u t G o i n g E d g e ( n o d e ) ) do
9:
            beginNode = outGoingEdge.GetStartNode()
10:
            endNode = outGoingEdge.GetEndNode()
11:
           w = outGoingEdge.GetValue()
12:
            tempActivityInfo.add(beginNode,endNode,w)
13:
           activityMap.add(tempActivityInfo)
14:
         end for
15:
     end for
16:
     for ( p a i r ( a c t i v i t y P a i r M a p ) ) do
17:
         ratioValue = p a i r . w 2 / p a i r . w 1
18:
         if (ratioValue > Threshold-T) then
19:
            G.RemoveEdge(pair.Start, pair.End)
20:
            G.RemoveEdge(pair.End,pair.Start)
21:
         end if
22:
     end for
23:
     Close the enactment log file ( ω ε )
24:
     Return Graph G
25:
 end procedure

4. Experimental Analyses

This section carries out a series of experimental analyses to verify the correctness of the temporal activity-sequencing anomaly detection and removal algorithm as well as fulfill a couple of effectiveness analyses of the proposed anomaly concept by applying the algorithmic approach to the exemplary datasets released in the 4TU.Center for Research Data [5]. Through the experiment and feasibility analyses, the author may prove that the theoretical and functional correctness of the proposed concept and such anomaly and noise existence ought to be reasonable and acceptable.

4.1. Preparation of the Process Event Log Datasets

According to the way a process instance is executed, the logging and auditing component of the process enactment engine records its workitem execution events on a log repository, and those logged events are arranged in the form of the corresponding temporal activity-sequence of event logs. Eventually, the temporal activity sequence of each process instance is concretized to the process instance event trace with the conceptual and theoretical basis as the name of temporal workcase model. From the dataset of process instance event traces, it does extract each process instance’s workitem event trace and its temporal workcase with the workcase model. In general, each of the event logs in the process instance event traces is formatted in a tag-based language such as XWELL, BPAF, or XES. Recently, IEEE released a standard tag-based language, XES (Extensible Event Stream), whose aim is to provide designers of information systems a unified and extensible methodology for capturing the systems’ behaviors by using event logs and event streams. For the sake of the experimental analyses, the author prepares a group of process enactment event log datasets, as shown in Figure 2, all of which are available in the 4TU.Center for Research Data [5] and are well-fitted and appropriate for carrying out the experimental analyses.

4.2. Effectiveness Analyses of the Proposed Anomalies

In order to verify the effectiveness of detecting and removing the temporal activity-sequencing anomalies and noises, the author fulfills an experiment onto the downloaded event log dataset of the Large Bank Transaction Process Model, which contains 10,000 business process instance event traces under the name of “10000-all-synthetic-150MB.xes”, the business process enactment event history files. Fortunately, the provider of the dataset supplements the corresponding Petri-net process model for the verification and validation of the process-mining algorithms and systems. For the purpose of simplifying the effectiveness analysis, only a single subprocess model of the Sender Authentication Subprocess Model in a nested formation of the disjunctive and conjunctive process pattern constructs is chosen as the target dataset.
Figure 3 shows two screen-captures of the process pattern graph models; one is of the process pattern graph model that embeds the temporal activity-sequencing noises and that is discovered from all the 10,000 groups of temporally ordered activity-sequencing traces, and the other is of the process pattern graph model after removing all the temporal activity-sequencing noises by applying the algorithm and its implemented system proposed in this paper. The figure also shows two hand-drawings of the process pattern graph models to clearly verify the effectiveness analysis by projecting and beautifying the before-and-after results of the process pattern graph models for the partial event log dataset of the Sender Authentication Subprocess Model. Right after building the weighted process pattern graph, it is necessary to eliminate those anomaly pairs of adjacent activities in the graph, which are artificially created when forming the temporal workcases and their workcase models from the business process instance event traces. The left-most hand-drawing figure illustrates the situation of building a weighted process pattern graph embedding the anomaly pairs of adjacent activities. In particular, those pairs of adjacent activities among the children activities of a same father activity in parallel AND relationships are called the anomaly pairs of adjacent activities, which is colored in red in the figure, such as ( α 9 , α 10 ), ( α 10 , α 9 ), ( α 10 , α 11 ), ( α 11 , α 10 ), ( α 9 , α 11 ), and ( α 11 , α 9 ), for instance. The rightmost hand-drawing figure also shows the process pattern graph model of the chosen subprocess model eventually mined by the implemented system, which is discovered from all the 10,000 temporal workcases (business process instance event traces) in the enactment event history dataset of the Large Bank Transaction Process Model.
Conclusively, the eventual effectiveness of the noise detection and removal algorithm ought to be evaluated by the qualitative degree of correctness on the discovered business process model transformed from the process pattern graph model after removing the noises. The author carried out an experiment on the event log dataset by using the corresponding process mining algorithm and system adopting the anomaly and noise detection and removal algorithm, which is ρ -Algorithm [11] and its implemented system. The final screen-captured result with a hand-drawing of the discovered SICN-based process model produced from the experiment is shown in Figure 4. As we can easily identify in the figure, the qualitative degree of the discovered SICN-based process model ought to be a perfect correctness in a nested formation of four process patterns of AND, XOR, and LOOP constructs satisfied with the matched pairing and the proper nesting properties as well. Therefore, the author can conclude that the noise detection and removal algorithm and its system are effective enough on providing the higher degree of correctness in the process-mining algorithms and systems.

4.3. Experimental Analyses of the Proposed Anomalies

In the previous section, the author successfully verified the effectiveness of detecting and removing the proposed concept of the temporal activity-sequencing anomalies by simply applying a dataset that contains all the enactment event logs of 10,000 process instances into the Large Bank Transaction Process Model. This section carries out five experiments. The first experiment is performed on the name of the dataset, ETM-Configurtion1.xes. The three experiments are conducted on the datasets, which were released in the BPI 2012, 2017, and 2018 contests. Finally, the dataset of the Large Bank Transaction Process Model, which was used in the previous section, is completely analyzed in the last experiment. Table 1 serves to summarily list the analyzed quantitative numbers of events, traces, and associated activities included in the process event log datasets that are selectively chosen from the prepared datasets for the experimental analyses.
Figure 5 summarizes the number of the temporal activity-sequencing noises detected and removed from each of the datasets as the results analyzed in the series of experiments. That is, through the consecutive experimental analyses, the author carried out four more experiments, each of which applies the proposed anomaly detection and removal algorithm and its implemented system to each of the datasets listed in the table. More specifically, Figure 5 represents a line-chart plotting all the points on a line of each experimental dataset used in the experimental analyses, where the point of the vertical axis implies the number of the detected and removed (temporal activity-sequencing) noises and the point of the horizontal axis depicts each of the threshold values of the directed edge-occurrence ratios. Finally, with respect to the Large Bank Transaction Process Model’s event log dataset, Figure 6 provides a group of the process pattern graph models after removing all the noises by the proposed anomaly detection and removal algorithm, each of which corresponds to each subprocess model being composed of the main process model.

5. Conclusions

This paper addresses a series of experimental analyses focusing on the concept of the temporal activity-sequencing anomaly that must be taken into consideration in developing and implementing the process-mining algorithms and systems, respectively. It also excogitated the algorithmic approach of detecting and removing the noises created from these anomalies and proved its effectiveness in improving the qualitative degree of the process-mining algorithm and system by applying the typical SICN-based process-mining algorithm ( ρ -Algorithm) to an ideal dataset of process enactment event logs as well. Finally, the author successfully carried out a series of experimental analyses with five different process event log datasets provided by the 4TU.Center for Research Data. In the future works, the author will continue exploring other types of anomalies and their detection and removal approaches, which can be developed during all the phases in the process-mining algorithms and the business process enactment event logging activities as well.

Funding

This research was supported by the KGU Research Foundation Program funded by the KYONGGI UNIVERSITY in the Republic of Korea (Grant No. 2022-003).

Acknowledgments

This research was partially supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT, Ministry of Science and ICT), Republic of Korea (Grant No. NRF-2022R1A2C2093002). The author thanks their colleagues and institution for sponsoring this research. Particularly, appreciation is extended to the student members of the Data & Process Engineering Research Laboratory in Kyonggi University and to Dinh-Lam Pham who is a research professor in the Contents Convergence Software Research Institute in Kyonggi University.

Conflicts of Interest

The author declares no conflict of interest.

References

  1. Park, M.; Ahn, H.; Kim, K.P. Workflow-Supported Social Networks: Discovery, Analyses, and System. J. Netw. Comput. Appl. 2016, 75, 355–373. [Google Scholar] [CrossRef]
  2. Van der Aalst, W.M.P.; Weijters, A.J.M.M. Process mining: A research agenda. Comput. Ind. 2004, 53, 231–244. [Google Scholar] [CrossRef] [Green Version]
  3. Van der Aalst, W.; Weijters, T.; Maruster, L. Workflow mining: Discovering process models from event logs. IEEE Trans. Knowl. Data Eng. 2004, 16, 1128–1142. [Google Scholar] [CrossRef]
  4. Dumas, M.; Mendling, J.; Rosa, M.L.; Reijers, H.A. Fundamentals of Business Process Managment; Springer: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
  5. 4TU Center for Research Data. 2022. Available online: https://data.4tu.nl/info/en/ (accessed on 1 March 2021).
  6. Ahn, H.; Kim, K.P. Formal Approach to Workflow Application Fragmentations over Cloud Deployment Models. Comput. Mater. Contin. 2021, 67, 3071–3088. [Google Scholar] [CrossRef]
  7. Wang, X.; Cao, Z.; Zhan, R.; Bai, M.; Ma, Q.; Li, G. Density-Based Outlier Detection in Multi-Dimensional Datasets. KSII Trans. Internet Inf. Syst. 2022, 16, 3815–3835. [Google Scholar]
  8. Preetha, K.; Babu, K.R.R.; Sangeetha, U.; Thomas, R.S.; Saigopika, R.; Walter, S.; Thomas, S. Price Forecasting on a Large Scale Data Set using Time Series and Neural Network Models. KSII Trans. Internet Inf. Syst. 2022, 16, 3923–3942. [Google Scholar]
  9. Van Der Aalst, W. Process mining: Overview and opportunities. ACM Trans. Manag. Inf. Syst. 2012, 3, 1–17. [Google Scholar] [CrossRef]
  10. Van Der Aalst, W. Process Mining: Discovery, Conformance and Enhancement of Business Processes; Springer: Heidelberg, Germany, 2011. [Google Scholar]
  11. Kim, K.S.; Pham, D.L.; Kim, K.P. ρ-Algorithm: A SICN-Oriented Process Mining Framework. IEEE Access 2021, 9, 139852–139875. [Google Scholar] [CrossRef]
  12. Weijters, A.; Ribeiro, J. Flexible heuristics miner (FHM). In Proceedings of the 2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM), Paris, France, 11–15 April 2011; pp. 310–317. [Google Scholar]
  13. Nandi, A.; Mandal, A.; Atreja, S.; Dasgupta, G.B.; Bhattacharya, S. Anomaly detection using program control flow graph mining from execution logs. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 215–224. [Google Scholar]
  14. Conforti, R.; La Rosa, M.; ter Hofstede, A.H. Filtering out infrequent behavior from business process event logs. IEEE Trans. Knowl. Data Eng. 2017, 29, 300–314. [Google Scholar] [CrossRef] [Green Version]
  15. Mitsyuk, A.A.; Shugurov, I.S. On process model synthesis based on event logs with noise. Autom. Control. Comput. Sci. 2014, 21, 181–198. [Google Scholar] [CrossRef]
  16. van Zelst, S.J.; Sani, M.F.; Ostovar, A.; Conforti, R.; La Rosa, M. Filtering spurious events from event streams of business processes. In Proceedings of the International Conference on Advanced Information Systems Engineering, Tallinn, Estonia, 11–15 June 2018; pp. 35–52. [Google Scholar]
  17. Kim, K.S.; Pham, D.L.; Park, Y.I.; Kim, K.P. Experimental verification and validation of the SICN-oriented process mining algorithm and system. J. King Saud Univ. Comput. Inf. Sci. 2022, 34, 9793–9813. [Google Scholar] [CrossRef]
  18. Pham, D.; Ahn, H.; Kim, K.P. A Temporal Work Transference Discovery Algorithm and Experimental Results on XES-Formatted Workflow Logs. In Proceedings of the 13th Asia Pacific International Conference on Information Science and Technology (APIC-IST 2018), Nha Trang, Vietnam, 24–27 June 2018; pp. 41–46. [Google Scholar]
  19. Mukred, M.; Yusof, Z.M.; Mokhtar, U.A.; Sadiq, A.S.; Hawash, B.; Ahmed, W.A. Improving the Decision-Making Process in the Higher Learning Institutions via Electronic Records Management System Adoption. KSII Trans. Internet Inf. Syst. 2021, 15, 90–113. [Google Scholar]
  20. Ghionna, L.; Greco, G.; Guzzo, A.; Pontieri, L. Outlier detection techniques for process mining applications. In Proceedings of the International Symposium on Methodologies for Intelligent Systems, Toronto, ON, Canada, 20–23 May 2008; pp. 150–159. [Google Scholar]
  21. Kim, K.; Lee, Y.K.; Ahn, H.; Kim, K.P. An experimental mining and analytics for discovering proportional process patterns from workflow enactment event logs. Wirel. Netw. 2022, 28, 1211–1218. [Google Scholar] [CrossRef]
  22. Weijters, A.J.M.M.; van Der Aalst, W.M.P.; De Medeiros, A.K.A. Process Mining with the Heuristics Miner Algorithm; Technische Universiteit Eindhoven: Eindhoven, The Netherlands, 2006; Volume 166, pp. 1–34. [Google Scholar]
  23. Conforti, R.; Rosa, M.L.; ter Hofstede, A.H. Noise Filtering of Process Execution Logs based on Outliers Detection. 2015. QUT ePrints. Available online: https://eprints.qut.edu.au/82901/ (accessed on 15 November 2021).
  24. Sani, M.F.; van Zelst, S.J.; van der Aalst, W.M. Improving process discovery results by filtering outliers using conditional behavioural probabilities. In Proceedings of the International Conference on Business Process Management, Barcelona, Spain, 10–15 September 2017; pp. 216–229. [Google Scholar]
  25. Li, W.; Zhu, H.; Liu, W.; Chen, D.; Jiang, J.; Jin, Q. An Anti-Noise Process Mining Algorithm Based on Minimum Spanning Tree Clustering. IEEE Access 2018, 6, 48756–48764. [Google Scholar] [CrossRef]
  26. Sani, M.F.; van Zelst, S.J.; van der Aalst, W.M. Applying Sequence Mining for Outlier Detection in Process Mining. In Proceedings of the OTM Confederated International Conferences “On the Move to Meaningful Internet Systems”, Valletta, Malta, 22–26 October 2018; pp. 98–116. [Google Scholar]
  27. Ahn, H.; Kim, K.P. Formal approach for discovering work transference networks from workflow logs. Inf. Sci. 2020, 515, 1–25. [Google Scholar] [CrossRef]
  28. Rogge-Solti, A.; Kasneci, G. Temporal anomaly detection in business processes. In Proceedings of the International Conference on Business Process Management, Eindhoven, The Netherlands, 7–8 September 2014; pp. 234–249. [Google Scholar]
  29. Peterson, J.L. Petri nets. ACM Comput. Surv. (CSUR) 1977, 9, 223–252. [Google Scholar] [CrossRef]
  30. Kim, K.; Ellis, C.A. An ICN-based Workflow Model and Its Advances. In Handbook of Research on BP Modeling; IGI Global, ISR: Hershey, PA, USA, 2009; Section II/Chapter VII. [Google Scholar]
  31. Jin, M.; Kim, K.; Kim, K.P. Discovering a Work Transference Network from a Workflow Model of Library Book Acquisition Procedures. In Proceedings of the 9th International Conference on Internet, Qingdao, China, 23–25 August 2017; pp. 221–225. [Google Scholar]
Figure 1. Activity-sequencing graph in the process-mining framework addressing temporal activity-sequencing anomalies.
Figure 1. Activity-sequencing graph in the process-mining framework addressing temporal activity-sequencing anomalies.
Applsci 13 03143 g001
Figure 2. The temporal workcase cube of business process enactment event logs from the 10 XES-formatted datasets.
Figure 2. The temporal workcase cube of business process enactment event logs from the 10 XES-formatted datasets.
Applsci 13 03143 g002
Figure 3. The process pattern graph models before and after removing noises from the dataset of 1000-all-synthetic-150MB.xes.
Figure 3. The process pattern graph models before and after removing noises from the dataset of 1000-all-synthetic-150MB.xes.
Applsci 13 03143 g003
Figure 4. Effectiveness of the noise detection and removal: The SICN-based process model discovered from the process pattern graph model.
Figure 4. Effectiveness of the noise detection and removal: The SICN-based process model discovered from the process pattern graph model.
Applsci 13 03143 g004
Figure 5. The numbers of temporal activity-sequencing noises in the datasets of the experimental analyses.
Figure 5. The numbers of temporal activity-sequencing noises in the datasets of the experimental analyses.
Applsci 13 03143 g005
Figure 6. A group of process pattern graph models after removing the noises, each of which is corresponding to each subprocess in the Large Bank Transaction Process Model.
Figure 6. A group of process pattern graph models after removing the noises, each of which is corresponding to each subprocess in the Large Bank Transaction Process Model.
Applsci 13 03143 g006
Table 1. The log-datasets used in the experimental analyses.
Table 1. The log-datasets used in the experimental analyses.
Log-DatasetTracesActivitiesEvents
BPI201213,08724262,200
BPI201731,509261,202,267
BPI201843,809412,514,266
Large Bank
Transaction
10,000113678,864
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kim, K.P. Experimental Analyses of Temporal Activity-Sequencing Anomalies in Process Mining. Appl. Sci. 2023, 13, 3143. https://doi.org/10.3390/app13053143

AMA Style

Kim KP. Experimental Analyses of Temporal Activity-Sequencing Anomalies in Process Mining. Applied Sciences. 2023; 13(5):3143. https://doi.org/10.3390/app13053143

Chicago/Turabian Style

Kim, Kwanghoon Pio. 2023. "Experimental Analyses of Temporal Activity-Sequencing Anomalies in Process Mining" Applied Sciences 13, no. 5: 3143. https://doi.org/10.3390/app13053143

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop