1. Introduction
With the rapid development of the big data era, process mining has become increasingly important. The three types of process mining [
1] are process discovery, consistency checking, and process enhancement. Process mining aims to gain insights into business processes by analyzing event data recorded in information systems. Many current process mining algorithms have difficulties in handling real event data. Process discovery techniques are constantly being developed and refined, such as heuristic process discovery techniques [
2], inductive mining [
3], fuzzy mining [
4], and integer linear programming mining [
5]. Most process discovery techniques assume that event logs realistically describe the behavior of business processes and strive to reflect all event logs in the business process model; unfortunately, in real life event logs often contain outliers [
6], noise [
7], and chaotic activity [
8]. If the event logs are not preprocessed and applied directly, the resulting business process model is complex and difficult to understand, and the fitness and precision [
9] are low and cannot accurately describe the real execution of the business process.
Data quality affects business process discovery in real-life complex environments. Many real-life event logs (e.g., healthcare logs) may have a number of data quality issues, some of which cannot be resolved by preprocessing or data cleansing techniques, leading to inaccurate results. Benevento, E. et al. [
10] uses an innovative process mining (PM) approach called Interactive Process Discovery (IPD), which combines domain knowledge with available data. By putting “people in the loop”, this approach can overcome the limitations of noisy and incomplete event logs, thus improving business process modeling. Pegoraro, M. et al. [
11] analyzes the previously unexplored setting of uncertain event logs: logs that record quantitative uncertainty and the corresponding data. A classification of uncertain event logs and models is defined and the challenges posed by uncertainty for process discovery and consistency checking are examined. Chaotic activity detection facilitates the improvement of business process quality and promotes early prediction of various fraudulent activities [
12], such as credit card theft [
13], false insurance claims [
14], and healthcare systems [
15]. Detecting chaotic activity in data streams is extremely challenging because event logs are by nature highly variable [
2]. There are fewer studies on chaotic activities, and the existing research methods treat chaotic activities as ordinary anomalies resulting in the inability to filter chaotic activities accurately. Calculating the distance of points in the dataset to detect implied outliers has high computational complexity; Mishra, S. et al. [
16] focuses on major outlier detection algorithms based on local outlier factors. Böhmer, K. et al. [
17] proposes an anomaly detection method based on association rule mining. It is capable of handling process changes and flexible execution that may lead to false alarms. This helps in taking appropriate countermeasures against malignant anomalies and avoiding possible termination of benign process execution. van Zelst, S.J. et al. [
18] proposes an event processor that is capable of filtering out spurious events from the event stream, thus improving the results of process mining. Cai, S. et al. [
19] proposes an outlier detection method based on two-stage minimum weighted rare pattern mining, where two deviation factors are defined to measure the degree of abnormality of each transaction on the weighted data stream in the outlier detection stage.
Process discovery typically analyzes frequent behaviors in event logs to gain an intuitive understanding of processes. However, chaotic activity randomness and uncertainty increase the difficulty of process mining and hinder the applicability of process mining techniques. Koschmider, A. et al. [
20] investigates the extent to which anomaly detection techniques in process mining can advance the field of process mining. van Zelst, S.J. et al. [
21] proposes a generic event stream filter that relies on incremental update automata to filter out spurious events, aiming at detecting and removing infrequent behaviors from the event stream. Data-aware heuristic mining algorithm [
22] is a process discovery algorithm that uses data attributes to distinguish infrequent paths from random noise through classification techniques, which can effectively filter random noise. However, in real life, there are effective and uncommon behaviors (aerospace escape systems) that help to improve business processes. Most existing research either ignores them or dismisses them as harmful chaotic behaviors. In order to distinguish between valid infrequent sequences and chaotic activities, an algorithm based on maximum probability path analysis of strong migration relationships between activity distribution states and behaviors has been proposed in [
23]. Infrequent logs are preprocessed using conditional probabilistic entropy to remove individual noisy activities that are very irregularly distributed in the trajectory. Valid sequences are then extracted from the logs based on the state transfer information of the activities. This method not only preserves the key structure of the model, but also improves the quality of the model. Wang, L. et al. [
24] further takes the perspective of combining control flow and data flow for effective infrequent behavior analysis, using frequent patterns and interaction behavior profiles to find out the infrequent behaviors, and then analyzes the intensity of the data flow information’s impact on infrequent behaviors through conditional dependence probability, thus proposing an effective infrequent behavior identification method based on data awareness of frequent patterns.
Chaotic activity is independent of rarity, and due to the stochastic nature of chaotic activity, chaotic activity can occur frequently or infrequently in the event log. Koschmider, A. et al. [
20] examines studies covering various aspects related to (semi-)automated outlier/noise detection and proposes a method for identifying anomalous traces in event logs based on clustered similarities between different traces. When chaotic activity is present in most of the traces, the number of traces used by the method for model discovery decreases and ultimately an accurate process model cannot be found. Yi, G. et al. [
25] proposes a chaotic activity filtering method based on bidirectional causal dependencies. The method effectively filters chaotic activities in the event log by analyzing the bidirectional causal dependencies between the activities in the event log and using the accuracy between the model and the event log. Chaotic activities are executed arbitrarily in the process, affecting the quality of the discovery model. Regarding the difficulty of defining chaotic activity without basic knowledge, Lamghari, Z. et al. [
26] proposes a technique using unsupervised learning to identify chaotic activity without labeled training data.
Filtering techniques for infrequent behaviors and noise in event logs are well developed; however, existing filtering techniques for chaotic activities are not perfect due to the random and disorderly nature of chaotic activities. The frequency-based activity filtering discussed in [
8] cannot solve the problem caused by chaotic activity. To solve this problem, this paper proposes an entropy-based behavior tightness filtering chaotic activity algorithm. It is based on the communication behavior profile theory combined with feature and modular nets applying high-frequency logs to construct a process model, which lays the modeling foundation for chaotic activity model queries. The identification of the set of suspicious chaotic activities is achieved through Laplace entropy. It is based on logs containing suspicious chaotic activities in order to construct query models. Finally, based on the 
K-order inheritance relationship, the behavioral closeness of the query model and the business process model is analyzed, which can effectively filter the chaotic activities in the event log.
The rest of the paper is organized as follows: 
Section 2 shows the motivation case, and 
Section 3 presents the fundamentals. In 
Section 4, we build a business process model based on feature network and module network, which lays a foundation for the model query of chaotic activity filtering. 
Section 5 is based on entropy chaotic activity filtering method, 
Section 6 is experimental evaluation, and the final conclusion.
  2. Motivation
Chaotic activities can occur randomly anywhere in the entire business process without constraints, obscuring the true direct succession of event logs. The flow relationship between disturbing events makes the business process model redundant and unreadable, even if the process model mined by the developed process discovery technology cannot accurately describe the actual business process. As a result, it is difficult for organizational managers to understand the whole business process model from the complicated business process model, and it is difficult for business process analysts to make correct decisions in the face of poor readability of business process models.
Figure 1 illustrates a comparison between a business process model containing chaotic activities and a business process model after filtering chaotic activities. The business process containing chaotic activities can be seen in 
Figure 1a, where the events are interleaved with each other to form a spiderweb-like complex flow relationship, and the chaotic activity 
D acts as an input to multiple events, 
 resulting in business process execution with ambiguity. In 
Figure 1b, the business process model after filtering chaotic activities is more concise, readable, and executable. Therefore, it is important to filter the chaotic activities in the event log to construct a reasonable business process model.
 There is less research on filtering chaotic activity in event logs, and the difficulty lies in distinguishing low-frequency chaotic activity from effectively infrequent behavior. Low-frequency chaotic activity is often included in real event logs. In the online shopping business in 
Figure 2, when a customer submits an order but does not pay when shopping for a product, the seller sends an SMS to remind the customer to continue to pay or to cancel the order, but there are also cases where the seller sends the customer information about discounts on the shop’s products for the purpose of expanding publicity. The frequency of these two situations is relatively low in the online shopping business process. The former has a perfect effect on the online shopping business process, which is called effective infrequent activity, while the latter makes customers accept spam messages and disturb customers’ life, which is called chaotic activity. In this paper, we propose an entropy-based behavioral closeness filtering method for chaotic activities that can effectively distinguish chaotic activities from effectively infrequent behaviors.
  3. Basic Knowledge
Definition 1  (
K-order successor relation [
27])
. Let the network system be , where , 
. For the transition sequence 
          , the  K-order successor relation of  is defined as follows: The 
K-order successor of system 
 is defined as:
The maximum K-order successor relationship between transition pairs is defined as:.
Definition 2  (Minimum 
K-order Succession Relationship [
27])
. Let  be a sound business process Petri net, the minimum K-order succession relationship for any sequence of transition occurrences  is defined as follows: To better illustrate the minimum K-order successor relationship, an example is introduced below.
From 
Figure 3a of the business flowchart, it can be seen that transition 
 is a direct successor to transition 
, 
 through a place to either 
 or 
. So, the minimum 
K-order successor relation is 
 for the transition pair    
, and similarly the minimum 
K-order successor relation is 
 for the transition pair 
. The shortest path from transition 
 to transition 
 takes two places, so the minimum 
K-order successor relation for transition pair 
 is 
. The minimal 
K-order successor relationship 
 of the transitions to 
 is due to the fact that the transition 
 occurring in place 
 must occur at the same time, and the activity 
 occurring in place 
 gets marked at the same time. The execution path of 
 is 
, i.e., 
. The minimum 
K-order successor relationships between the other pairs of transitions are shown in 
Figure 3b.
Definition 3  (Behavioral inclusion [
28])
. Let  and  be two net systems where  and ; if any sequence of transitions occurring  there exists a sequence of transitions occurring , such that , denoted as , then net  is said to be the behavioral inclusion of , denoted as .
 Definition 4  (Behavioral closeness [
28])
. Let  and  be two net systems, where  and ; the maximum K-order successor relation and the minimum K-order successor relation of net system  are , , and that of net system  are ,, respectively. The closeness of the transitions to  in the net system  is defined as follows: Net system  to  Closeness of .
Definition 5  (Direct Pre-set Rate, Direct Post-set Rate [
8])
. The Direct Post-set Rate, denoted , represents the ratio  of the direct post-sets where activity  is a direct post-set of activity  in the log, where the smooth parameter  is usually is the number of activities in the event log , and  represents the number of direct post-sets. 
           indicates the number of logs that contain the active  in the log . Similarly, the direct pre-set rate .
 Definition 6  (Entropy [
8])
. Introduces the probability distribution function , and the Laplacian smooth entropy of activity  is defined in log  by the direct pre-set rate and the direct post-set rate as follows: Definition 7  (Communication Behavior Profile). Let  be the event log and  be the corresponding communication successor relationship. The communication behavior profile is a ternary array , consisting of the following relations:
- (1)
- Strict communication relation    if and only ; 
- (2)
- Inter-communication relation  if and only if ; 
- (3)
- Exclusive communication relationship  if and only if . 
 Definition 8  (Feature Net). Let  be the event log, and  be the feature. Let  be the communication behavior profile, and the feature net  satisfy the following conditions:
- (1)
- ; 
- (2)
- ; 
- (3)
- ; 
- (4)
where  is the set of places and  is the workflow network.
   4. Build Business Process Model Based on Feature Network and Module Network
In business process management, due to the diversified needs of organizations, it has become a new trend to connect models under system operation through various behavioral interactions [
29], and the interaction of behaviors is more prone to chaotic activities, which makes the event log increasingly redundant, thus increasing the difficulty of process mining. In order to solve the above problems, this section proposes an interactive process model mining method based on feature and module networks.
  4.1. Fusion of Feature Network and Module Network Online Shopping Model
Chaotic activity occurs with disorder and randomness, and the inclusion of chaotic activity in the event log not only complicates the business process flow relationship, but also covers the true successor relationship between events. For example, in the event log , the direct successor of activity  is . If chaos activity  occurs randomly between  i.e., , then the direct successor of activity  is , and the direct successor of  is . The existence of the chaotic activity  hides the real direct successor relationship. According to the above problems, this section proposes an interactive process model mining method based on a feature network and a module network. First, the internal feature order relationship among various features corresponding to activities in logs is analyzed, and the initial module network is mined first. Secondly, according to the definition of interface transitions and feature nets, the logs to mine transitions are traversed and places are added to them; then, using the viewpoint of synthetic nets, the interaction modules are fused into a perfect process model through the interface places, which lays the foundation of the query model for chaotic activities.
This section takes the buyer (Y), the seller (Z), the rebate system (X), and the three modules of online shopping as an example to introduce the use of the fusion feature network and module network to construct a complete online shopping business process model. A total of 8542 online shopping event logs are selected from the system. Let us start with the activity names for each of the three modules.
Buyer (Y):  login,  select goods,  add to cart,  continue to purchase,  direct purchase,  Submit the order,  default delivery address,  new harvest address,  payment method,  Alipay,  Bank card,  ant flower,  payment success,  receipt of goods,  confirmation of receipt,  evaluation,  return application,  express return,  payment return.
Seller (Z):  Store Centre,  Sold Items,  Pending Payment,  Send Payment Reminder SMS,  Packing,  printing Courier,  shipping order,  logistics delivery,  return application approval,  Provide return address,  successful transaction,  receive payment,  receive goods,  send discount SMS,  send coupons 300 − 40.
Rebate system (X):  Register and log in,  paste purchase link,  rebate information,  Coupon Collection,  Purchase,  rebate failure,  rebate success,  rebate to the account.
The interaction log between the three modules of online shopping, buyer, seller, and rebate system is as follows:
The logs corresponding to each module are extracted from the source logs according to the module characteristics, and the interaction system is shown in 
Table 1.
The communication behavior profile is applied to build the interaction system between the modules, see 
Table 2.
The interactive information flow relationship between each module can be seen from the communication behavior relationship table (
Figure 4), and the complete online shopping business flow chart is shown in 
Figure 5.
  4.2. Fusion of Feature Network and Module Network Mining Algorithm
Many business processes consist of multiple modules interacting to transfer information, so merging multiple modules into a rational business process model is the focus of process mining. Most of the systems are generated by the interaction of different modules, such that the logs generated by the system incorporate multiple modules. In this section, an algorithm of business process discovery, based on a multi-module fusion of feature networks and module networks, is proposed. The algorithm first classifies the features in the log into different modules, based on which the inter-module and intra-module feature interactions are derived to fuse the two, respectively. The following section describes the business process model mining algorithm based on feature and module nets.
In Algorithm 1, line 1 classifies event logs according to module characteristics to get module logs. Lines 3–4 construct a behavioral profile table  
 for each module log, and construct a module net 
 based on the behavioral profile table, using the communication behavior profile to obtain a characteristic network 
 of interaction behaviors between the two modules. The information communication between modules in lines 5–9 is described below.
        
|  | Algorithm 1 Mining algorithms for Fusing Feature and Modular Nets | 
|  | Input: Event Logs: | 
|  | Output: Interactive business process model | 
| 1 | // according to module characteristics to get module logs. | 
| 2 | For  do | 
| 3 | //construct a behavioral profile table  . | 
| 4 | // construct a module net based on the behavioral profile table. | 
| 5 | if  then | 
| 6 |  | 
| 7 | Else  if | 
| 8 |  | 
| 9 | Else  if | 
| 10 |  | 
| 11 |  | 
| 12 |  | 
| 13 | END | 
| 14 | Return | 
| 15 | END | 
If 
 receives a message from 
, the communication flow between 
 and 
 is as follows:
If 
 transmits a message to 
, then the business flow between 
 and 
 is as follows:
If 
 receives messages from 
 and can receive and send messages to 
, then the business flow between 
, 
, 
 is as follows:
An open Petri net that satisfies the above flow relationships is a feature net .
In lines 10–13, the flow relationship of feature  in feature network  is added to the interactive module networks, and the business process model  of fusion feature network  and module network  is obtained. Finally, the interactive business process model  is output.
  5. Chaotic Activity Filtering Method
The presence of chaotic activities may lead to confusing and less understandable mining results, and filtering out these activities can improve the accuracy and reliability of process mining. It can help people better understand and analyze the process, find potential problems, and prevent various fraud incidents.
  5.1. Entropy-Based Behavioral Closeness Filtering of Chaotic Activity Methods
The entropy values of direct pre-set rate, direct post-set rate, and Laplace smoothing were calculated for each activity in the low-frequency event log, which lists the entropy values 
 of the activities listed in 
Table 3.
From 
Table 3, it can be seen that entropy values of activities 
, 
, 
, 
 are lower than 0.3, and are put into the set of suspicious chaotic activities 
. The event log which contains the suspicious chaotic activity 
, 
 is 
. The query model 
 containing the suspicious chaotic activity 
, 
 is shown in 
Figure 6.
Event logs containing suspected chaotic activity 
, 
 are:
        
The query model 
 containing the suspicious chaotic activity 
, 
 is shown in 
Figure 7.
The closeness scores between the query model 
, 
, and the online shopping model 
 are calculated separately as follows.
        
From the tightness score, it can be clearly seen that 
, indicating that the query mode 
, which is composed of event logs containing suspected chaotic activities 
, 
, has a high closeness with the online shopping business process model, which is in line with the business process 
, which is not a chaotic activity but an important part of the business process. When the customer selects the goods to be purchased and submits an order without payment, the seller sends a reminder SMS, the customer may cancel the order or may go to the payment to complete the transaction, which is part of the online shopping business process. The query model 
 consisting of the event log containing the suspicious chaotic activity 
, 
 has a low closeness to the online shopping business process model 
, and the activity 
, 
 has little relevance to the online shopping business process model, so 
, 
 is a chaotic activity and should be deleted from the business process. The event log after filtering chaos activity is as follows.
        
  5.2. Entropy-Based Behavioral Closeness Filtering Chaotic Activity Algorithm
Suspicious low-frequency chaotic activity hidden in the event logs that destroys direct successor relationships between activities can be detected by calculating the entropy value with Laplace smoothing. The event logs containing suspicious chaotic activities are extracted and a query model is constructed based on the behavioral profile. The query model is matched with the complete business process model to derive a closeness score to identify chaotic activities, the detailed steps of the algorithm are as follows.
Lines 7–8 of Algorithm 2 compute the direct back-set rate and front-set rate for activity “a”. Line 9 calculates the Laplace smooth entropy value for each activity. Lines 10–16 are further behavioral judgments on the set W of suspected chaotic activities. Firstly, the behavior profile relationship table is constructed for each suspicious chaotic activity. Then, all traces containing suspected chaotic activity are extracted for constructing the query model . Lines 17–25 calculate the first-order successor relationship and the K-order successor relationship between the query model and the business process. Finally, the closeness score is calculated. A closeness score of less than 0.9 queries the model containing the suspected chaotic activity as chaotic.
Entropy behavioral closeness is a measure of complexity and uncertainty in behavioral sequences. It measures the randomness and diversity of a sequence of behaviors by calculating the entropy of the information in the sequence. The choice of an entropy value of 0.9 as the threshold is an empirical choice for determining whether the behavioral sequence has sufficient complexity and randomness.
When the entropy value of a behavioral sequence is greater than 0.9, it indicates that the sequence has a high level of complexity and diversity. Such sequences may contain a variety of different behavioural patterns that do not satisfy the stability of the business process; whereas event logs that satisfy certain constraint rules have uniquely determined behavioural patterns. Such a sequence may contain a variety of different behavioral patterns that do not satisfy the stability of the business process; event logs that satisfy certain constraint rules have uniquely determined behavioral patterns.
However, the selection of 0.9 is not a fixed criterion and can be adapted to the specific situation in practical applications. A higher threshold can be chosen if a more rigorous screening of behavioral sequences of higher complexity is required; conversely, a lower threshold can be chosen if more behavioral sequences need to be included. This paper shows that, for online shopping, after continuous debugging, the threshold value of 0.9 can completely filter the chaotic activity in the business process.
        
| Algorithm 2 Entropy based Behavior Closeness Chaos Activity Filtering Algorithm | 
| Input: Event Logs: , Threshold: | 
| Output: Event logs after the chaos activity is filtered | 
| 1 | Directly Precedes Ratios: | 
| 2 | Directly Follows Ratios: | 
| 3 | Laplace smooth entropy: | 
| 4 | Business Process Model | 
| 5 | Suspicious chaotic activity set: | 
| 6 | For  do | 
| 7 | // Activity a directly follows ratios in . | 
| 8 | //Activity a directly precedes ratios in . | 
| 9 | //Calculate the Laplacian smooth entropy for each activity | 
| 10 | If    then | 
| 11 |  | 
| 12 | Fordo | 
| 13 | //Extract all traces of suspicious chaotic actives | 
| 14 | Behavior Profile Relationship | 
| 15 | //Construct a query model of suspicious chaotic activity trace | 
| 16 | end | 
| 17 | //First order inheritance relation | 
| 18 | //k order inheritance relation | 
| 19 | //Calculate the closeness score between | 
| 20 | end | 
| 21 | Chaotic activity set: | 
| 22 | If     then | 
| 23 |  | 
| 24 |  | 
| 25 | end | 
| 26 | end | 
| 27 | return | 
  6. Experimental Evaluation
This section uses synthetic logs and real event logs, respectively, to evaluate the effectiveness and performance of the methods proposed in this chapter and to analyze the experimental results. The experiments were run on a machine with a 3.4 GHz Intel Core i7 processor and 16 GB of RAM. The results of the evaluation of the chaotic activity filtering method based on synthetic event logs are given in 
Section 6.1, and the results of the evaluation based on realistic logs are given in 
Section 6.2.
  6.1. Verification of Chaotic Activity Filtering Methods for Synthetic Event Logs
In this paper, an entropy-based behavior tightness filtering chaotic activity method based on fusion of feature network and module network is used. In order to verify the effectiveness of the method of mining chaotic activities in this paper, firstly, PSLG (Petri simulation log generation) was written using Java language, which simulates the generation of event logs in .xes format for business process systems with multi-organizational interactions. The PSLG-1.0 software was used to generate 5367 synthetic logs in which the number of events is 2365 and the number of activities is 22; the minimum number of activities in each case is 12, the maximum number of activities is 18, and the average number of activities is 14.
The synthetic event logs are based on chaotic activity-independent business process states that can randomly occur at any time and place. Chaotic activities are classified into three categories randomly inserted into any position of the generated log, which are Random Chaotic Activity [U], Frequent Chaotic Activity [F], and Infrequent Chaotic Activity [I]. The effectiveness of each method in filtering chaotic activities is judged by counting the number of correct activities misjudged as chaotic activities by each method after filtering all labeled chaotic activities using direct filtering technique, indirect filtering technique, most frequent first (MFF), least frequent first (LFF), and entropy-based behavioral closeness filtering chaotic activity methods (EBCFCA). Obviously the lower the number of misjudgments, the better the effect of filtering chaotic activity.
As can be seen from 
Figure 8 3D graph, with the increase of chaotic activity, the most commonly used current filtering techniques are shown: direct filtering technique (Direct), high-frequency priority filtering technique (MFF), low-frequency priority filtering technique (LFF), and this paper’s proposed entropy-based behavioral closeness filtering technique of chaotic activity (EBCFCA). With the increase of the chaotic activity content, the misclassified chaotic activity becomes an incremental tendency when the labeled chaotic activity is filtered out completely. Let us take the most difficult-to-filter random chaotic activity (U) as an example. When the number of labeled chaotic activities is filtered up to 220, 68 activities are misjudged based on Direct (U) with a misjudgment rate of 31%, Indirect filtration with a misjudgment rate of 42%, EBCFCA with a misjudgment rate of 19.5%, MFF with a misjudgment rate of 35%, and LFF with a misjudgment rate of 42.3%. As shown in 
Figure 8, the direct filtering method can better distinguish between actual activities and man-made chaotic activities in the process. Infrequent random positioning activities are the most difficult type of chaotic activities to filter out correctly, because their infrequency may lead to low entropy in the probability distribution of activities around them. Frequency-based activity filtering techniques, either based on the most frequent or the least frequent filtering techniques, cannot filter out the random location activity inserted into the log, even the small amount of random location activity added. The behavioral closeness filtering chaotic activity method of base entropy proposed in this paper not only considers the behavioral profile relationship between activities, but further analyzes the 
K-order succession relationship between activities, which can reduce the behavioral relationship between two real activities weakened by chaotic activities. In addition, this paper considers where chaotic activity is most likely to occur during the transfer of information between process models. Based on the suspicious chaotic activity traces, the query model is constructed to calculate business process closeness to detect chaotic activities, which greatly improves the accuracy of chaotic activities. As seen in 
Figure 8, the number of misjudged chaotic activities tends to increase as the number of chaotic activities increases, especially the more randomly located activities [U].
  6.2. Detection of Chaotic Activity Filtering Method of Real Data
In this section, we apply the BPMI2019 dataset (containing traces 251,734, number of events 1,595,923, and number of activities 42) to evaluate the effect of the chaotic activity filtering method. There is no a priori knowledge of labeling those activities as chaotic activities in the BPMI dataset. Therefore, we apply 
Figure 9 to show the five methods to filter the dataset after chaotic activities, mine the process model, and evaluate the quality of the mining process model to validate the effectiveness of the chaotic activity filtering methods.
Real-life event logs typically contain a wide variety of data quality issues, including incorrectly recorded events, events recorded in the wrong order, and unrecorded events. Existing filtering techniques in the field of process mining can be classified into four categories: (1) event filtering techniques; (2) process discovery techniques with built-in integrated filtering mechanisms; (3) trace-based filtering techniques; and (4) activity filtering techniques. Currently, most of the techniques treat chaotic activity as noise, and many event log filtering techniques have been proposed to address the noise approach, typically represented by IPF [
30], Fodina [
31], and Heuristic Mining [
32].
IPF is a frequency-based filtering technique that works by removing infrequent prefixes. It can filter chaotic activities that occur relatively infrequently. It is easy to see that the chaotic activity X affects the prefix closures found from the event log. Given a log consisting of two trajectories “a, X, b” and “X, a, b,” the activity X causes the prefix closures of the two traces to be inconsistent in state, whereas without the activity X the two traces are identical. Thus prefix closure-based filtering methods are less effective because frequent prefixes are randomly distributed over several infrequent prefixes when there is chaotic activity. Heuristic mining handles noise, defaulting to any low-frequency behavior as noise, and it retains the main behaviors recorded in the event log. It supports mining all common structures in the process model but cannot handle repetitive tasks. The algorithm requires complete workflow logs, otherwise the exported model may lose some of the relationships between activities. Obviously, this form of mining algorithm is only suitable for simple structures and complete workflow logging processes. The Fodina algorithm converts event logs into task logs and uses contextual information to mine repeated activities. A dependency metric is used to construct a basic dependency graph between activities, ensuring that each task is accessible in the dependency graph, and mining remote dependencies in the dependency graph. This method filters chaotic activity and noisy data, adding the ability to discover repeated activities.
The difference between chaotic activity and noise is that noise comes from errors related to logging, whereas chaotic activity events are actually logged correctly because chaotic activity occurs randomly independently of the business process flow and can be frequent or low frequency. Excessive chaotic activity exists in the process to hide the behavioral relationships in the process and affects the process management to do decision analysis. Chaotic activity is also clearly different from the filtering of infrequent behaviors, so filtering chaotic activity requires specialized methods based on the characteristics of chaotic activity rather than treating it as noise filtering or based on low-frequency behavior filtering methods. IAFAE [
8] (indirect activity filtering approach based on entropy) describes the greedy approach for iterative filtering of chaotic activities from event logs. The randomness of chaotic activities and the direct inheritance relationship between the activities are taken into account to select the chaotic activities to be filtered based on the Laplace entropy value. IAFAE filtering chaotic activity in a single business process is more effective, but with the development of the data era, the interaction between organizations is more and more frequent, and the mining of chaotic activity in the multi-organizational interaction process is more complex, so this paper proposes entropy-based behavioral closeness filtering of multi-organizational interaction business process chaotic activity. The relationship between module networks and feature networks between organizations is considered, and then entropy value and behavioral closeness between suspicious chaotic activity and business process closeness are calculated based on the 
K-order inheritance relationship, which can filter chaotic activity more accurately.
IPF, Fodina, Heuristic Mining, EBCFCA, and IAFAE are applied to verify the accuracy of chaotic activity detection in terms of fitness [
33], accuracy [
2], F1-score, and runtime dimensions to verify the accuracy of chaotic activity detection. Fitness, also known as recall, is concerned with quantifying how much of the observed behavior in the event log fits the process model. Precision is the quantification of how much of the process model allows for behaviors that are never observed in the event log. The model discovered should generalize the example behaviors seen in the event log. The F-score is the harmonic average between fitness and accuracy.
As can be seen in 
Figure 9, as more chaotic activities are included from the event logs, there is a decreasing trend in the precision, fitness, and F-score scores for all event logs. The IPF algorithm filters business process activities directly based on frequency, and the low-frequency-first filtering approach may lead to some effective low-frequency behaviors being filtered out without considering the following relationship between activities, resulting in the lowest F-score of the process model obtained from filtered event log mining. Heuristic mining and the Fodina algorithm not only calculate the direct-following relationship between activities, but also define the final following relationship between activities for mining the long-term dependency relationship. Therefore, the mining model is more accurate than the IPF chaotic activity filtering method.
Frequency-based and activity-based relationship filtering methods are lower than entropy-based (EBCFCA, IAFAE) filtering techniques in terms of fitness, accuracy, and F-score. This suggests that the entropy-based filtering technique not only takes into account the chaotic level of each activity, but also further considers the activity direct-following relationships, as opposed to simply filtering out infrequent activities. Chaotic activities may still be associated with processes. Therefore, the entropy-based behavioral precision filtering chaotic activity approach (EBCFCA) proposed in this paper analyzes the communication behavioral profile relationships of the process model in detail using feature and module nets. The logs of suspicious chaotic activity sets with high entropy values are used to construct the query model, and then the closeness scores of the query model and the complete business process model are calculated to filter the chaotic activities with low closeness scores. In summary, the method in this paper has a higher F-score than the IAFAE entropy-based filtered chaotic activity method and maintains a relatively reasonable runtime.
Figure 10a shows the business process model of the logs at Prom Mining after filtering chaotic activity by the frequency-based method, and we can see that the business process diagram is very redundant, and we zoomed in locally to make it clearer, and we can see that the frequency-based filtering technique is unable to accurately identify chaotic activity. As far as possible, the information in all logs is represented in the model, taking into account that logs with a low frequency of occurrence are chaotic activities and may be missing some effective infrequent behaviors (e.g., escape systems in aviation). These activities, although occurring less frequently, are indeed valuable behaviors for business processes. 
Figure 10b realizes that chaotic activities are different from noise filtering, and using the direct and final inheritance relationships between activities, and using entropy to calculate the chaotic level of the activities, can filter some of the chaotic activities, but it is less effective in filtering the random chaotic activities. Thus, 
Figure 10b is the business process mined by the entropy-based chaotic activity filtering technique, although it is still a bit chaotic but already has the framework structure of the process. 
Figure 10c considers the 
K-order inheritance relationship of each activity at a finer granularity, as well as the closeness of the pattern of the suspected chaotic activity to the process model. The complexity and uncertainty of the behavioral sequence is measured by calculating the entropy value of the activity. By filtering out chaotic activities with high entropy values, the noise and redundant information of the mining process model can be reduced and the reliability of the model can be improved. It can help to mine clearer and more ordered process models for better understanding and analysis of actual business processes.
   7. Conclusions
In this paper, we discuss that the existence of chaotic activities in event logs leads to the unreliable quality of process models for process discovery mining. In this paper, the chaotic degree and uncertainty of entropy calculation activities are introduced, and the activities with high entropy are summarized as suspicious chaotic activities. Finally, a query model is constructed by taking the logs that contain suspicious chaotic activities. The interactive business process model uses the fusion of feature and module nets. A tightness score is computed from the K-order inheritance relationship between the query model and the interaction process model to detect chaotic activities, and the chaotic activities are filtered to ensure that the business process has a high degree of fitness and accuracy. This method overcomes the shortcomings of existing frequency-based filtering methods that are unable to filter chaotic activities, and is able to accurately distinguish between low-frequency chaotic activities and effective infrequent behaviors, which significantly improves the accuracy of business processes.
This paper has mainly carried out theoretical methodological research and is deficient in terms of application. Although case studies are also carried out in the paper, they are still very limited. The existence of chaotic activities in the business process may cover up the real behavioral relationship of the business process, resulting in the redundancy of the business process and the inability to manage the security of the business process. Based on some industry data research performed in the early stage, it can be shown that identifying chaotic activity has a good application value in the coal mining industry’s big data analysis and risk early warning systems; financial risk identification, prevention and control; and the insurance claims business process. In the future, the method proposed in this paper could be used to carry out process log analysis of the above and more large-scale industry big data.
Calculating the degree of activity chaos applies empirical thresholds to judge whether activity is chaotic or ordinary; even if it is debugged many times it still may lead to misjudgment. Since chaotic activity recognition often requires the analysis of data flow information, change propagation often occurs when data flow and control flow are fused, and the impact of decision analysis on chaotic activity recognition is not considered; therefore, it is necessary to further explore how to carry out the fusion of control flow and data flow in combination with the machine learning method to discover the feature set of chaotic activity from event logs, and carry out the research of chaotic activity recognition methods through the attentional mechanism in the neural network, and so on. 
In future work, we will analyze association rules and attention mechanisms in business processes that can better filter chaotic activities in business processes. Through the use of machine learning algorithms, chaotic activities in business processes can be modeled and predicted to provide more accurate analytical results and decision support. Continuous improvement and optimization of methods and techniques for mining chaotic activities in business processes are needed. The accuracy and efficiency of business process chaotic activities can be improved to better meet the actual needs.