Next Article in Journal
Concept of an Integrated Urban Public Transport System Linked to a Railway Network Based on the Principles of a Timed-Transfer Timetable in the City of Prievidza
Previous Article in Journal
Using Systems Thinking to Manage Tourist-Based Nutrient Pollution in Belizean Cayes
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Domain Knowledge-Enhanced Process Mining for Anomaly Detection in Commercial Bank Business Processes

1
School of Management & Engineering, Nanjing University, Nanjing 210093, China
2
Postdoctoral Research Station, Suzhou International Development Group Co., Ltd., Suzhou 215008, China
*
Author to whom correspondence should be addressed.
Systems 2025, 13(7), 545; https://doi.org/10.3390/systems13070545
Submission received: 31 May 2025 / Revised: 30 June 2025 / Accepted: 3 July 2025 / Published: 4 July 2025
(This article belongs to the Special Issue Business Process Management Based on Big Data Analytics)

Abstract

Process anomaly detection in financial services systems is crucial for operational compliance and risk management. However, traditional process mining techniques frequently neglect the detection of significant low-frequency abnormalities due to their dependence on frequency and the inadequate incorporation of domain-specific knowledge. Therefore, we develop an enhanced process mining algorithm by incorporating a domain-specific follow-relationship matrix derived from standard operating procedures (SOPs). We empirically evaluated the effectiveness of the proposed algorithm based on real-world event logs from a corporate account-opening process conducted from January to December 2022 in a Chinese commercial bank. Additionally, we employed large language models (LLMs) for root cause analysis and process optimization recommendations. The empirical results demonstrate that the E-Heuristic Miner significantly outperforms traditional machine learning methods and process mining algorithms in process anomaly detection. Furthermore, the integration of LLMs provides promising capabilities in semantic reasoning and offers explainable optimization suggestions, enhancing decision-making support in complex financial scenarios. Our study significantly improves the precision of process anomaly detection in financial contexts by incorporating banking-specific domain knowledge into process mining algorithms. Meanwhile, it extends theoretical boundaries and the practical applicability of process mining in intelligent, semantic-aware financial service management.

1. Introduction

Operational management is a cornerstone of financial service systems. To ensure efficiency, security, and compliance in the delivery of financial services, operational management involves the rigorous planning and management of various operational activities, such as process design, performance analysis, productivity evaluation, risk prevention, and regulatory compliance [1,2]. With the rapid digitalization and automation of financial services [3], improving the operational quality and risk resilience of financial services has become an urgent issue for financial institutions [4]. As an important foundation of service operation management, business process management (BPM) has gradually received widespread attention from academia and the industry [5,6,7].
Commercial banks are essential providers of financial services. The standardization and efficiency of business processes significantly influence the overall quality of financial services and the compliance of risk management [8]. In 2005, the China Banking and Insurance Regulatory Commission (CBRC) advocated the transition of banking organizations from a “department-driven” to a “process-driven” model. However, limited by the traditional manual process combining method, the majority of banks largely depend on static, expert-driven standard operating procedures for process management. These methods have failed to adapt to the complex and dynamic business environment of contemporary banking. On the one hand, static flowcharts have difficulty representing the actual execution path of business processes. On the other hand, manual combing struggles to identify deviations between actual business processes and SOPs, which potentially hinders the recognition of operational risks. Additionally, the absence of automated and intelligent process analysis tools constrains the efficiency and precision of process management.
To address the limitations of traditional business process management, process mining technology has increasingly gained significant interest in the financial services sector as an essential tool for advancing financial digital transformation [9,10]. Process mining has integrated data mining and process modeling theory to rediscover actual business processes from event logs in information systems, thereby effectively detecting bottlenecks and deviations [11]. Although studies have validated the substantial theoretical insights of process mining [12,13,14,15], the majority predominantly emphasize innovations in algorithms and theoretical discussions [16,17,18], with limited empirical examinations of practical banking applications. Evidence from the BPI Challenge in banking domains suggests that Heuristic Miner algorithms utilize frequency thresholds to simplify process models [19,20,21], but these kinds of frequency-based methods may conceal low-frequency but potentially critical paths [22]. Furthermore, conventional process mining algorithms generally lack domain-specific knowledge because they do not account for established SOP rules during discovery. Normally, SOPs only appear during conformance checking. As a result, the rediscovered process models might omit important but infrequent SOPs, reducing their usefulness for compliance analysis. In sum, a significant gap remains regarding how to integrate banking-specific domain knowledge into process mining to improve the precision of process anomaly detection.
In this study, we address these gaps by developing an enhanced process mining algorithm that integrates domain-specific banking knowledge for process anomaly detection in commercial bank business processes. Specifically, we propose the E-Heuristic Miner algorithm, which incorporates a domain-specific follow-relationship matrix derived from the SOPs of banks into a traditional Heuristic Miner algorithm. This approach aims to preserve infrequent yet business-critical process paths, ensuring that low-frequency anomalies, such as procedural violations or rare fraud scenarios, are not overlooked by the mining process. At the same time, the algorithm highlights non-compliant paths; if an observed activity transition does not appear in the SOP-defined matrix, it is explicitly marked in the resulting process graph as a deviation node. Overall, the E-Heuristic Miner significantly improves anomaly detection precision and efficiency.
Additionally, we explore the potential applications of LLMs in semantic process analysis and intelligent process optimization to extend the research frontier of process management. As a form of intelligent post-processing, LLMs can produce natural language explanations of deviations and anomalies identified by the E-Heuristic Miner. Thus, non-technical stakeholders can understand the deviations without needing to interpret complex models, and they can use the LLM’s insights to inform remediation decisions or process improvements.
This research makes the following contributions:
  • We propose and empirically validate the E-Heuristic Miner algorithm, a novel method integrating banking domain knowledge into process mining, which significantly enhances the precision of anomaly detection in commercial banking processes.
  • We illustrate the practical utility and efficacy of the proposed method by conducting a thorough empirical analysis using real-world event logs from a commercial bank. This analysis provides a clear, data-driven path for process optimization.
  • We expand the theoretical and practical boundaries of process management research by exploring the application of LLMs in semantic process understanding and process analysis.
The rest of this study is structured as follows. Section 2 presents the theoretical background of process mining and process anomaly detection methods. Section 3 describes the research method. Section 4 presents the case study and findings. Section 5 concludes this study and outlines future research.

2. Literature Review

2.1. The Concepts and Techniques of Process Mining

Process mining utilizes event log data in information systems to improve operational processes [23,24]. It establishes a connection between process science and data science and integrates event log data with process models, thus facilitating process analysis, identifying process bottlenecks, and enabling process prediction or enhancement. Process mining comprises three major research modules: process discovery, conformance checking, and process enhancement [11,16].
Process discovery aims to generate and rediscover actual process models from event logs. Combined with the knowledge of graph theory and process modeling languages, process discovery has developed several algorithms, such as Alpha Miner, Heuristic Miner, Fuzzy Miner, Inductive Miner, and Split Miner. Table 1 summarizes the main algorithms in process discovery, including the proposed time, the process modeling representation, and the proposer.
Figure 1 constructs a research evolution diagram of process discovery algorithms and their corresponding graph representation. The vertical axis categorizes algorithms based on their underlying discovery principles, ranging from statistical models and frequency-based methods to block-structured and filter-based approaches. The horizontal axis denotes the corresponding graph representation, which refers to the formal structures or process modeling languages used by these algorithms to express the control-flow logic of business processes.
Graph representation and graph theory are critical for process mining, reflecting the trade-off between model expressiveness and interpretability. Finite State Machines (FSMs) [25] and Directed Acyclic Graphs (DAGs) [26] emphasize probabilistic transitions and basic activity sequences. Workflow nets (WF-nets) derived from Petri net theory [27,28] can offer more precise semantics for modeling concurrency and synchronization. Both causal nets (C-nets) [29,30] and Directly-Follows Graphs (DFGs) [32] represent local ordering or dependency relationships between process activities extracted from event logs. While DFGs focus on common patterns directly following, C-nets emphasize inferred causal structures. Process trees are a kind of hierarchically structured representation [33,34], and BPMN can represent more complex and rich business process situations, such as parallel and conditional branches, loops, and exception handling [36].
Conformance checking can evaluate the quality of process models, which is crucial for diagnosing and analyzing discrepancies between event logs and process models. In conformance checking, we usually need to balance four quality dimensions: fitness, accuracy, generalization, and simplification [37].
The research perspectives of process mining are divided into the process (control flow), time, organizational, and case perspectives. The process perspective focuses on the control flow and the execution order of business process activities. Process discovery and conformance checking both focus on the process perspective. Process enhancement utilizes additional information (e.g., activity frequency and time duration) and adds the time, organization, and case perspectives to refine existing process models. It typically integrates machine learning and data mining techniques, thus enhancing analysis and adapting more effectively to dynamic operational situations. Recent advancements, such as the integration of LLMs with process mining [38] and the advent of large process models, indicate the evolution of process management toward more extensive and complicated intelligent systems [39].

2.2. Applications of Process Mining in Financial Services Operations

Process mining in financial services has been used in several scenarios, including loan approval and insurance claims. Contemporary research predominantly emphasizes process discovery and process enhancement, whereas conformance checking receives comparatively little attention. While conformance checking is essential for assuring process compliance in financial contexts, the complexity of implementing conformance checking and the requirement for in-depth analysis of process models and execution data mean related studies face a high technical threshold. Table 2 summarizes the relevant literature on the application of process mining in financial services operations.
In particular, we sorted financial service operation scenarios investigated in each study, mainly involving insurance claim handling and loan approvals. We also summarized the specific process-related problems or operational purposes that each study aimed to address using process mining techniques in each financial service scenario. For instance, some studies focus on business process optimization to improve efficiency or streamline operations, while others emphasize risk monitoring, compliance validation, or process performance evaluation. Furthermore, we classified research modules and research perspectives on process mining involved in each study.
Moreover, process mining studies on financial service operations predominantly rely on the Heuristic Miner and Fuzzy Miner due to their strong adaptability to complex and noisy business processes in banking. Heuristic Miner can extract dominant execution paths by leveraging activity dependency measures and frequency thresholds. Thus, it is particularly suitable for structured, high-frequency processes. Conversely, Fuzzy Miner applies automated clustering and abstraction to simplify the visualization of large-scale event logs, which makes it more appropriate for exploratory analysis.

2.3. Anomaly Detection Methods in Business Processes

Process anomaly detection is a crucial tool for ensuring compliance in corporate operations. Especially for commercial banks, compliance is intrinsically linked to financial security and risk management.
Conformance checking in process mining can identify deviations by aligning event logs with process models [49]. However, its effectiveness heavily depends on the model’s quality [50]. Statistical modeling approaches, such as Markov models [51] and association rule-based models [52], can capture the probabilistic characteristics of processes. However, these methods usually face parameter sensitivity and threshold dependence problems [53]. Additionally, clustering methods based on similarity between process models can effectively identify anomalous trajectories that deviate from mainstream patterns [54]. Inspired by natural language processing, Word2Vec can capture semantic similarities between process activities by mapping business process text data into a continuous vector space, thus adding a semantic dimension to anomaly detection [55,56].

2.4. Summary of Research Gap and Practical Motivation

Although relevant studies have demonstrated the potential of process mining in finance, the depth of its integration with real banking situations remains limited. Due to the complexity of financial business processes and the widespread existence of unstructured processes, applying process mining technology in the financial sector faces many challenges. In addition, the industry’s ability to integrate technical insights with domain-specific knowledge remains underdeveloped. Current approaches often fail to incorporate bank-specific domain knowledge, thereby limiting their practical effectiveness in anomaly detection.
Given these critical gaps, this study aims to develop a novel, domain knowledge-integrated process mining algorithm specifically designed for anomaly detection in banking contexts. Given the strict compliance requirements embedded in banking standard operating procedures and the need to distinguish between normative and exceptional paths, this study chooses to enhance the Heuristic Miner algorithm. Its explicit modeling of activity dependencies enables the seamless integration of domain knowledge via a follow-relationship matrix, thereby improving the detection and retention of low-frequency yet critical deviations.

3. Methodology

Process Anomaly Algorithm: E-Heuristic Miner

Heuristic Miner was initially proposed by A.J.M.M. Weijters et al. in 2003 and subsequently refined in 2006 [28,30]. It can rediscover actual process models from event logs based on direct dependency relationships between activities. By introducing dependency measures and setting thresholds, Heuristic Miner can filter out edges with low frequency and insufficient importance to simplify the process model. Therefore, it has the potential to enhance the readability of process models, suppress noise, and emphasize the main process structures. It has been particularly effective in addressing complex real-world processes, such as those commonly found in banking operations, where irregular behaviors and low-frequency deviations often exist.
However, the traditional Heuristic Miner algorithm relies solely on activity frequency and dependency measures to filter out low-frequency paths. These low-frequency paths may suggest critical operational risks or compliance violations in the banking context. They may also represent critical anomalies, special business scenarios, design flaws within information systems, or the irregular operational behaviors of employees. Thus, the algorithm only filters these paths using frequency results, leading to inadequate anomaly identification and potentially missing critical risk-related information included in these infrequent activities. Additionally, the original algorithm does not leverage domain-specific knowledge embedded in SOPs, limiting its capacity to accurately detect deviations of high practical importance.
To overcome these shortcomings, this study proposes an enhanced Heuristic Miner algorithm, which we call E-Heuristic Miner. The primary innovation of E-Heuristic Miner lies in its integration of banking-specific domain knowledge into the traditional heuristic mining framework. The banking-specific domain knowledge is represented by the follow-up relationship matrix produced from SOPs.
Specifically, E-Heuristic Miner includes the following four steps:
1.
Extract domain knowledge from a bank’s standard operating procedure and build a follow-up relationship matrix. Based on the explicit sequential relationship between each activity in the bank’s SOP, we can develop a follow-up relationship matrix. It contains two levels of information: process activity and activity sequence relationship information. Each element in the matrix represents the direct follow-up relationship between two activity nodes. For example, the corresponding element value in the matrix is 1 if activity A directly follows activity B and 0 if there is no direct follow-up relationship. Specifically, the formula is as follows:
f o l l o w a , b = 0 ,     a L b 1 ,     a > L b
2.
Calculate the dependency index between activities to construct a dependency/frequency table (D/F table) to count the frequency of dependency relationships between activities in event logs. The calculation method for dependency is as follows:
Let L be an event log over ζ, i.e., L     ζ * . Let a, b ∈ ζ: a > L b if there is a trace σ = t 1 t 2 t 3   t n and i     { 1 ,   ,   n 1 } such that σ     L and t i = a and t i + 1 = b . Here, | a > L b | is the number of times a > L b occurs in L:
a > L b = σ L L σ × 1 i σ     σ i = a σ i + 1 = b
a L b indicates the dependency value between a and b. The traditional formula for dependency is as follows:
a L b = a > L b b > L a a > L b + b > L a + 1 i f   a b a > L a a > L a + 1 i f   a = b
The value of | a L b | is always between −1 and 1. If it is close to 1, there is a strong positive dependency between a and b, where a is the cause of b. If it is close to −1, there is a strong negative dependency, where b is the cause of a. If a is the same as b, it means there is a loop and a strong reflexive relationship.
The importance of each path is re-evaluated in light of the domain knowledge recorded in the follow-up relationship matrix and the actual dependency of each path. The dependency calculation formula will retain and assign greater weights to the paths that deviate from the standard path and have low frequencies. The retention of the normal paths will be determined by the calculated dependency value and the established threshold. This innovation simplifies the process model to the greatest extent possible while also preserving the deviation path. The innovative dependency calculation is as follows:
a L b = ρ a > L b b > L a a > L b + b > L a + 1 + 1 ρ i f   a b ρ a > L a a > L a + 1 + 1 ρ i f   a = b
c o l o r a i   o r   a > L b = ρ d o t t e d + 1 ρ d o t t e d   a n d   l a b e l a > L b = a > L b b > L a a > L b + b > L a + 1 ρ d o t t e d + 1 ρ s o l i d   a n d   l a b e l a > L b = a > L b b > L a a > L b + b > L a + 1    
ρ is the value in the follow-up relationship matrix. In addition, c o l o r a > L b represents the color of activities and paths in the process. If there is a deviation from the activity or path, it will be colored.
3.
Convert the D/F table into a D/F graph and generate DFGs from the D/F graph. Note that this study utilized the PM4PY API tool to analyze business processes instead of relying on tools like Disco or ProM [57].
In summary, Algorithm A1 presents the implementation of E-Heuristic Miner (see Appendix A). We set the threshold θ equal to 0.5 in the algorithm, following commonly used defaults in the literature and existing tools such as the pm4py framework. By dynamically adjusting the dependency calculation of activity paths through this embedded matrix, E-Heuristic Miner significantly enhances its sensitivity to important but rare abnormalities and can identify deviations from expected business practices with greater precision. Thus, it not only maintains the interpretability and visualization advantages of the original Heuristic Miner but also substantially improves the accuracy and effectiveness of anomaly detection within complex banking process environments.

4. Empirical Study and Results

4.1. Data Source and Business Analysis

Constrained by the availability of SOPs, data confidentiality, and institutional restrictions, we chose to analyze the corporate account-opening process from one commercial bank subject to standardized regulatory oversight and exhibiting clear process structures with known variations and compliance risks. Broader access to SOPs and event logs was not possible because such internal process documentation is typically classified and not publicly shared.
Corporate banking services refer to various financial services provided by banks to legal clients such as companies and institutions. The bank we selected has 15 activities in its corporate account-opening business processes, which comprehensively address two typical scenarios: successful and failed account opening. The event logs cover the entire year of 2022. The starting activity node is “Pre-application has been submitted for review”, while the end activity node is “Next-day review has been completed”. The corporate account-opening business process is effectively managed when the process executes to the “Completed and waiting for evaluation” status. Meanwhile, it has failed if the process executes to “Pre-review rejected (in-bank cancellation)” or “On-site review failed and order withdrawn (in-bank cancellation)”. The standard operating procedure of the corporate account-opening business process is shown in Figure 2.
We have addressed sensitive data in the event logs used in this study. The event logs consist of three components: “caseID”, “activity”, and “DateTime” (the execution time of the activity). A trace is used to represent each case, containing specified execution steps known as “activities”. Table 3 illustrates a portion of the corporate account-opening business event logs.
We utilized the process mining tools ProM and PM4PY API to conduct an overall analysis of the business. Table 4 shows the number of activities and their frequencies. The highest number of executions is for “Scanned for review the next day” with 62,199 times, while the lowest number of executions is for “Customer has appointed” with 1 time. In addition, four activities do not exist in the SOP.
We conducted a systematic statistical analysis of event logs and generated the corresponding activity trajectory diagram to analyze the actual execution of the case bank’s account-opening business, as illustrated in Figure 3. The results indicate that the shortest case only involves three activities, while the longest case involves as many as 46 activities. Overall, the average number of activities for each case is about 12. In addition, we analyzed the completion time required for each case. According to Figure 4, the cycle for the majority of account-opening businesses is controlled within 10 days.
Additionally, identified potential bottlenecks in the account-opening business and performed a comprehensive analysis of the execution duration of each activity node in the process model based on completion time distribution. According to Table 5, the main bottlenecks involve “Customer has evaluated” (average time of about 7.57 days), “On-site review failed and order withdrawn” (average time of about 4.58 days), and “Counter business activated and accepted” (average time of about 2.45 days). It should be noted that “Scanned for review the next day” (average time of about 14.03 h) exhibits evident repeated operations, indicating a possible bottleneck in the business process. It is thus necessary to further analyze the business or management reasons to formulate targeted optimization measures.
Considering the data characteristics and potential bottlenecks, we applied the E-Heuristic Miner algorithm in the following section to detect process anomalies.

4.2. Process Anomaly Detection Based on E-Heuristic Miner

In this subsection, we demonstrate the effectiveness of E-Heuristic Miner and compare it with other machine learning methods.

4.2.1. The Effectiveness of E-Heuristic Miner

Using E-Heuristic Miner, we constructed a follow-up relationship matrix based on the SOP of the corporate account-opening business. Activities not included in the matrix identified in event logs may indicate anomalous behaviors. If the activities in event logs do not align with the specification of the follow-up relationship matrix, this indicates potential risk paths or deviation paths in the business process. Then, we performed process mining and deviation path identification on the case bank’s corporate account-opening business. The process model is shown in Figure 5.
Follow = start H E 01 H 2 H 1 H 5 H 8 H 9 H 12 H 14 H 13 H E 03 H 3 H E 02 H 4 H 24 H 11 end start H E 01 H 2 H 1 H 5 H 8 H 9 H 12 H 14 H 13 H E 03 H 3 H E 02 H 4 H 24 H 11 end 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
In Figure 5, the colored activities denote those that appear in the event log but are not defined in the SOP, thus referred to as deviation activities. The red lines represent the direct follow-up relationships between activities that exist in the event log but are not defined in the SOP, referred to as deviation paths. The numbers above the lines linking the activities represent the frequency of executions, while the numbers below represent the dependency values calculated by the E-Heuristic Miner algorithm. The higher the dependency value, the higher the execution frequency of the path between two adjacent activity nodes. The paths connected to “start” and “end” do not indicate the dependency value.
In addition, to verify the effectiveness of the E-Heuristic Miner algorithm, we utilized four indicators to assess the quality of process models and selected other traditional process discovery algorithms for comparative experiments. Fitness, accuracy, generalization, and simplification are the four quality dimensions that must be balanced [37,58].
  • Fitness requires that the discovered model allows for the behavior seen in the event log. Formally, the fitness for a trace, σ , given a WF-net, N , is computed as follows:
    F i t n e s s L , N = 1 2 1 σ L m N , σ σ L c N , σ + 1 2 1 σ L L σ × r N , σ σ L L σ × p N , σ
    where m represents missing tokens; c represents consumed tokens; p represents produced tokens; and r   represents remaining tokens.
  • Precision assesses whether the discovered model allows for behavior completely unrelated to what has been observed in the event log. Let E ε represent the set of events in the event log, L. For any event e E , let e n N ( e ) denote the set of activities enabled by the process model, N , following the context of event e . In other words, if we look at the state of the model just before event e occurs, e n N e is the set of all possible next activities that model N allows from that state. e n L ( e ) denotes the set of activities enabled in log L after the context of event e . Since an event e in the log has a particular prefix, e n L ( e ) represents the activities that were actually observed next in the log whenever that same state was reached. Precision is calculated as
    P r e c i s i o n L , N = 1 E e E e n L ( e ) e n N ( e )
    If precision is high, the discovered model does not allow for much more behavior than observed. Hence, e n N ( e )     e n L ( e ) . If precision is low, the discovered model allows for much more behavior than observed. Hence, e n N ( e )     e n L ( e ) .
  • Generalization measures the discovered model’s ability to generalize the example behavior seen in the event log. Generalization is calculated as
    G e n e r a l i z a t i o n L , N = 1 n o d e s   ( # e x e c u t i o n s ) 1 # n o d e s
    Here, “nodes” represent the set of nodes (activities) in the process model, N , and # n o d e s is the total number of nodes. For each node, # e x e c u t i o n s denotes the number of times that node was executed in the event log, L .
  • Simplicity addresses the structural complexity of the process model. Simplicity is calculated as
    S i m p l i c i t y L , N = 1 1 + max m e a n d e g r e e k ,     0
    where m e a n d e g r e e is calculated as the average number of incoming and outgoing arcs per node in the process model, and k is a non-negative constant chosen as a baseline for comparison. Here, we set k as 2, adhering to the PM4PY API.
As illustrated in Table 6, the Alpha and Alpha+ algorithms failed to successfully mine a process model because they could not effectively handle structural problems such as short loops. The Inductive Miner algorithm showed outstanding fitness, generalization, and simplicity, but performed poorly in precision. The performance of the E-Heuristic Miner algorithm was relatively balanced in the four indicators, so it is more suitable for complex banking business process scenarios.
The E-Heuristic Miner results indicate that only 10,318 cases in the corporate account-opening activity of the case bank adhered strictly to the SOP, representing 40.33%. As illustrated in Table 7, the deviation ratio of cases is 59.67% and these cases exhibited differing levels of deviance, mainly centering on deviation activities and deviation paths. The deviation activities emerged primarily from parts of the actual process that are invisible to managers. Current process design and process optimization mostly rely on expert experience, leading to the delayed incorporation of new activities into the SOP. It is easy to generate compliance risks in business processes. The emergence of deviation paths may also be affected by the subjective operations of bank employees.
After discussion with the case bank’s business experts, we attributed the deviation path “On-site review failed and order withdrawn → Branch is preparing” to personnel operation factors, and the rest were classified as customer factors, IT system problems, and management failure factors. The proportion of deviation cases and paths caused by personnel operation factors is shown in Figure 6.

4.2.2. Comparison of Deviation Detection Methods

In this section, we compare machine learning methods with the E-Heuristic Miner algorithm in terms of their abilities in process anomaly detection within the context of commercial banking. The machine learning models involved include the following: SVM + PCA, KNN, Word2Vec, and LSTM. SVM + PCA can accurately distinguish between normal and abnormal processes by constructing the optimal classification hyperspace in the high-dimensional feature space and removing redundant information and noise [59]. KNN can classify anomalies based on the similarity distance between cases [60]. Word2Vec can capture the semantic similarity between activities and provide an anomaly identification method in the semantic dimension [61]. LSTM can effectively utilize the historical process data and identify potential abnormal patterns due to its excellent ability to capture long sequence dependencies [62].
One-Class SVM: To characterize the behavior of process execution durations, we extracted four key statistical features from the execution time of each process instance: the minimum ( t i m e _ d i f f _ m i n ), maximum ( t i m e _ d i f f _ m a x ), mean ( t i m e _ d i f f _ m e a n ), and standard deviation ( t i m e _ d i f f _ s t d ) of the time differences. We trained a One-Class SVM model to identify anomalous process instances. To interpret the trained model and understand which features contribute most to anomaly detection, we employed permutation feature importance. Figure 7 indicated that t i m e _ d i f f _ s t d and t i m e _ d i f f _ m a x were particularly sensitive to anomalies characterized by substantial temporal fluctuations, while t i m e _ d i f f _ m i n and t i m e _ d i f f _ m e a n generally reflected efficiency issues but were less sensitive to identifying distinct anomalies. As a result, we identified 1280 anomalous cases out of 24,604 normal cases.
PCA: Referring to the four statistical features used in SVM, we added another feature: the number of activities. Thus, we utilized five features for PCA: the minimum ( t i m e _ d i f f _ m i n ), maximum ( t i m e _ d i f f _ m a x ), mean ( t i m e _ d i f f _ m e a n ), standard deviation ( t i m e _ d i f f _ s t d ) of the time differences, and number of activities ( a c t i v i t i y _ c o u n t ). We first reduced the dimensionality to two principal components; then, we applied reconstruction error for anomaly detection. An anomaly threshold was defined at a 95% confidence interval. The results are shown in Figure 8. We finally identified 1280 anomalous cases out of 24,604 normal cases based on PCA.
KNN: We selected each case’s execution duration and encoded process variants. We hypothesized that processes of the same category would exhibit similar execution durations and structural patterns, whereas anomalous cases would deviate significantly in duration or structure. Using the elbow method, we determined that 10 clusters provided the optimal partitioning. As shown in Figure 9, the largest cluster contained 24,711 cases and was designated as normal, while the remaining nine clusters were labeled anomalous.
LSTM: We encoded and normalized activity sequences for each case as inputs to the LSTM. The autoencoder consists of an encoder–decoder pair: the encoder compresses each activity sequence into a latent representation, and the decoder reconstructs the original sequence from this compressed embedding. The model was trained to minimize the reconstruction error between the input sequence and its reconstruction. Specifically, the Mean Squared Error (MSE) was used as the loss function to reproduce normal cases (low error) while amplifying deviations for anomalous cases. For optimization, the Adam optimizer was employed with a learning rate of 0.01. The model was trained for 10 epochs. As we can see from Figure 10, over the course of training, the loss steadily decreased and eventually converged, indicating that the autoencoder successfully learned the regular patterns in the data.
After training, the LSTM autoencoder was used to reconstruct each case’s activity sequence, and a reconstruction error was computed for every case. An anomaly detection mechanism was then applied by comparing each case’s error to a threshold. We defined the anomaly threshold as the mean reconstruction error across all cases plus two standard deviations (i.e., threshold = μ + 2σ). Figure 11 illustrates a scatter plot of reconstruction errors for all process cases, with the anomaly threshold and identified anomalies highlighted.
Word2Vec: We further explored semantic embedding techniques by applying Word2Vec to measure semantic similarity between adjacent activities within process cases. By averaging the cosine similarities of all adjacent activity pairs, each case was assigned an average similarity score indicating the semantic coherence of its sequence. To identify anomalies, a threshold can be defined as the mean of these average similarity scores minus one standard deviation. In our analysis, this threshold was 0.7607, meaning any case whose average similarity fell below 0.7607 was flagged as anomalous. In Figure 12, a black dashed line denotes the anomaly threshold at 0.7607. Cases to the right of this line (blue bars) have similarity above the threshold and are considered normal, while cases to the left (orange bars) fall below 0.7607 and are classified as anomalies. Based on this threshold, 4406 out of 23,779 cases (≈18.5%) were identified as anomalous in the depicted dataset. The underlying logic is that low average similarity signals an irregular sequence of activities, thereby indicating potential process anomalies.
We summarize the number of process anomalies identified by each algorithm in Table 8.
To quantitatively assess the effectiveness of each algorithm in detecting meaningful process anomalies, we defined three types of business-relevant anomalies based on domain knowledge according to the bank’s SOP, as follows:
(1)
Cases where the special deviation path of “On-site review failed and order withdrawn → Branch is preparing” exists.
(2)
Cases where the execution time of the activity exceeds 20 days. (Note: the average duration for completing the account-opening process in the case bank is 10 days.)
(3)
Cases where the number of activities exceeds 24. (Note: the average number of executed activities for the case bank’s account-opening business is 12.)
Other cases are classified as positive instances. All remaining cases were considered normal. These SOP-based definitions served as the ground truth to evaluate the proportion of each algorithm’s identified anomalies that align with actual business exceptions. The results of the comparison are shown in Table 9.
Overall, traditional machine learning methods perform poorly in process anomaly detection. The One-Class SVM and PCA algorithms perform well in detecting abnormal cases with bottlenecks and activity quantities, but cannot effectively capture abnormal cases with deviation paths. While KNN can partially identify cases with bottlenecks, its overall detection accuracy remains low. Word2Vec and LSTM mainly rely on the sequential characteristics of activities. Although they can identify abnormal cases where the activity sequence markedly diverges from the standard sequence, they are obviously insufficient in detecting local subtle deviation paths (such as the specific path of “On-site review failed and order withdrawn → Branch is preparing”). As a word-vector-embedding model, Word2Vec essentially learns a semantic relationship between activities, ignoring sequential, concurrency, and hierarchical information in the process. Thus, it fails to reveal complex process dependencies. Meanwhile, LSTM fails to fully leverage the time attributes in the process. The detected anomalies are primarily confined to cases involving a minimal number of activities (e.g., customer cancellations), an extensive number of loops, or specific activities (e.g., “Customer has appointed”), while the identification of general bottlenecks and local deviation paths is ineffective.
In comparison, the E-Heuristic Miner method can not only accurately identify deviation activities and deviation paths by integrating domain knowledge and clear business rules, but is also more sensitive to local anomalies with subtle differences. It significantly compensates for the shortcomings of traditional machine learning methods. Therefore, the E-Heuristic Miner method is more suitable for process anomaly detection in real banking scenarios.

4.3. LLM-Driven Process Anomaly Analysis and Process Optimization Analysis

With the rapid development of LLMs, their excellent natural language-understanding abilities, powerful pattern recognition, and induction abilities open up new opportunities for process analysis. Beyond traditional process mining methods, LLMs can explain the root causes of business processes and propose effective improvement plans. However, it is currently difficult to directly input multiple event log data as prompts in LLMs at one time. Thus, we adopted a two-stage abstraction method inspired by Berti et al. [40]. First, we used the PM4PY API tool to abstract structured process variants or Petri nets from the event logs, which are recorded in a textual format. Then, we transformed abstracted process information and queries relevant to process anomaly analysis into prompts and submitted them to LLMs to obtain the output results. Examples of input contents are shown in Table 10 and Table 11. Table 12 illustrates the process optimization suggestions provided by GPT 4.5.
LLMs can execute semantic reasoning and logical analysis based on process information (including process variants and Petri nets), as well as textual prompts. Despite their promising insights, LLM-generated recommendations still require validation through domain experts due to potential limitations such as domain-specific knowledge gaps and context understanding. In particular, current general-purpose LLMs lack professionalism and domain knowledge, which may result in recommendations that are linguistically plausible but misaligned with institutional policies. Additionally, LLMs may fail to fully account for implicit business logic or operational nuances that domain experts would naturally consider. This introduces a risk of over-reliance on surface-level textual associations rather than grounded process insights.
Therefore, we emphasize that LLM-generated explanations and suggestions should be treated as preliminary and supportive rather than authoritative. Their outputs should be subject to thorough expert validation prior to any operational decision-making. To strengthen the practical reliability of our findings, we conducted consultations with relevant banking professionals and business practitioners, who confirmed that the LLM-generated responses presented in this study are scientifically valid and aligned with actual business practices. Future research may address this limitation by fine-tuning LLMs on domain-specific corpora or integrating them with formal rule-based systems to ensure alignment with financial compliance standards.

4.4. Discussion

The E-Heuristic Miner algorithm’s flexibility in embedding domain-specific business rules ensures broad adaptability across diverse banking processes. Banks with varying SOPs and risk tolerance levels can readily tailor the algorithm to their specific operational contexts without compromising interpretability, thereby addressing the critical auditability and transparency requirements prevalent in the financial sector.
Although the proposed E-Heuristic Miner algorithm demonstrates promising performance in detecting process anomalies, its practical deployment within banking institutions involves several implementation challenges and regulatory considerations. Considering implementation challenges, integrating process mining tools into existing banking IT infrastructures requires consistent and structured event logs accessible across multiple departments. In practice, achieving this consistency can be hindered by data fragmentation, legacy system incompatibilities, and stringent data privacy regulations that limit the availability and sharing of operational data. From a regulatory perspective, the proposed method aligns closely with contemporary regulatory expectations, emphasizing process-level transparency and rigorous internal control frameworks, as stipulated under Basel III and other domestic supervisory guidelines. By explicitly highlighting deviations from SOPs, our method strengthens a bank’s first line of defense (operational management controls) and facilitates monitoring responsibilities assigned to the second line (compliance oversight). This clear alignment enhances the practicality and acceptability of our approach within regulated financial institutions.
However, for anomaly detection results to be effectively operationalized, they must be interpretable for compliance teams and internal auditors who typically lack specialized technical knowledge. The rule-based structure of the proposed E-Heuristic Miner naturally supports intuitive visualizations of non-compliant paths, allowing compliance personnel to quickly identify, interpret, and investigate process deviations. This visualization capability directly aids internal audit functions by prioritizing critical compliance issues for immediate follow-up. Moreover, LLMs significantly enhance the interpretability and practical usability of anomaly detection outputs; thus, anomaly detection transitions from passive reporting to active compliance and operational support, driving meaningful process improvements.

5. Conclusions

This study proposes an enhanced process mining algorithm, E-Heuristic Miner, that integrates domain-specific knowledge derived from the standard operating procedures of commercial banks to facilitate process anomaly detection. By incorporating a follow-relationship matrix derived from banking SOPs into a process mining algorithm, the E-Heuristic Miner preserves critical low-frequency behavior that traditional frequency-based miners often overlook. It explicitly flags any observed activity transitions that violate SOP-defined rules as deviation paths, thereby significantly improving the precision of anomaly detection compared with existing discovery techniques. We validated the effectiveness of the E-Heuristic Miner through empirical analysis. We utilized event logs from a commercial bank’s corporate account-opening business for analysis. The results demonstrated that the E-Heuristic Miner algorithm significantly outperforms traditional machine learning methods in detecting process deviations and process anomalies. E-Heuristic Miner effectively incorporates semantic understanding and domain knowledge and reduces compliance risks, addressing a key limitation of earlier approaches.
The practical implication of E-Heuristic Miner is its improved interpretability and accuracy in detecting compliance violations and irregular process paths for banking institutions that would otherwise remain hidden. The identified process anomalies highlight specific procedural breakdowns and inefficiencies, guiding managers and auditors to address these issues proactively and optimize process flows for better operational compliance and efficiency. In this way, the E-Heuristic Miner is a valuable tool for proactive anomaly detection and continuous improvement in real-world, compliance-driven financial service scenarios.
Moreover, this study explores the potential of LLMs for post hoc analysis and explanations of process anomalies. By coupling E-Heuristic Miner’s outputs with LLM-generated natural language analysis, we transform raw anomaly detection results into meaningful narratives and actionable recommendations. This helps stakeholders readily understand the root causes of detected anomalies and receive suggestions for remediation or process enhancement without deep technical expertise.
Our work has several limitations. First, the event logs used in this study contained basic information such as caseID, timestamps, and activities. Future research should enhance event logs by incorporating additional resource information to enable a deeper analysis of the relationship between control flows and organizational structures. Second, although combining the E-Heuristic Miner algorithm with domain knowledge enhances process anomaly detection, it still lacks the ability to model complex causal relationships and graph-based structures. Future research could integrate Graph Neural Networks (GNNs) or causal inference methods to enhance the precision and interpretability of process anomaly detection. Third, the privacy of domain knowledge restricts their extensive application within LLMs. Future research should explore the development of secure, bank-specific LLMs tailored for sensitive financial contexts.

Author Contributions

Conceptualization, Y.L., Z.N. and B.X.; methodology, Y.L. and Z.N.; software, Y.L. and Z.N.; validation, Y.L., Z.N. and B.X.; formal analysis, Y.L. and Z.N.; investigation, Y.L. and Z.N.; resources, Z.N.; data curation, Y.L. and Z.N.; writing—original draft preparation, Y.L. and Z.N.; writing—reviewing and editing, Y.L. and Z.N.; visualization, Y.L. and Z.N.; supervision, B.X.; project administration, Y.L., Z.N. and B.X.; funding acquisition, B.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (special project “Modeling and Optimization of Financial Service Based on Data and Behavior,” grant number 72342024; “Technology Finance: Theories and Empirical Evidence,” grant number 72495150; “Bounded Rationality of Financial Consumers, Misconduct of Financial Institutions and Suboptimal Financial Consumption Decisions,” grant number 72071102) and National Social Science Foundation of China (special project “Research on the Innovative Path of Social Credit System in the Modernization of Social Governance,” grant number 23&ZD175).

Data Availability Statement

The data presented in this study are available on request from the corresponding author due to privacy reasons.

Conflicts of Interest

Zaiwen Ni is a postdoctoral research fellow jointly affiliated with the Postdoctoral Research Station of Nanjing University and the Postdoctoral Research Workstation of Suzhou International Development Group Co., Ltd. The authors declare that this collaboration does not involve any commercial or financial relationships that could be construed as a potential conflict of interest. The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
BPMBusiness Process Management
SOPsStandard Operating Procedures
LLMsLarge-language Models
CBRCChina Banking and Insurance Regulatory Commission
FSMFinite State Machines
DAGDirected Acyclic Graphs
WF-netsWorkflow nets
C-netsCausal nets
DFGDirectly-Follows Graph
BPMNBusiness Process Modeling Notation
MSEMean Squared Error

Appendix A. Pseudocode of Algorithm A1

Algorithm A1. The E-Heuristic Miner Algorithm
Input:
 - The event log L ⊆ ζ* of target business process in the bank
 - The SOP Manual S of target bank
 - Dependency Threshold θ
Output:
 - Annotated Process Model with Deviation Paths (DFG)
//Step 1: Construct Follow-Up Relationship Matrix from SOP
function BuildFollowUpMatrix(S):
 ζ ← Extract unique activities from S
 Initialize n × n matrix ρ, where n = |ζ|
 for each explicit sequence (A → B) in S do
  if SOP defines “A directly followed by B” then
    ρ [A][B] ← 1
  else
    ρ [A][B] ← 0
  end if
 end for
return   ζ ,   ρ
//Step 2: Calculate Enhanced Dependency Index
function ComputeDependency(L, ρ ):
 Initialize D/F_Table as n × n matrix
for   each   trace   σ = t 1 t 2 t 3 t n in L do
  for i = 1 to n-1 do
    a     t i ,   b     t i + 1
    | a > L b |     | a > L b | + 1
  end for
 end for

 for each activity pair (a, b) ∈ ζ × ζ do
  //Traditional dependency measure
   dependency     ( | a > L b |     | b > L a | ) / ( | a > L b |   +   | b > L a | + 1)
  //Enhanced weighting with domain knowledge
  if ρ [a][b] == 0 then
   weight ← 1 + log (1 + | a > L b | )
  else
   weight ← 1
  end if
   D/F_Table [a][b] ← dependency × weight
 end for
 return D/F_Table
//Step 3: Generate Annotated DFG
function GenerateDFG(D/F_Table, ρ ):
 Initialize directed graph G = (ζ, E)
 for each (a, b) where D/F_Table[a][b] > threshold do
  Add edge a → b to E
   if   ρ [a][b] == 0 then
   Set color(b) = red //Mark deviation nodes
  end if
 end for
 return G
ρ ← BuildFollowUpMatrix(S)
D/F_Table ← ComputeDependency(L, ρ )
DFG ← GenerateDFG(D/F_Table, ρ )

References

  1. Gomber, P.; Kauffman, R.J.; Parker, C.; Weber, B.W. On the Fintech Revolution: Interpreting the Forces of Innovation, Disruption, and Transformation in Financial Services. J. Manag. Inf. Syst. 2017, 35, 220–265. [Google Scholar] [CrossRef]
  2. Hatzakis, E.; Nair, S.K.; Pinedo, M. Operations in Financial Services—An Overview. Prod. Oper. Manag. 2010, 19, 633–664. [Google Scholar] [CrossRef]
  3. Cheng, X.; Du, A.M.; Yan, C.; Goodell, J.W. Internal business process governance and external regulation: How does AI technology empower financial performance? Int. Rev. Financ. Anal. 2025, 99, 103927. [Google Scholar] [CrossRef]
  4. Kim, Y.; Xu, Y. Operational Risk Management: Optimal Inspection Policy. Manag. Sci. 2024, 70, 18. [Google Scholar] [CrossRef]
  5. Adams, N.; Augusto, A.; Davern, M.J.; La Rosa, M. Five guidelines to improve context-aware process selection: An Australian banking perspective. Bus. Process Manag. J. 2023, 31, 878–903. [Google Scholar] [CrossRef]
  6. Oberle, L.J. How to build responsive service processes in German banks: The role of process documentation and the myth of automation. Bus. Process. Manag. J. 2023, 29, 578–596. [Google Scholar] [CrossRef]
  7. Xu, Y.; Pinedo, M.; Xue, M. Operational Risk in Financial Services: A Review and New Research Opportunities. Prod. Oper. Manag. 2016, 26, 426–445. [Google Scholar] [CrossRef]
  8. Bin-Qing, X.; Xin-Dan, L.I.; Yu-Qian, X.U.; Yuan-Qiao, C. Process, compliance and operational risk management. J. Manag. Sci. China 2017, 20, 117–123. [Google Scholar]
  9. De Weerdt, J.; Schupp, A.; Vanderloock, A.; Baesens, B. Process Mining for the multi-faceted analysis of business processes—A case study in a financial services organization. Comput. Ind. 2013, 64, 57–67. [Google Scholar] [CrossRef]
  10. Taskesenlioglu, S.; Ozkan, N.F.; Erdogan, T.G. Identifying Possible Improvements of Software Development Life Cycle (SDLC) Process of a Bank by Using Process Mining. Int. J. Softw. Eng. Knowl. Eng. 2022, 32, 525–552. [Google Scholar] [CrossRef]
  11. van der Aalst, W.M.P. Process Mining: Data Science in Action; Springer: Berlin/Heidelberg, Germany, 2016. [Google Scholar]
  12. Grisold, T.; Mendling, J.; Otto, M.; Brocke, J.V. Adoption, use and management of process mining in practice. Bus. Process. Manag. J. 2020, 27, 369–387. [Google Scholar] [CrossRef]
  13. Novak, C.; Pfahlsberger, L.; Bala, S.; Revoredo, K.; Mendling, J. Enhancing decision-making of IT demand management with process mining. Bus. Process. Manag. J. 2023, 29, 230–259. [Google Scholar] [CrossRef]
  14. Rott, J.; Böhm, M.; Krcmar, H. Laying the ground for future cross-organizational process mining research and application: A literature review. Bus. Process. Manag. J. 2024, 30, 144–206. [Google Scholar] [CrossRef]
  15. Zerbino, P.; Stefanini, A.; Aloini, D. Process Science in Action: A Literature Review on Process Mining in Business Management. Technol. Forecast. Soc. Change 2021, 172, 121021. [Google Scholar] [CrossRef]
  16. dos Santos Garcia, C.; Meincheim, A.; Junior, E.R.; Dallagassa, M.R.; Sato, D.M.; Carvalho, D.R.; Santos, E.A.; Scalabrin, E.E. Process mining techniques and applications-A systematic mapping study. Expert Syst. Appl. 2019, 133, 260–295. [Google Scholar] [CrossRef]
  17. Gomes, A.F.C.; de Lacerda, A.C.W.G.; da Silva Fialho, J.R. Comparative Analysis of Process Mining Algorithms in Process Discover. In Advances in Intelligent Systems and Computing; Springer: Berlin/Heidelberg, Germany, 2021. [Google Scholar]
  18. Tiwari, A.; Turner, C.J.; Majeed, B. A review of business process mining: State-of-the-art and future trends. Bus. Process Manag. J. 2008, 14, 5–22. [Google Scholar] [CrossRef]
  19. Rojos, F.; Lucchini, F.; González, J.; Espinoza, D.; Lee, J.; Fernández, J.P.S.; Arias, M. Process Mining: Research in Banking Operations; Eindhoven University of Technology: Eindhoven, The Netherlands, 2017. [Google Scholar]
  20. Carmona, R.; Cofré, R.; Naranjo, C.; Vásquez, O.; Lee, J.; Fernández, J.P.S.; Arias, M. Analysis of Loan Application Process Using Process Mining; Eindhoven University of Technology: Eindhoven, The Netherlands, 2017. [Google Scholar]
  21. Bautista, A.D.; Wangikar, L.; Akbar, S.M.K. Process Mining-Driven Optimization of a Consumer Loan Approvals Process. In Proceedings of the International Conference on Business Process Management, Tallinn, Estonia, 3–6 September 2012. [Google Scholar]
  22. van der Aalst, W.M.P. A practitioner’s guide to process mining: Limitations of the directly-follows graph. Procedia Comput. Sci. 2019, 164, 321–328. [Google Scholar] [CrossRef]
  23. Huser, V. Process Mining: Discovery, Conformance and Enhancement of Business Processes. J. Biomed. Inform. 2012, 45, 1018–1019. [Google Scholar] [CrossRef]
  24. van der Aalst, W.M.P.; Weijters, A.J.M.M. Process mining: A research agenda. Comput. Ind. 2004, 53, 231–244. [Google Scholar] [CrossRef]
  25. Cook, J.E.; Wolf, A.L. Automating Process Discovery through Event-Data Analysis. In Proceedings of the 17th International Conference on Software Engineering, Seattle, WA, USA, 23–30 April 1995; p. 73. [Google Scholar]
  26. Agrawal, R.; Gunopulos, D.; Leymann, F. Mining Process Models from Workflow Logs. In Proceedings of the International Conference on Extending Database Technology, Valencia, Spain, 23–27 March 1998. [Google Scholar]
  27. van der Aalst, W.M.P.; Weijters, A.J.M.M.; Măruşter, L. Workflow mining: Discovering process models from event logs. IEEE Trans. Knowl. Data Eng. 2004, 16, 1128–1142. [Google Scholar] [CrossRef]
  28. Weijters, A.J.M.M.; van der Aalst, W.M.P. Rediscovering workflow models from event-based data using little thumb. Integr. Comput. Aided Eng. 2003, 10, 151–162. [Google Scholar] [CrossRef]
  29. De Medeiros, A.A.; Weijters, A.J.; Van Der Aalst, W.M. Using Genetic Algorithms to Mine Process Models: Representation, Operators and Results; Eindhoven University of Technology: Eindhoven, The Netherlands, 2004. [Google Scholar]
  30. Weijters, A.J.; van Der Aalst, W.M.; De Medeiros, A.A. Process Mining with the Heuristics Miner Algorithm; Eindhoven University of Technology: Eindhoven, The Netherlands, 2006. [Google Scholar]
  31. Van Der Aalst, W.; Buijs, J.; Van Dongen, B. Towards Improving the Representational Bias of Process Mining. In Proceedings of the International Symposium on Data-Driven Process Discovery and Analysis, Campione d’Italia, Italy, 29 June–1 July 2011. [Google Scholar]
  32. Günther, C.W.; van der Aalst, W.M.P. Fuzzy Mining-Adaptive Process Simplification Based on Multi-perspective Metrics. In Proceedings of the International Conference on Business Process Management, Brisbane, Australia, 24–28 September 2007. [Google Scholar]
  33. Buijs, J.C.; Van Dongen, B.F.; van Der Aalst, W.M. On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery. In Proceedings of the CoopIS, DOA-SVI, and ODBASE 2012—OTM Conferences, Rome, Italy, 10–14 September 2012. [Google Scholar]
  34. Leemans, S.J.J.; Fahland, D.; van der Aalst, W.M.P. Discovering Block-Structured Process Models from Event Logs-A Constructive Approach. In Proceedings of the International Conference on Applications and Theory of Petri Nets and Concurrency, Berlin, Germany, 24 June 2013. [Google Scholar]
  35. vanden Broucke, S.K.; De Weerdt, J. Fodina: A robust and flexible heuristic process discovery technique. Decis. Support Syst. 2017, 100, 109–118. [Google Scholar] [CrossRef]
  36. Augusto, A.; Conforti, R.; Dumas, M.; La Rosa, M.; Polyvyanyy, A. Split miner: Automated discovery of accurate and simple business process models from event logs. Knowl. Inf. Syst. 2018, 59, 251–284. [Google Scholar] [CrossRef]
  37. Buijs, J.C.; van Dongen, B.F.; van der Aalst, W.M. Quality Dimensions in Process Discovery: The Importance of Fitness, Precision, Generalization and Simplicity. Int. J. Coop. Inf. Syst. 2014, 23, 1440001. [Google Scholar] [CrossRef]
  38. Berti, A.; Schuster, D.; van der Aalst, W.M.P. Abstractions, Scenarios, and Prompt Definitions for Process Mining with LLMs: A Case Study. arXiv 2023, arXiv:2307.02194. [Google Scholar]
  39. Kampik, T.; Warmuth, C.; Rebmann, A.; Agam, R.; Egger, L.N.; Gerber, A.; Hoffart, J.; Kolk, J.; Herzig, P.; Decker, G.; et al. Large Process Models: A Vision for Business Process Management in the Age of Generative AI. KI-Künstliche Intell. 2023. [Google Scholar] [CrossRef]
  40. Ashoori, M.; Tarokh, M.J. Applying the Process Mining Project Methodology for Insurance Risks Reduction. Int. J. Res. 2014, 3, 57–69. [Google Scholar]
  41. Conforti, R.; De Leoni, M.; La Rosa, M.; Van Der Aalst, W.M.; Ter Hofstede, A.H. A recommendation system for predicting risks across multiple business process instances. Decis. Support Syst. 2015, 69, 1–19. [Google Scholar] [CrossRef]
  42. Carvallo, A.; Henning, C.H.; Razmilić, D.; López, R.R.; Lee, J.; Fernández, J.P.S.; Arias, M. Applying Process Mining for Loan Approvals in a Banking Institution; Eindhoven University of Technology: Eindhoven, The Netherlands, 2017. [Google Scholar]
  43. Rodrigues, A.; Almeida, C.; Saraiva, D.; Moreira, F.; Spyrides, G.; Varela, G.; Krieger, G.; Peres, I.; Dantas, L.; Lana, M.; et al. Stairway to Value: Mining a Loan Application Process. 2017. Available online: https://ais.win.tue.nl/bpi/2017/bpi2017_winner_academic.pdf (accessed on 25 June 2025).
  44. Maaradji, A.; Dumas, M.; Rosa, M.L.; Ostovar, A. Detecting Sudden and Gradual Drifts in Business Processes from Execution Traces. IEEE Trans. Knowl. Data Eng. 2017, 29, 2140–2154. [Google Scholar] [CrossRef]
  45. Moreira, C.; Haven, E.; Sozzo, S.; Wichert, A.M. Process mining with real world financial loan applications: Improving inference on incomplete event logs. PLoS ONE 2018, 13, e0207806. [Google Scholar] [CrossRef]
  46. Yazici, I.; Engin, O. For Loan Processing a Fuzzy Process Mining. J. Adv. Res. Nat. Appl. Sci. 2023, 9, 511–530. [Google Scholar]
  47. Bahaweres, R.B.; Zakiyyah, H. Enhancing Loan Application Business Process Model with Multi-perspective Process Mining. In Proceedings of the 11th International Conference on Cyber and IT Service Management (CITSM) 2023, Makassar, Indonesia, 10–11 November 2023; pp. 1–6. [Google Scholar]
  48. Mongkolrob, S.; Intarasema, S.; Premchaiswadi, W. From Data to Decisions: Enhancing Loan Applications with Process Mining Techniques. In Proceedings of the 22nd International Conference on ICT and Knowledge Engineering (ICT&KE), Bangkok, Thailand, 20–22 November 2024; IEEE: New York, NY, USA, 2024; pp. 1–8. [Google Scholar]
  49. Dunzer, S.; Stierle, M.; Matzner, M.; Baier, S. Conformance checking: A state-of-the-art literature review. arXiv 2019, arXiv:2007.10903. [Google Scholar]
  50. Van der Aalst, W.; Adriansyah, A.; Van Dongen, B. Replaying history on process models for conformance checking and performance analysis. Wiley Interdiscip. Rev.: Data Min. Knowl. Discov. 2012, 2, 182–192. [Google Scholar] [CrossRef]
  51. Zhang, Z.; Hildebrant, R.; Asgarinejad, F.; Venkatasubramanian, N.; Ren, S. Improving Process Discovery Results by Filtering Out Outliers from Event Logs with Hidden Markov Models. In Proceedings of the IEEE 23rd Conference on Business Informatics (CBI), Online Conference, 1–3 September 2021; pp. 171–180. [Google Scholar]
  52. Böhmer, K.; Rinderle-Ma, S. Mining association rules for anomaly detection in dynamic process runtime behavior and explaining the root cause to users. Inf. Syst. 2020, 90, 101438. [Google Scholar] [CrossRef]
  53. Bezerra, F.; Wainer, J. A Dynamic Threshold Algorithm for Anomaly Detection in Logs of Process Aware Systems. J. Inf. Data Manag. 2012, 3, 316–331. [Google Scholar]
  54. Folino, F.; Greco, G.; Guzzo, A.; Pontieri, L. Mining usage scenarios in business processes: Outlier-aware discovery and run-time prediction. Data Knowl. Eng. 2011, 70, 1005–1029. [Google Scholar] [CrossRef]
  55. Wang, J.; Tang, Y.; He, S.; Zhao, C.; Sharma, P.K.; Alfarraj, O.; Tolba, A. LogEvent2vec: LogEvent-to-Vector Based Anomaly Detection for Large-Scale Logs in Internet of Things. Sensors 2020, 20, 2451. [Google Scholar] [CrossRef] [PubMed]
  56. Wang, J.; Zhao, C.; He, S.; Gu, Y.; Alfarraj, O.; Abugabah, A. LogUAD: Log Unsupervised Anomaly Detection Based on Word2Vec. Comput. Syst. Sci. Eng. 2022, 41, 1207–1222. [Google Scholar] [CrossRef]
  57. Berti, A.; van Zelst, S.; Schuster, D. PM4Py: A process mining library for Python. Softw. Impacts 2023, 17, 100556. [Google Scholar] [CrossRef]
  58. Munoz-Gama, J.; Carmona, J. Enhancing precision in Process Conformance: Stability, confidence and severity. In Proceedings of the IEEE Symposium on Computational Intelligence and Data Mining (CIDM), Paris, France, 11–15 April 2011; pp. 184–191. [Google Scholar]
  59. Wang, K.; Liu, X. An Anomaly Detection Method of Industrial Data Based on Stacking Integration. J. Artif. Intell. 2021, 3, 9. [Google Scholar] [CrossRef]
  60. Wang, J.; Zhou, Z.; Li, Z.; Du, S. A Novel Fault Detection Scheme Based on Mutual k-Nearest Neighbor Method: Application on the Industrial Processes with Outliers. Processes 2022, 10, 497. [Google Scholar] [CrossRef]
  61. Zhou, J.; Ye, Z.; Zhang, S.; Geng, Z.; Han, N.; Yang, T. Investigating response behavior through TF-IDF and Word2vec text analysis: A case study of PISA 2012 problem-solving process data. Heliyon 2024, 10, e35945. [Google Scholar] [CrossRef] [PubMed]
  62. Gomolka, Z.; Żesławska, E.; Olbrot, L. Using Hybrid LSTM Neural Networks to Detect Anomalies in the Fiber Tube Manufacturing Process. Appl. Sci. 2025, 15, 1383. [Google Scholar] [CrossRef]
Figure 1. Research evolution diagram of process discovery algorithms.
Figure 1. Research evolution diagram of process discovery algorithms.
Systems 13 00545 g001
Figure 2. SOP of the corporate account-opening business process in the case bank.
Figure 2. SOP of the corporate account-opening business process in the case bank.
Systems 13 00545 g002
Figure 3. Execution of corporate account-opening business activities.
Figure 3. Execution of corporate account-opening business activities.
Systems 13 00545 g003
Figure 4. Completion time distribution of corporate account-opening business activities.
Figure 4. Completion time distribution of corporate account-opening business activities.
Systems 13 00545 g004
Figure 5. Process model of corporate account-opening business process generated by the E-Heuristic Miner algorithm.
Figure 5. Process model of corporate account-opening business process generated by the E-Heuristic Miner algorithm.
Systems 13 00545 g005
Figure 6. Proportion of cases (a) and paths (b) involved in personnel operation factors.
Figure 6. Proportion of cases (a) and paths (b) involved in personnel operation factors.
Systems 13 00545 g006
Figure 7. Permutation feature importance of the four execution time features in the one-class SVM model.
Figure 7. Permutation feature importance of the four execution time features in the one-class SVM model.
Systems 13 00545 g007
Figure 8. PCA results.
Figure 8. PCA results.
Systems 13 00545 g008
Figure 9. KNN results.
Figure 9. KNN results.
Systems 13 00545 g009
Figure 10. LSTM training.
Figure 10. LSTM training.
Systems 13 00545 g010
Figure 11. Scatter plot of reconstruction errors for all process cases.
Figure 11. Scatter plot of reconstruction errors for all process cases.
Systems 13 00545 g011
Figure 12. Word2Vec results.
Figure 12. Word2Vec results.
Systems 13 00545 g012
Table 1. Representative process discovery algorithms.
Table 1. Representative process discovery algorithms.
AlgorithmGraph RepresentationProposer
RNN, KTAIL, and Markov MinersFSM, Markov chain[25]
Agrawal MinerDAG[26]
Alpha MinerWF-net[27]
Heuristic Miner 1.0WF-net[28]
Genetic Miner 1.0C-net[29]
Heuristic Miner 2.0C-net[30]
Genetic Miner 2.0Process Tree[31]
Fuzzy MinerDFG[32]
ETMProcess Tree[33]
Inductive MinerProcess Tree[34]
Fodina MinerC-net[35]
Split MinerBPMN[36]
Table 2. Process mining in financial services operations.
Table 2. Process mining in financial services operations.
ReferenceScenariosPurposeResearch ModulesPerspectivesAlgorithms
[9]Insurance claim processBusiness process optimizationProcess discovery; process enhancementControl flow; organization; caseHeuristic Miner
[40]Insurance claim processRisk monitoring and risk identificationProcess discovery; process enhancementControl flow; caseHeuristic Miner
[41]Bank loan processProcess risk assessment;
process risk identification
Process discovery; process enhancementControl flow; caseNot mentioned
[20]Bank loan processBusiness process optimization;
process prediction
Process discovery; process enhancementControl flow; organization; caseInductive Miner
[42]Bank loan processBusiness process analysis;
process analysis framework
Process discovery; process enhancementControl flow; time; caseHeuristic Miner
Inductive Miner
Fuzzy Miner
[43]Bank loan processBusiness process analysis and optimization;
process variants analysis;
process performance evaluation
Process discovery; process enhancementControl flow; caseNot mentioned
[44]Bank loan processConcept drift;
business process analysis
Process discoveryControl flowNot mentioned
[45]Bank loan processBusiness process optimization;
process prediction
Process discoveryControl flowNot mentioned
[46]Bank loan processBusiness process analysisProcess discovery; process enhancementControl flow; caseFuzzy Miner
[47]Bank loan processBusiness process modeling analysisProcess discovery;
conformance checking;
process enhancement
Control flow; time; organization; caseInductive Miner
[48]Bank loan processBusiness process analysis
Process improvement based on Generative AI
Process discovery; process enhancementControl flow; time; caseFuzzy Miner
Table 3. Event logs for the corporate account-opening business process (partial).
Table 3. Event logs for the corporate account-opening business process (partial).
Case IDActivityDate Time
B1xxxxxxxx448632Pre-application has been submitted for review2022-01-02 19:24:01
Under pre-review at the bank’s preliminary review post2022-01-04 08:55:23
Pre-review has been completed and pending processing by the branch.2022-01-04 08:59:24
The branch is preparing.2022-01-10 09:24:31
…………
B1xxxxxxxx454247Pre-application has been submitted for review2022-01-03 19:41:48
Open electronic channel pre-filled form2022-01-03 19:41:48
Under pre-review at the bank’s preliminary review post2022-01-04 09:04:14
…………
………………
Table 4. Sorting of process activity execution frequency.
Table 4. Sorting of process activity execution frequency.
ActivityActivity NumberFrequencyProportion
Scanned for review the next dayHE0162,19920.00%
Under pre-review at the bank’s preliminary review postH234,78511.18%
Pre-application has been submitted for reviewH133,91110.90%
Pre-review has been completed and pending processing by the branch.H523,3597.51%
The branch is preparing.H823,0877.42%
Counter business activated and acceptedH922,6007.27%
Completed and waiting for evaluationH1422,5527.25%
Account opening completedH1222,5527.25%
Items handed over, and the customer left the counterH1322,5527.25%
Next-day review has been completedHE0321,2776.84%
Pre-review waiting for customer to modify/supplement informationH389252.87%
Open electronic channel pre-filled formH9069942.25%
Next-day review returned and modifiedHE0228450.92%
Pre-review rejectedH416270.52%
Customer has withdrawn the orderH9912210.39%
Customer has evaluatedH243550.11%
On-site review failed and order withdrawnH111840.06%
Customer has appointedH97280.01%
Customer has obtained appointment numberH9810.00%
Table 5. Consuming time for corporate account-opening business activities.
Table 5. Consuming time for corporate account-opening business activities.
ActivityAverage Completion Time (Activity)Variance of Execution Completion Time (Activity)Average Completion Time (Case)Variance of
Completion Time (Case)
Customer has evaluated7.57 days16.29 days5.57 days8.22 days
On-site review failed and order withdrawn4.58 days10.25 days
Counter business activated and accepted2.45 days4.86 days
Next-day review has been completed1.47 days1.12 days
Next-day review returned and modified1.44 days2.31 days
Completed and waiting for evaluation23.69 h6.21 days
Branch is preparing23.57 h2.63 days
Scanned for review the next day14.03 h1.34 days
Customer has withdrawn the order10.2 h3.11 days
Under pre-review at the bank’s preliminary review post3.33 h21.05 h
Pre-application has been submitted for review2.86 h20.44 h
Customer has appointed1.26 h2.21 h
Pre-review has been completed and pending processing by the branch8.86 min41.98 min
Account opening completed8.50 min1.45 h
Pre-review rejected6.42 min11.51 min
Pre-review waiting for customer to modify/supplement information2.76 min41.41 min
Items handed over, and the customer left the counter1.46 min1.12 h
Open electronic channel pre-filled form1.4 min42 min
Customer has obtained appointment number--
Table 6. Results of process quality assessment.
Table 6. Results of process quality assessment.
IndicatorsAlpha MinerAlpha + MinerE-Heuristic MinerInductive Miner
Success/FailureFailureFailureSuccessSuccess
Fitness--0.9111.0
Precision--0.9890.131
Generalization--0.8450.940
Simplicity--0.4950.522
Table 7. Analysis of deviation in corporate account-opening business in the case of the bank.
Table 7. Analysis of deviation in corporate account-opening business in the case of the bank.
Deviation Ratio of ActivitiesDeviation Ratio of PathsDeviation Ratio of Cases
21.05%9.00%59.67%
Table 8. Process anomaly detection results.
Table 8. Process anomaly detection results.
AlgorithmNumber of Cases in Process Anomalies
E-Heuristic Miner15,266
One-Class SVM1280
PCA1280
KNN873
LSTM822
Word2Vec4406
Table 9. Comparison of deviation detection methods.
Table 9. Comparison of deviation detection methods.
AlgorithmThe Number and Accuracy of (1) IdentifiedThe Number and Accuracy of (2) and (3) Identified
E-Heuristic Miner11 (100%)782 (100%)
One-Class SVM2 (18.18%)708 (90.54%)
PCA0 (0.00%)686 (87.72%)
KNN0 (0.00%)95 (12.15%)
LSTM0 (0.00%)54 (6.91%)
Word2Vec0 (0.00%)17 (2.17%)
Table 10. Inputting the textual abstraction of the process variant into GPT 4.5.
Table 10. Inputting the textual abstraction of the process variant into GPT 4.5.
Input Text
Pre-application has been submitted for review -> Under pre-review at the bank’s preliminary review post -> Pre-review has been completed and pending processing by the branch -> Branch is preparing -> Counter business activated and accepted -> Account opening completed -> Items handed over and the customer left the counter -> Completed and waiting for evaluation -> Scanned for review the next day -> Scanned for review the next day -> Scanned for review the next day (frequency = 3334 performance = 420,942.921)

Pre-application has been submitted for review -> Under pre-review at the bank’s preliminary review post -> Pre-review has been completed and pending processing by the branch -> Branch is preparing -> Counter business activated and accepted -> Account opening completed -> Items handed over and the customer left the counter -> Completed and waiting for evaluation -> Scanned for review the next day -> Scanned for review the next day -> Scanned for review the next day (frequency = 3172 performance = 454,041.715)

Pre-application has been submitted for review -> Under pre-review at the bank’s preliminary review post -> Pre-review has been completed and pending processing by the branch -> Branch is preparing -> Counter business activated and accepted -> Account opening completed -> Items handed over and the customer left the counter -> Completed and waiting for evaluation -> Scanned for review the next day -> Scanned for review the next day -> Scanned for review the next day (frequency = 1567 performance = 425,102.106)

……
What are the root causes of issues in the process? Please provide only process and data specific considerations.
Table 11. Inputting Petri net text into GPT 4.5.
Table 11. Inputting Petri net text into GPT 4.5.
Input Text
If I have a Petri net:
places: [ p_10, p_11, p_12, p_14, p_15, p_16, p_17, p_19, p_20, p_23, p_24, p_26, p_27, p_28, p_29, p_3, p_30, p_31, p_32, p_33, p_34, p_35, p_37, p_38, p_39, p_4, p_40, p_41, p_42, p_43, p_45, p_46, p_47, p_48, p_49, p_5, p_51, p_52, p_53, p_54, p_55, p_7, p_8, p_9, sink, source]
transitions: [ (0281b5d2-8984-475f-9fef-38bf648715db, “Customer has appointed”), (0803a375-70b2-4982-a1c7-2b69d3416a7f, “Branch is preparing”), ……)]
arcs: [ (0281b5d2-8984-475f-9fef-38bf648715db, “Customer has appointed”) -> p_35, (0803a375-70b2-4982-a1c7-2b69d3416a7f, “Branch is preparing”)->……]
initial marking: [“source:1”]
final marking: [“sink:1”]

Can you provide suggestions to improve the process model based on your domain knowledge?
Table 12. Process optimization analysis results from GPT 4.5.
Table 12. Process optimization analysis results from GPT 4.5.
BottlenecksAnalysisSuggestions
“Scanned and waiting for next day review” is repeated“Scanned and waiting for next day review” occurs 2–4 times, which indicates that there may be review delays or redundant steps, which impede the prompt completion of business process.OCR + real-time review can reduce repeated scanning
“Pre-review pending customer modification/supplementary information” is stuck in a loop“pre-review pending customer modification/supplementary information” occurs in multiple cycles and loops.AI can help identify missing information and bring up all issues at once. It can also help customers modify or submit online.
The position of “Completed and waiting for evaluation” is not unifiedIn some cases, “Completed and waiting for evaluation” occurs before or after “Scanned and waiting for next day review”. This means that there is a logical ambiguity or parallel problem between the evaluation process and the next-day review.Unifies the evaluation order and provides an online evaluation channel
High customer order cancellation rateFor example, after “Pre-review pending customer modification/supplementary information”, the frequency of customer order cancellation is 166.
After “Pre-review has been completed and pending processing by the branch”, the frequency of customer order cancellation is 191.
After multiple loops of “Pre-review pending customer modification/supplementary information” -> “Pre-application has been submitted for review”, the customer cancels the order.
(1) Identifies and reduces non-business value nodes such as “customer order cancellation”, “Pre-review rejected”, and “On-site review failed and order withdrawn”. These states may represent abnormal termination. If they constitute an excessively high proportion in the process, it may indicate that the customer experience is poor or that the bank’s pre-review criteria are ambiguous or too strict.
(2) Introduces automated review mechanisms to reduce inefficient manual review.
Frequent pre-review rejectionPre-review rejections have occurred many times, indicating that the customer’s specific requirements may not be fully informed during the initial review or solutions may not be provided in a timely manner.Provides information about risks in advance and recommends alternatives.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, Y.; Ni, Z.; Xiao, B. Domain Knowledge-Enhanced Process Mining for Anomaly Detection in Commercial Bank Business Processes. Systems 2025, 13, 545. https://doi.org/10.3390/systems13070545

AMA Style

Li Y, Ni Z, Xiao B. Domain Knowledge-Enhanced Process Mining for Anomaly Detection in Commercial Bank Business Processes. Systems. 2025; 13(7):545. https://doi.org/10.3390/systems13070545

Chicago/Turabian Style

Li, Yanying, Zaiwen Ni, and Binqing Xiao. 2025. "Domain Knowledge-Enhanced Process Mining for Anomaly Detection in Commercial Bank Business Processes" Systems 13, no. 7: 545. https://doi.org/10.3390/systems13070545

APA Style

Li, Y., Ni, Z., & Xiao, B. (2025). Domain Knowledge-Enhanced Process Mining for Anomaly Detection in Commercial Bank Business Processes. Systems, 13(7), 545. https://doi.org/10.3390/systems13070545

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop