Real-world business processes are dynamic, with event logs that are generally unstructured and contain heterogeneous business classes. Process mining techniques derive useful knowledge from such logs but translating them into simplified and logical segments is crucial. Complexity is increased when dealing with business processes with a large number of events with no outcome labels. Techniques such as trace clustering and event clustering, tend to simplify the complex business logs but the resulting clusters are generally not understandable to the business users as the business aspects of the process are not considered while clustering the process log. In this paper, we provided a multi-stage hierarchical framework for business-logic driven clustering of highly variable process logs with extensively large number of events. Firstly, we introduced a term contrail processes for describing the characteristics of such complex real-world business processes and their logs presenting contrail-like models. Secondly, we proposed an algorithm Novel Hierarchical Clustering (NoHiC) to discover business-logic driven clusters from these contrail processes. For clustering, the raw event log is initially decomposed into high-level business classes, and later feature engineering is performed exclusively based on the business-context features, to support the discovery of meaningful business clusters. We used a hybrid approach which combines rule-based mining technique with a novel form of agglomerative hierarchical clustering for the experiments. A case-study of a CRM process of the UK’s renowned telecommunication firm is presented and the quality of the proposed framework is verified through several measures, such as cluster segregation, classification accuracy, and fitness of the log. We compared NoHiC technique with two trace clustering techniques using two real world process logs. The discovered clusters through NoHiC are found to have improved fitness as compared to the other techniques, and they also hold valuable information about the business context of the process log.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.