1. Introduction
With the rapid development of technology, the size and availability of data are constantly increasing. Companies need to analyze their data intelligently to acquire competitive advantages in terms of efficiency and responsiveness [
1]. Business analytics is a decision support system that enables managers and analysts to take effective and timely decisions in business activities [
2]. Meanwhile, business process analytics is a specialized system that helps to improve business process execution by analyzing operational data recorded in event logs that contain activity events, which have been recorded in process-aware information systems such as enterprise resource planning (ERP) and supply chain management (SCM) systems [
3]. Although business analytics offers both types of data-oriented and process-oriented analysis tools, few studies have investigated how to integrate the two approaches in a concrete and effective manner [
4,
5,
6].
Business analytics using regression, classification, and clustering usually do not consider the order of data generation because time information is rarely contained in a general form of data, e.g., tabular data [
7]. However, the context information related to business process often crucially affects the target business values of interest such as delivery time and service quality. Therefore, process analytics can provide the missing link between the data and the process aspects. Recently, although many process mining methods have been developed to analyze process execution data, most of the traditional process mining techniques, such as process discovery and conformance checking, mainly focus on the structure of business processes, not the process-related data. Hence, process analytics tools need to integrate the two sides of analytics (i.e., data and process) in a balanced way to enable effective and efficient process data analysis.
To support the integrated analysis of data and process, a concept of process cubes was proposed by van der Aalst [
8]. The process cube is inspired in the online analytical processing (OLAP) and extends the concept into process repositories. A process cube is created from an event log and its related database, in which different dimensions such as event class, case type, and time are defined. Like an OLAP cube, a process cube allows the analyst to drill-down or roll-up data and zoom into slices of the data, as well as reorder the dimension. Furthermore, just as an OLAP cube is used for analyzing and reporting operational performance, a process cube can also be used in a similar way to evaluate operational performance.
In this paper, we introduce a PRocess ANAlytics System, named PRANAS, which was developed to effectively and efficiently evaluate operational performance of supply chain operations using process cubes and process warehouses. The system provides an environment for both data-oriented analytics and process-oriented analytics. The system can store in the process warehouse the business performance data that are generated in supply chain execution, and the process cube can provide the aggregated process data through convenient OLAP functions such as slicing, dicing, rolling-up, and drilling-down. The process data extracted from the system can also be used as the inputs of data mining, as well as process mining, for the purpose of advanced analytics.
There are insufficient studies on the design of process analytics systems that integrate two analytics aspects: Data analytics and process analytics. This research presents a good guidance for balancing the benefits of the two analytics tools using process warehouses and cubes. The process warehouse and cube can provide proper filtered data fractions that will be used for the recently advanced analytics of both data analytics and process analytics for performance management in terms of business process execution. In particular, a framework of collaborative performance measures, called collaborative Balanced Scorecard (cBSC), is presented to illustrate the usage of the PRANAS for supply chain management under the Supply Chain Operations Reference (SCOR) standard models.
The remainder of the paper is organized as follows. The work related with our approach is described in
Section 2. The framework of PRANAS is presented, along with the design of process cubes and warehouse, in
Section 3. Three exemplary analytics applications of the system are explained in
Section 4. Finally, we conclude the paper with future work in
Section 5.
2. Related Work
The integration between business and process analytics has been considered in different ways in a few studies [
4,
5,
6,
9,
10]. A conceptual framework of a decision support system for business intelligence was presented by Görgülü and Pickl [
4], which combined classical data-centric approaches such as data mining with a modern system engineering using the advanced concept of adaptive business intelligence. Marjanovic argued that, in service-oriented industries, the operational business intelligence and business process management systems need to be integrated with case-handling [
5], which could be done with the framework proposed in this paper. Kim presented an architecture to functionally integrate data analysis functions and process discovery functions [
6]. Both systems coexist in what it is called a process-aware enterprise organization, which is usually the case of companies that participate in supply chain systems. Beheshti et al. handled the business and process analytics integration problem based on a process graph, which refers to large hybrid collections of heterogenous and partially unstructured process execution data [
9]. They presented the process OLAP (P-OLAP), which is a very similar concept to the process cube presented by van der Aalst [
8]. However, P-OLAP focuses on the scalability of big process graphs operations based on MapReduce rather than the actual integration of process and business analytics as it is presented in this paper. In addition, Silva et al. presented a decision-support oriented framework that integrates process analytics with other types of analytics for project selection [
10]. The integration is achieved by decomposing a complex process into sub-manageable processes, which, in turn, are handled with multiple analytic techniques such as text mining, clustering, and social network analysis. However, the approach is limited to recommendation analysis.
More specifically, a few studies have integrated business analytics directly with process mining [
11,
12]. An approach to support business and IT users in the task of measuring and monitoring the performance of business process execution was presented by Grigori et al. [
11]. They described well important concepts, such as process behavior and process state change, and even they described their own definition of process mining. However, when the paper was written, process mining techniques were not as mature as they are today. We can benefit from many mature process mining algorithms that give our approach better foundations and wide applicability, such as process discovery algorithms, conformance checking, and decision point analysis. Mansmann et al. presented an interesting attempt to integrate OLAP concepts and capabilities into process analysis [
12]. Their approach focused on modeling a data warehouse and presented a formal solution for this problem by means of an extended snowflake schema that is quite complex. In our study, the schema becomes much simpler, since our approach leverages the advantages of process cubes which already integrate the OLAP concept with business process elements such as event class and case type.
Process cube, which is a relatively new concept, has been implemented in some domains such as education [
13,
14], resource allocation [
15], and internal logistics [
16]. Applications in the education domain were described in case studies based on video lectures from a Dutch university [
13,
14]. The nature of the data makes process cubes suitable for analyzing the video lecture-watching behavior depending on characteristics of the students (e.g., country, age, gender, school year), because the ability to slice and dice depending on different attributes of the process data yields meaningful results. Additionally, the authors implemented an automated process to generate periodic reports, which can be used by the lecturers to monitor and correlate the students’ behaviors with their grades. In our work, besides of the process cube integration, we also provide a process warehouse scheme and the ability to combine process analytics and data analytics for a more powerful analysis.
Furthermore, a resource process cube was introduced by Arias et al. [
15]. The cube provided a flexible, extensible, and fine-grained way to extract historical information from event logs. Several resource-related dimensions were defined and used, such as frequency dimension, performance dimension, and cost dimension. Moreover, one of their major contributions is the ability to consider the resources at a generic subprocess level, rather than an activity level, because of the functions offered by process cubes. However, this approach was specifically designed for resource allocation, and although it can be adapted to different resource related company scenarios, our approach can be applied to other domains such as decision point analysis, as well as data analytics, such as classification and regression over the sliced data obtained from the process cube
Lastly, Knoll et al. presented a multidimensional process mining approach combined with lean management principles and value stream mapping (VSM) for reducing the waste in the internal logistics processes at a production plant at a German automotive manufacturer [
16]. Process mining was used in all stages of the approach, starting with the event definitions and storage as event logs. Then, process cubes were constructed and used for multidimensional process analysis using structure-based and time-based analysis. In contrast, the capabilities of PRANAS go beyond these dimensions and are limited only by the availability of event-, case-, and process-related data.
4. Applications to Process Analytics
In this section, we demonstrate three applications of PRANAS to process analytics. The example applications explore the different type of analytics used in supply chain analytics: Descriptive, predictive, and prescriptive [
18]. In this research, a process warehouse and a process cube were implemented in Microsoft SQL Server Business Intelligence Development Studio (BIDS) (It is contained as an additional package in Microsoft SQL Server 2008 R2, USA), as shown in
Figure 4, while process mining methods were implemented on the ProM framework (ProM 6.1, Eindhoven University of Technology, Eindhoven, The Netherlands
http://www.promtools.org/doku.php?id=prom61) [
19]. The implemented system can be used to analyze and report the operation performance measures in multidimensions. In addition, process-oriented analysis can be performed using technologies of existing process mining from event trace and covered process models stored in the process warehouse.
All the data in the examples were extracted from the process warehouse and the process cube that was designed in
Section 3.3 and
Section 3.4. The first application is process-oriented analytics. This example illustrates process mining techniques such as process discovery and performer recommendation. The second application is data-oriented analytics. The core of the second example are data mining techniques such as classification and regression for on-time delivery prediction on a design change process. The final application is a hybrid scenario of process-oriented and data-oriented analytics. We discover a control-flow process model and the decision rules inside the process.
4.1. Process Discovery and Enhancement
The first application is the process-oriented analytics. Process mining is a research discipline that combines business process modeling with data mining methods to convey useful results. The basic input for process mining techniques is prepared from event log, that is, records of the actual process execution. The preparation for process mining can be made through the process warehouse and the process cube. In detail, the wanted attributes can be selected from the process warehouse, and the result is designed into the process cube. Then, the process cube can support three purposes of process mining, such as process discovery, conformance checking, and process enhancement [
3]. Moreover, the cube can also be used for operational support through three main tasks: Detect, predict, and recommend. For example, recommendation refers to guiding users in selecting the next proper activity over a running instance, as well as other kinds of recommendations, such as which performer had better take the next activity in terms of time or cost.
Using the process cube, we can effectively preprocess the data. In this example process cube, we discover a subprocess model as shown in
Figure 5a. After combining the discovered model with event data and metadata from the process cube, we then also apply a performer recommendation method, DTMiner, which stands for the decision tree miner [
20]. This approach constructs decision trees based on event logs and recommends the best performer according to specific measures such as completion time or cost. When DTMiner is applied to a cell of interest in the process cube, the result shown in the
Figure 5b can be obtained. Given a running instance, which has three completed activities, DTMiner uses historical event data to construct a decision tree, showing in red color the recommended performer for each possible path based on completion time or cost.
4.2. Performance Prediction
Since process warehouse and cubes in PRANAS include the functions of OLAP, data mining or machine learning techniques can be performed on the data schema. Therefore, the system can be utilized to analyze operation performance by choosing the suitable analytics algorithms such as classification, regression, and clustering. Suppose a manager is willing to analyze the last two years of data, as well as the collaboration processes related with all company sizes except small-sized companies. To filter the target data, two dicing operations are required for time dimension and class type dimension. After this, data analytics techniques such as decision tree and linear regression methods are applied. Two example scenarios of the data-oriented analytics are presented below.
Classification for on-time delivery: Suppose a manager analyzes the process to find which product or project is expected to be frequently delayed. In this example, the classification models predict categorical classes which are ‘on-time delivery’ or ‘delayed delivery’ with the dimensions selected from the process cube.
Figure 6a shows the result of using the decision tree classifier by adopting the classification techniques for the on-time delivery. The result of the first branch in decision tree can be interpreted as follows: If ‘Schedule Hit Rate’ (the compliance rate schedule) is more than 95%, products are expected to be on-time delivery with 98.06%.
Regression for design change rate: Assume a manager needs to predict how many times the design changes will be done for a particular situation in the process cube. The results derived from the regression method for the design change rate based on linear regression is shown in
Figure 6b. For the design change rate, in case of ‘Priority’ is not 2 (which means ‘medium’), the linear regression equation is:
From the regression model, it can be said that the factors such as receiving cost, satisfaction, and email reply time mainly affect the design change rate.
4.3. Decision Point Analysis
In this subsection, we introduce a hybrid approach of process-oriented and data-oriented analytics. Decision mining, also known as decision point analysis, is an interesting approach that can potentially take a full advantage of the PRANAS. The authors of [
21,
22] have introduced similar techniques, in which a separation between control flow and data flow was distinguished. This characteristic particularly suites the PRANAS, since it can clearly separate the event log that is represented by the control-flow perspective as well as by the data-flow perspective. Suppose decision points in the event log are provided, and in each decision point the process splits in different branches. These branches are assigned as a dimension of the process cube. For example, in
Figure 3, the events generated by companies of different sizes were taken to be analyzed. If a manager is interested in analyzing the branches of a specific decision point, the rolling-up operation can be used to extract the corresponding data.
Figure 7a shows the discovered process model with decision points and data annotations obtained with the ProM plugin presented in [
22]. Moreover,
Figure 7b depicts a decision tree classifier from the decision point of company size obtained by rolling up the first two branches of the process, and then a decision tree was generated. With the result of the decision tree classifier, the effects of specific attributes on the path taken by the running process can be analyzed, which can help managers predict performance measures such as completion time or costs.
5. Conclusions and Future Work
The typical techniques of data-oriented analytics, such as regression, classification, and clustering, do not consider process-related aspects such as the order of business activities and the timestamp of generated data. Although a few process-oriented analytics such as process mining techniques consider various case data as well as process-related data, the concrete methods have not yet been clearly developed. Besides, the existing techniques of process mining often focus on only a single well-defined process rather than a process data complicatedly combined with the data storage. However, business processes change over time due to various reasons, such as work structure change and economic situation changes. Clearly, when business analysts analyze operational business performance, they should consider the execution of business process along with the design of business data. For this reason, the concepts of process cube and warehouse were presented in this paper to illustrate how they can be used for both of data-oriented analytics and process-oriented analytics.
In this paper, we designed the process analytics system called PRANAS that contains a process warehouse and a process cube. The analytics system was implemented for the operational performance analytics in supply chain management under the SCOR standard model. As the performance measures for operational process analysis, the collaborative BSC (cBSC) was designed by extending four perspectives of BSC in terms of collaboration among business partners. To illustrate the process analytics tools, three types of examples of analytics were given: Process-oriented analytics, data-oriented analytics, and hybrid analytics. In particular, the operation performance on a multidimensional level using the designed process cube was analyzed.
Previous studies on integration between data analytics and process analytics were not sufficient. To supplement the research necessity, the concept of process warehouse and cube can be useful. The proposed framework based on the process warehouse and cub is expected to be a helpful guide for designing business performance systems to be implemented using recently advanced techniques of data mining, as well as process mining. This is because the system was designed to support both aspects of data-oriented analytics and process-oriented analytics.
Although we showed a system implementation supporting the proposed process analytics system in this paper, many challenges still remain. The main purpose of operational analytics is to provide “near real-time” analytics in organizations, and therefore an automated procedure for generating insightful reports could be proposed. With this, the decision-making process inside the company would be systematically improved in terms of business process intelligence.