# Investigating Social Contextual Factors in Remaining-Time Predictive Process Monitoring—A Survival Analysis Approach

^{*}

## Abstract

**:**

## 1. Introduction

- Case context: the properties or attributes of a case.
- Process context: similar cases that may be competing for the same resources.
- Social context: the way human resources collaborate in an organisation to work on the process of interest.
- External context: factors in the broader ecosystem that impacts the process, e.g., weather, legislation, and location.

## 2. Materials and Methods

#### 2.1. Definitions

#### 2.1.1. Event, Traces and Event Logs

**Definition**

**1.**

_{1}(e)..#attribute

_{n}(e)). The elements of the tuple represent the attributes associated with the event. Though an event is minimally defined by the triplet ((#case_identifier(e), #activity(e), #completion_time(e)), it is common and desirable to have additional attributes such as #performer(e) indicating the performer associated with the event and #trans(e) indicating the transaction type associated with the event, amongst others. For each of these attributes, there is a function which assigns the attribute to the event. e.g., attr

_{start_time}$\in \mathsf{\epsilon}\to T$ assigning a start time to the event, attr

_{completion_time}$\in \mathsf{\epsilon}\to T$ assigning a completion time to the event, attr

_{activity}$\in \mathsf{\epsilon}\to A$ assigning an activity label to the event and attr

_{performer}$\in \mathsf{\epsilon}\nrightarrow P$, a partial function assigning a performer (or resource) to events. Note that attr

_{performer}is a partial function as some events may not be associated with any performers.

**Definition**

**2.**

_{n}is a valid terminal event if #activity_label(e

_{n}) $\in \mathrm{Z}.$ This event indicates a ‘clean’ completion of the process instance. Otherwise, the process instance is still in-flight or abandoned.

**Definition**

**3.**

^{p}) has a non-valid terminal event as the final event (${e}_{n}$). It indicates an in-flight (pre-mortem) process instance.

^{f}) ends with a terminal event (${e}_{n}$). It details the journey through the value chain that the particular process instance followed and indicates a completed (post-mortem) process instance.

**Definition**

**4.**

**Definition**

**5.**

_{1}, σ

_{2}$\in L:\text{}\forall {e}_{1}\in {\sigma}_{1}\forall {e}_{2}\in {\sigma}_{2}{e}_{1}\ne {e}_{2}\text{}or\text{}{\sigma}_{1}={\sigma}_{2}$.

**Definition**

**6.**

^{f}represent a full trace, τ.e

_{n}represent the completion time associated with the terminal event, #completion_time(e

_{n}), and t represents the prediction point. For t < τ.e

_{n}, the remaining time τ

_{rem}= t − τ.e

_{n}. It indicates the remaining time to completion of case/process instance. Note that predicting at or after the completion time (i.e., t $\ge $ τ.e

_{n}) is pointless.

**Definition**

**7.**

^{f}represent a full trace, τ.e

_{1}represent the start time associated with the start event, #start_time(e

_{1}), and t represents the prediction point. For t > τ.e

_{1}, the elapsed time τ

_{ela}= t − τ.e

_{1}. It indicates the elapsed time from the start of case/process instance to the prediction time.

**Definition**

**8.**

^{f}represent a full trace, τ.e

_{1}represent the start time associated with the start event, #start_time(e

_{1}) and τ.e

_{n}represent the completion time associated with the terminal event, #completion_time(e

_{n}), The trace cycle time τ

_{cyc}= $\tau .{e}_{n}-\tau .{e}_{1}$. It indicates the time taken to complete the process instance from start to finish.

#### 2.1.2. Survival Functions and Social Networks

**Definition**

**9.**

_{cyc.}

_{1}…. τ

_{cyc.n}}, a trace σ

_{i}$\in L$ with cycle time τi.

_{cyc}and a random time, t

_{r}, the survival function S(t) = P (τi.

_{cyc}> t

_{r}). It gives the probability that the random time, t

_{r}exceeds the trace cycle time.

**Definition**

**10.**

_{i}and e

_{i}

_{+1}) and a completes #activity(e

_{i}), while b completes #activity(e

_{i}

_{+1}). Note that the incidence function permits a performer to hand over work to themself, i.e., complete #activity (e

_{i}) and (e

_{i}

_{+1}).

**Definition**

**11.**

_{i}$\in L$, X = {# performer (e

_{1})……# performer (e

_{n})}. This denotes the subset of performers who completed the activities in a trace.

_{u}

_{,v}represent the number of geodesics connecting vertices u to v and g

_{u}

_{,v(}X) represent the number of geodesics between u and v passing through some vertex of X. The group betweenness centrality of X is defined as follows:

_{1})UN(x

_{2})…UN(x

_{n}) where N() denotes the neighbourhood of the vertex and x

_{i}denotes the members of the set X. The group eigenvector centrality is defined as follows:

#### 2.2. Overview

#### 2.3. Pre-Processing

_{n}(see Definition 2) is present within the set of terminal activity labels. If it is not, the trace is considered censored; otherwise, it is.

#### 2.4. Predictive Monitoring

_{1})…. # performer(e

_{n})) associated with that trace as well as the start and end event activity labels. While we adopt that approach to explore the impact of social contextual factors on process cycle time, we acknowledge that other encoding approaches, such as index-based encoding which is “lossless” [9], could also be equally adopted. Our approach is in effect a combination of aggregation and last state encoding [14] where the aggregation function computes the group degree, betweenness, closeness, and eigenvalue centrality for each trace based on the set of performers who executed the events in that trace. This approach enables us to treat the performers who execute the activities in a trace as a team and builds on the approach in “the team effectiveness literature where researchers have used several internal team composition variables to predict performance” [21]. We utilise the parametric Weibull model to build the survival model. Even though it requires that certain assumptions regarding the distribution of the process cycle time are satisfied, this method offers several unique benefits in that it is “simultaneously both proportional and accelerated so that both relative event rates and relative extension in” process cycle “time can be estimated” [23].

Algorithm 1 Survival Algorithm. | |

Input: | An event log L over some trace universe σ with the associated feature elapsed time τ_{ela}, cycle time τ_{cyc}, a target measure remaining time τ_{rem}, a set of terminal activity labels (T), an estimation quantile q and a survival analysis (SURV) method |

Output: | A Survival Analysis predictive model (SA-PM) for L |

1 | Associate a binary variable #censored(σ) with each trace σ ϵ L using #activity(e_{n}), T (see definition 3) |

2 | Encode each trace using a suitable encoding function |

3 | Induce a survival function sa-pm out of L using method SURV {#censored(σ_{i}), # cycle time(σ_{i}) …..# attribute_{n}(σ_{i})} as input value |

4 | Let σ_{1}… σ_{n} denote each trace |

5 | For each σ_{i} do |

6 | Estimate the cycle time τi._{cyc_pred} for each trace from sa-pm utilising q |

7 | Estimate the remaining time for each trace τi._{rem_pred}: τi._{cyc_pred} − τ_{ela} |

8 | End |

9 | Return c {τ_{rem_pred1}……. τ_{rem_predn}} |

#### 2.5. Evaluation

**RQ1:**What is the relationship between social contextual factors and process completion time?

**RQ2:**How does the survival analysis predictive process monitoring approach compare with existing approaches?

#### 2.5.1. Datasets

#### 2.5.2. Experimental Setup

## 3. Results

#### 3.1. Experimental Results

_{rem_pred1}, τ

_{rem_pred2}….. τ

_{rem_predn}} represents the prediction target. The resulting probability distribution is used to make predictions for the test set. However, when there is a significant proportion of incomplete traces in the training data, this approach is not useful as the target (Y), i.e., the remaining time for the trace, is unknown. This is the reason why these traces are typically removed from the training set. However, generative approaches, such as the survival analysis approach proposed, calculate a joint distribution P(X,Y) which is then utilised to derive the conditional probability P(Y|X). This approach can generate synthetic values of X by sampling from the joint distribution. As a result, this approach performs better when an event log has a significant proportion of incomplete trace.

^{−9}; for BPIC 18, p < 2.2 × 10

^{−16}). We subsequently run pairwise comparisons using Wilcoxon rank-sum test to determine which proportions differ significantly from the baseline (i.e., the log with 100% complete traces).

#### 3.2. Threats to Validity

## 4. Conclusions

## Author Contributions

## Funding

## Conflicts of Interest

## References

- Van der Aalst, W.M. Process Mining: Data Science in Action, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 2016. [Google Scholar]
- Tax, N.; Verenich, I.; La Rosa, M.; Dumas, M. Predictive business process monitoring with LSTM neural networks. In Proceedings of the International Conference on Advanced Information Systems Engineering, Essen, Germany, 12–16 June 2017; pp. 477–492. [Google Scholar]
- Aslan, A. Combining Process Mining and Queueing Theory for the ICT Ticket Resolution Process at LUMC. Master’s Thesis, University of Twente, Enschede, The Netherlands, 2017. [Google Scholar]
- Rogge-Solti, A.; Weske, M. Prediction of remaining service execution time using stochastic petri nets with arbitrary firing delays. In International Conference on Service-Oriented Computing; Springer: Berlin/Heidelberg, Germany, 2013; pp. 389–403. [Google Scholar]
- Akritas, M.G. Non-parametric survival analysis. Stat. Sci.
**2004**, 19, 615–623. [Google Scholar] [CrossRef] [Green Version] - Somers, M.J. Modelling employee withdrawal behaviour over time: A study of turnover using survival analysis. J. Occup. Organ. Psychol.
**1996**, 69, 315–326. [Google Scholar] [CrossRef] - Larivière, B.; Van den Poel, D. Investigating the role of product features in preventing customer churn, by using survival analysis and choice modeling: The case of financial services. Expert Syst. Appl.
**2004**, 27, 277–285. [Google Scholar] [CrossRef] - Dirick, L.; Claeskens, G.; Baesens, B. Time to default in credit scoring using survival analysis: A benchmark study. J. Oper. Res. Soc.
**2017**, 68, 652–665. [Google Scholar] [CrossRef] [Green Version] - Verenich, I.; Nguyen, H.; La Rosa, M.; Dumas, M. White-box prediction of process performance indicators via flow analysis. In Proceedings of the 2017 International Conference on Software and System Process Pages, ACM, Paris, France, 5–7 July 2017; pp. 85–94. [Google Scholar]
- Folino, F.; Guarascio, M.; Pontieri, L. Discovering Context-Aware Models for Predicting Business Process Performances. In On the Move to Meaningful Internet Systems, Proceedings of OTM 2012. OTM, Rome, Italy, 10–14 September 2012; Meersman, R., Panetto, H., Dillon, T., Rinderle-Ma, S., Dadam, P., Zhou, X., Pearson, S., Ferscha, A., Bergamaschi, S., Cruz, I.F., Eds.; Springer: Berlin/Heidelberg, Germany, 2012. [Google Scholar]
- Senderovich, A.; Di Francescomarino, C.; Ghidini, C.; Jorbina, K.; Maggi, F.M. ‘Intra and Inter-case Features in Predictive Process Monitoring: A Tale of Two Dimensions’. In Lecture Notes in Computer Science, Proceedings of Business Process Management. BPM 2017, Barcelona, Spain, 10–15 September 2017; Carmona, J., Engels, G., Kumar, A., Eds.; Springer: Cham, Switzerland, 2017; Volume 10445. [Google Scholar]
- Rozinat, A.; Wynn, M.T.; van der Aalst, W.M.; ter Hofstede, A.H.; Fidge, C.J. Workflow simulation for operational decision support. Data Knowl. Eng.
**2009**, 68, 834–850. [Google Scholar] [CrossRef] - Veldhoen, J. The Applicability of Short-term Simulation of Business Processes for the Support of Operational Decisions. Master’s Thesis, Technische Universiteit Eindhoven, Eindhoven, The Newzerlands, March 2011. Available online: http://alexandria.tue.nl/extra2/afstversl/tm/Veldhoen%202011.pdf (accessed on 22 April 2020).
- Verenich, I.; Dumas, M.; La Rosa, M.; Maggi, F.M.; Teinemaa, I. Survey and Cross-Benchmark Comparison of Remaining Time Prediction Methods in Business Process Monitoring. Available online: https://arxiv.org/abs/1805.02896 (accessed on 11 May 2018).
- Breuker, D.; Matzner, M.; Delfmann, P.; Becker, J. Comprehensible Predictive Models for Business Processes. MIS Q.
**2016**, 40, 1009–1034. [Google Scholar] [CrossRef] [Green Version] - Evermann, J.; Rehse, J.R.; Fettke, P. Predicting process behaviour using deep learning. Decis. Support Syst.
**2017**, 100, 129–140. [Google Scholar] [CrossRef] [Green Version] - Pasquadibisceglie, V.; Appice, A.; Castellano, G.; Malerba, D. Using Convolutional Neural Networks for Predictive Process Analytics. In Proceedings of the 2019 International Conference on Process Mining (ICPM), Aachen, Germany, 24–26 June 2019; pp. 129–136. [Google Scholar]
- Van Der Aalst, W.M.; Reijers, H.A.; Song, M. Discovering social networks from event logs. Comput. Supported Coop. Work (CSCW)
**2005**, 14, 549–593. [Google Scholar] [CrossRef] - Song, M.; Van der Aalst, W.M. Towards comprehensive support for organisational mining. Decis. Support Syst.
**2008**, 46, 300–317. [Google Scholar] [CrossRef] [Green Version] - Nakatumba, J.; van der Aalst, W.M. Analysing resource behavior using process mining. In Proceedings of the International Conference on Business Process Management, Vienna, Austria, 1–6 September 2019; pp. 69–80. [Google Scholar]
- Everett, M.G.; Borgatti, S.P. The centrality of groups and classes. J. Math. Sociol.
**1999**, 23, 181–201. [Google Scholar] [CrossRef] - Zhang, J.; Thomas, L.C. Comparisons of linear regression and survival analysis using single and mixture distributions approaches in modelling LGD. Int. J. Forecast.
**2012**, 28, 204–215. [Google Scholar] [CrossRef] [Green Version] - Carroll, K.J. On the use and utility of the Weibull model in the analysis of survival data. Control. Clin. Trials
**2003**, 24, 682–701. [Google Scholar] [CrossRef] - van Dongen, B.F. BPI Challenge 2012. 4TU. Centre for Research Data. Dataset. Available online: https://data.4tu.nl/articles/BPI_Challenge_2012/12689204 (accessed on 4 May 2020).
- van Dongen, B.F. BPI Challenge 2014. 4TU. Centre for Research Data. Dataset. Available online: https://data.4tu.nl/collections/BPI_Challenge_2014/5065469 (accessed on 4 May 2020).
- van Dongen, B.F. BPI Challenge 2015 Municipality 3. Eindhoven University of Technology. Dataset. Available online: https://data.4tu.nl/articles/dataset/BPI_Challenge_2015_Municipality_3/12718370 (accessed on 4 May 2020).
- van Dongen, B.F. BPI Challenge 2017. Eindhoven University of Technology. Dataset. Available online: https://data.4tu.nl/articles/BPI_Challenge_2017/12696884 (accessed on 4 May 2020).
- van Dongen, B.F. BPI Challenge 2018. Eindhoven University of Technology. Dataset. Available online: https://data.4tu.nl/articles/BPI_Challenge_2018/12688355 (accessed on 4 May 2020).
- van Dongen, B.F. BPI Challenge 2020. 4TU. Centre for Research Data. Dataset. 2020. Available online: https://data.4tu.nl/collections/BPI_Challenge_2020/5065541 (accessed on 26 May 2020).
- Bevacqua, A.; Carnuccio, M.; Folino, F.; Guarascio, M.; Pontieri, L. A Data-Driven Prediction Framework for Analysing and Monitoring Business Process Performances. In Lecture Notes in Business Information Processing, Proceedings of Enterprise Information Systems. ICEIS 2013, Angers, France, 4–7 July 2013; Hammoudi, S., Cordeiro, J., Maciaszek, L., Filipe, J., Eds.; Springer: Cham, Switzerland, 2014; Volume 190. [Google Scholar]
- Cesario, E.; Folino, F.; Guarascio, M.; Pontieri, L. A Cloud-Based Prediction Framework for Analyzing Business Process Performances; Springer: Cham, Switzerland, 2016. [Google Scholar]
- Ancona, D.G. Outward bound: Strategic for team survival in an organization. Acad. Manag. J.
**1990**, 33, 334–365. [Google Scholar] [CrossRef] - Demšar, J. Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res.
**2006**, 7, 1–30. [Google Scholar] - Linden, A.; Yarnold, P.R. Modeling time-to-event (survival) data using classification tree analysis. J. Eval. Clin. Pract.
**2017**, 23, 1299–1308. [Google Scholar] [CrossRef] [PubMed]

**Figure 1.**Contextual Factors and Relationship—(Adapted from [1]).

**Figure 3.**(

**a**) BPIC 14 Group Between Centrality Graph; (

**b**) BPIC 14 Group Closeness Centrality Graph; (

**c**) BPIC 14 Group Degree Centrality Graph; (

**d**) BPIC 14 Group Eigenvalue Centrality Graph.

BPIC 18 | BPIC 17 | BPIC 15(3) | BPIC 14 | BPIC 12 | |
---|---|---|---|---|---|

Number of events | 267,830 | 233,928 | 59,681 | 277,577 | 262,200 |

Number of cases | 3285 | 9453 | 1409 | 13,985 | 13,087 |

Number of traces | 3277 | 5211 | 1350 | 13,942 | 4366 |

Number of distinct activities | 141 | 26 | 277 | 39 | 24 |

Mean trace length | 81.53 | 24.75 | 42.36 | 19.85 | 20.04 |

Mean throughput time (days) | 580.63 | 24.11 | 62.23 | 12.93 | 8.62 |

Throughput time SD (days) | 580.62 | 14.893 | 97.64 | 27.94 | 12.13 |

Domain | Public Admin | Financial services | Public Admin | Financial services | Financial services |

GB | GC | GE | GD | |
---|---|---|---|---|

BPIC 18 | −0.058 | −0.014 | 0.248 | 0.107 |

BPIC 17 | 0.063 | 0.176 | 0.063 | 0.093 |

BPIC 15(3) | 0.289 | 0.414 | −0.208 | −0.045 |

BPIC 14 | 0.078 | 0.119 | −0.183 | −0.171 |

BPIC 12 | 0.877 | 0.848 | −0.475 | −0.003 |

Survival | MLP | GBM | Cloud-Based | Context-Aware | |
---|---|---|---|---|---|

BPIC 18 | 166.27 ± 46.6 | 187.37 ± 190.62 | 76.09 ± 84.18 | 217.27 ± 162.30 | 212.34 ± 124.37 |

BPIC 17 | 11.158 ± 2.03 | 12.11 ± 12.67 | 10.83 ± 9.54 | 12.18 ± 11.53 | 12.77 ± 11.23 |

BPIC 15(3) | 23.91 ± 6.12 | 27.88 ± 40.91 | 29.07 ± 33.63 | 42.26 ± 52.27 | 57.12 ± 59.31 |

BPIC 14 | 19.19 ± 11.6 | 20.78 ± 41.39 | 23.79 ± 36.37 | 26.03 ± 27.99 | 25.53 ± 31.84 |

BPIC 12 | 5.83 ± 1.95 | 8.24 ± 8.72 | 5.59 ± 5.27 | 9.55 ± 8.49 | 9.86 ± 9.12 |

Survival | MLP | GBM | Cloud-Based | |
---|---|---|---|---|

MLP | 0.04173 | |||

GBM | 0.75592 | 0.07598 | ||

Cloud-based | 0.00042 | 0.04173 | 0.00082 | |

Context-aware | 0.00082 | 0.07598 | 0.00159 | 0.75592 |

% of Complete Traces in Event Log | |||||
---|---|---|---|---|---|

0% | 20% | 40% | 60% | 80% | |

BPIC 12 | 1.2 × 10^{−7} | 0.0003 | 0.1158 | 0.0737 | 0.0909 |

BPIC 18 | <2 × 10^{−16} | <2 × 10^{−16} | <2 × 10^{−16} | 2.5 × 10^{−16} | 6.0 × 10^{−12} |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Ogunbiyi, N.; Basukoski, A.; Chaussalet, T.
Investigating Social Contextual Factors in Remaining-Time Predictive Process Monitoring—A Survival Analysis Approach. *Algorithms* **2020**, *13*, 267.
https://doi.org/10.3390/a13110267

**AMA Style**

Ogunbiyi N, Basukoski A, Chaussalet T.
Investigating Social Contextual Factors in Remaining-Time Predictive Process Monitoring—A Survival Analysis Approach. *Algorithms*. 2020; 13(11):267.
https://doi.org/10.3390/a13110267

**Chicago/Turabian Style**

Ogunbiyi, Niyi, Artie Basukoski, and Thierry Chaussalet.
2020. "Investigating Social Contextual Factors in Remaining-Time Predictive Process Monitoring—A Survival Analysis Approach" *Algorithms* 13, no. 11: 267.
https://doi.org/10.3390/a13110267