A Deep Learning Approach for Predictive Healthcare Process Monitoring

: In this paper, we propose a deep learning-based approach to predict the next event in hospital organizational process models following the guidance of predictive process mining. This method provides value for the planning and allocating of resources since each trace linked to a case shows the consecutive execution of events in a healthcare process. The predictive model is based on a long short-term memory (LSTM) neural network that achieves high accuracy in the training and testing stages. In addition, a framework to implement the LSTM neural network is proposed, comprising stages from the preprocessing of the raw data to selecting the best LSTM model. The effectiveness of the prediction method is evaluated through four real-life event logs that contain historical information on the execution of the processes of patient transfer orders between hospitals, sepsis care cases, billing of medical services, and patient care management. In the test stage, the LSTM model reached values of 0.98, 0.91, 0.85, and 0.81 in the accuracy metric, and in the evaluation of the prediction of the next event using the 10-fold cross-validation technique, values of 0.94, 0.88, 0.84, and 0.81 were obtained for the four previously mentioned event logs. In addition, the performance of the LSTM prediction model was evaluated with the precision, recall, F1-score, and area under the receiver operating characteristic (ROC) curve (AUC) metrics, obtaining high scores very close to 1. The experimental results suggest that the proposed method achieves acceptable measures in predicting the next event regardless of whether an input event or a set of input events is used.


Introduction
Healthcare services continuously adopt technological advancements to overcome growing challenges regarding quality of care, cost reduction, efficiency, sustainability, and transforming data into knowledge.In the medical field, significant attention has been devoted to the methods and effectiveness of e-healthcare to improve the quality of care and service recipients' well-being (particularly patients) [1].The technological field tends to generate solutions for e-healthcare and its impacts on the medical and healthcare system [2].Therefore, the main goal set forth by both governments and organizations is to increase the effectiveness, accuracy, and service quality of healthcare for citizens (users) [3] through the development, management, and monitoring of schemes based on information technologies (IT).
In this context, patient information exchange between healthcare service providers is essential in hospital information systems (HIS).Healthcare information exchange (HIE)based approaches allow the sharing of patient's medical information electronically, improving the speed, quality, safety, and cost of patient care [4,5].These approaches can be incorporated into electronic health record (EHR) systems; it is recommended to use standards for data management, such as Health Level Seven (HL7) [6], Clinical Document Architecture (CDA) [7], and Fast Healthcare Interoperability Resources (FHIR) [8] in these systems, both in clinical document design and in the exchange of clinical data between healthcare service providers.The direct exchange of information (documents), laboratory orders and results, patient referrals, or discharge summaries can be discussed.
The enhancement of business processes is a growing concern and a challenge to advancing efficiency and service quality in health institutions worldwide [9].In this way, business process management (BPM) approaches have been applied in different areas of healthcare institutions to reduce patient waiting times, reduce service costs, maximize patient care, balance resource utilization, improve the quality of service, and minimize risk (i.e., reduce patient morbidity and mortality) [10,11].A business process in the healthcare domain (healthcare process) is understood as the means institutions use to deliver a service to their users (patients) [12], which is defined by a set of elements, such as the sequence of activities and events, decision points, interactions between participants (people, organizations, software systems, equipment), and where the execution of the business process leads to one or several outcomes (business goals) [12], to deliver healthcare to patients.
Healthcare processes can be classified into clinical and organizational processes [10,13].The clinical processes are related to the patients and implemented through a previously defined diagnostic-therapeutic cycle [13].Organizational processes are administrative processes that can follow behaviors that support medical treatment processes to coordinate medical treatment between users, providers, and organizational units [14].Patient transfer orders between hospitals, medical patient management, and exam requests are examples of organizational processes.Healthcare processes are usually considered patient-centered, complex, constantly developing and updating, adaptable, and multidisciplinary [15,16].Consequently, extracting knowledge and acquiring insight into the dynamics of these processes can be expensive and complex tasks.The management of healthcare processes is very complex due to their variability, dynamism, fast-changing ad hoc nature, and increasingly multidisciplinary scope [17].
Healthcare process execution through HIS process-oriented registers a large number of events.From these events, knowledge can be extracted to verify and validate the flow performed in the execution of the healthcare process [11,14].An event log contains a history of the actions and activities that occurred during the execution of a business process.This log can be analyzed using an approach based on process mining techniques [18].Then, process mining can discover, verify, and improve business processes by analyzing the event logs generated by process-oriented information systems [19].Therefore, institutions can discover what behaviors are followed in the execution of a process, verify that business rules are applied, and identify possible bottlenecks in the business process flow [20].Likewise, they can determine the distribution of resources, consumption of time to execute an activity, and aspects related to the exchange of business documents.
Process mining through predictive methods exploits events contained in the traces to predict events of the present and future, allowing us to obtain a global and comprehensive point of view of the behavior of a process [21,22].Predictive monitoring approaches incorporated into business process management systems can generate intelligent systems capable of predicting the behavior of processes in operation [23].In the health domain, the prediction is probably more important than the explanation due to the high cost that a delay in diagnosis and treatment can cause [24].The capability to predict the healthcare process's future behavior is a significant challenge in process management and process mining.Namely, monitoring healthcare process instances and predicting their behavior can enable medical supervisors or hospital administrators to act proactively in anticipation of an event [25].
The predictive analysis applied to health data is emerging as a tool to collaborate in more proactive and preventive treatment options [24] and improve healthcare processes.In this way, compared to other machine learning methods, deep learning techniques have shown their ability to improve the discriminative function of the selected inference model.Well-known approaches based on deep learning architectures have been proposed to address complex problems, for example, remote sensing segmentation of dense buildings after an earthquake in urban areas using an improved Swin transformer [26] or a model to evaluate the structural health of long-span bridges, establishing a relationship between the vertical deflection of the beam and the cable tension, in cable-stayed bridges [27].Recurrent neural networks (RNNs) are a deep learning technique considered the most efficient time series prediction method.An RNN allows the use of a sequence of input data with cyclic connections between blocks, where neurons are interconnected in the same hidden layer.Then, a training function is repeatedly applied to the hidden states [28].Long short-term memory (LSTM) is based on an RNN with the capability to solve memory and forgetting problems by adding multi-threshold gates.This neural network has been successfully applied to many sequence and time series prediction problems.Hence, a model based on LSTM to predict the next event from a set of cases or traces can be considered an essential strategy in the supervised process mining environment.
Motivated by these requirements, we propose a deep learning-based approach to predict the next event in a healthcare process.The predictive model is based on the LSTM neural network cell structure.A framework is proposed to manage the preprocessing and categorization of the neural network input data and the design, training, and selection of the LSTM network model.The LSTM model can predict the next event from an input event or a set of input events.The validation performed on the LSTM model shows that it can predict the next event of a new process model instance, with the highest precision metric of 0.98, confirming the feasibility and usefulness of this approach.Our experiment addresses the challenges of data analytics in healthcare and public health.The performance of the LSTM predictive model was evaluated using four data sets (event logs) from different business processes running within the daily operations of a hospital.The model reaches values of 0.95, 0.90, 0.85, and 0.82 in the F1-score metric for the event logs of the patient transfer between hospitals, sepsis care cases, billing of medical services, and health care management of patients, respectively.In evaluating the prediction model's performance through the area under the receiver operating characteristic (ROC) curve (AUC), average values between 0.953 and 1.00 for micro-average ROC-AUC and between 0.899 and 0.958 for macro-average ROC-AUC are obtained.The experimental results based on key metrics suggest that the proposed method achieves the highest measures on a healthcare dataset and can be applied when models have high complexity in their behavior and extension.

Materials and Methods
The framework for predicting the next event in a healthcare process model comprises data pre-processing, binarization, and model training phases, as shown in Figure 1.Most state-of-the-art investigations of prediction problems using neural networks do not describe an implementation method and usually perform a manual implementation.For this reason, a framework is required to support the development of a technological solution based on an LSTM neural network with the ability to predict future events.

Data Pre-Processing
The pre-processing of an event log consists of analyzing the raw data to identify its attributes, extracting the traces with its events, and applying separation criteria to the selected data.This phase is composed of the following stages:

Data Extraction
The data extraction stage consists of diverse tasks that are executed sequentially.First, the attributes contained in an event log are identified.The eXtensible Event Stream (XES) standard has emerged as the principal storage format for event logs [29].This standard defines a structure to manage and manipulate logs containing traces and events and their attributes.The XES standard format is based on the XML language, formed by a hierarchical structure.The root node corresponds to the event log, and each child (intermediate node) corresponds to the traces contained in an event log.Each intermediate node can have several children representing each event within a trace.Each event node has several children, representing the attributes belonging to the events, and each attribute has a name and a value.
The first task's output allows the selection of the required attributes to predict the next event in a healthcare process model.In the Results section experiments, the selected attribute in each event contained in the event log (XES file) was work f lowmodel element : name.This attribute stores as a value the name of the activity executed in the healthcare process and recorded in the event log.Next, a normalization task is applied to the data contained in the dataset's attribute.Afterward, the corrupt or inaccurate instances are detected and removed from the event log.
Then, a trace identification task is applied: searching, recognizing, selecting, and recovering the traces with their events and attributes.This task is performed by sweeping the event log's hierarchical structure content.The attribute value of each of the events contained in a trace is collected when the condition attribute_event = b is fulfilled, where b corresponds to the attribute selected in the previous task (in this case, it is the work f lowmodelelement : name attribute), as shown in line 4 of the Algorithm 1.Each recovered trace is stored in a text file (see lines 11-12 of the Algorithm 1), respecting the order in which the trace is extracted.e ← remove_punctuation(attribute_event) end for 10: add event_list to trace_list 11: end for 12: write trace_list to f ile 13: return text f ile

Segmentation
The segmentation stage provides a basic structure to the traces that can later be used by downstream phases, enabling the prediction of the next event.This stage consists of dividing traces into segments, each segment being topically coherent, and cutoff points indicate a change in an event.A cutoff point separates each event included in a trace.Then, a vocabulary with unique acronyms is created from the dataset generated in the previous stage, transforming these selected acronyms into single integers.Subsequently, the events are transformed into single integers using the previously created vocabulary as a reference.Each event in the trace is transformed into a single integer, constructing the trace as a sequence of integers.
Next, a list of valid n-tuples is constructed for each sequence of integers.For which, the following should be considered, given an ordered n-tuple of integers (events) of a trace, the number of permutations is the number of possible ordered n-tuples, applying the next rules: (1) the ordering of the events cannot change, (2) the position of an event cannot be modified, (3) the first event and the last event of a trace cannot be grouped.For example, given the trace t 1 = {1, 3, 5, 9, 2}, the possible tuples to be calculated {1, 3}, {3, 5}, {5, 9}, {9, 2}, {1 3, 5}, {3 5, 9}, {5 9, 2}, {1 3 5, 9}, {3 5 9, 2}, {1 3 5 9, 2}.Then, the tuple list generated must be divided into two sub-datasets, which will be used in the neural network training and testing phases.This task requires defining the percentage of the tuple list allocated for the training sub-dataset, and the remaining records will automatically be allocated to the test sub-dataset.The instances of the training sub-dataset are automatically selected by a random method.
The next actions are performed in parallel for the two sub-datasets.(1) The input (X) and output (y) activity lists are created for each sub-dataset.( 2) The last integer is extracted from each tuple, allowing the output activity list to populate (y).( 3) An antecedent event or a set of antecedent events to the last event for each tuple is extracted and added to the input activities list (X).Continuing with the previous example, of the elements of the tuple {1, 3}, the first element {1} would be stored in the input activity list (X).The second element {3} would be recorded in the output activity list (y), and so on for all tuples collected from each event log trace.Figure 2 graphically shows an example of the segmentation task implemented in the framework.( 4) The list of input activities (X) is converted into a two-dimensional matrix constructed with the total sequence of integers and the maximum length of the sequences; that is, the matrix of input activities (X) will have a dimension of m x n, where m = total tuples, and n = maximum length of the sequence.In the case of a tuple < 3, a padding method is applied to insert zero values before the integer in the sequence so that the length of the tuple is equal to 3. (5) Finally, the list of output activities (y) becomes a matrix of dimension m x 1, where m = total number of tuples and n = 1 because it only has a single output activity (class) per tuple.

Binarization
The binarization phase assigns a binary value to each integer in the output activities (y) vector.This phase is performed for the two output activity lists (y) generated previously, corresponding to the training and test sub-datasets.The binarization reduces the complexity of the task, which is also advisable for detecting connected components, i.e., event sequences or next-event prediction.This task is performed by applying a one-hot encoding scheme, which specifies that the quantity of classes equals the dictionary size.The one-hot encoding technique helps convert categorical features into binary format, that is, categorical integer features to a binary variable (a binary variable is added for each unique integer).Figure 3 extends the example of Figure 2 to explain the binarization phase.Figure 3 shows that a binary representation is created from the unique integers.A vector of the presence/absence of integers illustrates this representation.The number of columns will equal the number of classes in the event log, and the number of rows will equal the number of instances used in the training stage.Hence, a binary representation consists of a sequence of 0/1 vectors, possibly long.The presence of an integer is marked with a value of 1 in the class column corresponding to the value of the integer.All the other columns of the vector will be marked with a value of 0. The above procedure is performed for each instance in the output vector.

Model Training
The model training phase obtains a trained model based on an LSTM network that can predict the next event of a healthcare process model.Network design, network training, model selection, and model inference stages form this phase.

Network Design
The input layer is created using the word embedding technique in the neural network design.Also, the hidden layer containing the LSTM units is generated and interconnected through the input, output, and forget gates.Finally, only one neuron is available as a unique output value corresponding to the prediction in the output layer.Figure 4 shows the design of the neural network, where x t is the input vector and h t is the output result to the memory cell at time t.Moreover, h t is the value of the memory cell.At the time t, i t are the values of the input gate, f t are the values of the forget gate, and o t are the values of the output gate.Finally, Ct are values of the candidate state of the memory cell at time t.

Network Training
The LSTM uses hyperbolic tangent (tanh) and sigmoid activation functions in different flows for the input and output gates; logistic sigmoid activation function for the forget gate, as shown in Figure 4.The two-dimensional matrix of the input activities (X) and the vector of the output activities (y), represented by a one-hot encoder and generated in the framework's previous stages, are used.We performed hyper-parameter optimization using a grid search algorithm to determine the optimal parameters for the LSTM networks.This algorithm is based on an exhaustive search that looks through each hyper-parameter combination using permutation and combination.The hyper-parameters introduced to the search algorithm were the following: "batches = [32, 64]", "dropout_rate = [0.0,0.1, 0.2, 0.3, 0.4, 0.5]", "epochs = [50, 100]", "units = [10, 50, 100]", "optimizers = ['SGD', 'RMSprop', 'Adagrad', 'Adadelta', 'Adam', 'Adamax', 'Nadam']", "activation = ['softmax', 'softplus', 'softsign', 'relu', 'tanh', 'sigmoid']".The performance of the grid search algorithm is measured using 3-fold cross-validation on the evaluation set.The setting that achieved the highest accuracy in the validation process after performing all possible combinations of hyper-parameters is the next: epochs = 100, optimizer = Adam, batch size = 32, LSTM units = 50, and activation = softmax.The cross-entropy loss was predefined in the neural network parameters due to the data types used in the input vectors.

Model Selection
The LSTM network model is selected based on the accuracy measure and the loss function achieved in the training stage.The model that achieves the highest accuracy measurement and a minor loss function in the training phase will be used as the LSTM network model to predict the next event.Otherwise, we must return to the previous stage and adjust the training parameters if an acceptable metric is not obtained during network training.

Inference Model
The LSTM inference or predictive model is the model that has acquired knowledge or learned adequately in the last phase of the framework.With a model implemented from an input event or a sequence of input events, the LSTM inference model allows the prediction of the next event for a new instance of a healthcare process, explained in the following section.

Results
The business processes that derived the event logs were generated in different hospitals or, in the case of the healthcare collaboration process, simulated the relationship established in the clinic for the care of a sick patient.The four healthcare processes used in our approach coexist in the operation of a hospital, starting with patient care through the healthcare collaboration process, from which the sepsis process or the patient transfer process to a second or third-level hospital can be derived.In other cases, the patient transfer process can occur after sepsis.For its part, the hospital billing process can be performed after each previous process.An event log typically contains attributes related to the identifier and name of the executed process, the name of the activities executed, resources responsible for the execution of the activity (human, system, or equipment), date and time of the execution of the activity, and the events that triggered the activity.A case contains a sequence of traces representing the behavior identified in an instance of a process.
In implementing the neural network model, a training sub-dataset with 80% of the original dataset (event log) observations was used to train our model (in each of the experiments described below).These instances were selected automatically on a random basis.The remaining 20% of the original dataset instances were used to validate the effectiveness of the inference model.Furthermore, the experimentation was implemented on the Jupyter Notebook tool, using the Python Keras Neural Network library with the TensorFlow backend [30].

Experiment 1
The event log contains historical information (from 2012 to 2018) about patient transfer orders and medical referral management processes.The event log basis is a project for collaborating healthcare services between a primary care provider (PCP) and a specialist care provider (SCP) located in northern Mexico, presented in [31].The procedure of a patient referral assumes negotiating a patient transfer order and managing the medical referral.The social work department of each hospital operates the patient transfer order process.A social worker executes a patient transfer upon request from a specialist doctor.The social work department coordinates all activities related to patient transfer, including equipment, care facilities, personnel, and transportation.The patients are referred from a PCP to an SCP when requiring medical care for cancer, heart, vascular, and neurological diseases.Both hospitals have defined a set of inter-organizational business processes at design time executed by the HIS of each hospital in a coordinated way while respecting the institutions' autonomy involved in the collaboration.
The dataset was extracted from the test environment of the PCP information system by information technology personnel authorized by the hospital.This environment contains the real behavior of the healthcare processes executed between hospitals with n instances of all possible behaviors defined in each healthcare process.The data was extracted automatically, assigning a Case-ID for each instance of the logged process, generating an XML file with the required structure in the XES schema.It is essential to mention that this data set does not include sensitive data about the patient's identity, disease, or diagnosis, the doctors' or specialists' identification, or the hospital's administrative staff.
In this experimentation, the event log exclusively contains instances generated by the patient transfer order process's execution through the HIS of the PCP (see Table 1), composed of 2500 traces, 25 unique activities, and 76,948 events.The event log contains traces from a patient transfer order process (including medical referral management), with attributes of Case-ID, activity name, resource, and timestamp.
Table 2 shows an excerpt of the results obtained.The "input activity" column represents the activity or set of activities introduced at the input gate of the LSTM network.The "target event" column contains the events that can be predicted according to the behavior pattern identified in the event log traces.A target event value is an event with the highest prediction probability, estimated according to the weight of the event tag value.The prediction of the event generated by the LSTM method is presented in the "output event" column.In instances 1 and 4, the LSTM model has correctly predicted the next event according to the expected event in the process model, e.g., when the new input activity is START.The output event is GPTO (see Table 2).In instances 2 and 3, it can be seen that the neural network has not correctly predicted the next event in the process model.In instance 3, the target event has two prediction possibilities (APPR | RPPR), but the model did not predict the next event correctly.In the "Target Event" column of Table 2, we present instances (3 and 5) with two expected events (APPR | RPPR, CPR | IRLC) because the activities are potentially found in the output paths at a decision point or path division gateway of the process flow.Also, in instances 5-7 (see Table 2), a sequence of two input activities is shown in the "Input Activity" column; i.e., the LSTM network receives a sequence of input activities to predict the next event.Additionally, in instances 8-10, three input activities are entered into the LSTM model, which correctly predicts the next event in all cases.Table 3 shows the accuracy and precision metrics achieved by the LSTM neural network model in the validation stage.The precision measure allows us to know the number of correct positive predictions.This way, the LSTM model can predict the next event with 0.98 precision.This model achieves the highest precision confirming the ability to predict the next event in the patient referral healthcare process.Similarly, the accuracy measure achieved by the LSTM model is highly acceptable (0.94), supporting the precision measure obtained.The value obtained by the inference model for the recall metric was 0.94 (see Table 3).The recall metric indicates how much the model captures the behavior present in the event log.The recall is close to the precision value, which is desirable when training a model.This means the LSTM model detects the most positive examples, providing higher reliability.Similarly, high reliability is observed when calculating the F1-score, with a value of 0.95 (see Table 3).Furthermore, our experiment was evaluated using 10-fold cross-validation, achieving an accuracy of 94.62% (±0.09%), which demonstrates that the model can predict the next event correctly, regardless of the partitioning of the dataset used for training and validation (see Table 3).The aforementioned confirms that the selected LSTM model parameters do not operate only on a particular dataset partition; on the contrary, they function correctly for data not seen within a dataset partition.Moreover, we report the receiver operating characteristic (ROC) curves computed from the output probabilities provided by the LSTM inference model.The area under the ROC curve (AUC) represents the degree of class separability that the model achieves.This measure quantifies the ability of a model to distinguish between classes.The higher the AUC value is, the greater the model's ability to distinguish one class from another.This metric is generally used with binary classifiers; to be used with multi-class classifiers, it is necessary to binarize the output.This condition is satisfied by using one-hot encoding.A ROC curve can be plotted for each class, taking a one-vs-all approach for each class.The scenario used in our experimentation consists of 25 classes.
ROC-AUC quantifies the continuous relation between true and false positives, given the ROC curve plots the true positive rate (TPR) against the false positive rate (FPR), as defined in Equations ( 1) and ( 2).TPR is also termed sensitivity or recall.It is important to mention that a good classification model should have an AUC value close to 1.
where TP represents the number of identified positive instances in the positive set, TN means the number of classification negative instances in the negative set, FP refers to the number of identified positive instances in the negative set, and FN represents the number of identified negative instances in the positive set.
Figure 5 shows an analysis of the area under ROC curves using the TPR on the Yaxis (Equation ( 1)) and the FPR on the X-axis (Equation ( 2)).Also, Figure 5 presents the micro-average and the macro-average ROC-AUC values.The former consists of adding each class's contribution to calculate its average.The latter is calculated independently for each class to calculate all the classes' averages (managing each class equally).Figure 5 displays the five worst ROC curves according to the AUC values obtained with the onevs-all approach (illustrated by continuous lines).The ROC-AUC values for these five classes are class_2 = 0.9831, class_9 = 0.9411, class_13 = 0.9960, class_14 = 0.9960, and class_19 = 0.9870.Most of the remaining classes of the experiment obtained a ROC-AUC value between 0.9998 and 1.000.On the other hand, a magenta dotted line represents the micro-average ROC-AUC, and a blue dotted line represents the macro-average ROC-AUC, calculated for all the classes (Figure 5).The micro-average and macro-average ROC-AUC of the LSTM method are 1.000 and 0.958, respectively.
In addition, an equal error rate (EER) was checked for all of the given classes to evaluate the results.EER is defined as the error rate at a point on the ROC curve where FPR (Equation ( 2)) is equal to FNR (Equation ( 3)).EER gives a good overview of a classifier's strength in deep learning approaches as it provides a comparable and reproducible compromise between acceptance and rejection rates.Then, EER can serve as a quantitative measure of the classifier quality assessment.An EER equal to 0.00% corresponds to the inference model's error-free work, meaning the correct classification at the point on the ROC curve.EER is the value of the FPR (Equation ( 2)) and the false negative rate (FNR) for a given matching process when the FPR = FNR.The FNR measure is defined by Equation (3).In this case, the value furthest from 0 within the experiment is an EER of 11.78% achieved in class_9, corresponding to the value obtained by the same class in the ROC-AUC metric.The other four worst EER values reached are 3.38%, 1.65%, 1.43%, and 2.65% for the classes class_2, class_13, class_14, and class_19, respectively.The EER value for the remaining 20 classes is between 0.66% and 0.00%.The results listed show that the LSTM method attains a significantly low EER in most classes.
Finally, the LSTM was trained with an optimization procedure that requires a loss function to calculate the model error, allowing a precise summary to be generated through an indicator considering all the assessable aspects of the model.On the one hand, the LSTM model obtains 0.1160 and 0.9465 of the average loss function and accuracy in the training stage, respectively (Figure 6).On the other hand, the inference LSTM model reaches 0.1172 and 0.9466 of the average loss function and prediction accuracy in the validation stage, respectively (Figure 7).

Experiment 2
A real-life event log containing sepsis case events from a hospital is used in this experiment.Sepsis is a life-threatening condition typically caused by a bacterial infection primarily affecting the stomach, lungs, kidneys, or bladder.The events were recorded by an enterprise resource planning (ERP) system of a hospital in the Netherlands and provided by the Eindhoven University of Technology [32].The event log contains 1050 cases with a total of 15,214 events that were logged for 16 unique activities.The data was captured from 7 November 2013 to 5 June 2015.A case in the event log represents the patient's trajectory in their medical care, recording each activity related to their care during their hospital stay.
The LSTM model predicts the next event within the identified behavior in the sepsis care business process with a precision of 0.92 and an accuracy of 0.91 (see Table 4).Furthermore, the model captures the behavior contained in the event log with a recall metric of 0.90.This event log has a highly complex behavior, which has been used in many experiments applying different data mining and machine learning algorithms.Our inference model correctly identifies the behavior with an F1-score metric of 0.90, the harmonic mean of the precision and recall, confirming its ability to predict the next event accurately, very close to the maximum range of 1 on this metric.In the evaluation with the 10-fold cross-validation technique, a very slight decrease in the accuracy metric is observed.However, the inference model maintains high performance, confirming that the change of instances in the folds does not drastically affect its prediction, allowing reliance on the stability of the model's performance.Figure 8 shows the ROC curve's graphical plot calculation, where class 12 obtains the worst ROC-AUC value of 0.779.The rest of the classes received better values in the measurement of the area; for example, in classes 2, 10, 14, 15, and 16, AUC values between 0.802 and 0.842 were reached; classes 3, 9, and 13 obtained values of 0.895, 0.863, and 0.889.On the other hand, classes 6 and 8 reached AUC values of 0.937 and 0.928; and the classes with AUC values closest to 1 in the experiment are 4, 5, 7, and 11 with 0.978, 0.987, 0.965, and 0.987, respectively.In addition, the dotted lines represent the micro-average and the macro-average ROC-AUC, with values of 0.953 and 0.899, respectively.

Experiment 3
The Hospital Billing event log is derived from the financial modules of an ERP system of a Dutch regional hospital [33].It contains events related to billing for the medical services provided by the hospital from 2012 to 2016.The activities carried out to bill the medical service are recorded in each trace.All traces are anonymous and do not contain identification or user values for privacy reasons.The event log comprises 100,000 traces, 451,359 events, and 18 unique activities.The event log comprises the concept : name, lifecycle : transition, and time : timestamp attributes.
The inference model obtains acceptable values in all metrics using the hospital billing event log to predict the next event to be executed in the business process.In the precision and accuracy metrics, a value of 0.85 is obtained.In the validation using the 10-fold crossvalidation technique, a value of 0.84 is reached (see Table 5), which confirms the stability of the model with any data set, whether to train or test the inference model.Very high scores are obtained in evaluating the performance of the inference model through the ROC curve.In the micro-average ROC curve, 0.994 is obtained.This curve adds the participation of all classes to calculate the average metric, which is representative of multiclass ROC curve analysis.Figure 9 clearly shows class 17 with the lowest AUC score (0.737), represented by the yellow line.At the other extreme are classes 0, 2, 3, 4, 10, and 12, with the highest AUC values of 0.990, 0.996, 0.991, 0.994, 0.999, and 0.993, respectively.Classes 6, 7, 13, and 16 obtain AUC values of 0.872, 0.899, 0.844, and 0.816.The remaining classes earn AUC scores between 0.906 and 0.984.We conclude that in the inference model with the hospital billing event log, in 13 classes, there is more than 90% probability that the model can distinguish between the positive and negative classes, and in 4 classes, with more than 81% probability to distinguish between classes.

Experiment 4
The healthcare collaboration event log is artificial, derived from a scenario of collaborative business processes presented in [34].The business process involves the public (messaging) and private activities of the patient, gynecologist, laboratory, and hospital participants.The business process begins when the patient requests medical attention from the gynecologist, reporting the disease.The gynecologist examines the patient and may determine that a blood test, a prescription, or an order for hospitalization is required, as well as the additional activities needed in the gynecologist's decision, for the patient to recover her health.The event log contains message interactions between the gynecologist and the patient, the gynecologist with the laboratory and the hospital, and the hospital with the patient.The event log includes 199 traces and 21 activities.
The inference model learns adequately from 159 instances used in the training stage, obtaining good results in predicting the next event, considering that it is an event log with few traces.A value of 0.81 is received in the accuracy metric, in precision and recall of 0.83 and 0.82, respectively, with which an average of 0.82 is obtained in the F1-score measure (see Table 6).In evaluating the prediction using 10-fold cross-validation, a value of 0.81 is reached; observing a low variance is very important with a limited amount of data in the event log.The preceding demonstrates that estimating the model's performance is less sensitive to data partitioning, which is appropriate.In Figure 10, all ROC curves incline towards the upper left space of the graph, indicating perfect classification, representing that the TPR is very close to 100% and the FPR is close to 0%.The AUC score in classes 0, 6, 7, 8, and 10 obtain the maximum value, the ideal point in the graph with an FPR = 0 and a TPR = 1, confirming an excellent performance of the inference model in these classes (see Figure 10

Discussion
We introduced the application of deep learning-based event prediction in healthcare processes.Our approach is not based on process models with an explicit or simple flow of activities.It can be applied when the models have high complexity in their behavior and extension.Our results show no available comparison to the state-of-the-art methods in the healthcare domain (experiment 1).Approaches to predicting patient transfer management events (experiment 1) are not found in the literature or reports from extensive recent research.The values obtained in the metrics are encouraging for practical implementations, demonstrating the feasibility and usefulness of this method.The use of LSTM neural networks has been employed in diverse fields of healthcare, e.g., predicting healthcare trajectories from medical records [35], analyzing longitudinal patient records [36], predicting patient spending on medications [37], and predicting an initial diagnosis of heart failure [38].Regarding experiment 2, in [39], they proposed a method based on LSTM using the event log sepsis to predict the next activity, obtaining a value of 0.60 in the F1-score metric.In [40], it compares the convolutional neural network (CNN) and an LSTM for predicting the next event, reaching a score of 0.57 in the ROC-AUC and 0.84 in the accuracy metrics.Similarly, in [41], they evaluated models based on LSTM and a deep neural network (DNN), reaching 0.66 and 0.57 in the same event log.Our model exceeds the values mentioned in the sepsis event log (experiment 2), reaching 0.91, 0.90, 0.953, and 0.899 in accuracy, F1-score, micro-average ROC-AUC, and macro-average ROC-AUC, respectively.
Concerning experiment 3, in [42], reported a value of 0.78 in the accuracy metric for predicting the next activity in a business process obtained through an LSTM model using the hospital billing event log.Similarly, the authors of [43] present an approach based on a self-attention mechanism called process transformer to predict the next activity, which obtains 85.83 and 0.82 for the accuracy and F1-score metrics.Our LSTM model achieves equal or higher values in the accuracy and F1-score metrics than the proposals mentioned above.Furthermore, the AUC score achieved in each class demonstrates the stability of the prediction model's performance.Moreover, the average of 0.89 obtained in the 10-fold cross-validation technique indicates that the model distinguishes most cases correctly, that is, the next activity within the business process, regardless of the partition of the tested data set.
The results achieved by the LSTM network model support the hypothesis of implementing deep learning approaches to predict future events within healthcare processes.According to Evermann et al. [21], three factors can (positively or negatively) affect prediction performance.First, parameters initialized randomly in the training phase can lead to different values.Second, traces contained in an event log are in an arbitrary order.Third, the training data selection may influence the prediction, so the results obtained by a testing dataset may not be generalized to others.Nevertheless, our experiment is unaffected by the above factors; we verify each experiment by 10-fold cross-validation, achieving an accuracy of 94%, 88%, 84%, and 81%.The possibility of generalizing to similar datasets is evaluated by comparing the prediction performance from an independent validation sample with the training sample's prediction performance.Given the complexity that characterizes the clinical and organizational processes in the healthcare domain, it is essential to consider the implementation of process mining approaches when performing data analysis that supports decision-making within a hospital.Furthermore, predicting future events that may occur within the execution of a healthcare process has significant advantages, such as reducing patient care costs, patient waiting time for transfer between hospitals, queue time in patient care, and the time to order a blood test or to hospitalize a patient.
Exploiting event logs to predict the next event is essential for planning activities and resource assignments, such as preparing a machine (computed tomography or magnetic resonance imaging scan) or a resource ready for timely patient care.Therefore, deep learning for predicting events in healthcare processes through process mining approaches is feasible.
In summary, the following advantages and limitations can be described.The framework can be replicated in any organization that uses event logs in XES format and even in data sets that contain a trace/events format.With a trained LSTM model, the processing time required to predict the next event is minimal.The inference model has the ability to predict one or more future events correctly.The LSTM model can remember dependencies when using long data streams.On the other hand, the required processing time will depend on the number of events distributed in the traces contained in an event log.The LSTM network training must be rerun when new instances are added to the dataset.The inference model only operates through the attribute selected in the data extraction stage.

Conclusions
We presented a deep learning-based LSTM neural network approach for predicting future events or activities in healthcare processes.Before implementing the LSTM model selected, we pre-processed and binarized the events.Subsequently, we designed a neural network model and trained the network to choose the inference model that achieves the highest performance.
The inference model achieves high performance on accuracy measures, with values of 0.98, 0.91, 0.85, and 0.81 for patient transfer orders, sepsis, hospital billing, and healthcare collaboration event logs, respectively, based on the validation dataset.The accuracy metric is widely used to measure the performance of inference models that predict the next activity in a business process.However, in data sets that present imbalances in their classes, it is recommended to use the F1-score metric, in which values of 0.95, 0.90, 0.85, and 0.82 were reached for the same previously mentioned event logs.Therefore, the predictive method achieves the highest values in the evaluation metrics, which confirms the capability to predict the next event in the healthcare process.
In future work, we aim to extend the input vectors of the LSTM model with additional attributes (resources or participants) of the event log to predict other event types.To meet some of the challenges identified within the healthcare processes, such as estimating the time of occurrence of the next event or predicting the time required to complete a case.

Figure 2 .
Figure 2. Operation schema of the segmentation task.

Figure 3 .
Figure 3. One-hot vector representation of the output activities.

Figure 4 .
Figure 4. Architecture of a single LSTM cell.

Figure 5 .
Figure 5. Receiver operating characteristic (ROC) curves and their area under the curves (AUC) of worst classes existing in the validation stage of our experiment.

Figure 6 .
Figure 6.Loss function in the training and validation phases by epoch.

Figure 7 .
Figure 7. Accuracy in the training and validation phases by epoch.

Figure 8 .
Figure 8. ROC curves using all classes of the sepsis event log.

Figure 9 .
Figure 9. ROC curves using all classes of the hospital billing event log.

Figure 10 .
Figure 10.ROC curves using all classes of the healthcare collaboration event log.

Table 1 .
Example of the patient transfer order event log.

Table 2 .
Extraction of the LSTM prediction results.

Table 3 .
Key performance measures achieved by the inference method at the validation stage using the patient transfer event log.

Table 4 .
Key performance measures achieved by the inference method at the validation stage using the sepsis event log.

Table 5 .
Key performance measures achieved by the inference method at the validation stage using the hospital billing event log.

Table 6 .
Key performance measures achieved by the inference method at the validation stage using the healthcare collaboration event log.