Enabling Timely Medical Intervention by Exploring Health-Related Multivariate Time Series with a Hybrid Attentive Model

Modern healthcare practice, especially in intensive care units, produces a vast amount of multivariate time series of health-related data, e.g., multi-lead electrocardiogram (ECG), pulse waveform, blood pressure waveform and so on. As a result, timely and accurate prediction of medical intervention (e.g., intravenous injection) becomes possible, by exploring such semantic-rich time series. Existing works mainly focused on onset prediction at the granularity of hours that was not suitable for medication intervention in emergency medicine. This research proposes a Multi-Variable Hybrid Attentive Model (MVHA) to predict the impending need of medical intervention, by jointly mining multiple time series. Specifically, a two-level attention mechanism is designed to capture the pattern of fluctuations and trends of different time series. This work applied MVHA to the prediction of the impending intravenous injection need of critical patients at the intensive care units. Experiments on the MIMIC Waveform Database demonstrated that the proposed model achieves a prediction accuracy of 0.8475 and an ROC-AUC of 0.8318, which significantly outperforms baseline models.


Introduction
Intensive care units (ICU) play a pivotal role in caring for the most severely hospitalized patients [1], where clinicians must anticipate patient care needs according to a set of fast-paced physiological signals, and then provide aggressive life-saving treatments or interventions [2]. To assist clinicians with supporting evidence for timely and accurate medical interventions, an effective approach is analyzing time series which contain representative information related to the health status, e.g., the physiology, the respiratory and the neurological function [3][4][5][6][7][8]. In other words, early event prediction plays an important role in ICUs, and it ensures that hospital staff are prepared for interventions [9].
To provide high level supportive analytics, numerous predictive models and computeraided diagnostic solutions were proposed [10]. For example, different medical scoring systems (e.g., SOFA, SAPS, APACHE [11]) have been developed to provide computer assisted decision support. Usually, these scoring systems are based on some type of routine physiological measurements followed by logic-based regression techniques. However, these scoring systems are not able to discover the rich semantics of the vital physiological time series and are not well calibrated in predicting results [12].
Although medical scoring systems are still widely used for evaluating various clinical probabilities in the ICUs [13][14][15], machine learning approaches have been attracting more and more attention lately in the literature. In addition to predictive models based on logistic regression, more sophisticated approaches (e.g., random forests and clustering techniques) are employed to improve the predictive performance for early detection of emergency abnormal period of a certain signal, which illustrates two kinds of abnormal fluctuations, i.e., abnormal speeding up and abnormal slowing down. Among the physiological signals recorded in ICUs, ECG is one of the most important vital signs [35]. By analyzing the ECG time series, researchers can not only reveal the respiratory rate, heart rate and variability, but also reduce the false alarm in ICUs [36][37][38]. Thus, ECG provides a good chance for understanding the patient's physiological status. Recently, a number of models have been developed for end-to-end ECG diagnosis and illustrated a superior performance [39][40][41][42]. However, these models were directly fed with raw ECG waveforms, without exploring the fine-grained temporal fluctuations or trends, which are key to ECG-based medical diagnoses [31], especially for the treatment of acute heart attacks, acute coronary syndromes, and other life-threatening symptoms in ICUs [43][44][45]. Moreover, it should be noted that other physiological signals can be important supplements for ECG-based analysis. As a result, to explore the temporal nature of multiple physiological signals, this study proposes to build a hybrid model by combining the convolutional neural network and the recurrent neural network, aiming to take full advantages of CNN's ability of extracting local features and LSTM's capability of mining long dependencies of the time series. Specifically, in this work we mainly consider the following signals, including arterial blood pressure (ABP), peripheral arterial oxygen saturation (SpO 2 ), heart rate (HR), pulse, and respiration rate (RESP). Then, to further improve the model's interpretability, this model incorporates a fluctuation attention mechanism for CNN and a multi-channel trend attention mechanism for LSTM. Based on attentive modeling of the hidden characteristics of multi-variate signals, the work can identify the inputs that have more significant influences on the model's output.
To sum up, the contributions of this paper are three-fold: First, to characterize the abnormal pattern of physiological variables more accurately, this work propose a novel hybrid neural architecture by combining a CNN and a LSTM. Particularly, CNN aims to find compact latent features in each wave components, and LSTM is utilized to learn long dependencies of time series to model the overall variation patterns.
Second, to enhance the interpretability of the proposed model, this work designs two attention mechanisms, including a fluctuation attention mechanism for CNN and a multichannel trend attention mechanism for LSTM. Moreover, this work performs attention fusion across fluctuations and trends of different time series to characterize variation patterns according to their importance.
Third, this study achieve state-of-the-art prediction results in the forward-facing prediction of emergency rescue medications in ICU, which can help ensure hospital staff are prepared for interventions as early as possible.
The remainder of this paper is organized as follows. Section 2 reviews the related work. Section 3 describes the proposed approach in details. Experimental results are presented in Section 5. Finally, Section 4 concludes the paper.

Related Work
This section will briefly review the related work, which can be grouped into three categories.

ICU Scoring Models
The medical scoring model gives an assessment of the patient's health status in the form of a score [46], which refer to the clinical severity of the patient. The outcome of forecasting scores can help caregivers be aware of patients at risk and take appropriate actions in advance to prevent these patients from deteriorating [47]. For instance, the sequential organ failure assessment score (SOFA score), which is based on six different scores, is useful in predicting the clinical outcomes of critically ill patients [48]. In the logistic organ dysfunction system (LODS), logistic regression techniques are used to determine severity levels and provided an objective tool for identifying the organ dysfunction level (from 1 to 3) for six different organ systems [14].
Specifically, there are two widely used ICU scoring models at present. The Simplified Acute Physiology Score (SAPS) model calculates the severity of disease for patients admitted to intensive care units, by using 12 routine physiological measurements of the past 24 h [49]. The Acute Physiology And Chronic Health Evaluation (APACHE) model is used to calculate the probability of death independent of diagnosis, based on markers for the extent of the abnormality of 12 common physiological and laboratory values [50].
In general, the outputs of the models are ordinal, i.e., a higher score corresponds to a higher severity. However, all of them are based on fixed time intervals, without considering neither the evolving clinical information nor the non-linear constructed latent features [10,30].

ICU Interventions
Intensive care unit interventions refer to medical treatments given to seriously or critically ill patients who are at risk of conditions that may be potential or established organ failures [51]. Existing studies mainly relate to the content of emergency airway care, respiratory failure and so on [52].
Mechanical ventilation (i.e., assisted respiration) is one of the most common intervention implemented in the intensive care medicine [53]. For instance, a number of studies have been conducted to determine the factors that could help predict the possibility of mechanical ventilation and weaning [54][55][56]. Vasopressor is another commonly used intervention in a medical intensive care unit [57]. For example, Wu et al. [58] used a switching-state autoregressive model to predict the need for a vasopressor. Similarly, to make the intervention model more applicable, unsupervised switching state autoregressive models [9] have been developed by combining waveform recordings with demographic information, aiming to simultaneously provide an in-hospital early detection for five different clinical intervention.
Nevertheless, existing works mainly focus on improving the prediction performance for actionable interventions several hours ahead of onset, and none of them have explored the prediction problem of immediate intravenous injections, which is a core focus of our work.

Deep Learning on ICU Data
Intensive care treatment is highly challenging due to the chitinous generation of a large amounts of heterogeneous health-related data. Thereby, more and more attention is being paid to deep learning based data processing and assistant decision-making, aiming to improve the accuracy of clinical identification and prediction [24,29]. For example, Rajpurkar et al. [59] developed a multi-layer CNN model to detect arrhythmias based on ECG time-series. Similarly, a deep learning based model was built to classify 12 rhythm classes [60], which achieved a state-of-the-art performance.
However, these studies mainly explored the time series of a single vital sign, and could not provide a more comprehensive characterization of the patient's status in clinical environments (especially in ICUs). A better choice is to fuse multiple simultaneously collected time series with deep models. Recently, a set of models had been proposed to combine vital physiological time series with demographic information (including age, gender, lab test results and so on) to provide clinical predictions [30,61]. Similarly, Lipton et al. [62] had shown promising results using multivariate time series of clinical measurements for learning and prediction.
Nevertheless, the timeliness and interpretability of existing models are still not good enough for the prediction of impending medication intervention needs in ICUs. Therefore, a more effective model is needed, which should be able to provide timely and interpretable predictions, by exploring the fine-grained temporal trends and fluctuations of multivariate time series.

Methodology
This section describe the proposed multi-variable hybrid CNN-LSTM model, which is mainly composed of a multivariate input processing layer, a hybrid attentive model layer and a predictive output layer.

Overview of MVHA
This subsection first briefly describes the framework of MVHA and introduces the notations used in this article. We denote multivariate physiological signals as S = [G, L], where G represents the high-frequency waveforms (such as ECG) and L represents the numerical waveforms (such as HR). Aligned with the i-th intravenous intervention, we denote multi-channel high-frequency waveforms G at time step t as: i_t . Similarly, the numerical signals L at time step t is defined as: i_t ∈ R nl , 0 ≤ t ≤ T, cl = 1, 2,. . . , CL and CL = |L i |, nl denotes the length of l (cl) i_t . Particularly, T represents the time steps used for the prediction of a medical intervention, g cg i is the continuously monitored high-frequency waveform by channel cg, and l cl i denotes the numerical sign sampled by channel cl. The used notations are summarized in Table 1.

Notation Description
S, s, s k multivariate physiological signals (G and L), one of g (cg) or l (cl) , the k-th segment in s G, g (cg) , g k high-frequency waveforms, the cg-th channel in G, the k-th segment in g L, l (cl) , l k numerical waveforms, the cl-th channel in L, the k-th segment in l P∈ R (U×J) , p (j) ∈ R U the convolutional features, the j-th column in P O∈ R (U×M) , o∈R U , o f k output of the CNN layer, the sum of p (j) , output of the fluctuant level attention weights of the fluctuant level attention, the k-th value in α, weights of the trend level attention, the k-th value in β H∈ R (J×M) , h k output of the Bi-LSTM layer, the k-th column in H Z∈ R (J×CH) , z combination of H, the sum of h k X∈ R (I×CH) , x output of the fully connected layer, the k-th column of X A f l , A tr feature weights of the fluctuant level attention, feature weights of the trend level attention d, tr (kt) , ρ(s k ) output of the trend level attention, difference between s k and s k−1 , max, mean or min of s k y i prediction result of the i-th segment Given a time step t and an observation window W for the i-th intervention, this work takes the observed multivariate time series S(t − W, t] (including both G(t − W, t] and L(t − W, t]) as input, aiming to predict the output value of variable y i . With a pre-defined step size, S(t − W, t] is first split into M equal length segments: s k , 0 ≤ k ≤ M (e.g., given a step with a length of 1 min, a high-frequency waveform segmentation g k of 125 Hz contains 7500 samples and a numerical waveform l k of 1 Hz contains 60 values). Next, CNN has been applied to these segments to obtain the convolutional output o k and the fluctuant level attention o f k , followed by a Bi-LSTM that transforms o f k into sequentially embedded vectors H and Z, and then a fully connected layer is adopted to convert Z into X. After that, this work makes use of the weighted average to integrate X = [x (1) ,. . . ,x (CH) ] (CH = |G| + |L|) across all channels to obtain the trend level attention d, which will be concatenated with tr (kt) (1 ≤ kt ≤ M − 1) and used for prediction. Among them, tr (kt) = |ρ(s k+1 ) − ρ(s k )| represents the difference between s k and s k+1 , where ρ(s k ) calculates the statistics of segment s k (i.e., max, mean or min). Specifically, to improve the model's accuracy and interpretability, this study design a two-level attention mechanism (i.e., a fluctuant level attention and a trend level attention, denoted as α and β. Figure 2 depicts the framework of the proposed model.  For a multivariate time series, to exploit the local dependency patterns among different channels, this study adopted convolutional neural networks to encode the time series and map them to the latent space. Formally, the study first split G and L of S(t−W, t] into a sequence of equal length segments. In particular, the segments of S(t−W, t] is defined as follows. Next, 1-D convolution is applied to the obtained segments to extract features P = conv(s), where s stands for g (cg) k or l (cl) k , P∈R U×J , U is the number of filters, and J is the length of the segment after convolution (a hyperparameter of CNN [29,34]). And then, added p (j) along the J axis together to get the value of o, which can be shown as: The dimension of the M segments output was finally fixed at O ∈ R U×M , in which the first dimension corresponded to the number of filters and the second dimension corresponded to the number of segments. Therefore, the output of the CNN layer is defined as: where p (j) ∈ R U , o ∈ R U , and O ∈ R U×M . Fluctuant Level Attentive Layer. To extract fluctuant level patterns, this study propose a fluctuant-specific weight vector α (with a size of 1 × M) to aggregate the physiological feature maps. Thus, the model obtains better fluctuant level interpretation o f k = α k o k , where α k represents the weight of the k-th fluctuant level features. Then, to sequentially represent the history information of the physiological time series, we adopt LSTM to characterize the long-term temporal dependencies. Specifically, the LSTM units include a set of gates to control when the information should be maintain in the memory cell, when it should be forgotten and when it should be outputted. For a given time series X t = {x 1,t , x 2,t , . . . , x k,t } at time t, the encoder layer employs the input gate ig t , the output gate og t and the forget gate f g t to jointly control the cell state c t and the output h t as follows: where the group of tensors W and b are the matrices and bias parameters to be learned during training, x t is the current input, h t−1 corresponds to the previous state, and c t is the cell state vector at the current time step. Due to the use of different gates, LSTM can overcome the vanishing gradient problem and capture the long-term dependencies of time series. Specifically, this model use a standard configuration of the bidirectional LSTM network, due to its abilities to capture temporal dependencies. The output of LSTM is denoted as h k = biLSTM(o f 1 , o f 2 , . . . , o f k ). Finally, by concatenating the forward and backward outputs, we obtain the sequential encoding features as H ∈ R J×M . Trend Level Attentive Layer. The trend level attentive layer is designed to obtain a more comprehensive view of the multivariate signals, by fusing attentions across all the channels. First, a fully-connected transformation is performed on the LSTM feature map as follows: where z = ∑ M k=1 h (k) , Z ∈ R J×CH , W z ∈ R J×I , b z ∈ R I , and X ∈ R I×CH . Then, considering that different signal channels play different roles and have various importance, this model introduce a trend-specific weight vector β (with a size of 1 × CH) to fuse the trend level attentions as d = ∑ CH k=1 β (k) x (k) . Finally, given the encoded state d and the time-varying variable tr ch kt , the model can predict a categorical output y i based on multivariate regression as follows:ŷ i = so f tmax(W y i h d + W y i tr tr ch kt + b y i ).
Specifically, the model adopt the cross-entropy loss function as follows: whereN denotes the number of instances in a mini-batch, y i andŷ i represent the true label and the predicted label of the i-th instance, respectively.

Hybrid Attention Mechanisms
The above section have described the framework of the proposed model. To further explain the design principle of the model, this subsection will present the details of the proposed hybrid attention mechanisms.
In order to better characterize fluctuation and trend changes, this study imported two attention mechanisms in the proposed model, i.e., a fluctuant attention and a trend attention. To obtain the fluctuant attention vector α and the trend attention vector β, the model design is a two-step neural network. Specifically, the first full connection layer is used to calculate the scores for computing weights, and the second full connection layer is designed to compute the weights with via Softmax activation.
Fluctuant Attention Mechanism. To characterize fluctuations with attention weights α, the model first compute the standard deviation of each obtained segment s, and obtain the fluctuant level knowledge feature vector A f l = SD(S) as follows: where SD(·) calculates the standard deviation of each s of the time series S. Afterwards, the model concatenates the knowledge features with the output of the CNN layer to obtain the attention weights: where W f l ∈ R (U+E f l )×D f l is the weighted matrix at the first layer, V f l ∈ R D f l ×1 is the weighted vector at the second layer, b f l ∈ R D f l is the bias vector, denotes an addition with broadcasting, A f l ∈ R E f l ×M , and α ∈ R M . We further present the fluctuant attention in more detail in Algorithm 2. Figure 3 shows the structure of fluctuant attention.  Trend Attention Mechanism. Intuitively, signals with significant changes are likely to contain more important information, and should be given more attentions. However, as different channels of the multivariate time series usually have different amplitudes, this study adopts the min-max scaling to normalize the time series first, based on which this model further extracts the trend level knowledge feature weights A tr of each channel as: Based on the above formula, the model can obtain the trend level knowledge feature vector A tr = [trc 1 , . . . , trc CH ], and then calculate the attention weight β as follows: where W tr ∈ R (U×M+E tr )×D tr and V tr ∈ R D tr ×1 are the weighted matrix and vector in the first and second layers, respectively. b tr ∈ R D tr is the bias vector, represents an addition with broadcasting, A tr ∈ R E tr ×CH , and β ∈ R CH . We further present the proposed trend attention in more detail in Algorithm 3. Figure 4 shows the structure of trend attention.

Experiments
This section first describes the dataset and baseline models used in this work, and then presents the experimental results.

Dataset
To evaluate the performance of the proposed model, this research use the MIMIC-III (Multi-parameter Intelligent Monitoring in Intensive Care) Waveform Database Matched Subset [63]. MIMIC is a publicly available benchmark dataset which contains over 58,000 hospital admissions from approximately 38,600 adults, whose physiological signals were recorded continuously in ICUs. These waveform records include thousands of recordings of waveforms (such as one or more channel of ECG signals) and the time series of vital signs (such as heart and respiration rates). This research chose 18 frequently used rescue intravenous drugs in critical care unit (CCU) [64], which is a special department of the ICU, and got 19,608 experimental records. These medications include sodium nitroprusside, nitroglycerin, dopamine, dobutamine, norepinephrine, milrinone, amiodarone, lidocaine, epinephrine, adenosine, alteplase, esmolol, diltiazem, phenylephrine, hydralazine, nesiritide, procainamide, and isoproterenol.
In the experiment, this work aimed to predict whether an intravenous injection of the mentioned drugs is needed. Specifically, this research formulates the prediction issue as a binary classification problem, i.e., whether the patient needs an injection within a certain time period. Normally, the medical staff of the emergency treatment in ICUs would inject a variety of drugs into patients in a relatively short time period. Therefore, this work takes all drugs that were injected 2 min before and after a certain time point as the same group. For example, as shown in Figure 5, the subject was given a group of injections, including three doses of norepinephrine and one dose of lorazepam. Accordingly, this work identified 18,792 groups of intravenous injections. For each injection event, 30 min of time series were extracted from the dataset by taking the event as an endpoint. With the constraint that there should be only one group of intravenous injections in the extracted time series, a total number of 14,465 groups were obtained. The experiment took the first half of each time series as a negative sample and the second half as a positive sample. Specifically, the obtained time series consisted of five vital signs, i.e., heart rate (hr), pulse, respiratory (resp), peripheral capillary oxygen saturation (SpO 2 ) and ECG. Missing values were imputed using piecewise cubic spline interpolation in the experiment.

Experimental Setup and Baseline Models
Training and Implementation Details. For the training of CNNs, various numbers of convolutional layers (ranging from 1 to 5) and filters (ranging from 8 to 64) have been tried, with the hyperparameter of stride setting as 1 or 2. Similar to existing studies [29,65,66], this study use batch normalization, rectified linear unit (ReLU) activation and max pooling between convolutional layers to prevent overfitting. Specifically, this model utilize a 3-layer CNN for high-frequency time series (i.e., EEG) with the filter size ranging from 10 to 3, a 2-layer CNN for the other time series with the filter size varying from 5 to 2.
Furthermore, this work explore the Bi-LSTM from one to eight layers and the number of hidden units from 8 to 64. Meanwhile, different configurations are tested, including different mini-batch sizes (16, 32 and 128) and different optimizers (stochastic gradient descent, adagrad and Adam). Specifically, the model used a 3-layer Bi-LSTM by setting the number of hidden units to 16. The model's initial weights/parameters are given randomly, and the learnable ones are updated in each loop based on the Adam optimizer, with the learning rate of 0.002. The dropout rate is set to 0.5 in the fully connected prediction layer. The model is trained with a mini-batch size of 128 samples, and the dataset is randomly divided into three subsets, i.e., a training set (70%), a validation set(10%) and a test set (20%). In our experiments, all models are implemented with Pytorch 1.1.0 and the used machine is equipped with Intel Xeon E5-2640, 256 GB RAM, 8 Nvidia Titan-X GPU and CUDA 8.0. The workflows of the proposed hybrid CNN-LSTM model is shown in Figure 6.

Experimental Results
The experiment measure the models' performance based on accuracy (ACC), area under the ROC curve (ROC-AUC) and F1 score. Table 2 reports the performance of each model on the prediction task. The results reflect that the proposed model MVHA outperforms all other models. Meanwhile, all attentionbased predictions show better performance than without, which agree with the premise of utilizing the attention mechanisms can distinguish between samples more clearly in result. In order to get a better view of the results, a boxplot graph of the accuracy is shown in Figure 7.  CNN (ECG) has a relatively satisfied classification result and two main reasons are speculated: first, the samples came from CCU which treated patients with severe cardiac diseases, and these acute diseases influence the ECG directly; second, high dense signals contain enough information for completing some certain tasks, and the ability of the designed CNN could utilize these multidimensional inputs efficiently. In other ways, however, its performance was inferior to that of other models (such as CNN-FAttn), perhaps suggesting that ECG needs to be integrated with other time series data for prediction tasks. CNN-FAttn hold all the time series and fluctuant level attention mechanism to improve performance. Particularly, CNN-FAttn surpasses CNN (ECG) by up to 1.5% for ACC, which indicates that the representatives from wider signal sources help in performance improvement.
The rest of the five kinds of models incorporate both CNN and LSTM. CNN-LSTM gives the relatively poorer experiment results compared with other models. It can be explained that a proper short space of waveform from the injection point could provide sufficient contextual information and, if too long, may undermine information already mined from previous search time series. In a further study, the shorter waveforms may be used for such research. Adding multi-channel trend level attention CLSTM-TAttn has higher scores compared to CNN-LSTM, but did not beat CNN-FAttn and CLSTM-FAttn, maybe indicating that whencomparing with the trend variation in a short time, the violent fluctuation of signals seems to be more significant for impending need intravenous injection. Furthermore, it can be found that whatever type of attention models we decided to use, the method can improve the classification performance. Lastly, the proposed model MVHA that incorporates changes from both fluctuation and trends events reaches the best performance on prediction. That is, mining the fluctuation pattern and overall variation trends could retain more useful information for the classification.
To validate the interpretability of the proposed attentive model, Figure 8 presents the predicted risk level for an intravenous injection of an unseen patient. Accordingly, this study can find that the patient is predicted to have a higher risk of intervention than average during the 11th-13th min (highlighted cells as yellow and orange). Apparently, a time slice would receive higher attention if it is closer to the time point of an intravenous injection or it contains significant fluctuations, which proved the effectiveness of the proposed fluctuation level attention mechanism. In addition, for the trend level attention (as shown in Figure 9), we find that the ECG channel receives the highest attention weight, the other three channels (i.e., Heat, Pulse and Resp) attract slightly lower attention, and the SpO 2 channel has the lowest attention. It indicates that, on the one hand, ECG provides the most important evidence for the prediction of intravenous injections. On the other hand, while high-frequency time series contain abundant information, it is still necessary to take into account other vital signs to enable timely and accurate medical interventions.

Conclusions and Future Work
This paper proposed a hybrid deep model to enable timely medical intervention by exploring health-related multivariate time series. Specifically, CNNs were utilized to mine local features and LSTM to depict time-dependent features. Furthermore, to improve the interpretability of the prediction result, a two-level attention mechanism (i.e., fluctuant level attention and trend level attention) is developed to focus on key time slices and key channels. MVHA is finally set as 3-layer CNNs for high-frequency time series, 2layer CNNs for numerical waveforms plus 3-layer Bi-LSTM. Total number of learnable parameters in our model is 3392. Experiments on the MIMIC dataset showed that the proposed model significantly outperformed baseline models. In the future, we plan to extend the proposed model by taking into account multi-modality data, such as medical text and medical image and another possible future direction is to study other kinds of medical interventions. Meanwhile, sparse neural networks, which use what is known as network pruning, would be adopted by a future model in order to reduce the computational load.
Further, in this work, by exploiting multi-channel waveforms, a hybrid attentive neural network was used to predict whether an intravenous injection is needed or not. On the other hand, many correlative references (such as Chen et al. [67]) also demonstrated that a rule-based system in the ICUs could execute decisions much faster with proper training for tagging critical events. However, against the background of this thesis, limitations of rule-based systems are as follows: first, when complex and high-density databases are involved in one decision, it can be hard for humans to try instituting detailed and complete rules; second, if researchers want to make rule-based systems successful, it is important to consider the domain expertise, but that is not fully known at design time. While deep learning is more beneficial for analyzing the data and looking for correlations, rule-based systems are relatively simple and their output is easy for a human to debug. Meanwhile, because using the rule engine's data can come in handy in increasing the performance of the deep learning algorithm [68], in future work, neural network and operating rules systems would be considered in tandem, and this could be more beneficial to the framework than replacing rules entirely.

Conflicts of Interest:
The authors declare no conflict of interest.