Prediction of Blood Pressure after Induction of Anesthesia Using Deep Learning: A Feasibility Study

Anesthesia induction is associated with frequent blood pressure fluctuation such as hypotension and hypertension. If it is possible to precisely predict blood pressure a few minutes ahead, anesthesiologists can proactively give anesthetic management before patients develop hemodynamic problem. The objective of this study is to develop a real-time model for predicting 3-min-ahead blood pressure from the start of anesthesia induction to surgical incision. We used only vital signs and anesthesia-related data obtained during anesthesia-induction phase and designed a bidirectional recurrent neural network followed by fully connected layers. We conducted experiments on our collected data of 102 patients, and obtained mean absolute errors between 8.2 mmHg and 11.1 mmHg and standard deviation between 8.7 mmHg and 12.7 mmHg. The average elapsed time for prediction of a batch of 100 unseen data was about 26.56 milliseconds. We believe that this study shows feasibility of real-time prediction of future blood pressures, and the performance will be improved by collecting more data and finding better model structures.


Introduction
General anesthesia for surgery can be divided into three phases (i.e., induction, maintenance and emergence from anesthesia). Especially in the anesthesia-induction phase, the changes in blood pressure is rapid and can range variously from hypotension to hypertension. This usually caused not only by the administration of intravenous anesthetic agents (propofol and remifentanil), volatile anesthetic agents, neuromuscular blocking agents but also by airway manipulation to intubate patient's trachea for the mechanical ventilation. It has been reported that hypotension, even a short duration of mean arterial pressure less than 55 mmHg, is associated with acute kidney injury and myocardial injury [1]. On the other hand, hypertension, if left untreated, increases risks of bleeding, cerebrovascular events, and myocardial infarction postoperatively [2]. If blood pressure can be accurately predicted, anesthesiologist may proactively search for possible causes to prevent severe hemodynamic changes. This may enable early intervention such as adjustment of anesthetic agents, fluids, and vasoactive drugs. Thus, patient may not experience harmful consequences caused by hypotension or hypertension as such events can be prevented ahead. However, intraoperative accurate blood pressure predictions is nearly impossible because of complex mechanisms that cause blood pressure changes, and at least similar predictions require much experience and knowledge of factors that affect blood pressure fluctuations during surgery.
The volume of modern anesthesia data has increased with the use of electronic medical records. It is difficult for anesthesiologist to use these data to judge patients hemodynamic status during operation, so it will be helpful if there is a tool to support clinical decision making for anesthesiologist based on these data. Machine learning technique is one of the tools for supporting clinical decision making, as it is known to be effective in learning arbitrary patterns of data. There have been few studies on hypotension prediction using machine learning models. One study used various machine learning models based on existing information in the electronic health records to predict hypotension within 10 min of anesthesia induction [3]. It should be noted that it does not predict when exactly the hypotension event will occur, but predicts whether the event will occur within 10 min; so it is not applicable to real-time service because the event may occur right after 1-2 s. There was another study that predicts the probability of developing hypotension 15 min before its actual occurrence by applying a machine learning model to the waveform of invasive arterial blood pressure [4]. Although it predicts the potential hypotension event before its actual occurrence, it is not suitable to real-time prediction because it assumes that the hypotension event does not occur again within 20 min. It is obvious that we do not know when future events will occur, so it is impossible to use it for real-time prediction based on such assumption. As far as we know, most previous studies basically aimed at a classification problem (i.e., predict the potential event), but there was no study for a real-time regression problem (i.e., predict the actual blood pressure).
Machine learning models have been widely adopted for the regression problem. Support vector regression (SVR) [5], which is based on the same principles of the support vector machine (SVM) [6], has been used to predict various real-numbers (e.g., stock price, demand/supply of pulpwood) [7,8]. Random forest (RF) [9] is one of ensemble learning methods. It is known to keep low bias of decision trees and avoid overfitting by controlled variance. The RF can be used for either classification or regression [10]. These traditional machine learning models have shown quite successful results, but they suffer from a common limitation; they strongly depend on a hand-crafted feature set that requires much effort of experts.
Deep learning is one of solutions for the limitation as it automatically extracts arbitrary patterns (i.e., features) beneath the observed data. The deep learning is rooted from artificial neural network (ANN) [11], and it can be used for the regression problem by adopting a particular loss function (e.g., mean squared error). The deep learning is theoretically capable of modeling all non-linear patterns by stacking many layers. As it is discovered that stacking too many layers might cause worse outcome (e.g., low accuracy, high error) due to gradient vanishing [12], there have been several approaches to effectively building deeper layers: rectified linear unit (ReLU) [13], residual connection [14], shortcut connection [15], Inception module [16], and pretraining concept [17,18]. Thanks to these studies, convolutional neural network (CNN), which mainly consists of convolutional layers and pooling layers, was widely used for detection and recognition problem [19][20][21]. The convolutional layer extracts latent local features, and the pooling layer picks the most meaningful feature among the extracted local features; the CNN effectively captures local patterns and makes a decision by summarizing the most meaningful patterns. On the other hand, recurrent neural network (RNN) [22] allows a layer to have a recursive connection to itself, so that the RNN effectively captures sequential patterns by memorizing previous inputs. Such property makes the RNN to be widely used for machine translation [23,24] and real-time prediction for sequences [25,26].
In this paper, we aim at real-time prediction of blood pressure between the induction of anesthesia and the beginning of operation. This is basically a problem of real-time regression for blood pressure.
Please note that we do not predict the current blood pressure, but the blood pressure of future (e.g., three minutes later). We adopt the RNN to capture arbitrary features from the sequential vital signs, and it makes prediction based on the features. As far as we know, this is the first study of applying the RNN to the real-time prediction of future blood pressure. We believe that this might be helpful for preventing some patients from falling into a critical condition.
This paper is structured as follows. Section 2 describes the characteristics of target data (e.g., vital signs) and how we preprocess the data. It also provides details of our proposed approach as well as the definition of input and output. Section 3 demonstrates the performance of our approach by experimental results, and Section 4 interprets some sample results and discuss about additional experiments with different settings. Finally, Section 5 summarizes and concludes this paper.

Materials and Methods
This paper aims at solving a new problem that predicts future blood pressures in real time. We basically follow Data Science (DS) methodology from problem to approach. As mentioned so far, real-time prediction of blood pressures will help to prevent patients from falling into critical condition. Practically, we follow Cross Industry Standard Process for Data Mining (CRISP-DM) methodology that is an iterative process of several steps such as business understanding, data understanding, data preparation, modeling, evaluation, and deployment. We collect and examine the data of vital signs, and preprocess the data to feed them to train our proposed model as shown in Figure 1. The model is designed to incorporate underlying sequential patterns of the vital signs, and evaluated by averaged absolute errors of 10-fold cross-validation. At the running phase, future blood pressures will be predicted given the vital signs for previous few minutes (e.g., 3 min). As this paper is a feasibility study, it is not ready for deployment; it must be carefully deployed because this is a life and death situation. We will keep collecting more data and improving the model for deployment. This retrospective study was approved by the institutional review board of Soonchunhyang University Bucheon hospital (approval No. 2019-08-016). We collect data from three operation rooms of Soonchunhyang University Bucheon Hospital, where the operations are performed between 29 October 2018 and 18 January 2019. The data are obtained from various devices using Vital Recorder: B × 50 (patient monitor), Solar 8000M (patient monitor), Datex-Ohmeda (anesthesia machine), Primus (anesthesia machine), BIS (brain monitor) and Orchestra (infusion pump), which results in a K-dimensional real-valued vector. As the vector has few missing values, we employ two strategies: (1) replacing the missing values with the mean of surrounding values, and (2) replacing the missing values with the lastly observed previous value. We apply the first strategy to the vital signs obtainable from the BIS (e.g., signal quality index (SQI)), and make use of the second strategy to other values. In this paper, the vector dimension K is 27, and the detail of the vital signs is described in Table 1. Each dimension of the vector has a distinct sampling rate; for example, BIS/SQI and BIS/BIS are collected every second, whereas TV and MV are collected every six seconds. To address this issue, we assume that all dimensions have the common sampling rate (i.e., three seconds). For example, the blood pressure values (e.g., mean blood pressure (MBP), systolic blood pressure (SBP), diastolic blood pressure (DBP)) are obtained every 1∼3 min (mostly every minute), so these values are assumed to be fixed until their new values come in. That is, if the MBP value is sampled every minute, then the MBP values for every 20 timesteps will be the same.
For each r-th surgery operation, we collect K-dimensional vectors for T r seconds, where 1 ≤ r ≤ R and R denotes the number of operations. As we assume that all vital signs are sampled every three seconds, the total data becomes R × K × (T r /3) tensor (i.e., R sequence of K × (T r /3) matrices). Please note that the T r for different operation will be different because different operations probably have different operation time. For our collected data, the number of patients (i.e., the number of operations) R is 102.The statistics of the collected data are summarized in Table 2. Figure 2 depicts a sample sequence of the collected data. Please note that the three blood pressure values are fixed for 20 timesteps (i.e., one minute) while other values change.    We transform the sequence of K × (T r /3) matrices into a shape for the real-time sequential prediction of future blood pressure as follows. First, for each t-th timestep where 1 ≤ t ≤ T r /3, we define the sequence of vital signs excluding blood pressures for previous W timesteps as an input i t ; in other words, the input i t is a K − 3 × W matrix for the timesteps between [t − W + 1, t]. We also add the timestep t into the i t , so finally the i t becomes a K − 2 × W matrix. Second, we define a normalized blood pressure at the t-th timestep as a supplementary input si t . If the blood pressure value is 125, then it is divided by 250 to be normalized (e.g., 125/250). We take only the latest observed blood pressure, but not the blood pressure values for W timesteps because the inconsistent sampling rate (e.g., every minute) of the blood pressure may harm the results of the RNN. Third, we define the blood pressure value of the timestep t + G as an output o t . It is important that the output o t is not the blood pressure at the timestep t, but the future blood pressure at the timestep t + G. Through the steps above, for each timestep t, we generate a triple of the input i t , the supplementary input si t , and the output o t . Assuming that t = 100, W = 60, and G = 20. The input i 100 will be a (K − 2) × 60 matrix and the si 100 will be a real-number of the normalized blood pressure at the 100-th timestep. The output o 100 will be a blood pressure value at the 120-th timestep. This can be interpreted that it predicts the blood pressure of one minute later (i.e., after 20 timesteps) given the lastly observed vital signs for three minutes (i.e., 60 timesteps). As we generate the triple (i t , si t , o t ) for every timestep, the total number of triples for the r-th operation will be T r /3 − W − G + 1. We conduct the above transformation process to the three blood pressures (e.g., MBP, DBP, and SBP) independently, and got the triples for each of them. The transformation process is summarized in Figure 3. In short, the input consists of W vital signs i t and a current blood pressure si t , while the output is a future blood pressure o t after G timesteps. We observe that different operations exhibit different sequential patterns (e.g., different aspect of heart-rate changes). To incorporate such diversity of sequential patterns, we design an RNN model followed by fully connected layers as shown in Figure 4. Given the input i t for r-th operation, the W vital vectors are sequentially injected to the RNN. Please note that our RNN has bidirectional and hierarchical structure. The bidirectional RNN consists of a forward RNN and a backward RNN, where the forward RNN and the backward RNN can capture forward patterns and backward patterns, respectively. There might be sequential patterns of a forward direction and a backward direction, so we take the bidirectional RNN to incorporate such patterns. Meanwhile, both forward and backward RNN are hierarchical as they have two stacked layers. The first RNN layer may capture sequential correlations between different vital signs (e.g., a propofol rate and the heart-rate), and the second RNN layer catches high-level sequential correlations between the correlations found at the first layer. Thanks to the bidirectional and hierarchical structure, the RNN will memorize high-level sequential patterns in both directions. The forward RNN yields a R 2 -dimensional summary vector h F , and the backward RNN also gives a R 2 -dimensional summary h B . These two summary vectors are then concatenated and the supplementary input si t comes into the vector, resulting in a 2 × R 2 + 1-dimensional vector. The concatenated vector is passed to the fully connected layers that are supposed to find some correlations between the h F and h B . For example, when the RNN layers may capture 'increasing trend of heart rates' and 'fluctuating ETCO2' patterns, the fully connected layers may find how positive or negative correlation they have. Finally, given the F 2 -dimensional vector generated by the second fully connected layer, the output layer predicts the future blood pressure.
For the cell of the RNN layers, we adopt the Gated Recurrent Unit (GRU) [27] that is one of the most widely used RNN cells. The most important aspect of the RNN cell is that it remembers previously observed information. Although the RNN cell must be capable of preserving every previous information theoretically, it loses long-term information practically. The GRU is one of solutions to settle such issue by two types of gates (i.e., an update gate and a reset gate). These two types of gates help to preserve important long-term information while discarding unnecessary information. Thanks to the GRU cells, the bidirectional RNN layers give two vectors (e.g., h F and h B ) that capture important sequential patterns in both directions. For the two fully connected layers, we adopt the rectified linear unit (ReLU) [13] as an activation function as it is known to prevent from the gradient vanishing problem. For the output layer, we take mean squared error (MSE), which is widely used for regression, as a loss function.

Results
We set W = 60 and G = 60, which implies that we predict the blood pressure of three minutes later, given the observed vital signs for latest three minutes. Please note that we use the vital data only obtained between the induction of anesthesia and the beginning of operation; we do not employ any other information (e.g., age, sex, base blood pressures, ASA). The total number of transformed data is 26,887. Each dimension of the transformed data is normalized except for the timestep value. The normalization, of course, is done with only training data. We take 10-fold cross-validation and compute mean absolute error (MAE). We conduct three independent experiments: SBP prediction, MBP prediction, and DBP prediction. All experiments are performed using a computer with eight Central Processing Units (CPU) of i7-7700 3.6 GHz and two NVIDIA GeForce 1080 Ti. The proposed model is implemented with Python3 language with Google TensorFlow packages.
The training recipe and parameter setting are as follows. The dimensions of RNN layers R 1 and R 2 are equally 15, and the dimensions of the fully connected (FC) layers F 1 and F 2 are 100 and 50, respectively. We applied the drop out [28] with a keep probability 0.1 to the RNN layers, and the decov [29] with a weight 0.1 to the FC layers. Both the drop out and the decov are known to have a regularization effect, which prevents from overfitting. In terms of the parameter initialization, the weight matrices of the FC layers are initialized using He initialization [30], and the biases are initialized as zero. The weight matrices of the RNN layers are initialized using Xavier initialization [31], and the initial bias value is one. We use Adam optimizer [32] with an initial learning rate 0.001 to train the model parameters, and the number of epochs is 60. For training phase, it computes a predicted blood pressure by feed-forward propagation; the RNN layers generate two vectors given a input, and the fully connected layers take the concatenation of the two vectors as an input and generate an output. It computes a cost (error) by comparing the predicted blood pressure and a true blood pressure, and all weights and bias values are updated via back propagation algorithm. For each epoch, the feed-forward and back propagation are conducted throughout all data with a mini-batch as a unit. In this paper, we set the size of mini-batch as 100. Table 3 summarizes the mean and standard deviation of the absolute errors obtained from the three predictions. Small mean and standard deviation mean that it predicts the blood pressure accurately. Among the three predictions, the DBP prediction is the most accurate while the SBP prediction exhibits the worst results. The Figures 5-7 depict histograms of errors, where horizontal axis represents error bins; for example, a bin [1-2) represents the range 1 ≤ e < 2 where e indicates an error. The three figures seem to have a form of Gaussian distribution, and they generally follow the trend of the true blood pressures. For instance, in Figure 6, the peak of distribution is located around the interval [0-1), which implies that the predicted MBP values are mostly correct compared to the true MBP values. However, the shapes of three figures are a bit left skewed, so the overall mean is between 8.2 mmHg and 11.1 mmHg while the standard deviation is between 8.7 mmHg and 12.7 mmHg. Figure 8 shows Bland-Altman diagrams of the three blood pressures. The diagrams imply that errors tend to grow when the average of a predicted blood pressure and a true blood pressure is high. This can be interpreted that it is hard to correctly predict the true blood pressure when the average is abnormally high because such cases were barely seen in the data.

Discussion
We investigated whether RNN could predict future blood pressure (e.g., 3 min ahead) during anesthesia-induction period. We found that our model could predict 3-min ahead blood pressure with absolute error around 10 mmHg for each SBP, DBP, and MBP. Although this error seems to be large for helping clinicians to use our model as decision support tool in the hemodynamic management during anesthesia for now, we suggest it is feasible for RNN to predict future blood pressure using only features those obtained from various anesthesia monitors, ventilator and drug infusion pump in relatively short periods.
We examine the plots of predicted blood pressure and true blood pressure. To do so, we trained the model with 90% of shuffled data, and the remaining data is used for examination. Figure 9 shows three plots of SBP prediction, where the two upper examples are relatively well predicted cases and the bottom example shows a poorly predicted case. Please note that the model gives its first prediction at the 120-th timestep because it sees the sequential data of three minutes (i.e., 60 timesteps) and predicts three minutes later (i.e., 60 timesteps). Because the SBP is sampled every minute, the plot of true SBP looks like stairs. Generally speaking, the three figures in Figure 9 show that the model well predicts the trend of future blood pressure; it captures when the SBP will arise, keep or fall. Interestingly, as shown in the second figure of Figure 9, the predicted SBP fluctuates as the true SBP even though it predicts the SBP of three minutes after. On the other hand, in the bottom figure, the predicted SBP follows the trend of true SBP but there is a steady gap between them. We believe that such gap will be reduced if we collect more data to incorporate various patterns of blood pressure. Among the hemodynamic changes occurring during surgery, hypotension is known to be frequent and has been reported to cause adverse outcomes after surgery [1]. Definition of intraoperative hypotension varies among investigators which ranges from MBP of 55 mmHg to 65 mmHg. In [33], it was revealed that MBP less than 60 mmHg for 11 to 20 min and MBP less than 55 mmHg for more than 10 min are associated with acute kidney injury. The mean absolute error of MBP predicted by our proposed model was 9 mmHg, which may not helpful to clinicians in some critical situations. For example, if the actual MBP is 58 mmHg, then MBP predicted by our model may range from 49 mmHg to 67 mmHg. Such variation of the predicted MBP might cause two opposite ways of management. If the predicted MBP is 49 mmHg, one will explore possible causes for hypotension, whereas one just observes blood pressures and do nothing if the predicted MBP is 67 mmHg. Of course, there can be another case that the predicted MBP is helpful. Assuming that actual MBP is around 75 mmHg, and predicted MBP may range from 66 mmHg to 84 mmHg. This is generally not harmful to most surgical patients. The Association for the Advancement of Medical Instrumentation (AAMI) established standards for the validation of automatic arterial pressure monitoring. It was defined as acceptable if error (e.g., mean absolute error) is not greater than 5 mmHg and standard deviation of errors is not greater than 8 mmHg for SAP and DAP [34]. In this regard, as the mean absolute errors of our model for SBP and MBP were 11 mmHg and 9 mmHg, respectively, which does not meet the AAMI standards. However, there is no consensus on the accuracy of clinically acceptable blood pressure because the AAMI standards are for the approving clinical validation of new automated blood pressure devices.
One may argue that there might be better parameter settings or better structure of the model. The training recipe and parameter setting used in this paper is obtained via a grid-searching. We varied the number of RNN layers and fully connected layers, and tried various dimensions. A part of the grid-searching result is shown in Table 4, where the relative change of MAE is computed using the best MAE 11.056 of Table 3; the relative change is (current MAE − 11.056)/11.056 × 100, so greater value means worse result. It seems that the bidirectional RNNs generally work better than the unidirectional RNNs. The FC dimensions represent F 1 and F 2 ; [100, 50] means F 1 and F 2 are 100 and 50, respectively, and [100] implies that it uses a single FC layer with F 1 = 100. It seems that using two FC layers is much better than using a single FC layer, and the regularization methods (e.g., drop out, decov) prevent the model from overfitting. This study aims at a real-time prediction of blood pressure, so one may ask 'Does this model really work in real time?,' because our model has a quite complex structure (e.g., a composite model of RNN and fully connected layers). We found that the average elapsed time for prediction of a batch of 100 unseen data is about 26.56 milliseconds. As our model must give a prediction result every three seconds, it is definitely capable of the real-time prediction.
Although our model exhibits its potential as a real-time predictor of future blood pressure, there is a room for improvement, especially about the error. About the SBP prediction, its mean absolute error 11.056 indicates that we still have a lot to do. The main reason for this is that our data is obtained from only 102 operations, which is not much enough for incorporating diverse patterns of operations. Thus, this study can be a first step that proves the feasibility of the real-time prediction of future blood pressure. We believe that our model will achieve further improvement as we will keep collecting more data. Another minor limitation of our work is that it gives its first result after some timesteps (e.g., 120 timesteps), which can be addressed if we collect the vital signs before the induction of anesthesia.

Conclusions
In this study, we prepared and preprocessed the vital signs, and design a recurrent neural network for real-time prediction of future blood pressures. We demonstrated that the model has a potential to predict the future blood pressures by histograms of absolute errors, but also observed its limitation (e.g., mean and standard deviation of absolute errors). By the plots of predicted blood pressures, we showed that the model can foresee the trend of blood pressures. We also proved that our model works in real time by measuring average elapsed time for prediction. This study is not a final stop and not ready for deployment, but shows a feasibility of the RNN-based model for the prediction of future blood pressures. We believe that this study will help to reduce facing emergent situations by warning to the medical team before it happens. For example, if our model reports that a future SBP will be low (e.g., 60 mmHg), then the medical team may inject arteriopressor to prevent potential hypotension. To improve the performance of the proposed model, we will keep collecting more data and finding better model structures. Furthermore, we will investigate other useful devices (or sensors) as well as a combination of clinical values (e.g., EMR) and the vital signs. We will also extend our study to develop a real-time system of an intraoperative prediction of future blood pressures.