1. Introduction
Bleeding is a common surgery complication. Acute blood loss is a life-threatening physiological condition, and the resulting hemorrhagic shock is one of the most common causes of mortality for surgical patients [
1,
2]. Beyond the assessment of fluid responsiveness, accurate estimation of blood loss volume (BLV) is critical to appropriately resuscitate surgical patients with fluids and blood if necessary. Monitoring and estimating BLV could also be used as a surgical quality indicator to improve the care of surgical patients and reduce patient morbidity, mortality, and health care costs [
3,
4,
5]. Reviewing the entire process of the clinical patient pathway, potential advantages for the estimation of surgical BLV include, but are not limited to, surgical quality review, optimization of anesthesia, transfusion therapy, and surgical management of actively bleeding and coagulopathic patients [
4].
BLV estimation requires a better solution. There are three main types of BLV estimation methods currently used in clinics: visual estimation, weighing compresses and gauzes, and BLV computation formulas using hematocrit (HCT) values. Among these, visual estimation remains the most frequently used. Physicians have visually estimated BLV based on prior experience. However, this method has proven inaccurate and unreliable, often biased toward overestimating modest blood loss and underestimating large blood loss. This is especially true of inexperienced surgeons [
6,
7]. Although some improvements, including referring to a nomogram, were proposed for visual estimation, estimation bias remains questionable in clinical implementation [
8]. The second most commonly used method, weighing compresses and gauzes, is widely used in surgical operating theaters and delivery rooms. This method performs better than visual estimation but is slow and cumbersome as well as difficult to apply during large and rapid blood loss. The third method is computational formulas using HCT values, based on the principle that HCT would proportionally decrease with the blood volume reduction. BLV can be derived by comparing the current HCT value with its baseline. Based on this principle, several BLV estimation formulas have been proposed to consider the compartmental equilibration corrections necessary to improve estimation performances [
9,
10,
11].
The need for reliable methods to estimate BLV approaches is rising, and the use of continuously monitored vital signs has attracted attention. Invasive arterial blood pressure (ABP) is the golden standard for hemorrhage identification in clinics. However, blood pressure (BP) only decreases significantly when a large blood loss has already occurred due to human compensatory responses, especially in young and healthy subjects [
12]. Surgical or traumatic hemorrhage often occurs acutely. The poor sensitivity of BP, and particularly the ABP signal, may not be able to yield an accurate estimate of BLV. Other vital signs, including heart rate and blood oxygen saturation, also have low sensitivity and specificity in BLV estimation.
The noninvasive photoplethysmography (PPG) signal is another potential surrogate for blood volume variations because of its ability to detect local intravascular volume changes. Previous studies have reported that PPG-amplitude-derived features, such as the pleth variability index and amplitude modulations, have the potential for measuring spontaneous blood loss [
13,
14,
15]. A prior attempt was made using the PPG waveform to detect hemorrhage but not for estimating BLV [
16]. Among obstacles to using the PPG waveform for tracking hemodynamic changes is the variability that exists among subjects. Simply correcting for moving baselines or amplitudes of the PPG signal does not fully address subject-to-subject variability [
17]. Current early hemorrhage detection studies are based on machine learning approaches, where feature normalization by referencing their baseline value could mitigate large subject-to-subject variability [
16]. Pinsky et al. reported the efficiency of normalizing features derived by vital signs by referencing their baseline values [
18]. It is thus hypothesized that normalized features derived by vital signs could be potential predictors for tracking hemorrhage and BLV.
In this study, we use an experimental design that simulates surgical hemorrhage in pigs. Critical vital signs ABP and PPG were continuously collected and their waveform features were extracted. Two machine learning methods, the least absolute square shrinkage operation (LASSO) regressor and random forest (RF) regressor, were performed on extracted features to train the BLV estimation model. Three different kinds of models are presented: BLV estimation using solely PPG, solely ABP, and the combination of PPG and ABP. Preliminary results suggest that our proposed models could estimate surgical BLV well on anesthetized pigs and normalized PPG features are superior to ABP in accurately estimating BLV in the early stage of hemorrhage, whereas normalized ABP features could enhance model performance as BLV increases.
2. Methods
2.1. Research Subjects and Hemorrhagic Surgery Protocol
Forty Yorkshire Pigs (31.82 ± 3.52 kg) were used. Animals were anesthetized, intubated and ventilated on volume control mode, and instrumented. The PPG signal was monitored in the tail (Masimo, Irvine, CA, USA), and the invasive ABP was monitored (LiDCOplus™, LiDCO Ltd., London, UK) in the femoral artery. After surgical preparation, subjects were observed without any further manipulation to set a baseline for 30 min. Animals were then bled using a roller pump (Masterflex L/S easy-load II pump, Cole-Parmer; Vernon Hills, IL, USA) at a 20 mL/min fixed rate until their mean arterial pressure (MAP) reached 30 mmHg. When subjects were in hemorrhagic shock, fluid resuscitation was performed immediately by femoral vein infusion.
Figure 1 depicts a schematic representation of the hemorrhagic shock protocol. Further details of these experiments have been previously presented [
19,
20]. All experiments were approved by the Institutional Animal Care and Use Committee of the University of Pittsburgh (Protocol Numbers 13061614 and 130923382).
The PPG and ABP signals were collected from the beginning of the baseline period at a sampling rate of 250 Hz. The reference BLV was calculated based on the time elapsed from the start of the bleed and pump bleeding rate according to data in the surgical notes. The initial blood volume of each subject was estimated using their weight. HCT data was measured from blood draws at the baseline, start of bleeding, start of resuscitation, and every 30 min otherwise. Only data accrued prior to resuscitation were used for the current analysis.
2.2. Feature Derivation and Normalization
PPG and ABP signals were first reviewed by clinical experts to judge the signal quality and differentiate physiological changes in signal noise induced by vital signs and artifacts based on visual distinction. No signal preprocessing was necessary on major data because of the acceptable signal quality, and other data with artifacts were dropped. Continuous signals were divided into overlapping data frames of a 1 min duration updated every 30 s. Features were computed over these 1 min windows. We selected this window length because 1 min long vital signs contain information to capture significant physiological changes and adequate waveform for feature extraction.
Graphical features include the amplitude and time domain features extracted from PPG and ABP, and their first derivative waveform has been widely used for tracking hemodynamics [
21,
22]. Our previous studies also attempted to use features derived from vital signs and machine learning techniques to detect hemorrhage and reported the ability of PPG-waveform-derived features, heart rate, and BP-derived features in reflecting physiological condition changes in subjects suffering from blood loss [
18,
23,
24]. Similar features were extracted as the feature metrics in this study. A detailed explanation of these feature metrics and their graphical diagrams are presented in the
Appendix A. For the PPG signal, 18 features were derived. For the ABP signal, 6 features were derived using the ABP-waveform-derived systolic blood pressure (SBP) and diastolic blood pressure (DBP). Six pulse rate (PR)-derived features were also included. The final feature vector of the ABP waveform included 12 features (6 from the ABP signal and 6 from the PR), and the feature vector of the PPG waveform included 24 features (18 from the PPG waveform and 6 from the PR). A feature vector computed over a single 1 min window constituted an observation.
Two different feature-rescaling approaches were performed to normalize features to the subjects’ corresponding baselines. Given total subjects, baseline data for the subject are designated . The data in the bleeding period are . Feature vectors extracted from and are , , respectively. is the set of all feature vectors of subject j, and is the th feature vector in .
The first approach is called person-specific feature normalization. For the
feature vector of subject
j, we first compute the feature’s mean value in its baseline as the center value then normalize the corresponding subject’s features by dividing the feature by the center value, shown in Equation (1).
is the subject-normalized feature vector of .
The second feature normalization approach is called group-specific feature normalization. For the
feature vector of subject
j, in this approach, the center value is the mean of all the subject’s feature value in their baseline, and the normalized feature
is computed as in (2).
By using these two approaches, features were normalized as the ratio to the baseline; thus, subjects’ differences are addressed. The advantage of group-specific normalization lies in its applicability even in the absence of a personal baseline.
2.3. Model Development
We derived models for BLV estimation using a supervised machine learning framework. Two different machine learning methods, a RF regressor and a LASSO regressor, were trained in this study. RF is an ensemble method combining multiple decision trees to avoid overfitting, and LASSO is a linear method that performs both regression and feature selection with L1 regularization. Data used for model training were observations from both baseline data and bleeding data. The label of each observation was 0 (for all baseline feature vectors) or the computed BLV at that time from feature vectors computed from bleeding data. We then used the trained models to estimate the BLV. The input of the trained models was the observations and the output was the estimated BLV at that time. To avoid model overfitting, we conducted our experiments using a leave-one-animal-out cross-validation (CV) protocol. Observations from one subject would be designated as the test set, while observations from the other thirty-nine subjects were the training set. Our models were developed using the
scikit-learn toolbox in the Python 3.6 environment [
25]. For the LASSO model, the
alpha parameter was set as the hyperparameter for optimization. For the RF model, we optimized hyperparameters
n_estimators and
max_depth by grid search.
We finally proposed six different models on different feature sets: (1) a model using PPG features and personal-specific feature normalization (PPG Personal Model); (2) a model using PPG features and group-specific feature normalization (PPG Group Model); (3) a model using ABP features and personal-specific feature normalization (ABP personal Model); (4) a model using ABP features and group-specific feature normalization (ABP Group Model); (5) a model using the combination of ABP and PPG features and personal-specific feature normalization (ABP&PPG Personal Model); (6) a model using the combination of ABP and PPG features and group-specific feature normalization (ABP&PPG Group Model). Twelve features were used in the ABP model, 24 features were used in the PPG model, and 30 features were used in the ABP&PPG model. It is noted that, for personal models, features were normalized using personal-specific methods before model training and testing. For the group models, for each CV epoch, features in the training set were first normalized using the group-specific method, and the same parameters obtained from the training set were performed on features in the test set for normalization. The difference in mean and standard deviation (SD) between actual and predicted BLV on the observations from the hold-out subject at all available time points was used as the model’s evaluation metric.
We also compared our model predictions to three traditional HCT-based blood loss estimation formulas proposed by Ward et al., Bourke et al., and Gross et al. when we obtained HCT values. These formulas are provided by the following equations [
9,
10,
11]:
is the subjects’ estimated initial blood volume,
is the HCT at baseline,
is the HCT in the bleeding stage, and
is the mean HCT between the baseline and the bleeding stage. Finally, features’ explanatory contribution to the BLV estimation models were also investigated using the Gini-impurity index for the RF model and the size of coefficients of features for the LASSO Model prediction equation [
26]. Because coefficients of the LASSO Model may be negative, we computed the mean absolute values of the feature’s importance among different CV folds and ranked them.
4. Discussion
We proposed novel models toward accurate BLV estimation for surgical hemorrhage using continuously monitored vital signs and a supervised machine learning framework. Models built using only features from noninvasive PPG waveforms showed on average minimal bias and good performance (11.9 ± 156.2 mL estimation error), yet tended to significantly underestimate blood loss exceeding 30% of the initial blood volume. This performance was comparable to models using only the invasive ABP waveform (6.5 ± 161.5 mL). However, if the ABP and PPG signals were combined, model performance was significantly improved (7.0 ± 139.4 mL estimation error), although this improvement is likely not clinically significant. We also found that PPG models perform as well as invasive ABP models in the early stage of bleeding, but adding ABP improves performance with large bleeds.
This represents, to our knowledge, the first attempt at predicting quantitative blood loss from noninvasive measurements. The PPG waveform has previously been used to evaluate fluid responsiveness, where a pulse variability index of 15% is associated with fluid responsiveness in subjects receiving positive-pressure ventilation in sinus rhythm [
15,
28]. Most patients are not monitored invasively using an indwelling arterial catheter, and exploring the conditions under which data from the PPG waveform might provide useful information continues to be warranted. Without some form of signal normalization, the PPG waveform could not predict blood loss. Because of the baseline variability of PPG waveforms across subjects, the mean values of PPG-derived features varied largely among subjects (data not shown). Group-based normalization improves the situation substantially and opens the possibility of deploying such algorithms to situations where a person-specific baseline is not available. However, group-based normalization was also associated with significant bias, as expected. Indeed, any given subject may be quite distinctive. Person-based normalization improves the situation substantially in that bias is significantly reduced. However, person-specific normalization must be interpreted with the caveat that the baseline is assumed to represent a period of stability. We have not specifically explored situations where this assumption is relaxed and the baseline simply represents a group of observations from some time in the past. Given this experimental setup, a blood loss model would possibly identify ongoing blood loss from an unknown initial state. This remains to be explored.
Unlike noninvasive PPG, invasive ABP remains the clinical gold standard for monitoring hemorrhage. In our study, both ABP group-specific models and ABP person-specific models achieved a reasonable bias and estimation error in estimating BLV. We also found there is less than a 20% performance improvement for ABP models using person-specific normalization compared to group-specific normalization. We note that, in our experimental setting, absolute values of ABP were used to define shock (systolic ABP < 30 mmHg) rather than tissue-perfusion-based measurements. Thus, there could be some bias favoring ABP models. We also note that we used the continuous invasive ABP waveform from an arterial line rather than a noninvasively continuously monitored BP, such as can be obtained commercially, yet rarely available, technology. It remains unknown whether our findings regarding ABP models would translate to noninvasive BP monitoring.
An interesting finding of our study is that PPG models performed well in the early stage of bleeding. We believe this may be related to the autonomic compensatory response, which attempts to preserve tissue perfusion and blood pressure early in the bleeding as cardiac output and intravascular volume decrease. We indeed consider pulse rate a significant predictor of BLV, as expected. However, blood flow and blood volume are decreased in proportion to BLV. Thus, we expect an impact on the PPG waveforms. As mentioned below, it is possible that more exhaustive featurization of the arterial waveform could also capture the early physiologic response.
Analysis of the leading predictive features may help provide construct validity to the presented models. There are significant differences in the high-ranking features between LA models and RF models. For the LA models, the most predictive features are PPG-amplitude-related features. It is known that changes in PPG amplitude correlate to changes in the circulatory blood volume, and the area under the PPG waveform positively correlates to the stroke volume [
29]. It is interesting to note that the pulse variability index, documented as a good predictor of fluid responsiveness in subjects on positive-pressure ventilation, is only highly ranked in the RF model. It is also noted that the geometric features of the PPG signals are not generally ranked highly in their ability to predict BLV. Thus, the LASSO model could discriminate well between different hemorrhagic situations. Regarding the RF-derived model, nonlinear BP-related features, such as DBP, SBP, and PP, are more important. Although its variation in the hemorrhagic procedures is not linear and significant, the RF regressor could also select it because it is informative.
Our proposed models achieve an acceptable estimation error for the quantitative BLV estimation. It is also investigated that our estimation results reveal a large variation. This large variation potentially demonstrates that, in our study, the model’s performances varied largely among different subjects and different blood loss classes. The results presented in
Figure 4 also demonstrate the larger bias between estimated BLV and reference BLV when the subjects are in a 15–30% blood loss class. We believe this large variation of estimation error is reasonable. Different subjects may have different responses when suffering from hemorrhage. For example, a variation in vital signs is not significant for stronger subjects, and our trained models cannot accurately quantify these tiny physiological changes. Other factors, such as artifacts induced by blood draw to collect a blood sample and saturation changes in vital signs when huge bleeding occurs, may also impact the estimation results, causing this large variation.
Our study subjects suffered multiple bleedings until the development of hemorrhagic shock in our experimental protocol. Between each bleeding, a long resting time was performed for each subject to stabilize their vital signs. During these long resting times, the autonomic compensatory response works, blood volume from the capillary is refilled to the aorta and peripheral arteries, and BP and blood volume are recovered. It is reasonable to assume that autonomic compensatory responses would potentially influence the estimation error and cause a larger bias between estimated BLV and reference BLV in certain blood loss classes.
The main limitation of our study lies in our tightly controlled experimental design, where signals were obtained from anesthetized animals subjected to a strict bleeding protocol. Whether this translates to uncontrolled human settings remains to be explored, as the physiological responses of pigs and humans may be significantly different. Our models slightly overestimate BLV in the early stage of hemorrhage but significantly underestimate BLV with large blood loss. Presumably, such larger blood losses would be clinically apparent. We did not use as extensive a featurization of the ABP signal as we did for PPG, thus potentially biasing the model in favor of PPG. The requirement for baseline normalization limits the potential generalizability of the models to situations where such a baseline is available, such as quantification of surgical or peripartum bleeding. Indeed, peripartum hemorrhage could represent an important use case for models based on noninvasive signals since these patients are rarely instrumented and the majority of hemorrhages occur in low-risk patients. The reasonable performance of models using a population-derived baseline introduces the possibility that such models could be deployed to other use cases where a baseline is not readily available, such as traumatic hemorrhage. Finally, models were developed on observations prior to interventions, such as vasopressors or fluid resuscitation. Model predictions should be evaluated in observations that extend to the resuscitation phase of our experimental design. Another limitation is that only 40 subjects were involved in our study, and there are little data for model development and model validation separately. The result we presented is an optimized value of each CV epoch, leading to an overestimation of our models’ performance. The machine learning models we used were LASSO and RF, which are basic and simple, and the models’ performances heavily rely on the features’ metrics we extracted. Advanced machine learning approaches, such as the deep learning network, could be considered in future work for their ability to automatically perform feature extraction. Further work is currently in progress to address such limitations, devise a more sophisticated machine learning scenario that contains feature engineering and model development work, and potentially extend the ability to quantify blood loss to a wider range of clinical situations beyond controlled surgical hemorrhage.