This article is
- freely available
Prediction of Preoperative Blood Preparation for Orthopedic Surgery Patients: A Supervised Learning Approach
Department of Laboratory Medicine, Ditmanson Medical Foundation Chia-Yi Christian Hospital, Chiayi City 600, Taiwan
Department of Obstetrics and Gynecology, Taipei Tzu Chi Hospital, Buddhist Tzu Chi Medical Foundation, Taipei 231, Taiwan
School of Medicine, Tzu Chi University, Hualien 970, Taiwan
Department of Information Management and Institute of Healthcare Information Management, National Chung Cheng University, Chiayi County 621, Taiwan
Center for Innovative Research on Aging Society, National Chung Cheng University, Chiayi 621, Taiwan
Department of Psychiatry, Chiayi Branch, Taichung Veterans General Hospital, Chiayi 600, Taiwan
School of Medicine, National Yang-Ming University, Taipei 112, Taiwan
These authors contributed equally to this work.
Received: 9 July 2018 / Accepted: 31 August 2018 / Published: 5 September 2018
Blood transfusion is a common and often necessary medical procedure during surgery. However, most physicians rely on their personal clinical experience to determine whether a patient requires a transfusion. This generally involves considering the risk of blood loss during surgery, and the preparation of blood is thus regularly requested before surgery. However, unused blood is a particularly severe problem, especially in orthopedic procedures, which not only increases medical resource wastage but also places a burden on medical personnel. This study collected the records of 1396 patients who received an orthopedic surgery in a regional teaching hospital. Data mining techniques, namely support vector machine, C4.5 decision tree, classification and regression tree, and logistic regression (LGR) were employed to predict whether patients undergoing an orthopedic surgery required an intraoperative blood transfusion. The LGR classifier, which was constructed using the CfsSubsetEval module and GeneticSearch method, exhibited optimal prediction accuracy (area under the curve: 78.7%). This study investigated major variables involved in blood transfusions to provide a clear reference for evaluating the necessity of preparing blood for surgical procedures. Data mining techniques can be used to simplify unnecessary blood preparation procedures, thereby reducing the workload of medical staff and minimizing the wastage of medical resources.
blood transfusion prediction; data mining; supervised learning techniques; orthopedic surgery; feature selection
Intraoperative blood transfusions are commonly performed when loss of blood flow, blood components, or oxygen occur due to excessive bleeding. Without such measures, patients are at an increased risk of heart failure and death. Thus, predicting whether a blood transfusion is necessary for surgical patients is an issue of great concern in the field of medicine. Clinically, blood transfusions are often necessary in cardiovascular, orthopedic (e.g., hip and knee joint replacement), gynecological (e.g., radical hysterectomy), and urological (e.g., radical prostatectomy) surgical procedures [1
]. However, blood transfusion may induce some risks, such as transmission of bacterial or viral infections. Therefore, the World Health Organization (WHO) urges member states to utilize transfusion alternatives and develop individualized Patient Blood Management (PBM) programs to reduce transfusion needs. The three pillars of PBM are as follows: (i) detection and treatment of preoperative anemia; (ii) reduction in perioperative RBC loss; and (iii) harnessing and optimizing the patient-specific physiological reserve of anemia (including restrictive hemoglobin transfusion triggers) [2
To prevent excessive blood loss during surgery, preoperative blood preparation is often necessary in certain departments and disciplines. Previous studies have identified that more than 60–70% of prepared blood products go unused [3
]. A further investigation of this trend has revealed that most physicians often determine whether a transfusion is necessary on the basis of their clinical experience, which results in a waste of related medical resources. Furthermore, it can impose a burden on medical staff. For example, two nursing personnel must perform joint assessments to facilitate accurate blood preparation. In addition, unnecessary requests for blood can put pressure on blood bank personnel to ensure the accurate management and storage of blood. Accordingly, enhancing the prediction accuracy of whether an intraoperative blood transfusion is necessary may effectively resolve current problems in clinical practice. The guideline of the Maximum Surgical Blood Order Schedule (MSBOS)–the amount of blood to be cross-matched for specific elective operations–has been used to rationalize the number of units of blood routinely cross-matched for elective surgical procedures and has concomitantly reduced the unnecessary use of blood [4
]. However, some primary concerns regarding the MSBOS are that the recommendations are often outdated, based on opinion, do not include recently developed surgical procedures, and are not based on institution-specific blood utilization data.
Previous studies on intraoperative blood transfusion [8
] have focused primarily on variables related to patients’ physiological characteristics (e.g., weight, age, and sex) [13
] and medication history (e.g., whether the patient is receiving nonsteroidal anti-inflammatory drugs or blood thinners) [10
], the type and duration of the surgical procedure [14
], the use of tranexamic acid [16
], and whether patients have a history of cardiopulmonary disease [18
]. A review of the literature revealed that most studies on related topics have investigated patients in the United States and Europe, but no study has addressed patients in Asian countries. In addition, these studies have mainly employed statistical analyses, whereas supervised learning algorithms have rarely been employed in their analysis of related variables. Moreover, no viable prediction model has been proposed regarding the clinical application of blood transfusions.
According to the electronic medical records (EMRs) collected for this study, orthopedic surgery requires the highest volume of blood, the most frequent transfusions, and incurs the highest cost from unused blood products, suggesting that the prediction accuracy in orthopedic departments requires significant improvement. Therefore, to prevent unnecessary blood preparation, the aim of this study is to develop a prediction model to determine whether an orthopedic surgery patient requires a preoperative blood preparation for an intraoperative or postoperative blood transfusion. To make the precision model applicable in clinical practice, we only consider the set of independent variables (IVs) which can be retrieved before the onset of surgery. Supervised learning algorithms were employed to analyze the influence of each variable, and a prediction model was devised to provide a reference to facilitate clinical decision-making by related medical personnel in preoperative blood preparation.
2. Materials and Methods
All the medical histories of inpatients were collected from the electronic medical record system at a regional teaching hospital in Southern Taiwan. The inpatients who underwent an orthopedic surgery from July 2011 to December 2013 were included. Because intraoperative or postoperative blood preparations usually occur under emergency conditions and cannot be predicted, the patients having an intraoperative or a postoperative blood preparation were not considered in the study. Moreover, patients younger than 20 years of age were also excluded. In addition to the factors suggested in previous studies, chronic disease history and data from liver and renal function tests, namely glutamic oxaloacetic transaminase (GOT), glutamic pyruvic transaminase (GPT), blood urea nitrogen (BUN), and creatinine levels, were included as the study variables. These test results were included because abnormal liver function can affect coagulation and thus increase the need for an emergent intraoperative blood transfusion. Moreover, patients with poor renal function or who require dialysis often have lower hemoglobin (HB) levels and are thus at a greater risk of requiring an intraoperative blood transfusion. The Chia-Yi Christian Hospital Institutional Review Board approved the study protocol (CYCH-IRB No. 103018). Written consent from the study was deemed unnecessary because the dataset comprises only anonymized secondary data for research purposes, and the Chia-Yi Christian Hospital Institutional Review Board issued a formal written waiver of the need for consent.
2.1. Data Source
With the approval of the institutional review board, the medical records of all patients whose orthopedic physicians requested preoperative blood preparation between July 2011 and December 2013 were retrieved from a blood bank database. According to the manual report in the case hospital in 2013, the number of orthopedic surgeries is 671 and the number of blood units prepared for these patients is 2052. Among them, only 848 blood units (41.3%) were used for 316 patients (47.1%). From the records, details on the blood product specifications, product quantity, the serial number of blood preparation services, physician identification numbers, and medical history numbers were linked to inpatient identification numbers to obtain the corresponding diagnostic, operation, and procedure codes. Finally, the aforementioned data were used to access relevant EMRs and retrieve information on inpatient physical assessments, preoperative blood and biochemistry evaluations, preoperative anesthesia consultation evaluations, medical and medication histories, and smoking or alcohol history.
2.2. Variable Definition and Selection
On the basis of a review of the literature and consultations with the relevant specialists, the dependent variable of this study was the decision to perform a blood transfusion within 48 h of surgery. A total of 35 IVs (Table 1
) retrieved before the onset of surgery were considered in our study [1
The study employed the data mining software, WEKA 3.6.11, and used three Correlation-based Feature Subset Selection (CfsSubsetEval) methods to generate three datasets (referred to as Datasets A, B, and C). The CfsSubsetEval module evaluates both the prediction power of each IV and the degree of redundancy between any two IVs; the subset of features that are highly correlated with the dependent variable but not strongly correlated with one another are preferred. Dataset A was processed using the CfsSubsetEval module with the GreedyStepwise method to retrieve the following 4 IVs: OP, BMI, HB, and PLT. Dataset B was generated using the CfsSubsetEval module with the RankSearch and GainRatioAttibuteEval methods to acquire the following 13 IVs: SURGEON, OP, AGE, BMI, DBP, HB, PLT, INR, GOT, BUN, NA, OP_DAYS, and LIVER. Finally, the CfsSubsetEval module and GeneticSearch method were used to construct Dataset C, which contained the following 12 IVs: SURGEON, OP, AGE, GENDER, BMI, DBP, HB, PLT, INR, GOT, OP_DAYS, and KIDNEY.
2.3. Investigated Classification Techniques
The data mining techniques selected in this study were support vector machines (SVMs), classification and regression trees (CARTs), C4.5 algorithm, and logistic regression (LGR) [27
]. SVM, a supervised learning method for classification, is currently one of the most effective methods for high-dimensional data (Cortes and Vapnik, 1995). An SVM initially maps input and output variables to a high-dimensional vector space by using structural risk minimization, which minimizes boundary errors by induction. Then, the SVM seeks a separating hyperplane to divide the data into two or more categories. Therefore, a new instance can be mapped into one of the subspaces projected by a set of hyperplanes, and the majority class in this subspace is assigned to the new instance.
Decision tree (DT) is a classification technique commonly used in data mining. In the tree, each internal node represents a single IV. Each branch represents one or more possible values of the selected IV, and each leaf-node represents a class label. During tree construction, the DT-based algorithms recursively select an IV to reduce the impurity of the instance group. When the stopping criteria are satisfied, a class label will be assigned to a leaf-node. Both CART and C4.5 are DT-based algorithms, but they have two major differences during the tree-growing phase: (1) the impurity measure of CART is the Gini index, while in C4.5, it is the gain ratio; (2) CART method builds binary trees, while C4.5 builds a multiway tree.
Regression analysis is a statistical learning method; it analyzes the collected data to develop a mathematical model for predicting the output variable. It aims at determining the correlation strength between the input and output variables. LGR is a nonlinear regression model, where the dependent variable is categorical. LGR can be used to predict the probability of an event by fitting the data objects to a logistic function, that is, it allows input variables with any value to be put into the logistic function to obtain a probability value between 0 and 1.
2.4. Experimental Setup and Performance Measure
Parameter values were automatically adjusted using WEKA. The classification performance of the SVM, C4.5, CART, and LGR models was compared to select the optimal model. Ten-fold cross-validation was applied to all experimental evaluations. Specifically, each dataset is partitioned into ten complementary subsets; any nine were used for model training, and the remaining subset was used for model testing. The validation was repeated 10 times, and the average results were reported in our study.
To evaluate the efficacy of these prediction models, the accuracy, sensitivity, and specificity were evaluated using a confusion matrix (Table 2
). These metrics were determined using the following formulas: sensitivity = TP
), specificity = TN
), and accuracy = TP
). In addition, the area under the receiver operating characteristic curve (AUC) was included as an indicator of the model performance, with larger AUCs indicating higher accuracy.
From the blood bank databases and medical records of the study hospital, 1698 blood preparation records were obtained for patients who underwent an orthopedic surgery. Filtering and preprocessing the data to remove records with missing values or data errors yielded 1396 records for subsequent analyses, which included 661 clinical cases with an intraoperative blood transfusion as well as 735 cases without an intraoperative blood transfusion.
In Dataset A, the four significant IVs were first tested on the SVM, C4.5, CART, and LGR classifiers. The results in Table 3
reveal that the accuracy of the LGR classifier was the highest (71.80%), followed by the CART classifier (71.10%), and the C4.5 classifier had the lowest accuracy rate (70.30%). The AUC results revealed that the LGR had the highest accuracy (77.40%), while the SVM classifier had the lowest accuracy (70.30%).
In Dataset B, the 13 significant IVs were processed using the same four classifiers. The results in Table 4
show that the prediction accuracy of the CART classifier was the highest (71.80%), followed by LGR (71.70%), and the SVM classifier was the lowest (71.10%). The AUC results showed that the optimal performance was exhibited by the LGR classifier (78.30%), while the least favorable performance was by the SVM classifier (70.80%).
Finally, in Dataset C, the 12 significant IVs were analyzed using the same four classifiers. The analysis results in Table 5
show that the highest accuracy was obtained by the CART classifier (73.10%), followed by the LGR classifier (72.20%) and C4.5 classifiers (72.20%), and the lowest was for the SVM classifier (71.10%). The AUC results showed that the optimal performance was obtained by the LGR classifier (78.70%), while the least one was obtained by the SVM classifier (70.70%).
Results showed that the LGR classifier was the optimal classifier and yielded the highest prediction accuracy for all three datasets. In addition, the variables in Dataset C exhibited a relatively satisfactory prediction model performance, indicating that the combination of the CfsSubsetEval module and the GeneticSearch method can generate higher prediction accuracy. However, the prediction model generated from Dataset C must be performed based on 12 IVs. If data for all 12 variables cannot be obtained in clinical practice, Dataset A can be used alternatively because it still produces satisfactory prediction accuracy (77.40%) with only four IVs (OP, BMI, HB, and PLT).
To further explore the effects of the IVs on the dependent variable, the attribute selection module in WEKA was adopted to analyze the significant variables. The gain ratio (GainRatioAttributeEval) was applied to rank the variables according to their gain ratio. The experimental results in Figure 1
reveal that variables such as HB, BMI, PLT, OP_DAYS, AGE, INR, OP, DBP, and the variables related to liver and renal function have a larger influence on intraoperative blood transfusion.
First, HB was found to have the greatest effect on preoperative blood preparation, which corresponded with results from other literature [32
]. The relationship between decreasing HB levels and the need for an intraoperative blood transfusion was further explored by employing HB as the sole variable to conduct a univariate analysis using the C4.5 classifier. The retrieved classifications were HB ≤ 12.1 g/dL: Y (656.0/172.0), HB > 12.1 g/dL: N (831.0/268.0) (Y indicates the status of intraoperative blood transfusion). The results of accuracy, sensitivity, specificity, and AUC were 0.677, 0.77, 0.573, and 0.664, respectively.
The effect of BMI came second after HB, which was found to be more influential than the other primary factors considered by specialists (e.g., PLT, OP, AGE, and LIVER). Similar to previous findings [14
], the evaluation of BMI indicated that patients with a high BMI may require treatment with the same volume of blood loss. Furthermore, a lower PLT (i.e., PLT ≤ 80,000) indicates a high amount of blood loss. This is because poor blood coagulation may be caused by cirrhosis or severe sepsis, which was supported by a previous study reporting that blood preparation may be suitable with a PLT of <100,000 [15
Notably, surgical waiting time (OP_DAYS) was also found to have a marked influence mainly on patients with open wounds because the need for an intraoperative blood transfusion is higher for patients who experience continuous blood loss and have longer surgical waiting times [10
Next, our study identified AGE as a vital factor for determining whether a blood transfusion should be performed, which is consistent with the findings of many previous studies [3
]. Although some previous studies have considered that INR is irrelevant to the need for a blood transfusion, the results of the present study showed that INR is relevant to the need for a blood transfusion.
Finally, regarding the type of OP, specialists consider that the decision to prepare blood is highly likely to be affected by events that occur during surgery. For instance, most limb operations do not require blood preparation because tourniquets are used when the need for blood preparation increases or for operations with a high risk of bleeding (e.g., spinal, thigh bone, knee, and hip replacement operations).
The identification of the predictors for blood transfusions in surgical patients has long been a topic of concern in medicine; statistical models are commonly utilized in the literature. Until recently, data mining and machine learning techniques have proven to possess an excellent ability to construct prediction models in the medical domain. This study sought to develop a reliable prediction model by using SVM, DT, and LGR supervised learning algorithms to improve current clinical decision-making procedures for blood management. The results may provide a clinical reference for evaluating preoperative blood preparation and may serve as a reminder to high-risk patients when preparing blood for a transfusion before surgery, thereby enhancing overall service quality and safety. In addition, our study also contributes to the literature related to the novel use of data mining techniques for transfusion medicine.
Three concerns were identified as research limitations. First, the gold standard of blood transfusion in surgical patients is the doctor’s clinical judgement, and over-transfusion might be a problem in such scenario. It may be worth repeating the study after reviewing whether a transfusion is appropriate by a well-trained team. Second, we primarily investigated the need for a blood transfusion within 48 h of surgery and did not consider that physicians may have already performed a blood transfusion for the patient prior to that threshold. Third, a history of anticoagulant use does not consider prescriptions from other hospitals or purchases from other sources.
The present study confirmed that data mining techniques possess satisfactory accuracy for predicting whether a blood transfusion will be necessary in orthopedic procedures. Future studies of the prediction of blood volume or using other variables such as surgery duration and blood loss volume should be considered. Further applications in other medical departments are expected to generate satisfactory outcomes and improve the safety of surgery.