Machine Learning Prediction of Iron Deficiency Anemia in Chinese Premenopausal Women 12 Months after Sleeve Gastrectomy

Premenopausal women, who account for more than half of patients for bariatric surgery, are at higher risk of developing postoperative iron deficiency anemia (IDA) than postmenopausal women and men. We aimed at establishing a machine learning model to evaluate the risk of newly onset IDA in premenopausal women 12 months after sleeve gastrectomy (SG). Premenopausal women with complete clinical records and undergoing SG were enrolled in this retrospective study. Newly onset IDA after surgery, the main outcome, was defined according to the age- and gender-specific World Health Organization criteria. A linear support vector machine model was developed to predict the risk of IDA after SG with the top five important features identified during feature selection. Four hundred and seven subjects aged 31.0 (Interquartile range (IQR): 26.0–36.0) years with a median follow-up period of 12 (IQR 7–13) months were analyzed. They were divided into a training set and a validation set with 285 and 122 individuals, respectively. Preoperative ferritin, age, hemoglobin, creatinine, and fasting C-peptide were included. The model showed moderate discrimination in both sets (area under curve 0.858 and 0.799, respectively, p < 0.001). The calibration curve indicated acceptable consistency between observed and predicted results in both sets. Moreover, decision curve analysis showed substantial clinical benefits of the model in both sets. Our machine learning model could accurately predict newly onset IDA in Chinese premenopausal women with obesity 12 months after SG. External validation was required before the model was used in clinical practice.


Introduction
During the last several decades, obesity has reached epidemic proportions in both developing and developed countries [1]. Although having a healthy lifestyle seems to be an ideal option to lose weight, bariatric surgery (BS) results in greater and sustained improvements in weight loss, obesity-associated complications, all-cause mortality, and quality of life compared with non-surgical treatment options [2]. BS involves different techniques leading to different effects on energy metabolism. Currently, sleeve gastrectomy (SG) is one of the most performed techniques in clinic.
Iron deficiency (ID) and iron deficiency anemia (IDA) have been observed at higher rates in patients with obesity compared with the general population. SG has been shown to exacerbate IDA with a prevalence as high as 18% [3] through a variety of mechanisms, including malabsorption due to gastric volume reduction and decreased gastric acid secretion [4], as well as impaired dietary tolerance of red meat [5]. Premenopausal women are at higher risk of developing ID and subsequently IDA due to menstrual blood losses after BS [6]. In our previous study, it has been reported that premenopausal women develop IDA more often after RYGB compared with postmenopausal women and men [7]. In addition, according to the data from the International Federation for the Surgery of Obesity (IFSO) global registry during 2015-2018, 77.1% of all patients who underwent BS were women [8]. In general, 70% of these female patients were premenopausal women. Therefore, premenopausal women accounted for more than half of the patients for BS. Furthermore, Knight et al. found IDA after BS was associated with more likelihood of hospitalization, higher risk of BS complications, and greater healthcare costs [9].
Due to the high incidence of IDA in premenopausal women after BS and the burden resulting from postoperative IDA, there is a growing need for tools to predict IDA after SG in premenopausal female patients. These tools could improve clinical decisions for necessary postoperative nutrition interventions. However, to our knowledge, there have been no predictive models available in clinical practice. In light of recent advances in machine learning (ML), predictive models developed from ML algorithms have been feasible for evaluating the prognosis of various diseases including stroke and myocardial infarction [10][11][12]. Based on the ML algorithm support vector machine (SVM), the advanced-DiaRem score for the prediction of diabetes remission after BS was developed by Aron-Wisnewsky et al. and had improved performance [13]. Thus, ML algorithms have been proposed as an alternative to developing predictive models.
The objective of this study was to establish a predictive model for newly onset IDA using machine learning based on the baseline clinicopathologic data of premenopausal female patients with obesity who underwent SG.

Study Design and Participants
A retrospective study was conducted on premenopausal women with obesity who underwent SG between 2015 and 2021 in a referral center. The inclusion criteria included the following: body mass index (BMI) ≥ 27.5 kg/m 2 ; aged 18 to 50 years; complete preoperative and 1-year follow-up information. The exclusion criteria were as follows: vegetarian; underwent other bariatric surgeries; anemia at baseline; renal failure at baseline; incomplete preoperative information or lost to follow-up; premenopausal women with heavy menstrual bleeding after surgery (heavy menstrual bleeding was defined as a total blood loss per menstrual cycle that regularly exceeds 80 mL [14]); postoperative bleeding occurring within 30 days after surgery (it was defined as either a drop in hemoglobin levels (>30 g/L) and/or blood loss confirmed on intervention that required treatment [15]). Medical history, age, height, weight, BMI, blood pressure (BP), and current medications were recorded at baseline and after surgery. Glucose, C-peptide, glycated hemoglobin (HbA1c) levels, and lipid profiles were measured preoperatively and at 1 year postoperatively. Blood samples were collected in fasting state. Blood routine test was conducted by a fully automated hematology analyzer XN-350 (Sysmex, Kobe, Japan). White blood cell (WBC) count was measured by the flow cytometry method. Hemoglobin (Hb) was measured by cyanide-free sodium lauryl sulfate method. Plasma glucose concentration was measured by the glucose oxidase method. Serum insulin and C-peptide levels were quantified using radio-immunoassays. HbA1c level was measured by high-performance liquid chromatography with a VARIANT II Hemoglobin A1c analyzer (Bio-Rad Laboratories, Hercules, CA, USA). The levels of alanine aminotransferase (ALT), aspartate aminotransferase (AST), blood urea nitrogen (BUN), creatinine (Cr), blood uric acid (BUA), triglycerides (TG), total cholesterol, high-density lipoprotein cholesterol (HDL-c) and low-density lipoprotein cholesterol (LDL-c) were determined by applying standard enzymatic methods using a biochemical analyzer (7600-120; Hitachi, Tokyo, Japan). Serum levels of ferritin and iron were measured by electrochemiluminescence immunoassay using Modular E170 analyzer (Roche Diagnostics, Basel, Switzerland). Serum vitamin B12 and folic acid levels were performed with radioimmunoassay method (MP Biomedicals, Irvine, CA, USA).
Patients meeting inclusion criteria were randomly divided into training and validation sets with a ratio of 7:3 using the R function "createDataPartition" in the "caret" R package. The training set was used to establish the ML predictive model and the validation set was used to evaluate the performance of the model.
The Ethics Committee of our institution approved the study in accordance with the guidelines of the Declaration of Helsinki (World Medical Association). Informed consent was obtained from all participants included in the study.

Surgical Techniques
The surgical technique used in this study was laparoscopic SG, as described in a previous study [17]. All the surgeries were performed by the same surgical group in the referral center with the patients in the supine position. The gastric tube was created over a 37-Fr bougie using green and blue staples. Gastric section started 5 cm away from the pylorus towards the angle of His. Afterwards, the staple line was reinforced with running an absorbable suture.

Data Pre-Processing and Feature Selection
The imbalance between two outcomes in the training set was mitigated by synthetic minority oversampling technique (SMOTE), which could create synthetic minority class samples. SMOTE was conducted by deploying the "SMOTE" function in the R package "DmWR". Subsequently, baseline clinicopathologic data were used to build a random forest model and the importance of the features was ranked on the basis of mean decrease in GINI index. The top 5 features were included in the model learning.

Statistical Analysis
The sample size was estimated using the R package "pmsamplesize" for predictive model sample size calculation. Clinical characteristics are presented as mean ± standard deviation (SD) and median + interquartile range (IQR) for normally and non-normally distributed continuous variables, respectively; binominal variables are presented as frequencies and percentages. Shapiro-Wilk normality tests and histograms were used to verify whether the continuous variables had a normal distribution. Independent t-tests, chi-square tests, and Mann-Whitney U tests were performed to compare baseline characteristics between training and validation sets as well as between normal and newly onset IDA groups in the training set. Paired t-tests, Wilcoxon tests, and McNemar tests were performed to compare baseline and 1-year characteristics within normal and newly onset IDA groups in the training set.
For model learning, we used the linear SVM, which is a classification algorithm with acceptable accuracy under low computational power and small sample size. In order to detect overfitting and make alterations, if necessary, we performed tenfold cross-validation. Importance of variables included in the model was calculated by the varImp function of the R package "caret". The discriminative ability of the model was evaluated by the area under curve (AUC) derived from receiver operating characteristic (ROC) curve. The significance of the AUCs compared to 0.5 was tested by the DeLong method and p values were generated. Calibration was validated by performing calibration curve analysis with bootstrapping to assess the agreement between model predictive and actual probability. Furthermore, decision curve analysis (DCA) was performed to evaluate the net benefits of the model. Finally, we developed an application based on our model by using the R package "shiny". All statistical analysis was performed using IBM SPSS Statistics 25.0 (IBM Corp., Armonk, NY, USA) and R statistical software 4.1.2 (R Foundation for Statistical Computing, Vienna, Austria). The ML predictive model was constructed using the "caret" package, and DCA was conducted using the "rmda" package. p value < 0.05 (two-sided) was considered statistically significant.

Clinical Characteristics of Study Subjects
In accordance with aforementioned inclusion and exclusion criteria, 407 eligible patients were actually seen both at baseline and last follow-up in the analysis (Figure 1 (Table 1). Of these patients, forty-four individuals (10.8%) had newly onset IDA and 363 (89.2%) did not.  All eligible patients were divided into a training set and a validation set including 285 individuals and 122 individuals, respectively. According to the calculation executed by the package "pmsamplesize", the minimum sample size was 199 for the training set to build a predictive model including 5 parameters when R 2 of the model was set to be 0.2. The sample size of the training set was able to meet the minimum requirement.

The SVM Model Construction and Evaluation
After SMOTE of the training set by setting the parameter "perc.over" as 500 and the parameter "perc.under" as 120, a dataset consisting of 50%, 156 samples with a normal outcome and 50%, 156 samples with newly onset IDA after SG was created. The feature importance for predicting IDA using a random forest algorithm after SMOTE was shown in Figure 2. In the feature selection step, preoperative ferritin, age, hemoglobin, Cr, and FCP were the top five features, which were then included in the SVM model. Their importance was shown in Table 3. Preoperative ferritin was the most important contributor to the model. In the training set, the AUC was 0.858 (95% CI 0.784-0.931, p < 0.001; Figure 3A). The calibration curve of the model in the training set was close to the ideal diagonal line and the mean absolute error was 0.01, which was close to 0 ( Figure 4A). These indicated acceptable consistency between observed and model-predicted results in the training set, and the model was well calibrated. DCA curve indicated that the model added net benefits compared with the treat-all-patients scheme and the treat-none scheme ( Figure 5A). The importance of these features in the SVM model was calculated and preoperative ferritin was the most important among them. In the validation set, the AUC was 0.799 (95% CI 0.689-0.910, p < 0.001; Figure 3B). The calibration curve of the model in the validation set was also close to the ideal diagonal line and the mean absolute error was 0.03, which was also close to 0 ( Figure 4B). These indicated fair consistency between observed and model-predicted results in the validation set, and the model was well calibrated. Moreover, the DCA curve showed net benefits of the predictive model in the validation set as well ( Figure 5B). Finally, an application based on our model was developed by using the "shiny" package. The application could calculate the probability of postoperative newly onset IDA after parameters were input into the panel on the left ( Figure 6).

Discussion
To the best of our knowledge, the study is the first to establish an ML model for accurately predicting newly onset IDA after SG in premenopausal female patients with obesity. IDA is a common nutritional problem and complication after BS, especially in premenopausal females [7]. Gowanlock et al. suggested IDA was reported in 16% of patients after BS in their cohort with 388 subjects [20]. With regard to different surgical procedures, Kwon et al. showed there were no significant differences in the risk of postoperative anemia or ID between gastric bypass and SG [21]. According to Nie et al., the pooled prevalence of anemia increased to 12% at 12 months after SG, and ferritin deficiency was strongly correlated with anemia [22]. In our cohort, the SG premenopausal patients had a reported IDA incidence of 10.8% post-operatively, which is similar to previous studies.
As for sample size calculation of the training set, in the function "pmsamplesize", the parameter "prevalence" was set as 0.11, which was approximately the prevalence of newly onset IDA in our cohort, and the parameter "parameters" was set as 5 in accordance with the expected number of features in our model. According to the calculation of the function, the maximum R 2 for an outcome proportion of 0.11 was 0.5. As was suggested by Riley et al. covering the sample size calculation for developing a clinical prediction model in detail [23], the anticipated R 2 of the model could be set as 50% of the maximum R 2 when the training dataset included direct measures of the clinical process involved. In this study, direct measures of IDA such as ferritin and Hb were included in the training set. Thus, the anticipated R 2 could be set as 0.25, or 50% of 0.5. With this, the anticipated R 2 and aforementioned parameters, the minimum sample size of the training set was 172, smaller than the actual size. Actually, the anticipated R 2 here was set as 0.2, or 40% of 0.5, which was

Discussion
To the best of our knowledge, the study is the first to establish an ML model for accurately predicting newly onset IDA after SG in premenopausal female patients with obesity. IDA is a common nutritional problem and complication after BS, especially in premenopausal females [7]. Gowanlock et al. suggested IDA was reported in 16% of patients after BS in their cohort with 388 subjects [20]. With regard to different surgical procedures, Kwon et al. showed there were no significant differences in the risk of postoperative anemia or ID between gastric bypass and SG [21]. According to Nie et al., the pooled prevalence of anemia increased to 12% at 12 months after SG, and ferritin deficiency was strongly correlated with anemia [22]. In our cohort, the SG premenopausal patients had a reported IDA incidence of 10.8% post-operatively, which is similar to previous studies.
As for sample size calculation of the training set, in the function "pmsamplesize", the parameter "prevalence" was set as 0.11, which was approximately the prevalence of newly onset IDA in our cohort, and the parameter "parameters" was set as 5 in accordance with the expected number of features in our model. According to the calculation of the function, the maximum R 2 for an outcome proportion of 0.11 was 0.5. As was suggested by Riley et al. covering the sample size calculation for developing a clinical prediction model in detail [23], the anticipated R 2 of the model could be set as 50% of the maximum R 2 when the training dataset included direct measures of the clinical process involved. In this study, direct measures of IDA such as ferritin and Hb were included in the training set. Thus, the anticipated R 2 could be set as 0.25, or 50% of 0.5. With this, the anticipated R 2 and aforementioned parameters, the minimum sample size of the training set was 172, smaller than the actual size. Actually, the anticipated R 2 here was set as 0.2, or 40% of 0.5, which was more conservative than the suggestion in the literature. Moreover, when the anticipated R 2 was set as a conservative one, 0.15, or 30% of 0.5, the minimum sample size required for the training set was 275, again smaller than the actual one. In summary, the sample size of our training set was adequate for model construction.
When the model was developed, we considered the computation complexity and ease of prediction. Additionally, considering the relatively short consulting time for each patient in the outpatient of our center and the short hospital stay, a simpler model might be more appropriate to assist the clinicians or the team for BS in evaluating the risk of newly onset IDA than a complex one. Apart from this, because of the limited sample size, our training set might not be capable of supporting the construction of a complex model. For instance, according to the calculation by the "pmsamplesize" package, the minimal sample size of the training set for building a 10-feature model was over 300, larger than the actual size. Based on these considerations and conditions, we performed feature selection with the help of the Gini index decrease calculated by random forest.
The present study suggests that preoperative serum ferritin, Hb, age, FCP, and Cr levels relate to newly onset IDA in premenopausal patients with obesity. Firstly, in clinics, ferritin is predominantly utilized as a serum marker of total body iron stores. In cases of iron deficiency and overload, serum ferritin serves a critical role in both diagnosis and management. In a retrospective study involving 2116 subjects who underwent gastric bypass, McCracken et al. found preoperative low ferritin (defined as <13 ng/mL for females and <30 ng/mL for males) was a significant factor associated with postoperative severe anemia in both univariate and multivariate analysis [24]. Thus, preoperative low ferritin was included in the scoring algorithm developed by them for the prediction of postoperative severe anemia. In another study that set out to determine the factors associated with IDA after BS including SG, gastric bypass, and duodenal switch, Gowanlock et al. showed low baseline ferritin level was associated with an increased risk of IDA with a mean follow-up of 31 months after BS. A baseline ferritin level of less than 30 mg/L was associated with a higher risk of IDA, whereas a ferritin level of 156 mg/L or greater carried a minimal risk of IDA even after 6 years of follow-up [20]. In our study with a median follow-up of 12 months, the newly onset IDA group in the training set also had significantly lower serum ferritin levels at baseline compared with the normal group, even though median ferritin levels were in the normal range in both groups.
Secondly, preoperative Hb per se can predict postoperative anemia or IDA. Lee et al. investigated the factors affecting anemia development after BS including gastric bypass, gastric binding, and SG in their retrospective cohort with 442 subjects, they found preoperative optimal value of Hb 156 g/L was able to predict future anemia in patients with morbid obesity 2 years after BS [25]. In an aforementioned study by Gowanlock et al., they reported lower preoperative Hb (Hb < 12 g/dL in females) was correlated with an increased risk of postoperative IDA [20]. Moreover, a recent study by Ben-Porat et al. with 121 subjects showed a lower pre-operative Hb level was an independent factor associated with anemia during pregnancy 2 years after SG [26]. Thus, the inclusion of preoperative Hb as one of the predictors in our model corroborated these previous findings about the association between preoperative Hb and anemia or IDA after BS. In addition, the median Hb in our newly onset IDA group of the training set at baseline was 131.0 g/L, which was on par with the pre-operative Hb level of patients developing postoperative anemia in the SG cohort present in the study by Ben-Porat et al. Therefore, it might be necessary to prevent the development of IDA in a premenopausal patient with an Hb level of around 130 g/L before SG.
Age is another factor included in our predictive model. The association between age and anemia or IDA after BS has been investigated in previous studies. In a large cohort study, aimed at exploring possible factors correlating to the risk of anemia after BS including gastric bypass, SG, and gastric banding, Bailly et al. identified younger age (defined as <52) as a factor for the occurrence of anemia after BS in their cohort with 306,298 patients [27]. In an East Asia cohort with 4373 subjects, Wang et al. found the incidence of post-BS anemia increased among patients in young-aged (defined as 20-29 years) and middle-aged (defined as 30-64 years) groups [28]. The aforementioned study by Gowanlock et al., which focused on the predictors of IDA after BS in a cohort with a mean age of 46, also reported young age was associated with an increased risk of IDA [20]. In our study, the median age of the newly onset IDA group in the training set was 33.5. It was a rather young age in comparison to previous studies and was located in the previously reported age groups correlating to the risk of anemia or IDA after BS. Therefore, it might be necessary for healthcare providers to take measures for the prevention of IDA in young female patients after SG.
Preoperative FCP and Cr were the other two factors included in our model. As for FCP, relatively lower FCP may indicate impaired pancreatic beta-cell function. In a cross-sectional study by Chung et al. with 1300 participants, lower FCP was reported to be associated with more severe anemia in type 2 diabetes patients [29]. Dysregulated iron metabolism, which could be caused by insulin resistance in patients with obesity or metabolic syndrome through various mechanisms [30], might further deteriorate when betacell function declined. Therefore, preoperative FCP may be a factor associated with IDA in premenopausal patients after SG. With regard to preoperative Cr, relatively lower serum Cr may reflect decreased skeletal muscle proportion or lower red meat intake as Cr is a measure of protein metabolism in subjects with normal renal function. A recently published study by Ikeda-Taniguchi et al. found malnourished patients with skeletal muscle loss showed functional iron deficiency such as iron binding and utilization capacity intolerance [31]. Thus, preoperative Cr may be a predictor of IDA after SG in premenopausal patients.
With regard to the net benefit of the model, the DCA curves showed that compared to treating all patients empirically (the grey line), treating patients after the prediction of the SVM model could produce more benefits in both sets (the black line). Moreover, the intervention or prevention of IDA mainly involved nutritional arrangements and iron supplementations, which were effective and did not have many costs, inconveniences, or many adverse effects. Therefore, the probability or risk threshold of taking these measures might be low, possibly <0.2, meaning the patients might opt for intervention in this range of newly onset IDA probability. Within this range, treating patients after the prediction of the model performed better than treating all patients empirically, which might indicate the clinicians could persuade patients from overconcern about postoperative IDA and keep them from postoperative overtreatment for IDA prevention with the assistance of the model. This study had a couple of limitations. First, this was a single-center retrospective study, thus potentially introducing selection bias. The generalization of the results to the entire bariatric population required external validation in multi-center cohorts with a larger sample size. Second, the follow-up period of our study was relatively short. Hence, the predictive capacity of our model in long-term IDA risk after SG called for further studies. Third, preoperative dietary information with a quantitative questionnaire was not collected. Due to this, the association between preoperative dietary structure and postoperative IDA risk could not be evaluated. Fourth, we did not make a body composition assessment, especially skeletal muscle, by magnetic resonance or dual-energy X-ray methods. Therefore, whether there was preoperative skeletal muscle loss in our cohort was not clear.
This study also had strengths. First, this study focused exclusively on premenopausal women, due to this population having the highest postoperative IDA incidence. Second, this study only included only patients who underwent SG, which is the most commonly recommended surgery for obesity according to the guidelines. Hence, we avoided the bias seen in other studies that evaluated the whole population with multiple types of BS, which are associated with varying metabolic effects on IDA. Third, instead of merely listing the risk factors associated with postoperative IDA, these factors were developed into a predictive model and a feasible tool for clinical practice in our study. Our ML model precisely predicted the probability of IDA at about 1 year after SG. Healthcare providers could in advance discuss the necessary postoperative arrangements with patients at risk of IDA after SG and essential healthcare resources could be assigned to these patients under the direction of our model. The multidisciplinary team for BS could plan for postoperative nutritional management and iron supplementation arrangements for the early postoperative intervention of patients at risk of IDA after SG. Furthermore, the surgery team could also take measures to mitigate the previously predicted postoperative IDA risk of the patients and then conduct a re-evaluation by using the model again. Therefore, the team could ensure that the patients would undergo the surgery at a lower risk of postoperative IDA.
In conclusion, we first devised an ML predictive model which consisted of preoperative ferritin, age, hemoglobin, Cr, and FCP and resulted in accurate prediction of IDA in premenopausal female patients with obesity after SG and may provide a reference in terms of preventive interventions. Our model had acceptable discrimination, calibration, and net benefits in predicting newly onset IDA after SG. With the pre-operative evaluation of the postoperative risk of newly onset IDA, the multidisciplinary team for BS could discuss postoperative nutritional management and iron supplementation arrangements beforehand for the early postoperative intervention of patients at risk of IDA after SG. Apart from this, the surgery team could also operate on the patients after taking measures to mitigate their predicted postoperative IDA risk. Further validation in other ethnic groups will be of interest.