Prediction of HELLP Syndrome Severity Using Machine Learning Algorithms—Results from a Retrospective Study

(1) Background: HELLP (hemolysis, elevated liver enzymes, and low platelets) syndrome is a rare and life-threatening complication of preeclampsia. The aim of this study was to evaluate and compare the predictive performances of four machine learning-based models for the prediction of HELLP syndrome, and its subtypes according to the Mississippi classification; (2) Methods: This retrospective case-control study evaluated pregnancies that occurred in women who attended a tertiary maternity hospital in Romania between January 2007 and December 2021. The patients’ clinical and paraclinical characteristics were included in four machine learning-based models: decision tree (DT), naïve Bayes (NB), k-nearest neighbors (KNN), and random forest (RF), and their predictive performance were assessed; (3) Results: Our results showed that HELLP syndrome was best predicted by RF (accuracy: 89.4%) and NB (accuracy: 86.9%) models, while DT (accuracy: 91%) and KNN (accuracy: 87.1%) models had the highest performance when used to predict class 1 HELLP syndrome. The predictive performance of these models was modest for class 2 and 3 of HELLP syndrome, with accuracies ranging from 65.2% and 83.8%; (4) Conclusions: The machine learning-based models could be useful tools for predicting HELLP syndrome, and its most severe form—class 1.


Introduction
HELLP (hemolysis, elevated liver enzymes, and low platelets) syndrome is a rare complication of preeclampsia (PE), associated with high feto-maternal mortality and morbidity rates [1]. The prevalence of HELLP syndrome ranges from 0.5% to 0.9% [2]. The majority of cases (70%) occur during the third trimester of pregnancy or within 48 h after birth, but few of them can manifest as early as the second trimester of pregnancy, after 20 weeks of gestation [3]. The perinatal death rate is approximately 37%, while the maternal mortality rate ranges between 0-24% [4,5].
Maternal complications due to HELLP syndrome are represented by postpartum hemorrhage, placental abruption, heart failure, pulmonary edema, acute kidney injury, disseminated intravascular coagulation (DIC), acute respiratory distress syndrome (ARDS), stroke, and hepatic injury, while neonatal complications include preterm birth, low birth weight, birth asphyxia, neonatal intensive care unit (NICU) admission, and neonatal resuscitation [6,7]. These complications depend on the severity of HELLP syndrome.
The Mississippi triple-class system divides the severity of HELLP syndrome considering the platelet count (PLT), serum asparate aminotransferase (AST) or alanine aminotransferase (ALT) levels, and lactate dehydrogenase (LDH) levels [4]. Class 1 HELLP syndrome is the most severe form, defined as PLT ≤ 50.000/mm 3 , AST or ALT ≥ 70 IU/L, and (No. 151/13.02.2022). Informed consent was obtained from all participants included in the study. All methods were carried out in accordance with relevant guidelines and regulations.

Study Population
We recruited participants at the time of admission to our tertiary center. The inclusion criteria taken into consideration were as follows: pregnant patients with singleton pregnancies who developed HELLP syndrome, maternal age ≥ 18, and certain first trimester pregnancy dating. Exclusion criteria comprised patients who had multiple pregnancies, fetuses with chromosomal or structural abnormalities, intrauterine infection, incomplete medical records, incorrect/lack of first trimester sonographic pregnancy dating, or who were unable to offer informed consent.

Data Collection
Maternal characteristics and previous medical history were evaluated by a physician, and maternal risk factors for preeclampsia and HELLP syndrome were recorded in the database. The following parameters were evaluated: demographic data, age, parity, BMI (body mass index), smoking status during pregnancy, the use of ART, obstetrical comorbidities, personal history of PE, diabetes, pre-existing chronic hypertension, renal disease, hepatic disease, thrombophilia, systemic lupus erythematosus (SLE), antiphospholipid syndrome (APS), serum values of LDH, AST, PLT, highest systolic (SBP), and diastolic (DBP) blood pressure values, symptoms, HELLP syndrome complications (eclampsia, abruptio placentae, pulmonary edema, hepato-renal syndrome, hepatic insufficiency, sepsis), and number of maternal deaths.
The APRI was calculated using the following formula: [AST (IU/l)/AST (Upper Limit of Normal-IU/l)/Platelet count (10 9 /l)] × 100. Blood pressure was measured using the Fetal Medicine Foundation (FMF) guidelines [19] with a calibrated device (Omron M3 COMFORT; Omron Corp, Kyoto, Japan). The following pregnancy outcomes were recorded: type of birth, presentation, gestational age at birth, newborn's gender, birthweight, length, Apgar scores at 1 and 5 min, NICU admission, neonatal death.

Study Groups
A total of 161 patients were included in the analysis of this study and were divided into two groups: those who developed HELLP syndrome (81 patients, group 1), and those who did not develop HELLP syndrome (80 patients, group 2). The pregnant patients affected by HELLP syndrome were subsequently divided into the following subgroups according to the Mississippi classification: subgroup 1 (Class 1, n = 21), subgroup 2 (Class 2, n = 35), and subgroup 3 (Class 3, n = 25).

First Step: Statistical Analysis
In the first stage of the statistical analysis, each variable was evaluated with chi-squared and Fisher's exact tests for categorical variables, which were presented as frequencies with corresponding percentages, and t-tests for continuous variables, which were presented as means and standard deviations (SD).
ANOVA analysis with the Bonferroni post-hoc test was used to determine whether there is a statistically significant difference between the subgroups regarding their paraclinical characteristics (AST, LDH, PLT, APRI, SBP, DBP), and boxplots were used for graphical representations of these differences. The statistical analyses were performed using STATA SE (version 14, 2015, StataCorp LLC, College Station, TX, USA).

Machine Learning Analysis-Data Inclusion and Types of Algorithms
In the second stage of the analysis, we evaluated the predictive performance of four machine learning-based models: decision tree, naïve Bayes, k-nearest neighbors (KNN), and random forest algorithm. Clinical data included in the evaluation were as follows: demographic, age, parity, BMI, smoking status during pregnancy, the use of ART, obstet-Diagnostics 2023, 13, 287 4 of 13 rical comorbidities, and pathologic personal history (PE, diabetes, pre-existing chronic hypertension, renal disease, hepatic disease, thrombophilia, SLE, APS). The paraclinical data included: APRI score, SBP, and DBP.

Feature Selection and Data Processing
Our first step in helping the data mining process was to standardize the parameters using AutoData Prep from the field operations. No feature selection was employed because we wanted to assess the overall predictive performance of a combination of clinical and paraclinical parameters that are routinely determined by clinicians.
Data were further segregated into data for testing (70%) and training (30%). In order to protect from overfitting, all models underwent 5-fold cross validation. After cross validation of both testing and training data, we calculated the models' predictive performance based on the training results. Their true positive rates (TPR), false negative rates (FNR), false detection rates (FDR), accuracies, values for area under the receiver operating characteristic curve (AUROC), precision, and F1 scores were calculated, and compared for HELLP syndrome, class 1, 2, and 3 subgroups, respectively. The comparison was made using ROC analysis, and the results were plotted. The models were constructed and analyzed using IBM SPSS Modeler (version 1.0.0.399, IBM Corporation, Armonk, NY, USA).

Clinical and Paraclinical Characteristics of the Patients Included in the Main Groups
A total of 161 pregnant patients were evaluated in our retrospective study. Their clinical and paraclinical characteristics are presented in Table 1, segregated into the following groups: those who developed HELLP syndrome (81 patients, group 1), and those who did not develop HELLP syndrome (80 patients, group 2). The first group had significantly more patients with a personal history of chronic hypertension (p = 0.024), SLE/APS (p = 0.004), thrombophilia (p = 0.007), and preeclampsia in previous pregnancies (p = 0.004). Moreover, obesity (p < 0.001), nulliparity (p = 0.021), and the use of ART (p = 0.007) were significantly more frequently encountered in the first group compared to the second group (p < 0.001). Regarding the evaluated paraclinical characteristics, the APRI score, SBP, and DBP were significantly higher for the HELLP group (p < 0.001).

Pregnancy Outcomes of the Patients Included in the Main Groups
Pregnancy outcomes for the main groups are presented in Table 2. Pregnancies affected by HELLP syndrome were significantly associated with complications such as preterm birth (p < 0.001), intrauterine growth restriction (p < 0.001), and oligoamnios (p = 0.01). The patients in the first group had a significantly higher cesarean delivery rate (n = 77 patients, 95.06%; p < 0.001) and their newborns had significantly lower birthweight, Apgar scores at 1 and 5 min, and length (p < 0.001). Moreover, significantly more newborns from mothers with HELLP syndrome were admitted to NICU for specific treatment (p = 0.015), and three newborn deaths were recorded. Overall, three maternal deaths (3.7%) were recorded in the HELLP syndrome group, while only one neonatal death was recorded in the control group. Hepatic insufficiency, pulmonary edema, and sepsis (n = 1, 1.25%), and hepatorenal syndrome (n = 3, 3.7%) were among the complications recorded in the main group.

Subgroups Comparisons
We further comparatively analyzed the paraclinical characteristics and symptoms of the following subgroups: subgroup 1 (Class 1, n = 21), subgroup 2 (Class 2, n = 35), and subgroup 3 (Class 3, n = 25) (Tables 3 and 4). The serum values of LDH and AST were significantly higher, and the number of platelets was significantly lower for the first subgroup when compared to the second and third subgroups (p < 0.001). Moreover, we observed an ascending trend for APRI, SBP, and DBP values from the third subgroup to the first one, although the latter did not reach a statistically significant level (p = 0.21). Graphical representations of the subgroup's comparisons are represented in Figures 1-4. Although all the analyzed symptoms were more prevalent in the second subgroup, we could not find any statistically significant differences between subgroups.

Subgroups Comparisons
We further comparatively analyzed the paraclinical characteristics and symptoms of the following subgroups: subgroup 1 (Class 1, n = 21), subgroup 2 (Class 2, n = 35), and subgroup 3 (Class 3, n = 25) (Tables 3 and 4). The serum values of LDH and AST were significantly higher, and the number of platelets was significantly lower for the first subgroup when compared to the second and third subgroups (p < 0.001). Moreover, we observed an ascending trend for APRI, SBP, and DBP values from the third subgroup to the first one, although the latter did not reach a statistically significant level (p = 0.21). Graphical representations of the subgroup's comparisons are represented in Figures 1-4. Although all the analyzed symptoms were more prevalent in the second subgroup, we could not find any statistically significant differences between subgroups.

The Predictive Performance of Machine Learning-Based Models for the HELLP Syndrome and Its Subtypes
In the second stage of the analysis, we incorporated the pregnant patient's clinical and paraclinical characteristics into four machine learning-based models, and we calculated their predictive performance (Table 5). DT achieved the highest accuracy when predicting class 1 HELLP syndrome (91%), with an TPR value of 94.9%. Although the RF algorithm had the highest TPR value for class 1 HELLP prediction (88.6%), its best performance was achieved when used to predict all types of HELLP syndrome, with an accuracy of 89.4%. A similar situation described the predictive performance of NB model, which achieved an accuracy of 86.9% for all types of HELLP syndrome, and a TPR of 88.3% for class 3 HELLP syndrome. Finally, the KNN model appeared to have the highest predictive performance when used to predict class 1 HELLP syndrome, achieving an accuracy of 87.1%, and an AUROC value of 0.81.

ROC Analysis
Graphical representations of multiple ROC curves comparisons between machine learning algorithms in relationship with the evaluated groups are presented in Figures 5-8, and the results from the ROC analysis are indicated in Table 6. The latter analysis confirmed a statistically significant difference (p < 0.05) regarding the AUROC values of various ML models in relationship with the evaluated groups.

ROC Analysis
Graphical representations of multiple ROC curves comparisons between machine learning algorithms in relationship with the evaluated groups are presented in Figures 5-8, and the results from the ROC analysis are indicated in Table 6. The latter analysis confirmed a statistically significant difference (p < 0.05) regarding the AUROC values of various ML models in relationship with the evaluated groups.

Discussion
This is the first retrospective study in the literature that trained four machine-learning based models (DT, NB, KNN, and RF) for the prediction of HELLP syndrome and its severity (three classes) in a cohort of pregnant patients with singleton pregnancies, using clinical and paraclinical characteristics.

Discussion
This is the first retrospective study in the literature that trained four machine-learning based models (DT, NB, KNN, and RF) for the prediction of HELLP syndrome and its severity (three classes) in a cohort of pregnant patients with singleton pregnancies, using clinical and paraclinical characteristics.
The bagging algorithm serves as the foundation for RF, which employs ensemble learning [20]. It creates as many trees on the subset of the data and combines the output of all the trees. In doing so, it lessens the issue of overfitting in decision trees, as well as lowering variance and raising accuracy. On the other hand, NB is suitable for solving multi-class prediction problems, especially when using small datasets, and has much lower costs than RF [21]. DT models are similar to NB in terms of handling small datasets and low costs, but have the disadvantage of overfitting without a proper data standardization process [22]. Finally, one of the biggest advantages of the KNN model is that it can be used both for classification and regression problems, but does not perform well on imbalanced data [23].
In a recent review by Uddin et al., the authors aimed to investigate the predictive performance of various machine learning approaches and showed that for papers that used only clinical and demographic data, DT had the highest accuracy, while for those articles that used research data other than 'clinical and demographic' type, SVM and RF have been found to show the superior accuracy at most times [24]. Another retrospective study that evaluated the predictive performance of two ML models for the prediction of adverse outcomes, including HELLP syndrome, in patients with suspected preeclampsia, demonstrated similar performances in terms of positive predictive values for gradientboosted tree model (88 ± 6%) and random forest classifier (88 ± 6%) [16].
Our models used both clinical and paraclinical data, and showed superior predictive performance for the severe form of HELLP syndrome (class 1), which constitutes an advantage for physicians who follow the clinical progression of this disorder. The clinical characteristics can be easily obtained from the patient's anamnesis and medical records, while the paraclinical characteristics used, including APRI score, could be rapidly determined in the local hospitals, allowing a possible anticipation of the pregnancy's adverse outcome.
Pregnancies affected by HELLP syndrome presented significantly more severe outcomes such preterm birth (n = 60, 74.07%), intrauterine growth restriction (n = 33, 40.74%), oligoamnios (n = 13, 16.05%), low Apgar scores at 1 (6.44 ± 2.31) and 5 (6.67 ± 2.65) minutes of the newborns, and more NICU admissions (n = 13, 16.05%). A recent retrospective study by Li et al. that analyzed the similarities and differences in the clinical features and pregnancy outcomes in various forms of PE and HELLP syndrome reported similar results, and a global incidence of adverse maternal outcomes of 61.4% [25].
Our study has several limitations, including a small cohort of patients and number of predictors, but at the same time, the trained models have the advantage of an easier implementation by the physicians. All chosen machine learning-based models have the ability to handle small sample data [26]. Moreover, the used algorithms have proven superior predictive performance when applied for datasets based mainly on categorical predictors in comparison with other models such as gradient boosting [27], artificial neural networks [28], support vector machines, extreme gradient boosting, multilayer perceptron [29], or linear discriminant analysis [30].
We hypothesize that the model's accuracy could be improved by adding specific sonographic and serum markers for preeclampsia prediction, since their physiopathology is closely related [31][32][33][34][35]. On the other hand, this is the first study in the literature that evaluated the predictive performance of four machine learning-based models for the prediction of HELLP syndrome and its subtypes on a cohort of pregnant patients from a tertiary center during a 14-year timeframe.
Further studies on larger cohorts of patients could evaluate the predictive performance of these ML-based models in different settings and populations. The results could aid clinicians in the risk stratification process of pregnant patients and could help calculate the risk-benefit ratio in order to support the decision of prompt delivery versus conservative management of the case.

Conclusions
This is the first retrospective study in the literature that trained four machine learningbased models based on clinical and paraclinical characteristics for the prediction of HELLP syndrome and its severity in a cohort of pregnant patients with singleton pregnancies.
Pregnancies affected by HELLP syndrome were associated with significantly more adverse pregnancy outcomes such as preterm birth, intrauterine growth restriction, oligoamnios, low Apgar scores at 1 and 5 min, and more NICU admissions.
The results from our study indicated that HELLP syndrome was best predicted by RF and NB models, while DT and KNN models had the highest performance when used to predict class 1 HELLP syndrome.
Further studies on larger cohorts of patients could demonstrate an improvement of the predictive performance of these models by adding traditional markers for preeclampsia. Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author. The data are not publicly available due to local policies.

Conflicts of Interest:
The authors declare no conflict of interest.