Predictive Modeling for the Diagnosis of Gestational Diabetes Mellitus Using Epidemiological Data in the United Arab Emirates

: Gestational diabetes mellitus (GDM) is a common condition with repercussions for both the mother and her child. Machine learning (ML) modeling techniques were proposed to predict the risk of several medical outcomes. A systematic evaluation of the predictive capacity of maternal factors resulting in GDM in the UAE is warranted. Data on a total of 3858 women who gave birth and had information on their GDM status in a birth cohort were used to ﬁt the GDM risk prediction model. Information used for the predictive modeling were from self-reported epidemiological data collected at early gestation. Three different ML models, random forest (RF), gradient boosting model (GBM), and extreme gradient boosting (XGBoost), were used to predict GDM. Furthermore, to provide local interpretation of each feature in GDM diagnosis, features were studied using Shapley additive explanations (SHAP). Results obtained using ML models show that XGBoost, which achieved an AUC of 0.77, performed better compared to RF and GBM. Individual feature importance using SHAP value and the XGBoost model show that previous GDM diagnosis, maternal age, body mass index, and gravidity play a vital role in GDM diagnosis. ML models using self-reported epidemiological data are useful and feasible in prediction models for GDM diagnosis amongst pregnant women. Such data should be periodically collected at early pregnancy for health professionals to intervene at earlier stages to prevent adverse outcomes in pregnancy and delivery. The XGBoost algorithm was the optimal model for identifying the features that predict GDM diagnosis.


Introduction
Gestational diabetes mellitus (GDM) is a common medical condition during pregnancy and is characterized as any degree of glucose intolerance with onset or first recognition during pregnancy [1]. This definition is applicable regardless of whether insulin or diet modifications are used for treating GDM or whether the condition continues after pregnancy [2].
GDM increases the risk of maternal trauma, preeclampsia and eclampsia, premature rupture of membranes, preterm delivery, and delivery by caesarean section [3][4][5]. In the newborns, there is increased risk of macrosomia, shoulder dystocia, neonatal intensive care unit admission, and perinatal death [3,[6][7][8]. Moreover, mothers with GDM have increased risk of type 2 diabetes and cardiovascular diseases later in life [9,10], while their children have an increased risk for obesity, impaired glucose tolerance, metabolic syndromes, and cardiovascular risk profiles during adolescence and early adulthood [11][12][13]. The current evidence indicates that early detection and management of GDM improves outcomes for both mothers and their children [14].
Previous research has shown that prediction modeling is very successful in relating factors to future advent of GDM diagnosis [15]. One model developed by applying a machine learning (ML) algorithm to data extracted from health records for the first trimester to predict risk GDM at 24-28 weeks of gestation achieved an AUC of 0.86 and accuracy of 62.2%. This algorithm also included maternal factors such as age, parity, BMI, education, and other hematological and biochemical test results [16]. It is uncertain whether these models are applicable to local population in the UAE as such a study has not been done here or in the region. Furthermore, there is lack of evidence whether the use of the relatively feasible and easy to collect self-reported epidemiological data in these models would be predictive in the local community. Therefore, the purpose of the present study was to develop a simple model incorporating maternal self-reported data and triage results to predict the risk of GDM amongst pregnant women in the UAE.

Materials and Methods
This analysis is based on the pregnant women from the Emirati population who participated in a prospective cohort study in Al Ain, Abu Dhabi, UAE. Upon recruitment, women completed a baseline questionnaire and were followed up during pregnancy via medical records in the hospitals. The overall study has been described in detail elsewhere [17]. The study was approved by the United Arab Emirates University Human Research Ethics Committee (ERH-2017-5512), the Al Ain Hospital Research Ethics Committee (AAHEC-03-17-058) and the Tawam Hospital Research Ethics Committee (IRR-494). Informed written consent was obtained from the participant prior to the data collection.
Data for the current analysis were extracted from the questionnaire administered during the first point of contact with the participants recruited between May 2017 and February 2021. The questionnaire contains questions on the demographics, psychosocial factors, previous pregnancies, and behaviors during the participant's current pregnancy.
GDM was diagnosed using the mandatory testing and diagnosis standards used in all healthcare facilities in the emirate of Abu Dhabi. Specifically, between weeks 24-28 of gestation, pregnant women are required to complete a standardized oral glucose tolerance test (OGTT, fasting and 2 h post-glucose load) for GDM. Diagnosis of GDM was confirmed if fasting plasma glucose ≥ 5.1 mmol/L, one-hour plasma glucose ≥ 10.0 mmol/L, or twohour plasma glucose ≥ 8.5 mmol/L [18]. Women diagnosed with GDM were categorized into the GDM group and all other women were included in the comparison group. Women with previous type 1 or 2 diabetes mellitus were excluded in this analysis.
Features selected included maternal age, number of previous pregnancies (gravidity), previous GDM diagnosis, planned pregnancy status, infertility treatment, consanguinity, education, employment, and physical activity during and before the current pregnancy. The focus of these features was for data that were collected in the questionnaire. This was to investigate the feasibility of prediction via self-reported data and hence features collected from medical records such as most anthropometry and biomarker data were not included. From the medical records, only information about the women's GDM status and their body mass index (BMI) were used for prediction. Features used in this study have been previously shown to predict GDM diagnosis in pregnancies.
Descriptive statistics were performed to show and compare the distribution of characteristics of the study population by GDM status. Continuous variables are presented as means and standard deviations, while categorical variables are presented as counts and percentages. Student's t-tests were used to determine differences between group means for continuous variables (e.g., maternal age) and Pearson chi-square tests were used for categorical variables (e.g., maternal education). Statistical analyses were performed using Stata 15.1 (Stata Corp, College Station, TX, USA). A p-value less than or equal to 0.05 defined statistical significance.
The proposed methodology can be explained using Algorithm 1.

Algorithm 1 GDM diagnosis using ML model.
Input: X is the total dataset of Patients P. Output: diagnosis of GDM for a Patient p i ∈ P such that i = 0 is normal and i = 1 is a GDM.

1.
X imputed ← X . Predict the missing values in X using MissForest.

2.
X train , X test ← X Imputed Divided X Imputed into X train and X test . 3.
X train is used to train the ML model f (·). 4.
Use trained model f (·) to predict patient p i in X test .

5.
Return p i .
To build a GDM prediction model, the performance of three ML-based models, random forest (RF), gradient boosting model (GBM) and extreme gradient boosting model (XGBoost), were evaluated in this analysis. The RF is a ML model that consists of multiple decision tress (DT), where each tree has its own prediction. The prediction of each DT is then combined using averaging or majority vote to obtain overall output prediction. If there are multiple DTs where each tree has their own prediction for each outcome, according to RF algorithm the final prediction with the outcome being GDM will be: where Yes GDM and No GDM are the predictions of the DTs with the presence and absence of GDM, respectively. GBM is an ensemble ML classifier based on the idea of boosting, that is, if the weak learners can be modified to strong learners in an iterative way [19,20]. In GBM, gradients in the loss function are used to minimize the loss for weak learners (DTs). However, the GBM suffers from overfitting if the iterative process is not properly [20]. XGBoost [19] is a scalable ML model based on a gradient boosting framework used to build a low-depth DT iteratively to minimize a loss function [21,22] The training process add DTs iteratively to predict the errors from previous DTs before all the DTs are ensembled [23]. To express the XGBoost model containing n number of DTs are represented as: where y actual is the input sample and y pred is the predicted value by the model. In XGBoost, training is performed in an additive manner with the aim to optimize the objective function. The objective function O for m samples at the t-th iteration is represented as: In Equation (3), l(.) is the loss function and Ω(f i ) is the regulation function which can be represented as: The Shapley additive explanations (SHAP) method [24] is based on cooperative game theory [25] in which a group of team members cooperate to an outcome of a game and obtain a certain gain. Some players may contribute more than the others team members resulting in their payoff being more than others. SHAP values provide the solution to distribute the gain based on each player's contribution. Consider a game, G, consisting of N players. Let C be the coalition of players and v(C) be the cost obtained from the coalition. Then, for each individual player i from the cost function v, the SHAP value Ø can be obtained using: where π represents the set of permutations and C(π,i) is the set of players in the coalition.
The higher the Øi(v), the larger is the payoff of the individual player. Similarly, in the GDM model, each player is analogous to each feature. If the SHAP values is higher for a specific feature, it means that the specific feature is contributing more to the diagnosis of GDM. The missing data were imputed using missForest [26,27]. The dataset with the features was divided randomly into 70-30 training-testing sets and each experiment was repeated five times. SHAP values were calculated using Python implementation of TreeExplainer. To evaluate the performance of each ML model for the propped GDM diagnosis, we first obtained ROC using ML models followed by each feature's importance using SHAP value. Then, we described the positive and the negative impact towards the GDM diagnosis for each feature using SHAP values (see Section 3). The feature contribution impact plot was plotted, with a positive impact represented by a red bar and a negative impact represented by a blue color bar. The impact of global feature importance using ML model corresponding to each sample was obtained using a summary graph. Each sample in a summary graph is represented in a dot that lies on the x-axis which is determined by a SHAP value shows the contribution of that feature on the GDM diagnosis. When multiple sample lies on the same points it creates a density. Finally, dependence plots are obtained for each feature. In each dependence plot, the X-axis represents the feature while its interaction with other dependent feature is represented on the Y-axis (right side) and the SHAP values are represented on the left side of the Y-axis (see Section 3). All the experiments were conducted using Python 3.8 on Inter(R) Core i9-9900 CPU@ 3.10 GHz 8 GB RAM.

Results
The baseline characteristics of 3858 women who had information on their OGTT are presented in Table 1. Women who were diagnosed with GDM were older (32.8 vs. 29.9, p < 0.001), more parous (3.4 vs. 2.7, p < 0.001), more likely to be employed (37.5% vs. 31.3%, p = 0.006), and more likely to have self-reported previous GDM (56.0% vs. 15.6%, p < 0.001). The ROC plot represented in Figure 1 shows that the XGBoost model achieved the best auROC of 0.770 compared to 0.764 and 0.624 achieved by RF and GBM, respectively. Therefore, XGBoost-based GDM models were used for further analysis using the SHAP values.  The impact of each feature towards GDM diagnosis is represented in Figure 2. The figure shows that based on higher SHAP values, previous GDM diagnosis is the most important factor for GDM diagnosis in the current pregnancy compared to any other feature. Maternal age and BMI are the next most important factors in GDM diagnosis. Features such as self-reported physical activity before and after pregnancy, employment, infertility treatment, and planned pregnancy had very little impact on GDM diagnosis. Again, previous GDM diagnosis was identified as the most important feature, and maternal age and BMI were the second most predictive features contributing positively towards GDM diagnosis using the SHAP values. On the other hand, self-reported physical activity before and after pregnancy and higher education have a negative impact on the proposed GDM model. The impact of each feature towards GDM diagnosis is represented in Figure 2. The figure shows that based on higher SHAP values, previous GDM diagnosis is the most important factor for GDM diagnosis in the current pregnancy compared to any other feature. Maternal age and BMI are the next most important factors in GDM diagnosis. Features such as self-reported physical activity before and after pregnancy, employment, infertility treatment, and planned pregnancy had very little impact on GDM diagnosis. Again, previous GDM diagnosis was identified as the most important feature, and maternal age and BMI were the second most predictive features contributing positively towards GDM diagnosis using the SHAP values. On the other hand, self-reported physical activity before and after pregnancy and higher education have a negative impact on the proposed GDM model. A summary pot in Figure 3 shows that history of previous GDM leads to higher SHAP values; thus, it has the most significant influence on GDM diagnosis. Similarly, higher values for maternal age and BMI results in higher SHAP values. Higher education and performing physical activity before and after pregnancy decreases the risk of developing GDM diagnosis. Employment is also shown to have a negatively impact on GDM diagnosis. Other features such as infertility treatment, planned pregnancy, and consanguinity only have a slight positive impact on GDM diagnosis.   A summary pot in Figure 3 shows that history of previous GDM leads to higher SHAP values; thus, it has the most significant influence on GDM diagnosis. Similarly, higher values for maternal age and BMI results in higher SHAP values. Higher education and performing physical activity before and after pregnancy decreases the risk of developing GDM diagnosis. Employment is also shown to have a negatively impact on GDM diagnosis. Other features such as infertility treatment, planned pregnancy, and consanguinity only have a slight positive impact on GDM diagnosis.  A summary pot in Figure 3 shows that history of previous GDM leads to higher SHAP values; thus, it has the most significant influence on GDM diagnosis. Similarly, higher values for maternal age and BMI results in higher SHAP values. Higher education and performing physical activity before and after pregnancy decreases the risk of developing GDM diagnosis. Employment is also shown to have a negatively impact on GDM diagnosis. Other features such as infertility treatment, planned pregnancy, and consanguinity only have a slight positive impact on GDM diagnosis.    Figure 4 represents the dependence plot for the features using the SHAP values for each sample. The plot shows that previous GDM diagnosis has a clear interaction with gravidity, as shown in Figure 4a for women who have been previously diagnosed with GDM and those who have a gravidity more than four. In this situation, the SHAP values increase and thus the women are at a higher risk of developing GDM. For age (Figure 4b), the SHAP value increases when the age of the women is older than 35 years. Regardless of previous GDM diagnosis, increased age is a risk of developing GDM. Nevertheless, we also found that pregnant women whose age is less than 30 years are negatively impacted in the GDM model. BMI (Figure 4c) shows an interaction with age, that is, a pregnant woman whose BMI is more than 30 kg/m 2 (considered obese) has an increased risk of GDM, especially in older women. Younger pregnant women with a BMI less than 25 kg/m 2 (considered of acceptable weight) are negatively correlated with GDM diagnosis. The plot also shows that gravidity increases the risk of developing GDM. Women with lower education (Figure 4e) are at a higher risk of developing GDM, especially if they are older women. Primiparity has an interaction with education, as shown in Figure 4f, which shows that women who attend higher education and who are not primiparous are at a lower risk of developing GDM. Physical activity before pregnancy (Figure 4g) shows an interaction with previous GDM. Women who do not perform physical activity, on the other hand, have >0 SHAP values, indicating a risk of GDM diagnosis. In Figure 4i, it is shown that unplanned pregnancy in older women is a risk for developing GDM diagnosis. Employment and education have an interaction, which show that women who are unemployed with lower education may be at a higher risk of GDM diagnosis (Figure 4j). The dependence plot for infertility treatment in Figure 4k shows that younger women who had infertility treatment are at a higher risk of GDM diagnosis. GDM and those who have a gravidity more than four. In this situation, the SHAP values increase and thus the women are at a higher risk of developing GDM. For age (Figure 4b), the SHAP value increases when the age of the women is older than 35 years. Regardless of previous GDM diagnosis, increased age is a risk of developing GDM. Nevertheless, we also found that pregnant women whose age is less than 30 years are negatively impacted in the GDM model. BMI (Figure 4c) shows an interaction with age, that is, a pregnant woman whose BMI is more than 30 kg/m 2 (considered obese) has an increased risk of GDM, especially in older women. Younger pregnant women with a BMI less than 25 kg/m 2 (considered of acceptable weight) are negatively correlated with GDM diagnosis. The plot also shows that gravidity increases the risk of developing GDM. Women with lower education (Figure 4e) are at a higher risk of developing GDM, especially if they are older women. Primiparity has an interaction with education, as shown in Figure 4f, which shows that women who attend higher education and who are not primiparous are at a lower risk of developing GDM. Physical activity before pregnancy (Figure 4g) shows an interaction with previous GDM. Women who do not perform physical activity, on the other hand, have >0 SHAP values, indicating a risk of GDM diagnosis. In Figure 4i, it is shown that unplanned pregnancy in older women is a risk for developing GDM diagnosis. Employment and education have an interaction, which show that women who are unemployed with lower education may be at a higher risk of GDM diagnosis (Figure 4j). The dependence plot for infertility treatment in Figure 4k shows that younger women who had infertility treatment are at a higher risk of GDM diagnosis.

Discussion
In this analysis from self-reported epidemiological data collected during early pregnancy, three ML prediction models were developed: XGBoost, RF, and GBM. Compared to other models, the XGBoost was highly predictive (auROC = 0.77). Experiments performed for the diagnosis of GDM show that the XGBoost algorithm performed well in

Discussion
In this analysis from self-reported epidemiological data collected during early pregnancy, three ML prediction models were developed: XGBoost, RF, and GBM. Compared to other models, the XGBoost was highly predictive (auROC = 0.77). Experiments performed for the diagnosis of GDM show that the XGBoost algorithm performed well in comparison to to GBM and RF. The models showed that previous GDM diagnosis, maternal age, body mass index, and gravidity play a vital role in future GDM diagnosis.
The high predictivity of the XGBoost model in this analysis is consistent with the findings of a recent study [28] which found that the XGBoost model had a higher AUC than the logistic model (0.742 vs. 0.663, p = 0.001). XGBoost is an ensemble of multiple decision trees. XGBoost is an optimized gradient tree boosting system which also controls overfitting [29]. XGBoost can create diverse and accurate DTs that can be the reason of better performance [28,30]. Moreover, it handles the non-linear relationship in the data, and it is robust to outliers in the data. However, the black box nature of DT ensemble algorithms [31] remains a challenge to provide the local interpretations of each feature which leads to GDM diagnosis. Therefore, to provide the local interpretation of each feature we used SHAP with XGBoost algorithm for GDM diagnosis. Individual feature importance using SHAP value shows that previous GDM diagnosis is the most important factor for GDM diagnosis, followed by the age of the pregnant woman. The incorporation of SHAP value with XGBoost model enabled these local interpretations of each feature which contributes towards GDM diagnosis. XGBoost and RF which are main parts of the proposed GDM model can easily be generalized for similar populations. The GBM model is prone to noise, requires expensive parameter tuning, and may suffer from overfitting; therefore, the performance of the GBM was poor compared to RF and XGBoost. The problem of overfitting can be solved by using the optimized objective function obtained from booted trees in the XGBoost model.
The features included in this analysis are easily collected from pregnant women at antenatal care visits. As such, prediction of future GDM diagnosis using self-reported epidemiological data at early pregnancy is extremely feasible. This is crucial for many reasons. The easy prediction of women at risk of GDM is an important step to allow better antenatal care and interventions during pregnancy and even before conception [32]. Nutritional management should remain the focus, as it has shown the best prognosis for better neonatal and maternal outcomes in women with GDM [33]. The early recognition of such women using simple predictors makes management inherently manageable for both the women and the caregivers. Furthermore, practical interventions can be set up to ensure women with GDM are being proactive in their approach to dealing with the diagnosis. Many applications exist in the areas of diabetes-related dietary and physical activity management, and a customized application can be explored as per the factors highly associated with the diagnosis in each population.
A systematic review [11], reported that the risk for recurrent GDM in subsequent pregnancy was as high as 30-84% in women with prior GDM, and the variations in the GDM recurrence rate were dependent on the presence of other risk factors. The combination of high BMI with high abdominal circumference and elevated fasting glucose was associated with a 13-fold increased risk of GDM as compared to women who did not have this combination of symptoms [34]. According to Liu et al. [35], BMI and maternal age were two of the most used features for GDM prediction. Our findings showed that regardless of previous GDM history, the risk of GDM increases after the age of 35, with women under the age of 30 having a lower risk. Advanced maternal age is an independent risk factor for GDM [36]. Physical activity during pregnancy has also been shown to predict GDM in other populations. In the same meta-analysis mentioned above, physical activity both during and prior to pregnancy was associated with lower odds of GDM in pregnancy. The identification of women with sedentary behavior or poor activity via simple data such as those of this study allows for early intervention and better follow up. The predictive models also showed interactions between socio-economic factors such as education and employment in decreasing the probability of future GDM diagnosis. The clear interaction between older and uneducated women and their association to GDM diagnosis shows a clearly defined population who should be targeted for intervention. This is the same for women with previous infertility treatments who are inactive or sedentary. Such interactive models allow for defined populations to be targeted, ensuring that higher risk populations do not eventually get diagnosed with GDM so long as appropriate interventions are provided.
One of the strengths of this study is the focus of self-reported epidemiological data in predicting GDM diagnosis among pregnant women. Other studies show prediction models work well with data that are more invasive to collect, such as those from complete blood counts and anthropometry [37]. The focus of this study, however, was to give importance to features that might be easily collected from pregnant women at early gestation. This allows for close monitoring as early as possible. Women can also be told of their risk of being diagnosed and given appropriate interventions, as mentioned earlier. There are some limitations to this study. Firstly, self-reported data suffers from their own set of bias. From the ML perspective, ML models usually require enough amount of training data for better prediction. Although there is a reasonable amount of data for the GDM model, only a limited number of risk factors were used in this analysis. Other authors have used large number of features and a sufficient amount of training data. For instance, Artzi et al., Qiu et al., and Wu et al. used 2355, 50, and 50 features, respectively [16,38,39]. Another major issue is the class imbalance, as GDM cases were fewer compared to those who were not diagnosed, which makes the algorithm biased. Therefore, to balance all the classes, data balancing algorithms such SMOTE and GANS will be used in the future. Furthermore, the prediction of the proposed XGBoost can further be improved by incorporating more robust optimization techniques such as Bayesian particle swarm optimization.

Conclusions
ML models using self-reported epidemiological data are useful and feasible in prediction models for GDM diagnosis amongst pregnant women. Such data should be periodically collected at early pregnancy for health professionals to intervene at earlier stages to prevent adverse outcomes in pregnancy and delivery. The XGBoost algorithm was the optimal model for identifying the features that predict GDM diagnosis. SHAP values using XGBoost further identify the interactions of some variables in determining GDM diagnosis.  Informed Consent Statement: Informed written consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data presented in this study can be made available on request from the Mutaba'ah study. Approval from research ethics committee may be required.