Abstract
The increasing prevalence of gestational diabetes mellitus (GDM) is contributing to the rising global burden of type 2 diabetes (T2D) and intergenerational cycle of chronic metabolic disorders. Primary lifestyle interventions to manage GDM, including second trimester dietary and exercise guidance, have met with limited success due to late implementation, poor adherence and generic guidelines. In this study, we aimed to build a preconception-based GDM predictor to enable early intervention. We also assessed the associations of top predictors with GDM and adverse birth outcomes. Our evolutionary algorithm-based automated machine learning (AutoML) model was implemented with data from 222 Asian multi-ethnic women in a preconception cohort study, Singapore Preconception Study of Long-Term Maternal and Child Outcomes (S-PRESTO). A stacked ensemble model with a gradient boosting classifier and linear support vector machine classifier (stochastic gradient descent training) was derived using genetic programming, achieving an excellent AUC of 0.93 based on four features (glycated hemoglobin A1c (HbA1c), mean arterial blood pressure, fasting insulin, triglycerides/HDL ratio). The results of multivariate logistic regression model showed that each 1 mmol/mol increase in preconception HbA1c was positively associated with increased risks of GDM (p = 0.001, odds ratio (95% CI) 1.34 (1.13–1.60)) and preterm birth (p = 0.011, odds ratio 1.63 (1.12–2.38)). Optimal control of preconception HbA1c may aid in preventing GDM and reducing the incidence of preterm birth. Our trained predictor has been deployed as a web application that can be easily employed in GDM intervention programs, prior to conception.
1. Introduction
The prevalence of gestational diabetes mellitus (GDM) is increasing globally, affecting one in five pregnancies in some populations [1]. GDM is a condition in which a woman without previous diabetes develops glucose intolerance during pregnancy [2]. This condition increases the risk of developing GDM-related complications such as hypertensive disorders of pregnancy, fetal macrosomia, caesarean section, shoulder dystocia and birth injuries [3]. Poorly controlled GDM also increases risks of premature birth, perinatal mortality and neonatal morbidity. GDM has long-term implications as women with a history of GDM have a 10-fold higher risk of developing type 2 diabetes (T2D) as well as higher risk of developing cardiovascular adversities compared to those with a normoglycemic pregnancy [4,5]. Offspring of mothers with GDM are also at an increased risk of having cardiometabolic adversities, resulting in a transgenerational cycle of diabetes and cardiovascular diseases [6].
Healthcare systems across the world use either the high risk selective screening approach or universal screening of GDM in pregnant women. The American Diabetes Association (ADA) endorses the use of either a one-step approach (IADPSG diagnostic criteria, fasting two-hour, three-point 75 g oral glucose tolerance test (OGTT)) or an older two-step approach (non-fasting one-hour 50 g glucose challenge test (GCT), followed by diagnostic fasting three-hour 100 g OGTT on a subset of women exceeding the glucose threshold value of GCT) at 24–28 weeks’ gestation [7]. The UK NICE recommends high risk selective screening for women with known GDM risk factors, such as obesity (body mass index (BMI) ≥30 kg/m2), family history of diabetes, history of GDM, previous delivery of a macrosomic baby (≥4.5 kg) and being in an ethnic group with a high prevalence of diabetes (South Asian, Black Caribbean or Middle Eastern) [8]. In the latest UK NICE 2015 guidelines, women with a history of GDM are offered an OGTT at their booking appointment [8]. Women with other risk factors are offered an OGTT at 24–28 weeks’ gestation. The International Diabetes Federation (IDF) GDM Model of Care [9] recommends that all pregnant women are screened at first visit by a fasting glucose, HbA1c or random glucose sample to rule out pre-existing diabetes. In those with normal early screening, an OGTT is performed at 24–28 weeks’ and 32 weeks’ gestation (for high risk women) to assess the risk of GDM.
Pre-existing abnormalities in maternal metabolism are important factors in the pathophysiology of metabolic diseases and fetal programming. GDM intervention typically focuses on counseling, dietary modification and increasing physical activity. The daily self-monitoring of blood glucose is aimed at normalizing blood glucose levels and reducing the complications of GDM. Primary lifestyle interventions to manage GDM, such as diet and exercise in the second trimester, provide limited benefits for the mother and child due to late implementation, poor adherence and generic guidelines [10]. Preconception presents an important opportunity to break the intergenerational cycle of chronic metabolic disorders. The Lancet series on preconception maternal health in 2018 highlighted preconception as a critical period for shaping pregnancy outcomes and subsequent maternal and child health [11,12,13].
In recent years, some machine learning models have been developed for population based GDM risk stratification. However, the current state-of-the-art models are only applicable during pregnancy, which can be too late for effective intervention. Artzi et al. trained a LightGBM gradient boosting classifier with Israel’s Electronic Health Records (EHR) data for onset of GDM (area under the receiver operating characteristic curve (AUC) of 0.80 was achieved with nine questionnaire features) [14]. In another study, Wu et al. trained a logistic regression classifier with China’s EHR data for early GDM prediction (AUC of 0.77 was achieved with seven clinical features) [15]. To date, there have been no studies applying machine learning for GDM risk assessment in a preconception population. We therefore would like to suggest a paradigm shift in GDM management strategy.
In this study, we developed a machine learning model for early prediction of GDM during preconception among women in Singapore. Taking a longitudinal approach, we also assessed the associations of the strongest predictors with GDM and adverse birth outcomes (preterm birth, low birthweight at term and large for gestational age infant). Our machine learning models were implemented using data from the prospective Singapore Preconception Study of Long-Term Maternal and Child Outcomes (S-PRESTO) cohort study.
2. Materials and Methods
2.1. Study Design
S-PRESTO (ClinicalTrials.gov NCT03531658) is a prospective, preconception cohort study of multi-ethnic groups (Chinese, Malay, Indian or any combination of these three ethnicities) [16]. Women planning for pregnancies were recruited from the KK Women’s and Children’s Hospital (KKH) and community between February 2015 and October 2017. There were 1032 unique participants for preconception; 475 conceived singleton pregnancies within a year of enrollment into the study, and 373 remained in the study and had a livebirth. The mother–child dyads have been followed for seven years, with longitudinal phenotypic data collected across multiple health domains.
Maternal glucose tolerance status was assessed longitudinally using 75 g 2 h oral glucose tolerance test (OGTT) preconception, mid-gestation (median 28.1 weeks, interquartile range 27.3–28.7 weeks) and 3 months postpartum, alongside glycated hemoglobin A1c (HbA1c) at the same timepoints. The International Association of Diabetes and Pregnancy Study (IADPSG)/World Health Organization (WHO) 2013 criteria (fasting plasma glucose ≥ 5.1–6.9 mmol/L, 1 h plasma glucose ≥ 10.0 mmol/L and 2 h plasma glucose ≥ 8.5–11.0 mmol/L) were used to diagnose GDM [17]. The WHO 2006 criteria (fasting plasma glucose ≥ 7.0 mmol/L or 2 h plasma glucose ≥ 11.1 mmol/L) were used to diagnose impaired fasting glucose (IFG), impaired glucose tolerance (IGT) and type 2 diabetes (T2D) [18]. An HbA1c of ≥6.5% was used as the cut-off point for diagnosing diabetes based on WHO recommendations [19].
Participants diagnosed with T2D based on preconception and 3 months postpartum OGTT or HbA1c readings were excluded from model training. GDM analysis was restricted to mothers whose gestation at the time of antenatal OGTT was 24+1–28+6 weeks (gestational age is given as weeks+days). Participants of mixed ethnicity or unclassifiable GDM status due to missing glucose readings were removed from the final analysis set.
Our models were built using 222 preconception women who had complete data on demographics, medical/obstetric history, physical measures, blood-derived markers, lifestyle factors and antenatal OGTT (Figure 1). The prevalence of GDM was 13.1% in our training dataset. Participant characteristics are presented in Table 1.
Figure 1.
Sample Participant characteristics. Sample selection flowchart of 222 preconception women who had complete data on demographics, medical/obstetric history, physical measures, blood-derived markers, lifestyle factors and antenatal OGTT for machine learning models.
Table 1.
Participant characteristics at preconception baseline. Participant characteristics table on demographics, medical/obstetric history, physical measures, blood-derived markers, lifestyle factors, metabolic indices, prediabetes status, antenatal OGTT and adverse birth outcomes. Continuous variables are presented as group mean value and standard deviation. Categorical variables are presented as count and percentage.
Information on demographics (age, ethnicity) and medical/obstetric history (family history of diabetes mellitus, history of GDM, parity and medical history of high blood pressure) were derived from preconception questionnaires. Lifestyle factors on self-reported smoking and alcohol consumption were also collected at preconception.
The physical measures at preconception were included for feature selection modeling. Weight was measured to the nearest 0.1 kg (SECA 803) and height to the nearest 0.1 cm (SECA 213). BMI was derived using weight divided by height squared (kg/m2). Waist circumference was measured to the nearest 0.1 cm (SECA 203). Additionally, mid-upper arm circumference was measured to the nearest 0.1 cm, midway between acromion process and olecranon process (SECA 212). Systolic and diastolic blood pressure were measured using the Microlife BP 3AS1-2 blood pressure device. Mean arterial blood pressure was further derived by doubling the diastolic blood pressure and adding to the systolic blood pressure, with the composite sum divided by 3.
Sodium fluoride/potassium oxalate tubes were used to collect blood samples for plasma glucose measurement. Potassium EDTA tubes were used to collect whole blood samples for HbA1c measurement. All samples were kept at 4 °C, immediately sent to the hospital laboratory, centrifuged within 30 min and analyzed within 1 h from the time of earliest blood draw. Fasting plasma glucose, 30 min plasma glucose, 1 h plasma glucose (antenatal OGTT only), 90 min plasma glucose (antenatal OGTT only), 2 h plasma glucose and HbA1c concentrations were measured using the ARCHITECT c8000 Clinical Chemistry Analyzer (Abbott Laboratories), which is a National Glycohemoglobin Standardisation Program (NGSP) certified method for HbA1c testing. The preconception HbA1c marker was included for feature selection modeling.
Longitudinally obtained plasma samples were analyzed for fasting insulin, triglycerides (TGs), high density lipoprotein (HDL) cholesterol and gamma-glutamyl transferase at the National University Hospital (NUH) clinical laboratory (accredited by the College of American Pathologists [20]). Maternal venous blood was collected into silicone coated tubes, and serum was obtained by centrifugation at 1600× g for 10 min at 4 °C. The serum was stored at −80 °C until sample batch analysis. Insulin was measured using the Sandwich immunoassay (Beckman DxI 800 analyzer, manufactured by Beckman Coulter in Fullerton, CA, USA). Using a Beckman AU5800 analyzer, TG and gamma-glutamyl transferase were measured by colorimetric assays and HDL cholesterol using an enzymatic assay. These blood markers were subsequently used for the derivation of metabolic indices and machine learning modeling.
The homeostasis model assessment of insulin resistance (HOMA-IR) index was calculated based on the formula [21]:
In addition, the TG/HDL cholesterol ratio was calculated based on the fasting lipid concentrations to assess insulin resistance [22].
Fatty liver index as a surrogate marker of non-alcoholic fatty liver disease (NAFLD) was calculated with 4 variables (triglycerides (TGs), BMI, gamma-glutamyl transferase (GGT) and waist circumference (WC)) [23]:
Metabolic syndrome status was defined when three or more of the following criteria were fulfilled: waist circumference > 80 cm, triglycerides ≥ 1.7 mmol/L, HDL cholesterol ≤ 1.3 mmol/L, blood pressure ≥ 130/85 mm Hg, fasting plasma glucose ≥ 6.1 mmol/L [24].
Age, ethnicity, family history of diabetes mellitus, history of GDM, parity, height, BMI, mid-upper arm circumference, mean arterial blood pressure, HbA1c, fasting insulin, self-reported smoking, self-reported alcohol consumption, TG/HDL ratio, fatty liver index and metabolic syndrome variables were included for feature selection modeling.
2.2. Machine Learning Methodology and Statistical Analyses
Our methodological novelty lies in combining coalitional game theory concepts with evolutionary algorithm-based automated machine learning (AutoML). Automating the process of machine learning enables the best possible model to be built for our supervised machine learning problem. The optimal machine learning pipelines were automatically generated using genetic programming (GP), a type of evolutionary algorithm [25,26]. An introduction to GP is provided in the Supplementary Materials. In brief, GP solves machine learning tasks based on random mutation, crossover, fitness functions and generations to arrive at optimal solutions (models and hyperparameters).
The Shapley additive explanations (SHAP) framework [27] was combined with the evolutionary algorithm-based Tree-Based Pipeline Optimization Tool (TPOT) [28] to discover novel features and select optimal supervised machine learning models. We explored the interaction effects of multiple predictors using the SHAP framework methodology. In game theory, the Shapley value is the average expected marginal contribution of one player across all possible permutations of players (average effects of team member composition and team size). The Shapley value helps to determine a payoff for all the game players when each player might have contributed more or less than the others when working in coalition. In machine learning, game players are the features, and collective payout is the model prediction. The SHAP framework provides local explanations based on exact Shapley values to understand the global model structure. For every possible feature ordering, features are introduced one at a time into a conditional expectation function of the model’s output, and changes in expectation are attributed to the introduced feature, averaged over all possible feature orderings in a fair manner. SHAP values represent a change in log odds ratio. Lundberg and Lee have proposed SHAP as the only additive feature attribution method that satisfies two important properties of game theory—additivity (local accuracy) and monotonicity (consistency) [27]. The integrated game theoretical approach with automated machine learning further advances biomedical data science for data-driven precision care.
The AutoML models were built using Anaconda’s distribution of Python v3.7.9 programming language in a JupyterLab computational environment. Community-developed Python packages were used for modular programming: Pandas v0.25.3, Numpy v1.19.2, Matplotlib v3.3.2, Scikit-learn v0.23.2, TPOT v0.11.7 and Shap v0.37.0. We trained the AutoML models on a Linux server with an Intel Xeon Gold 6138 CPU processor. In the TPOT classifier, the search for optimal machine learning pipelines was run over 100 generations with 100 individuals retained in the genetic programming population of every generation. We used 5-fold stratified cross validation to preserve the same proportion of GDM cases in each fold, and model performances were evaluated using the area under the receiver operating characteristic curve (AUC).
The AutoML feature selection model based on preconception feature variables was trained with GDM as the outcome; top predictors with SHAP value magnitudes greater than zero were included in the GDM prediction models. Sensitivity analyses were performed to explore the prediction effects of fasting glucose, systolic blood pressure and HOMA-IR index in the proposed AutoML model. We also assessed the associations between the strongest predictors and GDM outcome/adverse birth outcomes (preterm birth, low birthweight at term and large for gestational age infants). Preterm birth was defined as livebirth before 37 weeks of pregnancy [29]. Low birthweight at term was defined as birthweight less than 2500 g in term births (37–42 weeks of pregnancy) [29]. The sex-specific birthweight for gestational age percentile was derived by making reference to Growing Up in Singapore Towards Healthy Outcomes (GUSTO) healthy newborn weight percentile [30], which was based on the generic reference for birthweight percentiles created by Mikolajczyk et al. [31]. Large for gestational age infants have a birthweight of more than 90th percentile. Additional sensitivity analyses were performed by excluding preconception women with prediabetes (IFG and IGT). All association analyses were performed using Stata/MP 17.0 software (StataCorp LP, College Station, TX, USA).
3. Results
3.1. Top Predictors from AutoML Feature Selection Model
Figure 2 presents the SHAP global importance plot of the AutoML feature selection model. A stacked ensemble model with a random forest classifier and linear support vector machine classifier (stochastic gradient descent training) was the best machine learning pipeline evaluated by TPOT (AUC: 0.89). The top preconception feature variables impacting the model outputs were HbA1c, fatty liver index, mean arterial blood pressure, fasting insulin, TG/HDL ratio, height, age, mid-upper arm circumference, BMI, parity, alcohol consumption, family history of diabetes mellitus and Chinese ethnicity.
Figure 2.
SHAP Global Importance Plot. Global importance of individual features and their correlation with GDM/non-GDM outcomes estimated using the Shapley values computed from coalitional game theory. SHAP values represent a change in log odds ratio. SHAP values of zero means that the feature does not contribute to the prediction.
Pre-pregnancy BMI demonstrated small predictive effects relative to preconception HbA1c. Chinese women also had a higher risk of GDM when compared with Indian and Malay women. The latter observation could be attributed to the high proportion of Chinese ethnic participants in the analysis set (79.3%). A history of GDM was a redundant feature in the AutoML feature selection model possibly due to the low frequency of participants with a history of documented GDM (2.7%). Metabolic syndrome status preconception did not contribute to GDM prediction.
3.2. Preconception Predictive Risk Model
The preconception predictive risk model for GDM was sequentially constructed using top predictors with SHAP value magnitudes greater than zero (Table 2). Preconception HbA1c alone was able to predict GDM outcome with high discrimination (AUC: 0.81). A model with nine features obtainable non-invasively (mean arterial blood pressure, height, age, mid-upper arm circumference, BMI, parity, alcohol consumption, family history of diabetes, Chinese ethnicity) was also able to predict GDM outcome with good performance (AUC: 0.81). The optimal machine learning pipeline comprises five features (HbA1c, fatty liver index, mean arterial blood pressure, fasting insulin, TG/HDL ratio). The extra trees classifier was the best machine learning pipeline evaluated by TPOT (AUC: 0.93). In the sensitivity analysis (see Supplementary Table S1), we observed that model performance was still maintained by dropping the fatty liver index as a feature variable. Based on the remaining four features, a stacked ensemble model with a gradient boosting classifier and linear support vector machine classifier (stochastic gradient descent training) was the best machine learning pipeline evaluated by TPOT (AUC: 0.93). The four-feature model comprising HbA1c, mean arterial blood pressure, fasting insulin and TG/HDL ratio is our proposed solution for a preconception-based GDM predictor. The exported AutoML pipeline for the best predictive model is provided in the Supplementary Materials.
Table 2.
Construction of preconception predictive risk model. The preconception predictive risk model for GDM was sequentially constructed using top predictors with SHAP value magnitudes greater than zero in the AutoML feature selection model. The optimal machine learning pipeline for each model and area under the receiver operating characteristic curve (AUC) performance metric are reported. The proposed AutoML model was also robust when replacing HbA1c with fasting glucose (AUC: 0.87), replacing mean arterial blood pressure with systolic blood pressure (AUC: 0.91) and replacing fasting insulin with HOMA-IR index (AUC: 0.91) (Supplementary Table S1). HbA1c had the greatest impact on model performance changes (ΔAUC = −0.06), followed by mean arterial blood pressure (ΔAUC = −0.02) and fasting insulin (ΔAUC = −0.02). Given these observations, maternal insulin resistance around conception can be postulated as an important determinant in the pathophysiology of metabolic diseases and fetal programming.
3.3. Associations of Top Predictors and GDM Outcome
Table 3 presents the associations of the strongest predictors identified from the AutoML feature selection model for GDM. Each 1 mmol/mol increase in preconception HbA1c was positively associated with GDM, independent of maternal ethnicity, age, parity, family history of diabetes mellitus and pre-pregnancy BMI (p = 0.001, OR (95% CI) 1.34 (1.13–1.60)).
Table 3.
Associations of top predictors and GDM outcome. Associations of top predictors identified from AutoML feature selection model and GDM outcome. Statistical tests were conducted two-sided with a significance level of 5%. All confidence intervals (CIs) are presented two-sided with a confidence level of 95%. The odds ratios (ORs) with 95% CI are presented. A resultant p-value of less than 0.05 is considered statistically significant.
3.4. Associations of Top Predictors and Adverse Birth Outcomes (Preterm Birth, Low Birthweight at Term and Large for Gestational Age Infant)
Similarly, Table 4 presents the associations of top GDM predictors with adverse birth outcomes (preterm birth, low birthweight at term and large for gestational age infant). Each 1 mmol/mol increase in preconception HbA1c was positively associated with preterm birth outcome, independent of maternal ethnicity, age, parity, family history of diabetes mellitus, pre-pregnancy BMI, GDM diagnosis, total gestational weight gain and child sex (p = 0.011, OR: 1.63 (1.12–2.38)). However, preconception HbA1c was not associated with low birthweight at term (OR: 1.13 (0.86–1.49)) or large for gestational age infant (OR: 1.06 (0.92–1.21)). We additionally found that pre-pregnancy BMI was positively associated with large for gestational age infant (p < 0.001, OR: 1.20 (1.10–1.31)).
Table 4.
Associations of top predictors and adverse birth outcomes (preterm birth, low birthweight at term and large for gestational age infant). Associations of top predictors identified from AutoML feature selection model and adverse birth outcomes (preterm birth, low birthweight at term and large for gestational age infant). Statistical tests were conducted two-sided with a significance level of 5%. All confidence intervals (CIs) are presented two-sided with a confidence level of 95%. The odds ratios (ORs) with 95% CI are presented. A resultant p-value of less than 0.05 is considered statistically significant.
After excluding women with prediabetes, the associations between preconception HbA1c and a GDM outcome (p = 0.003, OR: 1.32 (1.10–1.59)) and with a preterm birth outcome (p = 0.010, OR: 1.75 (1.14–2.67)) were not materially changed.
4. Discussion
Primary Findings
We built an effective preconception-based GDM predictor by integrating game theory concepts with evolutionary algorithm-based AutoML. Our proposed AutoML model was derived using genetic programming and achieved an excellent AUC of 0.93 with four features (HbA1c, mean arterial blood pressure, fasting insulin, TG/HDL ratio). A stacked ensemble model with the gradient boosting classifier and linear support vector machine classifier (stochastic gradient descent training) was the best machine learning pipeline evaluated by TPOT. The preconception predictive risk model can be leveraged as a risk stratification tool during preconception care to identify Asian women at high risk of developing GDM, enabling early intervention. Alternatively, our non-invasive model trained with nine features (mean arterial blood pressure, height, age, mid-upper arm circumference, BMI, parity, alcohol consumption, family history of diabetes, Chinese ethnicity) provides an alternative for clinical implementation if blood-derived markers are unavailable (AUC: 0.81).
Population-based research on preconception HbA1c and its relationship/association with GDM and adverse birth outcomes remains limited. In our study, HbA1c was the top predictive feature discovered from AutoML feature selection modeling. The physiological variation in HbA1c can be attributed to increased red cell turnover during pregnancy with new erythrocytes exposed to a lower time-averaged glucose concentration [32] and decreasing insulin sensitivity with increasing gestation [33].
In the fully adjusted logistic regression model (adjusted for maternal ethnicity, age, parity, family history of diabetes mellitus and pre-pregnancy BMI), preconception HbA1c was associated with increased risks of GDM. Preconception HbA1c alone had a high predictive performance in the AutoML model (AUC: 0.81). Similarly in the sensitivity analyses, the predictive performance of the AutoML model was stronger with preconception HbA1c (AUC: 0.93) than preconception fasting glucose (AUC: 0.87), implying that early GDM risk stratification can be significantly improved with the inclusion of preconception HbA1c over preconception fasting glucose. Moreover, HbA1c offers greater clinical convenience than fasting glucose as there is no fasting requirement, less biological variation and greater pre-analytical stability [34]. As HbA1c is a measure of how glucose has interacted with erythrocytes up to a three-month period [35], our findings suggest that women who develop GDM may have impaired glucose homeostasis prior to pregnancy itself.
The clinical usefulness of preconception HbA1c can be extended to adverse pregnancy outcomes. In a Swedish study by Ludvigsson et al. [36], women with periconceptional HbA1c levels within recommended target levels (HbA1c < 6.5%) were at increased risk of preterm delivery. The risk of early preterm birth increased with increasing HbA1c levels in normal pregnancies and among women with type 1 diabetes [36]. Our study provides further evidence that preconception HbA1c is an independent risk factor for preterm birth. In the fully adjusted logistic regression model (adjusted for maternal ethnicity, age, parity, family history of diabetes mellitus, pre-pregnancy BMI, GDM diagnosis, total gestational weight gain and child sex), preconception HbA1c was associated with increased risks of preterm birth. Associations between preconception HbA1c and GDM and preterm birth were not materially changed after excluding women with prediabetes, indicating that preconception HbA1c is a reliable marker in predicting GDM/preterm birth even within normal HbA1c range.
Blood pressure changes between preconception and pregnancy are underexplored in the literature. In our study, mean arterial blood pressure feature was another critical component of the AutoML model. Although mean arterial blood pressure at preconception was not associated with GDM outcome, the linkage between preconception blood pressure and physiological changes associated with pregnancy complications warrants further investigation.
The TG/HDL ratio is a surrogate marker for insulin resistance and was one of the top five features in the AutoML feature selection model. GDM is a condition of increased insulin resistance, and this shifts the balance of lipid processing as reflected by the TG/HDL ratio [37,38]. The four features in AutoML modeling for GDM prediction (HbA1c, mean arterial blood pressure, fasting insulin and TG/HDL ratio) discovered through genetic programming are suggestive of transient insulin resistance at preconception and reflect the women’s pre-existing metabolic physiology, which clearly has a bearing on the women’s ability to amount an appropriate metabolic adaptation to pregnancy in response to signals from the conceptus to ensure a successful pregnancy. Dysfunctional metabolic adaptation can thus lead to gestational diabetes and preterm birth.
5. Limitations
This study has some limitations due to scarcity of longitudinal data. Our AutoML model was trained on a limited S-PRESTO cohort of 222 preconception women. However, the prospective S-PRESTO data capture complex clinical pathways during pregnancy initiation and are less prone to differential measurement errors. A sub-cohort analyses by individual ethnic groups can be trained with larger clinical datasets such as the electronic health records. No replication cohort was available, and our proposed model should be evaluated in confirmatory studies. The four features in AutoML modeling for GDM prediction need to be evaluated in an early pregnancy cohort for generalizability.
Comparison with Prior Work
The implementation of our GDM risk prediction algorithm during preconception care would enable early engagement of women for preventive intervention, compared to existing pregnancy-based GDM risk prediction algorithms [14,15] developed for antenatal care. In another recent study by Wu et al. [39], an early pregnancy prediction model for GDM was developed based on genetic variants (four genetic susceptible single nucleotide polymorphisms (SNPs)) and six basic clinical features (AUC: 0.73). The latter model requires more advanced laboratory testing for SNPs, which may not be routinely available in all standard clinical laboratories. Xiong et al. [40] developed high performance machine learning models with the linear support vector machine classifier and LightGBM gradient boosting classifier using 10–19 weeks’ gestation data (AUC: 0.91–0.98), which may be too late for effective GDM interventions. With four basic clinical features measured at preconception and high prediction performance of AUC: 0.93, our stacked ensemble model with the gradient boosting classifier and linear support vector machine classifier (stochastic gradient descent training) offers a simpler solution for early GDM prediction.
6. Conclusions
Leveraging AI and evolutionary algorithms, we devised a population-based predictive care solution to assess the risk of developing GDM in preconception of Asian women. An optimal control of preconception HbA1c has the potential to lower the risk of GDM and reduce the incidence of preterm birth. Our trained classifier has been deployed in a web application for GDM prevention programs and intervention with early-stage nutritional and lifestyle changes during preconception care. The GDM predictor can also be combined with a digital health intervention such as a smartphone application.
Supplementary Materials
The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/ijerph19116792/s1, File S1: Introduction to Genetic Programming; Table S1: Sensitivity analysis of preconception predictive risk model.
Author Contributions
M.K. contributed to research study design, machine learning modeling, statistical analyses, interpretation of results and writing of the manuscript. L.T.A., H.P. and M.N. contributed to clinical data curation. K.T. contributed to the acquisition, curation of biochemistry data and critical reading of the manuscript. S.L.L. contributed to collection of phenotypic data in S-PRESTO cohort and critical reading of the manuscript. K.H.T., J.K.Y.C., K.M.G., S.-y.C. and Y.S.C. contributed to S-PRESTO cohort study design, data collection and critical reading of the manuscript. J.G.E. contributed to interpretation of results, writing of the manuscript and S-PRESTO cohort data collection. M.F. contributed to supervision of the study, interpretation of results and writing of the manuscript. N.K. contributed to supervision of the study, interpretation of results, writing of the manuscript and S-PRESTO cohort study data collection. M.F. and N.K. accept full responsibility for the work, had access to the data and controlled the decision to publish. All authors have read and agreed to the published version of the manuscript.
Funding
The S-PRESTO cohort study is supported by the National Research Foundation (NRF) under the Open Fund-Large Collaborative Grant No. OF-LCG; MOH-000504 administered by the Singapore Ministry of Health’s National Medical Research Council (NMRC) and the Agency for Science, Technology and Research (A*STAR). This research is supported by NMRC’s Open Fund—Large Collaborative Grant, titled ‘Metabolic Health in Asian Women and their Children’ (award no. OFLCG19may-0033). K.M.G. is supported by the UK Medical Research Council (MC_UU_12011/4), the National Institute for Health Research (NIHR Senior Investigator (NF-SI-0515-10042) and NIHR Southampton Biomedical Research Centre (IS-BRC-1215-20004)) and the British Heart Foundation (RG/15/17/3174). Additional funds for data analysis were supported by the Strategic Positioning Fund and IAFpp funds (H17/01/a0/005) available to N.K. through A*STAR (award number SPF 002/2013).
Institutional Review Board Statement
The study was conducted according to the guidelines of the Declaration of Helsinki and reviewed by SingHealth Centralised Institutional Review Board for ethical approval (CIRB Ref: 2014/692/D, 19 September 2014).
Informed Consent Statement
Informed consent was obtained from all subjects involved in the study for use of human biological material and data for future research.
Data Availability Statement
The data supporting the findings of this research can be requested from the S-PRESTO executive committee upon reasonable request. The code generated to reproduce this research is available at GitHub page: https://github.com/mukkeshkumar/S-PRESTO_Gestational-Diabetes-Mellitus. The AutoML model has been deployed as a web application and can be accessed through the following URL: https://www.mornin-feng.com/all-projects-and-demos#gdm3.
Acknowledgments
We thank the S-PRESTO study team for their help in acquiring the research data and their crucial work with the participants.
Conflicts of Interest
N.K., K.M.G., S.-y.C. and Y.S.C. are part of an academic consortium that has received research funding from Abbott Nutrition, Nestec, BenevolentAI Bio Ltd. and Danone. MF was partially supported by the National Research Foundation Singapore under its AI Singapore Programme (award number: AISG-GC-2019-001-2A). Other authors declare no conflict of interest.
References
- International Diabetes Federation. IDF Diabetes Atlas, 9th ed.; International Diabetes Federation: Brussels, Belgium, 2019. [Google Scholar]
- Metzger, B.E.; Coustan, D.R. Summary and recommendations of the Fourth International Workshop-Conference on Gestational Diabetes Mellitus. Diabetes Care 1998, 21, B161–B167. [Google Scholar] [PubMed]
- American Diabetes Association. Gestational Diabetes Mellitus. Diabetes Care 2003, 26, s103–s105. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Vounzoulaki, E.; Khunti, K.; Abner, S.C.; Tan, B.K.; Davies, M.J.; Gillies, C.L. Progression to type 2 diabetes in women with a known history of gestational diabetes: Systematic review and meta-analysis. BMJ 2020, 369, m1361. [Google Scholar] [CrossRef]
- Kramer, C.K.; Campbell, S.; Retnakaran, R. Gestational diabetes and the risk of cardiovascular disease in women: A systematic review and meta-analysis. Diabetologia 2019, 62, 905–914. [Google Scholar] [CrossRef] [Green Version]
- Chu, A.H.Y.; Godfrey, K.M. Gestational Diabetes Mellitus and Developmental Programming. Ann. Nutr. Metab. 2020, 76, 4–15. [Google Scholar] [CrossRef]
- American Diabetes Association. Classification and Diagnosis of Diabetes: Standards of Medical Care in Diabetes—2020. Diabetes Care 2020, 43, S14–S31. [Google Scholar] [CrossRef] [Green Version]
- National Institute for Health and Care Excellence. Diabetes in Pregnancy: Management from Preconception to the Postnatal Period; National Institute for Health and Care Excellence: London, UK, 2015. [Google Scholar]
- International Diabetes Federation. IDF GDM Model of Care; International Diabetes Federation: Brussels, Belgium, 2015. [Google Scholar]
- Moholdt, T.; Hawley, J.A. Maternal Lifestyle Interventions: Targeting Preconception Health. Trends Endocrinol. Metab. 2020, 31, 561–569. [Google Scholar] [CrossRef]
- Stephenson, J.; Heslehurst, N.; Hall, J.; Schoenaker, D.A.J.M.; Hutchinson, J.; Cade, J.E.; Poston, L.; Barrett, G.; Crozier, S.R.; Barker, M.; et al. Before the beginning: Nutrition and lifestyle in the preconception period and its importance for future health. Lancet 2018, 391, 1830–1841. [Google Scholar] [CrossRef]
- Fleming, T.P.; Watkins, A.J.; Velazquez, M.A.; Mathers, J.C.; Prentice, A.M.; Stephenson, J.; Barker, M.; Saffery, R.; Yajnik, C.S.; Eckert, J.J.; et al. Origins of lifetime health around the time of conception: Causes and consequences. Lancet 2018, 391, 1842–1852. [Google Scholar] [CrossRef]
- Barker, M.; Dombrowski, S.U.; Colbourn, T.; Fall, C.H.D.; Kriznik, N.M.; Lawrence, W.T.; Norris, S.A.; Ngaiza, G.; Patel, D.; Skordis-Worrall, J.; et al. Intervention strategies to improve nutrition and health behaviours before conception. Lancet 2018, 391, 1853–1864. [Google Scholar] [CrossRef] [Green Version]
- Artzi, N.S.; Shilo, S.; Hadar, E.; Rossman, H.; Barbash-Hazan, S.; Ben-Haroush, A.; Balicer, R.D.; Feldman, B.; Wiznitzer, A.; Segal, E. Prediction of gestational diabetes based on nationwide electronic health records. Nat. Med. 2020, 26, 71–76. [Google Scholar] [CrossRef] [PubMed]
- Wu, Y.-T.; Zhang, C.-J.; Mol, B.W.; Kawai, A.; Li, C.; Chen, L.; Wang, Y.; Sheng, J.-Z.; Fan, J.-X.; Shi, Y.; et al. Early Prediction of Gestational Diabetes Mellitus in the Chinese Population via Advanced Machine Learning. J. Clin. Endocrinol. Metab. 2020, 106, e1191–e1205. [Google Scholar] [CrossRef] [PubMed]
- Loo, E.X.L.; Soh, S.E.; Loy, S.L.; Ng, S.; Tint, M.T.; Chan, S.Y.; Huang, J.Y.; Yap, F.; Tan, K.H.; Chern, B.S.M.; et al. Cohort profile: Singapore Preconception Study of Long-Term Maternal and Child Outcomes (S-PRESTO). Eur. J. Epidemiol. 2021, 36, 129–142. [Google Scholar] [CrossRef] [PubMed]
- World Health Organization. Diagnostic Criteria and Classification of Hyperglycaemia First Detected in Pregnancy; World Health Organization: Geneva, Switzerland, 2013. [Google Scholar]
- World Health Organization; International Diabetes Federation (IDF). Definition and Diagnosis of Diabetes Mellitus and Intermediate Hyperglycaemia; World Health Organization: Geneva, Switzerland; International Diabetes Federation: Brussels, Belgium, 2006. [Google Scholar]
- World Health Organization. Use of Glycated Haemoglobin (HbA1c) in the Diagnosis of Diabetes Mellitus; World Health Organization: Geneva, Switzerland, 2011. [Google Scholar]
- Ding, C.; Chan, Z.; Chooi, Y.C.; Choo, J.; Sadananthan, S.A.; Michael, N.; Velan, S.S.; Leow, M.K.-S.; Magkos, F. Association between Serum Vitamin D Metabolites and Metabolic Function in Healthy Asian Adults. Nutrients 2020, 12, 3706. [Google Scholar] [CrossRef] [PubMed]
- Matthews, D.R.; Hosker, J.P.; Rudenski, A.S.; Naylor, B.A.; Treacher, D.F.; Turner, R.C. Homeostasis model assessment: Insulin resistance and β-cell function from fasting plasma glucose and insulin concentrations in man. Diabetologia 1985, 28, 412–419. [Google Scholar] [CrossRef] [Green Version]
- Reaven, G.; Strom, T.K.; Fox, B. Syndrome X, The Silent Killer: The New Heart Disease Risk; Simon and Schuster: New York, NY, USA, 2001. [Google Scholar]
- Bedogni, G.; Bellentani, S.; Miglioli, L.; Masutti, F.; Passalacqua, M.; Castiglione, A.; Tiribelli, C. The Fatty Liver Index: A simple and accurate predictor of hepatic steatosis in the general population. BMC Gastroenterol. 2006, 6, 33. [Google Scholar] [CrossRef] [Green Version]
- Health Promotion Board, Metabolic Syndrome. Available online: https://www.hpb.gov.sg/article/metabolic-syndrome (accessed on 7 March 2022).
- Eiben, A.E.; Smith, J.E. Introduction to Evolutionary Computing; Springer: Berlin/Heidelberg, Germany, 2003; Volume 53. [Google Scholar]
- Banzhaf, W.; Nordin, P.; Keller, R.E.; Francone, F.D. Genetic Programming: An Introduction: On the Automatic Evolution of Computer Programs and Its Applications; Morgan Kaufmann Publishers Inc.: Burlington, MA, USA, 1998. [Google Scholar]
- Lundberg, S.M.; Erion, G.; Chen, H.; DeGrave, A.; Prutkin, J.M.; Nair, B.; Katz, R.; Himmelfarb, J.; Bansal, N.; Lee, S.-I. From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2020, 2, 56–67. [Google Scholar] [CrossRef]
- Le, T.T.; Fu, W.; Moore, J.H. Scaling tree-based automated machine learning to biomedical big data with a feature set selector. Bioinformatics 2019, 36, 250–256. [Google Scholar] [CrossRef] [Green Version]
- WHO. Recommended definitions, terminology and format for statistical tables related to the perinatal period and use of a new certificate for cause of perinatal deaths. Modifications recommended by FIGO as amended October 14, 1976. Acta Obstet. Gynecol. Scand. 1977, 56, 247–253. [Google Scholar]
- Soh, S.-E.; Tint, M.T.; Gluckman, P.D.; Godfrey, K.M.; Rifkin-Graboi, A.; Chan, Y.H.; Stünkel, W.; Holbrook, J.D.; Kwek, K.; Chong, Y.-S.; et al. Cohort Profile: Growing Up in Singapore Towards healthy Outcomes (GUSTO) birth cohort study. Int. J. Epidemiol. 2013, 43, 1401–1409. [Google Scholar] [CrossRef]
- Mikolajczyk, R.T.; Zhang, J.; Betran, A.P.; Souza, J.P.; Mori, R.; Gülmezoglu, A.M.; Merialdi, M. A global reference for fetal-weight and birthweight percentiles. Lancet 2011, 377, 1855–1861. [Google Scholar] [CrossRef]
- Lurie, S.; Mamet, Y. Red blood cell survival and kinetics during pregnancy. Eur. J. Obstet. Gynecol. Reprod. Biol. 2000, 93, 185–192. [Google Scholar] [CrossRef]
- Catalano, P.M.; Huston, L.; Amini, S.B.; Kalhan, S. Longitudinal changes in glucose metabolism during pregnancy in obese women with normal glucose tolerance and gestational diabetes mellitus. Am. J. Obstet. Gynecol. 1999, 180, 903–916. [Google Scholar] [CrossRef]
- Bonora, E.; Tuomilehto, J. The Pros and Cons of Diagnosing Diabetes With A1C. Diabetes Care 2011, 34, S184–S190. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- van’t Riet, E.; Alssema, M.; Rijkelijkhuizen, J.M.; Kostense, P.J.; Nijpels, G.; Dekker, J.M. Relationship between A1C and glucose levels in the general Dutch population: The new Hoorn study. Diabetes Care 2010, 33, 61–66. [Google Scholar] [CrossRef] [Green Version]
- Ludvigsson, J.F.; Neovius, M.; Söderling, J.; Gudbjörnsdottir, S.; Svensson, A.M.; Franzén, S.; Stephansson, O.; Pasternak, B. Maternal Glycemic Control in Type 1 Diabetes and the Risk for Preterm Birth: A Population-Based Cohort Study. Ann. Intern. Med. 2019, 170, 691–701. [Google Scholar] [CrossRef]
- An-Na, C.; Man-Li, Y.; Jeng-Hsiu, H.; Pesus, C.; Shin-Kuo, S.; Heung-Tat, N. Alterations of serum lipid levels and their biological relevances during and after pregnancy. Life Sci. 1995, 56, 2367–2375. [Google Scholar] [CrossRef]
- Toescu, V.; Nuttall, S.L.; Martin, U.; Nightingale, P.; Kendall, M.J.; Brydon, P.; Dunne, F. Changes in plasma lipids and markers of oxidative stress in normal pregnancy and pregnancies complicated by diabetes. Clin. Sci. 2004, 106, 93–98. [Google Scholar] [CrossRef] [Green Version]
- Wu, Q.; Chen, Y.; Zhou, M.; Liu, M.; Zhang, L.; Liang, Z.; Chen, D. An early prediction model for gestational diabetes mellitus based on genetic variants and clinical characteristics in China. Diabetol. Metab. Syndr. 2022, 14, 15. [Google Scholar] [CrossRef]
- Xiong, Y.; Lin, L.; Chen, Y.; Salerno, S.; Li, Y.; Zeng, X.; Li, H. Prediction of gestational diabetes mellitus in the first 19 weeks of pregnancy using machine learning techniques. J. Matern. Neonatal Med. 2022, 35, 2457–2463. [Google Scholar] [CrossRef]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).