You are currently viewing a new version of our website. To view the old version click .
Diagnostics
  • Article
  • Open Access

7 January 2026

Predictive Accuracy of Ultrasound Biometry and Maternal Factors in Identifying Large-for-Gestational-Age Neonates at 30–34 Weeks

,
,
,
,
,
,
,
,
and
1
Department of Obstetrics and Gynaecology, School of Medicine, Faculty of Health Sciences, University of Ioannina, 45332 Ioannina, Greece
2
Third Department of Obstetrics and Gynaecology, School of Medicine, Faculty of Health Sciences, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece
3
Third Department of Obstetrics and Gynecology, University Hospital “ATTIKON”, Medical School, National and Kapodistrian University of Athens, 17237 Athens, Greece
*
Author to whom correspondence should be addressed.
This article belongs to the Special Issue Advances in Ultrasound Diagnosis in Maternal Fetal Medicine Practice

Abstract

Background/Objectives: To construct and compare multivariable prediction models for the early prediction of large-for-gestational-age (LGA) neonates, using ultrasound biometry and maternal characteristics. Methods: This retrospective cohort study analyzed data from singleton pregnancies that underwent routine ultrasound examinations at 30+0–34+0 weeks of gestation. Ultrasound parameters included fetal abdominal circumference (AC), head circumference (HC), femur length (FL), HC-to-AC ratio, mean uterine artery pulsatility index (mUtA-PI), and presence of polyhydramnios. LGA neonates were defined as those having a birthweight > 90th percentile. Logistic regression was used to evaluate associations between ultrasound markers and LGA after adjusting for the following maternal and pregnancy-related covariates: maternal age, body mass index, parity, gestational diabetes mellitus (GDM), pre-existing diabetes, previous cesarean section (PCS), assisted reproductive technology (ART) use, smoking, hypothyroidism, and chronic hypertension. Associations were expressed as adjusted odds ratios (aORs) with 95% confidence intervals (CIs). Three prognostic models were developed utilizing the following predictors: (i) biometric ultrasound measurements including AC, HC-to-AC ratio, FL, UtA-PI, and polyhydramnios (Model 1), (ii) a combination of biometric ultrasound measurements and clinical–maternal data (Model 2), and (iii) only the estimated fetal weight (EFW) (Model 3). Results: In total, 3808 singleton pregnancies were included in the analyses. The multivariable analysis revealed that AC (aOR 1.07, 95% CI [1.06, 1.08]), HC to AC (aOR 1.01, 95% CI [1.006, 1.01]), FL (aOR 1.01, 95% CI [1.009, 1.01]), and the presence of polyhydramnios (aOR 4.97, 95% CI [0.7, 58.8]) were associated with an increased risk of LGA, while a higher mUtA-PI was associated with a reduced risk (aOR 0.98, 95% CI [0.98, 0.99]). Maternal parameters, such as GDM, pre-existing diabetes, elevated pre-pregnancy BMI, absence of uterine artery notching, mUtA-PI, and multiparity, were significantly higher in the LGA group. Both models 1 and 2 showed similar performance (AUCs: 84.7% and 85.3%, respectively) and outperformed model 3 (AUC: 77.5%). Bootstrap and temporal validation indicated minimal overfitting and stable model performance, while decision curve analysis supported potential clinical utility. Conclusions: Models using biometric and Doppler ultrasound at 30–34 weeks demonstrated good discriminative ability for predicting LGA neonates, with an AUC up to 84.7%. Adding maternal characteristics did not significantly improve performance, while the biometric model performed better than EFW alone. Sensitivity at conventional thresholds was low but increased substantially when lower probability cut-offs were applied, illustrating the model’s threshold-dependent flexibility for early risk stratification in different clinical screening needs. Although decision curve analysis was performed to explore potential clinical utility, external validation and prospective assessment in clinical settings are still needed to confirm generalizability and to determine optimal decision thresholds for clinical application.

1. Introduction

Antenatal identification of large-for-gestational-age (LGA) neonates is important for optimizing prenatal care, planning delivery, and mitigating potential complications associated with fetal overgrowth [1]. LGA, defined as birthweight > 90th percentile for gestational age, is associated with an increased risk of labor complications, including shoulder dystocia, birth trauma, and neonatal intensive care unit admissions [1,2,3]. Moreover, LGA infants face increased risk of adverse metabolic outcomes later in life, such as obesity, hypertension, and type 2 diabetes [4,5]. In cases where LGA is detected, underlying maternal and fetal conditions should be excluded. These conditions include gestational diabetes mellitus (GDM) and congenital overgrowth syndromes, specifically Beckwith–Wiedemann, Pallister–Killian, Sotos, Perlman, and Simpson–Golabi–Behmel syndromes [6].
One strategy to mitigate these risks is early induction of labor. Evidence from a large randomized controlled trial demonstrated that induction at 37+0–39+0 weeks of gestation reduced the incidence of severe shoulder dystocia by 68%, suggesting a potential clinical benefit [7]. However, despite these findings, national and international guidelines do not currently recommend routine early induction of labor for suspected LGA unless additional risk factors, such as maternal diabetes, are present [8,9]. Furthermore, a recent meta-analysis investigating the timing of induction in suspected macrosomia showed that gestational age at induction has a decisive role in perinatal adverse outcomes, further highlighting the need for effective strategies to identify and appropriately manage these pregnancies [10].
There is no consensus on the routine sonographic evaluation of low-risk pregnancies in the third trimester [11]. Accumulating data suggests that third-trimester ultrasonography may offer several advantages, including the identification of small-for-gestational-age (SGA) and LGA fetuses [12], assessment of placental position, fetal presentation, and detection of late-onset anomalies [13]. Another point of dispute is the timing of a routine third-trimester ultrasound. Traditionally, this was offered at 30–34 weeks of gestation. However, it was recently advocated that in low-risk pregnancies, a late third-trimester scan, at 35–37 weeks, may be more useful in identifying growth abnormalities and allow for more effective decision making on management [14]. The 30–34-week window was chosen in this study to investigate whether earlier prediction could allow proactive planning. Nonetheless, this earlier window presents a greater challenge for predicting LGA neonates, as most excessive fetal growth occurs closer to term. Fetal growth is traditionally evaluated using ultrasound measurements, but this approach is often inaccurate in predicting LGA, particularly when used in isolation and early in the third trimester [15]. Standard third-trimester estimated fetal weight assessments often lack the necessary sensitivity and specificity, which can result in both false-positive and false-negative diagnoses, potentially leading to unnecessary interventions or missed opportunities for appropriate management [15]. To overcome this limitation, various multivariable prediction models have been developed to improve the sensitivity of ultrasound alone, incorporating either additional ultrasound parameters or maternal factors [1]. Given that maternal factors—such as pre-pregnancy body mass index (BMI), GDM, and parity— significantly influence fetal growth trajectories, it is reasonable to expect that integrating these variables into predictive models alongside ultrasound biometry would enhance predictive accuracy [16,17,18]. However, a recent systematic review that evaluated all multivariable prediction models for detecting LGA concluded that most models had similar or lower area under the curve (AUC) values compared to ultrasound alone, and that none of the models are ready for clinical implementation yet [1].
Therefore, the primary objective of this study was to evaluate if a model using fetal biometric measurements [abdominal circumference (AC), femur length (FL), head circumference (HC)-to-AC ratio], uterine artery Doppler parameters [mean uterine artery pulsatility index (mUtA-PI)], and the presence of polyhydramnios has superior predictive performance for LGA neonates compared to estimated fetal weight (EFW) alone, based on a single examination at 30–34 weeks of gestation. We also aimed to determine if the addition of maternal clinical factors could improve this model’s accuracy. The aim is to build a more accurate prediction model that helps clinicians focus fetal monitoring on pregnancies with the greatest risk.

2. Materials and Methods

2.1. Study Details and Population Characteristics

We conducted a retrospective cohort study using data from singleton pregnancies that underwent routine ultrasound examinations between 30+0 and 34+0 weeks of gestation at the 3rd Department of Obstetrics and Gynecology, School of Medicine, Faculty of Health Sciences, Aristotle University of Thessaloniki, Greece. All eligible women who came to the clinic during the data collection period were consecutively included in the study. Data collection took place between February 2013 and February 2024. Data were obtained from dedicated medical records, encompassing demographic information, medical history, and ultrasound measurements. Based on the national guidelines and the local clinical protocol, all participants were offered routine ultrasound examinations at 11+0–13+6 weeks, 20+0–23+6 weeks, and at 30+0–34+0 weeks. Although later third-trimester scans may improve predictive accuracy for LGA detection, the 30–34-week window was selected to explore earlier identification, allowing for potential intervention planning. All examinations were performed by fetal medicine specialists, accredited by the Fetal Medicine Foundation, UK, using the Voluson E8 ultrasound system (GE HealthCare, Chicago, IL, USA) with the GE RAB4-8-D transducer (GE HealthCare, Chicago, IL, USA). The ultrasound parameters assessed included HC, AC, FL, HC-to-AC ratio, presence of polyhydramnios defined as the deepest pool of amniotic fluid > 8 cm, and mUtA-PI. In addition, uterine artery notching was recorded and categorized as absent, bilateral, left-sided, or right-sided. Fetal growth was assessed using the Hadlock growth curves as reference standards, with no adjustment for fetal sex in the calculation of centiles. Furthermore, all women underwent routine screening for GDM between 26+0 and 27+6 weeks of gestation, according to criteria based on the results of the HAPO study [19]. According to this protocol, GDM was defined as a fasting plasma glucose of 5.1 mmol/L or more, a 1 h value of 10.0 mmol/L or more, or a 2 h value of 8.5 mmol/L or more after a 75 g oral glucose load. Maternal factors, such as previous cesarean section (PCS), GDM diagnosed according to HAPO study criteria [19], conception via assisted reproductive technology (ART), parity, pre-pregnancy BMI, maternal age, smoking during pregnancy, presence of hypothyroidism, pre-existing diabetes (type I or II), and chronic hypertension, were also noted. Of note, parity was analyzed as a dichotomous variable, while BMI was treated as a continuous variable.
Eligibility criteria were as follows: (i) maternal age ≥ 18 years old, (ii) previous scans at 11+0–13+6 weeks and 20+0–23+6 weeks, (iii) singletons with no detectable congenital anomalies, (iv) availability of complete records for key ultrasound measurements (AC, HC, FL, and mUtA-PI) and relevant maternal factors, and (v) deliveries after 33+1 weeks with known perinatal outcomes.
LGA was defined as birth weight > 90th percentile for gestational age based on the Hadlock growth charts, which the recent literature suggests perform comparably to the INTERGROWTH charts [20].
This study was part of a larger prospective cohort conducted in accordance with the principles of the Declaration of Helsinki [21]. Ethical approval was granted by the Bioethics Committee of the Aristotle University of Thessaloniki, Greece (approval No. 6.231/29 July 2020) before the start of the study. All participants provided written informed consent prior to enrollment, and no financial or other incentives were offered for participation.

2.2. Statistical Analysis

The HC, AC, FL, HC to AC, and mUtA-PI centiles were calculated using the mathematical models reported by Snijders et al. [22], Gómez et al. [23], and Sotiriadis et al. [24], respectively. To describe the characteristics of the study’s population, the Shapiro–Wilk test was applied to assess the normality of the continuous variables, while the F-test was used to check the equality of variances. Depending on data distribution, hypothesis testing was performed using the t-test for normally distributed data, and the Wilcoxon or Mann–Whitney tests for non-parametric data. For binary variables, Fisher’s exact test was used, particularly in cases with small sample sizes. Three logistic regression models were constructed:
Biometric Ultrasound Measurements Model (Model 1): Included AC, FL, HC to AC, mUtA-PI, and presence/absence of polyhydramnios as predictors.
Biometric and Clinical Model (Model 2): Included the same variables as Model 1 plus maternal factors (maternal age, BMI, parity, GDM, pre-existing diabetes, PCS, ART use, smoking, hypothyroidism, and chronic hypertension).
EFW Model (Model 3): Used the EFW as the only predictor, estimated according to the Hadlock 4 (HC, AC, and FL) formula [25]. EFW was derived from AC, HC, and FL. It was analyzed separately to compare the predictive value of raw biometric inputs and Doppler measures versus the composite EFW formula.
The study population was stratified into two groups: pregnancies complicated by GDM or pre-existing diabetes, and those without these conditions. The same three logistic regression models were then developed separately within each group. Sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) were calculated to assess predictive performance. To further evaluate the model’s discrimination ability, a receiver operating characteristic (ROC) curve was plotted, and the AUC was calculated along with its 95% confidence interval (CI). A p-value of less than 0.05 was considered statistically significant. All analyses were conducted using the R software (version 4.2.1).
Missing data were assessed for all predictors included in the analysis, and the extent of missingness is presented in Supplementary Table S1. Overall, the dataset was highly complete. Only pre-pregnancy BMI showed missing values (89/3808, 2.34%), while all other variables had 0% missingness. Because the proportion of missing data was small and limited to a single predictor, complete-case analysis was performed. This approach is consistent with methodological recommendations indicating that complete-case analysis is unlikely to introduce meaningful bias when the proportion of missingness is approximately < 5% [26]. Furthermore, TRIPOD does not prescribe a specific threshold at which imputation must be applied but requires transparent reporting of missing data and the chosen handling strategy, which is fulfilled here [27]. Given the minimal level of missingness, additional imputation was not considered necessary.
Given the retrospective design, no formal a priori sample size calculation was performed. Sample adequacy was evaluated according to the number of outcome events per predictor, which exceeded the commonly recommended minimum of 10 events per predictor for reliable model development [28].
Internal validation was performed using bootstrap resampling with optimism correction. For each of the three prediction models, 500 bootstrap samples were drawn with replacement from the original dataset (n = 3808; 431 LGA cases, 3377 non-LGA cases). In each bootstrap iteration, the model was fitted on the bootstrap sample and performance was evaluated on both the bootstrap sample (apparent performance) and the original dataset (test performance). Optimism was calculated as the difference between apparent and test performance for each metric. The mean optimism across all 500 bootstrap iterations was then subtracted from the apparent performance on the original dataset to obtain optimism-corrected estimates. Bootstrap validation was performed to evaluate model performance across several metrics. Discrimination was assessed using the area under the receiver operating characteristic curve (AUC), and overall predictive accuracy was measured with the Brier score. Model calibration was examined through the calibration intercept and slope. In addition, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) were calculated using a probability threshold of 0.5. All bootstrap procedures and statistical analyses were carried out in R version 4.3 (R Foundation for Statistical Computing, Vienna, Austria).
Temporal validation was performed using chronological stratification of the dataset. The cohort was divided into an earlier subset used for model development (n = 2599, including 291 LGA cases) and a later subset used for testing (n = 1118, including 136 LGA cases). Prediction models were developed using the earlier subset and subsequently evaluated in the later subset. Model discrimination was assessed using the AUC, and differences in the AUC between the two temporal subsets were calculated to quantify changes in performance over time.
Decision curve analysis was performed to evaluate the potential clinical utility of the prediction models across a range of threshold probabilities. Net benefit was calculated for models 1, 2, and 3 and compared with default strategies of treating all pregnancies as high risk and treating none. Net benefit represents the balance between true positives and false positives at different risk thresholds, allowing for the assessment of whether decision making based on the models would improve clinical outcomes.
To assess potential circularity between fetal biometric predictors and the definition of LGA, sensitivity analysis was performed using an alternative representation of fetal biometry. In the main analysis, biometric variables were expressed as gestational age-adjusted centiles. In the sensitivity analysis, centile-based predictors were replaced with raw biometric measurements and gestational age was included explicitly as a covariate. Model discrimination was compared between approaches using the AUC.

3. Results

The study population consisted of 3863 women undergoing routine ultrasound examination at 30+0–34+0 weeks of gestation with singleton pregnancies and structurally normal fetuses. Of these, 55 women were excluded due to incomplete records of ultrasound parameters. Thus, 3808 women were eligible for the data analysis (Figure 1).
Figure 1. Flowchart of the investigated population selection process.
Of the recruited women, 431 (11.32%) delivered LGA neonates, while 722 (18.96%) delivered SGA neonates. The general characteristics of the population are presented in Table 1. In the unadjusted analysis, women who delivered an LGA neonate had a significantly higher pre-pregnancy BMI and a higher prevalence of PCS, GDM, multiparity, and pre-existing diabetes mellitus. Fetal biometrics percentiles (AC, HC, and FL) were also significantly higher in the LGA group, although the HC-to-AC ratio was lower. Additionally, the absence of a uterine artery notch and the presence of polyhydramnios were significantly more common in the LGA group (Table 1).
Table 1. Maternal characteristics and measured variables between the LGA and non-LGA groups.
In the adjusted analysis, AC centile (aOR 1.06, 95% CI [1.06, 1.07]), HC/AC centile (aOR 1.01, 95% CI [1.006, 1.01]), and FL centile (aOR 1.01, 95% CI [1.009, 1.01]) remained significantly associated with LGA. The mUtA-PI centile was associated with a reduced likelihood of LGA (aOR 0.98, 95% CI [0.98, 0.99]). Polyhydramnios showed a positive but non-significant association with LGA (aOR 4.97, 95% CI [0.7–58.8], p-value = 0.14) (Table 2).
Table 2. Multivariate adjusted ORs for ultrasound and Doppler variables.
The prognostic performances of Models 1 and 2 did not differ significantly. For Model 1, sensitivity was 23.4%, specificity was 98.2%, and the AUC was 84.7%. For Model 2, sensitivity was 23.4%, specificity was 98.2%, and the AUC was 85.3%. In comparison, the EFW model (Model 3) achieved inferior performance, with a sensitivity of 7.4%, specificity of 99.2%, and an AUC of 77.5% (Table 3). The post hoc power of the study was high for all models and the systematic error of the ROC curves approached zero, indicating sufficient sample size and minimal bias across all possible thresholds.
Table 3. Predictive performance comparison of the prediction models at the conventional 0.5 threshold.
Performance metrics were also calculated at multiple decision thresholds for all three models (Supplementary Tables S2–S4). At the conventional 0.5 threshold, the sensitivities for Model 1, Model 2, and Model 3 were 23.4%, 23.4%, and 7.4%, respectively. When the probability threshold was lowered, sensitivity increased. At a threshold of 0.3, for example, Model 1 reached a sensitivity of 42.1% with a specificity of 93.3%, Model 2 reached 43.5% with a specificity of 93.1%, and Model 3 reached 25.2% with a specificity of 95.6%. At a lower threshold of 0.2, the sensitivity for Model 1 increased to 58.7% with a specificity of 87.2%, for Model 2 it reached 62.5% with a specificity of 87.7%, and for Model 3 it reached 45.4% with a specificity of 88%. Calibration plots demonstrated close agreement between predicted and observed risks, consistent with calibration slopes near one and indicating good model calibration (Supplementary Figure S1).
Internal validation using bootstrap resampling with optimism correction proved minimal overfitting across all three models. The performance metric estimates did not differ significantly between the apparent and optimism-corrected performance, indicating good model generalizability. The optimism-corrected Brier scores (0.08) were consistent across the models, ensuring reliable overall prediction accuracy. Calibration slopes remained close to one after correction, suggesting stable calibration. These findings confirm that the models maintained their performance after correcting for internal validation bias (Table 4).
Table 4. Internal validation of the prediction models.
Temporal validation demonstrated preserved discriminative performance across all three models. In the temporally later cohort, test AUC values were comparable to or slightly higher than those observed in the earlier cohort. Model 1 achieved a test AUC of 86.7% compared with a training AUC of 83.8%, Model 2 achieved a test AUC of 87.0% compared with 84.4%, and Model 3 achieved a test AUC of 80.7% compared with 76.3%. The observed differences in the AUC were small, indicating stable model discrimination over time (Table 5).
Table 5. Temporal validation of the prediction models.
Decision curve analysis demonstrated that Models 1 and 2 provided greater net benefits than the treat-all and treat-none strategies across a broad range of low-to-moderate threshold probabilities. Model 2 consistently achieved the highest net benefit, while Model 3 showed a lower net benefit across the thresholds. At higher threshold probabilities, the net benefit of all models gradually decreased, indicating that limiting intervention to only those with very-high predicted risk offers little additional advantage. Overall, these findings suggest that the proposed models have potential clinical usefulness within certain decision thresholds (Supplementary Figure S2).
The sensitivity analysis demonstrated that model performance was robust to the representation of fetal biometric predictors. The main model, which was based on centiles, achieved an AUC of 0.847, while the sensitivity model using raw biometric measurements and explicit adjustments for gestational age achieved an AUC of 0.855. The absolute difference in discrimination between the two approaches was minimal. The preservation of model discrimination after the removal of centiles provides strong evidence against circularity and supports that the model captures genuine biological associations related to fetal growth rather than mathematical coupling with the outcome definition (Table 6).
Table 6. Circularity sensitivity analysis.
A total of 726 pregnancies (19%) were complicated by GDM or pre-existing DM. The performance of the logistic regression models did not differ significantly between the two groups (Supplementary Tables S5 and Table S6). The ROC curves of the predictive models are presented in Figure 2.
Figure 2. ROC curves comparison of logistic regression prediction models between our subgroups. (A) Overall population; (B) GDM or pre-existing DM population; (C) non-GDM, non-pre-existing DM population.

4. Discussion

4.1. Primary Findings

The study’s main findings were that (i) maternal conditions, including GDM, pre-existing diabetes, elevated pre-pregnancy BMI, and higher parity, were significantly higher in the LGA group; (ii) sonographic parameters (AC, HC, FL, and mUtA-PI) were significant predictors of an LGA neonate; (iii) the biometric ultrasound measurements model (Model 1) performed similarly to the clinical and biometric model (Model 2), and outperformed the EFW model (Model 3); (iv) all models demonstrated a high degree of flexibility, with an adjustable balance between sensitivity and specificity that can be calibrated by selecting different probability thresholds to suit specific clinical objectives; (v) bootstrap internal validation demonstrated minimal overfitting, with evaluation metrics remaining stable after optimism correction, while temporal validation showed preserved performance across time periods, supporting the robustness of the model; and (vi) decision curve analysis indicated that Models 1 and 2 provided greater net benefit than the treat-all or treat-none strategies across clinically relevant risk thresholds, supporting their potential clinical usefulness.

4.2. Interpretation of the Results

Pre-pregnancy BMI, GDM, pre-existing diabetes, and excessive gestational weight gain are recognized risk factors for delivering LGA neonates [16,17,29,30,31]. The underlying pathophysiology involves fetal hyperinsulinemia, an anabolic hormone that promotes accelerated fetal growth and ultimately leads to fetal macrosomia [32]. In addition to metabolic factors, multiparity has consistently been identified as a major risk factor for macrosomia, with studies showing increased odds of LGA with higher parity (aOR 1.31) [33,34,35]. Although a history of PCS has not been extensively studied, existing evidence, including a study by Rosen et al., suggests that PCS is more common among mothers of LGA infants [32]. In our study, women who delivered LGA infants had a higher BMI and were significantly more likely to have GDM or pre-existing diabetes. Both higher parity and PCS were more frequent in the LGA group. Because maternal overweight and obesity are modifiable factors, implementing structured lifestyle and weight-management strategies, particularly among multiparous women, is essential for optimizing maternal and neonatal health outcomes.
Ultrasound-based fetal biometry is widely employed to predict LGA infants, though its diagnostic accuracy is modest; a 2020 meta-analysis demonstrated that sonographically suspected macrosomia had a sensitivity of 53.2% for predicting LGA, a finding consistent across multiple studies [2,36]. It is reasonable that ultrasound appears more effective in predicting LGA infants at 36 weeks compared to 32 weeks; its accuracy increases closer to the delivery date [15]. Fetal measurements in the early third trimester have limited prognostic value [15]. Although a lower HC-to-AC ratio was proposed as a method to increase sensitivity, it failed to improve predictive accuracy in a retrospective study when compared to estimated EFW and AC alone [37]. In our study, we examined a variety of ultrasound measurements to predict the birth of LGA neonates. The most significant individual predictor proved to be AC, with each additional centile associated with 7% higher odds of the neonate being LGA at birth.
The association between mUtA-PI measurements and SGA neonates has been extensively studied, whereas its relationship with LGA outcomes has received less attention [38]. In the study by Ip et al., which examined ultrasonographic parameters at 11 to 13+6 gestational weeks, mUtA-PI values were slightly lower in pregnancies that subsequently developed LGA neonates, although the difference was not statistically significant [39]. mUtA-PI proved to be a significant predictor in our study, with each centile increase associated with a 2% reduction in the odds of LGA. While mUtA-PI is traditionally associated with placental insufficiency and SGA, these findings suggest its potential role in predicting LGA, necessitating further research.
The biometric model, incorporating ultrasound and Doppler measurements, achieved strong predictive ability. In an attempt to improve accuracy, we integrated maternal clinical factors into the model. Notably, the combined clinical and biometric model did not outperform the biometric model, suggesting that the inclusion of clinical variables added limited incremental value in our logistic regression models. This finding contrasts with previous studies. Weschenfelder et al. assessed a clinical model and reported improved performance following the addition of ultrasound parameters [40]. Similarly, Erkamp et al. found that the combination of clinical variables with ultrasound and Doppler data improved predictive accuracy compared to clinical data alone [41]. These divergent results highlight the complexity of predicting perinatal outcomes and suggest that the contribution of clinical and ultrasonographic variables may vary depending on the study population, timing of assessment, and model structure.
The biometric model demonstrated superior predictive performance compared to the EFW model, with AUCs of 84.7% and 77.5%, respectively. This result may be attributed to the model’s assessment of all fetal biometric measurements individually rather than the EFW alone, but also to the addition of Doppler ultrasound measurements and amniotic fluid assessment. In the literature, Pilalis et al. have reported a similar comparison, showing that incorporating Doppler and clinical variables into an EFW model significantly improved predictive accuracy [42]. These results reveal the limitations of generalized predictive formulas, suggesting that population-specific modeling may offer superior performance. However, the direct comparison of EFW to its individual components plus Doppler may not be methodologically neutral and could disadvantage the EFW model. Future work should test models that integrate EFW with Doppler and maternal variables. Moreover, given the heterogeneity in maternal and fetal characteristics across populations, clinical centers may benefit from developing models based on individual measurements tailored to their specific population, thereby optimizing predictive ability.

4.3. Strengths and Limitations

This study has several notable strengths. First, we comprehensively assessed multiple fetal biometric measurements individually, including HC, AC, FL, and HC-to-AC ratio, which proved to be superior in predicting LGA neonates compared to the EFW. Notably, the study also uniquely incorporated mUtA-PI and the presence of polyhydramnios, parameters not typically assessed in similar studies, offering a more complete picture of fetal and placental hemodynamics. Moreover, this population was routinely screened in the first trimester, all pregnancies were appropriately dated, and all scans were performed by trained physicians. Also, all women underwent routine GDM screening at 26+0 to 27+6 weeks. Another major advantage is the internal validation of the models using bootstrap resampling, with evaluation metrics remaining stable before and after optimism correction, and preserved performance on temporal validation, indicating reliable, well-calibrated, and generalizable predictions. An additional strength of this study is the use of decision curve analysis, which demonstrated that Models 1 and 2 provide greater net benefits than the treat-all or treat-none strategies across clinically relevant risk thresholds, supporting their potential clinical usefulness. Finally, the development of three distinct multivariable prediction models, together with subgroup analyses for pregnancies with and without GDM or pre-existing diabetes, provides valuable information on model performance across different clinical contexts.
Several limitations should also be acknowledged. First, the retrospective, single-center design may limit the generalizability of the findings to more diverse populations. External validation in independent cohorts is therefore essential and is planned as the next step. Although decision curve analysis suggested potential clinical usefulness, the models still need to be evaluated in clinical settings. Prospective validation is required to determine whether using these models improves maternal and neonatal outcomes and whether they can be integrated into routine third-trimester care. Second, the assessment window of 30–34 weeks was selected to align with the standard timing of the third-trimester ultrasound scan in Greece, which enhances clinical applicability but restricts the evaluation of data obtained beyond 34 weeks, when substantial fetal growth occurs. A third consideration is the models’ performance at the conventional 0.5 probability threshold, which yields high specificity at the cost of low sensitivity. Rather than an inherent model weakness, this reflects a specific operating point that prioritizes minimizing false-positive results. The models’ clinical value lies in their flexibility; by adjusting the decision threshold, sensitivity can be substantially increased to approximately 80% (at a 0.1 threshold), allowing the tool to be calibrated for different clinical objectives. Finally, other limitations include the imbalanced size of the LGA and non-LGA groups and the unavailability of data on a prior history of delivering an LGA neonate, which is a known risk factor.

5. Conclusions

In summary, this study showed that a prediction model based on individual fetal biometric and Doppler measurements at 30–34 weeks can predict LGA neonates at birth with good accuracy, outperforming a model based solely on EFW and performing similarly to a model that additionally includes maternal factors. The model demonstrated adequate discriminative ability, with an AUC of up to 84.8%, and important clinical flexibility. By adjusting the decision threshold, sensitivity can be increased from around 23.4% to over 58% while maintaining high specificity (>87%). This adaptability, driven by threshold selection, establishes the model as a valuable and practical tool for early LGA risk stratification. Following internal bootstrap validation, temporal validation, and decision curve analysis demonstrating robust performance and potential clinical utility, external validation in independent populations represents the essential next step before clinical implementation.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/diagnostics16020187/s1, Figure S1: Calibration plots of prediction models; Figure S2: Decision curve analysis; Table S1: Missing data table; Table S2: Predictive performance of prediction model 1 calculated for various probability thresholds; Table S3: Predictive performance of prediction model 2 calculated for various probability thresholds; Table S4: Predictive performance of prediction model 3 calculated for various probability thresholds; Table S5: Predictive performance comparison of the prediction models of GDM or pre-existing diabetes’ population; Table S6: Predictive performance comparison of the prediction models of non-GDM, non-pre-existing diabetes’ population.

Author Contributions

Conceptualization, I.T. and T.D.; Data Curation, T.D.; Formal Analysis, V.B., A.T., and D.S.; Investigation, V.B., A.T., A.S., C.C., and A.M.; Methodology, V.B., A.T., A.S., S.S., I.T., and T.D.; Resources, S.S. and T.D.; Software, A.T. and D.S.; Supervision, I.T. and T.D.; Visualization, A.T., A.S., and S.S.; Writing—Original Draft, V.B., A.T., A.S., and A.P.; Writing—Review and Editing, A.S., S.S., C.C., A.A., A.M., I.T., and T.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

The study was part of a prospective cohort approved by the Bioethics Committee of the Aristotle University of Thessaloniki, Greece (No. 6.231/29 July 2020), and was conducted in accordance with the Declaration of Helsinki.

Data Availability Statement

The data presented in this study are available on reasonable request from the corresponding author. The data are not publicly available due to privacy restrictions.

Acknowledgments

The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Ewington, L.; Black, N.; Leeson, C.; Al Wattar, B.H.; Quenby, S. Multivariable Prediction Models for Fetal Macrosomia and Large for Gestational Age: A Systematic Review. BJOG Int. J. Obstet. Gynaecol. 2024, 131, 1591–1602. [Google Scholar] [CrossRef] [PubMed]
  2. Blue, N.R.; Yordan, J.M.P.; Holbrook, B.D.; Nirgudkar, P.A.; Mozurkewich, E.L. Abdominal Circumference Alone versus Estimated Fetal Weight after 24 Weeks to Predict Small or Large for Gestational Age at Birth: A Meta-Analysis. Am. J. Perinatol. 2017, 34, 1115–1124. [Google Scholar] [CrossRef]
  3. Zhao, Y.; Li, D.-Z. Born Large for Gestational Age: Not Just Bigger. Am. J. Obstet. Gynecol. 2023, 228, 366–367. [Google Scholar] [CrossRef]
  4. Zhang, Y.; Liu, P.; Zhou, W.; Hu, J.; Cui, L.; Chen, Z.-J. Association of Large for Gestational Age with Cardiovascular Metabolic Risks: A Systematic Review and Meta-Analysis. Obesity 2023, 31, 1255–1269. [Google Scholar] [CrossRef] [PubMed]
  5. Viswanathan, S.; McNelis, K.; Makker, K.; Calhoun, D.; Woo, J.G.; Balagopal, B. Childhood Obesity and Adverse Cardiometabolic Risk in Large for Gestational Age Infants and Potential Early Preventive Strategies: A Narrative Review. Pediatr. Res. 2022, 92, 653–661. [Google Scholar] [CrossRef]
  6. Vora, N.; Bianchi, D.W. Genetic Considerations in the Prenatal Diagnosis of Overgrowth Syndromes. Prenat. Diagn. 2009, 29, 923–929. [Google Scholar] [CrossRef]
  7. Boulvain, M.; Senat, M.-V.; Perrotin, F.; Winer, N.; Beucher, G.; Subtil, D.; Bretelle, F.; Azria, E.; Hejaiej, D.; Vendittelli, F.; et al. Induction of Labour versus Expectant Management for Large-for-Date Fetuses: A Randomised Controlled Trial. Lancet 2015, 385, 2600–2605. [Google Scholar] [CrossRef]
  8. Tsakiridis, I.; Mamopoulos, A.; Athanasiadis, A.; Dagklis, T. Induction of Labor: An Overview of Guidelines. Obstet. Gynecol. Surv. 2020, 75, 61–72. [Google Scholar] [CrossRef]
  9. Giouleka, S.; Tsakiridis, I.; Ralli, E.; Mamopoulos, A.; Kalogiannidis, I.; Athanasiadis, A.; Dagklis, T. Diagnosis and Management of Macrosomia and Shoulder Dystocia: A Comprehensive Review of Major Guidelines. Obstet. Gynecol. Surv. 2024, 79, 233–241. [Google Scholar] [CrossRef]
  10. Badr, D.A.; Carlin, A.; Kadji, C.; Kang, X.; Cannie, M.M.; Jani, J.C. Timing of Induction of Labor in Suspected Macrosomia: Retrospective Cohort Study, Systematic Review and Meta-Analysis. Ultrasound Obstet. Gynecol. Off. J. Int. Soc. Ultrasound Obstet. Gynecol. 2024, 64, 443–452. [Google Scholar] [CrossRef] [PubMed]
  11. Khalil, A.; Sotiriadis, A.; D’Antonio, F.; Da Silva Costa, F.; Odibo, A.; Prefumo, F.; Papageorghiou, A.T.; Salomon, L.J. ISUOG Practice Guidelines: Performance of Third-Trimester Obstetric Ultrasound Scan. Ultrasound Obstet. Gynecol. Off. J. Int. Soc. Ultrasound Obstet. Gynecol. 2024, 63, 131–147. [Google Scholar] [CrossRef]
  12. Dagklis, T.; Papastefanou, I.; Tsakiridis, I.; Sotiriadis, A.; Makrydimas, G.; Athanasiadis, A. Validation of Fetal Medicine Foundation Competing-Risks Model for Small-for-Gestational-Age Neonate in Early Third Trimester. Ultrasound Obstet. Gynecol. Off. J. Int. Soc. Ultrasound Obstet. Gynecol. 2024, 63, 466–471. [Google Scholar] [CrossRef]
  13. Fitiri, M.; Papavasileiou, D.; Mesaric, V.; Syngelaki, A.; Akolekar, R.; Nicolaides, K.H. Routine 36-Week Scan: Diagnosis and Outcome of Abnormal Fetal Presentation. Ultrasound Obstet. Gynecol. Off. J. Int. Soc. Ultrasound Obstet. Gynecol. 2025, 65, 154–162. [Google Scholar] [CrossRef]
  14. Ficara, A.; Syngelaki, A.; Hammami, A.; Akolekar, R.; Nicolaides, K.H. Value of Routine Ultrasound Examination at 35-37 Weeks’ Gestation in Diagnosis of Fetal Abnormalities. Ultrasound Obstet. Gynecol. Off. J. Int. Soc. Ultrasound Obstet. Gynecol. 2020, 55, 75–80. [Google Scholar] [CrossRef] [PubMed]
  15. Khan, N.; Ciobanu, A.; Karampitsakos, T.; Akolekar, R.; Nicolaides, K.H. Prediction of Large-for-Gestational-Age Neonate by Routine Third-Trimester Ultrasound. Ultrasound Obstet. Gynecol. Off. J. Int. Soc. Ultrasound Obstet. Gynecol. 2019, 54, 326–333. [Google Scholar] [CrossRef] [PubMed]
  16. Santos, S.; Voerman, E.; Amiano, P.; Barros, H.; Beilin, L.J.; Bergström, A.; Charles, M.-A.; Chatzi, L.; Chevrier, C.; Chrousos, G.P.; et al. Impact of Maternal Body Mass Index and Gestational Weight Gain on Pregnancy Complications: An Individual Participant Data Meta-Analysis of European, North American and Australian Cohorts. BJOG Int. J. Obstet. Gynaecol. 2019, 126, 984–995. [Google Scholar] [CrossRef]
  17. Malaza, N.; Masete, M.; Adam, S.; Dias, S.; Nyawo, T.; Pheiffer, C. A Systematic Review to Compare Adverse Pregnancy Outcomes in Women with Pregestational Diabetes and Gestational Diabetes. Int. J. Environ. Res. Public Health 2022, 19, 10846. [Google Scholar] [CrossRef] [PubMed]
  18. Frick, A.P.; Syngelaki, A.; Zheng, M.; Poon, L.C.; Nicolaides, K.H. Prediction of Large-for-Gestational-Age Neonates: Screening by Maternal Factors and Biomarkers in the Three Trimesters of Pregnancy. Ultrasound Obstet. Gynecol. Off. J. Int. Soc. Ultrasound Obstet. Gynecol. 2016, 47, 332–339. [Google Scholar] [CrossRef]
  19. HAPO Study Cooperative Research Group; Metzger, B.E.; Lowe, L.P.; Dyer, A.R.; Trimble, E.R.; Chaovarindr, U.; Coustan, D.R.; Hadden, D.R.; McCance, D.R.; Hod, M.; et al. Hyperglycemia and Adverse Pregnancy Outcomes. N. Engl. J. Med. 2008, 358, 1991–2002. [Google Scholar] [CrossRef]
  20. Saini, R.; Bachani, S.; Suri, J.; Gupta, M.; Gupta, A.; Sharma, P.; Debata, P. Comparison of Hadlock and INTERGROWTH-21st Growth Charts for Estimating Fetal Weight in the Third Trimester via Ultrasound. Cureus 2025, 17, e81333. [Google Scholar] [CrossRef]
  21. Wade, D.T. Ethics, Audit, and Research: All Shades of Grey. BMJ 2005, 330, 468–471. [Google Scholar] [CrossRef]
  22. Snijders, R.J.; Nicolaides, K.H. Fetal Biometry at 14–40 Weeks’ Gestation. Ultrasound Obstet. Gynecol. Off. J. Int. Soc. Ultrasound Obstet. Gynecol. 1994, 4, 34–48. [Google Scholar] [CrossRef]
  23. Gómez, O.; Figueras, F.; Fernández, S.; Bennasar, M.; Martínez, J.M.; Puerto, B.; Gratacós, E. Reference Ranges for Uterine Artery Mean Pulsatility Index at 11–41 Weeks of Gestation. Ultrasound Obstet. Gynecol. Off. J. Int. Soc. Ultrasound Obstet. Gynecol. 2008, 32, 128–132. [Google Scholar] [CrossRef]
  24. Sotiriadis, A.; Figueras, F.; Eleftheriades, M.; Papaioannou, G.K.; Chorozoglou, G.; Dinas, K.; Papantoniou, N. First-Trimester and Combined First- and Second-Trimester Prediction of Small-for-Gestational Age and Late Fetal Growth Restriction. Ultrasound Obstet. Gynecol. Off. J. Int. Soc. Ultrasound Obstet. Gynecol. 2019, 53, 55–61. [Google Scholar] [CrossRef]
  25. Westerway, S.C. Estimating Fetal Weight for Best Clinical Outcome. Australas. J. Ultrasound Med. 2012, 15, 13–17. [Google Scholar] [CrossRef]
  26. Jakobsen, J.C.; Gluud, C.; Wetterslev, J.; Winkel, P. When and How Should Multiple Imputation Be Used for Handling Missing Data in Randomised Clinical Trials—A Practical Guide with Flowcharts. BMC Med. Res. Methodol. 2017, 17, 162. [Google Scholar] [CrossRef]
  27. Collins, G.S.; Reitsma, J.B.; Altman, D.G.; Moons, K.G.M. Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD): The TRIPOD Statement. BMJ 2015, 350, g7594. [Google Scholar] [CrossRef]
  28. Pavlou, M.; Ambler, G.; Seaman, S.R.; Guttmann, O.; Elliott, P.; King, M.; Omar, R.Z. How to Develop a More Accurate Risk Prediction Model When There Are Few Events. BMJ 2015, 351, h3868. [Google Scholar] [CrossRef] [PubMed]
  29. Yin, B.; Hu, L.; Wu, K.; Sun, Y.; Meng, X.; Zheng, W.; Zhu, B. Maternal Gestational Weight Gain and Adverse Pregnancy Outcomes in Non-Diabetic Women. J. Obstet. Gynaecol. J. Inst. Obstet. Gynaecol. 2023, 43, 2255010. [Google Scholar] [CrossRef]
  30. Goldstein, R.F.; Abell, S.K.; Ranasinha, S.; Misso, M.; Boyle, J.A.; Black, M.H.; Li, N.; Hu, G.; Corrado, F.; Rode, L.; et al. Association of Gestational Weight Gain With Maternal and Infant Outcomes: A Systematic Review and Meta-Analysis. JAMA 2017, 317, 2207–2225. [Google Scholar] [CrossRef]
  31. Yang, W.; Liu, J.; Li, J.; Liu, J.; Liu, H.; Wang, Y.; Leng, J.; Wang, S.; Chen, H.; Chan, J.C.N.; et al. Interactive Effects of Prepregnancy Overweight and Gestational Diabetes on Macrosomia and Large for Gestational Age: A Population-Based Prospective Cohort in Tianjin, China. Diabetes Res. Clin. Pract. 2019, 154, 82–89. [Google Scholar] [CrossRef]
  32. Rosen, H.; Shmueli, A.; Ashwal, E.; Hiersch, L.; Yogev, Y.; Aviram, A. Delivery Outcomes of Large-for-Gestational-Age Newborns Stratified by the Presence or Absence of Gestational Diabetes Mellitus. Int. J. Gynaecol. Obstet. Off. Organ Int. Fed. Gynaecol. Obstet. 2018, 141, 120–125. [Google Scholar] [CrossRef] [PubMed]
  33. Lwin, M.W.; Timby, E.; Ivarsson, A.; Eurenius, E.; Vaezghasemi, M.; Silfverdal, S.-A.; Lindkvist, M. Abnormal Birth Weights for Gestational Age in Relation to Maternal Characteristics in Sweden: A Five Year Cross-Sectional Study. BMC Public Health 2023, 23, 976. [Google Scholar] [CrossRef]
  34. American College of Obstetricians and Gynecologists. Practice Bulletin No. 173: Fetal Macrosomia. Obstet. Gynecol. 2016, 128, e195–e209. [Google Scholar] [CrossRef]
  35. Lei, F.; Zhang, L.; Shen, Y.; Zhao, Y.; Kang, Y.; Qu, P.; Mi, B.; Dang, S.; Yan, H. Association between Parity and Macrosomia in Shaanxi Province of Northwest China. Ital. J. Pediatr. 2020, 46, 24. [Google Scholar] [CrossRef]
  36. Moraitis, A.A.; Shreeve, N.; Sovio, U.; Brocklehurst, P.; Heazell, A.E.P.; Thornton, J.G.; Robson, S.C.; Papageorghiou, A.; Smith, G.C. Universal Third-Trimester Ultrasonic Screening Using Fetal Macrosomia in the Prediction of Adverse Perinatal Outcome: A Systematic Review and Meta-Analysis of Diagnostic Test Accuracy. PLoS Med. 2020, 17, e1003190. [Google Scholar] [CrossRef] [PubMed]
  37. Rathcke, S.L.; Sinding, M.M.; Christensen, T.T.; Uldbjerg, N.; Christiansen, O.B.; Kornblad, J.; Søndergaard, K.H.; Krogh, S.; Sørensen, A.N.W. Prediction of Large-for-Gestational-Age at Birth Using Fetal Biometry in Type 1 and Type 2 Diabetes: A Retrospective Cohort Study. Int. J. Gynaecol. Obstet. Off. Organ Int. Fed. Gynaecol. Obstet. 2024, 167, 695–704. [Google Scholar] [CrossRef]
  38. Zhi, R.; Tao, X.; Li, Q.; Yu, M.; Li, H. Association between Transabdominal Uterine Artery Doppler and Small-for-Gestational-Age: A Systematic Review and Meta-Analysis. BMC Pregnancy Childbirth 2023, 23, 659. [Google Scholar] [CrossRef] [PubMed]
  39. Ip, P.N.P.; Nguyen-Hoang, L.; Chaemsaithong, P.; Guo, J.; Wang, X.; Sahota, D.S.; Chung, J.P.W.; Poon, L.C.Y. Ultrasonographic Placental Parameters at 11–13+6 Weeks’ Gestation in the Prediction of Complications in Pregnancy after Assisted Reproductive Technology. Taiwan. J. Obstet. Gynecol. 2024, 63, 341–349. [Google Scholar] [CrossRef]
  40. Weschenfelder, F.; Baum, N.; Lehmann, T.; Schleußner, E.; Groten, T. The Relevance of Fetal Abdominal Subcutaneous Tissue Recording in Predicting Perinatal Outcome of GDM Pregnancies: A Retrospective Study. J. Clin. Med. 2020, 9, 3375. [Google Scholar] [CrossRef]
  41. Erkamp, J.S.; Voerman, E.; Steegers, E.A.P.; Mulders, A.G.M.G.J.; Reiss, I.K.M.; Duijts, L.; Jaddoe, V.W.V.; Gaillard, R. Second and Third Trimester Fetal Ultrasound Population Screening for Risks of Preterm Birth and Small-Size and Large-Size for Gestational Age at Birth: A Population-Based Prospective Cohort Study. BMC Med. 2020, 18, 63. [Google Scholar] [CrossRef] [PubMed]
  42. Pilalis, A.; Souka, A.P.; Papastefanou, I.; Michalitsi, V.; Panagopoulos, P.; Chrelias, C.; Kassanos, D. Third Trimester Ultrasound for the Prediction of the Large for Gestational Age Fetus in Low-Risk Population and Evaluation of Contingency Strategies. Prenat. Diagn. 2012, 32, 846–853. [Google Scholar] [CrossRef] [PubMed]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.