Predictors of Newborn’s Weight for Height: A Machine Learning Study Using Nationwide Multicenter Ultrasound Data

There has been no machine learning study with a rich collection of clinical, sonographic markers to compare the performance measures for a variety of newborns’ weight-for-height indicators. This study compared the performance measures for a variety of newborns’ weight-for-height indicators based on machine learning, ultrasonographic data and maternal/delivery information. The source of data for this study was a multi-center retrospective study with 2949 mother–newborn pairs. The mean-squared-error-over-variance measures of five machine learning approaches were compared for newborn’s weight, newborn’s weight/height, newborn’s weight/height2 and newborn’s weight/hieght3. Random forest variable importance, the influence of a variable over average node impurity, was used to identify major predictors of these newborns’ weight-for-height indicators among ultrasonographic data and maternal/delivery information. Regarding ultrasonographic fetal biometry, newborn’s weight, newborn’s weight/height and newborn’s weight/height2 were better indicators with smaller mean-squared-error-over-variance measures than newborn’s weight/height3. Based on random forest variable importance, the top six predictors of newborn’s weight were the same as those of newborn’s weight/height and those of newborn’s weight/height2: gestational age at delivery time, the first estimated fetal weight and abdominal circumference in week 36 or later, maternal weight and body mass index at delivery time, and the first biparietal diameter in week 36 or later. These six predictors also ranked within the top seven for large-for-gestational-age and the top eight for small-for-gestational-age. In conclusion, newborn’s weight, newborn’s weight/height and newborn’s weight/height2 are more suitable for ultrasonographic fetal biometry with smaller mean-squared-error-over-variance measures than newborn’s weight/height3. Machine learning with ultrasonographic data would be an effective noninvasive approach for predicting newborn’s weight, weight/height and weight/height2.


Introduction
Newborns' underweight and children's obesity are significant contributors for disease burden on the globe. One in every seven newborns in the world suffered from underweight in 2015 and these babies are more likely to experience death in the initial 28 days of life than common babies [1]. Similarly, 40 million children aged five or less in the world were characterized by overweight or obesity in 2016 [2], and this is likely to cause various diseases in their subsequent life such as asthma, cardiovascular disorders, depression, diabetes, dyslipidemia and hypertension [3][4][5][6][7][8]. In this context, the World Health Organization champions a global goal "No Increase in Childhood Overweight by 2025" [9].
Likewise, existing literature has attempted to examine newborn's weight and its significant predictor variables among ultrasonographic data and maternal/delivery information [10][11][12][13]. These studies adopted linear regression, and hence they could not analyze (1) which predictor variables are more important for predicting newborn's weight, or (2) what time is the best for taking ultrasonographic data. To overcome these limitations, a more recent study employed machine learning and made predictions for newborn's body mass index from ultrasonographic data and maternal/delivery information [14]. The findings of this study agreed with those of existing literature stating that newborn's weight/height 2 would be a good alternative measure of newborn's adiposity to newborn's weight [15][16][17].
However, an optimal index for classifying underweight and overweight in children under 2 years of age has not been established yet, while conventional studies still ignore newborn's weight/height and weight/height 3 (Ponderal Index). Here, the Ponderal Index is designed to reflect the three-dimensional (volume) information (height 3 ) [18]. To our best knowledge, there has been no machine learning study with a rich collection of clinical sonographic markers to compare the performance measures for a variety of newborns' weight-for-height indicators. In this context, this study compared the performance measures for a variety of newborn's weight-for-height indicators based on machine learning, ultrasonographic data and maternal/delivery information. This study includes four weight-for-height indicators, that is, newborn's weight, weight/height, weight/height 2 and weight/height 3 . In addition, this study features 64 clinical, sonographic markers and 2949 mother-baby pairs. The ultimate goal of this study is to test the following null and alternative hypotheses: Null Hypothesis: Newborn's weight, newborn's weight/height, newborn's weight/height 2 and newborn's weight/height 3 are equally suitable for ultrasonographic fetal biometry.

Participants
The source of data for this multi-center retrospective study was the same as in [14], the medical records of 2949 mother-baby pairs (see [14] for more detailed description). The study period was September 2019-March 2021 and the participating institutions were 48 general hospitals. This study was approved by institutional review boards of the fortyeight hospitals such as Korea University Anam Hospital (2019AN0433) participating in the study. Informed consent was waived by the institutional review boards. No administrative permissions or licenses were acquired by the authors to access the data used in this study. Then, data collection, analysis and interpretation followed.

Variables
The dependent variables were newborn's weight, weight/height and weight/height 3 . Newborn's weight and height were recorded at the time of birth. The following 64 independent variables were considered: (1) maternal data including age (years), children alive, height, pre-gestational weight, weight at delivery time, pre-gestational body mass index, body mass index at delivery time, term births, preterm births, abortions; (2) gestational age, ultrasound measures (see their notations in Table S1 (Supplementary Materials)); and (3) delivery/newborn data such as gestational age at delivery (weeks/days), Apgar scores in 1 and 5 min after delivery, caesarean delivery methods (no vs. yes), newborn's sex-female (no vs. yes), neonatal intensive care unit hospitalization (no vs. yes). All participating institutions adopted Hadlock's formula [19] for the estimation of EFW (except one participating institution that employed the Shinozuka's formula [20]). These formulas use the same parameters and register similar performances to predict newborn's weight [21].

Analysis
Five machine learning approaches were adopted for the prediction of newborn's weight, weight/height and weight/height 3 : linear regression, random forest and artificial neural networks with one, two and three hidden layers [14,22]. Data on 2949 mother-baby pairs were split into training and validation sets with a 75:25 ratio (2212 vs. 737 mother-baby pairs). The mean squared error (MSE), the average of the squares of errors among 737 mother-baby pairs, was employed as a performance measure. The unit of the MSE is the squared unit of the dependent variable. The MSE is not appropriate for the comparison of model performance across different dependent variables with different units. For this reason, the MSE divided by the variance of the dependent variable (MSE over variance) was introduced for the comparison of model performance across different dependent variables with different units. Finally, random forest variable importance, the influence of a variable over average node impurity, was introduced for identifying most important predictor variables of newborn's weight, weight/height and weight/height 3 among ultrasonographic data and maternal/delivery information. R-Studio was used for the analysis on March 2021. It needs to be noted that the results for newborn's weight/height 2 were adopted from [14] and were compared with those for newborn's weight, weight/height and weight/height 3 in this study.

Results
Descriptive statistics in this study are given in Table 1. The respective median (Q2) values of newborn's weight, weight/height, weight/height 3 , GA36AC1 (the first abdominal circumference in week 36 or later), GA36EFW1 (the first estimated fetal weight in week 36 or later) and gestational age at delivery time were 3.17 kg, 6.36 kg/m, 25.68 kg/m 3 , 322 mm, 2866 g and 38 weeks. The respective median values of GA21AC1 (the first abdominal circumference during week 21-week 35) and maternal body mass index at delivery time were 214.70 mm and 26.04 kg/m 2 . The proportion of neonatal intensive care unit hospitalization was 12% (354/2949). The MSEs of the five machine learning models for newborn's weight, weight/height, weight/height 2 and weight/height 3 are presented in Table 2. The data were split, and the analysis was performed three times; then, the average MSE was obtained for each of the five statistical methods. Linear regression and the random forest were better models with smaller MSEs than the artificial neural networks for predicting newborn's weight-for-height indicators. More importantly, newborn's weight, newborn's weight/height and newborn's weight/height 2 were better indicators with smaller MSE-over-variance measures than newborn's weight/height 3 .
Based on random forest variable importance, the top six predictor variables of newborn's weight were the same with those of newborn's weight/height and newborn's weight/height 2 : Gestational age at delivery time, the first EFW and AC in week 36 or later, maternal weight and body mass index at delivery time, and the first BPD (biparietal diameter) in week 36 or later (See Tables 3-5, Table S2(1-3) (Supplementary Materials) and Figure S1(1-3) (Supplementary Materials) in this study, Table 3 and Figure 1 in [14]). Eight among the top ten predictor variables of newborn's weight/height 3 were identical to those of newborn's weight, weight/height and weight/height 2 . However, the importance ranking of the first EFW in week 36 or later was lower for newborn's weight/height 3 than for the other three indicators, and vice versa for the first AC during week 21-week 35. Indeed, the results of linear regression are informative regarding the effects of important predictor variables on newborn's weight or weight/height. For example, newborn's weight will increase by 170 g if gestational age at delivery time increases by 1 week. Newborn's weight/height will increase by 0.05 g/m if the first EFW in week 36 or later increases by 1 g.     Finally, the random forest variable importance of predictors for large-for-gestationalage (LGA) and small-for-gestational-age (SGA) are presented in Figures 1 and 2, respectively. The top six predictor variables of newborn's weight, weight/height and weight/height 2 also ranked within the top seven for LGA and the top eight for SGA: gestational age at delivery time, the first EFW and AC in week 36 or later, maternal weight and body mass index at delivery time, and the first BPD in week 36 or later. Moreover, the importance rankings of the top three predictors for newborn's weight, weight/height and weight/height 2 were within the top four for LGA and the top three for SGA as well: gestational age at delivery time, and the first EFW and AC in week 36 or later.    The results of this study support the alternative hypothesis: newborn's weight, newborn's weight/height, newborn's weight/height 2 and newborn's weight/height 3 are not equally suitable for ultrasonographic fetal biometry. It was found in this study that newborn's weight, newborn's weight/height and newborn's weight/height 2 are more suitable for ultrasonographic fetal biometry than newborn's weight/height 3 .

Principal Findings
Newborn's weight/height and newborn's weight/height 2 are more suitable for ultrasonographic fetal biometry with smaller MSE-over-variance measures than newborn's weight/height 3 . The top six predictor variables of newborn weight were the same as those of newborn weight/height and those of newborn weight/height 2 : gestational age at delivery time, the first EFW and AC in week 36 or later, maternal weight and body mass index at delivery time, and the first BPD in week 36 or later. These six predictors also ranked within the top seven for large-for-gestational-age and the top eight for small-for-gestational-age.

Clinical and Research Implications
The findings of this study above are consistent with those of the previous study [14]: week 36 or later is the best time to take ultrasonographic data, and AC and EFW are the most important predictor variables of newborn's weight/height 2 together with gestational age at delivery and maternal body mass index at delivery. However, the previous study ignored newborn's weight/height, which can be another good alternative measure of newborn's adiposity to newborn's weight. As a matter of fact, there is no consensus on the best weightfor-height indicator for newborns and children under the age of 2, in part because babies born earlier are more heterogeneous in terms of weight for height than babies born later [18]. Newborn thinness is considered to be a risk factor for adult chronic disease, but it is not clear which of a newborn's weight-for-height indicators (e.g., weight/height, weight/height 2 , weight/height 3 ) are the best indicators for adult chronic disease [18]. Given that newborn thinness is known to be a risk factor for adult chronic disease, it would be worthwhile to shift our attention to newborn weight-for-height indicators and their prenatal predictors. This will help to develop a new research tradition covering health conditions across different life periods, i.e., prenatal, newborns, children and adults. In this context, this study compared the performance measures for a variety of newborn weight-for-height indicators based on machine learning, ultrasonographic data and maternal/delivery information. To the best of our knowledge, there has been no study on this topic in this direction. The findings of this study suggest that machine learning with ultrasonographic data would be an effective noninvasive approach for predicting a newborn's weight, weight/height and weight/height 2 . Specifically, the results of this study bring the following clinical implication for the prognosis of adiposity for newborns and children under the age of 2 (with no current consensus on their best weight-for-height indicators): clinicians are recommended to use a newborn's weight, weight/height or weight/height 2 as an indicator of a newborn's adiposity when they employ ultrasonographic fetal biometry.

Strengths and Limitations
To the best of our knowledge, there has been no machine learning study with a rich collection of clinical, sonographic markers to compare the performance measures for a variety of newborns' weight-for-height indicators. In this context, this study compared the performance measures for a variety of newborn's weight-for-height indicators based on machine learning, ultrasonographic data and maternal/delivery information. This study included four weight-for-height indicators, that is, newborn's weight, weight/height, weight/height 2 and weight/height 3 . In addition, this study featured 64 clinical, sonographic markers and 2949 mother-baby pairs. However, this study had some limitations. Firstly, this study did not include possible mediating effects. Secondly, this study did not consider socioeconomic determinants, disease information (diabetes, gastroesophageal reflux disease, hypertension, periodontitis), medication history (benzodiazepine, calcium channel blocker, nitrate, progesterone, proton pump inhibitor, sleeping pills, antidepressant) and obstetric information (in vitro fertilization, myoma uteri, prior cone). These factors have been reported to influence delivery outcome [23][24][25] and it would be a useful extension to consider these new variables. Thirdly, additional examination of symptomatic vs. asymptomatic, single vs. multiple gestation, is expected to provide more insights and implications on this important topic.

Conclusions
This is the first study to compare the performance measures for a variety of newborn's weight-for-height indicators based on machine learning, ultrasonographic data and maternal/delivery information. Newborn's weight, newborn's weight/height and newborn's weight/height 2 are more suitable for ultrasonographic fetal biometry with smaller MSE-over-variance measures than newborn's weight/height 3 . Machine learning with ultrasonographic data would be an effective noninvasive approach for predicting newborn's weight, weight/height and weight/height 2 .
Author Contributions: K.H.A., K.-S.L. and S.N. contributed to conception, design, data analysis, manuscript writing and manuscript review. All authors contributed to conception, design and manuscript review. All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.
Institutional Review Board Statement: This study was approved by institutional review boards of forty-eight hospitals such as Korea University Anam Hospital (2019AN0433; approval date: 20 December 2020) participating in the study. Informed consent was waived by the institutional review boards. No administrative permissions or licenses were acquired by the authors to access the data used in this study.
Informed Consent Statement: Informed consent was waived by the institutional review boards, given that data were deidentified.

Data Availability Statement:
The datasets used and/or analyzed during the current study are available from the corresponding authors on reasonable request.