Predicting Childhood Obesity Using Machine Learning: Practical Considerations

: Previous studies demonstrate the feasibility of predicting obesity using various machine learning techniques; however, these studies do not address the limitations of these methods in real-life settings where available data for children may vary. We investigated the medical history required for machine learning models to accurately predict body mass index (BMI) during early childhood. Within a longitudinal dataset of children ages 0–4 years, we developed predictive models based on long short-term memory (LSTM), a recurrent neural network architecture, using history EHR data from 2 to 8 clinical encounters to estimate child BMI. We developed separate, sex-stratiﬁed models using 80% of the data for training and 20% for external validation. We evaluated model performance using K-fold cross-validation, mean average error (MAE), and Pearson’s correlation coefﬁcient (R 2 ). Two history encounters and a 4-month prediction yielded a high prediction error and low correlation between predicted and actual BMI (MAE of 1.60 for girls and 1.49 for boys). Model performance improved with additional history encounters; improvement was not signiﬁcant beyond ﬁve history encounters. The combined model outperformed the sex-stratiﬁed models, with a MAE = 0.98 (SD 0.03) and R 2 = 0.72. Our models show that ﬁve history encounters are sufﬁcient to predict BMI prior to age 4 for both boys and girls. Moreover, starting from an initial dataset with more than 269 exposure variables, we were able to identify a limited set of 24 variables that can facilitate BMI prediction in early childhood. Nine of these ﬁnal variables are collected once, and the remaining 15 need to be updated during each visit.


Introduction
While previously uncommon in young children, obesity is now a worldwide epidemic affecting over 40 million children under the age of 5 [1,2]. Obesity in childhood is associated with both adverse outcomes like hyperlipidemia, diabetes and hypertension [3][4][5][6], as well as with higher morbidity and mortality in adulthood [7]. The underlying causes of obesity are modifiable risk factors throughout the life course; these risk factors represent major causes of health inequalities [8]. Thus, the prevention of obesity is considered a national and global health priority [9].
Unhealthy weight gain during early childhood significantly increases the risk for obesity later in life [10,11], so the ability to identify children at a young age who carry the greatest risk for obesity could significantly improve prevention efforts [12]. Several important and potentially modifiable indicators of obesity have been identified during this timeframe, including rapid infant weight gain, poor infant sleep quality, birth weight, and maternal characteristics (e.g., current and pre-pregnancy weight, depression) [13,14]. Despite this, there has been relatively limited research into predictive modeling of childhood obesity risk, leaving many unanswered questions about how and when to intervene.
Existing research to evaluate obesity risk has predominantly employed logistic regression techniques, with limited success. The constraints of traditional regression approaches (e.g., restricting analyses to a relatively small number of predictors and assumptions of independence and linearity) have prompted others to examine non-linear interactions via machine learning [14][15][16]. Machine learning is increasingly recognized as useful for preventive care [17] because of its ability to characterize, adapt, learn, predict and analyze clinical data. However, one of the main challenges in employing machine learning in the clinical domain is that electronic health record (EHR) data are often incomplete and irregularly sampled (e.g., lacking regular time intervals between patient visits). In addition, height and weight, which are necessary to calculate BMI, are collected during pediatric visits in the first 2 years of life [18], but not routinely as pediatric appointments are often missed [19]. These issues hinder the performance of predictive models using EHR data. Recent techniques in deep learning and artificial neural networks address these issues and have the potential to predict health outcomes more accurately by using EHR data.
In this study, we used a longitudinal, EHR-derived dataset of children to investigate the medical history needed for a recurrent machine learning model to accurately predict BMI prior to age 4 years. Our secondary aim was to understand whether BMI prediction varies considerably between boys and girls, which would require separate BMI prediction models for each sex.
Previous studies have used machine learning techniques to develop obesity prediction models or to determine key determinants of obesity for designing intervention tools [14,20]. However, as discussed by Siddiqui et al. [20], very few of these studies analyze sex-specific prediction models, use large-scale datasets, or examine geographic/neighborhood exposure variables (e.g., access to food and opportunities for physical activity) [21,22,[22][23][24] that might be associated with childhood obesity [25][26][27].
Existing models of childhood obesity risk also tend to focus on predictive variables that are routinely collected in clinical practice [28], and therefore tend to include only biological predictors and postnatal factors like infant sex and birthweight [29]. It has been suggested that one of the reasons for the intractability of childhood obesity is the failure to take into account the complexity and interconnectedness of contributing factors across the life course, ranging from the social, built, and economic environments to behavior, physiology, and epigenetics [30]. A number of childhood obesity risk factors that operate during the first 1000 days of life have been identified [13] and have special significance for obesity risk prediction. For instance, programming effects occurring during pregnancy increase children's obesity risk. Adding this information could lead to improvements in a model's ability to identify children at risk for obesity in early life, but EHR data typically contain information on maternal prenatal risk factors separately from risk factors during infancy and from measures of height and weight across childhood. The models presented in this study leverage data from a population-based, longitudinal database that combines data from multiple stages of the life course and thus add a valuable contribution to our understanding of obesity risk in early life.
Finally, the lack of effective interventions to reduce the risk for obesity in early life [31,32] suggests that efforts must be made to identify very young children with a high risk of developing obesity that could be specifically targeted for intervention. The methodology in the present paper employs long short-term memory (LSTM) [33] models to predict children's BMI prior to age 4 using different lengths of history data, determined by the number of previous clinical encounters. LSTM is a recurrent neural network model that learns from an ordered sequence of events, in this case, prior clinical encounters of the patient. While several machine learning techniques could have been used, an LSTM model was selected because the history encounter constitutes a time series. In particular, the variables height and weight that are used to calculate BMI as well as the age of the child vary from one encounter to the next. LSTM models are particularly well suited for time-series applications and continue to outperform other architectures in various fields. For example, in Wang et al.'s analysis [34], LSTM outperformed RF, SVM, Naive Bayes, and Feed forward neural networks when predicting patient-reported outcomes using history responses from cancer patients. In other applications [35], LSTM models were used to predict post-operative risk for patients suffering from obesity and risk for complications after bariatric surgery.

Data Source
Data were extracted from the Obesity Prediction in Early Life (OPEL) database, a unique longitudinal, epidemiologic data repository that combines birth certificate, contextual-level, and health outcome data for 19,857 children born in Marion County, Indiana. We constructed the OPEL database by linking three independent data sources: The Child Health Improvement through Computer Automation (CHICA) system; a computer-based pediatric primary care clinical decision support system that operated in eight pediatric primary care practices in Indianapolis between 2004-2019 [36]. The CHICA system includes data for over 47,000 patients on factors such as measured height and weight, demographics (e.g., child sex, age, race/ethnicity, Medicaid insurance status), and social determinants of health (e.g., parent health literacy, food and housing insecurity, parental depression, and infant feeding practices); 2.
The IN Standard Certificate of Live Birth (i.e., 'birth certificate'), which consists of 235 variables covering parental sociodemographic information as well as information on prenatal care, labor/delivery, and neonatal conditions and procedures. Birth certificate data were made available from the Marion County Public Health Department (MCPHD); and 3.
The Social Assets and Vulnerabilities Indicators (SAVI) Project, which collects geocodes, organizes, and presents integrated data on communities in the 11-county Indianapolis metropolitan statistical area drawn from more than 30 federal, state, and local providers. All are linked to the lowest available geographic level [37]. SAVI is the nation's largest community information system, with more than 10,000 time-series variables from 1980 to the present, including welfare, education, health, public safety, housing, demographics, locations of health facilities, health and human services, community facilities, and associated service areas.
Institutional Review Board approval to construct the OPEL database was obtained from the Indiana University School of Medicine. All data analyses for this study occurred on a restricted-access server provisioned specifically for research purposes.

Data Preprocessing
From the OPEL database, we identified 73,957 clinical encounters from 6614 children ages 0 to 4 years. Within this limited dataset, we performed data preprocessing to remove erroneous records, impute missing values, and encode variables into normalized features for use in our predictive model. For example, encounters where height decreased more than 2 inches from the previous encounter or with implausible recorded BMIs were categorized as input error. We also established valid ranges for the mother's gestational weight gain and the child's birth weight. Variables that were one-hot encoded (e.g., race of the mother or father) were converted to multi-class nominal variables. Finally, we deleted duplicative variables, administrative variables not directly relevant to the aims of our analysis, and variables without enough data to be useful.
This preprocessing yielded a list of 269 variables derived from the OPEL database that we initially considered for modeling (Appendix A). From this list, we performed feature reduction guided by existing peer-reviewed literature on early life obesity risk (e.g., [13]), expert opinion (ERC), and the results of a LASSO regression. Feature reduction also took into account noisy and sparsely populated variables.

Model Development
Our outcome of interest was BMI as defined by the Center for Disease Control and Prevention (CDC) guidelines [38]. We imputed missing and invalid BMIs using linear interpolation and height and weight data from previous encounters.
After preprocessing, we randomly selected an equal number of boy and girl patients, then split the dataset by patient such that 80% of our data was used for model training and 20% was used for model testing while maintaining an equal split according to patient sex. We normalized all input variables to values between −1 and 1. In the initial dataset, the girl class was the minority class.
We then developed separate long short-term memory (LSTM) [33] models to predict BMI using different lengths of history data, determined by the number of previous clinical encounters. We defined history data as either 2, 3, 5, or 8 prior encounters, and modeled our predictions of patient BMI at each encounter immediately following the set of history encounters. We modeled predictive variables as both fixed (e.g., maternal and paternal race, infant birthweight, mother's age at birth) and varying (e.g., patient's age, visit type, sleep quality) between encounters.
The model architecture consisted of an LSTM layer followed by a single Feed forward linear layer. The number of hidden nodes in the LSTM layer was set to half the number of input features. The Adam optimizer was used to update the weights in the model. Each model was trained using an input-output sequence with a varying number of history encounters. For example, when using five history encounters the model was trained to predict BMI at the sixth encounter.
Based on prior research demonstrating different obesity determinants for boys and girls [39], we developed three models: one for boys, one for girls, and a combined model for both. K-fold cross-validation [40] with k = 5 was used to evaluate each model and to estimate variabilities induced by the data selection. The accuracy of the models was measured using MAE and Pearson's correlation coefficient (R 2 ). We report the standard deviation of these metrics from the K-fold cross-validation.

Results
The feature reduction process resulted in a set of 24 exposure variables: 15 were derived from the CHICA dataset, 7 from the birth certificate, 1 from CHICA/birth certificate, and 1 from SAVI (Table 1).  Table 2 and Figure 1 show the distribution of the patients in the training and testing cohorts. As designed, there were approximately the same number of boys and girls included in both training and testing cohorts. There were no clinically meaningful differences across the cohorts in terms of mean BMI and age at the clinical encounter. The mean age at the encounter, defined as the average age across all encounters, was approximately 68 weeks (17 months), with no difference between the training and testing cohorts. There were also no significant differences between the cohorts with respect to the average number of encounters during the study period, although the average number of encounters for boys showed a higher standard deviation than for girls.  Data in Table 2 were used to develop the three types of models discussed above. The boy BMI model used a total of 2694 patients during training and was tested on 657 patients. Similarly, the girl model was trained on 2614 patients and tested on 649 patients. The combined model was trained using both training cohorts (i.e., 5308 boy and girl patients) and was tested on the combined testing cohorts (i.e., 1306 boy and girl patients). Table 3 and Figure 2 show the results of the LTSM models. Models with five or eight history encounters were determined to more accurately predict the patient's BMI than models using two or three history encounters. These models fit the observed data well, as shown by the mean average error and correlation between actual BMI and predicted BMI. Models were not trained with more than eight encounters due to concerns of reduced data quantity. Mean average error and correlation estimates were less optimal when using two or three history encounters, with the highest mean average error (1.49 for boys and 1.60 for girls) and the lowest correlation between actual and predicted BMI observed using two history encounters (R 2 = 0.55 in the boy only model and R 2 = 0.49 in the girl only model). Moreover, the K-fold standard deviation was low for both the mean average error and the R 2 in models with five and eight history encounters, indicating that these models were not susceptible to the selection of the training data and were more likely to generalize to new data. We observed higher K-fold standard deviations in models with two or three history encounters, suggesting less optimal performance in predicting BMI.  Each entry is the mean value of all folds in a 5 K-fold evaluation. MAE, mean average error; SD, standard deviation.
The above-mentioned advantages of the five and eight history encounter models were achieved despite having longer prediction horizons compared to the two or three history encounters models. For instance, the five history encounters boy model had an average prediction horizon of more than 20 weeks. That is, the model predicted BMI, on average, 20 weeks into the future. Conversely, the two history encounters model had an average prediction horizon of less than 18 weeks.
We did not observe significant model differences between boys and girls. The combined model showed optimal performance with the lowest mean average error (0.98, SD = 0.03) and the highest correlation (R 2 = 0.72), likely owing to the greater number of patients included.
Within the entire cohort, the mean age at which children reached five clinical encounters was 10.1 months with a standard deviation of 6.5 months.

Discussion
The purpose of this study was to understand the importance of historical health data in developing machine learning models to identify pediatric patients with increased risk of future overweight and obesity. Our LSTM models suggest that clinical data from at least five clinical encounters are needed to accurately predict child BMI prior to age four years with prediction horizons approximately 20 weeks in the future. In contrast to prior research [39], our combined model performed better than the models separated by sex, negating the need to develop and employ separate models for boys and girls.
Although previous studies have successfully applied machine learning to predict childhood obesity [14], few have investigated the application of these models in clinical care [28]. Our model could be employed in a pediatric clinical setting to dynamically track and predict children's BMI progression, facilitating obesity prevention through anticipatory guidance during each wellness visit. The results also suggest that having height and weight data from at least five clinical encounters may be necessary to accurately predict future BMI values. Encouragingly, the majority of patients in our sample achieved this threshold within the first 17 months of life, with 10 months being the average age at which children reached five clinical encounters. This suggests that employing our model to identify children at risk for suboptimal weight outcomes is feasible in very early childhood.
The input variables used by our model are consistent with previous findings in the literature [13]. For instance, characteristics of children's sleep such as duration, timing, and quality have been associated with obesity [41,42]. In this study, we conducted an ablation test on the two sleep quality variables (i.e., frequency of nighttime waking and parental perception of sleep quality) for the combined boys and girls model with five history encounters. The result of the ablation test shows a higher mean average error (1.03 vs. 0.98) with a larger standard deviation (0.07 vs. 0.03). The BMI correlation also dropped from 0.72 to 0.70, underscoring the important association of early sleep quality for the prediction of children's obesity risk.
Pediatricians are well-positioned to provide parents with information regarding obesity risk in early life, but many consensus guidelines recommend obesity screening in the pediatric setting only after 2 years of age when the "tipping point" of obesity onset may have already passed [43]. Further, meta-analyses indicate that BMI surveillance and counseling have only marginal effects on reducing children's BMI [44]. There is evidence that unhealthy weight gain in very early childhood of age tracks into later childhood, adolescence, and adulthood [10,11], which suggests that new approaches to help providers and parents address this problem are needed. Our screener, administered in the clinic setting, could help identify very young children at risk of unhealthy weight gain, enabling preventive counseling focused on healthy feeding, activity, and family lifestyle behaviors. Even though our findings show statistical support for postponing BMI prediction until it is possible to obtain information from five clinical encounters, the proposed models still facilitate early identification and intervention as existing guidelines recommend at least this many pediatric visits by six months of age [18]. The prediction horizon of 20 weeks and the frequency of encounters during children's first year of life means that there are numerous opportunities for providers to monitor growth, identify weight issues, and take appropriate action.
Consistent with prior research [45], the performance of our models diminished as the temporal distance between the acquisition of the exposure variables and the time of BMI prediction in the future increased. While requiring only two history encounters is attractive in practice because it enables the use of the model for a wider population, the high mean average error of the resulting predictive models makes their utility to predict obesity risk limited. The model's improvement when using five history encounters suggests that more clinical data are needed before one can correctly predict future BMI. However, further research is needed to evaluate the reproducibility and generalizability of our models before they can be applied in clinical practice for similar and related populations. Future work may wish to investigate the relative importance of the variables in our model using an external validation dataset and by conducting ablation experiments as performed in the present study for the subjective sleep quality variables.
Machine learning has been widely applied in the field of obesity research, both for the prediction of future weight outcomes and for identifying targets for intervention. Several previous studies proposed classifiers for obesity in both adults and for early childhood. For instance, Thamrin et al. [46] used linear regression and various machine learning approaches (Bayesian networks and CART models) to classify adults 18 and older as having or not having obesity based on survey data on indicators such as age, parental obesity, and activity level. Here, we predict children's future BMI rather than classify risks for obesity. We stipulate that the transparency of our proposed approach can better support intervention. Another earlier study by Dugan et al. [47] used longitudinal data from CHICA to compare different machine learning techniques (decision trees, random forest, and Bayesian networks) using 167 features from the first 2 years of life. They found that decision trees provided the best accuracy when predicting obesity between ages 2 and 10 years. Our study expands on this work by using historical data to predict children at risk for obesity. Other research focused on machine learning and obesity prediction has provided thresholds for obesity rather than BMI [48][49][50], which may not be as applicable for patients at younger ages. The models proposed in the present paper estimate exact BMI values and are dynamic. They predict future BMI based on the nearest history and can therefore be used for children of varying ages. Moreover, the proposed models leverage routinely-collected EHR data, which is a practical approach compared with previous models that, for example, predict obesity using more costly and less accessible genetic data [48,51]. Importantly, the limited number of features we identify makes our model practical for use in other settings. Although the relatively narrow set of variables we identify are not all typically included in the EHR, they could be easily collected using existing screeners [28]. This data collection approach was successfully used in previous studies to obtain child birthweight and weight change between birth and 6, 9, and 12 months [52]; and to obtain data on paternal weight, maternal smoking, and breastfeeding [53].
Our study is subject to some limitations. First, it is possible that our results may be confounded by child age. While the distribution of the data (Table 2) shows that the average at encounter is approximately 68 weeks for all cohorts, patients with five or eight encounters may be older than those with two or three encounters. Their BMI may be more stable and easier to predict. This potential for confounding is the subject of a current investigation. In addition, the EHR data within the OPEL database is derived from a predominately low-income, urban population in Indianapolis, IN. Additional work in other populations is needed to externally validate our findings, as children's growth patterns may vary by socioeconomic factors [54]. Finally, we were unable to examine other variables that are potentially impactful to children's early weight gain, like physical activity, as they were not included in the OPEL database. Future research may wish to incorporate such measures for a better understanding of the children's weight trajectories.

Conclusions
The present study shows that five history encounters and a limited number of exposure variables are sufficient to predict BMI for both boys and girls in very early childhood. These findings can inform efforts to identify infants at risk of developing overweight and obesity. We envision using the proposed model in a pediatric clinic to dynamically track the progression of children's BMI four months into the future during each wellness visit. Our findings have implications for future work aimed at early identification and intervention of obesity, as well as for other chronic diseases that begin in early life.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy laws.

Acknowledgments:
The authors wish to thank Sami Gharbi for his contribution to the data acquisition and interpretation.

Conflicts of Interest:
The authors have no conflict of interest to disclose.

Appendix A
Complete list of starting features before LASSO reduction by data source.