Next Article in Journal
Articulatory Data on Preboundary Lengthening Across Prominence Conditions in American English
Previous Article in Journal
DECOVID: A UK Two-Center Harmonized Database of Acute Care Electronic Health Records for COVID-19 Research
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Using Machine Learning to Identify Predictors of Heterogeneous Intervention Effects in Childhood Obesity Prevention

by
Elizabeth Mannion
1,
Kristine Bihrmann
1,
Nanna Julie Olsen
2,
Berit Lilienthal Heitmann
2,3 and
Christian Ritz
1,*
1
National Institute of Public Health, University of Southern Denmark, 1455 Copenhagen, Denmark
2
Research Unit for Dietary Studies, The Parker Institute, Bispebjerg and Frederiksberg Hospital, 2000 Frederiksberg, Denmark
3
Department of Public Health, Section for General Practice, University of Copenhagen, 1172 Copenhagen, Denmark
*
Author to whom correspondence should be addressed.
Data 2025, 10(12), 196; https://doi.org/10.3390/data10120196 (registering DOI)
Submission received: 24 October 2025 / Revised: 18 November 2025 / Accepted: 19 November 2025 / Published: 1 December 2025

Abstract

Obesity prevention interventions in children often produce small or null effects. However, ignoring heterogeneous responses may widen pre-existing inequalities. This secondary analysis explored baseline predictors of differential effects on BMI z-score, Fat mass (%), stress, and sleep outcomes in obesity-susceptible, healthy-weight children (n = 543). A modified LASSO regression was applied to baseline characteristics, including physical activity and socio-demographics. Few predictors were retained. For BMI z-score, weekly chores and parental divorce were the strongest predictors: children who did chores had a slightly larger increase in BMI z-score in the intervention group compared with controls (MD = 0.15, 95% CI: −0.03, 0.33), while children with divorced parents showed a smaller increase (MD = −0.19, 95% CI: −0.69, 0.31). These results align with evidence that low-intensity activity has limited impact on obesity outcomes and that children with compounded vulnerability may respond differently to tailored interventions. Even when overall effects are small, machine learning approaches can identify potential predictors of heterogeneous intervention effects, supporting the design of future targeted interventions aimed at reducing inequalities.

1. Introduction

Childhood obesity remains a major global public health challenge, driven by a complex interplay of biological and lifestyle factors. Although prevalence has stabilised in some countries, inequalities persist and are widening, while in others prevalence continues to rise despite decades of prevention efforts [1]. Several biological predispositions increase susceptibility to obesity, but their impact depends on environmental and lifestyle exposures [2]. Some factors, such as dietary behaviors, physical activity, sleep, stress, and maternal behavior, are modifiable, while others, such as socio-economic status, are more challenging to change but strongly associated with obesity risk [3,4]. Childhood obesity not only has immediate psychosocial consequences, but estimates suggest that children with obesity have a 62–88% probability of continuing with obesity by age 35, where risks of type 2 diabetes, cardiovascular disease, disability, and premature mortality are increased. While some risks are higher when childhood obesity persists into adulthood, research suggests that childhood obesity, even when not carried into adulthood, still affects adult life, including disability and early retirement, underscoring the urgency of effective early-life prevention [5,6,7,8,9].
In response, a broad range of obesity prevention interventions have emerged over the past two decades, typically targeting modifiable risk factors such as diet and physical activity, with combined approaches often showing greater effectiveness on body outcomes [10]. However, overall effects of universal interventions, those delivered to entire populations rather than tailored subgroups, are often small or null [10]. When effects do occur, they tend to reflect reductions in body weight among children already with overweight or obesity, while little is known about factors that influence the prevention of obesity in healthy-weight children [11]. Meanwhile inequalities in childhood obesity and its prevention are prevalent, with children from lower socioeconomic backgrounds at disproportionate risk and evidence suggesting they have fewer resources to benefit from one-size-fits-all approaches [3,12]. In addition, other baseline characteristics have been found to modify intervention effects; for example, higher baseline physical activity has been associated with greater intervention effect [13]. Identifying drivers of heterogeneous intervention effects is therefore essential for improving both the effectiveness and equity of childhood obesity prevention interventions.
Secondary analyses of existing trial data using interpretable, data-driven machine learning methods provide a cost-effective and efficient approach to explore heterogeneous intervention effects, particularly in studies where no overall intervention effect was found but differential effects may have been masked [14,15]. Such methods align with recent strategies in nutrition science for subgroup-specific analyses [16] and complement outcome-based frameworks in personalized nutrition research [17], which advocate tailoring interventions according to meaningful outcome variation. Unlike conventional subgroup analyses, machine learning methods can simultaneously evaluate multiple correlated baseline factors to identify heterogeneity in intervention effects while reducing overfitting through regularization [15,18]. This approach may identify which baseline characteristics in healthy-weight children predict heterogeneous intervention effects, addressing a critical knowledge gap highlighted by the Danish Council on Obesity [11]. Among these methods, the least absolute shrinkage and selection operator (LASSO) is particularly suited to randomized trial data [19]. By identifying baseline characteristics that drive heterogeneity, interpretable machine learning can enhance understanding of intervention mechanisms and inform the design of more tailored and equitable obesity prevention strategies.
The aim of the current study was to conduct a secondary analysis of data from an obesity prevention intervention, using LASSO regression to explore whether baseline characteristics predicted heterogeneous intervention effects on BMI z-score, fat mass, and secondary sleep and stress related outcomes.

2. Materials and Methods

2.1. Study Design and Population

The present study is a secondary analysis based on data from the Healthy Start Study, a randomized controlled trial conducted in the Greater Copenhagen area between 2009 and 2011, targeting obesity-susceptible, healthy-weight children. The original Healthy Start study aimed to prevent excessive weight gain and maintain healthy weight [20,21].
Parents of eligible children aged 2–6 (born between 1 January 2004, and 31 December 2007) were contacted and informed that their child was considered “susceptible” to obesity based on one or more criteria identified from the Danish Medical Birth Registry: high birth weight (>4 kg), maternal pre-pregnancy overweight (BMI > 28 kg/m2), or, in one municipality, maternal low educational level (<10 years of education) [21]. The original study randomized participants to three groups: an intervention group (40%), which received the obesity prevention program; a control group (40%), which did not receive the intervention but was followed alongside the intervention group and a shadow control group (20%), which served as an observational comparison group but did not receive intervention or control protocols and for which baseline covariate data were not collected [20,21]. Children with overweight or obesity at baseline (BMI > 25 kg/m2) were excluded from the original study, ensuring that only healthy-weight children were included at baseline. Written informed consent for the use of collected data in research was obtained from all parents of children in both the intervention and control group [21].
For this secondary analysis, only participants in the intervention and control groups were included, those in the shadow control group were excluded due to missing baseline covariate data. Each LASSO model included only participants with complete data for all predictors, outcomes, and confounders in that model, after excluding variables with >30% missing data.
This study was conducted in accordance with the Declaration of Helsinki. The Scientific Ethical Committee of the Capital Region of Denmark determined that the original Healthy Start Study did not require approval from the Danish Bioethics Committee (journal number H-A-2007-0019). The original study received approval from the Danish Data Protection Agency (journal number: 2015-41-3937). The Healthy Start study is registered at ClinicalTrials.gov (ID: NCT01583335).

2.2. Intervention

The intervention, in the original Healthy Start study, consisted of tailored family consultations based on the needs of parents using stages of change model, focusing on improving children’s diet, physical activity stress and sleep. These consultations were delivered through motivational interviewing to children and their families, with the child as the primary target of behavioral change [22]. Parents and siblings participated to support the child’s behavior changes. During each consultation, families collaboratively selected tools they were motivated to use, guided by the stages of change model [23]. Sessions were held at community-provided locations (e.g., schools) to minimize transportation time for participants. The intervention period lasted approximately 1.3 years. Detailed descriptions of the multiple aspects of the intervention procedures have been published previously [24,25,26]. In addition to the individual consultations, families in the intervention group were invited to bi-monthly group cooking classes and monthly physical activity sessions. Families also had access to recipe ideas and suggestions for active play via the Healthy Start website [27]. The control group attended the first consultation and the follow up consultation only, they were not seen by the health consultant between the baseline and follow up examinations.
The first consultation lasted approximately 1 h, followed by a 1.5-h follow-up. Subsequent sessions between baseline and follow-up were approximately 30 min each, with a maximum gap of 4 months between sessions, ensuring at least three sessions during the intervention period.

2.3. Outcomes

The outcome variables in the present study were BMI z-score, Fat Mass percentage, sleep duration (hours), sleep latency (minutes), child psychological stress (SDQ), and pro-social behavior (SDQ-PSB) at follow up. All outcome variables were collected and calculated by researchers in the original healthy start study at the follow up consultation (an average of 1.3 years after the baseline consultation).

2.3.1. BMI Z-Score

BMI z-scores were generated in the original study, applying national Danish reference values [28].

2.3.2. Fat Mass Percentage

Fat mass was calculated in the original study by subtracting fat-free mass from body weight, and the percentage of fat mass was determined by dividing fat mass by body weight [21]. The original study estimated fat-free mass using an equation developed by Goran, using bioelectrical impedance resistance measurements (at resistance 50 kHz) to account for lean tissue [29].

2.3.3. Stress

Child psychological stress was assessed using the Danish single-sided version of the Strengths and Difficulties Questionnaire (SDQ), answered by parents in the original study [30,31]. The composite stress score was calculated using the validated SDQ Total Difficulties score. Scores range from 0–40, with higher scores indicating worse difficulties.
The Prosocial Behavior (SDQ-PSB) score, which ranges from 0 to 10 points, was treated separately from the SDQ Total Difficulties score, as the absence of prosocial behaviors is conceptually different from the presence of psychological difficulties [32]. A higher SDQ-PSB score reflects better pro-social behavior.

2.3.4. Sleep

Sleep duration and sleep latency were calculated from self-reported sleep diaries, with parents recording their child’s sleep times over six days to estimate average nighttime sleep duration (hours). Average sleep onset latency (minutes) was also reported, based on the approximate time between when the child was put to bed and when they fell asleep.

2.4. Measurements

Anthropometric measurements were taken at both baseline and follow-up by the Healthy Start researchers during face-to-face consultations with the children in both the intervention and control groups. These measurements included waist and hip circumference, skinfold thickness, height, and weight, which were used to calculate fat mass indicators and BMI at both baseline and follow-up. Detailed descriptions of the measurements taken as part of the original Healthy Start Study have previously been published [20,21].
Socio-economic indicators and pre- and post-natal characteristics were obtained in the original study via registry linkage from the Danish Medical Birth Register and the Danish Health Visitors’ Child Health Database. Specifically, baseline covariates in the present study included parental BMI (kg/m2), parental education (highest completed level), and duration of exclusive breastfeeding (months). In addition, self-reported data from parent questionnaires were used to assess other baseline covariates, including children’s weekly engagement in physical activities, with estimates of hours spent on specific activities. Most of these questionnaire variables were recorded in a binary yes/no format, while a few frequency-based variables (e.g., hours per week) were available but largely incomplete due to missing data. The questionnaires also provided information on average daily food consumption, which the current study standardized to per gram depending on the indictor (e.g., saturated fat per 5 g, protein per 10 g, and milk per 100 g), across various food categories and macronutrients.

2.5. Statistical Analysis

Descriptive statistics are presented, for baseline characteristics, as means and standard deviations for continuous variables and as counts with percentages for categorical variables, stratified by intervention group.
To identify predictors of heterogeneous intervention effects, a modified least absolute shrinkage and selection operator (LASSO) regression was employed [33]. Figure 1 summarises the modified LASSO model. Predictor variables with more than 30% missing data were excluded, and models were fitted on complete cases for the remaining variables. Predictors were standardized to have mean 0 and standard deviation 1 prior to model fitting. Interaction terms between each baseline predictor and the intervention indicator were generated by multiplying each predictor by a centered intervention variable coded as +0.5 for the intervention group and −0.5 for the control group. This coding centers the intervention variable at zero, thereby effectively removing the main effect of the intervention from the penalized terms and allowing the model to directly estimate interaction effects. These interaction terms (X × Z) were included as penalized predictors in the LASSO model. Age, sex, baseline value of the outcome, and average daily energy intake (MJ) (for BMI z-score and Fat Mass outcomes) were included as unpenalized covariates to adjust for confounding [34]. Only participants with complete data for the outcome, baseline characteristics, and unpenalized covariates were included in each model.
Modified LASSO models were fit using the cv.glmnet() function from the glmnet package in R, with 10-fold cross-validation and a fixed random seed to select the optimal regularization parameter (λmin) (Figure 1) [35]. Variables with non-zero LASSO coefficients were considered potential predictors of intervention effect heterogeneity. This LASSO-based identification of potential effect modifiers is exploratory and intended to generate hypotheses rather than provide confirmatory evidence. The sign of each coefficient indicates directionality: positive values correspond to larger outcomes in the intervention group relative to control, and negative values correspond to lower outcomes. Coefficients were normalized to allow comparison of relative effect sizes across predictors.
To quantify the direction and magnitude of effects for LASSO-selected baseline characteristics, linear regression models including the intervention-by-characteristic interaction term were fitted, adjusting for the same covariates. Marginal means were estimated using the emmeans package in R, and mean differences between intervention and control groups were calculated within each level of the characteristic [36]. Difference-of-mean-differences (DoMD) with 95% confidence intervals was then derived. Mean difference and DoMD estimates are presented without p-values, consistent with guidance for post hoc subgroup analyses, emphasizing effect magnitude and direction rather than formal hypothesis testing [37]. All subgroup analyses are exploratory and should not be interpreted as confirmatory.
All statistical analyses were carried out using R version 4.41 [38]. A synthetic dataset replicating the size and structure of the original data is provided as Supplementary Material, along with the R code for the analysis (Supplementary Material).

3. Results

3.1. Characteristics of the Study Population

A total of 543 children were included in this study; 271 children were in the intervention group and 272 children were in the control group (Table 1). The mean overall age at baseline was 4.01 years and the mean overall follow-up time was 1.3 years.

Baseline Characteristics as Predictors of Heterogenous Intervention Effects

After excluding variables with >30% missing data, 28 baseline variables were eligible for inclusion in the LASSO models (Table S1, Supplementary Material). Sample sizes varied across outcomes due to missing data, ranging from 126 to 203 participants per model, representing 56% to 61% retention from the eligible baseline sample (Table S2, Supplementary Material). Subgroup analyses presented here are exploratory and hypothesis-generating, and results should be interpreted with caution.
For the secondary outcomes, Fat Mass (%), Total SDQ score, Pro-social SDQ score, and Sleep Latency (minutes), no predictors were retained by the LASSO model, indicating that no baseline variables provided strong signals to predict heterogeneous intervention effects. For Total Sleep Hours, parental divorce was the only variable selected. Subgroup mean differences, differences of mean differences, and LASSO coefficients for these outcomes are presented in Table S3 (Supplementary Material).
For the outcome BMI z-score, several predictors were selected (λ = 0.02). The strongest were chores as a weekly activity, parental divorce, and playing hide-and-seek as a weekly activity, based on the magnitude of the LASSO coefficients (Table 2). Figure 2 displays the normalized coefficients, illustrating their relative influence on intervention effect heterogeneity for BMI z-score.
Mean differences (MD) and difference of mean differences (DoMD) analyses quantified intervention effect sizes on BMI z-score for the baseline characteristics identified as most important by the LASSO regression, none reached statistical significance. Children who engaged in weekly chores showed a slightly greater increase in BMI z-score in the intervention group compared with controls (MD = 0.15, 95% CI: −0.03, 0.33), whereas children who did not do chores had a smaller increase (MD = 0.02, 95% CI: −0.14, 0.17), the subgroup difference was not significant (DoMD = 0.14, 95% CI: −0.10, 0.38). Children with divorced parents showed a smaller increase in BMI z-score in the intervention group compared with controls (MD = −0.19, 95% CI: −0.69, 0.31), a pattern not observed among children with non-divorced parents (MD = 0.09, 95% CI: −0.02, 0.20), this subgroup difference was also not statistically significant (DoMD = −0.27, 95% CI: −0.79, 0.24). There was no difference in mean BMI z-score between the intervention and control groups in children who played hide-and-seek as a weekly activity (MD = 0.01, 95% CI: −0.12, 0.14). In contrast, children who did not play hide-and-seek weekly had a larger increase in BMI z-score in the intervention group compared to the control group (MD = 0.33, 95% CI: 0.09, 0.57). The j698 subgroup difference indicated that this was a significant difference (DoMD = −0.32, 95% CI: −0.59, −0.05).

4. Discussion

The purpose of this secondary analysis was to explore baseline characteristics, using machine learning, that might predict heterogeneous intervention effects on BMI z-score, Fat mass (%), and secondary sleep and stress outcomes in obesity susceptible, healthy weight children enrolled in the Healthy Start Study. Using LASSO regression, a small number of predictors were identified, including indicators of social factors (parental divorce) and physical activity behaviors (e.g., weekly chores or hide-and-seek). It is important to note that these analyses are exploratory and hypothesis-generating rather than confirmatory, given the post hoc nature and limited sample sizes. However, for the outcomes Fat Mass (%), Total SDQ score, Pro-social SDQ score, and Sleep Latency (minutes), no baseline characteristics were selected/retained by the model. Mean differences between intervention groups by levels of these baseline characteristics were clinically small, and confidence intervals predominantly included zero.
The strongest relative predictor of differential intervention effects for BMI z-score was children engaging in chores as a weekly activity. Interestingly, the direction of this effect, though not significant, suggested that children engaging in chores had a slightly greater increase in BMI z-score in the intervention group compared with controls. Though initially counterintuitive to the relationship between physical activity and weight related outcomes, this pattern may reflect the low-intensity and incidental nature of chores, which may not substitute for moderate-to-vigorous physical activity (MVPA). Prior evidence has consistently linked MVPA, rather than light or incidental activity, to favorable obesity-related outcomes in children, with some authors even suggesting that low-intensity exercise may be predictive of overweight [39,40]. In the current study, chores and other low-intensity activities, such as walking or board games, may have displaced more beneficial forms of exercise for BMI Z-score outcomes, offering one explanation for the observed directionality.
The second strongest predictor of differing intervention effect for BMI z-score was parental divorce. Children of divorced parents in the intervention group had a slightly smaller increase in BMI z-score compared to controls when marginal mean estimates were compared, though confidence intervals crossed zero and both group’s BMI Z-score did increase from baseline to follow up. Prior research, such as the Greenlight trial, has found that subgroups with increased vulnerability, often associated with lower socio-economic status, may gain an enhanced benefit when interventions are tailored to their needs [41].
In the current study, parental divorce may have served as a marker of compounded vulnerability within an already susceptible cohort, producing the small directional effect differences observed. The Healthy Start intervention targeted children deemed obesity susceptible, defined in some cases using socio-economic measures such as maternal education, so those at the extreme end of vulnerability, potentially indicated by parental divorce, may have benefited most. However, the subgroup of children with divorced parents was small, which may have affected the stability of the LASSO estimates and exaggerated the signal of predicted differing treatment effects. Confirmatory trials are needed to test this hypothesis by recruiting sufficient numbers of children with specific family structures to detect differences and compare across dynamics.
Overall, these findings align with prior literature showing that childhood obesity prevention interventions produce small or null effects in children who already have a healthy weight, with larger effects typically seen in those with overweight or obesity at baseline [11,42,43]. Nevertheless, the LASSO regression in the current study provided a data-driven, interpretable method to identify the strongest predictors of differential intervention effects, even when effect sizes were small. Insights from this methodology can be used efficiently and cost-effectively to inform future studies and hypothesis generation, illustrating the value of machine learning for secondary analyses in studies with inherent design constraints.

4.1. Study Strengths and Limitations

The main strength of the current analysis lies in its methodological contribution. By applying LASSO regression to explore heterogeneous intervention effects, this study demonstrates the value of machine learning approaches for hypothesis generation in childhood obesity prevention trials, where more creative and data-driven solutions have been called for [11]. Machine learning methods, including penalized regression, have recently been advocated as valuable tools for uncovering heterogeneity of treatment effects in randomized controlled trials [44]. Our approach aligns with recent PATH (Predictive Approaches to Treatment Heterogeneity) and HTE (Heterogeneity of Treatment Effect) guidance [15], using regularization to reduce overfitting, avoiding p-values for subgroup effects, and emphasizing effect sizes for robust, interpretable findings. Our results show that even when overall intervention effects are small, interpretable methods like LASSO can identify baseline characteristics that may warrant investigation in future research, offering an efficient and cost-effective approach for secondary analyses.
Nonetheless, several limitations should be noted. The original Healthy Start study was not designed or powered to detect subgroup differences, leading to wide confidence intervals and reduced precision of estimates of the current analysis. Some LASSO selected predictors, such as parental divorce, represented small subgroups, which may have affected the stability of estimates and exaggerated signals [45]. Future replication studies could strengthen these findings by including formal model stability assessments (e.g., bootstrapping). The analysis focused on group-level heterogeneity; applying similar methods to estimate individual or matched treatment effects (ITE/CATE) would advance the field of precision medicine and nutrition, though this was beyond the scope and data availability of the current study. Potential bias may have arisen through the use of convenience sampling, as the sample was based on available and complete data for outcome variables and baseline characteristic combinations for each model, where missingness could not be confirmed to be random. For example, poor diet, low physical activity and parental education are known determinants of engagement in obesity interventions [46]. This may have introduced selection bias and limits the generalisability of the results to the study population [47]. Finally, the dataset was collected between 2009 and 2011, and contextual changes, such as increased screen time, shifts in dietary environments, physical activity opportunities, and updated public health recommendations, may affect the relevance of findings to contemporary child populations [48,49,50].

4.2. Future Research

Future research should build on this work using prospectively powered trials that incorporate pre-specified subgroup analyses and integrate machine learning techniques to identify children who may benefit disproportionately from obesity prevention interventions. Such approaches could support the design of tailored or intensified interventions that address disparities in childhood obesity outcomes, ultimately ensuring that prevention efforts are both effective and equitable.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/data10120196/s1, Table S1: Baseline characteristics included in LASSO models (≤30% missing data). Table S2: Available and included cases for each outcome model. Table S3. LASSO coefficients, Subgroup mean differences and differences of mean differences for the outcomes Fat Mass (%), Total SDQ score, Pro-social SDQ score, Total Sleep Hours, and Sleep Latency (minutes). File S1. Synthetic Dataset (Healthy_start_synth.rds): A synthetic dataset generated with similar structure to the original dataset (which cannot be shared due to restrictions), provided for the purpose of enabling end-to-end execution of the analysis code. File S2. R Code for Synthetic Data Generation (Healthy_start_synth_create.R): Script containing the full code used to generate the synthetic dataset (File S1). File S3. Example Analysis Code (Healthy_start_example_LASSO.R): R code demonstrating how to reproduce the analyses from the manuscript using the synthetic dataset, including LASSO regression and the difference-of-differences analysis. File S4. Data Dictionary (Healthy_Start_data_dictionary.xlsx): A data dictionary describing all variables in the synthetic dataset, including variable names, labels, units, and coding. File S5. Session Information (sessionInfo.txt): R session information (packages, versions, and system details) used when running the example analysis code.

Author Contributions

Conceptualization, C.R.; methodology, C.R. and K.B.; formal analysis, E.M. and C.R.; data curation, B.L.H. and N.J.O.; writing—original draft preparation, E.M.; writing—review and editing, C.R., K.B., B.L.H. and N.J.O.; supervision, C.R. and K.B.; project administration, C.R.; funding acquisition, C.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Novo Nordisk Foundation, grant number NNF22SA0080451.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki. The Scientific Ethical Committee of the Capital Region of Denmark determined that the original healthy start project did not require approval from the Danish Bioethics Committee (journal number H-A-2007-0019). The original study received approval from the Danish Data Protection Agency (journal number: 2015-41-3937).

Informed Consent Statement

Informed consent was obtained from all participants involved in the study.

Data Availability Statement

The original dataset contains identifiable information and is available only onsite at The Parker Institute (Copenhagen University Hospital). A fully synthetic dataset reproducing the structure and missingness of the original data, together with all analysis code, is provided in the Supplementary Materials. Requests to access to the original dataset should be directed to Berit Heitmann (berit.lilienthal.heitmann@regionh.dk).

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study.

Abbreviations

The following abbreviations are used in this manuscript:
LASSOLeast Absolute Shrinkage and Selection Operator
SESSocio-economic status
BMIBody Mass Index
SDQStrengths and Difficulties Questionnaire
SDQ-PSBStrengths and Difficulties Questionnaire—Prosocial Behaviour
PATHPredictive Approaches to Treatment Heterogeneity
HTEHeterogeneity of Treatment Effect

References

  1. World Health Organization. World Health Statistics 2022: Monitoring Health for the SDGs, Sustainable Development Goals, 1st ed.; World Health Organization: Geneva, Switzerland, 2022; ISBN 978-92-4-005114-0. [Google Scholar]
  2. Mahmoud, R.; Kimonis, V.; Butler, M.G. Genetics of Obesity in Humans: A Clinical Review. Int. J. Mol. Sci. 2022, 23, 11005. [Google Scholar] [CrossRef]
  3. Robertson, A.; Lobstein, T.; Knai, C. Obesity and Socio-Economic Groups in Europe: Evidence Review and Implications for Action; European Commission: Brussels, Belgium, 2007. [Google Scholar]
  4. Obesity and Overweight. Available online: https://www.who.int/news-room/fact-sheets/detail/obesity-and-overweight (accessed on 7 October 2025).
  5. Pont, S.J.; Puhl, R.; Cook, S.R.; Slusser, W.; Section on Obesity; The Obesity Society. Stigma Experienced by Children and Adolescents with Obesity. Pediatrics 2017, 140, e20173034. [Google Scholar] [CrossRef]
  6. Reiband, H.K.; Klemmensen, R.T.; Rosthøj, S.; Sørensen, T.I.A.; Heitmann, B.L. Body Weight in Childhood, Adolescence, and Young Adulthood in Relation to Later Risk of Disabilities and Early Retirement among Danish Female Nurses. Int. J. Obes. 2024, 48, 859–866. [Google Scholar] [CrossRef]
  7. Bjerregaard, L.G.; Jensen, B.W.; Ängquist, L.; Osler, M.; Sørensen, T.I.A.; Baker, J.L. Change in Overweight from Childhood to Early Adulthood and Risk of Type 2 Diabetes. N. Engl. J. Med. 2018, 378, 1302–1312. [Google Scholar] [CrossRef]
  8. Ward, Z.J.; Long, M.W.; Resch, S.C.; Giles, C.M.; Cradock, A.L.; Gortmaker, S.L. Simulation of Growth Trajectories of Childhood Obesity into Adulthood. N. Engl. J. Med. 2017, 377, 2145–2153. [Google Scholar] [CrossRef] [PubMed]
  9. Guo, S.S.; Wu, W.; Chumlea, W.C.; Roche, A.F. Predicting Overweight and Obesity in Adulthood from Body Mass Index Values in Childhood and Adolescence123. Am. J. Clin. Nutr. 2002, 76, 653–658. [Google Scholar] [CrossRef] [PubMed]
  10. Spiga, F.; Davies, A.L.; Tomlinson, E.; Moore, T.H.; Dawson, S.; Breheny, K.; Savović, J.; Gao, Y.; Phillips, S.M.; Hillier-Brown, F.; et al. Interventions to Prevent Obesity in Children Aged 5 to 11 Years Old. Cochrane Database Syst. Rev. 2024, 3, CD009729. [Google Scholar]
  11. Olsen, N.J.; Østergaard, J.N.; Bjerregaard, L.G.; Høy, T.V.; Kierkegaard, L.; Michaelsen, K.F.; Sørensen, T.I.A.; Grønbæk, M.K.; Bruun, J.M.; Heitmann, B.L. A Literature Review of Evidence for Primary Prevention of Overweight and Obesity in Healthy Weight Children and Adolescents: A Report Produced by a Working Group of the Danish Council on Health and Disease Prevention. Obes. Rev. 2024, 25, e13641. [Google Scholar] [CrossRef] [PubMed]
  12. Rolke, L.; White, M.J. Improving the Effectiveness and Equity of Child Obesity Interventions. Pediatrics 2024, 153, e2023064453. [Google Scholar] [CrossRef]
  13. Mannion, E.; Bihrmann, K.; Plachta-Danielzik, S.; Müller, M.J.; Bosy-Westphal, A.; Ritz, C. Exploring the Effect of an Obesity-Prevention Intervention on Various Child Subgroups: A Post Hoc Subgroup Analysis of the Kiel Obesity Prevention Study. Nutrients 2024, 16, 3220. [Google Scholar] [CrossRef]
  14. Inoue, K.; Adomi, M.; Efthimiou, O.; Komura, T.; Omae, K.; Onishi, A.; Tsutsumi, Y.; Fujii, T.; Kondo, N.; Furukawa, T.A. Machine Learning Approaches to Evaluate Heterogeneous Treatment Effects in Randomized Controlled Trials: A Scoping Review. J. Clin. Epidemiol. 2024, 176, 111538. [Google Scholar] [CrossRef] [PubMed]
  15. Kent, D.M.; Paulus, J.K.; van Klaveren, D.; D’Agostino, R.; Goodman, S.; Hayward, R.; Ioannidis, J.P.A.; Patrick-Lake, B.; Morton, S.; Pencina, M.; et al. The Predictive Approaches to Treatment Effect Heterogeneity (PATH) Statement. Ann. Intern. Med. 2020, 172, 35–45. [Google Scholar] [CrossRef]
  16. Lambert, J.; Xavier, T.; Berger, K.; Peairs, A. A Strategy for Exploring Subgroup-Specific Effects in Nutrition Science. J. Nutr. Sci. 2022, 11, e106. [Google Scholar] [CrossRef]
  17. Ferrario, P.G.; Watzl, B.; Møller, G.; Ritz, C. What Is the Promise of Personalised Nutrition? J. Nutr. Sci. 2021, 10, e23. [Google Scholar] [CrossRef]
  18. Padula, W.V.; Kreif, N.; Vanness, D.J.; Adamson, B.; Rueda, J.-D.; Felizzi, F.; Jonsson, P.; IJzerman, M.J.; Butte, A.; Crown, W. Machine Learning Methods in Health Economics and Outcomes Research-The PALISADE Checklist: A Good Practices Report of an ISPOR Task Force. Value Health 2022, 25, 1063–1080. [Google Scholar] [CrossRef] [PubMed]
  19. Tibshirani, R. Regression Shrinkage and Selection via the Lasso. J. R. Stat. Soc. Ser. B Methodol. 1996, 58, 267–288. [Google Scholar] [CrossRef]
  20. Olsen, N.J.; Buch-Andersen, T.; Händel, M.N.; Østergaard, L.M.; Pedersen, J.; Seeger, C.; Stougaard, M.; Trærup, M.; Livemore, K.; Mortensen, E.L.; et al. The Healthy Start Project: A Randomized, Controlled Intervention to Prevent Overweight among Normal Weight, Preschool Children at High Risk of Future Overweight. BMC Public Health 2012, 12, 590. [Google Scholar] [CrossRef]
  21. Olsen, N.J.; Ängquist, L.; Frederiksen, P.; Lykke Mortensen, E.; Heitmann, B.L. Primary Prevention of Fat and Weight Gain among Obesity Susceptible Healthy Weight Preschool Children. Main Results from the “Healthy Start” Randomized Controlled Intervention. Pediatr. Obes. 2021, 16, e12736. [Google Scholar] [CrossRef]
  22. Miller, W.R. Motivational Interviewing: Research, Practice, and Puzzles. Addict. Behav. 1996, 21, 835–842. [Google Scholar] [CrossRef] [PubMed]
  23. Prochaska, J.O.; DiClemente, C.C. Stages and Processes of Self-Change of Smoking: Toward an Integrative Model of Change. J. Consult. Clin. Psychol. 1983, 51, 390–395. [Google Scholar] [CrossRef]
  24. Olsen, N.J.; Larsen, S.C.; Rohde, J.F.; Stougaard, M.; Händel, M.N.; Specht, I.O.; Heitmann, B.L. Effects of the Healthy Start Randomized Intervention on Psychological Stress and Sleep Habits among Obesity-Susceptible Healthy Weight Children and Their Parents. PLoS ONE 2022, 17, e0264514. [Google Scholar] [CrossRef]
  25. Rohde, J.F.; Larsen, S.C.; Ängquist, L.; Olsen, N.J.; Stougaard, M.; Mortensen, E.L.; Heitmann, B.L. Effects of the Healthy Start Randomized Intervention on Dietary Intake among Obesity-Prone Normal-Weight Children. Public Health Nutr. 2017, 20, 2988–2997. [Google Scholar] [CrossRef]
  26. Händel, M.N.; Larsen, S.C.; Rohde, J.F.; Stougaard, M.; Olsen, N.J.; Heitmann, B.L. Effects of the Healthy Start Randomized Intervention Trial on Physical Activity among Normal Weight Preschool Children Predisposed to Overweight and Obesity. PLoS ONE 2017, 12, e0185266. [Google Scholar] [CrossRef]
  27. Sund Start. Available online: https://www.sundstart.nu/ (accessed on 15 March 2025).
  28. Nysom, K.; Mølgaard, C.; Hutchings, B.; Fleischer Michaelsen, K. Body Mass Index of 0 to 45-y-Old Danes: Reference Values and Comparison with Published European Reference Values. Int. J. Obes. 2001, 25, 177–184. [Google Scholar] [CrossRef] [PubMed]
  29. Goran, M.; Driscoll, P.; Johnson, R.; Nagy, T.; Hunter, G. Cross-Calibration of Body-Composition Techniques against Dual-Energy X-Ray Absorptiometry in Young Children. Am. J. Clin. Nutr. 1996, 63, 299–305. [Google Scholar] [CrossRef]
  30. Danish Single Sided SDQ. Available online: https://www.sdqinfo.org/py/sdqinfo/b3.py?language=Danish (accessed on 1 April 2025).
  31. Goodman, R. The Strengths and Difficulties Questionnaire: A Research Note. J. Child Psychol. Psychiatry 1997, 38, 581–586. [Google Scholar] [CrossRef]
  32. Goodman, A.; Lamping, D.L.; Ploubidis, G.B. When to Use Broader Internalising and Externalising Subscales Instead of the Hypothesised Five Subscales on the Strengths and Difficulties Questionnaire (SDQ): Data from British Parents, Teachers and Children. J. Abnorm. Child. Psychol. 2010, 38, 1179–1191. [Google Scholar] [CrossRef]
  33. Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed.; Springer: New York, NY, USA, 2009; ISBN 978-0-387-84857-0. [Google Scholar]
  34. Willett, W.C.; Howe, G.R.; Kushi, L.H. Adjustment for Total Energy Intake in Epidemiologic Studies. Am. J. Clin. Nutr. 1997, 65, 1220S–1228S; discussion 1229S–1231S. [Google Scholar] [CrossRef] [PubMed]
  35. Friedman, J.H.; Hastie, T.; Tibshirani, R. Regularization Paths for Generalized Linear Models via Coordinate Descent. J. Stat. Softw. 2010, 33, 1–22. [Google Scholar] [CrossRef] [PubMed]
  36. Lenth, R.V. Emmeans: Estimated Marginal Means, Aka Least-Squares Means; R Package Version 1.11.0. Available online: https://CRAN.R-project.org/package=emmeans (accessed on 2 December 2024).
  37. Alosh, M.; Huque, M.F.; Bretz, F.; D’Agostino, R.B. Tutorial on Statistical Considerations on Subgroup Analysis in Confirmatory Clinical Trials. Stat. Med. 2017, 36, 1334–1360. [Google Scholar] [CrossRef]
  38. R: The R Project for Statistical Computing. Available online: https://www.r-project.org/ (accessed on 2 December 2024).
  39. Sénéchal, M.; Hebert, J.J.; Fairchild, T.J.; Møller, N.C.; Klakk, H.; Wedderkopp, N. Vigorous Physical Activity Is Important in Maintaining a Favourable Health Trajectory in Active Children: The CHAMPS Study-DK. Sci. Rep. 2021, 11, 19211. [Google Scholar] [CrossRef] [PubMed]
  40. Wyszyńska, J.; Ring-Dimitriou, S.; Thivel, D.; Weghuber, D.; Hadjipanayis, A.; Grossman, Z.; Ross-Russell, R.; Dereń, K.; Mazur, A. Physical Activity in the Prevention of Childhood Obesity: The Position of the European Childhood Obesity Group and the European Academy of Pediatrics. Front. Pediatr. 2020, 8, 535705. [Google Scholar] [CrossRef] [PubMed]
  41. Heerman, W.J.; Yin, H.S.; Schildcrout, J.S.; Bian, A.; Rothman, R.L.; Flower, K.B.; Delamater, A.M.; Sanders, L.; Wood, C.; Perrin, E.M. The Effect of an Obesity Prevention Intervention Among Specific Subpopulations: A Heterogeneity of Treatment Effect Analysis of the Greenlight Trial. Child. Obes. 2024, 20, 572–580. [Google Scholar] [CrossRef]
  42. Lissner, L.; De Bourdeaudhuij, I.; Konstabel, K.; Mårild, S.; Mehlig, K.; Molnár, D.; Moreno, L.A.; Pigeot, I.; Siani, A.; Tornaritis, M.; et al. Differential Outcome of the IDEFICS Intervention in Overweight versus Non-Overweight Children: Did We Achieve ‘Primary’ or ‘Secondary’ Prevention? Obes. Rev. 2015, 16, 119–126. [Google Scholar] [CrossRef]
  43. Derwig, M.; Tiberg, I.; Björk, J.; Welander Tärneberg, A.; Hallström, I.K. A Child-centered Health Dialogue for the Prevention of Obesity in Child Health Services in Sweden—A Randomized Controlled Trial Including an Economic Evaluation. Obes. Sci. Pract. 2021, 8, 77–90. [Google Scholar] [CrossRef]
  44. Zhang, Y.; Kreif, N.; Gc, V.S.; Manca, A. Machine Learning Methods to Estimate Individualized Treatment Effects for Use in Health Technology Assessment. Med. Decis. Mak. 2024, 44, 756–769. [Google Scholar] [CrossRef]
  45. Riley, R.D.; Snell, K.I.E.; Martin, G.P.; Whittle, R.; Archer, L.; Sperrin, M.; Collins, G.S. Penalization and Shrinkage Methods Produced Unreliable Clinical Prediction Models Especially When Sample Size Was Small. J. Clin. Epidemiol. 2021, 132, 88–96. [Google Scholar] [CrossRef]
  46. Rodriguez, A.; Korzeniowska, K.; Szarejko, K.; Borowski, H.; Brzeziński, M.; Myśliwiec, M.; Czupryniak, L.; Berggren, P.-O.; Radziwiłł, M.; Soszyński, P. Getting Them through the Door: Social and Behavioral Determinants of Uptake and Engagement in an Obesity Intervention. Obes. Res. Clin. Pract. 2023, 17, 86–90. [Google Scholar] [CrossRef]
  47. Stratton, S.J. Population Research: Convenience Sampling Strategies. Prehospital Disaster Med. 2021, 36, 373–374. [Google Scholar] [CrossRef]
  48. Twenge, J.M.; Campbell, W.K. Associations between Screen Time and Lower Psychological Well-Being among Children and Adolescents: Evidence from a Population-Based Study. Prev. Med. Rep. 2018, 12, 271–283. [Google Scholar] [CrossRef] [PubMed]
  49. Monteiro, C.A.; Cannon, G.; Levy, R.B.; Moubarac, J.-C.; Louzada, M.L.; Rauber, F.; Khandpur, N.; Cediel, G.; Neri, D.; Martinez-Steele, E.; et al. Ultra-Processed Foods: What They Are and How to Identify Them. Public Health Nutr. 2019, 22, 936–941. [Google Scholar] [CrossRef] [PubMed]
  50. World Health Organization. Report of the Commission on Ending Childhood Obesity; World Health Organization: Geneva, Switzerland, 2016; ISBN 978-92-4-151006-6. [Google Scholar]
Figure 1. Flowchart summarising the modified LASSO model. Baseline predictors (X) were multiplied by the centered intervention variable (Z, coded +0.5/−0.5) to form interaction terms (X × Z). These interaction terms were penalised within a 10-fold cross-validated LASSO, while sex, age, baseline outcome, and mean energy intake were included as unpenalised covariates. The model structure is shown in the central box. Non-zero coefficients at λmin represent baseline predictors signaling a modified intervention effect on the outcome.
Figure 1. Flowchart summarising the modified LASSO model. Baseline predictors (X) were multiplied by the centered intervention variable (Z, coded +0.5/−0.5) to form interaction terms (X × Z). These interaction terms were penalised within a 10-fold cross-validated LASSO, while sex, age, baseline outcome, and mean energy intake were included as unpenalised covariates. The model structure is shown in the central box. Non-zero coefficients at λmin represent baseline predictors signaling a modified intervention effect on the outcome.
Data 10 00196 g001
Figure 2. Predictors ranked by relative importance (%) (LASSO, λmin = 0.02). Note: Subgroup analyses are exploratory and hypothesis-generating, not confirmatory.
Figure 2. Predictors ranked by relative importance (%) (LASSO, λmin = 0.02). Note: Subgroup analyses are exploratory and hypothesis-generating, not confirmatory.
Data 10 00196 g002
Table 1. Baseline Characteristics of children included in the analysis.
Table 1. Baseline Characteristics of children included in the analysis.
Baseline CharacteristicControl GroupIntervention Group
n1n
Age (years)2724.01 ± 1.072714.01 ± 1.08
Sex
Male1063912144
Female1666115055
Follow up time (years)2031.29 ± 0.211611.31 ± 0.28
BMI Z-Score2720.15 ± 0.742710.06 ± 0.80
Fat Mass (%)17621.31 ± 8.9019922.26 ± 10.12
Fat Mass (kg)1763.74 ± 1.591993.78 ± 1.72
Fat-Free Mass (kg)17614.05 ± 3.1719913.62 ± 3.38
Waist circumference (cm)25752.10 ± 3.0125351.76 ± 3.22
Waist/Hip ratio2550.93 ± 0.052520.93 ± 0.05
Sum of four skin folds (mm)24224.82 ± 5.2323624.61 ± 5.44
Average Daily energy intake (MJ)2724.83 ± 1.022714.70 ± 1.00
Chores a weekly activity
Yes85398937
No1316014662
Hide and Seek a weekly activity
Yes1887718577
No55235523
Parents divorced
Yes197176
No2389223893
1 Mean ± standard deviation provided for continuous baseline characteristics; percentage provided for categorical baseline characteristics. BMI, Body Mass Index.
Table 2. Predictors selected by LASSO for heterogeneous intervention effects on BMI z-score, with subgroup mean differences and difference of mean differences.
Table 2. Predictors selected by LASSO for heterogeneous intervention effects on BMI z-score, with subgroup mean differences and difference of mean differences.
Baseline CovariateLASSO
Coefficient 1
(λmin = 0.02)
Yes
N
Yes Subgroup *
(Intervention—Control)
No
N
No Subgroup *
(Intervention—Control)
Difference of Mean Differences
Chores a weekly activity0.251170.15 (−0.03, 0.33)1520.02 (−0.14, 0.17)0.13 (−0.10, 0.38)
Parents Divorced−0.2113−0.19 (−0.69, 0.31)2210.09 (−0.08, 0.25)−0.28 (−0.79, 0.24)
Hide and Seek a weekly activity−0.192500.01 (−0.12, 0.14)740.33 (0.09, 0.57)−0.32 (−0.59, −0.05)
Trampoline a weekly activity0.08990.20 (−0.01, 0.40)2050.03 (−0.11, 0.17)0.17 (−0.08, 0.41)
Walking a weekly activity0.062020.15 (0.01, 0.30)117−0.02 (−0.20, 0.16)0.17 (−0.06, 0.40)
Boardgames a weekly activity0.061210.20 (0.02, 0.40)188−0.02 (−0.17, 0.13)0.22 (−0.01, 0.45)
Football a weekly activity0.05640.28 (0.03, 0.52)2330.04 (−0.09, 0.18)0.24 (−0.04, 0.51)
Computer Games a weekly activity−0.011480.00 (−0.16, 0.16)1650.13 (−0.03, 0.29)−0.13 (−0.36, 0.10)
1 Positive LASSO coefficients correspond to larger outcomes in the intervention group relative to control, and negative coefficients correspond to lower outcomes. * Mean Difference and 95% confidence intervals. Model included the unpenalized covariates: Age, Sex, Baseline BMI z-score and average daily energy intake (MJ). Note: Subgroup analyses are exploratory and emphasise effect sizes over formal hypothesis testing.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Mannion, E.; Bihrmann, K.; Olsen, N.J.; Heitmann, B.L.; Ritz, C. Using Machine Learning to Identify Predictors of Heterogeneous Intervention Effects in Childhood Obesity Prevention. Data 2025, 10, 196. https://doi.org/10.3390/data10120196

AMA Style

Mannion E, Bihrmann K, Olsen NJ, Heitmann BL, Ritz C. Using Machine Learning to Identify Predictors of Heterogeneous Intervention Effects in Childhood Obesity Prevention. Data. 2025; 10(12):196. https://doi.org/10.3390/data10120196

Chicago/Turabian Style

Mannion, Elizabeth, Kristine Bihrmann, Nanna Julie Olsen, Berit Lilienthal Heitmann, and Christian Ritz. 2025. "Using Machine Learning to Identify Predictors of Heterogeneous Intervention Effects in Childhood Obesity Prevention" Data 10, no. 12: 196. https://doi.org/10.3390/data10120196

APA Style

Mannion, E., Bihrmann, K., Olsen, N. J., Heitmann, B. L., & Ritz, C. (2025). Using Machine Learning to Identify Predictors of Heterogeneous Intervention Effects in Childhood Obesity Prevention. Data, 10(12), 196. https://doi.org/10.3390/data10120196

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop