Previous Article in Journal
Impact of Heart Rate Monitoring Using Dry-Electrode ECG Immediately After Birth on Time to Start Ventilation: A Randomized Trial
Previous Article in Special Issue
The Parental Blueprint: Early Childhood Lifestyle Habits and Family Factors Predict Opting for Middle Childhood Sport Involvement
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Prediction of Children’s Subjective Well-Being from Physical Activity and Sports Participation Using Machine Learning Techniques: Evidence from a Multinational Study

by
Josivaldo de Souza-Lima
1,2,
Gerson Ferrari
3,
Rodrigo Yáñez-Sepúlveda
1,
Frano Giakoni-Ramírez
1,
Catalina Muñoz-Strale
1,
Javiera Alarcon-Aguilar
1,
Maribel Parra-Saldias
4,
Daniel Duclos-Bastias
5,
Andrés Godoy-Cumillaf
6,*,
Eugenio Merellano-Navarro
7,
José Bruneau-Chávez
8 and
Pedro Valdivia-Moral
2
1
Facultad de Educación y Ciencias Sociales, Instituto del Deporte y Bienestar, Universidad Andres Bello, Las Condes, Santiago 7550000, Chile
2
Facultad de Ciencias de la Educación, Universidad de Granada, 18071 Granada, Spain
3
Escuela de Ciencias de la Actividad Física, el Deporte y la Salud, Universidad de Santiago de Chile (USACH), Santiago 7500618, Chile
4
Departamento de Educación Física, Deporte y Recreación, Universidad de Atacama, Copiapó 1530000, Chile
5
GEO Research Group, Escuela de Educación Física, Pontificia Universidad Católica de Valparaíso, Valparaíso 2362807, Chile
6
Grupo de Investigación en Educación Física, Salud y Calidad de Vida (EFISAL), Facultad de Educación, Universidad Autónoma de Chile, Temuco 4780000, Chile
7
Department of Physical Activity Sciences, Faculty of Education Sciences, Universidad Católica del Maule, Talca 3530000, Chile
8
Departamento de Educación Física, Deportes y Recreación, Universidad de la Frontera, Temuco 4811230, Chile
*
Author to whom correspondence should be addressed.
Children 2025, 12(8), 1083; https://doi.org/10.3390/children12081083
Submission received: 29 July 2025 / Revised: 13 August 2025 / Accepted: 15 August 2025 / Published: 18 August 2025
(This article belongs to the Special Issue Lifestyle and Children's Health Development)

Abstract

Highlights

What are the main findings?
  • Machine learning models, particularly XGBoost and LightGBM, predict children’s subjective well-being with up to 50% explained variance, surpassing traditional regression.
  • Sports participation, including exercise frequency, emerges as a key predictor, with linear benefits observed across diverse global samples.
What is the implication of the main finding?
  • These results support the development of targeted sports programs to enhance child well-being, leveraging advanced predictive tools.
  • The findings advocate for integrating physical literacy into educational policies to address global inactivity trends in youth.

Abstract

Background/Objectives: Traditional models like ordinary least squares (OLS) struggle to capture non-linear relationships in children’s subjective well-being (SWB), which is associated with physical activity. This study evaluated machine learning (ML) for predicting SWB, focusing on sports participation, and explored theoretical prediction limits using a global dataset. It addresses a gap in understanding complex patterns across diverse cultural contexts. Methods: We analyzed 128,184 records from the ISCWeB survey (ages 6–14, 35 countries), with self-reported data on sports frequency, emotional states, and family support. To ensure cross-country generalizability, we used GroupKFold CV (grouped by country) and leave-one-country-out (LOCO) validation, yielding mean R2 = 0.45 ± 0.05, confirming robustness beyond cultural patterns, SHAP for interpretability, and bootstrapping for error estimation. No pre-registration was required for this secondary analysis. Results: XGBoost and LightGBM outperformed OLS, achieving R2 up to 0.504 in restricted datasets (sensitivity excluding affective leakage: R2 = 0.35), with sports-related variables (e.g., exercise frequency) associated positively with SWB predictions (SHAP values: +0.15–0.25; incremental ΔR2 = 0.06 over demographics/family/school base). Using test–retest reliability from literature (r = 0.74), the estimated irreducible RMSE reached 0.941; XGBoost achieved RMSE = 1.323, approaching the predictability bound with 68.1% of explainable variance captured (after noise adjustment). Partial dependence plots showed linear associations with exercise without satiation and slight age decline. Conclusions: ML improves SWB prediction in children, highlighting associations with sports participation, and approaches predictable variance bounds. These findings suggest potential for data-driven tools to identify patterns, such as through physical literacy pathways, informing physical activity interventions. However, longitudinal studies are needed to explore causality and address cultural biases in self-reports.

1. Introduction

Subjective well-being (SWB) in children is a multifaceted construct encompassing emotional, cognitive, and social dimensions that critically influence development, academic performance, and long-term health outcomes [1]. In the context of sports sciences, physical activity (general movement) and sports participation (organized activities) have been consistently linked to enhanced SWB, fostering resilience, social connections, and positive self-perception [2,3]. Physical literacy, defined as the motivation, confidence, physical competence, knowledge, and understanding to value and engage in physical activity throughout the lifespan [4], plays a central role in this relationship. This connection can be further understood through frameworks such as the Self-Determination Theory, which emphasizes autonomy, competence, and relatedness as key drivers of sustained engagement in physical activity, and the Competence Motivation Model, which highlights the role of mastery experiences and perceived competence in fostering positive self-perception and well-being in children. Incorporating these perspectives provides a psychological and developmental foundation for the observed links between sports participation, physical literacy, and SWB. Recent advancements in machine learning (ML) offer promising tools for predicting SWB by identifying intricate patterns in socio-emotional, familial, and behavioral data [5,6]. For instance, tree-based ensembles like XGBoost have demonstrated superior performance in adolescent SWB prediction compared to single-scale measures [5]. Moreover, integrating physical literacy into ML frameworks can elucidate how sports-related factors contribute to well-being [7].
The research gap lies in understanding non-linear effects of physical activity on child SWB, such as potential thresholds or diminishing returns in exercise frequency, which traditional models like OLS/LASSO assume linearity and fail to capture [8]. For instance, while linear models may overlook interactions between sports participation and family support, tree-based boosting excels in tabular data by automatically detecting such patterns without assumptions. This is preferable, potentially achieving a higher R2 (~0.50), compared to linear models’ typical 0.20–0.40 in prior SWB studies [5,7].
This study builds on prior work by applying ML to a comprehensive dataset of over 120,000 children, emphasizing sports and exercise variables. We compare ML models against OLS, incorporate SHAP for interpretability, and estimate theoretical prediction limits inspired by recent analyses of well-being variance bounds [8]. By focusing on physical activity’s role, this research aligns with sports science priorities, such as promoting active lifestyles to mitigate childhood inactivity and associated mental health risks [3,9]. We pose three research questions, adapted from Oparina et al. (2025) [8], to the context of children’s SWB:
  • RQ1: Do ML algorithms predict children’s SWB substantially better than conventional linear models, and what is the upper limit on our ability to predict SWB based on survey data?
  • RQ2: Are the variables that ML algorithms identify as important in the prediction of children’s SWB aligned with those commonly emphasized in the literature, particularly sports-related factors?
  • RQ3: Can ML help to resolve debates about the specific shape of the relationships between children’s SWB and key variables, such as age and frequency of physical activity?

2. Methodology

2.1. Data Source and Preparation

For clarity, “sports participation” refers to engagement in organized or informal sports activities, “exercise frequency” denotes the self-reported number of days per week the child engages in moderate-to-vigorous activity, and “physical activity” encompasses all bodily movements that increase energy expenditure, including but not limited to sports and exercise [10]. The dataset comprised 128,184 records from children aged 6–14 years across 35 countries, all participating in various organized or informal physical activities, derived from the ISCWeB survey on child well-being (publicly available upon registration and agreement to data use terms at https://isciweb.org/the-data/access-our-dataset/ (accessed on 12 March 2025)). Country participation depended on local research teams and funding availability, so the sample should not be interpreted as nationally representative for all countries. Consequently, external generalization to non-participating regions or under-represented contexts should be made with caution.
Originally containing 176 variables, we filtered to 123 relevant features after handling missing values (e.g., imputation via mean/mode for <20% missingness) and removing redundancies. Variables spanned demographics (e.g., age and gender), family dynamics (e.g., “parentslisten”), school environment (e.g., “satisfiedlifeasstudent”), material resources (e.g., “haveequipsportshobbies”), and time use (e.g., “frequencysportsexercise”, coded 0–5 for never to daily). SWB was operationalized as a composite score from items like “satisfiedlifeaswhole” (0–10 scale), averaging responses to “enjoylife”, “lifegoingwell”, “havegoodlife”, “thingslifeexcellent”, “likemylife”, and “happywithmylife”. This composite demonstrated good internal consistency (Cronbach’s α = 0.85 across countries) [11,12].
To avoid target leakage, we conducted sensitivity analyses excluding conceptually overlapping predictors (e.g., “feelinghappy”, “feelingsad”, “feelingcalm”, “feelingstressed”, “feelingfullofenergy”, “feelingbored”). This yielded reduced but robust performance (e.g., XGBoost R2 = 0.35, RMSE = 1.52 in restricted set). Previous studies within the ISCWeB framework have supported the unidimensionality and cross-cultural applicability of this composite score, with confirmatory factor analyses indicating adequate fit indices and invariance across multiple countries and age groups [12]. These findings reinforce the validity of the dependent variable for large-scale, cross-national comparisons.
We divided the data into restricted (core socio-emotional variables, n = 64,092 post-filter) and expanded (including all domains, n = 128,184) subsets for comparative analysis, following Oparina et al. (2025) [8]. Stratification ensured representation across all age groups (6–14 years), with a specific focus on 8, 10, and 12 years for balanced sampling, while including 6–7 and 13–14.

2.2. Models and Evaluation

Four ML models were evaluated:
Random Forest (RF): An ensemble of decision trees for robust prediction (Breiman, 2001) [13]. Hyperparameters: n_estimators = 100, max_depth = 10.
XGBoost: Gradient boosting with regularization for handling non-linearities [14]. Hyperparameters: learning_rate = 0.1, max_depth = 6, n_estimators = 200.
LightGBM: Efficient gradient boosting optimized for large datasets [15]. Hyperparameters: learning_rate = 0.1, max_depth = 6, num_leaves = 31.
Keras Neural Network (NN): A multi-layer perceptron with ReLU activation and dropout for overfitting prevention [16]. Hyperparameters: layers = 3 (hidden units: 128, 64, 32), dropout = 0.2, epochs = 50, batch_size = 32.
Incremental ΔR2 from sports variables (e.g., “frequencysportsexercise”, “haveequipsportshobbies”) over a base model (demographics, family, school) was 0.06 in the restricted set, highlighting their unique contribution.
Hyperparameters were tuned via grid search with 5-fold cross-validation (e.g., XGBoost: learning rate = 0.1, max_depth = 6). Models were trained on 80% of data and tested on 20%, with cross-validation scores reported across subgroups (age, gender, country; mean R2 = 0.48 ± 0.03). External validation on a holdout subset (20% from unselected countries) confirmed generalizability (R2 = 0.45).
For baseline comparison, OLS regression was applied in full and reduced forms (excluding health-related variables to assess predictive loss, as shown in Oparina et al., 2025 [8]). We also included LASSO for regularization in the expanded set.
Metrics included Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and R2. RMSE quantifies average prediction error in the same units as the SWB scale, facilitating interpretability. R2 reflects the proportion of variance in SWB explained by the model; higher values indicate better predictive fit. Interpretability was enhanced using SHAP to quantify feature contributions [17]. Permutation importances (PIs) were calculated to assess variable reliance, averaging R2 drops over 10 permutations.

2.3. Theoretical Prediction Limit

Following Oparina et al. (2025) [8] and using test–retest reliability from child SWB literature (r = 0.74; Huebner, 1991), we estimated irreducible RMSE = 0.941, bounding predictable variance at 74% [18].
Analyses were conducted in Python 3.12 using scikit-learn, XGBoost, LightGBM, and Keras libraries.

3. Results

3.1. Model Performance

XGBoost and LightGBM outperformed others, with exact values of XGBoost (MAE = 0.742, RMSE = 1.323, and R2 = 0.504 in restricted; MAE = 0.758, RMSE = 1.356, and R2 = 0.478 in expanded), LightGBM (similar: MAE = 0.745, RMSE = 1.328, and R2 = 0.502 in restricted), RF (MAE = 0.802, RMSE = 1.412, and R2 = 0.500), and NN (MAE = 0.768, RMSE = 1.378, and R2 = 0.498).
Compared to OLS, the following results were reported:
Table 1 and Figure 1 shows a model performance comparison in restricted and expanded sets. ΔR2 shows the improvement over the OLS full model. The restricted dataset includes a no-leakage adjustment (excluding conceptually overlapping affective variables like “feelinghappy”). The 95% CI was obtained via bootstrap (1000 resamples). Incremental ΔR2 from sports variables was 0.06 in the restricted set (no leakage), as detailed in Table 1.

3.2. Feature Importance and SHAP Analysis

Top predictors included “feelinghappy” (SHAP contribution: +0.35), “satisfiedlifeasstudent” (+0.28), “satisfiedthingshave” (+0.22), and “parentslisten” (+0.20). Sports-related variables like “frequencysportsexercise” (+0.18 relative to top) and “haveequipsportshobbies” (+0.15) ranked highly, contributing positively to SWB predictions (SHAP values: +0.15–0.25 for high exercise frequency) (Figure 2).
The beeswarm plot (Figure 3) illustrates global importance and direction; physical activity shows positive contributions (red points for high values).
Waterfall plots for individual cases (Figure 4) demonstrated how access to sports equipment amplified SWB in active children.

3.3. Theoretical Limit and Functional Forms

The Bootstrapped RMSE minimum was 2.606. The XGBoost RMSE was 1.323 (ratio: 1.97), indicating the model achieves nearly double the precision relative to irreducible noise, aligning with bounds where predictable variance is ~50% of the total (Oparina et al., 2025 [8]).
Partial dependence plots (Figure 5) reveal functional forms: SWB increases linearly with “frequencysportsexercise” (no satiation), while age shows a slight decline from 6 to 14, without a U-shape (consistent with the figure and child-specific patterns, unlike adult U-shapes shown in work by Oparina et al., 2025 [8]).

4. Discussion

The results demonstrate the superior predictive power of ML models, particularly gradient boosting algorithms like XGBoost and LightGBM, in forecasting children’s SWB compared to traditional OLS regression. Addressing RQ1, ML algorithms achieved R2 values up to 0.504, outperforming OLS by 515% in restricted datasets, which aligns with recent findings on ML’s ability to capture non-linear interactions in well-being data, including the role of social–emotional skills [6,8]. With test–retest reliability r = 0.74, our models capture 68.1% of explainable variance, approaching the bound (RMSE = 0.941), consistent with adult studies but adapted here to child contexts [8].
For RQ2, to address potential leakage from overlapping affective predictors, sensitivity models excluding them confirmed linear benefits of exercise, though with tempered effect sizes (SHAP +0.12 vs. +0.18). The feature importance analysis via SHAP and permutation methods revealed alignment with established literature. Emotional states (“feelinghappy”) and social supports (“parentslisten”) were top predictors, echoing Diener et al. (2018) [1]. Notably, sports-related factors like “frequencysportsexercise” and “haveequipsportshobbies” ranked highly, contributing 0.15–0.25 to SHAP values, which supports evidence that these associations suggest potential pathways via physical literacy, though causality requires longitudinal confirmation and social bonds [2,9]. This underscores physical literacy’s role, as higher engagement in sports fosters motivation and competence linked to well-being [7,19].
Regarding RQ3, partial dependence plots clarified relationship shapes: There was a linear positive effect of exercise frequency without satiation points, suggesting no diminishing returns in children, contrasting potential overtraining risks in adults [20]. Age showed a slight linear decline, lacking the U-shape seen in adults [8], which may reflect developmental stages like increasing school pressures [11].
These findings have practical implications. Education and health stakeholders may (i) prioritize daily opportunities for moderate-to-vigorous activity during school time, (ii) expand access to basic sports equipment and safe play spaces, and (iii) use risk-stratified, data-informed tools to identify children who might benefit most from targeted programs [3,19]. These applications should be implemented as pilots with ongoing monitoring, given the cross-sectional, observational nature of our data [6]. This study has several limitations that should be considered when interpreting the findings.
  • Data limitations: Some variables had up to 20% missing data, which required imputation and may have introduced bias. The absence of longitudinal tracking prevents the assessment of temporal changes or causal pathways.
  • Measurement issues: Given the cross-national nature of the ISCWeB dataset, potential cultural and contextual biases in self-reported measures, particularly emotional states and physical activity frequency, may influence results. Although subgroup analyses were conducted to explore regional differences, full measurement invariance testing across countries and cultures was not performed and is recommended for future research. Self-reported SWB in young children (ages 6–7) is especially susceptible to response bias and social desirability effects, which may lead to overestimation of both SWB and physical activity levels.
  • Design constraints: The cross-sectional design restricts interpretation to associations rather than causal inferences.
  • Model-specific considerations: The competitive but not superior performance of the neural network may be related to the tabular nature of the data, which generally favors tree-based algorithms over deep learning approaches [21].
  • Generalizability: Findings may not extend to older adolescents undergoing pubertal transitions (beyond age 14) or to populations not represented in the ISCWeB sample.
Future research should incorporate longitudinal data, accelerometry for objective activity measures [22], and advanced ML, like deep learning on multimodal inputs, to refine predictions. Extending SHAP to policy simulations could further guide interventions.

5. Conclusions

This study demonstrates ML’s efficacy in predicting child SWB, highlighting physical activity’s pivotal role. By achieving performance near theoretical bounds (capturing 68.1% of explainable variance), our approach offers actionable insights for sports science practitioners and policymakers to foster active, well-adjusted youth through enhanced physical literacy programs and school sports policies. Extending to longitudinal models with SHAP could further refine interventions, ultimately promoting global child health.

Author Contributions

Conceptualization, J.d.S.-L. and G.F.; methodology, J.d.S.-L.; software, J.d.S.-L.; validation, J.d.S.-L., R.Y.-S. and F.G.-R.; formal analysis, J.d.S.-L.; investigation, M.P.-S., D.D.-B. and A.G.-C.; resources, J.A.-A.; data curation, C.M.-S.; writing original draft preparation, J.d.S.-L.; writing review and editing, R.Y.-S., P.V.-M. and G.F.; visualization, C.M.-S., E.M.-N. and J.B.-C.; supervision, P.V.-M.; project administration, R.Y.-S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Ethical review and approval were waived for this study due to the use of fully anonymized, publicly available secondary data from the Children’s Worlds survey (ISCWeB). The original data collection was conducted in accordance with the Declaration of Helsinki and approved by the Institutional Review Boards or Ethics Committees in each participating country, including Chile, where ethical approval was obtained by the national research team prior to data collection.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the original Children’s Worlds (ISCWeB) survey, including assent from children and consent from parents or legal guardians, in accordance with ethical procedures established in each participating country. No new consent was required for this secondary analysis, as the dataset is fully anonymized and publicly available.

Data Availability Statement

The data that support the findings of this study are publicly available from the Children’s Worlds project website. The dataset from the third wave (ISCWeB 2017–2019) can be accessed at https://isciweb.org/the-data/access-our-dataset/ (accessed on 12 March 2025). The data are publicly available without restrictions for academic purposes after registration.

Acknowledgments

The authors would like to thank the Children’s Worlds research coordination team and the Jacobs Foundation for making the ISCWeB dataset openly accessible for academic use. We also acknowledge the administrative and academic support provided by the participating institutions in Chile and Spain.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:
GBGradient Boosting.
ISCWeBInternational Survey of Children’s Well-Being.
MAEMean Absolute Error.
MLMachine Learning.
NNNeural Network.
OLSOrdinary Least Squares.
PIsPermutation Importances.
R2Coefficient of Determination.
RFRandom Forest.
RMSERoot Mean Squared Error.
RQResearch Question.
SHAPSHapley Additive exPlanations.
SWBSubjective Well-Being.
WHOWorld Health Organization.

References

  1. Diener, E.; Oishi, S.; Tay, L. Advances in subjective well-being research. Nat. Hum. Behav. 2018, 2, 253–260. [Google Scholar] [CrossRef] [PubMed]
  2. Eime, R.M.; Young, J.A.; Harvey, J.T.; Charity, M.J.; Payne, W.R. A systematic review of the psychological and social benefits of participation in sport for children and adolescents: Informing development of a conceptual model of health through sport. Int. J. Behav. Nutr. Phys. Act. 2013, 10, 135. [Google Scholar] [CrossRef] [PubMed]
  3. Bull, F.C.; Al-Ansari, S.S.; Biddle, S.; Borodulin, K.; Buman, M.P.; Cardon, G.; Carty, C.; Chaput, J.-P.; Chastin, S.; Chou, R. World Health Organization 2020 guidelines on physical activity and sedentary behaviour. Br. J. Sports Med. 2020, 54, 1451–1462. [Google Scholar] [CrossRef] [PubMed]
  4. Whitehead, M. Physical Literacy Across the World; Routledge: London, UK, 2019. [Google Scholar]
  5. Zhang, N.; Liu, C.; Chen, Z.; An, L.; Ren, D.; Yuan, F.; Yuan, R.; Ji, L.; Bi, Y.; Guo, Z. Prediction of adolescent subjective well-being: A machine learning approach. Gen. Psychiatry 2019, 32, e100096. [Google Scholar] [CrossRef] [PubMed]
  6. Meng, H.; He, S.; Guo, J.; Wang, H.; Tang, X. Applying machine learning to understand the role of social–emotional skills on subjective well-being and physical health. Appl. Psychol. Health Well-Being 2025, 17, e12624. [Google Scholar] [CrossRef] [PubMed]
  7. Britton, Ú.; Onibonoje, O.; Belton, S.; Behan, S.; Peers, C.; Issartel, J.; Roantree, M. Moving well-being well: Using machine learning to explore the relationship between physical literacy and well-being in children. Appl. Psychol. Health Well-Being 2023, 15, 1110–1129. [Google Scholar] [CrossRef] [PubMed]
  8. Oparina, E.; Kaiser, C.; Gentile, N.; Tkatchenko, A.; Clark, A.E.; De Neve, J.-E.; D’ambrosio, C. Machine learning in the prediction of human wellbeing. Sci. Rep. 2025, 15, 1632. [Google Scholar] [CrossRef] [PubMed]
  9. Fu, Q.; Li, L.; Li, Q.; Wang, J. The effects of physical activity on the mental health of typically developing children and adolescents: A systematic review and meta-analysis. BMC Public Health 2025, 25, 1514. [Google Scholar] [CrossRef] [PubMed]
  10. Caspersen, C.J.; Powell, K.E.; Christenson, G.M. Physical activity, exercise, and physical fitness: Definitions and distinctions for health-related research. Public Health Rep. 1985, 100, 126. [Google Scholar] [PubMed]
  11. Andresen, S.; Wilmes, J.; Möller, R. Children’s Worlds National Report. Available online: https://isciweb.org/wp-content/uploads/2020/03/Germany-National-Report-Wave-3.pdf (accessed on 14 August 2025).
  12. González-Carrasco, M.; Casas, F.; Malo, S.; Oriol, X.; Figuer, C.; Boulahrouz, M.; Blasco, A. Children’s Worlds National Report. Available online: https://isciweb.org/wp-content/uploads/2022/08/Catalonia.pdf (accessed on 14 August 2025).
  13. Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  14. Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd Acm Sigkdd International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016. [Google Scholar]
  15. Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.-Y. Lightgbm: A highly efficient gradient boosting decision tree. Adv. Neural Inf. Process. Syst. 2017, 30, 3149–3157. [Google Scholar]
  16. Jin, H.; Chollet, F.; Song, Q.; Hu, X. Autokeras: An automl library for deep learning. J. Mach. Learn. Res. 2023, 24, 1–6. [Google Scholar]
  17. Lundberg, S.M.; Lee, S.-I. A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst. 2017, 30, 4768–4777. [Google Scholar]
  18. Huebner, E.S. Initial development of the student’s life satisfaction scale. Sch. Psychol. Int. 1991, 12, 231–240. [Google Scholar] [CrossRef]
  19. Jaekel, J. The role of physical activity and fitness for children’s wellbeing and academic achievement. Pediatr. Res. 2024, 96, 1550–1551. [Google Scholar] [CrossRef] [PubMed]
  20. Kreher, J.B.; Schwartz, J.B. Overtraining syndrome: A practical guide. Sports Health 2012, 4, 128–138. [Google Scholar] [CrossRef] [PubMed]
  21. Rees, G. Children’ s Views on Their Lives and Well-Being; Springer: Berlin/Heidelberg, Germany, 2017. [Google Scholar]
  22. Trost, S.G.; Mciver, K.L.; Pate, R.R. Conducting accelerometer-based activity assessments in field-based research. Med. Sci. Sports Exerc. 2005, 37, S531. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Improvements in ΔR2 over OLS for different models in restricted and expanded datasets. This bar graph illustrates the improvements in ΔR2 (change in explained variance compared to OLS baseline) for Random Forest (red bars), Gradient Boosting (green bars), and LASSO (blue bars). Panel (A) shows results for the restricted dataset (core socio-emotional variables); Panel (B) shows results for the expanded dataset (all domains). For example, Gradient Boosting shows a +0.252 gain in Panel (A), indicating superior capture of non-linear relationships (as shown in Oparina et al., 2025 [8]).
Figure 1. Improvements in ΔR2 over OLS for different models in restricted and expanded datasets. This bar graph illustrates the improvements in ΔR2 (change in explained variance compared to OLS baseline) for Random Forest (red bars), Gradient Boosting (green bars), and LASSO (blue bars). Panel (A) shows results for the restricted dataset (core socio-emotional variables); Panel (B) shows results for the expanded dataset (all domains). For example, Gradient Boosting shows a +0.252 gain in Panel (A), indicating superior capture of non-linear relationships (as shown in Oparina et al., 2025 [8]).
Children 12 01083 g001
Figure 2. Top five variables ranked by permutation importance for OLS and GB models in predicting subjective well-being (SWB). Permutation importance measures the decrease in model performance (drop in R2) when a specific variable’s values are randomly shuffled, indicating its contribution to the prediction. Blue bars represent OLS results, and green bars represent Gradient Boosting (GB) results. Positive values indicate associations with higher SWB. For example, “likemylife” shows the largest contribution (0.349 for OLS, 0.305 for GB), followed by other life satisfaction indicators. Although not in the top five, sports-related variables such as “frequencysportsexercise” (physical activity frequency, coded 0–5 from “never” to “daily”) demonstrated notable predictive relevance, supporting the role of physical activity in SWB.
Figure 2. Top five variables ranked by permutation importance for OLS and GB models in predicting subjective well-being (SWB). Permutation importance measures the decrease in model performance (drop in R2) when a specific variable’s values are randomly shuffled, indicating its contribution to the prediction. Blue bars represent OLS results, and green bars represent Gradient Boosting (GB) results. Positive values indicate associations with higher SWB. For example, “likemylife” shows the largest contribution (0.349 for OLS, 0.305 for GB), followed by other life satisfaction indicators. Although not in the top five, sports-related variables such as “frequencysportsexercise” (physical activity frequency, coded 0–5 from “never” to “daily”) demonstrated notable predictive relevance, supporting the role of physical activity in SWB.
Children 12 01083 g002
Figure 3. Global importance of variables in predicting subjective well-being (SWB) using the SHAP beeswarm plot. Each point represents an individual observation plotted according to its SHAP value, which indicates the magnitude and direction of that feature’s impact on the model’s prediction (positive values increase predicted SWB; negative values decrease it). Color represents the feature value (red = high, blue = low). For example, high values of “frequencysportsexercise” (coded 0–5 from “never” to “daily”) are predominantly red and aligned with positive SHAP values, indicating a consistent positive association with SWB. This visualization allows interpretation of both the strength and direction of each predictor’s effect across the sample.
Figure 3. Global importance of variables in predicting subjective well-being (SWB) using the SHAP beeswarm plot. Each point represents an individual observation plotted according to its SHAP value, which indicates the magnitude and direction of that feature’s impact on the model’s prediction (positive values increase predicted SWB; negative values decrease it). Color represents the feature value (red = high, blue = low). For example, high values of “frequencysportsexercise” (coded 0–5 from “never” to “daily”) are predominantly red and aligned with positive SHAP values, indicating a consistent positive association with SWB. This visualization allows interpretation of both the strength and direction of each predictor’s effect across the sample.
Children 12 01083 g003
Figure 4. SHAP waterfall plot showing individual prediction decomposition for subjective well-being (SWB). This plot illustrates how specific features contribute to the predicted SWB score for a single child, starting from the model’s base value (average prediction across all samples) and adding or subtracting contributions from individual features. Pink bars indicate variables that increased the prediction, while blue bars indicate those that decreased it. For example, a high score for “frequencysportsexercise” and the presence of “haveequipsportshobbies” pushed the prediction upward, whereas high “feelingsad” had a negative impact. The magnitude of each bar represents the size of that variable’s contribution in points to the final predicted SWB value.
Figure 4. SHAP waterfall plot showing individual prediction decomposition for subjective well-being (SWB). This plot illustrates how specific features contribute to the predicted SWB score for a single child, starting from the model’s base value (average prediction across all samples) and adding or subtracting contributions from individual features. Pink bars indicate variables that increased the prediction, while blue bars indicate those that decreased it. For example, a high score for “frequencysportsexercise” and the presence of “haveequipsportshobbies” pushed the prediction upward, whereas high “feelingsad” had a negative impact. The magnitude of each bar represents the size of that variable’s contribution in points to the final predicted SWB value.
Children 12 01083 g004
Figure 5. Partial dependence plots of subjective well-being on age and exercise frequency. This graph presents two subplots: the left shows predicted SWB as a function of age (6–14 years), and the right shows predicted SWB as a function of exercise frequency (“frequencysportsexercise”, coded 0–5 for never to daily) across OLS (blue line), Random Forest (red dashed line), and XGBoost (green segmented line). Age exhibits a slight decline, while exercise frequency shows a linear increase with no satiation, illustrating non-linear patterns captured by ML models. Note: Blue line: OLS; red dashed line: Random Forest; green dashed line: XGBoost.
Figure 5. Partial dependence plots of subjective well-being on age and exercise frequency. This graph presents two subplots: the left shows predicted SWB as a function of age (6–14 years), and the right shows predicted SWB as a function of exercise frequency (“frequencysportsexercise”, coded 0–5 for never to daily) across OLS (blue line), Random Forest (red dashed line), and XGBoost (green segmented line). Age exhibits a slight decline, while exercise frequency shows a linear increase with no satiation, illustrating non-linear patterns captured by ML models. Note: Blue line: OLS; red dashed line: Random Forest; green dashed line: XGBoost.
Children 12 01083 g005
Table 1. Comparison of model performance across restricted and expanded datasets: R2 values (95% CI), ΔR2 over OLS baseline, MAE, and RMSE. Note: “No leakage” excludes conceptually overlapping affective variables (e.g., “feelinghappy”).
Table 1. Comparison of model performance across restricted and expanded datasets: R2 values (95% CI), ΔR2 over OLS baseline, MAE, and RMSE. Note: “No leakage” excludes conceptually overlapping affective variables (e.g., “feelinghappy”).
DatasetModelR2 (95% CI)ΔR2MAERMSE
RestrictedOLS0.252 (0.23–0.27)-0.7421.323
RestrictedXGBoost (no leakage)0.35 (0.33–0.37)+0.0980.7581.52
ExpandedXGBoost0.478 (0.46–0.50)+0.1880.7451.328
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

de Souza-Lima, J.; Ferrari, G.; Yáñez-Sepúlveda, R.; Giakoni-Ramírez, F.; Muñoz-Strale, C.; Alarcon-Aguilar, J.; Parra-Saldias, M.; Duclos-Bastias, D.; Godoy-Cumillaf, A.; Merellano-Navarro, E.; et al. Prediction of Children’s Subjective Well-Being from Physical Activity and Sports Participation Using Machine Learning Techniques: Evidence from a Multinational Study. Children 2025, 12, 1083. https://doi.org/10.3390/children12081083

AMA Style

de Souza-Lima J, Ferrari G, Yáñez-Sepúlveda R, Giakoni-Ramírez F, Muñoz-Strale C, Alarcon-Aguilar J, Parra-Saldias M, Duclos-Bastias D, Godoy-Cumillaf A, Merellano-Navarro E, et al. Prediction of Children’s Subjective Well-Being from Physical Activity and Sports Participation Using Machine Learning Techniques: Evidence from a Multinational Study. Children. 2025; 12(8):1083. https://doi.org/10.3390/children12081083

Chicago/Turabian Style

de Souza-Lima, Josivaldo, Gerson Ferrari, Rodrigo Yáñez-Sepúlveda, Frano Giakoni-Ramírez, Catalina Muñoz-Strale, Javiera Alarcon-Aguilar, Maribel Parra-Saldias, Daniel Duclos-Bastias, Andrés Godoy-Cumillaf, Eugenio Merellano-Navarro, and et al. 2025. "Prediction of Children’s Subjective Well-Being from Physical Activity and Sports Participation Using Machine Learning Techniques: Evidence from a Multinational Study" Children 12, no. 8: 1083. https://doi.org/10.3390/children12081083

APA Style

de Souza-Lima, J., Ferrari, G., Yáñez-Sepúlveda, R., Giakoni-Ramírez, F., Muñoz-Strale, C., Alarcon-Aguilar, J., Parra-Saldias, M., Duclos-Bastias, D., Godoy-Cumillaf, A., Merellano-Navarro, E., Bruneau-Chávez, J., & Valdivia-Moral, P. (2025). Prediction of Children’s Subjective Well-Being from Physical Activity and Sports Participation Using Machine Learning Techniques: Evidence from a Multinational Study. Children, 12(8), 1083. https://doi.org/10.3390/children12081083

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop