Physical Health Portrait and Intervention Strategy of College Students Based on Multivariate Cluster Analysis and Machine Learning
Abstract
:1. Introduction
2. Model Construction and Algorithm Enhancement
2.1. Univariate and Multivariate Analysis
2.2. Improved K-Means Algorithm
2.3. Validation and Evaluation
2.3.1. Logistic Regression Model
2.3.2. Decision Tree Model
2.3.3. Random Forest Model
2.3.4. XG Boost Model
3. Experimental Design
3.1. Dataset Description
3.2. Variable Analysis
3.2.1. Univariate Analysis
3.2.2. Multivariate Analysis
3.3. Cluster Analysis
3.3.1. Selection of the Optimal K Value
- K-means Clustering and Determination of the Optimal K Value
- 2.
- Application of the Elbow Method
- 3.
- Introduction and Validation of the Calinski–Harabasz Index
- 4.
- Cluster Analysis for Male and Female Groups
- 5.
- Dimensionality Reduction and Accuracy of Results
3.3.2. Analysis of Clustering Results
- Male Group Classification
- 2.
- Female Group Classification
- 3.
- Conclusions
4. Validation of Clustering Results
4.1. Algorithm Validation
4.2. Intervention Strategy Customization
4.2.1. Male Group
4.2.2. Female Group
5. Conclusions
6. Outlook
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Zhang, S.; Qi, H.; Shan, Y.; Lan, H.; Cao, Y.; Xin, W.; Ou, T.; Tang, W. An Interview study of college students’ health emergency literacy in the context of COVID-19 normalized prevention and control. Altern. Ther. Health Med. 2023, 29, 293–297. [Google Scholar] [PubMed]
- World Health Organization. Global Recommendations on Physical Activity for Health; World Health Organization: Geneva, Switzerland, 2010; p. 60. [Google Scholar]
- Ortega, F.B.; Ruiz, J.R.; Castillo, M.J.; Sjöström, M. Physical fitness in childhood and adolescence: A powerful marker of health. Int. J. Obes. 2008, 32, 1–11. [Google Scholar] [CrossRef] [PubMed]
- Simon, S.D. Centers for Disease Control and Prevention (CDC); Springer: Berlin/Heidelberg, Germany, 2022; pp. 158–161. [Google Scholar]
- Keating, X.D.; Guan, J.; Piñero, J.C.; Bridges, D.M. A meta-analysis of college students’ physical activity behaviors. J. Am. Coll. Health. 2005, 54, 116–126. [Google Scholar] [CrossRef] [PubMed]
- World Health Organization. Global Action Plan on Physical Activity 2018–2030: More Active People for a Healthier World; World Health Organization: Geneva, Switzerland, 2019. [Google Scholar]
- Inchley, J.; Currie, D.B.; Budisavljevic, S.; Torsheim, T.; Jastad, A.; Cosma, A.; Kelly, C.; Arnasson, A.M. Spotlight on Adolescent Health and Wellbeing: Findings from the 2017/2018 Health Behaviour in School-Aged Children (HBSC) Survey in Europe and Canada. International Report. Volume 1: Key Findings; WHO Regional Office for Europe: Geneva, Switzerland, 2020. [Google Scholar]
- Kidokoro, T.; Tomkinson, G.R.; Lang, J.J.; Suzuki, K. Physical fitness before and during the COVID-19 pandemic: Results of annual national physical fitness surveillance among 16,647,699 Japanese children and adolescents between 2013 and 2021. J. Sport Health Sci. 2023, 12, 246–254. [Google Scholar] [CrossRef]
- Kuwahara, K.; Amagasa, S.; Yamaoka, K. Declines in Physical Activity and Cardiorespiratory Fitness After Implementing new National School Guidelines. J. Adolesc. Health. 2024, 74, 385–387. [Google Scholar] [CrossRef]
- Cho, K.O.; Lee, S.; Kim, Y.S. Physical activity and sedentary behavior are independently associated with weight in Korean adolescents. J. Lifestyle Med. 2014, 4, 47. [Google Scholar] [CrossRef]
- Yao, S. Construction of relationship model between college students’ psychological status and epidemic situation based on BP neural network. Comput. Intell. Neurosci. 2022, 2022, 5115432. [Google Scholar] [CrossRef]
- Zhao, H.; Shu, C. Original Research Article Student sports data analysis and physical fitness evaluation based on convolutional neural networks. J. Auton. Intell. 2024, 7. [Google Scholar] [CrossRef]
- Akay, M.F.; Çetin, E.; Yarım, İ.; Bozkurt, Ö.; Erdem, S. Development of Physical Fitness Prediction Models for Turkish Secondary School Students Using Machine Learning Methods. Gümüşhane Üniversitesi Fen Bilim. Derg. 2018, 1, 7–10. [Google Scholar]
- Zhang, L.; Jiao, Y. Analysis and Research on College Students’ Physical Fitness Test Data Based on K-means Clustering Algorithm. In Proceedings of the 2023 4th International Conference on Machine Learning and Computer Application, Hangzhou, China, 27–29 October 2023. [Google Scholar]
- Zhang, C. Application of K-means clustering algorithm in mental health education evaluation of college students. In Innovative Computing, Proceedings of the 5th International Conference on Innovative Computing, Guam, GU, USA, 11-14 January 2022; Springer: Berlin/Heidelberg, Germany, 2022. [Google Scholar]
- Awaluddin, M.F.; Nurtanio, I.; Zainuddin, Z. Mapping and Prediction of Stunting Potential Using K-Means Clustering and Support Vector Regression (Case Study of Palopo City). In Proceedings of the 2024 7th International Conference on Information and Communications Technology (ICOIACT), Ishikawa, Japan, 20–21 November 2024; IEEE: Piscataway, NJ, USA, 2024. [Google Scholar]
- Huang, X.; Gao, J.; Huang, H. Semi-supervised Clustering Ensemble with Pairwise Constraints Based on PCA Dimension Reduction. Comput. Mod. 2021, 1, 94–99. [Google Scholar]
- Xu, B.; Zheng, L.; Wang, K.; Song, C. Dim targets detection based on local gray probability analysis. J. Laser Infrared 2005, 35, 187189. [Google Scholar]
- Pearson, K. LIII. On lines and planes of closest fit to systems of points in space. Lond. Edinb. Dublin Philos. Mag. J. Sci. 1901, 2, 559–572. [Google Scholar] [CrossRef]
- Miao, R.; Li, B. A user-portraits-based recommendation algorithm for traditional short video industry and security management of user privacy in social networks. Technol. Forecast. Soc. Chang. 2022, 185, 122103. [Google Scholar] [CrossRef]
- Wang, S.; Liu, C.; Xing, S. A Review of K-means Clustering Algorithm Research. J. East China Jiaotong Univ. 2022, 39, 119–126. [Google Scholar]
- Shen, G.; Jiang, Z. A canopy bisecting k-means algorithm based on density and central index. Comput. Eng. Sci. 2022, 44, 372. [Google Scholar]
- Wang, J.; Wang, J.; Song, J.; Xu, X.; Shen, H.T.; Li, S. Optimized cartesian k-means. IEEE Trans. Knowl. Data Eng. 2014, 27, 180–192. [Google Scholar] [CrossRef]
- Ji, R.; Yang, J.; Wu, Y.; Li, Y.; Li, R.; Chen, J.; Yang, J. Construction and analysis of students’ physical health portrait based on principal component analysis improved Canopy-K-means algorithm. J. Supercomput. 2024, 80, 15940–15973. [Google Scholar] [CrossRef]
- Kleinbaum, D.G.; Dietz, K.; Gail, M.; Klein, M.; Klein, M. Logistic Regression; Springer: Berlin/Heidelberg, Germany, 2002. [Google Scholar]
- Harris, J.K. Primer on binary logistic regression. Fam. Med. Community Health 2021, 9, e1290. [Google Scholar] [CrossRef]
- Ying, L.U. Decision tree methods: Applications for classification and prediction. Shanghai Arch. Psychiatry 2015, 27, 130. [Google Scholar]
- Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016. [Google Scholar]
- Cui, S.; Wang, D.; Wang, S.; Xia, J.; Wang, Y.; Jin, Y. Prostate cancer diagnosis method based on gmm-rbf neural network. J. Manag. Sci. 2018, 31, 33–47. [Google Scholar]
- Wang, S.; Wang, Y.; Wang, D.; Yin, Y.; Wang, Y.; Jin, Y. An improved random forest-based rule extraction method for breast cancer diagnosis. Appl. Soft Comput. 2020, 86, 105941. [Google Scholar] [CrossRef]
- Groppe, D.M.; Urbach, T.P.; Kutas, M. Mass univariate analysis of event-related brain potentials/fields I: A critical tutorial review. Psychophysiology 2011, 48, 1711–1725. [Google Scholar] [CrossRef]
- Schober, P.; Boer, C.; Schwarte, L.A. Correlation coefficients: Appropriate use and interpretation. Anesth. Analg. 2018, 126, 1763–1768. [Google Scholar] [CrossRef]
- Ratner, B. The correlation coefficient: Its values range between +1/−1, or do they? J. Target. Meas. Anal. Mark. 2009, 17, 139–142. [Google Scholar] [CrossRef]
- Cohen, J. Statistical Power Analysis for the Behavioral Sciences; Routledge: London, UK, 2013. [Google Scholar]
- Bholowalia, P.; Kumar, A. EBK-means: A clustering technique based on elbow method and k-means in WSN. Int. J. Comput. Appl. 2014, 105, 9. [Google Scholar]
- Wang, X.; Xu, Y. An improved index for clustering validation based on Silhouette index and Calinski-Harabasz index. In IOP Conference Series: Materials Science and Engineering; IOP Publishing: Bristol, UK, 2019. [Google Scholar]
- Liang, S.; Han, D.; Yang, Y. Cluster validity index for irregular clustering results. Appl. Soft Comput. 2020, 95, 106583. [Google Scholar] [CrossRef]
- Jiang, W.; Chi, W.; Yao, L.; Zheng, C. Model Construction of Enterprise Business Portrait Based on Cluster Analysis and Random Forest. In Proceedings of the 2024 IEEE 13th International Conference on Communication Systems and Network Technologies (CSNT), Jabalpur, India, 6–7 April 2024; IEEE: Piscataway, NJ, USA, 2024. [Google Scholar]
- Zhai, M.Y.; Cheng, J.; Wang, S.T.; Wang, Y.Z. Portraying student education based on the k-prototype clustering method. J. Dalian Univ. Technol. (Soc. Sci.) 2021, 42, 22–31. [Google Scholar]
- Ji, H.; Ni, F.; Liu, J.; Lu, Q.; Zhang, X.; Que, Z. Prediction of telecom customer churn based on XGB-BFS feature selection algorithm. Comput. Technol. Dev. 2021, 31, 21–25. [Google Scholar]
- Molnar, C. Interpretable Machine Learning; Lulu. com: Morrisville, NA, USA, 2020. [Google Scholar]
- Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 2019, 1, 206–215. [Google Scholar] [CrossRef]
- Lundberg, S.M.; Lee, S. A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst. 2017, 30, 4768–4777. [Google Scholar]
- Guo, Y.; Liao, M.; Cai, W.; Yu, X.; Li, S.; Ke, X.; Tan, S.; Luo, Z.; Cui, Y.; Wang, Q. Physical activity, screen exposure and sleep among students during the pandemic of COVID-19. Sci. Rep. 2021, 11, 8529. [Google Scholar] [CrossRef] [PubMed]
- Yang, X.; Yu, H.; Liu, M.; Zhang, J.; Tang, B.; Yuan, S.; Gasevic, D.; Paul, K.; Wang, P.; He, Q. The impact of a health education intervention on health behaviors and mental health among Chinese college students. J. Am. Coll. Health. 2020, 68, 587–592. [Google Scholar] [CrossRef] [PubMed]
Test Subjects | Individual Indicators | Weight (%) |
---|---|---|
2020 Students of Yunnan Agricultural University/Southwest Forestry University | Body Mass Index (BMI) | 15 |
Vital Capacity | 15 | |
50 m Sprint | 20 | |
Sit-and-Reach | 10 | |
Standing Long Jump | 10 | |
Pull-ups (Male)/1 min Sit-ups (Female) | 10 | |
1000 m Run (Male)/800 m Run (Female) | 20 |
Weight | BMI | Lung Capacity Score | 50 m Running Score | Standing Long Jump Score | Sitting Forward Bend Score | 1000 m Running Score | Pulling Up Score | Total Score | Year |
---|---|---|---|---|---|---|---|---|---|
100 | 23.80 | 80 | 90 | 90 | 90 | 78 | 68 | 85.4 | 2020 |
60 | 21.08 | 90 | 100 | 100 | 76 | 90 | 100 | 94.1 | 2020 |
100 | 19.72 | 78 | 74 | 78 | 90 | 72 | 64 | 79.1 | 2021 |
80 | 23.63 | 100 | 74 | 78 | 100 | 66 | 10 | 76.8 | 2021 |
100 | 19.84 | 85 | 85 | 100 | 70 | 10 | 76 | 77.95 | 2022 |
80 | 20.24 | 62 | 76 | 74 | 80 | 50 | 50 | 69.9 | 2022 |
100 | 22.49 | 76 | 70 | 95 | 78 | 64 | 60 | 77 | 2023 |
80 | 22.04 | 78 | 60 | 100 | 68 | 10 | 50 | 68 | 2023 |
… | … | … | … | … | … | … | … | … | … |
Cluster | Cluster 1 | Cluster 2 | Cluster 3 |
---|---|---|---|
Amount | 4342 | 1930 | 4384 |
BMI Score | 97.34 | 95.82 | 94.42 |
Lung Capacity Score | 73.51 | 74.30 | 74.38 |
50 m Running Score | 79.19 | 70.30 | 74.00 |
Standing Long Jump Score | 75.30 | 78.40 | 61.00 |
Sitting Forward Bend Score | 75.36 | 71.82 | 70.01 |
1000 m Running Score | 70.73 | 12.73 | 59.14 |
Pulling up Score | 68.75 | 53.71 | 7.99 |
Health results | 77.17 | 65.30 | 63.00 |
Cluster | Cluster 1 | Cluster 2 | Cluster 3 | Cluster 4 |
---|---|---|---|---|
Amount | 4459 | 1771 | 4379 | 899 |
BMI Score | 97.76 | 97.55 | 98.95 | 95.06 |
Lung Capacity Score | 70.14 | 71.87 | 79.30 | 73.45 |
50 m Running Score | 67.15 | 68.87 | 74.61 | 55.97 |
Standing Long Jump Score | 67.50 | 74.51 | 83.70 | 62.93 |
Sitting Forward Bend Score | 71.96 | 74.60 | 82.99 | 70.20 |
800 m Running Score | 67.87 | 67.51 | 74.73 | 20.75 |
Sit up Score | 67.27 | 17.65 | 71.08 | 53.91 |
Health results | 70.81 | 67.51 | 80.76 | 61.75 |
Factor | Gradient Boosting | Random Forest | Decision Tree | Logistic Regression |
---|---|---|---|---|
Accuracy | 0.985929 | 0.979362 | 0.970919 | 0.996717 |
Precision | 0.985974 | 0.979385 | 0.970958 | 0.996720 |
Recall | 0.985929 | 0.979362 | 0.970919 | 0.996717 |
F1 Score | 0.985889 | 0.979315 | 0.970880 | 0.996717 |
Factor | Gradient Boosting | Random Forest | Decision Tree | Logistic Regression |
---|---|---|---|---|
Accuracy | 0.970895 | 0.963510 | 0.929626 | 0.994787 |
Precision | 0.970908 | 0.963591 | 0.929680 | 0.994784 |
Recall | 0.970895 | 0.963510 | 0.929626 | 0.994787 |
F1 Score | 0.970883 | 0.963462 | 0.929634 | 0.994784 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Guo, R.; Dong, R.; Lu, N.; Yu, L.; Chen, C.; Che, Y.; Zhang, J.; Yang, J. Physical Health Portrait and Intervention Strategy of College Students Based on Multivariate Cluster Analysis and Machine Learning. Appl. Sci. 2025, 15, 4940. https://doi.org/10.3390/app15094940
Guo R, Dong R, Lu N, Yu L, Chen C, Che Y, Zhang J, Yang J. Physical Health Portrait and Intervention Strategy of College Students Based on Multivariate Cluster Analysis and Machine Learning. Applied Sciences. 2025; 15(9):4940. https://doi.org/10.3390/app15094940
Chicago/Turabian StyleGuo, Rong, Rou Dong, Ni Lu, Lin Yu, Chaoxian Chen, Yonglin Che, Jiajin Zhang, and Jianke Yang. 2025. "Physical Health Portrait and Intervention Strategy of College Students Based on Multivariate Cluster Analysis and Machine Learning" Applied Sciences 15, no. 9: 4940. https://doi.org/10.3390/app15094940
APA StyleGuo, R., Dong, R., Lu, N., Yu, L., Chen, C., Che, Y., Zhang, J., & Yang, J. (2025). Physical Health Portrait and Intervention Strategy of College Students Based on Multivariate Cluster Analysis and Machine Learning. Applied Sciences, 15(9), 4940. https://doi.org/10.3390/app15094940