Using Boosted Machine Learning to Predict Suicidal Ideation by Socioeconomic Status among Adolescents
Abstract
:1. Introduction
2. Materials and Methods
2.1. Study Population
2.2. Measures
2.2.1. Demographic Variables
2.2.2. Suicidal Ideation
2.2.3. Mental Health-Related Variables
2.2.4. Health-Related Behavior
2.3. Data Processing and Machine Learning
3. Results
3.1. General Characteristics of the Suicidal Ideation
3.2. XGBoost Models by Socioeconomic Status and Prediction of Suicidal Ideation
3.3. Decision Tree of Suicidal Ideation by XGBoost
3.4. Decision Tree of Suicidal Ideation by XGBoost
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Kwak, C.W.; Ickovics, J.R. Adolescent suicide in South Korea: Risk factors and proposed multi-dimensional solution. Asian J. Psychiatry 2019, 43, 150–153. [Google Scholar] [CrossRef] [PubMed]
- Jeon, H.J.; Bae, J.; Woo, J.-M. Recent statistics and risk factors of suicide in children and adolescents. J. Korean Med. Assoc. 2013, 56, 93–99. [Google Scholar] [CrossRef]
- Hawton, K.; Saunders, K.E.; O’Connor, R.C. Self-harm and suicide in adolescents. Lancet 2012, 379, 2373–2382. [Google Scholar] [CrossRef]
- Miranda-Mendizabal, A.; Castellví, P.; Parés-Badell, O.; Alayo, I.; Almenara, J.; Alonso, I.; Blasco, M.J.; Cebria, A.; Gabilondo, A.; Gili, M. Gender differences in suicidal behavior in adolescents and young adults: Systematic review and meta-analysis of longitudinal studies. Int. J. Public Health 2019, 64, 265–283. [Google Scholar] [CrossRef] [Green Version]
- Cohen, J. Statistical approaches to suicidal risk factor analysis. Ann. N. Y. Acad. Sci. 1986, 487, 34–41. [Google Scholar] [CrossRef]
- Moitra, M.; Santomauro, D.; Degenhardt, L.; Collins, P.Y.; Whiteford, H.; Vos, T.; Ferrari, A. Estimating the risk of suicide associated with mental disorders: A systematic review and meta-regression analysis. J. Psychiatr. Res. 2021, 137, 242–249. [Google Scholar] [CrossRef]
- Linthicum, K.P.; Schafer, K.M.; Ribeiro, J.D. Machine learning in suicide science: Applications and ethics. Behav. Sci. Law 2019, 37, 214–222. [Google Scholar] [CrossRef]
- Freund, Y.; Schapire, R.E. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 1997, 55, 119–139. [Google Scholar] [CrossRef] [Green Version]
- Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
- Shalev-Shwartz, S.; Ben-David, S. Understanding Machine Learning: From Theory to Algorithms; Cambridge University Press: Cambridge, UK, 2014. [Google Scholar]
- Zhang, Z.; Zhao, Y.; Canes, A.; Steinberg, D.; Lyashevska, O. Predictive analytics with gradient boosting in clinical medicine. Ann. Transl. Med. 2019, 7, 152. [Google Scholar] [CrossRef]
- Baker, E.H. Socioeconomic status, definition. In The Wiley Blackwell Encyclopedia of Health, Illness, Behavior, and Society; Wiley: Hoboken, NJ, USA, 2014; pp. 2210–2214. [Google Scholar]
- Goodman, E.; Slap, G.B.; Huang, B. The public health impact of socioeconomic status on adolescent depression and obesity. Am. J. Public Health 2003, 93, 1844–1850. [Google Scholar] [CrossRef] [PubMed]
- Devenish, B.; Hooley, M.; Mellor, D. The pathways between socioeconomic status and adolescent outcomes: A systematic review. Am. J. Community Psychol. 2017, 59, 219–238. [Google Scholar] [CrossRef] [PubMed]
- Lee, S.H.; Shin, C.; Kim, H.; Jeon, S.W.; Yoon, H.K.; Ko, Y.H.; Pae, C.U.; Han, C. Validation of the Korean version of the generalized anxiety disorder 7 self-rating scale. Asia-Pac. Psychiatry 2020, 14, e12421. [Google Scholar] [CrossRef] [PubMed]
- Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM Sigkdd International Conference on Knowledge Discovery and Data Mining; ACM Digital Library: New York, NY, USA, 2016; pp. 785–794. [Google Scholar]
- Li, D.-C.; Liu, C.-W.; Hu, S.C. A learning method for the class imbalance problem with medical data sets. Comput. Biol. Med. 2010, 40, 509–518. [Google Scholar] [CrossRef] [PubMed]
- Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
- Flach, P.A.; Hernández-Orallo, J.; Ramirez, C.F. A coherent interpretation of AUC as a measure of aggregated classification performance. In Proceedings of the 28th International Conference on Machine Learning, Bellevue, WA, USA, 28 June–2 July 2011. [Google Scholar]
- McHugh, C.M.; Large, M.M. Can machine-learning methods really help predict suicide? Curr. Opin. Psychiatry 2020, 33, 369–374. [Google Scholar] [CrossRef]
- Kim, S.; Lee, H.-K.; Lee, K. Detecting suicidal risk using MMPI-2 based on machine learning algorithm. Sci. Rep. 2021, 11, 1–9. [Google Scholar]
- Kim, S.; Lee, H.-K.; Lee, K. Which PHQ-9 items can effectively screen for suicide? Machine learning approaches. Int. J. Environ. Res. Public Health 2021, 18, 3339. [Google Scholar] [CrossRef]
- Lewis, S.A.; Johnson, J.; Cohen, P.; Garcia, M.; Noemi Velez, C. Attempted suicide in youth: Its relationship to school achievement, educational goals, and socioeconomic status. J. Abnorm. Child Psychol. 1988, 16, 459–471. [Google Scholar] [CrossRef]
- Grøholt, B.; Ekeberg, Ø.; Wichstrøm, L.; Haldorsen, T. Young suicide attempters: A comparison between a clinical and an epidemiological sample. J. Am. Acad. Child Adolesc. Psychiatry 2000, 39, 868–875. [Google Scholar] [CrossRef]
- Wadsworth, M.E.; Achenbach, T.M. Explaining the link between low socioeconomic status and psychopathology: Testing two mechanisms of the social causation hypothesis. J. Consult. Clin. Psychol. 2005, 73, 1146. [Google Scholar] [CrossRef] [PubMed]
- Costello, E.J.; Compton, S.N.; Keeler, G.; Angold, A. Relationships between poverty and psychopathology: A natural experiment. Jama 2003, 290, 2023–2029. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Kiviruusu, O.; Huurre, T.; Haukkala, A.; Aro, H. Changes in psychological resources moderate the effect of socioeconomic status on distress symptoms: A 10-year follow-up among young adults. Health Psychol. 2013, 32, 627. [Google Scholar] [CrossRef] [PubMed]



| Total | Suicidal Ideation | p Value | |||||
|---|---|---|---|---|---|---|---|
| No | Yes | ||||||
| Gender | <0.001 | ||||||
| Male | 28,353 | 51.6 | 26,099 | 53.3 | 2254 | 37.7 | |
| Female | 26,595 | 48.4 | 22,870 | 46.7 | 3725 | 62.3 | |
| Sadness or hopelessness over 2 weeks | <0.001 | ||||||
| No | 41,108 | 74.8 | 39,468 | 80.6 | 1640 | 27.4 | |
| Yes | 13,840 | 25.2 | 9501 | 19.4 | 4339 | 72.6 | |
| Perceived stress level in daily life | <0.001 | ||||||
| Extremely | 4603 | 8.4 | 2785 | 5.7 | 1818 | 30.4 | |
| Stressful | 14,059 | 25.6 | 11,423 | 23.3 | 2636 | 44.1 | |
| Moderately | 24,379 | 44.4 | 23,055 | 47.1 | 1324 | 22.1 | |
| Minimally | 9889 | 18.0 | 9734 | 19.9 | 155 | 2.6 | |
| Not at all | 2018 | 3.7 | 1972 | 4.0 | 46 | 0.8 | |
| Feeling of happiness | <0.001 | ||||||
| Very happy | 15,111 | 27.5 | 14,666 | 29.9 | 445 | 7.4 | |
| A little happy | 20,064 | 36.5 | 18,785 | 38.4 | 1279 | 21.4 | |
| Normal | 14,960 | 27.2 | 12,880 | 26.3 | 2080 | 34.8 | |
| A little unhappy | 4070 | 7.4 | 2377 | 4.9 | 1693 | 28.3 | |
| Very unhappy | 743 | 1.4 | 261 | 0.5 | 482 | 8.1 | |
| Violence victimization | <0.001 | ||||||
| No | 54,229 | 98.7 | 48,530 | 99.1 | 5699 | 95.3 | |
| Yes | 719 | 1.3 | 439 | 0.9 | 280 | 4.7 | |
| GAD-7 score | 3.91 ± 4.37 | 3.30 ± 3.78 | 8.84 ± 5.60 | <0.001 | |||
| Subjective health status | <0.001 | ||||||
| Very good | 15,150 | 27.6 | 14,244 | 29.1 | 906 | 15.2 | |
| Good | 23,294 | 42.4 | 21,151 | 43.2 | 2143 | 35.8 | |
| Fair | 12,342 | 22.5 | 10,543 | 21.5 | 1799 | 30.1 | |
| Poor | 3891 | 7.1 | 2876 | 5.9 | 1015 | 17.0 | |
| Very poor | 271 | 0.5 | 155 | 0.3 | 116 | 1.9 | |
| Alcohol consumption (month) | <0.001 | ||||||
| none | 49,056 | 89.3 | 44,247 | 90.4 | 4809 | 80.4 | |
| 2 days | 3495 | 6.4 | 2863 | 5.8 | 632 | 10.6 | |
| 3~4 days | 1059 | 1.9 | 849 | 1.7 | 210 | 3.5 | |
| 6 days or more | 1338 | 2.4 | 1010 | 2.1 | 328 | 5.5 | |
| Smoking (month) | <0.001 | ||||||
| Non-smoker | 52,478 | 95.5 | 47,046 | 96.1 | 5432 | 90.9 | |
| 1~9 days | 1168 | 2.1 | 898 | 1.8 | 270 | 4.5 | |
| 10 days or more | 1302 | 2.4 | 1025 | 2.1 | 277 | 4.6 | |
| Sexual experience | <0.001 | ||||||
| No | 52,461 | 95.5 | 47,050 | 96.1 | 5411 | 90.5 | |
| Yes | 2487 | 4.5 | 1919 | 3.9 | 568 | 9.5 | |
| Drug abuse | <0.001 | ||||||
| No | 54,543 | 99.3 | 48,738 | 99.5 | 5805 | 97.1 | |
| Yes | 405 | 0.7 | 231 | 0.5 | 174 | 2.9 | |
| Academic performance | <0.001 | ||||||
| High | 6736 | 12.3 | 6081 | 12.4 | 655 | 11.0 | |
| Medium high | 13,410 | 24.4 | 12,123 | 24.8 | 1287 | 21.5 | |
| Medium | 16,585 | 30.2 | 15,034 | 30.7 | 1551 | 25.9 | |
| Medium low | 12,684 | 23.1 | 11,150 | 22.8 | 1534 | 25.7 | |
| Low | 5533 | 10.1 | 4581 | 9.4 | 952 | 15.9 | |
| Socioeconomic status | <0.001 | ||||||
| High | 6039 | 11.0 | 5518 | 11.3 | 521 | 8.7 | |
| Medium | 47,634 | 86.7 | 42,486 | 86.8 | 5148 | 86.1 | |
| Low | 1275 | 2.3 | 965 | 2.0 | 310 | 5.2 | |
| Machine Learning Methods | Model | Sensitivity | Specificity | Accuracy | Positive Predictive Value | Negative Predictive Value | F1 Score | AUC | |
|---|---|---|---|---|---|---|---|---|---|
| (%) | (%) | (%) | (%) | (%) | |||||
| XGBoost model | Test data | High SES | 57.9 | 81.6 | 79.4 | 24.3 | 95.0 | 0.343 | 0.773 | 
| Middle SES | 76.5 | 78.0 | 77.8 | 29.9 | 96.4 | 0.430 | 0.846 | ||
| Low SES | 60.7 | 80.4 | 74.9 | 54.5 | 84.1 | 0.575 | 0.781 | ||
| Training data | High SES | 69.8 | 80.4 | 79.4 | 26.1 | 96.4 | 0.380 | 0.835 | |
| Middle SES | 77.3 | 78.3 | 78.2 | 30.6 | 96.5 | 0.439 | 0.857 | ||
| Low SES | 74.9 | 80.0 | 78.8 | 54.4 | 90.9 | 0.630 | 0.871 | ||
| Random Forest | Test data | High SES | 35.6 | 87.7 | 83.0 | 22.1 | 93.3 | 0.273 | 0.767 | 
| Middle SES | 52.0 | 84.9 | 81.4 | 29.3 | 93.6 | 0.375 | 0.794 | ||
| Low SES | 56.3 | 80.8 | 74.6 | 49.5 | 84.6 | 0.526 | 0.762 | 
| Total | Socioeconomic Status | p Value | |||||||
|---|---|---|---|---|---|---|---|---|---|
| High | Medium | Low | |||||||
| N | (%) | N | (%) | N | (%) | N | (%) | ||
| Suicidal ideation | 5979 | 10.9 | 521 | 8.6 | 5148 | 10.8 | 310 | 24.3 | |
| Gender | <0.001 | ||||||||
| Male | 28,353 | 51.6 | 3536 | 58.6 | 24,095 | 50.6 | 722 | 56.6 | |
| Female | 26,595 | 48.4 | 2503 | 41.4 | 23,539 | 49.4 | 553 | 43.4 | |
| Sadness | <0.001 | ||||||||
| No | 41,108 | 74.8 | 4689 | 77.6 | 35,701 | 74.9 | 718 | 56.3 | |
| Yes | 13,840 | 25.2 | 1350 | 22.4 | 11,933 | 25.1 | 557 | 43.7 | |
| Perceived stress | <0.001 | ||||||||
| Extremely | 4603 | 8.4 | 515 | 8.3 | 3804 | 8.0 | 284 | 22.3 | |
| Stressful | 14,059 | 25.6 | 1191 | 19.7 | 12,470 | 26.2 | 398 | 31.2 | |
| Moderately | 24,379 | 44.4 | 2429 | 40.2 | 21,512 | 45.2 | 438 | 34.4 | |
| Minimally | 9889 | 18.0 | 1364 | 22.6 | 8404 | 17.6 | 121 | 9.5 | |
| Not at all | 2018 | 3.7 | 540 | 8.9 | 1444 | 3.0 | 34 | 2.7 | |
| Feeling of happiness | <0.001 | ||||||||
| Very happy | 15,111 | 27.5 | 2816 | 46.6 | 12,081 | 25.4 | 214 | 16.8 | |
| A little happy | 20,064 | 36.5 | 1785 | 29.6 | 17,975 | 37.7 | 304 | 23.8 | |
| Normal | 14,960 | 27.2 | 1101 | 18.2 | 13,426 | 28.2 | 433 | 34.0 | |
| A little unhappy | 4070 | 7.4 | 263 | 4.4 | 3577 | 7.5 | 230 | 18.0 | |
| Very unhappy | 743 | 1.4 | 74 | 1.2 | 575 | 1.2 | 94 | 7.4 | |
| Violent victimization | <0.001 | ||||||||
| No | 54,229 | 98.7 | 5903 | 97.7 | 47,103 | 98.9 | 1223 | 95.9 | |
| Yes | 719 | 1.3 | 136 | 2.3 | 531 | 1.1 | 52 | 4.1 | |
| GAD-7 score | 3.91 ± 4.37 | 3.12 ± 4.30 | 3.95 ± 4.31 | 6.02 ± 5.86 | <0.001 | ||||
| Subjective health status | <0.001 | ||||||||
| Very good | 15,150 | 27.6 | 2711 | 44.9 | 12,130 | 25.5 | 309 | 24.2 | |
| Good | 23,294 | 42.4 | 2205 | 36.5 | 20,705 | 43.5 | 384 | 30.1 | |
| Fair | 12,342 | 22.5 | 848 | 14.0 | 11,125 | 23.4 | 369 | 28.9 | |
| Poor | 3891 | 7.1 | 234 | 3.9 | 3478 | 7.3 | 179 | 14.0 | |
| Very poor | 271 | 0.5 | 41 | 0.7 | 196 | 0.4 | 34 | 2.7 | |
| Alcohol consumption (month) | <0.001 | ||||||||
| No drinker | 49,056 | 89.3 | 5431 | 89.9 | 42,597 | 89.4 | 1028 | 80.6 | |
| 2 days | 3495 | 6.4 | 307 | 5.1 | 3076 | 6.5 | 112 | 8.8 | |
| 3~4 days | 1059 | 1.9 | 100 | 1,7 | 913 | 1.9 | 46 | 3.6 | |
| 6 days or more | 1338 | 2.4 | 201 | 3.3 | 1048 | 2.2 | 89 | 7.0 | |
| Smoking (month) | <0.001 | ||||||||
| Non-smoker | 52,478 | 95.5 | 5737 | 95.0 | 45,605 | 95.7 | 1135 | 89.0 | |
| 1~9 days | 1168 | 2.1 | 123 | 2.0 | 992 | 2.1 | 53 | 4.2 | |
| 10 days or more | 1302 | 2.4 | 179 | 3.0 | 1036 | 2.2 | 87 | 6.8 | |
| Sexual experience | <0.001 | ||||||||
| No | 52,461 | 95.5 | 5668 | 93.9 | 45,659 | 95.9 | 1134 | 88.9 | |
| Yes | 2487 | 4.5 | 371 | 6.1 | 1975 | 4.1 | 141 | 11.1 | |
| Drug abuse | <0.001 | ||||||||
| No | 54,543 | 99.3 | 5978 | 99.0 | 47,320 | 99.3 | 1245 | 97.6 | |
| Yes | 405 | 0.7 | 61 | 1.0 | 314 | 0.7 | 30 | 2.4 | |
| Academic performance | <0.001 | ||||||||
| High | 6736 | 12.3 | 1907 | 31.6 | 4739 | 9.9 | 90 | 7.1 | |
| Medium high | 13,410 | 24.4 | 1608 | 26.6 | 11,664 | 24.5 | 138 | 10.8 | |
| Medium | 16,585 | 30.2 | 1247 | 20.6 | 15,129 | 31.8 | 209 | 16.4 | |
| Medium low | 12,684 | 23.1 | 816 | 13.5 | 11,502 | 24.1 | 366 | 28.7 | |
| Low | 5533 | 10.1 | 461 | 7.6 | 4600 | 9.7 | 472 | 37.0 | |
| Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. | 
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Park, H.; Lee, K. Using Boosted Machine Learning to Predict Suicidal Ideation by Socioeconomic Status among Adolescents. J. Pers. Med. 2022, 12, 1357. https://doi.org/10.3390/jpm12091357
Park H, Lee K. Using Boosted Machine Learning to Predict Suicidal Ideation by Socioeconomic Status among Adolescents. Journal of Personalized Medicine. 2022; 12(9):1357. https://doi.org/10.3390/jpm12091357
Chicago/Turabian StylePark, Hwanjin, and Kounseok Lee. 2022. "Using Boosted Machine Learning to Predict Suicidal Ideation by Socioeconomic Status among Adolescents" Journal of Personalized Medicine 12, no. 9: 1357. https://doi.org/10.3390/jpm12091357
APA StylePark, H., & Lee, K. (2022). Using Boosted Machine Learning to Predict Suicidal Ideation by Socioeconomic Status among Adolescents. Journal of Personalized Medicine, 12(9), 1357. https://doi.org/10.3390/jpm12091357
 
         
                                                

 
       