Prediction of Voice Therapy Outcomes Using Machine Learning Approaches and SHAP Analysis: A K-VRQOL-Based Analysis
Abstract
1. Introduction
- First, it presents a holistic analytical framework that elucidates the complex relationship between voice and quality of life beyond fragmented or univariate approaches.
- Second, it demonstrates the use of machine learning and explainable AI to capture nonlinear interactions and assess the relative importance of predictors in an interpretable way.
- Third, the findings provide a scientifically grounded foundation for the early identification of at-risk individuals, the development of preventive strategies, and the implementation of effective interventions aimed at improving voice-related quality of life.
2. Related Works
3. Materials and Methods
3.1. Database
3.2. K-VRQOL Questionnaires
3.3. Statistical Analysis
3.4. Machine Learning Models
3.5. Shapley Additive Explanations (SHAP)
4. Results
4.1. Descriptive Analysis
4.2. Multiple Regression Analysis
4.3. Performance Evaluation of Machine Learning Models
4.4. Interpretability Analysis of Machine Learning Model Predictions
5. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Emshoff, R.; Astl, M.; Giotakis, A.I.; Hupp, L.C.; Kolk, A. Factors Associated with Voice-Related Quality of Life among Patients with Temporomandibular Disorders. J. Appl. Oral Sci. 2024, 32, e20230296. [Google Scholar] [CrossRef] [PubMed]
- Judge, T.A.; Bono, J.E. Relationship of Core Self-Evaluations Traits—Self-Esteem, Generalized Self-Efficacy, Locus of Control, and Emotional Stability—With Job Satisfaction and Job Performance: A Meta-Analysis. J. Appl. Psychol. 2001, 86, 80–92. [Google Scholar] [CrossRef] [PubMed]
- Diener, E.; Chan, M.Y. Happy People Live Longer: Subjective Well-Being Contributes to Health and Longevity. Appl. Psychol. Health Well Being 2011, 3, 1–43. [Google Scholar] [CrossRef]
- Keyes, C.L.M. The Mental Health Continuum: From Languishing to Flourishing in Life. J. Health Soc. Behav. 2002, 43, 207–222. [Google Scholar] [CrossRef]
- Cantor Cutiva, L.C. Association between Occupational Voice Use and Occurrence of Voice Disorders: A Meta-Analysis. Areté 2018, 18, 1–10. [Google Scholar] [CrossRef]
- Dietrich, M.; Verdolini Abbott, K.; Gartner-Schmidt, J.; Rosen, C.A. Psychosocial Distress in Patients Presenting with Voice Concerns. J. Voice 2012, 26, 711–721. [Google Scholar]
- Hunter, E.J.; Cantor-Cutiva, L.C.; van Leer, E.; van Mersbergen, M.; Bottalico, P.; Nanjundeswaran, C.D. Quantifying the Occupational Voice Use of Teachers. J. Speech Lang. Hear. Res. 2024, 67, 123–135. [Google Scholar]
- Behlau, M.; Zambon, F.; Guerrieri, A.C.; Roy, N. Voice Problems of Group Fitness Instructors: Diagnosis, Treatment, and Recommendations. J. Voice 2014, 28, 316–327. [Google Scholar]
- Hogikyan, N.D.; Wodchis, W.P.; Terrell, J.E.; Bradford, C.R.; Esclamado, R.M. Voice-Related Quality of Life (V-RQOL) Following Type I Thyroplasty for Unilateral Vocal Fold Paralysis. J. Voice 2000, 14, 378–386. [Google Scholar] [CrossRef]
- Schuman, A.D.; Neevel, A.; Morrison, R.J.; Hogikyan, N.D.; Kupfer, R.A. Voice-Related Quality of Life Is Associated with Postoperative Change in Subglottic Stenosis. Laryngoscope 2021, 131, 360–365. [Google Scholar] [CrossRef]
- Kim, J.O.; Lim, S.E.; Park, S.Y.; Choi, S.H.; Choi, J.N.; Choi, H.S. Validity and Reliability of the Korean-Version of the Voice Handicap Index and Voice-Related Quality of Life. Speech Sci. 2007, 14, 111–125. [Google Scholar]
- Lee, Y.J.; Hwang, Y.J. Comparison of Self-Voice Assessment in Individuals with Voice Disorders and Listener Voice Assessment in Normal Individuals. Speech Sound Voice Sci. 2012, 4, 105–114. [Google Scholar]
- Lee, Y.; Kim, G.-H. Discriminative and Predictive Ability for Screening Korean Dysphonic Patients Using Self-Reported Questionnaires. Clin. Arch. Commun. Disord. 2020, 5, 132–244. [Google Scholar] [CrossRef]
- Kang, H.; Kim, S.; Yoo, J. Vocal Hygiene Habits and the Characteristics of Korean Voice-Related Quality of Life (K-VRQOL) Among Classical Singers. Phon. Speech Sci. 2018, 10, 49–59. [Google Scholar]
- Wertzner, H.F.; Schreiber, S.; Amaro, L. Analysis of Fundamental Frequency, Jitter, Shimmer and Vocal Parameters in Children with Phonological Disorders. Braz. J. Otorhinolaryngol. 2022, 88, 944–951. [Google Scholar]
- Lathadevi, H.T.; Guggarigoudar, S.P. Objective Acoustic Analysis and Comparison of Normal and Abnormal Voices. J. Clin. Diagn. Res. 2018, 12, MC01–MC04. [Google Scholar] [CrossRef]
- Rajula, H.S.R.; Verlato, G.; Manchia, M.; Antonucci, N.; Fanos, V. Comparison of Conventional Statistical Methods with Machine Learning in Medicine: Diagnosis, Drug Development, and Treatment. Medicina 2020, 56, 455. [Google Scholar] [CrossRef]
- Ho, F.K.; Cole, T.J. Non-linear Predictor-Outcome Associations in Clinical Research: Challenges and Solutions. BMJ Med. 2022, 2, e000396. [Google Scholar]
- Kim, J.; Jeong, K.; Lee, S.; Baek, Y. Machine-Learning Model Predicting Quality of Life Using Multifaceted Lifestyles in Middle-Aged South Korean Adults: A Cross-Sectional Study. BMC Public Health 2024, 24, 159. [Google Scholar] [CrossRef]
- Choi, J.-H.; Lee, K.-S.; Ahn, K.-H.; Jeong, W.Y. Explainable Model Using Shapley Additive Explanations Approach for Analyzing Wound Infections after Wide Resection in Patients with Soft Tissue Sarcomas. J. Clin. Med. 2023, 12, 3451. [Google Scholar]
- Salih, A.M.; Raisi-Estabragh, Z.; Galazzo, I.B.; Radeva, P.; Petersen, S.E.; Lekadir, K.; Menegaz, G. A Perspective on Explainable Artificial Intelligence Methods: SHAP and LIME. arXiv 2023, arXiv:2305.02012. [Google Scholar] [CrossRef]
- Despotovic, V.; Elbéji, A.; Fünfgeld, K.; Pizzimenti, M.; Ayadi, H.; Nazarov, P.V.; Fagherazzi, G. Digital Voice-Based Biomarker for Monitoring Respiratory Quality of Life: Findings from the Colive Voice Study. Biomed. Signal Process. Control. 2024, 96, 106555. [Google Scholar] [CrossRef]
- Stojanović, J.; Veselinović, M.; Jevtić, M.; Jovanović, M.; Nikolić, D.; Kuzmanović Pfićer, J.; Živković-Marinkov, E.; Relić, N. Assessment of Life Quality in Children with Dysphonia Using Modified Pediatric Voice-Related Quality of Life Questionnaire in Serbia. Children 2023, 10, 125. [Google Scholar] [CrossRef] [PubMed]
- Yu, L.; Lu, D.; Yang, H.; Zou, J.; Wang, H.; Zheng, M.; Hu, J. A Comparative and Correlative Study of the Voice-Related Quality of Life (V-RQOL) and the Voice Activity and Participation Profile (VAPP) for Voice-Related Quality of Life among Teachers with and without Voice Disorders. Medicine 2019, 98, e14491. [Google Scholar] [CrossRef] [PubMed]
- Lu, D.; Huang, M.; Cheng, I.K.; Dong, J.; Yang, H. Comparison and Correlation Between the Pediatric Voice Handicap Index and the Pediatric Voice-Related Quality-of-Life Questionnaires. Medicine 2018, 97, e11850. [Google Scholar] [CrossRef]
- Lee, J.-Y.; Park, J.H.; Lee, J.-N.; Jung, A.R. Personal and Clinical Predictors of Voice Therapy Outcomes: A Machine Learning Analysis Using the Voice Handicap Index. Appl. Sci. 2024, 14, 10376. [Google Scholar] [CrossRef]
- Wang, C.-C.; Liao, J.-S.; Lai, H.-C.; Lo, Y.-H. Correlations among the Mandarin Voice Handicap Index, Its Shortened Version, and the Voice-Related Quality of Life Measure for Laryngectomees. J. Chin. Med. Assoc. 2022, 85, 944–951. [Google Scholar] [CrossRef]
- Rasch, T.; Günther, E. Voice-Related Quality of Life in Organic and Functional Voice Disorders. Logop. Phoniatr. Vocol. 2005, 30, 9–13. [Google Scholar] [CrossRef]
- Hogikyan, N.D.; Sethuraman, G. Validation of an Instrument to Measure Voice-Related Quality of Life (V-RQOL). J. Voice 1999, 13, 557–569. [Google Scholar] [CrossRef]
- Uyanık, G.K.; Güler, N. A Study on Multiple Linear Regression Analysis. Procedia Soc. Behav. Sci. 2013, 106, 234–240. [Google Scholar] [CrossRef]
- Li, X.; Wang, Y.; Zhang, H.; Liu, Z. A Multiple Linear Regression Analysis Identifies Factors Associated with Quality of Life in Cancer Patients. Medicine 2023, 102, e33345. [Google Scholar]
- De Falco, I.; Esposito, M.; Cimitile, M.; De Pietro, G. Multiple Regression Model to Analyze the Total Length of Stay for Patients Undergoing Laparoscopic Appendectomy. BMC Med. Inform. Decis. Mak. 2022, 22, 84. [Google Scholar]
- Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Ruder, S. An Overview of Gradient Descent Optimization Algorithms. arXiv 2016, arXiv:1609.04747. [Google Scholar]
- Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.-Y. LightGBM: A Highly Efficient Gradient Boosting Decision Tree, Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Curran Associates Inc.: Red Hook, NY, USA, 2017; pp. 3149–3157. [Google Scholar]
- Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System, Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; ACM: New York, NY, USA, 2016; pp. 785–794. [Google Scholar]
- Lundberg, S.M.; Lee, S.-I. A Unified Approach to Interpreting Model Predictions. Adv. Neural Inf. Process. Syst. 2017, 30, 4765–4774. [Google Scholar]
- Štrumbelj, E.; Kononenko, I. Explaining Prediction Models and Individual Predictions with Feature Contributions. Knowl. Inf. Syst. 2014, 41, 647–665. [Google Scholar] [CrossRef]
- Van den Broeck, G.; Lykov, A.; Schleich, M.; Suciu, D. On the Tractability of SHAP Explanations. arXiv 2020, arXiv:2009.08634. [Google Scholar]
Number of samples | 63 female and 35 male voices |
Average age | 51 |
Types of voice disorders (numbers) | vocal fold polyp (31), vocal paresis (3), vocal nodule (19), mutational dysphonia (1), thyroid nodule (1), hoarseness (4), muscle tension dysphonia (7), sulcus vocalis (2), presbyphonia (8), dysphonia (10), vocal cord cyst (9), leukoplakia (2), thyroid cancer (2), vallecular cyst (1), vocal mass (2), and laryngopharyngeal reflux (1). |
Variables | gender, smoking status, alcohol status, voice user status, coffee status, comorbidity, surgery status, voice training status, the number of training sessions, and acoustic parameters measured before and after voice disorder therapy such as F0 (Hz), jitter (%), shimmer (%), NHR (dB), SFF (Hz), and MPT (s). |
Emotional domain: 4, 5, 8, 10 Physical domain: 1, 2, 3, 6, 7, 9 | |||||
---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 |
| 1 | 2 | 3 | 4 | 5 |
| 1 | 2 | 3 | 4 | 5 |
| 1 | 2 | 3 | 4 | 5 |
| 1 | 2 | 3 | 4 | 5 |
| 1 | 2 | 3 | 4 | 5 |
| 1 | 2 | 3 | 4 | 5 |
| 1 | 2 | 3 | 4 | 5 |
| 1 | 2 | 3 | 4 | 5 |
| 1 | 2 | 3 | 4 | 5 |
Dependent Variable | Independent Variable | B | Std. Error | Beta (β) | t | p-Value | VIF |
---|---|---|---|---|---|---|---|
Total scores_diff 1 | (constant) | −17.000 | 11.519 | −1.476 | 0.144 | ||
gender | 16.214 | 5.202 | 0.347 | 3.117 | 0.003 * | 1.638 | |
smoking status | 9.397 | 7.292 | 0.142 | 1.288 | 0.201 | 1.613 | |
alcohol status | 0.491 | 4.691 | 0.011 | 0.105 | 0.917 | 1.411 | |
coffee status | 4.975 | 5.022 | 0.104 | 0.991 | 0.325 | 1.462 | |
voice user status | −2.877 | 4.478 | −0.064 | −0.642 | 0.522 | 1.319 | |
comorbidity | 10.743 | 4.716 | 0.223 | 2.278 | 0.025 * | 1.268 | |
surgery status | −6.746 | 4.632 | −0.168 | −1.456 | 0.149 | 1.758 | |
voice training status | 6.952 | 7.914 | 0.146 | 0.878 | 0.382 | 3.631 | |
number of voice trainings | −0.368 | 0.991 | −0.057 | −0.371 | 0.711 | 3.136 | |
F0_diff 1 | −0.075 | 0.062 | −0.148 | −1.215 | 0.228 | 1.966 | |
Jitter_diff 1 | −0.079 | 0.095 | −0.077 | −0.833 | 0.407 | 1.125 | |
Shimmer_diff 1 | −0.246 | 0.629 | −0.051 | −0.392 | 0.696 | 2.212 | |
NHR_diff 1 | −27.929 | 23.575 | −0.161 | −1.185 | 0.240 | 2.440 | |
SFF_diff 1 | −0.251 | 0.111 | −0.260 | −2.268 | 0.026 * | 1.740 | |
MPT_diff 1 | 0.436 | 0.534 | 0.081 | 0.817 | 0.416 | 1.312 | |
Physical scores_diff 1 | (constant) | −30.154 | 11.094 | −2.718 | 0.008 | ||
gender | 19.186 | 5.010 | 0.370 | 3.830 | <0.001 * | 1.638 | |
smoking status | 3.508 | 7.025 | 0.048 | 0.499 | 0.619 | 1.613 | |
alcohol status | 3.613 | 4.518 | 0.072 | 0.800 | 0.426 | 1.411 | |
coffee status | 9.011 | 4.836 | 0.170 | 1.863 | 0.066 | 1.462 | |
voice user status | 0.672 | 4.313 | 0.014 | 0.156 | 0.877 | 1.319 | |
comorbidity | 14.147 | 4.542 | 0.265 | 3.115 | 0.003 * | 1.268 | |
surgery status | −8.583 | 4.461 | −0.193 | −1.924 | 0.058 | 1.758 | |
voice training status | 8.368 | 7.622 | 0.158 | 1.098 | 0.275 | 3.631 | |
number of voice trainings | −0.139 | 0.955 | −0.019 | −0.146 | 0.884 | 3.136 | |
F0_diff 1 | −0.086 | 0.059 | −0.153 | −1.445 | 0.152 | 1.966 | |
Jitter_diff 1 | −0.245 | 0.092 | −0.214 | −2.668 | 0.009 * | 1.125 | |
Shimmer_diff 1 | −0.703 | 0.605 | −0.130 | −1.161 | 0.249 | 2.212 | |
NHR_diff 1 | −24.291 | 22.705 | −0.126 | −1.070 | 0.288 | 2.440 | |
SFF_diff 1 | −0.314 | 0.107 | −0.293 | −2.940 | 0.004 * | 1.740 | |
MPT_diff 1 | 1.069 | 0.514 | 0.180 | 2.081 | 0.041 * | 1.312 | |
Emotional scores_diff 1 | (constant) | −11.069 | 13.413 | −0.825 | 0.412 | ||
gender | 11.669 | 6.057 | 0.230 | 1.927 | 0.058 | 1.638 | |
smoking status | 17.063 | 8.493 | 0.238 | 2.009 | 0.048 * | 1.613 | |
alcohol status | 5.462 | 5.462 | 0.111 | 1.000 | 0.320 | 1.411 | |
coffee status | 3.055 | 5.847 | 0.059 | 0.522 | 0.603 | 1.462 | |
voice user status | −2.193 | 5.215 | −0.045 | −0.420 | 0.675 | 1.319 | |
comorbidity | 8.075 | 5.491 | 0.155 | 1.470 | 0.145 | 1.268 | |
surgery status | −7.600 | 5.394 | −0.175 | −1.409 | 0.163 | 1.758 | |
voice training status | 6.966 | 9.215 | 0.135 | 0.756 | 0.452 | 3.631 | |
number of voice trainings | 0.416 | 1.154 | 0.060 | 0.360 | 0.719 | 3.136 | |
F0_diff 1 | −0.049 | 0.072 | −0.090 | −0.684 | 0.496 | 1.966 | |
Jitter_diff 1 | −0.084 | 0.111 | −0.075 | −0.755 | 0.452 | 1.125 | |
Shimmer_diff 1 | 0.373 | 0.732 | 0.071 | 0.510 | 0.611 | 2.212 | |
NHR_diff 1 | −29.250 | 27.451 | −0.156 | −1.066 | 0.290 | 2.440 | |
SFF_diff 1 | −0.260 | 0.129 | −0.248 | −2.014 | 0.047 * | 1.740 | |
MPT_diff 1 | −0.307 | 0.621 | −0.053 | −0.494 | 0.622 | 1.312 |
Model | R | R Square | Adjusted R Square | Std. Error of the Estimate (RMSE) | p Value | Mean Absolute Error (MAE) |
---|---|---|---|---|---|---|
Total domain a | 0.615 b | 0.378 | 0.264 | 19.2824 | <0.001 b | 14.14 |
Physical domain c | 0.730 b | 0.533 | 0.447 | 18.57045 | <0.001 b | 13.62 |
Emotional domain d | 0.533 b | 0.284 | 0.153 | 22.45214 | 0.014 b | 16.47 |
random forest (RF) | n_estimators = 60, learning_rate = 0.4, max_depth = 1 |
gradient boosting (GB) | n_estimators = 800, min_samples_leaf = 3, max_features = ‘sqrt’ |
light gradient boosting machine (LightGBM) | objective = ‘regression’, n_estimators = 1000, learning_rate = 0.25, max_depth = 3, num_leaves = 5, subsample = 0.8, colsample_bytree = 0.6, reg_alpha = 0.5, reg_lambda = 1.0 |
extreme gradient boosting (XGBoost) | objective = ‘reg:squarederror’, n_estimators = 1000, max_depth = 2, learning_rate = 0.01, reg_alpha = 0.7, reg_lambda = 1 |
Methods | Total Domain | Physical Domain | Emotional Domain |
---|---|---|---|
GB | 20.91% | 30.50% | 6.81% |
RF | 21.82% | 24.11% | 17.33% |
LightGBM | 32.54% | 30.20% | 15.45% |
XGBoost | 17.50% | 27.83% | 11.74% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Park, J.H.; Jung, A.R.; Lee, J.-N.; Lee, J.-Y. Prediction of Voice Therapy Outcomes Using Machine Learning Approaches and SHAP Analysis: A K-VRQOL-Based Analysis. Appl. Sci. 2025, 15, 7045. https://doi.org/10.3390/app15137045
Park JH, Jung AR, Lee J-N, Lee J-Y. Prediction of Voice Therapy Outcomes Using Machine Learning Approaches and SHAP Analysis: A K-VRQOL-Based Analysis. Applied Sciences. 2025; 15(13):7045. https://doi.org/10.3390/app15137045
Chicago/Turabian StylePark, Ji Hye, Ah Ra Jung, Ji-Na Lee, and Ji-Yeoun Lee. 2025. "Prediction of Voice Therapy Outcomes Using Machine Learning Approaches and SHAP Analysis: A K-VRQOL-Based Analysis" Applied Sciences 15, no. 13: 7045. https://doi.org/10.3390/app15137045
APA StylePark, J. H., Jung, A. R., Lee, J.-N., & Lee, J.-Y. (2025). Prediction of Voice Therapy Outcomes Using Machine Learning Approaches and SHAP Analysis: A K-VRQOL-Based Analysis. Applied Sciences, 15(13), 7045. https://doi.org/10.3390/app15137045