Explainable Machine Learning Models for Predicting FEV1 in Non-Smoking Taiwanese Men Aged 45–55 Years
Abstract
1. Introduction
2. Materials and Methods
2.1. Patient Selection
- Men aged 45–55 years;
- Never smoked;
- Not taking medications known to affect pulmonary function;
- Not taking medications for metabolic syndrome;
- No major systemic or chronic diseases.
2.2. Measurements of Anthropometry and Biochemistry
2.3. Measurement of FEV1
2.4. Traditional Statistical Analysis
2.5. Description of the Study Dataset
- Demographic: BW, CC, SBP, DBP, and education level.
- Biochemical: leukocyte count, hemoglobin, platelet count, total bilirubin, total protein, albumin, aspartate aminotransferase (AST), alanine aminotransferase (ALT), gamma-glutamyltransferase (γ-GT), LDH, creatinine, uric acid, TG, HDL-C, LDL-C, thyroid-stimulating hormone (TSH), and C-reactive protein (CRP).
- Lifestyle: drinking habits and sport area.
2.6. Data Preprocessing, Model Validation and Sensitivity Analyses
2.7. Proposed Machine Learning Scheme
3. Results
4. Discussion
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Mortimer, K.; Lesosky, M.; García-Marcos, L.; Asher, M.I.; Pearce, N.; Ellwood, E.; Bissell, K.; El Sony, A.; Ellwood, P.; Marks, G.B.; et al. The burden of asthma, hay fever and eczema in adults in 17 countries: GAN Phase I study. Eur. Respir. J. 2022, 60, 2102865. [Google Scholar] [CrossRef]
- Asher, M.I.; Rutter, E.C.; Bissell, K.; Chiang, C.-Y.; El Sony, A.; Ellwood, E.; Ellwood, P.; García-Marcos, L.; Marks, G.B.; Morales, E.; et al. Worldwide trends in the burden of asthma symptoms in school-aged children: Global Asthma Network Phase I cross-sectional study. Lancet 2021, 398, 1569–1580. [Google Scholar] [CrossRef] [PubMed]
- Vos, T.; Lim, S.S.; Abbafati, C.; Abbas, K.M.; Abbasi, M.; Abbasifard, M.; Bhutta, Z.A. Global burden of 369 diseases and injuries in 204 countries and territories, 1990–2019: A systematic analysis for the Global Burden of Disease Study 2019. Lancet 2020, 396, 1204–1222. [Google Scholar] [CrossRef]
- Halpin, D.M.G.; Celli, B.R.; Criner, G.J.; Frith, P.; López–Varela, M.V.L.; Salvi, S.; Vogelmeier, C.F.; Chen, R.; Mortimer, K.; Montes de Oca, M.; et al. The GOLD Summit on chronic obstructive pulmonary disease in low- and middle-income countries. Int. J. Tuberc. Lung Dis. 2019, 23, 1131–1141. [Google Scholar] [CrossRef]
- Meghji, J.; Mortimer, K.; Agusti, A.; Allwood, B.W.; Asher, I.; Bateman, E.D.; Bissell, K.; Bolton, E.C.; Bush, A.; Celli, B.; et al. Improving lung health in low-income and middle-income countries: From challenges to solutions. Lancet 2021, 397, 928–940. [Google Scholar] [CrossRef]
- Terzikhan, N.; Verhamme, K.M.C.; Hofman, A.; Stricker, B.H.; Brusselle, G.G.; Lahousse, L. Prevalence and incidence of COPD in smokers and non-smokers: The Rotterdam Study. Eur. J. Epidemiol. 2016, 31, 785–792. [Google Scholar] [CrossRef]
- Honkamaki, J.; Hisinger-Mölkänen, H.; Ilmarinen, P.; Piirilä, P.; Tuomisto, L.E.; Andersén, H.; Huhtala, H.; Sovijärvi, A.; Backman, H.; Lundbäck, B.; et al. Age- and gender-specific incidence of new asthma diagnosis from childhood to late adulthood. Respir. Med. 2019, 154, 56–62. [Google Scholar] [CrossRef]
- Janmeja, A.K.; Mohapatra, P.R.; Gupta, R.; Aggarwal, D. Spirometry Reference Values and Equations in North Indian Geriatric Population. Indian J. Chest Dis. Allied Sci. 2017, 59, 125–130. [Google Scholar] [CrossRef]
- Johnson, D.C.; Johnson, B.G. Spirometry Reference Equations Including Existing and Novel Parameters. Open Respir. Med. J. 2023, 17, e187430642212260. [Google Scholar] [CrossRef] [PubMed]
- Park, J.E.; Chung, J.H.; Lee, K.H.; Shin, K.C. The effect of body composition on pulmonary function. Tuberc. Respir. Dis. 2012, 72, 433–440. [Google Scholar] [CrossRef]
- Tantisuwat, A.; Thaveeratitham, P. Effects of smoking on chest expansion, lung function, and respiratory muscle strength of youths. J. Phys. Ther. Sci. 2014, 26, 167–170. [Google Scholar] [CrossRef] [PubMed]
- Marchetti, N.; Garshick, E.; Kinney, G.L.; McKenzie, A.; Stinson, D.; Lutz, S.M.; Lynch, D.A.; Criner, G.J.; Silverman, E.K.; Crapo, J.D. Association between occupational exposure and lung function, respiratory symptoms, and high-resolution computed tomography imaging in COPDGene. Am. J. Respir. Crit. Care Med. 2014, 190, 756–762. [Google Scholar] [CrossRef] [PubMed]
- Beverin, L.; Topalovic, M.; Halilovic, A.; Desbordes, P.; Janssens, W.; De Vos, M. Predicting total lung capacity from spirometry: A machine learning approach. Front. Med. 2023, 10, 1174631. [Google Scholar] [CrossRef]
- Wu, C.-Z.; Huang, L.-Y.; Chen, F.-Y.; Kuo, C.-H.; Yeih, D.-F. Using Machine Learning to Predict Abnormal Carotid Intima-Media Thickness in Type 2 Diabetes. Diagnostics 2023, 13, 1834. [Google Scholar] [CrossRef]
- Wu, X.; Tsai, S.P.; Tsao, C.K.; Chiu, M.L.; Tsai, M.K.; Lu, P.J.; Lee, J.H.; Chen, C.H.; Wen, C.; Chang, S.-S.; et al. Cohort Profile: The Taiwan MJ Cohort: Half a million Taiwanese with repeated health surveillance data. Int. J. Epidemiol. 2017, 46, 1744–1744g. [Google Scholar] [CrossRef]
- Foundation MHR. The Introduction of MJ Health Database. MJ Health Research Foundation Technical Report, MJHRF-TR-01. 2016. Available online: http://www.mjhrf.org/en/index.php?action=database&id=6 (accessed on 22 August 2016).
- Tzou, S.J.; Peng, C.-H.; Huang, L.-Y.; Chen, F.-Y.; Kuo, C.-H.; Wu, C.-Z.; Chu, T.-W. Comparison between linear regression and four different machine learning methods in selecting risk factors for osteoporosis in a Taiwanese female aged cohort. J. Chin. Med. Assoc. 2023, 86, 1028–1036. [Google Scholar] [CrossRef]
- Pellegrino, R.; Viegi, G.; Brusasco, V.; Crapo, R.O.; Burgos, F.; Casaburi, R.; Coates, A.; Van Der Grinten, C.P.M.; Gustafsson, P.; Hankinson, J.; et al. Interpretative strategies for lung function tests. Eur. Respir. J. 2005, 26, 948–968. [Google Scholar] [CrossRef]
- Chistiakov, D.A. Diabetic retinopathy: Pathogenic mechanisms and current treatments. Diabetes Metab. Syndr. 2011, 5, 165–172. [Google Scholar] [CrossRef] [PubMed]
- Nichols, G.A.; Vupputuri, S.; Lau, H. Medical care costs associated with progression of diabetic nephropathy. Diabetes Care 2011, 34, 2374–2378. [Google Scholar] [CrossRef]
- Chen, L.K.; Lin, M.-H.; Chen, Z.-J.; Hwang, S.-J.; Chiou, S.-T. Association of insulin resistance and hematologic parameters: Study of a middle-aged and elderly Taiwanese population in Taiwan. J. Chin. Med. Assoc. 2006, 69, 248–253. [Google Scholar] [CrossRef]
- Manns, B.; Hemmelgarn, B.; Tonelli, M.; Au, F.; Chiasson, T.C.; Dong, J.; Klarenbach, S. Population based screening for chronic kidney disease: Cost effectiveness study. Bmj 2010, 341, c5869. [Google Scholar] [CrossRef] [PubMed]
- Hex, N.; Bartlett, C.; Wright, D.; Taylor, M.; Varley, D. Estimating the current and future costs of Type 1 and Type 2 diabetes in the UK, including direct health costs and indirect societal and productivity costs. Diabet. Med. 2012, 29, 855–862. [Google Scholar] [CrossRef]
- Shapley, L.S. A value for n-person games. In Contributions to the Theory; Princeton University Press: Princeton, NJ, USA, 1953. [Google Scholar]
- Lundberg, S.M.; Lee, S.-I. A Unified Approach to Interpreting Model Predictions. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Choi, H.-W.; Abdirayimov, S. Demonstrating the Power of SHAP Values in AI-Driven Classification of Marvel Characters. J. Multimed. Inf. Syst. 2024, 11, 167–172. [Google Scholar] [CrossRef]
- Armstrong, J.S. Long-Range Forecasting: From Crystal Ball to Computer, 2nd ed.; Wiley: New York, NY, USA, 1985. [Google Scholar]
- Makridakis, S. Accuracy measures: Theoretical and practical concerns. Int. J. Forecast. 1993, 9, 527–529. [Google Scholar] [CrossRef]
- Hamner, B.F.M.; Maleki, A. R package, version 0.1.4; Metrics: Machine Learning Evaluation Metrics for Regression and Classification. 2025. Available online: https://github.com/mfrasco/Metrics (accessed on 21 July 2025).
- Lang, M.B.M.; Richter, J.; Schratz, P.; Pfisterer, F.; Coors, S.; Binder, M.; Au, Q.; Casalicchio, G.; Kotthoff, L.; Bischl, B. mlr3: A modern object-oriented machine learning framework in R. J. Open Source Softw. 2019, 4, 1903. [Google Scholar] [CrossRef]
- Willmott, C.J.; Matsuura, K. Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Clim. Res. 2005, 30, 79–82. [Google Scholar] [CrossRef]
- Roth, R.A. Effect of pneumotoxicants on lactate dehydrogenase activity in airways of rats. Toxicol. Appl. Pharmacol. 1981, 57, 69–78. [Google Scholar] [CrossRef]
- Hu, S.; Ye, J.; Guo, Q.; Zou, S.; Zhang, W.; Zhang, D.; Zhang, Y.; Wang, S.; Su, L.; Wei, Y. Serum lactate dehydrogenase is associated with impaired lung function: NHANES 2011–2012. PLoS ONE 2023, 18, e0281203. [Google Scholar] [CrossRef]
- Do, J.G.; Park, C.-H.; Lee, Y.-T.; Yoon, K.J. Association between underweight and pulmonary function in 282,135 healthy adults: A cross-sectional study in Korean population. Sci. Rep. 2019, 9, 14308. [Google Scholar] [CrossRef] [PubMed]
- Peralta, G.P.; Marcon, A.; Carsin, A.-E.; Abramson, M.J.; Accordini, S.; Amaral, A.F.; Antó, J.M.; Bowatte, G.; Burney, P.; Corsico, A.; et al. Body mass index and weight change are associated with adult lung function trajectories: The prospective ECRHS study. Thorax 2020, 75, 313–320. [Google Scholar] [CrossRef] [PubMed]
- Tabak, C.; Spijkerman, A.M.W.; Verschuren, W.M.M.; Smit, H.A. Does educational level influence lung function decline (Doetinchem Cohort Study)? Eur. Respir. J. 2009, 34, 940–947. [Google Scholar] [CrossRef]
- Polak, M.; Szafraniec, K.; Kozela, M.; Wolfshaut-Wolak, R.; Bobak, M.; Pająk, A. Socioeconomic status and pulmonary function, transition from childhood to adulthood: Cross-sectional results from the polish part of the HAPIEE study. BMJ Open 2019, 9, e022638. [Google Scholar] [CrossRef]
- Janoff, A. Biochemical links between cigarette smoking and pulmonary emphysema. J. Appl. Physiol. Respir. Environ. Exerc. Physiol. 1983, 55, 285–293. [Google Scholar] [CrossRef]
- Babior, B.M. The respiratory burst of phagocytes. J. Clin. Invest. 1984, 73, 599–601. [Google Scholar] [CrossRef] [PubMed]
- Yang, H.F.; Kao, T.W.; Wang, C.C.; Peng, T.C.; Chang, Y.W.; Chen, W.L. Serum white blood cell count and pulmonary function test are negatively associated. Acta Clin. Belg. 2015, 70, 419–424. [Google Scholar] [CrossRef] [PubMed]
- Lin, J.P.; Vitek, L.; Schwertner, H.A. Serum bilirubin and genes controlling bilirubin concentrations as biomarkers for cardiovascular disease. Clin. Chem. 2010, 56, 1535–1543. [Google Scholar] [CrossRef] [PubMed]
- Schwertner, H.A.; Vítek, L. Gilbert syndrome, UGT1A1*28 allele, and cardiovascular disease risk: Possible protective effects and therapeutic applications of bilirubin. Atherosclerosis 2008, 198, 1–11. [Google Scholar] [CrossRef]
- Leem, A.Y.; Kim, H.Y.; Kim, Y.S.; Park, M.S.; Chang, J.; Jung, J.Y. Association of serum bilirubin level with lung function decline: A Korean community-based cohort study. Respir. Res. 2018, 19, 99. [Google Scholar] [CrossRef]
- Holmen, T.L.; Barrett-Connor, E.; Clausen, J.; Holmen, J.; Bjermer, L. Physical exercise, sports, and lung function in smoking versus nonsmoking adolescents. Eur. Respir. J. 2002, 19, 8–15. [Google Scholar] [CrossRef]
- Doherty, M.; Dimitriou, L. Comparison of lung volume in Greek swimmers, land based athletes, and sedentary controls using allometric scaling. Br. J. Sports Med. 1997, 31, 337–341. [Google Scholar] [CrossRef]
- MacAuley, D.; McCrum, E.; Evans, A.; Stott, G.; Boreham, C.; Trinick, T. Physical activity, physical fitness and respiratory function--exercise and respiratory function. Ir. J. Med. Sci. 1999, 168, 119–123. [Google Scholar] [CrossRef] [PubMed]






| Variable | Values (Mean ± SD) |
|---|---|
| Body weight (kg) | 70.8 ± 10.2 |
| Chest circumference (cm) | 93.6 ± 6.4 |
| Leukocyte count (*103/uL) | 6.0 ± 1.6 |
| Hemoglobin (g/dL) | 15.2 ± 1.1 |
| Platelet count (*103/uL) | 228.9 ± 50.9 |
| Total bilirubin (mg/dL) | 1.0 ± 0.4 |
| Total protein (g/dL) | 7.5 ± 0.4 |
| Albumin (g/dL) | 4.5 ± 0.2 |
| Aspartate aminotransferase (IU/L) | 26.7 ± 16.9 |
| Alanine aminotransferase (IU/L) | 35.3 ± 29.9 |
| Gamma-glutamyltransferase (IU/L) | 35.3 ± 46.1 |
| Lactate dehydrogenase (IU/L) | 207.1 ± 79.7 |
| Creatinine (mg/dL) | 1.1 ± 0.3 |
| Uric acid (mg/dL) | 6.7 ± 1.4 |
| Triglyceride (mg/dL) | 145.4 ± 112.4 |
| High density lipoprotein cholesterol (mg/dL) | 50.7 ± 12.3 |
| Low density lipoprotein cholesterol (mg/dL) | 126.2 ± 32.8 |
| Thyroid stimulating hormone (uIU/mL) | 1.6 ± 1.6 |
| C-reactive protein (mg/dL) | 0.2 ± 0.5 |
| Systolic blood pressure (mmHg) | 124.3 ± 16.9 |
| Diastolic blood pressure (mmHg) | 79.4 ± 11.4 |
| Sport hour/week (hour) | 9.0 ± 9.1 |
| Education | |
| Illiterate | 72 (0.3%) |
| Primary school | 1797 (7.7%) |
| Junior high school | 1713 (7.3%) |
| Senior high school | 4851 (20.8%) |
| College | 5131 (22.0%) |
| University | 5819 (25.0%) |
| Higher than master’s degree | 3937 (16.9%) |
| Drinking | |
| Non-drinker | 17,906 (78.3%) |
| Drinker | 4977 (21.7%) |
| Metrics | Description | Calculation |
|---|---|---|
| SMAPE | Symmetric Mean Absolute Percentage Error | |
| RAE | Relative Absolute Error | |
| RRSE | Root Relative Squared Error | |
| RMSE | Root Mean Squared Error |
| BW | CC | WBC | Hb | PLT | TB | TP | Albumin | AST | ALT | γ-GT | LDH | Cr |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.156 *** | 0.091 *** | −0.161 *** | −0.005 | −0.089 *** | 0.146 *** | −0.096 *** | −0.073 *** | −0.052 *** | −0.048 *** | −0.059 *** | −0.337 *** | −0.015 * |
| UA | TG | HDL-C | LDL-C | TSH | CRP | SBP | DBP | Sport | Education | Drinking | ||
| −0.095 *** | −0.086*** | 0.136 *** | −0.023 *** | −0.018 *** | −0.070 *** | −0.150 *** | −0.042 *** | 0.035 *** | 0.298 *** | −0.057 *** |
| SMAPE | RAE | RRSE | |
|---|---|---|---|
| LR | 0.1460 | 0.1401 | 0.8762 |
| RF | 0.1446 | 0.1381 | 0.8662 |
| SGB | 0.1429 | 0.1370 | 0.8582 |
| XGboost | 0.1426 | 0.1369 | 0.8576 |
| Model | RMSE (Mean ± SD) | 95% CI (Lower) | 95% CI (Upper) |
|---|---|---|---|
| MLR | 0.5344 ± 0.0082 | 0.5252 | 0.5437 |
| RF | 0.5321 ± 0.0077 | 0.5233 | 0.5408 |
| SGB | 0.5258 ± 0.0073 | 0.5176 | 0.5341 |
| XGBoost | 0.5252 ± 0.0074 | 0.5168 | 0.5335 |
| Comparison | p-value | ||
| RF vs. MLR | 0.0039 | ||
| SGB vs. MLR | 0.0019 | ||
| XGBoost vs. MLR | 0.0019 | ||
| Variables | LR | RF | SGB | XGboost | MORI |
|---|---|---|---|---|---|
| Body weight (kg) | 55.2 | 42.1 | 41.2 | 30.7 | 38.0 |
| Chest circumference (cm) | 0.0 | 16.0 | 5.1 | 3.8 | 8.3 |
| Leukocyte count (*103/uL) | 44.6 | 34.3 | 18.5 | 16.3 | 23.0 |
| Hemoglobin (g/dL) | 10.0 | 17.4 | 1.7 | 1.8 | 7.0 |
| Platelet count (*103/uL) | 8.3 | 22.2 | 2.5 | 2.4 | 9.0 |
| Total bilirubin (mg/dL) | 45.4 | 32.3 | 18.9 | 14.0 | 21.7 |
| Total protein (g/dL) | 2.4 | 11.8 | 0.4 | 0.0 | 4.1 |
| Albumin (g/dL) | 1.1 | 7.6 | 0.1 | 0.0 | 2.6 |
| Aspartate aminotransferase (IU/L) | 18.7 | 13.3 | 0.0 | 0.7 | 4.7 |
| Alanine aminotransferase (IU/L) | 18.9 | 16.5 | 0.1 | 0.9 | 5.8 |
| Gamma-glutamyltransferase (IU/L) | 21.8 | 22.0 | 9.5 | 7.5 | 13.0 |
| Lactate dehydrogenase (IU/L) | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 |
| Creatinine (mg/dL) | 8.7 | 19.7 | 8.5 | 8.4 | 12.2 |
| Uric acid (mg/dL) | 19.8 | 19.0 | 3.5 | 2.9 | 8.5 |
| Triglyceride (mg/dL) | 25.6 | 23.0 | 3.2 | 1.9 | 9.4 |
| High density lipoprotein cholesterol (mg/dL) | 52.5 | 23.5 | 12.1 | 10.5 | 15.4 |
| Low density lipoprotein cholesterol (mg/dL) | 29.9 | 17.3 | 2.6 | 0.9 | 6.9 |
| Thyroid stimulating hormone (uIU/mL) | 5.2 | 22.4 | 1.0 | 2.2 | 8.5 |
| C-reactive protein (mg/dL) | 11.8 | 5.1 | 7.7 | 7.2 | 6.7 |
| Systolic blood pressure (mmHg) | 59.6 | 26.0 | 15.7 | 11.7 | 17.8 |
| Diastolic blood pressure (mmHg) | 37.5 | 16.9 | 3.9 | 2.6 | 7.8 |
| Sport | 13.8 | 20.5 | 14.4 | 21.7 | 18.9 |
| Education | 73.0 | 29.6 | 37.8 | 36.1 | 34.5 |
| Drinking | 11.0 | 0.0 | 1.2 | 0.6 | 0.6 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Chang, C.-Y.; Pei, D.; Kuo, Y.-L.; Lee, L.-N.; Wu, C.-Z.; Chu, T.-W.; Shen, H.-S.; Huang, C.-Y.; Liang, Y.-J. Explainable Machine Learning Models for Predicting FEV1 in Non-Smoking Taiwanese Men Aged 45–55 Years. Diagnostics 2025, 15, 3152. https://doi.org/10.3390/diagnostics15243152
Chang C-Y, Pei D, Kuo Y-L, Lee L-N, Wu C-Z, Chu T-W, Shen H-S, Huang C-Y, Liang Y-J. Explainable Machine Learning Models for Predicting FEV1 in Non-Smoking Taiwanese Men Aged 45–55 Years. Diagnostics. 2025; 15(24):3152. https://doi.org/10.3390/diagnostics15243152
Chicago/Turabian StyleChang, Chih-Yueh, Dee Pei, Yen-Liang Kuo, Li-Na Lee, Chung-Ze Wu, Ta-Wei Chu, Hsiang-Shi Shen, Chun-Yen Huang, and Yao-Jen Liang. 2025. "Explainable Machine Learning Models for Predicting FEV1 in Non-Smoking Taiwanese Men Aged 45–55 Years" Diagnostics 15, no. 24: 3152. https://doi.org/10.3390/diagnostics15243152
APA StyleChang, C.-Y., Pei, D., Kuo, Y.-L., Lee, L.-N., Wu, C.-Z., Chu, T.-W., Shen, H.-S., Huang, C.-Y., & Liang, Y.-J. (2025). Explainable Machine Learning Models for Predicting FEV1 in Non-Smoking Taiwanese Men Aged 45–55 Years. Diagnostics, 15(24), 3152. https://doi.org/10.3390/diagnostics15243152

