Predictive and Explainable Machine Learning Models for Endocrine, Nutritional, and Metabolic Mortality in Italy Using Geolocalized Pollution Data
Abstract
:1. Introduction
2. Materials and Methods
2.1. Data Preparation Description
2.2. Multicollinearity Analysis
2.3. Leave-One-Group-Out Cross-Validation
2.4. Machine Learning Models
- Coefficient of determination:
- Mean absolute error:
- Root mean squared error:
2.5. Explanation of the Features
3. Results
3.1. Regression Performance
3.2. Interpretation of Models Through SHAP Values
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- World Health Organization. Noncommunicable Diseases Progress Monitor 2018; World Health Organization: Geneva, Switzerland, 2018. [Google Scholar]
- The Italian National Institute of Statistics. Italian National Health Statistics Report 2021; ISTAT: Chicago, IL, USA, 2021. Available online: https://www.istat.it (accessed on 1 November 2023).
- Michels, A.W.; Redondo, M.J.; Atkinson, M.A. The pathogenesis, natural history, and treatment of type 1 diabetes: Time (thankfully) does not stand still. Lancet Diabetes Endocrinol. 2022, 10, 90–92. [Google Scholar] [PubMed]
- Kim, S.H.; Lee, S.Y.; Kim, C.W.; Suh, Y.J.; Hong, S.; Ahn, S.H.; Seo, D.H.; Nam, M.S.; Chon, S.; Woo, J.T.; et al. Impact of socioeconomic status on health behaviors, metabolic control, and chronic complications in type 2 diabetes mellitus. Diabetes Metab. J. 2018, 42, 380. [Google Scholar] [PubMed]
- Myers, S.S.; Smith, M.R.; Guth, S.; Golden, C.D.; Vaitla, B.; Mueller, N.D.; Dangour, A.D.; Huybers, P. Climate change and global food systems: Potential impacts on food security and undernutrition. Annu. Rev. Public Health 2017, 38, 259–277. [Google Scholar] [PubMed]
- Yen, F.S.; Wei, J.C.C.; Chiu, L.T.; Hsu, C.C.; Hwu, C.M. Diabetes, hypertension, and cardiovascular disease development. J. Transl. Med. 2022, 20, 9. [Google Scholar]
- Bommer, C.; Heesemann, E.; Sagalova, V.; Manne-Goehler, J.; Atun, R.; Bärnighausen, T.; Vollmer, S. The global economic burden of diabetes in adults aged 20–79 years: A cost-of-illness study. Lancet Diabetes Endocrinol. 2017, 5, 423–430. [Google Scholar]
- Pagano, E.; De Rosa, M.; Rossi, E.; Cinconze, E.; Marchesini, G.; Miccoli, R.; Vaccaro, O.; Bonora, E.; Bruno, G. The relative burden of diabetes complications on healthcare costs: The population-based CINECA-SID ARNO Diabetes Observatory. Nutr. Metab. Cardiovasc. Dis. 2016, 26, 944–950. [Google Scholar]
- Di Cesare, M.; Khang, Y.H.; Asaria, P.; Blakely, T.; Cowan, M.J.; Farzadfar, F.; Guerrero, R.; Ikeda, N.; Kyobutungi, C.; Msyamboza, K.P.; et al. Inequalities in non-communicable diseases and effective responses. Lancet 2013, 381, 585–597. [Google Scholar]
- Pope, C.A.; Dockery, D.W. Health effects of fine particulate air pollution: Lines that connect. J. Air Waste Manag. Assoc. 2006, 56, 709–742. [Google Scholar]
- Thurston, G.D.; Kipen, H.; Annesi-Maesano, I.; Balmes, J.; Brook, R.D.; Cromar, K.; De Matteis, S.; Forastiere, F.; Forsberg, B.; Frampton, M.W.; et al. A joint ERS/ATS policy statement: What constitutes an adverse health effect of air pollution? An analytical framework. Eur. Respir. J. 2017, 49, 1600419. [Google Scholar]
- Monaco, A.; Lacalamita, A.; Amoroso, N.; D’Orta, A.; Del Buono, A.; di Tuoro, F.; Tangaro, S.; Galeandro, A.I.; Bellotti, R. Random forests highlight the combined effect of environmental heavy metals exposure and genetic damages for cardiovascular diseases. Appl. Sci. 2021, 11, 8405. [Google Scholar] [CrossRef]
- Braveman, P.A.; Egerter, S.; Williams, D.R. The social determinants of health: Coming of age. Annu. Rev. Public Health 2011, 32, 381–398. [Google Scholar] [CrossRef] [PubMed]
- Rajkomar, A.; Dean, J.; Kohane, I. Machine learning in medicine. N. Engl. J. Med. 2019, 380, 1347–1358. [Google Scholar] [CrossRef] [PubMed]
- Lundberg, S.M.; Lee, S.I. A Unified Approach to Interpreting Model Predictions. In Proceedings of the Advances in Neural Information Processing Systems, NeurIPS, Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
- Romano, D.; Novielli, P.; Diacono, D.; Cilli, R.; Pantaleo, E.; Amoroso, N.; Bellantuono, L.; Monaco, A.; Bellotti, R.; Tangaro, S. Insights from Explainable Artificial Intelligence of Pollution and Socioeconomic Influences for Respiratory Cancer Mortality in Italy. J. Pers. Med. 2024, 14, 430. [Google Scholar] [CrossRef] [PubMed]
- Tangaro, S.; Amoroso, N.; Brescia, M.; Cavuoti, S.; Chincarini, A.; Errico, R.; Inglese, P.; Longo, G.; Maglietta, R.; Tateo, A.; et al. Feature selection based on machine learning in MRIs for hippocampal segmentation. Comput. Math. Methods Med. 2015, 2015, 814104. [Google Scholar] [CrossRef]
- Novielli, P.; Romano, D.; Magarelli, M.; Diacono, D.; Monaco, A.; Amoroso, N.; Vacca, M.; De Angelis, M.; Bellotti, R.; Tangaro, S. Personalized identification of autism-related bacteria in the gut microbiome using explainable artificial intelligence. iScience 2024, 27, 110709. [Google Scholar] [CrossRef] [PubMed]
- Romano, D.; Novielli, P.; Cilli, R.; Amoroso, N.; Monaco, A.; Bellotti, R.; Tangaro, S. Air pollution and mortality for cancer of the respiratory system in Italy: An explainable artificial intelligence approach. Front. Public Health 2024, 12, 1344865. [Google Scholar] [CrossRef]
- Thunis, P.; Degraeuwe, B.; Pisoni, E.; Meleux, F.; Clappier, A. Analyzing the efficiency of short-term air quality plans in European cities, using the CHIMERE air quality model. Air Qual. Atmos. Health 2017, 10, 235–248. [Google Scholar] [CrossRef]
- Hass, H.; Ebel, A.; Feldmann, H.; Jakobs, H.; Memmesheimer, M. Evaluation studies with a regional chemical transport model (EURAD) using air quality data from the EMEP monitoring network. Atmos. Environ. Part A Gen. Top. 1993, 27, 867–887. [Google Scholar] [CrossRef]
- Duarte, E.D.S.F.; Franke, P.; Lange, A.C.; Friese, E.; da Silva Lopes, F.J.; da Silva, J.J.; dos Reis, J.S.; Landulfo, E.; e Silva, C.M.S.; Elbern, H.; et al. Evaluation of atmospheric aerosols in the metropolitan area of São Paulo simulated by the regional EURAD-IM model on high-resolution. Atmos. Pollut. Res. 2021, 12, 451–469. [Google Scholar] [CrossRef]
- Hinestroza-Ramirez, J.E.; Lopez-Restrepo, S.; Yarce Botero, A.; Segers, A.; Rendon-Perez, A.M.; Isaza-Cadavid, S.; Heemink, A.; Quintero, O.L. Improving Air Pollution Modelling in Complex Terrain with a Coupled WRF–LOTOS–EUROS Approach: A Case Study in Aburrá Valley, Colombia. Atmosphere 2023, 14, 738. [Google Scholar] [CrossRef]
- Persson, C.; Langner, J.; Robertson, L. Air pollution assessment studies for Sweden based on the MATCH model and air pollution measurements. In Air Pollution Modeling and Its Application XI; Personalized identification of autism-related bacteria in the gut microbiome using explainable artificial intelligence; Springer: Boston, MA, USA, 1996; pp. 127–134. [Google Scholar]
- Joly, M.; Josse, B.; Plu, M.; Arteta, J.; Guth, J.; Meleux, F. High-Resolution Air Quality Forecasts with MOCAGE Chemistry Transport Model. In Air Pollution Modeling and its Application XXIV; Springer: Cham, Switzerland, 2016; pp. 563–565. [Google Scholar]
- Van Loon, M.; Vautard, R.; Schaap, M.; Bergström, R.; Bessagnet, B.; Brandt, J.; Builtjes, P.; Christensen, J.; Cuvelier, C.; Graff, A.; et al. Evaluation of long-term ozone simulations from seven regional air quality models and their ensemble. Atmos. Environ. 2007, 41, 2083–2097. [Google Scholar]
- Neary, L.; Kaminski, J.W.; Lupu, A.; McConnell, J.C. Developments and results from a global multiscale air quality model (GEM-AQ). In Air Pollution Modeling and Its Application XVII; Springer: Boston, MA, USA, 2007; pp. 403–410. [Google Scholar]
- Mayr, A.; Binder, H.; Gefeller, O.; Schmid, M. The evolution of boosting algorithms. Methods Inf. Med. 2014, 53, 419–427. [Google Scholar]
- Parmar, A.; Katariya, R.; Patel, V. A review on random forest: An ensemble classifier. In Proceedings of the International Conference on Intelligent Data Communication Technologies and Internet of Things (ICICI) 2018, Coimbatore, India, 7–8 August 2018; Springer: Cham, Switzerland, 2019; pp. 758–763. [Google Scholar]
- Azmi, S.S.; Baliga, S. An overview of boosting decision tree algorithms utilizing AdaBoost and XGBoost boosting strategies. Int. Res. J. Eng. Technol 2020, 7, 6867–6870. [Google Scholar]
- Aas, K.; Jullum, M.; Løland, A. Explaining individual predictions when features are dependent: More accurate approximations to Shapley values. Artif. Intell. 2021, 298, 103502. [Google Scholar] [CrossRef]
- Webb, S.M.; Crespo, I.; Santos, A.; Resmini, E.; Aulinas, A.; Valassi, E. Management of endocrine disease: Quality of life tools for the management of pituitary disease. Eur. J. Endocrinol. 2017, 177, R13–R26. [Google Scholar] [PubMed]
- Sonino, N.; Fava, G.A.; Fallo, F.; Boscaro, M. Psychological distress and quality of life in endocrine disease. Psychother. Psychosom. 1990, 54, 140–144. [Google Scholar] [PubMed]
- Klinker, C.D.; Aaby, A.; Ringgaard, L.W.; Hjort, A.V.; Hawkins, M.; Maindal, H.T. Health literacy is associated with health behaviors in students from vocational education and training schools: A Danish population-based survey. Int. J. Environ. Res. Public Health 2020, 17, 671. [Google Scholar] [CrossRef]
- Zajacova, A.; Lawrence, E.M. The relationship between education and health: Reducing disparities through a contextual approach. Annu. Rev. Public Health 2018, 39, 273–289. [Google Scholar]
- Cui, Y.; Mo, Z.; Ji, P.; Zhong, J.; Li, Z.; Li, D.; Qin, L.; Liao, Q.; He, Z.; Guo, W.; et al. Benzene exposure leads to lipodystrophy and alters endocrine activity in vivo and in Vitro. Front. Endocrinol. 2022, 13, 937281. [Google Scholar]
- Li, Y.; Cheng, Y.; Cui, G.; Peng, C.; Xu, Y.; Wang, Y.; Liu, Y.; Liu, J.; Li, C.; Wu, Z.; et al. Association between high temperature and mortality in metropolitan areas of four cities in various climatic zones in China: A time-series study. Environ. Health 2014, 13, 65. [Google Scholar]
- Saleem, A.; Awan, T.; Akhtar, M.F. A comprehensive review on endocrine toxicity of gaseous components and particulate matter in smog. Front. Endocrinol. 2024, 15, 1294205. [Google Scholar]
- Darbre, P.D. Overview of air pollution and endocrine disorders. Int. J. Gen. Med. 2018, 11, 191–207. [Google Scholar] [PubMed]
- Gea, M.; Fea, E.; Racca, L.; Gilli, G.; Gardois, P.; Schilirò, T. Atmospheric endocrine disruptors: A systematic review on oestrogenic and androgenic activity of particulate matter. Chemosphere 2024, 349, 140887. [Google Scholar] [PubMed]
- Ning, J.; Akhter, T.; Sarfraz, M.; Afridi, H.I.; Albasher, G.; Unar, A. The importance of monitoring endocrine-disrupting chemicals and essential elements in biological samples of fertilizer industry workers. Environ. Res. 2023, 231, 116173. [Google Scholar]
- Klasing, K.C. Nutritional diseases. In Diseases of Poultry; John Wiley & Sons: Hoboken, NJ, USA, 2013; pp. 1203–1232. [Google Scholar]
- Kerr, N.R.; Booth, F.W. Contributions of physical inactivity and sedentary behavior to metabolic and endocrine diseases. Trends Endocrinol. Metab. 2022, 33, 817–827. [Google Scholar]
Model | MAE | RMSE | |
---|---|---|---|
Gradient Boosting | 0.55 | 0.17 | 0.05 |
Random Forest | 0.48 | 0.19 | 0.05 |
XGBoost | 0.06 | 0.24 | 0.08 |
Linear regression | −0.05 | 0.23 | 0.11 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Published by MDPI on behalf of the International Institute of Knowledge Innovation and Invention. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Romano, D.; Magarelli, M.; Novielli, P.; Diacono, D.; Di Bitonto, P.; Amoroso, N.; Monaco, A.; Bellotti, R.; Tangaro, S. Predictive and Explainable Machine Learning Models for Endocrine, Nutritional, and Metabolic Mortality in Italy Using Geolocalized Pollution Data. Appl. Syst. Innov. 2025, 8, 48. https://doi.org/10.3390/asi8020048
Romano D, Magarelli M, Novielli P, Diacono D, Di Bitonto P, Amoroso N, Monaco A, Bellotti R, Tangaro S. Predictive and Explainable Machine Learning Models for Endocrine, Nutritional, and Metabolic Mortality in Italy Using Geolocalized Pollution Data. Applied System Innovation. 2025; 8(2):48. https://doi.org/10.3390/asi8020048
Chicago/Turabian StyleRomano, Donato, Michele Magarelli, Pierfrancesco Novielli, Domenico Diacono, Pierpaolo Di Bitonto, Nicola Amoroso, Alfonso Monaco, Roberto Bellotti, and Sabina Tangaro. 2025. "Predictive and Explainable Machine Learning Models for Endocrine, Nutritional, and Metabolic Mortality in Italy Using Geolocalized Pollution Data" Applied System Innovation 8, no. 2: 48. https://doi.org/10.3390/asi8020048
APA StyleRomano, D., Magarelli, M., Novielli, P., Diacono, D., Di Bitonto, P., Amoroso, N., Monaco, A., Bellotti, R., & Tangaro, S. (2025). Predictive and Explainable Machine Learning Models for Endocrine, Nutritional, and Metabolic Mortality in Italy Using Geolocalized Pollution Data. Applied System Innovation, 8(2), 48. https://doi.org/10.3390/asi8020048