An Interpretable Machine Learning Framework for Analyzing the Interaction Between Cardiorespiratory Diseases and Meteo-Pollutant Sensor Data
Abstract
1. Introduction
State of the Art
2. Materials and Methods
2.1. Methodology
2.2. Dataset Collection and Preparation
2.3. Machine Learning Models
- Random Forest (RF): A supervised learning algorithm based on an ensemble of decision trees, where each tree is trained on a random subsample of the dataset. This strategy introduces a high degree of diversity among trees, reducing variance and improving generalization. In regression, the final prediction is calculated as the mean of the individual tree predictions [33]. Random Forest is known for its robustness to noise, ability to handle nonlinearly correlated features, and lack of assumptions about data distribution. Additionally, it shows relatively stable behavior against overfitting, especially with a large number of trees [34].
- Extreme Gradient Boosting (XGBoost): One of the most advanced and optimized implementations of gradient boosting. The algorithm builds decision trees sequentially, where each new tree is trained to minimize the residual errors of the previous ones through gradient descent on a differentiable loss function [35]. XGBoost integrates regularization techniques (L1 and L2), efficient memory management, parallel training, and early pruning to prevent overgrowth of trees. It is particularly effective on structured and high-dimensional datasets but requires careful hyperparameter tuning for optimal performance. Its popularity also stems from the computational speed and high predictive accuracy [20].
- Light Gradient Boosting Machine (LightGBM): A boosting algorithm that uses decision trees as base learners and is designed to be highly efficient in computation and memory. LightGBM builds trees leaf-wise (instead of level-wise), selecting the leaf with the greatest potential loss reduction at each iteration. This approach yields deeper and more accurate models but requires careful regularization to avoid overfitting. LightGBM supports missing value handling and customized loss functions, making it especially suitable for large and heterogeneous datasets [36].
- Explainable Boosting Machine (EBM): Belonging to the interpretable (glassbox) model category, it is based on the architecture of Generalized Additive Models with pairwise interactions (GAMs), balancing model transparency with high predictive performance. Each variable contributes independently to the final prediction through explicit additive functions, making it particularly suited for sensitive contexts such as healthcare [37].
- Coefficient of Determination (R2): This metric measures the proportion of variance in the target variable that is explained by the model. A value close to 1 indicates excellent predictive ability. As shown in Equation (1), the coefficient of determination (R2) is calculated as follows:
- Mean Absolute Error (MAE): This metric represents the average absolute difference between predicted and actual values. It is useful for interpreting the model’s error in the same unit as the target variable. Equation (2) provides the formula for calculating the MAE:
- Mean Squared Error (MSE): This metric calculates the average of the squares of the errors. It penalizes larger errors more than the MAE and is sensitive to outliers. Its expression is given by Equation (3):
- Root Mean Squared Error (RMSE): This metric provides the error magnitude in the same unit as the target. It helps quantify how much predictions typically deviate from actual values. Equation (4) expresses how the RMSE is computed:
2.4. Global and Local Interpretability (XAI Methods)
- SHapley Additive exPlanations (SHAP)
- Local Interpretable Model-Agnostic Explanations (LIME)
2.5. Study Area
2.6. Environmental Monitoring Sensors
2.7. ARPA Puglia Monitoring Network
2.8. Data Sources, Code Development, and Ethical Considerations
- Clinical data: Provided by the Policlinico of Bari, consisting of anonymized records of emergency department visits.
- Environmental data: Obtained from ARPA Puglia [46], which operates regional meteorological and air quality monitoring stations.
- Code development: The models were developed and tested using the Python 3.11.11 programming language. The main libraries used include:
- ○
- pandas and numpy for data handling and manipulation;
- ○
- scikit-learn (modules: model_selection, metrics, preprocessing, ensemble) for:
- -
- data splitting (e.g., train_test_split);
- -
- performance evaluation through cross-validation (cross_val_score);
- -
- computation of metrics (e.g., MAE, RMSE);
- -
- preprocessing (e.g., normalization via MinMaxScaler or StandardScaler);
- -
- model training with RandomForestRegressor;
- -
- xgboost (XGBRegressor) and lightgbm (LGBMRegressor) for advanced boosting techniques;
- -
- interpret (glassbox module) for the Explainable Boosting Machine (EBM);
- -
- Optuna for automated Bayesian hyperparameter optimization;
- -
- joblib for saving and loading models and experimental outputs;
- -
- shap and lime for global and local model interpretability;
- -
- matplotlib and seaborn for visualization and exploratory analysis.
- Ethical considerations:
- ○
- Only fully anonymized and aggregated data were used.
- ○
- No personal identifiers are present, and no patient can be traced.
- ○
- No human or animal experiments were performed.
- ○
- As such, ethical approval was not required according to applicable regulations.
- Generative Artificial Intelligence (GenAI) tools were used solely to assist with partial language editing.
3. Results
3.1. Application to the Case Study
3.1.1. Meteorological Data
3.1.2. Emergency Room (ER) Data
3.1.3. Application of Machine Learning Algorithms
3.1.4. Global and Local Interpretability (SHAP and LIME Methods)
- CO has a positive effect on the target: High concentrations of CO (in red) are associated with positive SHAP values, suggesting an increased likelihood of CRD cases.
- P and Tavg show an inverse impact: Low values (in blue) are associated with positive SHAP values, implying that low pressure and low temperature conditions contribute to an increase in cases.
- RH displays a pattern similar to that of carbon monoxide, with higher values linked to a positive impact on the target.
- For CO (Figure 5a), high values (approximately greater than 0.75–0.80 mg/m3) are associated with positive SHAP values and a strong red coloration, confirming the role of this variable as a risk factor when concentrations are elevated.
- For P_atm (Figure 5b) and Tavg (Figure 5c), the relationship is inverse, as already noted from the global Bee Swarm plot. Lower values tend to be associated with higher SHAP values, highlighting how harsher atmospheric conditions are more favorable to an increase in CRD cases (critical thresholds appear to be around 1010 hPa for P_atm and below approximately 20 °C for Tavg).
- For RH (Figure 5d), the same trend observed for CO emerges again, with high humidity levels (around 70% or more) showing a positive effect on the target.
- Significant increases in CRD cases were observed when CO levels exceeded 0.84 mg/m3.
- CRD cases increased when P_atm values were ≤1006.81 hPa.
- An increase in CRD numbers is mainly associated with average temperatures (Tavg) between 12.28 °C and 17.19 °C, which represent the interval with the highest incidence. However, even temperatures ≤ 12.28 °C show a positive association, though with a less significant contribution.
- Relative humidity (RH) values between 70.33% and 75.79% contribute more substantially to the rise in CRD cases, whereas levels > 75.79%, although still linked to a positive effect, have a lesser impact.
4. Discussion
- CO > 0.84 mg/m3;
- P_atm ≤ 1006.81 hPa;
- Tavg ≤ 17.19 °C;
- RH > 70.33%.
5. Conclusions
- Meteorological data (mean temperature, atmospheric pressure, relative humidity, etc.);
- Environmental data (pollutants such as CO, NO2, and PM10);
- Health data on daily emergency room admissions for cardiorespiratory conditions in the Bari area.
- CO concentrations exceed 0.84 mg/m3;
- P_atm is less than or equal to 1006.81 hPa;
- Tavg is less than or equal to 17.19 °C;
- RH exceeds 70.33%.
6. Patents
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
AI | Artificial Intelligence |
ARIMA | AutoRegressive Integrated Moving Average |
ARPA | Regional Environmental Protection Agency |
CRDs | Cardiorespiratory Diseases |
CWSM | Complex Wireless Sensors Model |
CO | Carmon monoxide |
CVD | Cardiovascular Disease |
DEW | Dew Point Temperature |
DCTs | Dates corresponding to Critical Thresholds |
DNN | Deep Neural Network |
DT | Decision Tree |
ED | Emergency Department |
EBM | Explainable Boosting Machine |
ER | Emergency Room |
GAMs | Generalized Additive Models |
GBDT | Gradient Boosting Decision Trees |
KNN | K-Nearest Neighbors |
LASSO | Least Absolute Shrinkage and Selection Operator |
LightGBM | Light Gradient Boosting Machine |
LIME | Local Interpretable Model-Agnostic Explanations |
LR | Linear Regression |
LSTM | Long Short-Term Memory |
MAE | Mean Absolute Error |
MAPE | Mean Absolute Percentage Error |
ML | Machine Learning |
MLP | Multi-Layer Perceptron |
MLR | Multivariable Linear Regression |
MSE | Mean Squared Error |
NO2 | Nitrogen Dioxide |
NSTEMI | Non-ST-Elevation Myocardial Infarction |
O3 | Ozone |
P_atm | Atmospheric Pressure |
PM10 | Particulate Matter ≤ 10 μm |
PM2.5 | Particulate Matter ≤ 2.5 μm |
R2 | Coefficient of determination |
RF | Random Forest |
RH | Relative Humidity |
RMSE | Root Mean Squared Error |
RRQA | Regional Air Quality Network |
SHAP | SHapley Additive exPlanations |
SVR | Support Vector Regression |
Tavg | Average Temperature |
TPEs | Tree-structured Parzen Estimators |
UA | Unstable Angina |
UAV | Unmanned Aerial Vehicle |
UV | Ultraviolet Radiation |
XAI | Explainable Artificial Intelligence |
XGBoost | Extreme Gradient Boosting |
References
- Sohail, H.; Kollanus, V.; Tiittanen, P.; Mikkonen, S.; Lipponen, A.H.; Zhang, S.; Breitner, S.; Schneider, A.; Lanki, T. Low temperature, cold spells, and cardiorespiratory hospital admissions in Helsinki, Finland. Air Qual. Atmos. Health 2023, 16, 213–220. [Google Scholar] [CrossRef]
- Kotecki, P.; Więckowska, B.; Stawińska-Witoszyńska, B. The Impact of Meteorological Parameters and Seasonal Changes on Reporting Patients with Selected Cardiovascular Diseases to Hospital Emergency Departments: A Pilot Study. Int. J. Environ. Res. Public Health 2023, 20, 4838. [Google Scholar] [CrossRef]
- Li, J.; Liang, L.; Lyu, B.; Cai, Y.S.; Zuo, Y.; Su, J.; Tong, Z. Double trouble: The interaction of PM2.5 and O3 on respiratory hospital admissions. Environ. Pollut. 2023, 338, 122665. [Google Scholar] [CrossRef]
- Gasparrini, A.; Guo, Y.; Hashizume, M.; Lavigne, E.; Zanobetti, A.; Schwartz, J.; Tobias, A.; Tong, S.; Rocklöv, J.; Forsberg, B.; et al. Mortality risk attributable to high and low ambient temperature: A multicountry observational study. Lancet 2015, 386, 369–375. [Google Scholar] [CrossRef]
- Mebrahtu, T.F.; Santorelli, G.; Yang, T.C.; Wright, J.; Tate, J.; McEachan, R.C. The effects of exposure to NO2, PM2.5 and PM10 on health service attendances with respiratory illnesses: A time-series analysis. Environ. Pollut. 2023, 333, 122123. [Google Scholar] [CrossRef]
- He, X.; Zhai, S.; Liu, X.; Liang, L.; Song, G.; Song, H.; Kong, Y. Interactive short-term effects of meteorological factors and air pollution on hospital admissions for cardiovascular diseases. Environ. Sci. Pollut. Res. 2022, 29, 68103–68117. [Google Scholar] [CrossRef]
- Monteiro Martins, L.; Coz, E.; Maucort-Boulch, D.; Hacid, M.S. Machine learning with environmental predictors to forecast hospital visits and admissions: A systematic review. Environ. Syst. Res. 2025, 14, 12. [Google Scholar] [CrossRef]
- Watson, G.L. Machine learning in environmental exposure assessment. In Machine Learning in Chemical Safety and Health; John Wiley & Sons: Hoboken, NJ, USA, 2022; pp. 251–265. [Google Scholar] [CrossRef]
- Usmani, R.S.A.; Pillai, T.R.; Hashem, I.A.T.; Marjani, M.; Shaharudin, R.; Latif, M.T. Air pollution and cardiorespiratory hospitalization, predictive modeling, and analysis using artificial intelligence techniques. Environ. Sci. Pollut. Res. 2021, 28, 56759–56771. [Google Scholar] [CrossRef]
- Lu, J.; Bu, P.; Xia, X.; Lu, N.; Yao, L.; Jiang, H. Feasibility of machine learning methods for predicting hospital emergency room visits for respiratory diseases. Environ. Sci. Pollut. Res. 2021, 28, 29701–29709. [Google Scholar] [CrossRef]
- Ravindra, K.; Bahadur, S.S.; Katoch, V.; Bhardwaj, S.; Kaur-Sidhu, M.; Gupta, M.; Mor, S. Application of machine learning approaches to predict the impact of ambient air pollution on outpatient visits for acute respiratory infections. Sci. Total Environ. 2023, 858, 159509. [Google Scholar] [CrossRef]
- Kurucz, V.C.; Schenk, J.; Veelo, D.P.; Geerts, B.F.; Vlaar, A.P.J.; Van Der Ster, B.J.P. Prediction of emergency department presentations for acute coronary syndrome using a machine learning approach. Sci. Rep. 2024, 14, 23125. [Google Scholar] [CrossRef]
- Hu, Z.; Qiu, H.; Su, Z.; Shen, M.; Chen, Z. A Stacking ensemble model to predict daily number of hospital admissions for cardiovascular diseases. IEEE Access 2020, 8, 138719–138729. [Google Scholar] [CrossRef]
- Münzel, T.; Hahad, O.; Sørensen, M.; Lelieveld, J.; Duerr, G.D.; Nieuwenhuijsen, M.; Daiber, A. Environmental risk factors and cardiovascular diseases: A comprehensive expert review. Cardiovasc. Res. 2021, 118, 2880–2902. [Google Scholar] [CrossRef]
- Rodríguez, M.; Montalvo, G.; Morales, J.; Jiménez-Martín, M.; Aparicio, M.; Piñeiro, C. Environmental monitoring and disease prediction. In Advancements and Technologies in Pig and Poultry Bacterial Disease Control; Foster, N., Kyriazakis, I., Barrow, P., Eds.; Academic Press: Cambridge, MA, USA, 2021; pp. 145–169. [Google Scholar] [CrossRef]
- Wang, J.; Kou, L.; Kwan, M.-P.; Shakespeare, R.M.; Lee, K.; Park, Y.M. An Integrated Individual Environmental Exposure Assessment System for Real-Time Mobile Sensing in Environmental Health Studies. Sensors 2021, 21, 4039. [Google Scholar] [CrossRef]
- Asha, P.; Natrayan, L.; Geetha, B.T.; Beulah, J.R.; Sumathy, R.; Varalakshmi, G.; Neelakandan, S. IoT enabled environmental toxicology for air pollution monitoring using AI techniques. Environ. Res. 2021, 205, 112574. [Google Scholar] [CrossRef]
- Tousi, A.; Luján, M. Comparative Analysis of Machine Learning Models for Performance Prediction of the SPEC Benchmarks. IEEE Access 2022, 10, 11994–12011. [Google Scholar] [CrossRef]
- Dowlatabadi, Y.; Abadi, S.; Sarkhosh, M.; Mohammadi, M.; Moezzi, S.M.M. Assessing the impact of meteorological factors and air pollution on respiratory disease mortality rates: A random forest model analysis (2017–2021). Sci. Rep. 2024, 14, 24535. [Google Scholar] [CrossRef]
- Zhou, L.; Zhu, Q.; Chen, Q.; Wang, P.; Huang, H. Predicting hospital outpatient volume using XGBoost: A machine learning approach. Sci. Rep. 2025, 15, 17028. [Google Scholar] [CrossRef]
- Liao, H.; Zhang, X.; Zhao, C.; Chen, Y.; Zeng, X.; Li, H. LightGBM: An efficient and accurate method for predicting pregnancy diseases. J. Obstet. Gynaecol. 2022, 42, 620–629. [Google Scholar] [CrossRef]
- Sarica, A.; Quattrone, A.; Quattrone, A. Explainable boosting machine for predicting Alzheimer’s disease from MRI hippocampal subfields. In International Conference on Brain Informatics; Mahmud, M., Kaiser, M.S., Vassanelli, S., Dai, Q., Zhong, N., Eds.; Springer: Cham, Switzerland, 2021; p. 12960. [Google Scholar] [CrossRef]
- Körner, A.; Sailer, B.; Sari-Yavuz, S.; Haeberle, H.A.; Mirakaj, V.; Bernard, A.; Rosenberger, P.; Koeppen, M. Explainable Boosting Machine approach identifies risk factors for acute renal failure. Intensiv. Care Med. Exp. 2024, 12, 55. [Google Scholar] [CrossRef]
- Srinivas, P.; Katarya, R. hyOPTXg: OPTUNA hyper-parameter optimization framework for predicting cardiovascular disease using XGBoost. Biomed. Signal Process. Control 2021, 73, 103456. [Google Scholar] [CrossRef]
- Dhanka, S.; Maini, S. A hybridization of XGBoost machine learning model by Optuna hyperparameter tuning suite for cardiovascular disease classification with significant effect of outliers and heterogeneous training datasets. Int. J. Cardiol. 2024, 420, 132757. [Google Scholar] [CrossRef]
- Chicco, D.; Warrens, M.J.; Jurman, G. The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation. PeerJ Comput. Sci. 2021, 7, e623. [Google Scholar] [CrossRef]
- Arias-Duart, A.; Parés, F.; Garcia-Gasulla, D.; Giménez-Ábalos, V. Focus! Rating XAI Methods and Finding Biases. In Proceedings of the 2022 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), Padua, Italy, 18–23 July 2022; pp. 1–8. [Google Scholar] [CrossRef]
- Arrieta, A.B.; Díaz-Rodríguez, N.; Del Ser, J.; Bennetot, A.; Tabik, S.; Barbado, A.; Garcia, S.; Gil-Lopez, S.; Molina, D.; Benjamins, R.; et al. Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Inf. Fusion 2020, 58, 82–115. [Google Scholar] [CrossRef]
- Shaikhina, T.; Bhatt, U.; Zhang, R.; Georgatzis, K.; Xiang, A.; Weller, A. Effects of uncertainty on the quality of feature importance explanations. In AAAI Workshop on Explainable Agency in Artificial Intelligence; AAAI Press: Washington, DC, USA, 2021; Available online: https://umangsbhatt.github.io/reports/AAAI_XAI_QB.pdf (accessed on 1 June 2025). (In Italian)
- Tarabanis, C.; Kalampokis, E.; Khalil, M.; Alviar, C.L.; Chinitz, L.A.; Jankelson, L. Explainable SHAP-XGBoost models for in-hospital mortality after myocardial infarction. Cardiovasc. Digit. Health J. 2023, 4, 126–132. [Google Scholar] [CrossRef]
- Kumar, I.E.; Venkatasubramanian, S.; Scheidegger, C.; Friedler, S. Problems with Shapley-value-based explanations as feature importance measures. In Proceedings of the 37th International Conference on Machine Learning, Virtual, 13–8 July 2020; PMLR: Cambridge, MA, USA, 2020; Volume 119, pp. 5491–5500. Available online: https://proceedings.mlr.press/v119/kumar20e.html (accessed on 5 August 2025).
- Salih, A.; Raisi-Estabragh, Z.; Galazzo, I.B.; Radeva, P.; Petersen, S.E.; Lekadir, K.; Menegaz, G. A Perspective on Explainable Artificial Intelligence Methods: SHAP and LIME. arXiv 2023, arXiv:2305.02012. [Google Scholar]
- Medeiros, N.; Fogliatto, F.S.; Rocha, M.; Tortorella, G. Predicting the length-of-stay of pediatric patients using machine learning algorithms. Int. J. Prod. Res. 2023, 63, 483–496. [Google Scholar] [CrossRef]
- Ding, W.; Qie, X. Prediction of air pollutant concentrations via Random Forest regressor coupled with uncertainty analysis—A case study in Ningxia. Atmosphere 2022, 13, 960. [Google Scholar] [CrossRef]
- Gao, L.; Feng, J.; Gao, Y.; Luo, L.; Jiang, H.; Yang, Q.; Lu, J.; Guo, L. XGBoost-based model for predicting PICC occlusion risk in cancer patients: Insights from SHAP analysis. Alex. Eng. J. 2025, 123, 436–447. [Google Scholar] [CrossRef]
- Talebi, H.; Bardsiri, A.K.; Bardsiri, V.K. Developing a hybrid machine learning model for employee turnover prediction: Integrating LightGBM and genetic algorithms. J. Open Innov. Technol. Mark. Complex. 2025, 11, 100557. [Google Scholar] [CrossRef]
- Colantonio, L.; Equeter, L.; Dehombreux, P.; Ducobu, F. Explainable AI for tool condition monitoring using Explainable Boosting Machine. Procedia CIRP 2025, 133, 138–143. [Google Scholar] [CrossRef]
- Nti, I.K.; Nyarko-Boateng, O.; Aning, J. Performance of Machine Learning Algorithms with Different K Values in K-fold Cross-Validation. Int. J. Inf. Technol. Comput. Sci. 2021, 13, 61–71. [Google Scholar] [CrossRef]
- Amministrazioni Comunali. (n.d.). Metropolitan area of Bari. Available online: https://www.amministrazionicomunali.it/citta-metropolitana-di-bari (accessed on 1 June 2025). (In Italian).
- ISPRA. Italian National Institute for Environmental Protection and Research. Available online: https://www.isprambiente.gov.it/it (accessed on 1 June 2025). (In Italian)
- Guan, R.; Yu, J.; Li, M.; Yan, J.; Liu, Z. Preparation of electrochemical sensor assisted unmanned aerial vehicles system for SO2, O3, NO2, CO and PM2.5/PM10 detection in air. Int. J. Electrochem. Sci. 2021, 16, 211021. [Google Scholar] [CrossRef]
- Lambey, V.; Prasad, A.D. Measurement of PM10, PM2.5, NO2, and SO2 Using Sensors. In Applied Geography and Geoinformatics for Sustainable Development; Boonpook, W., Lin, Z., Meksangsouy, P., Wetchayont, P., Eds.; Springer: Cham, Switzerland, 2023. [Google Scholar] [CrossRef]
- Petruci, J.F.S.; Barreto, D.N.; Dias, M.A.; Felix, E.P.; Cardoso, A.A. Analytical methods applied for ozone gas detection: A review. TrAC Trends Anal. Chem. 2022, 149, 116552. [Google Scholar] [CrossRef]
- Zeb, B.; Ditta, A.; Alam, K.; Sorooshian, A.; Din, B.U.; Iqbal, R.; Rahman, M.H.U.; Raza, A.; Alwahibi, M.S.; Elshikh, M.S. Wintertime investigation of PM10 concentrations, sources, and relationship with different meteorological parameters. Sci. Rep. 2024, 14, 154. [Google Scholar] [CrossRef]
- Afzal, J.; Zhou, Y.; Afzal, U.; Aslam, M. A complex wireless sensors model (CWSM) for real-time monitoring of dam temperature. Heliyon 2023, 9, e13371. [Google Scholar] [CrossRef]
- ARPA. Puglia Agenzia Regionale per la Protezione Ambientale–Rete di Monitoraggio Ambientale e Meteorologico. Available online: https://www.arpa.puglia.it (accessed on 1 June 2025). (In Italian).
- Khatun, N. Applications of Normality Test in Statistical Analysis. Open J. Stat. 2021, 11, 113. [Google Scholar] [CrossRef]
- Zhao, S.; Yang, Z.; Musa, S.S.; Ran, J.; Chong, M.K.C.; Javanbakht, M.; He, D.; Wang, M.H. Attach importance of the bootstrap t test against Student’s t test in clinical epidemiology: A demonstrative comparison using COVID-19 as an example. Epidemiol. Infect. 2021, 149, e107. [Google Scholar] [CrossRef]
- Santos, M.R.; Guedes, A.; Sanchez-Gendriz, I. SHapley Additive exPlanations (SHAP) for efficient feature selection in rolling bearing fault diagnosis. Mach. Learn. Knowl. Extr. 2024, 6, 316–341. [Google Scholar] [CrossRef]
Title | Authors | Machine Learning Models Used | Objective | Key Findings |
---|---|---|---|---|
Application of machine learning approaches to predict the impact of ambient air pollution on outpatient visits for acute respiratory infections | Ravindra, Katoch, Mor et al. (2023) [11] | RF, KNN, LR, LASSO, DT, SVR, XGBoost, DNN | Predict outpatient visits for respiratory infections using air pollution data | Air pollutants showed significant predictive power; Random Forest yielded the best performance |
Prediction of emergency department presentations for acute coronary syndrome using a machine learning approach | Kurucz, Schenk, Veelo et al. (2024) [12] | MLR, RF | Forecast ED visits for acute coronary syndrome using meteorological variables | Seasonal and daily patterns identified; strong predictive accuracy for UA and NSTEMI (R2 = 0.80) |
A Stacking Ensemble Model to Predict Daily Number of Hospital Admissions for Cardiovascular Diseases | Hu, Qiu, Su et al. (2020) [13] | LR, SVR, XGBoost, RF, GBDT, Stacking | Predict daily hospitalizations for CVD using an ensemble of base models | The stacking model outperformed all base models in terms of MAE, RMSE, MAPE, and R2 |
Interactive short-term effects of meteorological factors and air pollution on hospital admissions for cardiovascular diseases | He, Zhai, Liu et al. (2022) [6] | Generalized Additive Models (GAMs) with interactions | Examine the interactive short-term effects of meteorological factors and pollutants on CVD | Synergistic effects found between cold temperatures and PM pollution, particularly among elderly patients |
Feasibility of machine learning methods for predicting hospital emergency room visits for respiratory diseases | Lu, Bu, Xia, Lu, Yao, Jiang (2021) [10] | ARIMA, MLP, LSTM | Forecast ER visits for respiratory conditions based on PM2.5 exposure in Beijing | LSTM and MLP showed strong predictive performance; PM2.5 was a critical risk factor |
Environmental risk factors and cardiovascular diseases: a comprehensive expert review | Münzel, Hahad, Lelieveld et al. (2021) [14] | Narrative Review | Analyze physical and psychosocial environmental factors influencing cardiovascular disease | Identified pathophysiological mechanisms; proposed mitigation strategies for air, noise, and light pollution |
Tavg | DEW | P_atm | RH | CO | O3 | PM10 | NO2 | CRDs | |
---|---|---|---|---|---|---|---|---|---|
max | 31.7 | 23.7 | 1031.3 | 90.7 | 2.8 | 134.9 | 74.8 | 112.8 | 37.8 |
min | 2.8 | −3.6 | 987.8 | 39.8 | 0.1 | 40.8 | 7.4 | 19.0 | 4.9 |
avg | 17.8 | 12.1 | 1011.8 | 70.3 | 0.7 | 82.8 | 22.7 | 53.0 | 16.8 |
std | 6.2 | 5.3 | 6.9 | 7.6 | 0.4 | 18.4 | 7.2 | 14.8 | 6.4 |
99th | 29.6 | 21.4 | 1027.2 | 86.1 | 1.7 | 122.4 | 44.0 | 93.8 | 33.6 |
95th | 27.6 | 20.2 | 1022.9 | 82.1 | 1.4 | 111.4 | 36.1 | 79.3 | 29.1 |
90th | 26.4 | 19.1 | 1020.1 | 80.1 | 1.2 | 106.3 | 32.2 | 72.9 | 25.9 |
75th | 23.5 | 16.7 | 1016.2 | 75.8 | 0.8 | 97.3 | 26.5 | 62.0 | 20.9 |
50th | 17.2 | 12.1 | 1012.5 | 70.4 | 0.6 | 82.9 | 21.4 | 51.4 | 16.4 |
25th | 12.3 | 7.7 | 1006.6 | 65.2 | 0.4 | 67.9 | 17.6 | 43.0 | 11.5 |
Optuna Optimization | |||||
---|---|---|---|---|---|
Hyperparameter | Range | Adopted Value | Hyperparameter | Range | Adopted Value |
XGBoost | Random Forest | ||||
learning_rate | 0.01–0.3 | 0.05 | n_estimator | 100–1000 | 995 |
max_depth | 3–10 | 8 | max_depth | 3–30 | 20 |
n_estimator | 100–1000 | 645 | min_sample_split | 2–10 | 4 |
subsample | 0.2–1.0 | 0.43 | |||
LightGBM | EBM | ||||
n_estimator | 100–1000 | 717 | max_bins | 64–512 | 429 |
max_depth | −1–30 | −1 | max_interact_bins | 32–256 | 208 |
learning_rate | 0.01–0.3 | 0.07 | interactions | 0–10 | 10 |
num_leaves | 20–300 | 128 | learning_rate | 0.01–0.3 | 0.1 |
subsample | 0.5–1.0 | 0.91 | min_samples_leaf | 2–50 | 46 |
colsample_bytree | 0.5–1.0 | 0.9 | max_leaves | 2–64 | 8 |
Random Forest | XGBoost | LightGBM | EBM |
---|---|---|---|
R2 (−) = 0.881 | R2 (−) = 0.901 | R2 (−) = 0.896 | R2 (−) = 0.813 |
MAE (case/day) = 0.051 | MAE (case/day) = 0.047 | MAE (case/day) = 0.048 | MAE (case/day) = 0.066 |
MSE (−) = 0.005 | MSE (−) = 0.004 | MSE (−) = 0.004 | MSE (−) = 0.007 |
RMSE (−) = 0.068 | RMSE (−) = 0.062 | RMSE (−) = 0.063 | RMSE (−) = 0.085 |
Model 1 | Model 2 | p-Value | Significant |
---|---|---|---|
XGBoost | Random Forest | <0.001 | Yes |
XGBoost | LightGBM | <0.001 | Yes |
XGBoost | EBM | <0.001 | Yes |
Article Title | ML Models Used | Target | R2 (Test) | MAE (Test) (Cases/Day) |
---|---|---|---|---|
Feasibility of machine learning methods for predicting hospital emergency room visits for respiratory diseases (Lu, Bu, Xia et al., 2021) [10] | ARIMA, MLP, LSTM | Hospital visits for respiratory diseases | ARIMA: 0.70 MLP: 0.80 LSTM: 0.78 | ARIMA: 99 MLP: 49 LSTM: 33 |
Application of machine learning approaches to predict the impact of ambient air pollution on outpatient visits for acute respiratory infections (Ravindra, Katoch, Mor et al., 2023) [11] | RF, KNN, LR, LASSO, DT, SVR, XGBoost, DNN | Outpatient visits for acute respiratory infections (ARIs) | RF: 0.61 (ARI) 0.87 (total patients) | n.a. |
Prediction of emergency department presentations for acute coronary syndrome using a machine learning approach (Kurucz, Schenk, Veelo et al., 2024) [12] | MLR, RF | Emergency visits for acute coronary syndrome (ACS) | MLR: 0.66 (overall) 0.80 (unstable angina) | 7.8 (overall) 5.3 (unstable angina) |
A Stacking Ensemble Model to Predict Daily Number of Hospital Admissions for Cardiovascular Diseases (Hu, Qiu, Su et al., 2020) [13] | LR, SVR, XGBoost, RF, GBDT, Stacking | Hospital admissions for cardiovascular diseases | Stacking: 0.90 | Stacking: 20.69 |
[Current Study] Machine Learning interpretability to analyze the interaction between cardiorespiratory diseases and meteo-pollutant sensor data | XGBoost (selected among RF, LightGBM, and EBM) | Hospital admissions for cardiorespiratory diseases | XGBoost: 0.90 | XGBoost: 0.05 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Telesca, V.; Rondinone, M. An Interpretable Machine Learning Framework for Analyzing the Interaction Between Cardiorespiratory Diseases and Meteo-Pollutant Sensor Data. Sensors 2025, 25, 4864. https://doi.org/10.3390/s25154864
Telesca V, Rondinone M. An Interpretable Machine Learning Framework for Analyzing the Interaction Between Cardiorespiratory Diseases and Meteo-Pollutant Sensor Data. Sensors. 2025; 25(15):4864. https://doi.org/10.3390/s25154864
Chicago/Turabian StyleTelesca, Vito, and Maríca Rondinone. 2025. "An Interpretable Machine Learning Framework for Analyzing the Interaction Between Cardiorespiratory Diseases and Meteo-Pollutant Sensor Data" Sensors 25, no. 15: 4864. https://doi.org/10.3390/s25154864
APA StyleTelesca, V., & Rondinone, M. (2025). An Interpretable Machine Learning Framework for Analyzing the Interaction Between Cardiorespiratory Diseases and Meteo-Pollutant Sensor Data. Sensors, 25(15), 4864. https://doi.org/10.3390/s25154864