Next Article in Journal
Exposure to PM2.5 While Walking in the City Center
Previous Article in Journal
Assessing Dam Site Suitability Using an Integrated AHP and GIS Approach: A Case Study of the Purna Catchment in the Upper Tapi Basin, India
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Proceeding Paper

Insights into Air Quality Index (AQI) Variability with Explainable Machine Learning Techniques †

by
Claudio Andenna
1 and
Roberta Valentina Gagliardi
2,*
1
Istituto Nazionale per l’Assicurazione contro gli Infortuni sul Lavoro (INAIL-DIT), 00143 Rome, Italy
2
Istituto Superiore di Sanità, 00161 Rome, Italy
*
Author to whom correspondence should be addressed.
Presented at the 7th International Electronic Conference on Atmospheric Sciences (ECAS-7), 4–6 June 2025; Available online: https://sciforum.net/event/ECAS2025.
Environ. Earth Sci. Proc. 2025, 34(1), 1; https://doi.org/10.3390/eesp2025034001
Published: 5 August 2025

Abstract

In this study, a combined approach joining the machine learning model Extreme Gradient Boosting (XGBoost) with Shapley Additive Explanation (SHAP) is adopted to simulate the temporal pattern of the air quality index (AQI) and subsequently explore the key factors affecting AQI variability. Based on the analysis of air pollutants and meteorological data acquired from two air quality monitoring stations in Rome (Italy), over the 2018–2022 period, the results demonstrate the effectiveness of the proposed methodological approach in elucidating the role of the main factors driving AQI evolution, and their interaction effects.

1. Introduction

The air quality index (AQI) is a dimensionless indicator used to provide real time air quality and health information [1,2]. Although member states in the European Union are subject to the same air quality legislation [3], a common AQI has not been adopted, and each country applies its own [4]. In Italy, a decentralized network of 21 regional environmental protection agencies (ARPAs) monitors and manages air quality, with the national-level agency (National System for the Environmental Protection, SNPA) coordinating these efforts. While the calculation of the AQI may vary slightly between different Italian regions, the basic principle remains the same: the AQI is developed using criteria pollutant criteria, the short-term average concentrations of which are compared to the short-term limit values set by the current legislation [5]. The pollutant with the highest value relative to its limit value determines the AQI value. Finally, the AQI values are grouped into five classes with a uniform interval width of 50, indicating good, acceptable, moderate, poor, and very poor levels of air quality.
From a health perspective, measuring the AQI based on the dominant pollutant may not accurately reflect our current understanding of the adverse health effects of ambient air pollution, especially in circumstances involving the combined effect of multiple pollutants [6] or low-level exposure [7]. Therefore, a deeper characterization of the factors affecting AQI variability would be beneficial to provide more reliable data and information for air pollution management and more effective health protection strategies.
Stochastic models such as ARIMA, SARIMA, ARIMAX, SARIMAX have been traditionally used for air pollutant time series modeling [8]. Recently, the superiority of machine learning models has been shown in comparative studies investigating the abilities of various existing techniques when applied for time series forecasting [9,10].
Therefore, a combined approach integrating a tree-based machine learning (ML) algorithm (XGBoost) [11] with a game theory-based explanation model (SHAP) [12] has been implemented in this study. The objective is to simulate the temporal pattern of AQI and then explore the influence of various driving factors and their interactions on AQI variability.
Despite their remarkable predictive capabilities, ML models are regarded as “black boxes”, that is, it is difficult for the ML algorithm to provide a suitable explanation for how it arrived at an answer [13,14]. The lack of interpretability and explainability is a huge challenge when there is a need to explain the rationale behind a model’s output. To overcome this issue, explainable artificial intelligence strategies (XAI) have been developed to understand how an ML model makes its predictions [15] and to assess the impact of individual factors and their interrelationships on the model output.
The proposed methodological framework has been applied to a dataset comprising hourly data for air pollutant concentrations and meteorological parameters acquired, over the 2018–2022 period, in Rome (central Italy). The results obtained from this study demonstrate the efficacy of both XGBoost and SHAP algorithms in predicting and explaining AQI variability. The innovative character of the proposed work consists of building a machine learning model that simulates AQI as a function of parameters that are not present in its calculation formula. Moreover, through the SHAP analysis, the work carried out also proposes an examination of the complex interconnections between air pollutants and meteorological factors and AQI.

2. Materials and Methods

2.1. Study Area

The area of interest in this study is Rome (Lat. 41.91 N, Lon. 12.48 E), the capital of Italy, located about 27 km from the coast of the Tyrrhenian Sea. The urban area is approximately 1300 km2, including highly urbanized zones alternating with parks and green areas. The territory is mainly flat, with an altitude between 13 and 140 m above sea level, while the rural areas surrounding the city have a more complex hilly orography. According to the Köppen–Geiger climate classification [16], Rome belongs to the Mediterranean climate class (Csa), i.e., a temperate climate, with dry and hot summers [17]. In the absence of significant industrial plants, the air quality is strongly influenced by local emissions (vehicular traffic and domestic heating) and by advective phenomena [18]. A network of 18 air quality monitoring stations and 4 weather stations are managed by the Agency for Environmental Protection of the Lazio Region (ARPAL); acquired data are made publicly available after a validation process. According to the reports on environmental data drafted by ARPAL [19] over the 2013–2022 period, in the urban area PM10, NO2 and O3 exceeded target values.

2.2. Observational Dataset

To build the working dataset, hourly data for the concentrations of five air pollutants, namely O3, nitrogen dioxide (NO2), nitrogen oxide (NO), and particulate matter (PM10 and PM2.5), as well as three meteorological parameters, temperature (T), relative humidity (RH) and wind speed (ws), have been downloaded from the official ARPAL website (https://www.arpalazio.net/main/aria/sci/basedati/chimici/chimici.php (accessed on 7 March 2025), https://www.arpalazio.it/rete-micro-meteorologica, (accessed on 10 March 2025)). The data needed for our study were acquired from two air quality monitoring stations selected according to their location, one in downtown Rome on the edge of a large green area, Villa Ada (VA), and the other one in the rural surrounding areas situated in a protected natural area, Castel di Guido (CG), classified as urban and rural background stations, so that they are representative of the pollution conditions in the urban and rural environment, respectively. For meteorological parameters, the two nearest weather stations have been selected, i.e., Boncompagni e Castel di Guido, respectively. Figure 1a,b show a map of the studied area and the AQI temporal pattern at the CG and VA sites.
The daily AQI has been calculated according to the equation:
A Q i   =     m a x I i   =   100 C i S i
where i indicates the pollutant; Ci is the hourly concentration for NO2, the 8 h carried mobile average for O3 and the daily average for PM10; and Si is the corresponding threshold level for the protection of human health established by the legislation.
As this is a daily index, the reference pollutants are those for which parameters with a short-term average are established by current legislation [5] and whose effect could pose a potential risk to the population. The reference values are as follows: the hourly maximum average for NO2 is 200 µg/m3, the 8 h average for O3 is 120 µg/m3, and the daily average for PM10 is 50 µg/m3. Overall, a dataset consisting of 3652 daily observations, covering the 2018–2022 period, was set up for the development of the XGBoost model.

2.3. Methodological Approach

Data loading, processing, statistical analysis, and modeling were accomplished in the R software environment and its associated packages (R version 4.3.1)

2.3.1. XGBoost

The (XGBoost) algorithm is a supervised machine learning technique based on the ensemble of decision trees, designed to solve classification, and regression problems. It has been chosen to predict the time-varying AQI pattern due to its scalability, speed, performance, and interpretability compared to other algorithms [20].
Assuming hourly AQI concentrations as the model output (or target), the model can be expressed in terms of 8 features (or explanatory variables), as follows:
A Q I   ~ xgboost O 3 ,   NO 2 ,   NO ,   PM 10 ,   PM 2.5 ,   T ,   RH , ws
where x g b o o s t is the function implementing the boosted regression tree technique in the R software environment (xgboost R package 1.7.8.1), and the chosen features are available in all the sites considered. To build the model, the observational dataset has been randomly partitioned into a training dataset for the model development (75% of the observations) and a test dataset used to check the model performance (25% of the observations). The hyperparameter tuning, as defined at https://xgboost.readthedocs.io/en/latest/parameter.html (accessed on 7 March 2025), was conducted by means of Bayesian optimization (ParBayesianOptimization R package 1.2.6). The XGBoost models’ skill has been evaluated by comparing predicted and observed AQI values using the following statistical indicators for the training and test datasets: the coefficient of determination (R2), the mean bias error (MBE), and the root mean square error (RMSE).

2.3.2. SHAP

The SHAP method is a technique that allows the output of any machine learning model to be explained. Based on the coalitional game theory, this approach enables an understanding of how the model reaches its predictions by assessing each feature’s contribution to model output and addressing complex feature interactions [21]. For theoretical insights readers can refer to [12]. In this study, once obtained, SHAP (shapViz R package 0.9.7) was applied to the outcomes of the XGBoost models to elucidate the extent and the nature (positive or negative) of the contribution of each feature to the AQI predictions. The SHAP explanations are supported by several SHAP visualization tools enabling a more intuitive comprehension of the model output.

3. Results and Discussion

3.1. Explorative AQI Analysis

An explorative AQI data analysis was carried out before the XGBoost model was applied. The results show (i) a mostly acceptable level of AQI at both examined sites, i.e., an AQI smaller than or equal to 100, Figure 1b, (ii) the pollutants with the largest individual AQI, i.e., in the order, O3, PM10 and NO2, (iii) the seasonal shift in AQI, which is higher during spring and summer indicating poorer air quality in these seasons, and (iv) a similar spatial distribution of AQI values among sites. Interestingly, during the study period there was only one day when the AQI exceeded 150 at the CG site.

3.2. XGBoost Model Performances

The best hyperparameters of the XGBoost model obtained by Bayesian optimization are reported in Table 1.
As can be seen from Table 2 and the scatter plots in Figure 2a,b, a good degree of concordance is found between the observed and predicted AQI values over the 2018–2022 period.
Overall, the performances of the models are satisfactory for daily predictions and consistent with other studies [22].

3.3. SHAP Analysis

For each site, Figure 3 summarizes, in descending order, the relative importance as well as the positive or negative contribution of each feature to the target prediction. The number on the left is the global importance of the features based on the mean absolute SHAP value. O3, PM10, PM2.5, T, and ws turn out to be the most important features. The order of the most essential features is slightly different between the two sites; however, the overall contribution of O3 and PM10 to AQI is predominant at each site. Moreover, the AQI is more sensitive to air pollutants than meteorological factors, which is consistent with the fact that the AQI is calculated according to the concentrations of three air pollutants.
It is intriguing to emphasize the role of PM2.5, which is not directly included in the AQI calculation yet but has been integrated into the model as a predictor due to its known strong interconnection with O3 [23]. NO2 and NO provide a negligible contribution, although, according to [19], NO2 still represents a criticality in the Rome area together with O3. Among the meteorological features considered, ws and T are the most important features, while the results show that the RH’s contribution is much smaller.
At both sites, elevated concentrations of O3 and particulate matter (PM10 and PM2.5) are found to have a positive SHAP value, thereby contributing to a better prediction of AQI values. As expected, this finding indicates a correlation between elevated concentrations of pollutants and high AQI values.
The role of meteorological parameters in the AQI is more intricate in comparison to that of pollutant factors. Elevated T values are found to have a positive SHAP value, indicating a positive correlation between AQI and T. This phenomenon can be attributed to the fact that this parameter enhances O3 formation [24], which is the primary contributor to the AQI. Conversely, higher ws values have a negative SHAP value, thus contributing to a lower prediction of Air Quality Index (AQI) values which translates into a better air quality likely attributable to pollutant dispersion [25].
A seasonal analysis confirmed O3 as the most significant feature at both locations and in each season except in winter when the contribution of PM10 becomes more significant. Moreover, it identifies a relevant role of T in summer and winter at the CG and VA sites, as well as a relevant role of ws in spring, summer, and fall at the VA site.
Insights into the shape of the relationship between the feature values and their impact on the prediction can be obtained with the SHAP dependence plots. In Figure 4a the relationship between O3, the most significant features for the VA site, and its SHAP values is illustrated. The analysis of this relationship enables the identification of the turning points, i.e., points where the feature contribution to the AQI prediction undergoes an inversion in behavior, from positive to negative or vice versa, with respect to the base value. Figure 4a shows that O3 has a positive SHAP value above 50 µg/m3.
The vertical spread of SHAP values at a fixed value of the feature, which can be seen in Figure 4a, is a sign of interaction effects with all the other features in the model. To explain the effect of the interactions, the dependence plot showing the SHAP values of O3 at the VA site has been split into two parts, as shown in Figure 4b,c. The first represents the impact of O3 on the AQI after all interaction effects have been removed; the second represents the interaction between O3 and PM10 which is the strongest interaction determined by SHAP theory. Following the red arrow, we can see that below the turning point, increasing the PM10 values at a fixed O3 level increases the SHAP interaction values, which in turn increases the AQI and reduces air quality.
Finally, a short-term and site-specific analysis has been carried out to reveal some peculiarities in the variability of AQI during a selected pollution event, identified on the 24 of April 2019. In Figure 5 the time trend of distinctive features contributing to an AQI value higher than 150 is illustrated, showing that on the peak day all features push the prediction above the base value. Moreover, PM10 has a dominant role that is not apparent in the seasonal analysis. According to the analysis of the available metadata, this event can be attributed to a massive dust storm pushing Saharan dust over the Mediterranean Sea in late April 2019 that has been well documented at national and European levels [26].

4. Conclusions

The XGBoost model accurately predicts the AQI variability, showing a correlation of 0.88 for the testing datasets. The remaining discrepancies between the predicted and observed AQI levels may be due to the lack of available additional predictor factors, such as the height of the boundary layer or the contribution of other pollutants, such as volatile organic compounds (VOCs). The SHAP analysis makes it possible to identify the main determinants of AQI variability, both over the study period and in relation to a specific event, as well as the effect of the main interactions between features. It is also worth noting that our analysis suggests a relevant role of factors that are sensitive to climate change, such as O3 and T, in determining the variability of the AQI.
It is important to specify that caution should be exercised when deriving insights from SHAP analyses. SHAP only tells you what the model is doing within the context of the data on which it has been trained. However, even if it does not necessarily reveal the true relationship between variables and outcomes in the real world, it could help generate hypothesis to unveil implicit knowledge or even new knowledge.
Overall, the results obtained highlight the effectiveness of an integrated approach combining ML and SHAP algorithms in elucidating the main factors driving AQI evolution. This knowledge is expected to be useful for optimizing AQI prevention and mitigation measures for the protection of public health in the examined area.

Author Contributions

Conceptualization, C.A. and R.V.G.; methodology, C.A. and R.V.G.; software, C.A. and R.V.G.; validation, C.A. and R.V.G.; formal analysis, C.A. and R.V.G.; investigation, C.A. and R.V.G.; resources, C.A. and R.V.G.; data curation, C.A. and R.V.G.; writing—original draft preparation, C.A. and R.V.G.; writing—review and editing, C.A. and R.V.G.; visualization, C.A. and R.V.G.; supervision, C.A. and R.V.G.; project administration, C.A. and R.V.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data used in this work have been downloaded from https://www.arpalazio.net/main/aria/sci/basedati/chimici/chimici.php and https://www.arpalazio.it/rete-micro-meteorologica accessed on 7 March 2025 and 10 March 2025, respectively.

Acknowledgments

The authors are grateful to the Environmental Protection Agency of Lazio Region for providing the open data used in this work.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Plaia, A.; Ruggieri, M. Air Quality Indices: A Review. Rev. Env. Sci. Biotechnol. 2011, 10, 165–179. [Google Scholar] [CrossRef]
  2. Dimitriou, K.; Paschalidou, A.K.; Kassomenos, P.A. Assessing Air Quality with Regards to Its Effect on Human Health in the European Union through Air Quality Indices. Ecol. Indic. 2013, 27, 108–115. [Google Scholar] [CrossRef]
  3. European Union. EU Directive (EU) 2024/2881 of the European Parliament and of the Council of 23 October 2024 on Ambient Air Quality and Cleaner Air for Europe (Recast); European Union: Luxembourg, 2024. [Google Scholar]
  4. Karavas, Z.; Karayannis, V.; Moustakas, K. Comparative Study of Air Quality Indices in the European Union towards Adopting a Common Air Quality Index. Energy Environ. 2021, 32, 959–980. [Google Scholar] [CrossRef]
  5. Gazzetta Ufficiale. Legislative Decree 155-13-8-2010 Attuazione Della Direttiva 2008/50/CE Relativa Alla Qualità Dell’aria Ambiente e per Un’aria Più Pulita in Europa; Gazzetta Ufficialen.216 del 15-09-2010—Suppl. Ordinario 217; Gazzetta Ufficiale: Rome, Italy, 2010. [Google Scholar]
  6. Tan, X.; Han, L.; Zhang, X.; Zhou, W.; Li, W.; Qian, Y. A Review of Current Air Quality Indexes and Improvements under the Multi-Contaminant Air Pollution Exposure. J. Environ. Manag. 2021, 279, 111681. [Google Scholar] [CrossRef]
  7. Adebayo-Ojo, T.C.; Wichmann, J.; Arowosegbe, O.O.; Probst-Hensch, N.; Schindler, C.; Künzli, N. A New Global Air Quality Health Index Based on the WHO Air Quality Guideline Values with Application in Cape Town. Int. J. Public Health. 2023, 68, 1606349. [Google Scholar] [CrossRef]
  8. Sun, X.; Tian, Z. A Novel Air Quality Index Prediction Model Based on Variational Mode Decomposition and SARIMA-GA-TCN. Process Saf. Environ. Prot. 2024, 184, 961–992. [Google Scholar] [CrossRef]
  9. Das, R.; Middya, A.I.; Roy, S. High Granular and Short Term Time Series Forecasting of PM 2.5 Air Pollutant—A Comparative Review. Artif. Intell. Rev. 2022, 55, 1253–1287. [Google Scholar] [CrossRef]
  10. Zhou, K.; Wang, W.Y.; Hu, T.; Wu, C.H. Comparison of Time Series Forecasting Based on Statistical ARIMA Model and LSTM with Attention Mechanism. J. Phys. Conf. Ser. 2020, 1631, 012141. [Google Scholar] [CrossRef]
  11. Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
  12. Lundberg, S.; Lee, S.-I. A Unified Approach to Interpreting Model Predictions. Adv. Neural Inf. Process. Syst. 2017, 30, 4768–4777. [Google Scholar]
  13. Peng, Z.; Zhang, B.; Wang, D.; Niu, X.; Sun, J.; Xu, H.; Cao, J.; Shen, Z. Application of Machine Learning in Atmospheric Pollution Research: A State-of-Art Review. Sci. Total Environ. 2024, 910, 168588. [Google Scholar] [CrossRef]
  14. Chakraborty, S.; Misra, B.; Dey, N. Explainable Artificial Intelligence (XAI) for Air Quality Assessment. Des. Stud. Intell. Eng. 2024, 383, 333–341. [Google Scholar]
  15. Adadi, A.; Berrada, M. Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI). IEEE Access. 2018, 6, 52138–52160. [Google Scholar] [CrossRef]
  16. Beck, H.E.; Zimmermann, N.E.; McVicar, T.R.; Vergopolan, N.; Berg, A.; Wood, E.F. Present and Future Köppen-Geiger Climate Classification Maps at 1-Km Resolution. Sci. Data. 2018, 5, 180214. [Google Scholar] [CrossRef] [PubMed]
  17. Di Bernardino, A.; Iannarelli, A.M.; Casadio, S.; Pisacane, G.; Siani, A.M. Spatial-Temporal Assessment of Air Quality in Rome (Italy) Based on Anemological Clustering. Atmos. Pollut. Res. 2023, 14, 101670. [Google Scholar] [CrossRef]
  18. Gobbi, G.P.; Angelini, F.; Barnaba, F.; Costabile, F.; Baldasano, J.M.; Basart, S.; Sozzi, R.; Bolignano, A. Changes in Particulate Matter Physical Properties during Saharan Advections over Rome (Italy): A Four-Year Study, 2001–2004. Atmos. Chem. Phys. 2013, 13, 7395–7404. [Google Scholar] [CrossRef]
  19. ARPA Lazio. Qualità Dell’aria Nella Regione Lazio. Analisi Delle Serie Storiche Dei Principali Inquinanti 2013–2022; ARPA Lazio: Rieti, Italy, 2023; Available online: https://www.arpalazio.it/web/guest/pubblicazioni (accessed on 7 March 2025).
  20. Ma, J.; Ding, Y.; Cheng, J.C.P.; Jiang, F.; Tan, Y.; Gan, V.J.L.; Wan, Z. Identification of High Impact Factors of Air Quality on a National Scale Using Big Data and Machine Learning Techniques. J. Clean. Prod. 2020, 244, 118955. [Google Scholar] [CrossRef]
  21. Ponce-Bobadilla, A.V.; Schmitt, V.; Maier, C.S.; Mensing, S.; Stodtmann, S. Practical Guide to SHAP Analysis: Explaining Supervised Machine Learning Model Predictions in Drug Development. Clin. Transl. Sci. 2024, 17, e70056. [Google Scholar] [CrossRef]
  22. Liu, H.; Li, Q.; Yu, D.; Gu, Y. Air Quality Index and Air Pollutant Concentration Prediction Based on Machine Learning Algorithms. Appl. Sci. 2019, 9, 4069. [Google Scholar] [CrossRef]
  23. He, J.; Wang, T.; Li, H.; Zhou, Y.; Liu, Y.; Xu, A. Synergistic Toxicity of Fine Particulate Matter and Ozone and Their Underlying Mechanisms. Toxics. 2025, 13, 236. [Google Scholar] [CrossRef]
  24. Pusede, S.E.; Steiner, A.L.; Cohen, R.C. Temperature and Recent Trends in the Chemistry of Continental Surface Ozone. Chem. Rev. 2015, 115, 3898–3918. [Google Scholar] [CrossRef]
  25. Xie, J.; Sun, T.; Liu, C.; Li, L.; Xu, X.; Miao, S.; Lin, L.; Chen, Y.; Fan, S. Quantitative Evaluation of Impacts of the Steadiness and Duration of Urban Surface Wind Patterns on Air Quality. Sci. Total Environ. 2022, 850, 157957. [Google Scholar] [CrossRef] [PubMed]
  26. Peshev, Z.; Deleva, A.; Vulkova, L.; Dreischuh, T. Large-Scale Saharan Dust Episode in April 2019: Study of Desert Aerosol Loads over Sofia, Bulgaria, Using Remote Sensing, In Situ, and Modeling Resources. Atmosphere 2022, 13, 981. [Google Scholar] [CrossRef]
Figure 1. (a) Location of the study area, the measurements sites (red markers), the weather stations (blue markers), and (b) AQI temporal pattern at CG and VA.
Figure 1. (a) Location of the study area, the measurements sites (red markers), the weather stations (blue markers), and (b) AQI temporal pattern at CG and VA.
Eesp 34 00001 g001
Figure 2. Scatter plot of AQI predicted vs. observed values on the test dataset for (a) CG and (b) VA sites. R2 values and the equation of the best fit line are also shown.
Figure 2. Scatter plot of AQI predicted vs. observed values on the test dataset for (a) CG and (b) VA sites. R2 values and the equation of the best fit line are also shown.
Eesp 34 00001 g002
Figure 3. Summary plots of the SHAP values for O3. (a) CG site and (b) VA site. The values on the left side represent the absolute mean of the SHAP values.
Figure 3. Summary plots of the SHAP values for O3. (a) CG site and (b) VA site. The values on the left side represent the absolute mean of the SHAP values.
Eesp 34 00001 g003
Figure 4. Dependence plots of the SHAP values for (a) O3, (b) O3’s main effect, and (c) the O3—PM10 interaction effect.
Figure 4. Dependence plots of the SHAP values for (a) O3, (b) O3’s main effect, and (c) the O3—PM10 interaction effect.
Eesp 34 00001 g004
Figure 5. Stacked area time series of the individual SHAP values. In brackets the feature’s SHAP contribution over the April month are reported.
Figure 5. Stacked area time series of the individual SHAP values. In brackets the feature’s SHAP contribution over the April month are reported.
Eesp 34 00001 g005
Table 1. Best XGBoost model hyperparameters.
Table 1. Best XGBoost model hyperparameters.
Eta.Max_DepthMin_Child_WeightSubsampleColsample_BytreeGamma
0.2880.50.91.73
Table 2. AQI XGBoost model performance on training and testing data during the period 2018–2022.
Table 2. AQI XGBoost model performance on training and testing data during the period 2018–2022.
Statistical IndicatorCastel di GuidoVilla Ada
R20.880.88
MBE−0.130.30
RMSE5.966.93
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Andenna, C.; Gagliardi, R.V. Insights into Air Quality Index (AQI) Variability with Explainable Machine Learning Techniques. Environ. Earth Sci. Proc. 2025, 34, 1. https://doi.org/10.3390/eesp2025034001

AMA Style

Andenna C, Gagliardi RV. Insights into Air Quality Index (AQI) Variability with Explainable Machine Learning Techniques. Environmental and Earth Sciences Proceedings. 2025; 34(1):1. https://doi.org/10.3390/eesp2025034001

Chicago/Turabian Style

Andenna, Claudio, and Roberta Valentina Gagliardi. 2025. "Insights into Air Quality Index (AQI) Variability with Explainable Machine Learning Techniques" Environmental and Earth Sciences Proceedings 34, no. 1: 1. https://doi.org/10.3390/eesp2025034001

APA Style

Andenna, C., & Gagliardi, R. V. (2025). Insights into Air Quality Index (AQI) Variability with Explainable Machine Learning Techniques. Environmental and Earth Sciences Proceedings, 34(1), 1. https://doi.org/10.3390/eesp2025034001

Article Metrics

Back to TopTop