Next Article in Journal
An Improved One-Dimensional Variational Method for a Ground-Based Microwave Radiometer
Previous Article in Journal
The Applications of AI Tools in the Fields of Weather and Climate—Selected Examples
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Exploring the Influencing Factors of Surface Ozone Variability by Explainable Machine Learning: A Case Study in the Basilicata Region (Southern Italy)

by
Roberta Valentina Gagliardi
1,* and
Claudio Andenna
2
1
Istituto Superiore di Sanità, 00161 Rome, Italy
2
Istituto Nazionale per l’Assicurazione contro gli Infortuni sul Lavoro (INAIL-DIT), 00143 Rome, Italy
*
Author to whom correspondence should be addressed.
Atmosphere 2025, 16(5), 491; https://doi.org/10.3390/atmos16050491
Submission received: 18 March 2025 / Revised: 16 April 2025 / Accepted: 21 April 2025 / Published: 24 April 2025
(This article belongs to the Section Air Quality)

Abstract

:
Exposure to high surface ozone (O3) concentrations, which is a major air pollutant and greenhouse gas, constitutes a significant public health concern, especially considering the potential adverse impact of climate change on future O3 values. The implementation of increasingly effective methods to assess the factors determining the formation and variability of O3 is, therefore, of great significance. In this study, a methodological approach combining both supervised and unsupervised machine learning algorithms (MLAs) with the Shapley additive explanations (SHAP) method was used to understand the key factors behind O3 variability and to explore the nonlinear relationships linking O3 to these factors. The SHAP analysis carried out at different event scales indicated (i) the dominant role of the meteorological variables in driving O3 variability, mainly relative humidity, wind speed, and temperature throughout the study period; (ii) an increase in the contribution of temperature, nitrogen oxides, and carbon monoxide to high O3 concentrations during a selected pollution event; (iii) the predominant effect of wind speed and relative humidity in shaping the O3 daily patterns clustered using the k-means technique. The results obtained are expected to be useful for the definition of effective measures to prevent and/or mitigate the health damage associated with ozone exposure.

1. Introduction

Exposure to surface ozone (O3), which is a secondary air pollutant, can cause adverse health impacts, primarily affecting the human respiratory and cardiovascular systems [1,2,3]. As a strong oxidant, ozone is also harmful to vegetation [4] and materials [5]. No less important is the two-way interaction between tropospheric ozone and climate change. Firstly, tropospheric ozone affects climate change, to which it contributes by acting both as the third-most-important greenhouse gas and as an indirect controller of other greenhouse gases’ lifetimes [6,7]. Conversely, climate change can impact O3 levels in several ways, including altering meteorological factors, modifying stratosphere–troposphere exchange, and enhancing natural emissions of O3 precursors [8,9,10].
At the European level, despite the reduction of the air pollutant emissions recorded in the last decades [11], monitored data indicate that O3 levels frequently exceed the target value threshold for the protection of human health as established by the European Union (EU) Air Quality Directive [12], i.e., maximum daily 8 h mean ozone concentrations (MDA8) of 120 μg/m3, as well as the short-term guideline level established by the World Health Organization Air Quality Guidelines (WHO AQG) [13], i.e., MDA8 of 100 μg/m3.
The European Environment Agency (EEA) [14] estimates that, in 2022, 19% of the European urban population was exposed to O3 concentrations above the EU target value, while a significantly higher percentage, about 94%, was exposed to O3 levels exceeding the stricter 2021 WHO short-term guideline value for health protection. In Italy, in the same year, the percentage of urban population exposed to O3 concentrations above the EU target value was 58% [15], with exceedances recorded in greater numbers in the northern part of the country and, to a lesser extent, in the rest of Italy [16]. Considering the large percentage of the population at risk, exposure to high O3 concentrations constitutes a significant public health concern, especially in the scope of the potential adverse impact of climate change on future O3 values [17]. Therefore, it is of great importance to implement increasingly effective methods for assessing the effect of the main factors determining the formation and variability of O3.
At ground level, the primary factors influencing O3 formation are the precursor emissions, mainly nitrogen oxides (NOx ≡ NO + NO2), volatile organic compounds (VOC), carbon monoxide (CO), and methane (CH4), together with climatic factors such as temperature, humidity, precipitation, and wind speed, which affect the photochemical and transport processes regulating O3 production and destruction [18]. The complex nonlinear relationships between O3 and its many determinants make it a difficult task to accurately estimate O3 variability even considering the influence of climate change on the variables that modulate O3 levels [19,20,21].
It is precisely the ability to explore large and complex datasets, make predictions, and model nonlinear relationships that accounts for the increasingly widespread use of machine learning algorithms (MLAs), which has been recorded in the last decade in the field of air quality modeling [22,23,24]. MLAs offer superior accuracy and computational efficiency compared to traditional deterministic and statistic regression approaches [25,26], being less dependent on up-to-date emission inventories while successfully capturing the nonlinear relationships and interactions between variables [27]. Moreover, thanks to the development of new tools in the field of explainable artificial intelligence (XAI) [28], the opportunity to explain the output of a machine learning model by estimating the contributions of input features to the model’s predictions is now provided, thus increasing its transparency and reliability. This capability is crucial in the context of air quality assessments [29], where a comprehensive understanding of the factors determining air pollution levels is an essential scientific basis for targeted pollution control measures. Furthermore, the use of explainable MLAs has the potential to facilitate researchers in generating hypotheses concerning the atmospheric processes that are responsible for the formation and variability of air pollution [30,31].
In this study, an approach combining MLAs with the Shapley additive explanations (SHAP) method [32,33] was adopted to quantify the influence of several meteorological and chemical factors on O3 variability [34,35,36]. The ability of explainable machine learning models to suggest hypotheses underlying O3 variability was also exploited. To this end, both supervised and unsupervised MLAs—XGBoost and k-means, respectively—were used: the former was selected for regression task to simulate the O3 concentrations [37]; the latter was employed to cluster monitoring sites according to the observed O3 daily pattern [38,39]. The SHAP algorithm, a tool based on coalitional game theory, was applied to assess how the input features affect the final model predictions [32]. The application of SHAP analysis enabled the visualization of feature contributions to the model output, thereby facilitating the identification of functional relationships. The MLAs models were developed on observed data of pollutants and meteorological data collected, over the 2018–2022 period, at ten air quality monitoring sites of the Basilicata region (Southern Italy), which lies in the center of the Mediterranean area. SHAP was applied to the model results both considering the entire sampling period analyzed in this study and a suitably selected pollution event. Moreover, since O3 and its precursors exhibit a specific daily pattern closely related to atmospheric physical and chemical processes, the O3 variability was also analyzed from the perspective of its daily cycle [39].
It is worth noting that the Basilicata Region is an area sensitive to O3 pollution [16], although in the context of a general low level of O3 precursors (NO, NO2, CO), i.e., concentrations values below the current limit values set by the EU, the US Environmental Protection Agency (EPA) standards, and the 2005 WHO Air Quality Guidelines [40,41]. Consequently, the Basilicata region offers a valuable case study for investigating the factors influencing the O3 variability in an environment characterized by low concentrations of O3 precursors, a scenario that has been documented in only a few studies to date [42,43]. In this context, the identification of the factors that most influence the dynamic variation of O3 predictions, both in the long and short term, represents a fundamental knowledge to optimize O3 prevention and mitigation measures for the protection of public health in the examined area.

2. Data and Methods

2.1. Study Area

The Basilicata region (Southern Italy) is an area of about 10,000 km2 with a low population density of approximately 530,000 inhabitants (https://demo.istat.it/app/). The region’s topography is characterized by a high degree of complexity, with significant variations in both physiographic features and morphological diversity, which can be observed over a relatively short distance. The climate of the region can be defined as continental, exhibiting Mediterranean characteristics only in the coastal areas [44]. Its geographical position in the center of the Mediterranean basin renders the Basilicata region potentially vulnerable both to the transport of mineral dust from the Sahara Desert [45,46] and to intense photochemical O3 events [47,48]. Domestic heating and vehicular traffic are the main sources of air pollution within the region; industrial activities are concentrated in a few areas where international companies operate within the automotive and agri-food sectors. The region’s most significant peculiarity is the presence within inhabited areas of two onshore hydrocarbon reservoirs, one of which is the largest in Western Europe, and two oil pre-treatment plants. This makes Basilicata the region that contributes more than any other to the national production of hydrocarbons extracted on land (accounting for over 80% of the national production) (https://unmig.mase.gov.it/wp-content/uploads/dati/produzione/produzione-2023.pdf, accessed on 15 January 2025).
A network of 15 monitoring stations, operated by the Agency for Environmental Protection of the Basilicata Region (ARPAB), evaluates air quality; the measured data are made publicly available after a validation process. According to the annual report on environmental data drafted by the ARPAB [49], for the period 2020–2022 and at all air quality monitoring sites, O3 experiences exceedances of 120 µg/m3 (MDA8) target value. The other regulated pollutants considered in this work comply with the EU legislation limit values.

2.2. Observational Dataset

To build the working dataset, hourly data on concentrations of four air pollutants, namely, O3, nitrogen dioxide (NO2), nitrogen oxide (NO), and carbon monoxide (CO), as well as on six meteorological parameters, including temperature (T), atmospheric pressure (p), relative humidity (RH), wind direction (wd), wind speed (ws), and precipitation (prec), were downloaded from the official website of ARPAB (https://www.arpab.it/temi-ambientali/aria/qualita-dellaria/monitoraggio-della-qualita-dellaria/qualita-dellaria/dataset, accessed on 1 July 2024). The data refer to ten monitoring sites selected based on the condition that the time series of all variables considered satisfy the proportion of 75% of valid data for each year considered. Overall, a data set consisting of 438,240 hourly observations, spanning the 2018–2022 period, was set up for the development of ML models.
Figure 1 shows the location of the study area and the measurement sites; Table 1 summarizes information on the monitoring sites, and Table S1 in the Supporting Information (SI) provides a statistical summary of air pollutants and meteorological parameters for each station over the 2018–2022 period.

2.3. Machine Learning Models

A single-station-level modeling approach was adopted in order to consider site-specific topography, weather, and atmospheric conditions.
All data loading, processing, statistical analysis, and modeling were accomplished in the R software environment and its associated packages (R version 4.3.1, https://www.r-project.org).

2.3.1. XGBoost

The eXtreme Gradient Boosting (XGBoost) algorithm [50] is a supervised machine learning technique based on the ensemble of decision trees, designed to solve classification and regression problems. It was chosen to predict the time-varying ozone concentrations due to its scalability, speed, performance, and interpretability compared to other algorithms [36].
Assuming hourly O3 concentrations as the model output (or target), the model can be expressed in terms of nine explanatory variables (or features), as follows:
O 3 ~ x g b o o s t ( n o 2 , n o , c o , T , R H , w s , w d , p , p r e c )
where x g b o o s t is the function implementing the boosted regression tree technique in the R software environment (xgboost R package 1.7.8.1), and the chosen features are available in all the sites considered.
To build the model, the observational dataset was randomly partitioned into a training dataset for the model development (75% of the observations) and a test dataset used to check the model performance (25% of the observations). Hyperparameter tuning, as defined in https://xgboost.readthedocs.io/en/latest/parameter.html (accessed on 22 July 2024), was conducted by means of Bayesian optimization (ParBayesianOptimization R package 1.2.6). The optimal combination of hyperparameters for each site is listed in Table S2 in the SI.
The XGBoost models’ skill was evaluated by comparing predicted and observed O3 concentration values using a number of statistical indicators for the training and test datasets. The coefficient of determination (R2), the mean bias error (MBE, μg/m3), and the root mean square error (RMSE, μg/m3) were utilized in this study (Section S1 in the SI). It is acknowledged that high accuracy (R2 close to 1) and minimal errors (MBE and RMSE close to 0) are the desired outcomes for an optimal prediction model.

2.3.2. SHAP

The Shapley additive explanations (SHAP) method [32,33] is a technique that allows the explanation of the output of any machine learning model. Based on the coalitional game theory [51], this approach enables an understanding of how the model reaches its predictions by assessing each feature’s contribution and addressing complex feature interactions. Moreover, SHAP facilitates the generation of several plots, thereby enabling a more intuitive comprehension of the model output [52].
In this study, based on the outcomes of the XGBoost models, SHAP was applied to elucidate the entity and the nature (positive or negative) of the contribution of each feature to the O3 predictions (shapViz R package, 0.9.7). The SHAP explanations were supported by several SHAP visualization tools, including the global feature importance plot, the summary plot, and the dependence plot. The equations of SHAP values can be found in Section S2 of the SI.

2.3.3. K-Means

The k-means clustering algorithm [53] has been extensively utilized in air pollution studies for the identification of interesting patterns and possible insights into the underlying structure of data, because of its simplicity, ease of implementation, and efficiency [54]. In this study, the k-means algorithm (stats R package, 4.4.3) was applied to cluster the daily O3 patterns at various monitoring sites. To this end, the daily pattern of the O3 concentrations and of all features included in the model was initially determined by averaging the values of each variable at every hour of each day over the study period. The results were collated into a new dataset comprising 240 rows (24 h for each of the 10 sites) on which the k-means technique was applied. The optimal number of clusters was computed using the elbow method, ensuring that clusters consisting of a single site were avoided [38].
An XGBoost model to predict the daily pattern of O3 was fitted using the same method described in Section 2.3.1 (training/test split of 75/25% and Bayesian optimization of the hyperparameters; see results in Table S3 of the SI). This allowed the application of the SHAP method to the model results to investigate the contribution of the features to each cluster.

3. Results and Discussion

To avoid redundancies, the text presents figures describing the results obtained for three sites—FE, ME, and VP—while results for all other sites are reported in the SI.

3.1. XGBoost Model Performance

As shown in Figure 2 (and Figure S1 in the SI), a good degree of concordance was found between the observed and predicted O3 concentration values over the 2018–2022 period. The values of the statistical indicators employed to assess the XGBoost models’ performance are listed in Table S4. Across all sites, R2, MBE, and RMSE evaluated on the test data set varied in the ranges of 0.59–0.86, −1.3–−0.87 µg/m3, and 10–12 µg/m3, respectively. The best performance in predicting O3 concentrations was obtained at the FE site, and the poorest at the GR site. Overall, the performance of the models was satisfactory for hourly predictions and consistent with other studies [55,56].

3.2. Global Importance of Main Influencing Factors

Based on the prediction results from the XGBoost models, the SHAP algorithm was applied to quantify the impact of each feature on O3 variability.
The global importance of the features was ranked based on the mean absolute SHAP value, as expressed by Equation (S2.3) in the SI. This quantifies the magnitude of each feature’s contribution to the O3 prediction, with higher mean absolute SHAP values corresponding to more influential features. The overall contribution of the meteorological features, as shown in Figure 3a (and in Figure S2a in the SI), was found to be dominant at all sites, ranging from 72% (ME) to 81% (LM), over the entire period. RH, T, ws, and wd generally turned out to be the most significant meteorological features. It is noteworthy that RH was confirmed as the most significant predictive variable across all sites, accounting for a percentage of the total contribution of meteorological parameters ranging from 36.5% (VP) to 54.4% (PZB). This finding confirms the conclusions of a previous study conducted on data from the MdB site [57], as well as those of other studies carried out in the Mediterranean region [58,59]. Among the atmospheric pollutants considered, NO2 and NO were the most important features, while CO provided a negligible contribution. The analysis revealed that approximately 75% of the variability in O3 was attributable to RH, T, ws, NO2, and NO. However, the LM site represented an exception to this pattern, with RH, wd, T, NO2, and p emerging as the top five features.
Figure 3b (and Figure S2b in the SI) summarizes, in descending order, the relative importance of each feature and the actual relationships with the predicted outcome. This highlights the magnitude as well as the positive or negative contribution of each feature to the target prediction. In all sites examined, elevated RH values exhibited negative SHAP values, leading to reduced O3 predictions, while the opposite was observed for low RH values. Conversely, elevated T and ws values exhibited positive SHAP values, contributing positively to O3 predictions, while, in contrast, low T and ws values exerted a negative influence. Furthermore, high NO2 and NO values exhibited negative SHAP values, consequently contributing negatively to the O3 predictions. However, the NO2 level in the ME site showed an opposite trend.

3.2.1. Meteorological Factors

Insights into the relationship between feature values and their impact on the prediction can be obtained with the SHAP dependence plots. In this study, the relationships between the top five most significant features for each site and their SHAP values were analyzed, as illustrated in Figure 4 (and Figure S3 in the SI). The analysis of these relationships enabled the identification of one or more turning points where the feature contribution to the O3 prediction undergoes an inversion in behavior—from positive to negative, or vice versa.
A nonlinear negative relationship between SHAP values and RH was observed in all sites. The lowest turning point, at approximately RH = 60%, was observed at the VP site, while the highest, at RH~85%, was recorded at the MdB site. For the remaining sites, the turning point was around 75%. The SHAP values were found to be positive below this threshold and negative above, indicating that RH values higher than the turning point lowered O3 prediction. Several reasons can be given to explain the negative correlation between O3 and RH. A list of different physical–chemical processes in which RH favors O3 decomposition has been described in the extant scientific literature [60,61]. High humidity can also be associated with cloud formation and atmospheric instability: the former slows ozone production due to a reduction in solar insolation, while the latter increases ozone dispersion [62,63,64,65]. Furthermore, as previously suggested by [57], in certain locations, the role of RH in the removal of O3 by dry deposition cannot be disregarded, due to an increase in the O3 uptake following the plants’ stomatal opening when RH increases [66].
A nonlinear positive association between SHAP values and T, as well as a turning point between 15 °C and 20 °C, was observed in all sites. Therefore, below this point, T contributed negatively to O3 predictions, while above, it contributed positively. The relationship between O3 and T followed the expected trend, with T influencing the rate of chemical reactions that promote O3 production [67]. Furthermore, elevated temperatures are frequently associated with increased solar radiation and VOC emissions from biogenic sources, as well as decreased water vapor, all of which collectively contribute to increased O3 concentrations [68].
A highly nonlinear relationship between SHAP values and ws was recorded at all sites, consisting of a negative association for very low ws values and a positive association for ws values above approximately 2 m/s, followed by a plateau for higher ws values. However, at the ME site, a decline in SHAP values was observed after the plateau phase, indicating that ws values between 2 m/s and 5 m/s were the most favorable for high O3 concentrations at this site.
The local geographical conditions and the distribution of ozone and precursors regulate the ozone-wind interaction, making it difficult to schematize and generalize the relationship between ws and O3. Moreover, the secondary nature of O3 makes this interaction not subject to a univocal interpretation. In our study, the presence of very low ws values was concomitant with negative SHAP values, indicating an unfavorable environment for O3 concentration. This suggests that O3 does not have a local origin and that transport from other areas is negligible due to the very slow wind speed. Conversely, values of ws in the range of 2–4 m/s may be essential for the accumulation of precursors (VOCs and NOx) that are responsible for O3 formation [59,69,70]. At higher values of ws, O3 can be transported from other sites or dispersed, as observed at the ME site [71].
The SHAP dependence showed a nonlinear relationship between wd and O3 concentrations. The impact of wd on O3 concentrations represents the effect of the main pathways of transport, which are site-specific. As illustrated by the polar plot in Figure S4 in the SI, each site exhibited different directions of the highest O3 concentration, the only common feature being the absence of O3 at very low ws values. Figure S5 in the SI shows how we and wd vary for each site.
In conclusion, the most favorable meteorological conditions for the increase in O3 concentrations at the sites considered and, presumably, across the Basilicata region, are those with RH < 75%, T > 20 °C, and ws > 2 m/s.

3.2.2. Chemical Factors

Among the chemical features analyzed, NO2 exhibited the strongest impact on O3 predictions. With the exception of the ME site, SHAP values decreased with increasing NO2 values, indicating a negative nonlinear relationship. Positive SHAP values were generally observed for NO2 concentrations below <5 µg/m3, while negative SHAP values were noted for NO2 concentrations exceeding this range.
Conversely, at the ME site, an opposite trend was observed: at low NO2 values (<3 µg/m3), SHAP values were negative, while for higher NO2 values, SHAP values spread around zero and contributed both positively and negatively to O3 concentrations. For NO2 values greater than 20 µg/m3, the contribution was predominantly positive (see Figure S6 in the SI).
For NO values lower than approximately 2 µg/m3, the SHAP values generally spread around zero or were positive, while for higher NO values, they were negative.
The negative correlation between NO2 and O3 likely arises from the fact that NO2 serves as a precursor to the formation of O3. The correlation between NO and O3, on the other hand, could be attributed to the titration effect, which promotes the decrease of O3 [72]. However, due to the complex nonlinear nature of the relationship between O3 and its precursors, further data and investigation are necessary for a correct interpretation of the observed trend.
To summarize, with the exception of the ME site, NO2 < 5 µg/m3 and NO < 2 µg/m3 seem to be the conditions most conducive to O3 accumulation.
In the following Table 2, all feature values favorable to O3 concentration increase are summarized.

3.3. Contributing Factors to High Selected Pollution Event

Short-term and site-specific analyses can reveal some peculiarities in the variability of O3 [73], the knowledge of which is crucial, especially in reference to pollution episodes that raise concern for public health and the environment. Therefore, the potentialities of the methodological approach illustrated in the previous sections were exploited to identify the main drivers of a selected O3 pollution event spanning a few days.
Among the examined sites, ME was the monitoring station that recorded the highest number of exceedances of the O3 target value of 120 µg/m3 (MDA8) in the 2020–2022 period [49]. Consequently, as a case study, a ten-day period, spanning from 10 June 2021 to 20 June 2021, during which exceedances of the target value were recorded, was selected as the case study for this site.
As illustrated in Figure 5, the time series of observed and predicted O3 concentration values over this period showed similar patterns, confirming the efficacy of the XGBoost model in replicating O3 variability. The entire period under consideration was characterized by features with predominantly positive SHAP values, which resulted in predictions exceeding 73.4 µg/m3, the base value as defined by Equation (S2.1) in the SI. T ranked first among the features followed by RH, confirming the dominant role of these two meteorological parameters. In contrast to the overall period, NO had a positive impact on O3 predictions. It is interesting to note the pivotal role played by CO in increasing O3 concentrations, although in the global model, CO was found to have a limited impact on the prediction. The positive contributions of both NO and CO suggested a possible contribution of combustion processes to the high O3 concentrations [61].
Moreover, as shown in Figure 6, in accordance with the outcomes of the preceding Section 3.2.1 and Section 3.2.2, the positive contribution of T and RH was identified for T values exceeding the turning point of 20 °C and RH values falling below the turning point of 75%. The range of values for ws varied between 1.5 and 4.5 m/s, indicating a favorable environment for O3 increase.
Similarly, NO exerts a positive impact at concentrations below 2 µg/m3, while NO2 shows a dual effect, exhibiting both negative and positive contributions at levels exceeding 3 µg/m3. Finally, for CO, no turning point has been identified since it provides a negligible contribution to the overall model.
Based on the obtained results, it can be hypothesized that both photochemical processes, mainly driven by T and RH, and combustion processes, driven by CO and NO, may have been the causative factors for the elevated O3 predictions at the Melfi site during the specified period.

3.4. Contributing Factors to O3 Daily Pattern

Due to the strong dependence of O3 daily patterns on both chemistry and meteorological processes, the clustering of daily fluctuations in O3 could reveal interesting and unique findings across the examined sites [67,74]. The elbow method was used to determine the optimal number of clusters while taking care to avoid a single-site cluster. As shown in Figure 7, three distinct clusters were identified. They mainly differ in the daily amplitudes, defined as the difference between the maximum and minimum O3 values, which are 42 µg/m3, 22 µg/m3, and 13 µg/m3 for Cluster 1, Cluster 2, and Cluster 3, respectively.
Cluster 3 showed a less common daily pattern, characterized by the smallest daily amplitude and an O3 level never lower than 80 µg/m3 both day and night. It is notable that this cluster includes the two sites that are located at the greatest altitude among those under consideration. It is acknowledged that daily O3 amplitudes diminish with increasing site elevation [75]. Since it is recognized that high nocturnal ozone concentrations have adverse effects on human health and biological growth [76,77,78], the O3 variability was analyzed from the perspective of the third cluster.
Using the dataset described in Section 2.3.3, an XGBoost model was built and applied to predict the daily pattern of O3 concentration values for each cluster (R2 = 0.967 on the test data, Table S5 in the SI). Then, based on the model outputs, SHAP values were calculated to determine the main drivers featuring the three clusters. The summary plots obtained (see Figure 8) show the ranking of the most important features. The main aspect characterizing the third cluster was the consistently positive contribution of the SHAP values of the most important features; in the other two clusters, instead, both negative and positive contributions to the prediction outcome were registered.
The results obtained were consistent with those described in Section 3.2.1 and Section 3.2.2: as illustrated in Figure 9, the features values in Cluster 3 fell within the range indicated as favorable for the O3 increase, i.e., ws > 2 m/s, RH < 75%, NO < 2 µg/m3, NO2 < 5 µg/m3. Moreover, in contrast to the other two clusters, these conditions were also verified in the interval from 00:00 to 06:00 a.m., during which time Cluster 3 exhibited O3 concentration values that were higher than in the other two clusters.
Nor can the important positive impact of pressure on the O3 daily pattern be ignored in Cluster 3, in contrast to the poor significance of this feature in the global model built on the 2018–2022 period.
The predominant role played by ws in Cluster 3 has led to the hypothesis that transport significantly contributes to the high nocturnal O3 values, in conjunction with RH values too low to have a negative effect on the ozone concentrations.

4. Conclusions

In this study, a methodological approach combining both supervised and unsupervised MLAs with the SHAP method was adopted, with the aim of analyzing the main meteorological and chemical factors affecting O3 variability in the Basilicata region and exploring potential physical and chemical mechanisms behind this variability.
The findings, based on the entire 2018–2022 period and all sites, revealed that meteorological conditions, dominated by RH, ws, and T parameters, were responsible for at least 75% of the O3 variability. NO and NO2 were found to contribute marginally to this variability.
Furthermore, for each feature, a turning point was determined on an average basis across all sites. Specifically, RH < 75%, T > 20 °C, and ws > 2 m/s were identified as the most favorable meteorological conditions for O3 increase. With respect to the chemical factors, the most conducive conditions for the O3 accumulation were identified as NO2 < 5 µg/m3 and NO < 2 µg/m3. However, the ME site exhibited a different NO2 trend, with values above 3 µg/m3 being conducive to O3 formation.
The analysis of a pollution event revealed that CO and NO contributed to O3 predictions, in addition to T and RH. This suggests that combustion processes may be a contributing factor to the elevated O3 concentrations. Conversely, the predominant role of ws in shaping the daily O3 pattern at higher altitude sites may indicate the transport phenomenon as a possible cause of elevated nighttime O3 levels, which is also supported by low RH values.
However, caution should be exercised when deriving insights from SHAP analyses. The fundamental purpose of the SHAP approach is to enhance the comprehension of the model functioning; it does not inherently reveal the true relationship between variables and outcomes in the real world. Therefore, the integration of MLAs with other independent methodologies, such as process-based models or ad hoc experimental activities, should be considered to achieve a more structured knowledge of the real physical and chemical phenomena underlying O3 variability.
The study selected the available predictor variables for each site. The remaining discrepancies between predicted and observed O3 levels may be due to the lack of available additional predictor factors, such as the height of the boundary layer or the contribution of anthropogenic and biogenic VOC species.
Despite these limitations, the proposed approach has proven to be an effective tool for diagnosing the main drivers of O3 pollution in an area characterized by relatively low levels of O3 precursors and variable meteorological conditions using just publicly available monitored data. This knowledge is expected to be useful for the design of science-based strategies that maximize co-benefits of O3 reduction for air quality, public health, and climate change.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/atmos16050491/s1, Figure S1. Scatter plot of predicted vs. observed O3 values on the test dataset for (a) CM, (b) GR, (c) LM, (d) MdB, (e) PZB, (f) SNM, and (g) VZI. R2, MBE, and RMSE values and the equation of the best-fit line are also shown; Figure S2. (a) Percentage contribution of meteorological variables (meteo) and atmospheric pollutants (chemicals) to O3. (b) Summary plots of the SHAP values for O3. The values on the left side represent the absolute mean of the SHAP values; Figure S3. Main effects of the top five features on O3 (µg/m3) for (a) CM, (b) GR, (c) LM, (d) MdB, (e) PZB, (f) SNM, and (g) VZI; Figure S4. O3 concentrations varying with wind speed and wind direction for (a) FE, (b) LM, (c) PZB, and (d) VP; Figure S5. Wind roses for each site; Figure S6. Main effect of NO2 on O3 (µg/m3); Table S1. Statistical summary of air pollutants and meteorological parameters, based on hourly data over the 2018–2022 period; Table S2. Optimal hyperparameter values for O3 XGBoost models; Table S3. Optimal hyperparameter values for the XGBoost model on the O3 daily pattern dataset; Table S4. O3 XGBoost model performance on training and test datasets, 2018–2022 period; Table S5. O3 XGBoost model performance on training and test datasets for daily patterns.

Author Contributions

All individuals listed as authors have contributed equally to the present work. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Acknowledgments

The authors are grateful to the Environmental Protection Agency of the Basilicata Region for providing the data used in this work.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Soares, A.R.; Silva, C. Review of Ground-Level Ozone Impact in Respiratory Health Deterioration for the Past Two Decades. Atmosphere 2022, 13, 434. [Google Scholar] [CrossRef]
  2. Donzelli, G.; Suarez-Varela, M.M. Tropospheric Ozone: A Critical Review of the Literature on Emissions, Exposure, and Health Effects. Atmosphere 2024, 15, 779. [Google Scholar] [CrossRef]
  3. Gao, P.; Wu, Y.; He, L.; Wang, L.; Fu, Y.; Chen, J.; Zhang, F.; Krafft, T.; Martens, P. Adverse Short-Term Effects of Ozone on Cardiovascular Mortalities Modified by Season and Temperature: A Time-Series Study. Front. Public Health 2023, 11, 1182337. [Google Scholar] [CrossRef]
  4. Grulke, N.E.; Heath, R.L. Ozone Effects on Plants in Natural Ecosystems. Plant Biol. 2020, 22, 12–37. [Google Scholar] [CrossRef] [PubMed]
  5. Friedrichová, R.; Karl, J.; Růžička, M.; Vyskočilová, A.; Martinec, M.; Veselý, M.; Škubník, J.; Svobodová Pavlíčková, V.; Rimpelová, S.; Kuchař, M. Determination of Ozone Concentration and Its Effect on Degradation of Materials. Trans. VŠB—Tech. Univ. Ostrav. Saf. Eng. Ser. 2022, 17, 14–25. [Google Scholar] [CrossRef]
  6. Pope, R.J.; Rap, A.; Pimlott, M.A.; Barret, B.; Le Flochmoen, E.; Kerridge, B.J.; Siddans, R.; Latter, B.G.; Ventress, L.J.; Boynard, A.; et al. Quantifying the Tropospheric Ozone Radiative Effect and Its Temporal Evolution in the Satellite Era. Atmos. Chem. Phys. 2024, 24, 3613–3626. [Google Scholar] [CrossRef]
  7. Dewan, S.; Lakhani, A. Tropospheric Ozone and Its Natural Precursors Impacted by Climatic Changes in Emission and Dynamics. Front. Environ. Sci. 2022, 10, 1007942. [Google Scholar] [CrossRef]
  8. Fu, T.M.; Tian, H. Climate Change Penalty to Ozone Air Quality: Review of Current Understandings and Knowledge Gaps. Curr. Pollut. Rep. 2019, 5, 159–171. [Google Scholar] [CrossRef]
  9. Hertig, E.; Jahn, S.; Kaspar-Ott, I. Future Local Ground-Level Ozone in the European Area From Statistical Downscaling Projections Considering Climate and Emission Changes. Earths Future 2023, 11, e2022EF003317. [Google Scholar] [CrossRef]
  10. Rezaei, R.; Güllü, G.; Ünal, A. Assessing the Impact of Climate Change on Summertime Tropospheric Ozone in the Eastern Mediterranean: Insights from Meteorological and Air Quality Modeling. Atmos. Environ. 2025, 344, 121036. [Google Scholar] [CrossRef]
  11. Chen, Z.Y.; Petetin, H.; Méndez Turrubiates, R.F.; Achebak, H.; Pérez García-Pando, C.; Ballester, J. Population Exposure to Multiple Air Pollutants and Its Compound Episodes in Europe. Nat. Commun. 2024, 15, 2094. [Google Scholar] [CrossRef] [PubMed]
  12. European Union. EU Directive (EU) 2024/2881 of the European Parliament and of the Council of 23 October 2024 on Ambient Air Quality and Cleaner Air for Europe (Recast). Off. J. Eur. Union. 2024. Available online: https://eur-lex.europa.eu/eli/dir/2024/2881/oj/eng (accessed on 1 October 2024).
  13. WHO. WHO Global Air Quality Guidelines. Particulate Matter (PM2.5 and PM10), Ozone, Nitrogen Dioxide, Sulfur Dioxide and Carbon Monoxide; World Health Organization: Geneva, Switzerland, 2021. [Google Scholar]
  14. EEA Europe’s Air Quality Status 2024. Available online: https://www.eea.europa.eu/publications/europes-air-quality-status-2024 (accessed on 19 November 2024).
  15. EEA Italy—Air Pollution Country Fact Sheet 2024. Available online: https://www.eea.europa.eu/en/topics/in-depth/air-pollution/air-pollution-country-fact-sheets-2024/italy-air-pollution-country-fact-sheet-2024 (accessed on 20 January 2025).
  16. SNPA. La Qualità Dell’aria in Italia Edizione 2023; Report Ambientali SNPA; SNPA: Rome, Italy, 2024; Volume 40, ISBN 9788844812072. [Google Scholar]
  17. Lyu, X.; Li, K.; Guo, H.; Morawska, L.; Zhou, B.; Zeren, Y.; Jiang, F.; Chen, C.; Goldstein, A.H.; Xu, X.; et al. A Synergistic Ozone-Climate Control to Address Emerging Ozone Pollution Challenges. One Earth 2023, 6, 964–977. [Google Scholar] [CrossRef]
  18. Jacob, D.J.; Winner, D.A. Effect of Climate Change on Air Quality. Atmos. Environ. 2009, 43, 51–63. [Google Scholar] [CrossRef]
  19. Mayer, M.; Schreier, S.F.; Spangl, W.; Staehle, C.; Trimmel, H.; Rieder, H.E. An Analysis of 30 Years of Surface Ozone Concentrations in Austria: Temporal Evolution, Changes in Precursor Emissions and Chemical Regimes, Temperature Dependence, and Lessons for the Future. Environ. Sci. Atmos. 2022, 2, 601–615. [Google Scholar] [CrossRef]
  20. Li, J.; Wang, Y.; Qu, H. Dependence of Summertime Surface Ozone on NOx and VOC Emissions Over the United States: Peak Time and Value. Geophys. Res. Lett. 2019, 46, 3540–3550. [Google Scholar] [CrossRef]
  21. Koplitz, S.; Simon, H.; Henderson, B.; Liljegren, J.; Tonnesen, G.; Whitehill, A.; Wells, B. Changes in Ozone Chemical Sensitivity in the United States from 2007 to 2016. ACS Environ. Au 2022, 2, 206–222. [Google Scholar] [CrossRef]
  22. Peng, Z.; Zhang, B.; Wang, D.; Niu, X.; Sun, J.; Xu, H.; Cao, J.; Shen, Z. Application of Machine Learning in Atmospheric Pollution Research: A State-of-Art Review. Sci. Total Environ. 2024, 910, 168588. [Google Scholar] [CrossRef]
  23. Masih, A. Machine Learning Algorithms in Air Quality Modeling. Glob. J. Environ. Sci. Manag. 2019, 5, 515–534. [Google Scholar]
  24. Méndez, M.; Merayo, M.G.; Núñez, M. Machine Learning Algorithms to Forecast Air Quality: A Survey. Artif. Intell. Rev. 2023, 56, 10031–10066. [Google Scholar] [CrossRef]
  25. Zheng, L.; Lin, R.; Wang, X.; Chen, W. The Development and Application of Machine Learning in Atmospheric Environment Studies. Remote Sens. 2021, 13, 4839. [Google Scholar] [CrossRef]
  26. Rybarczyk, Y.; Zalakeviciute, R. Machine Learning Approaches for Outdoor Air Quality Modelling: A Systematic Review. Appl. Sci. 2018, 8, 2570. [Google Scholar] [CrossRef]
  27. Li, W.; Ma, D.; Fu, J.; Qi, Y.; Shi, H.; Ni, T. A Quantitative Exploration of the Interactions and Synergistic Driving Mechanisms between Factors Affecting Regional Air Quality Based on Deep Learning. Atmos. Environ. 2023, 314, 120077. [Google Scholar] [CrossRef]
  28. Adadi, A.; Berrada, M. Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI). IEEE Access 2018, 6, 52138–52160. [Google Scholar] [CrossRef]
  29. Chakraborty, S.; Misra, B.; Dey, N. Explainable Artificial Intelligence (XAI) for Air Quality Assessment. In Frontiers in Artificial Intelligence and Applications; IOS Press BV: Amsterdam, The Netherlands, 2024; Volume 383, pp. 333–341. [Google Scholar]
  30. Roscher, R.; Bohn, B.; Duarte, M.F.; Garcke, J. Explainable Machine Learning for Scientific Insights and Discoveries. IEEE Access 2020, 8, 42200–42216. [Google Scholar] [CrossRef]
  31. Oviedo, F.; Ferres, J.L.; Buonassisi, T.; Butler, K.T. Interpretable and Explainable Machine Learning for Materials Science and Chemistry. Acc. Mater. Res. 2022, 3, 597–607. [Google Scholar] [CrossRef]
  32. Lundberg, S.; Lee, S.-I. A Unified Approach to Interpreting Model Predictions. In Proceedings of the NIPS’17: Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
  33. Lundberg, S.M.; Erion, G.; Chen, H.; DeGrave, A.; Prutkin, J.M.; Nair, B.; Katz, R.; Himmelfarb, J.; Bansal, N.; Lee, S.I. From Local Explanations to Global Understanding with Explainable AI for Trees. Nat. Mach. Intell. 2020, 2, 56–67. [Google Scholar] [CrossRef]
  34. Chen, Y.W.; Medya, S.; Chen, Y.C. Investigating Variable Importance in Ground-Level Ozone Formation with Supervised Learning. Atmos. Environ. 2022, 282, 119148. [Google Scholar] [CrossRef]
  35. Yao, T.; Lu, S.; Wang, Y.; Li, X.; Ye, H.; Duan, Y.; Fu, Q.; Li, J. Revealing the Drivers of Surface Ozone Pollution by Explainable Machine Learning and Satellite Observations in Hangzhou Bay, China. J. Clean. Prod. 2024, 440, 140938. [Google Scholar] [CrossRef]
  36. Zhang, L.; Wang, L.; Ji, D.; Xia, Z.; Nan, P.; Zhang, J.; Li, K.; Qi, B.; Du, R.; Sun, Y.; et al. Explainable Ensemble Machine Learning Revealing the Effect of Meteorology and Sources on Ozone Formation in Megacity Hangzhou, China. Sci. Total Environ. 2024, 922, 171295. [Google Scholar] [CrossRef]
  37. Ma, J.; Ding, Y.; Cheng, J.C.P.; Jiang, F.; Tan, Y.; Gan, V.J.L.; Wan, Z. Identification of High Impact Factors of Air Quality on a National Scale Using Big Data and Machine Learning Techniques. J. Clean. Prod. 2020, 244, 118955. [Google Scholar] [CrossRef]
  38. Bernier, C.; Wang, Y.; Estes, M.; Lei, R.; Jia, B.; Wang, S.C.; Sun, J. Clustering Surface Ozone Diurnal Cycles to Understand the Impact of Circulation Patterns in Houston, TX. J. Geophys. Res. Atmos. 2019, 124, 13457–13474. [Google Scholar] [CrossRef]
  39. Chen, Z.; Liu, R.; Wu, S.; Xu, J.; Wu, Y.; Qi, S. Diurnal Variation Characteristics and Meteorological Causes of Autumn Ozone in the Pearl River Delta, China. Sci. Total Environ. 2024, 908, 168469. [Google Scholar] [CrossRef]
  40. Stafoggia, M.; Oftedal, B.; Chen, J.; Rodopoulou, S.; Renzi, M.; Atkinson, R.W.; Bauwelinck, M.; Klompmaker, J.O.; Mehta, A.; Vienneau, D.; et al. Long-Term Exposure to Low Ambient Air Pollution Concentrations and Mortality among 28 Million People: Results from Seven Large European Cohorts within the ELAPSE Project. Lancet Planet Health 2022, 6, e9–e18. [Google Scholar] [CrossRef]
  41. Strak, M.; Weinmayr, G.; Rodopoulou, S.; Chen, J.; De Hoogh, K.; Andersen, Z.J.; Atkinson, R.; Bauwelinck, M.; Bekkevold, T.; Bellander, T.; et al. Long Term Exposure to Low Level Air Pollution and Mortality in Eight European Cohorts within the ELAPSE Project: Pooled Analysis. BMJ 2021, 374, n1904. [Google Scholar] [CrossRef]
  42. Liu, T.; Hong, Y.; Li, M.; Xu, L.; Chen, J.; Bian, Y.; Yang, C.; Dan, Y.; Zhang, Y.; Xue, L.; et al. Atmospheric Oxidation Capacity and Ozone Pollution Mechanism in a Coastal City of Southeastern China: Analysis of a Typical Photochemical Episode by an Observation-Based Model. Atmos. Chem. Phys. 2022, 22, 2173–2190. [Google Scholar] [CrossRef]
  43. D’Amico, F.; Gullì, D.; Lo Feudo, T.; Ammoscato, I.; Avolio, E.; De Pino, M.; Cristofanelli, P.; Busetto, M.; Malacaria, L.; Parise, D.; et al. Cyclic and Multi-Year Characterization of Surface Ozone at the WMO/GAW Coastal Station of Lamezia Terme (Calabria, Southern Italy): Implications for Local Environment, Cultural Heritage, and Human Health. Environments 2024, 11, 227. [Google Scholar] [CrossRef]
  44. Coluzzi, R.; D’emilio, M.; Imbrenda, V.; Giorgio, G.A.; Lanfredi, M.; Macchiato, M.; Ragosta, M.; Simoniello, T.; Telesca, V. Investigating Climate Variability and Long-Term Vegetation Activity across Heterogeneous Basilicata Agroecosystems. Geomat. Nat. Hazards Risk 2019, 10, 168–180. [Google Scholar] [CrossRef]
  45. Blanco, A.; De Tomasi, F.; Filippo, E.; Manno, D.; Perrone, M.R.; Serra, A.; Tafuro, A.M.; Tepore, A. Characterization of African Dust over Southern Italy. Atmos. Chem. Phys. 2003, 3, 2147–2159. [Google Scholar] [CrossRef]
  46. Wang, Q.; Gu, J.; Wang, X. The Impact of Sahara Dust on Air Quality and Public Health in European Countries. Atmos. Environ. 2020, 241, 117771. [Google Scholar] [CrossRef]
  47. Sicard, P.; De Marco, A.; Troussier, F.; Renou, C.; Vas, N.; Paoletti, E. Decrease in Surface Ozone Concentrations at Mediterranean Remote Sites and Increase in the Cities. Atmos. Environ. 2013, 79, 705–715. [Google Scholar] [CrossRef]
  48. Cristofanelli, P.; Bonasoni, P. Background Ozone in the Southern Europe and Mediterranean Area: Influence of the Transport Processes. Environ. Pollut. 2009, 157, 1399–1406. [Google Scholar] [CrossRef] [PubMed]
  49. ARPA Basilicata. Raccolta Annuale dei Dati Ambientali, Anno 2022; ARPA Basilicata: Basilicata, Italy, 2023; Volume 1. [Google Scholar]
  50. Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; Association for Computing Machinery, New York, NY, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
  51. Molnar, C. Interpretable Machine Learning. A Guide for Making Black Box Models Explainable; Leanpub: Victoria, BC, Canada, 2021. [Google Scholar]
  52. Ponce-Bobadilla, A.V.; Schmitt, V.; Maier, C.S.; Mensing, S.; Stodtmann, S. Practical Guide to SHAP Analysis: Explaining Supervised Machine Learning Model Predictions in Drug Development. Clin. Transl. Sci. 2024, 17, e70056. [Google Scholar] [CrossRef]
  53. Hartigan, J.A.; Wong, M.A. Algorithm AS 136: A K-Means Clustering Algorithm. J. R. Stat. Soc. Ser. C Appl. Stat. 1979, 28, 100–108. [Google Scholar] [CrossRef]
  54. Govender, P.; Sivakumar, V. Application of K-Means and Hierarchical Clustering Techniques for Analysis of Air Pollution: A Review (1980–2019). Atmos. Pollut. Res. 2020, 11, 40–56. [Google Scholar] [CrossRef]
  55. Kang, Y.; Choi, H.; Im, J.; Park, S.; Shin, M.; Song, C.K.; Kim, S. Estimation of Surface-Level NO2 and O3 Concentrations Using TROPOMI Data and Machine Learning over East Asia. Environ. Pollut. 2021, 288, 117711. [Google Scholar] [CrossRef]
  56. Luo, Z.; Lu, P.; Chen, Z.; Liu, R. Ozone Concentration Estimation and Meteorological Impact Quantification in the Beijing-Tianjin-Hebei Region Based on Machine Learning Models. Earth Space Sci. 2024, 11, e2023EA003346. [Google Scholar] [CrossRef]
  57. Gagliardi, R.V.; Andenna, C. A Machine Learning Approach to Investigate the Surface Ozone Behavior. Atmosphere 2020, 11, 1173. [Google Scholar] [CrossRef]
  58. Ordóñez, C.; Garrido-Perez, J.M.; García-Herrera, R. Early Spring Near-Surface Ozone in Europe during the COVID-19 Shutdown: Meteorological Effects Outweigh Emission Changes. Sci. Total Environ. 2020, 747, 141322. [Google Scholar] [CrossRef]
  59. Ni, J.; Jin, J.; Wang, Y.; Li, B.; Wu, Q.; Chen, Y.; Du, S.; Li, Y.; He, C. Surface Ozone in Global Cities: A Synthesis of Basic Features, Exposure Risk, and Leading Meteorological Driving Factors. Geogr. Sustain. 2024, 5, 64–76. [Google Scholar] [CrossRef]
  60. Li, M.; Yu, S.; Chen, X.; Li, Z.; Zhang, Y.; Wang, L.; Liu, W.; Li, P.; Lichtfouse, E.; Rosenfeld, D.; et al. Large Scale Control of Surface Ozone by Relative Humidity Observed during Warm Seasons in China. Environ. Chem. Lett. 2021, 19, 3981–3989. [Google Scholar] [CrossRef]
  61. Laban, T.L.; Van Zyl, P.G.; Beukes, J.P.; Mikkonen, S.; Santana, L.; Josipovic, M.; Vakkari, V.; Thompson, A.M.; Kulmala, M.; Laakso, L. Statistical Analysis of Factors Driving Surface Ozone Variability over Continental South Africa. J. Integr. Environ. Sci. 2020, 17, 1–28. [Google Scholar] [CrossRef]
  62. Camalier, L.; Cox, W.; Dolwick, P. The Effects of Meteorology on Ozone in Urban Areas and Their Use in Assessing Ozone Trends. Atmos. Environ. 2007, 41, 7127–7137. [Google Scholar] [CrossRef]
  63. Cheng, N.; Jing, D.; Gu, Z.; Cai, X.; Shi, Z.; Li, S.; Chen, L.; Li, W.; Wang, Q. Observation-Based Ozone Formation Rules by Gradient Boosting Decision Trees Model in Typical Chemical Industrial Parks. Atmosphere 2024, 15, 600. [Google Scholar] [CrossRef]
  64. Borhani, F.; Shafiepour Motlagh, M.; Stohl, A.; Rashidi, Y.; Ehsani, A.H. Tropospheric Ozone in Tehran, Iran, during the Last 20 Years. Environ. Geochem. Health 2022, 44, 3615–3637. [Google Scholar] [CrossRef]
  65. Han, H.; Liu, J.; Shu, L.; Wang, T.; Yuan, H. Local and Synoptic Meteorological Influences on Daily Variability in Summertime Surface Ozone in Eastern China. Atmos. Chem. Phys. 2020, 20, 203–222. [Google Scholar] [CrossRef]
  66. Kavassalis, S.C.; Murphy, J.G. Understanding Ozone-Meteorology Correlations: A Role for Dry Deposition. Geophys. Res. Lett. 2017, 44, 2922–2931. [Google Scholar] [CrossRef]
  67. Liao, Z.; Pan, Y.; Ma, P.; Jia, X.; Cheng, Z.; Wang, Q.; Dou, Y.; Zhao, X.; Zhang, J.; Quan, J. Meteorological and Chemical Controls on Surface Ozone Diurnal Variability in Beijing: A Clustering-Based Perspective. Atmos. Environ. 2023, 295, 119566. [Google Scholar] [CrossRef]
  68. Cheng, Y.; Huang, X.F.; Peng, Y.; Tang, M.X.; Zhu, B.; Xia, S.Y.; He, L.Y. A Novel Machine Learning Method for Evaluating the Impact of Emission Sources on Ozone Formation. Environ. Pollut. 2023, 316, 120685. [Google Scholar] [CrossRef] [PubMed]
  69. Zhang, C.; Xie, Y.; Shao, M.; Wang, Q. Application of Machine Learning to Analyze Ozone Sensitivity to Influencing Factors: A Case Study in Nanjing, China. Sci. Total Environ. 2024, 929, 172544. [Google Scholar] [CrossRef]
  70. Nguyen, D.H.; Lin, C.; Vu, C.T.; Cheruiyot, N.K.; Nguyen, M.K.; Le, T.H.; Lukkhasorn, W.; Vo, T.D.H.; Bui, X.T. Tropospheric Ozone and NOx: A Review of Worldwide Variation and Meteorological Influences. Environ. Technol. Innov. 2022, 28, 102809. [Google Scholar] [CrossRef]
  71. Bi, Z.; Ye, Z.; He, C.; Li, Y. Analysis of the Meteorological Factors Affecting the Short-Term Increase in O3 Concentrations in Nine Global Cities during COVID-19. Atmos. Pollut. Res. 2022, 13, 101523. [Google Scholar] [CrossRef] [PubMed]
  72. Lee, H.J.; Chang, L.S.; Jaffe, D.A.; Bak, J.; Liu, X.; Abad, G.G.; Jo, H.Y.; Jo, Y.J.; Lee, J.B.; Kim, C.H. Ozone Continues to Increase in East Asia despite Decreasing NO2: Causes and Abatements. Remote Sens. 2021, 13, 2177. [Google Scholar] [CrossRef]
  73. Chen, J.; Shen, H.; Li, T.; Peng, X.; Cheng, H.; Ma, C. Temporal and Spatial Features of the Correlation between PM2.5 and O3 Concentrations in China. Int. J. Environ. Res. Public Health 2019, 16, 4824. [Google Scholar] [CrossRef]
  74. Wang, J.; Dong, J.; Guo, J.; Cai, P.; Li, R.; Zhang, X.; Xu, Q.; Song, X. Understanding Temporal Patterns and Determinants of Ground-Level Ozone. Atmosphere 2023, 14, 604. [Google Scholar] [CrossRef]
  75. Agathokleous, S.; Saitanis, C.J.; Savvides, C.; Sicard, P.; Agathokleous, E.; De Marco, A. Spatiotemporal Variations of Ozone Exposure and Its Risks to Vegetation and Human Health in Cyprus: An Analysis across a Gradient of Altitudes. J. For. Res. 2023, 34, 579–594. [Google Scholar] [CrossRef]
  76. An, C.; Li, H.; Ji, Y.; Chu, W.; Yan, X.; Chai, F. A Review on Nocturnal Surface Ozone Enhancement: Characterization, Formation Causes, and Atmospheric Chemical Effects. Sci. Total Environ. 2024, 921, 170731. [Google Scholar] [CrossRef]
  77. Wang, K.; Xie, F.; Sulaymon, I.D.; Gong, K.; Li, N.; Li, J.; Hu, J. Understanding the Nocturnal Ozone Increase in Nanjing, China: Insights from Observations and Numerical Simulations. Sci. Total Environ. 2023, 859, 160211. [Google Scholar] [CrossRef]
  78. He, G.; He, C.; Wang, H.; Lu, X.; Pei, C.; Qiu, X.; Liu, C.; Wang, Y.; Liu, N.; Zhang, J.; et al. Nighttime Ozone in the Lower Boundary Layer: Insights from 3-Year Tower-Based Measurements in South China and Regional Air Quality Modeling. Atmos. Chem. Phys. 2023, 23, 13107–13124. [Google Scholar] [CrossRef]
Figure 1. Location of the study area and the measurement sites. The different colors of the site names indicate the three clusters of the O3 daily pattern, as described in Section 3.4. Potenza and Matera are the two major urban areas. The two oil pre-treatment plants are also shown.
Figure 1. Location of the study area and the measurement sites. The different colors of the site names indicate the three clusters of the O3 daily pattern, as described in Section 3.4. Potenza and Matera are the two major urban areas. The two oil pre-treatment plants are also shown.
Atmosphere 16 00491 g001
Figure 2. Scatter plot of predicted vs. observed O3 values on the test dataset for (a) FE, (b) ME, and (c) VP. R2, MBE, and RMSE values and the equation of the best-fit line are also shown.
Figure 2. Scatter plot of predicted vs. observed O3 values on the test dataset for (a) FE, (b) ME, and (c) VP. R2, MBE, and RMSE values and the equation of the best-fit line are also shown.
Atmosphere 16 00491 g002
Figure 3. (a) Percentage contribution of meteorological variables (meteo) and atmospheric pollutants (chemicals) to O3. (b) Summary plots of the SHAP values for O3. The values on the left side represent the absolute mean of the SHAP values.
Figure 3. (a) Percentage contribution of meteorological variables (meteo) and atmospheric pollutants (chemicals) to O3. (b) Summary plots of the SHAP values for O3. The values on the left side represent the absolute mean of the SHAP values.
Atmosphere 16 00491 g003
Figure 4. Main effects of the top five features on O3 (µg/m3) for (a) FE, (b) ME, and (c) VP.
Figure 4. Main effects of the top five features on O3 (µg/m3) for (a) FE, (b) ME, and (c) VP.
Atmosphere 16 00491 g004aAtmosphere 16 00491 g004b
Figure 5. Time series of the observed and predicted O3 concentrations (black and red lines) and the stacked area time series of the individual SHAP values for nine variables (colored areas). On the right side are the importance ranking of the features and their absolute mean of the SHAP values. The blue and black dotted lines represent, respectively, the base value, 73.4 µg/m3, and the target value, 120 µg/m3.
Figure 5. Time series of the observed and predicted O3 concentrations (black and red lines) and the stacked area time series of the individual SHAP values for nine variables (colored areas). On the right side are the importance ranking of the features and their absolute mean of the SHAP values. The blue and black dotted lines represent, respectively, the base value, 73.4 µg/m3, and the target value, 120 µg/m3.
Atmosphere 16 00491 g005
Figure 6. Time series of the main features in the period of interest: (a) T, (b) RH, (c) CO, (d) NO, (e) NO2, (f) ws. The grey areas indicate the feature values favorable to O3 formation. In the case of CO, the grey area is not reported because no evident turning point was observed.
Figure 6. Time series of the main features in the period of interest: (a) T, (b) RH, (c) CO, (d) NO, (e) NO2, (f) ws. The grey areas indicate the feature values favorable to O3 formation. In the case of CO, the grey area is not reported because no evident turning point was observed.
Atmosphere 16 00491 g006
Figure 7. Clustered O3 daily pattern. The black line represents the final clustering centers.
Figure 7. Clustered O3 daily pattern. The black line represents the final clustering centers.
Atmosphere 16 00491 g007
Figure 8. Summary plots of the SHAP values for each cluster of O3 daily pattern. (a) FE, LM, MdB, SNM, VZI, (b) CM, ME, PZB, (c) GR, VP.
Figure 8. Summary plots of the SHAP values for each cluster of O3 daily pattern. (a) FE, LM, MdB, SNM, VZI, (b) CM, ME, PZB, (c) GR, VP.
Atmosphere 16 00491 g008aAtmosphere 16 00491 g008b
Figure 9. (a) ws, (b) RH, (c) NO, and (d) NO2 feature values daily pattern.
Figure 9. (a) ws, (b) RH, (c) NO, and (d) NO2 feature values daily pattern.
Atmosphere 16 00491 g009
Table 1. List of selected air quality monitoring stations and corresponding identifiers (ID).
Table 1. List of selected air quality monitoring stations and corresponding identifiers (ID).
SiteIDStation Type, AreaLatitude NLongitude EAltitude
(m a.s.l.)
Costa MolinaCMIndustrial, rural15°57′17″,240°18′56″,2690
FerrandinaFEIndustrial, rural16°29′46″,440°29′09″,063
GrumentoGRIndustrial, sub urban15°53′29″,140°17′18″,2735
La MartellaLMIndustrial, sub urban16°32′49″,740°41′11″,9245
Masseria De BlasisMdBIndustrial, rural15°52′02″,540°19′27″,2603
MelfiMEIndustrial, sub urban15°38′23″,940°59′02″,8561
Potenza SL BrancaPZBIndustrial, sub urban15°52′22″,440°38′38″,0720
San Nicola Di MelfiSNMIndustrial, rural15°43′21″,941°04′01″,4187
Viggiano PaeseVPIndustrial, rural15°54′02″,540°20′05″,5820
Viggiano ZIVZIIndustrial, rural15°54′16″,440°18′50″,6604
Table 2. Summary of feature values favorable to O3 concentration increase. (*) Melfi site, (**) feature not in the first top five.
Table 2. Summary of feature values favorable to O3 concentration increase. (*) Melfi site, (**) feature not in the first top five.
SiteRH
[%]
T
[°C]
ws
[m/s]
NO2
[μg/m3]
NO
[μg/m3]
Most favorable feature conditions to O3 accumulation<75>20>2
2 ÷ 5 *
<5
>3 *
<2
CM<75>17>2<4<2
FE<80**>1<10<4
GR<73>15>2<5<2
LM<77>20**<8**
MdB<85>20>2<5**
ME<72>172 ÷ 5>3 **<2
PZB<75>20>1<8**
SNM<73>17>2<13<5
VP<60>17>2<4**
VZI<78>15>1<6<4
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Gagliardi, R.V.; Andenna, C. Exploring the Influencing Factors of Surface Ozone Variability by Explainable Machine Learning: A Case Study in the Basilicata Region (Southern Italy). Atmosphere 2025, 16, 491. https://doi.org/10.3390/atmos16050491

AMA Style

Gagliardi RV, Andenna C. Exploring the Influencing Factors of Surface Ozone Variability by Explainable Machine Learning: A Case Study in the Basilicata Region (Southern Italy). Atmosphere. 2025; 16(5):491. https://doi.org/10.3390/atmos16050491

Chicago/Turabian Style

Gagliardi, Roberta Valentina, and Claudio Andenna. 2025. "Exploring the Influencing Factors of Surface Ozone Variability by Explainable Machine Learning: A Case Study in the Basilicata Region (Southern Italy)" Atmosphere 16, no. 5: 491. https://doi.org/10.3390/atmos16050491

APA Style

Gagliardi, R. V., & Andenna, C. (2025). Exploring the Influencing Factors of Surface Ozone Variability by Explainable Machine Learning: A Case Study in the Basilicata Region (Southern Italy). Atmosphere, 16(5), 491. https://doi.org/10.3390/atmos16050491

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop