A Statistical Procedure for Analyzing the Behavior of Air Pollutants during Temperature Extreme Events: The Case Study of Emilia ‐ Romagna Region (Northern Italy)

Featured Application: The characterization of the correlation structure among air pollutants dur ‐ ing heat waves may be useful for decision makers that have the responsibility of taking addi ‐ tional measures to prevent health effects on the population when extreme temperatures are fore ‐ casted. Abstract: Meteorological conditions play a crucial role in air pollution by affecting both directly and indirectly the emissions, transport, formation, and deposition of air pollutants. Extreme weather events can strongly affect surface air quality. Understanding relations between air pollutant con ‐ centrations and extreme weather events is a fundamental step toward improving the knowledge of how excessive heat impacts on air quality. In this work, we developed a statistical procedure for investigating the variations in the correlation structure of four air pollutants (NO x , O 3 , PM 10 , PM 2,5 ) during extreme temperature events measured in monitoring sites located of Emilia Romagna re ‐ gion, Northern Italy, in summer (June–August) from 2015 to 2017. For the selected stations, Hot Days (HDs) and Heat Waves (HWs) were identified with respect to historical series of maximum temperature measured for a 30 ‐ year period (1971–2000). This method, based on multivariate tech ‐ niques, allowed us to highlight the variations in air quality of study area due to the occurrence of HWs. The examined data, including PM concentrations, show higher values, whereas NO x and O 3 concentrations seem to be not influenced by HWs. This operative procedure can be easily exported in other geographical areas for studying effects of climate change on a local scale.


Introduction
In the last years, the scientific research linked to human health and wellbeing of populations in urban and industrialized areas mainly focused on two issues: climate change and air quality [1][2][3][4][5][6][7][8][9][10]. Climate change affects human wellbeing through many related events: increased frequency and intensity of heat waves and cold waves, changing precipitation intensity, and increased devastating weather events, such as hurricanes, tornadoes, floods and droughts, cold-related mortality, and heat-related mortality [11][12][13][14][15][16][17]. Observed climate trends show that temperatures are increasing, particularly extreme temperatures, with heat waves (HWs) becoming more frequent, more intense, and longer lasting [3,[17][18][19][20]. In literature, many studies are focused on the effects of HWs and air pollution on human health, particularly on the increase in mortality and morbidity, whilst few studies investigate the link between HWs and air pollution [2,[21][22][23][24].
Climate change is intrinsically connected to air pollution because the main driver of climate change is fossil fuel combustion, which is also a major contributor to air pollution. In fact, combustion processes emit both greenhouse gases (CO2, CH4, N2O) and air pollutants (PM, SO2, NOx, CO) [25][26][27][28]. Meteorological conditions seem to influence atmospheric concentrations of ozone, particulate matter, and nitrogen oxides. These air pollutants are a major threat to human health; particularly, they are dangerous for older people, people with heart disease, and children. Inadequate pollutant dispersion due to topographic and meteorological conditions, such as low wind intensity, cause the stagnation of the chemical air mass, associated with peak pollution episodes [29][30][31].
We focused our investigation on Po plain, an area in which many studies have shown a strong influence of dispersion conditions for all particle matter ranges, a distinct anthropogenic periodicity (seasonal and diurnal) for many pollutants, and a strong dependence on atmospheric conditions. Our study analyses data collected during the summers of the years 2015-2017 in the Emilia Romagna Region (Italy). We focus on this region because of its severe urban air pollution due to the high population and industrial manufacturing density and to the fact of being in a valley where two surrounding mountain chains favor the stagnation of pollutants [29][30][31]. Extreme temperature events (Hot Days and Heat Waves) in selected 14 regional stations were identified with respect to a 30-year period . The objective of this study was to develop an operative procedure based on multivariate techniques for investigating the relationship between temperature and air quality during Hot Days and Heat Waves and for evaluating the impact of temperature on air pollutant concentrations.

Study Area
The study was carried out in Emilia Romagna (Figure 1), a region of Northern Italy that largely includes the Po Valley, delimited by Tusco-Emilian Apennines to the south, with mountains reaching altitude of 2000 m a.s.l. A climate gradient, from the Mediterranean warm-temperate climate to the cold-temperate climate of the Apennines, is present in the region. In the Po plain, summers are hot, winters are cold (typical monthly mean temperatures ranging from 1 °C to 26 °C), and autumns rainy. This region is characterized by high humidity levels (typical monthly mean relative humidity ranging from 60% to 84%) and low wind intensities (typical annual mean wind intensities of about 2 m/s). The Emilia Romagna region is densely populated and suffers a strong anthropic pressure due to urban areas, heavy industrialization, and intensive breeding and agriculture.

Data
In this work, 14 sampling sites (SS) were selected. The sampling points are located either in rural areas or in background urban or suburban areas. Moreover, the stations are representative of the four areas in which the territory of the region is classified, the agglomeration area of Bologna city and the three different geographical zones: west plain area, east plain area, and Apennine mountains and hill area ( Figure 1). Each sampling site consists of an air-quality monitoring station and a closer meteorological station managed by the Regional Agency for Prevention, Environment, and Energy of Emilia-Romagna (www.arpae.it accessed on 1 July 2021). The sampling sites are listed in Table 1 For each sampling site, we used historical series of maximum temperatures for the reference 30-year period (1971-2000); we selected this reference period on the base of the suggestion given by WMO (World Meteorological Organization) to update CLINO every ten years [32,33]. The entire database was checked to guarantee the goodness of fit and to avoid the presence of anomalies; to this aim, ad-hoc routines were implemented in R environment for automatically detecting missing data. In all the examined data, the percentage of data missing is less than 5%; in the other cases, the data are considered not available.

Data Analysis Procedure
Using historical series of maximum temperature and following the methodology proposed in [3], Hot Days (HD) and Heat Waves (HW) for each sampling site and for each summer were determined. For each station, we defined the degree of severity of these extreme events, comparing the 2015-2017 data with the respective historical time series from 1971-2000. The current daily value of temperature was compared with its mean historical value, including standard deviation (σ). We define Hot Days of the first degree if the current value is higher than historical values ±σ; Hot Days of the second degree with , if the current value is higher than historical values ± 2σ; and Hot Days of the third degree with , if the current value is higher than historical values ± 3σ. Furthermore, to measure the persistence of these events, the occurrence of Heat Waves is defined with the following criteria: occurrence of at least six consecutive days classified as = occurrence of Heat Wave of the first degree ; occurrence of at least six consecutive days classified as = occurrence of Heat Wave of the second degree ; and occurrence of at least six consecutive days classified as = occurrence of Heat Wave of the third degree . In a second step, multivariate statistical procedures were applied [34,35]. For each sampling site, all data related to summer 2015-2017 were organized in data matrices at daily scale. Cluster Analysis (CA) and Principal Component Analysis (PCA) were applied to highlight the underlying correlation structure between pollutants concentrations and meteorological parameters [36]. CA is a classification technique used for determining homogeneous subgroups of sampling sites. For each data matrix, we calculated the association matrix using the Euclidean distance, and we applied the clustering algorithm at complete linkage. For results validation, each cluster has to be characterized by means of endogenous indices (centroids are the mean values of descriptors measured in SS included in the cluster) and by means of exogenous indices (mean values of variables external to the clustering procedure). The correspondence among endogenous and exogenous indices allowed us to explain the clustering structure. If the data matrix is correctly constructed, the results must not depend on the selected parameters PCA is the most common factorial technique for reducing the space dimensionality and for highlighting the underlying correlation structure among measured variables. For determining the principal components (PCs), starting from data matrices, we calculated the eigenvalues and the corresponding eigenvectors of correlation matrix (association matrix based on Pearson coefficient). The eigenvectors represent the mutually orthogonal linear combinations of the original descriptors, and each of them can be considered a new independent variable. In order to investigate the nature of the new variables, or PC, we analyzed the percentage loadings matrix in which each element represents the percentage weight of the original descriptor in the PC.
For highlighting differences in the correlation structures among pollutants and meteorological parameters, we used Hot Days and Heat Waves identification procedure for selecting specific subsets of sampling days. We introduced two external variables with double modalities: (HD/noHD) and (HW/noHW). On the base of value assumed by bimodal variables, the original data matrices, including all the summer days (in which SS indicates the sampling site, y the year, m the number of sampling days 92, and n the number of descriptors), may be divided in the following submatrices: 1 HD-Submatrix and noHD-Submatrix , for which ∪ with . Basically, we divided the original matrix into two submatrices one containing only HD days and the other containing only non-HD days; and 2 HW-Submatrix and noHW-Submatrix , for which ∪ with . Additionally, in this case, we divided the original matrix in two parts: the first that contains only the days that fall within a defined interval HW and another that contains only the days that do not belong to intervals of HW.
This procedure may be repeated also introducing the degree of severity for HD and HW, reducing the number of sampling days included in the submatrices, and increasing the level of risk. For all these different submatrices, PCA is applied in order to characterize the different correlation structures.

HD and HW Characterization
Hot Days and Heat Waves of the three levels of severity identified for the 14 study sites are shown in Tables 2 and 3. We highlight that the three threshold series used to identify extreme events were obtained using historical temperature series proper for each site station; in this way, the extreme event occurrences were identified with a higher degree of accuracy because linked to strictly local information.
From Table 2, we note that 2015 and 2017 present higher events of , , and respect to 2016; Table 3  Comparing these results with those obtained in an area of Southern Italy (Matera, Basilicata) during summer 2015-2017, we note that, using the same method, the number of HDs and HWs highlighted in the second case are greater. Particularly, we note the marked presence of second-and third-level HWs, which instead are rare in Emilia-Romagna. This is probably due to the difference between the two examined regions: in Emilia-Romagna, the anthropogenic pressure of the last 40 years has remained unchanged, while in Matera, there has been a significant variation in land use [3,11].

Cluster Analysis on 2015-2017 Data
In Table 4, mean values of all the six descriptors are shown. If we focus on pollution data, we note that the recorded values, the maximum values included, do not exceed the alarm threshold stated by Italian law, which incorporates the European indications [37]. If we compare the values measured in the selected sampling sites with the data shown in the literature for similar studies, we note that they are low, both those measured by Mavrakys et al. [24] and those measured by Kalisa et al. [2] To put in evidence homogeneous subgroups of sampling sites, in our statistical procedure, we applied a clustering algorithm. We repeated the clustering procedure twice, the first time taking into account all pollutants and meteorological parameters available and the second time eliminating RHmin because, as shown in Table 4, it presents a great deal of data missing. We obtained similar results so, for brevity, here, we show only the results of the procedure applied eliminating RHmin.

Combined Analysis of HD/HW Occurrence and Pollutants Concentrations
In the groups SS2, SS3, and SS5, highlighted by cluster procedure, the frequency of Hot Days is higher than in the other SSs. In 2015, 53% of summer days are classified as , while the other SSs show an average of 43% of , while in 2017, these two percentages are 50% and 40%, respectively (Table 2).
Regarding Heat Waves' occurrence (Table 3), the difference is more marked. In the first group of SSs, in 2015, an average of four for the sampling site is recorded; on the contrary, in the other group of SSs, this mean frequency is 2.5. In 2017, the frequencies were 4.3 and 3.4, respectively. Moreover, also the Heat Waves intensity (or length of HWs) was significantly different: in 2015, for the first group of SSs, 73% of sampling days classified as , which includes Heat Waves; on the contrary, in the other group of SSs, this percentage is 53%; in 2017, analogous behavior was observed, and the two percentage were 62% and 55%, respectively In the following, we present and discuss the results for the sampling site SS3 Badia (Parma, West Plain Area); this choice is based on the results previously discussed. SS3 shows a certain number of HW and HD that is statistically significant, allowing a detailed discussion of the correlation structure.
Using the HD and HW identification procedure [3], we introduce, as explained in Section 2.3, two external variables with double modalities (HD/noHD) and (HW/noHW) determining the submatrices: and and 6 .
in which n is the number of DS (n = 6); is the number of Hot Days; and is the number of Hot Days included in HW (reported in Table 5 as N number of sampling days). In Table 5, for each year and for each pollutant, we report the mean value calculated in summer and the percentage difference of mean values calculated only on sampling days classified as HD or HW. Levels of gaseous pollutants NOx and O3 show limited variations; on the contrary, the levels of particulate (PM10 and PM2.5) show a significant increase in the HD and in the HW sampling day subsets. The concentration percentage increase is in the range 15-31%. , the first three factors explain more than 80% of data variance (Table 6), and the first factor is characterized by PM concentrations, whereas for submatrices (Table 7), all the examined pollutants and meteorological parameters have a similar weight (about 20%) in the first factor, confirming the dominant role of particulate during Hot Days independent from relative humidity, which often plays an isolated role in the correlation structure. These results are comparable with what is described in Athene, where a worsening of the air-quality index is described in correspondence with HDs and HWs [19]. In our case, the pollutants that most increase their concentration in correspondence with heat waves are PM10 and PM2,5.
Our method allowed us to study the behavior of pollutants during extreme temperature events even if the concentrations of pollutants were not particularly high, while generally in the literature, we find comparisons between extreme events of concentration of pollutants and extreme events of temperature [23]. Moreover, our statistical procedure allowed us to make hypothesis also without a large database and on a local scale.

Conclusions
In this study, we present a statistical procedure investigating the behavior of air pollutants concentrations during Hot Day and Heat Waves. Our procedure was based on the following steps: definition of HD and HW from the comparison with the 30-year reference periods of each sampling site; identification of homogenous subgroups of sampling sites by means of Cluster analysis; and characterization of the correlation patterns among air pollutants during Hot Days and Heat Waves by means of Principal Component Analysis.
For the examined test case, by analyzing extreme events of maximum temperature through the identification of the number of occurrences of the Hot Days and their persistence (Heat Waves), it is possible to state that: summer 2016 presented a lower frequency of extreme event occurrence than summers 2015 and 2017; during the investigated and in examined area, no Heat Wave of the third level was identified. The last step of our procedure shows that there is a different behavior of pollutants during Heat Waves; particularly, Particulate Matter concentrations are higher independent from humidity, whereas NO2 and O3 concentrations seem to be not influenced by Heat Waves.
We want to emphasize that our procedure is easily applicable also in different geographical areas and with different pollutants. This method works well even if the pollutants' concentrations are low and if there are no large differences in the level of pollutants between Hot Days and no Hot Days.
The discussed procedure can be used in other contexts at local scale; it is useful for understanding the influence of extreme events on air pollutant concentrations and may be used by decision makers that have the responsibility of taking additional measures to prevent health effects on the population.

Conflicts of Interest:
The authors declare no conflict of interest.