Statistical Analysis of PM 10 Concentration in the Monterrey Metropolitan Area, Mexico (2010–2018)

: Air-quality monitoring and analysis are initial parts of a comprehensive strategy to prevent air pollution in cities. In such a context, statistical tools play an important role in determining the time-series trends, locating areas with high pollutant concentrations, and building predictive models. In this work, we analyzed the spatio-temporal behavior of the pollutant PM 10 in the Monterrey Metropolitan Area (MMA), Mexico during the period 2010–2018 by applying statistical analysis to the time series of seven environmental stations. First, we used experimental variograms and scientiﬁc visualization to determine the general trends and variability in time. Then, fractal exponents (the Hurst rescaled range and Higuchi algorithm) were used to analyze the long-term dependence of the time series and characterize the study area by correlating that dependence with the geographical parameters of each environmental station. The results suggest a linear decrease in PM 10 concentration, which showed an annual cyclicity. The autumn-winter period was the most polluted and the spring-summer period was the least. Furthermore, it was found that the highest average concentrations are located in the western and high-altitude zones of the MMA, and that average concentration is related in a quadratic way to the Hurst and Higuchi exponents, which in turn are related to some geographic parameters. Therefore, in addition to the results for the MMA, the present paper shows three practical statistical methods for analyzing the spatio-temporal behavior of air quality.


Introduction
Since ancient times, the behavior of the different variables that interact in the atmosphere has drawn the attention of human beings, as these variables set the guidelines for the conditions of the environment in which daily activities are carried out. The study of pollutant behavior has become an important and relevant topic in a society that aspires to improve quality of life.
Air pollution is defined as the presence of one or more chemical substances in the air at high concentrations that can damage human beings, animals, vegetation, and even materials [1][2][3][4]. Some of the pollutants studied and measured are ozone (O 3 ), sulfur dioxide (SO 2 ), nitrogen dioxide (NO 2 ), carbon monoxide (CO), lead (Pb), and particulate matter (PM). The Norma Oficial Mexicana NOM-025-SSA1-2014 [5] describes PM as a mix of substances in a solid or liquid state that remains suspended in the atmosphere. Each particle can present a wide variety of features. Therefore, to better study it, this material is classified into two groups according to its aerodynamic behavior-PM 2.5 and PM 10 , which are less than 2.5 µm and 10 µm, respectively [4,[6][7][8].
One way to examine the impact of pollutants on the environment is through their time series. Different methodologies have been used to study this, such as that of Arreola Contreras and González [9], in which a spectral analysis is applied to time series, with which they found that wind is correlated with the tracking of PM 10 levels. Likewise, Lee et al. [10] indicate that it is possible to use Box counting as a tool to distinguish spatio-temporal variations of PM 10 .
As a temporal population variance estimator and structural analysis model, a variogram describes the relation of paired observations separated by a distance h. Specifically, this method consists of estimating the expected value of the squared difference between neighboring random variables separated by different values of h throughout the time series, providing fundamental support and allowing the representation of such a relation quantitatively, as well as determining important aspects of the series such as the distance at which the maximum variability occurs, whether there is cyclicity or not, and whether to perform interpolation or kriging [11,12]. Several authors have studied the temporal variation of PM 10 , correlating its variability with other variables such as wind direction, temperature, and diurnal cycle, among others [13][14][15][16][17][18]. In turn, Serna [19] [21], measure the fractality of time series, which is directly related to the persistency or long-term memory of time series. Therefore, the higher the value in the exponent, the higher the persistence. Several analyses relating the dependence of long-term memory on different pollutants have been reported in the literature [22][23][24][25][26]. Hrs was used as the primary tool in those reports, which found a longterm correlation for PM 10 results. Additionally, Nikolopoulos et al. [23] used the HFD algorithm to study PM 10 time series, indicating fractality and long-term memory in the behavior of the records from the studied stations. However, none of these exponents have been calculated to compare them with geographic parameters, such as latitude, longitude, and altitude, despite the fact that those relations have been found for other meteorological variables [27][28][29][30].
Regarding standards and other studies undertaken in Mexico, air pollution has been considered a serious and complex issue in this country since the 1970s, which has resulted in the design new techniques to study this phenomenon [31][32][33]. Among these researches, Audelo-Vucovich et al. [34] stand out, having used the Hurst exponent and other tools to determine the persistence of PM 10 concentrations and their correlation with environmental pre-contingency, observing that the probability of reaching a pre-contingency event increases when Hrs > 0.3.
Moving deeper into the study area, different strategies have been established in the Monterrey Metropolitan Area (MMA), given the city's characteristics and accelerated urban growth. In the mid-1990s, PM 10 and O 3 pollutants frequently registered values that exceeded the established level limit of the air quality standards [35]. In this context, the government of the State of Nuevo León, in the Environmental State Plan 1995-2020 (Plan Estatal de Medio Ambiente 1995-2020), defined guidelines and developed the Administration of Air Quality Program 1997-2000 (Programa de Administración de la Calidad del Aire del AMM 1997-2000) [36]. As a result, the region uses an environmental monitoring system called the Sistema Integral de Monitoreo Ambiental (SIMA), which keeps track of air quality based on the criteria pollutants (CO, SO 2 , NO x , O 3 , PM 10 , and PM 2.5 ). Recently, air pollution was analyzed between 2006 and 2008, comparing PM 10 levels at different time scales (seasons and days) and employing the Bootstrap method to obtain samples studied under a variance analysis [37]. It was found that PM 10 values were high and exceeded the 75 µg/m 3 per day limit determined by NOM-025-SSA1-2014 in 2014 [5], and it was concluded that these levels may have severe effects on health. The southern MMA had higher PM 10 levels during the studied period. Winter was the most polluted season and summer the least. The most polluted days were Thursday and Friday, while Sunday was the least. The hours with the highest PM 10 levels were from 8:00 to 10:00 a.m., while the hours during the night had the lowest levels. Moreover, according to SIMA records, the state government has issued continuous pollution alerts in recent years, meaning that there were more than 200 days per year of poor-quality air reported by at least one environmental station in the MMA [38]. In addition, a connection has been reported between chronic exposure to particle pollution and lead in blood, as well as uncommon common diseases [39] such as cleft lip and palate [40]. Similar studies outside the region have found a clear correlation between pollutants and obesity [41][42][43].
In this work, we carried out a spatio-temporal analysis of PM 10 pollution in the MMA for the period 2010-2018 with two main objectives, the first being to provide a current and detailed quantification of temporary and geographic pollutant behavior to help to visualize the trend and distribution of pollution in the MMA, focusing on the periodicity of the time series and the detection of the most polluted zones inside the study area. For this purpose, we analyzed time series, variograms, and averaged data. The second objective was to determine the feasibility of using fractal coefficients, such as Hrs and HFD, to provide outstanding information for PM 10 time series analysis, taking into account the relations reported between those coefficients, geographic parameters, and other accumulated meteorological variables. The structure of the paper is as follows. The methodology used to construct the experimental variograms and calculate Hrs and HFD is shown in Section 2; results and their interpretation are discussed in Sections 3 and 4, respectively. Finally, conclusions are shown in Section 5.

Data Acquisition and General Observations
Data were obtained from the monthly reports published by the government of Nuevo León throughout SIMA [44] meaning that the data were validated and published as open access. SIMA used a gravimetric method based on the Reference Method for the Determination of Suspended Particulate Matter in the Atmosphere (High-Volume Method) [45], which is also the official method established by the federal code regulations of the USA [46]. The minimum period interval used in the SIMA reports is one month. We used this minimum time length to obtain the maximum amount data for our analysis for the following reasons. First, despite the fact that nowadays there are 14 working environmental stations, we collected only the information of the seven stations with more records, since the other seven stations have significant missing data or have started to work only recently. The location of the selected environmental stations and the map of the MMA are illustrated in Figure 1. The time series used are from January 2010 to December 2018. This period was chosen because additional changes regarding the distribution of data reports were made starting in June 2019. Therefore, no treatment was performed on the database; all the information analyzed is raw data. Each time series comprises about 108 data points, corresponding to 9 years of information.
In Sections 2.2-2.4, we used the experimental semi-variogram and Hrs and HFD exponents to analyze these series. Additionally, we compared those data with some air quality standards to estimate how high the pollution level is in relation to health risk. The national standard that we used is NOM-025-SSA1-2014 [5], while the international standard considered is that proposed by the WHO for the year 2021 [47]. The classification intervals for air quality of both standards are shown in Table 1.  Table 1. Labels used to define the air quality and health risk according to [5,47,48]. The last column corresponds to the corresponding PM 10 intervals.

Experimental Semi-Variogram
Let a time series of PM 10 concentration be {X t , t ≥ 0}. The semi-variogram γ(h) of such a series is the half variance of the difference between pairs of observations separated by temporal distances h, such that [49,50]: We used the discretized version of the estimator shown in Equation (1) to complete the semi-variograms of our PM 10 series: where n(h) refers to the number of temporal distances with a length h (in months) for the time series X t .

Hurst Exponent
The Hurst rescaled range (Hrs) was measured for each time series X. The procedure consists of dividing X into d subseries [51,52]. Then, the respective rescaled range is obtained by: where E m is the mean of the subseries X m , and Y j = i ∑ j=1 (X jm − E m ) represents the cumulative series of deviations considering values until the j-th subseries.
Next, the range shown in Equation (3) is rescaled and averaged, such that: where S m is the standard deviation of the series X m . Finally, Hrs is the exponent H that satisfies the power law: where c > 0 is a dimensionless constant. Practically, Hrs is calculated by applying the logarithm function to both sides of Equation (5), obtaining The transformation of Equation (6) allows obtaining the coefficient H by a linear fitting of the points, using the Least Squares Method (LSM). A key point in determining Hrs is the number of subseries m in which the temporal series is divided. In this work, we take m = 12 because of the annual periodicity of the data.

Higuchi Fractal Dimension (HFD)
The Higuchi fractal dimension (HFD) was calculated according to Higuchi [21]. Similar to Hrs, HFD divides the total time series into windows; however, its methodology is different since it does not calculate the range for the accumulative series, but for each subseries in a particular way. Namely, for a time series X with N data points [53], expresses the fractal length of the subseries X m k , where k = 1, . . . , k max indicates the k-th time, m is the interval time of that subseries, and k max is the total number of subseries used. Therefore, the mean of all the L m (k)'s subseries (Equation (7)) defines the fractal length L(k) of the wide series, so that: Similar to Hurst's procedure, in practice, L(k) can be characterized by the slope F that fits the straight line equation: which is obtained applying the corresponding logarithmic transformation from Equation (8).
In Equation (9), b > 0 is a constant, and F is a dimensionless constant. The fitting is performed by the LSM. We used a maximum number k max = 8 of subseries in this work, considering the cyclical analysis for Hurst's procedure and a good resolution in HFD values. Figure 2 shows the time series of the monthly-averaged PM 10 values for the environmental monitoring stations in the study area. In each graph, the time series corresponding to the parenthetical remarks (a) to (f), is compared with the time series corresponding to the Northwest2 station, because this environmental station differs from the others in some aspects. Namely, it recorded the highest PM 10 levels during the study period despite the fact that the municipality is a sub-urban zone outside the MMA; see Figure 1. The majority of the stone quarries are located in this zone [54], there is a difference of 200 m between that municipality and the rest of the MMA, and the main wind direction is east-west, driving the particles of all the metropolitan area to this zone [9]. The semiannual averages are marked with a discontinuous horizontal line of the same color as the graph in the corresponding period. Background colors indicate the division of the years in six-month periods. Finally, the established thresholds are marked to classify air quality by Mexican norms NOM-025-SSA1-1993 and NOM-025-SSA1-2014 [5,48] by horizontal red lines; see Table 1 for the meaning of labels of air quality. The GL limit showed discontinuity in 2015, since there was a decrease in the classification criteria of the NOM-025-SSA1-2014, which took effect that year. The horizontal dashed line is the limit for good air quality established by the World Health Organization (WHO) [47]. Table 2 displays the results of the annual averages obtained from the time series and the Hrs and HFD exponents obtained from the procedures described in Sections 2.3 and 2.4, respectively. In addition, the geographic information corresponding to each station is presented as well. The pollution variance is calculated by the semi-variogram formula (2) and shown in Figure 3 for each station. In each graph, the variogram of the Northwest2 station is compared to those of the other stations. In addition, a linear fit indicating the general tendency (smoothing the oscillating phenomenon) of each station is displayed.  Table 1. Vertical panels divide the window time into six-month periods to illustrate the periodicity of the series. Short-horizontal lines with a length of one year indicate the PM 10 mean of that year for that time series.

Time Series and Semi-Variograms
The results shown in the time series in Figure 2 indicate a general decrease in the last ten years. The variability analysis shown in Figure 3 allows us to quantify this tendency for each station.
Specifically, the Northwest2 station showed a more significant decrease than the others with a slope of 4.04 in contrast to slopes smaller than 1 for the other stations (Southeast has 1.82); see Figure 3. However, despite that decrease, the aforementioned station showed one of the highest pollution levels per six-month period, as is observable in the horizontal bars of the series shown in Figure 2. In turn, the stations located to the north and the west zones of the MMA showed a greater amount of pollution than the stations in the east. As previously reported [9], these observations indicate the accumulation of pollution in the western region caused by the predominant airflow in the east-to-west direction. This is essential knowledge for MMA inhabitants and authorities to identify or map the most dangerous zones inside the metropolis.
The time series of all the environmental stations exhibited an annual quasi-periodicity, which is clearly shown in most of the charts in Figure 3, except for the northern and southeastern stations. This phenomenon had already been suggested by González-Santiago et al. [37], who reported greater PM 10 concentrations in winter and minor concentrations in summer during 2006-2008. We found that this periodic tendency remained for 2010-2018. Indeed, aided by the division of background colors in the charts of Figure 2, greater PM 10 concentrations can be observed in the autumn-winter period than in the spring-summer period for all of the MMA. This suggests that the quasi-periodic behavior of the time series is a natural phenomenon of the region that could be related to other environmental factors that have been reported with the same cyclical characteristics for the study area, such as the seasonal rains of the region [55].
The decrease of PM 10 concentrations over time could also be a natural phenomenon or it could be caused by the actions of the corresponding government, since the federal government establishes the air quality reference limits while the state government is in charge of establishing measures to fulfill these limits. As is observable in the charts of Figure 2, although there are some stations in the interval of PH, and in particular cases, with historical records reaching VV, the values tended to decrease as previously mentioned, and managed to balance within the AM-GL ranges in the case of the North, Center, and Southeast stations. When analyzing the different periods of the seasons of the year, it can be observed that the highest peaks of all time series are concentrated in the period from autumn-winter 2010 to the beginning of spring-summer 2011. Authorities have associated those high PM 10 concentrations to three atmospheric factors [56]. The first is the lack abundance of rainfall from October 2010, which brought droughts. The second one is the low relative humidity index due to a significant increase in temperature. The third is related to the speed and direction of winds, since strong winds with speeds of 20 to 40 km/h and an average flow in the east-west direction in the MMA caused all the particulate matter that was part of the ground to enter atmospheric suspension once again and consequently increase the PM 10 indices.
Additionally, it can also be observed that time series of the Northwest, Northwest2, and Southwest stations oscillate around the AM limit with their PM 10 mean value in the PH interval; see Table 1. The other stations oscillate between lower values compared to the previous ones, from GL to AM.

Fractal Exponents, PM 10 Mean, and Geographic Parameters
According to the obtained results, it is possible to classify the seven time series as a correlation or long-term memory due to the Hurst and Higuchi exponents, with Hrs > 0.5 and HFD > 1.5 values for all the stations.
We found that the geographic parameters are as much related to the fractal exponents that were used as to the average PM 10 concentration. In detail, the average value of PM 10 follows has a quadratic behavior related to increase in altitude; i.e., the greater the height, the greater the PM 10 concentration; see Figure 4. This relation has a fitting of R 2 = 0.79, whereas Hrs and HFD do not have a clear relation, as they obtained low R 2 values in their quadratic fitting: 0.23 and 0.01, respectively.
The fittings of Hrs, HFD, and the PM 10 mean for the longitude of the stations have a behavior similar to that of altitude. As seen in Figure 5, the curve of Hurst, as well as that of Higuchi, has a concave-upwards-quadratic behavior, with bad fitting R 2 < 0.5, whereas the curve of PM 10 mean is concave downwards and presents very good fitting (R 2 = 0.87); in fact, the pollution concentration grows in the stations located mainly in the western zone.  This last observation and the PM 10 -altitude relation were expected due to the predominant airflow and the suspension of such particles in the air, as commented on in Section 4.1. In response to these clear tendencies, it can be concluded that the population of the west zone of the MMA is more susceptible to suffering diseases (e.g., cleft lip and palate [40]) caused by PM 10 than the people living in the central and eastern zones. This encourages the development of initiatives in which the government can take preventive action, either segmented or clustered within the MMA, focusing on its western zone.
On the other hand, Figure 6 shows that latitude is mainly related to the values obtained from the HFD in a quadratic way (R 2 = 0.86), whereas Hrs fits with R 2 = 0.42; the quadratic curves fitted by both methods reach their minimum value in latitude between 25.70 • and 25.75 • . Moreover, the PM 10 mean does not have a clear relation with latitude (R 2 = 0.15). Even though the series are persistent according to HFD, we cannot clearly explain the good fit between the long-term correlation measured by HFD and the latitude. In addition, there is good quadratic fitting for the Higuchi-PM 10 mean (R 2 = 0.77) and Hurst-PM 10 mean (R 2 = 0.72) (see Figure 7), which we think should be studied in more detail by increasing the number of environmental monitoring stations in future work, since this could help to numerically characterize the behavior of PM 10 through the persistence of its time series and fractal exponents.

Conclusions
Seven time series of the air pollutant PM 10 taken from environmental monitoring stations located in the Monterrey Metropolitan Area (MMA), Mexico, were analyzed, with the following results obtained: • The annual cyclical behavior reported in previous studies was the same for the period studied, from 2010 to 2018, with greater pollution levels during the autumn-winter periods and lower levels during spring-summer periods; • The PM 10 concentration has been decreasing, which is promising for the people of the MMA in the near future. Nevertheless, half of the selected stations exceeded the AM limit during their most polluted period. Therefore, studying strategies to deal with pollution can help the authorities make decisions that protect the environment and the quality of life of MMA inhabitants; • The highest values of average pollution are located in the western zone and the zone with the highest altitude in the MMA, which is probably a consequence of the main direction of wind flow, as well as geographical features. Consequently, we suggest a personalized alert system for the interior of the MMA so that the state government can issue policies and warn people from different areas of the metropolis according to the level of contamination presented by each zone; • The persistence of the series, measured with Hrs and HFD, is related in a quadratic way to the pollution mean. We did not find a correlation between these coefficients and altitude, but we found a strong correlation between HFD and latitude, which is a significant result since latitude was not related to PM 10 mean level; • The use of the semi-variograms and fractal exponents Hrs and HFD provided practical results in the time series analysis, which suggests that these techniques should continue to explored and that they should be complemented with other meteorological parameters for the corresponding authorities to take into account in their decision making.