Developing a Hierarchical Model for the Spatial Analysis of PM10 Pollution Extremes in the Mexico City Metropolitan Area

We implemented a spatial model for analysing PM10 maxima across the Mexico City metropolitan area during the period 1995–2016. We assumed that these maxima follow a non-identical generalized extreme value (GEV) distribution and modeled the trend by introducing multivariate smoothing spline functions into the probability GEV distribution. A flexible, three-stage hierarchical Bayesian approach was developed to analyse the distribution of the PM10 maxima in space and time. We evaluated the statistical model’s performance by using a simulation study. The results showed strong evidence of a positive correlation between the PM10 maxima and the longitude and latitude. The relationship between time and the PM10 maxima was negative, indicating a decreasing trend over time. Finally, a high risk of PM10 maxima presenting levels above 1000 μg/m3 (return period: 25 yr) was observed in the northwestern region of the study area.


Introduction
Air pollution in urban areas has become a major problem [1]. Increases in industry and urban traffic due to economic and population growth have led to an increase in gas and particulate emission that contribute to air pollution [2]. Among air pollutants, fine inhalable particles, known as particulate matter (PM) are associated with respiratory illnesses such as bronchitis, emphysema, asthma and other chronic obstructive pulmonary diseases [3]. PM can be classified by size; particles of 10 µm or less in diameter are called PM 10 . They can consist of diverse solid and liquid particles, which may contain chemical constituents such as nitrates, sulfates and organic carbon [4]. PM originates from factories, internal combustion engines, heating systems, volcanoes, and deserts, and can include dust particles formed through the mechanical breakdown of larger particles during agricultural and mining processes [5]. In the Mexico City metropolitan area, a study conducted by the U.S. Department of Energy (DOE) and Mexico's Petróleos Mexicanos (PEMEX) showed that approximately 50% of the PM 10 was composed of dust from roadways, construction, and bare land [6].
Previous studies on the damage caused to human health by breathable particulate matter have revealed an association between high concentrations of PM 10 and mortality due to respiratory diseases [1,7]. To reduce exposure and minimize the adverse effects of these particulates on public health, several studies have been conducted with a focus on understanding the causes and factors related to the origin and flow of these particles [4,8]. Most of these studies have relied on continuous measurements of PM 10 to predict future concentrations based on various models, such as multiple regression models, neural networks [9] and generalized linear models [5]. However, these short-term forecasting methodologies were developed for use in locations with limited infrastructure, and where obtaining continuous measurements of PM 10 concentrations is difficult. In other, more densely populated regions, such as the Mexico City metropolitan area, there are systems that measure these concentrations and send information in real time to a central location for the immediate activation or deactivation of alerts or emergency procedures. In these cases, it is more convenient to study air pollutants through the use of robust methodologies such as the theory of extreme values.
Extreme value theory is used in many fields, such as environmental studies, engineering and finance, to monitor the quantitative risk of future extreme events [10,11]. In the environmental sciences, it is used to model extremes of temperature, rainfall, wind and pollution, among other phenomena. The asymptotic distribution of maxima is known as the generalized extreme value distribution, which is described by three parameters corresponding to location, scale and shape [12]. These parameters are estimated using the maximum likelihood method [13]. However, this method is not robust with a small sample size, so many other estimation methods have also been proposed, such as the method of moments, the use of L-moments and the use of weighted probability moments [14][15][16].
Recently, new methodologies have been proposed for the study of extreme values, mainly for application to hydroclimatological and environmental data, all of them based on the generalized extreme value distribution. For example: Gaetan and Grigoletto [17] proposed the use of Markov random fields approximated based on smoothing kernel parameters for modeling the parameters of the GEV distribution. Reich et al. [18] studied heat waves using a Bayesian hierarchical model with the generalized Pareto distribution (GPD). Cooley and Sain [19] studied maximum rainfall events by assigning a normal prior to the parameters of the GPD. Sang and Gelfand [11] studied the extreme values of spatial stochastic processes and modeled the observed trend as a function of covariates.
In a real scenario, it is common that conditions change and that the assumptions of stationarity that are required in a traditional analysis of extreme values not met; this is because there are often trends of extreme values [20][21][22]. Recent studies have introduced covariate functions for describing extreme value distributions to model either the location parameter, the scale parameter or both. Regarding the location parameter, Weissman [23] used a sine function, Tawn [24] and Scarf [25] proposed a linear function, Rosen and Cohen [26] and Pauli and Coles [27] used splines, and Bocci et al. [28] used a geoadditive model. In the case of the scale parameter, because this parameter is assumed to be positive, it is more common to model its logarithm; therefore, Yee and Stephenson [29] used additive models, El Adlouni et al. [10] and Rodriguez et al. [30] used linear functions, and Cannon [31] proposed the use of neural networks.
In this article, we present a spatio-temporal analysis of extreme PM 10 values in the Mexico City metropolitan area. The objective of this study was to analyse PM 10 data collected over time and in different spatial locations to gain insight into the distribution of PM 10 maxima and to quantify the risk of future extreme PM 10 pollution events, even in non-monitored regions.

Study Area
The Mexico City metropolitan area is one of the world's largest urban agglomerations, with approximately 25.4 million inhabitants spread over 7866 km 2 at an average elevation of 2240 m above mean sea level. It is surrounded by mountains to the east and west, creating a basin with low points to the the north which leads to air pollution problems because of limited ventilation [6]. Two synoptic wind regimes prevail throughout the year: an anticyclonic westerly wind from November to April and a moist trade wind associated with the rainy season from May to October. Despite weak prevailing synoptic winds, the thermally induced local wind converges toward the city, which tends to restrict the ventilation of polluted air [32]. The study area and the locations of the primary sampling sites are shown in Figure 1.

A Nonstationary GEV Model
Let Y 1 , ..., Y n be an independent and identically distributed set of random variables with distribution function F X (x) and let M n = max (Y 1 , ..., Y n ). Let F M n be the distribution function of M n , because F M n = F X (x) n we have that M n is a degenerate distribution when n → ∞. The extreme value theory considers that the only nondegenerate limiting distribution G n = (M n − a n )/b n (if such a sequences of constants {b n > 0} and {a n } exist) as n → ∞ is the generalized extreme value (GEV) distribution [12]: , −∞ ≤ y ≤ +∞ when κ = 0 (Gumbel) and µ + σ/κ ≤ y ≤ +∞ when κ > 0 (Fréchet). Here, µ ∈ R, σ > 0 and κ ∈ R are the location, scale and shape parameters, respectively; see [33].
In the nonstationary case, the parameters are expressed as a function of a vector of covariates [34]. To ensure a positive value for the scale parameter, log σ(x t ) is used instead of σ(x t ) in the estimation process.

Proposed Approach
For the non-stationary case, consider assigning a linear predictor to the location and scale parameters. The κ parameter is usually assumed to be constant [29]. Therefore, we propose using the following conditions for the above parameters: where X is a scaled and centered n × p 1 matrix of covariates that includes the intercept; β i , i = 1, 2, is a vector of length p 1 ; u i , i = 1, 2, is a vector of length p 2 ; Z is an n × p 2 matrix such that . . , n, j = 1, . . . , p 2 ; x i is the vector of covariates for the i-th observation, scaled and centered; and k j the j-th centroid obtained using the method of average linkage hierarchical clustering [28,35].

Maximum Likelihood Estimation
For a sample of n observations: y = (y 1 , ..., y n ), the maximum likelihood estimator for the non-stationary GEV can be determined by maximizing the likelihood function where n is the number of observations. The function of κ t is usually assumed constant [29], so the log-likelihood is: is a vector of p × 1 parameters, the linear predictors can be written as: For this study, we assigned a penalization (P) to the vector of parameters, such that: where I p 1 and I p 2 are identity matrices of order p 1 and p 1 respectively, σ 2 β and σ 2 u are values that control the degree of regularization of the model. Therefore the penalized log-likelihood of the model is: rewrite the log-likelihood as: In order to compare two models, let M 1 with θ 1 a parametric vector against another model M 0 with θ 0 a subset vector such as M 1 ⊂ M 0 , a simple way to compare two models is to use the deviance statistic defined by [34] where l * n (M) is the maximized log likelihood function of model M. Values of D greater than χ 2 k are considered significant, where k is the difference between the dimensions of M 1 and M 0 , thus model M 1 explains better data variation than model M 0 .
Penalized maximum likelihood estimators are used in this work for two reasons, the first is to use these estimators to perform a procedure of variables selection through the deviance, and second, we will use these estimators as initial values for the Bayesian hierarchical model to reduce the number of samples necessary to reach the stationary distribution of the MCMC algorithm.

Bayesian Implementation
Under the assumption that π (y t |θ t ) is the GEV distribution with parameters of θ t = (µ t , σ t , κ), a Bayesian formulation for the model of extreme values is as follows: where ω = (β 1 , β 2 , u 1 , u 2 ) is such that the set of equations given in Equation (1) is satisfied. This hierarchical model consists of three levels: a data level, denoted by π (y t |θ t ); a process level, denoted by π (θ t |ω); and a prior level, denoted by π (ω). Alternatively, the model given in Equation (3) with the conditions given in Equation (1) can be reformulated as a function of the parameter space of and ω * * = {σ 1 , σ 2 }; therefore, we can write the posterior distribution as follows: where π (y t |ω * ) is the GEV density under the conditions on the parameters given in Equation (1). The prior distribution for ω * is such that The prior distribution for the hyperparameters ω * * is given by To sample the a posteriori distribution, we use a MCMC method with an acceptance probability given by where π (θ) is the prior distribution for the parameters, π (x|θ) is the likelihood, and Q is the proposal function.

Simulation Study
In this section, we examine the performance of the hierarchical GEV model defined in Equation (4) using simulated data. We simulated 500 extremes from a GEV model with two covariates, x 1 and x 2 , corresponding to the longitude and latitude, respectively. The x 1 values were generated with equally spaced data in the range of 0 to 4π, and the values of the covariate x 2 were randomly selected from the interval [0, 4], with For our model, we ran 70,000 iterations to obtain samples of the a posteriori density, after a burn-in period of 60,000 iterations, and applied thinning in every fifth iteration. The hierarchical model given in Equation (4) was fitted by setting p 2 = 10 in equation set Equation (1). The estimate of the shape parameter was −0.14. The true functions of µ and σ as functions of the covariates x 1 and x 2 are shown in Figure 1a,c, respectively. The function corresponding to µ that was chosen for the simulation is a function that cannot be separated based only on the main effects of the covariates, so it cannot be adjusted as in most traditional models for extreme values. The function for sigma is a flat function in the space covariate.
The true surface and local spatial patterns of the location parameter ( Figure 2a) that were used to simulate the extreme values were recovered reasonably well by the smoothing function ( Figure 2b). Similarly, a comparison of Figure 2c,d reveals that the smoothing function for the scale parameter of the extreme values based on the model given in Equation (1) recovered the true flat function for σ given in Equation (5). To evaluate the performance of our model, we reserved a set of 1000 locations to serve as the testing set and then calculated the correlation with respect to their estimated values. Specifically, for the location parameter, we obtained a correlation of 0.99 between the predicted values and the testing data. Typically, when the trend found in the maxima can be modeled using nonlinear functions of a set of covariates, the interpretations of the individual coefficients of the smoothing function lose importance, and they are often meaningless. Therefore, the main information about the trend is provided by the smoothing function constructed from the complete set of estimated parameters.

Data Analysis
We constructed a GEV model of the PM 10 maxima in the Mexico City metropolitan area, using multivariate smoothing functions of spatio-temporal covariates, latitude (s1), longitude (s2) and time (t), grouped into X (1) and meteorological covariates, wind speed (ws) and relative humidity (rh), grouped into X (2) , to fit the trends in the non stationary GEV model. We defined two models to estimate the GEV parameters, the GEV0 model which included the spatio-temporal covariates and the GEV1 model that included the spatio-temporal covariates as well as the meteorological covariates.
According to the proposed approach in Equation (1), we define the joint contribution δ θ t (X) of the set of covarites X corresponding to the θ t parameter of the GEV distribution, as follows The GEV0 model is a baseline model that incorporates the spatio-temporal covariates X (1) using a multivariate spline structure that implicitly includes the interaction between the covariables of the group, as follows The GEV1 model is an extension of the GEV0 model that incorporates in an additive way the meteorological set of covariates X (2) with the structure given by Equation (6).

Results and Discussion
A descriptive summary of the data is shown at Table 1. Four examples of quarterly block maxima of PM 10 levels are presented in Figure 3. The U.S. and Mexican standard for PM 10 pollution levels is 150 µg/m 3 . An analysis of Table 1 shows that in the area of Villa de las Flores, the permissible level was exceeded by more than three quarters of all measured extreme values. At three of the monitoring stations, CUA = Cuautitlán, NET = Netzahualcoyotl and XAL = Xalostoc, all observations exceeded the permitted standard level, this is because these locations have a high population density and more concentrated industry. The peak PM 10 concentrations exceeded 1000 µg/m 3 at CES = Cerro de la Estrella, MER = Merced, SAG = San Agustín, VIF = Villa de las Flores and XAL = Xalostoc; at SAG = San Agustín station, the level recorded was 10 times higher than the recommended safe limit.  Figure 4 shows boxplots of the PM 10 maxima at each of the 31 monitoring stations considered in the study. The sites with lower PM 10 maxima, less than 151 µg/m 3 on average, are located in MGH = Miguel Hidalgo, MPA = Milpa Alta, AJM = Ajusco Medio and CUA = Cuajimalpa. Relatively well-preserved areas can still be found at these sites, which also have the lowest industrial activity and urban growth in the study area. By contrast, we also found sites with more than 723 µg/m 3 PM 10 on average, such as NET = Netzahualcoyotl, XAL = Xalostoc, CES = Cerro de la Estrella and MER = Merced, which are characterized by high industrial activity and heavily traveled paved and unpaved roads with heavy vehicular traffic [6]. In order to determine the significance of the meteorological variables in the non-stationary GEV model, we adjusted the GEV0 and GEV1 models by using the method of maximum likelihood (ML) and penalized maximum likelihood (PML) and we compared these two models through the deviance statistics. The results presented in Table 2 show that the model GEV1 is not statistically better than the baseline model GEV0, therefore the set of meteorological covariates wind speed (ws) and relative humidity (rh) did not present evidence at the 95% level to be included in the non-stationary model GEV for the modeling of PM 10 maxima. Table 2 shows the effect of penalization of the parameters in the model, in the case of maximum likelihood method none regularization was performed, therefore PL (b (1) , b (2) , κ) is considerably greater than the case of the penalized maximum likelihood. The maximum likelihood estimators have desirable statistical properties, however penalized estimators are preferred because they control the overfitting and reduce the instability of the estimates. In this work penalized estimators are used as initial values for the Bayesian hierarchical model, in order to reduce the number of chains necessary to reach the stationary distribution of the sampling MCMC algorithm.

% Method
Model n (y | µ t , σ t , κ) PL (b (1) , b (2) , κ) Deviance p-value Once the set of covariates involved in the model is determined, a hierarchical Bayesian model was fitted to analyse the PM 10 maxima in the Mexico City metropolitan area. At the first level, we modeled the PM 10 maxima using the generalized extreme value distribution; at the second level, we used multivariate smoothing spline functions to model nonstationary spatio-temporal extremes; and finally, at the third level, we assumed a priori distribution functions for the parameters of the model. Unfortunately, the posterior density in Equation (4) is not analytically tractable, so we implemented our model via the random walk Metropolis-Hastings algorithm to draw samples of the unknown parameters.
We performed the analysis using the statistical software R (version 3.3.1, R Foundation for Statistical Computing, Vienna, Austria). We fitted our Bayesian hierarchical model with the conditions expressed in Equation (6), setting p 2 = 10. We ran 70,000 MCMC samples after a burn-in period of 60,000 iterations, with thinning every fifth iteration. A look at the log-likelihood of the chain revealed convergence toward the stationary distribution of the MCMC algorithm.
Evidently, the extreme values of PM 10 vary over time and from area to area because of local conditions, such as the topographic setting or the local wind. This fact justifies the modeling of the location parameter with respect to time and space. We verified this behavior by means of the estimates presented in Table 3, which show that the magnitudes of the studied maxima tend to decrease over time. Similarly, the boxplots in Figure 4 show that the distribution of the maxima is not the same in all monitored locations; therefore, a suitable model for the PM 10 maxima must include temporal and spatial variations and their possible interactions. One of the strengths of our study is that the proposed model implicitly includes interactions between covariates, whereas most models consider only the main effects through additive or linear functions. The estimates for the parameters of model Equation (1) corresponding to the coefficients of the main effects of the space-time covariates in the location and scale smoothing functions as well as the shape parameter are shown in Table 3. In this case, the vector of parameters β 1 corresponds to µ, and the vector β 2 corresponds to log σ t . We noticed that the effect β (1)t of time on the location parameter is negative and that the effects β (1)s1 and β (1)s2 of the longitude and latitude are both positive. Because of the smoothing term in Equation (6) and the standardization of the covariates used in the model, it is not possible to establish a direct relation between the parameters estimated in Table 1 and the parameters of the GEV distribution in model Equation (6), except through the terms β (1)0 and β (2)0 , which can be interpreted as the values of the location and scale parameters, respectively, of the GEV distribution in the middle of the studied period and in the center of the geographic area considered in the study. The spatial smoothing for the location and scale functions is shown in Figure 5. The estimation function for the location parameter in the latitude-longitude coordinate system shows that the PM 10 maxima tend to increase in the northwesterly direction. The scale parameter also increases in the northwesterly direction, but unlike for the location parameter, a slight decrease is observed in the region close to Iztacalco. The estimate of the shape parameter is 0.1715, and its 95% credible interval is (0.1694, 0.1737). We have developed and validated a model of extreme values using a robust and solidly-based theory, an important application of the results of our study is through a risk map or return level map. The return level Z p is the threshold at which an extreme value is exceeded with probability p, which is expected to occur once every 1/p years [34]. Figure 6 shows the return levels for a return period of 25 years. Accordingly, the trend of increased risk in the northwest of the study area is preserved, with the greatest levels of risk in areas close to VIF = Villa Flores, SAG = San Agustín and ACO = Acolman and a lower level of risk in the area surrounding Pedregal. Previous studies have analysed extremes of air pollutants using the generalized extreme value distribution [30,36]. However, the majority of these investigations have used information from a single site and consequently have ignored the aspect of spatial variability, which may result in underestimation of the GEV parameters. Several approaches to the spatio-temporal modeling of extreme values [19,28] have been implemented for environmental extreme data analysis. Therefore, on the basis of these studies, we have proposed a model that offers structural advantages for the modeling of extreme values.
Similar to the findings reported by studies in other countries [3], the extreme PM 10 concentrations studied here exhibited spatial variations within the study area.
These results were consistent with the environmental characteristics of each monitored region: the locations with the highest industrial and population densities showed higher concentrations of PM 10 , and consequently, in these locations, the GEV model yielded high estimates of the location and scale parameters and the 25-year return map showed greater risks of extreme future events.
Most PM 10 modeling studies on short-term dynamics have found that weather covariables such as wind speed or temperature are significant in the model [37]. This is not always the case in studies with a longer time horizon, where the conclusions for the long term are more robust. One of the reasons is due to the temporality of the data, daily maxima present a different association with respect to the meteorological covariates than the maxima obtained in a longer time window, such as quarterly maxima.
One of the findings of our study is the negative contribution of the effect of time on the trend of PM 10 maxima; an explanation of this phenomenon is the continuous implementation of new emission control policies and the continuous revision of environmental norms in the Mexico City, which have recently become increasingly strict, mainly by regulating the traffic and restricting the automotive vehicles that can be used i.e., vehicle's model year. Additionally, the Mexico City government have implemented a website for monitoring air quality (http://www.aire.cdmx.gob.mx/default.php), thus environmental contingency alerts are activated to mitigate the adverse effect of air pollutants on public health.

Conclusions
Recent years have seen a growing interest in the monitoring of PM 10 air pollution because of the multiple health problems it causes in densely populated areas. In the Mexico City metropolitan area, PM 10 air pollution often originates from dust from roadways and organic black carbon formed during combustion processes. This has led to is an ongoing public health problem over the past several years affecting a large section of the population. Recently, due to population growth and the increase in the number of combustion vehicles and industries, the PM 10 air pollution problem has worsened. Therefore, it is extremely important to study the spatio-temporal trend of the PM 10 maxima and provide this information to the management authorities so that prevention standards and policies can be reviewed and adapted to prevent extreme cases of PM 10 air pollution.
Because wind speed (ws) and relative humidity (rh) are statistically significant covariates in most models for the short-term forecast for PM 10 concentration, we consider including these covariates in the non-stationary GEV model. However by using the deviance statistic and the chi-square test, we found that at a credible level of 95%, the model with these covariates was not statistically better than the model that did not include these variables. Therefore, wind speed and relative humidity were not statistically significant in the non-stationary GEV model with quarterly block maxima of PM 10 levels. The return levels for a return period of 25 years revealed a clear spatial trend of increased levels of PM 10 in the northwest study area and a decreasing trend in the extreme values over time.
In this study, we implemented a methodology to analyse the nonstationary extreme data and to perform a spatio-temporal study of the maxima of breathable particulate matter pollution (PM 10 ) levels in the Mexico City metropolitan area. These PM 10 levels usually vary in space and time, and can potentially include significant spatio-temporal interactions. Therefore, the commonly used models can underperform the GEV estimates and consequently result in inaccuracies. We achieved this using an analysis of non-stationary extreme values, in which we used a combination of existing traditional statistical techniques. We used the maximum likelihood method to perform a variable selection step, the penalized maximum likelihood estimators to obtain initial values and reduce the convergence time of the MCMC algorithm and fitted the estimators using a Bayesian approach to eliminate potential invalid parameter values. The combination of these statistical techniques gives support and solidity to our results.
We proposed a modification to the Bayesian estimation methods used in the previous studies related to the analysis of extreme values applied to environmental areas. In our model, the covariates are included in the generalized extreme value distribution through multivariate spline functions and, therefore, interactions between the covariates are also considered in the model. We observed that this approach can be easily extended to the modeling of extreme events and the generation of risk maps for air pollution, rainfall, heat waves, wind speed, etc. The results of the simulations conducted led us to conclude that the methodology is adequate and reliable for this type of study. Additionally, as a first attempt, time has been treated as a covariate. Extensions of this work should consider a more general model for spatio-temporal analysis, a specific analysis of the prior distributions of the parameters, and a method for determining the correct number of nodes in the multivariate spline functions.