A Bayesian Hierarchical Spatial Copula Model: An Application to Extreme Temperatures in Extremadura (Spain)

: A Bayesian hierarchical framework with a Gaussian copula and a generalized extreme value (GEV) marginal distribution is proposed for the description of spatial dependencies in data. This spatial copula model was applied to extreme summer temperatures over the Extremadura Region, in the southwest of Spain, during the period 1980–2015, and compared with the spatial noncopula model. The Bayesian hierarchical model was implemented with a Monte Carlo Markov Chain (MCMC) method that allows the distribution of the model’s parameters to be estimated. The results show the GEV distribution’s shape parameter to take constant negative values, the location parameter to be altitude dependent, and the scale parameter values to be concentrated around the same value throughout the region. Further, the spatial copula model chosen presents lower deviance information criterion (DIC) values when spatial distributions are assumed for the GEV distribution’s location and scale parameters than when the scale parameter is taken to be constant over the region. annual maximum observed temperatures at a set of


Introduction
Extreme events tend to occur naturally as topics of importance in several sciences -climatology, hydrology, engineering, etc.-but also in finance (financial crisis studies) and in the insurance industry. Extreme Value Theory (EVT) is a widely used statistical tool with which to address their study. It is used in several particular scientific fields to model and predict extreme events of precipitation [1][2][3][4][5], temperature [6][7][8], solar climatology [9,10], and financial crises [11,12].
In climatology, given the spatial nature of extreme events, it is useful to apply a spatial theory as this will improve the accuracy when estimating parameter distributions by sharing information from similar sites [13,14]. Several theories are used to address the problem of spatial extremes, some examples are max-stable processes [15], Bayesian hierarchical models [16][17][18], and copula theory [19][20][21][22].
Copula theory is increasingly being used in multivariate extreme value models in climatology [23], since Sklar's theorem [24] allows the construction of joint distributions using the desired univariate marginal distributions. Of the different copula families, there stand out the Archimedean copulas, extreme value copulas, and elliptical copulas (in particular, the Gaussian and the Student-t copulas). These copulas constitute a convenient tool for modeling dependence in a non-Gaussian and high-dimensional context [25]. Indeed, copula theory is based on describing the dependence structure regardless of any marginal distributions [26], which makes copulas an ideal candidate for modeling dependence.
Wikle et al. [27] proposed a general Bayesian hierarchical framework in which to describe the spatial variability of the distribution of some environmental variable. Their model comprises various layers. For example, in the first layer, it is assumed that the data follow a distribution with unknown parameters, while in a second layer, the variability of these parameters is modeled spatially by regression techniques. This kind of model has been used in many extreme rainfall studies [28][29][30][31] and also in temperature studies [32,33]. In most of these works however, it is assumed that the data are spatially independent given the values of the parameters of their distribution.
As pointed out by Cooley et al. [14], the spatial dependence among the parameters is related to what the authors called spatial climate dependence since the climate of a location or region is described by the statistical distribution of the variable of interest-in our case, the temperature. For readers interested in a climate parameter such as the return period or the return level, the hierarchical model used in the previously cited papers is enough to estimate those parameters. However, if someone is interested in, for example, transferring information from a gauged site to an ungauged site, the spatial dependence among the data must be taken into account in what Renard [20] called spatial weather dependence, because both dependencies are involved in the transferring information process. The spatial climate dependence allows the transfer of information from one place to another. However, the correlation between observatories decreases the content of the information available in nearby observatories [20]. Therefore, there must be an increase in uncertainty in the information transferred to the ungauged site when the correlation between observatories is taken into account. This key point of considering the spatial dependence among the data can be accomplished by means of a copula model.
In this sense, the main goal of the present work is to address spatial dependence by incorporating a copula into this model-specifically, a Gaussian copula (GC). An application of the proposed model is presented with the maximum temperatures recorded in the Extremadura Region, in southwestern Spain, for the period 1980-2015.
This communication is structured as follows. Section 2 describes the hierarchical model with copula, and Section 3 describes the proposed posterior distribution model and its use when inferring the GEV distribution parameters. The selected data are briefly described in Section 4, and Section 5 presents the results of applying the model to extreme temperatures in the Extremadura Region (Spain). Finally, some conclusions are drawn in Section 6.

Statistical Model
A Bayesian approach was chosen to address the spatial extremes problem because of its flexibility, the possibility of adding further elements or layers, and its adaptability to situations in which complex variations in the parameters of the extremes values distribution appear. In addition, one of the disadvantages shown in [18] of assuming conditional independence among observed data is resolved by introducing a Gaussian copula to control the existence of dependence, with this fact being the main difference with that model.
The proposed theoretical Bayesian hierarchical framework can be factored into the three stages shown in Figure 1. The first stage (Section 2.1) allows one to model the observations' joint distribution with a Gaussian copula and at-site marginal GEV distributions. The second stage (Section 2.2) models the GEV parameter variances with a latent process by means of a conditional probability. In the third stage (Section 2.3), prior distributions are given for the model's parameters. Thus, the posterior distribution of the parameters is calculated using Bayes' theorem (see Section 3). In the following, the model's stages will be described in more detail. Stage 1 (data level) P (data | process, parameters) Stage 2 (process level) P (process | parameters) Stage 3 (prior distribution) P (parameters) According to the theorems of Gnedenko [26] and Fisher and Tippett [20], asymptotically 88 Y s has a GEV distribution We shall denote by Y st the block maximum of the variable of interest, with s ∈ S = {s 1 , . . . , s M } and t ∈ T = {t 1 , . . . , t N }. For simplicity and consistency with the application, we shall refer to s as the site and t as the time. Further, for each s and t, Y st follows a specific distribution whose parameters vary spatially.

Data Level
According to the theorems of Gnedenko [34] and Fisher and Tippett [35], asymptotically, Y s has a GEV distribution with cumulative distribution function (cdf) where 1 + ξ s ((y − µ s )/σ s ) ≥ 0 and in which µ, σ, and ξ are the location, scale, and shape parameters, respectively. The location parameter explains the mean values of the extreme value distribution, the scale one is referred to the variability, and the shape parameter determines the rate of decay of the upper tail of the distribution. Shape parameter values below zero indicate that the distribution has an upper bound showing that the maximum values are not getting large, and values above or equal to zero indicate that the distribution has no upper limit showing that maximum values are getting infinitely large [36].
For any set of M sites, an M-dimensional Gaussian copula is assumed for the multivariate distribution of the data, with pairwise correlation matrix C and marginal distributions for which the probability density function (pdf) is shown in Appendix A. Geostatistical models use correlation matrices that capture the positive relationship between different stations through their distance. In this case, the Gaussian copula collects this information in the correlation matrix C which is positively defined. Then, it is assumed that the correlation between data from two sites depends on the distance between them. In particular, the elements of the correlation matrix are defined as where c 0 and c 1 are unknown parameters, x(s) is the geographical position (longitude, latitude) of the site s ∈ S, and x(s i ) − x s j are the distances between sites s i and s j , for i, j = 1, . . . , M. Moreover, independence is assumed between observations The multivariate distribution of the data thus comprises the following components: at-site distribution (1), Gaussian copula to model the spatial dependence (3) and correlation matrix (4).

Process Level
The second stage of the hierarchical model consists of describing the variance of the at-site GEV parameters (location-µ, scale-σ, and shape-ξ) through a Gaussian spatial process whose mean depends on covariates that describe the site's characteristics. A spatial regression model as described by Garcia et al. [18] is used: where s = (s 1 , . . . , s M ) denotes a site vector and p µ , p σ , and p ξ denote the number of regression parameters in each case. To simplify use of the notation, the parameters µ, σ, or ξ will be represented by k, so that X(s) represents p k spatial covariates (geographic coordinates), α k is a set of p k regression parameters (including the intercept), W k (s) represents a spatial model that captures the dependencies between different sites, and k (s) is the noise not included in the spatial model.
In general, we shall denote the proposed models by BHGCM-p µ p σ p ξ , and the noncopula models described by Garcia et al. [18] by BHM-p µ p σ p ξ .
With regard to Equations (5), for p k = 1, we shall assume that no covariate associated with the site characteristics is involved, i.e., and for p k = 2, the only covariate is h s , the altitude of site s, i.e., As a particular case, for p k = 0, we shall assume that the parameter k is constant, and hence, the spatial model does not intervene, i.e., W k (s) = 0.
In addition, a position-independent Gaussian model, i.e., N 0, τ 2 k , is adopted for the pure noise effect k (s). The spatial term W k (s) was considered to be a random variable with an M-dimensional normal distribution N (0, Σ k ), where the covariance matrix Σ k follows the exponential model where β k0 (the sill) and β k1 (the range) are unknown parameters. The random variables on the left-hand side of Equation (5) are assumed to have an M-dimensional normal distribution:

Prior Distribution
The Bayesian framework requires the prior distributions of the parameters included in the model to be specified. The prior distribution for the spatial regression parameters α k was a p k -dimensional normal distribution with hyperparameters chosen such that the distribution was either non-or only weakly informative. Inverse gamma distributions were taken for the sill β k0 and the variance τ 2 k parameters, and a gamma distribution for the range β k1 . For the Gaussian copula parameters, c 0 and c 1 , uniform prior distributions were assumed-U (0, 1) and U (0, 1000), respectively. A normal distribution with mean 0 was taken for the shape parameter. The parameters were assumed to be mutually independent.

Estimation
We shall apply two proposed models to the extreme temperature data (see Section 5). In the first, denoted BHGCM-200, the scale and shape parameters are constant and the location parameter is modeled as in Equation (5). In the second, denoted BHGCM-210, the shape parameter is constant and the location and scale parameters are modeled as in Equation (5), with p µ = 2 and p σ = 1, respectively. The model that best fits the data will be compared with the equivalent noncopula version.
As mentioned above, Bayes' theorem allows one to calculate the posterior distribution of a proposed model as being proportional to the product of the probabilities described in Figure 1. To simulate the posterior distribution of each of the proposed models, a Markov Chain Monte Carlo (MCMC) method was applied, in particular, using a Gibbs sampler with embedded Metropolis-Hastings steps [37] and-as appropriate given the characteristics of these methods-the Gelman-Rubin diagonal convergence test [38]. For this last test, the CODA package [39] of the R language was used.
Four parallel chains with sizes of 30,000 values were constructed starting at different points. For each chain, 10,000 values were used as burn-in, leaving 20,000 values less 10 taken for thinning. The last 2000 values of each chain were combined to form a single chain of 8000 values with which to construct the posterior distribution.
The code used to carry out the simulations was written in FORTRAN, closely following the procedure set out. Maps were prepared using the R package ggplot2 [40], with the geographical coordinates provided by Spain's National Centre for Geographic Information (Centro Nacional de Información Geográfica, CNIG) [41].

Posterior Distribution
For each GEV parameter that changes in the model, the hierarchical framework described in Section 2 estimates the following unknowns: Gaussian copula parameters (c 0 and c 1 ), regression parameters (α k ), sill parameter (β k0 ), range parameter (β k1 ), and variance parameter (τ 2 k ). For the BHGCM-200 model, the posterior distribution is and for the BHGCM-210 model, it is Assuming independence between the observations (Y s 1 t , . . . , Y s M t ) and Y s 1 t , . . . , Y s M t (t, t ∈ T with t = t ), the likelihood function of the observations is given by This shows the details of Equations (A3)-(A5) in Appendix B.

Assessment of the Models' Goodness-Of-Fit
The deviance information criterion (DIC) described by Spiegelhalter et al. [42] was used to choose the model that best fits the observed data. The best model has the lowest DIC value. The parameter values needed to calculate the DIC were those determined through the MCMC procedure. The criterion is defined as (c) p θ =D θ − D θ is a parameter that controls the complexity of the model (effective number of parameters), where D θ is the deviance of the posterior meanθ of the parameter of interest.
In particular, the goodness-of-fit was used to compare the proposed copula models (BHGCM-p µ p σ p ξ ), and the resulting model with the lowest DIC value will be contrasted with the equivalent noncopula model described by García et al. [18].

Inference
The MCMC method gave the posterior distribution of the GEV distribution's parameters at the gauged sites, s. A set of replicates of size nsim was generated from the posterior distribution for the parameter vector c . This sample was then used to infer the GEV distribution's parameters at an ungauged sites by applying the following algorithm (Algorithm 1):

Algorithm 1 Ungauged Site
Do for l = 1 . . . nsim: 1: Using the well-known formula for conditional Gaussian distributions, generate W k (s) with a normal distribution of mean µ cond and standard deviation σ cond given by where W In addition, the proposed theoretical model provides observations at ungauged sites with a spatial dependence on other sites that is controlled by a Gaussian copula. Algorithm 2 is the scheme used to simulate these observations. This algorithm provides a sample of the posterior predictive distribution (PPD) of observations defined by Gelman [43] as p(ỹ|Y) = p(ỹ|θ) · L(θ|Y) dθ, where θ is the parameter vector.

Algorithm 2 Observations
Do for l = 1 . . . nsim: 1: Calculate the GEV distribution's parameters for an ungauged sites with Algorithm 1, taking into account the GEV parameters of the gauged sites s obtained with the MCMC generated sample.

Data
The data used in this study are annual maximum observed temperatures at a set of meteorological observatories distributed over the Extremadura Region (Spain), from 1980 to 2015. These time series were provided by Spain's State Meteorological Agency (Agencia Estatal de Meteorología, AEMET). Figure 2 shows the location of this region within Spain and the spatial distribution of the observatories considered. In particular, there are M = 28 meteorological observatories, each providing a time series of N = 36 extreme temperatures. 35 Long.

Figure 2. Location of the Extremadura Region within Spain (left). Topographic map of Extremadura together with the locations of the meteorological observatories used in this study (right).
For a site s ∈ S, temperature and altitude are correlated, as higher altitudes mean lower temperatures, i.e., temperature decreases with increasing altitude. This reason, together with the fact that Extremadura is not a large region, led us to take the altitude of the sites as being the only covariate in the regression model (5). The altitude was standardized as follows:h where h obs s is the altitude above mean sea level of site s ∈ S, and max s h s and min s h s are the maximum and minimum altitudes of all the sites, respectively.

Results
The Bayesian spatial copula model was applied to the described data set (see Section 4). In particular, the BHGCM-200 and BHGCM-210 models were compared using the DIC as a measure of the goodness-of-fit (see Section 3.2). The parameters of the model that best fits the data were compared with the equivalent noncopula model (BHM).

Evaluation of the Models
As noted above, the DIC was employed to compare the two candidate spatial copula models. The results are presented in Table 1. One observes that the copula model in which a spatial model intervenes in the location and scale parameters' regressions, i.e., BHGCM-210, has a lower DIC value than the model in which the scale parameter is constant (BHGCM-200). Therefore, the spatial copula model chosen is BHGCM-210. Table 2 gives the results of applying the goodness-of-fit to the BHM-210 model. Recall that the parameter p θ indicates the complexity of the model, and, as can be observed in Table 2, it is greater in the BHGCM model, which was to be expected given that the proposed models are intrinsically more complex. However, this increase in complexity over the models proposed by García et al. [18] is acceptable since the variance in the models proposed in the present work is less than in the BHM cases.

Parameter Estimates
In this subsection, the regression parameters of the BHM-210 model and the proposed BHGCM-210 model are compared. Figure 3 shows the estimate of the posterior distribution density function from the selected BHGCM-210 model (red line) versus that estimated by the BHM-210 model (blue line) for the different regression coefficients α. The regression coefficient α σ1 (lower left panel) of the scale parameter, σ, has a mean value of 0.78 ºC (SD: 0.51 ºC) for the BHGCM model. Moreover, this model provides an estimate of the density function that is less concentrated than that given by the BHM model. With regard to the regression coefficients for the location parameter, µ (top row), the two models give qualitatively similar posterior density function estimates. In particular, the coefficient α µ1 (top left panel) has a mean of 41.12 • C (SD: 0.84 • C), while the coefficient α µ2 (top right panel) is clearly negative with a mean of −2.97 • C km −1 (SD: 1.15 • C km −1 ). These negative values are consistent with the fact that temperature decreases with altitude.
Another interesting result is the covariance function of the GEV parameters given by Equation (8). Table 3 lists the medians and (2.5%, 97.5%) quantiles of the sill (β k0 ) and range (β k1 ) coefficients for the location and scale parameters in models BHM-210 and BHGCM-210. The values of these coefficients are of similar orders of magnitude for the two models. Since the range is a measure of the strength of spatial dependence for the location and scale parameters, one observes that, in both models, this dependence is weaker for the location parameter than for the scale parameter.

Validation of the Models
The posterior predictive distribution (PPD) provides temperature values for the measured observatories that can be compared to the observed temperatures. The error (E) and the absolute error (AE) were used to validate the BHGCM-210 model. These errors were calculated using the values obtained from the PPD. They define as whereŶ st is the predictive temperature, Y st is the observed temperature, s ∈ S = {s 1 , . . . , s M }, and t ∈ T = {t 1 , . . . , t N }. Figure 4 shows the density function of E (left panel) and AE (right panel) for the BHM-210 model (black color) and BHGCM-210 model (red color). The error's density functions are symmetric around their mean values of 0.23 and 0.06 for the BHGCM-210 and BHM-210 models, respectively. Note that the mean values represent the bias of the models. The precision, measured as the absolute error, is slightly better in the copula model than in the noncopula model. This is related with the increase in uncertainty, previously mentioned in the Introduction.

Inference
The chosen model (BHGCM-210) estimates the location, scale, and shape parameters of the at-site GEV distribution. These parameters are predicted at the ungauged sites with Algorithm 1. Figure 6 shows the estimated posterior distribution density function for the shape parameter. This parameter takes clearly negative values, with a symmetric and homogeneous distribution around the mean value −0.38 and a standard deviation of 0.03, indicating that there is an upper bound on how high extreme temperatures can be. This leads to a quick decrease of the rate of decay of the extreme temperature distribution, so it does not increase infinitely. Moreover, it is relevant to highlight the low value of the standard deviation obtained when compared with that obtained for the shape parameter estimated in each location individually. Using the maximum-likelihood fit, the standard deviation of the shape parameter varies from 0.09 to 0.12-meaning more than three times that obtained with the spatial model. Therefore, pooling nearby data leads to a decrease in the statistical uncertainty of the parameters and implies an important advantage of using spatial models.  Figure 7 shows the spatial posterior distributions of the means and standard deviations of the location and scale parameters. The spatial posterior distribution of the location parameter shows its dependence on altitude, with mean values between 39.29 • C and 41.12 • C and standard deviations between 1.31 • C and 1.41 • C. The areas with low values of the location parameter correspond to those of higher altitude, since the temperature is highly determined by the orography (see Figure 2). The spatial posterior distribution of the scale parameter shows no spatial dependence, taking very similar values over the entire region (between 2.96 • C and 3.24 • C), and the standard deviations are very concentrated throughout the region. 37

1.
Bayesian hierarchical models, BHM, proposed by García et al. [18] present the problem of assuming spatial independence between observations at different sites. The present work has addressed this problem by introducing a copula.

2.
A Gaussian copula is assumed as a joint distribution with at-site GEV marginal distributions. In this way, the spatial dependence of observations from different sites is represented by a correlation matrix. In addition, spatial regression models of the GEV parameters are proposed.

3.
Two BHGCM models are proposed: BHGCM-200 takes a spatial regression model for µ while the parameters σ and ξ are constant; BHGCM-210 takes spatial models for µ and σ, while the parameter ξ is constant.

4.
The BHGCM-210 model has a better DIC goodness-of-fit value than the BHGCM-200 model and the noncopula BHM-210 model.

5.
For the GEV distribution's location parameter, the BHGCM-210 and BHM-210 models give qualitatively similar estimates of the regression parameter posterior distributions. 6.
For the GEV distribution's scale parameter, the BHGCM-210 model gives a distribution with greater variance than that given by the BHM-210 model. 7.
In the BHGCM-210 model, the GEV shape parameter takes negative values, and its posterior distribution is symmetrical and highly concentrated around −0.38. Therefore, the extreme temperature distribution is not expected to increase too much. 8.
The BHGCM-210 model gives a spatial posterior distribution for the location parameter that is strongly dependent on altitude, unlike the scale parameter. The location parameter's mean values in the region lie between 39.29 • C and 41.12 • C. 9.
In the BHGCM-210 model, the scale parameter's spatial posterior distribution is very concentrated, taking very similar values throughout the region. Acknowledgments: Thanks are due to the Spanish State Meteorological Agency for providing the daily temperature time series used in this study.

Conflicts of Interest:
The authors declare that there were no conflicts of interest. The funders had no role in the design of the study, in the collection, analyses, or interpretation of data, in writing the manuscript, or in the decision to publish the results
In copula theory, the most important theorem is Sklar's Theorem, because it permits establishing a relation between multivariate distributions and copulas.