Facing Missing Observations in Data—A New Approach for Estimating Strength of Earthquakes on the Pacific Coast of Southern Mexico Using Random Censoring

We introduced a novel spatial model based on the distribution of generalized extreme values (GEV) to analyze the maximum intensity levels of earthquakes with incomplete data (randomly censored) on the Pacific coast of southern Mexico using a random censorship approach. Spatiotemporal trends were modeled through a non-stationary GEV model. We used a multivariate smoothing function as a linear predictor of GEV parameters to approximate nonlinear trends. The model was fitted using a flexible semi-parametric Bayesian approach and the parameters are estimated via Markov chain Monte-Carlo (MCMC). Through a rigorous simulation study, we showed the robustness of both the model and the estimation method used. Maps of the location parameter on the spatial plane for different periods of time show the existence of local variations in the extreme values of seismicity in the study area. The results indicate strong evidence of an increase in the magnitude of earthquakes over time. A spatial map of risk with maximum intensity of earthquakes in a period of 25 years was elaborated.


Introduction
Earthquakes are among the natural disasters that have caused the greatest harm to humanity. They are one of the main types of natural disasters that can occur without warning, resulting in devastating effects and even thousands of deaths in a matter of seconds. Nowadays, considerable efforts have been made to study and investigate the possible causes and mechanisms that trigger an earthquake, but no method has been able to predict the occurrence of an earthquake. Although the short-term forecast of an earthquake is currently an unresolved problem, we can still use the distribution of the maximum to study the extreme values and thus calculate the long-term risks of intense earthquakes. The occurrence of an earthquake is a problem involving multiple geophysical processes, such as the movement of magma, the rotation of the Earth, the resistance of materials in the subduction zones and many other causes and factors. Different models have been proposed to study the dynamics of the movements produced during an earthquake using statistical physical approaches [1]. Some research indicates that there are some precursor phenomena such as changes in seismic activity, electromagnetic signals, variations in the ionosphere and chemical emissions [2][3][4][5]. of return, particularly in the analysis of extreme flood values [19]. Extreme value of earthquake intensity data can also be studied by means of Type III censorship because the threshold used in the censorship mechanism is a random variable. In this sense, the main objective of this study was to analyze earthquake intensity data with the extreme value model corrected by Type III censorship or random censorship. We justify the random censorship in our analysis by the following fact. It is known that an earthquake occurred above a random threshold in certain blocks used to determine the block maxima; however, their current values are unknown and are marked as censored values.

Study Area
The seismicity and tectonics of southern Mexico are characterized by the subduction of the Cocos and Rivera plates beneath the North America plate [28]. The Cocos plate has been subducting underneath the North American plate at a constant and relatively shallow angle of 12 • -15 • at a rate of about 6 cm/year [29], affecting the seismicity of the states of Michoacan, Guerrero and Oaxaca. Similarly, the Cocos and North American plates are joined to the Caribbean plate in a trench-shaped area forming the zone of subduction of the Cocos plate beneath that of the Caribbean, increasing the seismicity in the states of Oaxaca and Chiapas. The subduction of the Rivera plate beneath the Jalisco block is relatively low, which affects the states of Colima and Jalisco. Figure 1 shows the study area as well as the spatial distribution of the maxima of earthquake intensities between 16 September 1992 and 20 February 2018.

The GEV Distribution
Classical results on order statistics on the maximum of a random sample Y 1 , ..., Y n show that the distribution of M n = max (Y 1 , ..., Y n ) degenerates its mass of probabilities into a point as the sample size increases and consequently the exact distribution of the Mn is a degenerate function when the sample size is large enough. In such cases, the distribution Mn can be stabilized using a sequence of constants {b n > 0} and {a n }, through G n = (M n − a n )/b n . Transcendent statistical results show that the asymptotic distribution of G n is a non-degenerate, limit distribution known as the generalized extreme value (GEV) distribution [30]: [31]).

Estimation of the Parameters of the GEV under Random Censoring
Consider a set of n independently and identically distributed random pairs (M t , δ t ). Specifically, in the extreme values framework, we obtain the maximum values M t , in blocks of information in which a censored value U t can exist. We assume that the variables M t and U t are independent.
Let G t (y t |µ t , σ t , κ) be the probability distribution function of M t , g (y t |µ t , σ t , κ) is its probability density function and denote G * (y t |µ t , σ t , κ)= 1 − G (y t |µ t , σ t , κ) as its survival function. Similarly, let F (u) and f (u) be the survival and density functions of U t , respectively. The joint density function of Y t may be obtained from the joint density function of M t and U t , thus we can construct the likelihood function from a random sample of maxima as follows [16]: Arranging the terms, we finally obtain: Assuming that the censoring is non-informative, i.e., F (u) and f (u) do not involve the parameter µ t , σ t , κ , then the likelihood on the data is: We analyze the extremes on a spatial region using functions adjusted to the parameters of the GEV distribution. These functions model the trends in the maxima and relate the parameters with covariates through functions. In this research, we consider using the following linear predictors: . . , n, j = 1, . . . , p 2 and k j the jth centroid obtained using the method of average linkage hierarchical clustering. We construct the design matrix C joining the columns of X and Z and rewrite the linear predictors as µ t = Cb 1 ; log σ t = σ ; κ t = κ.

Bayesian Implementation
The logarithmic scale in which the intensities of the earthquakes are measured directly affects the tails of the distribution of extreme values and increases the difficulty of correctly estimating the parameters. Bayesian inference helps solve this problem. Assigning an appropriate a priori distribution, we can include previous information about the parameters. We incorporate a priori information about the parameters θ t = (µ t , σ t , κ) of the GEV distribution, using the following hierarchical Bayesian model: where π (y t |θ t ) is the GEV distribution, π (θ t |ω) is the a priori distribution of the parameters and π (ω) is the a priori distribution of the hyperparameters. Applying the conditions in the parameters given in Equation (1), we reformulate the parameter and hyperparameter sets using ω * = (β 1 , β 2 , u 1 , u 2 , κ) and ω * * = {σ 1 , σ 2 }, respectively. Therefore, the posterior distribution can be written as: where π (y t |ω * ) is the GEV density under the conditions on the parameters given in Equation (1). We regularize the model using normal distributions with zero mean for the coefficients of the linear predictor, and a uniform a priori distribution with mean −0.75 for the shape parameter. Therefore, the prior distribution set for ω * are β 1 ∼ N 0, 10 4 I , β 2 ∼ N 0, 10 4 I , u 1 |σ 1 ∼ N(0, σ 2 1 I P 2 + D d D d ), u 2 |σ 2 ∼ N 0, σ 2 2 I P 2 and κ ∼ Uni f orm(−0.9, −0.59). The prior distribution for the hyperparameters ω * * is given by σ 2 1 ∼ Hal f − Cauchy(25) and σ 2 2 ∼ Hal f − Cauchy (25). We have based the parameters estimation on the average of 300,000 samples drawn by a MCMC random walk algorithm. To sample the a posteriori distribution, we generate candidates from a normal density function, and accept with probability: where π (θ) is the prior distribution for the parameters, π (x|θ) is the likelihood and Q is the proposal density. Parameters obtained with the maximum a posteriori (MAP) approach are also provided. The estimators of the map method are similar to those obtained by the maximum likelihood method, which, in the case of the a priori normal distribution, bring about the regularization of the model and the shrinkage of the coefficients of the linear predictor.

Simulation Study
We conducted a simulation study to verify the model efficiency, the numerical convergence and the limitations in the estimation procedure. We simulated a set of 800 extremes artificially censored from a GEV model with a parametric setting given by Equation (5) and 40 percent random censorship.
To determine how the result is affected by the number of nodes in Equation (1), we adjusted two models, the first setting p 2 = 20 and the second with p 2 = 80 in Equation (1). The estimates of the shape parameter were −0.59 and −0.69, respectively. The estimates of the σ parameter were 1.38 and 1.05, respectively. The functions used to generate the trend of the maxima and defined by the scale parameter of the GEV distribution, are shown in Figure 2a. Because the function corresponding to µ involves the product of the exponentials to the square of the covariates x 1 and x 2 , it cannot be represented as the sum of additive functions of these two covariates. For simplicity, functions for the scale and shape parameter are constant. The results of the simulation showed that the proposed model correctly estimates the actual function used for the trend (Figure 2a). Although the magnitude of the estimates differs from the original, the difference decreases as the number of knots increases, from p = 20 to p = 80 (Figure 2b,c). Similarly, a comparison of estimators of κ for both p = 20 and p = 80 reveals that estimates of the shape parameter was more accurate in the p = 20 case, with an error less than 0.01, compared with the estimate obtained with p = 80, for which the deviation was around 0.1. In contrast, the estimators of the scale parameter are slightly skewed to the left due to the effect of the a priori distributions on the parameters.
An independent testing set of 1000 observations was simulated to validate the fitted model. The correlation between the observed and the estimates was 0.86 and 0.94 with p = 20 and p = 80, respectively. The parameters estimated by the spline function can barely be interpreted individually because they constitute a large set of parameters, which have the main objective of approximating the nonlinear characteristics of an arbitrary function. However, because the variables have been standardized, it is possible to evaluate the effect of the change of each coefficient on the change in the parameters of the GEV distribution.

Data Collection
The data corresponded to 845 observations of monthly block maxima of earthquake intensities in the seismological scale of moments, between 16 September 1992 and 20 February 2018, obtained at 97 fixed monitoring stations of the Mexican Seismic Alert System (Figure 1). The SASMEX system is a network formed by the Sistema de Alerta Sísmica de la Ciudad de México (SAS) and the Sistema de Alerta Sísmica de Oaxaca (SASO). The SAS was designed and developed by Centro de Instrumentación y Registro Sísmico (CIRES) in 1988. A few years later, as a result of the damage caused to the state of Oaxaca by the earthquake that occurred on 15 June 1999, the state government gave CIRES the responsibility of building the SASO system. In 2012, the Oaxaca and Mexico City authorities agreed to integrate the functions of SASO and SAS into one system known as SASMEX [32].

Data Analysis
The extreme values were obtained from space-time blocks using the block maxima approach [33]. In the spatial plane, a total of 22 rectangular space blocks were generated, and the area of each block was 36,925 km 2 . In the temporal plane, the width of the time interval was one month. Thus, the rationale for censoring data in our study is as follows. Some blocks exhibited missing information (47% of the total number of blocks). Such missing observations could have values, in some cases, higher than the maximum value observed in the block (see Figure 3). However, in other cases, missing values could be lower than the minimum found, thus such observations should not be censored (unfortunately, this information is unknown), thus the maximum found value was "artificially" censored using a random threshold. Thus, the maximum value was eliminated, and the threshold estimated was assigned for the block and the observation was then flagged as "censored observation". Furthermore, to avoid loss of information for analysis, the difference between the threshold estimated and the maximum observed was minimal. Therefore, both censored and uncensored observations data were included to estimate likelihood following the method in Section 2.2.2. We constructed a GEV model for the Maxima of earthquakes on the Pacific coast of southern Mexico, using multivariate smoothing functions of spatiotemporal covariates, latitude (s1), longitude (s2) and time (t), to fit the trends in the non stationary GEV model. To regularize the model and avoid abrupt changes between the coefficients, we assigned a N(0, σ 2 1 I P 2 + D d D d ) a priori distribution to the coefficients u 1 s in the model in Equation (1), where D d (with d = 1) is a matrix such that D d u 1 = ∆ d u 1 constructs the vector of dth differences of u 1 .

Results and Discussion
A descriptive summary of the data by each spatial block is shown in Table 1. In this table, we can see that the magnitude of the maxima for each block is varied and heterogeneous. In the area of Block 4, whose central point is in the parallel located at 15 • N and in the meridian located at 94.5 • W, the maximum intensity recorded was M w 8.2 regarding to the magnitude of the moment, in contrast to the maximum intensity measured in Block 22, where the maxima was M w 4.6. In six of the twenty-two blocks, the maximum magnitude exceeded M w 7.0. An important characteristic that we observed in all the blocks is that there is a marked difference between the third and fourth quartiles, which indicates that a distribution of heavy tails should be appropriate for the study of this type of data. Clearly, tectonic and physical conditions in southern Mexico vary from region to region, causing seismic patterns that are reflected in extreme events of earthquakes in surrounding areas. Furthermore, these trends may also vary from one time period to another. We verified this behavior again in the statistics of means and quantiles presented in Table 1, which show that the magnitudes of the maxima studied tend to vary from one place to another. This fact justifies the modeling of the location parameter with respect to time and space.
The covariates used to estimate spatiotemporal trends were latitude, longitude and time. A hierarchical Bayesian model was fitted to analyze the maxima of earthquakes on the Pacific coast of southern Mexico. At the first level, we modeled the intensities of maxima earthquakes using the generalized extreme value distribution with random censorship; at the second level, we used multivariate smoothing spline functions to model nonstationary spatiotemporal extremes; and, at the third level, we assumed a priori distribution functions for the parameters of the model. Table 2 shows the estimates of the fitted model. In other words, this formulation has the advantage of analyzing maxima through hierarchical models, or stages. The first stage represents the previous information; the second stage represents the trends in the maxima; and the third stage represents the distribution of the data. We constructed the joint probabilities distribution of the parameters and the data, also known as posterior density, using the conditional distributions, mentioned above as stages, formally represented in Equation (2). The procedure used in this work can be easily replicated by: (1) constructing the likelihood function given by Section 2.2.2; and (2) running the Metropolis-Hastings algorithm with the acceptance probabilities given by Equation (4). We fitted our Bayesian hierarchical model with the conditions expressed in Equation (1), setting p 2 = 80. The statistical analysis was performed using the R 3.3.1 software. The samples were generated using a Metropolis-Hastings algorithm, with a Gaussian random walk. We generated 300,000 samples using a burn-in period of 50,000. A look at the logarithm of the a posteriori likelihood indicates that the algorithm successfully converged. The spatial smoothing for the years 1995, 2000, 2005, 2010, 2015 and 2018 for the location functions is shown in Figure 4a-f, respectively. Through this, we observed that the model captured the spatiotemporal patterns present in the data. Similarly, we observed variations of the location parameter in the spatial plane. Typically, these spatiotemporal variations could not be modeled using only linear functions with a few parameters, as they could not catch the complex nonlinear variations present in the seismicity phenomenon. However, it is still possible to use several types of parameterized linear functions, such as the model in Equation (1), where the interaction effects of the original covariates are included through auxiliary covariates.  Several results were obtained by analyzing the spatial smoothing set for the location parameter over time, as shown in Figure 4a-f. Firstly, we could observe that each of the functions estimated through time showed several peaks or maximums, which correspond to the places where the most intense earthquakes occur. Secondly, it can be observed that the estimated functions for the years 1995, 2000 and 2005 have similar characteristics. However, in the year 2010, the patterns began to change, until later, for the years 2015 and 2018, we observed different patterns compared to the first few years. Furthermore, the intensities of earthquakes increase over time, which shows that the risk of extreme events increases with time.
The flexibility of the proposed model allows studying the temporal variations in each geographical location on the study area. Figure 5  The probability of large earthquakes on the coasts of Oaxaca and Guerrero is greater compared to other regions, due to the proximity they have with boundary plates. However, the Puebla and Morelos earthquake hit the central zone of the country, far from the limits between tectonic plates. Recent research suggests that these earthquakes should not be unexpected. An entropy change of seismicity under time reversal, considered as a non-seismic precursor of earthquakes, was identified several months before the occurrence of the major earthquake [34,35]. In this study, we developed a model of extreme values with censoring data for the earthquakes on the Pacific coast of southern Mexico using a robust and solidly-based theory. An important application of the results of our study is through a risk map or return level map. The return level Z p is the threshold at which an extreme value is exceeded with probability p, which is expected to occur once every 1/p years [36]. Figure 6 shows the strength of earthquakes for a return period of 25 years built using a spatial model. The stability of the results shown in Figure 6 is strongly influenced by the size of the sample. Therefore, adding or eliminating a set of extreme values, the results obtained should change. The initial value of the algorithm also modifies the convergence time of the MCMC algorithm, causing in the worst case, the non-convergence of the algorithm. Isolines in the map ( Figure 6) can be used to discover the spatial trend of strength of earthquakes. In fact, earthquakes greater than M w 8 regarding to the magnitude of the moment in a return period of 25 years can be expected in the Southeast part of the State of Chiapas ( Figure 6). An interesting issue that explains the spatial trend of strength of earthquake is that the boundaries of the Cocos, North America and Caribbean plates are converging slightly to the south from the region that presents the highest estimated values. Further, in the boundary between Veracruz and Oaxaca States can be identified a region moderately vulnerable to the presence of earthquakes with values of strength of earthquakes higher than 7.5. Finally, a decrease of the estimated intensity of earthquakes can be detected in regions further away from the southern coast of Mexico. This research confirmed the latent risk of earthquakes in the region using empirical data gathered from a network of sensors spatially-distributed and administered by the SASMEX system. These findings should encourage using stronger specifications when building civil infrastructure in order to decrease the effect of earthquakes. Similar to the findings reported by studies in other countries, the extreme events of earthquakes studied here exhibited spatial variations within the study area [37]. These spatial variations have also been found in the results of various types of studies on seismogenic zoning [38]. The analysis of the extreme values of earthquakes using the extreme distribution of the value in the non-stationary approach, in addition to allowing the calculation of the probabilities of extreme events, helps to solve the problem of seismogenic zoning in a limited area, by establishing suitably chosen thresholds in the isolines of the estimated smoothed function of the location parameter.

Conclusions
In this research work, we studied the extreme non-stationary values of earthquakes with censored data using a Bayesian approach. One of the benefits of this model is that the spline function used to calculate the parameters of the GEV distribution allows for the adjustment of a wide variety of nonlinear functions, which are common in this type of data. The parameters are estimated via Markov chain Monte-Carlo (MCMC), which has the advantage of sampling analytically intractable posterior distributions and simultaneously estimating unknown hyperparameters. It also allows the elimination of potential values of invalid parameters. Additionally, estimates were obtained by adjusting to the non-stationary GEV model, in which we corrected the likelihood function in order to consider the effect of random censorship to obtain more reliable results. Our model was validated through a rigorous simulation study that gives support and validity to our results. An important challenge is extending this work to the case when using different a priori distributions for the GEV parameters. Our findings indicate the existence of perfectly delimited areas with greater risks of occurrence of strong earthquakes. These results should help administrative authorities to improve prevention policies and standards in the case of an extreme event.