Objective Bayesian Prediction of Future Record Statistics Based on the Exponentiated Gumbel Distribution: Comparison with Time-Series Prediction

: The interest in the study of record statistics has been increasing in recent years in the context of predicting stock markets and addressing global warming and climate change problems such as cyclones and ﬂoods. However, because record values are mostly rare observed, its probability distribution may be skewed or asymmetric. In this case, the Bayesian approach with a reasonable choice of the prior distribution can be a good alternative. This paper presents an objective Bayesian method for predicting future record values when observed record values have a two-parameter exponentiated Gumbel distribution with the scale and shape parameters. For objective Bayesian analysis, objective priors such as the Jeffreys and reference priors are ﬁrst derived from the Fisher information matrix for the scale and shape parameters, and an analysis of the resulting posterior distribution is then performed to examine its properness and validity. In addition, under the derived objective prior distributions, a simple algorithm using a pivotal quantity is proposed to predict future record values. To validate the proposed approach, it was applied to a real dataset. For a closer examination and demonstration of the superiority of the proposed predictive method, it was compared to time-series models such as the autoregressive integrated moving average and dynamic linear model in an analysis of real data that can be observed from an inﬁnite time series comprising independent sample values.


Introduction
The occurrence of extreme events such as extreme temperatures, excess flood peaks, and rapid increases in pollutant concentrations has steadily increased over the past decade. However, the volume of data generated by such events is relatively small compared to big data generated by normal events occurring daily. In this case, providing accurate predictions for extreme events might be a challenging task due to the insufficient number of past records. Bayesian methods may be more suitable for modeling data to small sample sizes with skewness or lack of symmetry as in the case of record values, provided that the Bayesian method involves a reasonable choice for the prior distribution because the Bayesian methods do not rely on asymptotic theory in the same way that frequentist methods do.
The concept of record values was introduced by Chandler [1] and it can be described as follows. Let {X 1 , X 2 , . . .} be a sequence of independent and identically distributed random variables. Then, for every i < j, X j is called the upper (lower) record value if X j > (<)X i , which indicates that X j is higher (lower) than all previous observations. That is, the upper (lower) record values include the members of a series that are larger (smaller) than all preceding members. The indices for which upper record values occur are informed by the record times {U(k), k ≥ 1}, where U(k) = min{j|j > U(k − 1), X j > X U(k−1) }, k > 1, with U(1) = 1. The record times for the lower record values are {L(k), k ≥ 1}, where L(k) = min{j|j > L(k − 1), X j < X L(k−1) }, k > 1, with L(1) = 1. Therefore, the sequences of the upper and lower record values are denoted as {X U(k) , k = 1, 2, . . .} and {X L(k) , k = 1, 2, . . .}, respectively, from the original sequence {X 1 , X 2 , . . .}.
Since such record values arise in many real-world situations related to climate, economics, sports, and life test, relevant studies have been conducted in various fields. Coles and Tawn [2] analyzed a daily rainfall series for modeling the extremes of the rainfall process in the context of record values. Madi and Raqab [3] analyzed average temperatures in Neuenburg, Switzerland, using a Bayesian predictive method for record values from the Pareto distribution. Wergen et al. [4] analyzed both the probability of occurrence and PDF of record-breaking values for temperatures in Europe and the United States. Seo and Kim [5] proposed an objective Bayesian inference method for record values from the Gumbel distribution, which was applied to the concentration analysis of sulfur dioxide. Seo and Kang [6] proposed a estimation method using record values from an exponentiated half-logistic distribution in Bayesian and non-Bayesian perspectives. The authors demonstrated the efficiency of their method by comparing it to the existing estimation methods through rainfall data analysis.
This paper proposes a predictive method based on an objective Bayesian approach that can save the effort of finding an exact prior distribution when there is no sufficient information in the context of record statistics values from the exponentiated Gumbel distribution (EGD) with cumulative distribution function (CDF) where σ and λ denote the scale and shape parameters, respectively. The EGD is a generalized version of the GD, which is the most widely applied statistical distribution in the extreme value analysis of extreme events such as global warming, floods, heavy rainfall, and high wind speeds. The EGD can be considered as being simply the λth power of the CDF of the GD with the scale parameter σ. Therefore, the EGD can lead to an improved performance and applicability of models built over a variety of complex datasets compared to the GD. Note that it is possible to apply time-series techniques if no data are lost during acquiring record values since the lower record time L(k) is the serial number of record values in an infinite time series. For comparison with the proposed objective Bayesian predictive method, two types of time-series models are considered in this study: the autoregressive integrated moving average (ARIMA) model introduced by Box et al. [7] and the dynamic linear model (DLM) developed by West and Harrison [8]. The rest of the paper is organized as follows. Section 2 presents objective priors for unknown parameters of the EGD in the context of record statistics values, along with the corresponding posterior analysis and predictive method. Section 3 provides a brief description of the ARIMA and DLM that are employed in this study as benchmarks to validate the proposed objective Bayesian method. Section 4 presents the results of applying both the time-series and objective Bayesian models to real data. Section 5 concludes this paper.

Bayesian Prediction
The aim of this study is to predict future lower record values based on an objective Bayesian approach. To accomplish this, an objective Bayesian approach that does not require determining hyperparameters is presented first. The following subsection introduces objective priors based on the Fisher information (FI) matrix for unknown parameters of the EGD with the CDF (1).

Objective Prior
Let X L(1) , . . . , X L(k) be the lower record values from the CDF (1). Then, the corresponding likelihood function and its natural logarithm can be expressed as respectively. For computational convenience, let θ = 1/σ. Then, based on the log-likelihood (2), the FI matrix for (λ, θ) can be defined as follows.
Proof. In the FI matrix I(λ, θ) = I 11 I 12 I 21 I 22 , the element I 11 can be easily computed, while the other elements can be expressed as Then, the proof is completed given the marginal density function of X L(i) defined in Ahsanullah [9] as The objective priors such as the Jeffreys and reference priors based on the FI matrix (3) are defined according to the following theorem.
Proof. According to the definition of the Jeffreys prior (Jeffreys [10]), it follows that where |I| denotes the determinant of the FI matrix (3). This completes the proof.
In the following, the reference priors for each parameter of interest are derived from the algorithm provided in Berger and Bernardo [11].
, and, if θ is the parameter of interest, the reference prior for (θ, λ) is Proof. When λ is the parameter of interest, the conditional prior distribution of θ given λ can be defined based on the FI matrix (3) as Then, by choosing a sequence of compact sets In addition, the marginal reference prior for λ can be defined based on the FI matrix (3) and (4) as , which leads to the following reference prior: for any fixed point λ 0 . When θ is the parameter of interest, the same argument is applied. Let Then, and In addition, the marginal reference prior for θ can be expressed as from which the reference prior can be expressed as for any fixed point θ 0 . This completes the proof.
Note that, since all the derived objective priors are improper, the corresponding posterior distribution should be proved to be proper. Since the Jeffreys prior π J (λ, θ) and reference prior π R2 (θ, λ) have the same form, the notation π JR (λ, θ) is used from now on.

Posterior Analysis
Let x L = {x L(1) , . . . , x L(k) } be the observed lower record values. Then, the objective prior π JR (λ, θ) results in the following marginal posterior density functions of λ and θ: and respectively. Note that the marginal posterior distribution of θ has a gamma distribution with the parameters k and ∑ k i=1 x L(i) − kx L(k) . Then, the Bayes estimators under the squared error loss function from the marginal posterior density functions (5) and (6) can be expressed aŝ respectively. In terms of σ, the corresponding marginal posterior density function can be expressed as Since it is the PDF of an inverse gamma distribution with the parameters k and ∑ k i=1 x L(i) − kx L(k) , the Bayes estimator of σ under the squared error loss function can be expressed aŝ Theorem 3. From the Frequentist perspective, the estimatorσ JR is an unbiased estimator of σ.
Proof. According to Lemma 2 provided in Wang and Ye [12], independent and identically distributed random variables from the uniform distribution on (0, 1) are defined as are independent random variables having χ 2 distributions with 2j(j = 1, . . . , k) degrees of freedom. Then, the estimatorσ JR has a gamma distribution with the parameters k − 1 and (k − 1)/σ for any k > 1 because has a gamma distribution with the parameters k − 1 and 1. This completes the proof.
The highest posterior density (HPD) credible intervals (CrIs) for λ and σ can be constructed by generating the MCMC samples from the marginal posterior density functions (5) and (6), respectively. However, since the marginal posterior density function (5) is not a well-known probability distribution, sampling from it is a difficult task. Instead, sampling can be indirectly performed from the relationship π JR (λ, θ|x L ) = π JR (λ|θ, x L )π JR (θ|x L ) because the conditional posterior distribution π JR (λ|θ, x L ) ∝ λ k−1 e −λe −θx L(k) has a gamma distribution with the parameters k and e −θx L(k) . To achieve this, θ should be generated from its marginal posterior distribution first, and then λ should be generated from its conditional posterior distribution given the generated value of θ. Finally, σ = 1/θ can be computed. Then, the 100(1 − α)% equal-tails (ETs) and HPD CrIs can be constructed for 0 < α < 1 using the method proposed in Chen and Shao [13]. Under the prior π R1 (λ, θ), the resulting posterior π R1 (λ, θ|x L ) is proper because Q 2 (λ) > 1. However, since it is not possible to express it in a closed form as we know it, MCMC samples for λ and θ can be generated using the Metropolis-Hastings algorithm. For efficient mixing, the proposal variance-covariance structure is updated adaptively. For the corresponding Bayes estimators under the squared error loss function, the notationsλ R1 andσ R1 are used.

Prediction
Let X L(s) (s = k + 1, k + 2, . . .) be the future lower record values. Since the sequence {X L(i) , i = 1, 2, . . .} is a Markov chain, the conditional density function of X L(s) given X L = x L is the same as that of X L(s) given X L(k) = x L(k) . That is, it follows that by Ahsanullah [9]. Then, for the EGD with the CDF (1), (7) becomes and the corresponding Bayesian predictive density function can be expressed as where π(λ, θ|x L ) is a general joint posterior distribution for (λ, θ). Note that it requires very complex and tedious computations. In fact, there is no guarantee that it can be expressed in a closed form. Instead, a much simpler approach is to use the pivotal quantity that can be obtained by the transformation of a random variable.
Let H = λ e −θy − e −θx L(k) in the conditional density function (8). Then, it has a gamma distribution with the parameters s − k and 1 with a PDF of because y < x L(k) maps onto h > 0 and the Jacobian of the transformation is , which leads to the following algorithm for generating the MCMC samples y i (i = 1, . . . , N).
Step 1a. Generate h i from the gamma distribution with the parameters (s − k) and 1.
Then, the corresponding 100(1 − α)% predictive interval (PI) for 0 < α < 1 can be constructed using the method proposed in [13] as in the case of λ and θ. For the purpose of clarity, X JR L(s) | x L(k) and X R1 L(s) | x L(k) are used to denote future lower record values under the priors π JR (λ, θ) and π R1 (λ, θ), respectively.

Time Series Approach
Providing that record values are observed from time series of uncorrelated random variables sampled from continuous probability distributions, the proposed Bayesian method presented in the previous section is compared to the ARIMA and DLM time-series techniques as described below.
The ARIMA model is the most widely used approach to time-series forecasting. Conventionally, it is defined using three components (p, d, q), where • p denotes the order of the autoregressive (AR) term • d denotes the number of differencing required to make the time series stationary • q denotes the order of the moving average (MA) term.
Here, the autoregressive (AR) process assumes that the current value of the series y t can be expressed as a function of p past values y t−1 , y t−2 , . . . , y t−p in a form of µ is the mean of this process, φ 1 , φ 2 , . . . , φ p are constants (φ p = 0), and ε t is a weak white noise series with a mean of zero and a variance of σ 2 ε . The MA process uses past forecast errors expressed as that is, a weighted average of the past values of the white noise process ε t . Then, the time series Y t is an ARIMA(p, d, q) process if ∆ d Y t is ARMA(p, q) obtained by combing the AR and MA terms, where ∆ d is the dth-order differencing operator. For non-stationary data, one usually fits an ARMA model after taking differences for the data until stationarity is achieved.
The second time-series approach considered in this study for comparison with the proposed method is the DLM. Let Y t be an m-dimensional vector observed at time t, while δ t be a generally unobservable p-dimensional state vector of the system at time t. Then, the DLM can be defined as for each time t ≥ 1 together with a prior distribution for the p-dimensional state vector at time t = 0, δ 0 ∼ N p (m 0 , C 0 ), where F t and G t are known matrices of m × p and p × p, respectively, and V t and W t are variance matrices. Furthermore, it is assumed that the error sequences v t and w t are independent, and independent of δ 0 . Note that the lower record value from a univariate time series has a strong trend of decreasing through time. Therefore, a DLM with is considered, namely, the linear growth model (LGM) where µ t and β t denote the local level and local growth rate at time t, respectively, and v t , w t,1 , and w t,2 denote uncorrelated errors.

Sulfur Dioxide Data
This section demonstrates the superiority of the proposed Bayesian method by comparing it to the ARIMA and DLM methods.
The three methods are applied to the time-series data representing sulfur dioxide emissions in the United States (U.S.) from 1970 to 2017 (in 1000 tons) measured by the U.S. Environmental Protection Agency. Due to the implementation of the Acid Rain Program created under Title IV of the 1990 Clean Air Act, sulfur dioxide emissions have decreased significantly over the last decades through a cap and trade program for fossil-fuel powered plants. The observed volume of sulfur dioxide emissions and its descriptive statistics are presented in Figure 1 and Table 1, respectively. Note that each data point was divided by 1000 for computational convenience; given that the data continued to decrease during the observation period as shown in Figure 1, they were all used without losing data during acquiring the lower record values.  To conduct the goodness-of-fit test for the observed sulfur dioxide data, the replicated data are first considered. If the estimated model is adequate, then it should look similar to the observed data. The replicated data are generated from the Bayesian predictive function is also computed. These results are reported in Figure 2. It can be noticed from Figure 2 that the estimated models fit the observed sulfur dioxide lower record values very well, and the estimated models under the priors π JR (λ, θ|x L ) and π R1 (λ, θ|x L ) provide almost the same results for the considered statistical criteria. Table 2 reports the estimation results for the derived priors for comparison and corresponding maximum likelihood counterparts. For the maximum likelihood procedure, the maximum likelihood estimators (MLEs)λ andσ are obtained by maximizing the log-likelihood function (2), while the approximate 100(1 − α)% confidence intervals (CIs) are calculated based on the MLEs aŝ λ ± Z α/2 Var λ andσ ± Z α/2 Var θ where Z α denotes the upper α percentile point of the standard normal distribution, and Var λ and Var θ are the diagonal elements of the asymptotic variance-covariance matrix of the MLE obtained by inverting the Fisher information matrix (3). For the shape parameter λ, the Bayes estimateλ R1 has a slightly lower value than the other estimatesλ andλ JR that have almost the same values. However, the 95% HPD CrI under the prior π R1 (λ, θ) has the shortest length. For the scale parameter σ, all estimates vary slightly, while the approximate 95% CI based on theσ has a shorter length than the other CrIs have.   For prediction, the last lower record value is assumed to be known. As mentioned earlier, since no data are lost during acquiring the observed lower record values, the time-series analysis outlined in Section 3 can be conducted at the same time. Table 3 reports the prediction results for the next lower record value X L(20) | x L(19) . The R package dlm (Petris [14]) was used to estimate the parameters and state vector in the LGM (9) with δ 0 ∼ N 2 33.361 −2.471 , 0.593 0 0 0.190 . In addition, for the ARIMA model, ARIMA(0, 2, 1) was chosen as the best model in terms of the corrected Akaike Information Criterion (AICc), indicating that it has the smallest AICc value when q = 1 with ϕ = −0.705 after differencing the data twice. The forecast accuracy are evaluated in terms of the mean absolute deviation (MAD), the mean square error (MSE), and the mean absolute percentage error (MAPE), defined respectively as where e t = x t −x t andx t is a point forcast of x t . In this example, n = k and x t = x L(t) . These results are reported in Table 4, which indicates that there is little difference in the predictive performance of the two models. Table 3 shows that the proposed Bayesian PI has a shorter length than those obtained for the considered time-series ARIMA model and DLM, especially under the prior π JR (λ, θ). That is, the predictive result under the prior π JR (λ, θ) shows the best performance in terms of uncertainty. Using the best performing Bayesian predictive model in terms of uncertainty, the Bayesian predictive density functions for the three future record values are estimated as the kernel density functions based on their MCMC samples. The results are plotted in Figure 3, which shows that both estimated Bayesian predictive models under the priors π JR (λ, θ) and π R1 (λ, θ) have a greater variance as the future record time L(s) increases.  Figure 3. Estimated kernel density for X L(s) | x L(19) under the priors π JR (λ, θ) and π R1 (λ, θ).

Conclusions
This paper defined the Jeffreys and reference priors for unknown parameters of the EGD based on record values and proposed a Bayesian method for predicting future record values. The method makes it very easy to generate MCMC samples for prediction. To validate the proposed method, it was compared to two time-series approaches, namely, the ARIMA model and DLM, using a sulfur dioxide emissions dataset. The results of the comparison demonstrated that the proposed method outperforms the time-series approaches in terms of uncertainty.
While there was no clear difference in the results of the goodness-of-fit tests among the proposed objective prior distributions when analyzing the observed data, the results of forecasting under the prior distribution π JR (λ, θ) were better than those under the prior distribution π R1 (λ, θ); both derived prior distributions were proved to be valid.