Objective Bayesian Prediction of Future Record Statistics Based on the Exponentiated Gumbel Distribution: Comparison with Time-Series Prediction

Kim, Yongku; Seo, Jung In

doi:10.3390/sym12091443

Open AccessArticle

Objective Bayesian Prediction of Future Record Statistics Based on the Exponentiated Gumbel Distribution: Comparison with Time-Series Prediction

by

Yongku Kim

¹

and

Jung In Seo

^2,*

¹

Department of Statistics, Kyungpook National University, Daegu 41566, Korea

²

Division of Convergence Education, Halla University, Wonju 26404, Korea

^*

Author to whom correspondence should be addressed.

Symmetry 2020, 12(9), 1443; https://doi.org/10.3390/sym12091443

Submission received: 13 July 2020 / Revised: 28 July 2020 / Accepted: 2 August 2020 / Published: 1 September 2020

Download

Browse Figures

Versions Notes

Abstract

:

The interest in the study of record statistics has been increasing in recent years in the context of predicting stock markets and addressing global warming and climate change problems such as cyclones and floods. However, because record values are mostly rare observed, its probability distribution may be skewed or asymmetric. In this case, the Bayesian approach with a reasonable choice of the prior distribution can be a good alternative. This paper presents an objective Bayesian method for predicting future record values when observed record values have a two-parameter exponentiated Gumbel distribution with the scale and shape parameters. For objective Bayesian analysis, objective priors such as the Jeffreys and reference priors are first derived from the Fisher information matrix for the scale and shape parameters, and an analysis of the resulting posterior distribution is then performed to examine its properness and validity. In addition, under the derived objective prior distributions, a simple algorithm using a pivotal quantity is proposed to predict future record values. To validate the proposed approach, it was applied to a real dataset. For a closer examination and demonstration of the superiority of the proposed predictive method, it was compared to time-series models such as the autoregressive integrated moving average and dynamic linear model in an analysis of real data that can be observed from an infinite time series comprising independent sample values.

Keywords:

exponentiated gumbel distribution; objective Bayesian analysis; record value; time series

Graphical Abstract

1. Introduction

The occurrence of extreme events such as extreme temperatures, excess flood peaks, and rapid increases in pollutant concentrations has steadily increased over the past decade. However, the volume of data generated by such events is relatively small compared to big data generated by normal events occurring daily. In this case, providing accurate predictions for extreme events might be a challenging task due to the insufficient number of past records. Bayesian methods may be more suitable for modeling data to small sample sizes with skewness or lack of symmetry as in the case of record values, provided that the Bayesian method involves a reasonable choice for the prior distribution because the Bayesian methods do not rely on asymptotic theory in the same way that frequentist methods do.

The concept of record values was introduced by Chandler [1] and it can be described as follows. Let

{X_{1}, X_{2}, \dots}

be a sequence of independent and identically distributed random variables. Then, for every

i < j

,

X_{j}

is called the upper (lower) record value if

X_{j} > (<) X_{i}

, which indicates that

X_{j}

is higher (lower) than all previous observations. That is, the upper (lower) record values include the members of a series that are larger (smaller) than all preceding members. The indices for which upper record values occur are informed by the record times

{U (k), k \geq 1}

, where

U (k) = min {j | j > U (k - 1)

,

X_{j} > X_{U (k - 1)}}, k > 1

, with

U (1) = 1

. The record times for the lower record values are

{L (k), k \geq 1}

, where

L (k) = min {j | j > L (k - 1), X_{j} < X_{L (k - 1)}}, k > 1

, with

L (1) = 1

. Therefore, the sequences of the upper and lower record values are denoted as

{X_{U (k)}, k = 1, 2, \dots}

and

{X_{L (k)}, k = 1, 2, \dots}

, respectively, from the original sequence

{X_{1}, X_{2}, \dots}

.

Since such record values arise in many real-world situations related to climate, economics, sports, and life test, relevant studies have been conducted in various fields. Coles and Tawn [2] analyzed a daily rainfall series for modeling the extremes of the rainfall process in the context of record values. Madi and Raqab [3] analyzed average temperatures in Neuenburg, Switzerland, using a Bayesian predictive method for record values from the Pareto distribution. Wergen et al. [4] analyzed both the probability of occurrence and PDF of record-breaking values for temperatures in Europe and the United States. Seo and Kim [5] proposed an objective Bayesian inference method for record values from the Gumbel distribution, which was applied to the concentration analysis of sulfur dioxide. Seo and Kang [6] proposed a estimation method using record values from an exponentiated half-logistic distribution in Bayesian and non-Bayesian perspectives. The authors demonstrated the efficiency of their method by comparing it to the existing estimation methods through rainfall data analysis.

This paper proposes a predictive method based on an objective Bayesian approach that can save the effort of finding an exact prior distribution when there is no sufficient information in the context of record statistics values from the exponentiated Gumbel distribution (EGD) with cumulative distribution function (CDF)

\begin{matrix} F (x) = {(e^{- e^{- x / σ}})}^{λ}, - \infty < x < \infty, λ, σ > 0, \end{matrix}

(1)

where

σ

and

λ

denote the scale and shape parameters, respectively. The EGD is a generalized version of the GD, which is the most widely applied statistical distribution in the extreme value analysis of extreme events such as global warming, floods, heavy rainfall, and high wind speeds. The EGD can be considered as being simply the

λ

th power of the CDF of the GD with the scale parameter

σ

. Therefore, the EGD can lead to an improved performance and applicability of models built over a variety of complex datasets compared to the GD. Note that it is possible to apply time-series techniques if no data are lost during acquiring record values since the lower record time

L (k)

is the serial number of record values in an infinite time series. For comparison with the proposed objective Bayesian predictive method, two types of time-series models are considered in this study: the autoregressive integrated moving average (ARIMA) model introduced by Box et al. [7] and the dynamic linear model (DLM) developed by West and Harrison [8].

The rest of the paper is organized as follows. Section 2 presents objective priors for unknown parameters of the EGD in the context of record statistics values, along with the corresponding posterior analysis and predictive method. Section 3 provides a brief description of the ARIMA and DLM that are employed in this study as benchmarks to validate the proposed objective Bayesian method. Section 4 presents the results of applying both the time-series and objective Bayesian models to real data. Section 5 concludes this paper.

2. Bayesian Prediction

The aim of this study is to predict future lower record values based on an objective Bayesian approach. To accomplish this, an objective Bayesian approach that does not require determining hyperparameters is presented first. The following subsection introduces objective priors based on the Fisher information (FI) matrix for unknown parameters of the EGD with the CDF (1).

2.1. Objective Prior

Let

X_{L (1)}, \dots, X_{L (k)}

be the lower record values from the CDF (1). Then, the corresponding likelihood function and its natural logarithm can be expressed as

\begin{matrix} L (λ, σ) & = f (x_{L (k)}) \prod_{i = 1}^{k - 1} \frac{f (x_{L (i)})}{F (x_{L (i)})} \\ = {(\frac{λ}{σ})}^{k} e^{- λ e^{- x_{L (k)} / σ}} e^{- \sum_{i = 1}^{k} x_{L (i)} / σ} \end{matrix}

and

\begin{matrix} log L (λ, σ) = k log λ - k log σ - λ e^{- x_{L (k)} / σ} - \frac{1}{σ} \sum_{i = 1}^{k} x_{L (i)}, \end{matrix}

(2)

respectively. For computational convenience, let

θ

=

1 / σ

. Then, based on the log-likelihood (2), the FI matrix for

(λ, θ)

can be defined as follows.

Proposition 1.

The FI matrix for

(λ, θ)

is of the form

\begin{matrix} I (λ, θ) & = (\begin{matrix} k / λ^{2} & k Q_{1} (λ) / (λ θ) \\ k Q_{1} (λ) / (λ θ) & k Q_{2} (λ) / θ^{2} \end{matrix}), \end{matrix}

(3)

where

\begin{matrix} Q_{1} (λ) = log λ - ψ (k + 1), \\ Q_{2} (λ) = {[Q_{1} (λ)]}^{2} + ψ_{1} (k + 1) + 1, \end{matrix}

and

ψ (\cdot)

and

ψ_{1} (\cdot)

are the digamma and trigamma functions, respectively.

Proof.

In the FI matrix

\begin{matrix} I (λ, θ) & = (\begin{matrix} I_{11} & I_{12} \\ I_{21} & I_{22} \end{matrix}), \end{matrix}

the element

I_{11}

can be easily computed, while the other elements can be expressed as

\begin{matrix} I_{12} & = E (- \frac{\partial^{2}}{\partial λ \partial θ} log L (λ, θ)) \\ = E (X_{L (k)} e^{- θ X_{L (k)}}) \\ = I_{21} \end{matrix}

and

\begin{matrix} I_{22} & = E (- \frac{\partial^{2}}{\partial θ^{2}} log L (λ, θ)) \\ = \frac{k}{θ^{2}} + λ E (X_{L (k)}^{2} e^{- θ X_{L (k)}}) . \end{matrix}

Then, the proof is completed given the marginal density function of

X_{L (i)}

defined in Ahsanullah [9] as

\begin{matrix} f_{X_{L (i)}} (x) = \frac{1}{Γ (k)} {[- log F (x)]}^{i - 1} f (x) \end{matrix}

and assuming

Y = λ e^{- θ X_{L (k)}}

. □

The objective priors such as the Jeffreys and reference priors based on the FI matrix (3) are defined according to the following theorem.

Theorem 1.

The Jeffreys prior for

(λ, θ)

is

\begin{matrix} π_{J} (λ, θ) \propto \frac{1}{λ θ} . \end{matrix}

Proof.

According to the definition of the Jeffreys prior (Jeffreys [10]), it follows that

\begin{matrix} π_{J} (λ, θ) & \propto \sqrt{| I |} \\ = \frac{k}{λ θ} [ψ_{1} (k + 1) + 1], \end{matrix}

where

| I |

denotes the determinant of the FI matrix (3). This completes the proof. □

In the following, the reference priors for each parameter of interest are derived from the algorithm provided in Berger and Bernardo [11].

Theorem 2.

If λ is the parameter of interest, the reference prior for

(λ, θ)

is

\begin{matrix} π_{R 1} (λ, θ) \propto \frac{1}{θ λ \sqrt{Q_{2} (λ)}}, \end{matrix}

and, if θ is the parameter of interest, the reference prior for

(θ, λ)

is

\begin{matrix} π_{R 2} (θ, λ) \propto \frac{1}{λ θ} . \end{matrix}

Proof.

When

λ

is the parameter of interest, the conditional prior distribution of

θ

given

λ

can be defined based on the FI matrix (3) as

\begin{matrix} π (θ | λ) & = \sqrt{I_{22}} \\ = \frac{1}{θ} \sqrt{k Q_{2} (λ)} . \end{matrix}

Then, by choosing a sequence of compact sets

Ω_{i} = (d_{1 i}, d_{2 i}) \times (d_{3 i}, d_{4 i})

for

(λ, θ)

such that

d_{1 i}

,

d_{3 i} \to 0

,

d_{2 i}, d_{4 i} \to \infty

as

i \to \infty

, it follows that

\begin{matrix} k_{1 i}^{- 1} (λ) & = \int_{d_{4 i}}^{d_{3 i}} π (θ | λ) d θ \\ = \sqrt{k Q_{2} (λ)} [log (d_{4 i}) - log (d_{3 i})], \end{matrix}

and

\begin{matrix} p_{i} (θ | λ) & = k_{1 i} (λ) π (θ | λ) \\ = \frac{1}{θ [log (d_{4 i}) - log (d_{3 i})]} . \end{matrix}

(4)

In addition, the marginal reference prior for

λ

can be defined based on the FI matrix (3) and (4) as

\begin{matrix} π_{i} (λ) & = exp [\frac{1}{2} \int_{d_{3 i}}^{d_{4 i}} p_{i} (θ | λ) log (\frac{| I |}{I_{22}}) d θ] \\ = \frac{1}{λ \sqrt{Q_{2} (λ)}}, \end{matrix}

which leads to the following reference prior:

\begin{matrix} π_{R 1} (λ, θ) & = lim_{i \to \infty} [\frac{k_{1 i} (λ) π_{i} (λ)}{k_{1 i} (λ_{0}) π_{i} (λ_{0})}] π (θ | λ) \\ \propto \frac{1}{θ λ \sqrt{Q_{2} (λ)}} \end{matrix}

for any fixed point

λ_{0}

. When

θ

is the parameter of interest, the same argument is applied.

Let

\begin{matrix} π (λ | θ) & = \sqrt{I_{11}} \\ = \frac{\sqrt{k}}{λ} . \end{matrix}

Then,

\begin{matrix} k_{2 i}^{- 1} (θ) & = \int_{d_{1 i}}^{d_{2 i}} π (λ | θ) d λ \\ = \sqrt{k} [log (d_{2 i}) - log (d_{1 i})] \end{matrix}

and

\begin{matrix} p_{i} (λ | θ) & = k_{2 i} (θ) π (λ | θ) \\ = \frac{1}{λ [log (d_{2 i}) - log (d_{1 i})]} . \end{matrix}

In addition, the marginal reference prior for

θ

can be expressed as

\begin{matrix} π_{i} (θ) & = exp [\frac{1}{2} \int_{d_{1 i}}^{d_{2 i}} p_{i} (λ | θ) log (\frac{| I |}{I_{11}}) d λ] \\ \propto \frac{1}{θ}, \end{matrix}

from which the reference prior can be expressed as

\begin{matrix} π_{R 2} (θ, λ) & = lim_{i \to \infty} [\frac{k_{2 i} (θ) π_{i} (θ)}{k_{2 i} (θ_{0}) π_{i} (θ_{0})}] π (λ | θ) \\ \propto \frac{1}{λ θ} \end{matrix}

for any fixed point

θ_{0}

. This completes the proof. □

Note that, since all the derived objective priors are improper, the corresponding posterior distribution should be proved to be proper. Since the Jeffreys prior

π_{J} (λ, θ)

and reference prior

π_{R 2} (θ, λ)

have the same form, the notation

π_{J R} (λ, θ)

is used from now on.

2.2. Posterior Analysis

Let

x_{L} = {x_{L (1)}, \dots, x_{L (k)}}

be the observed lower record values. Then, the objective prior

π_{J R} (λ, θ)

results in the following marginal posterior density functions of

λ

and

θ

:

\begin{matrix} π_{J R} (λ | x_{L}) & = \frac{\int_{θ} L (λ, θ) π_{J R} (λ, θ) d θ}{\int_{θ} \int_{λ} L (λ, θ) π_{J R} (λ, θ) d θ d λ} \\ = \sum_{j = 0}^{\infty} \frac{1}{{(\sum_{i = 1}^{k} x_{L (i)} + j x_{L (k)})}^{k}} λ^{k - 1} \frac{{(- λ)}^{j}}{j!} \frac{{(\sum_{i = 1}^{k} x_{L (i)} - k x_{L (k)})}^{k}}{Γ (k)} \end{matrix}

(5)

and

\begin{matrix} π_{J R} (θ | x_{L}) & = \frac{\int_{λ} L (λ, θ) π_{J R} (λ, θ) d λ}{\int_{θ} \int_{λ} L (λ, θ) π_{J R} (λ, θ) d θ d λ} \\ = \frac{{(\sum_{i = 1}^{k} x_{L (i)} - k x_{L (k)})}^{k}}{Γ (k)} θ^{k - 1} e^{- θ (\sum_{i = 1}^{k} x_{L (i)} - k x_{L (k)})}, \end{matrix}

(6)

respectively. Note that the marginal posterior distribution of

θ

has a gamma distribution with the parameters k and

\sum_{i = 1}^{k} x_{L (i)} - k x_{L (k)}

. Then, the Bayes estimators under the squared error loss function from the marginal posterior density functions (5) and (6) can be expressed as

\begin{matrix} {\hat{λ}}_{J R} & = \int_{λ} λ π_{J R} (λ | x_{L}) d λ \\ = k {(1 - \frac{x_{L (k)}}{\sum_{i = 1}^{k} x_{i} - k x_{k}})}^{- k} \end{matrix}

and

\begin{matrix} {\hat{θ}}_{J R} = \frac{k}{\sum_{i = 1}^{k} x_{L (i)} - k x_{L (k)}}, \end{matrix}

respectively.

In terms of

σ

, the corresponding marginal posterior density function can be expressed as

\begin{matrix} π_{J R} (σ | x_{L}) & = \frac{{(\sum_{i = 1}^{k} x_{L (i)} - k x_{L (k)})}^{k}}{Γ (k)} σ^{- k - 1} e^{- \frac{1}{σ} (\sum_{i = 1}^{k} x_{L (i)} - k x_{L (k)})} . \end{matrix}

Since it is the PDF of an inverse gamma distribution with the parameters k and

\sum_{i = 1}^{k} x_{L (i)} - k x_{L (k)}

, the Bayes estimator of

σ

under the squared error loss function can be expressed as

\begin{matrix} {\hat{σ}}_{J R} = \frac{\sum_{i = 1}^{k} x_{L (i)} - k x_{L (k)}}{k - 1}, k > 1 . \end{matrix}

Theorem 3.

From the Frequentist perspective, the estimator

{\hat{σ}}_{J R}

is an unbiased estimator of σ.

Proof.

According to Lemma 2 provided in Wang and Ye [12], independent and identically distributed random variables from the uniform distribution on

(0, 1)

are defined as

\begin{matrix} U_{j} & = {(\frac{T_{j}}{T_{j + 1}})}^{j} \\ = e^{j (X_{L (j + 1)} - X_{L (j)}) / σ}, j = 1, \dots, k - 1, \end{matrix}

where

\begin{matrix} T_{j} & = 2 \sum_{i = 1}^{j} [log F (x_{L (i - 1)}) - log F (x_{L (i)})] \\ = 2 λ e^{- X_{L (j)} / σ}, j = 1, \dots, k (log F (x_{L (0)}) \equiv 0) \end{matrix}

are independent random variables having

χ^{2}

distributions with

2 j (j = 1, \dots, k)

degrees of freedom. Then, the estimator

{\hat{σ}}_{J R}

has a gamma distribution with the parameters

k - 1

and

(k - 1) / σ

for any

k > 1

because

\begin{matrix} W (σ) & = - \sum_{j = 1}^{k - 1} log U_{j} \\ = \frac{(\sum_{i = 1}^{k} X_{L (i)} - k X_{L (k)})}{σ} \end{matrix}

has a gamma distribution with the parameters

k - 1

and 1. This completes the proof. □

The highest posterior density (HPD) credible intervals (CrIs) for

λ

and

σ

can be constructed by generating the MCMC samples from the marginal posterior density functions (5) and (6), respectively. However, since the marginal posterior density function (5) is not a well-known probability distribution, sampling from it is a difficult task. Instead, sampling can be indirectly performed from the relationship

π_{J R} (λ, θ | x_{L}) = π_{J R} (λ | θ, x_{L}) π_{J R} (θ | x_{L})

because the conditional posterior distribution

π_{J R} (λ | θ, x_{L}) \propto λ^{k - 1} e^{- λ e^{- θ x_{L (k)}}}

has a gamma distribution with the parameters k and

e^{- θ x_{L (k)}}

. To achieve this,

θ

should be generated from its marginal posterior distribution first, and then

λ

should be generated from its conditional posterior distribution given the generated value of

θ

. Finally,

σ = 1 / θ

can be computed. Then, the

100 (1 - α) %

equal-tails (ETs) and HPD CrIs can be constructed for

0 < α < 1

using the method proposed in Chen and Shao [13].

Under the prior

π_{R 1} (λ, θ)

, the resulting posterior

π_{R 1} (λ, θ | x_{L})

is proper because

Q_{2} (λ) > 1

. However, since it is not possible to express it in a closed form as we know it, MCMC samples for

λ

and

θ

can be generated using the Metropolis–Hastings algorithm. For efficient mixing, the proposal variance-covariance structure is updated adaptively. For the corresponding Bayes estimators under the squared error loss function, the notations

{\hat{λ}}_{R 1}

and

{\hat{σ}}_{R 1}

are used.

2.3. Prediction

Let

X_{L (s)} (s = k + 1, k + 2, \dots)

be the future lower record values. Since the sequence

{X_{L (i)}

,

i = 1, 2, \dots}

is a Markov chain, the conditional density function of

X_{L (s)}

given

X_{L} = x_{L}

is the same as that of

X_{L (s)}

given

X_{L (k)} = x_{L (k)}

. That is, it follows that

\begin{matrix} f_{X_{L (s)} | x_{L}} (y) & = f_{X_{L (s)} | x_{L (k)}} (y) \\ = \frac{{[log F (x_{L (k)}) - log F (y)]}^{s - k - 1} f (y)}{Γ (s - k) F (x_{L (k)})}, y < x_{L (k)} \end{matrix}

(7)

by Ahsanullah [9]. Then, for the EGD with the CDF (1), (7) becomes

\begin{matrix} f_{X_{L (s)} | x_{L}} (y) & = \frac{λ θ {[λ (e^{- θ y} - e^{- θ x_{L (k)}})]}^{s - k - 1}}{Γ (s - k)} e^{- θ y - λ (e^{- θ y} - e^{- θ x_{L (k)}})}, - \infty < y < x_{L (k)} < \infty \end{matrix}

(8)

and the corresponding Bayesian predictive density function can be expressed as

\begin{matrix} f_{X_{L (s)} | x_{L}}^{*} (y) = \int_{θ} \int_{λ} f_{X_{L (s)} | x_{L}} (y; λ, θ) π (λ, θ | x_{L}) d λ d θ, - \infty < y < x_{L (k)} < \infty, \end{matrix}

where

π (λ, θ | x_{L})

is a general joint posterior distribution for

(λ, θ)

.

Note that it requires very complex and tedious computations. In fact, there is no guarantee that it can be expressed in a closed form. Instead, a much simpler approach is to use the pivotal quantity that can be obtained by the transformation of a random variable.

Let

H = λ (e^{- θ y} - e^{- θ x_{L (k)}})

in the conditional density function (8). Then, it has a gamma distribution with the parameters

s - k

and 1 with a PDF of

\begin{matrix} f_{H} (h) = \frac{1}{Γ (s - k)} h^{s - k - 1} e^{- h}, h > 0 \end{matrix}

because

y < x_{L (k)}

maps onto

h > 0

and the Jacobian of the transformation is

\begin{matrix} J & = \frac{\partial}{\partial h} y \\ = - {[λ θ (\frac{h}{λ} + e^{- θ x_{L (k)}})]}^{- 1}, \end{matrix}

which leads to the following algorithm for generating the MCMC samples

y_{i} (i = 1, \dots, N)

.

Step 1a.: Generate $h_{i}$ from the gamma distribution with the parameters $(s - k)$ and 1.
Step 1b.: Generate $λ_{i}$ and $θ_{i}$ from the joint posterior distribution $π (λ, θ | x)$ .
Step 2.: Compute

$\begin{matrix} y_{i} = - \frac{1}{θ_{i}} log (\frac{h_{i}}{λ_{i}} + e^{- θ_{i} x_{L (k)}}) . \end{matrix}$
Step 3.: Repeat steps 1 and 2, N times.

Then, the corresponding

100 (1 - α) %

predictive interval (PI) for

0 < α < 1

can be constructed using the method proposed in [13] as in the case of

λ

and

θ

. For the purpose of clarity,

X_{L (s)}^{J R} ∣ x_{L (k)}

and

X_{L (s)}^{R 1} ∣ x_{L (k)}

are used to denote future lower record values under the priors

π_{J R} (λ, θ)

and

π_{R 1} (λ, θ)

, respectively.

3. Time Series Approach

Providing that record values are observed from time series of uncorrelated random variables sampled from continuous probability distributions, the proposed Bayesian method presented in the previous section is compared to the ARIMA and DLM time-series techniques as described below.

The ARIMA model is the most widely used approach to time-series forecasting. Conventionally, it is defined using three components (p, d, q), where

p denotes the order of the autoregressive (AR) term
d denotes the number of differencing required to make the time series stationary
q denotes the order of the moving average (MA) term.

Here, the autoregressive (AR) process assumes that the current value of the series

y_{t}

can be expressed as a function of p past values

y_{t - 1}, y_{t - 2}, \dots, y_{t - p}

in a form of

\begin{matrix} Y_{t} = β_{0} + \sum_{j = 1}^{p} ϕ_{j} Y_{t - j} + ε_{t} \end{matrix}

for

t \geq 1

, where

β_{0} = (1 - \sum_{i = 1}^{p} ϕ_{i}) μ

,

μ

is the mean of this process,

ϕ_{1}, ϕ_{2}, \dots, ϕ_{p}

are constants

(ϕ_{p} \neq 0)

, and

ε_{t}

is a weak white noise series with a mean of zero and a variance of

σ_{ε}^{2}

. The MA process uses past forecast errors expressed as

\begin{matrix} Y_{t} = μ + \sum_{j = 1}^{q} φ_{j} ε_{t - j} + ε_{t}, \end{matrix}

that is, a weighted average of the past values of the white noise process

ε_{t}

. Then, the time series

Y_{t}

is an ARIMA

(p, d, q)

process if

Δ^{d} Y_{t}

is ARMA

(p, q)

obtained by combing the AR and MA terms, where

Δ^{d}

is the dth-order differencing operator. For non-stationary data, one usually fits an ARMA model after taking differences for the data until stationarity is achieved.

The second time-series approach considered in this study for comparison with the proposed method is the DLM. Let

Y_{t}

be an m-dimensional vector observed at time t, while

δ_{t}

be a generally unobservable p-dimensional state vector of the system at time t. Then, the DLM can be defined as

\begin{matrix} \{\begin{matrix} Y_{t} = F_{t} δ_{t} + v_{t}, & v_{t} \sim N_{m} (0, V_{t}) \\ δ_{t} = G_{t} δ_{t - 1} + w_{t}, & w_{t} \sim N_{p} (0, W_{t}) \end{matrix} \end{matrix}

for each time

t \geq 1

together with a prior distribution for the p-dimensional state vector at time

t = 0

,

δ_{0} \sim N_{p} (m_{0}, C_{0})

, where

F_{t}

and

G_{t}

are known matrices of

m \times p

and

p \times p

, respectively, and

V_{t}

and

W_{t}

are variance matrices. Furthermore, it is assumed that the error sequences

v_{t}

and

w_{t}

are independent, and independent of

δ_{0}

.

Note that the lower record value from a univariate time series has a strong trend of decreasing through time. Therefore, a DLM with

\begin{matrix} V_{t} = V, F_{t} = F = [\begin{matrix} 1 & 0 \end{matrix}], δ_{t} = [\begin{matrix} μ_{t} \\ β_{t} \end{matrix}], G_{t} = G = [\begin{matrix} 1 & 1 \\ 0 & 1 \end{matrix}], W_{t} = W = [\begin{matrix} σ_{μ}^{2} & 0 \\ 0 & σ_{β}^{2} \end{matrix}] \end{matrix}

is considered, namely, the linear growth model (LGM)

\begin{matrix} \{\begin{matrix} Y_{t} = μ_{t} + v_{t}, & v_{t} \sim N (0, V) \\ μ_{t} = μ_{t - 1} + β_{t - 1} + w_{t, 1}, & w_{t, 1} \sim N (0, σ_{μ}^{2}) \\ β_{t} = β_{t - 1} + w_{t, 2}, & w_{t, 2} \sim N (0, σ_{β}^{2}), \end{matrix} \end{matrix}

(9)

where

μ_{t}

and

β_{t}

denote the local level and local growth rate at time t, respectively, and

v_{t}

,

w_{t, 1}

, and

w_{t, 2}

denote uncorrelated errors.

4. Sulfur Dioxide Data

This section demonstrates the superiority of the proposed Bayesian method by comparing it to the ARIMA and DLM methods.

The three methods are applied to the time-series data representing sulfur dioxide emissions in the United States (U.S.) from 1970 to 2017 (in 1000 tons) measured by the U.S. Environmental Protection Agency. Due to the implementation of the Acid Rain Program created under Title IV of the 1990 Clean Air Act, sulfur dioxide emissions have decreased significantly over the last decades through a cap and trade program for fossil-fuel powered plants. The observed volume of sulfur dioxide emissions and its descriptive statistics are presented in Figure 1 and Table 1, respectively. Note that each data point was divided by 1000 for computational convenience; given that the data continued to decrease during the observation period as shown in Figure 1, they were all used without losing data during acquiring the lower record values.

To conduct the goodness-of-fit test for the observed sulfur dioxide data, the replicated data are first considered. If the estimated model is adequate, then it should look similar to the observed data. The replicated data are generated from the Bayesian predictive function

\begin{matrix} f_{X_{L (i)}}^{*} (x) = \int_{λ} \int_{σ} f_{X_{L (i)}} (x; λ, θ) π (λ, θ | x_{L}) d λ d θ \end{matrix}

and denoted as

X_{L (i)}^{r e p}

for

i = 1, \dots, k

. Under each prior distribution, the correlation coefficient of the mean

E (X_{L (i)}^{r e p})

and observed lower record values r can then be computed. For further examination, the weighted mean squared error (WMSE)

\begin{matrix} WMSE = \frac{1}{k} \sum_{i = 1}^{k} \frac{{(x_{L (i)} - E (X_{L (i)} | λ, σ))}^{2}}{V a r (X_{L (i)} | λ, σ)} \end{matrix}

is also computed. These results are reported in Figure 2.

It can be noticed from Figure 2 that the estimated models fit the observed sulfur dioxide lower record values very well, and the estimated models under the priors

π_{J R} (λ, θ | x_{L})

and

π_{R 1} (λ, θ | x_{L})

provide almost the same results for the considered statistical criteria.

Table 2 reports the estimation results for the derived priors for comparison and corresponding maximum likelihood counterparts. For the maximum likelihood procedure, the maximum likelihood estimators (MLEs)

\hat{λ}

and

\hat{σ}

are obtained by maximizing the log-likelihood function (2), while the approximate

100 (1 - α) %

confidence intervals (CIs) are calculated based on the MLEs as

\begin{matrix} \hat{λ} \pm Z_{α / 2} \sqrt{\hat{V a r} (\hat{λ})} and \hat{σ} \pm Z_{α / 2} \sqrt{\frac{\hat{V a r} (\hat{θ})}{θ^{4}}}, \end{matrix}

where

Z_{α}

denotes the upper

α

percentile point of the standard normal distribution, and

\hat{V a r} (\hat{λ})

and

\hat{V a r} (\hat{θ})

are the diagonal elements of the asymptotic variance-covariance matrix of the MLE obtained by inverting the Fisher information matrix (3). For the shape parameter

λ

, the Bayes estimate

{\hat{λ}}_{R 1}

has a slightly lower value than the other estimates

\hat{λ}

and

{\hat{λ}}_{J R}

that have almost the same values. However, the

95 %

HPD CrI under the prior

π_{R 1} (λ, θ)

has the shortest length. For the scale parameter

σ

, all estimates vary slightly, while the approximate

95 %

CI based on the

\hat{σ}

has a shorter length than the other CrIs have.

For prediction, the last lower record value is assumed to be known. As mentioned earlier, since no data are lost during acquiring the observed lower record values, the time-series analysis outlined in Section 3 can be conducted at the same time. Table 3 reports the prediction results for the next lower record value

X_{L (20)} ∣ x_{L (19)}

. The R package dlm (Petris [14]) was used to estimate the parameters and state vector in the LGM (9) with

\begin{matrix} δ_{0} \sim N_{2} ([\begin{matrix} 33.361 \\ - 2.471 \end{matrix}], [\begin{matrix} 0.593 & 0 \\ 0 & 0.190 \end{matrix}]) . \end{matrix}

In addition, for the ARIMA model, ARIMA

(0, 2, 1)

was chosen as the best model in terms of the corrected Akaike Information Criterion (AICc), indicating that it has the smallest AICc value when

q = 1

with

φ = - 0.705

after differencing the data twice. The forecast accuracy are evaluated in terms of the mean absolute deviation (MAD), the mean square error (MSE), and the mean absolute percentage error (MAPE), defined respectively as

\begin{matrix} MAD = \frac{1}{n} \sum_{t = 1}^{n} | e_{t} |, \\ MSE = \frac{1}{n} \sum_{t = 1}^{n} e_{t}^{2}, \\ MAPE = \frac{1}{n} \sum_{t = 1}^{n} \frac{| e_{t} |}{x_{t}}, \end{matrix}

where

e_{t} = x_{t} - {\hat{x}}_{t}

and

{\hat{x}}_{t}

is a point forcast of

x_{t}

. In this example,

n = k

and

x_{t} = x_{L (t)}

. These results are reported in Table 4, which indicates that there is little difference in the predictive performance of the two models. Table 3 shows that the proposed Bayesian PI has a shorter length than those obtained for the considered time-series ARIMA model and DLM, especially under the prior

π_{J R} (λ, θ)

. That is, the predictive result under the prior

π_{J R} (λ, θ)

shows the best performance in terms of uncertainty. Using the best performing Bayesian predictive model in terms of uncertainty, the Bayesian predictive density functions for the three future record values are estimated as the kernel density functions based on their MCMC samples. The results are plotted in Figure 3, which shows that both estimated Bayesian predictive models under the priors

π_{J R} (λ, θ)

and

π_{R 1} (λ, θ)

have a greater variance as the future record time

L (s)

increases.

5. Conclusions

This paper defined the Jeffreys and reference priors for unknown parameters of the EGD based on record values and proposed a Bayesian method for predicting future record values. The method makes it very easy to generate MCMC samples for prediction. To validate the proposed method, it was compared to two time-series approaches, namely, the ARIMA model and DLM, using a sulfur dioxide emissions dataset. The results of the comparison demonstrated that the proposed method outperforms the time-series approaches in terms of uncertainty.

While there was no clear difference in the results of the goodness-of-fit tests among the proposed objective prior distributions when analyzing the observed data, the results of forecasting under the prior distribution

π_{J R} (λ, θ)

were better than those under the prior distribution

π_{R 1} (λ, θ)

; both derived prior distributions were proved to be valid.

Author Contributions

Y.K. and J.I.S. came up with the idea and developed the method presented in this paper. Both authors performed data analysis and wrote the paper. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the National Research Foundation of Korea (NRF) grant funded by the Ministry of Education (No. 2019R1I1A3A01062838).

Acknowledgments

We are grateful to the Editor-in-chief, Associate Editor, and the anonymous referees for their helpful comments.

Conflicts of Interest

There are no potential conflict of interest.

References

Chandler, K.N. The distribution and frequency of record values. J. R. Stat. Soc. Ser. B 1952, 14, 220–228. [Google Scholar] [CrossRef]
Coles, S.G.; Tawn, J.A. A Bayesian analysis of extreme rainfall data. J. R. Stat. Soc. Ser. C 1996, 45, 463–478. [Google Scholar] [CrossRef]
Madi, M.T.; Raqab, M.Z. Bayesian prediction of temperature records using the Pareto model. Environmetrics 2004, 15, 701–710. [Google Scholar] [CrossRef]
Wergen, G.; Hense, A.; Krug, J. Record occurrence and record values in daily and monthly temperatures. Clim. Dyn. 2013, 42, 1275–1289. [Google Scholar] [CrossRef] [Green Version]
Seo, J.I.; Kim, Y. Statistical inference on Gumbel distribution using record values. J. Korean Stat. Soc. 2016, 45, 342–357. [Google Scholar] [CrossRef]
Seo, J.I.; Kang, S.B. More efficient approaches to the exponentiated half-logistic distribution based on record values. SpringerPlus 2016, 5, 1433–1451. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Box, G.E.P.; Jenkins, G.M.; Reinsel, G.C. Time Series Analysis: Forecasting, and Control, 3rd ed.; Prentice Hall: Englewood Cliffs, NJ, USA, 1994. [Google Scholar]
West, M.; Harrison, J. Bayesian Forecasting and Dynamic Models, 2nd ed.; Springer: New York, NY, USA, 1997. [Google Scholar]
Ahsanullah, M. Record Statistics; Nova Science Publishers, Inc.: New York, NY, USA, 1995. [Google Scholar]
Jeffreys, H. Theory of Probability and Inference, 3rd ed.; Cambridge University Press: London, UK, 1961. [Google Scholar]
Beger, J.O.; Bernardo, J.M. Estimating a product of means: Bayesian analysis with reference priors. J. Am. Stat. Assoc. 1989, 84, 200–207. [Google Scholar] [CrossRef]
Wang, B.X.; Ye, Z.S. Inference on the Weibull distribution based on record values. Comput. Stat. Data Anal. 2015, 83, 26–36. [Google Scholar] [CrossRef]
Chen, M.H.; Shao, Q.M. Monte Carlo estimation of Bayesian credible and HPD intervals. J. Comput. Graph. Stat. 1999, 8, 69–92. [Google Scholar]
Petris, G. dlm: Bayesian and Likelihood Analysis of Dynamic Linear Models. R Package Version 1.1-1. 2010. Available online: http://CRAN.R-project.org/package=dlm (accessed on 13 June 2020).

Figure 1. Observed sulfur dioxide lower record values.

Figure 2.

95 %

regions for

X_{L (i)}^{r e p}

and scatter plots between

E (X_{L (i)}^{r e p})

and the observed lower record values.

Figure 2.

95 %

regions for

X_{L (i)}^{r e p}

and scatter plots between

E (X_{L (i)}^{r e p})

and the observed lower record values.

Figure 3. Estimated kernel density for

X_{L (s)} ∣ x_{L (19)}

under the priors

π_{J R} (λ, θ)

and

π_{R 1} (λ, θ)

.

Figure 3. Estimated kernel density for

X_{L (s)} ∣ x_{L (19)}

under the priors

π_{J R} (λ, θ)

and

π_{R 1} (λ, θ)

.

Table 1. Descriptive statistics for the observed sulfur dioxide lower record values.

Minimum	Maximum	Mean	Median	Standard Deviation	Skewness	Kurtosis
2815	31,218	13.181	11.011	9.065	0.544	−1.147

Table 2. Results for the observed sulfur dioxide lower record values.

	$\hat{λ}$	${\hat{λ}}_{JR}$	${\hat{λ}}_{R 1}$	$\hat{σ}$	${\hat{σ}}_{JR}$	${\hat{σ}}_{R 1}$
Estimate	26.240	26.289	24.409	10.366	10.911	11.611
$95 %$ ETs	(14.411, 38.070)	(15.785, 39.653)	(14.874, 36.755)	(5.930, 14.801)	(6.991, 16.989)	(7.450, 18.242)
$95 %$ HPD	-	(15.082, 38.560)	(13.784, 35.388)	-	(6.493, 15.992)	(6.859, 17.039)

Table 3. Prediction results for the next lower record value.

	Mean	Median	$95 %$ PI		$80 %$ PI
	Mean	Median	ETs	HPD	ETs	HPD
$X_{L (20)}^{J R} ∣ x_{L (19)}$	2.359	2.560	(0.634, 2.945)	(1.115, 2.960)	(1.566, 2.899)	(2.009, 2.960)
$X_{L (20)}^{R 1} ∣ x_{L (19)}$	2.275	2.507	(0.339, 2.945)	(0.905, 2.960)	(1.384, 2.894)	(1.870, 2.960)
$X_{L (20)}^{A R I M A}$	2.139	-	(0.082, 4.195)	-	(0.794, 3.483)	-
$X_{L (20)}^{D L M}$	2.357	-	(0.535, 4.179)	-	(1.165, 3.548)	-

Table 4. Forecast accuracy for the ARIMA model and DLM.

	MAD	MSE	MAPE
ARIMA(0,2,1)	0.649	0.927	0.062
DLM	0.664	0.817	0.062

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kim, Y.; Seo, J.I. Objective Bayesian Prediction of Future Record Statistics Based on the Exponentiated Gumbel Distribution: Comparison with Time-Series Prediction. Symmetry 2020, 12, 1443. https://doi.org/10.3390/sym12091443

AMA Style

Kim Y, Seo JI. Objective Bayesian Prediction of Future Record Statistics Based on the Exponentiated Gumbel Distribution: Comparison with Time-Series Prediction. Symmetry. 2020; 12(9):1443. https://doi.org/10.3390/sym12091443

Chicago/Turabian Style

Kim, Yongku, and Jung In Seo. 2020. "Objective Bayesian Prediction of Future Record Statistics Based on the Exponentiated Gumbel Distribution: Comparison with Time-Series Prediction" Symmetry 12, no. 9: 1443. https://doi.org/10.3390/sym12091443

APA Style

Kim, Y., & Seo, J. I. (2020). Objective Bayesian Prediction of Future Record Statistics Based on the Exponentiated Gumbel Distribution: Comparison with Time-Series Prediction. Symmetry, 12(9), 1443. https://doi.org/10.3390/sym12091443

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Objective Bayesian Prediction of Future Record Statistics Based on the Exponentiated Gumbel Distribution: Comparison with Time-Series Prediction

Abstract

1. Introduction

2. Bayesian Prediction

2.1. Objective Prior

2.2. Posterior Analysis

2.3. Prediction

3. Time Series Approach

4. Sulfur Dioxide Data

5. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI