Abstract
Recently, the area of sea ice is rapidly decreasing due to global warming, and since the Arctic sea ice has a great impact on climate change, interest in this is increasing very much all over the world. In fact, the area of sea ice reached a record low in September 2012 after satellite observations began in late 1979. In addition, in early 2018, the glacier on the northern coast of Greenland began to collapse. If we are interested in record values of sea ice area, modeling relationships of these values and predicting future record values can be a very important issue because the record values that consist of larger or smaller values than the preceding observations are very closely related to each other. The relationship between the record values can be modeled based on the pivotal quantity and canonical and drawable vine copulas, and the relationship is called a dependence structure. In addition, predictions for future record values can be solved in a very concise way based on the pivotal quantity. To accomplish that, this article proposes an approach to model the dependence structure between record values based on the canonical and drawable vine. To do this, unknown parameters of a probability distribution need to be estimated first, and the pivotal-based method is provided. In the pivotal-based estimation, a new algorithm to deal with a nuisance parameter is proposed. This method allows one to reduce computational complexity when constructing exact confidence intervals of functions with unknown parameters. This method not only reduces computational complexity when constructing exact confidence intervals of functions with unknown parameters, but is also very useful for obtaining the replicated data needed to model the dependence structure based on canonical and drawable vine. In addition, prediction methods for future record values are proposed with the pivotal quantity, and we compared them with a time series forecasting method in real data analysis. The validity of the proposed methods was examined through Monte Carlo simulations and analysis for Arctic sea ice data.
1. Introduction
Extreme weather and air pollution have been received steadily increasing attention over the past decade. Examples include extreme temperatures, the exceedances of flood peaks, and pollutant concentrations deviating considerably from expected average levels. In such cases, predicting observations more extreme than the current extreme values is an important issue. The topic of record values was introduced by Chandler [1], and Balakrishnan et al. [2] established some recurrence relationships for single and double moments of lower record values from the Gumbel distribution. Coles and Tawn [3] analyzed a daily rainfall series for modeling the extremes of a rainfall process in the context of the record values. Wang et al. [4] provided approaches to constructing exact confidence intervals (CIs) for unknown parameters in the family of proportional reversed hazard distributions based on lower record values. Seo and Kim [5] provided classical and Bayesian approaches to inference for the Gumbel distribution based on lower record values. Seo and Kim [6] presented an objective Bayesian analysis method for the two-parameter Rayleigh distribution based on record values. Seo and Kim [7] proposed an entropy inference method based on an objective Bayesian approach for when observed record values have a two-parameter logistic distribution.
There are two types of the record values. If the observation is greater than all the preceding observations it is called the upper record. On the other hand, if the observation is smaller than all the preceding observations then it is called the lower record. There are a few situations where lower record values are of special interest. For example, Arctic sea ice greatly affects climate change, and the reduction of Arctic sea ice is a very serious issue. In this case, only sea ice extent less than the previous one is of interest and recorded, which poses a problem of predicting the next sea ice extent. The lower record value is described as follows. Let be a sequence of independent and identically distributed (iid) random variables with a probability density function (PDF) and a cumulative distribution function (CDF) . Then, is a lower record value if for every The indexes for which lower record values occur are given by the record times , where with Therefore, a sequence of lower record values is denoted by from the original sequence These record values are heavily related to each other. Imani and Braga-Neto [8] proposed an efficient finite-horizon feedback controller similar to an optimal linear quadratic Gaussian estimator for partially-observed Boolean dynamical systems as a general class of nonlinear state-space model. Imani et al. [9] proposed an optimal Bayesian filter approach to the problem of recursive estimation in partially-observed Boolean dynamical systems. To establish the relationship between the record values, a copulas approach based on the canonical (C) and drawable (D) vine is proposed in this article.
Copulas have recently received much attention as a modeling tool for describing the dependency structure of multivariate data. The notion of a copula function can be found in Sklar [10]. One of the advantages of copula models is to build a variety of dependence structures based on existing parametric or non-parametric models of the marginal distributions. The copulas can be described as follows. Let F be the d-dimensional function of the random vector with marginal distributions . Then there exists a copula C such that for all ,
where the copula C is unique if are continuous by Sklar’s theorem (Sklar [10], 1959). In this case, the copula C can be interpreted as the distribution function of a d-dimensional random variable on with uniform marginal distributions. Tsung et al. [11] conducted a comprehensive literature review of statistical transfer learning methods focusing on statistical models and statistical methodologies, including a Gaussian copula. Rocher et al. [12] proposed a generative copula-based method that can elaborately estimate the likelihood that a particular person will be correctly re-identified, even in a very incomplete dataset. Vine copulas were introduced by Joe [13] to overcome limitations of standard multivariate copulas in higher dimensions, where standard multivariate copulas lack the flexibility of accurately modeling the dependence.
Aas et al. [14] described statistical inference techniques for the C- and D-vine copulas. Berg and Aas [15] and Fischer et al. [16] showed the excellence of the D-vine copula approach, compared to alternative copulas in constructing higher dimensional dependency structures.
To the best of our knowledge, modeling the dependence structure between record values numerically has been little explored. In this paper, we propose an approach with which to model the relationship between the record values based on C- and D-vine copulas and to predict future record values.
The remainder of the article is organized as follows. Section 2 introduces C- and D-vine copulas and provides pivotal-based approaches to estimate the model parameters and to predict future record values by proposing a new algorithm to deal with a nuisance parameter. Section 3 presents simulation studies to validate the proposed approaches. We applied the methods to Arctic sea ice data; see Section 4. Concluding remarks and some discussions are in Section 5.
2. Methods
Let and be the marginal density and distribution functions of X, respectively, where is an unknown parameter. Then a schematic diagram (Figure 1) of our method is given by
Figure 1.
Schematic diagram.
The C -and D-vine copulas are first described in the following subsection.
2.1. C- and D-Vine Copulas
A vine is a flexible graphical model that decomposes a multivariate probability distribution into bivariate copulas, where each pair-copula can be chosen independently from the others [14]. This article considers C- and D-vine copulas to model the relationship between record values based on C- and D-vine copulas.
The C-vine decomposition is given by
Then, we can specify the pairs of the d-dimensional C-vine copula model in the following order:
which has vectors of length , where d is the number of variables.
The D-vine decomposition is given by
Similarly, the pairs of the d-dimensional D-vine copula model are specified in the following order:
To measure the dependence of each pair-copula, we consider tree 1 that can be employed to obtain Kendall’s (Nelsen [17], 2006) given by
where is a bivariate copula function for .
In a C- and D-vine, consider the exponentiated Gumbel distribution (EGD) with the CDF
where and are the scale and shape parameters. Then, is the value of the marginal distribution of with
where (Ahsanullah [18] and Arnold et al. [19]). That is, for pairs of data points , the corresponding couples can be computed from the marginal distribution (2). In addition, the marginal density function of is given by
Note that it is necessary to estimate and for computing the values of and .
2.2. Pivotal-Based Approach
Here we present a pivotal-based method to estimate the parameters of the CDF (1). First, a lemma is introduced to deal with nuisance parameters in order to establish the relationship between record values.
Lemma 1.
Let be the lower record values from the CDF (1). Then,
- (a)
- has a distribution with degrees of freedom;
- (b)
- has a F distribution with and 2 degrees of freedom;
- (c)
- has a distribution with degrees of freedom.
Proof.
Let be the lower record values from the CDF (1). Then, we have
that have a standard exponential distribution, and that leads to the following spacings
which is independent and identically distributed as the standard exponential distribution with mean 1. From the spacings, the pivotal quantities
are derived, which are independent random variables such that there is an distribution with . From (3), the pivotal quantity (a) is easily proved. In addition, let . Then, the pivotal quantity (b) is proved as
because and are independent random variables, given the fact that are independent and identically random values, as mentioned earlier; both have a distribution with and 2 degrees of freedom, respectively. On top of that, we can derive the following pivotal quantities by using Lemma 2 of Wang et al. [4] in (3)
which are independent and identically distributed as the uniform distribution on the interval . Then, the pivotal quantity (c) is proved as
☐
From Lemma 1(c), the unique solution is given by
where follows a distribution with degrees of freedom. Then, for any , an exact CI for based on is given by
where is the th smallest of . Note that the exact CI is the equal-tail CI because it splits the probability equally, putting in each tail of the distribution. For any , an exact CI with the shortest-length based on is given by
where is chosen so that
Similarly, the unique solution from in Lemma 1 is given by
where follows a F distribution with and 2 degrees of freedom. Then, with the same argument, the exact equal-tailed and shortest CIs for based on can be constructed. In Section 3, it is found that provides a more efficient CI than in terms of average lengths (ALs) of the CIs through Monte Carlo simulations, as in the case of Seo and Kim [5].
For , we have that
by putting in Lemma 1. In addition, let be the unique solution of for , where . Then, by substituting for in (4), the following generalized quantity is given by
The existing literature (Wang et al. [4] and Wang et al. [20]) supposed that W has a distribution with degrees of freedom, and then obtained the percentiles of generating W and independently from the distribution with and degrees of freedom, respectively, althrough W and are not independent. As an alternative, the following algorithm is proposed to obtain the percentiles of .
- Step 1.
- Generate from a distribution with two degrees of freedom.
- Step 2.
- Compute for .
- Step 3.
- Compute and solve the equation for to obtain .
- Step 4.
- Compute .
- Step 5.
- Repeat times.
From the algorithm, the equal-tailed and shortest CIs for based on are given by
and
respectively, where is chosen so that
2.3. Prediction
Let be the future lower record values. Then, for any , the conditional quantile is given by
where is the conditional distribution of given . However, the quantile cannot be obtained numerically because it does not have closed forms. Instead, we propose a pivotal approach based on the following lemma.
Lemma 2.
Let in the conditional density function of given be defined by Ahsanullah [18] as
Then, has a gamma distribution with the parameters .
Proof.
Let in (5). Then, the Jacobian of the transformation is
and the density function of is
which is the probability density function of a gamma distribution with parameters . ☐
With the same argument as Section 2, an algorithm for obtaining the Markov-chain Monte-Carlo (MCMC) samples based on the pivotal quantity is provided as follows.
- Step 1.
- Generate from Gam.
- Step 2.
- Compute
- Step 3.
- Repeat steps 1 and 2, N times.
3. Simulation Study
A simulation study was performed to examine the validity of the proposed pivotal-based approach in terms of the coverage probabilities (CPs) and ALs of the proposed confidence intervals (CIs). The lower record values with sizes were generated from the standard EGD distribution with and . To construct the exact CIs described in Section 2.2, MCMC samples were generated, and the corresponding CPs and average lengths (ALs) were computed over 10,000 simulations. The results are reported in Table 1 along with those for the classical inference (see Proof) (Appendix A) for comparison. Table 1 shows that the CIs using MCMC samples have nearly same results as those using the classical method, and all considered CIs are well matched to their corresponding nominal levels; however, the CIs based on have shorter length than those based on . In addition, all ALs decrease with an increase in the size of record values k. For ALs, the CIs with the shortest-lengths have shorter lengths than those with equal-tails, as expected.
Table 1.
Coverage probabilities (CPs) (average lengths (ALs)) of CIs for and .
4. Application: Arctic Sea Ice
Sea ice maintains the Earth’s average temperature by reflecting solar energy and keeping the polar regions cool. Currently, the Arctic is warming faster than any other region on earth. The warming of the Arctic Circle leads to a decrease in sea ice, which again causes warming of the Arctic Circle. In addition, it causes global weather changes such as summer heat waves, winter cold waves, and heavy snow. These climate changes are leading to disturbances of ecosystems formed around Arctic sea ice and changes in habitats. For this reason, the importance of sea prediction systems to cope with climate change is increasing. The National Aeronautics and Space Administration (NASA) reported that the area covered by Arctic sea ice has decreased by about ten percent in the last 30 years (Figure 2).
Figure 2.
Sea ice extent in October 1979 (left) and October 2018 (right).
This section analyzes the smallest annual Arctic sea ice extent (see Table 2) from October 1978 to October 2018 extracted from the National Snow & Ice Data Center (NSIDC).
Table 2.
Observed record values from Arctic sea ice data
To measure goodness of fit of the EGD, the replicated data of observed lower record value were generated from its marginal density function with and . All results were obtained by generating MCMC samples. In addition, based on the results in the previous simulation study, from was only considered in this data analysis.
The confidence region for the replicated data was plotted in Figure 3. It was found that the confidence regions decreased as the record value of the smallest annual Arctic sea ice decreased. The correlation coefficient between the observed and expected lower record values indicates a strong association.
Figure 3.
95% Confidence region of the replicated data. The solid line represents the mean of the replicated data and r is the correlation coefficient of the mean and observed lower record values.
To examine the relationship between observed record values, Figure 4 plots the first trees of the C- and D-vines with the best copula function in terms of the Akaike information criterion (AIC) and corresponding Kendall .
Figure 4.
(a) First tree of C-vine for the observed record values; (b) first tree of D-vine for the observed record values. The labels are the best pair-copula families and corresponding Kendall’s values. For example, N, t, BB1, and SBB1 represent Gaussian, Student t, Clayton Gumbel, and survival Clayton Gumbel copula, respectively.
Note that the AIC is defined as , where L is the likelihood function and k is the number of estimated parameters of the model. Therefore, the smaller the AIC, the better. The entire result for the relationships between observed lower record values is reported in Figure 5. It is shown that the observed lower record values have a positive dependence on each other. In addition, the Kendall’s values increase as the interval between the lower record times decreases. That indicates that and such that for have the strongest dependency in terms of the Kendall . It is worth noting that the strength of dependency between and such that for becomes stronger as the lower record times increase.
Figure 5.
Circular plot for Kendall’s between two paired record values.
The exact CIs for and are reported in Table 3, which shows a similar pattern to the simulation results.
Table 3.
CIs for and .
For prediction, the last lower record value was assumed to be unknown, and a time series analysis was conducted, in which it was expected that differences of the observed lower record values could yield a stationary time series because the observed lower record values had a decreasing pattern. In fact, the ARIMA (0, 1, 0) model was chosen as the best model in terms of the AIC from an ARIMA (p, d, q) model, where p is the autoregressive (AR) model order, d is the difference order, and q is the moving average (MA) model order. Table 4 and Figure 6 present the results for future record values of the least annual Arctic sea ice. Table 4 shows that there is little difference in measures of center such as the mean and median for the predictions of the future lower record values based on the pivotal quantity .
Table 4.
Prediction results.
Figure 6.
Estimated kernel density functions for .
For the last lower record value, the ARIMA (0, 1, 0) model provides a closer predictive value than the mean of to the actual value of 1.29, while the PI from the ARIMA (0, 1, 0) model has a longer length than that for based on the pivotal quantity . Finally, Figure 6 shows that as the future record time increases, the variance of the predicted future record value from the conditional density function increases.
5. Conclusions
This article proposed a copula approach with which to model the dependence structure between record values from the EGD and to predict future lower record values using a pivotal-based method. In the pivotal-based method, a new algorithm for dealing with a nuisance parameter has been proposed; it not only is very computationally convenient in constructing exact CIs with the shortest lengths, but also provides very satisfactory results in terms of the CPs and ALs, compared with the classical method. In the approach based on the C- and D-vine copulas, we chose the best copula model in terms of the AIC among 40 paircopula families and it showed very intuitive and reasonable results in analysis based on real data. An interesting point is that the strength of the dependency between and such that for becomes strong as the lower record times increase in real data analysis. The proposed method is applicable to recording values of other real data that have a probability distribution if the CDF of the probability distribution has a closed form, such as an extreme value distribution. The prediction results of this paper indicate that we should be alert to the decrease in Arctic sea ice extent. In future studies, we envision extending this work to predict the size and decreasing rate of Arctic sea ice extent in real time.
Author Contributions
J.I.S. conceived and designed the research; J.I.S. and J.L. analyzed the data and interpreted the results; J.I.S., J.L., J.J.S., and Y.K. wrote the manuscript. All authors have read and agreed to the published version of the manuscript.
Funding
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (Ministry of Education) (number 2019R1I1A3A01062838).
Acknowledgments
We are grateful to the editor-in-chief, associate editor, and anonymous referees for their helpful comments.
Conflicts of Interest
The authors declare no conflict of interest.
Appendix A. Proof
Proof.
For any , we have
Then, the interval length in (A1) is given by
and the corresponding expected interval is given by
because
where is the digamma function. The equal-tailed CI based on is obtained setting and in (A1) because has the F distribution with and 2 degrees of freedom. To find and that minimizes the length such that
where is the PDF of the F distribution with and 2 degrees of freedom, we have
and
so that
where
It follows that the minimum occurs at
That is, we choose and that satisfy the conditions (A2) and (A3) to construct the shortest CI for based on .
Similarly, for any , the equal-tailed CI based on is obtained setting and in
and the shortest CI can be constructed choosing and that satisfy
and
☐
References
- Chandler, K.N. The distribution and frequency of record values. J. R. Stat. Soc. Ser. B 1952, 14, 220–228. [Google Scholar] [CrossRef]
- Balakrishnan, N.; Ahsanullah, M.; Chan, P.S. Relations for single and product moments of record values from Gumbel distribution. Stat. Probab. Lett. 1992, 15, 223–227. [Google Scholar] [CrossRef]
- Coles, S.G.; Tawn, J.A.A. Bayesian analysis of extreme rainfall data. J. R. Stat. Soc. Ser. C 1996, 45, 463–478. [Google Scholar] [CrossRef]
- Wang, B.X.; Yu, K.; Coolen, F.P. Interval estimation for proportional reversed hazard family based on lower record values. Stat. Probab. Lett. 2015, 98, 115–122. [Google Scholar] [CrossRef]
- Seo, J.I.; Kim, Y. Statistical inference on Gumbel distribution using record values. J. Korean Stat. Soc. 2016, 45, 342–357. [Google Scholar] [CrossRef]
- Seo, J.I.; Kim, Y. Objective Bayesian analysis based on upper record values from two-parameter Rayleigh distribution with partial information. J. Appl. Stat. 2017, 44, 2222–2237. [Google Scholar] [CrossRef]
- Seo, J.I.; Kim, Y. Objective Bayesian entropy inference for two-parameter logistic distribution using upper record values. Entropy 2017, 19, 208. [Google Scholar] [CrossRef]
- Imani, M.; Braga-Neto, U.M. Finite-horizon LQR controller for partially-observed Boolean dynamical systems. Automatica 2018, 95, 172–179. [Google Scholar] [CrossRef]
- Imani, M.; Dougherty, E.R.; Braga-Neto, U. Boolean Kalman filter and smoother under model uncertainty. Automatica 2020, 111, 108609. [Google Scholar] [CrossRef]
- Sklar, M. Fonctions de répartition á n dimensions et leurs marges. Publ. Inst. Stat. Univ. Paris 1959, 8, 229–231. [Google Scholar]
- Tsung, F.; Zhang, K.; Cheng, L.; Song, Z. Statistical transfer learning: A review and some extensions to statistical process control. Qual. Eng. 2018, 30, 115–128. [Google Scholar] [CrossRef]
- Rocher, L.; Hendrickx, J.M.; De Montjoye, Y.A. Estimating the success of re-identifications in incomplete datasets using generative models. Nat. Commun. 2019, 10, 1–9. [Google Scholar] [CrossRef] [PubMed]
- Joe, H. Families of m-variate distributions with given margins and m(m−1)/2 bivariate dependence parameters. Lect. Notes Monogr. Ser. 1996, 28, 120–141. [Google Scholar]
- Aas, K.; Czado, C.; Frigessi, A.; Bakken, H. Pair-copula constructions of multiple dependence. Insur. Math. Econ. 2009, 44, 182–198. [Google Scholar] [CrossRef]
- Berg, D.; Aas, K. Models for construction of multivariate dependence: A comparison study. Eur. J. Financ. 2009, 15, 639–659. [Google Scholar]
- Fischer, M.; Köck, C.; Schlüter, S.; Weigert, F. An empirical analysis of multivariate copula models. Quant. Financ. 2009, 9, 839–854. [Google Scholar] [CrossRef]
- Nelsen, R.B. An Introduction to Copulas, 2nd ed.; Springer: New York, NY, USA, 2006. [Google Scholar]
- Ahsanullah, M. Record Statistics; Nova Science Publishers, Inc.: New York, NY, USA, 1995. [Google Scholar]
- Arnold, B.C.; Balakrishnan, N.; Nagaraja, H.N. Records; Wiley: New York, NY, USA, 1998. [Google Scholar]
- Wang, B.X.; Yu, K.; Jones, M.C. Inference under progressively Type II right-censored sampling for certain lifetime distributions. Technometrics 2010, 52, 453–460. [Google Scholar] [CrossRef]
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).