1. Introduction
The Fréchet, Weibull, and Gumbel extreme value distributions [
1,
2] are genuine probabilistic models for extreme event data, as they correspond to the asymptotic distribution of statistics extreme of independent and identically distributed random variables. The generalized extreme value (GEV) distribution, presented by [
3], summarizes the three extreme distributions. For this reason, the GEV distribution is widely used to model extreme events across various fields, including insurance, finance, and hydrology. The theory and applications of the GEV distribution are thoroughly discussed in the books [
4,
5,
6,
7,
8,
9], among others.
A continuous random variable
X has a GEV distribution,
X∼
, if its cumulative distribution function (CDF) and probability density function (PDF) are given, respectively, by
and
with shape parameter
, scale parameter
, and location parameter
.
The support of the GEV distribution depends on the values of the parameters. It is the set for and for .
The parameter
determines the weight of the tail of the distribution. The GEV distribution accommodates heavy-tailed and light-tailed distributions and is characterized by its unimodal shape. Some of the unimodal generalizations of the GEV distribution are the transmuted GEV distribution [
10,
11], the dual gamma generalized extreme value distribution, the exponentiated generalized extreme value distribution [
11], and the blended generalized extreme value distribution [
12].
In various applications, extreme climate data, such as wind speed, humidity, and temperature, exhibit heterogeneous (bimodal) densities with rare events and heavy tails. A very promising model for extreme heterogeneous data is the bimodal GEV distribution, as defined in [
13].
Following [
13], a random variable
X has a bimodal GEV (BGEV) distribution;
if its cumulative distribution function is given by
with
,
,
,
, and the transformation
is defined by
is invertible and differentiable. The support of the probability density function associated with the distribution (
3) depends on the shape parameter. When
, the support is
, when
, the interval is
, and when
, the interval is
, where
.
The disadvantage of the model (
3) is that its four parameters are shape parameters. In this distribution, there are no location and scale parameters. In other words,
is not a scale parameter and
is not a location parameter, as is the case with the GEV distribution. Furthermore, the local minimum of the PDF is always located at zero. This limitation complicates its applicability, as real bimodal data can have local minimum at any value of the real line.
The chief goal of this paper is to examine several properties of the new BGEV distribution, which was redefined by [
14] and which includes a location parameter, and to illustrate its applicability. Specifically, this paper complements the work of [
14] in three directions. First, it presents the proof of the identifiability of the new bimodal GEV, which is crucial for the practical application of this model. Second, it presents expressions for the moments, the moment-generating function, and the differential entropy of the new BGEV model. Third, it presents a real data application of the BGEV distribution in a scenario where a bimodal model for extreme data is needed.
The remainder of this paper is structured as follows.
Section 2 presents the main results of this work. We begin in
Section 2.1 with the definition of the main functions related to the new BGEV model. Next, in
Section 2.2, we provide a graphical illustration of the new BGEV model. Finally, in
Section 2.3, we show the main properties of the new BGEV distribution.
Section 3 contains an application of the new bimodal BGEV model to climate data. Finally,
Section 4 closes the paper with some concluding remarks.
2. The New BGEV Distribution
Initially, in
Section 2.1, we show how the model (
3) was redefined by [
14], presenting the cumulative distribution function, probability density function, failure rate function, and quantile function. The versatility of the BGEV distribution is illustrated through the graphical representations in
Section 2.2. The main results of this work are the properties of the new BGEV distribution, which are in
Section 2.3.
2.1. The Redefined BGEV Distribution
Definition 1. A random variable X has a bimodal GEV distribution with location parameter if CDF is given bywhere The inverse and derivative functions of
are, respectively, given by
and
For simplicity, we write , where the notation with parameters is used only to show the role of the parameters and .
The expressions (
7) and (
8) allow us to obtain the following PDF of
, given by
whose support is
In the BGEV model,
,
, and
are shape parameters, while
is a location parameter. It is important to note that in the GEV distribution (
1),
is a scale parameter; however, in (
5),
is not a scale parameter, because does not satisfy the condition
since
.
On the other hand, the parameter
in (
5) is a location parameter. To prove this, it suffices to observe that
and
Thus, .
The model (
5) is a generalization of the GEV distribution, because when
the BGEV distribution returns to GEV distribution. That is,
.
From the expressions in (
5) and (
9), it is simple to obtain the survival and hazard functions. These functions are useful in the area of reliability and for calculating risk measures in other areas. The survival and hazard functions are given by the expressions
and
respectively. The support of the survival and hazard functions is the set (
10).
An important property of the new BGEV model is that its quantile function has a simple closed-form expression. This feature is extremely useful for simulation procedures and the calculation of risk measures in various applied fields.
From (
1), (
5), and (
7), we have that the quantile function of BGEV model given by
2.2. Graphic Illustrations of New BGEV Distribution
The versatility of the new PDF, defined in (
9), is illustrated in
Figure 1,
Figure 2,
Figure 3 and
Figure 4. Depending on the combination of parameters, the PDF can be unimodal or bimodal, symmetric or asymmetric, and have a heavy or light tail. To better understand the role of each of the four parameters in the PDF, we consider four scenarios. In each scenario, we fix three parameters and let the fourth parameter vary to understand its effect on the curves. In each of the
Figure 1,
Figure 2,
Figure 3 and
Figure 4 are the graphs of the PDF (f), CDF (F), survival (S), and hazard functions (h) of
.
In
Figure 1,
Figure 2 and
Figure 3, the graphs of the four functions change as the values of
,
, and
vary. This illustrates our comment above that the parameters
,
, and
are shape parameters.
Figure 1 shows the effect of the parameter
on the curves. When
, the PDF is unimodal, whereas it is bimodal for
. Furthermore, the larger the value of
, the further apart and larger the modes are and the heavier the tails. The effect of the parameter
on the curves is shown in
Figure 2. As
increases, the density tails are heavier and the asymmetry becomes more evident. In
Figure 3, one can see that the parameter
also modifies the PDF. The parameter
is not a scale parameter, since the PDF remains fixed at the local minimum. This confirms our proof above that
is not a scale parameter. In
Figure 4, the PDF only moves with the variation in
. This also confirms that
is a location parameter. Regarding the hazard function h, by depending on the combination of model parameters, the h function is increasing, decreasing, unimodal, N-shaped, or M-shaped. In other words, the BGEV distribution is quite flexible for modeling data regarding survival/reliability.
As previously demonstrated, the mode of distribution is governed by the parameter
. In practical applications, extreme data from heterogeneous populations can be appropriately modeled using the BGEV distribution with
, reflecting its ability to capture bimodal patterns. For instance, [
13] fitted this distribution to maximum wind speed and maximum temperature data, obtaining
values of
and
, respectively, which clearly indicate the presence of bimodality in the extremes. This characteristic is consistent with the seasonal behavior of the studied region, which exhibits two well-defined climatic seasons (wet and dry), each associated with distinct regimes of extreme event occurrences, resulting in two distinct modes throughout the year.
2.3. Properties
In statistics, identifiability is an important property that a family of distributions must satisfy for accurate inference. A distribution of a family is identifiable if different values of the parameters produce different probability distributions. In other words, the parameter of the distribution is unique. The following shows that the family of distributions of
in (
5) is identifiable. In addition, we derive other properties, including formulas for the moments and quantile functions.
2.3.1. Identifiability
Let be a family of CDFs. This class is identifiable if and only if for any , the equality implies .
Proposition 1. The family of BGEV distributions with known parameter δ; is identifiable.
Proof. The authors of [
15] demonstrated the identifiability of the finite mixture of GEV distributions, particularly that the family of a GEV component is identifiable. That is, the family
is identifiable. Thus, to prove that for any
equality
implies
. From (
5), we have that (
11) is equivalent to
Since the function
is identifiable, it follows from (
12) that the equality necessarily implies
and
. □
2.3.2. Moments and Moment Generating Function
To calculate the moments of
, first consider the gamma function, the upper incomplete gamma function, and the lower incomplete gamma function, defined according to [
16], respectively, by
and
where
.
Proposition 2. Let with , then the k-th integer moment of X is given bywhen , whenever andwhen , whenever . Proof. By definition
where
is defined in (
2),
T as in (
6), and
given in (
8). By substituting
into (
17), the moments are expressed as follows:
As
, the Newton Binomial formula is used, so (
18) is updated by the integral
where
and
is the indicator function of the set
A;
if
and
otherwise. Now, we need to analyze the cases where
and
.
Case
: By replacing (
2) in (
19) and changing the variable
for
, it follows that
Newton’s Binomial is used in (
20) and we obtain
In the same way, the
k-th moment of
Y truncated in the negative part is
The lower incomplete gamma function (
15) and the upper incomplete gamma function (
14) are used to represent the integrals of (
21) and (
22), respectively. Consequently, the proof of (
16) follows by substituting these updates into Equation (
19).
Case : The same procedure as in the case where is repeated, respecting the support of . □
Remark 1. From Proposition 2, we have that for , is finite for . That is, the two shape parameters ξ and δ influence the weight of the tail of the new distribution. Consequently, the tail index of the new BGEV distribution is . That is, the right tail of the BGEV distribution can be heavier than the tail of the GEV distribution.
In the following corollary from Proposition 2, we obtain a known result.
Corollary 1. Let where . The k-th moment of X is given by Proof. From (
19), when
, we obtain
where
and
. The proof ends with the use of the expressions (
21) and (
22) and the fact that
. □
The mean of a random variable exists when . It is given in the following corollary.
Corollary 2. Let with . Then, for and For , an expression of the moment generating function was obtained. It is given in the following proposition.
Proposition 3. Let with . The moment generating function of X is given bywhere . Proof. When using the substitution
in (
24) and the fact that
we have that
The new substitution
allows you to update (
25) by
Finally, the series representation of the exponential function is used. Thus, (
26) is rewritten by the equation
□
The following result is a particular case of (
24). It coincides with the moment-generating function of the Gumbel distribution [
6].
Corollary 3. Let . The moment generating function of X is given by Proof. When
, the expression (
23) reduces to
where
. □
The mean of always exists. It is given in the following corollary.
Corollary 4. Let . The expectation of X is given bywhere . Proof. The proof follows from the derivative of (
23) at
. □
2.3.3. Entropy
The differential entropy of the BGEV distribution is given in the following proposition.
Proposition 4. Let with , then the entropy of X is given bywhere γ is the Euler constant and . Proof. Case
. From (
9), Equation (
29) becomes
With the substitution
in (
30), it follows that
where
is as (
1). Due to the fact that
and
, the proof of (
28) is complete.
Case
. Again, from (
9), the Equation (
29) becomes
With the substitution
, the Equation (
31) is updated by
where
. Since
and
, the Equation (
32) proves (
28). □
3. Application
In this section, to demonstrate the applicability of the bimodal GEV model,
with PDF (
9), we use data on the minimum humidity and wind gust speed of Goiânia. It is the second most populous city in the Central-West region of Brazil, surpassed only by Brasilia, the capital of Brazil. The city is an important economic hub in the region and is considered a strategic center for areas such as industry, medicine, fashion, and agriculture. In Goiânia, the climate is tropical, with a dry season with two well-defined seasons: rainy (from October to April) and dry (from May to September). In the dry season, relative humidity reaches critical levels and can be close to
, which constitutes a state of emergency.
The data used here correspond to the period from 1 January 2011 to 31 December 2022 and come from the automatic weather station A002 in Goiânia. Data recording is hourly. The data are available from the National Institute of Meteorology on the website
https://portal.inmet.gov.br/, accessed on 15 January 2023. The relative humidity (HUM) is calculated as the percentage of water vapor in the atmosphere and the wind speed (WS) is measured in meters per second (km/h).
Table 1 shows the descriptive statistics of HUM. In the period corresponding to the data used here, the minimum humidity recorded was 21.83%.
Since the original HUM and WS data exhibit temporal dependence and the model
is for independent and identically distributed (i.i.d.) data, we first applied the minimum block technique [
17] to obtain a subsample of the minimum values of HUM.
Table 2 presents a summary of the
p-values obtained for different block sizes
N, based on the Ljung–Box test [
18] applied to the subsample of block minimum. It is observed that, from 60 towards, the
p-values exceed the
significance level, indicating that for block sizes greater than or equal to 60, the null hypothesis of independence among observations is not rejected. Therefore,
(1440 h) is adopted as the smallest block size that ensures serial independence within the subsample of minimum.
The left panels of
Figure 5 and
Figure 6 display the histograms of the HUM and WS variables, respectively. The right panels of these same figures present the histograms of the subsamples composed of the minimum values of HUM and WS. The histograms of the subsamples exhibit a bimodal behavior, which suggests the use of the BGEV distribution as an appropriate model for these data. Given the extreme nature of the observations, both sets of minima were fitted using the GEV and BGEV distributions to assess the modeling ability of each distribution to capture the empirical characteristics observed.
To estimate the parameters of the GEV and bimodal GEV distributions
and
, we use the maximum likelihood technique that is implemented in the EVD [
19] and bgev packages [
14] in the R Project for Statistical Computing [
20]. The maximum likelihood estimation algorithm of the bgev package is described in the
Appendix A.
Table 3 shows the estimates and standard errors for the minimum subsamples of the HUM and WS.
The parameter
in the BGEV distribution is associated with the presence of bimodality, as previously discussed in
Section 2.2. From
Table 3, we observe that the estimates of
for HUM and WS are 0.54 and 0.36, respectively. These values indicate the presence of an inherent bimodal structure in the data.
To assess the goodness of fit of the BGEV and GEV distributions to the minima of HUM and WS, we used the Akaike Information Criterion (AIC) [
21]. The AIC values obtained for the HUM variable were 611.2 (BGEV) and 2550.6 (GEV), while for WS they were 834.2 (BGEV) and 3452.2 (GEV). These results indicate that the BGEV distributions provide a significantly better fit than the GEV distribution to model the minimum values of relative humidity and wind gust speed observed in Goiânia.
The Shannon entropy of a continuous random variable
, denoted by
and defined in Equation (
29), quantifies the average uncertainty associated with the distribution
F. In general, the higher the entropy, the greater the uncertainty of
F in representing the observations of
X. In the context of extreme value modeling, let
be the entropy of
and let
be the entropy of
. When both the GEV and BGEV distributions are used to model extreme data, Equation (
28) shows that, for
, the following inequality holds:
Applying this condition to the parameter estimates obtained to model the minimum HUM and WS, as presented in
Table 3, we find
for HUM and WS, respectively. This result implies that the entropy of the BGEV distribution is lower than that of the GEV, indicating that the BGEV model provides a more efficient fit to the data, in the sense of lower uncertainty associated with the variability of the observations.
The better performance of the BGEV distribution is further illustrated in the right panels of
Figure 5 and
Figure 6, which present the histograms of the adjusted GEV and BGEV densities for the minimum relative humidity and wind gust speed data, respectively.
4. Conclusions
The GEV distribution is a crucial tool for modeling extreme data. However, this distribution is not well suited to datasets that exhibit bimodal behavior. In this work, we examine a recent extension of the GEV distribution that accommodates bimodal data, known as the BGEV distribution, which was introduced by [
13] and later redefined by [
14].
In short, the main contributions of this paper are as follows. First, it presents a detailed explanation of the redefinition of the BGEV model. Second, the versatility of the BGEV distribution is illustrated through graphical representations of its PDF, which can be highly flexible, exhibiting unimodal or bimodal characteristics, as well as being symmetric or asymmetric and possessing either heavy or light tails. Third, it provides a comprehensive proof of the key properties of the new BGEV distribution, including identifiability, moments, the moment-generating function, and differential entropy. Fourth, it illustrates the usefulness of the new BGEV distribution through the application of climate data. Overall, the BGEV distribution is more effective than the GEV distribution when bimodality is inherent in the data.
A natural extension of the present work is the development of a regression model based on the BGEV distribution. The authors of this paper are currently developing a new class of regression model based on a median reparameterization of the redefined BGEV distribution discussed here. This work is in progress and the results will be reported elsewhere. Another promising extension of this work consists in developing time series models with innovations following the BGEV distribution.
A relevant limitation of the BGEV distribution arises from the fact that its support depends directly on the parameters themselves. This dependence presents challenges in optimizing the likelihood function. In the bgev package, specific strategies have been built to overcome these restrictions and ensure the numerical convergence of the algorithm, as described in Step 3 (error treatment of input parameters) of the algorithm in
Appendix A. In addition, the adjustments of
and
exert a strong influence on the behavior of the tails. In contexts with small sample sizes, these adjustments tend to be unstable and difficult.