Next Article in Journal
A Double Generally Weighted Moving Average Chart for Monitoring the COM-Poisson Processes
Next Article in Special Issue
A New Bimodal Distribution for Modeling Asymmetric Bimodal Heavy-Tail Real Lifetime Data
Previous Article in Journal
Similarity Measures of Quadripartitioned Single Valued Bipolar Neutrosophic Sets and Its Application in Multi-Criteria Decision Making Problems
Previous Article in Special Issue
Likelihood-Based Inference for the Asymmetric Beta-Skew Alpha-Power Distribution
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An Asymptotic Test for Bimodality Using The Kullback–Leibler Divergence

by
Javier E. Contreras-Reyes
Departamento de Estadística, Facultad de Ciencias, Universidad del Bío-Bío, Concepción 4030000, Chile
Symmetry 2020, 12(6), 1013; https://doi.org/10.3390/sym12061013
Submission received: 16 May 2020 / Revised: 1 June 2020 / Accepted: 9 June 2020 / Published: 16 June 2020
(This article belongs to the Special Issue Symmetric and Asymmetric Bimodal Distributions with Applications)

Abstract

:
Detecting bimodality of a frequency distribution is of considerable interest in several fields. Classical inferential methods for detecting bimodality focused in third and fourth moments through the kurtosis measure. Nonparametric approach-based asymptotic tests (DIPtest) for comparing the empirical distribution function with a unimodal one are also available. The latter point drives this paper, by considering a parametric approach using the bimodal skew-symmetric normal distribution. This general class captures bimodality, asymmetry and excess of kurtosis in data sets. The Kullback–Leibler divergence is considered to obtain the statistic’s test. Some comparisons with DIPtest, simulations, and the study of sea surface temperature data illustrate the usefulness of proposed methodology.

1. Introduction

The bimodality of a frequency distribution is crucial in several fields. For example, [1] analyzed bimodality-generating mechanisms for age-size plant population data sets. Ashman et al. [2] discussed the presence of bimodality in globular cluster metallicity distributions, velocity distributions of galaxies in clusters, and burst durations of gamma-ray sources. Hosenfeld et al. [3] detected bimodality in samples of elementary schoolchildren’s reasoning performance. Bao et al. [4] applied the minimum relative entropy method for bimodal distribution to remanufacturing system data. Freeman and Dale [5] assessed bimodality to detect the presence of dual cognitive processes, and Shalek et al. [6] found bimodal variation in immune cells using ribonucleic acid (RNA) fluorescence.
In the literature, exist several measures of a frequency distribution’s bimodality. Classical inferential methods for detecting bimodality focused on third and fourth moments through kurtosis measure [1]. Darlington [7] and Hildebrand [8] claimed that kurtosis is more a measure of unimodality versus bimodality than a measure of peakedness versus flatness. Hartigan and Hartigan [9] considered an asymptotic test to compare the empirical distribution function with a unimodal one. This paper is motivated by the latter, but considers a parametric approach. Specifically, we considered a generalized class of distributions that involves bimodal behavior in empirical distribution. This class is called the bimodal skew-symmetric normal (BSSN) distribution [10], and includes the particular case of bimodal normal distribution of [11]. Thus, with BSSN distribution it is possible to capture asymmetric and platykurtic/leptokurtic (excess negative/positive kurtosis) in data sets, in addition to bimodality. Besides, entropic measures are useful to obtain the statistic’s test if some regularity conditions of the probability distribution function are accomplished [12]. We considered the Shannon entropy [13], the Kullback–Leibler [14] divergence, and the BSSN maximum likelihood estimators to provide an asymptotic test for bimodality.
This paper is organized as follows: some properties and inferential aspects of BSSN distribution are presented in Section 2. In Section 3, we provide the computation and description of information theoretic measures related to BSSN distribution and then develop a hypothesis test about significance of bimodality parameter together with a simulation study (Section 4). In Section 5, real data of sea surface temperature collected off northern Chile illustrate the usefulness of the developed methodology. Discussion concludes the paper in Section 6.

2. Bimodal Skew-Symmetric Normal Distribution

Definition 1.
Let X be a continuous random variable defined at R , so we say X is bimodal skew-symmetric normal (BSSN), distributed and denoted as X B S S N ( μ , σ 2 , β , δ ) [10], if its probability density function (pdf) is given by
f ( x ) = c [ ( x β ) 2 + δ ] ϕ ( x ; μ , σ 2 ) , x R ,
where μ , β R are location parameters, σ > 0 and δ 0 denote respectively the scale and bimodality parameters, ϕ ( · ; μ , σ 2 ) is the normal pdf with location μ and scale σ parameters; and c = [ λ 2 + σ 2 + δ ] 1 , with λ = β μ .
The mean and variance of X are given by
E [ X ] = μ 2 c λ σ 2 ,
V a r [ X ] = c 2 σ 2 ( 3 σ 4 + 4 δ σ 2 + [ λ 2 + δ ] 2 ) ,
respectively. Equation (1) shows that X N ( μ , σ 2 ) as δ or | β | . The pdf of Equation (1) can also be expressed in a standardized form as is presented next.
Definition 2.
A random variable Y has a BSSN distribution, with location parameters μ , β R , positive scale parameter σ, and non-negative bimodality parameter δ if its pdf is
f ( y ) = c σ y 2 2 c λ y + 1 σ 1 ϕ ( y ) , y R ,
where y = ( x μ ) / σ , ϕ ( · ) is the standardized normal pdf with location 0 and scale 1, and c and λ are defined as in Equation (1).
Figure 1 portrays various plots of the BSSN pdf, accommodating various shapes in terms of skewness, kurtosis and bimodality. We observed that bimodality is presented for smallest δ , see also Proposition 2.4 in [10]. In addition, the μ and β parameters allows accomodating skewness and kurtosis.
For a random sample X = ( X 1 , , X n ) with pdf given in Equation (1), the log-likelihood function can be written as
( θ ; X ) = n log c + m = 1 n log [ ( X m β ) 2 + δ ] 1 2 m = 1 n Y m 2 ,
where Y m = ( X m μ ) / σ , m = 1 , , n , and θ = ( μ , σ 2 , β , δ ) . Therefore, the MLE θ ^ is obtained by maximizing the function (4). The Fisher Information Matrix (FIM) related to maximum likelihood equations and derivatives with respect to θ , is
I ( θ ) = I μ μ I σ μ I β μ I δ μ I μ σ I σ σ I β σ I δ σ I μ β I σ β I β β I δ β I μ δ I σ δ I β δ I δ δ ,
where its elements denoted by I θ i θ j = E [ 2 ( θ ; X ) / θ i θ j ] , θ k = { μ , σ 2 , β , δ } , k = 1 , 2 , 3 , 4 ; are
I μ μ = 2 n c σ 4 n 2 c λ σ 4 2 + n σ 2 , I σ σ = 2 n σ 2 4 n c 2 δ + λ 2 σ 2 2 , I β β = I μ μ n σ 2 2 i = 1 n δ ( X i β ) 2 [ δ + ( X i β ) 2 ] 2 , I δ δ = n c σ 4 2 + i = 1 n 1 [ δ + ( X i β ) 2 ] 2 , I μ σ = 4 n λ c σ 2 2 2 n ( X ¯ μ ) = I σ μ , I μ β = n σ 2 I μ μ = I β μ , I μ δ = 2 n λ c σ 4 2 = I δ μ , I σ β = 4 n λ c σ 2 2 = I β σ , I σ δ = I σ β 2 = I δ σ , I β δ = I μ δ 2 i = 1 n X i β [ δ + ( X i β ) 2 ] 2 = I δ β ,
where X ¯ = 1 n i = 1 n X i . It can be seen that FIM of Equation (5) is regular, for all δ 0 .

3. Information Measures

In the next subsections, we present the main results of information measures for BSSN distribution.

3.1. Shannon Entropy

The entropy concept is attributed to uncertainty of information or mathematical contrariety of information. Of all possible entropies presented in the literature, we focus on Shannon entropy (SE) [13]. The SE of a random variable Z with pdf f ( z ) is defined as the expected value given by
H ( Z ) = E [ log f ( Z ) ] = R f ( z ) log f ( z ) d z ,
where E [ g ( Z ) ] denotes the expected information in Z for a function g ( z ) . In this case, SE is the expected value of the function g ( z ) = log f ( z ) , which satisfies g ( 1 ) = 0 and g ( 0 ) = . We extend this notation in all expected values expressed in this paper.
Proposition 1.
Let X B S S N ( μ , σ 2 , β , δ ) with pdf defined in Equation (1), the SE of X is given by
H ( X ) = 1 2 log 2 π σ 2 c 2 + ( c σ ) 2 2 3 σ 2 + 4 δ 4 λ 2 + λ 2 + δ σ 2 E log { ( X β ) 2 + δ } ,
with λ = β μ .
Proof. 
From Equation (1), we have
log f ( x ) = log c + log [ ( x β ) 2 + δ ] 1 2 log ( 2 π σ 2 ) 1 2 σ 2 ( x μ ) 2 .
Then, from the definition of SE given in Equation (6), we have
H ( X ) = R f ( x ) log c d x + R 1 2 log ( 2 π σ 2 ) f ( x ) d x R log [ ( x β ) 2 + δ ] f ( x ) d x + R 1 2 σ 2 ( x μ ) 2 f ( x ) d x , = log c + 1 2 log ( 2 π σ 2 ) + 1 2 σ 2 E [ ( x μ ) 2 ] E log { ( X β ) 2 + δ } .
Given that E [ ( X μ ) 2 ] = V a r [ X μ ] + E [ X μ ] 2 = V a r [ X ] + ( E [ X ] μ ) 2 , the result for H ( X ) yields from Equations (2) and (3) and some basic algebra. □
For any δ , the expected values of Equation (7) are not directly computable. However, the integrals are evaluated numerically using the integrate function of R software’s [15] QUADPACK routine [16]. Several cases of SE given in Equation (7) are illustrated in the left panel of Figure 2 for δ = 0.1 to 20. SE is positive and reaches its maximum value for largest values of β and 0 < δ < 5 (where more bimodality exists). As is highlighted in Section 2, the SE of BSSN random variable tends to SE of a normal one,
H ( X ) = 1 2 log ( 2 π σ 2 e ) = H ( X N ) ,
as δ or | β | , where X N N ( μ , σ 2 ) [17]. Therefore, for highest values of δ , the SE decreases and converges to normal SE, H ( X N ) = 2.224 , with σ 2 = 5 . It can be shown that H ( X ) = H ( X N ) for δ 500 .
From the expected value given in Equation (8), we could consider the polynomial of second order, p ( x ) = x 2 2 x β + β 2 + δ . This polynomial has determinant given by Δ = 4 δ . Given that δ 0 , we have two cases for possible roots, u 1 and u 2 , of p ( x ) = ( x u 1 ) ( x u 2 ) :
(i)
δ = 0 Δ = 0 : u 1 = u 2 = β (real and equal roots). Thus, f ( x ) = c * ( x β ) 2 ϕ ( x ; μ , σ 2 ) , with c * = [ λ 2 + σ 2 ] 1 . However, for this case X does not present bimodality, so p ( x ) 0 for all x R \ { β } .
(ii)
δ > 0 Δ < 0 : u 1 = β + i δ and u 2 = β i δ , i = 1 (complex and different roots). However, x is defined in the real line, R .
Considering cases (i) and (ii), the SE exists and is finite if p ( x ) 0 for all x R / { β } , δ = 0 , and for all x R , δ > 0 . These cases are illustrated in the right panel of Figure 2. Red dots are related to roots without real part ( β = 0 ) and the other dots are related to β 0 . Given that Δ < 0 , several dots are related to δ , shaped like circles.

3.2. Kullback-Leibler Divergence

Another measure related to the SE is the Kullback-Leibler (KL, [14]) divergence. It measures the degree of divergence between the distributions of two random variables, Z 1 and Z 2 , with pdf f ( z 1 ) and g ( z 2 ) , respectively. The KL divergence of the pdf of Z 1 from the pdf of Z 2 is defined by
K ( Z 1 , Z 2 ) = E log f ( z ) g ( z ) = R f ( z ) log f ( z ) g ( z ) d z ,
where, as indicated in the notation, the expectation is defined with respect to the pdf f ( z 1 ) . We note that K ( Z j , Z j ) = 0 , j = 1 , 2 , but again K ( Z j , Z k ) K ( Z k , Z j ) , j , k = 1 , 2 , j k , at least that Z j = d Z k , i.e., the KL divergence is not symmetric. An important property of KL divergence is that is non-negative: K ( Z j , Z k ) 0 , j , k = 1 , 2 , j k , for all Z 1 , Z 2 . Given that KL divergence does not satisfy the triangular inequality, it must be interpreted as a pseudo-distance measure [17].
Proposition 2.
Let Z j B S S N ( μ j , σ j 2 , β j , δ j ) , j = 1 , 2 , both with pdf defined in Equation (1), the KL divergence between Z 1 and Z 2 is given by
K ( Z 1 , Z 2 ) = log c 1 σ 2 c 2 σ 1 + 1 2 1 σ 2 2 1 σ 1 2 c 1 2 σ 1 2 ( 3 σ 1 4 + 4 δ 1 σ 1 2 + [ λ 1 2 + δ 1 ] 2 ) + 1 2 σ 2 2 ( μ 1 μ 2 2 c 1 λ 1 σ 1 2 ) 2 2 c 1 2 λ 1 2 σ 1 2 + E log ( Z 1 β 1 ) 2 + δ 1 ( Z 1 β 2 ) 2 + δ 2 ,
with c j = [ λ j 2 + σ j 2 + δ j ] 1 , λ j = β j μ j , j = 1 , 2 .
Proof. 
Assuming that Z 1 and Z 2 have pdf f and g, respectively; from Equation (1) we get
log g ( x ) = log c 2 + log [ ( x β 2 ) 2 + δ 2 ] 1 2 log ( 2 π σ 2 2 ) 1 2 σ 2 2 ( x μ 2 ) 2 .
Then, from definition of KL divergence given in Equation (9), we get
K ( Z 1 , Z 2 ) = R f ( z 1 ) log g ( z 1 ) d x H ( Z 1 ) = R log [ ( z 1 β 2 ) 2 + δ 2 ] f ( z 1 ) d z 1 + R 1 2 σ 2 2 ( z 1 μ 2 ) 2 f ( z 1 ) d z 1 R f ( z 1 ) log c 2 d z 1 + R 1 2 log ( 2 π σ 2 2 ) f ( z 1 ) d z 1 H ( Z 1 ) = 1 2 log 2 π σ 2 2 c 2 2 + 1 2 σ 2 2 E [ ( Z 1 μ 2 ) 2 ] E log { ( Z 1 β 2 ) 2 + δ 2 } H ( Z 1 ) = 1 2 log 2 π σ 2 2 c 2 2 + 1 2 σ 2 2 V a r [ Z 1 ] + ( E [ Z 1 ] μ 2 ) 2 1 2 log 2 π σ 1 2 c 1 2 1 2 σ 1 2 V a r [ Z 1 ] + ( E [ Z 1 ] μ 1 ) 2 + E log { ( Z 1 β 1 ) 2 + δ 1 } E log { ( Z 1 β 2 ) 2 + δ 2 } .
Given that E [ ( Z 1 μ 2 ) 2 ] = V a r [ Z 1 μ 2 ] + E [ Z 1 μ 2 ] 2 = V a r [ Z 1 ] + ( E [ Z 1 ] μ 2 ) 2 , the result yields from Equations (2) and (3), Proposition 1, and some basic algebra. □
For any δ j , j = 1 , 2 , the expected values of Equation (10) are not directly computable. However, the integrals were evaluated numerically using the integrate function of QUADPACK routine [16]. Besides, we are considering two polynomials of second order, p j ( x ) = x 2 2 x β j + β j 2 + δ j , with determinants given by Δ j = 4 δ j , j = 1 , 2 , respectively. Given that δ j 0 , we get four cases for possible roots, u j , k of p j ( x ) = ( x u j , k ) ( x u j , k ) , j , k = 1 , 2 :
(i)
δ j = 0 Δ j = 0 , j = 1 , 2 : u 1 , 1 = u 1 , 2 = β 1 and u 2 , 1 = u 2 , 2 = β 2 (real and equal roots). Thus, f ( x ) = c 1 ( x β 1 ) 2 ϕ ( x ; μ 1 , σ 1 2 ) , with c 1 = [ λ 1 2 + σ 1 2 ] 1 , and g ( x ) = c 2 ( x β 2 ) 2 ϕ ( x ; μ 2 , σ 2 2 ) , with c 2 = [ λ 2 2 + σ 2 2 ] 1 . However, neither densities presents bimodality. Thus, p 1 ( x ) 0 , for all x R \ { β 1 } , and p 2 ( x ) 0 , for all x R \ { β 2 } .
(ii)
δ j > 0 Δ j < 0 , j = 1 , 2 : u j , 1 = β j + i δ j and u j , 2 = β j i δ j (complex and different roots). However, z 1 is defined in the real line, R .
(iii)
δ 1 = 0 Δ 1 = 0 , δ 2 > 0 Δ 2 < 0 : u 1 , 1 = u 1 , 2 = β 1 , u 2 , 1 = β 2 + i δ 2 and u 2 , 2 = β 2 i δ 2 (complex and different roots). Thus, f ( x ) = c 1 ( x β 1 ) 2 ϕ ( x ; μ 1 , σ 1 2 ) , with c 1 = [ λ 1 2 + σ 1 2 ] 1 . However, f ( x ) does not present bimodality and z 1 is defined in the real line, R . So, p 1 ( x ) 0 , for all x R \ { β 1 } .
(iv)
δ 1 > 0 Δ 1 < 0 , δ 2 = 0 Δ 2 = 0 , : u 2 , 1 = u 2 , 2 = β 2 , u 1 , 1 = β 1 + i δ 1 and u 1 , 2 = β 1 i δ 1 (complex and different roots). Hence, g ( x ) = c 2 ( x β 2 ) 2 ϕ ( x ; μ 2 , σ 2 2 ) , with c 2 = [ λ 2 2 + σ 2 2 ] 1 . However, g ( x ) does not present bimodality and z 1 is defined in the real line, R . Therefore, p 2 ( x ) 0 , for all x R \ { β 2 } .
All of these cases are analogous to those illustrated in the right panel of Figure 2.
Corollary 1.
Let Z B S S N ( μ , σ 2 , β , δ ) and Z 0 B S S N ( μ , σ 2 , β , δ 0 ) , both with pdf defined in Equation (1), the KL divergence between Z and Z 0 is given by
K ( Z , Z 0 ) = log λ 2 + σ 2 + δ 0 λ 2 + σ 2 + δ + E log ( Z β ) 2 + δ ( Z β ) 2 + δ 0 ,
with λ = β μ .
Proof. 
The result is straightforward from Proposition 2 (by replacing μ = μ 1 = μ 2 , σ = σ 1 = σ 2 , β = β 1 = β 2 , δ = δ 1 and δ 2 = δ 0 ) and some basic algebra. □
As is highlighted in Section 2, the KL divergence between two BSSN random variables tends to a KL divergence between two normal ones [17],
K ( Z 1 , Z 2 ) = 1 2 log σ 2 2 σ 1 2 + σ 1 2 σ 2 2 + ( μ 1 μ 2 ) 2 σ 2 2 1 = K ( X 1 , X 2 ) ,
as δ j or | β j | , where X j N ( μ j , σ j 2 ) , j = 1 , 2 .
Figure 3 (left) illustrates the numerical behavior of the KL divergence between two BSSN distributions under different δ 1 and δ 2 parameters. Specifically, we can observe from there the behavior of the KL divergence given in Proposition 2, where for δ 1 δ 2 , the KL divergence tends to zero but is always non-negative. For δ 1 δ 2 , we observe that KL divergence has the highest values. The right panel illustrates the cases δ 1 = { 0.5 , , 100 } and δ 2 = { 0 , 2 , 5 , 10 } , where the KL divergence converges to 1.269 when δ 2 = 0 (see Equation (12)), as δ 1 , and increases for δ 1 between 0 and 100. For  δ 1 , δ 2 > 0 , the KL divergence decreases because more similarity exists between bimodality parameters.

3.3. Jeffreys Divergence

As KL divergence is not symmetric, Jeffrey’s (J) divergence [18] is considered as a symmetric version of the KL divergence, which is defined as
J ( Z 1 , Z 2 ) = K ( Z 1 , Z 2 ) + K ( Z 2 , Z 1 ) = J ( Z 2 , Z 1 ) .
The J divergence does not satisfy the triangular inequality of distance, so it is a pseudo-distance measure.
Corollary 2.
Let Z j B S S N ( μ j , σ j 2 , β j , δ j ) , j = 1 , 2 , both with pdf defined in Equation (1), the J divergence between Z 1 and Z 2 is given by
J ( Z 1 , Z 2 ) = 1 2 1 σ 2 2 1 σ 1 2 c 1 2 σ 1 2 ( 3 σ 1 4 + 4 δ 1 σ 1 2 + [ λ 1 2 + δ 1 ] 2 ) c 2 2 σ 2 2 ( 3 σ 2 4 + 4 δ 2 σ 2 2 + [ λ 2 2 + δ 2 ] 2 ) + 1 2 σ 2 2 ( μ 1 μ 2 2 c 1 λ 1 σ 1 2 ) 2 + 1 2 σ 1 2 ( μ 2 μ 1 2 c 2 λ 2 σ 2 2 ) 2 2 c 1 2 λ 1 2 σ 1 2 2 c 2 2 λ 2 2 σ 2 2 + E log ( Z 1 β 1 ) 2 + δ 1 ( Z 1 β 2 ) 2 + δ 2 + E log ( Z 2 β 2 ) 2 + δ 2 ( Z 2 β 1 ) 2 + δ 1 .
with c j = [ λ j 2 + σ j 2 + δ j ] 1 , λ j = β j μ j , j = 1 , 2 .
Proof. 
The result is straightforward from the definition given in Equation (13), Proposition 2, and some basic algebra. □
As mentioned in Section 2, the J divergence between two BSSN random variables tends to a J divergence between two normal ones [17],
J ( Z 1 , Z 2 ) = 1 2 σ 1 2 σ 2 2 + ( μ 1 μ 2 ) 2 1 σ 2 2 + 1 σ 1 2 2 = J ( X 1 , X 2 ) ,
as δ j or | β j | , where X j N ( μ j , σ j 2 ) , j = 1 , 2 .

4. Bimodality Test

First, an analytical tool is necessary to determine a set of values for δ where bimodality exists. Following Proposition 2.5 of [10], the steps presented next determine these values for given μ , β and σ 2 parameters.

4.1. Bimodality

Let f ( k ) ( x ) = k f ( x ) x k be the kth derivative of f ( x ) with respect to x, k = 1 , 2 , we have
f ( 1 ) ( x ) = 2 c ( x β ) 1 2 σ 2 ( x μ ) [ ( x β ) 2 + δ ] ϕ ( x ; μ , σ 2 ) , f ( 2 ) ( x ) = 2 c 1 + 1 2 σ 4 ( x μ ) 2 [ ( x β ) 2 + δ ] 1 2 σ 2 [ ( x β ) 2 + 4 ( x μ ) ( x β ) + δ ] ϕ ( x ; μ , σ 2 ) .
Thus, the pdf of Equation (1) is bimodal if a δ 0 0 like δ < δ 0 exists for the following cases:
(i)
if μ = β , thus δ 0 = 2 σ 2 ;
(ii)
if μ β , thus f ( 1 ) ( x ) = 0 implies to find three roots, v 1 , v 2 and v 3 ( v 1 < v 2 < v 3 ), of the polynomial of degree three, r ( x ) = a 3 x 3 + a 2 x 2 + a 1 x + a 0 = 0 , with
a 3 = 1 2 σ 2 , a 2 = 1 2 σ 2 ( 2 β + μ ) , a 1 = 1 2 σ 2 ( β 2 + 2 β μ + δ 2 σ 2 ) , a 0 = 1 2 σ 2 ( β 2 μ + δ μ 2 β σ 2 ) .
For given μ , σ 2 and β parameters, μ β , the polynomial r ( x ) can be solved for v 2 in terms of δ and inequality f ( 2 ) ( v 2 ) > 0 can be used to determine δ 0 . This implies that
δ < 2 σ 4 + ( v 2 μ ) 2 ( v 2 β ) 2 σ 2 ( v 2 β ) ( 5 v 2 4 μ β ) σ 2 ( v 2 μ ) 2 = δ 0 .
Therefore, since δ < δ 0 , the upper bound given in Equation (15) can be used for detecting bimodality if δ 0 > 0 for a given root v 2 of r ( x ) , v 1 < v 2 < v 3 , and μ , σ 2 and β parameters.

4.2. Asymptotic Test

The results given in [12] can be applied, for example, to construct a bimodality test from the KL divergence presented in Corollary 1 between a regular BSSN distribution and a BSSN distribution without bimodality. Specifically, consider a random sample X 1 , , X n from X B S S N ( μ , σ 2 , β , δ ) and the null ( H 0 ) and alternative ( H 1 ) hypothesis
H 0 : δ δ 0 versus H 1 : δ > δ 0 ,
where the null and alternative hypothesis refers to bimodality and unimodality, respectively. Thus the BSSN random variable X becomes a B S S N ( μ , σ 2 , β , δ 0 ) random variable for a specific value δ 0 under H 0 . The δ 0 could be selected using, for example, the criteria explained in cases (i) and (ii) of Section 4.1.
Proposition 3.
Let θ ^ = ( μ ^ , σ 2 ^ , β ^ , δ ^ ) be the MLE of θ = ( μ , σ 2 , β , δ ) as in Section 2, and θ 0 ^ = ( μ ^ , σ 2 ^ , β ^ , δ 0 ) . Therefore, under H 0 we have
S K ( θ ^ , θ 0 ^ ) = 2 n K ^ ( Z , Z 0 ) d n χ 1 2 ,
where χ 1 2 denotes the chi-square distribution with 1 degree of freedom, and K ^ ( Z , Z 0 ) is the MLE of K ( Z , Z 0 ) defined in Equation (12) of Corollary 1.
Proof. 
The result is straightforward from ([12], p. 375). □
Under specifications of Proposition 3, the statistic S K ( θ ^ , θ 0 ^ ) depends only on δ ^ and n. As stated in Section 2 and Section 3, unimodality is typically obtained from the BSSN class at δ δ 0 . Given that FIM is regular and regularity conditions (i), (ii), and (iii) stated in ([12], p. 375) are satisfied, it is possible to test bimodality via hypothesis testing of Proposition 3. Let
C = { X 1 , , X n | S K ( θ ^ , θ 0 ^ ) χ 1 α 2 , 0 < α < 1 }
be the critical region related to (16), thus P ( χ 1 2 χ 1 α 2 ) = 1 α . Hence, from Proposition 3, evidence exists to accept the null hypothesis of bimodality given in Equation (16) at level α if
P ( χ 1 2 < S K ( θ ^ , θ 0 ^ ) ) > 1 α or P ( χ 1 2 > S K ( θ ^ , θ 0 ^ ) ) α .
The observed power of the asymptotic bimodality test can be obtained from Equations (17) and (18), for different sample sizes and values of the bimodality parameter. These results were obtained from 1000 simulations for a nominal level of 5%. In each simulation, the estimation of the BSSN model’s parameters was carried out by maximizing the likelihood function of Equation (4) over the parameter space θ and a random sample of size n = 25 , 50 , 100 and 200. To estimate the parameters and get their standard errors, first the random sample is obtained using the rBSSN function of gamlssbssn package [19]. Second, the log-likelihood function is computed using the pdf of Equation (1) implemented in the same package. Third,, the log-likelihood function is optimized using the mle function included in the stats4 package of R software [15]. To avoid local maxima, the optimization routine was run using specific starting values used for random samples.
Table 1 shows that the proposed test is highly conservative since the observed rate of incorrect rejections of the bimodality hypothesis ( H 0 ) is always lower than the nominal level, i.e., for δ δ 0 and δ δ 0 , the observed power tends to increase and decrease, respectively. The proposed test is also more powerful in large samples ( n 100 ) and for δ > 0.5 . As expected, the power of the test increases with sample size, given that statistic S K ( θ ^ , θ ^ ) depends on n although K ^ ( Z , Z 0 ) is small (Figure 3).

5. Application to Sea Surface Temperature Data

A real application in this section illustrates the performance of the asymptotic bimodality test. Specifically, we considered the Sea Surface Temperature (SST) data sets presented in [20], which were recorded from 2012 to 2014 by scientific observers of the northern Chilean longline fleet (industrial and artisanal, 21 ° 31 36 ° 39 LS and 71 ° 08 85 ° 52 LW). Contreras-Reyes et al. [20] proposed the Skewed Reflected Gompertz (SRG) model based on two-piece distributions [21] as suitable for interpreting annual bimodal and asymmetric SST data. The SRG distribution produces two-piece asymmetric and bimodality behavior of Gompertz (GZ) density.
To estimate the parameters and get their standard errors, the log-likelihood function and its optimization were carried out (see Section 4.2). However, to avoid local maxima, the optimization routine was run using specific starting values obtained by visual inspection of histograms, which are widely scattered in the parameter space. To evaluate the goodness of fit test, the Kolmogorov–Smirnov (K–S), Anderson–Darling (A–D), and Cramer-von Mises (C–V) tests were considered for all models. These are commonly used to analyze the goodness of fit test of a particular distribution (see e.g., [20,21]). The test are implemented with the goftest package [22] of R software, and all considered the cumulative distribution function pBSSN of gamlssbssn package [19]. The proposed asymptotic bimodality test is compared with a nonparametric approach-based asymptotic test (DIPtest), implemented in the diptest package/function [23].
Considering the smallest Akaike (AIC) and Schwarz (BIC) information criteria values, we observed in Table 2 that BSSN performs better than the SRG model (and the other competitors, see AIC and BIC values reported in Table 1 of [20]). In addition, considering the K–S, A–D, and C–V test for a 95% confidence level, BSSN fits perform well for all years (p-values higher than 0.05 mean appropriate goodness of fit). Figure 4 illustrates this performance, where more than one mode is presented in histograms. The most notorious bimodality emerged for 2014.
Parameters estimated from the BSSN model, presented in Table 2, are used to perform the SE and KL divergence for SST in each year and for the asymptotic test of Section 4.2. The determination of δ 0 was conducted using the procedure explained in Section 4.1. The results of these analyses appear in Table 3, where K ^ ( Z , Z 0 ) represents the KL divergence under null hypothesis. Shannon entropies illustrate that most SST information come from 2013. In addition, the asymptotic test presented in Table 3 is analogous for all years. In fact, the null hypothesis H 0 of bimodality is accepted at 95% confidence level according to Equation (18). This acceptance is reinforced by large sample size and by the DIPtest results, where rejection (p-value < 0.05) implies at least bimodality.

6. Conclusions

We have presented a methodology to compute the Shannon entropy and the Kullback–Leibler and Jeffreys divergences for the family of bimodal skew-symmetric normal distributions. Given the regularity conditions accomplished by the BSSN distribution, specifically by the regularity of Fisher information matrix, an asymptotic test for bimodality was developed. A statistical application to South Pacific sea surface temperature was given. We illustrated that asymptotic tests in samples of three years were useful to detect strong evidence of bimodality. This approach can be applied to real models and used for data analysis in various systems, such as Artic Sea Temperature [24] and biological [25] data.
The main result is that information measures and asymptotic tests can be employed in bimodal distributions (if regularity conditions are accomplished [12]) and present enough flexibility in complex data. Compared with DIPtest [9,23] and kurtosis measure [7,8], the proposed asymptotic test for bimodality presented the following novelties: (i) it was built under a parametric approach (a known distribution); (ii) it was based on information measures; and (iii) it considered regularity conditions of BSSN distribution. In addition, the computation of information quantifiers of BSSN distributions is a more adequate tool, compared with information quantifiers obtained for finite mixture of flexible distributions, where Shannon entropy is approximated by bounds [26].
Finally, we encourage researchers to consider the proposed methodology for further investigations with other bimodal distributions, such as bimodal normal distribution [11], the extension proposed in Equation (19) of [10], or the generalized bimodal skew-normal distribution proposed by [27].

Author Contributions

J.E.C.-R. wrote the paper and contributed reagents/analysis/materials tools; conceived, designed and performed the experiments and analyzed the data. All authors have read and agreed to the published version of the manuscript.

Acknowledgments

I am grateful to Daniel Devia Cortés (IFOP) for providing access to the data used in this work. Author’s research was fully supported by Grant FONDECYT (Chile) No. 11190116. The author thanks the editor and two anonymous referees for their helpful comments and suggestions. All R codes and data used in this paper are available by request to the corresponding author.

Conflicts of Interest

The author declares that there is no conflict of interest in the publication of this paper.

References

  1. Wyszomirski, T. Detecting and displaying size bimodality: Kurtosis, skewness and bimodalizable distributions. J. Theor. Biol. 1992, 158, 109–128. [Google Scholar] [CrossRef]
  2. Ashman, K.M.; Bird, C.M.; Zepf, S.E. Detecting bimodality in astronomical datasets. Astr. J. 1994, 108, 2348–2361. [Google Scholar] [CrossRef]
  3. Hosenfeld, B.; Van Der Maas, H.L.; Van den Boom, D.C. Detecting bimodality in the analogical reasoning performance of elementary schoolchildren. Int. J. Behav. Dev. 1997, 20, 529–547. [Google Scholar] [CrossRef] [Green Version]
  4. Bao, X.; Tang, O.; Ji, J. Applying the minimum relative entropy method for bimodal distribution in a remanufacturing system. Int. J. Prod. Econ. 2008, 113, 969–979. [Google Scholar] [CrossRef]
  5. Freeman, J.B.; Dale, R. Assessing bimodality to detect the presence of a dual cognitive process. Behav. Res. Methods 2013, 45, 83–97. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  6. Shalek, A.K.; Satija, R.; Adiconis, X.; Gertner, R.S.; Gaublomme, J.T.; Raychowdhury, R.; Schwartz, S.; Yosef, N.; Malboeuf, C.; Lu, D.; et al. Single-cell transcriptomics reveals bimodality in expression and splicing in immune cells. Nature 2013, 498, 236–240. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  7. Darlington, R.B. Is kurtosis really “peakedness?”. Am. Stat. 1970, 24, 19–22. [Google Scholar]
  8. Hildebrand, D.K. Kurtosis measures bimodality? Am. Stat. 1971, 25, 42–43. [Google Scholar]
  9. Hartigan, J.A.; Hartigan, P.M. The dip test of unimodality. Ann. Stat. 1985, 13, 70–84. [Google Scholar] [CrossRef]
  10. Hassan, M.Y.; El-Bassiouni, M.Y. Bimodal skew-symmetric normal distribution. Commun. Stat. Theory Methods 2016, 45, 1527–1541. [Google Scholar] [CrossRef]
  11. Hassan, M.Y.; Hijazi, R. A bimodal exponential power distribution. Pak. J. Stat. 2010, 26, 379–396. [Google Scholar]
  12. Salicrú, M.; Menéndez, M.L.; Pardo, L.; Morales, D. On the applications of divergence type measures in testing statistical hypothesis. J. Multivar. Anal. 1994, 51, 372–391. [Google Scholar] [CrossRef] [Green Version]
  13. Cover, T.M.; Thomas, J.A. Elements of Information Theory, 2nd ed.; Wiley & Son, Inc.: New York, NY, USA, 2006. [Google Scholar]
  14. Kullback, S.; Leibler, R.A. On information and sufficiency. Ann. Math. Stat. 1951, 22, 79–86. [Google Scholar] [CrossRef]
  15. R Core Team. A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2019; Available online: http://www.R-project.org (accessed on 14 April 2020).
  16. Piessens, R.; deDoncker-Kapenga, E.; Uberhuber, C.; Kahaner, D. Quadpack: A Subroutine Package for Automatic Integration; Springer: Berlin, Germany, 1983. [Google Scholar]
  17. Contreras-Reyes, J.E. Asymptotic form of the Kullback–Leibler divergence for multivariate asymmetric heavy-tailed distributions. Phys. A 2014, 395, 200–208. [Google Scholar] [CrossRef]
  18. Jeffreys, H. An invariant form for the prior probability in estimation problems. Proc. R. Soc. Lond. Ser. A 1946, 186, 453–461. [Google Scholar]
  19. Hossain, A.; Rigby, R.; Stasinopoulos, M. R Package Gamlssbssn: Bimodal Skew Symmetric Normal Distribution (Version 0.1.0). 2017. Available online: https://cran.r-project.org/web/packages/gamlssbssn/index.html (accessed on 14 April 2020).
  20. Contreras-Reyes, J.E.; Maleki, M.; Cortés, D.D. Skew-Reflected-Gompertz information quantifiers with application to sea surface temperature records. Mathematics 2019, 7, 403. [Google Scholar] [CrossRef] [Green Version]
  21. Hoseinzadeh, A.; Maleki, M.; Khodadadi, Z.; Contreras-Reyes, J.E. The Skew-Reflected-Gompertz distribution for analyzing symmetric and asymmetric data. J. Comput. Appl. Math. 2019, 349, 132–141. [Google Scholar] [CrossRef]
  22. Faraway, J.; Marsaglia, G.; Marsaglia, J.; Baddeley, A. R Package Goftest: Classical Goodness-of-Fit Tests for Univariate Distributions (Version 1.2-2). 2019. Available online: https://cran.r-project.org/web/packages/goftest/index.html (accessed on 14 April 2020).
  23. Maechler, M. R Package Diptest: Hartigan’s Dip Test Statistic for Unimodality—Corrected (Version 0.75-7). 2016. Available online: https://cran.r-project.org/web/packages/diptest/index.html (accessed on 1 May 2020).
  24. Lorentzen, T. Statistical analysis of temperature data sampled at Station-M in the Norwegian Sea. J. Mar. Syst. 2014, 130, 31–45. [Google Scholar] [CrossRef]
  25. Contreras-Reyes, J.E.; Canales, T.M.; Rojas, P.M. Influence of climate variability on anchovy reproductive timing off northern Chile. J. Mar. Syst. 2016, 164, 67–75. [Google Scholar] [CrossRef]
  26. Contreras-Reyes, J.E.; Cortés, D.D. Bounds on Rényi and Shannon entropies for finite mixtures of multivariate skew-normal distributions: Application to swordfish (Xiphias gladius linnaeus). Entropy 2016, 18, 382. [Google Scholar] [CrossRef]
  27. Venegas, O.; Salinas, H.S.; Gallardo, D.I.; Bolfarine, B.; Gómez, H.W. Bimodality based on the generalized skew-normal distribution. J. Stat. Comput. Simul. 2018, 88, 156–181. [Google Scholar] [CrossRef]
Figure 1. Various shapes of the pdfs of X B S S N ( μ , σ 2 , β , δ ) , with σ 2 = 5 , δ = 0 , 2 , 5 , 10 (black, red, blue and violet lines, respectively); and (a) μ = 1 , β = 1 , (b) μ = 1 , β = 0 , (c) μ = 1 , β = 0.5 , and (d) μ = 1 , β = 0.5 parameters.
Figure 1. Various shapes of the pdfs of X B S S N ( μ , σ 2 , β , δ ) , with σ 2 = 5 , δ = 0 , 2 , 5 , 10 (black, red, blue and violet lines, respectively); and (a) μ = 1 , β = 1 , (b) μ = 1 , β = 0 , (c) μ = 1 , β = 0.5 , and (d) μ = 1 , β = 0.5 parameters.
Symmetry 12 01013 g001
Figure 2. (Left) Shannon entropy for X B S S N ( μ , σ 2 , β , δ ) using several combinations of μ , β and δ = 0.1 , 0.2 , , 20 . (Right) Inverse roots of p ( x ) = ( x u 1 ) ( x u 2 ) in the unit circle for the same values of β and δ used in the left panel, where the inverse roots, 1 / u 1 and 1 / u 2 are plotted in their real (x axis) and imaginary (y axis) parts, respectively.
Figure 2. (Left) Shannon entropy for X B S S N ( μ , σ 2 , β , δ ) using several combinations of μ , β and δ = 0.1 , 0.2 , , 20 . (Right) Inverse roots of p ( x ) = ( x u 1 ) ( x u 2 ) in the unit circle for the same values of β and δ used in the left panel, where the inverse roots, 1 / u 1 and 1 / u 2 are plotted in their real (x axis) and imaginary (y axis) parts, respectively.
Symmetry 12 01013 g002
Figure 3. (Left) KL divergence between Z 1 B S S N ( 1 , 5 , 1 , δ 1 ) and Z 2 B S S N ( 1 , 5 , 1 , δ 2 ) , for δ = 0.1 , 0.2 , , 20 . (Right) KL divergence between Z 1 and Z 2 , Z j B S S N ( μ , σ 2 , β , δ j ) , j = 1 , 2 , for δ 1 = 0.5 , , 100 , δ 2 = 0 , 2 , 5 , 10 , and the same parameters μ , σ 2 and β of Figure 1 and Figure 2.
Figure 3. (Left) KL divergence between Z 1 B S S N ( 1 , 5 , 1 , δ 1 ) and Z 2 B S S N ( 1 , 5 , 1 , δ 2 ) , for δ = 0.1 , 0.2 , , 20 . (Right) KL divergence between Z 1 and Z 2 , Z j B S S N ( μ , σ 2 , β , δ j ) , j = 1 , 2 , for δ 1 = 0.5 , , 100 , δ 2 = 0 , 2 , 5 , 10 , and the same parameters μ , σ 2 and β of Figure 1 and Figure 2.
Symmetry 12 01013 g003
Figure 4. Histograms of SST data by year and their respective MLE fits of BSSN model (solid line).
Figure 4. Histograms of SST data by year and their respective MLE fits of BSSN model (solid line).
Symmetry 12 01013 g004
Table 1. Observed power (in %) of the proposed bimodality test using MLE of BSSN model from 1000 simulations for nominal level 5%, locations μ = 1 and β = 0 (see Figure 1b), various values of bimodality parameters δ and δ 0 , and sample size n.
Table 1. Observed power (in %) of the proposed bimodality test using MLE of BSSN model from 1000 simulations for nominal level 5%, locations μ = 1 and β = 0 (see Figure 1b), various values of bimodality parameters δ and δ 0 , and sample size n.
δ 0
n δ 0.51235710
250.525.4017.6319.7834.3159.9775.6986.49
244.2730.1923.0121.3933.6148.2265.56
575.1956.8534.5126.7725.0531.9138.73
784.0470.1240.2132.3423.0026.9634.07
500.518.1416.6447.9572.1493.5897.5299.23
262.5835.9723.8933.0559.7777.1487.29
594.4281.4551.1131.9625.9036.4049.09
797.6887.8364.0347.0426.0126.7236.61
1000.519.7029.6581.7495.5399.7599.87100.00
279.7648.5924.2239.9277.8292.4597.79
599.9096.2069.7043.4026.0336.6059.50
799.9099.2088.0066.2729.2026.0341.00
2000.521.3753.3395.04100.00100.00100.00100.00
295.8070.1024.9251.3093.2099.30100.00
5100.00100.0093.2060.1023.3045.5080.10
7100.00100.0099.4088.9038.8024.2045.60
Table 2. Parameter estimates and their respective standard deviations (S.D) for SST by year based on BSSN model. For each fit, log-likelihood function ( θ ) with θ = ( μ , σ 2 , β , δ ) , Akaike (AIC) and Schwarz (BIC) information criteria, and goodness of fit tests (Kolmogorov–Smirnov (K–S), Anderson–Darling (A–D), and Cramer–von Mises, (C–V)) are also reported with respective p-values in parenthesis.
Table 2. Parameter estimates and their respective standard deviations (S.D) for SST by year based on BSSN model. For each fit, log-likelihood function ( θ ) with θ = ( μ , σ 2 , β , δ ) , Akaike (AIC) and Schwarz (BIC) information criteria, and goodness of fit tests (Kolmogorov–Smirnov (K–S), Anderson–Darling (A–D), and Cramer–von Mises, (C–V)) are also reported with respective p-values in parenthesis.
YearParam.Estim.(S.D) ( θ ) AICBICK–SA–DC–V
2012 μ 19.0070.078−1396.12800.32818.90.0421.7600.233
( n = 774 ) σ 2 1.4340.020 (0.13)(0.13)(0.21)
β 19.6700.151
δ 1.7460.384
2013 μ 18.1870.068−683.711375.41391.50.0350.6360.074
( n = 414 ) σ 2 0.8860.044 (0.68)(0.61)(0.73)
β 18.3280.127
δ 1.0260.310
2014 μ 17.6280.040−643.621295.21311.60.0430.5180.070
( n = 439 ) σ 2 0.5500.054 (0.41)(0.73)(0.75)
β 17.6820.058
δ 0.3060.079
Table 3. BSSN Shannon, H ( Z ) , KL divergence K ^ ( Z , Z 0 ) , statistic and respective p-values of Equation (17) are reported for SST data and for each year. All reported H ( Z ) , δ ^ , and K ^ ( Z , Z 0 ) estimates considered the estimated parameters and sample size n reported in Table 2.
Table 3. BSSN Shannon, H ( Z ) , KL divergence K ^ ( Z , Z 0 ) , statistic and respective p-values of Equation (17) are reported for SST data and for each year. All reported H ( Z ) , δ ^ , and K ^ ( Z , Z 0 ) estimates considered the estimated parameters and sample size n reported in Table 2.
MethodQuantifier201220132014
Proposed H ( Z ) 1.6131.6341.455
δ ^ 1.7461.0260.306
δ 0 9.2732.5342.579
K ^ ( Z , Z 0 ) 0.0030.0250.138
Statistic556.65721.425121.087
p-value0.9710.9991.000
DIPtestStatistic0.0230.0290.038
p-value<0.010.016<0.01

Share and Cite

MDPI and ACS Style

Contreras-Reyes, J.E. An Asymptotic Test for Bimodality Using The Kullback–Leibler Divergence. Symmetry 2020, 12, 1013. https://doi.org/10.3390/sym12061013

AMA Style

Contreras-Reyes JE. An Asymptotic Test for Bimodality Using The Kullback–Leibler Divergence. Symmetry. 2020; 12(6):1013. https://doi.org/10.3390/sym12061013

Chicago/Turabian Style

Contreras-Reyes, Javier E. 2020. "An Asymptotic Test for Bimodality Using The Kullback–Leibler Divergence" Symmetry 12, no. 6: 1013. https://doi.org/10.3390/sym12061013

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop