Next Article in Journal
A Quasi-Affine Transformation Evolutionary Algorithm Enhanced by Hybrid Taguchi Strategy and Its Application in Fault Detection of Wireless Sensor Network
Next Article in Special Issue
Simulation Techniques for Strength Component Partially Accelerated to Analyze Stress–Strength Model
Previous Article in Journal
Possible Expansion of Blood Vessels by Means of the Electrostrictive Effect
Previous Article in Special Issue
Analysis of Milk Production and Failure Data: Using Unit Exponentiated Half Logistic Power Series Class of Distributions
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Confidence Interval Estimation for the Ratio of the Percentiles of Two Delta-Lognormal Distributions with Application to Rainfall Data

by
Warisa Thangjai
1,†,
Sa-Aat Niwitpong
2,*,†,
Suparat Niwitpong
2,† and
Narudee Smithpreecha
3,†
1
Department of Statistics, Faculty of Science, Ramkhamhaeng University, Bangkok 10240, Thailand
2
Department of Applied Statistics, Faculty of Applied Science, King Mongkut’s University of Technology North Bangkok, Bangkok 10800, Thailand
3
Department of Mathematics and Statistics, Faculty of Science and Technology, Rajamangala University of Technology Phra Nakhon, Bangkok 10800, Thailand
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Symmetry 2023, 15(4), 794; https://doi.org/10.3390/sym15040794
Submission received: 24 February 2023 / Revised: 14 March 2023 / Accepted: 22 March 2023 / Published: 24 March 2023

Abstract

:
The log-normal distribution (skewed distribution or asymmetry distribution) is used to describe random variables comprising positive real values. It is well known that the logarithm values of these are normally distributed (symmetry distribution). Positively right-skewed data applicable to the log-normal distribution are frequently observed in the fields of environmental studies, biology, and medicine. The number of zero observations follows a binomial distribution. However, problems can arise in the analysis of data containing zero observations along with log-normally distributed data, for which the delta-lognormal distribution is often referred to for using the analysis of the data. In statistics, the percentile provides the relative standing of a numerical data point when compared to all of the others in a distribution with reference to the observations at or below it. In this study, estimates for the confidence interval for the ratio of the percentiles of two delta-lognormal distributions are constructed using fiducial generalized confidence interval approaches based on the fiducial quantity and the optimal generalized fiducial quantity, the Bayesian approach, and the parametric bootstrap method. As assessed by Monte Carlo simulations using the RStudio programming in terms of the coverage probability and the average length, the Bayesian approach performed quite well by providing adequate coverage probabilities along with the shortest average lengths in all of the scenarios tested. Daily rainfall data contain both zero and positive values. The daily rainfall data can usually be fitted to the delta-lognormal distribution. Their application to rainfall data is also provided to illustrate their efficacies with real data. The efficacy of the approach is used to compare two rainfall dispersion populations.

1. Introduction

The quantile, decile, and percentile can be used to investigate the central tendency and spread of a distribution. The difference between the third quartile and the first quartile is the interquartile range, which can be used as a measure of the spread of a distribution. The percentile has been utilized in many applications, such as in insurance and environmental science. For example, it has been used to compare the same-strength properties in the wood industry [1,2], to set premiums in the insurance industry [3], to compute rainfall amounts [4], and to estimate rainfall dispersion [5]. The ratio of the same-strength properties to compare two sources of lumber in terms of dimensions, grade, or species is commonly used as a comparison measure in the wood industry [1], for which the US lumber standards use the fifth percentile. Moreover, using the ratio of the fifth percentiles of two strength distributions is more meaningful than using the ratio of their means [2]. In addition, the ratio of percentiles is scale-free and easily interpretable. Thailand is an agricultural country, for which water is essential. Since the amount of water available depends on rainfall, predicting the latter is of interest, which is a particularly difficult task. Motivated by the application of the ratio of percentiles used in the lumber industry, we applied this approach in this study to compare two rainfall dispersion populations.
For environmental data, meteorology and climatology studies are often the positive values or the right-skewed data. The data follow the normal, log-normal, gamma, and exponential distributions. However, the log-normal distribution is commonly used in analysis of the right-skewed data. The delta-lognormal distribution can be used to investigate populations comprising a combination of zero and positive values [6]. The number of zero observations has a binomial distribution with probability δ = 1 δ , whereas the positive observations follow a log-normal distribution with probability δ . The log-normal distribution (asymmetry distribution) is used to describe random variables comprising positive real values. It is well known that the logarithm values of the random variables are normal distributions (symmetry distribution). The delta-lognormal distribution has been utilized in many fields, including medical and environmental science. For example, it has been used to study the diagnostic test charge data for older adults with depression [7] and to estimate rainfall dispersion [5].
Several researchers have studied interval estimation for the parameters of the delta-lognormal distribution, such as the mean, variance, coefficient of variation, and percentile. For instance, Hasan and Krishnamoorthy [8] estimated the confidence interval for the mean and the percentile of a delta-lognormal distribution. Moreover, Thangjai et al. [5] estimated the confidence interval for the common percentile of several delta-lognormal distributions.
Interval estimation about the ratio of percentiles of two delta-lognormal distributions are of practical and theoretical importance. To the best of our knowledge, the interval estimation for the ratio of the percentiles of two delta-lognormal distributions has not previously been studied. Therefore, the confidence interval for the ratio of the percentiles of two delta-lognormal distributions is of interest. The fiducial generalized confidence interval (FGCI), the Bayesian (BS), and the parametric bootstrap (PB) have been widely used to estimate confidence intervals for the parameter of interest [5]. The FGCI, BS, and PB approaches are based on simulated data. The FGCI approach uses simulation based on the fiducial generalized pivotal quantity (FGPQ). The BS approach uses simulation based on the prior distribution. The PB approach uses simulation based on the sampling distribution. This paper focuses on developing an approach for estimating the ratio of the percentiles of two delta-lognormal distributions. Herein, estimates for the confidence interval for the ratio of the percentiles of two delta-lognormal distributions are provided using several methods: two FGCI approaches based on the fiducial quantity and the optimal generalized fiducial quantity, one based on the BS approach, and one based on the PB approach. The confidence interval estimates were investigated via a Monte Carlo simulation study and then used to compare the percentiles of rainfall dispersion datasets from two regions in Thailand.
The rest of the paper is organized as follows. Several methods to construct the confidence intervals for the ratio of the percentiles of two delta-lognormal distributions are demonstrated in Section 2. The simulation studies are presented in Section 3. Application is given to illustrate the proposed approaches of constructing confidence intervals in Section 4. Some conclusions are given in Section 5.

2. Methods

For i = 1 , 2 , let n i ( 0 ) be the number of true zero observed value and let n i ( 1 ) be the number of positive observed values. Additionally, let n i = n i ( 0 ) + n i ( 1 ) be the number of the sample size. Let X i = ( X i 1 , X i 2 , . . . , X i n i ) be a non-negative random sample of size n i from the delta-lognormal distribution with parameters mean μ i , variance σ i 2 , and probability of obtaining the positive observation δ i . Moreover, let δ i = 1 δ i be the probability of zero observation. The density function of delta-lognormal distribution is
f ( x i j ; μ i , σ i 2 , δ i ) = ( 1 δ i ) I 0 [ x i j ] + δ i 1 x i j σ i 2 π exp 1 2 ln ( x i j ) μ i σ i 2 I ( 0 , ) [ x i j ] ,
where I 0 [ x i j ] is an indicator function for which the values are equal to 1 when x i j = 0 and 0 otherwise, and I ( 0 , ) [ x i j ] are equal to 0 when x i j = 0 and 1 when x i j > 0.
The distribution function of delta-lognormal distribution is
G ( x i j ; μ i , σ i 2 , δ i ) = { δ i ; x i j = 0 δ i + δ i F ( x i j ; μ i , σ i 2 ) ; x i j > 0 ,
where F ( x i j ; μ i , σ i 2 ) is the log-normal cumulative distribution function and j = 1 , 2 , . . . , n i .
Let Y i = ln ( X i ) be the normal distribution with parameters mean μ i and variance σ i 2 . Let Y ¯ i and S i be the estimators of mean and standard deviation, respectively. Suppose that y ¯ i and s i are the observed values of Y ¯ i and S i , respectively. Moreover, let Y ¯ i ( 1 ) and S i ( 1 ) be the estimators of mean and standard deviation based on the log-transformed positive observations, respectively. Suppose that y ¯ i ( 1 ) and s i ( 1 ) are the observed values of Y ¯ i ( 1 ) and S i ( 1 ) , respectively.
Let q p i be the p i -th quantile of the delta-lognormal distribution. From the distribution function, that is G ( q p i ; μ i , σ i 2 , δ i ) = p i . Therefore, the  q p i is
q p i = { 0 ; p i < δ i exp μ i + Φ 1 p i δ i 1 δ i σ i ; p i > δ i ,
where Φ is the standard normal distribution function.
Suppose that λ p i = μ i + Φ 1 p i δ i 1 δ i σ i . The estimator of the q p i is
q ^ p i = exp ( λ ^ p i ) = exp Y ¯ i ( 1 ) + Φ 1 p i δ i 1 δ i S i ( 1 ) .
Let θ be the ratio of two percentiles of delta-lognormal distributions. The estimator of θ is
θ ^ = q ^ p 1 q ^ p 2 = exp ( λ ^ p 1 ) exp ( λ ^ p 2 ) = exp Y ¯ 1 ( 1 ) + Φ 1 p 1 δ 1 1 δ 1 S 1 ( 1 ) exp Y ¯ 2 ( 1 ) + Φ 1 p 2 δ 2 1 δ 2 S 2 ( 1 ) .

2.1. Fiducial Generalized Confidence Interval Approach

We proposed the FGCI approach based on the fiducial quantity and the FGCI approach based on optimal generalized fiducial quantity. First, the FGCI approach based on the fiducial quantity uses the FGPQ of θ ^ . The FGPQ of θ ^ is based on the FGPQs of μ 1 , μ 2 , σ 1 2 , σ 2 2 , δ 1 , δ 2 , λ p 1 , and  λ p 2 .
Let Z 1 and Z 2 be the standard normal distributions. Additionally, let U 1 ( 1 ) and U 2 ( 1 ) be the chi-squared distributions with n 1 ( 1 ) 1 and n 2 ( 1 ) 1 degrees of freedom, respectively. The FGPQs for μ 1 , μ 2 , σ 1 2 , and  σ 2 2 are given by
R μ 1 = y ¯ 1 ( 1 ) Z 1 U 1 ( 1 ) ( n 1 ( 1 ) 1 ) s 1 ( 1 ) 2 n 1 ( 1 )
R μ 2 = y ¯ 2 ( 1 ) Z 2 U 2 ( 1 ) ( n 2 ( 1 ) 1 ) s 2 ( 1 ) 2 n 2 ( 1 )
R σ 1 2 = ( n 1 ( 1 ) 1 ) s 1 ( 1 ) 2 U 1 ( 1 )
and
R σ 2 2 = ( n 2 ( 1 ) 1 ) s 2 ( 1 ) 2 U 2 ( 1 ) .
According to Thangjai et al. [5], let B n 1 ( 0 ) + 0.5 , n 1 ( 1 ) + 0.5 be the beta random variable with shape parameters n 1 ( 0 ) + 0.5 and n 1 ( 1 ) + 0.5 . Additionally, let B n 2 ( 0 ) + 0.5 , n 2 ( 1 ) + 0.5 be the beta random variable with shape parameters n 2 ( 0 ) + 0.5 and n 2 ( 1 ) + 0.5 . Let V 1 and V 2 be the standard uniform distributions. Let p 1 and p 2 be the percentiles. Let H ( p 1 ; n 1 ( 0 ) + 0.5 , n 1 ( 1 ) + 0.5 ) and H ( p 2 ; n 2 ( 0 ) + 0.5 , n 2 ( 1 ) + 0.5 ) be the beta distribution functions. Moreover, let H 1 ( V 1 H ( p 1 ; n 1 ( 0 ) + 0.5 , n 1 ( 1 ) + 0.5 ) ; n 1 ( 0 ) + 0.5 , n 1 ( 1 ) + 0.5 ) and H 1 ( V 2 H ( p 2 ; n 2 ( 0 ) + 0.5 , n 2 ( 1 ) + 0.5 ) ; n 2 ( 0 ) + 0.5 , n 2 ( 1 ) + 0.5 ) be the quartile functions of the beta distributions. The FGPQs for δ 1 and δ 2 are given by
R δ 1 = H 1 ( V 1 H ( p 1 ; n 1 ( 0 ) + 0.5 , n 1 ( 1 ) + 0.5 ) ; n 1 ( 0 ) + 0.5 , n 1 ( 1 ) + 0.5 )
and
R δ 2 = H 1 ( V 2 H ( p 2 ; n 2 ( 0 ) + 0.5 , n 2 ( 1 ) + 0.5 ) ; n 2 ( 0 ) + 0.5 , n 2 ( 1 ) + 0.5 ) .
The FGPQs for μ 1 , σ 1 2 , and  δ 1 are used to compute the FGPQ for λ p 1 . Additionally, the FGPQs for μ 2 , σ 2 2 , and  δ 2 are used to calculate the FGPQ for λ p 2 . The FGPQs for λ p 1 and λ p 2 are given by
R λ p 1 = R μ 1 + R σ 1 2 n 1 ( 1 ) Z 1 + Φ 1 p 1 R δ 1 1 R δ 1 n 1 ( 1 ) U 1 ( 1 )
and
R λ p 2 = R μ 2 + R σ 2 2 n 2 ( 1 ) Z 2 + Φ 1 p 2 R δ 2 1 R δ 2 n 2 ( 1 ) U 2 ( 1 ) ,
where Φ 1 ( ) is the quartile function.
The FGPQ for θ is
R θ = exp ( R λ p 1 ) exp ( R λ p 2 ) ,
where R λ p 1 and R λ p 2 are defined in Equation (12) and Equation (13), respectively.
Therefore, the  100 ( 1 α ) % two-sided confidence interval for the ratio of two percentiles based on the FGCI approach using fiducial quantity is
C I F G C I 1 = [ L F G C I 1 , U F G C I 1 ] = [ R θ ( α / 2 ) , R θ ( 1 α / 2 ) ] ,
where R θ ( α / 2 ) and R θ ( 1 α / 2 ) are the 100 ( α / 2 ) -th and 100 ( 1 α / 2 ) -th percentiles of R θ , respectively.
Algorithm 1: Confidence interval based on FGCI approach using fiducial quantity
Step 1: Calculate the values of R μ 1 , R μ 2 , R σ 1 2 , and  R σ 2 2 as given in Equations (6)–(9).
Step 2: Calculate the values of R δ 1 , R δ 2 , R λ p 1 , and  R λ p 2 as given in Equations (10)–(13).
Step 3: Calculate the value of R θ as given in Equation (14).
Step 4: Repeat the step 1–step 3 for q times.
Step 5: Calculate the values of L F G C I 1 and U F G C I 1 as given in Equation (15).
Second, the concept of the FGCI approach based on the optimal generalized fiducial quantity is similar to the concept of the FGCI approach based on the fiducial quantity. The FGCI approach based on optimal generalized fiducial quantity uses the FGPQ of θ ^ , which is given by
R θ . m = exp ( R λ p . m 1 ) exp ( R λ p . m 2 ) ,
where
R λ p . m 1 = R μ 1 + R σ 1 2 n 1 ( 1 ) Z 1 + Φ 1 p 1 R δ m 1 1 R δ m 1 n 1 ( 1 ) U 1 ( 1 ) ,
R λ p . m 2 = R μ 2 + R σ 2 2 n 2 ( 1 ) Z 2 + Φ 1 p 2 R δ m 2 1 R δ m 2 n 2 ( 1 ) U 2 ( 1 ) ,
R δ m 1 = H 1 ( V 1 H ( p 1 ; n 1 ( 0 ) , n 1 ( 1 ) + 1 ) ; n 1 ( 0 ) + 1 , n 1 ( 1 ) ) 2 ,
and
R δ m 2 = H 1 ( V 2 H ( p 2 ; n 2 ( 0 ) , n 2 ( 1 ) + 1 ) ; n 2 ( 0 ) + 1 , n 2 ( 1 ) ) 2 .
Therefore, the  100 ( 1 α ) % two-sided confidence interval for the ratio of two percentiles based on the FGCI approach using the optimal generalized fiducial quantity is
C I F G C I 2 = [ L F G C I 2 , U F G C I 2 ] = [ R θ . m ( α / 2 ) , R θ . m ( 1 α / 2 ) ] ,
where R θ . m ( α / 2 ) and R θ . m ( 1 α / 2 ) are the 100 ( α / 2 ) -th and 100 ( 1 α / 2 ) -th percentiles of R θ . m , respectively.
Algorithm 2: Confidence interval based on FGCI approach using optimal generalized fiducial quantity
Step 1: Calculate the values of R μ 1 , R μ 2 , R σ 1 2 , and  R σ 2 2 as given in Equations (6)–(9).
Step 2: Calculate the values of R λ p . m 1 , R λ p . m 2 , R δ m 1 , R δ m 2 , and  R θ . m as given in Equation (16).
Step 3: Repeat steps 1–2 for q times.
Step 4: Calculate the values of L F G C I 2 and U F G C I 2 as given in Equation (17).

2.2. Bayesian Approach

The prior distribution is based on the experimenter’s belief and is updated with the sample information. The posterior distribution is used to update the prior distribution with Bayes’ rule. The Bayesian approach is based on the likelihood function and the prior distributions. The Jeffreys independence priors are
p ( μ 1 , σ 1 2 ) 1 σ 1 2
and
p ( μ 2 , σ 2 2 ) 1 σ 2 2 .
The posterior distributions for σ 1 2 , σ 2 2 , μ 1 , and  μ 2 are
σ 1 2 | y 1 I G n 1 ( 1 ) 1 2 , ( n 1 ( 1 ) 1 ) s 1 ( 1 ) 2 2
σ 2 2 | y 2 I G n 2 ( 1 ) 1 2 , ( n 2 ( 1 ) 1 ) s 2 ( 1 ) 2 2
μ 1 | σ 1 2 , y 1 N y ¯ 1 ( 1 ) , σ 1 2 n 1 ( 1 )
and
μ 2 | σ 2 2 , y 2 N y ¯ 2 ( 1 ) , σ 2 2 n 2 ( 1 ) .
Let Q δ 1 and Q δ 2 be the probability distributions. The probability distributions are defined by
Q δ 1 = H 1 ( V 1 H ( p 1 ; n 1 ( 0 ) + 0.5 , n 1 ( 1 ) + 0.5 ) ; n 1 ( 0 ) + 0.5 , n 1 ( 1 ) + 0.5 )
and
Q δ 2 = H 1 ( V 2 H ( p 2 ; n 2 ( 0 ) + 0.5 , n 2 ( 1 ) + 0.5 ) ; n 2 ( 0 ) + 0.5 , n 2 ( 1 ) + 0.5 ) .
The posterior distributions of λ p 1 and λ p 2 are
λ B S . p 1 = μ 1 + σ 1 2 n 1 ( 1 ) Z 1 + Φ 1 p 1 Q δ 1 1 Q δ 1 n 1 ( 1 ) U 1 ( 1 )
and
λ B S . p 2 = μ 2 + σ 2 2 n 2 ( 1 ) Z 2 + Φ 1 p 2 Q δ 2 1 Q δ 2 n 2 ( 1 ) U 2 ( 1 ) ,
where σ 1 2 , σ 2 2 , μ 1 , μ 2 , Q δ 1 , and  Q δ 2 are defined in Equations (20)–(25).
Therefore, the posterior distribution for θ is
θ B S = exp ( λ B S . p 1 ) exp ( λ B S . p 2 ) ,
where λ B S . p 1 and λ B S . p 2 are defined in Equation (26) and Equation (27), respectively.
Therefore, the  100 ( 1 α ) % two-sided credible interval for the ratio of two percentiles based on the BS approach is
C I B S = [ L B S , U B S ] ,
where L B S and U B S are the lower limit and the upper limit of the shortest 100 ( 1 α ) % highest posterior density interval of θ B S , respectively.
Algorithm 3: Credible interval based on BS approach
Step 1: Generate the values of σ 1 2 | y 1 , σ 2 2 | y 2 , μ 1 | σ 1 2 , y 1 , and  μ 2 | σ 2 2 , y 2 as given in Equations (20)–(23).
Step 2: Calculate the values of Q δ 1 , Q δ 2 , λ B S . p 1 , and  λ B S . p 2 as given in Equations (24)–(27).
Step 3: Calculate the value of θ B S as given in Equation (28).
Step 4: Repeat the step 1–step 3 for q times.
Step 5: Calculate the values of L B S and U B S as given in Equation (29).

2.3. Parametric Bootstrap Approach

For i = 1 , 2 , let X i * = ( X i 1 * , X i 2 * , . . . , X i n i * ) be the sample with replacement from X i = ( X i 1 , X i 2 , . . . , X i n i ) . Let x i * = ( x i 1 * , x i 2 * , . . . , x i n i * ) be the observed values of X i * = ( X i 1 * , X i 2 * , . . . , X i n i * ) . Moreover, let Y i * = ln ( X i * ) be the normal distribution with parameters mean μ i * and variance σ i 2 * . Let Y ¯ i * and S i 2 * be the estimators of mean and variance, respectively. Let Y ¯ i ( 1 ) * and S i ( 1 ) 2 * be the estimators of mean and variance based on the log-transformed positive observations, respectively. Let y ¯ i * , y ¯ i ( 1 ) * , s i 2 * , and  s i ( 1 ) 2 * be observed values of Y ¯ i * , Y ¯ i ( 1 ) * , S i 2 * , and  S i ( 1 ) 2 * , respectively.
Let W 1 and W 2 be the standard uniform distributions. Let Q δ 1 * and Q δ 2 * be the probability distributions, which are given by
Q δ 1 * = H 1 ( W 1 H ( p 1 ; n 1 ( 0 ) + 0.5 , n 1 ( 1 ) + 0.5 ) ; n 1 ( 0 ) + 0.5 , n 1 ( 1 ) + 0.5 )
and
Q δ 2 * = H 1 ( W 2 H ( p 2 ; n 2 ( 0 ) + 0.5 , n 2 ( 1 ) + 0.5 ) ; n 2 ( 0 ) + 0.5 , n 2 ( 1 ) + 0.5 ) .
The estimators of λ p 1 and λ p 2 are
λ ^ p 1 * = Y ¯ 1 ( 1 ) * + Φ 1 p 1 Q δ 1 * 1 Q δ 1 * S 1 ( 1 ) *
and
λ ^ p 2 * = Y ¯ 2 ( 1 ) * + Φ 1 p 2 Q δ 2 * 1 Q δ 2 * S 2 ( 1 ) * .
Let θ * be the ratio of two percentiles of delta-lognormal distributions. The estimator of θ * is
θ ^ * = exp ( λ ^ p 1 * ) exp ( λ ^ p 2 * ) ,
where λ ^ p 1 * and λ ^ p 2 * are defined in Equation (32) and Equation (33), respectively.
Let θ ^ ¯ * and s d ( θ ^ * ) be the mean and standard deviation of θ ^ * , respectively. The lower and upper bounds for θ * are defined by
L P B = θ ^ ¯ * z 1 α / 2 s d ( θ ^ * )
and
U P B = θ ^ ¯ * + z 1 α / 2 s d ( θ ^ * ) ,
where z 1 α / 2 is the 100 ( 1 α / 2 ) -th percentile of the standard normal distribution.
Therefore, the  100 ( 1 α ) % two-sided confidence interval for the ratio of two percentiles based on the PB approach is
C I P B = [ L P B , U P B ]
where L P B and U P B are defined in Equation (35) and Equation (36), respectively.
Algorithm 4: Confidence interval based on PB approach
Step 1: Generate the value of x 1 * = ( x 11 * , x 12 * , . . . , x 1 n 1 * ) with replacement from
x 1 = ( x 11 , x 12 , . . . , x 1 n 1 ) and generate the value of x 2 * = ( x 21 * , x 22 * , . . . , x 2 n 2 * ) with replacement from x 2 = ( x 21 , x 22 , . . . , x 2 n 2 ) .
Step 2: Calculate the values of y 1 * = ln ( x 1 * ) , y 2 * = ln ( x 2 * ) , y ¯ 1 * , y ¯ 2 * , y ¯ 1 ( 1 ) * , y ¯ 2 ( 1 ) * , s 1 2 * , s 2 2 * , s 1 ( 1 ) 2 * , and  s 2 ( 1 ) 2 * .
Step 3: Calculate the values of Q δ 1 * , Q δ 2 * , λ ^ p 1 * , and  λ ^ p 2 * as given in Equations (30)–(33).
Step 4: Calculate the value of θ ^ * as given in Equation (34).
Step 5: Repeat the step 1–step 4 for q times.
Step 6: Calculate the values of L P B and U P B as given in Equations (35) and (36).

3. Results

Monte Carlo simulation was used to investigate the efficacies of the proposed approaches in terms of the coverage probability (the percentage of times the true parameter of interest falls within the confidence interval) and the average length of the estimate via the RStudio programming. The nominal confidence level was set as 95%. The best-performing confidence interval estimates for each scenario provided a coverage probability greater than or equal to the nominal confidence level of 0.95 and the shortest average length. To generate the data, the sample sizes were set as ( n 1 , n 2 ) = (30,30), (50,50), (30,50), (100,100), or (50,100); the population means were fixed as ( μ 1 , μ 2 ) = (1.00,1.00); the population variances were set as ( σ 1 2 , σ 2 2 ) = (0.50,0.50), (0.50,1.00), or (1.00,1.00); and the probabilities of zero observations were set as ( δ 1 , δ 2 ) = (0.3,0.3), (0.3,0.5), or (0.5,0.5). For each simulation, 3000 runs were made together with 1500 replications.  
Algorithm 5: Coverage probability and average length of the confidence intervals
For a given n 1 , n 2 , μ 1 , μ 2 , δ 1 , δ 2 , σ 1 2 , σ 2 2 , and  θ
Step 1: Generate the values of x 1 = ( x 11 , x 12 , . . . , x 1 n 1 ) and x 2 = ( x 21 , x 22 , . . . , x 2 n 2 ) from the delta-lognormal distributions.
Step 2: Calculate the values of y 1 = ln ( x 1 ) , y 2 = ln ( x 2 ) , y ¯ 1 , y ¯ 2 , y ¯ 1 ( 1 ) , y ¯ 2 ( 1 ) , s 1 2 , s 2 2 , s 1 ( 1 ) 2 , and  s 2 ( 1 ) 2 .
Step 3: Construct C I F G C I 1 ( h ) = [ L F G C I 1 ( h ) , U F G C I 1 ( h ) ] using the Algorithm 1.
Step 4: Construct C I F G C I 2 ( h ) = [ L F G C I 2 ( h ) , U F G C I 2 ( h ) ] using the Algorithm 2.
Step 5: Construct C I B S ( h ) = [ L B S ( h ) , U B S ( h ) ] using the Algorithm 3.
Step 6: Construct C I P B ( h ) = [ L P B ( h ) , U P B ( h ) ] using the Algorithm 4.
Step 7: If L ( h ) θ U ( h ) set p ( h ) = 1, else p ( h ) = 0.
Step 8: Calculate U ( h ) L ( h ) .
Step 9: Repeat the step 1–step 8 for a large number of times (say, M times) and calculate coverage probability and average length.
The coverage probabilities and average lengths of each confidence interval are presented in Table 1 and shown in Figure 1, Figure 2 and Figure 3. It can be seen that the coverage probabilities of all of the methods were greater than the nominal confidence level of 0.95, while the BS approach performed better than the others by providing the shortest average lengths for all of the scenarios tested.

4. Empirical Application of the Methods to Rainfall Data from Two Regions in Thailand

The confidence intervals for the ratio of percentiles discussed in the previous section were subsequently applied to estimate the confidence interval for the ratio of percentiles for two rainfall datasets. The calculations were conducted using RStudio.
Of the six geographical regions in Thailand, the northern and northeastern regions are the most agrarian, and since water is essential for agriculture, estimating the amount of rainfall is paramount. Rainfall data for 1 September 2021 from the northern and northeastern regions of Thailand obtained from the Thai Meteorological Department were previously reported by Thangjai et al. [5]. The rainfall data of northern and northeastern regions contain the zero and positive values. From Thangjai et al. [5], the minimum Akaike information criterion (AIC) values of the positive rainfall data for northern and northeastern regions are fitted to the log-normal distributions. Therefore, the rainfall data of the northern and northeastern regions follow delta-lognormal distributions. The summary statistics for the rainfall data from the northern region are n 1 = 29, n 1 ( 1 ) = 23, n 1 ( 0 ) = 6, y ¯ 1 ( 1 ) = 0.56, and s 1 ( 1 ) 2 = 2.28, while those for the northeastern region are n 2 = 28, n 2 ( 1 ) = 18, n 2 ( 0 ) = 10, y ¯ 2 ( 1 ) = 1.10, and s 2 ( 1 ) 2 = 3.26. The 95% confidence interval for the ratio of the percentiles of the two populations based on the four approaches was C I F G C I 1 = [0.0418,4.6082] with an interval length of 4.5664, C I F G C I 2 = [0.0270, 3.3244] with an interval length of 3.2974, C I B S = [0.0026, 3.1862] with an interval length of 3.1836, and C I P B = [−0.8686, 2.3793] with an interval length of 3.2479. The results show that the BS credible interval estimate provided the shortest length. The trace plot of the BS estimate is shown in Figure 4. Therefore, the empirical results are in accordance with the simulation study results.

5. Discussion and Conclusions

The percentile is used to describe the dispersion of a probability distribution. Moreover, the ratio of percentiles is used to compare the dispersion of two populations. In addition, the data comprising the positively right-skewed data and zero observation are also often encountered in many fields. The positively right-skewed data conform to the log-normal distribution, whereas the number of zero observations conforms to the binomial distribution. The distribution of data consists of positively right-skewed data, and zero observations correspond to the delta-lognormal distribution. Therefore, the ratio of percentiles of the delta-lognormal data plays an important part in statistics. The confidence interval estimation is recommended for estimating the ratio of percentiles of the delta-lognormal data. Herein, we present the four approaches to estimate the confidence interval for the ratio of the percentiles of two delta-lognormal distributions based on the FGCI approach using the fiducial quantity, the optimal generalized fiducial quantity, the BS approach, or the PB approach. The main advantage of using the four approaches is that they can be used to estimate the confidence intervals for complex parameters, whereas the main disadvantage is that a simulation study is required to determine their values for a particular scenario. Comparatively, the FGCI approach requires numerical simulation based on the fiducial generalized pivotal quantity, the BS approach requires simulation with the prior distribution, and the PB approach is based on the sampling distribution. The performance results of the proposed confidence intervals were compared to obtain the precise interval estimator.
Nevertheless, the simulation study results indicate that while the coverage probabilities of all of the approaches were suitable, the average lengths of the BS approach were the shortest for all of the scenarios tested. Thus, the BS approach is the best for constructing an estimate for the confidence interval for the ratio of the percentiles of two delta-lognormal distributions, albeit the other methods being suitable alternatives. This conclusion is similar to those presented elsewhere [5]. The BS approach can be used to construct the estimate of the credible interval for complex parameters. Moreover, the BS approach can be easily extended to infer the percentiles of other distributions. In future research, statistical inference using the percentiles of other distributions will be considered. For the delta-lognormal distribution, the proposed approach can be applied to compare rainfall dispersion in other two areas. Moreover, it can be used in many applications, such as insurance and PM2.5 dispersion.

Author Contributions

Conceptualization, S.-A.N. and W.T.; methodology, S.-A.N. and W.T.; software, W.T.; validation, S.-A.N., S.N. and N.S.; formal analysis, S.-A.N. and W.T.; investigation, S.N. and N.S.; resources, W.T.; data curation, W.T.; writing—original draft preparation, W.T.; writing—review and editing, S.-A.N. and W.T.; visualization, W.T.; supervision, S.-A.N.; project administration, S.-A.N. and S.N.; funding acquisition, S.-A.N. and S.N. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by King Mongkut’s University of Technology North Bangkok. Grant No: KMUTNB-66-KNOW-01.

Data Availability Statement

Rainfall data from the northern and northeastern regions of Thailand obtained from the Thai Meteorological Department were previously reported by Thangjai et al. [5].

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Huang, L.F.; Johnson, R.A. Confidence regions for the ratio of percentiles. Statisrics Probab. Lett. 2006, 76, 384–392. [Google Scholar] [CrossRef]
  2. Huang, L.F. Approximated non parametric confidence regions for the ratio of two percentiles. Commun. Stat.-Theory Methods 2017, 46, 4004–4015. [Google Scholar] [CrossRef]
  3. Chakraborti, S.; Li, J. Confidence interval estimation of a normal percentile. Am. Stat. 2007, 61, 331–336. [Google Scholar] [CrossRef]
  4. Shrestha, S.; Fang, X.; Zech, W.C. What should be the 95th percentile rainfall event depths? J. Irrig. Drain. Eng. 2014, 140, 06013002. [Google Scholar] [CrossRef]
  5. Thangjai, W.; Niwitpong, S.A.; Niwitpong, S. Estimation of common percentile of rainfall datasets in Thailand using delta-lognormal distributions. PeerJ 2022, 10, 1–39. [Google Scholar] [CrossRef] [PubMed]
  6. Aitchison, J. On the distribution of a positive random variable having a discrete probability and mass at the origin. J. Am. Stat. Assoc. 1955, 50, 901–908. [Google Scholar]
  7. Zhou, X.H.; Tu, W. Confidence intervals for the mean of diagnostic test charge data containing zeros. Biometrics 2000, 56, 1118–1125. [Google Scholar] [CrossRef] [PubMed]
  8. Hasan, M.S.; Krishnamoorthy, K. Confidence intervals for the mean and a percentile based on zero-inflated lognormal data. J. Stat. Comput. Simul. 2018, 88, 1499–1514. [Google Scholar] [CrossRef]
Figure 1. Comparison of the CPs and the ALs of the confidence intervals for the ratio of percentiles according to sample sizes.
Figure 1. Comparison of the CPs and the ALs of the confidence intervals for the ratio of percentiles according to sample sizes.
Symmetry 15 00794 g001
Figure 2. Comparison of the CPs and the ALs of the confidence intervals for the ratio of percentiles according to probabilities of non-zero values.
Figure 2. Comparison of the CPs and the ALs of the confidence intervals for the ratio of percentiles according to probabilities of non-zero values.
Symmetry 15 00794 g002
Figure 3. Comparison of the CPs and the ALs of the confidence intervals for the ratio of percentiles according to variance.
Figure 3. Comparison of the CPs and the ALs of the confidence intervals for the ratio of percentiles according to variance.
Symmetry 15 00794 g003
Figure 4. Trace plot of the BS estimate for the ratio of the percentile of the northern region and the northeastern region.
Figure 4. Trace plot of the BS estimate for the ratio of the percentile of the northern region and the northeastern region.
Symmetry 15 00794 g004
Table 1. The coverage probabilities (CPs) and average lengths (ALs) of 95% two-sided confidence intervals for the ratio of the percentiles of two delta-lognormal distributions.
Table 1. The coverage probabilities (CPs) and average lengths (ALs) of 95% two-sided confidence intervals for the ratio of the percentiles of two delta-lognormal distributions.
( n 1 , n 2 ) ( μ 1 , μ 2 ) ( δ 1 , δ 2 ) ( σ 1 2 , σ 2 2 ) CP (AL)
C I F G C I 1 C I F G C I 2 C I B S C I P B
(30,30)(1.0,1.0)(0.3,0.3)(0.5,0.5)0.99500.99500.99300.9910
(2.5460)(2.6894)(2.2299)(2.3273)
(0.5,1.0)0.99430.99530.99200.9920
(2.0245)(2.0671)(1.7386)(1.9352)
(1.0,1.0)0.99270.99470.99070.9887
(4.5295)(4.8314)(3.6130)(4.3502)
(0.3,0.5)(0.5,0.5)0.99230.99070.98600.9807
(2.8673)(2.8758)(2.5092)(2.6268)
(0.5,1.0)0.99370.99100.98600.9860
(2.4029)(2.2886)(2.0468)(2.3110)
(1.0,1.0)0.99500.99430.99130.9870
(5.2764)(5.3124)(4.1935)(5.0920)
(0.5,0.5)(0.5,0.5)0.99600.99530.99300.9907
(2.9642)(3.2731)(2.5253)(2.6688)
(0.5,1.0)0.99670.99570.99330.9923
(2.4995)(2.6089)(2.0791)(2.3692)
(1.0,1.0)0.99370.99430.98800.9860
(5.6224)(6.4502)(4.3045)(5.5863)
(50,50)(1.0,1.0)(0.3,0.3)(0.5,0.5)0.99300.99430.99070.9883
(1.6773)(1.7432)(1.5578)(1.5834)
(0.5,1.0)0.99400.99530.99400.9953
(1.3285)(1.3390)(1.2167)(1.2822)
(1.0,1.0)0.99270.99170.99000.9893
(2.6586)(2.7784)(2.3229)(2.5554)
(0.3,0.5)(0.5,0.5)0.99430.99130.98570.9843
(1.9112)(1.8864)(1.7737)(1.8024)
(0.5,1.0)0.99270.98830.98770.9873
(1.6139)(1.5126)(1.4709)(1.5735)
(1.0,1.0)0.99230.99030.98570.9850
(3.1755)(3.0862)(2.7680)(3.0512)
(0.5,0.5)(0.5,0.5)0.99430.99370.99370.9910
(1.8877)(2.0385)(1.7256)(1.7624)
(0.5,1.0)0.99470.99500.99200.9910
(1.5758)(1.6069)(1.4184)(1.5152)
(1.0,1.0)0.99330.99170.99030.9877
(3.1608)(3.4810)(2.6977)(2.9972)
(30,50)(1.0,1.0)(0.3,0.3)(0.5,0.5)0.99530.99430.99130.9847
(2.3111)(2.4502)(2.0192)(2.0532)
(0.5,1.0)0.99570.99630.99230.9900
(1.7212)(1.7609)(1.4965)(1.5918)
(1.0,1.0)0.99170.99130.98870.9820
(4.0489)(4.3732)(3.2471)(3.7715)
(0.3,0.5)(0.5,0.5)0.99200.99070.98600.9763
(2.5849)(2.5940)(2.2673)(2.3035)
(0.5,1.0)0.99370.98900.98670.9843
(2.1018)(1.9964)(1.8207)(1.9397)
(1.0,1.0)0.99470.99330.98800.9837
(4.7258)(4.6993)(3.7844)(4.4954)
(0.5,0.5)(0.5,0.5)0.99630.99630.99630.9930
(2.7961)(3.1043)(2.3796)(2.4386)
(0.5,1.0)0.99600.99570.99100.9907
(2.1357)(2.2267)(1.8044)(1.9514)
(1.0,1.0)0.99500.99500.99630.9930
(5.2021)(5.9882)(3.9804)(4.8729)
(100,100)(1.0,1.0)(0.3,0.3)(0.5,0.5)0.99600.99530.99270.9910
(1.0621)(1.0968)(1.0194)(1.0323)
(0.5,1.0)0.99430.99370.99170.9917
(0.8466)(0.8481)(0.8073)(0.8323)
(1.0,1.0)0.99230.99300.99070.9887
(1.5956)(1.6469)(1.4855)(1.5559)
(0.3,0.5)(0.5,0.5)0.99400.98870.99070.9907
(1.2290)(1.2023)(1.1794)(1.1939)
(0.5,1.0)0.99600.98270.98870.9910
(1.0472)(0.9689)(0.9952)(1.0338)
(1.0,1.0)0.99400.98800.98900.9867
(1.9285)(1.8404)(1.7952)(1.8889)
(0.5,0.5)(0.5,0.5)0.99330.99330.99170.9903
(1.1822)(1.2580)(1.1272)(1.1401)
(0.5,1.0)0.99070.99070.98970.9907
(0.9819)(0.9850)(0.9294)(0.9624)
(1.0,1.0)0.99570.99470.99370.9933
(1.7969)(1.9312)(1.6575)(1.7492)
(50,100)(1.0,1.0)(0.3,0.3)(0.5,0.5)0.99430.99330.99170.9877
(1.4553)(1.5199)(1.3477)(1.3457)
(0.5,1.0)0.99130.99170.99130.9890
(1.0928)(1.1063)(1.0094)(1.0336)
(1.0,1.0)0.99230.99270.98870.9837
(2.3210)(2.4394)(2.0415)(2.1534)
(0.3,0.5)(0.5,0.5)0.99330.98930.98900.9830
(1.6587)(1.6297)(1.5360)(1.5401)
(0.5,0.5)0.99330.98230.98700.9803
(1.3195)(1.2287)(1.2155)(1.2523)
(0.5,1.0)0.99330.98930.98830.9820
(2.7516)(2.6548)(2.4220)(2.5710)
(0.5,0.5)(0.5,0.5)0.99300.99300.99100.9873
(1.6779)(1.8074)(1.5302)(1.5300)
(0.5,1.0)0.99300.99070.98930.9870
(1.3099)(1.3329)(1.1930)(1.2194)
(1.0,1.0)0.99270.99170.99070.9860
(2.7363)(2.9991)(2.3541)(2.5189)
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Thangjai, W.; Niwitpong, S.-A.; Niwitpong, S.; Smithpreecha, N. Confidence Interval Estimation for the Ratio of the Percentiles of Two Delta-Lognormal Distributions with Application to Rainfall Data. Symmetry 2023, 15, 794. https://doi.org/10.3390/sym15040794

AMA Style

Thangjai W, Niwitpong S-A, Niwitpong S, Smithpreecha N. Confidence Interval Estimation for the Ratio of the Percentiles of Two Delta-Lognormal Distributions with Application to Rainfall Data. Symmetry. 2023; 15(4):794. https://doi.org/10.3390/sym15040794

Chicago/Turabian Style

Thangjai, Warisa, Sa-Aat Niwitpong, Suparat Niwitpong, and Narudee Smithpreecha. 2023. "Confidence Interval Estimation for the Ratio of the Percentiles of Two Delta-Lognormal Distributions with Application to Rainfall Data" Symmetry 15, no. 4: 794. https://doi.org/10.3390/sym15040794

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop