Abstract
The log-normal distribution (skewed distribution or asymmetry distribution) is used to describe random variables comprising positive real values. It is well known that the logarithm values of these are normally distributed (symmetry distribution). Positively right-skewed data applicable to the log-normal distribution are frequently observed in the fields of environmental studies, biology, and medicine. The number of zero observations follows a binomial distribution. However, problems can arise in the analysis of data containing zero observations along with log-normally distributed data, for which the delta-lognormal distribution is often referred to for using the analysis of the data. In statistics, the percentile provides the relative standing of a numerical data point when compared to all of the others in a distribution with reference to the observations at or below it. In this study, estimates for the confidence interval for the ratio of the percentiles of two delta-lognormal distributions are constructed using fiducial generalized confidence interval approaches based on the fiducial quantity and the optimal generalized fiducial quantity, the Bayesian approach, and the parametric bootstrap method. As assessed by Monte Carlo simulations using the RStudio programming in terms of the coverage probability and the average length, the Bayesian approach performed quite well by providing adequate coverage probabilities along with the shortest average lengths in all of the scenarios tested. Daily rainfall data contain both zero and positive values. The daily rainfall data can usually be fitted to the delta-lognormal distribution. Their application to rainfall data is also provided to illustrate their efficacies with real data. The efficacy of the approach is used to compare two rainfall dispersion populations.
1. Introduction
The quantile, decile, and percentile can be used to investigate the central tendency and spread of a distribution. The difference between the third quartile and the first quartile is the interquartile range, which can be used as a measure of the spread of a distribution. The percentile has been utilized in many applications, such as in insurance and environmental science. For example, it has been used to compare the same-strength properties in the wood industry [1,2], to set premiums in the insurance industry [3], to compute rainfall amounts [4], and to estimate rainfall dispersion [5]. The ratio of the same-strength properties to compare two sources of lumber in terms of dimensions, grade, or species is commonly used as a comparison measure in the wood industry [1], for which the US lumber standards use the fifth percentile. Moreover, using the ratio of the fifth percentiles of two strength distributions is more meaningful than using the ratio of their means [2]. In addition, the ratio of percentiles is scale-free and easily interpretable. Thailand is an agricultural country, for which water is essential. Since the amount of water available depends on rainfall, predicting the latter is of interest, which is a particularly difficult task. Motivated by the application of the ratio of percentiles used in the lumber industry, we applied this approach in this study to compare two rainfall dispersion populations.
For environmental data, meteorology and climatology studies are often the positive values or the right-skewed data. The data follow the normal, log-normal, gamma, and exponential distributions. However, the log-normal distribution is commonly used in analysis of the right-skewed data. The delta-lognormal distribution can be used to investigate populations comprising a combination of zero and positive values [6]. The number of zero observations has a binomial distribution with probability , whereas the positive observations follow a log-normal distribution with probability . The log-normal distribution (asymmetry distribution) is used to describe random variables comprising positive real values. It is well known that the logarithm values of the random variables are normal distributions (symmetry distribution). The delta-lognormal distribution has been utilized in many fields, including medical and environmental science. For example, it has been used to study the diagnostic test charge data for older adults with depression [7] and to estimate rainfall dispersion [5].
Several researchers have studied interval estimation for the parameters of the delta-lognormal distribution, such as the mean, variance, coefficient of variation, and percentile. For instance, Hasan and Krishnamoorthy [8] estimated the confidence interval for the mean and the percentile of a delta-lognormal distribution. Moreover, Thangjai et al. [5] estimated the confidence interval for the common percentile of several delta-lognormal distributions.
Interval estimation about the ratio of percentiles of two delta-lognormal distributions are of practical and theoretical importance. To the best of our knowledge, the interval estimation for the ratio of the percentiles of two delta-lognormal distributions has not previously been studied. Therefore, the confidence interval for the ratio of the percentiles of two delta-lognormal distributions is of interest. The fiducial generalized confidence interval (FGCI), the Bayesian (BS), and the parametric bootstrap (PB) have been widely used to estimate confidence intervals for the parameter of interest [5]. The FGCI, BS, and PB approaches are based on simulated data. The FGCI approach uses simulation based on the fiducial generalized pivotal quantity (FGPQ). The BS approach uses simulation based on the prior distribution. The PB approach uses simulation based on the sampling distribution. This paper focuses on developing an approach for estimating the ratio of the percentiles of two delta-lognormal distributions. Herein, estimates for the confidence interval for the ratio of the percentiles of two delta-lognormal distributions are provided using several methods: two FGCI approaches based on the fiducial quantity and the optimal generalized fiducial quantity, one based on the BS approach, and one based on the PB approach. The confidence interval estimates were investigated via a Monte Carlo simulation study and then used to compare the percentiles of rainfall dispersion datasets from two regions in Thailand.
The rest of the paper is organized as follows. Several methods to construct the confidence intervals for the ratio of the percentiles of two delta-lognormal distributions are demonstrated in Section 2. The simulation studies are presented in Section 3. Application is given to illustrate the proposed approaches of constructing confidence intervals in Section 4. Some conclusions are given in Section 5.
2. Methods
For , let be the number of true zero observed value and let be the number of positive observed values. Additionally, let be the number of the sample size. Let be a non-negative random sample of size from the delta-lognormal distribution with parameters mean , variance , and probability of obtaining the positive observation . Moreover, let be the probability of zero observation. The density function of delta-lognormal distribution is
where is an indicator function for which the values are equal to 1 when 0 and 0 otherwise, and are equal to 0 when 0 and 1 when 0.
The distribution function of delta-lognormal distribution is
where is the log-normal cumulative distribution function and .
Let be the normal distribution with parameters mean and variance . Let and be the estimators of mean and standard deviation, respectively. Suppose that and are the observed values of and , respectively. Moreover, let and be the estimators of mean and standard deviation based on the log-transformed positive observations, respectively. Suppose that and are the observed values of and , respectively.
Let be the -th quantile of the delta-lognormal distribution. From the distribution function, that is . Therefore, the is
where is the standard normal distribution function.
Suppose that . The estimator of the is
Let be the ratio of two percentiles of delta-lognormal distributions. The estimator of is
2.1. Fiducial Generalized Confidence Interval Approach
We proposed the FGCI approach based on the fiducial quantity and the FGCI approach based on optimal generalized fiducial quantity. First, the FGCI approach based on the fiducial quantity uses the FGPQ of . The FGPQ of is based on the FGPQs of , , , , , , , and .
Let and be the standard normal distributions. Additionally, let and be the chi-squared distributions with and degrees of freedom, respectively. The FGPQs for , , , and are given by
and
According to Thangjai et al. [5], let be the beta random variable with shape parameters and . Additionally, let be the beta random variable with shape parameters and . Let and be the standard uniform distributions. Let and be the percentiles. Let and be the beta distribution functions. Moreover, let and be the quartile functions of the beta distributions. The FGPQs for and are given by
and
The FGPQs for , , and are used to compute the FGPQ for . Additionally, the FGPQs for , , and are used to calculate the FGPQ for . The FGPQs for and are given by
and
where is the quartile function.
The FGPQ for is
where and are defined in Equation (12) and Equation (13), respectively.
Therefore, the two-sided confidence interval for the ratio of two percentiles based on the FGCI approach using fiducial quantity is
where and are the -th and -th percentiles of , respectively.
| Algorithm 1: Confidence interval based on FGCI approach using fiducial quantity |
Step 1: Calculate the values of , , , and as given in Equations (6)–(9). Step 2: Calculate the values of , , , and as given in Equations (10)–(13). Step 3: Calculate the value of as given in Equation (14). Step 4: Repeat the step 1–step 3 for q times. Step 5: Calculate the values of and as given in Equation (15). |
Second, the concept of the FGCI approach based on the optimal generalized fiducial quantity is similar to the concept of the FGCI approach based on the fiducial quantity. The FGCI approach based on optimal generalized fiducial quantity uses the FGPQ of , which is given by
where
and
Therefore, the two-sided confidence interval for the ratio of two percentiles based on the FGCI approach using the optimal generalized fiducial quantity is
where and are the -th and -th percentiles of , respectively.
| Algorithm 2: Confidence interval based on FGCI approach using optimal generalized fiducial quantity |
Step 1: Calculate the values of , , , and as given in Equations (6)–(9). Step 2: Calculate the values of , , , , and as given in Equation (16). Step 3: Repeat steps 1–2 for q times. Step 4: Calculate the values of and as given in Equation (17). |
2.2. Bayesian Approach
The prior distribution is based on the experimenter’s belief and is updated with the sample information. The posterior distribution is used to update the prior distribution with Bayes’ rule. The Bayesian approach is based on the likelihood function and the prior distributions. The Jeffreys independence priors are
and
The posterior distributions for , , , and are
and
Let and be the probability distributions. The probability distributions are defined by
and
The posterior distributions of and are
and
where , , , , , and are defined in Equations (20)–(25).
Therefore, the posterior distribution for is
where and are defined in Equation (26) and Equation (27), respectively.
Therefore, the two-sided credible interval for the ratio of two percentiles based on the BS approach is
where and are the lower limit and the upper limit of the shortest highest posterior density interval of , respectively.
| Algorithm 3: Credible interval based on BS approach |
Step 1: Generate the values of , , , and as given in Equations (20)–(23). Step 2: Calculate the values of , , , and as given in Equations (24)–(27). Step 3: Calculate the value of as given in Equation (28). Step 4: Repeat the step 1–step 3 for q times. Step 5: Calculate the values of and as given in Equation (29). |
2.3. Parametric Bootstrap Approach
For , let be the sample with replacement from . Let be the observed values of . Moreover, let be the normal distribution with parameters mean and variance . Let and be the estimators of mean and variance, respectively. Let and be the estimators of mean and variance based on the log-transformed positive observations, respectively. Let , , , and be observed values of , , , and , respectively.
Let and be the standard uniform distributions. Let and be the probability distributions, which are given by
and
The estimators of and are
and
Let be the ratio of two percentiles of delta-lognormal distributions. The estimator of is
where and are defined in Equation (32) and Equation (33), respectively.
Let and be the mean and standard deviation of , respectively. The lower and upper bounds for are defined by
and
where is the -th percentile of the standard normal distribution.
Therefore, the two-sided confidence interval for the ratio of two percentiles based on the PB approach is
where and are defined in Equation (35) and Equation (36), respectively.
| Algorithm 4: Confidence interval based on PB approach |
Step 1: Generate the value of with replacement from and generate the value of with replacement from . Step 2: Calculate the values of , , , , , , , , , and . Step 3: Calculate the values of , , , and as given in Equations (30)–(33). Step 4: Calculate the value of as given in Equation (34). Step 5: Repeat the step 1–step 4 for q times. Step 6: Calculate the values of and as given in Equations (35) and (36). |
3. Results
Monte Carlo simulation was used to investigate the efficacies of the proposed approaches in terms of the coverage probability (the percentage of times the true parameter of interest falls within the confidence interval) and the average length of the estimate via the RStudio programming. The nominal confidence level was set as 95%. The best-performing confidence interval estimates for each scenario provided a coverage probability greater than or equal to the nominal confidence level of 0.95 and the shortest average length. To generate the data, the sample sizes were set as (30,30), (50,50), (30,50), (100,100), or (50,100); the population means were fixed as (1.00,1.00); the population variances were set as (0.50,0.50), (0.50,1.00), or (1.00,1.00); and the probabilities of zero observations were set as (0.3,0.3), (0.3,0.5), or (0.5,0.5). For each simulation, 3000 runs were made together with 1500 replications.
| Algorithm 5: Coverage probability and average length of the confidence intervals |
For a given , , , , , , , , and Step 1: Generate the values of and from the delta-lognormal distributions. Step 2: Calculate the values of , , , , , , , , , and . Step 3: Construct using the Algorithm 1. Step 4: Construct using the Algorithm 2. Step 5: Construct using the Algorithm 3. Step 6: Construct using the Algorithm 4. Step 7: If set 1, else 0. Step 8: Calculate . Step 9: Repeat the step 1–step 8 for a large number of times (say, M times) and calculate coverage probability and average length. |
The coverage probabilities and average lengths of each confidence interval are presented in Table 1 and shown in Figure 1, Figure 2 and Figure 3. It can be seen that the coverage probabilities of all of the methods were greater than the nominal confidence level of 0.95, while the BS approach performed better than the others by providing the shortest average lengths for all of the scenarios tested.
Table 1.
The coverage probabilities (CPs) and average lengths (ALs) of 95% two-sided confidence intervals for the ratio of the percentiles of two delta-lognormal distributions.
Figure 1.
Comparison of the CPs and the ALs of the confidence intervals for the ratio of percentiles according to sample sizes.
Figure 2.
Comparison of the CPs and the ALs of the confidence intervals for the ratio of percentiles according to probabilities of non-zero values.
Figure 3.
Comparison of the CPs and the ALs of the confidence intervals for the ratio of percentiles according to variance.
4. Empirical Application of the Methods to Rainfall Data from Two Regions in Thailand
The confidence intervals for the ratio of percentiles discussed in the previous section were subsequently applied to estimate the confidence interval for the ratio of percentiles for two rainfall datasets. The calculations were conducted using RStudio.
Of the six geographical regions in Thailand, the northern and northeastern regions are the most agrarian, and since water is essential for agriculture, estimating the amount of rainfall is paramount. Rainfall data for 1 September 2021 from the northern and northeastern regions of Thailand obtained from the Thai Meteorological Department were previously reported by Thangjai et al. [5]. The rainfall data of northern and northeastern regions contain the zero and positive values. From Thangjai et al. [5], the minimum Akaike information criterion (AIC) values of the positive rainfall data for northern and northeastern regions are fitted to the log-normal distributions. Therefore, the rainfall data of the northern and northeastern regions follow delta-lognormal distributions. The summary statistics for the rainfall data from the northern region are 29, 23, 6, 0.56, and 2.28, while those for the northeastern region are 28, 18, 10, 1.10, and 3.26. The 95% confidence interval for the ratio of the percentiles of the two populations based on the four approaches was [0.0418,4.6082] with an interval length of 4.5664, [0.0270, 3.3244] with an interval length of 3.2974, [0.0026, 3.1862] with an interval length of 3.1836, and [−0.8686, 2.3793] with an interval length of 3.2479. The results show that the BS credible interval estimate provided the shortest length. The trace plot of the BS estimate is shown in Figure 4. Therefore, the empirical results are in accordance with the simulation study results.
Figure 4.
Trace plot of the BS estimate for the ratio of the percentile of the northern region and the northeastern region.
5. Discussion and Conclusions
The percentile is used to describe the dispersion of a probability distribution. Moreover, the ratio of percentiles is used to compare the dispersion of two populations. In addition, the data comprising the positively right-skewed data and zero observation are also often encountered in many fields. The positively right-skewed data conform to the log-normal distribution, whereas the number of zero observations conforms to the binomial distribution. The distribution of data consists of positively right-skewed data, and zero observations correspond to the delta-lognormal distribution. Therefore, the ratio of percentiles of the delta-lognormal data plays an important part in statistics. The confidence interval estimation is recommended for estimating the ratio of percentiles of the delta-lognormal data. Herein, we present the four approaches to estimate the confidence interval for the ratio of the percentiles of two delta-lognormal distributions based on the FGCI approach using the fiducial quantity, the optimal generalized fiducial quantity, the BS approach, or the PB approach. The main advantage of using the four approaches is that they can be used to estimate the confidence intervals for complex parameters, whereas the main disadvantage is that a simulation study is required to determine their values for a particular scenario. Comparatively, the FGCI approach requires numerical simulation based on the fiducial generalized pivotal quantity, the BS approach requires simulation with the prior distribution, and the PB approach is based on the sampling distribution. The performance results of the proposed confidence intervals were compared to obtain the precise interval estimator.
Nevertheless, the simulation study results indicate that while the coverage probabilities of all of the approaches were suitable, the average lengths of the BS approach were the shortest for all of the scenarios tested. Thus, the BS approach is the best for constructing an estimate for the confidence interval for the ratio of the percentiles of two delta-lognormal distributions, albeit the other methods being suitable alternatives. This conclusion is similar to those presented elsewhere [5]. The BS approach can be used to construct the estimate of the credible interval for complex parameters. Moreover, the BS approach can be easily extended to infer the percentiles of other distributions. In future research, statistical inference using the percentiles of other distributions will be considered. For the delta-lognormal distribution, the proposed approach can be applied to compare rainfall dispersion in other two areas. Moreover, it can be used in many applications, such as insurance and PM2.5 dispersion.
Author Contributions
Conceptualization, S.-A.N. and W.T.; methodology, S.-A.N. and W.T.; software, W.T.; validation, S.-A.N., S.N. and N.S.; formal analysis, S.-A.N. and W.T.; investigation, S.N. and N.S.; resources, W.T.; data curation, W.T.; writing—original draft preparation, W.T.; writing—review and editing, S.-A.N. and W.T.; visualization, W.T.; supervision, S.-A.N.; project administration, S.-A.N. and S.N.; funding acquisition, S.-A.N. and S.N. All authors have read and agreed to the published version of the manuscript.
Funding
This research was funded by King Mongkut’s University of Technology North Bangkok. Grant No: KMUTNB-66-KNOW-01.
Data Availability Statement
Rainfall data from the northern and northeastern regions of Thailand obtained from the Thai Meteorological Department were previously reported by Thangjai et al. [5].
Conflicts of Interest
The authors declare no conflict of interest.
References
- Huang, L.F.; Johnson, R.A. Confidence regions for the ratio of percentiles. Statisrics Probab. Lett. 2006, 76, 384–392. [Google Scholar] [CrossRef]
- Huang, L.F. Approximated non parametric confidence regions for the ratio of two percentiles. Commun. Stat.-Theory Methods 2017, 46, 4004–4015. [Google Scholar] [CrossRef]
- Chakraborti, S.; Li, J. Confidence interval estimation of a normal percentile. Am. Stat. 2007, 61, 331–336. [Google Scholar] [CrossRef]
- Shrestha, S.; Fang, X.; Zech, W.C. What should be the 95th percentile rainfall event depths? J. Irrig. Drain. Eng. 2014, 140, 06013002. [Google Scholar] [CrossRef]
- Thangjai, W.; Niwitpong, S.A.; Niwitpong, S. Estimation of common percentile of rainfall datasets in Thailand using delta-lognormal distributions. PeerJ 2022, 10, 1–39. [Google Scholar] [CrossRef] [PubMed]
- Aitchison, J. On the distribution of a positive random variable having a discrete probability and mass at the origin. J. Am. Stat. Assoc. 1955, 50, 901–908. [Google Scholar]
- Zhou, X.H.; Tu, W. Confidence intervals for the mean of diagnostic test charge data containing zeros. Biometrics 2000, 56, 1118–1125. [Google Scholar] [CrossRef] [PubMed]
- Hasan, M.S.; Krishnamoorthy, K. Confidence intervals for the mean and a percentile based on zero-inflated lognormal data. J. Stat. Comput. Simul. 2018, 88, 1499–1514. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).