1. Introduction
When several random variables comprise non-negative values, their dissemination will fit neither a normal nor other symmetrical distributions, and so asymmetrical distributions must be considered instead. Of these, the Birnbaum–Saunders (BS) distribution is receiving considerable attention because of its useful properties and close relationship to the normal distribution. The BS distribution was originally developed to model failure due to the development and growth of cracks in a material subjected to repeated stress cycles [
1]. It has two positive parameters:
, the shape parameter, and
, which represents both the scale parameter and the median of the BS distribution. Suppose that random variable
X follows a BS distribution with parameters
and
(denoted as
), then its cumulative distribution function (cdf) is given by
where
is the standard normal cdf. Subsequently, the probability density function (pdf) of the BS distribution can be written as
The BS distribution has several interesting properties. For example, for
, (i)
, where
denotes a standard normal distribution with parameters 0 and 1; (ii)
; and (iii)
. From (i), the expected value and variance of
X are given by
and
, respectively. The BS distribution has been applied in many areas, such as reliability testing, environmental studies, engineering, finance, and medical sciences. For example, Birnbaum and Saunders [
2] fitted it to datasets of the fatigue life of 6061-T6 aluminum coupons. Since then, the BS distribution has been used to model the active repair time of an airborne communication transceiver [
3], wind speed [
4], water quality [
5], and the hardness of a commercially available polymeric bone cement for evaluating the effect of using nanoparticles at different loading levels [
6]. Durham and Padgett [
7] applied the BS distribution to the carbon fiber fatigue data subjected to increasing stress. Other applications of the BS distribution can be found in [
8,
9,
10,
11,
12].
The coefficient of variation (CV) is the standard deviation divided by the mean; it is a unit-free measure of the dispersion of data with a high CV value, indicating a high level of dispersion around the mean. Since it is free from units of measurement, it has often been used instead of the standard deviation to compare the variability within and between populations. The CV has been used in many fields (e.g., economics, medicine, biology, engineering, among others). In addition, the difference between the CVs of two independent populations has been extended to compare the dispersion between them. In statistical inference, the confidence interval has been widely used to assess the population CV and the differences between the CVs of two independent populations, since it provides more information about the unknown population parameter of interest than a point estimator. The confidence interval at the nominal level of
is defined as the estimated range of values that is likely to include the unknown population parameter of interest
of the time. Moreover, it can be used to perform hypothesis testing. Several researchers have focused on constructing confidence intervals for the CV and the difference between the CVs of populations with various distributions. For instance, Tian [
13] applied the generalized confidence interval (GCI) approach to construct a confidence interval for the common CV of several normal distributions. Mahmoudvand and Hassani [
14] used an approximately unbiased estimator of the population CV and variance to construct confidence intervals for the CV of a normal distribution. Banik and Kibria [
15] compared several methods for constructing confidence intervals for the population CV of symmetric and positively skewed distributions. Sangnawakij and Niwitpong [
16] constructed confidence intervals for the single CV and the difference between the CVs of two-parameter exponential distributions by using the method of variance estimates recovery (MOVER), GCI, and the asymptotic confidence interval. Thangjai and Niwitpong [
17] proposed confidence intervals for the common CV of several normal populations by using the adjusted GCI and computational approaches. Yosboonruang et al. [
18] proposed Bayesian credible intervals for the difference between the CVs of delta-lognormal distributions. Recently, La-ongkaew et al. [
19] constructed confidence intervals for the difference between the CVs of Weibull distribution by using GCI, Bayesian methods, MOVER based on Hendricks and Robey’s confidence interval, a bootstrap method with standard errors, and a percentile bootstrap method.
Over the years, many other researchers have developed methods to construct confidence intervals for the functions of BS distribution parameters, such as the mean,
,
, quantiles, and reliability. For example, Engelhardt et al. [
20] presented hypothesis testing and confidence intervals for the parameters of BS distributions based on the maximum likelihood estimate. Wu and Wong [
21] improved interval estimation for a BS distribution by applying a high-order likelihood asymptotic procedure. Ng et al. [
22] proposed point and interval estimations for the parameters of BS distribution based on type-II censored samples. Xu and Tang [
23] used reference prior functions of the unknown parameters of a BS distribution by using Bayesian inference to obtain Bayesian estimators from the idea of Lindley and Gibbs’ sampling procedure. Subsequently, Wang [
24] examined confidence intervals for the
, mean, quantiles, and reliability function of BS distributions based on the concept of GCI. Meanwhile, Niu et al. [
25] proposed two test statistics, which are the exact generalized
p-value approach and the delta method, to compare the mean, quantile, and reliability functions of several BS distributions. Li and Xu [
26] considered two fiducial methods to estimate the parameters of BS distributions, based on the inverse of the structural equation and Hannig’s method. Recently, Guo et al. [
27] presented confidence intervals and hypothesis testing for the common mean of several BS distributions. Despite the diverse theoretical and methodological developments for constructing confidence intervals from the functions of parameters of BS distributions, there have not yet been any studies on the single CV and the difference between the CVs of BS distributions. To fill this gap, we propose confidence intervals for these two scenarios by applying the concepts of GCI, the bootstrap confidence interval (BCI), the Bayesian credible interval (BayCI), and the highest posterior density (HPD) interval. Moreover, we applied the proposed methods to datasets of PM2.5 (particulate matter ≤ 2.5
) concentration in Chiang Mai, Thailand, collected in March and April 2019, to illustrate their efficacies.
The rest of this paper is organized as follows. The methodology for the construction of confidence intervals for the CV and the difference between the CVs of BS distributions are described in
Section 2 and
Section 3, respectively. A simulation study and results are presented in
Section 4. In
Section 5, real datasets are used to illustrate the efficacies of the proposed confidence intervals. Finally, conclusions are provided in
Section 6.
4. Simulation Studies
A Monte Carlo simulation study was conducted by using R statistical software to evaluate the performances of the four methods used for constructing confidence intervals for the CV and the difference between the CVs of BS distributions under various combinations of parameters. We evaluated the performances of GCI, BCI, BayCI, and the HPD interval by measuring their coverage probabilities and average lengths based on 5000 independently generated replications, with 5000 pivotal quantities for GCI,
for BCI, and
for BayCI and the HPD interval. Note that we set hyperparameters
and
for BayCI and the HPD interval [
34]. For the nominal confidence level of 0.95, the best-performing method has a coverage probability close to or greater than 0.95, and the shortest average length. Since
is the scale parameter and the median of the BS distribution,
without loss of generality was applied in this simulation study.
For the CV of a BS distribution, we used sample sizes
n = 10, 20, 30, 50, or 100 and
= 0.1, 0.25, 0.5, 0.75, 1, or 2. The simulation results reported in
Table 1 show that the coverage probabilities of GCI, BayCI, and the HPD interval were greater than or close to 0.95, even for a small sample size and/or a high value of
. Conversely, although BCI had the shortest average lengths, its coverage probabilities were the lowest and under 0.95 but improved when
n was increased. When considering the average lengths of the other three methods, the HPD interval outperformed the others in all cases. In addition, the average lengths of the four methods tended to decrease and were similar when the sample size was increased.
For the difference between the CVs of BS distributions, we used sample sizes
= (10,10), (20,20), (30,30), (50,50), (100,100), (10,20), (30,20), (30,50), or (100,50) and (
,
= (0.25,0.25), (0.25,0.50), (0.25,1.00), (0.25,2.00), (0.50,0.50), (0.50,1.00), (0.50,2), (1.00,1.00), or (2.00,2.00). The simulation results for equal and unequal sample sizes are summarized in
Table 2 and
Table 3, respectively. The coverage probabilities of GCI, BayCI, and the HPD interval were greater than or close to 0.95 for all cases irrespective of whether
(
Table 2) or
(
Table 3). For both equal and unequal sample sizes, the coverage probabilities for BCI were the lowest, whereas its average lengths were similar to the others or the shortest in all cases. Meanwhile, the average lengths of the HPD interval were shorter than GCI and BayCI under all circumstances. Moreover, the performances of the four methods in terms of the average length improved and were close to each other when sample sizes
were increased.
5. An Empirical Application
The BS distribution has been successfully used to analyze air pollution concentration data, as its properties are similar to those of the lognormal distribution. For example, Leiva et al. [
37] applied the BS distribution to examine sulfur dioxide concentration data and, later on, PM10 (particulate matter ≤ 10
) concentration data in Santiago, Chile [
38]. In the second study, the authors proposed a criterion based on a control chart attribute for assessing the environmental risk.
In Chiang Mai, Thailand, air pollution from agricultural burning and forest fires from February to May has become a serious problem. We used datasets of PM 2.5 concentration data from Chiang Mai collected in March and April 2019 to illustrate the efficacies of the confidence intervals for the CV and the difference between the CVs of BS distributions derived using GCI, BCI, BayCI, and the HPD interval. The average daily PM 2.5 concentrations were measured at 9.00 AM in the area of Chang Phueak, Chiang Mai, Thailand. The datasets were obtained from the Pollution Control Department [
39]. Since the data comprise positive values, they can be fitted to lognormal, exponential, gamma, Weibull, or BS distributions. The Akaike information criterion (AIC) and Bayesian information criterion (BIC) were applied to check the fitting of the data to these distributions. The results in
Table 4 and
Table 5 show that the AIC and BIC of the BS distribution were the smallest, thereby ensuring its suitability for application to these datasets. To construct the BayCI and the HPD interval for the CV and difference between CVs of BS distributions using real data, we applied
and
for both sampling areas. The sample mean, sample variance, and CV of the data are 96.0645, 2896.9290, and 0.5603 for the March dataset and 78.3000, 887.6655, and 0.3805 for the April dataset, respectively. Subsequently, the difference between the CVs was 0.1798. The 95% confidence interval based on GCI, BCI, BayCI, and the HPD interval for the CV and the difference between the CVs for the BS distributions are reported in
Table 6.
From the results, it can be seen that BCI had the shortest average length, while that of the HPD interval was shorter than GCI and BayCI, which is in agreement with the results from the simulation study. Once again, BCI attained the lowest coverage probability, so it is not recommended for constructing confidence intervals for the CV and the difference between the CVs of the BS distributions of the two real datasets. Meanwhile, the coverage probabilities of the HPD interval for the CV and the difference between the CVs of the BS distributions were greater than or close to 0.95. Thus, under these circumstances, the HPD interval provided the best-performing confidence intervals for the CV and the difference between the CVs of the BS distributions of these two datasets.