1. Introduction
Characterized by its asymmetry, the Rayleigh distribution is a continuous probability model particularly effective for describing data with a pronounced right skew. In various fields, it is common to analyze data comprising values greater than zero, such as mortality rates, the number of deaths caused by diseases, or fatalities resulting from accidents. Such data are characteristically asymmetric due to the presence of a long right tail, indicating that although extreme values may be observed, they occur infrequently. The Rayleigh distribution was first introduced by Rayleigh [
1]. It is extensively applied in various fields, including medical research, quality control, and reliability engineering. For example, Arikan et al. [
2] explored the power of wind energy, optimal turbine selection, and associated costs in Turkey using the Rayleigh distribution. Similarly, Almongy et al. [
3] analyzed real mortality rate data from COVID-19 in Italy, Mexico, and the Netherlands across different time periods. Kilai et al. [
4] used the Rayleigh distribution to study mortality rates from COVID-19 data in Italy and Canada. Kilany et al. [
5] applied the power Rayleigh distribution to describe hydrologic data and the total COVID-19 deaths in Egypt. Finally, Seal et al. [
6] analyzed the strength of carbon fibers, measured in Giga-Pascals, and applied the findings to predict the lifetime of T8 fluorescent lamps.
In statistical inference, describing unknown parameters of a distribution is crucial for certain fields that rely on accurate forecasting. Numerous studies have applied statistical inference to various fields, including public health, reliability, and degradation modeling. Xu et al. [
7] propose a novel class of tail-weighted multivariate degradation models and demonstrate their application using two real-world degradation datasets. Xu et al. [
8] explore adaptive sampling and ensemble techniques for estimating the remaining useful life of machinery, which significantly contributes to operational efficiency and cost-effective maintenance. Li et al. [
9] introduced a Bayesian hierarchical model designed for public health applications, validated with data from mouse brain research and studies on human pancreatic ductal adenocarcinoma. Ma et al. [
10] developed a new algorithm for statistical inference aimed at estimating parameters in regression models, with an application to chemical sensor data. The two unknown parameters of the two-parameter Rayleigh distribution are the location and scale parameters, which describe the characteristics of the distribution. There are various methods used to estimate unknown parameters, with popular methods including the maximum likelihood method and Bayesian method. Dey et al. [
11] considered various methods for estimating unknown parameters, including point estimation using maximum likelihood, the method of moments, L-moments, percentile-based estimates, the least squares method, and Bayesian estimation. The Bayesian estimation process utilized a gamma prior to model the scale parameter and a uniform prior for the location parameter. Ardianti and Sutarman [
12] estimated the parameter for one parameter Rayleigh distribution using maximum likelihood and Bayesian estimate under non-informative prior. El-Faheem et al. [
13] proposed reliability function and estimate parameters for Rayleigh distribution with two parameters using maximum likelihood, the first and second modified method of moment and Bayesian estimate. Krishnamoorthy et al. [
14] developed various confidence intervals based on the method of pivotal quantities for mean. Seal et al. [
6] proposed estimation in Rayleigh distribution using Bayesian estimation under a distance-type loss function. For distributions characterized by an excess of zeros, the concept of zero-inflated modeling was first introduced by Lambert [
15]. Subsequently, this concept has been extended to various distributions, leading to the development of zero-inflated versions of well-known models. In particular, the Rayleigh distribution has been modified to incorporate zero inflation. Fuxiang et al. [
16] proposed a zero-inflated Rayleigh distribution with a single parameter, introducing its probability distribution function and methods for estimating parameters. Since then, numerous studies have developed confidence intervals for zero-inflated models based on various underlying distributions. For example, Maneerat et al. [
17] suggested approaches for developing confidence intervals for lognormal distribution means with certain zero values. It is called delta lognormal, and it uses a variance estimates recovery approach and generalized confidence interval. It is demonstrated through applications using two real datasets: the abundance of red cod in the waters of New Zealand and airborne chlorine in the United States.
Additionally, only a few studies have explored methods for inferring parameters of the two-parameter Rayleigh distribution. For example, the study by Seo et al. [
18] constructed an exact confidence interval for a two-parameter Rayleigh distribution, while Teawpongpan and Sangnawakij [
19] initiated the proposal for estimating the mean for one parameter Rayleigh distribution using confidence intervals through the large sample method. This research is the first to introduce a methodology for constructing confidence intervals for the single mean of the zero-inflated two-parameter Rayleigh distribution. It builds upon the work of Kilai et al. [
4], who applied the Rayleigh distribution to model the mortality rate of COVID-19. Our work differs from previous studies in that we have developed a zero-inflated version of the distribution to improve the accuracy and effectiveness of mean estimation in practical applications. Elshahhat [
20] proposed a new joint Type-I hybrid censoring scheme for comparing the lifetimes of products from two production lines based on the one-parameter Rayleigh distribution. Maximum likelihood and Bayesian methods were employed to analyze the censored data, and the performance was evaluated through simulations and real engineering data. Ahmad et al. [
21] introduced a generalized Rayleigh distribution that provides enhanced flexibility for modeling lifetime data. The study explores several statistical properties, applies maximum likelihood estimation (MLE) for parameter inference, and demonstrates the model’s effectiveness through real data applications. Shen et al. [
22] proposes a new extended generalized Rayleigh distribution with heavy-tailed characteristics, applied to Reddit advertising and breast cancer datasets. Parameters are estimated using maximum likelihood, and the model’s performance is compared with other Rayleigh-type distributions.
This paper focuses on constructing credible interval and confidence intervals for the single mean of a zero-inflected two-parameter Rayleigh distribution. Six different methods are utilized for parameter estimation: the percentile bootstrap method, the generalized confidence interval, the standard confidence interval based on maximum likelihood estimation, the approximate normal method based on the delta method, the Bayesian credible interval, and the Bayesian highest posterior density. In the Bayesian approach for estimating the single mean, the scale parameter is assigned a gamma prior, while the location parameter is assigned a uniform prior. Coverage probability and expected length were used to evaluate the effectiveness of the proposed credible and confidence intervals, both derived through Monte Carlo simulations conducted using R version 4.4.1 and OpenBUGS version 3.2.3. In the
Section 4, total death data collected by the WHO for Singapore in October 2022 were used to estimate confidence intervals for the mean.
The remainder of this paper is organized as follows.
Section 2 introduces the zero-inflated two-parameter Rayleigh distribution and details the proposed interval estimation methods.
Section 3 presents the results of the simulation study.
Section 4 illustrates the application of the proposed methods to real data, and
Section 5 concludes the paper with a summary and recommendations for future applications.
2. Materials and Methods
Let
represent a set of observations drawn independently from a zero-inflated two-parameter Rayleigh distribution, denoted by
. The probability density function of this distribution is defined as follows:
where
, which is greater than zero, represents the scale parameter of the Rayleigh distribution, and
is the location parameter. The probability of zeros is given by the parameter
. The maximum likelihood method can be used to estimate the parameters
,
, and
. However, closed-form solutions are not available for
and
. Dey et al. [
11] and Krishnamoorthy et al. [
14] proposed calculating the MLEs of
and
using numerical approaches. The maximum likelihood estimator for
is obtained by solving the following equation:
, while the MLE of
is given by
. The estimator for
is given by
.
is the number of zeros in the sample of size
n, and
. The distribution expectation is defined by
The two-parameter Rayleigh distribution is employed in this study for its greater flexibility in modeling data with fast-aging behavior.
Figure 1 shows how the distribution’s shape varies with different parameter values.
2.1. The Percentile Bootstrap Confidence Interval
The concept was first introduced by Efron and Tibshirani [
23], where the bootstrap confidence interval relies on percentile-based replication of the target distribution. The mean can be estimated using Equation (
2), and a two-sided percentile bootstrap confidence interval for the mean of the zero-inflated two-parameter Rayleigh distribution can be constructed as shown in Equation (
3):
where
represent the
percentile of
. The confidence interval for the single mean of zero-inflated two-parameter Rayleigh distribution based on percentile bootstrap is constructed according to Algorithm 1.
Algorithm 1 The percentile bootstrap method |
Generate random sample . Resample from original data for using same sample. Compute , and using maximum likelihood. Compute from Equation ( 2). Perform steps 2–4 for B iterations, where B represents the resampling value in the bootstrap method. Arrange in ascending. Compute 95% percentile bootstrap interval, as given by Equation ( 3).
|
2.2. The Generalized Confidence Interval (GCI)
The first idea for the generalized confidence interval (GCI) was proposed by Weerahandi [
24], and it is grounded in the concept of the generalized pivotal quantity (GPQ). Building on this foundation, the concepts of the pivotal quantity and its generalized form (GPQ) differ fundamentally. While a pivotal quantity is constructed based on the sampling distribution and the parameters of interest, a GPQ incorporates nuisance parameters into its formulation. However, these nuisance parameters do not affect the observed value of the GPQ. Importantly, the GPQ method involves quantities whose distributions are free from unknown parameters, making them suitable for inference even in complex models.
Krishnamoorthy et al. [
14] proposed pivotal quantities specifically for the unknown parameters of the two-parameter Rayleigh distribution. The parameters are represented as
a =
and
.
Let
and
be observed values of
and
, respectively. The pivotal quantities for the parameters are given by
and
where
and
are the maximum likelihood estimates from the two-parameter Rayleigh distribution with
a = 0 and
b = 1, based on a sample of size n. The pivotal quantities for
a and
b follow the distribution of
and
Thus, the pivotal quantity for
is defined as
and
. Furthermore, Wu and Hsieh [
25] construct the pivotal quantities based on VST to construct intervals, which is
where
. Hence, considering the three pivotal
,
, and
, which are independent of the parameter values, GPQ for the mean of the zero-inflated Rayleigh distribution is given by
Therefore, GCI for the single mean of the zero-inflated Rayleigh distribution is given by Equation (
10):
The confidence intervals for the single mean of the zero-inflated two-parameter Rayleigh distribution, based on the GCI method, are constructed according to Algorithm 2.
Algorithm 2 The GCI method |
Generate random sample . Compute and from MLE. Generate from two-parameter Rayleigh distribution or two-parameter Rayleigh distribution . Compute and from MLE. Compute , , and . Compute . Perform steps 3–6 repeatedly for q times, where q represents the number of generalized computations. Arrange in ascending. Evaluate 95% GCI. As given by Equation ( 10).
|
2.3. Standard Confidence Interval Based on Maximum Likelihood
The standard confidence interval is a basic method for estimating confidence intervals of parameters. The method for the parameters of the two-parameter Rayleigh distribution was first proposed by Dey et al. [
11].
Let
represent observation drawn independently with sample size
n. The approximate confidence interval of
, and
are given as follows:
and
where
, and
are the maximum likelihood estimators from random samples, and the standard confidence interval for the single mean, based on the maximum likelihood estimators, is constructed using Equation (
14):
The standard confidence interval for the mean of the zero-inflated Rayleigh distribution based on the maximum likelihood method is constructed according to Algorithm 3.
Algorithm 3 The standard confidence interval method |
Generate random sample . Estimation parameters , and by using maximum likelihood. Compute the confidence intervals for parameter. , and from Equations (11)–(13) Compute from Equation ( 2). Compute the single mean standard confidence interval using the maximum likelihood method from Equation ( 14).
|
2.4. Approximation Normal for Confidence Interval Based on Delta Method
The delta method is a technique within the framework of normal approximation used to obtain a normal distribution as the limiting distribution of an estimator. This common approach is summarized as follows:
Suppose that
is a differentiable scalar function with parameters of interest. For the estimator
, the objective is to identify the asymptotic distribution of
G by constructing a stochastic representation. The delta method is applied to confirm the asymptotic normality of
G as the sample size tends to infinity, enabling the calculation of the asymptotic mean and variance of
. In Equation (
15), the function
g is expanded into Taylor series at
, the true values corresponding to
.
The basic statistics are assumed here as
and
, where
are the MLE of
respectively. Then,
, where the function
. The partial derivatives are as follows:
Therefore, the expression for is derived using Taylor series expansion of the function around the true parameter values. This approximation linearizes the function with respect to the parameter estimates , and given by
.
The approximate mean and variance for can be derived by considering the expectation and variance given by
,
,
,
.
Thus, as
,
is approximately normally distributed with mean
and variance
. The estimations of the variances
and
for the two-parameter Rayleigh distribution were presented in the study by Dey S. et al. [
11], and the results are given as follows:
and
. We use plug-in estimators for
and
as follows:
The estimate of
is as follows:
where
represents positive observation from the two-parameter Rayleigh distribution, and let
denotes the number of observations with values greater than zero.
Therefore, the confidence interval for the approximate normal is based on the delta method for the single mean constructed by Equation (
18):
where
, with
, and
being the maximum likelihood estimators. The variance of
denoted by
is estimated using the delta method and computed according to Equation (
16). The confidence interval for the mean of the zero-inflated two-parameter Rayleigh distribution, based on approximate normality from the delta method, is constructed according to Algorithm 4.
Algorithm 4 The approximation normal method |
Generate random sample . Estimation parameters , and by using maximum likelihood. Compute variance for based on the delta method from Equation ( 17). Evaluate the confidence interval for by utilizing the delta method with the approximate normal distribution, as shown in Equation ( 18).
|
2.5. Bayesian Credible Intervals
The Bayesian method, or Bayesian parameter estimation, is an approach to estimating parameters by combining observed data with a prior distribution. This results in the posterior probability density function based on the given dataset obtained from random sampling. The posterior density function is given by
where
represents the likelihood function based on observation. The prior distribution for the scale parameter is represented by
,
denotes the prior distribution for the location parameter, and
is the posterior distribution function. Since the posterior distributions of the Rayleigh distribution cannot be computed analytically, we use Gibbs sampling to estimate the posterior distribution and approximate the unknown parameters.
In this research, we propose two methods for Bayesian credible intervals: the Bayesian MCMC interval and the Bayesian highest posterior density (HPD) interval.
2.5.1. Bayesian MCMC Interval
Suppose that represents an observation drawn independently from a zero-inflated Rayleigh distribution. Markov chain Monte Carlo simulations were employed to determine the posterior distribution and define the parameters of interest. A gamma distribution was used as the prior for the scale parameter , a uniform prior was chosen for the location parameter , and the prior for was modeled using a beta distribution. To incorporate prior information into the analysis and to improve the accuracy of parameter estimation by reducing the variability of the posterior mean, informative priors were employed. This approach also enables reliable parameter inference even when the observed data are limited. The R2OpenBUGS package in R was utilized to compute the Bayesian estimates.
Therefore, the single mean is calculated using Equation (
2), and the estimation of the credible intervals for the single mean of the zero-inflated Rayleigh distribution based on Bayesian MCMC for the two-sided case is given by Equation (
20):
The credible intervals for the single mean of zero-inflated two-parameter Rayleigh distribution based on Bayesian MCMC is constructed according to Algorithm 5.
Algorithm 5 The Bayesian MCMC method |
Generate random sample . Set prior distribution , , and by trial hyperparameters. Generate posterior distribution for parameter , , and using Gibbs sampling by R program and OpenBUGS. Compute . Repeat 1–4 for T times, where T denotes the total number of Gibbs sampling replications. Burn in C samples, where C is the number of values eliminated in the MCMC. Arrange in ascending order. Compute 95% Bayesian MCMC interval, as given by Equation ( 20).
|
2.5.2. The Bayesian Highest Posterior Density (HPD) Interval
The Bayesian HPD interval indicates that every value within the specified range is more probable than all points outside it. It is also the shortest interval [
26]. This paper calculates the Bayesian HPD interval using the HDInterval package in R. Therefore, the single mean is calculated using Equation (
2), and the two-sided credible intervals for the single mean of the zero-inflated two-parameter Rayleigh distribution, based on the Bayesian HPD interval, are given by Equation (
21):
The credible intervals for the single mean of zero-inflated two-parameter Rayleigh distribution based on Bayesian HPD are constructed according to Algorithm 6.
Algorithm 6 The Bayesian HPD method |
Generate random sample . Set prior distribution , , and by trial hyperparameters. Generate posterior distribution for parameter , , and using Gibbs sampling by R program and OpenBUGS. Compute . Repeat 3–4 for T times, where T denotes the total number of Gibbs sampling replications. Burn in C samples, where C is the number of values eliminated in the MCMC. Compute the highest posterior density using package HDInterval in R program. Arrange , and in ascending order. Compute 95% Bayesian HPD interval, as given by Equations (21).
|
3. Results
This research compares the efficiency of confidence intervals by evaluating coverage probability (CP) and expected length (EL) using Monte Carlo simulations conducted with the R program version 4.4.1. The main objective of this research is to develop and present a method for constructing confidence intervals for the mean of the zero-inflated two-parameter Rayleigh distribution. Additionally, this study examines the performance of confidence intervals obtained from various approaches, including the percentile bootstrap, generalized confidence interval, standard confidence interval based on maximum likelihood estimation, approximate normal confidence interval using the delta method, Bayesian Markov Chain Monte Carlo, and Bayesian highest posterior density. Sample sizes of n = 20, 50, 70, and 100 were considered. The data were generated from a zero-inflated two-parameter Rayleigh distribution with scale parameters = 0.2 and 0.5, location parameters = 0.2 and 0.5, and the probability of zeros = 0.2, 0.4, and 0.6. We used M = 1000 replications for the Monte Carlo simulation. For the bootstrap method, each situation involved B = 1000 replications, along with q = 2500 pivotal quantities, and we generated T = 20,000 realizations of Bayesian using Gibbs sampling with a burn-in of C = 5000. The best method was selected based on coverage probability (CP) values that were not lower than the confidence level and the shortest expected length (EL). Specifically, the optimal method was chosen if the CP values were greater than or equal to the 0.95 confidence level and the EL was the shortest.
Evaluate the CP and EL values for the confidence interval of the parameters of the zero-inflated Rayleigh distribution according to Algorithm 7.
Algorithm 7 CP and EL of the confidence intervals for single mean of the zero-inflated two-parameter Rayleigh distribution |
- 1.
Set input the number of process iterations and parameters of distribution and . - 2.
Generate from . - 3.
Use Algorithm 1 to compute 95% percentile bootstrap confidence interval for the single mean. - 4.
Use Algorithm 2 to compute 95% generalized confidence interval for the single mean. - 5.
Use Algorithm 3 to compute 95% standard confidence interval using maximum likelihood for the single mean. - 6.
Use Algorithm 4 to compute 95% approximate normal for single mean. - 7.
Use Algorithm 5 to compute 95% Bayesian MCMC confidence interval for the single mean. - 8.
Use Algorithm 6 to compute 95% Bayesian HPD confidence interval for the single mean. - 9.
Set if , and set if this condition is not satisfied. - 10.
Find the value of . - 11.
Perform steps 2–8 for M times. - 12.
Calculate the average of P to assess the coverage probability. - 13.
Find the average value of the range to assess the expected length.
|
The performance comparison of confidence intervals for the zero-inflated two-parameter Rayleigh distribution, based on simulations using CP and EL values from
Table 1 and
Table 2 and
Figure 2,
Figure 3,
Figure 4,
Figure 5,
Figure 6,
Figure 7,
Figure 8,
Figure 9,
Figure 10,
Figure 11,
Figure 12 and
Figure 13, focuses on the confidence intervals of the single mean, with scale parameters
and location parameter
. The standard confidence interval based on maximum likelihood provides coverage probabilities lower than the nominal confidence level and exhibits a larger expected length compared to other methods, especially for small sample sizes. The approximate normal method using the delta method yields higher CP values than the standard method but also results in greater expected lengths. The expected length generally decreases as the sample size increases across most methods. The simulation results across various scenarios indicate that the Bayesian method outperforms the others in estimating the single mean when evaluated based on CP and EL. Additionally, the Bayesian HPD interval offers the shortest credible interval length.
4. Application
The total death rate from COVID-19 plays a crucial role in understanding the overall impact of the pandemic on public health. This information enables governments and health authorities to assess the severity of the outbreak and evaluate the effectiveness of the implemented control measures. Moreover, it is valuable for planning and allocating medical resources appropriately based on the situation. Additionally, it helps raise public awareness about the importance of disease prevention and encourages greater compliance with recommendations from government agencies and health organizations. COVID-19 total death data for Singapore in October 2022 were compiled by the World Health Organization (WHO) and are accessible at
https://covid19.who.int/WHO-COVID-19-globaldata.csv (last accessed on 28 June 2023). The
n = 31 was as follows: 2, 1, 2, 2, 0, 1, 0, 4, 0, 3, 2, 2, 3, 0, 2, 3, 2, 5, 3, 5, 0, 1, 2, 1, 3, 2, 2, 0, 4, 2, 4. The zero observations in the dataset were modeled using a binomial distribution, while the positive data were analyzed using the model with the minimum Akaike information criterion (AIC) and minimum Bayesian information criterion (BIC), as reported in
Table 3 and
Table 4. The parameter estimates for the dataset are statistically summarized as follows:
, and
. The confidence interval for the single mean of total COVID-19 deaths in Singapore is reported in
Table 5.
The results are shown in
Table 5. The confidence interval for total deaths from COVID-19 in Singapore is not recommended using the standard confidence interval based on maximum likelihood or the approximate normal method derived from the delta method. The simulation results indicate that when the sample size
n is small, the standard confidence interval often results in a coverage probability that fails to include the true mean. Moreover, as the value of
increases, the mean estimator fails to capture the true mean of the target distribution, and the resulting confidence interval tends to be wider than those obtained from other methods. Similarly, for the approximate normal (AN) method, although the confidence interval tends to cover the parameter and becomes narrower as
n increases, it still produces wider intervals than other methods when
n is small. Therefore, we recommend the percentile bootstrap method, the GCI method, the Bayesian MCMC method, and the Bayesian HPD method as these approaches provide confidence intervals that adequately cover the mean parameter. Among them, the Bayesian HPD method is concluded to be the most effective as it produces the narrowest expected length for the confidence interval of the mean.
5. Discussion
Previous studies on constructing confidence intervals and Bayesian estimation have often shown that Bayesian methods yield the most effective results. In this research, Bayesian approaches were employed to construct confidence intervals for the single mean of the zero-inflated Rayleigh distribution. Specifically, the scale parameter was modeled using a gamma prior, while the location parameter was modeled using a uniform prior. The results indicated that Bayesian credible intervals performed well in nearly all scenarios, consistently achieving coverage probabilities close to the specified confidence level. This consistent and stable performance suggests that the Markov chains have likely converged satisfactorily. Furthermore, the Bayesian highest posterior density (HPD) intervals demonstrated superior performance compared to the Bayesian MCMC intervals in several cases. For example, when the parameters were , , and , the HPD method outperformed the MCMC method for sample sizes of 20, 70, and 100. Similarly, for , , and , the HPD method showed better performance for sample sizes of 20, 50, and 100. When the parameters were , , and , the HPD method again outperformed the MCMC method for sample sizes of 20, 50, and 70. Moreover, in several other scenarios not explicitly mentioned, the HPD method consistently yielded better results. In particular, when the parameters were , , and , or , , and , the HPD method outperformed the MCMC method across all sample sizes.
6. Conclusions
This paper presents confidence intervals for the single mean of a zero-inflated two-parameter Rayleigh distribution with unknown location and scale parameters. Various methods for estimating the single mean were considered, including the percentile bootstrap, generalized confidence interval (GCI), standard confidence interval based on maximum likelihood, and the approximate normal method using the delta method, as well as the Bayesian MCMC and Bayesian HPD methods. The performance of these methods was evaluated based on coverage probability and expected length through a simulation study. The results revealed that the percentile bootstrap, GCI, Bayesian MCMC, and Bayesian HPD methods produced coverage probabilities that exceeded the specified confidence level. However, the confidence intervals constructed using the standard maximum likelihood and delta method approaches proved unsuitable for the sample size under consideration as they resulted in excessively wide expected lengths, despite covering the parameter of interest. In certain scenarios, the Bayesian highest posterior density (HPD) method outperformed other approaches by delivering coverage probabilities (CPs) that reliably encompassed the parameter of interest while also producing shorter expected lengths (ELs). Nonetheless, in instances where coverage was inadequate, the Bayesian MCMC method exhibited the second-best performance.
In the application, we analyzed the total number of COVID-19-related deaths in Singapore during October 2022. Based on the simulation results, the Bayesian credible interval methods outperformed the other approaches, with the Bayesian HPD method demonstrating the best performance in constructing the confidence interval for the single mean of the zero-inflated two-parameter Rayleigh distribution. Therefore, the Bayesian HPD method is recommended for constructing the confidence interval for the total number of COVID-19 deaths in Singapore during this period.