Simultaneous Confidence Intervals for Pairwise Differences of Means in Zero-Inflated Rayleigh Distributions with an Application to Road Accident Fatalities Data

Thangjai, Warisa; Niwitpong, Sa-Aat; Smithpreecha, Narudee; Wongkhao, Arunee

doi:10.3390/math14030569

Open AccessArticle

Simultaneous Confidence Intervals for Pairwise Differences of Means in Zero-Inflated Rayleigh Distributions with an Application to Road Accident Fatalities Data

¹

Department of Statistics, Faculty of Science, Ramkhamhaeng University, Bangkok 10240, Thailand

²

Department of Applied Statistics, Faculty of Applied Science, King Mongkut’s University of Technology North Bangkok, Bangkok 10800, Thailand

³

Department of Mathematics and Statistics, Faculty of Science and Technology, Rajamangala University of Technology Phra Nakhon, Bangkok 10800, Thailand

⁴

Faculty of Science and Agricultural of Technology, Rajamangala University of Technology Lanna, Tak 63000, Thailand

^*

Author to whom correspondence should be addressed.

Mathematics 2026, 14(3), 569; https://doi.org/10.3390/math14030569

Submission received: 12 January 2026 / Revised: 2 February 2026 / Accepted: 3 February 2026 / Published: 5 February 2026

(This article belongs to the Special Issue Statistical Inference: Methods and Applications)

Download

Browse Figures

Versions Notes

Abstract

This paper develops simultaneous confidence intervals (SCIs) for pairwise differences of means with zero-inflated Rayleigh (ZIR) distributions, a flexible framework for modeling positively skewed data with excess zeros. Closed-form expressions for the ZIR mean are derived, and several competing interval estimation procedures are investigated, including generalized confidence interval (GCI), parametric bootstrap (PB), method of variance estimates recovery (MOVER), delta-method normal approximation, and highest posterior density (HPD) intervals. The finite-sample performance of the proposed SCIs is examined via extensive Monte Carlo simulations, focusing on empirical coverage probabilities (CPs) and average interval lengths (ALs) over a broad range of parameter configurations and zero-inflation levels. A real data application to road accident fatality counts demonstrates the practical utility of the proposed methodology. The results show that the HPD method consistently achieves the most favorable balance between coverage accuracy and interval efficiency. Overall, this study advances reliable simultaneous inference for zero-inflated models commonly encountered in environmental, biomedical, and reliability studies.

Keywords:

generalized confidence interval; highest posterior density credible interval; parametric bootstrap confidence interval; simultaneous confidence intervals; zero-inflated Rayleigh distribution

MSC:

62F25

1. Introduction

Statistics is a branch of mathematics that deals with collecting, organizing, analyzing, interpreting, and presenting data. It is broadly classified into descriptive statistics and inferential statistics, which together provide a framework for understanding data and making informed conclusions about a population based on sample observations. Descriptive statistics focus on summarizing and presenting the key features of a dataset in a clear and structured way, using numerical measures and graphical displays. Among descriptive measures, the mean (or average) is a fundamental statistic that represents the central tendency of quantitative data and is widely used due to its mathematical simplicity and practical usefulness. Inferential statistics, on the other hand, extends data analysis by allowing conclusions about a population to be drawn from sample data using probability theory. Key inferential tools include point estimation, confidence intervals, and hypothesis testing, with the population mean being a central parameter of interest. These methods account for sampling variability and provide a basis for reliable statistical inference beyond descriptive summaries.

The Rayleigh distribution has been widely used to model nonnegative data across a broad range of disciplines, including acoustics, medical research, quality control, communications engineering, reliability analysis, and aerospace applications. Owing to its practical relevance and mathematical simplicity, numerous extensions of the Rayleigh distribution have been introduced in the literature to improve its flexibility and modeling capability for complex real-world data. Notable contributions in this area include the works of Bashir and Rasul [1], Abdulhakim [2], Krishnamoorthy [3], and Almongy et al. [4], among others.

Data that contain a substantial proportion of zero values along with positively skewed nonzero observations commonly arise in areas such as reliability engineering, environmental science, biomedical research, and actuarial studies. In these settings, conventional continuous distributions are often inadequate because they fail to accommodate the excess zeros resulting from structural or process-related factors. Consequently, zero-inflated models have become an essential modeling framework as they explicitly incorporate a point mass at zero together with a continuous distribution to describe the positive outcomes.

The zero-inflated Rayleigh (ZIR) distribution offers a flexible and effective framework for modeling nonnegative data characterized by an excess number of zero observations. In this distribution, positive outcomes are modeled using a Rayleigh distribution, while a mixing parameter controls the probability of structural zeros. This dual-component structure enables the ZIR distribution to capture both the frequency of zero occurrences and the distributional behavior of positive measurements. Owing to its parsimonious formulation and interpretable parameters, the ZIR distribution is well suited for applications in lifetime and reliability analysis, signal amplitude modeling, and environmental studies, where zero values commonly occur alongside positively skewed data. A notable advantage of the ZIR distribution is that its mean reflects contributions from both the zero-inflation mechanism and the positive component, thereby summarizing the overall level of the underlying process. In comparative analyses involving multiple populations or experimental conditions, differences between the means of ZIR distributions provide a natural and informative measure for assessing group effects. Such comparisons allow researchers to simultaneously evaluate changes in the likelihood of zero outcomes and the expected magnitude of positive observations, offering a more comprehensive assessment than approaches based solely on the positive component. Consequently, statistical inference for differences between ZIR means is of considerable practical importance in many applied settings. Several studies have investigated properties and applications of the ZIR distribution, including those by Fuxiang et al. [5] and Kijsason et al. [6].

In studies that compare multiple populations or treatment groups, the primary focus is often on differences between group means rather than on individual mean estimates. When multiple groups are examined simultaneously, constructing separate pairwise confidence intervals can result in an inflated family-wise error rate. Simultaneous confidence intervals (SCIs) for all pairwise differences of means offer a coherent approach to joint inference, providing reliable conclusions while appropriately controlling for multiple comparisons. Although inferential methods for zero-inflated models have gained increasing attention, the development of SCIs for all differences of means within the framework of ZIR distributions remains relatively unexplored. This difficulty stems from the involvement of multiple parameters—specifically, the zero-inflation proportion and the Rayleigh scale parameter—which together define the mean and complicate its sampling behavior. Consequently, standard normal approximation techniques may yield unsatisfactory performance, particularly in settings with small to moderate sample sizes or substantial zero inflation.

Motivated by these issues, this study proposes the construction of SCIs for all pairwise differences of means of ZIR distributions using a range of inferential techniques, where simultaneous refers to the concurrent construction of confidence intervals for all pairwise contrasts within a unified inferential framework. Specifically, SCIs are developed based on the generalized confidence interval (GCI) method, the parametric bootstrap (PB) method, the method of variance estimates recovery (MOVER), and the delta-method normal approximation. In addition, a Bayesian framework is considered, with simultaneous inference conducted using highest posterior density (HPD) credible intervals for all pairwise contrasts. The performance of the proposed frequentist and Bayesian methods is systematically evaluated through extensive Monte Carlo simulations, with emphasis on the marginal coverage probability (CP) of individual pairwise intervals and the average interval length (AL) for various parameter settings and sample sizes. An application to road accident fatality counts is presented to demonstrate the practical effectiveness and applicability of the proposed methods.

2. Methods

The Rayleigh distribution is adopted for modeling the positive values of X because the observed nonzero data are continuous, nonnegative, and right-skewed, rendering discrete distributions inappropriate. As a member of the Weibull family, the Rayleigh distribution offers analytical tractability and computational efficiency, which are beneficial for statistical inference and simulation-based construction of simultaneous confidence intervals. Therefore, the zero-inflated Rayleigh model constitutes a flexible and theoretically sound framework for data exhibiting both excess zeros and continuous positive measurements.

Let X be a random variable following a ZIR distribution with parameters p and

σ

, where

p \in [0, 1)

denotes the probability of an additional point mass at zero and

σ > 0

is the scale parameter of the Rayleigh component. The probability density function (PDF) of X is given by

f (x; p, σ) = \{\begin{matrix} p & ; x = 0 \\ (1 - p) \frac{x}{σ^{2}} exp (- \frac{x^{2}}{2 σ^{2}}) & ; x > 0 . \\ 0 & ; o t h e r w i s e \end{matrix}

This distribution represents a mixture of a degenerate distribution at zero with probability p and a Rayleigh distribution with mixing weight

1 - p

. The corresponding cumulative distribution function (CDF) is

F (x; p, σ) = \{\begin{matrix} 0 & ; x < 0 \\ p & ; x = 0 . \\ p + (1 - p) [1 - exp (- \frac{x^{2}}{2 σ^{2}})] & ; x > 0 \end{matrix}

The mean of X is

E (X) = (1 - p) σ \sqrt{\frac{π}{2}},

where

π

denotes the mathematical constant.

The variance of X is

Var (X) = (1 - p) σ^{2} (2 - \frac{π}{2} (1 - p)) .

In this study, inference for the mean of the ZIR distribution across multiple groups is considered. Let

θ_{i}

be the population mean of the ZIR distribution in the i-th group, for

i = 1, 2, \dots, k

. The mean

θ_{i}

is defined as

θ_{i} = (1 - p_{i}) σ_{i} \sqrt{\frac{π}{2}},

(1)

where

p_{i}

and

σ_{i}

are the zero-inflation probability and the Rayleigh scale parameter for group i, respectively.

A plug-in estimator of

θ_{i}

is given by

{\hat{θ}}_{i} = (1 - {\hat{p}}_{i}) {\hat{σ}}_{i} \sqrt{\frac{π}{2}},

(2)

where

{\hat{p}}_{i}

and

{\hat{σ}}_{i}

are the corresponding MLEs.

The asymptotic variance of

{\hat{θ}}_{i}

is denoted by

Var ({\hat{θ}}_{i}) = \frac{π}{2} [\frac{{\hat{σ}}_{i}^{2} {\hat{p}}_{i} (1 - {\hat{p}}_{i})}{n_{i}} + \frac{{(1 - {\hat{p}}_{i})}^{2} {\hat{σ}}_{i}^{2}}{4 n_{i (1)}}] .

(3)

To compare group means, consider the vector of mean parameters

\underset{̲}{θ} = {(θ_{1}, θ_{2}, \dots, θ_{k})}^{T} .

For any two groups i and l

(i \neq l)

, the difference in means is defined as

θ_{i l} = θ_{i} - θ_{l} = (1 - p_{i}) σ_{i} \sqrt{\frac{π}{2}} - (1 - p_{l}) σ_{l} \sqrt{\frac{π}{2}} .

(4)

The estimator of

θ_{i l}

is

{\hat{θ}}_{i l} = {\hat{θ}}_{i} - {\hat{θ}}_{l} = (1 - {\hat{p}}_{i}) {\hat{σ}}_{i} \sqrt{\frac{π}{2}} - (1 - {\hat{p}}_{l}) {\hat{σ}}_{l} \sqrt{\frac{π}{2}} .

(5)

In Appendix A, assuming independence between groups, the asymptotic variance of

{\hat{θ}}_{i l}

is

\begin{matrix} Var ({\hat{θ}}_{i l}) & = Var ({\hat{θ}}_{i} - {\hat{θ}}_{l}) \\ = Var ({\hat{θ}}_{i}) + Var ({\hat{θ}}_{l}) \\ = \frac{π}{2} [\frac{{\hat{σ}}_{i}^{2} {\hat{p}}_{i} (1 - {\hat{p}}_{i})}{n_{i}} + \frac{{(1 - {\hat{p}}_{i})}^{2} {\hat{σ}}_{i}^{2}}{4 n_{i (1)}}] \\ + \frac{π}{2} [\frac{{\hat{σ}}_{l}^{2} {\hat{p}}_{l} (1 - {\hat{p}}_{l})}{n_{l}} + \frac{{(1 - {\hat{p}}_{l})}^{2} {\hat{σ}}_{l}^{2}}{4 n_{l (1)}}] . \end{matrix}

2.1. Generalized Confidence Interval Method

The GCI method, introduced by Weerahandi, provides an effective inferential framework for complex models in which the exact sampling distributions of estimators are analytically intractable. Unlike classical confidence intervals, the GCI method is constructed using generalized pivotal quantities (GPQs), whose distributions are free of unknown parameters. This method is particularly advantageous for mixture models and zero-inflated distributions, such as the ZIR distribution, where standard asymptotic approximations may perform poorly, especially in small or moderate samples.

The GPQ for

σ_{i}

is

R_{σ_{i}} = \sqrt{\frac{\sum_{i : X_{i} > 0} X_{i}^{2}}{χ_{2 n_{i (1)}}^{2}}} .

Let

θ_{i}

be the mean of the ZIR distribution for the i-th group, where

i = 1, 2, \dots, k

. Let

{\hat{θ}}_{i}

be its estimator. The GPQ for

θ_{i}

is

R_{θ_{i}} = (1 - {\hat{p}}_{i}) R_{σ_{i}} \sqrt{\frac{π}{2}} .

Similarly, let

θ_{i l} = θ_{i} - θ_{l}

be the difference between the means of groups i and l. The corresponding GPQ is defined as

R_{θ_{i l}} = R_{θ_{i}} - R_{θ_{l}} = (1 - {\hat{p}}_{i}) R_{σ_{i}} \sqrt{\frac{π}{2}} - (1 - {\hat{p}}_{l}) R_{σ_{l}} \sqrt{\frac{π}{2}} .

It should be noted that the proposed GPQs are constructed using plug-in estimators for the zero-inflation probability and therefore constitute approximate GPQs. As a result, the corresponding GCI-based intervals are approximate and may exhibit deviations from nominal coverage, particularly in settings with high zero inflation or small sample sizes.

The distribution of

R_{θ_{i l}}

is free of unknown parameters and can be approximated via Monte Carlo simulation. Consequently, a

100 (1 - α) %

two-sided confidence interval for

θ_{i l}

based on the GCI method is given by

{CI}_{i l (GCI)} = [L_{i l (GCI)}, U_{i l (GCI)}] = [R_{θ_{i l}} (α / 2), R_{θ_{i l}} (1 - α / 2)],

(6)

where

R_{θ_{i l}} (α / 2)

and

R_{θ_{i l}} (1 - α / 2)

denote the

(α / 2)

-th and

(1 - α / 2)

-th quantile of the simulated GPQ distribution, respectively. The detailed steps are presented in Algorithm 1.

Algorithm 1 GCI.

Estimates ${\hat{p}}_{i}$ , ${\hat{σ}}_{i}$ , and $\sum_{i : X_{i} > 0} X_{i}^{2}$ , $i = 1, 2, . . ., k$ ;
Number of GPQ draws m;
Significance level $α$ ;
For $b = 1, 2, . . ., m$ do
For $i = 1, 2, . . ., k$ do
If $n_{i (1)} > 0$ then
Draw $χ_{2 n_{i (1)}}^{2}$
Set $R_{σ_{i}} = \sqrt{\sum_{i : X_{i} > 0} X_{i}^{2} / χ_{2 n_{i (1)}}^{2}}$
Else set $R_{σ_{i}} = {\hat{σ}}_{i}$
Set $R_{θ_{i}} = (1 - {\hat{p}}_{i}) R_{σ_{i}} \sqrt{π / 2}$
End for
For all $i < l$ do
Set $R_{θ_{i l}} = R_{θ_{i}} - R_{θ_{l}}$
End for
End for
For all $i < l$ do
Set $L_{i l (G C I)} = R_{θ_{i l}} (α / 2)$ and $U_{i l (G C I)} = R_{θ_{i l}} (1 - α / 2)$
End for

2.2. Parametric Bootstrap Method

Bootstrapping is a widely used resampling technique for assessing the sampling distribution of estimators and constructing confidence intervals when analytic results are difficult or unavailable. Among bootstrap methods, the parametric bootstrap is particularly effective when the underlying data-generating mechanism can be reasonably modeled by a parametric family.

In the parametric bootstrap, it is assumed that the observed data arise from a distribution

F (\cdot; \underset{̲}{θ})

indexed by an unknown parameter vector

\underset{̲}{θ}

. The parameter is first estimated from the original sample, yielding

θ_{̲}^{^}

. Bootstrap samples are then generated from the fitted parametric model

F (\cdot; θ_{̲}^{^})

, rather than directly resampling from the empirical distribution. This procedure allows the bootstrap samples to preserve structural features implied by the assumed model, such as skewness, tail behavior, or zero inflation.

Let

T = T (\underset{̲}{X})

denote a statistic of interest computed from the original data

\underset{̲}{X}

. For each bootstrap replication, a synthetic sample is generated from

F (\cdot; θ_{̲}^{^})

, and the corresponding bootstrap statistic

T^{*}

is calculated. Repeating this process a large number of times yields an empirical approximation to the sampling distribution of T, which can be used to estimate bias, variance, and confidence intervals.

Parametric bootstrap confidence intervals are commonly constructed using the percentile method, where the interval endpoints are obtained from the empirical quantiles of the bootstrap distribution. Compared with the nonparametric bootstrap, the parametric bootstrap often exhibits improved efficiency and smoother sampling distributions when the assumed model is correctly specified. However, its performance depends critically on the validity of the parametric assumption; model misspecification may lead to biased inference.

Due to its flexibility and computational simplicity, the parametric bootstrap has been widely applied in complex inference problems, including small-sample settings, censored or zero-inflated data, and situations involving nonlinear estimators or functions of multiple parameters.

Generating a bootstrap sample, let

X_{i 1}^{*}, X_{i 2}^{*}, \dots, X_{i n_{i}}^{*}

be i.i.d. observations from a ZIR distribution with parameters

{\hat{p}}_{i}

and

{\hat{σ}}_{i}

. Define

n_{i (0)}^{*} = \sum_{j = 1}^{n_{i}} 1 {X_{i j}^{*} = 0}

,

n_{i (1)}^{*} = n_{i} - n_{i (0)}^{*}

. The bootstrap estimators of

p_{i}

and

σ_{i}

are

{\hat{p}}_{i}^{*} = \frac{n_{i (0)}^{*}}{n_{i}}

and

{\hat{σ}}_{i}^{*} = \sqrt{\frac{\sum_{i : X_{i}^{*} > 0} {(X_{i}^{*})}^{2}}{2 n_{i (1)}^{*}}} .

The bootstrap parameter of the mean for group i is

θ_{i}^{*} = (1 - {\hat{p}}_{i}^{*}) {\hat{σ}}_{i}^{*} \sqrt{\frac{π}{2}} .

(7)

In the parametric bootstrap framework, the fitted ZIR model with parameter estimates

{\hat{p}}_{i}

and

{\hat{σ}}_{i}

is employed as a plug-in approximation to the unknown true distribution, from which bootstrap samples are subsequently generated. The bootstrap parameter of the mean is defined as the corresponding population mean evaluated at these fitted parameters and is treated as fixed conditional on the observed data, thereby serving as the true parameter within the bootstrap resampling scheme. Equation (7) is standard in parametric bootstrap inference and ensures coherence between the original estimation problem and its bootstrap counterpart.

The bootstrap difference of the means is

θ_{i l}^{*} = θ_{i}^{*} - θ_{l}^{*} = (1 - {\hat{p}}_{i}^{*}) {\hat{σ}}_{i}^{*} \sqrt{\frac{π}{2}} - (1 - {\hat{p}}_{l}^{*}) {\hat{σ}}_{l}^{*} \sqrt{\frac{π}{2}} .

Therefore, a

100 (1 - α) %

two-sided confidence interval for

θ_{i l}^{*}

based on the parametric bootstrap method is given by

{CI}_{i l (PB)} = [L_{i l (PB)}, U_{i l (PB)}] = [θ_{i l}^{*} (α / 2), θ_{i l}^{*} (1 - α / 2)],

(8)

where

θ_{i l}^{*} (α / 2)

and

θ_{i l}^{*} (1 - α / 2)

denote the

(α / 2)

-th and

(1 - α / 2)

-th quantile of the simulated bootstrap replication, respectively. Algorithm 2 presents the detailed procedure.

Algorithm 2 Parametric Bootstrap Confidence Interval.

Estimates ${\hat{p}}_{i}, {\hat{σ}}_{i}, i = 1, 2, . . ., k$ ;
Number of bootstrap samples m;
Significance level $α$ ;
For $b = 1, 2, . . ., m$ do
For $i = 1, 2, . . ., k$ do
Generate bootstrap sample $X_{i}^{* (b)} \sim ZIR ({\hat{p}}_{i}, {\hat{σ}}_{i})$ of size $n_{i}$
Compute ${\hat{p}}_{i}^{* (b)} = n_{i (0)}^{* (b)} / n_{i}$
Compute ${\hat{σ}}_{i}^{* (b)} = \sqrt{\sum_{i : X_{i}^{*} > 0} {(X_{i}^{* (b)})}^{2} / 2 n_{i (1)}^{* (b)}}$
Set $θ_{i}^{* (b)} = (1 - {\hat{p}}_{i}^{* (b)}) {\hat{σ}}_{i}^{* (b)} \sqrt{π / 2}$
End for
For all $i < l$ do
Set $θ_{i l}^{* (b)} = θ_{i}^{* (b)} - θ_{l}^{* (b)}$
End for
End for
For all $i < l$ do
Set $L_{i l (PB)} = θ_{i l}^{*} (α / 2)$ and $U_{i l (PB)} = θ_{i l}^{*} (1 - α / 2)$
End for

2.3. Method of Variance Estimates Recovery

The MOVER is a general methodology for constructing confidence intervals for functions of parameters when direct variance estimation or joint distributional assumptions are impractical. The core principle of MOVER is that the uncertainty of a target function can be recovered from marginal confidence intervals of the individual parameters and combined in a least favorable configuration to ensure nominal coverage.

The MOVER was extended by Donner and Zou [7], who demonstrated that valid closed-form confidence intervals for nonlinear functions of scale parameters, such as the normal standard deviation, can be obtained using endpoint-based constructions. Their work establishes a unifying theoretical framework in which confidence intervals for both linear and nonlinear functions can be derived directly from marginal confidence limits.

Consider a ZIR distribution with parameters

(p_{i}, σ_{i})

, where

p_{i}

denotes the probability of a structural zero and

σ_{i} > 0

is the Rayleigh scale parameter. The mean of the distribution for group i is

θ_{i} = (1 - p_{i}) σ_{i} \sqrt{\frac{π}{2}} .

Let

[p_{i, L}, p_{i, U}]

be the marginal

100 (1 - α) %

two-sided confidence interval for

p_{i}

. The lower confidence limit

p_{i, L}

and upper confidence limit

p_{i, U}

are defined as

p_{i, L} = \{\begin{matrix} 0 & ; n_{i (0)} = 0 \\ B^{- 1} (α / 2; n_{i (0)}, n_{i} - n_{i (0)} + 1) & ; 0 < n_{i (0)} < n_{i} \\ {(α / 2)}^{1 / n} & ; n_{i (0)} = n_{i} \end{matrix}

and

p_{i, U} = \{\begin{matrix} 1 - {(α / 2)}^{1 / n} & ; n_{i (0)} = 0 \\ B^{- 1} (1 - α / 2; n_{i (0)} + 1, n_{i} - n_{i (0)}) & ; 0 < n_{i (0)} < n_{i}, \\ 1 & ; n_{i (0)} = n_{i} \end{matrix}

where

B^{- 1} (q; a, b)

denotes the q-th quantile of the Beta

(a, b)

distribution.

Let

[σ_{i, L}, σ_{i, U}]

be the marginal

100 (1 - α) %

two-sided confidence interval for

σ_{i}

. The lower confidence limit

σ_{i, L}

and upper confidence limit

σ_{i, U}

are defined as

σ_{i, L} = {\hat{σ}}_{i} - z_{α / 2} \sqrt{\frac{{\hat{σ}}_{i}^{2}}{4 n_{i (1)}}}

and

σ_{i, U} = {\hat{σ}}_{i} + z_{α / 2} \sqrt{\frac{{\hat{σ}}_{i}^{2}}{4 n_{i (1)}}},

where

z_{α / 2}

is the upper

(α / 2)

-quantile of the standard normal distribution.

Following the endpoint-based MOVER principle, the confidence interval for

θ_{i}

is constructed as

θ_{i, L} = (1 - p_{i, U}) σ_{i, L} \sqrt{\frac{π}{2}}

and

θ_{i, U} = (1 - p_{i, L}) σ_{i, U} \sqrt{\frac{π}{2}} .

Under the MOVER variance recovery principle, the variance of

{\hat{θ}}_{i}

at the lower and upper confidence limits can be approximated by

Var ({\hat{θ}}_{i}) |_{θ_{i, L}} = {(\frac{{\hat{θ}}_{i} - θ_{i, L}}{z_{α / 2}})}^{2}

and

Var ({\hat{θ}}_{i}) |_{θ_{i, U}} = {(\frac{θ_{i, U} - {\hat{θ}}_{i}}{z_{α / 2}})}^{2} .

These recovered variances implicitly account for the combined uncertainty arising from both the zero-inflation parameter

p_{i}

and the scale parameter

σ_{i}

, without requiring their joint sampling distribution.

For k independent groups i and l, define the difference

θ_{i l} = θ_{i} - θ_{l}

. Given the marginal MOVER confidence intervals

[θ_{i, L}, θ_{i, U}]

and

[θ_{l, L}, θ_{l, U}]

, the endpoint-based MOVER confidence interval for

θ_{i l}

is

θ_{i l, L} = θ_{i, L} - θ_{l, U}

and

θ_{i l, U} = θ_{i, U} - θ_{l, L} .

From the variance recovery perspective, the recovered variance of

{\hat{θ}}_{i l} = {\hat{θ}}_{i} - {\hat{θ}}_{l}

at the lower and upper limits is given by

Var ({\hat{θ}}_{i l}) |_{θ_{i l, L}} = Var ({\hat{θ}}_{i}) |_{θ_{i, L}} + Var ({\hat{θ}}_{l}) |_{θ_{l, U}}

and

Var ({\hat{θ}}_{i l}) |_{θ_{i l, U}} = Var ({\hat{θ}}_{i}) |_{θ_{i, U}} + Var ({\hat{θ}}_{l}) |_{θ_{l, L}},

assuming independence between the two samples (i and l).

This leads to the closed-form endpoint-based MOVER confidence interval. Therefore, in Appendix B, the

100 (1 - α) %

two-sided confidence interval for

θ_{i l}

based on the MOVER method is given by

{CI}_{i l (MOVER)} = [L_{i l (MOVER)}, U_{i l (MOVER)}] = [θ_{i, L} - θ_{l, U}, θ_{i, U} - θ_{l, L}] .

(9)

The complete procedure is described in Algorithm 3.

Algorithm 3 MOVER Confidence Interval.

Estimates ${\hat{p}}_{i}$ and ${\hat{σ}}_{i}$ , $i = 1, 2, . . ., k$ ;
Significance level $α$ ;
For $i = 1, 2, . . ., k$ do
Obtain confidence interval $[p_{i, L}, p_{i, U}]$ for $p_{i}$
Obtain confidence interval $[σ_{i, L}, σ_{i, U}]$ for $σ_{i}$
Set $θ_{i, L} = (1 - p_{i, U}) σ_{i, L} \sqrt{π / 2}$ and $θ_{i, U} = (1 - p_{i, L}) σ_{i, U} \sqrt{π / 2}$
End for
For all $i < l$ do
Set $L_{i l (MOVER)} = θ_{i, L} - θ_{l, U}$ and $U_{i l (MOVER)} = θ_{i, U} - θ_{l, L}$
End for

2.4. Delta-Method Normal Approximation

The delta method is a widely used asymptotic technique for approximating the sampling distribution of a function of an estimator. It is particularly useful when the parameter of interest is a nonlinear function of one or more parameters whose estimators are asymptotically normal. The delta-method normal approach relies on a first-order Taylor series expansion to obtain a normal approximation for the transformed estimator.

The delta-method normal approximation is used to construct confidence intervals for pairwise differences of parameters. The method relies on asymptotic normality and variance approximation obtained via the delta method.

Let

{\hat{θ}}_{i}

be the estimator of

θ_{i}

for group i. Using the delta method, the asymptotic variance of

{\hat{θ}}_{i}

is approximated by

Var ({\hat{θ}}_{i}) = \frac{π}{2} [\frac{{\hat{σ}}_{i}^{2} {\hat{p}}_{i} (1 - {\hat{p}}_{i})}{n_{i}} + \frac{{(1 - {\hat{p}}_{i})}^{2} {\hat{σ}}_{i}^{2}}{4 n_{i (1)}}] .

Assuming independence between estimators from different groups, the variance of the difference

{\hat{θ}}_{i l} = {\hat{θ}}_{i} - {\hat{θ}}_{l}

is approximated by

Var ({\hat{θ}}_{i l}) = Var ({\hat{θ}}_{i} - {\hat{θ}}_{l}) = Var ({\hat{θ}}_{i}) + Var ({\hat{θ}}_{l}) .

Therefore, in Appendix C, the

100 (1 - α) %

two-sided confidence interval for

θ_{i l}

based on the delta-method normal approximation is given by

\begin{matrix} {CI}_{i l (Delta)} & = & [L_{i l (Delta)}, U_{i l (Delta)}] \\ = & [{\hat{θ}}_{i l} - z_{α / 2} \sqrt{Var ({\hat{θ}}_{i}) + Var ({\hat{θ}}_{l})}, {\hat{θ}}_{i l} + z_{α / 2} \sqrt{Var ({\hat{θ}}_{i}) + Var ({\hat{θ}}_{l})}] . \end{matrix}

(10)

Algorithm 4 presents the detailed procedure.

Algorithm 4 Delta-Method Normal Approximation Confidence Interval.

Estimates ${\hat{p}}_{i}$ and ${\hat{σ}}_{i}$ , $i = 1, 2, . . ., k$ ;
Significance level $α$ ;
For $i = 1, 2, . . ., k$ do
If $n_{i (1)} > 0$ then
Set $Var ({\hat{θ}}_{i}) = \frac{π}{2} [\frac{{\hat{σ}}_{i}^{2} {\hat{p}}_{i} (1 - {\hat{p}}_{i})}{n_{i}} + \frac{{(1 - {\hat{p}}_{i})}^{2} {\hat{σ}}_{i}^{2}}{4 n_{i (1)}}]$
Else set $Var ({\hat{θ}}_{i}) = \infty$
End for
For all $i < l$ do
Set ${\hat{θ}}_{i l} = {\hat{θ}}_{i} - {\hat{θ}}_{l}$
Set $L_{i l (Delta)} = {\hat{θ}}_{i l} - z_{α / 2} \sqrt{Var ({\hat{θ}}_{i}) + Var ({\hat{θ}}_{l})}$
Set $U_{i l (Delta)} = {\hat{θ}}_{i l} + z_{α / 2} \sqrt{Var ({\hat{θ}}_{i}) + Var ({\hat{θ}}_{l})}$
End for

2.5. Highest Posterior Density Method

The Bayesian method is a probabilistic framework for statistical inference and decision-making. In contrast to frequentist methodology, which assumes model parameters are fixed but unknown constants, the Bayesian paradigm treats parameters as random variables and quantifies uncertainty through probability distributions.

Bayesian inference is founded on Bayes’ theorem, which provides a systematic way to revise prior assumptions about unknown parameters in light of observed data. Prior beliefs are expressed through a prior distribution, while the information contained in the data is captured by the likelihood function. The combination of these elements results in the posterior distribution, representing updated knowledge about the parameters after the data have been observed.

A key strength of the Bayesian framework is its ability to formally incorporate prior information or expert knowledge into the analysis. This feature is especially advantageous when dealing with small samples or complex statistical models. Moreover, Bayesian methods yield a comprehensive probabilistic characterization of uncertainty, enabling direct probability statements about parameters and derived quantities, such as percentiles or reliability indices.

Bayesian inference also supports interval estimation via credible intervals, including both equal-tailed intervals and HPD credible intervals, which are often easier to interpret than classical confidence intervals. Furthermore, modern computational tools, particularly Markov Chain Monte Carlo (MCMC) algorithms, have greatly expanded the applicability of Bayesian methods to problems where closed-form solutions are not available.

In summary, the Bayesian method provides a flexible and robust alternative to traditional statistical techniques and has been widely applied in areas such as reliability engineering, survival analysis, and numerous other scientific disciplines.

For

i = 1, 2, \dots, k

, let

X_{i} = (X_{i 1}, X_{i 2}, \dots, X_{i n_{i}})

be independent samples from k populations. For each population, the model is characterized by a mixing probability

p_{i}

and a scale parameter

σ_{i}

. The parameters are assigned independent prior distributions given by

p_{i} \sim Beta (a_{p}, b_{p})

and

σ_{i}^{2} \sim I G (a_{σ}, b_{σ}),

where

I G (\cdot, \cdot)

denotes the inverse-gamma distribution.

Let

n_{i}

be the sample size in group i,

n_{i (0)}

the number of zero observations, and

n_{i (1)} = n_{i} - n_{i (0)}

the number of positive observations. Conditional on the observed data, the posterior distribution of

p_{i}

is

p_{i} ∣ X_{i} \sim Beta (n_{i (0)} + a_{p}, n_{i} - n_{i (0)} + b_{p}) .

When

n_{i (1)} > 0

, the posterior distribution of

σ_{i}^{2}

is

σ_{i}^{2} ∣ X_{i} \sim IG (a_{σ} + n_{i (1)}, b_{σ} + \frac{1}{2} \sum_{i : X_{i} > 0} X_{i}^{2})

whereas if

n_{i (1)} = 0

, the posterior of

σ_{i}^{2}

coincides with its prior.

Posterior samples

(p_{i}, σ_{i})

are generated by Monte Carlo simulation. For each draw, the parameter of interest is defined as

θ_{i} = (1 - p_{i}) σ_{i} \sqrt{\frac{π}{2}} .

For any pair of groups

(i, l)

, posterior samples of the difference are obtained as

θ_{i l} = θ_{i} - θ_{l} = (1 - p_{i}) σ_{i} \sqrt{\frac{π}{2}} - (1 - p_{l}) σ_{l} \sqrt{\frac{π}{2}} .

Therefore, the

100 (1 - α) %

two-sided credible interval for

θ_{i l}

based on the HPD method is given by

{CI}_{i l (HPD)} = [L_{i l (HPD)}, U_{i l (HPD)}],

(11)

where

L_{i l (HPD)}

and

U_{i l (HPD)}

are obtained using the HPD interval function in R software (version 2024.12.0 + 467). The complete procedure is described in Algorithm 5.

Algorithm 5 HPD Credible Interval.

Priors $p_{i} \sim Beta (a_{p}, b_{p})$ and $σ_{i}^{2} \sim IG (a_{σ}, b_{σ})$ ;
Number of posterior draws m;
Credibility level $1 - α$ ;
For $b = 1, 2, . . ., m$ do
For $i = 1, 2, . . ., k$ do
Draw $p_{i}^{(b)} \sim Beta (n_{i (0)} + a_{p}, n_{i} - n_{i (0)} + b_{p})$
Draw $σ_{i}^{2 (b)} \sim IG (a_{σ} + n_{i (1)}, b_{σ} + \frac{1}{2} \sum_{i : X_{i} > 0} X_{i}^{2})$
Set $θ_{i}^{(b)} = (1 - p_{i}^{(b)}) σ_{i}^{(b)} \sqrt{π / 2}$
End for
For all $i < l$ do
Set $θ_{i l}^{(b)} = θ_{i}^{(b)} - θ_{l}^{(b)}$
End for
End for
For all $i < l$ do
Compute $L_{i l (HPD)}$ and $U_{i l (HPD)}$
End for

3. Simulation Studies

A detailed simulation study was conducted to evaluate the performance of the proposed SCIs. The investigation focused on two main performance metrics: coverage probability (CP) and average interval length (AL). All computational experiments were performed in R software, with each scenario replicated a sufficiently large number of times to ensure stable and reliable results. For each interval estimation method, the empirical CP was calculated as the proportion of simulated intervals that contained the true parameter value, while the AL was defined as the mean width of the intervals across replications. These measures formed the basis for comparing the efficiency and robustness of the SCIs with various parameter settings and sample sizes.

To determine the most suitable interval method, preference was given to methods achieving an empirical CP at or above the nominal confidence level of 0.95. Among the methods satisfying this criterion, the one yielding the smallest AL was regarded as the most efficient. This dual evaluation framework ensures that both the accuracy and precision of the interval estimators are appropriately considered.

The SCIs were constructed using five different methods: the GCI method, the PB method, the MOVER method, the delta-method normal approximation, and the HPD method. The simulation study examined three scenarios corresponding to

k = 3, 6

and 10 groups. The sample sizes were denoted by

n (k)

; the probabilities of zero inflation by

p (k)

; and the Rayleigh scale parameters by

σ (k)

. Note that

n (k)

,

p (k)

, and

σ (k)

indicate that the same values n, p, and

σ

, respectively, are repeated across the k groups. For every sample generated, an additional 1000 simulations were carried out following Algorithms 1, 2 and 5. For each set of parameter values, 1000 random samples were generated using Algorithm 6.

Algorithm 6 CP and AL.

True pairwise difference $θ_{i l}$ ;
Coverage indicators $C_{method}^{(r)}$ ;
Interval lengths $L_{method}^{(r)}$ ;
$r = 1, 2, \dots, B$ for each method
For each method $\in {GCI, PB, MOVER, delta, HPD}$ do
Compute average coverage probability $CP = \frac{1}{B} \sum_{r = 1}^{B} C^{(r)}$
Compute average interval length $AL = \frac{1}{B} \sum_{r = 1}^{B} L^{(r)}$
End for

From Figure 1, Figure 2 and Figure 3, the results for

k = 3

indicate that the GCI method consistently produced the lowest CPs, with values typically falling below the nominal 0.95 level. This under-coverage became more pronounced when the third group exhibited higher zero-inflation probabilities or when the scale parameter showed greater heterogeneity. Although the GCI yielded relatively narrow ALs, these intervals were often too short to achieve nominal coverage.

In contrast, both the PB method and delta-method normal approximation attained CPs close to the targeted 0.95 across nearly all parameter settings. Their CP performance improved as sample sizes increased while still maintaining moderate ALs. Between the two, the PB method generally produced slightly shorter intervals for a given CP, particularly in balanced sample-size scenarios.

The MOVER method consistently achieved CPs equal to or near 1.0000 in every configuration, indicating substantial over-coverage. However, this gain in CP came at the expense of much wider intervals—often more than double the ALs of the PB method and delta-method normal approximation, especially with greater scale disparities. Although the MOVER method is the most conservative method, its excessive interval lengths make it less efficient for practical applications.

The Bayesian method performed similarly to the PB method and delta-method normal approximation, with CPs typically ranging from 0.94 to 0.96 depending on the scenario. Its ALs were also comparable to those of the PB method and delta-method normal approximation and consistently far shorter than those of the MOVER method. Moreover, the HPD method maintained stable CP and AL behavior across variations in sample size and zero-inflation probability, demonstrating robustness.

Increasing sample sizes led to improved CPs for all approaches except the MOVER method (which had already attained 1.0000) and generally reduced ALs. This trend was especially noticeable for the PB method, delta-method normal approximation, and HPD method, whose CPs approached the nominal level more closely as n increased to 100 or 200.

Overall, the PB method, delta-method normal approximation, and HPD method provided the most favorable balance between coverage accuracy and interval length. The MOVER method offered the highest CPs but at the cost of substantially inflated ALs, whereas the GCI method persistently under-covered across most settings. These patterns identify the PB method, delta-method normal approximation, and HPD method as the most efficient SCI procedures for the three-sample ZIR scenarios.

Figure 4, Figure 5 and Figure 6 summarize the empirical CPs and ALs of the 95% two-sided SCIs for all pairwise mean differences in various six-sample configurations. The overall trends observed in the three-sample setting remain consistent and become even more pronounced when extended to six samples.

Across every parameter setup, the GCI method produced the lowest CPs, typically ranging from 0.83 to 0.91. Its under-coverage worsened in scenarios where the latter groups had higher zero-inflation probabilities (e.g.,

p_{4} = p_{5} = p_{6} = 0.3

) or when scale heterogeneity was substantial (e.g.,

σ_{4} = σ_{5} = σ_{6} = 0.75

). Although the GCI method generated the shortest intervals among all methods, this gain in precision came at the cost of inadequate coverage, demonstrating that its intervals were too narrow to maintain the nominal 0.95 confidence level.

The MOVER interval again displayed extreme conservatism, yielding CPs essentially equal to 1.0000 in every scenario. However, this was accompanied by substantially inflated ALs, often two to three times larger than those of the PB, delta-method normal approximation, and HPD intervals. The effect was particularly evident when the scale parameters increased from 0.25 to 0.75, producing noticeably wider MOVER intervals. Although MOVER interval ensures near-certainty in coverage, its inefficiency renders it the least practical option.

Sample size played a major role in shaping CP and AL across all methods. As sample sizes increased from 30 to 200, CPs improved and ALs consistently decreased. The PB, delta-method normal approximation, and HPD method showed the greatest gains, exhibiting highly stable CPs close to the nominal level and substantially shorter intervals with larger samples. In contrast, the GCI method continued to under-cover even with larger n, whereas the MOVER method persistently over-covered regardless of sample size.

Overall, the six-sample results reinforce the conclusions drawn from smaller-sample analyses. The PB, delta-method normal approximation, and HPD methods provide the most effective balance between achieving nominal coverage and maintaining reasonably short intervals. The GCI remains overly liberal, while MOVER interval remains excessively conservative. Thus, PB, delta-method normal approximation, and HPD procedures stand out as the most dependable and efficient SCI approaches for six-sample ZIR scenarios.

Figure 7, Figure 8 and Figure 9 report the empirical CPs and ALs of the 95% SCIs for all pairwise mean differences when

k = 10

. The general trends observed in the three- and six-sample situations persist and become even more pronounced as the dimensionality increases.

Across every configuration, the GCI procedure again produced CPs well below the nominal level, with values roughly between 0.83 and 0.90. Its under-coverage intensified when the last five groups exhibited higher zero-inflation probabilities (i.e.,

p_{6} = p_{7} = p_{8} = p_{9} = p_{10} = 0.3

) or when the scale parameters were larger (e.g.,

σ_{6} = σ_{7} = σ_{8} = σ_{9} = σ_{10} = 0.75

). Although the GCI continued to yield the shortest intervals, these reduced ALs did not offset the considerable loss in coverage, reaffirming its overly liberal nature.

The PB, delta-method normal approximation, and HPD methods exhibited strong and consistent performance across nearly all settings. Their CPs remained close to the desired 0.95 level, with only minor decreases under higher zero-inflation conditions. These three methods also showed clear gains from increased sample size, with improvements in both CP and AL as the smallest group size rose from 30 to 200. Regarding interval width, PB, delta-method normal approximation, and HPD intervals stayed within a moderate range—substantially shorter than those of MOVER interval yet appropriately wider than the GCI—making them practical and efficient options.

As in the previous figures, the MOVER method produced CPs essentially equal to 1.0000 across all scenarios, underscoring its conservativeness. However, this came with extremely large interval lengths, often two to three times greater than those obtained from the PB, delta-method normal approximation, and HPD methods. The inefficiency of MOVER interval became even more pronounced when scale parameters increased, resulting in notably wider intervals.

Sample size exerted a strong and consistent influence. As sample sizes increased from

n = 30

to

n = 200

, all methods produced shorter intervals, with PB, delta-method normal approximation, and HPD showing the most substantial improvements in precision and stability. The GCI continued to under-cover even in the largest samples, while MOVER persistently over-covered across all settings.

In summary, the findings for the ten-sample case reinforce the conclusions from Figure 7, Figure 8 and Figure 9. The PB, delta-method normal approximation, and HPD methods provide the best compromise between nominal coverage and reasonable interval widths. The GCI remains overly liberal and unreliable, whereas MOVER interval, although ensuring high CP, is excessively conservative and inefficient. Therefore, in high-dimensional ZIR scenarios, PB, delta-method normal approximation, and HPD methods stand out as the most dependable and effective procedures for constructing SCIs.

4. Empirical Application

Global road traffic accidents represent a critical public health crisis, serving as a major contributor to premature death and injury. Beyond the immediate human suffering, these incidents impose substantial economic burdens through rising healthcare expenditures, reduced workforce productivity, and the long-term costs associated with disability rehabilitation. The causes of road traffic accidents are multifaceted, arising from a complex interplay between risky driving behaviors—such as speeding and alcohol impairment—and external factors including inadequate infrastructure and weak law enforcement. Mitigating this challenge requires an integrated strategy that combines legislative reforms, public education initiatives, and infrastructure improvements. Consequently, robust statistical modeling is essential for developing evidence-based interventions aimed at reducing both the frequency and severity of traffic crashes.

This study examines fatality data obtained from the Road Accident Victims Protection Company Limited (Thai RSC) (https://www.thairsc.com/, accessed on 25 December 2025) for the period from 1 January to 24 December 2025, as presented in Table 1. The analysis encompasses 36 districts across Chachoengsao, Uttaradit, and Chaiyaphum provinces. As shown in Figure 10, Figure 11 and Figure 12, preliminary analysis indicates that the distributions of non-zero fatalities in these regions are markedly right-skewed. As shown in Table 2, model selection based on the Akaike Information Criterion (AIC) shows that the Rayleigh distribution provides the best fit for the non-zero observations. However, to accommodate the substantial proportion of zero-fatality reports, the ZIR model is identified as the most appropriate framework for modeling the complete dataset.

Table 3 reports the descriptive statistics of fatality counts for Chachoengsao, Uttaradit, and Chaiyaphum provinces. The estimated pairwise mean differences are 6.3476 (Chachoengsao–Uttaradit), 7.0852 (Chachoengsao–Chaiyaphum), and 0.7376 (Uttaradit–Chaiyaphum). As shown in Table 4, the 95% SCIs produced by all considered methods contain the true mean differences. The MOVER approach yields the widest intervals, in agreement with the simulation results presented in the previous section. In contrast, the GCI, PB, and delta intervals are shorter than the HPD credible interval.

It is important to note that, unlike the simulation study based on 1000 replications, this analysis corresponds to a single empirical dataset. The simulation results indicate that the CPs of the GCI, PB, and delta intervals fall below the nominal 0.95 level, rendering them unsuitable for constructing reliable 95% SCIs for mean differences in this setting. Conversely, the HPD interval consistently attains CPs exceeding the nominal level. Therefore, the HPD interval is recommended for constructing 95% SCIs for pairwise mean differences in fatalities among the three provinces.

It should be noted that the empirical analysis serves to demonstrate the practical implementation of the proposed simultaneous interval estimation procedures, while detailed model diagnostics and causal interpretation are beyond the scope of the present study.

5. Discussion

Overall, the simulation study reveals clear and systematic performance differences among the competing SCI methods across all scenarios. The GCI method consistently under-covered, with empirical coverage probabilities falling well below the nominal 0.95 level, particularly with higher zero inflation, greater scale heterogeneity, and increasing numbers of groups. Although the GCI produced the shortest intervals, these were generally too narrow to support reliable inference. In contrast, the MOVER method was highly conservative, achieving coverage probabilities close to 1.0000 in all settings but at the cost of substantially inflated interval widths, which severely limited its practical efficiency.

By comparison, the PB, delta-method normal approximation, and HPD methods provided the most favorable trade-off between coverage accuracy and interval length. Their CPs remained close to the nominal level across a broad range of sample sizes, zero-inflation levels, and dimensions while yielding intervals that were moderate in length and markedly shorter than those of the MOVER method. Moreover, their performance improved with increasing sample size, leading to greater stability and precision. These results identify the PB, delta-method normal approximation, and HPD methods as the most reliable and efficient approaches for constructing SCIs for pairwise mean differences in ZIR distributions.

6. Conclusions

In this work, SCIs for all pairwise differences of means in ZIR distributions were investigated in a variety of sampling and distributional settings. Several competing methods were examined, including the GCI, PB, delta-method normal approximation, MOVER, and HPD methods. Extensive simulation studies were conducted to evaluate their performance in terms of empirical CP and AL across different sample sizes, probabilities of zero inflation, scale parameters, and numbers of groups.

The results indicate substantial differences among the methods. The GCI method consistently failed to achieve nominal coverage, particularly in scenarios with higher zero inflation and increased scale heterogeneity, despite producing relatively short intervals. In contrast, the MOVER method was overly conservative, yielding near-unit coverage probabilities at the expense of excessively wide intervals. Among the methods considered, the PB, delta-method normal approximation, and HPD methods demonstrated the most favorable performance, providing CPs close to the nominal level with moderate interval lengths and improved stability as sample sizes increased, even in higher-dimensional settings.

Overall, the PB, delta-method normal approximation, and HPD methods are recommended for constructing SCIs for pairwise mean differences in ZIR distributions due to their reliability and efficiency. These findings contribute to the growing literature on inference for zero-inflated models and offer practical guidance for applied researchers working with skewed data containing excess zeros. Future research may extend the proposed framework to other zero-inflated distributions, assess robustness under model misspecification conditions, and explore computational enhancements for large-scale applications.

Author Contributions

Conceptualization, S.-A.N. and W.T.; methodology, S.-A.N. and W.T.; software, W.T. and N.S.; validation, S.-A.N. and W.T.; formal analysis, S.-A.N. and W.T.; investigation, S.-A.N. and A.W.; resources, W.T. and N.S.; data curation, W.T. and A.W.; writing—original draft preparation, W.T.; writing—review and editing, S.-A.N. and W.T.; visualization, W.T. and N.S.; supervision, S.-A.N.; project administration, S.-A.N.; funding acquisition, S.-A.N. All authors have read and agreed to the published version of the manuscript.

Funding

This research budget was allocated by the National Science, Research, and Innovation Fund (NSRF) and King Mongkut’s University of Technology North Bangkok: KMUTNB-FF-69-B-07.

Data Availability Statement

The data presented in this study are available in https://www.thairsc.com.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Let X be a random variable that follows a ZIR distribution with parameters p and

σ

. The population mean is

θ = (1 - p) σ \sqrt{π / 2}

. Let

\hat{θ} = g (\hat{p}, \hat{σ})

be the estimator of

θ

, where

\hat{p}

and

\hat{σ}

are the maximum likelihood estimators (MLEs) of p and

σ

, respectively. Then, an asymptotic estimator of the variance of

\hat{θ}

is given by

Var (\hat{θ}) = \frac{π}{2} [\frac{{\hat{σ}}^{2} \hat{p} (1 - \hat{p})}{n} + \frac{{(1 - \hat{p})}^{2} {\hat{σ}}^{2}}{4 n_{1}}],

where n is the total sample size and

n_{1}

is the number of positive observations.

Proof.

Let

X_{1}, X_{2}, \dots, X_{n}

be independent and identically distributed (i.i.d.) observations from a ZIR distribution with parameters p and

σ

. Because the Rayleigh density is zero at

x = 0

, all zero observations arise from the point mass at zero. The log-likelihood therefore factorizes into two independent components: one involving p and the other involving

σ

. The estimator

\hat{p}

is the sample proportion of zeros and satisfies

\sqrt{n} (\hat{p} - p) \overset{d}{\to} N (0, p (1 - p)) .

The estimator

\hat{σ}

is the MLE of the Rayleigh scale parameter based on the positive observations only. For a single positive Rayleigh observation, the Fisher information for

σ

is

4 / σ^{2}

. Since the expected number of positive observations is

n (1 - p)

,

\sqrt{n} (\hat{σ} - σ) \overset{d}{\to} N (0, \frac{σ^{2}}{4 (1 - p)}) .

Because the likelihood factorizes with respect to p and

σ

,

\hat{p}

and

\hat{σ}

are asymptotically independent. Define

g (p, σ) = (1 - p) σ \sqrt{\frac{π}{2}},

so that

\hat{θ} = g (\hat{p}, \hat{σ})

. The gradient of g is

\nabla g (p, σ) = {(\frac{𝜕 g}{𝜕 p}, \frac{𝜕 g}{𝜕 σ})}^{⊤} = {(- σ \sqrt{\frac{π}{2}}, (1 - p) \sqrt{\frac{π}{2}})}^{⊤} .

By the multivariate delta method, the variance of

\hat{θ}

is

Var (\hat{θ}) \approx \nabla g {(p, σ)}^{⊤} (\begin{matrix} \frac{p (1 - p)}{n} & 0 \\ 0 & \frac{σ^{2}}{4 n (1 - p)} \end{matrix}) \nabla g (p, σ) .

Evaluating the quadratic form yields

Var (\hat{θ}) = \frac{π}{2} [\frac{σ^{2} p (1 - p)}{n} + \frac{{(1 - p)}^{2} σ^{2}}{4 n (1 - p)}] .

Since

n_{1} / n \overset{p}{\to} (1 - p)

, the large-sample approximation is

\frac{1}{n (1 - p)} \approx \frac{1}{n_{1}} .

Replacing the unknown parameters in

Var (\hat{θ})

by their MLEs gives

Var (\hat{θ}) = \frac{π}{2} [\frac{{\hat{σ}}^{2} \hat{p} (1 - \hat{p})}{n} + \frac{{(1 - \hat{p})}^{2} {\hat{σ}}^{2}}{4 n_{1}}] .

□

Appendix B

For

i = 1, 2, . . ., k

, let

X_{i} = (X_{i 1}, X_{i 2}, . . ., X_{i n_{i}})

be independent random samples from ZIR distributions with parameters

p_{i}

and

σ_{i}

. The mean of group i is

θ_{i} = (1 - p_{i}) σ_{i} \sqrt{π / 2}

. Let

[p_{i, L}, p_{i, U}]

and

[σ_{i, L}, σ_{i, U}]

be the marginal

100 (1 - α) %

two-sided confidence interval for

p_{i}

and

σ_{i}

, respectively. Define

θ_{i, L} = (1 - p_{i, U}) σ_{i, L} \sqrt{π / 2}

and

θ_{i, U} = (1 - p_{i, L}) σ_{i, U} \sqrt{π / 2}

. For

i, l = 1, 2, . . ., k

and

i \neq l

, the

100 (1 - α) %

two-sided confidence interval for

θ_{i l} = θ_{i} - θ_{l}

using the endpoint-based MOVER confidence interval is

P {θ_{i, L} - θ_{l, U} \leq θ_{i} - θ_{l} \leq θ_{i, U} - θ_{l, L}, \forall i \neq l} \to 1 - α .

Proof.

By construction of the marginal confidence intervals,

P (p_{i, L} \leq p_{i} \leq p_{i, U}) \geq 1 - α

and

P (σ_{i, L} \leq σ_{i} \leq σ_{i, U}) \geq 1 - α

, where

i = 1, 2, \dots, k

. Since the mean function

θ_{i} (p_{i}, σ_{i}) = (1 - p_{i}) σ_{i} \sqrt{π / 2}

is monotonically decreasing in

p_{i}

and monotonically increasing in

σ_{i}

, it follows that for all admissible values

(p_{i}, σ_{i}) \in [p_{i, L}, p_{i, U}] \times [σ_{i, L}, σ_{i, U}]

. Then

θ_{i, L} \leq θ_{i} \leq θ_{i, U}

, where

i = 1, 2, \dots, k

. Therefore, whenever both marginal confidence intervals simultaneously cover their true parameters,

θ_{i} \in [θ_{i, L}, θ_{i, U}]

and

θ_{l} \in [θ_{l, L}, θ_{l, U}]

. For any

θ_{i} \in [θ_{i, L}, θ_{i, U}]

and

θ_{l} \in [θ_{l, L}, θ_{l, U}]

, the difference

θ_{i l} = θ_{i} - θ_{l}

satisfies

θ_{i, L} - θ_{l, U} \leq θ_{i} - θ_{l} \leq θ_{i, U} - θ_{1, L} .

Hence,

{θ_{i} \in [θ_{i, L}, θ_{i, U}], θ_{l} \in [θ_{l, L}, θ_{l, U}]} \subseteq {θ_{i l} \in [θ_{i, L} - θ_{l, U}, θ_{i, U} - θ_{l, L}]} .

Thus, whenever the marginal confidence intervals for

θ_{i}

and

θ_{l}

jointly cover their true values, the endpoint-based MOVER interval necessarily covers the true difference

θ_{i l}

. Consequently, the interval

C I_{MOVER} = [θ_{i, L} - θ_{l, U}, θ_{i, U} - θ_{l, L}]

achieves at least the nominal coverage probability

1 - α

under the least favorable configuration, which implies that

P {θ_{i, L} - θ_{l, U} \leq θ_{i} θ_{l} \leq θ_{i, U} - θ_{l, L}, \forall i \neq l} \to 1 - α .

□

Appendix C

For

i, l = 1, 2, \dots, k

and

i \neq l

, let

θ_{i l} = θ_{i} - θ_{l}

be the difference between means for groups i and l, respectively. Let

{\hat{θ}}_{i l}

be the estimator of

θ_{i l}

. Assume that

θ_{i}

and

θ_{l}

are independent. The variance of

{\hat{θ}}_{i l}

is approximated by

Var ({\hat{θ}}_{i l}) = Var ({\hat{θ}}_{i}) + Var ({\hat{θ}}_{l})

. Therefore, the confidence interval for

θ_{i l}

based on the delta-method normal approximation is given by

P \{{\hat{θ}}_{i l} - z_{α / 2} \sqrt{Var ({\hat{θ}}_{i}) + Var ({\hat{θ}}_{l})} \leq θ_{i} - θ_{l} \leq {\hat{θ}}_{i l} + z_{α / 2} \sqrt{Var ({\hat{θ}}_{i}) + Var ({\hat{θ}}_{l})}, \forall i \neq l\} \to 1 - α .

Proof.

Let

θ_{i} = g (η_{i})

be a scalar mean parameter for group i, where

η_{i} = {(p_{i}, σ_{i}^{2})}^{T}

is a vector of underlying parameters, and

{\hat{η}}_{i} = {({\hat{p}}_{i}, {\hat{σ}}_{i}^{2})}^{T}

is its estimator. Assume that

{\hat{η}}_{i}

satisfies the multivariate asymptotic normality condition

\sqrt{n_{i}} ({\hat{η}}_{i} - η_{i}) \overset{d}{\to} N (0, Σ_{i}),

where

Σ_{i}

is a finite, positive-definite covariance matrix.

Since

g (\cdot)

is differentiable at

η_{i}

, a first-order Taylor expansion of

{\hat{θ}}_{i} = g ({\hat{η}}_{i})

around

η_{i}

yields

{\hat{θ}}_{i} = θ_{i} + \nabla g {(η_{i})}^{T} ({\hat{η}}_{i} - η_{i}) + o_{p} (n_{i}^{- 1 / 2}) .

Multiplying both sides by

\sqrt{n_{i}}

, then

\sqrt{n_{i}} ({\hat{θ}}_{i} - θ_{i}) = \nabla g {(η_{i})}^{T} \sqrt{n_{i}} ({\hat{η}}_{i} - η_{i}) + o_{p} (l) .

By Slutsky’s theorem,

\sqrt{n_{i}} ({\hat{θ}}_{i} - θ_{i}) \overset{d}{\to} N (0, \nabla g {(η_{i})}^{T} Σ_{i} \nabla g (η_{i})) .

Hence,

{\hat{θ}}_{i} \sim N (θ_{i}, Var ({\hat{θ}}_{i})),

where

Var ({\hat{θ}}_{i})

is consistently estimated by

Var ({\hat{θ}}_{i}) = \frac{π}{2} [\frac{{\hat{σ}}_{i}^{2} {\hat{p}}_{i} (1 - {\hat{p}}_{i})}{n_{i}} + \frac{{(1 - {\hat{p}}_{i})}^{2} {\hat{σ}}_{i}^{2}}{4 n_{i (1)}}] .

Consider two independent estimators

{\hat{θ}}_{i}

and

{\hat{θ}}_{l}

. By independence and asymptotic normality,

{\hat{θ}}_{i} - {\hat{θ}}_{l} \sim N (θ_{i} - θ_{l}, Var ({\hat{θ}}_{i}) + Var ({\hat{θ}}_{l})) .

Then, by Slutsky’s theorem,

\frac{({\hat{θ}}_{i} - {\hat{θ}}_{l}) - (θ_{i} - θ_{l})}{\sqrt{Var ({\hat{θ}}_{i}) + Var ({\hat{θ}}_{l})}} \overset{d}{\to} N (0, 1) .

Let

z_{α / 2}

be the upper

α / 2

quantile of the standard normal distribution. From the limiting result above,

P (- z_{α / 2} \leq \frac{({\hat{θ}}_{i} - {\hat{θ}}_{l}) - (θ_{i} - θ_{l})}{\sqrt{Var ({\hat{θ}}_{i}) + Var ({\hat{θ}}_{l})}} \leq z_{α / 2}) \to 1 - α .

Rearranging the inequality gives

P {{\hat{θ}}_{i l} - z_{α / 2} \sqrt{Var ({\hat{θ}}_{i}) + Var ({\hat{θ}}_{l})} \leq θ_{i} - θ_{l} \leq {\hat{θ}}_{i l} + z_{α / 2} \sqrt{Var ({\hat{θ}}_{i}) + Var ({\hat{θ}}_{l})}, \forall i \neq l} \to 1 - α .

Therefore, the delta-method normal approximation confidence interval for the difference

θ_{i l} = θ_{i} - θ_{l}

is

[{\hat{θ}}_{i l} - z_{α / 2} \sqrt{Var ({\hat{θ}}_{i}) + Var ({\hat{θ}}_{l})}, {\hat{θ}}_{i l} + z_{α / 2} \sqrt{Var ({\hat{θ}}_{i}) + Var ({\hat{θ}}_{l})}]

and is asymptotically valid with coverage probability

1 - α

. □

References

Bashir, S.; Rasul, M. A new weighted Rayleigh distribution: Properties and applications on lifetime data. Open J. Stat. 2018, 8, 640–650. [Google Scholar] [CrossRef]
Al-Babtain, A.A. A new extended Rayleigh distribution. J. King Saud-Univ.-Sci. 2020, 32, 2576–2581. [Google Scholar] [CrossRef]
Krishnamoorthy, K.; Waguespack, D.; Hoang-Nguyen-Thuy, N. Confidence interval, prediction interval and tolerance limits for a two-parameter Rayleigh distribution. J. App. Stat. 2020, 47, 160–175. [Google Scholar] [CrossRef] [PubMed]
Almongy, H.M.; Almetwally, E.M.; Aljohani, H.M.; Alghamdi, A.S.; Hafez, E.H. A new extended Rayleigh distribution with applications of COVID-19 data. Results Phy. 2021, 23, 104012. [Google Scholar] [CrossRef] [PubMed]
Fuxiang, L.; Jianing, L.; Peng, X. A novel zero-inflated Rayleigh distribution and its properties. Results Phy. 2023, 51, 106634. [Google Scholar] [CrossRef]
Kijsason, S.; Niwitpong, S.-A.; Niwitpong, S. Confidence intervals for the parameter mean of zero-inflated two-parameter Rayleigh distribution. Symmetry 2025, 17, 1019. [Google Scholar] [CrossRef]
Donner, A.; Zou, G.Y. Closed-form confidence intervals for function of the normal standard deviation. Stat. Meth. Med. Res. 2010, 21, 347–359. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Comparison of the CPs and ALs of the SCIs for pairwise differences of means across three sample size scenarios: (A) coverage probabilities and (B) average lengths.

Figure 2. Comparison of the CPs and ALs of the SCIs for pairwise differences of means across three zero-inflation probability scenarios: (A) coverage probabilities and (B) average lengths.

Figure 3. Comparison of the CPs and ALs of the SCIs for pairwise differences of means across three Rayleigh scale parameter scenarios: (A) coverage probabilities and (B) average lengths.

Figure 4. Comparison of the CPs and ALs of the SCIs for pairwise differences of means across six sample size scenarios: (A) coverage probabilities and (B) average lengths.

Figure 5. Comparison of the CPs and ALs of the SCIs for pairwise differences of means across six zero-inflation probability scenarios: (A) coverage probabilities and (B) average lengths.

Figure 6. Comparison of the CPs and ALs of the SCIs for pairwise differences of means across six Rayleigh scale parameter scenarios: (A) coverage probabilities and (B) average lengths.

Figure 7. Comparison of the CPs and ALs of the SCIs for pairwise differences of means across ten sample size scenarios: (A) coverage probabilities and (B) average lengths.

Figure 8. Comparison of the CPs and ALs of the SCIs for pairwise differences of means across ten zero-inflation probability scenarios: (A) coverage probabilities and (B) average lengths.

Figure 9. Comparison of the CPs and ALs of the SCIs for pairwise differences of means across ten Rayleigh scale parameter scenarios: (A) coverage probabilities and (B) average lengths.

Figure 10. Histogram (A) and cumulative distribution function (CDF) (B) of the number of fatalities in Chachoengsao province.

Figure 11. Histogram (A) and cumulative distribution function (CDF) (B) of the number of fatalities in Uttaradit province.

Figure 12. Histogram (A) and cumulative distribution function (CDF) (B) of the number of fatalities in Chaiyaphum province.

Table 1. The number of fatalities from road accidents in Chachoengsao, Uttaradit, and Chaiyaphum provinces.

Provinces	Number of Fatalities
Chachoengsao	33	29	24	17	14	13	13	6	4
Chachoengsao	2	0
Uttaradit	22	14	10	9	6	4	3	2	0
Chaiyaphum	21	12	11	11	9	9	9	8	8
Chaiyaphum	6	5	4	4	3	1	0

Source: Road Accident Victims Protection Company Limited (https://www.thairsc.com/).

Table 2. AIC values for eight distributions in Chachoengsao, Uttaradit, and Chaiyaphum provinces.

Distributions	AIC
Distributions	Chachoengsao	Uttaradit	Chaiyaphum
Normal	78.2539	56.0499	92.7017
Log-normal	78.2990	52.6572	92.5681
Weibull	76.4777	53.0505	90.0834
Gamma	76.8536	52.8295	90.3140
Exponential	76.8168	52.7049	94.6322
Logistic	78.9707	56.0882	91.5391
Cauchy	82.6001	58.2615	94.9144
Rayleigh	75.5971	52.5331	88.3944

Note: Bold font means the distribution with the lowest AIC value.

Table 3. Statistics of the number of fatalities in Chachoengsao, Uttaradit, and Chaiyaphum provinces.

Statistics	Chachoengsao	Uttaradit	Chaiyaphum
Total sample size ( $n_{i}$ )	11	9	16
Number of zero observations ( $n_{i (0)}$ )	1	1	1
Number of non-zero observations ( $n_{i (1)}$ )	10	8	15
Estimator of zero-inflation probability ( ${\hat{p}}_{i}$ )	0.0909	0.1111	0.0625
Estimator of scale parameter ( ${\hat{σ}}_{i}$ )	13.0096	7.6076	6.5853
Estimator for mean ( ${\hat{θ}}_{i}$ )	14.8228	8.4753	7.7377

Table 4. The 95% two-sided SCIs for pairwise differences of means in ZIR distributions.

Comparison		Chachoengsao–Uttaradit	Chachoengsao–Chaiyaphum	Uttaradit–Chaiyaphum
GCI	${CI}_{GCI}$	[0.9969,13.4857]	[2.7051,14.3792]	[−2.7753,5.3810]
	Length	12.4888	11.6741	8.1563
PB	${CI}_{PB}$	[-0.0504,12.9515]	[1.2607,12.6435]	[−3.4796,4.4960]
	Length	13.0019	11.3828	7.9756
MOVER	${CI}_{MOVER}$	[−6.1946,18.0843]	[−3.7180,17.0077]	[−7.1009,8.5009]
	Length	24.2789	20.7257	15.6018
Delta	${CI}_{Delta}$	[−0.0734,12.7686]	[1.2916,12.8788]	[−3.4153,4.8905]
	Length	12.8420	11.5872	8.3058
HPD	${CI}_{HPD}$	[−0.8202,12.8196]	[0.9181,12.7709]	[−3.3553,5.6676]
	Length	13.6398	11.8528	9.0229

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Thangjai, W.; Niwitpong, S.-A.; Smithpreecha, N.; Wongkhao, A. Simultaneous Confidence Intervals for Pairwise Differences of Means in Zero-Inflated Rayleigh Distributions with an Application to Road Accident Fatalities Data. Mathematics 2026, 14, 569. https://doi.org/10.3390/math14030569

AMA Style

Thangjai W, Niwitpong S-A, Smithpreecha N, Wongkhao A. Simultaneous Confidence Intervals for Pairwise Differences of Means in Zero-Inflated Rayleigh Distributions with an Application to Road Accident Fatalities Data. Mathematics. 2026; 14(3):569. https://doi.org/10.3390/math14030569

Chicago/Turabian Style

Thangjai, Warisa, Sa-Aat Niwitpong, Narudee Smithpreecha, and Arunee Wongkhao. 2026. "Simultaneous Confidence Intervals for Pairwise Differences of Means in Zero-Inflated Rayleigh Distributions with an Application to Road Accident Fatalities Data" Mathematics 14, no. 3: 569. https://doi.org/10.3390/math14030569

APA Style

Thangjai, W., Niwitpong, S.-A., Smithpreecha, N., & Wongkhao, A. (2026). Simultaneous Confidence Intervals for Pairwise Differences of Means in Zero-Inflated Rayleigh Distributions with an Application to Road Accident Fatalities Data. Mathematics, 14(3), 569. https://doi.org/10.3390/math14030569

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Simultaneous Confidence Intervals for Pairwise Differences of Means in Zero-Inflated Rayleigh Distributions with an Application to Road Accident Fatalities Data

Abstract

1. Introduction

2. Methods

2.1. Generalized Confidence Interval Method

2.2. Parametric Bootstrap Method

2.3. Method of Variance Estimates Recovery

2.4. Delta-Method Normal Approximation

2.5. Highest Posterior Density Method

3. Simulation Studies

4. Empirical Application

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

Appendix B

Appendix C

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI