Multistage Estimation of the Rayleigh Distribution Variance

: In this paper we discuss the multistage sequential estimation of the variance of the Rayleigh distribution using the three ‐ stage procedure that was presented by Hall (Ann. Stat. 9(6):1229–1238, 1981). Since the Rayleigh distribution variance is a linear function of the distribution scale parameter’s square, it suffices to estimate the Rayleigh distribution’s scale parameter’s square. We tackle two estimation problems: first, the minimum risk point estimation problem under a squared ‐ error loss function plus linear sampling cost, and the second is a fixed ‐ width confidence interval estimation, using a unified optimal stopping rule. Such an estimation cannot be performed using fixed ‐ width classical procedures due to the non ‐ existence of a fixed sample size that simultaneously achieves both estimation problems. We find all the asymptotic results that enhanced finding the three ‐ stage regret as well as the three ‐ stage fixed ‐ width confidence interval for the desired parameter. The procedure attains asymptotic second ‐ order efficiency and asymptotic consistency. A series of Monte Carlo simulations were conducted to study the procedure’s performance as the optimal sample size increases. We found that the simulation results agree with the asymptotic results.


Introduction
Rayleigh distribution was presented by Rayleigh [1] in 1880 and primarily proposed in the context of a problem in acoustics and optics. As a useful reference for the history of the distribution, see Johnson et al. [2]. The distribution is extensively used in communication theory to describe the hourly median and immediate peak power of received radio signals. It plays a crucial role in survival analysis, reliability analysis, physical sciences, engineering, medical imaging science, applied statistics, and clinical studies. For more details related to its application, see Palovko [3], Gross and Clark [4], Lee and Wang [5], Rosen et al. [6], and Siddiqui [7]; for more information about the distribution and its statistical parameter inference, see Siddiqui [7], Hirano [8], Dyer and Whisenand [9], Howlader and Hossian [10], and Johnson et al. [2].
There have been many forms for the Rayleigh distribution to provide flexibility for modeling data. Vod [11,12] proposed a generalized form of the Rayleigh distribution and discussed its statistical and inferential properties. The probability density function of the generalized form with scale parameter and shape parameter is given by: where Γ is the gamma function. At 0 we obtain the standard Rayleigh distribution. The probability density function of the standard Rayleigh distribution with scale parameter is: Now, let , , … be independent and identically distributed random variables following a standard Rayleigh distribution with unknown scale parameter . It can be shown from Johnson et al. [2] that the population mean and population variance of the distribution are, respectively, 2 ⁄ and 2 .
Recently, Yousef et al. [13] discussed the Rayleigh distribution scale parameter's multistage estimation using Hall's [14] three-stage procedure. They tackled two estimation problems, point and confidence interval estimation, under a unified optimal stopping rule. They obtained the three-stage regret while they discussed the coverage probability through Monte Carlo simulation. They proved that the procedure attains asymptotic second-order efficiency and asymptotic consistency in the sense of Chow and Robbins [15] and Ghosh and Mukhopadhyay [16]. Tahir [17] proposed a purely sequential procedure to tackle the point estimation problem for the square of the scale parameter of the Rayleigh distribution, using a weighted squared-error loss function plus the cost of sampling. He found a second-order asymptotic expansion for the incurred regret and proved that the asymptotic regret is negative for a range of parameter values.
In this paper, the aim is to estimate the population variance 2 or the population second moment 2 of the Rayleigh distribution through estimating the scale parameter's square of the Rayleigh distribution. We do so because both the variance and the second moment are linear functions of . We use Hall's [14] procedure to carry out the study. Using sequential estimation goes back to the non-existence of a fixed-sample-size procedure that solves the problem analytically. For more details, see Mukhopadhyay and de Silva [18] (chapters 6-13 and 16) and Ghosh et al. [19] (Theorem 3.7.1). Since our focus is on the sequential estimation of the scale parameter's square, we use the following transformation to ease the subsequent sections' calculations. Let 2 ⁄ , and . Then the Jacobian transformation yields that: which is the probability density function of the exponential distribution with mean . It is readily known that 2 ⁄ is distributed according to the chi-squared distribution with two degrees of freedom . Let , , … , be a sequence of independent and identically distributed random variables following the exponential distribution in (1), then the raw moment is given by: Hence, for any and 1, 2, 3, … , we have: Moreover, let ∑ be the sample average of a random sample of size 2 , then: 0, , 2 , 9 .

Minimum Risk Point Estimation for the Parameter
It is a common practice in optimal decision structures to assume that the cost incurred in estimating by the corresponding sample measure takes the form of a squared-error loss function with linear sampling cost. For more details, see Degroot [20], Chow and Yu [21], and Martinsek [22].
. (2) The first term in (2) is known as the cost of estimation and the second term is the cost of sampling. The constant is the cost per unit sample, and is the estimation unit's cost. Details regarding the interpretation of are given in the following sections. The risk associated with (2) is: .
Considering as a continuous variable, we differentiate (3) concerning , and equate the result by zero to get the optimal sample size as: * Note that (4) is the optimal fixed-sample size required to minimize the risk had been known.

Fixed-Width Confidence Interval Estimation for the Parameter
Assume further that for a fixed-width 2 0 a confidence interval for is required, whose coverage probability is at least the nominal value 100 1 %. We use the central limit theorem and the normal approximation of the distribution of the sample average to propose the interval for . It follows that for large , the central limit theorem states that the quantitate Q = √ is distributed as a standard normal distribution. Therefore, Moreover, is the upper 2 ⁄ percentage, the cutoff point of the standard normal distribution. It follows that the optimal sample size required to satisfy the above objectives takes the form: * .
Since is numerically unknown, then * is unknown. It was shown by Dantzig [23] that there exists no fixed sample size that can achieve the above objectives uniformly for all 0 except sequentially.
In the following, we combine both the point estimation and the confidence interval estimation in one decision framework to make maximum use of the available sample to achieve several objectives in performing inference.

A Unified Decision Framework for Point and Interval Estimation
To determine the optimal sample size required to achieve both types of estimation, we equate both Equations (4) and (5) to obtain /√ , which results in √ and c * , contrary to what has been said about , being a known constant in the literature. In fact, is partially known. The term is a Fisher information and * represents the optimal cost of sampling, which depends on the unknown * . Therefore, represents the cost of estimation measured relative to the optimal cost of sampling. Clearly, → ∞ as → 0. However, we continue to use the optimal sample size in a general form as * to define the three-stage sampling procedure in the following section. The function 0 is a real-valued and continuously differentiable bounded function in a neighborhood around the parameter such that | | | ′′ * |.

Three-Stage Sequential Procedure for Inference
Hall [14] introduced the three-stage sampling procedure. The objective was to obtain a fixedwidth confidence interval for the mean of a normal distribution when the variance is finite but unknown. It was designed to overcome several technical problems in both one-by-one purely sequential schemes that were introduced by Anscombe [24], Robbins [25], and Chow and Robbins [15] and the two-stage bulk sample that was introduced by Stein [26,27] and Cox [28]. The procedure showed asymptotic second-order efficiency and asymptotic consistency in the sense of Chow and Robbins [15]. Mukhopadhyay [29] developed a unified framework for the three-stage procedure and laid out the theory associated with asymptotic second-order properties. As suggested by the name, the procedure is carried out in three consecutive sampling phases, the pilot phase, the main-study phase, and the fine-tuning phase.
The pilot-study phase: We start the process by observing a random pilot sample of size 2 , , … , to initiate the sampling procedure and calculate the sample estimate The main-study phase: In this phase, only a portion 0 1 of the optimal sample size * is estimated to avoid the early stopping and the possibility of oversampling. Let be the integervalued function. Then the procedure is terminated in this stage according to the following stopping rule: , 1}. The fine-tuning phase: We define the fine-tuning stopping rule as: If 1, sampling is terminated, else we continue to sample a sample of size 1 , say , , , … , , then we terminate the sampling course. Hence, we propose the point estimate and the confidence interval for the unknown parameter . As a result, the three-stage point estimate for the variance of the Rayleigh distribution is 2 .
The asymptotic characteristics of each phase are given in the following section.
The following asymptotic results were developed under the general regularity assumptions set forward by Hall [14] to develop a theory for the three-stage sampling procedure, which states:

0.
We have used the assumption that • and its derivatives are bounded and the fact that 0.  The following Theorem 2 provides second-order asymptotic expansion of a real-valued, continuously differentiable and bounded function 0 of . (6) and (7) and as → 0, we have: * 2 ′ .

Theorem 2. Under assumption (A), for the three-stage stopping rule
Proof. The proof is prompt if we consider the second-order Taylor expansion of around and make use of and of Theorem 1, and the assumption that the real-valued continuously differentiable function 0 and its derivatives are bounded. The proof of (iii) is complete. □ In the following section, we present the asymptotic theory for the stopping variable .

Asymptotic Characteristics for the Fine-Tuning Phase
Theorem 3. Under assumption (A), for the stopping variable , and as → 0, we have: It can be shown that as → 0, φ and are asymptotically uncorrelated.
Hence, E 1 . By using Taylor expansion for and utilizing Theorem 2, we have: * * ′ .
By using part (i), we obtain the proof. The proof is complete. □ Part follows directly from and of Theorem 3. □ The first part of Theorem 3 shows that lim → * ⁄ 1 (first-order asymptotic efficiency) and lim → * ∞ is bounded by a finite number that is unrelated to * . Such a property is called second-order asymptotic efficiency in the sense of Chow and Robbins [15]. Part (iii) shows that the variance increases as * increases. Theorem 4 below provides a second-order asymptotic expansion of the moments of a realvalued function ℎ 0 that is a continuously differentiable and bounded function of .

Theorem 4.
Let assumption (A) hold, and let ℎ 0 be a real-valued continuously differentiable function in a neighborhood around * such that lim Sup |ℎ′′ | |ℎ′′′ * )|. Then as → ∞, Proof. The proof is a direct substitution of and of Theorem 3 in the Taylor series expansion of the function ℎ . We omit any further details for brevity. The proof is complete. □ Lemma 1. As → 0, is an asymptotically standard normal distribution.
Proof. According to Anscombe [31], the central limit theorem → 0 , * * has an asymptotically standard normal distribution. By computing the moment generating function of , and using Theorem 4 we get the result. The proof is complete. □

Theorem 5.
Under assumption (A), for the stopping variable , and as → 0, we have:
Consider the second-order expansion of in the Taylor series around * ; we have: * * , where random variable is between and * . The assumption that 0 and its derivatives are bounded can be used to prove that ∑ 0. * * . *

.
Consider that the first-order Taylor expansion of ) gives ∑ ′ * . Again, we condition on the generated by , , … , . We get: Arguments similar to those used above and the fact that * yield statement of ( of Theorem 5. The proof is complete. □

Part
of Theorem 5 is the direct use of of Theorem 5; we omit details. The proof is complete. □

Part (i) of Theorem 4 shows that
is an asymptotically unbiased estimator of whereas the variance decreases as * increases.

The Asymptotic Regret
The regret associated with the quadratic loss function with linear sampling cost given by (2) is the loss associated with the three-stage sampling estimation procedure, and * is the optimal loss had the parameter been known. * and by of Theorem 5 and of Theorem 3, we get: * 2 1 . * 1 2 1 . and the optimal risk * 2c * . Therefore, the asymptotic regret is given as: * As shown above, negative regret is expected, since * 1 0. The issue of negative regret was addressed also by Martinsek [22], Yousef [32], and Hamdy [33]. This phenomenon deserves an in-depth investigation shortly.

Three-Stage Asymptotic Coverage Probability for the Parameter
The three-stage coverage probability is defined as: Meanwhile, the two events | | and the event are dependent for , 1, …. Therefore, we cannot obtain a mathematical expression of the coverage probability like those of Hall [14], Hamdy et al. [34], and Hamdy [35]. Therefore, we conducted a Monte Carlo simulation using Microsoft Developer Studio software to study the performance of the three-stage fixed-width confidence interval for when the optimal sample size varies from small to moderate and to large.

Simulation Study
We conducted a Monte Carlo simulation [36] to study the performance of the fixed-width confidence interval for the parameter . A series of 50,000 replications was generated from the exponential distribution with mean 5 using Microsoft Developer Studio software with the IMSL (International Mathematical and Statistical Library). The optimal sample sizes were chosen as recommended by Hall [15]: * 24, 43, 61, 76, 96, 125, 171, 246, and 500. We took the design factor 0.5 and the pilot sample 10. For brevity, we will consider 5%, which gives 1.96.
Let be the simulated estimate of the optimal sample size * with standard error . Let be the simulated estimate of the scale parameter's square of the Rayleigh distribution with standard error . is the simulated estimate for the variance of the Rayleigh distribution and 1 is the simulated estimate of the coverage probability. Table 1 below demonstrates the simulation results as the optimal sample size increases. We noticed that the simulation results agree with our findings while the coverage probability improves as the optimal sample size increases. It is evident from the simulation that the procedure provides coverage probabilities that are less than the prescribed nominal value, that is, ∈ 1 . At the same time, as → 0, ∈ → 1 . Collectively, all estimates improve as the optimal sample size increases. For the simulation methodology, see Yousef [37].

Conclusions
We have proposed a three-stage sequential procedure for estimating the Rayleigh distribution variance by estimating the Rayleigh distribution scale parameter's square. We proposed a unified decision framework for estimation and found all asymptotic results that led to asymptotic regret. The procedure attained negative regret, which shows that the three-stage procedure provides estimates better than the classical fixed-sample size procedures. Monte Carlo simulation agreed with our findings and revealed that the procedure provides coverage probabilities that are always less than the desired nominal value.