Three-Stage Estimation of the Mean and Variance of the Normal Distribution with Application to an Inverse Coe ﬃ cient of Variation with Computer Simulation

: This paper considers sequentially two main problems. First, we estimate both the mean and the variance of the normal distribution under a uniﬁed one decision framework using Hall’s three-stage procedure. We consider a minimum risk point estimation problem for the variance considering a squared-error loss function with linear sampling cost. Then we construct a conﬁdence interval for the mean with a preassigned width and coverage probability. Second, as an application, we develop Fortran codes that tackle both the point estimation and conﬁdence interval problems for the inverse coe ﬃ cient of variation using a Monte Carlo simulation. The simulation results show negative regret in the estimation of the inverse coe ﬃ cient of variation, which indicates that the three-stage procedure provides better estimation than the optimal.


Introduction
Let {X i , i ≥ 1} be a sequence of independent and identically distributed IID random variables from a normal distribution with mean µ ∈ and variance σ 2 ∈ + , both µ and σ 2 are unknown. Assume further that a random sample of size n(≥ 2) from the normal distribution becomes available then we propose to estimate µ and σ 2 by the corresponding sample measures X n and S 2 n , respectively. It is a common practice, over the last decays, to treat each problem separately, where we consider one decision framework for each inference problem of the mean or the variance.
The objective in this paper is to combine the inference of both problems under one decision framework in order to achieve maximal use of the available sample information to handle these problems simultaneously. Given pre-defined α, 0 < α < 1 and d(> 0), where (1 − α) is the confidence coefficient and 2d is the fixed-width of the interval, we want to construct a fixed-width (= 2d) confidence interval for the mean µ whose confidence coefficient is at least the nominal value 100(1 − α)%, where at the same time, we will be able to use the same available data to estimate the population variance σ 2 under squared-error loss function with linear sampling cost. Hence, we combine both optimal sample sizes in one decision rule to propose the three-stage sampling decision framework.
Therefore, the optimal sample size required to construct a fixed-width confidence interval for µ whose coverage probability is least the nominal value 100(1 − α)% must satisfy the following: where a is the upper (α/2)% critical point of the standard normal distribution N(0, 1). For more details about Equation (1), see Mukhopadhyay and de Silva ( [1]; chapter 6, p. 97).

Minimum Risk Estimation
In the literature of sequential point estimation problems, one may consider several types of loss functions such as the squared-error loss function, the absolute-error loss function, the linex-loss function and others. It was shown that the commonly used one is the squared-error loss function due to its simplicity in mathematical computations; see, for example, Degroot [2]. Therefore, we write the loss incurred in estimating σ 2 by the corresponding sample measure S 2 n as L n (A) = A S 2 n − σ 2 2 + cn (2) where A > 0 is a known constant and c is the known cost per unit sample observation. We will elaborate on the determination of A in the following lines. Now, the risk corresponding to Equation (2) is Thus, the minimum value of n that minimizes the risk in Equation (3) is moreover, the associated minimum risk is The value of n * in Equation (4) is called the optimal sample size required to generate a point estimate for σ 2 under Equation (2) while Equation (5) is the minimum risk obtained if σ 2 is known.

A Unified One Decision Framework
If we want to combine both the confidence interval estimation and the point estimation in one decision framework, we have to have the constant A = (1/2) a 4 /d 4 c to perform both confidence and point estimation in one decision rule. Careful investigation of the constant A = n * /2σ 4 (cn * ) provided the statistical interpretation, that is cn * is the cost of optimal sampling while n * /2σ 4 represents the optimal information; in other words, it is the amount of information required to explore a unit of variance in order to achieve minimum risk. Thus A is the cost of perfect information, and it is contrary to what has been said in the literature-that it is the cost of estimation.
Therefore, we proceed to use the following optimal sample size, to perform the required inference, Since σ 2 in Equation (6) is unknown, then no fixed sample size procedure can estimate the mean µ, independent of σ 2 ; see Dantzig [3]. Therefore, we resort to a triple sampling sequential procedure to achieve the previously required goals. Henceforth, we continue to use the asymptotic sample size defined in Equation (6) to propose the following triple sampling procedure to estimate the unknown population mean µ and the unknown population variance σ 2 via estimation of n * .

Three-Stage Estimation of the Mean and Variance
In his seminal work, Hall [4] introduced the idea of sampling in three stages to tackle several problems in sequential estimation. He combined the asymptotic characteristics of one-by-one purely sequential sampling procedures of Anscombe [5], Robbins [6], and Chow and Robbins [7] and the operational saving made possible by Stein [8], and Cox [9] group sampling.
From 1965 until the early 1980s, the research in sequential estimation was mainly devoted to two types of sequential sampling procedures-the two-stage procedure, which satisfies the operational savings, and the one-by-one purely sequential procedure that satisfies the asymptotic efficiency. The objective was to use these methods under non-normal distributions. For brevity, see Mukhopadhyay [10], Mukhopadhyay and Hilton [11], Mukhopadhyay and Darmanto [12], Mukhopadhyay and Hamdy [13], Ghosh and Mukhopadhyay [14], Mukhopadhyay and Ekwo [15], Sinha and Mukhopadhyay [16], Zacks [17], and Khan [18]. For a complete list of references, see Ghosh, Mukhopadhyay, and Sen [19].
In the early 1980s, Hall [4,20] considered the normal distribution with an unknown finite mean and an unknown finite variance. His objective was to construct a confidence interval for the mean with a pre-assigned fixed-width and coverage probability. We will describe Hall's three-stage procedure in Section 2.1.
Since the publication of Hall's paper, research in multistage sampling has extended Halls results in several directions. Some have utilized the triple sampling technique to generate inference for other distributions, others have tried to improve the quality of inference such as protecting the inference against type II error probability, studying the characteristic operating curve, or/and discussing the sensitivity of triple sampling when the underlying distribution departs away from normality. For more details see Mukhopadhyay [21][22][23], Mukhopadhyay et al. [24], Mukhopadhyay and Mauromoustakos [25], Hamdy and Palotta [26], Hamdy et al. [27], Hamdy [28], Hamdy et al. [29], Lohr [30], Mukhopadhyay and Padmanabhan [31], Takada [32], Hamdy et al. [33], Hamdy [34], Al-Mahmeed and Hamdy [35], AlMahmeed et al. [36], Costanzo et al. [37], Yousef et. al. [38], Yousef [39], Hamdy et al. [40] and Yousef [41]. Liu [42] used Hall's results to tackle hypothesis-testing problems for the mean of the normal distribution while Son et al. [43] used the three-stage procedure to tackle the problem of testing hypotheses concerning shifts in the population normal mean with controlled Type II error probability.

Three-Stage Sampling Procedure
As the name suggests, an inference in triple sampling is performed in three consecutive stages-the pilot phase, the main study phase, and the fine-tuning phase.
The Pilot Phase: In the pilot study phase, a random sample of size m(≥ 2) from the population say, (X 1 , . . . , X m ) to initiate sample measures, X m for the population mean µ and S m for the population The main Study Phase: In the main study phase, we only estimate a portion γ ∈ (0, 1) of n * to avoid possible oversampling. In literature, γ is known as the design factor. Let [x] be the largest integer ≤ x and ξ as defined before we have If m ≥ N 1 then we stop at this stage, otherwise we continue to sample an extra sample of size N 1 − m, say X m+1 , X m+2 , . . . , X N 1 , then we update the sampling measures to X N 1 and S N 1 for the population's unknown parameters µ and σ, respectively. Hence, we proceed to define the fine-tuning phase.
The Fine-Tuning Phase: In the fine-tuning phase, the decision to stop or continue sampling is taken according to the following stopping rule If N 1 ≥ N then sampling is terminated at this stage, or else we continue to sample an additional sample of size N − N 1 , say X N 1 +1 , X N 1 +2 , . . . , X N . Hence, we augment the previously collected N 1 samples with the new N − N 1 to update the sample estimates to X N and S N for the unknown parameters µ and σ. Upon terminating the sampling process, we propose to estimate the unknown population mean µ by the corresponding triple sampling confidence interval I N = X N − d, X N + d and the unknown population variance σ 2 by the corresponding triple sampling point estimate S 2 N . The asymptotic results in this paper are developed under the Assumption (A) set forward by Hall [20] to develop a theory for the triple sampling procedure. That is, for all n ≥ 2, and consider the following Helmert's transformation to the original normal random variables X 1 , . . . , X n , to write S 2 n as an average of IID random variables for all n ≥ 2. Now, let Z i = (X i − µ)/σ for i = 1, 2, . . . , n, Robbins [6], it follows that S 2 n and V n = (n − 1) −1 n i=2 V i are identically distributed for all n ≥ 2. That is, We continue to use the representation of V n instead of S 2 n for all n ≥ 2 to develop the asymptotic theory for both the main study phase and the fine-tuning phase.

The Asymptotic Characteristics of the Main Study Phase
Under Assumption (A), we have As ξ → ∞ , P N = [ξS 2 From Theorem 1 of Yousef et al. [38] as ξ → ∞ we have Theorem 1. Under Assumption (A) and using Equation (7), we can show for any real k, as ξ → ∞ Further simplifications similar to those given in Hamdy [28], we get By applying the first two terms of the infinite binomial series and taking the expectation, we get Y i and expand (N 1 − 1) −1 around γn * , and then take the expectation.
where ρ is a random variable between N 1 and γn * . It is not hard to show that E we have omitted the proof for brevity.
It follows that Likewise, we recall the second term and expand (N 1 − 1) −2 around γn * , we get Substituting Equations (11) and (12) into Equation (10), we get the result. The proof is complete. As a particular case of Theorem 1, for k = 1/2, 1, 2 and k = 3 we have as ξ → ∞ , while from the Equation (13) and the results of (ii) and (iii) we obtain The following Theorem 2 gives the second-order asymptotic expansion of the moments of a real-valued continuously differentiable function of S 2 N 1 .

Theorem 2.
Under Assumption (A) and let g(> 0) be a real-valued continuously differentiable function in a neighborhood around σ 2 such that sup Proof. Taylor expansion of g S 2 N 1 around σ 2 provides, where η is a random variable between S 2 N 1 and σ 2 . Now, taking the expectation all through we have, From Equation (13), parts (ii) and (14), we have Equation (13), part (iv), and the assumption that g (·) is a bounded function. The proof is complete.

Corollary 1.
Under Assumption (A) and let g(> 0), be a real-valued continuously differentiable function in a neighborhood around σ such that sup n>m |g (n)| = O(|g (n * )|) then Proof. First, by using Taylor series expansion of the function g(·) around σ, we get By taking the expectation all through we have by using Equation (13), parts (i), (ii) and (iii) and the fact, that g (·) is bounded. The proof is complete.
As an especial case of Corollary 1, take f (t) = t −1 , and f (t) = t −2 we obtain, This completes our first assertion regarding the asymptotic characteristics of the main-study phase. In the following section, we find the asymptotic characteristics of the final random sample size.

The Asymptotic Characteristics of the Fine-Tuning Phase
Asymptotic characteristics of the variable N are given in the following Theorem. (8), let h(> 0) be a real-valued continuously differentiable function in a neighborhood around n * such that sup

Theorem 3. Under Assumption (A) and using Equation
Proof. We write N = ξS 2 N 1 + 1, except possibly on a set φ = (N 1 < m) ∪ ξV N 1 < γξV m + 1 of measure zero. Therefore, for real r, we have provided that the rth moment exists, and , where [x] as defined before. From Hall [4], as ξ → ∞ , β N 1 is an asymptotically uniform distribution. Now, for r = 1, we have, Likewise, for r = 2, we have For r = 3, we have We turn to prove Theorem 3.

First, write h(N) in Tayler series expansion as
where ν is a random variable between N and n * . By using Equations (16)- (18) we have However, 1 6 and its derivatives are bounded. The proof is complete.
Let N be defined as in Equation (8) and assume (A) holds, the asymptotic characteristics of the fine-tuning phase as ξ → ∞ we have (see Yousef et al. [38]) (8), we can show that for any real k and as ξ → ∞

Consequently, this yields
Consider the first three terms in the expansion and the remainder term R(ξ) where E(R(ξ)) = o ξ −1 . Let us evaluate the second term σ 2k kE 1 where ν is arandom variable lies between N and n * . Furthermore, where we have used the fact that N ≈ ξ V N 1 . Thus, However, the first term in (21) n * −1 σ 2k kE [44] first equation.

Three-Stage Coverage Probability of the Mean
Since X N and the events {N = n}, n = m, m + 1, m + 2, . . . are independent because N is a function of S 2 N 1 also because X N and S 2 N 1 are independent for all n = m, m + 1, m + 2, . . . for the normal distribution, it follows that, as ξ → ∞ The quantity (2γ) −1 5 − γ + a 2 is known as the cost of ignorance or the cost of not knowing to σ 2 (see Simons [45] for details).

The Asymptotic Regret Incurred in Estimating σ 2
Theorem 5. The risk associated with (2) as m → ∞ is given by Moreover, the asymptotic regret is Proof. Recall the squared-error loss function given in Equation (2) and take the expectation all through, By using Equation (16) and Theorem 4 with k = 1 we have while the asymptotic regret of the triple sampling point estimation of σ 2 under (2) is The proof is complete.
Clearly, for zero cost, we obtain zero regrets. While for a non zero cost, we obtain negative regret for all 0 < γ < 1. This means that the triple sampling procedure provides better estimates than the optimal (see Martinsek [46]).

Simulation Results
Since the results are asymptotic, it is worth mentioning to record the performance of the estimates under a moderate sample size performance. Microsoft Developer Studio software was used to run FORTRAN codes using Equations (7) and (8). A series of 50,000 replications were generated from a normal distribution with different values of µ and σ 2 . The optimal sample sizes were chosen to represent small, medium to large sample sizes n * = 24,43,61,76,96,125,171,246, and 500 with γ = 0.5 as recommended by Hall [4,20]. For brevity, we report the case at m = 10.

The Mean and the Variance of the Normal Distribution
We estimate the optimal final sample size and its standard error, the mean and its standard error, the coverage probability of the mean, the variance and its standard error, the asymptotic regret of using the sample variance instead of the population variance. For constructing a fixed-width confidence interval for the mean we take α = 0.05 ⇒ a = 1.96 . In each Table, we report N as an estimate of n * , S N as the standard error of N,μ as an estimate of µ with standard error S(μ). The estimated coverage probability is 1 −α while the estimated asymptotic regret isω.
The simulation process is performed as follows: Fix γ, α and n * as in Equation (6). First: For the i-th sample generated from the normal distribution, take a pilot sample of size m, that is (X 1,i , X 2,i , . . . , X m,i ).
Second: Compute the sample mean X i and the sample variance S 2 i . Third: Apply Equations (7) and (8) to determine the stopping sample size at this iteration, whether in the first stage or the second stage, say N * i . The inverse coefficient of variation is the ratio of the population mean to the population standard deviation, that is θ = µ/σ, θ ∈ (no singularity point can exist over the entire real line). Assume further a random sample of size n(≥ 2) from the normal distribution becomes available, we propose to estimate θ byθ n = X n S n −1 . It is a dimensionless quantity that makes comparisons across several populations that have different units of measurements has useful meanings. In practical life, the inverse coefficient of variation is equal to the signal to noise ratio, which measures how much signal has been corrupted by noise (see McGibney and Smith [47]).
Fourth: Record the resultant sample size, the sample mean, the sample standard deviation, and the estimated inverse coefficient of variation N * i , X i * , S * i ,θ i for i = 1, 2, . . . , k where k = 50, 000 Hence, for each experimental combination, we have four vectors of size k as follows: are respectively the estimated mean sample size, the estimated mean of the population mean, the estimated mean of the sample variance and the estimated mean of the inverse coefficient of variation across replicates. The standard errors are, respectively, Tables 1 and 2 below show the performance of the estimates under m = 10 and γ = 0.5. Regarding the final random sample size N, we noticed that as n * increases, N is always less than n * (early stopping) with standard error increases, N/n * ≈ 1. While as n * increasesμ ≈ µ andσ ≈ σ with standard error decreases. Regarding the coverage probability, the three-stage procedure under the rules in Equations (7) and (8) provides coverage probabilities that are always less than the desired nominal value while it attains it only asymptotically. Regarding the estimated asymptotic regret,ω we obtain negative regret, which agrees with the result of Theorem 5.

The Inverse Coefficient of Variation
As an application, we invest the three-stage estimation of both the mean and the variance to estimate the inverse coefficient of variation θ, and its standard error S θ , the coverage probability of θ and the asymptotic regret. To estimate θ we perform the previous steps in addition to the simulated regretŵ(A) using a squared-error loss function with linear sampling cost iŝ Table 3 below shows the performance of the procedure for estimating θ. Obviously, as n * increaseŝ θ/θ ≈ 1 with standard errors decrease. Regarding the coverage probability of θ we noticed that P |θ N − θ| ≤ d ≥ 0.95 for all θ ∈ . This means that the procedure attains exact consistency. Regarding the asymptotic regret, we noticed that as n * increases, the regret decreases with negative values. This means that the three-stage procedure does better than the optimal. Table 3. Three-stage estimation of the inverse coefficient of variation under a unified stopping rule. µ = 10, σ = 5,θ = 2 µ = 5, σ = 10,θ = 0.5

Conclusions
We use a three-stage procedure to tackle the point estimation problem for the variance while estimating the mean by a confidence interval with preassigned width and coverage probability. We use one unified stopping rule for this estimation and use the results in developing both point and interval estimations for the inverse coefficient of variation. Monte Carlo simulations were performed to investigate the performance of all estimators. We conclude that the estimation of the inverse coefficient of variation through the mean and variance obtained better results with negative regret. As an application in engineering reliability see Ghosh, Mukhopadhyay and Sen ( [19]; chapter 1, p. 11). For applications in real-world problems see Mukhopadhyay, Datta, and Chattopadhyay [48]. Funding: This research received no external funding.

Conflicts of Interest:
The authors declare no conflict of interest.