On the Subrange and Its Application to the R-Chart

: The conventional sample range is widely used for the construction of an R-chart. In an R-chart, the sample range estimates the standard deviation, especially in the case of a small sample size. It is well known that the performance of the sample range degrades in the case of a large sample size. In this paper, we investigate the sample subrange as an alternative to the range. This subrange includes the range as a special case. We recognize that we can improve the performance of estimating the standard deviation by using the subrange, especially in the case of a large sample size. Note that the original sample range is biased. Thus, the correction factor is used to make it unbiased. Likewise, the original subrange is also biased. In this paper, we provide the correction factor for the subrange. To compare the sample subranges with different trims to the conventional sample range or the sample standard deviation, we provide the theoretical relative efﬁciency and its values, which can be used to select the best trim of the subrange with the sense of maximizing the relative efﬁciency. For a practical guideline, we also provide a simple formula for the best trim amount, which is obtained by the least-squares method. It is worth noting that the breakdown point of the conventional sample range is always zero, while that of the sample subrange increases proportionally to a trim amount. As an application of the proposed method, we illustrate how to incorporate it into the construction of the R-chart.


Introduction
The control chart is a widely used and powerful graphical tool in quality control that is used to measure, monitor, and control a process over time. Usually, the control charts are in pairs. For example, an X-chart monitors the average of the manufacturing process while an R-chart monitors the variation of the process [1]. Generally, there are two phases for constructing control charts [2]. In Phase-I the goal is to obtain reliable control limits from the process data. Then, in Phase-II monitor the process by comparing the statistical properties of the future observation to the control limits, which are achieved in Phase-I [3,4]. The performance of the control charts constructed in Phase-I will determine the performance of the results in Phase-II. Thus, the data quality in Phase-I plays an important role in statistical process control (SPC). However, for X − R charts, the sample mean and range are susceptible to the outliers, which is also called data contamination. Thus, the conventional control charts may be invalidated in the case of data contamination. To solve this problem, we use a robust estimator to construct control charts in Phase-I.
Robust statistics can provide good performance when there is the presence of outliers and departures from the model assumption. In robust design, when the collected data are contaminated, the robust estimators are employed to reduce or even avoid the influence of outliers on the results [5][6][7][8][9]. In statistical process control, Park et al. [10] proposed the use of robust scale estimators (e.g., median absolute deviation (MAD)) [11] and Shamos [12]) Lehmann [40] proposed that the smaller the variance in its sampling distribution, the more "efficient" is that estimator. The relative efficiency (RE) value is considered in order to decide how to choose the optimal trim amount of the subrange by comparing the variance of the sample. Firstly, we give the distribution of the sample subrange and provide the unbiasing factors, which depend on the sample size and the number of trims to make the estimator unbiased. Next, we provide a criterion that is based on the RE value to choose the optimal trims in the sample. Lastly, we consider the breakdown point of the subrange. As we know, the inter-quartile range (IQR) is widely used to estimate the standard deviation when the sample has an outlier or extreme values. The proposed subrange includes the IQR as a special case. As well, the subrange is robust to outliers. The contribution of this article is as follows.
We give the distribution of the sample subrange and provide the unbiasing factor for the subrange of a sample through a Monte Carlo simulation. We assume the data are a normal distribution in order to calculate the unbiasing factor and the distribution for the subrange because the commonly used control charts are under the independent and normality assumption [1, 12,41,42].
The RE values are calculated by using two different kinds of baseline estimators, from which we discover that the results coincide with choosing the best performance of subranges. Some interesting points are found through comparing the breakdown points of a range and a subrange, by which we conclude that the subrange has a positive breakdown point of k/n, and the asymptotic breakdown point of the subrange can be increased to around 1/2 in theory.
In previous studies, researchers paid little attention to the sample range due to its limitations, such as its narrow application and susceptibility to outliers. In this paper, we improve the range by extending its application and improving its breakdown point. Then we use the subrange to construct a control chart for monitoring the changes in the progress when the data are contaminated. Through the Monte Carlo simulation, we investigate the properties of the subrange. We offer unbiasing factors and the relative efficiency values for the sample size of 50 or less, firstly.
In this paper, the sample subrange is proposed as an alternative to the sample range as an estimate of the standard deviation. A correction factor for the subrange is provided in order to get unbiasedness. An application of the proposed method is illustrated by incorporating it into the construction of the R-chart. In Section 2, calculations are carried out for a random sample from a standard normal distribution, and then from a non-standard normal distribution. In Section 3, the central limit theorem is used to construct control charts by adopting the subrange. Some discussions about the relative efficiency value and the breakdown point of the subrange are provided in Section 4. Additionally, concluding remarks are given in Section 4.

The Distribution of the Subrange
Let X 1 , X 2 , · · · X n be a random sample with continuous cdf F(x) and pdf f (x). Let X (1) , X (2) , · · · X (n) be the order statistics of a random sample. Then the joint pdf of X (i) and X (j) for 1 ≤ i < j ≤ n is given by For more details on the above result, refer to Casella and Berger [28], and Hogg et al. [30]. We consider the symmetrically trimmed subrange. That is, we exclude the k smallest and k largest values. Thus, the symmetrically trimmed subrange is defined as R [k] = X (n−k) − X (k+1) where k = 0, 1, 2, · · · , n/2 − 1 and n ≥ 2. Here x is the largest integer less than or equal to x. Note that R [k] becomes the regular range with k = 0.
We assume that a random sample is from a normal distribution with mean µ and standard deviation σ. It is well known that the range is not unbiased to estimate σ. By dividing R by the unbiasing factor d 2 , we can easily make this R unbiased. The values of d 2 are provided in the quality control literature-for more details refer to Shewhart [32] and Oakland [33].
In this paper, we will provide the unbiasing factor which makes R [k] unbiased for σ. We denote this factor by d 2 (n, k). It should be noted that the factor d 2 depends on the sample size n and the number of trims k in the sample.
For notational convenience, we denote the joint pdf of X (k+1) and X (n−k) by f (u, v) Then we have Let Z 1 , Z 2 , · · · Z n be a random sample from a standard normal distribution with pdf φ(z) and cdf Φ(z). For notational convenience, we denote W 1 = Z (k+1) and W 2 = Z (n−k) . Using (1), we have the joint pdf of W 1 and W 2 .
The goal is to derive the distribution of the subrange, where the Z i are from the standard normal distribution, N(0, 1). Next, we consider the new random variables given by Y 1 = W 1 and Y 2 = W 2 − W 1 . Notice that the random variable Y 2 is the subrange. The inverse transformations are easily obtained by w 1 = y 1 and w 1 = y 2 + y 1 . Then, using the bivariate transformations, the joint pdf of Y 1 and Y 2 , denoted by g k (y 1 , y 2 ) is given by where −∞ < y 1 < ∞, y 2 > 0 and J is the determinant of the Jacobian matrix given by For more details, see Casella and Berger [28]. Then the pdf Y 2 = R [k] = Z (n−k) − Z (k+1) is just the marginal pdf of Y 2 , which is given by where k = 0, 1, 2, · · · , n/2 − 1 and n ≥ 2 again.

The Unbiasing Factors for the Subrange
Using the pdf of the range in (2), we can obtain the l-th moment of the range by calculating the expectation as follows: Let X i be a sample from the normal distribution N µ, σ 2 . Since σ is a scale parameter, we have X i = σZ i + µ, where Z i is a sample from the standard normal distribution N(0, 1).
Thus, we can obtain the unbiased scale estimator of σ using the subrange which is given by Note that the unbiasing factor, d 2 in the quality control literature is obtained by Then, we have SD R [k] = d 3 (n, k)σ. Also, the unbiasing factor d 3 in the quality control literature is also given by In Tables A1 and A2, we provide the values of d 2 (n, k) and d 3 (n, k), respectively. For brevity, we provide the values for n = 2, 3, · · · 50 and k = 0, 1, · · · , 9 in the tables. The R language program is provided in the online supplement. Subrange. Available online: https://github.com/jin-yuyu/subrange.git (accessed on 1 December 2021).

The Relative Efficiency of the Subrange
We proposed the R-chart based on the subrange. Given this, a natural question is which subrange should be selected? This is essentially the same as how to choose k. We suggest the choice of k based on the RE value. In the statistics literature, the RE is defined as where θ is often a reference or baseline estimator. For more details, see Serfling [39] and Lehmann [40]. We consider the RE of the unbiased scale estimator σ [k] using σ [0] as a baseline estimator, which is given by We calculated the values of RE σ [k] σ [0] in Table A3 for n = 2, 3, · · · 50 and k = 0, 1, · · · , 9. It is easily seen that the choice of k = 0 gives the best performance for n = 2, 3, · · · , 17, and the choice of k = 1 does for n = 18, 19, · · · , 31 etc. In Table 1, we also summarize the best choice of k which provides the maximum RE value. We also obtain the simple regression line using the least-squares method. The fitted line equation is given by Thus, when the sample size n is very large, the value of k can be approximately selected using the above-fitted value. For example, when n = 500, we have k ≈ 34.
Using the unbiased estimator S n /c 4 (n) of σ as a baseline, we also calculated the RE of the unbiased scale estimator σ [k] again. Here, c 4 (n) is given by It should be noted that E S 2 n = σ 2 and E[S n ] = c 4 (n)σ. Thus, we have Var(S n /c 4 (n)) = 1/c 4 (n) 2 − 1 σ 2 and We also calculated the values of RE σ [k] S n c 4 (n) in Table A3 for n = 2, 3, · · · 50 and k = 0, 1, · · · , 9. It should be noted that RE σ [k] S n c 4 (n) = 100% when n = 2. We investigated this specific case in more detail as follows. When n = 2, k should be zero. Thus, in this case, it is immediately from (2) Science g * (y 1 ) is the pdf of the normal distribution with N −y 2 /2, 1/ √ 2 , we have which results in Then we have On the other hand, we have It is immediate from (8) and (9) that Thus, as aforementioned, we have RE σ [k] S n c 4 (n) = 100% when n = 2 (also k = 0 in this case, as mentioned earlier).

The Breakdown Points
Another way of choosing k is considering the finite-sample breakdown point, denoted by ε n . This is the maximum proportion of arbitrarily extreme observations where an estimator results in a reasonable value. For more detail see [6,10]. This finite-sample breakdown point is generally a function of sample size n. The proposed estimator, based on the subrange, has additional merit because of a positive breakdown point. It is clear that σ [k] has a positive breakdown point of k/n for k > 0,σ [0] has a zero one. Thus, we can choose k based on the value k/n.
It should be noted that the maximum attainable value of the asymptotic breakdown point is given as follows. Since the maximum value of k is n/2 − 1, the maximum finite-sample breakdown point available is given by Since n/2 can be expressed by n/2 = n/2 − δ with 0 ≤ δ < 1, we have Thus, taking the limit of ε n as n → ∞ , we have the asymptotic breakdown point which is the maximum attainable asymptotic breakdown point. It is likely that the RE value tends to decrease as the finite-sample breakdown point increases. Thus, when we choose k, we need to consider the RE value and the finite-sample breakdown point as well.

The Construction of Control Charts
In this section, we provide the method for incorporating the proposed methods into constructing the control charts. We use the proposed subrange as an alternative of the range and standard deviation when there are outliers in the data of Phase I. We provided the X-chart first, and then the R-chart.
For the case of the X-chart, it is immediately from the central limit theorem that we have where X-is the sample mean from a sample of size n, and SE(·) is the standard deviation from a sample of size n. Solving the below for X we can construct the CL ± 3 · SE control limits.
Then we have UCL = µ + 3σ √ n , CL = µ, and LCL = µ − 3σ √ n . Since µ and σ are unknown in practice, we need to estimate them. Suppose that there are m samples of size n. The most widely used estimator of µ is X where The scale σ can be estimated by using the subrange. Let R [k],i be the subrange with k trims from the ith sample. Then, as shown in (4), R [k],i /d 2 (n, k) is an unbiased scale estimator of σ. Thus, it is quite reasonable to use R [k] /d 2 (n, k) for the scale estimator, where With the estimators of µ and σ, we have the following control limits: It should be noted that when k = 0, we have R [k] = R and d 2 (n, k) = d 2 . The above control limits then become which is the traditional X-chart provided in the quality control literature. For the case of the R-chart, we briefly reviewed the conventional R-chart and proposed a new R [l] -chart. Solving the below for R, we can construct the CL ± 3 · SE control limits given by Then we have UCL = d 2 σ + 3d 3 σ, CL = d 2 σ and LCL = d 2 σ − 3d 3 σ. Analogous to the above construction, we can calculate the R [l] -chart with the limits given by In practice, the scale σ is unknown. Thus, we need to estimate σ using R [k] /d 2 (n, k), as above. We then have When l = 0 and k = 0, we have the following limits which are essentially the same as the conventional R-chart Also, if we select k = l, we can simplify the above R [l] -chart Then a natural question is how to choose k for the R [l] -chart? The following proposition provides how to select k for the best performance Proposition 1. Let X 1 , X 2 , · · · X n be a random sample from the normal distribution with scale σ, we then have (7), for k = 0, 1, · · · , n/2 − 1.

Proof of Proposition 1. It is immediately from (5) that we can obtain Var
Using the above, Var R [k] /d 2 (n, k) /Var R [k * ] /d 2 (n, k * ) can be formulated as It is easily seen from (7) that k * provides the maximum RE value so that we have Thus, we get Var R [k] /d 2 (n, k) ≥ Var R [k * ] /d 2 (n, k * ) , which completes the proof.

Discussion
As mentioned above, the choice of the trim k * is based on the relative efficiency under different sample sizes. From Table 1, we can see that when the sample size is 18 ≤ n ≤ 31, the trim k * = 1 is the best choice. This means that the subrange is robust in the case of an outlier in the data, under the trim k * when its set to be 1. Given this, the subrange performs better than the traditional range. However, we also can see that when the sample size is 18 ≤ n ≤ 31, the largest breakdown point is 1/18 ≈ 5.6%. This breakdown point of the subrange is much smaller than the MAD is. The proposed subrange is very useful for constructing a control chart when the data are contaminated in Phase-I. Although the breakdown point of the subrange is not satisfactory, it is robust under data contamination.

Conclusions
In this paper, we proposed a method of estimating the scale estimator using the subrange. Using this, we can construct the X and R charts, which are widely used in the manufacturing process. By using the proposed control charts based on the subrange, we can gain statistical efficiency along with a robustness property.
We also provided the method of choosing the trim amount of the subrange, in the sense of gaining more efficiency compared to the conventional range. For the R program used in the paper, one can refer to the URL. Subrange. Available online: https://github. com/jin-yuyu/subrange.git (accessed on 1 December 2021). The simulation results are shown in the Appendix A. Tables A1 and A2 give the unbiasing factors for the sample size of 50 or less. Tables A3 and A4 give the information on relative efficiency values for the sample size of 50 or less-in terms of the relative efficiency values.
Note that the subrange has a positive breakdown point, while the breakdown point of the conventional range is always zero. The proposed subrange in this paper is robust. It can also be used when the sample size is larger (I.e., greater than 25).
However, for the X chart, the centerline is the sample mean, which is also the sensitivity to outliers. For future research, we suggest using a robust location estimator (e.g., median and Hodges-Lehmann [43]) to construct the X-chart. As discussed in the last section, selecting suitable or proper criteria to improve the breakdown may be an interesting topic.

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.

Conflicts of Interest:
The authors declare no conflict of interest.