Abstract
In this paper, we propose control limits for monitoring the mean of a process variable based on a first and second order Cornish–Fisher expansion, which limits are inclusive of its skewness and kurtosis measures, respectively. These are shown to have better in-control error performance than other limits that were similarly derived from this expansion with smoothing functions, both when these measures are assumed to be known and estimated from sample data. The range of measure specifications where the underlying Cornish–Fisher function is monotonic is derived. Operating characteristic curves for select cases demonstrate the associated out-of-control error performance. The Cornish–Fisher limits are applied to a real-life dataset in developing a control chart for monitoring the mean lifetime of car brake pads, wherein they are compared to other limit approximations.
1. Introduction
The classic Shewhart control chart on the mean of a process specifies control limits as the in-control mean plus/minus a multiplier of the standard deviation of the distribution of the sample mean. The skewness and excess kurtosis are omitted from this specification, and assumed to vanish by the central limit theorem when determining the type 1 and 2 error rates of these limits. In the case where these measures deviate greatly from zero in the population distribution, however, convergence to the Gaussian distribution may not be reasonably assumed for small subgroup sizes. Indeed there are many examples of real world processes in healthcare, tool wear, and human performance that follow Gamma, lognormal, and other non-Gaussian distributions with a high degree of skewness and kurtosis, for which Shewhart limits may be largely inaccurate for assumed error rates [1,2,3].
One method to handle such cases is to use a moment-based expansion of the sampling distribution of the mean that includes estimates of the population skewness or both skewness and kurtosis. In particular, a Cornish–Fisher (CF) expansion is one such method that may be used to approximate the corresponding quantile function that is inclusive of the skewness when truncated at the first order and of both skewness and kurtosis when truncated at the second order. Building on this, reference [4] utilized a first order CF expansion (CF-1) in specifying control limits that incorporated the skewness measure referred to as skewness correction (SC) limits, and reference [5] utilized a second order CF expansion (CF-2) in specifying limits that incorporated a kurtosis, but not skewness, measure referred to as Kurtosis Correction (KC) limits. While these showed improvement over Shewhart and other heuristic limits in select cases, they are specified with a smoothing function that increasingly negates the contribution of the estimated skewness and kurtosis measures to the approximated limits when these are increased in absolute value. This smoothing function gets around the inherent bounding of the skewness and kurtosis in the CF expansion, outside of which the approximated quantile function may not be monotonically increasing. These bounds are referred to as the domain of validity [6], and consist of one-dimensional range of skewness values for a CF-1 expansion and a two-dimensional range of skewness and kurtosis values for a CF-2 expansion. However, it was observed that the domain of validity of the skewness and kurtosis measure of both the CF-1 and CF-2 expansions are sufficiently quite wide to capture most practical applications of statistical process control. Indeed, a study by [7] found that the vast majority of real-world data sets that were studied had ranges of skewness and kurtosis values in the population distribution that would fall within the domain of validity of the CF-1 and CF-2 expansions when considering common subgroup sizes . Furthermore, the CF expansions will continue to exist outside the domain of validity, yet with likely diminished accuracy.
Other approaches to this problem include control charts that are based in other parametric approximations or in a non-parametric method. Weighted standard deviation (WSD) [8] and improved exponentially weighted moving average (IEWMA) [9] are two common parametric-based charts. The WSD method modifies Shewhart limits by having separate standard deviations for the LCL and the UCL where these are modified by the probability that the sample will be less than or greater than the mean respectively. It was demonstrated in [4] that the SC method based on Cornish–Fisher approximations outperformed the WSD method for every skewed distribution considered, for both known and estimated skewness. The IEWMA method selects a smoothing function and weights a priori to determine the effects of skewness on the control limits. Studies by [10,11] have shown that the IEWMA method overestimates the average run length (ARL) significantly when the mean and variance are estimated from data, and can be highly subjective to the assigned smoothing function and weights. Non-parametric control charts monitor a process median through order statistics or ranks of collected data, rather than directly monitoring the process mean. A sampling of such charts include [12] and a review of these may be found in [13]. Non-parametric charts by definition require no distributional assumption, yet likewise do not incorporate information about the skewness and kurtosis measures into the specification of the control limits. An example non-parametric chart that produces a Shewhart type control chart is the mean-rank chart proposed by [14]. The mean-rank chart utilizes the well-known Kruskal–Wallis nonparametric test [15] and converges to control limits set by the standard normal distribution as the sample size increases with the central limit theorem. Two other, closely related non-parametric Shewhart type control charts are the sign [16] and the signed-rank [17] control charts. Although all three non-parametric methods perform well for small shifts in the process variable, there is a cost in terms of sample size for accuracy.
In this paper, we present control limits for a chart that are derived directly from the CF-1 and CF-2 expansions without smoothing functions, thereby fully incorporating estimates of the skewness and the skewness and excess kurtosis into the approximated control limits. These are referred to as CF-1 and CF-2 limits in this paper, and are an alternative to SC and KC limits that are also based on a Cornish–Fisher expansion. These limits will be numerically tested across a range of subgroup sizes and asymmetrical population distributions, wherein they will be compared in their ability to meet a target in-control average run length (ARL). This testing will be in both the case in which the skewness and kurtosis measures are assumed to be known, and in the case where these are estimated from data. Additionally, the out-of-control performance of the CF-2 chart will assessed for the cases considered. Following this, an chart with the aforementioned limit approximation will be applied to a real-world data set for comparison purposes.
2. CF-1 Limits Inclusive of Skewness
It was shown by [18] that for a subgroup of size n, the random variable at the pth quantile, , can be estimated by inverting an Edgeworth expansion through the jth order, yielding a standard normal quantile function, , as the argument to polynomial functions of higher order cumulants [19]. When truncating at the first order, the CF-1 expansion given by
where
is inclusive of the skewness () measure. Letting , as is common for process monitoring, it follows that
whereupon substitution into Equation (1) yields the following CF-1 control limits
As a matter of comparison, the SC limits of [4] are given by
Note that the contribution of the skewness measure to the specification of the SC control limits vanishes as such is increased in absolute value, limiting these to Gaussian/Shewhart control limits.
The domain of validity for the CF-1 expansion of Equation (1) is that where the first derivative is non-negative, given by
from whence
Substituting Equation (2) yields the following domain of validity
for the skewness measure encompassed within the control limits.
3. CF-2 Limits Inclusive of Skewness and Kurtosis
When truncating at the second order, the CF-2 expansion given by
where
is inclusive of both the skewness () and excess kurtosis () measures. Letting , the CF-2 limits follow from Equation (6) as
As a matter of comparison, the KC limits of [5] are given by
Note that in the KC limits of (8), skewness is omitted and the effect of kurtosis limits to one when increased in absolute value.
The domain of validity for the CF-2 expansion of (6) is found as
which will hold if the corresponding discriminant is negative. Hence, it follows that
for all quantiles p, from whence the two-dimensional region of validity for the skewness and kurtosis measures of the distribution of the mean is depicted in Figure 1, with maximal values given by and .
Figure 1.
Domain of validity for skewness and kurtosis values for the second order Cornish Fisher Expansion.
4. Numerical Analysis
The accuracy of the Cornish–Fisher limits were tested in terms of their ability to meet the specified type-1 error rate , which is expressed for convenience as the in-control average run length (ARL) given by
Since the population distribution is assumed to be asymmetric, these will be measured separately for the approximated upper and lower limits, for which the specified type-1 error rate above the UCL (ARL↑) and below the LCL (ARL↓) will be set to the commonly assumed value of , and hence the exact given by
The process variable in this numerical study is assumed to follow a probability distribution, yet it should be noted that the control limits are specified through the moments of the random variable and not the distribution of such. The distribution was selected for this study due to its ability to both model a broad range of asymmetric shapes and obtain a closed form expression for the distribution of the subgroup mean, i.e., ∼, from whence exact ARL values for the approximated limits may be obtained. It follows thus that the accuracy of the approximated limits may be compared to the exact values of Equation (11) when the corresponding measures of the population mean, variance, skewness, and kurtosis measures are either assumed to be known or estimated from sampled data, evaluating thus both theoretical and practical accuracy.
4.1. Known Skewness and Kurtosis Measures
The theoretical accuracy of the CF-1 and CF-2 limits of Equations (3) and (7) for subgroup sizes of were compared to exact, Gaussian/Shewhart, and the SC and KC limits of Equations (7) and (8) for a distributed process variable with and , as depicted in Figure 2.
Figure 2.
probability distributions of varied shape and skew used in performance analysis and representing the range of many control chart applications.
As shown in Figure 2, the distribution can assume a wide array of possible skew and kurtosis values present in datasets. Note that a distributed process variable is an exponential distribution, and the distribution of the mean of both Chi-square and Weibull distributions are specialized cases of the distribution. In addition to the control limits generated from a Cornish–Fisher expansion, those generated by the weighted standard deviation (WSD) method were likewise considered in this analysis.
The known summary measures of a distribution are given by
Note that the skewness and kurtosis measures in the equations of (12) all fall within the domain of validity for both the CF-1 and CF-2 approximations as per Equations (5) and (10) for the specifications of n, , and considered in this study. The corresponding control limits and in-control Average Run Lengths are given in Table 1.
Table 1.
Upper and lower control limits and in-control average run lengths for an -chart on a distributed process with subgroups of size n and desired type-1 error of .
In comparing performance, most all approximated limits outperform Gaussian/Shewhart limits, yet it is further evident that the CF-1 and CF-2 approximations outperform all other methods across specification of the shape parameter and subgroups sizes n. Of particular interest, they outperform the SC and KC limits which were derived from the CF-1 and CF-2 expansions with the addition of the smoothing function. Specific to the SC limits, the CF-1 limits, which also only incorporates skewness, has similar performance for cases with limited asymmetry in the population distribution or large subgroup sizes, yet largely outperforms the SC limits at higher degrees of asymmetry and small subgroup sizes. The CF-2 limits, which additionally incorporates kurtosis, outperform SC and CF-1 limits in all instances. The KC limits have generally poor performance due to the assumption of these that the skewness is equal to zero.
From Table 1, it is observed that the in-control ARL performance of the lower control limit (LCL) for all approximate limits demonstrate a high degree of inaccuracy, e.g., from a distribution. Since the skewness is positive, it is noted that the derivative of the distribution of the mean in the lower tail increases greatly for small subgroup sizes and, hence, any deviation from the exact specification of the LCL will have a large deviation from the target type 1 error. Specific to the CF-1 and CF-2 limits, the magnitude of the associated skewness and kurtosis push towards the boundaries of the domain of validity of the Cornish–Fisher expansion, causing further inaccuracy, yet it is noted that this continues to be more accurate than the other limits.
The type-2 error performance of the approximate distributions was assessed by varying the values of the parameters and and generating the corresponding operating characteristic (OC) curves in Figure 3 and Figure 4. The OC curves only display the CF-2 approximations since the CF-1 approximation is very similar and would be difficult to determine the distinction between each graphically. The similarity between CF-1 and CF-2 approximations is demonstrated in a specific case presented in Table 2.
Figure 3.
Operating characteristic (OC) curves of an chart with subgroups of size n for a population distribution with shifted shape parameter , where is the in-control case.
Figure 4.
Operating characteristic (OC) curves of an chart with subgroups of size n for a population distribution with shifted scale parameter , where is the in-control case.
Table 2.
In- and out-of-control ARLs for a distributed process with subgroups of size .
Note that when shifting the scale parameter, both the mean and variance of vary, yet the skewness and kurtosis remain constant, whereas when shifting the shape parameter, all measures of vary.
Based on Figure 3 and Figure 4, the Gaussian approximation is not accurate in detecting shifts for both the LCL and UCL. CF-1 and CF-2 approximations predict out-of-control performance for all shifting shape and scale parameters almost the same as the exact distribution’s control limits. Consider the specific case of a distribution with subgroups of size , whose in- and out-of-control ARL performance is given in Table 2. In all cases given in Table 2, the CF-1 and CF-2 approximations achieve marked improvement over the normal approximation for type-2 errors. Of note, the Gaussian approximation generally displays loss of power as the shift increases in both directions. The Cornish–Fisher approximation only has a minimal loss of power on the order of a few percent relative error when compared with the exact ARL. There is also very minor differences between the CF-1 and CF-2 approximation except at the in-control ARL where the CF-2 approximation is more accurate. This reflects the previous results for the type-1 errors discussed in Table 1.
4.2. Estimated Skewness and Kurtosis Measures
As the skewness and kurtosis measures of a distribution are generally not known, the practical accuracy of the CF-1 and CF-2 limits are assessed in the same instances considered previously when the measures of the distribution of the mean are estimated from sampled data points as
Towards this end, a simulation study was conducted in which the ARL of Shewhart/Gaussian, SC, CF-1, and CF-2 limits were evaluated for each of replications within each combination of the shape parameter , subgroup sizes , and pre-sampled subgroups 10,000, whereupon the median (Med) and inner-quartile range (IQR) of these are reported for the LCL in Table 3 and UCL in Table 4.
Table 3.
In-control ARLs for the LCL of an -chart on a distributed process.
Table 4.
In-control ARLs for the UCL of an -chart on a distributed process.
It is expected that for all approximate limits, the median ARL will converge to the exact value of 740.74 for both the LCL and UCL as the shape parameter and subgroup size n are increased both simultaneously and independently, which was observed. Further, it is expected for all approximate limits, the interquartile range (IQR) of the ARL values would narrow as the number of pre-sampled subgroups m is increased, and that, to a lesser extent, the median ARL will converge, both of which were likewise observed.
Between the approximations, the IQR range of the CF-1 and CF-2 limits are more narrow than that of the SC limits in most cases, which points to the improved consistency of these approximations regardless of the quantity of pre-sampled subgroups. In a similar vein, but to a lesser extent, the IQR of the CF-2 limits were more narrow than that of CF-1 across nearly all cases. This observation, coupled with that of the median of the CF-2 approximation demonstrating the highest degree of accuracy across all cases nearly uniformly, is notable despite inherent inaccuracies in estimating the kurtosis when , n, and m are small. Focusing on the SC and CF-1 limits, in which only the skewness is considered, note that the SC limits quickly lose both accuracy and consistency when , n, and m are small. In such cases, the skewness measure, as it relates to the distribution of the mean, is quite large, whereby the effect of skewness on the approximated limits is negated by the multiplicative smoothing factor and these tend towards the Gaussian/Shewhart limits with poor ARL performance.
5. Control Charting Application
A dataset in which the lifetime of 98 randomly sampled car brake pads are reported in [20]. Assuming these were collected in subgroups of size , the summary measures of the distribution of the sample mean were calculated according to Equation (13) and are given by , , , and . A -chart with the sample data plotted against Gaussian/Shewhart, SC, KC, CF-1, and CF-2 limits is given in Figure 5.
Figure 5.
Control Chart limits for car brake pad lifetime data (, ).
Upon inspection of this control chart, the Gaussian/Shewhart limits do not account for either the skewness or kurtosis in the data and are, hence, very wide. The KC limits capture the effect of kurtosis and are hence a bit more narrow, yet are symmetrical about the mean without adjustment for skewness. The LCL of the SC, CF-1, and CF-2 limits is progressively larger in magnitude and, hence, the SC limits do not flag the mean of the 12th subgroup as being out-of-control, whereas the CF-1 limits have this point on the boundary and the CF-2 limits signal an out-of-control condition. Hence, the sensitivity of the control chart is increased as more complete information about the measures of the distribution of the sample mean is included in the control limit calculation.
6. Discussion
This research contributes to the study of control charts for the mean of a process in which the skewness and kurtosis of the population distribution are non-negligible. Cornish–Fisher approximations provide a closed form UCL and LCL for small sample sizes inclusive of skewness for the CF-1 expansion and inclusive of skewness and kurtosis for the CF-2 expansion. The approximation is distribution independent and was shown to be fairly robust to errors in estimating these measures from data. They outperform the similarly derived limits for the SC and KC methods that incorporate smoothing function in ARL performance, and do not require extraneous parameter specifications, such as the WSD and IEWMA methods. In summary, the CF-1 and CF-2 limits represent an improvement over previous research because they are closed form, inclusive of skewness and kurtosis, distribution independent, and accurate at small sample sizes. While there are some limitations to the use of these limits based on the domain of validity constraints on the skewness and kurtosis measure, it is noted that this constraint is quite large when considering the distribution of the mean when sub-grouping, and that the control limits do not cease to exist outside this domain of validity, but rather lose accuracy.
We propose future research into the domain of validity and how to address this limitation, including some research by [21] on using rearrangement. Future work also includes investigating methods to overcome the shortcomings of the domain of validity on skewness and kurtosis values set for a second order truncated Cornish–Fisher expansion series. This could involve exploration on possible additional functions to incorporate into the approximations of the control limits to ensure stability of the series expansion. Another possibility is investigating higher orders of truncation at the 3rd and 4th order, which would change the domain of validity by including higher order expansions into the approximation. Finally, applications beyond quality control could be investigated in which generally approximating the convolution of sums or averages are necessary.
7. Conclusions
This paper presents control limits for monitoring the mean of a process based on first and second order Cornish–Fisher expansions that are inclusive of the skewness and kurtosis measures of a population distribution. The theoretical accuracy of these limits in the case in which the skewness and kurtosis measures are known was shown to be a major improvement over traditional Shewhart limits based in the central limit theorem, and over other limit approximation methods that are inclusive of these measures in their specification. A simulation study showed that Cornish–Fisher limits are both accurate and consistent in practice, when these measures are estimated from even a limited amount of pre-sampled data. An control chart was derived using an array of limit approximations for a real-world dataset, showing a comparative practical application of this methodology.
Author Contributions
Study conceptualization: P.B., and T.M.; Research: P.B., and T.M.; Technical review and manuscript revision: T.M.; Supervision: T.M. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
Not applicable.
Conflicts of Interest
The authors declare no conflict of interest.
Nomenclature
| Skewness | |
| Excess Kurtosis | |
| n | Subgroup Size |
| m | Number of Subgroups |
| Type I Error Rate, Producer’s Risk | |
| Standard Normal Probability Density Function | |
| Standard Normal Cumulative Density Function | |
| Discriminant | |
| Scale Parameter of distribution | |
| Shape Parameter of distribution |
References
- Zain, Z.; Yahaya, S.S.S.; Ahmad, N.; Atta, A.M.A. New X-bar Control Chart Using Skewness Correction Method for skewed Distributions with Application in Healthcare. Syst. Rev. Pharm. 2020, 11, 120–131. [Google Scholar]
- Dreves, M.; Huang, G.; Peng, Z.; Polyzotis, N.; Rosen, E.; Suganthan, G.C.P. From Data to Models and Back. In Proceedings of the Fourth International Workshop on Data Management for End-to-End Machine Learning, Portland, OR, USA, 14 June 2020; pp. 1–4. [Google Scholar]
- Mehta, R. Self-Assessment Data May Be Significantly Skewed During Global Health Crises. Int. J. Aviat. Res. 2020, 12. [Google Scholar]
- Chan, L.K.; Cui, H.J. Skewness Correction and R Charts for Skewed Distributions. Nav. Res. Logist. 2003, 50, 555–573. [Google Scholar] [CrossRef]
- Tadikamalla, P.; Popescu, D. Kurtosis Correction Method for and Control Charts for Long-Tailed Symmetrical Distributions. Nav. Res. Logist. 2007, 54, 371–384. [Google Scholar] [CrossRef]
- Maillard, D. A User’s Guide to the Cornish Fisher Expansion; SSRN:1997178 2018; pp. 1–14. Available online: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=1997178 (accessed on 12 April 2022).
- Blanca, M.J.; Arnau, J.; López-Montiel, D.; Bono, R.; Bendayan, R. Skewness and Kurtosis in Real Data Samples. Methodology 2013, 9, 78–84. [Google Scholar]
- Bai, D.S.; Choi, S.I. and R control charts for skewed populations. J. Qual. Technol. 1995, 27, 120–131. [Google Scholar] [CrossRef]
- Shu, L.; Wei, J.; Wu, J. A one-sided EWMA control chart for monitoring process means. Commun. Stat. Comput. 2007, 36, 901–920. [Google Scholar] [CrossRef]
- Jones, A. The statistical design of EWMA control charts with estimated parameters. J. Qual. Technol. 2002, 34, 277–288. [Google Scholar] [CrossRef]
- Jensen, A.; Jones-Farmer, A.; Champ., C.; Woodall, W. Effects of parameter estimation on control chart properties: A literature review. J. Qual. Technol. 2006, 38, 349–364. [Google Scholar] [CrossRef]
- Chakraborti, S. Nonparametric (distribution-free) quality control charts. Encycl. Stat. Sci. 2011, 1–27. [Google Scholar] [CrossRef]
- Chakraborti, S.; Graham, M. Nonparametric (distribution-free) control charts: An updated overview and some results. Qual. Eng. 2019, 31, 1–22. [Google Scholar] [CrossRef]
- Jones-Farmer, L.A.; Jordan, V.; Champ, C.W. Distribution-free Phase I control charts for subgroup location. J. Qual. Technol. 2009, 41, 304–316. [Google Scholar] [CrossRef]
- Gibbons, J.D.; Chakraborti, S. Nonparametric Statistical Inference, 5th ed.; Taylor and Francis: Abingdon, UK, 2010. [Google Scholar]
- Amin, R.W.; Reynolds, M.R.; Bakir, S.T. Nonparametric quality control charts based on the sign statistic. Commun. Stat. Theory Methods 1995, 24, 1597–1623. [Google Scholar] [CrossRef]
- Bakir, S.T. A distribution-free Shewhart quality control chart based on signed-ranks. Qual. Eng. 2004, 16, 613–623. [Google Scholar] [CrossRef]
- Cornish, E.A.; Fisher, R.A. Moments and Cumulants in the Specifications of Distributions. Rev. Int. Stat. Inst. 1937, 4, 1–14. [Google Scholar] [CrossRef] [Green Version]
- Johnson, N.; Kotz, S. Continuous Univariate Distributions; Wiley: New York, NY, USA, 1970; p. 34. [Google Scholar]
- Omar, M.; Arafat, S.; Hossain, M.; Riaz, M. Inverse Maxwell Distribution and Statistical Process Control: An Efficient Approach for Monitoring Positively Skewed Process. Symmetry 2021, 13, 189. [Google Scholar] [CrossRef]
- Chernozhukov, V.; Fernandez-Val, I.; Galichon, A. Improving point and interval estimators of monotone functions by rearrangement. Biometrika 2009, 96, 559. [Google Scholar]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).