1. Introduction
The quality of a process can often be characterized by multiple correlated variables. For example, when monitoring a patient’s systolic and diastolic blood pressure, it is more appropriate to apply one multivariate control chart than two univariate charts. Hotelling [
1] introduced the first multivariate control chart. Many researchers have improved on this chart. For example, Lowry et al. [
2] proposed a multivariate exponentially weighted moving average (MEWMA), also Crosier [
3] developed a multivariate cumulative sum (MCUSUM) chart. Some other researchers introduced multivariate control charts that monitor the variability of a process. For example, Alt [
4] proposed a generalized variance chart (GVC). A GVC uses the determinant of the estimated covariance matrix of grouped observations as the monitoring statistics. For details on multivariate dispersion charts for grouped observations, see the review articles by Yeh et al. [
5] and Bersimis et al. [
6].
Recently, Ajadi and Zwetsloot [
7] compared the performance of a multivariate EWMA chart proposed by Huwang et al. [
8] for monitoring process variability of individual observations with various charts based on grouped observations. The authors concluded that monitoring methods based on individual observations are quickest in detecting sustained shifts in the process variability. Reynolds Jr and Stoumbos [
9] and Reynolds Jr and Stoumbos [
10] also advised monitoring with individual observations in the univariate setup.
Ajadi et al. [
11] gave an overview of the existing multivariate parametric and nonparametric control charts for monitoring process dispersion of individual observations. Next we briefly introduce some noteworthy examples. Yeh et al. [
12] proposed a MaxMEWMV chart. This chart is developed by computing the difference between the estimated covariance matrix derived from the EWMA statistic and the identity matrix. Then, the
norms of both the diagonal elements and the upper triangular off-diagonal elements are computed. Finally, a single statistic is derived from the two distance-measures to detect changes in the process variability. Huwang et al. [
8] proposed the multivariate exponentially weighted mean squared deviation (MEWMS) and the multivariate exponentially weighted moving variance (MEWMV) charts; these charts are based on the trace of the estimated covariance matrix obtained from the EWMA statistic. According to the authors, the MEWMS chart is designed to monitor only changes in the process variability while the MEWMV chart is designed to simultaneously monitors shifts in both the process mean vector and the covariance matrix. Hawkins and Maboudou-Tchao [
13] proposed the multivariate exponentially weighted moving covariance matrix (MEWMC) chart. The authors applied the Alt’s likelihood ratio statistic to compare the estimated EWMA covariance matrix with the identity matrix. The multivariate control charts discussed above are parametric charts. Li et al. [
14] employed a multivariate spatial sign test and EWMA statistic to develop a nonparametric control chart for monitoring shape parameters of the underlying data. The authors named the chart as the multivariate nonparametric shape EWMA (MNSE).
Except for the nonparametric control charts, the other multivariate charts for monitoring the process variability of individual observations are based on the assumption that process data are normally distributed. Stoumbos and Sullivan [
15] investigated the robustness of the MEWMA control chart for monitoring process mean vector. The authors showed that the MEWMA control chart is very robust to non-normality even for a highly skewed and heavy-tailed multivariate distributions. Testik et al. [
16] extended the work of Borror et al. [
17] to examine the robustness properties of multivariate EWMA control charts to non-normal data. Zwetsloot and Ajadi [
18] showed that the performance of the univariate EWMA control chart based on logarithm of sample variance is the most robust to normality assumption among three compared univariate EWMA dispersion charts. The authors recommended the chart to be used in practice since it can be difficult to distinguish between normally distributed data, and data that slightly deviates from normality. We have not seen any research that investigate the robustness of the multivariate dispersion charts for individual observations.
Ajadi et al. [
11] identified some research directions. One of these is that the robustness of multivariate dispersion charts for individual observations should be investigated. This is one of the motivations for the current study. The objective of this article is to introduce a monitoring method for detecting changes in the process dispersion using individual observations. This method should be robust to slight deviation from the normality assumption. The chart is designed using the logarithm of diagonal elements of the sample covariance matrix estimated using the EWMA statistic. This makes the chart robust to non-normality. In
Section 2, we briefly discuss the MaxMEWMV, the MEWMS and a nonparametric control chart. In
Section 3, the proposed control chart is discussed in detail. Next, in
Section 4, we compare the performance of the proposed chart with the other existing control charts discussed in
Section 2. An example is used in
Section 5 to support the comparison. Finally, we provide conclusions and recommendations in
Section 6.
2. Multivariate EWMA Dispersion Control Charts for Individual Observations
Assume that the data follow a multivariate normal distribution, i.e.,
, where the process mean vector and covariance matrix of a
p correlated quality characteristics are denoted by
and
, respectively. Next,
is standardized by transforming it to
as
Thus,
follows a multivariate normal distribution
, where
and
. Note that
and
, are the in-control values of
and
, respectively. When the process is in-control,
, where
is a
identity matrix. To compute
, we need estimates of
and
. These are usually obtained in a Phase I analysis (see
Section 3.2 for details).
Next, we discuss two popular parametric (MaxMEWMV, MEWMS) and one nonparametric (MNSE) control charts for monitoring the covariance matrix of individual observations.
2.1. MaxMEWMV Chart
The first multivariate EWMA control chart for monitoring the process variability of individual observations was proposed by Yeh et al. [
12]. This chart is referred to as the MaxMEWMV. As Yeh et al. [
12] suggested we first transform the data as
where
is a smoothing constant and
. The monitoring statistic for the MaxMEWMV chart is defined as
where
. Note that
is the
of the diagonal elements of
and
is the
of the upper triangular off-diagonal elements of
. Where
is used to detect shifts in the process variance and
to detect changes in the correlation structure. The expectation and variance of
and
are derived analytically by Yeh et al. [
12]. A signal is given when
exceeds an upper control limit (UCL).
2.2. MEWMS Chart
Later in 2007, Huwang et al. [
8] proposed two multivariate EWMA dispersion charts. One of the charts (MEWMS) is designed to detect only changes in the process variability. The MEWMS control chart applies the trace of the estimated covariance matrix obtained from the EWMA statistic (
) as the monitoring statistic.
where
is the
diagonal element of
. The monitoring statistic,
, is compared with the upper and lower control limits (
and
);
where
and
L is the control charting constant. The chart signals when
or
.
2.3. MNSE Chart
Li et al. [
14] incorporated a multivariate spatial sign test with an exponentially weighted moving average (EWMA) scheme to develop a distribution-free control chart for monitoring shape parameters. The authors named the chart the multivariate nonparametric shape EWMA (MNSE) chart. They showed that the MNSE chart is robust to non-normal distributions of the data, in the sense that the in-control performance is not affected. The mean vector (
) and covariance matrix (
) estimators of the MNSE chart can be obtained simultaneously through Equations (
3) and (
4).
where
m is the size of the historical dataset. The algorithm to compute
and
estimates from Equations (
3) and (
4) is available in the supplementary file by Zou and Tsung [
19]. Next, Phase II observations, i.e.,
can be standardized and transformed by multivariate spatial sign, i.e.,
, where
is the spatial sign vector. Then, the EWMA statistic can be computed as
where
. Finally, the monitoring statistic,
in Equation (
5), is compared with an upper control limit (
).
3. The Proposed Robust EWMA Control Chart
In this section, we introduce our new multivariate dispersion chart for individual observations. The proposed chart is based on the logarithm of the diagonal elements of and referred to as REWMV, where R is short for robust. We focus on robustness to deviations from the normality assumption that are difficult to identify. As we believe that this is a common situation in practice, data may look approximately normally distributed however it is impossible to verify if they could have slightly heavy tails or if the distribution is a little skewed.
3.1. Guidelines to Implementation of the REWMV Chart
The REWMV control chart incorporates the logarithm of the estimated covariance matrix in the multivariate EWMA statistic. There are three reasons why the logarithm makes the chart more robust. Firstly, as mentioned by Box et al. [
20] (Chapter 5.4), the logarithmic transformation of the sample variance is approximately more normally distributed than the sample variance itself. As the logarithm transforms skewed data to approximately normally distributed data. Secondly, Crowder and Hamilton [
21] showed that the logarithmic transformation changes the model from a variance shift to a location shift model. This is beneficial as most charts are designed to detect mean shifts quickly. Finally, the variance of the logarithm of sample variance is independent of
and only depends on sample size.
This idea has been used in univariate monitoring tools quite extensively; Crowder and Hamilton [
21] suggested the use of an EWMA scheme to the logarithm transformation of the sample variance for monitoring an increase in the process variability. Shu and Jiang [
22] also mentioned that using the logarithm transformation makes the control limits for a two-sided EWMA control chart nearly symmetrical. Zwetsloot and Ajadi [
18] compared the performance of three univariate EWMA dispersion charts based on estimated parameters under normal and non-normally distributed data. The authors showed that the EWMA control chart based on logarithm of the sample variance is more robust to non-normality than the other compared charts.
First, we integrate the logarithm of diagonal elements of the estimated covariance matrix into the multivariate EWMA statistic as
where
, is the smoothing parameter,
and
is a vector of the diagonal elements of the matrix A.
Our objective for this study is to detect both increases and decreases in the process variability for multivariate individual observations. Next, we discuss the procedure for the upper and lower-sided monitoring with the proposed REWMV control chart.
The monitoring statistic is reset to a reflection boundary whenever the statistic is less than the boundary for the upper-sided monitoring; or greater than the boundary for the lower-sided monitoring. The reflection boundary
is the expected value of
, and it is obtained as
under the assumption that
.
We apply the statistics in Equations (
7) and (
8) to detect respectively increases and decreases in the process variability. The
and
in these equations are respectively the element-wise maximum and minimum values of the two vectors.
There are many ways to develop summary statistics to monitor the change of the covariance parameters for multivariate observations. We tried various methods such as taking EWMA statistic before applying logarithm to the estimated covariance matrix; we also employed the mahalanobis distance (a standardized distance measure incorporating the variance of the data) of the estimated EWMA statistic as the summary statistic, among other methods. However, the present method works the best in detecting the considered ranges and types of shifts.
Finally, and are compared against the upper and lower control limits ( and ) to detect upward and downward shifts in the process dispersion respectively.
3.2. Phase I Estimation
To implement our and the other charts, we need estimates of
and
which we obtain in a Phase I analysis. Many covariance matrix estimators have been employed in the literature for estimating parameters in Phase I. The sample covariance matrix of the pooled observations is the most commonly used covariance matrix, and it is defined as
where
and the observations,
, are observed at times
. Note that
m is Phase I sample size. For each observation of the MEWMS, the MaxMEWMV and the REWMV charts in Phase II, we standardize the data based on the Phase I estimates as
Note that, the Phase I estimators for the MNSE chart are obtained through Equations (
3) and (
4).
4. Performance Comparison
In this section, we compare the performance of the proposed REWMV control chart with the existing competing charts (MaxMEWMV, MEWMS, and MNSE charts). We investigate the robustness of the selected charts to data that is slightly non-normal. We believe this is an important aspect of robustness. Because, as we will show in
Section 4.3 it can be very difficult to distinguish normally distributed data from slightly non-normally distributed data. Hence a monitoring method should perform consistently, however the existing monitoring methods will show many additional false alarms if this slight deviation occurs, as we will show in
Section 4.5. But first we discuss the selected performance measures (
Section 4.1) and the set-up of our simulation experiments (
Section 4.2). Code for obtaining the simulation results is available on
github.com/ajadi1982.
4.1. Performance Measures and Control Limits
The performance of the charts is defined in terms of the average of the conditional
values (
). The conditional
is defined as the average run length (
) conditional on a specific Phase I estimate, i.e.,
. We evaluate the conditional
for each of the charts with 100 Monte Carlo simulation of the run lengths based on the Phase I parameters. Then, we calculate the mean of 10,000 conditional ARL simulations to obtain the
. We adjust the control limits for different Phase I sample sizes such that the
for
,
and
. The adjusted control limits for each of the charts at specific values of
p,
and
m are provided in
Table 1. They are obtained under the assumption that the data are normally distributed.
4.2. Description of Experiments
Our in-control model is
0 ⋯
and
. For the out-of-control model, we set a general form
and
for overall and sparse shifts, respectively, as
and
We consider the following four scenarios for overall or sparse shifts in the process:
Case 1: overall increase shifts in the variance-covariance matrix
We increase all the diagonal elements of the covariance matrix (i.e.,
) and/or the correlation coefficients as given in Equation (
11).
Case 2: Sparse increase shifts in the variance-covariance matrix
We consider increasing shifts in only the first element of the diagonal matrix where
and
as given in Equation (
12).
Case 3: overall decrease shifts in the variance-covariance matrix
We decrease all the diagonal elements of the covariance matrix (i.e.,
) where
and
as given in Equation (
11).
Case 4: Sparse decrease shifts in the variance-covariance matrix
We consider decreasing shifts in only the first element of the diagonal matrix where
and
as given in Equation (
12).
In addition, we investigate the robustness of the proposed chart and its counterparts. We consider the multivariate t- distribution to model high kurtosis (kurtosis measures the thickness of the tails of a distribution), , where is the number of degrees of freedom. We use degrees of freedom. The in-control process mean vector and covariance matrix of are and respectively. Since the focus of this study is to monitor the changes in the process variability, the process mean vector remains constant in the out-of-control model while the covariance matrix changes to or .
In addition, we employ the multivariate gamma distribution to model high skewness,
, where
and
are the shape and scale parameters of the distribution respectively. Throughout this article, we use
. Note that, for gamma distributed data, the in-control covariance matrix,
, changes to
(since
) in the out-of-control model. We employed
. Details of the multivariate t and gamma distributions can be found in the appendix of Stoumbos and Sullivan [
15].
4.3. How Non-Normal Are the Data?
In this section, we test for the normality of Phase I data simulated for different models using the Henze and Zirkler multivariate normality test. We run 10000 simulations of the test for different values of
m; and compute the percentage of the tests that do not reject normality.
Table 2 shows the results. We require Phase I sample of
,
and
for
t distributed data with 10, 15 and 30 degrees of freedom respectively to have confidence that the data is non-normal. Moreover, we need
,
and
for
,
and
respectively to be certain that the skewed data fail the multivariate normality test.
Hence, we see that for small and moderate size Phase I samples it is very challenging and sometimes impossible to distinguish between normal and slight non-normally distributed data. Hence there is a need for control charts that are robust to deviations from the normality assumption. We recommend the use of robust charts when the number Phase I data is less than 500, since testing cannot distinguish between data which is normally or non-normally distributed with high accuracy.
4.4. Analysis of Phase I Estimators on Different Distributions
In this section, we analyze how the Phase I sample size affects the covariance matrix estimators under different distributions (normal, gamma, t) when the process is in-control. The variability of the trace of the covariance matrix estimates for different values of
m is presented in
Table 3. We observe that the variability decreases as
m increases. The multivariate normal distribution has the least variability when estimating
. The non-parametric covariance matrix estimator (
) has similar variability irrespective of the distribution.
4.5. Performance of the Dispersion Charts
We simulated the
values for each of the four competing control charts under the different scenarios discussed in
Section 4.2. Results are displayed in
Table 4,
Table 5,
Table 6 and
Table 7 where gray highlighted values correspond to in-control performance and bold face indicates the best performing chart, i.e., the lowest
value. The
values in italics show the most robust charts to non-normality under in-control performance. For
Table 6 and
Table 7, which show results for the scenarios with decreasing shifts, we set the maximum run length to 10,000 in the simulations for the MEWMS and MaxMEWMV charts. Because these charts are not designed to detect decrease shifts and therefore have extremely long run lengths. This maximum ensures reasonable simulation time. We use an asterisk to indicate any
-value which is influenced by this maximization and is therefor a underestimate of the actual
. In addition, we represent
values greater than 1000 by
.
4.5.1. Case 1: Overall Increase Shifts in the Variance-Covariance Matrix
For case 1, the results for different number of correlated quality characteristics (
, 5, 25) under multivariate normal, gamma and t distributions when
are reported in
Table 4. For small and moderate increases in the process variance, we notice that the MEWMS chart has the best performance among the competing charts. For example, when
and
, the MEWMS chart has the lowest
value of 42, while the
of the MaxMEWMV, the MNSE and the REWMV control charts are respectively 63, 203 and 55. In addition, the parametric charts ( REWMV, MEWMS, and the MaxMEWMV charts) have almost similar performance at high increase shift (
) in the process especially for a high value of
p.
When the process variances are unchanged (i.e., ) but the correlation increases (), the MaxMEWMV chart shows the best performance for while the MNSE control chart outperforms the other compared charts for and . When the MaxMEWMV control chart perform best for simultaneous shifts in both process variances and the correlation. However, the MEWMS control chart has the best performance among the compared charts for and .
We also observed that values of the REWMV and the MNSE control charts under normal, gamma and t distributions are almost equal for each of the process shifts we consider, i.e. they are robust to deviations from the normality assumption. However, the disadvantage of the MNSE chart is that it does not respond to the overall increase shifts in the process variance; see that all values are approximately 200. The MEWMS and MaxMEWMV control charts are seriously affected by slight deviation from the normality assumption. For instance, values of the MEWMS and the MaxMEWMV control charts are respectively 119 and 120 with gamma distributed data. The MEWMS and the MaxMEWMV charts still have the least values under the non-normal distributions but note that their in-control performance have the highest rate of false alarms with t and gamma distributions.
4.5.2. Case 2: Sparse Increase Shifts in the Variance-Covariance Matrix
Next, we consider a scenario where only one variable increases. Without loss of generality, we select the first diagonal element of the covariance matrix to changes where
and
. Here, we simulated for
and 2 under multivariate normal, gamma and t distributions, results are reported in
Table 5.
The MEWMS control chart performs best while the MNSE chart shows the worst performance when detecting only shifts in the process variances. For instance, when at under normally distributed data, the for the MEWMS, the MaxMEWMV are respectively 134 and 142, while that of the MNSE and the REWMV control charts are 191 and 150 respectively.
In addition, the MaxMEWMV chart has the best performance for when the process variances are unchanged but the correction increases (), while the MNSE control chart outperforms the other compared charts for and . The MaxMEWMV chart perform best when and for simultaneous shifts in the process variance and correlation while the MNSE chart outperforms the other charts for .
Overall, the out-of-control performance of the REWMV control chart is the most robust to non-normality for a sparse increase shift in the process variance among the compared parametric charts.
4.5.3. Case 3: Overall Decrease Shifts in the Variance-Covariance Matrix
We simulated for the overall decrease shifts (
0.2, 0.3, 0.4, 0.6, and 0.8) in the covariance matrix when
or
(case 3). The simulation results for this scenario when
for
under multivariate normal, gamma and t distributions are presented in
Table 6. The MEWMS and MaxMEWMV control charts are inefficient in detecting decrease shifts in the variance as their
values in most cases are greater than 1000. We notice that the REWMV chart has the best performance in detecting overall decrease shifts in the process dispersion.
The MNSE control chart does not react to downward shifts in the process. For example, when at and , the value of the MNSE chart is 201 and that of the MEWMS and the MaxMEWMV charts are greater than 1000 but the proposed REWMV control chart has the least value of 12. In addition, with the t and gamma distributed data, we observe that the REWMV chart is robust to non-normality for the out-of-control performance.
4.5.4. Case 4: Sparse Decrease Shifts in the Variance-Covariance Matrix
Consider case 4, where only the first variance decreases, results are displayed in
Table 7. The proposed REWMV control chart outperforms the other charts irrespective of the values of
p when the sparse decrease shifts are introduced under multivariate normal, gamma and t distributions and
. The MEWMS and the MaxMEWMV charts show the worst performance among the compared charts. For instance, when
at
, the
values of the MEWMS, the MaxMEWMV are respectively 714 and 319 under normal distribution; while the REWMV control chart has the lowest
value of 76. The MNSE chart performs best when there is simultaneously shifts in both process variances and correlations.
The MNSE control chart, again, does not detect any change as the values are approximately 200 under multivariate t and gamma distributions. The REWMV chart is the most robust to normality assumption among the compared parametric charts for the out-of-control performance. For instance, the AARL values for the REWMV chart under normal, gamma, and t distributions are respectively 56, 57, and 49 for , and while that of MaxMEWMV chart are respectively 319, 117, and 155. However, the excessive rate of false alarm of the MEWMS and MaxMEWMV control charts with the non-normality affect the out-of-control performance of the charts such that their are lower than that of the MNSE and the REWMV control charts when .
4.6. Further Investigation on Robustness
In this section, we want to investigate the robustness of the charts to non-normality under the in-control performance, results are displayed in
Figure 1. We employed t distribution with different degree of freedom (15 up to 50) to model moderate to high kurtosis for the 4 considered multivariate EWMA dispersion charts. We notice that the proposed chart has the most robust performance among the three parametric charts because its
is closest to 200. Moreover, its performance is almost similar to the MNSE chart.
In addition, we observe that for all the values of we considered for the gamma distributed data, the proposed REWMV and the MNSE charts have approximately . In general, the MaxMEWMV and the MEWMS charts show the worst performance in terms of their robustness to non-normally distributed data.
Overall, in
Table 4,
Table 5,
Table 6 and
Table 7 and
Figure 1, we notice that the REWMV and MNSE control charts consistently show the most robust performance to non-normality in the in-control. However, the MNSE shows very poor performance for overall shifts in the process variance. The MEWMS and MaxMEWMV control charts have the best performance for increase shifts in the process variances, only if the data is perfectly normally distributed. However, the proposed REWMV control chart shows the best performance for the overall decrease shifts in the process variances and comparable performance to the MEWMS and MaxMEWMV charts for increases. In addition it shows robust performance if the data show small deviation from normality.
6. Conclusions and Recommendation
In this paper, we introduced the REWMV control chart, a chart that is effective in detecting changes in the process dispersion when monitoring multivariate processes based on individual observations. The chart is robust to slight deviations from the normality assumption and guarantees in-control performance as expected and reasonable quick shift detection. It does not suffer from excessive false alarms like most method. This chart applies the logarithm to the covariance matrix. The REWMV is a new robust multivariate dispersion chart. We compared the performance of the REWMV control chart with some existing multivariate EWMA dispersion charts based on estimated parameters under normally and non-normally distributed data.
The REWMV chart outperforms the other competing charts when there is an overall decrease shift in the covariance matrix and shows comparable performance for increase shifts. In general, the MEWMS and the MaxMEWMV control charts have bad performance for a decrease in the process variances, while they show the best performance among the compared charts for an increase shifts in covariance matrix. Moreover, the performance of the non-parametric MNSE chart is bad for an overall increase or decrease in the process dispersion.
Next, we investigated the robustness of the REWMV control chart and the other multivariate EWMA dispersion charts for individual observations. We showed that the in-control performance of the MaxMEWMV and MEWMS control charts suffers from an excessive rate of false alarms with the gamma and t distributions. Both the MNSE and the REWMV control charts are robust to non-normality under in-control and out-of-control performance, where the MNSE chart is almost perfectly distribution free.
We apply only the constant control limits for monitoring the proposed REWMV chart by using two separate charts for detecting upward and downward shifts in the process dispersion. However, we recommend the use of probability limits for future work to enhance simultaneous monitoring of both increase and decrease shifts in the covariance matrix. In addition, in this article, we are only restricted to monitoring the process mean vector and covariance matrix, however, other shape parameters shifts may also be investigated for monitoring the REWMV charts.
The focus of this paper is on monitoring process variability with individual observations, however, the proposed chart can as well be extended to monitoring grouped observations where the sample covariance matrix of the grouped observations are incorporated in the EWMA statistic of Equation (
6). Moreover, we can extend this study by investigating the robustness of some Phase I covariance matrix estimators to outliers with the proposed REWMV control chart. In addition, integrating two multivariate control charts to detect the process mean vector and covariance matrix simultaneously sometimes loses the ability to provide detailed diagnostic information when a process is out of control. Hence, the performance of the proposed REWMV chart can be investigated when monitoring such out-of-control scenario.