Cumulative Sum Chart Modeled under the Presence of Outliers

: Cumulative sum control charts that are based on the estimated control limits are extensively used in practice. Such control limits are often characterized by a Phase I estimation error. The presence of these errors can cause a change in the location and / or width of control limits resulting in a deprived performance of the control chart. In this study, we introduce a non-parametric Tukey’s outlier detection model in the design structure of a two-sided cumulative sum (CUSUM) chart with estimated parameters for process monitoring. Using Monte Carlo simulations, we studied the estimation e ﬀ ect on the performance of the CUSUM chart in terms of the average run length and the standard deviation of the run length. We found the new design structure is more stable in the presence of outliers and requires fewer amounts of Phase I observations to stabilize the run-length performance. Finally, a numerical example and practical application of the proposed scheme are demonstrated using a dataset from healthcare surveillance where received signal strength of individuals’ movement is the variable of interest. The implementation of classical CUSUM shows that a shift detection in Phase II that received signal strength data is indeed masked / delayed if there are outliers in Phase I data. On the contrary, the proposed chart omits the Phase I outliers and gives a timely signal in Phase II.


Introduction
The cumulative sum (CUSUM) control chart is an effective monitoring tool widely used in industries and medical processes for quality improvement [1].The scheme was introduced by [2] as the substitution of the traditional Shewhart control chart.The CUSUM chart statistic accumulates the past and current information of the process, which provides more sensitivity to detect small and moderate shifts as compared to the traditional Shewhart control chart.Designing a CUSUM control chart requires setting up of the control limit, where the known in-control parameters are often assumed.However, this assumption is not realistic, and hence the CUSUM chart is implemented in a two-phase method.In Phase I, random observations are collected from a stable process and used to estimate the unknown parameters.In Phase II, the estimates from the earlier observations are used for the construction of the CUSUM chart to monitor and detect changes in a process [3].
The performance of a CUSUM chart to effectively handle changes in the process in Phase II largely depends on the accuracy of the estimated parameters in Phase I. Furthermore, higher chances The rest of the article is organized as follows.In the next section, we gave overview information on the two-sided CUSUM chart with estimated parameters followed by the performance measure metrics in terms of the RL properties.Section 3 presents the practitioner-to-practitioner variation on the performance of the CUSUM chart.The section also discusses the effect of error estimation on CUSUM control limits.In Section 4, we gave the design structure of the CUSUM chart in the presence of outliers and analyzed the effect of extremes on its in-control performance.The introduction of the Tukey outlier detection model in the CUSUM chart is presented in Section 5.An application example to illustrate the practical use of the scheme is given in Section 6.Finally, we provide some concluding remark in Section 7.

Overview of CUSUM Charts with Estimated Parameters
Let X i1 , X i2 , X i3 , . . ., X in .for i = 1, 2, 3, . . .be independent random observations of size n from a normal process, with a known in-control mean µ 0 and standard deviation σ 0 .The upper and lower sided CUSUM chart statistics for monitoring the upward and downward changes in the process location parameters are respectively, given by where max [a, b] and min [a, b] are the maximum and minimum of a and b, respectively.The statistic, X i = (1/n) n j=1 X ij is the mean of i th sample, and k is the reference value.The initial values, CUSUM + 0 and CUSUM − 0 , are usually set equal to zero.The chart gives an out-of-control signal when either CUSUM + i or CUSUM − i exceeds the predetermined control limit, h.The h is usually chosen to satisfy the desired in-control RL property.
However, if the process parameters are unknown, then µ 0 and σ 0 are replaced by their corresponding Phase I estimates.Let X i j , i = 1, 2, 3, . . ., m and j = 1, 2, 3, . . ., n denote m random samples each of size n of Phase I observations from a stable process.Then the unbiased estimator for µ 0 , is the overall sample mean given by μ0 and for the unbiased estimator of σ 0 when subgroup size n > 1, we used the pooled standard deviation, recommended by some researchers like Chen [22], Mahmoud, Henderson [23] and Nazir, Abbas [24].
Here, S 2 i = 1/(n − 1) n j=1 X ij − X i is the variance the of i th Phase I sample.The unbiased estimator is defined by where the constant, c 4 (w) = 2/(w − 1) Γ(w/2)/Γ[(w − 1)/2] is the bias correction constant that depends on the m and n.Thus, the corresponding two-sided CUSUM chart statistics based on the estimated parameters are defined as The statistical performance of a CUSUM chart is often evaluated in terms of its RL distribution [25].For a two-sided CUSUM chart with initial value of CUSUM 0 = z, where z ∈ (−h, h), the probability mass function [26] is given by pr(rl|z) = P(RL = rl | CUSUM 0 = z) For a single case, rl = 1, we have where v = σ0 /σ 0 , λ = σ/σ 0 , δ = √ n(µ − µ 0 )/σ 0 , u = √ mn( μ0 − µ 0 )/σ 0 and Φ(.) denotes the standard normal distribution function.For the case when rl > 1, we have where φ(.) is the standard normal density function.The most common used RL property to evaluate the performance of a control chart is the average run length (ARL), which represents the average number of samples plotted on a control chart before a process issues a signal.The ARL measures how quickly a control chart responds to changes in a process.If Equation ( 7) is denoted by g(u, v), for simplicity, then the ARL can be defined by the integral equation [26,27].
where f (v) is the scaled chi (χ) distribution with m (n − 1) degrees of freedom from cχ/ m (n − 1), and c is a scaled factor.There is also the standard deviation of run length (SDRL) that sometimes is used as a supplementary measure.The SDRL is the standard deviation of samples until the chart gives an out-of-control signal, that is, where For an in-control process, denote the ARL by ARL 0 , which in practice, should be sufficiently large to avoid unnecessary false signals.Furthermore, denote the out-of-control ARL by ARL 1 , which should be small enough to enable early detection of changes in a process.The above RL properties of a two-sided CUSUM chart may be obtained by evaluating f (v), but unfortunately, it cannot be computed exactly.Hence, the need for approximation using either Gaussian quadrature, Markov chain approximation or Monte Carlo simulation.With the technological advancements in computing software, we followed the simulation approach as recommended by several authors of the quality control chart.

Variability in the CUSUM Chart Performance
For the location control chart, the process is assumed to be initially stable with an in-control mean µ 0 and standard deviation σ 0 .After a certain point in time, it changes from the target value µ 0 to an out-of-control value µ 1 = µ 0 + δσ 0 thus, requiring immediate and quick detection of such changes.Without loss of generality, we assumed that the in-control process is normally distributed.To study the so-called practitioner-to-practitioner variation on the performance of the CUSUM chart, 100,000 seeded iterations, each sample size n = 5, were generated from the standard normal distribution N(0, 1).We then set up the charts with k = 0.25, 0.50, 0.75 and 1.00, using the combinations of the control limit h = 6.8516, 4.1713, 2.9332 and 1.0894 that corresponds to the in-control ARL 0 of 200.We used the simulation approach based on an algorithm developed in R, to compute the distributional properties of the CUSUM chart in terms of the ARL and SDRL for different shift values δ when the control chart parameters µ 0 and σ 0 are known and the results obtained are presented in Table 2.These results are in agreement with the theoretical values of a classical two-sided CUSUM chart [2].The unknown in-control process parameters, on the other hand, are estimated from m = 10, 50, 100, 500 and 1000 in-control Phase I samples each of subgroup size n = 5.Substituting the unknown parameters with their corresponding estimates, the Phase II two-sided CUSUM control charts were developed.For each fixed value of k = 0.25, 0.50, 0.75 and 1.00, the control limit h was determined through simulations to obtain the desired in-control ARL 0 of 200.Here, all the observations are from N( μ0 , σ2 0 ).The ARL and SDRL values are computed using 100,000 simulation iterations.For a clear consequence on the effect of each estimated process parameter on the performance of a CUSUM chart, we considered the cases when either the sample mean or sample standard deviation or both were estimated.Results obtained are given in Tables 3-5.

Effect of Estimation on the Two-Sided CUSUM Chart Performance
Results in Tables 2 and 3 shows that a small number of Phase I samples, produced out-of-control ARL and SDRL values (cf.Table 3) that were higher than the known standard values in Table 2, for a fixed ARL 0 = 200.This is an indication that the use of small Phase I samples to estimate the process mean had direct consequences on the performance of a two-sided CUSUM chart.It follows from Table 4 that the out-of-control ARL was relatively smaller than the desired.Hence, the effect of estimating the standard deviation from Phase I samples had less impact on the ARL performance of the CUSUM chart.However, the very large values of the accompanying SDRLs when was small required the availability of a large amount of Phase I samples.This was also the case when both the parameters were estimated (cf.Table 5).In all the three cases, Tables 3-5, the ARL and SDRL values were closer to the desired values in Table 2 as the number of Phase I observations, m increased.Furthermore, parameter estimation had a more adverse impact on the performance of a two-sided CUSUM chart based on smaller reference value k and designed for quick detection of very small changes in the process mean.

Effect of Estimation on Two-Sided CUSUM Control Limits
To study the effect of estimation error on the two-sided CUSUM control limits, we used a sample size of n = 5 and set the in-control ARL 0 to 200.For each value of k = 0.25, 0.50, 0.75, 1.00, 1.25, 1.50, 1.75 and 2.00, the corresponding value of the control limits h were computed based on 100,000 iterations.Table 6 presents the two-sided CUSUM control limits using values of m ranging from 10 to 1000.Once again, the use of a small number of Phase I observations m to estimate the unknown in-control chart parameters give the control limit that is higher or lower than the desired value when the mean or the standard deviation is estimated, respectively.Similar to the ARL performance and the displayed percentage error curves in Figure 1, quite a larger number of Phase I samples was required to achieve the desired control limit.The problem, however, is the availability of such an amount of Phase I data in practical applications.Hence, the need to design a more robust scheme that can minimize the practitioner-to-practitioner variation, particularly when extreme values or outliers was involved.

Effect of Estimation on the Two-Sided CUSUM Chart Performance
Results in Tables 2 and 3 shows that a small number of Phase I samples,  produced out-ofcontrol ARL and SDRL values (cf.Table 3) that were higher than the known standard values in Table 2, for a fixed ARL = 200.This is an indication that the use of small Phase I samples to estimate the process mean had direct consequences on the performance of a two-sided CUSUM chart.It follows from Table 4 that the out-of-control ARL was relatively smaller than the desired.Hence, the effect of estimating the standard deviation from Phase I samples had less impact on the ARL performance of the CUSUM chart.However, the very large values of the accompanying SDRLs when  was small required the availability of a large amount of Phase I samples.This was also the case when both the parameters were estimated (cf.Table 5).In all the three cases, Tables 3-5, the ARL and SDRL values were closer to the desired values in Table 2 as the number of Phase I observations,  increased.Furthermore, parameter estimation had a more adverse impact on the performance of a two-sided CUSUM chart based on smaller reference value  and designed for quick detection of very small changes in the process mean.

Effect of Estimation on Two-Sided CUSUM Control limits
To study the effect of estimation error on the two-sided CUSUM control limits, we used a sample size of  = 5 and set the in-control ARL to 200.For each value of  = 0.25, 0.50, 0.75, 1.00, 1.25, 1.50, 1.75 and 2.00, the corresponding value of the control limits ℎ were computed based on 100,000 iterations.Table 6 presents the two-sided CUSUM control limits using values of  ranging from 10 to 1000.Once again, the use of a small number of Phase I observations  to estimate the unknown in-control chart parameters give the control limit that is higher or lower than the desired value when the mean or the standard deviation is estimated, respectively.Similar to the ARL performance and the displayed percentage error curves in Figure 1, quite a larger number of Phase I samples was required to achieve the desired control limit.The problem, however, is the availability of such an amount of Phase I data in practical applications.Hence, the need to design a more robust scheme that can minimize the practitioner-to-practitioner variation, particularly when extreme values or outliers was involved.

The Outliers and CUSUM Chart with Estimated Parameters
The effect of estimation errors on the performance of a CUSUM chart may further be strained if there exist some extreme values in the Phase I samples.Both the in-control and the out-of-control ARL and SDRL values will be different from those of the theoretical CUSUM charts.In this section, we evaluated the effects of the outliers on the performance of a two-sided CUSUM control chart with estimated parameters.Using a simulation approach, outliers were generated from the mixture distribution, where (1 − α) 100% regular observations were from N( μ, σ2 ) and the remaining α100% observations came from a multiple of χ 2 (n) with n degrees of freedom, [28].That is, each observation was generated from a mixture distribution (1) (10) where α is the probability of having a multiple of χ 2 (1) added and w ≥ 1 is the outlier model multiplier.A value of α = 0 indicates no presence of an outlier in the sampled data.Without loss of generality, we set µ = 0 and σ 2 = 1.The values of w is set equal to 1, 2 or 3 corresponding to the small, medium and large outlier, respectively.
The mean and the variance of mixture distribution in Equation ( 10) are derived in Equations ( 11) and ( 12) respectively. 2 We set up a CUSUM chart using the same design parameters, n, k, h and m as in Section 3. he in-control ARL and SDRL values for the two-sided CUSUM chart based on this model with α = 0.00, 0.01, 0.02, 0.03 and 0.04 are presented in Tables 7-9.To save space, we restricted the study to in-control cases having seen the behavioral pattern for the out-of-control cases in Tables 3-5.From Tables 7-9, it was observed that estimating µ, σ or both in the presence of outliers, α > 0 to set up a CUSUM chart had a significant effect on the ARL and SDRL performance of the chart.Particularly, when the number of Phase I samples, m was small.The in-control ARLs were approximately equal to the limiting value of 200 when α = 0.As expected, the RL values were directly proportional to k and α.That is, the in-control ARL and SDRL deteriorated with the increasing number of the false alarm rate as k, α or both the design parameters increased.In fact, the deterioration level became more alarming with the increase in an outlier metric multiplier, w > 1.Furthermore, as the number of Phase I samples increased, the ARL approached its theoretical value and much faster than its corresponding SDRL (cf.Table 7).However, this was not the case for Tables 8 and 9, when m ≥ 500.In general, increasing the number of Phase I data will reduce the occurrence of false alarm and bring the RL to be closer to the theoretical value.Unfortunately, this may not be visible in practice.Thus, we suggest a design structure based on the robust Tukey outlier detection model.

Performance of the Tukey CUSUM Control Chart
In this section, we studied the performance of the proposed Tukey model based CUSUM control chart with estimated parameters.Let X 1 , X 2 , . . ., X n denote Phase I samples and X be the median samples.Then an observation X k from X 1 , X 2 , . . ., X n is declared as an outlier if X k − X > p × IQR, where IQR = Q 3 − Q 1 is the interquartile range.Q 1 and Q 3 are the first and third quartile of X 1 , X 2 , . . ., X n , corresponding to the 25th and 75th percentile, respectively.The constant, p is the confidence factor commonly chosen between 1.5 and 3.0.The confidence factor of Tukey's detector is selected so that it is not too small leading to unnecessary screening of observations that are not outliers, and at the same time it should not be too large implying the inability of the detector to detect any outliers.For the said reason, p is chosen to be 2.2 for the current study (for more details on the Tukey's outlier detector see, Tukey [28]).
Once an outlier is detected from the Phase I sample using Tukey's model, it is screened and the remaining data points are used to estimate mean and variance of the process.After screening the suspected outliers, distribution of the remaining data points in Phase I is revised from a mixture distribution to a truncated mixture distribution.Here, the truncation limits are set to be LDL = X − 2.2 × IQR and UDL = X + 2.2 × IQR where LDL and UDL are lower and upper detection limits, respectively.Finally, the truncated mean and variance for the Phase I data points are defined, respectively, as follows: where g(x) = f (x) ∀ LDL < x < UDL 0 otherwise and F X (.). is the cumulative distribution function of X.The truncated mean and variance in Equations ( 13) and ( 14) are evaluated for different values of α and w, and are given in Table 10.Table 10 clearly indicates that mixing α(100)% outliers in the distribution disturbs the mean and variance, especially for the larger values of w.On contrary, when the distribution is truncated (i.e., Tukey's outlier detector is applied) this disturbance in the mean and variance is negligible.In view of this discussion, the estimates of the process mean, and variance obtained from the truncated distribution (i.e., after screening the data using Tukey's model) will have the minimal effect of outliers introduced in the Phase I samples.
Using the same design structure and parameters as in Sections 3 and 4, we computed the in-control ARL and SDRL values for the two-sided CUSUM control chart based on the Tukey outlier detection model with the estimated parameters.Three cases were considered, when the mean, the standard deviation or both were estimated.To access the performance of the proposed charts, we present in Figures 2-4, a graphical display of the in-control ARL values with m = 10, 100, 500 and 1000 when the magnitude of outlier multiplier w is small (w = 1), medium (w = 2) and large (w = 3).We presented only the case when both the mean and the standard deviation were estimated, as the other two cases had similar conclusions.Furthermore, we also showed the in-control ARL values in the presence of outliers without screening in Figures 2-4 for a quick comparison.With the two charts side-by-side, we outlined our findings under the following headings.
side-by-side, we outlined our findings under the following headings.

Performance Comparison with Respect to m
We saw earlier that the number of Phase I data, , did have a significant effect on the performance of a CUSUM chart.From Figures 2-4, we saw that there was a vast difference in the reported ARL between non-screened data and when the robust Tukey outlier detection model was applied to construct a CUSUM chart, particularly when  was small.For example, in Figure 3, if  = 10,  = 1.0 and  > 0.04 , the ARL for non-screened data were in five figures while the corresponding Tukey screened data were relatively closed to the target value.Even an increase in the number of Phase I observations with no screening did not appear to have a significant impact on the chart's performance as the outlier multiplier increased.The Tukey screened counterpart, however, was getting closer to the limiting value ARL = 200, as  increased.
In other words, the use of the Tukey outlier detector in the construction of a CUSUM chart would maintain the performance of the chart, even with the handful amount of Phase I data.

Performance Comparison with Respect to m
We saw earlier that the number of Phase I data, m, did have a significant effect on the performance of a CUSUM chart.From Figures 2-4, we saw that there was a vast difference in the reported ARL 0 between non-screened data and when the robust Tukey outlier detection model was applied to construct a CUSUM chart, particularly when m was small.For example, in Figure 3, if m = 10, k = 1.0 and α > 0.04, the ARL 0 for non-screened data were in five figures while the corresponding Tukey screened data were relatively closed to the target value.Even an increase in the number of Phase I observations with no screening did not appear to have a significant impact on the chart's performance as the outlier multiplier increased.The Tukey screened counterpart, however, was getting closer to the limiting value ARL 0 = 200, as m increased.
In other words, the use of the Tukey outlier detector in the construction of a CUSUM chart would maintain the performance of the chart, even with the handful amount of Phase I data.

Performance Comparison with Respect to α
If α = 0, the in-control ARL values of CUSUM charts were approximately equal to the theoretical value of 200 and indicates the absence of outliers in the Phase I sampled data.However, as the magnitude of α increased, the non-screened data blew out of proportion, particularly when m was small and k > 0.5.For example, if m = 10, k = 1.0 and α = 0.05, in Figure 2, the ARL 0 for the non-screened observations was 770 as against to 280 when the Tukey outlier detection model was applied.Even with the large values of k and α, the Tukey screened data appeared to be getting closer to the nominal value as m increased.The same conclusion could not be made for non-screened data, as the in-control ARL values remained high when α was relatively large (cf.Figures 2-4).This means that the Tukey's model would not only keep the ARL 0 on target but also maintain the performance of the CUSUM control chart.In general, we observed that the effect of α was minimal when k was small.

Performance Comparison with Respect to w
The larger the magnitude of outlier multiplier w, the worst the in-control ARL value of a two-sided CUSUM chart.If the outliers in a Phase I data were not screened, the ARL 0 was so huge as w increased, that the capability of the CUSUM chart in process monitoring was seriously affected.Unlike the Tukey based chart that tried to maintain the ARL 0 at the target value.For example, if m = 10, k = 0.25 and α = 0.05, the in-control ARL values for the non-screened data were 302, 634 and 1025 when w = 1, 2 and 3, respectively.Compared to the screened Phase I data by the Tukey's model with ARL values of 217, 222 and 225.Thus, the Tukey CUSUM chart could relatively withstand the impact of outlier multiplier w as compared to the chart based on non-screened data.

Illustrative Example
For illustrating the application of Tukey's outlier detectors with the CUSUM control chart, we used a dataset from [3].The variable of interest was the flow width measurement (in microns) for the hard-brake process.The data consisted of twenty-five in-control Phase I samples and twenty out-of-control Phase II samples where the average width had increased due to an assignable cause(s).The process mean and standard deviation were estimated (cf.Equations ( 2)-( 4)) from Phase I samples and were found to be 1.5056 and 0.14, respectively.These estimates were used to set up a CUSUM control chart for Phase II samples.
It is clearly observed from the scatter plot given in Figure 5a that the observations were relocated in Phase II.Further, it might also be confirmed from the CUSUM chart plotted in Figure 5b, which indicates several out-of-control signals in Phase-II.These findings led to the evidence that the hard-brake process had a positive shift at subgroup number fifteen and onwards.
Mathematics 2020, 8, x FOR PEER REVIEW 23 of 31 If  = 0 , the in-control ARL values of CUSUM charts were approximately equal to the theoretical value of 200 and indicates the absence of outliers in the Phase I sampled data.However, as the magnitude of  increased, the non-screened data blew out of proportion, particularly when  was small and  > 0.5.For example, if  = 10,  = 1.0 and  = 0.05, in Figure 2, the ARL for the non-screened observations was 770 as against to 280 when the Tukey outlier detection model was applied.Even with the large values of  and , the Tukey screened data appeared to be getting closer to the nominal value as  increased.The same conclusion could not be made for non-screened data, as the in-control ARL values remained high when  was relatively large (cf.Figures 2-4).This means that the Tukey's model would not only keep the ARL on target but also maintain the performance of the CUSUM control chart.In general, we observed that the effect of  was minimal when  was small.

Performance Comparison with Respect to w
The larger the magnitude of outlier multiplier , the worst the in-control ARL value of a twosided CUSUM chart.If the outliers in a Phase I data were not screened, the ARL was so huge as  increased, that the capability of the CUSUM chart in process monitoring was seriously affected.
Unlike the Tukey based chart that tried to maintain the ARL at the target value.

Illustrative Example
For illustrating the application of Tukey's outlier detectors with the CUSUM control chart, we used a dataset from [3].The variable of interest was the flow width measurement (in microns) for the hard-brake process.The data consisted of twenty-five in-control Phase I samples and twenty out-ofcontrol Phase II samples where the average width had increased due to an assignable cause(s).The process mean and standard deviation were estimated (cf.Equations ( 2)-( 4)) from Phase I samples and were found to be 1.5056 and 0.14, respectively.These estimates were used to set up a CUSUM control chart for Phase II samples.
(a) (b) It is clearly observed from the scatter plot given in Figure 5a that the observations were relocated in Phase II.Further, it might also be confirmed from the CUSUM chart plotted in Figure 5b, which indicates several out-of-control signals in Phase-II.These findings led to the evidence that the hardbrake process had a positive shift at subgroup number fifteen and onwards.Now using the data perturbation technique (cf.Kargupta, Datta [30] and Liu, Kargupta [31]), we introduced random outliers in different subgroups.Further, the process mean and standard deviation were estimated and found out to be 1.554 and 0.21544, respectively.Based on these estimates, we constructed the limits, which were further used to monitor the location of Phase II samples.In Figure 5c, the scatter plot depicts a slight upward change in Phase II, and control chart presented in Figure 5d shows that the out-of-control situation in Phase II was delayed (to subgroup number twenty) due to a small number of outliers present in Phase I.This happened because the limits widened due to the variation in Phase I estimates of process mean and standard deviation.
Finally, by using the above-mentioned contaminated Phase I data, we estimated the limit of the Tukey's outlier detector, which was found to be  ×  = 0.44726.Now for any value, the absolute deviation from the median (i.e., | − 1.5171|) greater than 0.44726 implies that the corresponding value is an outlier and needs to be screened from the data.Hence, by using the outlier detector, six observations were screened from the Phase I data.Further, the process mean and standard deviation were estimated and found out to be 1.5123 and 0.152, respectively.These new estimates were similar to the estimates of the original data and the scatter plot of the data is given in Figure 5e, which also showed upward trend in Phase II.In Figure 5f, the control chart is presented, which revealed Now using the data perturbation technique (cf.Kargupta, Datta [29] and Liu, Kargupta [30]), we introduced random outliers in different subgroups.Further, the process mean and standard deviation were estimated and found out to be 1.554 and 0.21544, respectively.Based on these estimates, we constructed the limits, which were further used to monitor the location of Phase II samples.In Figure 5c, the scatter plot depicts a slight upward change in Phase II, and control chart presented in Figure 5d shows that the out-of-control situation in Phase II was delayed (to subgroup number twenty) due to a small number of outliers present in Phase I.This happened because the limits widened due to the variation in Phase I estimates of process mean and standard deviation.
Finally, by using the above-mentioned contaminated Phase I data, we estimated the limit of the Tukey's outlier detector, which was found to be p × IQR = 0.44726.Now for any value, the absolute deviation from the median (i.e., |X − 1.5171|) greater than 0.44726 implies that the corresponding value is an outlier and needs to be screened from the data.Hence, by using the outlier detector, six observations were screened from the Phase I data.Further, the process mean and standard deviation were estimated and found out to be 1.5123 and 0.152, respectively.These new estimates were similar to the estimates of the original data and the scatter plot of the data is given in Figure 5e, which also showed upward trend in Phase II.In Figure 5f, the control chart is presented, which revealed that the there was no change in the limits, but the chart had detected an increase in the process mean at subgroup number sixteen.

Practical Application
In recent years, activity recognition (AR) became an emerging research topic due to the advancement of electronic devices.AR is commonly used in pattern recognition, ubiquitous computing, human behavior modeling and human-machine interaction.In health care studies, different electronic devices are commonly used to recognize everyday life activities.In eldercare centers, these facilities provide assistance and care to the elders and help to ensure their safety and successful aging.Commonly, wearable devices and cameras are used to monitor everyday life activities, but these approaches suffer from several disadvantages such as intrusiveness, time-consuming processing and low resolution.Therefore, to overcome these challenges in real-time activity recognition, Hong, Kang [31] used an alternative method named as multisensor data fusion (assembly reliability evaluation method-AReM).For a more detailed introduction on the AReM see [32].In the AReM system, information is gathered from an inertial sensor embedded in a smartphone and wireless sensor system, which is plugged between the user and environment.Further, in a wireless sensor network, the movement of an individual is measured in the received signal strength (RSS) between the user and environment.For the AR dataset [31], designed a competition.In which three IRIS motes are used and placed on the chest, the right and left ankle of an actor (cf. Figure 6).
different electronic devices are commonly used to recognize everyday life activities.In eldercare centers, these facilities provide assistance and care to the elders and help to ensure their safety and successful aging.Commonly, wearable devices and cameras are used to monitor everyday life activities, but these approaches suffer from several disadvantages such as intrusiveness, timeconsuming processing and low resolution.Therefore, to overcome these challenges in real-time activity recognition, Hong, Kang [32] used an alternative method named as multisensor data fusion (assembly reliability evaluation method-AReM).For a more detailed introduction on the AReM see [33].In the AReM system, information is gathered from an inertial sensor embedded in a smartphone and wireless sensor system, which is plugged between the user and environment.Further, in a wireless sensor network, the movement of an individual is measured in the received signal strength (RSS) between the user and environment.For the AR dataset [32], designed a competition.In which three IRIS motes are used and placed on the chest, the right and left ankle of an actor (cf. Figure 6).
From this wireless sensor network, data was recorded on the actor's activities such as; bending, cycling, standing, sitting, laying and walking.Further, for the first task of heterogeneous AReM, they considered activities such as cycling and standing.For the application purpose, we were concerned to detect a change in the pattern of RSS generated through the heterogeneous AReM setup.The AR time series dataset contained 480 observations in total, and each observation was obtained after 250 milliseconds.The average of RSS against the three IRIS motes (i.e., rss12, rss13, and rss23) was available in 15 different sequences of each activity.In our application, we considered the first sequence of the rss13 IRIS mote (chest-left ankle).The average of RSS of cycling was considered as in-control Phase I samples and average RSS of standing was considered as the out-of-control Phase II sample points.To access the normality of Phase I data set, we plotted a probability plot at the 95% confidence interval (cf. Figure 7) and also applied the Anderson-Darling test (AD = 0.517 and -value = 0.189), which provided the evidence that the Phase I data set was normal.From this wireless sensor network, data was recorded on the actor's activities such as; bending, cycling, standing, sitting, laying and walking.Further, for the first task of heterogeneous AReM, they considered activities such as cycling and standing.For the application purpose, we were concerned to detect a change in the pattern of RSS generated through the heterogeneous AReM setup.The AR time series dataset contained 480 observations in total, and each observation was obtained after 250 milliseconds.The average of RSS against the three IRIS motes (i.e., rss12, rss13, and rss23) was available in 15 different sequences of each activity.In our application, we considered the first sequence of the rss13 IRIS mote (chest-left ankle).The average of RSS of cycling was considered as in-control Phase I samples and average RSS of standing was considered as the out-of-control Phase II sample points.To access the normality of Phase I data set, we plotted a probability plot at the 95% confidence interval (cf. Figure 7) and also applied the Anderson-Darling test (AD = 0.517 and p − value = 0.189), which provided the evidence that the Phase I data set was normal.11.The process mean and standard deviation were estimated from the Phase I samples and found to be 16.9734 and 3.4764, respectively.These estimates were then used to construct the CUSUM chart for the Phase II samples.The RSS values of the chest-left ankle mote belonging to the cycling activity were clubbed into Phase I subgroups, and only the first 50 subgroups were used for the plotting purpose.Moreover, the first 25 subgroups based on the RSS values of chest-left ankle mote belonging to the standing activity were used as Phase II samples.The dataset of 75 subgroups is reported in Table 11.The process mean and standard deviation were estimated from the Phase I samples and found to be 16.9734 and 3.4764, respectively.These estimates were then used to construct the CUSUM chart for the Phase II samples.Figure 8a,b presents the scatter plot for the original data and the control chart output, respectively.It was evident from Figure 8a that there was a downward relocation in Phase II samples, a point equally supported by the corresponding CUSUM chart, which gave an out-of-control signal right from the start of the plots in Figure 8b.Now to access the effect of the outliers, we followed the same procedure as described in Section 6, by first contaminating the Phase I data and used the estimates obtained, μ0 = 18.9924 and σ0 = 7.4018 to setup a CUSUM control chart for the Phase II data (cf.Figure 8c,d).Secondly, we used the Tukey outlier detector to screen the Phase I samples, computed the control chart parameters and used the estimates, μ0 = 17.0593 and σ0 = 3.5543 to construct the CUSUM chart for the Phase II samples (cf. Figure 8e,f).
The introduction of outliers in the Phase I samples, Figure 8c gave rise to wider control limits, which in turn delayed the out-of-control signal in the Phase II control chart setup (cf. Figure 8d).However, the application of the outlier detector on the contaminated Phase I data resulted in the screening out of about ten data points (cf. Figure 8e).Subsequently, the corresponding CUSUM chart in Figure 8f shows a similar behavioral pattern as those of the original data in Figure 8b.

Conclusions
In this article, we evaluated the in-control performance of a two-sided CUSUM control chart when the parameters were estimated in the presence of outliers based on the robust Tukey detection model.Using a Monte Carlo simulation approach, the ARL and SDRL were computed for a different number of Phase I data.
The results show that a large number of Phase I data was required to minimize the practitioner-to-practitioner variability.In the presence of outliers, a larger amount of Phase I data was needed, which might not be realistic in practical applications.The results further revealed that the use of the Tukey outlier detector in the construction of a two-sided CUSUM control chart required fewer Phase I observations to stabilize the chart's performance.Therefore, it was plausible to use the Tukey's model in the design structure of a CUSUM chart when the parameters were estimated for efficient process monitoring, particularly when the observations were prone to outliers.The advantage of this proposal is its simplicity to design and it is easy to use.A point demonstrated by the illustrative and application examples of the new Tukey CUSUM control chart.The scope of this study might be extended to other control charts design strategies like the Shewhart and exponentially weighted moving average.

31 Figure 1 .
Figure 1.Control limits for the two-sided CUSUM chart when the in-control mean and standard deviation are either known or estimated ( = 5, ARL = 200).

Figure 1 .
Figure 1.Control limits for the two-sided CUSUM chart when the in-control mean and standard deviation are either known or estimated (n = 5, ARL 0 = 200).

Figure 2 .
Figure 2. In-control ARL values for the two-sided CUSUM control chart in the presence of an outlier,

Figure 2 .
Figure 2. In-control ARL values for the two-sided CUSUM control chart in the presence of an outlier, with and without screening, when the parameters are estimated (w = 1, n = 5, ARL 0 = 200).

Figure 2 .Figure 3 .
Figure 2. In-control ARL values for the two-sided CUSUM control chart in the presence of an outlier,

Figure 3 .
Figure 3. In-control ARL values for the two-sided CUSUM control chart in the presence of an outlier, with and without screening, when the parameters are estimated (w = 2, n = 5, ARL = 200).

Figure 4 .
Figure 4. In-control ARL values for the two-sided CUSUM control chart in the presence of an outlier, with and without screening, when the parameters are estimated ( = 3,  = 5, ARL = 200).

Figure 4 .
Figure 4. In-control ARL values for the two-sided CUSUM control chart in the presence of an outlier, with and without screening, when the parameters are estimated (w = 3, n = 5, ARL 0 = 200).
For example, if  = 10,  = 0.25 and  = 0.05, the in-control ARL values for the non-screened data were 302, 634 and 1025 when  = 1, 2 and 3, respectively.Compared to the screened Phase I data by the Tukey's model with ARL values of 217, 222 and 225.Thus, the Tukey CUSUM chart could relatively withstand the impact of outlier multiplier  as compared to the chart based on non-screened data.

Figure 5 .
Figure 5. Scatter plots and the CUSUM control chart outputs for the dataset on the width of the hard-brake process.

Figure 7 .
Figure 7. Probability plot of the received signal strength (RSS) values of the rss13 mote belonging to the cycling activity.

Figure
Figure 8a,b presents the scatter plot for the original data and the control chart output, respectively.

Figure 7 .
Figure 7. Probability plot of the received signal strength (RSS) values of the rss13 mote belonging to the cycling activity.

Figure 8 .
Figure 8. Scatter plots and CUSUM control chart outputs for the dataset on the received signal strength process.

Table 1 .
A synthesis table for the past and current research on a two-sided cumulative sum (CUSUM) chart.

Table 2 .
Run length (RL) properties for the two-sided CUSUM control chart when the in-control mean and standard deviation are known (n = 5, ARL 0 = 200).

Table 3 .
RL properties for the two-sided CUSUM control chart when the in-control standard deviation is known, and mean is estimated (n = 5, ARL 0 = 200).

Table 4 .
RL properties for the two-sided CUSUM control chart when the in-control mean is known, and the standard deviation is estimated (n = 5, ARL 0 = 200).

Table 5 .
RL properties for the two-sided CUSUM control chart when the in-control mean and standard deviation are estimated (n = 5, ARL 0 = 200).

Table 6 .
Control limits for the two-sided CUSUM chart when the in-control mean and standard deviation are either known or estimated (n = 5, ARL 0 = 200).

Table 6 .
Control limits for the two-sided CUSUM chart when the in-control mean and standard deviation are either known or estimated ( = 5, ARL = 200).

Table 7 .
In-control average run length (ARL) and standard deviation run length (SDRL) values for the two-sided CUSUM control chart in the presence of outlier when the in-control standard deviation is known, and mean is estimated (n = 5, ARL 0 = 200).

Table 8 .
In-control ARL and SDRL values for the two-sided CUSUM control chart in the presence of an outlier when the in-control mean is known, and the standard deviation is estimated (n = 5, ARL 0 = 200).

Table 9 .
In-control ARL and SDRL values for the two-sided CUSUM control chart in the presence of an outlier when the in-control mean and standard deviation are estimated (n = 5, ARL 0 = 200).

Table 11 .
Phase-I subgroups of RSS values chest-left ankle mote.Phase-II subgroups of RSS values chest-left ankle mote.

Table 11 .
Phase-I subgroups of RSS values chest-left ankle mote.Phase-II subgroups of RSS values chest-left ankle mote.