A New EWMA Control Chart for Monitoring Multinomial Proportions

: Control charts have been widely used for monitoring process quality in manufacturing and have played an important role in triggering a signal in time when detecting a change in process quality. Many control charts in literature assume that the in-control distribution of the univariate or multivariate process data is continuous. This research develops two exponentially weighted moving average (EWMA) proportion control charts to monitor a process with multinomial proportions under large and small sample sizes, respectively. For a large sample size, the charting statistic depends on the well-known Pearson’s chi-square statistic, and the control limit of the EWMA proportion chart is determined by an asymptotical chi-square distribution. For a small sample size, we derive the exact mean and variance of the Pearson’s chi-square statistic. Hence, the exact EWMA proportion chart is determined. The proportion chart can also be applied to monitor the distribution-free continuous multivariate process as long as each categorical proportion associated with speciﬁcation limits of each quality variable is known or estimated. Lastly, we examine simulation studies and real data analysis to conduct the detection performance of the proposed EWMA proportion chart.


Introduction
Process control plays a critical role in fostering sustainable practices within industries. It establishes a connection and enables the attainment of secure and efficient process operation and energy systems. Sustainability encompasses the integration of economic, social, and environmental systems, necessitating a well-rounded approach to resource management [1][2][3]. From the standpoint of process control, several factors contribute to sustainable practices, including the minimization of raw material costs, reduction of product and material scrap/waste expenses, optimization of capital costs, enhancement of process and energy efficiency, mitigation of carbon and water footprints, and maximization of eco-efficiency and process safety. Therefore, process control plays a pivotal role in offering sustainability solutions for developing and implementing efficient technology (refer to Daoutidis et al. [4]). In other words, the practice of sustainability introduces new operational challenges in the development of process control methods. So far, few papers have discussed developing or utilizing control charts to offer sustainability solutions. For example, Anderson et al. [5] applied multivariate control charts to monitor ecological and environmental measurement indices; Morrison [6] used control charts to interpret and monitor environmental data; Gove et al. [7] adopted control charts to catch water supply in south-west Western Australia; Oliveira da Silva et al. [8] constructed control charts to help in stability and reliability of water quality; Shafqat et al. [9] provided triple EWMA mean control chart to monitor and compare Air and Green House Gases Emissions of various countries and identified the critical countries. Control charts serve as effective tools in process control, aiming to enhance the quality and yield of products/parts while reducing scrap/waste of raw materials, minimizing carbon and water footprints, and increasing profits/eco-efficiency and energy efficiency of products.

Investigation of the Property of Pearson's Chi-Square Statistic for Correlated Quality Variables following a Multinomial Distribution
We first denote X = (X 1 , X 2 , . . ., X m ) as the count vector of m categories in n independent trials, where X i is the count number of the ith category, i = 1, 2,. . ., m. Let p 0 = (p 0,1 , p 0,2 , . . ., p 0,m ) be a vector of the in-control proportion associated with X = (X 1 , X 2 , . . ., X m ), where p 0, i , i = 1, . . ., m, is the in-control proportion of the i-th category, and m ∑ i=1 p 0,i = 1. Next, X follows a multinomial distribution with probability mass function p(X 1 = x 1 , X 2 = x 2 , . . . , X m = x m ) = n! x 1 !x 2 ! . . . x m ! p x i = n, and x i is the realization value of X i for i = 1, . . ., m.
To know whether there is a change in the in-control proportion vector p 0 , a natural idea is to adopt the Pearson's chi-square statistic to make a test. The in-control Pearson's chi-square statistic: where e 0,i = np 0,i is the in control expected number of the ith category.
We now study the in-control distribution of the Pearson's chi-square statistic and derive its exact mean and variance by considering various sample size and in-control proportion vector. When n is large enough, the Pearson's chi-square statistic χ 2 follows an asymptotical chi-square distribution with degree of freedom (df) m − 1; that is, χ 2~χ2 (m − 1). This is a well-known asymptotical distribution. When n is small, the distribution of Pearson's chi-square statistic does not follow the χ 2 (m − 1) distribution. Hence, it is better to know the distribution of the Pearson's chi-square statistic for a small sample size. However, it is impossible to know the exact distribution of the Pearson's chi-square statistic, but we may derive its exact mean and variance as follows.
First, it is easy to derive the in-control mean of Pearson's chi-square statistics χ 2 given the in-control proportion as follows.
As per our best knowledge, the variance of the Pearson's chi-square statistic has not been derived. We derive the in-control exact variance of Pearson's chi-square statistic χ 2 as follows.
The Appendix A presents the derivation process. From (3), we find the variance value differs along with sample size n given m and p 0 , that is, the variance value is not fixed for various n.
To investigate how the mean and variance change under different n and in-control proportion vectors, without loss of generality, we consider two scenarios of in-control proportion vectors. In practice, the proportions could be all the same or not. It is the reason that we consider the proportion vector with the two scenarios. The two scenarios of in-control proportion vectors, each with four proportions for four categories are as follows.
Scenario (1) Table 1 shows the calculated exact means and variances under different n and two scenarios of in-control proportion vectors. We find the following results in Table 1: (i) Under scenario (1), the exact means are all fixed at 3 whether n is small or large.
However, the exact variance increases when n increases but converges to 5.999 when n is equal to 6000. (ii) Under scenario (2), the exact mean are all fixed at 3 whether n is small or large.
However, the exact variance decreases when n increases but converges to 6.0 when n is equal to 6000. (iii) The exact variance increases or decreases heavily due to the in-control proportion vector. We can see that the change behavior of the exact variance for increasing n is different in scenarios (1) and (2).
The above results present clear evidence and show that the variance of the Pearson's chi-square statistic is not fixed for a small sample size. However, the variance converges to 2(m − 1) when the sample size is large enough.  (1) and (2) with in-control proportion vectors. From Table 1, we can construct the exact EWMA-proportion control chart whether n is small or large.

A Pearson's Chi-Square (χ 2 ) Statistic-Based EWMA Chart for Monitoring the Multinomial Proportions
In statistical process control, sample size is usually small and not large. When n is not large enough, the distribution of Pearson's chi-square statistic does not follow the well-known χ 2 (m − 1) distribution. The resulting variances of the Pearson's chi-square statistic for various n in Section 2 exhibit this situation. Hence, it is not appropriate to adopt the χ 2 (m − 1) distribution to construct the EWMA-χ 2 control chart so as to monitor the multinomial-proportion process. The misuse of the EWMA-χ 2 control chart results in worse out-of-control detection performance.
We are able to derive the exact mean and variance of the Pearson's chi-square statistic whether the sample size is small or not in Section 2, although it is impossible to know the distribution of the Pearson's chi-square statistic. Based on (2) and (3), we may construct the exact EWMA-proportion control chart to monitor the changes in proportion vector of the multinomial quality variables for a small sample size. When sample size n is large enough, the in-control Pearson's chi-square statistic is approximately distributed as χ 2 (m − 1) distribution with df m − 1. Thus, the monitoring statistic is independent of the original multinomial distribution and sample size n. Hence, we construct the asymptotic EWMA-proportion control chart. The detection performance of the two proposed EWMAproportion control charts is then compared.

The Exact Multinomial-Proportion Control Chart
With the derived exact mean and variance of the in-control Pearson's chi-square statistic, we may construct an exact EWMA-proportion control chart with the upper control limit (UCL), center line (CL), and lower control limit (LCL) as follows; see (5), for various sample size. In other words, the EWMA-proportion control chart has the control limit depending the value of n given the m categories. Here, we let LCL be zero since the out-of-control proportion vector leads to an increase in the value of the Pearson's chi-square statistic.
We let the EWMA chart with monitoring statistic EW MA χ 2 t at time t be the weighted average of the Pearson's chi-square statistic χ 2 at time t: where λ ∈ ( 0, 1) is a smooth parameter. The in-control mean and variance of monitoring statistic EW MA χ 2 t at time t are = m − 1. The control limits of the exact EWMA-proportion control chart are consequently: where the coefficient L n should be chosen to satisfy the specified ARL 0 .
To determine L n satisfying a specified ARL 0 , we use the Monte Carlo method and follow Yang et al. [23]. The Monte Carlo procedure using R program language is applied to calculate L n , by satisfying a specified ARL 0 (see Appendix B, Algorithm A1).
Based on the Monte Carlo procedure, Table 2 lists the resulting L n of the exact EWMAproportion control charts with specified ARL 0 = 370.4 for various combinations of setting n and λ under the aforementioned two scenarios with in-control proportion vectors. We find that the L n value increases slowly as n increases and converges to 2.416 or 2.417 when n is equal to 6000 under scenario (1) or (2).

The Asymptotic Multinomial-Proportion Control Chart
When n is large enough, the Pearson's chi-square statistic χ 2 follows an asymptotical chi-square distribution with df m − 1 for an in-control process, that is, χ 2~χ2 (m − 1) with mean m − 1 and variance 2(m − 1). Thus, the monitoring statistic is independent of the original multinomial distribution and sample size n.
Based on the in-control asymptotical chi-square distribution, we may establish an EWMA multinomial-proportion control chart to monitor whether the proportion vector changes or not.
We let the EWMA chart with monitoring statistic EW MA where EW MA χ 2 0 = E(χ 2 ) = m − 1, and λ ∈ ( 0, 1) is a smooth parameter. The mean and variance of monitoring statistic EW MA , respectively. We may find that the mean and variance of the monitoring statistic EW MA χ 2 t are independent on n.
Hence, the dynamic control limits of the EWMA-χ 2 control chart are constructed as where L is a coefficient of UCL and should be chosen to achieve a specified ARL 0 .
To determine L satisfying a specified ARL 0 , we refer to the Markov chain method in Lucas and Saccucci [24] or Chandrasekaran et al. [25]. We describe the ARL 0 calculation procedure as follows.
Step 1. For a given L, at time t, the region (0, UCL t ] is partitioned into k(e.g., k = 101) subsets or state Step 2. Denote the transition probability matrix with transition probabilities p i,j t , from

is a column vector of ones, and the initial state probability is
To obtain the coefficient of the UCL, L, of the asymptotical control chart we next adopt the bisection algorithm. The calculation procedure is described as follows.
Step 4. Repeat step 2 and step 3 until |ARL 0 (L middle ) − ARL 0 |≤ ε . Hence, L = L middle . Based on the Markov chain method and bisection algorithm described above, the calculated coefficient (L) of the UCL with specified ARL 0 = 370.4 under scenario (1) or (2) is 2.416. The result is obvious since L is a fixed value and independent of sample size n.

Comparison of the Exact and Asymptotic Multinomial-Proportion Control Charts
The resulting L and L n of the exact and asymptotic EWMA-proportion control charts for the two scenarios show that L n converges to L (=2.416) when n (≥6000) is large enough. However, when n is not large enough, estimated L n and L exhibit obvious difference. This is evidence that it is incorrect to adopt the asymptotic EWMA-proportion control chart to monitor the multinomial proportion vector when n is small or not large enough. Hence, the exact EWMA-proportion control chart is recommended for small and not large enough n.

Detection Performance Measurement of the Proposed Exact and Asymptotic EWMA-Proportion Control Charts
Without loss of generality, to measure the out-of-control detection performance of the proposed exact and asymptotic EWMA-proportion charts, we consider the following two scenarios with six out-of-control proportion vectors for setting n = 2(1)20, 50, 100(100), λ = 0.05, and ARL 0 = 370.

Detection Performance of the Proposed Exact EWMA-Proportion Chart
Applying the calculated control limit coefficient, L n , of the proposed exact chart and the given scenarios (1) and (2) with the six out-of-control proportion vectors and sample size, we can calculate out-of-control average run length (ARL 1 ). The Monte Carlo procedure is also applied to calculate ARL 1 using R program language, see Appendix C (Algorithm A2). A smaller ARL 1 indicates better detection performance of a control chart. ARL 1 is always a popular detection performance index in the study of statistical process control.
The resulting Tables 3 and 4 illustrate the calculated ARL 1 (first row) and SDRL (standard deviation of run length; second row) of the proposed exact chart for various n and scenarios (1) and (2), respectively. We find the following results in Tables 3 and 4: (i) For detecting any out-of-control proportion vector, ARL 1 decreases when n increases; (ii) The larger the difference is between p 0 and p i , the smaller is ARL 1 under each n. The result is reasonable.

Detection Performance of the Asymptotic EWMA-Proportion Chart
Applying the calculated control limit coefficient, L, of the asymptotic chart and the given scenarios (1) and (2) with the six out-of-control proportion vectors, we can calculate ARL 1 .
We find the following results in Tables 5 and 6: (i) Most ARL 0 s are far away from the specified 370.4 for small n. In Table 5, we find many ARL 0 s are larger than the specified 370.4 for n < 400, and some ARL 1 s are larger than the specified 370.4 for very small n. However, in Table 6, we find all ARL 0 s are smaller than the specified 370.4 for n < 6000. These results indicate that the proposed asymptotic control chart is not in-control robust, it becomes ARL biased, and its detection performance is worse for small n. (ii) When n is large (n ≥ 400 for scenario (1) or n = 6000 for scenario (2)), the calculated ARL 0 close to the specified ARL 0 , and ARL 1 decreases when n increases for detecting any out-of-control proportion vector.
(iii) The larger the difference is between p 0 and p i , i = 1, 2, . . ., 6, the smaller is ARL 1 under each n.  process; otherwise, the detection performance of the asymptotic control chart would be worse and result in an incorrect process adjustment. Compared with the resulting Tables 3-6, we find that the two charts do have almost the same in-control and out-of-control process control performances for n ≥ 6000. However, the exact EWMA-proportion chart offers correct results compared to the asymptotic control chart, especially for small n. Hence, the proposed exact EWMA-proportion chart is recommended whether the sample size is small or not.

Monitoring Under-Specification Proportions of a Continuous Multivariate Process Using the Proposed EWMA-Proportion Chart and Its Application
The proposed exact EWMA-proportion chart can not only be applied to monitor the proportion vector of a multinomial process but also the proportion vector of multiple categories in a distribution-free or an unknown distributed continuous multivariate process.
In this section, we provide an example to describe how to apply our proposed exact chart to monitor the proportion vector of four categories in a distribution-free or an unknown distributed continuous bivariate process. We adopt a semiconductor manufacturing data-set that can be found in a data depository maintained by the University of California, Irvine (McCann and Johnston [26]). The data-set spans from July 2008 to October 2008 and contains 591 continuous quality variables. Each variable has 1567 observations, including 1463 in-control observations and 104 out-of-control observations.
To demonstrate the detection performance of the proposed exact chart, we select 2 of the 591 continuous correlated quality variables, X = (X3, X12) T . Based on the respective specifications of X3 and X12, they can be classified into four categories. The four categories: (1) X3 and X12 are all under specifications, (2) X3 is under specification, but X12 is not, (3) X3 and X12 are all out of specifications, and (4) X3 is out of specification, but X12 is under specification. By examining the 1463 in-control population observations, we classify their categories and obtain the proportion vector of the four categories as p 0 = (0.4, 0.08, 0.07, 0.45). For the 104 out-of-control population observations, the proportion vector of the four categories is p 1 = (0.00, 0.00, 0.2167, 0.7833). To demonstrate the detection performance of the proposed exact chart, we take the first 100 in-control observations and the first 60 out-of-control observations, respectively. We let the sample size be five, then there are 20 in-control samples and 12 out-of-control samples. To monitor the process proportion vector, we construct the exact control chart applying the aforementioned method.
From (5), we know that the control limit of the proposed exact control chart is variable when sampling time changes. Hence, for each sampling time t, we list UCL t , the number of observations in each category (n ij ), the in-control statistic value (χ 2 t ), and charting statistic value (EW MA χ 2 t ) for the 20 in-control subgroup data. The results are illustrated in Table 7. We then plot the in-control EW MA χ 2 t values in the constructed exact control chart; see Figure 1. We find all EW MA χ 2 t values fall within UCL t demonstrating that the first 20 samples are all from the population with the in-control proportion vector. Furthermore, we calculate n ij , the out-of-control statistic value (χ 2 t ), and charting statistic value (EW MA χ 2 t ) using the 12 out-of-control subgroup data. The results appear in Table 8. We display the out-of-control EW MA χ 2 t values in the constructed exact control chart in Figure 2. We find that the first EW MA χ 2 t value falls outside of UCL t , and ten out of the twelve EW MA χ 2 t values create signals. It demonstrates that the proposed exact control chart performs well in detecting the out-of-control proportion vector.

Conclusions
This paper develops the exact and asymptotic EWMA-proportion control charts to monitor the multinomial-proportions process. Based on the derived in-control exact mean and variance of the chi-square statistic, we calculate the control limits of the exact EWMAproportion control chart for various small and large sample sizes using the Monte Carlo

Conclusions
This paper develops the exact and asymptotic EWMA-proportion control charts to monitor the multinomial-proportions process. Based on the derived in-control exact mean and variance of the chi-square statistic, we calculate the control limits of the exact EWMAproportion control chart for various small and large sample sizes using the Monte Carlo method. Based on the asymptotic chi-square distribution with df m − 1, we calculate the control limits of the asymptotic EWMA-proportion control chart for a large enough sample size using the Markov chain method.
From numerical analyses, we find that control limits (5) and (7) with the same preset incontrol ARL and out-of-control detection ability are nearly the same when the sample size is large enough, e.g., n ≥ 6000 under scenarios (1) and (2). For small or moderate sample size, the exact EWMA-proportion control chart is in-control robust, but the asymptotic control chart's in-control ARL is more or less than the preset ALR 0 = 370.4. The misuse of the asymptotic control chart results in worse out-of-control detection performance. Thus, we strongly suggest to adopt the proposed exact control chart to monitor a multinomialproportions process. Moreover, the proposed exact EWMA proportion chart can be adopted to monitor the change in proportions of categories of a distribution-free or unknown continuous distributed multivariate process. A numerical example utilizing semiconductor manufacturing data was discussed to illustrate the application of the proposed exact EWMA proportion chart. The illustration of real data example shows good detection performance of the proposed chart.
In this study, we have developed a novel, efficient, and exact EWMA-proportion control chart specifically designed for monitoring a multinomial-proportion process. Unlike existing literature, which focuses on control charts for multinomial proportions with large or infinite sample sizes, our proposed method is tailored for small and medium sample sizes. Our exact EWMA-proportion control chart offers significant potential for providing sustainable solutions across various industries. We recommend applying this method not only for monitoring multinomial proportions in a multinomial process but also for distribution-free or unknown continuous distributed multivariate processes. By utilizing the proposed exact EWMA-proportion control chart, organizations can effectively monitor and control their processes, enabling them to identify and address deviations or shifts in the multinomial proportions. This approach holds promise for enhancing quality assurance, process optimization, and overall operational performance in diverse industrial settings.  (a) E(X i − np 0,i ) 4 = np 0,i (1 − p 0,i )(1 + 3 p 2 0,i − 3p 0,i + 3n 2 p 2 0,i (1 − p i ) 2 − 3np 2 0,i (1 − p 0,i ) 2 . Proof: suppose that X i1 , X i2 , . . . , X in are i.i.d Bernoulli(p 0,i ) and then Thus, we have For i = j, we get Hence, we have np 0,i − m 2 +2m−2 n + 2(m − 1).