Abstract
Among attribute processes, the number of nonconformities conforming to a Poisson distribution is among the most crucial quality attributes. Furthermore, owing to the variety of quality attributes, the significance of the multivariate Poisson process in industry cannot be overstated. An out-of-control multivariate Poisson process can be detected using an alarm on a multivariate control chart. Nevertheless, pinpointing the specific quality attributes that led to the process shifts is complex. The study focuses on the causes that lead to process shifts in multivariate Poisson processes, unlike the majority of studies examining shifts in multivariate normal processes. This paper initially presents a statistical method for detecting outliers in a multivariate Poisson distribution. Furthermore, a progressive testing algorithm is then developed to identify the variables responsible for a failure within a multivariate Poisson process. According to simulation results, the proposed approach can effectively determine the sources of a process fault within a multivariate Poisson process.
MSC:
62P30; 62J15
1. Introduction
Pinpointing the causes of process failures is crucial to enhancing a process. Multivariate control charts are extensively employed in identifying process faults. Nevertheless, this just means that faults have disrupted the underlying process. Thus, uncovering the main cause of faults in multivariate control charts has become increasingly critical, driving the rapid surge of related research.
In previous investigations, most attempts have been made to determine the variables that induce changes in mean or variance within a multivariate normal process. Several studies have suggested utilizing an approach based on soft computing for detecting changes in process variance or mean [1,2,3,4,5,6,7,8,9,10,11]. Furthermore, the investigation of the application of decomposition statistics to identify changes in variance or mean in a multivariate normal process was conducted [12,13,14,15,16,17,18,19,20]. However, the majority of these studies made the assumption that the process can be measured and conform to a multivariate normal distribution. It is noteworthy that many processes are not able to be numerically quantified. Instead, multiple correlated attributes must be assessed simultaneously to determine quality characteristics [21,22,23,24,25,26,27,28,29,30,31]. When a process has multiple attributes and the number of nonconformities follow a multivariate Poisson distribution, as is often the case in practical situations [23,30,31], recognizing the cause of a signal shift is crucial. Therefore, as opposed to the earlier discussion concerning multivariate normal process shifts, the aim of this research is to devise an effective method to recognize the sources of a process failure for a multivariate Poisson process.
The next section introduces the presented method for recognizing the sources of nonconformities shifts for a multivariate Poisson process. Section three presents numerical results, showing the efficiency of the suggested procedure. Section four provides a demonstrative example to illustrate the proposed method. The last section concludes this study.
2. Main Results
Let the k characteristics of the i th observation be represented by
which follows a multivariate Poisson distribution , where , , and . To monitor a multivariate Poisson process, numerous multivariate control charts have been suggested in the literature. To give an example, Chiu and Kuo [31] proposed the following control limits:
where
and
In practice, when the parameters and are unknown, unbiased estimators can be utilized to estimate the unknown parameters. When the signal becomes beyond control on a multivariate control chart of this kind, one of the difficulties is determining the variable which is responsible for it. The succeeding subsection first develops an approach to detecting outliers of the multivariate Poisson distribution established upon the maximum adjusted residual. Afterward, a straightforward approach is then proposed to recognize the causes of a process failure in a multivariate Poisson process.
2.1. A Method for Identifying Outliers in the Multivariate Poisson Distribution
Consider to be a random sample taken from the above-described multivariate Poisson distribution. The sample means are therefore
Note that the multivariate Poisson distribution of implies that has a Poisson distribution . There are many situations where it is necessary to test the hypothesis.
where denotes a predetermined mean vector. Accordingly, the residual that has been adjusted can be defined as follows:
Under the null hypothesis, it is simple to show that will converge in distribution to , which conforms to a multivariate normal distribution having mean and covariance matrix , where
Discovering outliers in the data is essential across a broad range of applications. Outliers can be defined in various ways depending on the application field. According to Suri, Murty, and Athithan [32], an outlier in a data set is typically defined as an object that deviates from the known/normal behavior or assumes values significantly different from the expected ones. According to the definition provided, this study defines outliers as data points that do not conform to the given “typical” distribution. Consider as the difference between and . If every is zero, we can say there are no outliers. However, if certain differences are positive or negative, those corresponding variables could be defined as positive or negative outliers. Based on our understanding, there has been no literature so far that offers a statistical test method for outlier detection in multivariate Poisson-distributed data in line with the mentioned outlier definition. To find multivariate Poisson distribution’s positive outliers, an effective method may be to use the largest value among the adjusted residuals
At significance level , the test regarding such a single-sided alternative would be to reject it, provided that
where c meets the condition of
By utilizing Boole’s inequality, we can obtain an easy-to-use approximation of c, which is
Accordingly,
where represents the th upper-tail percentile of the standard normal distribution. Consequently, is a proximate upper limit for c. As the calculation is easy to perform and typically leads to a conservative result, our suggestion is to derive the critical value of the test above by approximating it with the upper bound.
We undertake a series of simulated trials to assess whether the introduced approximation is effective. To evaluate whether the approximation is accurate, we perform a Monte Carlo simulation to estimate in accordance with null hypothesis assumption and compare to the nominal level. We evaluated two distinct values of k: 2, 3. Additionally, we consider five possible cases of , namely, (1,…,1), (5,…,5), (10,…,10), (30,…,30), (50,…,50). Sample sizes of 5, 10, 30, 50 are simulated to evaluate the effects of sample size. Due to the non-positive definiteness of the covariance matrix for , the present study concerns six values of : 0, 0.1, 0.3, 0.5, 0.7, 0.9. Krummenauer’s [33] algorithm is used to simulate 10,000 samples from each of the given multivariate Poisson populations to calculate . To estimate , we determine the percentage of the 10,000 simulated values that exceed . Accordingly, better performance is associated with smaller deviations of from the nominal levels.
For each value of , Figure 1 plots the deviations of from the nominal levels against the values and (k, n). Figure 1 shows that around half of the absolute deviations will fall under 0.0053 and 70% of them will be below 0.0095 for nominal levels of 0.05. At a nominal level 0.01, roughly 70% and 90% of absolute deviations are below 0.0027 and 0.0057. Additionally, when the nominal level is 0.001, 70% of the absolute deviations will be less than 0.0007 and 90% will be less than 0.0017. Consequently, in most cases, the approximation may be adequate.
Figure 1.
The relationship between and deviation, with each and (k, n) shown separately.
2.2. A Method for Recognizing the Causes of a Process Failure in a Multivariate Poisson Process
Through the use of the method introduced earlier and the test procedure outlined below, one can pinpoint the main contributors to the out-of-control signals for a multivariate Poisson process:
- (I)
- Commence at i = 1.
- (II)
- Assign the value of to . (The error spending approach and the Bonferroni method are utilized here to retain the type I error around its nominal value. Refer to [34]).
- (III)
- At the significance level, test the hypothesis with the statistic .
- (IV)
- Eliminate the variable exhibiting the largest value in the adjusted residual if the hypothesis is rejected in Step (III). To make it easier, assuming the kth variable is removed. Updata k = k − 1 and i = i + 1. Revisit Step (II).
- (V)
- When the hypothesis in Step (III) cannot be rejected, Terminate and deduce whether the other variables are not responsible for the nonconformity shifts.
Repeating the steps iteratively is necessary until only a subset of variables can no longer reject the hypothesis. Accordingly, while other characteristics were deemed to be contributors to process shifts, this set of characteristics was not considered to be the source.
To summarize, the method mentioned above can be briefly outlined as follows: Once a data set is collected, first use the in-control process mean to calculate the standardized residual for each quality characteristic. Then, rank these standardized residuals from largest to smallest. Next, compare them sequentially with the corresponding critical values, from largest to smallest, to determine whether the quality characteristic is out-of-control. It is worth noting that the proposed method is based on the fundamental assumption that the quality characteristics follow a multivariate Poisson distribution. If the quality characteristics do not conform to this distribution, the method will not be able to identify the sources accurately. Additionally, the proposed method assumes that when the process is out-of-control, the multivariate control chart can correctly detect it. Under these circumstances, the method can effectively determine the contributors. However, if the multivariate control chart fails to detect the out-of-control condition, the method will not be applicable.
As commonly understood, the multivariate Poisson distribution has broad applications in various fields. In addition, many research issues are essentially statistical problems focused on detecting outliers. Therefore, the approach introduced in this study can be applied not only for detecting sources of out-of-control processes in a multivariate Poisson process but can also offer practical potential for addressing problems involving the detection of outliers in multivariate Poisson distribution data in various fields.
3. Numerical Simulations
We perform simulations to showcase the effectiveness of the approach introduced earlier. Examining every possible data structure is an unfeasible task. Therefore, this study considers two values of k: 2, 3. FORTRAN V was used for coding all simulation programs. When the value of k is two, we use the telecommunication data set discussed in Chiu and Kuo [31] and assume the data have a bivariate Poisson distribution with in a process under control. In the case where k equals three, we apply the hepatitis C data set examined in Pascual and Akhundjanov [30] and assume
and that the correlation matrix is
Additionally, three simulation cases with out-of-control situations were considered. These cases take into account the following mean vectors:
Figure 2.
The relationship between and the ARR under case I for various sample sizes with k = 2 and k = 3.
Figure 3.
The relationship between and the ARRs under different cases for various sample sizes with k = 3.
Figure 2 illustrates how the ARR changes with varying shift values for various sample sizes. It is observed that, as either the sample size or the shift value increases, the ARR rises to 1. This is to be expected. Additionally, according to Figure 3, it can be inferred that more out-of-control quality variables necessitate a larger sample size to maintain the same ARR, as observed through a comparison of the three different simulated cases. As a result, it is observable that the accurate identification of sources of a process failure for a multivariate Poisson process can be achieved using the proposed approach with a sufficient sample size. Similar results were observed in our comprehensive simulation studies.
4. A Demonstrative Case
A practical example discussed in Pascual and Akhundjanov [30] concerning hepatitis C disease is analyzed to demonstrate the proposed method. Pascual and Akhundjanov [30] utilized the monthly data on hepatitis C notifications from three Australian states and assumed the counts of hepatitis C incidents to be trivariate Poisson distribution distributed. The three states referenced are New South Wales, Victoria, and South Australia. For clarity, let N, V, and S symbolize the three associated states. Pascual and Akhundjanov [30] tracked disease activity with a multivariate control chart to quickly find any upward trends in cases. Based on Pascual and Akhundjanov [30], the in-control mean vector and correlation matrix are estimated as shown in (14) and (15). To demonstrate the proposed method, we consider the following out-of-control mean vector
which is one of the out-of-control simulation scenarios examined by Pascual and Akhundjanov [30]. It is evident that N and V are the states contributing to nonconformity shifts. In the event that the multivariate control chart triggers an out-of-control signal, the proposed method can now be used to determine the quality variables causing the nonconformity shifts. For ease of explanation, an illustrative data set is generated by assuming n = 10 and using the distribution with the specified defined in (16) and defined in (15). The simulated data set consists of the following ten observations: (3, 16, 5), (4, 17, 4), (3, 18, 6), (6, 18, 5), (11, 19, 10), (5, 23, 6), (2, 17, 5), (3, 12, 5), (6, 17, 3), and (7, 17, 8). To assess the appropriateness of the simulated data, the one-sample Kolmogorov–Smirnov test is applied to determine if the three variables conform to the Poisson distribution. The results are given in Table 1.
Table 1.
The results of the one-sample Kolmogorov–Smirnov test for checking whether the variables conform to a Poisson distribution.
As shown in Table 1, the p-values for all three variables are above 0.05. These results offer substantial support that Poisson distributions appropriately model the marginal distributions of the simulated disease counts. In addition, the Fisher z-transformation test statistic
is employed to test whether the correlation coefficients between each pair of variables in the simulated data conform to Equation (15). The results are shown in Table 2.
Table 2.
The results of the Fisher z-transformation test for testing whether the correlation coefficients between each pair of variables follow Equation (15).
According to Table 2, all p-values exceed 0.05, suggesting that the correlation coefficients are in agreement with Equation (15). Based on the results above, it may be concluded that the simulated disease counts conform to a multivariate Poisson distribution. Furthermore, the sample means and adjusted residuals are calculated and shown in Table 3.
Table 3.
The summary statistics of the simulated disease counts.
With a significance level of 0.05 and the application of the method described earlier, it is possible to determine the contributors to nonconformity shifts. Table 4 provides a summary of this analysis.
Table 4.
Demonstration of the proposed test procedure ().
According to Table 4, since the test statistic surpasses 2.39 in the initial iteration, we reject the null hypothesis. Given that N exhibits the highest test value, we assert that it contributes to the nonconformity shifts. We exclude the first variable, increment i by i + 1, and decrement k by k − 1. In the same way, in the second iteration, the test statistic still exceeds 2.39, and V has the highest test value. As a result, the null hypothesis is rejected and the state V is considered to be the cause. After removing the second variable, we adjust i to i + 1 and k to k − 1. At iteration three, the test statistic falls to 1.30, which is below 2.24, so the null hypothesis cannot be rejected and we end the testing procedure. The analysis reveals that the remaining state S is not responsible for the nonconformity shift. The data in this table highlight how the proposed method can effectively and easily pinpoint the contributors to nonconformity shifts.
5. Conclusions
In the process industry, it is crucial to recognize the out-of-control process contributor as quickly and accurately as possible. This study differs from most previous methods by focusing on identifying contributors in nonconformity shifts instead of multivariate normal processes. This study proposes a new test approach to identify outliers in a multivariate Poisson distribution. Additionally, this study presents an iterative testing method for recognizing the sources of a process failure for a multivariate Poisson process. According to our numerical results, the proposed method is a straightforward and effective approach for recognizing sources of a process failure in a multivariate Poisson process. Multivariate Poisson distributions are used widely in social and natural sciences, so the introduced test procedure may apply to various other disciplines.
When more than one hypothesis test is conducted at once, it is referred to as multiple testing. The probability of encountering one or more false positives while performing multiple tests is referred to as the family-wise error rate. To maintain the overall family-wise error rate close to our desired significance level and to reduce the risk of false positives, this study employs the Bonferroni method and the error spending approach. However, the Bonferroni correction is not the sole method for tackling multiple tests. Alternative techniques, including the Benjamani–Hochberg method [35], the Holm–Bonferroni method [36], and the Sidak method [37], are also viable options to consider. More studies are necessary to ascertain which method is the best.
A technique that approximates the critical value of the introduced test was created with the use of Boole’s inequality. In light of our numerical findings, the approximation works effectively under the typical nominal level. Although the approximation may be satisfactory for numerous applications, using a sharper inequality could enhance its performance. Further investigation of this possibility is needed. In addition, as various other types of multivariate processes exist, further research is needed to determine whether other multivariate processes can be tackled using the same approach.
Author Contributions
Conceptualization, C.-D.H. and R.-H.S.; methodology, C.-D.H. and R.-H.S.; software, C.-D.H. and R.-H.S.; validation, R.-H.S.; formal analysis, C.-D.H.; investigation, C.-D.H. and R.-H.S.; resources, C.-D.H. and R.-H.S.; data curation, C.-D.H. and R.-H.S.; writing—original draft preparation, C.-D.H. and R.-H.S.; writing—review and editing, C.-D.H. and R.-H.S.; project administration, C.-D.H. and R.-H.S.; visualization, C.-D.H. and R.-H.S. All authors have read and agreed to the published version of the manuscript.
Funding
This work is partially supported by the National Science and Technology Council, Taiwan, under grant numbers MOST 112-2118-M-030-002 (C.-D.H.).
Data Availability Statement
Data are contained within the article.
Conflicts of Interest
The authors declare no conflicts of interest.
References
- Low, C.; Hsu, C.M.; Yu, F.J. Analysis of variations in a multivariate process using neural networks. Int. J. Adv. Manuf. Technol. 2003, 22, 911–921. [Google Scholar] [CrossRef]
- Chen, L.H.; Wang, T.Y. Artificial neural networks to classify mean shifts from multivariate chart signals. Comput. Ind. Eng. 2004, 47, 195–205. [Google Scholar] [CrossRef]
- Hwarng, H.B.; Wang, Y. Shift detection and source identification in multivariate autocorrelated process. Int. J. Prod. Res. 2010, 48, 835–859. [Google Scholar] [CrossRef]
- Shao, Y.E.; Huang, H.Y.; Chen, Y.J. Determining the sources of variance shifts in a multivariate process using flexible discriminant analysis. ICIC Express Lett. 2010, 4, 1573–1578. [Google Scholar]
- Shao, Y.E.; Lu, C.J.; Wang, Y.C. A hybrid ICA-SVM approach for determining the quality variables at fault in a multivariate process. Math. Probl. Eng. 2012, 2012, 284910. [Google Scholar] [CrossRef]
- Shao, Y.E.; Hou, C.D. Hybrid artificial neural networks modeling for faults identification of a stochastic multivariate process. Abstr. Appl. Anal. 2013, 2013, 386757. [Google Scholar] [CrossRef][Green Version]
- Shao, Y.E.; Hou, C.D. Fault identification in industrial processes using an integrated approach of neural network and analysis of variance. Math. Probl. Eng. 2013, 2013, 516760. [Google Scholar] [CrossRef]
- Shao, Y.E. Recognition of process disturbances for an SPC/EPC stochastic system using support vector machine and artificial neural network approaches. Abstr. Appl. Anal. 2014, 2014, 519705. [Google Scholar] [CrossRef][Green Version]
- Shao, Y.E. Using a computational intelligence hybrid approach to recognize the faults of variance shifts for a manufacturing process. J. Ind. Intell. Inf. 2016, 4, 131–135. [Google Scholar] [CrossRef][Green Version]
- Shao, Y.E.; Lin, S.C. Using a time delay neural network approach to diagnose the out-of-control signals for a multivariate normal process with variance shifts. Mathematics 2019, 7, 959. [Google Scholar] [CrossRef]
- Sabahno, H.; Niaki, S.T.A. New Machine-Learning Control Charts for Simultaneous Monitoring of Multivariate Normal Process Parameters with Detection and Identification. Mathematics 2023, 11, 3566. [Google Scholar] [CrossRef]
- Runger, G.C.; Alt, F.B.; Montgomery, D.C. Contributors to a multivariate statistical process control signal. Commun. Stat.-Theory Methods 1996, 25, 2203–2213. [Google Scholar] [CrossRef]
- Mason, R.L.; Tracy, N.D.; Young, J.C. A practical approach for interpreting multivariate T2 control chart signals. J. Qual. Technol. 1997, 29, 396–406. [Google Scholar] [CrossRef]
- Maravelakisa, P.E.; Bersimisb, S.; Panaretosc, J.; Psarakisa, S. Identifying the out of control variable in a multivariate control chart. Commun. Stat.-Theory Methods 2002, 31, 2391–2408. [Google Scholar] [CrossRef][Green Version]
- Vives-Mestres, M.; Daunis-i-Estadella, J.; Martín-Fernández, J.A. Signal interpretation in Hotelling’s T2 control chart for compositional data. IIE Trans. 2016, 48, 661–672. [Google Scholar] [CrossRef]
- Kim, J.; Jeong, M.K.; Elsayed, E.A.; Al-Khalifa, K.N.; Hamouda, A.M.S. An adaptive step-down procedure for fault variable identification. Int. J. Prod. Res. 2016, 54, 3187–3200. [Google Scholar] [CrossRef]
- Pina-Monarrez, M. Generalization of the Hotelling’s T2 decomposition method to the R-chart. Int. J. Ind. Eng.-Theory Appl. Pract. 2018, 25, 200–214. [Google Scholar] [CrossRef]
- Güler, Z.O.; Bakır, M.A. Detection and identification of mean shift using independent component analysis in multivariate processes. J. Stat. Comput. Simul. 2021, 92, 1920–1940. [Google Scholar] [CrossRef]
- Haq, A.; Khoo, M.B.C. An adaptive multivariate EWMA mean chart with variable sample sizes and/or variable sampling intervals. Qual. Reliab. Eng. Int. 2022, 38, 3322–3341. [Google Scholar] [CrossRef]
- Jing, H.; Li, J.; Bai, K. Directional monitoring and diagnosis for covariance matrices. J. Appl. Stat. 2022, 49, 1449–1464. [Google Scholar] [CrossRef]
- Lu, X.S.; Xie, M.; Goh, T.N.; Lai, C.D. Control chart for multivariate attribute processes. Int. J. Prod. Res. 1998, 36, 3477–3489. [Google Scholar] [CrossRef]
- Taleb, H. Control charts applications for multivariate attribute processes. Comput. Ind. Eng. 2009, 56, 399–410. [Google Scholar] [CrossRef]
- Topalidou, E.; Psarakis, S. Review of multinomial and multiattribute quality control charts. Qual. Reliab. Eng. Int. 2009, 25, 773–804. [Google Scholar] [CrossRef]
- Chiu, J.E.; Kuo, T.I. Control charts for fraction nonconforming in a bivariate binomial process. J. Appl. Stat. 2010, 37, 1717–1728. [Google Scholar] [CrossRef]
- Yang, S.F.; Yeh, J.T. Using cause selecting control charts to monitor dependent process stages with attributes data. Expert Syst. Appl. 2011, 38, 667–672. [Google Scholar] [CrossRef]
- Li, J.; Tsung, F.; Zou, C. Directional control schemes for multivariate categorical processes. J. Qual. Technol. 2012, 44, 136–154. [Google Scholar] [CrossRef]
- Niaki, S.T.A.; Jahani, P. The economic design of multivariate binomial EWMA VSSI control charts. J. Appl. Stat. 2013, 40, 1301–1318. [Google Scholar] [CrossRef]
- Li, J.; Tsung, F.; Zou, C. Multivariate binomial/multinomial control chart. IIE Trans 2014, 46, 526–542. [Google Scholar] [CrossRef]
- Niaki, S.T.A.; Khedmati, M. Step change-point estimation of multivariate binomial processes. Int. J. Qual. Reliab. Manag. 2014, 31, 566–587. [Google Scholar] [CrossRef]
- Pascual, F.G.; Akhundjanov, S.B. Copula-based control charts for monitoring multivariate Poisson processes with application to hepatitis C counts. J. Qual. Technol. 2020, 52, 128–144. [Google Scholar] [CrossRef]
- Chiu, J.E.; Kuo, T.I. Attribute control chart for multivariate Poisson distribution. Commun. Stat.-Theory Methods 2008, 37, 146–158. [Google Scholar] [CrossRef]
- Suri, N.N.R.R.; Murty, M.N.; Athithan, G. Outlier Detection: Techniques and Applications. A Data Mining Perspective; Springer Nature: Cham, Switzerland, 2019. [Google Scholar]
- Krummenauer, F. Efficient simulation of multivariate binomial and Poisson distributions. Biom. J. 1998, 40, 823–832. [Google Scholar] [CrossRef]
- Hou, C.D.; Chiang, J.; Tai, J.J. Identifying chromosomal fragile sites from a hierarchical-clustering point of view. Biometrics 2001, 57, 435–440. [Google Scholar] [CrossRef]
- Benjamini, Y.; Hochberg, Y. Controlling the false discovery rate: A practical and powerful approach to multiple hypothesis testing. J. R. Stat. Soc. Ser. B-Stat. Methodol. 1995, 57, 289–300. [Google Scholar] [CrossRef]
- Holm, S. A simple sequentially rejective multiple test procedure. Scand. J. Stat. 1979, 6, 65–70. [Google Scholar] [CrossRef]
- Sidák, Z.K. Rectangular confidence regions for the means of multivariate normal distributions. J. Am. Stat. Assoc. 1967, 62, 626–633. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).