Abstract
Manufacturing for a multitude of continuous processing applications in the era of automation and ‘Industry 4.0’ is focused on rapid throughput while producing products of acceptable quality that meet customer specifications. Monitoring the stability or statistical control of key process parameters using data acquired from online sensors is fundamental to successful automation in manufacturing applications. This study addresses the significant problem of positive autocorrelation in data collected from online sensors, which may impair assessment of statistical control. Sensor data collected at short time intervals typically have significant autocorrelation, and traditional statistical process control (SPC) techniques cannot be deployed. There is a plethora of literature on techniques for SPC in the presence of positive autocorrelation. This paper contributes to this area of study by investigating the performance of ‘Copula’ based control charts by assessing the average run length (ARL) when the subsequent observations are correlated and follow the AR(1) model. The conditional distribution of given is used in deriving the control chart limits for three different categories of Copulas: Gaussian, Clayton, and Farlie-Gumbel-Morgenstern Copulas. Preliminary results suggest that the overall performance of the Clayton Copula and Farlie-Gumbel-Morgenstern Copula is better compared to other Archimedean Copulas. The Clayton Copula is the more robust with respect to changes in the process standard deviation as the correlation coefficient increases.
1. Introduction
The origin of the time series goes back to the 1930s, in the context of the ARMA (Auto Regressive Moving Average) models that were developed by Herman Wold [1] for the stationary time series. Later, these models were expanded to include ARIMA (Auto Regressive Integrated Moving Average) in order to handle the non-stationary time series. In the 1970s, Box-Jenkins ARIMA models became very popular, as these models can handle seasonal and non-seasonal patterns [2]. In addition, smoothing techniques were used in time series [3,4]. These models use numerically iterative procedures to estimate the parameters involved in the time series models. These models are still popular and heavily used in forecasting. In addition to forecasting, these time series models can be used in quality control, especially in the construction of the popular EWMA (Exponentially Weighted Moving Average) control charts, under the assumption that the averages follow the AR(1) (or First Order Auto Regressive) model [5,6,7,8,9,10,11].
The use of Copulas in modeling resulted from the pioneering work of Sklar [12]. Substantial research was done in the 1980s and 1990s in the context of Copula modeling and applications in subjects such as economics, finance, actuarial science, engineering, etc. (see Nelsen [13]). The Copula models use the dependence structure and the direction of the association for modeling. There are several copula models, which differ based on their properties. The Copula models fall under one of two major categories; Archimedean Copulas and non-Archimedean Copulas [14,15]. In addition, there are copulas for discrete-type variables and continuous-type variables. In addition, there are different variants of the Copulas, such as Vine Copulas and Hierarchical Copulas [16,17,18,19,20,21]. These variants are quite useful in dimension reduction in the context of high dimensional problems. We expand upon the work of Hryniewicz [22] in considering other models of copula-based dependence, i.e., approximate models proposed in this study rely on Pearson’s rho, with practical considerations.
Autocorrelation is an inherent problem with time series data collected at short time intervals. Given the advent of automation and high-speed throughput in the modern era of manufacturing (Industry 4.0 [23]), autocorrelation is a significant problem and hampers traditional SPC techniques. Autocorrelation, if unaccounted for in statistical process control (SPC) applications, can result in false signals of out-of-control and may induce over-adjustment of the process and unwanted variation, e.g., ‘Deming’s funnel experiment’ [20]. There is a plethora of literature on SPC adjusted for autocorrelation (see [24,25,26,27,28,29,30]). In this study, we use Copula models to construct control charts for a stationary process under the assumption that the process follows the AR(1) series. We evaluate the performance of these models on the basis of Average Run Length (ARL).
2. Materials and Methods
Let us assume that the observations form an AR(1) model:
Note that the error terms, , are assumed to be independent, with a mean = 0 and a constant variance = , but are not necessarily distributed as a normal distribution. From (1), it follows that
Note that the observations and follow the same distribution, although these observations are correlated. Let be the first-order autocorrelation coefficient and
As we can see, the first-order autocorrelation is equal to . Our interest is in finding a suitable Copula model for approximating the joint distribution among these correlated time series observations.
2.1. Copula Construction
Here, we investigate the Copula models, such as the Clayton, Gumbel, Frank, Farlie-Gumbel, and Gaussian Copulas. Let us first discuss the construction of Copulas in a general set-up. In order to construct the Copulas, we need to define some notations:
However, since and follow the same distribution (due to AR(1) being a stationary process), F = G. Then, the cumulative joint probability distribution of and can be written as follows:
According to Copula theory, the conditional cumulative probability distribution of given is given by the following equation:
Note, for the AR(1) model, the conditional distribution of given is the same as the conditional distribution of , with the mean shifted by a constant. However, are independent of each other. Therefore, the conditional distribution of given is the same as the unconditional distribution of with the mean shifted by the same constant.
2.1.1. Clayton Copula
The derivation of the conditional distribution for the Clayton Copula is as follows:
2.1.2. Gumbel Copula
The derivation of the conditional distribution for the Gumbel Copula is as follows:
2.1.3. Farlie-Gumbel-Morgenstern (FGM) Copula
The derivation of the conditional distribution for the FGM Copula is as follows:
2.1.4. Frank Copula
The derivation of the conditional distribution for the Frank Copula is as follows:
Remark 1.
As you can see,is the cumulative conditional distribution ofgivenThis is the same as the cumulative conditional distribution ofgivenHowever,is independent ofHence,is the cumulative distribution of the error term regardless of which Copula model is used.
2.1.5. Gaussian Copula
The derivation of the conditional distribution for the Gaussian Copula is as follows:
where represent the standard normal cdf and the standard bivariate normal cdf, respectively. Moreover represents the functional inverse of .
The conditional cumulative distribution is as follows:
Now, the question is which copula should be used. For example, for data that exhibit a right tail, such as the exponential data (survival analysis for lifetime of machinery), the Clayton model is better [31]. For data that exhibit a left tail, such as the extreme data (stress due to heavy wind or pollution), the Gumbel model is better [32]. For data that have both tails, t-Copula is better [33]. If there are no tails, then the Farlie-Gumbel-Morgenstern model is better [34].
Suppose that for a particular location we are monitoring the moisture of materials for biomaterials and that the measurements are correlated. We assume the AR(1) model is appropriate for this particular data due to slow death in the autocorrelations between , and as . The reason is that for the AR(1) model, the autocorrelation between and is , where is the AR(1) model coefficient and . Next, we do a simulation study to find out which Archimedean Copula approximates the conditional distribution better. For the simulation we assume that follows a univariate normal distribution, with and variance . We use the Theorems A1 and A2 in Appendix A to derive the results presented in the next section.
3. Comparison of Copulas to Approximate the Conditional Distribution
Here, we present the results comparing the Copulas based on the conditional probabilities with the actual conditional probability (see Table 1). It can be seen from Table 1 that the overall performance of the Clayton Copula and Farlie-Gumbel-Morgenstern (FGM) Copula is better compared to the other Archimedean Copulas considered here. So, we use Clayton Copula and FGM Copula for approximating the conditional distribution of given.
Table 1.
Comparing the Copulas based on the conditional probabilities with the actual conditional probability.
In order to compute the conditional cumulative probability distribution under different copula models, we use the following values for the parameters of the AR(1) model. Note that represents the constant and is the coefficient in the AR(1) model. The observations are simulated according to the AR(1) model only (no other experimental design is used).
4. Applications in Quality Control
4.1. Construction of the Control Charts
Quality control methods started in the 1920s and became widely popular in the 1940s due to the efforts of Walter Shewhart [35], Edwards Deming [36], Joseph Juran [37], and Armand Feigenbaum [38]. As a result, the American Society for Quality Control (ASQC) was formed in 1946. The concept of control charts was later extended to cover multivariate situations as well [39]. In most of the control charts, the assumptions are that the observations are independent and that distribution is normal or multivariate normal (in multivariate situations). EWMA charts were introduced in situations where the observations were correlated [6,7]. Here we introduce a Copula-based approach to construct the control chart when the subsequent observations are correlated and follow the AR(1) model. The conditional distribution of given is used in deriving the control chart limits.
4.2. EWMA Chart as a Special Case
Let be independent observations following a stationary process, as defined by where are independent normally distributed error terms. Let us define , where . Note that
4.3. Approximation Based on the Clayton Copula
As noted earlier,
where is the cumulative probability distribution function of the error. Note that this is the conditional distribution of given. We are interested in setting up the quality control limits for such that , where is the confidence level for the control chart. So, for the conditional distribution of given based on the Clayton Copula model, the confidence control chart can be found by solving the following inequality:
Here, we are making the assumption that the unconditional distribution of is normally distributed with mean and variance
Let
Then, at the upper control limit ,
So,
Note that . This means that
Similarly, at the lower control limit ,
So,
This means that
4.4. Approximation Based on the Farlie-Gumbel-Morgenstern Copula
Again, as noted earlier,
where is the cumulative probability distribution function of the error Note that this is the conditional distribution of given. We are interested in setting up the quality control limits for such that , where is the confidence level for the control chart. So, for the conditional distribution of given, based on the FGM Copula model, the confidence control chart can be found by solving the following inequality:
Here, we are making the assumption that the unconditional distribution of is normally distributed with mean and variance .
Let
Let
Then, at the upper control limit,
Note that . This means that
Similarly, at the lower control limit,
This means that
Next, we investigate the use of the Gaussian Copula in the same context to approximate the conditional distribution of given .
5. Numerical Results
In this section, we perform a simulation study in order to compare the performance of the Gaussian Copula against the Clayton Copula and the FGM Copula on the basis of the average run length under the assumption that the null hypothesis is true. This study involved 10,000 simulation trials. Note that represents the first order auto-correlation between any two consecutive observations, and is the error standard deviation according to the AR(1) model. Moreover, we are focused on the positive correlations, as the Clayton Copula is valid only for positive associations. We compare the performance of the Gaussian Copula with the Clayton Copula and the FGM Copula.
5.1. Gaussian Copula
As we can see from the results presented in Table 2, the ARL under the null hypothesis decreases as the correlation coefficient increases. In addition, note that is fairly robust with respect to changes in the process standard deviation ).
Table 2.
Average run length by for the Clayton Copula.
5.2. Clayton Copula
Again, as we can see from the results in Table 3, the ARL under the null hypothesis decreases as the correlation coefficient increases. In addition, note that is fairly robust with respect to changes in the process standard deviation
Table 3.
Average run length by for the Gaussian Copula.
5.3. Farlie-Gumbel-Morgenstern Copula
The results presented in Table 4 show that the ARL under the null hypothesis is not as robust when compared to the Clayton Copula at a particular value of the correlation coefficient The values did not decrease when the correlation coefficient increased. Moreover, in the case of this Copula, the correlation coefficient has a restriction that
Table 4.
Average run length by for the Farlie-Gumbel-Morgenstern Copula.
6. Conclusions
The numerical results indicate that the Clayton Copula behaves more or less like the Gaussian Copula at very low correlation levels between any two consecutive time dependent measurements. For example, in situations in which the measurements are taken at large time intervals, it is reasonable to assume a low level of correlation between these measurements. However, the use of the Clayton Copula is not advisable to construct quality control limits when the correlation level is reasonably high. In any event, the use of other types of Copulas to construct the quality control limits for an AR(1) process should be avoided. For the same reason, we decided not to consider power calculations in this paper.
This study about the use of Copulas to construct the control chart for an AR(1) process can be easily extended to the EWMA chart as it is also based on an AR(1) process. The authors propose future study of the use of the Clayton Copula to construct the EWMA quality control chart.
Author Contributions
Conceptualization, T.M.Y. and A.N.; methodology, T.M.Y. and A.N.; software, H.N.; validation, T.M.Y., A.N. and H.N.; formal analysis, A.N.; investigation, T.M.Y. and A.N.; resources, T.M.Y. and A.N.; data curation, A.N. and H.N.; writing—original draft preparation, T.M.Y. and A.N.; writing—review and editing, T.M.Y. and A.N.; writing—review and editing, project administration, T.M.Y.; funding acquisition, T.M.Y. All authors have read and agreed to the published version of the manuscript.
Funding
This research was funded by the USDA National Institute of Food and Agriculture, Hatch project 1012359.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
Not applicable; given data are derived by simulation.
Conflicts of Interest
The authors declare no conflict of interest.
Appendix A
Theorem A1.
At any given time, if the observationsfollow a multivariate normal distribution with the mean
and variance-covariance matrix given by
Remark A1.
For the infinite dimensional case, letandas
Then the observations follow the model as given below.
Proof.
Note that at any time
This means that
This in turn means that
□
According to the model
Note that the expected values on either side of the above equation match due to .
Next, let us verify the equality of the variance.
Again, note that
But,
where
In addition, where and .
Note that due to the model, ,
Term-by-term comparison yields and .
Due to the stationarity of the series, , and this means
Again, due to the stationarity, .
Note that
As noted earlier,
Now let us use the identity,
Theorem A2.
Let the stationary time series …… , …… jointly follow a multivariate normal distribution while satisfying the properties of the AR(1) model. Then, this time series forms a Markov Chain.
Proof.
Let us define the vector
Note that vectors and are the partitions.
It is well-known that given follows a multivariate normal distribution with mean
and variance
By using the above results, we can easily verify that
and
Furthermore, the conditional distribution is normal distribution, so,
This means that the time series forms a Markov Chain. □
Note that the conditional distribution of given is given by
As seen from above, the conditional distribution is normally distributed with mean = and variance = .
Next, our interest is in studying the Average Run Length (ARL).
Deriving a Theoretical Expression for
Let Run Length
Let
Then,
Note that when (or the center line).
Similarly, when deviates from the center line.
References
- Wold, H. A Study in the Analysis of Stationary Time Series, 2nd ed.; Almqvist and Wiksell Book Co.: Stockholm, Sweden, 1954. [Google Scholar]
- Box, G.E.P.; Jenkins, G.M. Time Series Analysis Forecasting and Control; Holden-Day: San Francisco, CA, USA, 1970. [Google Scholar]
- Gardner, E.S. Exponential smoothing: The state of the art. J. Forecast. 1985, 4, 1–28. [Google Scholar] [CrossRef]
- Gardner, E.S.; McKenzie, E. Forecasting trends in time series. Manage. Sci. 1985, 31, 1237–1246. [Google Scholar] [CrossRef]
- Roberts, S.D. Properties of control chart zone tests. Bell Sys. Tech. J. 1959, 37, 83–114. [Google Scholar] [CrossRef]
- Crowder, S.V. A simple method for studying run length distribution of exponentially weighted moving average control. Technometrics 1959, 29, 401–407. [Google Scholar]
- Crowder, S.V. Design of exponentially weighted moving average schemes. Technometrics 1989, 21, 155–162. [Google Scholar] [CrossRef]
- Saccucci, M.S.; Lucas, J.M. Average run lengths for exponentially weighted moving average control schemes using the Markov Chain approach. J. Qual. Technol. 1990, 22, 154–162. [Google Scholar] [CrossRef]
- Alwan, L.C.; Roberts, H.V. Time-series modeling for statistical process control. J. Bus Econ. Stat. 1988, 6, 87–95. [Google Scholar]
- Montgomery, D.C.; Mastrangelo, C.M. Some statistical process control methods for autocorrelated data. J. Qual. Technol. 1991, 23, 179–193. [Google Scholar] [CrossRef]
- Schmid, W. On EWMA Charts for Time Series. In Frontiers in Statistical Quality Control; Lenz, H.J., Wilrich, P.T., Eds.; Physica-Verlag: Heidelberg, Germany, 1997; pp. 115–137. [Google Scholar]
- Sklar, A. Fonctions de répartition à n dimensions et leurs marges. Publ. Inst. Statist. Univ. Paris. 1959, 8, 229–231. [Google Scholar]
- Nelsen, R. An Introduction to Copulas, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
- Genest, C.; MacKay, J. Copules archim’ediennes et familles de lois bidimensionnelles dont les marges sont donn´ees. Canad. J. Statist. 1986, 14, 145. [Google Scholar] [CrossRef]
- Genest, C.; MacKay, J. The joy of copulas: Bivariate distributions with uniform marginals. Amer. Statist. 1986, 40, 280–285. [Google Scholar]
- Torre, E.; Marelli, S.; Embrechts, P.; Sudret, B. A general framework for data-driven uncertainty quantification under complex dependencies using Vine Copulas. Probab. Eng. Mech. 2018, 55, 1–16. [Google Scholar] [CrossRef] [Green Version]
- Tang, X.S.; Li, D.; Zhou, Q.; Phoon, K.K.; Zhang, L.M. Impact of Copulas for modeling bivariate distributions on system reliability. Struct. Saf. 2013, 44, 80–90. [Google Scholar] [CrossRef]
- Zhang, M.; Bedford, T. Vine copula approximation: A generic method for coping with conditional dependence. Stat. Comput. 2018, 28, 219–237. [Google Scholar] [CrossRef] [Green Version]
- Aas, K.; Czado, C.; Frigessi, A.; Bakken, H. Pair-copula constructions of multiple dependence. Insur. Math. Econ. 2009, 44, 182–198. [Google Scholar] [CrossRef] [Green Version]
- Kurowicka, D. Dependence Modeling: Vine Copula Handbook; World Scientific: Singapore, 2011. [Google Scholar]
- So, M.K.; Yeung, C.Y. Vine-copula garch model with dynamic conditional dependence. Comput. Stat. Data Anal. 2014, 76, 655–671. [Google Scholar] [CrossRef]
- Hryniewicz, O. On the Robustness of the Shewhart Control Chart to Different Types of Dependencies in Data. In Frontiers in Statistical Quality Control; Lenz, H.J., Schmid, W., Wilrich, P.T., Eds.; Physica: Heidelberg, Germany, 2012; Volume 10. [Google Scholar]
- Liao, Y.; Deschamps, F.; De Freitas Rocha Loures, E.; Pierin Ramos, L.F. Past, present and future of Industry 4.0—A systematic literature review and research agenda proposal. Int. J. Prod. Res. 2017, 55, 3609–3629. [Google Scholar]
- Reynolds, M.R.; Arnold, J.C.; Baik, J.W. Variable sampling interval X charts in the presence of correlation. J. Qual. Technol. 1996, 28, 12–30. [Google Scholar] [CrossRef]
- Gilbert, K.C.; Kirby, K.; Hild, C.R. Charting autocorrelated data: Guidelines for practitioners. Qual. Eng. 1997, 9, 367–382. [Google Scholar] [CrossRef]
- Lu, C.W.; Reynolds, M.R. EWMA control charts for monitoring the mean of autocorrelated processes. J. Qual. Technol. 1999, 31, 166–188. [Google Scholar] [CrossRef]
- Lin, Y.-C. The variable parameters control charts for monitoring autocorrelated processes. Comm. Stat-Simul C 2009, 38, 729–749. [Google Scholar] [CrossRef]
- Xi, M.; Zhang, L.; Hu, J.; Palazoglu, A. A model-free approach to reduce the effect of autocorrelation on statistical process control charts. J. Chemom. 2018, 32, 12. [Google Scholar]
- Zhang, N.F. A statistical control chart for stationary process data. Technometrics 1998, 40, 24–38. [Google Scholar] [CrossRef]
- Woodall, W.H.; Faltin, F. Autocorrelated data and SPC. ASQC Stat. Div. Newsl. 1993, 13, 1821. [Google Scholar]
- Al-babtain, A.; Elbatal, I.; Yousof, H.M. A new flexible three-parameter model: Properties, Clayton Copula, and modeling real data. Symmetry 2020, 12, 440. [Google Scholar] [CrossRef] [Green Version]
- Simiu, E.; Heckert, N.A.; Filliben, J.J.; Johnson, S.K. Extreme wind load estimates based on the Gumbel distribution of dynamic pressures: An assessment. Struct. Saf. 2001, 23, 221–229. [Google Scholar] [CrossRef]
- Cossin, D.; Schellhorn, H.; Song, N.; Satjaporn, T. A theoretical argument why the t-Copula explains credit risk contagion better than the Gaussian Copula. Adv. Decs. Sci. 2010, 29, 546547. [Google Scholar] [CrossRef]
- McCool, J.I. Testing for dependency of failure times in life testing. Technometrics 2012, 48, 41–48. [Google Scholar] [CrossRef]
- Shewhart, W.A. Economic Control of Quality of Manufactured Product; D. Van Nostrand Company: New York, NY, USA, 1931. [Google Scholar]
- Deming, W.E. Out of the Crisis; Massachusetts Institute of Technology, Center for Advanced Engineering Study: Cambridge, MA, USA, 1986. [Google Scholar]
- Juran, J.M. Quality-Control Handbook; McGraw-Hill: New York, NY, USA, 1951. [Google Scholar]
- Feigenbaum, A.V. Total Quality Control; McGraw-Hill: New York, NY, USA, 1991. [Google Scholar]
- Hotelling, H. Analysis of a complex of statistical variables into principal components. J. Educ. Psychol. 1933, 24, 417–441. [Google Scholar] [CrossRef]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).