On the Use of Copula for Quality Control Based on an AR(1) Model †

: Manufacturing for a multitude of continuous processing applications in the era of automation and ‘Industry 4.0’ is focused on rapid throughput while producing products of acceptable quality that meet customer speciﬁcations. Monitoring the stability or statistical control of key process parameters using data acquired from online sensors is fundamental to successful automation in manufacturing applications. This study addresses the signiﬁcant problem of positive autocorrelation in data collected from online sensors, which may impair assessment of statistical control. Sensor data collected at short time intervals typically have signiﬁcant autocorrelation, and traditional statistical process control (SPC) techniques cannot be deployed. There is a plethora of literature on techniques for SPC in the presence of positive autocorrelation. This paper contributes to this area of study by investigating the performance of ‘Copula’ based control charts by assessing the average run length (ARL) when the subsequent observations are correlated and follow the AR(1) model. The conditional distribution of y t given y t − 1 is used in deriving the control chart limits for three different categories of Copulas: Gaussian, Clayton, and Farlie-Gumbel-Morgenstern Copulas. Preliminary results suggest that the overall performance of the Clayton Copula and Farlie-Gumbel-Morgenstern Copula is better compared to other Archimedean Copulas. The Clayton Copula is the more robust with respect to changes in the process standard deviation as the correlation coefﬁcient increases.


Introduction
The origin of the time series goes back to the 1930s, in the context of the ARMA (Auto Regressive Moving Average) models that were developed by Herman Wold [1] for the stationary time series. Later, these models were expanded to include ARIMA (Auto Regressive Integrated Moving Average) in order to handle the non-stationary time series. In the 1970s, Box-Jenkins ARIMA models became very popular, as these models can handle seasonal and non-seasonal patterns [2]. In addition, smoothing techniques were used in time series [3,4]. These models use numerically iterative procedures to estimate the parameters involved in the time series models. These models are still popular and heavily used in forecasting. In addition to forecasting, these time series models can be used in quality control, especially in the construction of the popular EWMA (Exponentially Weighted Moving Average) control charts, under the assumption that the averages follow the AR(1) (or First Order Auto Regressive) model [5][6][7][8][9][10][11].
The use of Copulas in modeling resulted from the pioneering work of Sklar [12]. Substantial research was done in the 1980s and 1990s in the context of Copula modeling and applications in subjects such as economics, finance, actuarial science, engineering, etc. (see Nelsen [13]). The Copula models use the dependence structure and the direction of the association for modeling. There are several copula models, which differ based on their properties. The Copula models fall under one of two major categories; Archimedean Copulas and non-Archimedean Copulas [14,15]. In addition, there are copulas for discretetype variables and continuous-type variables. In addition, there are different variants of the Copulas, such as Vine Copulas and Hierarchical Copulas [16][17][18][19][20][21]. These variants are quite useful in dimension reduction in the context of high dimensional problems. We expand upon the work of Hryniewicz [22] in considering other models of copula-based dependence, i.e., approximate models proposed in this study rely on Pearson's rho, with practical considerations.
Autocorrelation is an inherent problem with time series data collected at short time intervals. Given the advent of automation and high-speed throughput in the modern era of manufacturing (Industry 4.0 [23]), autocorrelation is a significant problem and hampers traditional SPC techniques. Autocorrelation, if unaccounted for in statistical process control (SPC) applications, can result in false signals of out-of-control and may induce overadjustment of the process and unwanted variation, e.g., 'Deming's funnel experiment' [20]. There is a plethora of literature on SPC adjusted for autocorrelation (see [24][25][26][27][28][29][30]). In this study, we use Copula models to construct control charts for a stationary process under the assumption that the process follows the AR(1) series. We evaluate the performance of these models on the basis of Average Run Length (ARL).

Materials and Methods
Let us assume that the observations y 1 , y 2 , . . . . . . . . . y t , . . . . . . form an AR(1) model: Note that the error terms, ε t , are assumed to be independent, with a mean = 0 and a constant variance = σ 2 ε , but are not necessarily distributed as a normal distribution. From (1), it follows that Cov(y t , y t−1 ) = Cov(δ, y t−1 ) + ϕCov(y t , y t−1 ) + Cov(ε t, , y t−1 ) (2) Note that the observations y t and y t−1 follow the same distribution, although these observations are correlated. Let ρ be the first-order autocorrelation coefficient and As we can see, the first-order autocorrelation ρ is equal to ϕ. Our interest is in finding a suitable Copula model for approximating the joint distribution among these correlated time series observations.

Copula Construction
Here, we investigate the Copula models, such as the Clayton, Gumbel, Frank, Farlie-Gumbel, and Gaussian Copulas. Let us first discuss the construction of Copulas in a general set-up. In order to construct the Copulas, we need to define some notations: However, since y t and y t−1 follow the same distribution (due to AR(1) being a stationary process), F = G. Then, the cumulative joint probability distribution of y t and y t−1 can be written as follows: According to Copula theory, the conditional cumulative probability distribution of y t given y t−1 is given by the following equation: Note, for the AR(1) model, the conditional distribution of y t given y t−1 is the same as the conditional distribution of ε t given y t−1 , with the mean shifted by a constant. However, ε t and y t−1 are independent of each other. Therefore, the conditional distribution of y t given y t−1 is the same as the unconditional distribution of ε t , with the mean shifted by the same constant.

Clayton Copula
The derivation of the conditional distribution for the Clayton Copula is as follows:

Gumbel Copula
The derivation of the conditional distribution for the Gumbel Copula is as follows:

Farlie-Gumbel-Morgenstern (FGM) Copula
The derivation of the conditional distribution for the FGM Copula is as follows:

∂C(u,v) ∂v
is the cumulative conditional distribution of y t given y t−1 . This is the same as the cumulative conditional distribution of ε t given y t−1 . However, ε t is independent of y t−1 . Hence, is the cumulative distribution of the error term regardless of which Copula model is used.

Gaussian Copula
The derivation of the conditional distribution for the Gaussian Copula is as follows: where Φ, Φ 2 represent the standard normal cdf and the standard bivariate normal cdf, respectively. Moreover Φ −1 represents the functional inverse of Φ.
The conditional cumulative distribution is as follows: Now, the question is which copula should be used. For example, for data that exhibit a right tail, such as the exponential data (survival analysis for lifetime of machinery), the Clayton model is better [31]. For data that exhibit a left tail, such as the extreme data (stress due to heavy wind or pollution), the Gumbel model is better [32]. For data that have both tails, t-Copula is better [33]. If there are no tails, then the Farlie-Gumbel-Morgenstern model is better [34].
Suppose that for a particular location we are monitoring the moisture of materials for biomaterials and that the measurements are correlated. We assume the AR(1) model is appropriate for this particular data due to slow death in the autocorrelations between y t , and y t+k as k → ∞ . The reason is that for the AR(1) model, the autocorrelation between y t and y t+k is ρ k = ϕ k , where ϕ is the AR(1) model coefficient and |ϕ| ≤ 1. Next, we do a simulation study to find out which Archimedean Copula approximates the conditional distribution better. For the simulation we assume that y t follows a univariate normal distribution, with µ = δ 1−ϕ and variance = σ 2 . We use the Theorems A1 and A2 in Appendix A to derive the results presented in the next section.

Comparison of Copulas to Approximate the Conditional Distribution
Here, we present the results comparing the Copulas based on the conditional probabilities with the actual conditional probability P(Y t ≤ y t | Y t−1 = y t−1 ) (see Table 1). It can be seen from Table 1 that the overall performance of the Clayton Copula and Farlie-Gumbel-Morgenstern (FGM) Copula is better compared to the other Archimedean Copulas considered here. So, we use Clayton Copula and FGM Copula for approximating the conditional distribution of y t given y t−1 . In order to compute the conditional cumulative probability distribution under different copula models, we use the following values for the parameters of the AR(1) model. Note that δ represents the constant and ρ(=ϕ) is the coefficient in the AR(1) model. The observations are simulated according to the AR(1) model only (no other experimental design is used).

Construction of the Control Charts
Quality control methods started in the 1920s and became widely popular in the 1940s due to the efforts of Walter Shewhart [35], Edwards Deming [36], Joseph Juran [37], and Armand Feigenbaum [38]. As a result, the American Society for Quality Control (ASQC) was formed in 1946. The concept of control charts was later extended to cover multivariate situations as well [39]. In most of the control charts, the assumptions are that the observations are independent and that distribution is normal or multivariate normal (in multivariate situations). EWMA charts were introduced in situations where the observations were correlated [6,7]. Here we introduce a Copula-based approach to construct the control chart when the subsequent observations are correlated and follow the AR(1) model. The conditional distribution of y t given y t−1 is used in deriving the control chart limits.

EWMA Chart as a Special Case
Let x 0 , x 1 , . . . . . . . . . . be independent observations following a stationary process, as defined by x i = µ + E i , where E i are independent normally distributed error terms. Let us define

Approximation Based on the Clayton Copula
As noted earlier, where K(x) is the cumulative probability distribution function of the error ε t . Note that this is the conditional distribution of y t given y t−1 . We are interested in setting up the quality control limits for y t such that γ 2 ≤ K(ε t ) ≤ 1 − γ 2 , where 100(1 − γ)% is the confidence level for the control chart. So, for the conditional distribution of y t given y t−1 based on the Clayton Copula model, the 100(1 − γ)% confidence control chart can be found by solving the following inequality: Here, we are making the assumption that the unconditional distribution of y t−1 is normally distributed with mean µ = δ Then, at the upper control limit (UCL), Note that u = P(y t ≤ UCL). This means that Similarly, at the lower control limit (LCL), This means that LCL = µ − σΦ −1 (u lcl ).

Approximation Based on the Farlie-Gumbel-Morgenstern Copula
Again, as noted earlier, where K(x) is the cumulative probability distribution function of the error ε t . Note that this is the conditional distribution of y t given y t−1 . We are interested in setting up the quality control limits for y t such that γ 2 ≤ K(ε t ) ≤ 1 − γ 2 , where 100(1 − γ)% is the confidence level for the control chart. So, for the conditional distribution of y t given y t−1 , based on the FGM Copula model, the 100(1 − γ)% confidence control chart can be found by solving the following inequality: Here, we are making the assumption that the unconditional distribution of y t−1 is normally distributed with mean µ = δ 1−ϕ 1 and variance = σ 2 .
Then, at the upper control limit(UCL), Note that u ucl = P(y t ≤ UCL). This means that Similarly, at the lower control limit(LCL), This means that Next, we investigate the use of the Gaussian Copula in the same context to approximate the conditional distribution of y t given y t−1 .

Numerical Results
In this section, we perform a simulation study in order to compare the performance of the Gaussian Copula against the Clayton Copula and the FGM Copula on the basis of the average run length (ARL 0 ), under the assumption that the null hypothesis H 0 is true. This study involved 10,000 simulation trials. Note that ρ represents the first order autocorrelation between any two consecutive observations, and σ is the error standard deviation according to the AR(1) model. Moreover, we are focused on the positive correlations, as the Clayton Copula is valid only for positive associations. We compare the performance of the Gaussian Copula with the Clayton Copula and the FGM Copula.

Gaussian Copula
As we can see from the results presented in Table 2, the ARL under the null hypothesis (ARL 0 ) decreases as the correlation coefficient (ρ) increases. In addition, note that ARL 0 is fairly robust with respect to changes in the process standard deviation (σ).

Clayton Copula
Again, as we can see from the results in Table 3, the ARL under the null hypothesis (ARL 0 ) decreases as the correlation coefficient (ρ) increases. In addition, note that (ARL 0 ) is fairly robust with respect to changes in the process standard deviation (σ).

Farlie-Gumbel-Morgenstern Copula
The results presented in Table 4 show that the ARL under the null hypothesis (ARL 0 ) is not as robust when compared to the Clayton Copula at a particular value of the correlation coefficient ρ. The (ARL 0 ) values did not decrease when the correlation coefficient (ρ) increased. Moreover, in the case of this Copula, the correlation coefficient has a restriction that − 1 3 ≤ ρ ≤ 1 3 .

Conclusions
The numerical results indicate that the Clayton Copula behaves more or less like the Gaussian Copula at very low correlation levels between any two consecutive time dependent measurements. For example, in situations in which the measurements are taken at large time intervals, it is reasonable to assume a low level of correlation between these measurements. However, the use of the Clayton Copula is not advisable to construct quality control limits when the correlation level is reasonably high. In any event, the use of other types of Copulas to construct the quality control limits for an AR(1) process should be avoided. For the same reason, we decided not to consider power calculations in this paper.
This study about the use of Copulas to construct the control chart for an AR(1) process can be easily extended to the EWMA chart as it is also based on an AR(1) process. The authors propose future study of the use of the Clayton Copula to construct the EWMA quality control chart.
and variance-covariance matrix given by For the infinite dimensional case, let µ k → µ and Σ k → Σ as k → ∞.
This means that 1 . This in turn means that According to the AR(1) model y t = δ + ϕ 1 y t−1 + ε t ⇔ y t − ϕ 1 y t−1 = δ + ε t Note that the expected values on either side of the above equation match due to δ = (1 − ϕ 1 )µ.

Now let us use the identity,
Var(y t ) = E(Var(y t |y t−1 )) + Var(E(y t |y t−1 )) = σ 11 − Proof. Let us define the vector Note that vectors X and Y are the partitions.