On the Use of Copula for Quality Control Based on an AR(1) Model

Young, Timothy M.; Nanthakumar, Ampalavanar; Nanthakumar, Hari

doi:10.3390/math9182211

Open AccessArticle

On the Use of Copula for Quality Control Based on an AR(1) Model^†

by

Timothy M. Young

^1,*

,

Ampalavanar Nanthakumar

² and

Hari Nanthakumar

³

¹

Center for Renewable Carbon, The University of Tennessee, 2506 Jacob Drive, Knoxville, TN 37996-4563, USA

²

Department of Mathematics, State University of New York at Oswego, Oswego, NY 13126, USA

³

Department of Economics and mathematics, Columbia University, New York, NY 10027, USA

^*

Author to whom correspondence should be addressed.

^†

This paper is an extended version of a technical abstract for a presentation made at the 15th Annual International Conference on Statistics: Teaching, Theory & Applications. Athens Institute for Education and Research (ATINER), Athens, Greece, 28–30 June 2021.

Mathematics 2021, 9(18), 2211; https://doi.org/10.3390/math9182211

Submission received: 8 August 2021 / Revised: 4 September 2021 / Accepted: 7 September 2021 / Published: 9 September 2021

Download Versions Notes

Abstract

:

Manufacturing for a multitude of continuous processing applications in the era of automation and ‘Industry 4.0’ is focused on rapid throughput while producing products of acceptable quality that meet customer specifications. Monitoring the stability or statistical control of key process parameters using data acquired from online sensors is fundamental to successful automation in manufacturing applications. This study addresses the significant problem of positive autocorrelation in data collected from online sensors, which may impair assessment of statistical control. Sensor data collected at short time intervals typically have significant autocorrelation, and traditional statistical process control (SPC) techniques cannot be deployed. There is a plethora of literature on techniques for SPC in the presence of positive autocorrelation. This paper contributes to this area of study by investigating the performance of ‘Copula’ based control charts by assessing the average run length (ARL) when the subsequent observations are correlated and follow the AR(1) model. The conditional distribution of

y_{t}

given

y_{t - 1}

is used in deriving the control chart limits for three different categories of Copulas: Gaussian, Clayton, and Farlie-Gumbel-Morgenstern Copulas. Preliminary results suggest that the overall performance of the Clayton Copula and Farlie-Gumbel-Morgenstern Copula is better compared to other Archimedean Copulas. The Clayton Copula is the more robust with respect to changes in the process standard deviation as the correlation coefficient increases.

Keywords:

copula; model; autocorrelation; statistical process control

1. Introduction

The origin of the time series goes back to the 1930s, in the context of the ARMA (Auto Regressive Moving Average) models that were developed by Herman Wold [1] for the stationary time series. Later, these models were expanded to include ARIMA (Auto Regressive Integrated Moving Average) in order to handle the non-stationary time series. In the 1970s, Box-Jenkins ARIMA models became very popular, as these models can handle seasonal and non-seasonal patterns [2]. In addition, smoothing techniques were used in time series [3,4]. These models use numerically iterative procedures to estimate the parameters involved in the time series models. These models are still popular and heavily used in forecasting. In addition to forecasting, these time series models can be used in quality control, especially in the construction of the popular EWMA (Exponentially Weighted Moving Average) control charts, under the assumption that the averages follow the AR(1) (or First Order Auto Regressive) model [5,6,7,8,9,10,11].

The use of Copulas in modeling resulted from the pioneering work of Sklar [12]. Substantial research was done in the 1980s and 1990s in the context of Copula modeling and applications in subjects such as economics, finance, actuarial science, engineering, etc. (see Nelsen [13]). The Copula models use the dependence structure and the direction of the association for modeling. There are several copula models, which differ based on their properties. The Copula models fall under one of two major categories; Archimedean Copulas and non-Archimedean Copulas [14,15]. In addition, there are copulas for discrete-type variables and continuous-type variables. In addition, there are different variants of the Copulas, such as Vine Copulas and Hierarchical Copulas [16,17,18,19,20,21]. These variants are quite useful in dimension reduction in the context of high dimensional problems. We expand upon the work of Hryniewicz [22] in considering other models of copula-based dependence, i.e., approximate models proposed in this study rely on Pearson’s rho, with practical considerations.

Autocorrelation is an inherent problem with time series data collected at short time intervals. Given the advent of automation and high-speed throughput in the modern era of manufacturing (Industry 4.0 [23]), autocorrelation is a significant problem and hampers traditional SPC techniques. Autocorrelation, if unaccounted for in statistical process control (SPC) applications, can result in false signals of out-of-control and may induce over-adjustment of the process and unwanted variation, e.g., ‘Deming’s funnel experiment’ [20]. There is a plethora of literature on SPC adjusted for autocorrelation (see [24,25,26,27,28,29,30]). In this study, we use Copula models to construct control charts for a stationary process under the assumption that the process follows the AR(1) series. We evaluate the performance of these models on the basis of Average Run Length (ARL).

2. Materials and Methods

Let us assume that the observations

y_{1}, y_{2}, \dots \dots y_{t}, \dots \dots

form an AR(1) model:

y_{t} = δ + ϕ y_{t - 1} + ε_{t}

(1)

Note that the error terms,

ε_{t}

, are assumed to be independent, with a mean = 0 and a constant variance =

σ_{ε}^{2}

, but are not necessarily distributed as a normal distribution. From (1), it follows that

C o v (y_{t}, y_{t - 1}) = C o v (δ, y_{t - 1}) + φ C o v (y_{t}, y_{t - 1}) + C o v (ε_{t,}, y_{t - 1})

(2)

Note that the observations

y_{t}

and

y_{t - 1}

follow the same distribution, although these observations are correlated. Let

ρ

be the first-order autocorrelation coefficient and

V a r (y_{t}) = σ^{2} \Rightarrow

ρ σ^{2} = 0 + φ σ^{2} + 0 = φ σ^{2} \Rightarrow ρ = φ

(3)

As we can see, the first-order autocorrelation

ρ

is equal to

φ

. Our interest is in finding a suitable Copula model for approximating the joint distribution among these correlated time series observations.

2.1. Copula Construction

Here, we investigate the Copula models, such as the Clayton, Gumbel, Frank, Farlie-Gumbel, and Gaussian Copulas. Let us first discuss the construction of Copulas in a general set-up. In order to construct the Copulas, we need to define some notations:

u = F (x) = P (y_{t} \leq x)

(4)

v = G (y) = P (y_{t - 1} \leq y)

However, since

y_{t}

and

y_{t - 1}

follow the same distribution (due to AR(1) being a stationary process), F = G. Then, the cumulative joint probability distribution of

y_{t}

and

y_{t - 1}

can be written as follows:

P (y_{t} \leq x, y_{t - 1} \leq y) = C (F (x), F (y)) = C (u, v)

(5)

According to Copula theory, the conditional cumulative probability distribution of

y_{t}

given

y_{t - 1}

is given by the following equation:

F_{1 | 2} (y_{t} | y_{t - 1}) = \frac{\partial C (u, v)}{\partial v}

(6)

Note, for the AR(1) model, the conditional distribution of

y_{t}

given

y_{t - 1}

is the same as the conditional distribution of

ε_{t} given y_{t - 1}

, with the mean shifted by a constant. However,

ε_{t} and y_{t - 1}

are independent of each other. Therefore, the conditional distribution of

y_{t}

given

y_{t - 1}

is the same as the unconditional distribution of

ε_{t},

with the mean shifted by the same constant.

2.1.1. Clayton Copula

The derivation of the conditional distribution for the Clayton Copula is as follows:

C (u, v) = {(u^{- α} + v^{- α} - 1)}^{- (\frac{1}{α})} \begin{array}{l} \frac{\partial C (u, v)}{\partial v} & = - (\frac{1}{α}) {(u^{- α} + v^{- α} - 1)}^{- (\frac{1}{α} + 1)} (- α) v^{- (α + 1)} \\ = \frac{{(u^{- α} + v^{- α} - 1)}^{- (\frac{1}{α} + 1)}}{v^{(1 + α)}} \\ = \frac{{({(u^{- α} + v^{- α} - 1)}^{- (\frac{1}{α})})}^{(1 + α)}}{v^{(1 + α)}} \\ = {(\frac{C (u, v)}{v})}^{(1 + α)} \end{array}

(7)

2.1.2. Gumbel Copula

The derivation of the conditional distribution for the Gumbel Copula is as follows:

C (u, v) = e^{- {\{{(- \ln u)}^{α} + {(- \ln v)}^{α}\}}^{\frac{1}{α}}}

\begin{array}{l} \frac{\partial C (u, v)}{\partial v} & = e^{- {\{{(- \ln u)}^{α} + {(- \ln v)}^{α}\}}^{\frac{1}{α}}} (- \frac{1}{α}) {\{{(- \ln u)}^{α} + {(- \ln v)}^{α}\}}^{\frac{1}{α} - 1} α {(- \ln v)}^{α - 1} (- \frac{1}{v}) \\ = e^{- {\{{(- \ln u)}^{α} + {(- \ln v)}^{α}\}}^{\frac{1}{α}}} {\{{(- \ln u)}^{α} + {(- \ln v)}^{α}\}}^{\frac{1}{α} - 1} {(- \ln v)}^{α - 1} (\frac{1}{v}) \\ = \frac{C (u, v)}{v} {\{{(- \ln u)}^{α} + {(- \ln v)}^{α}\}}^{\frac{1}{α} - 1} {(- \ln v)}^{α - 1} \end{array}

(8)

2.1.3. Farlie-Gumbel-Morgenstern (FGM) Copula

The derivation of the conditional distribution for the FGM Copula is as follows:

\frac{\partial C (u, v)}{\partial v} = u v (α (1 - u) (- 1)) + u \{1 + α (1 - u) (1 - v)\} = u \{1 + α (1 - u) (1 - v) - α (1 - u) v\} = \frac{C (u, v)}{v} - α u v (1 - u)

(9)

2.1.4. Frank Copula

The derivation of the conditional distribution for the Frank Copula is as follows:

C (u, v) = \frac{1}{α} l n \{1 + \frac{(e^{α u} - 1) (e^{α v} - 1)}{(e^{α} - 1)}\}

(10)

\frac{\partial C (u, v)}{\partial v} = \frac{e^{α v} (e^{α u} - 1)}{(e^{α} - 1) (1 + \frac{(e^{α u} - 1) (e^{α v} - 1)}{(e^{α} - 1)})}

Remark 1.

As you can see,

\frac{\partial C (u, v)}{\partial v}

is the cumulative conditional distribution of

y_{t}

given

y_{t - 1} .

This is the same as the cumulative conditional distribution of

ε_{t}

given

y_{t - 1} .

However,

ε_{t}

is independent of

y_{t - 1} .

Hence,

\frac{\partial C (u, v)}{\partial v}

is the cumulative distribution of the error term regardless of which Copula model is used.

2.1.5. Gaussian Copula

The derivation of the conditional distribution for the Gaussian Copula is as follows:

F (x, y) = C (u, v) = Φ_{2} (Φ^{- 1} (u), Φ^{- 1} (v))

(11)

where

Φ, Φ_{2}

represent the standard normal cdf and the standard bivariate normal cdf, respectively. Moreover

Φ^{- 1}

represents the functional inverse of

Φ

.

The conditional cumulative distribution is as follows:

H (x | y) = \frac{\partial C (u, v)}{\partial v} ~ N (μ_{x} + ρ \frac{σ_{x}}{σ_{y}} (y - μ_{y}), σ_{x}^{2} (1 - ρ^{2})

(12)

Now, the question is which copula should be used. For example, for data that exhibit a right tail, such as the exponential data (survival analysis for lifetime of machinery), the Clayton model is better [31]. For data that exhibit a left tail, such as the extreme data (stress due to heavy wind or pollution), the Gumbel model is better [32]. For data that have both tails, t-Copula is better [33]. If there are no tails, then the Farlie-Gumbel-Morgenstern model is better [34].

Suppose that for a particular location we are monitoring the moisture of materials for biomaterials and that the measurements are correlated. We assume the AR(1) model is appropriate for this particular data due to slow death in the autocorrelations between

y_{t}

, and

y_{t + k}

as

k \to \infty

. The reason is that for the AR(1) model, the autocorrelation between

y_{t}

and

y_{t + k}

is

ρ_{k} = φ^{k}

, where

φ

is the AR(1) model coefficient and

|φ| \leq 1

. Next, we do a simulation study to find out which Archimedean Copula approximates the conditional distribution better. For the simulation we assume that

y_{t}

follows a univariate normal distribution, with

μ = \frac{δ}{1 - φ}

and variance

= σ^{2}

. We use the Theorems A1 and A2 in Appendix A to derive the results presented in the next section.

3. Comparison of Copulas to Approximate the Conditional Distribution

Here, we present the results comparing the Copulas based on the conditional probabilities with the actual conditional probability

P (Y_{t} \leq y_{t} | Y_{t - 1} = y_{t - 1})

(see Table 1). It can be seen from Table 1 that the overall performance of the Clayton Copula and Farlie-Gumbel-Morgenstern (FGM) Copula is better compared to the other Archimedean Copulas considered here. So, we use Clayton Copula and FGM Copula for approximating the conditional distribution of

y_{t}

given

y_{t - 1}

.

In order to compute the conditional cumulative probability distribution under different copula models, we use the following values for the parameters of the AR(1) model. Note that

δ

represents the constant and

ρ (= φ)

is the coefficient in the AR(1) model. The observations are simulated according to the AR(1) model only (no other experimental design is used).

4. Applications in Quality Control

4.1. Construction of the Control Charts

Quality control methods started in the 1920s and became widely popular in the 1940s due to the efforts of Walter Shewhart [35], Edwards Deming [36], Joseph Juran [37], and Armand Feigenbaum [38]. As a result, the American Society for Quality Control (ASQC) was formed in 1946. The concept of control charts was later extended to cover multivariate situations as well [39]. In most of the control charts, the assumptions are that the observations are independent and that distribution is normal or multivariate normal (in multivariate situations). EWMA charts were introduced in situations where the observations were correlated [6,7]. Here we introduce a Copula-based approach to construct the control chart when the subsequent observations are correlated and follow the AR(1) model. The conditional distribution of

y_{t}

given

y_{t - 1}

is used in deriving the control chart limits.

4.2. EWMA Chart as a Special Case

Let

x_{0}, x_{1}, \dots \dots

be independent observations following a stationary process, as defined by

x_{i} = μ + E_{i},

where

E_{i}

are independent normally distributed error terms. Let us define

y_{i} as (1 - λ) x_{i} + λ y_{i - 1}

, where

y_{0} = x_{0}

. Note that

(1 - λ) x_{i} = (1 - λ) μ + (1 - λ) E_{i} = δ + ε_{i} where δ = (1 - λ) μ and ε_{i} = (1 - λ) E_{i} \Rightarrow y_{i} = δ + λ y_{i - 1} + ε_{i} .

(13)

4.3. Approximation Based on the Clayton Copula

As noted earlier,

\frac{\partial C (u, v)}{\partial v} = \frac{{(u^{- α} + v^{- α} - 1)}^{\frac{- (1 + α)}{α}}}{v^{(1 + α)}} = K (ε_{t})

(14)

where

K (x)

is the cumulative probability distribution function of the error

ε_{t}

. Note that this is the conditional distribution of

y_{t}

given

y_{t - 1}

. We are interested in setting up the quality control limits for

y_{t}

such that

\frac{γ}{2} \leq K (ε_{t}) \leq 1 - \frac{γ}{2}

, where

100 (1 - γ) %

is the confidence level for the control chart. So, for the conditional distribution of

y_{t}

given

y_{t - 1}

based on the Clayton Copula model, the

100 (1 - γ) %

confidence control chart can be found by solving the following inequality:

\frac{γ}{2} \leq \frac{{(u^{- α} + v^{- α} - 1)}^{\frac{- (1 + α)}{α}}}{v^{(1 + α)}} \leq 1 - \frac{γ}{2}

(15)

Here, we are making the assumption that the unconditional distribution of

y_{t - 1}

is normally distributed with mean

μ = \frac{δ}{1 - φ_{1}}

and variance

= σ^{2} .

Let

v = G (x_{1}) = P (y_{t - 1} \leq x_{1}) = Φ (\frac{(x_{1} - μ)}{σ})

Then, at the upper control limit

(U C L)

,

u^{- α} = 1 + {(G (y_{t - 1}))}^{- α} {(1 - \frac{γ}{2})}^{\frac{- α}{(1 + α)}} - {(G (y_{t - 1}))}^{- α}

So,

u_{u c l} = {\{1 + {(G (y_{t - 1}))}^{- α} {(1 - \frac{γ}{2})}^{\frac{- α}{(1 + α)}} - {(G (y_{t - 1}))}^{- α}\}}^{\frac{- 1}{α}} .

(16)

Note that

u = P (y_{t} \leq U C L)

. This means that

U C L = μ + σ Φ^{- 1} (u_{u c l})

(17)

Similarly, at the lower control limit

(L C L)

,

u^{- α} = 1 + {(G (y_{t - 1}))}^{- α} {(\frac{γ}{2})}^{\frac{- α}{(1 + α)}} - {(G (y_{t - 1}))}^{- α}

So,

u_{l c l} = {\{1 + {(G (y_{t - 1}))}^{- α} {(\frac{γ}{2})}^{\frac{- α}{(1 + α)}} - {(G (y_{t - 1}))}^{- α}\}}^{\frac{- 1}{α}} .

(18)

This means that

L C L = μ - σ Φ^{- 1} (u_{l c l}) .

(19)

4.4. Approximation Based on the Farlie-Gumbel-Morgenstern Copula

Again, as noted earlier,

\frac{\partial C (u, v)}{\partial v} = \frac{u v \{1 + α (1 - u) (1 - v)\}}{v} - α u v (1 - u) = K (ε_{t})

(20)

where

K (x)

is the cumulative probability distribution function of the error

ε_{t} .

Note that this is the conditional distribution of

y_{t}

given

y_{t - 1}

. We are interested in setting up the quality control limits for

y_{t}

such that

\frac{γ}{2} \leq K (ε_{t}) \leq 1 - \frac{γ}{2}

, where

100 (1 - γ) %

is the confidence level for the control chart. So, for the conditional distribution of

y_{t}

given

y_{t - 1}

, based on the FGM Copula model, the

100 (1 - γ) %

confidence control chart can be found by solving the following inequality:

\frac{γ}{2} \leq \frac{u v \{1 + α (1 - u) (1 - v)\}}{v} - α u v (1 - u) \leq 1 - \frac{γ}{2}

(21)

Here, we are making the assumption that the unconditional distribution of

y_{t - 1}

is normally distributed with mean

μ = \frac{δ}{1 - φ_{1}}

and variance

= σ^{2}

.

Let

v = G (x_{1}) = P (y_{t - 1} \leq x_{1}) = Φ (\frac{(x_{1} - μ)}{σ})

Let

u = P (y_{t} \leq x_{2})

Then, at the upper control limit

(U C L)

,

u_{u c l} = \frac{2 α v - (1 + α) \pm \sqrt{{(1 + α)}^{2} - 4 α (1 + α) v + 4 α^{2} v^{2} + 4 (2 α v - α) (1 - \frac{γ}{2})}}{2 α (2 v - 1)}

(22)

Note that

u_{u c l} = P (y_{t} \leq U C L)

. This means that

U C L = μ + σ Φ^{- 1} (u_{u c l})

(23)

Similarly, at the lower control limit

(L C L)

,

u_{l c l} = \frac{2 α v - (1 + α) \pm \sqrt{{(1 + α)}^{2} - 4 α (1 + α) v + 4 α^{2} v^{2} + 4 (2 α v - α) (\frac{γ}{2})}}{2 α (2 v - 1)}

(24)

This means that

L C L = μ - σ Φ^{- 1} (u_{l c l}) .

(25)

Next, we investigate the use of the Gaussian Copula in the same context to approximate the conditional distribution of

y_{t}

given

y_{t - 1}

.

5. Numerical Results

In this section, we perform a simulation study in order to compare the performance of the Gaussian Copula against the Clayton Copula and the FGM Copula on the basis of the average run length

(A R L_{0}),

under the assumption that the null hypothesis

H_{0}

is true. This study involved 10,000 simulation trials. Note that

ρ

represents the first order auto-correlation between any two consecutive observations, and

σ

is the error standard deviation according to the AR(1) model. Moreover, we are focused on the positive correlations, as the Clayton Copula is valid only for positive associations. We compare the performance of the Gaussian Copula with the Clayton Copula and the FGM Copula.

5.1. Gaussian Copula

As we can see from the results presented in Table 2, the ARL under the null hypothesis

(A R L_{0})

decreases as the correlation coefficient

(ρ)

increases. In addition, note that

A R L_{0}

is fairly robust with respect to changes in the process standard deviation

(σ

).

5.2. Clayton Copula

Again, as we can see from the results in Table 3, the ARL under the null hypothesis

(A R L_{0})

decreases as the correlation coefficient

(ρ)

increases. In addition, note that

(A R L_{0})

is fairly robust with respect to changes in the process standard deviation

(σ) .

5.3. Farlie-Gumbel-Morgenstern Copula

The results presented in Table 4 show that the ARL under the null hypothesis

(A R L_{0})

is not as robust when compared to the Clayton Copula at a particular value of the correlation coefficient

ρ .

The

(A R L_{0})

values did not decrease when the correlation coefficient

(ρ)

increased. Moreover, in the case of this Copula, the correlation coefficient has a restriction that

- \frac{1}{3} \leq ρ \leq \frac{1}{3} .

6. Conclusions

The numerical results indicate that the Clayton Copula behaves more or less like the Gaussian Copula at very low correlation levels between any two consecutive time dependent measurements. For example, in situations in which the measurements are taken at large time intervals, it is reasonable to assume a low level of correlation between these measurements. However, the use of the Clayton Copula is not advisable to construct quality control limits when the correlation level is reasonably high. In any event, the use of other types of Copulas to construct the quality control limits for an AR(1) process should be avoided. For the same reason, we decided not to consider power calculations in this paper.

This study about the use of Copulas to construct the control chart for an AR(1) process can be easily extended to the EWMA chart as it is also based on an AR(1) process. The authors propose future study of the use of the Clayton Copula to construct the EWMA quality control chart.

Author Contributions

Conceptualization, T.M.Y. and A.N.; methodology, T.M.Y. and A.N.; software, H.N.; validation, T.M.Y., A.N. and H.N.; formal analysis, A.N.; investigation, T.M.Y. and A.N.; resources, T.M.Y. and A.N.; data curation, A.N. and H.N.; writing—original draft preparation, T.M.Y. and A.N.; writing—review and editing, T.M.Y. and A.N.; writing—review and editing, project administration, T.M.Y.; funding acquisition, T.M.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the USDA National Institute of Food and Agriculture, Hatch project 1012359.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable; given data are derived by simulation.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Theorem A1.

At any given time

t

, if the observations

y_{1}, y_{2}, \dots \dots, y_{t - k}

follow a multivariate normal distribution with the mean

\underline{μ_{k}} = {(\begin{matrix} μ \\ μ \\ . \\ . \\ μ \end{matrix})}_{k \times 1}

and variance-covariance matrix given by

{\underline{Σ}}_{k} = {(\begin{matrix} σ^{2} & ρ σ^{2} & ρ^{2} σ^{2} & - & - & ρ^{k - 1} σ^{2} & ρ^{k} σ^{2} \\ ρ σ^{2} & σ^{2} & ρ σ^{2} & ρ^{2} σ^{2} & - & - & ρ^{k - 1} σ^{2} \\ ρ^{2} σ^{2} & ρ σ^{2} & σ^{2} & ρ σ^{2} & - & - & ρ^{k - 2} σ^{2} \\ - & ρ σ^{2} & ρ σ^{2} & σ^{2} & ρ σ^{2} & - & ρ^{k - 3} σ^{2} \\ - & - & - & ρ σ^{2} & σ^{2} & ρ σ^{2} & - \\ ρ^{k - 1} σ^{2} & - & - & - & ρ σ^{2} & σ^{2} & ρ σ^{2} \\ ρ^{k} σ^{2} & ρ^{k - 1} σ^{2} & ρ^{k - 2} σ^{2} & ρ^{k - 3} σ^{2} & - & ρ σ^{2} & σ^{2} \end{matrix})}_{k \times k}

Remark A1.

For the infinite dimensional case, let

\underline{μ_{k}} \to \underline{μ}

and

\underline{Σ_{k}} \to \underline{Σ}

as

k \to \infty .

Then the observations

y_{1}, y_{2}, y_{3}, \dots \dots, y_{t}, \dots \dots

follow the

A R (1)

model as given below.

y_{t} = δ + φ_{1} y_{t - 1} + ε_{t} \forall t \in ℤ^{+} where μ = \frac{δ}{1 - φ_{1}} and ρ = φ_{1} .

Proof.

Note that at any time

t,

(\begin{matrix} y_{t} \\ y_{t - 1} \end{matrix}) ~ N ((\begin{matrix} μ \\ μ \end{matrix}), (\begin{matrix} σ^{2} & ρ σ^{2} \\ ρ σ^{2} & σ^{2} \end{matrix}) .)

This means that

\begin{array}{l} (\begin{matrix} 1 & - φ_{1} \end{matrix}) (\begin{matrix} y_{t} \\ y_{t - 1} \end{matrix}) ~ N ((\begin{matrix} 1 & - φ_{1} \end{matrix}) (\begin{matrix} μ \\ μ \end{matrix}), (\begin{matrix} 1 & - φ_{1} \end{matrix}) (\begin{matrix} σ^{2} & ρ σ^{2} \\ ρ σ^{2} & σ^{2} \end{matrix}) (\begin{matrix} 1 \\ - φ_{1} \end{matrix})) \\ ~ N ((1 - φ_{1}) μ, σ^{2} (1 - φ_{1}^{2})) . \end{array}

This in turn means that

y_{t} - φ_{1} y_{t - 1} ~ N ((1 - φ_{1}) μ, σ^{2} (1 - φ_{1}^{2})) .

□

According to the

A R (1)

model

y_{t} = δ + φ_{1} y_{t - 1} + ε_{t}

\Leftrightarrow

y_{t} - φ_{1} y_{t - 1} = δ + ε_{t}

Note that the expected values on either side of the above equation match due to

δ = (1 - φ_{1}) μ

.

Next, let us verify the equality of the variance.

Again, note that

V a r (ε_{t} | y_{t - 1}) = V a r (y_{t} | y_{t - 1})

But,

V a r (y_{t} | y_{t - 1}) = σ_{11} - \frac{σ_{12}^{2}}{σ_{22}} \Rightarrow E (V a r (y_{t} | y_{t - 1})) = σ_{11} - \frac{σ_{12}^{2}}{σ_{22}}

where

σ_{11} = V a r (y_{t}), σ_{22} = V a r (y_{t - 1}), σ_{12} = C o v (y_{t}, y_{t - 1})

In addition,

E (y_{t} | y_{t - 1}) = μ_{t} + \frac{σ_{12}}{σ_{22}} (y_{t - 1} - μ_{t - 1})

where

μ_{t} = E (y_{t})

and

μ_{t - 1} = E (y_{t - 1})

.

Note that due to the

A R (1)

model,

y_{t} = δ + φ_{1} y_{t - 1} + ε_{t}

,

\begin{array}{l} E (y_{t} | y_{t - 1}) = δ + φ_{1} y_{t - 1} + E (ε_{t} | y_{t - 1}) \\ = δ + φ_{1} y_{t - 1} + 0 \\ = δ + φ_{1} y_{t - 1} \end{array}

Term-by-term comparison yields

φ_{1} = \frac{σ_{12}}{σ_{22}}

and

δ = μ_{t} - \frac{σ_{12}}{σ_{22}} μ_{t - 1}

.

Due to the stationarity of the series,

μ_{t} = μ_{t - 1} = μ

, and this means

δ = μ (1 - φ_{1})

Again, due to the stationarity,

σ_{11} = σ_{22}

= σ^{2}

.

Note that

φ_{1} = \frac{σ_{12}}{σ_{22}} = \frac{ρ \sqrt{σ_{11}} \sqrt{σ_{22}}}{σ_{22}} = ρ \frac{\sqrt{σ_{11}}}{\sqrt{σ_{22}}} = ρ

As noted earlier,

E (y_{t} | y_{t - 1}) = δ + φ_{1} y_{t - 1} and V a r (y_{t} | y_{t - 1}) = σ_{11} - \frac{σ_{12}^{2}}{σ_{22}} .

Now let us use the identity,

\begin{array}{l} V a r (y_{t}) = E (V a r (y_{t} | y_{t - 1})) + V a r (E (y_{t} | y_{t - 1})) \\ = σ_{11} - \frac{σ_{12}^{2}}{σ_{22}} + \frac{σ_{12}^{2}}{σ_{22}^{2}} σ_{22} \\ = σ_{11} \\ = σ^{2} \end{array}

Theorem A2.

Let the stationary time series

y_{1}, y_{2}, y_{3}

……

y_{t}

, …… jointly follow a multivariate normal distribution while satisfying the properties of the AR(1) model. Then, this time series forms a Markov Chain.

Proof.

Let us define the vector

[\begin{matrix} y_{t} \\ y_{t - 1} \\ . \\ . \\ . \\ . \end{matrix}] = [\begin{matrix} \underline{X} \\ \underline{Y} \end{matrix}]

Note that vectors

\underline{X}

and

\underline{Y}

are the partitions.

It is well-known that

\underline{X}

given

\underline{Y} = \underline{y}

follows a multivariate normal distribution with mean

E (\underline{X} | \underline{Y} = \underline{y}) = \underline{μ_{X}} + \underline{Σ_{X Y}} \underline{Σ_{Y Y}^{- 1}} (\underline{y} - \underline{μ_{Y}})

and variance

V a r (\underline{X} | \underline{Y} = \underline{y}) = \underline{Σ_{X X}} - \underline{Σ_{X Y}} \underline{Σ_{Y Y}^{- 1}} \underline{Σ_{Y X}} .

By using the above results, we can easily verify that

\begin{array}{l} E (y_{t} | y_{t - 1}) = E (y_{t} | y_{t - 1}, y_{t - 2}) = \dots \dots = E (y_{t} | y_{t - 1}, y_{t - 2}, \dots \dots, y_{1}) \\ = μ + ρ (y_{t - 1} - μ) \end{array}

and

\begin{array}{l} V a r (y_{t} | y_{t - 1}) = V a r (y_{t} | y_{t - 1}, y_{t - 2}) = \dots \dots = V a r (y_{t} | y_{t - 1}, y_{t - 2}, \dots \dots, y_{1}) \\ = σ^{2} (1 - ρ^{2}) \end{array}

Furthermore, the conditional distribution is normal distribution, so,

P (y_{t} \leq y | y_{t - 1}) = P (y_{t} \leq y | y_{t - 1}, y_{t - 2}) = \dots \dots = P (y_{t} \leq y | y_{t - 1}, y_{t - 2}, \dots \dots, y_{1}) .

This means that the time series

y_{1}, y_{2}, \dots \dots, y_{t}, \dots \dots

forms a Markov Chain. □

Note that the conditional distribution of

y_{t}

given

y_{t - 1}

is given by

P (y_{t} \leq y | y_{t - 1} = x^{*}) = Φ (\frac{y - (μ + ρ (x^{*} - μ))}{σ \sqrt{1 - ρ^{2}}})

As seen from above, the conditional distribution is normally distributed with mean =

μ + ρ (x^{*} - μ) = ρ x^{*} + (1 - ρ) μ

and variance =

σ^{2} (1 - ρ^{2})

.

Next, our interest is in studying the Average Run Length (ARL).

Deriving a Theoretical Expression for

A R L_{0}

Let

X =

Run Length

P (X = x) = P (\begin{array}{l} l c l \leq y_{1} \leq u c l, l c l \leq y_{2} \leq u c l, \dots \dots, l c l \leq y_{x - 1} \leq u c l, \\ y_{x} \notin (l c l, u c l) \end{array}) \begin{array}{l} = P (l c l \leq y_{1} \leq u c l) P (l c l \leq y_{2} \leq u c l | y_{1}) P (l c l \leq y_{3} \leq u c l | y_{2}, y_{1}) \\ . P (l c l \leq y_{4} \leq u c l | y_{3}, y_{2}, y_{1}) \dots \dots P (l c l \leq y_{x - 1} \leq u c l | y_{1}, y_{2}, \dots \dots, y_{x - 2}) \\ . P (y_{x} \notin (l c l, u c l) | y_{1}, y_{2}, \dots \dots, y_{x - 1}) \end{array}

Let

p_{1} (x^{*}) = Φ (\frac{u c l - (μ + ρ (x^{*} - μ))}{σ \sqrt{1 - ρ^{2}}})

p_{2} (x^{*}) = Φ (\frac{l c l - (μ + ρ (x^{*} - μ))}{σ \sqrt{1 - ρ^{2}}})

Then,

\begin{array}{l} P (X = x) = \frac{1}{\sqrt{2 π} σ} \{Φ (\frac{u c l - μ}{σ}) - Φ (\frac{l c l - μ}{σ})\} \\ . \int_{- \infty}^{\infty} {(p_{1} (x^{*}) - p_{2} (x^{*}))}^{x - 2} \{1 - p_{1} (x^{*}) + p_{2} (x^{*})\} e^{- 0.5 (\frac{x^{*} - μ}{σ})} d x^{*} \end{array}

Note that

A R L_{0} = \sum_{x = 1}^{\infty} x P (X = x)

when

μ = \frac{l c l + u c l}{2}

(or the center line).

Similarly,

A R L_{1} = \sum_{x = 1}^{\infty} x P (X = x)

when

μ

deviates from the center line.

References

Wold, H. A Study in the Analysis of Stationary Time Series, 2nd ed.; Almqvist and Wiksell Book Co.: Stockholm, Sweden, 1954. [Google Scholar]
Box, G.E.P.; Jenkins, G.M. Time Series Analysis Forecasting and Control; Holden-Day: San Francisco, CA, USA, 1970. [Google Scholar]
Gardner, E.S. Exponential smoothing: The state of the art. J. Forecast. 1985, 4, 1–28. [Google Scholar] [CrossRef]
Gardner, E.S.; McKenzie, E. Forecasting trends in time series. Manage. Sci. 1985, 31, 1237–1246. [Google Scholar] [CrossRef]
Roberts, S.D. Properties of control chart zone tests. Bell Sys. Tech. J. 1959, 37, 83–114. [Google Scholar] [CrossRef]
Crowder, S.V. A simple method for studying run length distribution of exponentially weighted moving average control. Technometrics 1959, 29, 401–407. [Google Scholar]
Crowder, S.V. Design of exponentially weighted moving average schemes. Technometrics 1989, 21, 155–162. [Google Scholar] [CrossRef]
Saccucci, M.S.; Lucas, J.M. Average run lengths for exponentially weighted moving average control schemes using the Markov Chain approach. J. Qual. Technol. 1990, 22, 154–162. [Google Scholar] [CrossRef]
Alwan, L.C.; Roberts, H.V. Time-series modeling for statistical process control. J. Bus Econ. Stat. 1988, 6, 87–95. [Google Scholar]
Montgomery, D.C.; Mastrangelo, C.M. Some statistical process control methods for autocorrelated data. J. Qual. Technol. 1991, 23, 179–193. [Google Scholar] [CrossRef]
Schmid, W. On EWMA Charts for Time Series. In Frontiers in Statistical Quality Control; Lenz, H.J., Wilrich, P.T., Eds.; Physica-Verlag: Heidelberg, Germany, 1997; pp. 115–137. [Google Scholar]
Sklar, A. Fonctions de répartition à n dimensions et leurs marges. Publ. Inst. Statist. Univ. Paris. 1959, 8, 229–231. [Google Scholar]
Nelsen, R. An Introduction to Copulas, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
Genest, C.; MacKay, J. Copules archim’ediennes et familles de lois bidimensionnelles dont les marges sont donn´ees. Canad. J. Statist. 1986, 14, 145. [Google Scholar] [CrossRef]
Genest, C.; MacKay, J. The joy of copulas: Bivariate distributions with uniform marginals. Amer. Statist. 1986, 40, 280–285. [Google Scholar]
Torre, E.; Marelli, S.; Embrechts, P.; Sudret, B. A general framework for data-driven uncertainty quantification under complex dependencies using Vine Copulas. Probab. Eng. Mech. 2018, 55, 1–16. [Google Scholar] [CrossRef] [Green Version]
Tang, X.S.; Li, D.; Zhou, Q.; Phoon, K.K.; Zhang, L.M. Impact of Copulas for modeling bivariate distributions on system reliability. Struct. Saf. 2013, 44, 80–90. [Google Scholar] [CrossRef]
Zhang, M.; Bedford, T. Vine copula approximation: A generic method for coping with conditional dependence. Stat. Comput. 2018, 28, 219–237. [Google Scholar] [CrossRef] [Green Version]
Aas, K.; Czado, C.; Frigessi, A.; Bakken, H. Pair-copula constructions of multiple dependence. Insur. Math. Econ. 2009, 44, 182–198. [Google Scholar] [CrossRef] [Green Version]
Kurowicka, D. Dependence Modeling: Vine Copula Handbook; World Scientific: Singapore, 2011. [Google Scholar]
So, M.K.; Yeung, C.Y. Vine-copula garch model with dynamic conditional dependence. Comput. Stat. Data Anal. 2014, 76, 655–671. [Google Scholar] [CrossRef]
Hryniewicz, O. On the Robustness of the Shewhart Control Chart to Different Types of Dependencies in Data. In Frontiers in Statistical Quality Control; Lenz, H.J., Schmid, W., Wilrich, P.T., Eds.; Physica: Heidelberg, Germany, 2012; Volume 10. [Google Scholar]
Liao, Y.; Deschamps, F.; De Freitas Rocha Loures, E.; Pierin Ramos, L.F. Past, present and future of Industry 4.0—A systematic literature review and research agenda proposal. Int. J. Prod. Res. 2017, 55, 3609–3629. [Google Scholar]
Reynolds, M.R.; Arnold, J.C.; Baik, J.W. Variable sampling interval X charts in the presence of correlation. J. Qual. Technol. 1996, 28, 12–30. [Google Scholar] [CrossRef]
Gilbert, K.C.; Kirby, K.; Hild, C.R. Charting autocorrelated data: Guidelines for practitioners. Qual. Eng. 1997, 9, 367–382. [Google Scholar] [CrossRef]
Lu, C.W.; Reynolds, M.R. EWMA control charts for monitoring the mean of autocorrelated processes. J. Qual. Technol. 1999, 31, 166–188. [Google Scholar] [CrossRef]
Lin, Y.-C. The variable parameters control charts for monitoring autocorrelated processes. Comm. Stat-Simul C 2009, 38, 729–749. [Google Scholar] [CrossRef]
Xi, M.; Zhang, L.; Hu, J.; Palazoglu, A. A model-free approach to reduce the effect of autocorrelation on statistical process control charts. J. Chemom. 2018, 32, 12. [Google Scholar]
Zhang, N.F. A statistical control chart for stationary process data. Technometrics 1998, 40, 24–38. [Google Scholar] [CrossRef]
Woodall, W.H.; Faltin, F. Autocorrelated data and SPC. ASQC Stat. Div. Newsl. 1993, 13, 1821. [Google Scholar]
Al-babtain, A.; Elbatal, I.; Yousof, H.M. A new flexible three-parameter model: Properties, Clayton Copula, and modeling real data. Symmetry 2020, 12, 440. [Google Scholar] [CrossRef] [Green Version]
Simiu, E.; Heckert, N.A.; Filliben, J.J.; Johnson, S.K. Extreme wind load estimates based on the Gumbel distribution of dynamic pressures: An assessment. Struct. Saf. 2001, 23, 221–229. [Google Scholar] [CrossRef]
Cossin, D.; Schellhorn, H.; Song, N.; Satjaporn, T. A theoretical argument why the t-Copula explains credit risk contagion better than the Gaussian Copula. Adv. Decs. Sci. 2010, 29, 546547. [Google Scholar] [CrossRef]
McCool, J.I. Testing for dependency of failure times in life testing. Technometrics 2012, 48, 41–48. [Google Scholar] [CrossRef]
Shewhart, W.A. Economic Control of Quality of Manufactured Product; D. Van Nostrand Company: New York, NY, USA, 1931. [Google Scholar]
Deming, W.E. Out of the Crisis; Massachusetts Institute of Technology, Center for Advanced Engineering Study: Cambridge, MA, USA, 1986. [Google Scholar]
Juran, J.M. Quality-Control Handbook; McGraw-Hill: New York, NY, USA, 1951. [Google Scholar]
Feigenbaum, A.V. Total Quality Control; McGraw-Hill: New York, NY, USA, 1991. [Google Scholar]
Hotelling, H. Analysis of a complex of statistical variables into principal components. J. Educ. Psychol. 1933, 24, 417–441. [Google Scholar] [CrossRef]

Table 1. Comparing the Copulas based on the conditional probabilities with the actual conditional probability.

$F o r 27^{-} \leq y_{t - 1} \leq 27^{+}, y_{t} = 26.5, σ = 0.2, ρ = 0.6, δ = 10.5$
Clayton	FGM	Frank	Gumbel	Actual
0.521	0.502	0.857	0.00114	0.500
0.516	0.506	0.855	0.00112	0.559
0.455	0.423	0.831	0.00083	0.483
0.449	0.417	0.829	0.00029	0.563
0.507	0.488	0.854	0	0.635
$F o r 26^{-} \leq y_{t - 1} \leq 26^{+}, y_{t} = 26.5, σ = 0.2, ρ = 0.6, δ = 10.5$
Clayton	FGM	Frank	Gumbel	Actual
0.975	0.934	0.821	0.992	0.993
1.0	0.943	0.859	0.999	1.0
0.922	0.921	0.786	0.978	0.978
1.0	0.939	0.796	0.997	1.0
0.958	0.915	0.807	0.986	0.989
$F o r 35^{-} \leq y_{t - 1} \leq 35^{+}, y_{t} = 35.5, σ = 0.2, ρ = 0.7, δ = 10.5$
Clayton	FGM	Frank	Gumbel	Actual
0.988	0.936	0.957	0.998	0.997
0.989	0.940	0.963	0.998	0.998
0.989	0.941	0.965	0.998	0.998
1.0	0.931	0.952	1.0	1.0
0.990	0.954	0.973	0.998	0.998

Table 2. Average run length by

ρ and σ

for the Clayton Copula.

Table 2. Average run length by

ρ and σ

for the Clayton Copula.

	ρ
σ	0	0.1	0.2	0.3	0.4	0.5	0.6	0.7	0.8	0.9
0.1	326.5	313.7	270.1	210.2	143.0	91.5	50.9	28.0	13.1	5.04
0.2	323.6	325.5	232.9	196.7	132.6	90.2	50.3	28.1	13.3	5.08
0.3	326.7	305.9	228.1	186.8	124.8	87.1	50.2	28.6	12.9	5.06
0.4	270.4	300.3	217.1	182.5	123.9	91.9	50.5	28.2	13.1	5.06
0.5	344.7	292.6	242.7	185.9	126.2	92.1	48.6	28.2	13.1	5.09
0.6	280.0	278.4	278.3	180.2	101.4	78.5	50.7	28.4	13.1	5.14
0.7	324.3	281.3	247.0	173.0	122.7	87.9	48.1	29.1	13.1	5.06
0.8	329.1	274.7	241.2	174.3	125.7	82.1	48.5	29.0	12.8	5.11
0.9	350.9	272.2	259.6	158.4	104.8	78.7	48.9	28.3	12.9	5.10
1.0	379.2	273.4	249.0	154.3	104.8	70.8	51.4	28.4	13.2	5.08

Table 3. Average run length by

ρ and σ

for the Gaussian Copula.

Table 3. Average run length by

ρ and σ

for the Gaussian Copula.

	ρ
σ	0	0.1	0.2	0.3	0.4	0.5	0.6	0.7
0.1	347.8	300.9	219.1	120.0	56.0	25.5	11.5	5.74
0.2	322.6	342.9	219.9	119.3	59.8	25.5	11.3	5.46
0.3	295	295.3	227.1	119.5	56.6	25.2	11.4	5.49
0.4	317.3	309.1	181.5	112.0	52.9	25.4	11.5	5.45
0.5	302.1	309.7	219.3	113.8	57.4	25.2	11.2	5.45
0.6	311	302.6	237.3	119.7	59.1	25.6	11.8	5.53
0.7	324.3	281.3	247.0	113.0	59.9	27.1	11.8	5.28
0.8	329.1	274.7	241.2	106.4	56.2	26.5	11.2	5.41
0.9	350.9	272.2	259.6	112.4	54.7	29.0	11.3	5.60
1.0	322.3	273.4	249.0	109.9	52.6	27.4	11.2	5.40

Table 4. Average run length by

ρ and σ

for the Farlie-Gumbel-Morgenstern Copula.

Table 4. Average run length by

ρ and σ

for the Farlie-Gumbel-Morgenstern Copula.

	ρ
σ	0	0.1	0.2	0.3
0.1	329.6	474.4	440.4	575.2
0.2	341.4	431.0	559.8	483.3
0.3	312.1	390.6	442.0	610.0
0.4	334.3	456.2	323.1	585.7
0.5	344.8	443.8	550.2	580.0
0.6	323.1	459.0	559.8	606.4
0.7	338.8	391.3	511.7	374.9
0.8	333.2	432.5	558.5	436.4
0.9	317.5	450.3	454.3	375.7
1.0	337.9	439.7	475.1	411.6

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Young, T.M.; Nanthakumar, A.; Nanthakumar, H. On the Use of Copula for Quality Control Based on an AR(1) Model. Mathematics 2021, 9, 2211. https://doi.org/10.3390/math9182211

AMA Style

Young TM, Nanthakumar A, Nanthakumar H. On the Use of Copula for Quality Control Based on an AR(1) Model. Mathematics. 2021; 9(18):2211. https://doi.org/10.3390/math9182211

Chicago/Turabian Style

Young, Timothy M., Ampalavanar Nanthakumar, and Hari Nanthakumar. 2021. "On the Use of Copula for Quality Control Based on an AR(1) Model" Mathematics 9, no. 18: 2211. https://doi.org/10.3390/math9182211

APA Style

Young, T. M., Nanthakumar, A., & Nanthakumar, H. (2021). On the Use of Copula for Quality Control Based on an AR(1) Model. Mathematics, 9(18), 2211. https://doi.org/10.3390/math9182211

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

On the Use of Copula for Quality Control Based on an AR(1) Model^†

Abstract

1. Introduction

2. Materials and Methods

2.1. Copula Construction

2.1.1. Clayton Copula

2.1.2. Gumbel Copula

2.1.3. Farlie-Gumbel-Morgenstern (FGM) Copula

2.1.4. Frank Copula

2.1.5. Gaussian Copula

3. Comparison of Copulas to Approximate the Conditional Distribution

4. Applications in Quality Control

4.1. Construction of the Control Charts

4.2. EWMA Chart as a Special Case

4.3. Approximation Based on the Clayton Copula

4.4. Approximation Based on the Farlie-Gumbel-Morgenstern Copula

5. Numerical Results

5.1. Gaussian Copula

5.2. Clayton Copula

5.3. Farlie-Gumbel-Morgenstern Copula

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

On the Use of Copula for Quality Control Based on an AR(1) Model †

Abstract

1. Introduction

2. Materials and Methods

2.1. Copula Construction

2.1.1. Clayton Copula

2.1.2. Gumbel Copula

2.1.3. Farlie-Gumbel-Morgenstern (FGM) Copula

2.1.4. Frank Copula

2.1.5. Gaussian Copula

3. Comparison of Copulas to Approximate the Conditional Distribution

4. Applications in Quality Control

4.1. Construction of the Control Charts

4.2. EWMA Chart as a Special Case

4.3. Approximation Based on the Clayton Copula

4.4. Approximation Based on the Farlie-Gumbel-Morgenstern Copula

5. Numerical Results

5.1. Gaussian Copula

5.2. Clayton Copula

5.3. Farlie-Gumbel-Morgenstern Copula

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

On the Use of Copula for Quality Control Based on an AR(1) Model^†