Kernel-Based Multivariate Nonparametric CUSUM Multi-Chart for Detection of Abrupt Changes

Lei Qiao; Bing Wang

doi:10.3390/math12101473

and

School of Statistics and Mathematics, Shanghai Lixin University of Accounting and Finance, Shanghai 201209, China

^*

Author to whom correspondence should be addressed.

Mathematics2024, 12(10), 1473;https://doi.org/10.3390/math12101473

This article belongs to the Special Issue Statistical Monitoring and AI Models

Version Notes

Order Reprints

Abstract

In many cases, it is difficult to obtain precise distributional information on multivariate sequences. Therefore, there is a need to propose nonparametric methods for monitoring multivariate sequences. This article discusses the multivariate change detection problem and utilizes the kernel function as the statistic to construct the nonparametric Multivariate Cumulative Sum multi-chart, under the assumption that there is prior information about the abrupt changes. Through theoretical and numerical analysis, we show that the proposed control chart is more effective compared to other existing control charts. The good monitoring effect of this method demonstrates a strong potential for application.

Keywords:

distribution-free; kernel-based; multivariate CUSUM multi-chart

MSC:

62L10

1. Introduction

Statistical Process Control (SPC) is an important problem in numerous applications. The development of science and the increase in data storage have led to an increasing need for Multivariate Statistical Process Control (MSPC) [1,2]. The purpose of change detection is to raise an alarm as soon as possible when a change occurs. In order to evaluate the detectability of the control chart, we usually use two kinds of average run lengths (

A R L s

): one is in-control (IC)

A R L

, denoted by

A R L_{0}

, and the other is out-of-control (OC)

A R L

, denoted by

A R L_{1}

. With the same

A R L_{0}

, a control chart with a smaller

A R L_{1}

is better and can detect changes more effectively.

In the literature, many researchers focus on this area and propose various methods to detect multivariate variables. Multivariate Statistical Process Control techniques were established by Hotelling in his pioneering paper in 1947 [3], which monitored the mean vector of multiple quality variables following a multivariate normal distribution. Subsequently, the field has attracted many experts to conduct related research and many different multivariate control charts have been proposed. In the literature, multivariate control charts can be classified into two categories according to the presence or absence of the assumption of data distribution. The first category with a distribution assumption is parametric. These include the Multivariate Exponentially Weighted Moving Average control chart (MEWMA [4,5]), Multivariate Cumulative Sum control chart (MCUSUM [6,7]) and other control charts, such as [8,9,10], which use different strategies to construct control charts to detect multivariate variables. These charts typically require the distribution of the observed data to obey a parametric (usually normal) form. However, in most cases, we lack the distributional information, or we do not have enough samples to estimate the exact distribution. The second category is nonparametric without distribution assumption, i.e., distribution-free control charts. Unfortunately, unlike many univariate nonparametric charts, it is challenging to design a distribution-free MSPC scheme based on conventional construction [11]. There are many multivariate nonparametric control charts proposed. Refs. [12,13] discussed multivariate process control in the context of multivariate data. Ref. [14] proposed general and multiplicative nonparametric ratio models for data envelopment analysis problems with interval data. Ref. [15] provided a variable sampling interval (VSI) Shewhart

\bar{X}

chart. Ref. [2] proposed a multivariate sign EWMA chart without distribution assumption, which requires some in-control observations. Other nonparametric multivariate control charts such as [16,17,18] also tried to use different methods to detect changes without parametric assumption.

In this paper, we develop a nonparametric or distribution-free control chart for monitoring multivariate data. Our approach is different from the existing nonparametric SPC methods in the sense that we apply the kernel function to construct the CUSUM multi-chart, that is, we construct a CUSUM-type control chart that does not require the distributional information of the multivariate data. Meanwhile, the method proposed in this paper solves three challenges in multivariate change detection:

The first challenge is as follows: It is known that the traditional CUSUM control chart needs both pre-change and post-change information [7]. However, in reality, for online detection, we usually do not know the post-change information, we only have some pre-change observations. In order to solve this challenge, we construct the nonparametric multivariate CUSUM multi-chart. The idea of the multi-chart approach originates from [19], which shows that the multi-chart approach can detect different types of changes.
The second challenge is how to capture important features when constructing a control chart for multivariate data. It is known that for multivariate data, such as when the dimension is $p = 30$ , which is a relatively large number, statistics like Hotelling’s $T^{2}$ may contain errors and trigger false alarms in online change detection [3]. In order to overcome this challenge, we utilize the kernel function to capture information for the multivariate data.
The last challenge is the amount of historical pre-change observations; we may not have too many historical pre-change observations, that is, the amount of historical pre-change observations required for the method proposed in this paper does not need to be very large, making this method promising for application. For nonparametric control charts, historical pre-change observations are required. However, if the number of observations needed is excessively large, the loss outweighs the gain, and in many application scenarios, it is not achievable [20].

From the above statements, we can conclude that the kernel-based multivariate nonparametric CUSUM multi-chart proposed in this paper can overcome the aforementioned three challenges. We also present the theoretical results and simulation results of the control chart proposed in this paper.

The remainder of this paper is organized as follows: In Section 2, we first present a brief review of the reproducing kernel Hilbert space (RKHS) and then introduce the kernel-based multivariate nonparametric CUSUM multi-chart proposed in this paper, along with some theoretical properties. Section 3 uses the simulated data to present that the proposed kernel-based multivariate nonparametric CUSUM multi-chart has a better performance compared with other existing control charts. The conclusion and recommendations are given in Section 4. The proofs of the theoretical results are listed in Appendix A.

2. Kernel-Based Multivariate Nonparametric CUSUM Multi-Chart

In this section, we propose the kernel-based CUSUM multi-chart for the multivariate observation sequences. To illustrate this model in detail, we first briefly review the reproducing kernel Hilbert space (RKHS). This involves providing the definition of the kernel function and the statistics used for change detection. With the kernel function, we define the nonparametric CUSUM multi-chart and obtain the theoretical properties to ensure the performance of this model.

It is known that for a traditional univariate CUSUM control chart, we need to know the pre-change and post-change distributions to define the statistics used in the control chart. However, the post-change information is usually unknown in reality, as is the distribution information. So, in this section, the kernel-based CUSUM multi-chart can solve the above problem when both the post-change information and the distribution information are unknown.

2.1. Examples of Kernels

In this section, we construct the CUSUM multi-chart based on the kernel function. Here, we give some examples of kernels that are widely used, as follows:

When $X = R^{d}$ , $k^{l i n} (x, y) = < x, y >_{R^{d}}$ defines the linear kernel. When $d = 1$ , the KCP (kernel change-point) then coincides with the algorithm proposed in [21].
When $X = R^{d}$ , $k_{h}^{G} (x, y) = {exp [- | | x - y | |}^{2} / (2 h^{2})]$ defines the Gaussian kernel with bandwidth $h > 0$ , which is used in the experiments in Section 3.
When $X = R^{d}$ , $k_{h}^{L} (x, y) = exp [- | | x - y | | / h]$ defines the Laplace kernel with bandwidth $h > 0$ , which is used in the experiments in Section 3.
When $X = R^{d}$ , $k_{h}^{e} (x, y) = exp (< x, y >_{R^{d}} / h)$ defines the exponential kernel with bandwidth $h > 0$ . Note that, unlike Gaussian and Laplace kernels, the exponential kernel is not translation-invariant.

Classical kernels can be found in the books by [22], with the advantageous properties of the Gaussian kernel and Laplace kernel; we use these two kernels in the simulations and compare the results with other online detection methods. Kernel change-point methods can be used in offline monitoring problems and online monitoring problems. Here, we consider online change detection. So, we have pre-change information through historical pre-change observations, but the exact post-change information is usually unknown.

2.2. Kernel Change-Point Online Detection Method

Let

X_{- m + 1}, X_{- m + 2}, \dots, X_{0}, X_{1}, X_{2}, X_{3}, \dots, X_{N}

be the multivariate observation sequence. And,

X_{i}, - m + 1 \leq i \leq N \in R^{p}

are all p-dimensional vectors. We assume that

X_{- m + 1}

,

X_{- m + 2}

, …,

X_{0}

are historical pre-change observations, and the change-point is defined as

τ

(

1 \leq τ \leq N

). The observation sequence changes from an in-control to an out-of-control state after the change-point

τ

. From the m historical pre-change observations, we can estimate the pre-change mean vector denoted as

{\hat{μ}}_{0} \in R^{p}

. While the post-change mean vector, denoted by

μ_{1} \in R^{p}

, usually is not known in online change detection, we can obtain the possible boundary of the post-change mean vector, denoted by D, based on empirical knowledge or prior information. With the possible boundary D, we can define the c reference post-change mean vectors

μ_{1}^{k}, 1 \leq k \leq c

to construct multiple control charts to detect different types of changes.

With the kernel function, estimated pre-change mean vector

{\hat{μ}}_{0}

, and possible post-change mean vectors

μ_{1}^{k}

, where

1 \leq k \leq c

, we can construct the kernel-based CUSUM multi-chart. The kernel-based CUSUM multi-chart is defined as follows: For fixed

c \geq 2

,

\begin{matrix} T_{M C} & = min_{1 \leq k \leq c} T_{k}, \\ T_{k} & = min \{1 \leq n \leq N : max_{1 \leq j \leq n} \sum_{i = j}^{n} ln \frac{k (X_{i}, μ_{1}^{k})}{k (X_{i}, {\hat{μ}}_{0})} > d_{n}^{k}\}, \end{matrix}

(1)

where

d_{n}^{k} \in R

is called the control limit. If an abrupt change occurs, we warn as soon as the statistic exceeds the control limit. So, it is important to study the statistical properties of

T_{M C}

. For p-dimensional observations, possible changes may occur in each dimension. Suppose that in each dimension we give l pre-specified known reference values, then

c = l^{p}

. However, in reality, the changes usually only occur in several dimensions, and in the simulation, we consider the changes only occur in selected

[p / 5]

dimensions.

The traditional probability likelihood ratio function is replaced by the kernel ratio function in (1). Under a nonparametric assumption, the CUSUM-type control chart can not use the probability likelihood ratio function. Therefore, we aim to find an alternative function to replace the probability function. The reasons why we consider the kernel function are listed as follows:

The kernel function contains a lot of information, especially in high dimensional cases, and is commonly used in classification problems [23]. While change detection is a very special “classification” problem, in change detection, we need to classify the data as either in-control or out-of-control data. Similarly, can we apply the kernel function in change detection?
Actually, the kernel function is widely used in the literature in online and offline change detection problems, as demonstrated in [24,25]. Can we naturally apply this function in a CUSUM-type control chart?

For the two reasons mentioned above, we apply the kernel function to replace the probability function and construct the CUSUM multi-chart to solve the problem that the shift is unknown before detection.

Similarly, for the kernel-based CUSUM multi-chart defined in this section, in order to evaluate the detection power of any kernel-based test T with

A R L_{0} (T) = E_{0} (T) = γ

, we also propose an index called

K C P I

(Kernel Control Chart Performance Index), as in [19], which is as follows:

K C P I (T) = exp {- \int_{D} w (μ_{1}) \frac{A R L_{1} (T) - A R L_{1}^{*}}{A R L_{1}^{*}} d μ_{1}},

(2)

where

A R L_{1} (T) = E_{1} (T) .

The definition of

A R L

is the average value of run length T. Since the state is divided into in-control and out-of-control states, there are two

A R L s

.

E_{0}

means the mathematical expectation of the in-control state, and the in-control

A R L

is denoted by

A R L_{0} = E_{0} (T)

. Similarly,

E_{1}

stands for the mathematical expectation of the out-of-control state; the out-of-control

A R L

is denoted by

A R L_{1} = E_{1} (T)

.

A R L_{1}^{*}

is the lower bound and

w (μ_{1})

is the weight of the unknown post-change mean vector

μ_{1}

, that is, we do not know the exact value of

μ_{1}

, but from engineering knowledge, we know the possible boundary of

μ_{1}

denoted by D and the weight function denoted by

w (μ_{1})

. If we have no additional information about the unknown

μ_{1}

, the weight function is

w (μ_{1}) = 1 / \int_{D} d μ_{1}

.

Since the purpose of change detection is to provide an alert as soon as possible in the case of an abrupt change, a smaller value of

A R L_{1}

represents an earlier alarm. For comparison purposes, we usually need a benchmark, i.e., a common

A R L_{0}

for different control charts. So, in general, in order to compare the performance of several control charts, we ensure they have a common

A R L_{0}

and then compare the

A R L_{1}

of the control charts. The smaller the value of

A R L_{1}

, the better the performance of the control chart, which is the essence of the definition of the

K C P I

. If

K C P I (T_{M C})

is larger, the performance of the kernel-based CUSUM multi-chart is better. So, finding the optimal design of the kernel-based CUSUM multi-chart involves finding the maximum value of

K C P I (T_{M C})

. Under the multivariate observation cases, if finding the maximum value of

K C P I (T_{M C})

is hard, we can choose stochastic reference values of post-change

μ_{1}^{k}, 1 \leq k \leq c

, as long as the corresponding

K C P I

is large.

2.3. Theoretical Properties of Kernel-Based CUSUM Multi-Chart

In this section, we discuss the theoretical properties of the proposed kernel-based CUSUM multi-chart to ensure the performance of change detection.

Similarly, as in Theorem 3 of [26,27], for a large

γ

, we can obtain the lower bound

A R L_{1}^{*} = {[E_{1} (T)]}^{*}

with

A R L_{0} (T) = E_{0} (T) = γ

, as follows:

{[E_{1} (T)]}^{*} \sim \frac{ln γ}{K I (μ_{1}, {\hat{μ}}_{0})},

(3)

where

K I (μ_{1}, {\hat{μ}}_{0}) = E_{1} [ln \frac{k (X, μ_{1})}{k (X, {\hat{μ}}_{0})}] .

(4)

For a large

N, γ

, in order to obtain the asymptotic

A R L_{1}

of the kernel-based CUSUM multi-chart, we first divide the region D into several disjointed subsets by using the reference post-change

μ_{1}^{k}, 1 \leq k \leq c

, according to the kernel-based Kullback–Leibler information distance in (4).

Let

D_{k} = \{μ_{1} \in D : K I (μ_{1}, μ_{1}^{k}) \leq min_{l \neq k} K I (μ_{1}, μ_{1}^{l})\}

for

1 \leq k \leq c

. Thus, we have a disjointed division of the region D, with

D_{k}, 1 \leq k \leq c

, and then,

K C P I (T_{M C}) = exp {- \sum_{k = 1}^{c} \int_{D_{k}} w (μ_{1}) \frac{A R L_{1} (T_{M C}) - A R L_{1}^{*}}{A R L_{1}^{*}} d μ_{1}} .

(5)

The definition of the kernel-based inner Kullback–Leibler information distance is as follows:

\begin{matrix} K I (μ_{1}, μ_{1}^{k}, {\hat{μ}}_{0}) & = E_{1} [ln \frac{k (X, μ_{1}^{k})}{k (X, {\hat{μ}}_{0})}] \\ = E_{1} [ln \frac{k (X, μ_{1})}{k (X, {\hat{μ}}_{0})}] - E_{1} [ln \frac{k (X, μ_{1})}{k (X, μ_{1}^{k})}] \\ = K I (μ_{1}, {\hat{μ}}_{0}) - K I (μ_{1}, μ_{1}^{k}) . \end{matrix}

(6)

The last equation holds with the definition of

K I (μ_{1}, {\hat{μ}}_{0})

and

K I (μ_{1}, μ_{1}^{k})

in (4). Here, we assume that the reference

μ_{1}^{k}, 1 \leq k \leq c

satisfies the following conditions:

(1): $K I (μ_{1}, μ_{1}^{k}, {\hat{μ}}_{0}) > 0$ if $μ_{1} \in D_{k}$ for $1 \leq k \leq c$ .
(2): $K I (μ_{1}, {\hat{μ}}_{0}) \geq 0$ for $μ_{1} \in D$ .

The first condition means that if

μ_{1} \in D_{k}

, then

K I (μ_{1}, {\hat{μ}}_{0}) > K I (μ_{1}, μ_{1}^{k})

, which is consistent with the intuitive reality. We can interpret that the “distance” from

μ_{1}

to

μ^{k}

is smaller than the distance from

μ_{1}

to

{\hat{μ}}_{0}

. The reference post-change

μ_{1}^{k}

should be designed to be closer to the real but unknown post-change

μ_{1}

. The second condition ensures that the kernel-based Kullback–Leibler information distance is non-negative, which imposes constraints on the selection of different kernel functions. So, in the simulation, we choose the Gaussian kernel and Laplace kernel; these two kernel functions all satisfy the above two conditions. In order to give a clear form of (5), we set

d_{l i m}^{k} = lim_{n \to \infty} d_{n}^{k} .

By using a similar method as in [28], we can prove the following theorems. Theorem 1 shows the asymptotic results regarding

A R L_{1}

for a large

N, γ

.

Theorem 1.

Let

T_{k}, 1 \leq k \leq c

have a common

A R L_{0} = E_{0} (T_{k}) = γ

. If

μ_{1} \in D_{k}

, then, a.s.,

\begin{matrix} \frac{T_{k}}{T_{l}} \leq \frac{max (0, K I (μ_{1}, μ_{1}^{l}, {\hat{μ}}_{0})}{K I (μ_{1}, μ_{1}^{k}, {\hat{μ}}_{0})}, \\ T_{k} < T_{l}, \\ \frac{T_{M C}}{d_{l i m}^{k}} = \frac{T_{k}}{d_{l i m}^{k}} + o (1) \to \frac{1}{K I (μ_{1}, μ_{1}^{k}, {\hat{μ}}_{0})} \end{matrix}

(7)

for

τ = 1, k \neq l

, and

\begin{matrix} E_{1} [\frac{T_{M C}}{d_{l i m}^{k}}] = E_{1} [\frac{T_{k}}{d_{l i m}^{k}}] + o (1) = \frac{1}{K I (μ_{1}, μ_{1}^{k}, {\hat{μ}}_{0})} (1 + o (1)) \end{matrix}

(8)

for large

N, γ

.

By introducing the kernel-based Kullback–Leibler information distance in (4) and the kernel-based inner Kullback–Leibler information distance in (6), we can obtain the statistical properties of

T_{M C}

. For the kernel-based CUSUM multi-chart, in order to compare their

A R L_{1} s

, there are two common

A R L_{0} s

,

γ^{'}

and

γ

, respectively, for

T_{k}

and

T_{M C}^{'}

. The dynamic control limits

d_{n}^{' k}, 1 \leq k \leq c

are for

γ^{'}

, and

d_{n}^{k}, 1 \leq k \leq c

are for

γ

, and the inequality

d_{l i m}^{' k} > d_{l i m}^{k}

holds. Writing

T_{k}^{'} = T (d_{n}^{' k})

for short, then

E_{0} [T_{k}^{'}] = γ^{'} > E_{0} [T_{k}] = γ = E_{0} (T_{M C}^{'})

(9)

for

1 \leq k \leq c

, where

T_{M C}^{'} = {min}_{1 \leq k \leq c} {T_{k}^{'}}

.

So, from Theorem 1, we have

A R L_{1} (T_{M C}^{'}) \to \frac{d_{l i m}^{' k}}{K I (μ_{1}, μ_{1}^{k}, {\hat{μ}}_{0})} a . s .

And, from (3),

A R L_{1}^{*} \to \frac{ln γ}{K I (μ_{1}, {\hat{μ}}_{0})} a . s .

Then,

K C P I (T_{M C})

can have a simple form for a large N and

γ

:

\begin{matrix} K C P I (T_{M C}) & = K C P I (μ_{1}^{k}, 1 \leq k \leq c) = exp {- \int_{D} w (μ_{1}) \frac{A R L_{1} (T) - A R L_{1}^{*}}{A R L_{1}^{*}} d μ_{1}} \\ = exp \{1 - \int_{D} w (μ_{1}) \frac{A R L_{1} (T)}{A R L_{1}^{*}} d μ_{1}\} \\ = exp \{1 - \sum_{k} \int_{D_{k}} w (μ_{1}) \frac{K I (μ_{1}, {\hat{μ}}_{0})}{K I (μ_{1}, {\hat{μ}}_{0}) - K I (μ_{1}, μ_{1}^{k})} d μ_{1}\} . \end{matrix}

(10)

Then, we give a theorem about the relationship between the kernel-based CUSUM multi-chart and its constituent chart. Indeed, Theorem 2 shows that the performance of the kernel-based CUSUM multi-chart is better than that of its constituent charts in detecting an unknown abrupt change. That is to say, it makes sense to use the kernel-based CUSUM multi-chart; the kernel-based CUSUM multi-chart has better performance than a single kernel-based CUSUM chart in detecting unknown abrupt changes.

Theorem 2.

p_{l}, 1 \leq l \leq c

are positive numbers and satisfy

\sum_{1 \leq l \leq c} p_{l} = 1 .

If condition (7) holds for

T_{M C}^{'}

and for

μ_{1} \in D_{k}

, there is

μ_{1}^{l}

,

k \neq l

such that

K I (μ_{1}, μ_{1}^{l}, {\hat{μ}}_{0}) < K I (μ_{1}, μ_{1}^{k}, {\hat{μ}}_{0})

, and then for every

μ_{1} \in D

,

\sum_{1 \leq l \leq c} p_{l} A R L_{1} (T_{l}) > A R L_{1} (T_{M C}^{'})

for large

N, γ

.

The theorems above illustrate the beneficial properties of our proposed kernel-based CUSUM multi-chart.

3. Comparison and Analysis of Simulation Results

In this section, we present simulation results to demonstrate the performance of the proposed kernel-based CUSUM multi-chart. And, we choose the commonly used multivariate normal distribution, multivariate t distribution, and multivariate Gamma distribution as the simulation examples. For comparison, we use different control charts to compute the out-of-control average run length

A R L_{1}

, under the condition that the in-control average run length

A R L_{0}

is the same. For detection problems, we expect to detect changes promptly as soon as changes occur. Hence, with the same in-control average run length (

A R L_{0}

), the smaller the value of out-of-control average run length (

A R L_{1}

), the better the performance of the detection.

In the simulation, in order to evaluate the performance of change detection for different shifts, the criterion is as follows:

\begin{matrix} K C P I (T) & = exp {- \int_{D} w (μ_{1}) \frac{A R L_{1} (T) - A R L_{1}^{*}}{A R L_{1}^{*}} d μ_{1}} \\ \propto - \int_{D} w (μ_{1}) \frac{A R L_{1} (T) - A R L_{1}^{*}}{A R L_{1}^{*}} d μ_{1} \\ \propto - \int_{D} \frac{A R L_{1} (T) - A R L_{1}^{*}}{A R L_{1}^{*}} d μ_{1} (if no prior information) \\ \propto - \int_{D} A R L_{1} (T) d μ_{1} . \end{matrix}

We choose a number of possible values of

μ_{1}

as a representation in the numerical simulations and compute the out-of-control

A R L_{1}

. In this view, we can also conclude that the performance of the control charts depends on the value of the out-of-control average run length (

A R L_{1}

); the smaller the value, the better the performance.

Remark 1.

In this section, we compare the performance with different control charts, and the performance is measured by the

A R L

. It is well known that the bandwidth has an impact on the kernel function estimation. The optimal bandwidth h in regression problems has been studied in [29,30]. However, in the monitoring problem discussed in this paper, the stopping rule is that if an abrupt change occurs, we issue a warning whenever the statistic exceeds the control limit. In the case of the same

A R L_{0}

, the control limit was also influenced by h, so if an abrupt change occurs, the bandwidth h in Gaussian and Laplace kernel functions has a joint effect on the statistic and the control limit, and their effects theoretically cancel each other out.

In this section, we compare the Gaussian kernel-based CUSUM multi-chart, Laplace kernel-based CUSUM multi-chart, and other existing control charts. The performance (i.e., detectability) is measured by the

A R L

. Since the out-of-control state is unknown before change detection, we choose several representative out-of-control states in the simulation.

3.1. The Case for Multivariate Normal Distribution

In this section, we present simulation results to demonstrate the performance of the Gaussian kernel-based CUSUM multi-chart (KMulti-Gaussian), the Laplace kernel-based CUSUM multi-chart (KMulti-Laplace), the distribution-free multivariate goodness-of-fit chart (DFMGoF), the self-starting EWMAC (SSEWMAC) chart [31], the chart based on the change-point and generalized likelihood ratio test (short for ChangePt [32]), and the RTC chart [16]. In the simulation, we consider the Gaussian and Laplace kernel functions-based CUSUM multi-charts with DFMGoF, RTC, ChangePt, and SSEWMAC. We compare the values of

A R L_{1}

under the case that

A R L_{0} \approx 200

and

m = 100

. The bandwidth in the Laplace and Gaussian kernels is set to

h = 1

.

Without loss of generality, the mean vector

μ_{0}

is set to

0

, and the covariance matrix

Σ_{0} = I

is chosen to be diagonal. We consider

p = 10, 30

, representing the low-dimensional and high-dimensional cases, respectively. We compare the out-of-control performance of the six competing charts.

Similar to other MSPC studies, it is impossible to enumerate all the change patterns to allow a full-scale study of the charts’ performance. Following similar studies in the literature [2,20,31,32], we consider scenarios as examples: shifts in the process mean vector in the first

[p / 5]

components of size

δ

, i.e.,

μ_{1} = μ_{0} + δ e

with

e = {(1, \dots, 1, 0, \dots, 0)}^{T}

. We compare their performance in detecting mean shifts of magnitude

δ = 0.25, 0.5, 1, 2, 4

, respectively.

The upper part of Table 1 represents the value of the

A R L_{1}

for different control charts for

p = 10

with the same value of

A R L_{0} \approx 200

. As we have elaborated before, the smaller the value of

A R L_{1}

, the better the detection performance, given the same

A R L_{0}

. Thus, from Table 1, we can conclude that KMulti-Gaussian outperforms the other five charts when

p = 10

. The lower part of Table 1 represents the value of the

A R L_{1}

for different control charts for

p = 30

with the same value of

A R L_{0} \approx 200

; the conclusion is entirely similar.

Table 1. The comparison of the

A R L_{1}

in detecting location shifts when

m = 100

for a multivariate normal distribution.

In our simulation setting, we observed that the charts generally have a smaller

A R L_{1}

when

p = 30

than that when

p = 10

given the same

δ

. This is because the detectability is largely determined by the Mahalanobis distance of the shifted mean vector from the IC mean vector,

Δ = {(μ_{1} - μ_{0})}^{T} Σ_{0}^{- 1} (μ_{1} - μ_{0})

(see [31]). Given

Σ_{0}

and the change pattern in our study, we have

Δ_{p = 30} > Δ_{p = 10}

given the same

δ

. This can partially explain the better performance when

p = 30

.

Our proposed method has a priori information compared to other methods and extracts the information in the form of kernel functions, such that the simulation results are not unexpected.

3.2. The Case for Multivariate t Distribution

In this section, we consider the multivariate t distribution. We use

t_{p, ϵ}

, which represents multivariate t distribution with

ϵ

degrees of freedom. Let pre-change observations belong to multivariate t distribution with 5 degrees of freedom, denoted by

t_{p, 5}

. We also consider the scenario as follows: shifts in the process mean vector in the first

[p / 5]

components of size

δ

, i.e.,

μ_{1} = μ_{0} + δ e

with

e = {(1, \dots, 1, 0, \dots, 0)}^{T}

. We compare their performance in detecting mean shifts of magnitude

δ = 0.25, 0.5, 1, 2, 4

, respectively. Table 2 also compares our proposed methods with other existing methods.

Table 2. The comparison of the

A R L_{1}

in detecting location shifts when

m_{0} = 100

for a multivariate t distribution.

The upper part of Table 2 represents the value of the

A R L_{1}

for different control charts for

p = 10

with the same value of

A R L_{0} \approx 200

, and the lower part of Table 2 represents the value of the

A R L_{1}

for different control charts for

p = 30

with the same value of

A R L_{0} \approx 200

. As we have elaborated before, the smaller the value of the

A R L_{1}

, the better the detection performance, given the same

A R L_{0}

. Thus, from Table 2, we can conclude that KMulti-Laplace outperforms the other five charts when

p = 10

and

p = 30

.

3.3. The Case for Multivariate Gamma Distribution

In this section, we consider the multivariate Gamma distribution. We use

G a m m a_{p, ϵ}

, which represents the multivariate Gamma distribution with the shape parameter

ϵ

and scale parameter of one. Let the pre-change observations belong to multivariate Gamma with the shape parameter of three and scale parameter of one, denoted by

G a m m a_{p, 3}

. We also consider the scenario as follows: shifts in the process mean vector in the first

[p / 5]

components of size

δ

, i.e.,

μ_{1} = μ_{0} + δ e

with

e = {(1, \dots, 1, 0, \dots, 0)}^{T}

. We compare their performance in detecting mean shifts of magnitude

δ = 0.25, 0.5, 1, 2, 4

, respectively.

The upper part of Table 3 represents the value of the

A R L_{1}

for different control charts for

p = 10

with the same value of

A R L_{0} \approx 200

, and the lower part of Table 3 represents the value of the

A R L_{1}

for different control charts for

p = 30

with the same value of

A R L_{0} \approx 200

. As we have elaborated before, the smaller the value of the

A R L_{1}

, the better the detection performance, given the same

A R L_{0}

.

Table 3. The comparison of the

A R L_{1}

in detecting location shifts when

m_{0} = 100

for a multivariate Gamma distribution.

From Table 3, we can see that DFMGoF works better in some specific situations (

p = 30

and

δ = 0.25, 0.5

), but note that from all the drift cases combined, KMulti-Gaussian is still better. We assume that all drift cases are equally likely. For example, the

A R L_{1}

of DFMGoF is 14.38 less than the

A R L_{1}

of KMulti-Gaussian when

δ = 0.25

and

p = 30

, but we set a weight of 0.25/4 when measuring performance in aggregate. Combining all the out-of-control states considered, KMulti-Gaussian and KMulti-Laplace are better than the other control charts.

From the three tables of the simulation results, we can conclude that the performance of the kernel-based multivariate nonparametric CUSUM multi-chart (KMulti-Gaussian and KMulti-Laplace) is better than the other four existing control charts, as the

A R L_{1}

is smaller when the

A R L_{0}

is the same.

Remark 2.

In this section, we compared the performance of different multivariate control charts. The performance of the methods proposed in this paper may not always be the best because the

K C P I

in (5) is an integrated form for all possible post-change mean vectors

μ_{1}

. Combining all the out-of-control states considered, the value of the

A R L_{1}

for the method proposed in this paper is better compared with other existing control charts.

Remark 3.

The average run length (ARL) is a very important measurement in change detection. The in-control average run length, denoted by

A R L_{0}

, is, in the case of the observations, a pre-change observation. The out-of-control average run length, denoted by

A R L_{1}

, is, in the case of the observations, a post-change observation. The purpose of change detection is to alarm as quickly as possible if an abrupt change occurs. Hence, the smaller the value of the

A R L_{1}

, the better the performance of detection when the

A R L_{0}

is the same.

4. Concluding Remarks

4.1. The Novelty of the Kernel-Based Multivariate CUSUM Multi-Chart

Traditional CUSUM multi-chart [19,28] is designed for one-dimensional data with a known distribution. Generalizing to multivariate data is not as straightforward. On the one hand, it requires some computation. On the other hand, we would like to design a multivariate CUSUM multi-chart that does not depend on the distribution. This paper proposes a kernel-based multivariate nonparametric CUSUM multi-chart to solve the following three challenges:

In reality, we usually lack post-change information in online detection.
Effectively capturing important features is essential when the dimension is $p = 30$ , a slightly larger number. The statistics, such as Hotelling’s $T^{2}$ , may obtain the wrong alarm in online detection.
In some cases, the amount of historical pre-change observations is not large.

4.2. Future Research Perspectives

We initially attempted to construct a control chart using the kernel function, which in turn was found to have good theoretical properties and simulation performance. We defined the kernel function-based CUSUM multi-chart and

K C P I

, which further gives the theoretical properties. In the simulation, we compared our proposed method with other existing methods and showed the advantages of our proposed method in terms of the average run length, a crucial index.

At the same time, it has some shortcomings. We assumed that multivariate data have prior knowledge. As the number of dimensions increases, acquiring prior knowledge becomes more and more difficult. For example, in the actual production process, there are many steps involved in production. We can monitor the sensor data from these steps as multivariate data, but acquiring more precise prior knowledge may necessitate a substantial investment of time and money.

We can conduct further research on the correlation between multivariate data internally and investigate the relationship between multidimensional variables when constructing control charts. This is a valuable research area to consider in the future. We can also construct parallel control charts using different kernel functions to monitor changes. In the simulation results above, the Gaussian kernel function and Laplace kernel function each have their own advantages and disadvantages. As to which kernel function works better, it depends on the data and how they change. In some data, this kernel function works well. In other data, another kernel function works well. The subsequent research intends to combine different kernel functions to construct parallel control charts.

Author Contributions

Conceptualization, L.Q. and B.W.; methodology, L.Q. and B.W.; software, L.Q.; validation, B.W.; resources, L.Q. and B.W.; data curation, L.Q. and B.W.; writing—original draft preparation, L.Q.; writing—review and editing, B.W.; project administration, L.Q. and B.W.; funding acquisition, L.Q. and B.W. All authors have read and agreed to the published version of this manuscript.

Funding

L.Q. was supported by and the Shanghai Higher Education Young Teachers Cultivation Funding Program (Grant No. AT-33-021424-001006). B.W. was supported by the Shanghai Higher Education Young Teachers Cultivation Funding Program (Grant No. AT-33-021424-001006).

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

CUSUM	Cumulative Sum
SPC	Statistical Process Control
MSPC	Multivariate Statistical Process Control
IC	in-control
OC	out-of-control
ARL	average run length
$A R L_{0}$	in-control average run length
$A R L_{1}$	out-of-control average run length
MEWMA	Multivariate Exponentially Weighted Moving Average
MCUSUM	Multivariate Cumulative Sum
VSI	variable sampling interval
RKHS	reproducing kernel Hilbert space
KCP	kernel change-point
KCPI	Kernel Control Chart Performance Index
KI	Kullback–Leibler information distance
KMutl-Gaussian	Gaussian kernel-based CUSUM multi-chart
KMutl-Laplace	Laplace kernel-based CUSUM multi-chart
DFMGoF	distribution-free multivariate goodness-of-fit chart
RTC	Real-Time Constrasts
ChangePt	change-point and generalized likelihood ratio test
SSEWMAC	self-starting EWMA chart

Appendix A

Proof of Theorem 1.

Based on the proofs provided in [19,33], we can easily prove Theorem 1. The only difference is that the measure function is a kernel function. □

Proof of Theorem 2.

Based on the proofs provided in [19,33], we can easily prove Theorem 2. The only difference is that the measure function is a kernel function. □

References

Woodall, W.H.; Montgomery, D.C. Some Current Directions in the Theory and Application of Statistical Process Monitoring. J. Qual. Technol. 2014, 46, 78–94. [Google Scholar] [CrossRef]
Zou, C.; Tsung, F. A Multivariate Sign EWMA Control Chart. Technometrics 2011, 53, 84–97. [Google Scholar] [CrossRef]
Hotelling, H. Multivariate quality control—Illustrated by the air testing of sample bombsights. In Techniques of Statistical Analysis; Eisenhart, C., Hastay, M.W., Wallis, W.A., Eds.; McGraw-Hill: New York, NY, USA, 1947; pp. 111–184. [Google Scholar]
Lowry, C.A.; Woodall, W.H.; Champ, C.W.; Rigdon, S.E. Multivariate Exponentially Weighted Moving Average Control Chart. Technometrics 1992, 34, 46–53. [Google Scholar] [CrossRef]
Runger, G.C.; Prabhu, S.S. A Markov Chain Model for the Multivariate Exponentially Weighted Moving Averages Control Chart. J. Am. Stat. Assoc. 1996, 91, 1701–1706. [Google Scholar] [CrossRef]
Crosier, R.B. Multivariate Generalizations of Cumulative Sum Quality-Control Schemes. Technometrics 1988, 30, 243–251. [Google Scholar] [CrossRef]
Woodall, W.H.; Ncube, M.M. Multivariate CUSUM Quality Control Procedures. Technometrics 1985, 27, 285–292. [Google Scholar] [CrossRef]
Hawkins, D.M. Multivariate Quality Control Based on Regression-Adjusted Variables. Technometrics 1991, 33, 61–75. [Google Scholar] [CrossRef]
Wang, K.; Jiang, W. High-Dimensional Process Monitoring and Fault Isolation via Variable Selection. J. Qual. Technol. 2009, 41, 247–258. [Google Scholar] [CrossRef]
Zou, C.; Qiu, P. Multivariate Statistical Process Control Using LASSO. J. Am. Stat. Assoc. 2009, 40, 1586–1596. [Google Scholar] [CrossRef]
Zhang, C.; Chen, N.; Zou, C. Robust Multivariate Control Chart Based on Goodness-of-fit Test. J. Qual. Technol. 2016, 48, 139–161. [Google Scholar] [CrossRef]
Alt, F. Multivariate quality control: State of the art. In ASQC Quality Congress Transactions; American Society for Quality Control: Milwaukee, WI, USA, 1982; pp. 886–893. [Google Scholar]
Alt, F.; Smith, N. Multivariate process control. Handb. Stat. 1988, 7, 333–351. [Google Scholar]
Emrouznejad, A.; RostamyMalkhalifeh, M.; HatamiMarbini, A.; Tavana, M. General and multiplicative non-parametric corporate performance models with interval ratio data. Appl. Math. Model. 2012, 36, 5506–5514. [Google Scholar] [CrossRef]
Reynolds, M.R., Jr.; Amin, R.W.; Arnold, J.C.; Nachlas, J.A. X charts with variable sampling intervals. Technometrics 1988, 30, 181–192. [Google Scholar] [CrossRef]
Deng, H.; Runger, G.; Tuv, E. System Monitoring with Real-Time Contrasts. J. Qual. Technol. 2012, 44, 9–27. [Google Scholar] [CrossRef]
Hwang, W.; Runger, G.; Tuv, E. Multivariate Statistical Process Control with Artificial Contrasts. IIE Trans. 2007, 39, 659–669. [Google Scholar] [CrossRef]
Holland, M.D.; Hawkins, D. A Control Chart Based on A Nonparametric Multivariate Change-Point Model. J. Qual. Technol. 2014, 46, 63–77. [Google Scholar] [CrossRef]
Han, D.; Tsung, F. Detection and diagnosis of unknown abrupt changes using CUSUM multi-chart schemes. Sequantial Anal. 2007, 26, 225–249. [Google Scholar] [CrossRef]
Zou, C.; Wang, Z.; Tsung, F. A Spatial Rank-Based Multivariate EWMA Control Chart. Nav. Res. Logist. (NRL) 2012, 59, 91–110. [Google Scholar] [CrossRef]
Lebarbier, E. Detecting multiple change-points in the mean of a Gaussian process by model selection. Signal Process. 2005, 85, 717–736. [Google Scholar] [CrossRef]
Scholkopf, B.; Smola, A. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond; MIT Press: Cambridge, MA, USA, 2001. [Google Scholar]
Evers, L.; Messow, C.-M. Sparse kernel methods for high-dimensional survival data. Bioinformatics 2008, 24, 1632–1638. [Google Scholar] [CrossRef] [PubMed]
Arlot, S.; Celisse, A.; Harchaoui, Z. A kernel multiple change-point algorithm via model selection. J. Mach. Learn. Res. 2019, 162, 1–56. [Google Scholar]
Li, S.; Xie, Y.; Dai, H.; Song, L. Scan b -statistic for kernel change-point detection. Seq. Anal. 2019, 38, 503–544. [Google Scholar] [CrossRef]
Lorden, G. Procedures for reacting to a change in distribution. Ann. Math. Statist. 1971, 42, 520–527. [Google Scholar] [CrossRef]
Siegmund, D. Sequential Analysis: Tests and Confidence Intervals; Springer: New York, NY, USA, 1985. [Google Scholar]
Han, D.; Tsung, F.; Hu, X.; Wang, K. CUSUM and EWMA multi-charts for detection a range of mean shifts. Stat. Sin. 2007, 17, 1139–1164. [Google Scholar]
Amini, M.; Roozbeh, M. Optimal partial ridge estimation in restricted semiparametric regression models. J. Multivar. Anal. 2015, 136, 26–40. [Google Scholar] [CrossRef]
Roozbeh, M. Optimal QR-based estimation in partially linear regression models with correlated errors using GCV criterion. Comput. Stat. Data Anal. 2018, 117, 45–61. [Google Scholar] [CrossRef]
Maboudou-Tchao, E.M.; Hawkins, D.M. Self-Starting Multivariate Control Charts for Location and Scale. J. Qual. Technol. 2011, 43, 113–126. [Google Scholar] [CrossRef]
Zamba, K.; Hawkins, D.M. A Multivariate Change-Point Model for Change in Mean Vector and/or Covariance Structure. J. Qual. Technol. 2009, 41, 285–303. [Google Scholar] [CrossRef]
Qiao, L.; Han, D. CUSUM multi-chart for detecting unknown abrupt changes under finite measure space for network observation sequences. Statistics 2021, 55, 489–513. [Google Scholar] [CrossRef]

Table 1. The comparison of the

A R L_{1}

in detecting location shifts when

m = 100

for a multivariate normal distribution.

Table 1. The comparison of the

A R L_{1}

in detecting location shifts when

m = 100

for a multivariate normal distribution.

p	$δ$	KMulti-Gaussian	KMulti-Laplace	DFMGoF	RTC	ChangePt	SSEWMAC
10	0.25	42.63	45.24	93.9	132	186	122
	0.5	15.77	16.97	36.1	51.8	134	70.3
	1	5.32	5.67	12.4	12.1	36.4	11.6
	2	1.84	1.85	5.83	6.23	12.4	4.04
	4	1.01	1.01	3.93	4.72	4.81	1.59
30	0.25	23.22	25.92	83.1	100	186	87.8
	0.5	7.71	8.33	20.6	24.7	148	22.9
	1	2.46	2.36	8.4	7.62	52.9	9.09
	2	1.03	1.03	4.08	4.82	23.9	4.6
	4	1.01	1.01	2.74	3.9	12.2	2.37

Table 2. The comparison of the

A R L_{1}

in detecting location shifts when

m_{0} = 100

for a multivariate t distribution.

Table 2. The comparison of the

A R L_{1}

in detecting location shifts when

m_{0} = 100

for a multivariate t distribution.

p	$δ$	KMulti-Gaussian	KMulti-Laplace	DFMGoF	RTC	ChangePt	SSEWMAC
10	0.25	60.75	51.60	140	151.6	188	162
	0.5	25.54	20.75	58.5	68.3	155	88.9
	1	9.04	7.40	16.4	15.1	59.9	21.4
	2	3.58	2.89	7.37	6.74	19.9	8.1
	4	1.45	2.01	4.42	5.21	8.21	4.11
30	0.25	63.83	43.05	127	136	184	140
	0.5	18.81	11.89	37.7	45.6	154	55.3
	1	6.70	4.14	11.5	9.38	64.4	14.2
	2	2.76	1.55	5.46	5.77	31.5	6.43
	4	1.01	1.01	3.31	4.58	15.8	3.6

Table 3. The comparison of the

A R L_{1}

in detecting location shifts when

m_{0} = 100

for a multivariate Gamma distribution.

Table 3. The comparison of the

A R L_{1}

in detecting location shifts when

m_{0} = 100

for a multivariate Gamma distribution.

p	$δ$	KMulti-Gaussian	KMulti-Laplace	DFMGoF	RTC	ChangePt	SSEWMAC
10	0.25	72.73	73.94	88.3	147	188	137
	0.5	35.76	35.03	26.8	59.3	152	67.2
	1	14.41	14.33	13.2	10.5	52.9	17.4
	2	5.69	5.62	7.27	6.43	17.2	7.39
	4	2.30	2.57	4.57	5.05	6.87	3.4
30	0.25	57.78	64.62	43.4	126	183	109
	0.5	19.84	24.56	17.2	28.4	159	35.7
	1	7.26	7.04	9.2	7.43	62.5	11.9
	2	2.62	2.35	5.16	5.25	27.8	5.72
	4	1.14	1.09	3.3	4.13	14.7	2.85

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Kernel-Based Multivariate Nonparametric CUSUM Multi-Chart for Detection of Abrupt Changes

Abstract

1. Introduction

2. Kernel-Based Multivariate Nonparametric CUSUM Multi-Chart

2.1. Examples of Kernels

2.2. Kernel Change-Point Online Detection Method

2.3. Theoretical Properties of Kernel-Based CUSUM Multi-Chart

3. Comparison and Analysis of Simulation Results

3.1. The Case for Multivariate Normal Distribution

3.2. The Case for Multivariate t Distribution

3.3. The Case for Multivariate Gamma Distribution

4. Concluding Remarks

4.1. The Novelty of the Kernel-Based Multivariate CUSUM Multi-Chart

4.2. Future Research Perspectives

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A

References

Article Metrics

Citations

Article Access Statistics