1. Introduction
This work is motivated by an empirical analysis and process control of a monthly drug crime series, which contains excess zeros (over
) and shows clear serial dependence (see
Section 5 for more details). To solve this problem, an appropriate integer-valued model is selected, further, control charts based on this model are developed. Among kinds of integer-valued models, a specific kind featured with first-order integer-valued autoregressive (INAR(1)) models plays an very important role and has been widely studied in the literature. In reality, serial dependence among the count data have been demonstrated to arise extensively in practice, typical examples are infectious disease counts, defect counts and unemployment counts, etc. These data are important indicators of the epidemic study, quality control and economics analysis, and the process monitoring is essential to detect the shifts in them.
The first INAR(1) model proposed by Al-Osh and Alzaid [
1] is in the following form
where the binomial thinning operator “
is defined by Steutel and Van Harn [
2],
,
,
is a sequence of independent and identically distributed (i.i.d.) random variables with Bernoulli(
) distribution; and
is a sequence of i.i.d. random variables, independent of all
. The INAR(1) model is currently applied in various kinds of real-world problems because of its good interpretability. As one example, we let
represent the number of patients of an infectious disease in a community at time
t,
the number of new patients entering the community at time
t, and suppose each patient at time
survives at time
t with survival probability
. As for the crime data,
can be considered as the number of re-offendings provoked by
with probability
. Depending on the nature of this kind of observed data, the INAR(1) models have been modified and generalized with respect to their orders (Ristić and Nastić [
3], Nastić, Laketa and Ristić [
4]), dimensions (Pedeli and Karlis [
5], Khan, Cekim and Ozel [
6]), marginal distributions (Alzaid and Al-Osh [
7], Alzaid and Al-Osh [
8], Jazi, Jones and Lai [
9], Ristić, Nastić and Bakouch [
10], Barreto-Souza [
11]), thinning operators (Ristić, Bakouch and Nastić[
12], Liu and Zhu [
13]), and mixed models (Ristić and Nastić [
3], Li, Wang and Zhang [
14], Orozco, Sales, Fernández and Pinho [
15]). For more literature, we refer to review papers (Weiß [
16], Scotto, Weiß and Gouveia [
17]). Differing with the models that based on a fixed survival rate
, Zheng, Basawa and Datta [
18] proposed the random coefficient INAR(1) model, supposing that the parameter
may be affected by various environmental factors, and could vary randomly over time. Some of the generalizations of random coefficient INAR models can also be found in Kang and Lee [
19] and Zhang, Wang and Zhu [
20]. In particular, considering both random survival probability and zero-inflation phenomenon, Bakouch, Mohammadpour and Shirozhan [
21] purposed a zero-inflated geometric INAR(1) time series with random coefficient (short for the ZIGINAR
(1) process). The ZIGINAR
(1) model has simple structure and good properties, which turns out to be the best fit for the real data studied by us.
As the serial dependence shows big influence on the performance of the control chart, the traditional control charts under the assumption of independent observations are not appropriate in many cases. Therefore, the monitoring of INAR(1) models has received much attention. The related research includes but not limited to the control charts for the generally developed Poisson INAR(1) models (Weiß [
22], Weiß and Testik [
23], Weiß and Testik [
24], Yontay, Weiß, Testik and Bayindir [
25]), for zero-inflated or zero-deflated INAR(1) models (Rakitzis, Weiß and Castagliola [
26], Li, Wang and Sun [
27], Fernandes, Bourguignon and Ho [
28]), for the mixed INAR(1) model (Sales, Pinho, Vivacqua and Ho [
29]), etc. While, to the best of our knowledge, methods for monitoring the zero-inflated INAR(1) model with random coefficient have not been studied in the literature so far, which is exactly what we are going to explore. As cumulative sum (CUSUM) control charts are known to be sensitive in detecting small shifts, we study the performance of the CUSUM chart for monitoring ZIGINAR
(1) process. We investigate the practical guidelines for the statistical design and the methods for evaluating the chart performance. Besides monitoring mean shifts of the ZIGINAR
(1) model, our scope is also to monitor correlation shifts in the model. Meanwhile, we compare the performance of the CUSUM chart with the conventional Shewhart chart.
The rest of the article is outlined as follows. The ZIGINAR
(1) process and some properties of this process are introduced in
Section 2. In
Section 3, we present the monitoring procedure to detect the mean and correlation shifts of the process. Extensive computation results are discussed in
Section 4. In
Section 5, the applicability of the process monitoring is investigated using the monthly number of drug crimes in Pittsburgh. Finally conclusions and possible future lines of research are shown in
Section 6.
2. The ZIGINAR(1) Process
A randomized binomial thinning operation in Bakouch, Mohammadpour and Shirozhan [
21] is defined by
where
,
is a binary random variable independent of discrete random variable
X,
.
Based on the definition of the randomized binomial thinning operation, the ZIGINAR
model
presented by Bakouch, Mohammadpour and Shirozhan [
21], is given by
where the marginal distribution is a zero-inflated Geometric distribution (denoted as ZIG(
)),
,
and
.
is independent of the past of the solution
and the binary sequence
, parameters are also constrained by the condition
.
The ZIGINAR
(1) process is quite suitable for modelling some real-life phenomena in which counted events may survive or vanish with the random survival probability
. Such series are studied in
Section 5 with an example of the counts of the drug crimes, where the re-offending rate may be affected by public security situation and financial situation. The mean, variance, and first-order autocorrelation function of the process are, respectively,
Obviously the process is characterized by the property of overdispersion, i.e., the variance greater than the expectation.
Figure 1 shows some sample paths of simulated ZIGINAR
(1) processes for
;
;
and
. As we can see, the model has larger process mean with larger
, and larger percentage of zeros with larger
p.
Following Theorem 2.1 in Bakouch, Mohammadpour and Shirozhan [
21], the ZIGINAR
model has a unique, strictly stationary solution given by
Furthermore, the probability mass function of
is
where
for
and 0 else. It can be deduced that the innovation series
is a mixture of three random variables, a degenerate distribution at 0, Geometric(
) and Geometric(
)) distributions with three different mixing portions. The following form is the transition probability of the process
Some other important probabilistic properties of the process, like spectral density, multi-step conditional mean and variance, extreme order statistics, distributional properties of length of run of zeros, have also been discussed in Bakouch, Mohammadpour and Shirozhan [
21]. Furthermore, the unknown parameters of the model could be estimated by conditional least squares or maximum likelihood methods.
3. Monitoring Procedure
In this section, we present a CUSUM chart for monitoring the ZIGINAR(1) process. As this process is used to fit the number of crimes, an increase in the process mean usually means a deteriorating public security environment, and an increase in the process correlation usually means more re-offendings. Thus, our purpose is to detect the increasing of both mean shifts and correlation shifts in the ZIGINAR(1) process. According to the model properties, the process mean is affected by the parameters and p, the correlation is affected by the parameters and . Let and ( and ) denote the in-control (out-of-control) parameters of the processes, and () be the corresponding in-control (out-of-control) process mean, standard deviation and first-order correlation.
The CUSUM charts are commonly used charts in statistical process control, which were first proposed by Page [
30]. The essential assumption underlying the design of CUSUM charts is that the process observations are independent (Montgomery [
31], Alencar, Ho and Albarracin [
32], Bourguignon, Medeiros, Fernandes and Ho [
33]). While the violation of this major assumption seriously affects the monitoring performance of the charts (Harris and Ross [
34], Triantafyllopoulos and Bersimis [
35], Albarracin, Alencar and Ho [
36]). Some authors have studied the performance of CUSUM charts for some integer-valued models (Weiß and Testik [
23], Weiß and Testik [
24], Yontay, Weiß, Testik and Bayindir [
25], Rakitzis, Weiß and Castagliola [
26], Li, Wang and Sun [
27], Lee and Kim [
37], Lee, Kim and Kim [
38]).
Scheme (The ZIGINAR(1) CUSUM chart).Let be a stationary ZIGINAR(1) process, the CUSUM statistics is defined as:where k is a positive integer constant representing the reference (). This chart is said to be out-of-control when falls outside the control limit h (), that is, . The initial value of the CUSUM statistics is set equal to the integer constant
, i.e.,
with
. The performance evaluation of this chart is accomplished based on the average run length (ARL) measures, which is defined as the average number of points to be plotted on the chart until the first out-of-control signal triggers. As
of the ZIGINAR
(1) process is a bivariate Markov chain, the Markov chain approach proposed by Brook and Evans [
39] is adapted to evaluate the exact ARLs. Though this method has been described in detail in the relevant literature by Weiß [
22], Weiß and Testik [
23] and Weiß and Testik [
24], we briefly introduce this method here for completeness. The reachable control region (
) of
is given by
Obviously
has a finite number of elements and could be ordered in a certain manner. The transition probability matrix of
is
,
The initial probabilities are
The conditional probability that the run length of
equals
r is defined by
where
. Let the vector
denote the
k-th factorial moments that
where
and
. Then
The ARL is obtained as
For simplicity we do not repeat the proof methods, see Weiß [
22], Weiß and Testik [
23] and Weiß and Testik [
24] for more details. It is expected that an efficient chart possesses a large in-control ARL (denoted as ARL
) and a small out-of-control ARL. Along with the ARL, we also assess the performance of the charts through the standard deviation of run length (SDRL) suggested by Weiß [
22]. The SDRL of the ZIGINAR
(1) CUSUM chart could again be computed efficiently by applying the Markov chain method. The second order factorial moments
can be determined recursively from the relation
. Then the SDRL is
To implement the proposed monitoring scheme, the chart design pairs
need to be designed in advance. Generally, a fixed ARL
value is set to be the target value, and
is set accordingly. Some guidelines for the choices of them will be given in the next section.
4. Computation Results
In this section, we evaluate the ZIGINAR
(1) CUSUM chart performance basing on extensive numerical experiments and presume that the parameters in this model have already been known. In practice, the in-control parameters need to be estimated from the data, as shown in the next section. We search for possible chart designs (integer
pairs) in order to adjust the ARL
close to the target value. Here the target ARL
value is set to be 370, which is commonly used in the statistical process monitoring domain. Meanwhile, the values of ARL and SDRL are calculated accurately by the Markov chain method, and we only show the results with two decimal places for simplicity. We first compute ARL
and SDRL
of the CUSUM chart for different in-control process parameters and initial values in
Table 1. The process parameters are:
;
;
;
. Furthermore, initial values are
. These chosen parameters could cover a broad range of different scenarios. Based on the results in
Table 1, three important conclusions can be derived. First, it can be observed that when
takes smaller value, the deviation of ARL
and SDRL
is small. When
takes larger value, there might be a situation where the value of SDRL
is significantly greater than the value of ARL
(for example, ARL
, SDRL
under
). Thus, we assume that
in the following studies to get better robust. Second, as the differences of the values between ARL and SDRL are small when
, we only use ARL as the measure in the following computations to save space. Last, the parameter
shows a great influence on the selection of control designs
, with a larger
comes a larger pair of
.
Due to its simplicity, the conventional Shewhart chart is very popular in monitoring the process shifts. The upper limit for the Shewhart chart is denoted as
. For observations, when the value of the process
exceeds the threshold value
(
), a fault is declared.
Figure 2 and
Figure 3 investigate the CUSUM method preliminarily by comparing it with the Shewhart method. In both of these figures, we assume that the in-control parameters are
and
, which are selected based on the real drug crime data in
Section 5. According to these parameters, the CUSUM chart designs can be determined, respectively, as
(corresponding ARL
);
(ARL
);
(ARL
);
(ARL
);
(ARL
). Furthermore, the Shewhart chart limit
(ARL
=381.31) can be used. It should be noted that two types of changes are considered in
Figure 2, which both lead to the upward mean shifts. The first type of changes occurs only in the parameter
, with other parameters invariant, the results are listed in
Figure 2a. Similarly, the second type of changes occurs only in the parameter
p, with other parameters invariant, the results are in
Figure 2b. From
Figure 2, we can conclude that the CUSUM chart with the design
outperforms the other CUSUM charts under most shifts, while the Shewhart chart performs worst among them. For the upward correlation shift scenarios, ARL values under two types of the parameter changes are displayed in
Figure 3. The first one considers changes only in the parameter
, and the second one considers changes only in the parameter
. For each scenario, the Shewhart chart performs increase ratio of ARL with the increase of first-order correlation
, and the CUSUM chart has the better behaviour in the figure. In a comprehensive view, the conventional Shewhart chart is insensitive for upward mean shifts caused by changes in parameter
p, and fails to detect shifts in the correlation. While the proposed CUSUM chart could overcome these limitations and has superiorities in various coefficient shifts compared with the Shewhart chart. From the figures, we can also conclude that the smaller the value of
k, the more sensitive the CUSUM chart is. As the constraint
is required to make the chart reasonable, it is natural to recommend
(the smallest integer no less than
), then we aim to select the value of
h such that ARL
is close to 370. Now the computations of the CUSUM chart are extended to general cases with designs of experiments as follow.
In
Table 2,
Table 3 and
Table 4, we focus on situations that there are increasing shifts in process mean, and the correlation remains the same. Each in-control parameter has three levels:
;
;
;
. We consider the case when the changes only occur in
, this is the most common case. The out-of-control process mean is
, the shift size
considers potential values in set
. The usual relative deviation (in %) in ARL is defined as dev
= 100
(ARL-ARL
)/ARL
(Weiß and Testik [
24]). From
Table 2,
Table 3,
Table 4, we can conclude that the ZIGINAR
(1) CUSUM chart performs quite well in detecting upward mean shifts for all scenarios. For the small shift size of
, the CUSUM chart is efficient with the minimum
drop of ARL and the maximum
drop of ARL. Take larger
for another illustration (
), the drop of ARL is at least
, and up to
at most. It can also be obtained that
has to be at least 6 to get an immediate signal with the out-of-control ARL closer to 1. In addition, extensive computation results show that the in-control parameters (
) have little effect on the better performance of the CUSUM chart to detect the mean shifts.
The computation study in
Table 5 and
Table 6 concerns the upward shifts in the process correlation. Two levels are accepted for each in-control parameter:
;
;
;
. Two types of out-of-control pattern are considered here for comprehensive investigation, the first type is that the upward changes only exist in
(shown in
Table 5), and the second type is that the downward changes only exist in
(shown
Table 6). In
Table 5, the shifts in the magnitude
are from the set
. The results imply that the performance of CUSUM chart fluctuates greatly in detecting correlation shifts caused only by
. To be specific, when
is 0.3, dev
ranges from
to
. Another finding based on the design of experiments in
Table 5 is that both a smaller
and a smaller
could slightly improve detection efficiency, while
and
could not. In
Table 6, the shifts magnitude
are from the set
. From
Table 6, we can see the CUSUM chart is more efficient in detecting the correlation shifts caused by
. As the absolute value of
gets bigger, the decreasing proportion of ARL gradually increased. When
, dev
ranges from
to
. Meanwhile, we can further conclude that a smaller
and a larger
often lead to better chart performance, and
have little influence. Based on all the analysis above in this paragraph, we can further conclude that a smaller
and a larger value of initial correlation
are helpful to detect the correlation shifts. Furthermore, that we cannot get an immediate signal when only correlation shifts occur.
5. Analyses of Drug Crime Count Time Series
In this section, we present a case study of crime count data in Pittsburgh. The data set contains multiple crime types, such as arson, drink-driving, robbery and so on. Monitoring of crime data is needed not only for early warnings of the organised crime, but also for assessments of the social security environment. For the crime data, the readers can download it from the Forecasting Principles site (
http://www.forecastingprinciples.com, accessed on 20 March 2021), or email to the corresponding author to access. The subset we analyse is a monthly drug use count data collected from the 56th police car beat, which contains 144 observations from January 1990 to December 2001. There are 67 zeros in this drug use data (the proportion up to
), which have the greatest proportion among the other values for the data series. The sample mean, variance and first-order autocorrelation of the data are 1.7153, 6.4289 and 0.3886, respectively, which show strong overdispersion and autocorrelation. The sample path and the histogram of the series are in
Figure 4. The histograms of estimated ZIG distribution, estimated Geometric distribution and estimated Poisson distribution are also given in
Figure 4b, which indicate that the ZIG marginal is the most appropriate to describe the data. The sample autocorrelation function (ACF) and the sample partial autocorrelation function (PACF) in
Figure 5 reveal that the series most likely comes from an AR-type process of order 3. While our intention is to illustrate the implementation of the proposed control chart, we will employ the first-order INAR models that are widely studied and applied in the literature. The consideration of more complex models will be left for future study.
Except for the ZIGINAR
(1) model, some competitive models are also applied to the time series, such as Poisson INAR(1) (Al-Osh and Alzaid [
1]), GINAR(1) (Alzaid and Al-Osh [
7]), ZINAR(1) (Jazi, Jones and Lai [
9]), ZMGINAR(1) (Barreto-Souza [
11]), NGINAR(1) (Ristić, Bakouch and Nastić [
12]), ZIMINAR(1) (Li, Wang and Zhang [
14]). The Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) are suggested to evaluate these models. Numerical results in
Table 7 show that the ZIGINAR
(1) process has the best overall performance, compared with its competitors. Therefore, we assume that the drug crime data is from the ZIGINAR
(1) model and the estimated parameters in
Table 7 are used in the process control procedures. Based on the computation results in
Section 4, the CUSUM chart with designs
(corresponding ARL
) is the best choice, which is shown in
Figure 6a. For comparison, we also present CUSUM control charts with designs
(ARL
) in
Figure 6b, designs
(ARL
) in
Figure 6c, and the Shewhart chart with control limit
(ARL
) in
Figure 6d. We observe that all the CUSUM charts give out-of-control signals, while there is no outliers in the Shewhart chart. Because the Shewhart chart has been proved to be less effective than the CUSUM chart, the drug crime data set seems to be out-of-control with increasing mean shifts or increasing correlation shifts, and some investigation should be done for further explanation. The CUSUM control charts with three designs also display different detection efficiencies. The CUSUM chart with
first signals at
following with continuous alarms as
t increases. The signals of the CUSUM chart with
are first given at
, then go back below the control limit over a period of time, and come again at
. While the outlier of the CUSUM chart with
occurs at
. The analysis above proves again that the CUSUM chart design
is the most effective in practice.