A Conway–Maxwell–Poisson-Binomial AR(1) Model for Bounded Time Series Data

Binomial autoregressive models are frequently used for modeling bounded time series counts. However, they are not well developed for more complex bounded time series counts of the occurrence of n exchangeable and dependent units, which are becoming increasingly common in practice. To fill this gap, this paper first constructs an exchangeable Conway–Maxwell–Poisson-binomial (CMPB) thinning operator and then establishes the Conway–Maxwell–Poisson-binomial AR (CMPBAR) model. We establish its stationarity and ergodicity, discuss the conditional maximum likelihood (CML) estimate of the model’s parameters, and establish the asymptotic normality of the CML estimator. In a simulation study, the boxplots illustrate that the CML estimator is consistent and the qqplots show the asymptotic normality of the CML estimator. In the real data example, our model takes a smaller AIC and BIC than its main competitors.


Introduction
Bounded time series of counts are commonly observed in real-world applications. Its (binomial) index of dispersion (as a function of n, µ and σ 2 ) is defined by BID(X) = nσ 2 / µ(n − µ) , where n is the predetermined upper limit of the range, E(X) = µ and Var(X) = σ 2 . If its BID(X) < 1, then it is under-dispersed, if its BID(X) = 1, then it is equi-dispersed, while if its BID(X) > 1, then it is over-dispersed (or the extra-binomial variation).
A popular tool to establish a binomial autoregressive model (BAR) is the binomial thinning operator "•" [1], which is introduced by where X is a non-negative integer-valued random variable, {W i , i = 1, 2, · · · , n} is an i.i.d. Bernoulli random variable sequence with P(W i = 1) = 1 − P(W i = 0) = α and independent of X. McKenzie [2] used the binomial thinning operator given in (1) to establish the binomial AR(1) model, which is a popular model for bounded time series and defined as follows where n ∈ N is the predetermined upper limit of the range; X 0 follows the binomial distribution with P(X 0 = k) = ( n k )π k (1 − π) n−k ; α = β + ρ and β = (1 − ρ)π with ρ ∈ (max {−π/(1 − π), −(1 − π)/π}, 1) and π ∈ (0, 1); the counting series at time t are independent of the random variables X s , ∀s < t; and all the counting series in "α•" and "β•" are mutually independent sequences of independent Bernoulli random variables with parameters α and β, respectively. The binomial AR(1) process given in (2) is now well understood and it is an ergodic Markov chain with a stationary distribution Bin(n, π) with π = β/(1 − ρ) and ρ = α − β. Hence, its BID(X t ) = 1, i.e., the BAR model given in (2), applies to equi-dispersed time series with finite range; see [3][4][5][6][7] for more discussion about the BAR(1) model. Weiß and Pollett [8] extended the binomial AR(1) model as the density-dependent BAR(1) model (denoted as the DDBAR(1) model), whose thinning probabilities vary over time by assuming α t = α(X t−1 /n) and β t = β(X t−1 /n). In particular, for given n, if α t = (1 − ρ)(a + bX t−1 /n) and β t = (1 − ρ)(a + bX t−1 /n) + ρ, the DDBAR(1) model allows to analyze bounded integer-valued time series with under-dispersion, equi-dispersion and over-dispersion, see Section 4 in [8] for more details. To model extra-binomial variation for time series counts, Weiß and Kim [9] proposed the beta-binomial AR (BBAR) model based on the beta-binomial thinning operator "α• φ ", which is introduced by where X is a non-negative integer-valued random variable, {B i , i = 1, 2, · · · , n} is an i.i.d. Bernoulli random variable sequence with P( As discussed in Weiß [10], the BAR(1) model, DDBAR(1) model, and BBAR(1) model can be interpreted as a system with n mutually independent units and each unit being either in state "1" or state "0". Assume X t is the number of units being in state "1" at time t. Then α ) is the number of units, which moved from state "0" to state "1" at time t with revival probability β (random revival probability β t or β φ ). It is worth mentioning that all of BAR(1), DDBAR(1), and BBAR(1) models are aimed at a system with n independent units, but not a system with n dependent units, i.e., the counting series in "•" is independent and identically distributed, but not dependent. To solve this dilemma, Kang et al. [11] proposed a generalized binomial AR (GBAR) model based on the generalized binomial thinning operator "α• θ ", which is proposed by Ristić et al. [12] and given as follows are two independent random sequences of iid random variables with Bernoulli(α) and Bernoulli(θ) distributions, Z is a Bernoulli(α) random variable and is responsible for the cross-dependence, ∀i, j = 1, 2, ..., X, {W i }, {V j } and Z are mutually independent and each of them is independent of X.
Unfortunately, the GBAR model [11] can not use to analyze under-dispersed or equidispersed bounded data. To fill this gap, we are inspired by the Conway-Maxwell-Poissonbinomial (CMPB) distribution [13] and construct the Conway-Maxwell-Poisson-binomial thinning operator, whose counting series is exchangeablility. Furthermore, we propose a new Conway-Maxwell-Poisson-binomial autoregressive (CMPBAR) model, which not only allows us to analyze bounded data with over-dispersion but also allows us to model bounded data with equi-dispersion or under-dispersion. The second contribution of this paper is that we discuss the CML estimation of the parameters involved in the new model, and establish the asymptotic normality of the CML estimator. To illustrate that the new model is more flexible and superior, we apply the new model on the weekly rainy days at Hamburg-Neuwiedenthal in Germany.
The paper is organized as follows. Section 2 first gives a brief review of the Conway-Maxwell-Poisson-binomial distribution, then gives the definition of the exchangeable Conway-Maxwell-Poisson-binomial thinning operator and that of the Conway-Maxwell-Poisson-binomial AR model. The conditional maximum likelihood estimation and its asymptotic properties are established in Section 3. Section 4 gives a simulation study and Section 5 gives real data to show the better performance of the new model. Conclusions are made in Section 6.

Conway-Maxwell-Poisson-Binomial Distribution
For readability, we first give a brief review of the CMPB distribution introduced by Shmueli et al. [13].
A random variable X taking values in {0, 1, 2, . . . , n} is said to follow the Conway-Maxwell-Poisson-binomial distribution with parameters (α, ν), if the probability mass function (pmf) of X takes the form P(X = x|α, ν, n) = ( n x ) , ν ∈ R and n ∈ N is the predetermined upper limit of the range.
To further explore the dynamic change of the BID with α varying from {0.1, 0.2, · · · , 0.9} for given n = 7 and ν = −0.5, 0, 0.5, 1, 1.5, or 2, we present the plots of the BID in Figure 2. From Figure 2, we obtain the following observations. First, if ν < 1, the BID is no less than 1. To be precise, its BID is increasing to maximum when α is varying from 0 to 0.5, and then decreasing to 1 when α is varying from 0.5 to 1. Second, if ν = 1, its BID = 1, for all α ∈ (0, 1). Third, if ν > 1, its BID is no more than 1. Precisely, its BID is decreasing to the minimum when α is varying from 0 to 0.5, and then increasing to 1 when α is varying from 0.5 to 1. To sum up, the Conway-Maxwell-Poisson-binomial distribution allows under-dispersion, equi-dispersion, and over-dispersion for bounded time series data. (3), the pmf of the CMPB (n, α, ν) is expressed as that of the power series distribution and if ν = 0, P(X = x|θ, ν,

Conway-Maxwell-Poisson-Binomial Thinning Operator
By Shmueli et al. [13], the CMPB distribution is a distribution on the sum of n dependent Bernoulli components without specifying anything else about the joint distribution of those components. Precisely, if X ∼ CMPB(n, α, ν), there exists a Bernoulli variable

Definition 1. Let θ = α/(1 − α). Then the exchangeable Conway-Maxwell-Poisson-binomial thinning operator is introduced by
where X is a non-negative random variable, {Z i , i = 1, 2, · · · , X} is an exchangeable Bernoulli variable sequence with its pmf taking the form (5) and independent of X.
To generate the random number of "α ν X", we first let X = n, then α ν X|(X = n) ∼ CMPB(n, α, ν). Therefore, , then the pmf of α ν n takes the form (3). Third, we let θ = λ ν , λ > 0. By (3), the pmf of the α ν n can be rewritten as by which an algorithm is used to generate a random number of α ν X with X = n can be expressed as follows.
Remark 2. By Kadane [16], the counting series {Z i } in Definition 1 is a dependent Bernoulli variable sequence with exchangeability of order 2. To account for the concept of exchangeability, we assume π is a permutation of (z 1 , z 2 , · · · , z n ). Then P z 1 ,··· ,z n = P π( 1 ,··· ,z n ) . By the definition of exchangeability in Section 6 in Kadane [16], ∑ n i=1 Z i is n-exchangeable. Kadane [16] stated that "de Finetti's Theorem shows that sums of exchangeable random variables are mixtures of Binomial random variables. Because the marginal distribution of each component is Bernoulli, interest centers on the joint distribution of pairs of such variables". By Theorem 4 in Kadane [16], n-exchangeability applies to every permutation of length n, it implies that n is exchangeable for each n < n. Hence, {Z i } is exchangeable with order 2 because every pair has the same distribution as every other pair, i.e., every pair of {Z 1 , Z 2 , · · · , Z n } has the same distribution as every other pair and for any pair (Z i , Z j ), ∀i, j = 1, 2, · · · , n, and i = j, P( [16] for more discussion.

Binomial Autoregressive Model with the CMPB Operator
Now, we define the BAR(1) model with the CMPB operator by W i are the CMPB thinning operators given in Definition 1, their counting series {Z i } and {W i } are the exchangeable Bernoulli variable sequence with their pmfs taking the form (5), {Z i } is independent of {W j }, ∀i = 1, 2, . . . , X t−1 , j = 1, 2, . . . , (n − X t−1 ), and all the thinnings at time t are independent of {X s , s < t}, n ∈ N, ν ∈ R.
For simplicity, we denote the new model as the CMPBAR(1) model. By (8), {X t } N is a Markov chain and its one-step transition probability takes the form where (8), then {X t } is ergodicity and strictly stationarity.
By Definition 1 and (8), for given X t−1 , {X t } given in (8) consists of two independent parts α ν X t−1 and β ν (n − and the conditional binomial index of dispersion (CBID) is Unfortunally, because of the complexity of S 1 (θ 1 , ν) and S 2 (θ 2 , ν), we can not obtain the marginal distribution of {X t } and its the autocorrelation structure, including the E(X t ), Var(X t ), and BID. To resolve this dilemma, for given n = 10, we create some plots of the BID (in Figure 3 From Figure 3, we have the following observations. First, if ν < 1, the BID of the CMPBAR(1) model is greater than 1, i.e., the CMPBAR(1) model allows us to analyze bounded integer-valued time series with overdispersion. Second, if ν > 1, the BID of the CMPBAR(1) model is less than 1, i.e., the CMPBAR(1) model allows us to analyze bounded integer-valued time series with underdispersion. Third, if ν = 1, the CMPBAR(1) model becomes to the BAR(1) given in (2) and its BID is equal to 1, i.e., equi-dispersed bounded integer-valued time series is allowed.

Parameter Estimation
In this section, we use the conditional maximum likelihood method to estimate the parameters (denoted as η = (θ 1 , θ 2 , ν) ) involving in the CMPBAR(1) model. Let {X 0 , X 1 , . . . , X T } be a realization of {X t }, and generate by the CMPBAR(1) process based on Algorithm 1, where T ∈ N represents the size of sample.
By using (9), the conditional log-likelihood function can be written as: where Then the CML estimateη cml is obtained by minimizing (10).

Simulation
In this section, we conduct a simulation study to illustrate the large sample property of the CMPBAR(1) model.
In the simulation, we fix n = 10, let sample size T = 100, 300, 500, and use the optim function in R to optimize (η) in (10). To check the finite sample performance, we use the following parameter combinations of (θ 1 , θ 2 , ν) as  2 , whereφ i is the estimator of ϕ in the ith replication and m = 10, 000. Summaries of the simulation results are given in Tables 1-3.
To illustrate the consistency and the asymptotic normality of the CML estimators, we present the boxplots of the CML estimates for (A1), (B1), and (C1) in Figures 4, 5, and 6, and their qqplots with T = 500 in Figure 7, 8, and 9, respectively. Others are similar and we omit them.          These studies indicate that the CML method seems to perform reasonably well. First, Tables 1-3 show that the standard deviation of the CML estimator is decreasing with the sample size increase and the mean of the CML estimator is closer to the true parameter value in general cases. Second, Figures 4-6 account for the location and dispersion of the estimates, all of which indicate the consistency of the estimators. Third, Figures 7-9 indicate the asymptotic normality of the CML estimator.

Real Data Example
In this section, we consider the number of weekly rainy days for the period from 1 January 2005 to 31 December 2010 at Hamburg-Neuwiedenthal in Germany, where a week is defined as being from Saturday to Friday and n = 7. The data were collected from the German Weather Service (http://www.dwd.de/, accessed on 12 December 2018). The sample path and the ACF and PACF plots of the observations are given in Figures 10 and 11, respectively.  Figure 11. ACF and PACF plots of the weekly rainy days. (1) shows that the ACF exhibits significant value for lag 1, and (2) presents that the PACF indicates an AR(1)-like autocorrelation structure.
By computation, the sample mean and variance are 3.8371 and 3.6753, and the BID of the data is 1.2371, which implies the data exhibits extra-binomial variation. Hence, we use the CMPBAR(1) model, BAR(1) model [2], BBAR(1) model [9], and GBAR(1) model [11] to fit data by the CML method. We compare the estimated standard error (SE), −log-likelihood (−log-lik), Akaike's information criterion (AIC) and Bayesian information criterion (BIC), which are summarized in Table 4, including the fitted results of the CML estimate.
From Table 4, the CMPBAR(1) model takes the smallest values of the −log-lik, AIC, and BIC. Hence, the CMPBAR(1) model might be more appropriate for the weekly rainy days.
To illustrate the adequacy of the CMPBAR(1) model, we consider the fitted Pearson residual analysis of the CMPBAR(1) model. By computation, the mean and variance of the fitted Pearson residual are 0.0760 and 1.0500, respectively. The residual analysis in Figure 12 shows that this model performs rather well.  In addition, to further check the adequacy of the CMPBAR(1) model, we present the probability integral transform (PIT) (if the fitted model is adequate, its PIT histogram looks like that of a uniform distribution, see [10] for more discussion) in Figure 13 based on the fitted CMPBAR(1) model.
As can be seen in Figure 13, the PIT histogram of the CMPBAR(1) model is close to uniformity, i.e., the PIT histogram confirms that the fitted CMPBAR(1) model works reasonably well for the weekly rainy days.

Concluding Remarks
This paper considers a new CMPB thinning operator and proposes a new CMPBAR(1) model, which provides an available method to model bounded data with under-dispersion, equi-dispersion, and over-dispersion. We discuss some properties of the new model, the estimate of the parameters, and its large-sample properties. Simulations are conducted to examine the finite sample performance of estimators. A real data example is provided to illustrate the applicability of the CMPBAR(1) model.
There are several directions in which we plan to take this work forward. First, the random coefficient CMPBAR(1) model can be introduced by where α t = α(X t−1 /n) and β t = β(X t−1 /n), "• ν " is the CMPB thinning operator and the counting series in "α t • ν ", and that in "β t • ν " is independent and all of the counting series at time t is independent of {X s , s < t}; see Weiß and Pollett [8] for the random coefficient BAR(1) model. Second, a correlated sign-thinning operator can be established by where sign(x) = 1 if x ≥ 0 and sign(x)=−1 if x < 0, {Z i , i = 1, 2, · · · , X} is an exchangeable Bernoulli variable sequence with its pmf taking the form (5). Based on the correlated sign thinning operator, one can construct a Z-valued autoregressive model to analyze data with a range Z and under-dispersed, equi-dispersed, and over-dispersed. Third, a class of Conway-Maxwell-Poisson-binomial generalized autoregressive conditional heteroskedasticity models can be considered by Z t |F t−1 ∼ CMPB(n, α t , ν), α t = g η (Z t−1 /n, α t−1 ), where η is the parameter vector involving in the model (see Ristić et al. [20] and Chen et al. [18] for ARCH-type models, Lee and Lee [21] and Chen et al. [22] for GARCH-type models for bounded data). In addition, a semi-parameter version can be considered by Z t |F t−1 ∼ CMPB(n, α t , ν), α t = g η (Z t−1 /n, α t−1 ) + f γ (X t ), where η is the parameter vector involved in the model, {X t } is the covariate process imposed in the observe process {Z t }, and γ is the parameter vector involving in f (·).