Integer-Valued Split-BREAK Process with a General Family of Innovations and Application to Accident Count Data Modeling

Vladica S. Stojanović; Hassan S. Bakouch; Zorica Gajtanović; Fatimah E. Almuhayfith; Kristijan Kuk

doi:10.3390/axioms13010040

,

and

¹

Department of Informatics & Computer Sciences, University of Criminal Investigation and Police Studies, 11060 Belgrade, Serbia

²

Department of Mathematics, College of Science, Qassim University, Buraydah 51452, Saudi Arabia

³

Department of Mathematics, Faculty of Science, Tanta University, Tanta 31111, Egypt

⁴

Teacher Education Faculty, University of Kosovska Mitrovica, 38218 Leposavić, Serbia

Axioms2024, 13(1), 40;https://doi.org/10.3390/axioms13010040

This article belongs to the Special Issue Stochastic and Statistical Analysis in Natural Sciences

Version Notes

Order Reprints

Abstract

This paper presents a novel count time-series model, named integer-valued Split-BREAK process of the first order, abbr. INSB(1) model. This process is examined in terms of its basic stochastic properties, such as stationarity, mean, variance and correlation structure. In addition, the marginal distribution, over-dispersion and zero-inflation properties of the INSB(1) process are also examined. To estimate the unknown parameters of the INSB(1) process, an estimation procedure based on probability generating functions (PGFs) is proposed. For the obtained estimators, their asymptotic properties, as well as the appropriate simulation study, are examined. Finally, the INSB(1) process is applied in the dynamic analysis of some real-world series, namely, the numbers of serious traffic accidents in Serbia and forest fires in Greece.

Keywords:

time series; power-series family of distributions; noise indicator; parameter estimation; simulation; forecasting

MSC:

62M10; 60G10; 62M20

1. Introduction

Non-negative integer-valued (NNIV) time series are the subject of numerous research studies (see, among the more recently published, e.g., [1,2,3,4,5,6]) on the modeling and analysis of count time series. In the class of NNIV series, some of the frequently used models are the so-called integer-valued autoregressive (INAR) processes (see, among the more recently published, e.g., [7,8,9,10,11,12,13,14]). Our main motivation in this study is to introduce an integer-valued process based on the autoregressive principle, which would have a more general form than ordinary INAR-based time-series models. The proposed generalization can be viewed in the next two different aspects:

The first motive is the formation of the NNIV stochastic model based on a principle similar to that of the so-called Split-BREAK process, introduced by Stojanović et al. [15,16] and later also discussed by Jovanović et al. [17] and Ljajko et al. [18]. The Split-BREAK process is proposed there as a stochastic model intended for the description and analysis of time series with accentuated and persistent fluctuations, whereby different forms of continuous stochastic distributions (such as Gaussian, Laplace and Cauchy) are used as its innovations. Therefore, the model proposed here is based on similar ideas, i.e., it describes pronounced fluctuations in count time-series dynamics.

Another motive is the use, as innovation series, of NNIV distributions known as power-series (PS) distributions. These distributions represent a very general class of NNIV stochastic distributions, to which many well-known distributions belong as special cases (see, for instance, Stojanović et al. [19,20,21]).

In this way, a first-order integer-valued Split-BREAK process (abbr. INSB(1) process) is proposed here. It is worth pointing out that this model, similar to the Split-BREAK process with continuous-type distributions, can be seen as a generalization of some well-known count time-series models, such as INAR-based models. Also, as will be explained further, it can be applied to the examination of more pronounced fluctuations in some real-world time series. The definition and the basic features of the INSB(1) process are described in next section, Section 2. Then, some of the more important stochastic features of this process are presented in Section 3. The next section, Section 4, is devoted to a recent estimation technique called probability-generating function (PGF) estimation method. For the estimators thus obtained, asymptotic properties and efficiency, under some regulatory conditions, are also analyzed. Section 5 presents Monte Carlo simulations of the PGF estimators for some specific innovations, such as the Poisson and geometric distributions, which can be seen as special cases of PS-distributed innovation series. In both cases, the asymptotic properties of the obtained estimates are examined. The application of the INSB(1) process in modeling the dynamics and empirical distribution of some real-world time series, namely, the numbers of traffic accidents and forest fires, is described in Section 6. Also, the INSB(1) model is compared here with the ordinary INAR(1) model and is shown that the proposed model has the same or even better efficiency and forecasting accuracy. Finally, Section 7 provides some concluding details.

2. Definition and Structure of the INSB(1) Process

Similarly as in Stojanović et al. [19,20,21], we firstly introduce the independent identically distributed (i.i.d.) time series with power-series (PS) distribution.

Definition 1.

The i.i.d. integer-valued time series

(ε_{t})

,

t \in Z

, is PS-distributed if its probability mass distribution (PMF) is as follows:

p_{ε} (x; a) : = P {ε_{t} = x} = \frac{m (x) a^{x}}{f (a)}, x \in S,

(1)

where

S \subseteq Z^{+} = \{0, 1, 2, \dots\}

is discrete set of values of the series

(ε_{t})

;

m (x) \geq 0

is the mass function;

a > 0

is the (unknown) parameter; and

f (a) : = \sum_{x \in S} m (x) a^{x}

is the increasing function, which converges on some interval

a \in (0, R)

.

As is shown in Table 1, for certain choices of functions

m (x)

and

f (a)

, Equation (1) gives some of the most well-known types of discrete distributions. Notice that the condition

0 \in S

holds for them, which will henceforth be assumed to be satisfied, in order to examine the zero-inflation property of our model. Furthermore, using some simple calculations (see, e.g., [19]) for the mean and variance of the random variables (RVs)

(ε_{t})

, one obtains

\begin{matrix} μ_{ε} & : = E [ε_{t}] = \frac{1}{f (a)} \sum_{x \in S} m (x) x a^{x} = a \frac{f^{'} (a)}{f (a)} = a g^{'} (a) \\ σ_{ε}^{2} & : = V a r [ε_{t}] = \frac{1}{f (a)} \sum_{x \in S} m (x) x^{2} a^{x} - μ_{ε}^{2} = a^{2} \frac{f^{″} (a)}{f (a)} + a \frac{f^{'} (a)}{f (a)} - {(a \frac{f^{'} (a)}{f (a)})}^{2} \\ = μ_{ε} + a^{2} g^{″} (a), \end{matrix}

(2)

where

g (a) = ln f (a)

.

Table 1. Some specific PS distributions, along with their over-dispersion indices and PGFs.

Furthermore, if we define the over-dispersion index

D_{ε} (a) : = σ_{ε}^{2} - μ_{ε} = a^{2} g^{″} (a)

, then inequality

D_{ε} (a) > 0

holds if and only if

g^{″} (a) > 0

,

\forall a \in (0, R)

. Thus, the series

(ε_{t})

is over-dispersed if and only if

g (a)

is a convex function on

(0, R)

. Finally, the PGF of the first order of PS-distributed RVs

(ε_{t})

can be obtained as follows:

G_{ε} (u; a) : = E [u^{ε_{t}}] = \frac{1}{f (a)} \sum_{x \in S} m (x) {(a u)}^{x} = \frac{f (a u)}{f (a)}, u \in [- 1, 1] .

Clearly, the above sum converges when

u \in (0, R / a)

and allows for the simple calculation PGFs of some specific PS distributions, as also presented in Table 1. Below is a definition of the INSB(1) process, as well as some basic notes about it.

Definition 2.

Let

(Ω, F, P)

be the probability space, expanded by some filtration

F = (F_{t})

, where

t \in Z

is the set of time indices. The INSB(1) process is represented by the following time series, defined on an expanded basis

(Ω, F, P, F)

:

(i)

(ε_{t})

is the i.i.d. time series with PS distribution, given by Equation (1).

(i i)

(X_{t})

is a series of martingale means given by recurrence relation:

X_{t} = α \circ (X_{t - 1} + q_{t - 1} ε_{t - 1}),

(3)

where

α \in (0, 1)

is a (unknown) parameter;

α \circ X : = \sum_{j = 1}^{X} B_{j} (α)

is the binomial thinning operator, where

B_{j} (α)

are mutually independent (and also independent of X) Bernoulli RVs, with

P {B_{j} = 1} = 1 - P {B_{j} = 0} = α

; and

q_{t} = I (ε_{t - 1} \geq c) = \{\begin{matrix} 1, & ε_{t - 1} \geq c, \\ 0, & ε_{t - 1} < c \end{matrix}

(4)

is the noise indicator with critical value

c > 0

.

(i i i)

(Y_{t})

is a basic INSB series given by the additive decomposition

Y_{t} = X_{t} + ε_{t} .

(5)

Note that in a practical interpretation, the filtration

(F_{t})

is a set of “information” about some (real-world) time series at time t. Thus, PS-distributed RVs

(ε_{t})

are

F_{t}

-adaptive, for each

t \in Z

, and constitute the deviation (noise) component of the INSB(1) process. On the contrary, series

(X_{t})

is

F_{t - 1}

-adaptive and represents the predictive and stability component of the INSB(1) process. Finally, the parameter

c > 0

is the critical value of the reaction, which indicates the importance of the earlier realizations of

(ε_{t})

for the inclusion of its current values in Equation (3). In other words, when

q_{t - 1} = 0

, the martingale mean

X_{t}

does not exceed its previous value

X_{t - 1}

, and the basic INSB series

(Y_{t})

, given by Equation (5), is then realized with "low" fluctuation. Otherwise, the case

q_{t - 1} = 1

indicates a pronounced fluctuation in the series

(Y_{t})

. In this way, the critical value

c > 0

determines not only the fluctuation intensity of the basic series

(Y_{t})

but also the stochastic structure of the INSB(1) process, which represents a certain generalization of some well-known integer time-series models. For instance, according to Equation (4), larger values of c imply that

q_{t} \to 0

. Thus, Equation (3) becomes

X_{t} = α \circ X_{t - 1}

, that is,

X_{t} \overset{a s}{=} 0

, where “as” means “almost surely”. According to Equation (5), the series

(Y_{t})

is then reduced to the innovation series

(ε_{t})

. On the other hand, when

c \to 0

, it follows that

q_{t} \overset{a s}{=} 1

, and the structure of the series

(X_{t})

, as well as

(Y_{t})

, is then the same as with first-order integer-valued autoregressive (abbr. INAR(1)) models.

Figure 1 shows realizations of all mentioned INSB series, where as PS-distributed innovations, the Poisson distribution with parameter

a = 2

is taken. It is easy to see that the series

(X_{t})

of martingale means has the most pronounced zero inflation, which will be formally confirmed further on.

Figure 1. (a) Realizations of the INSB(1) time series. (b) Empirical frequency distributions of the INSB(1) time series (parameters are

α = 0.5, a = 2, c = 1

).

3. Main Properties of the INSB(1) Process

Here, some stochastic features of the INSB(1) process are examined. The following statement gives the integer-valued moving average representation of the infinite order (abbr. INMA

(\infty)

representation) of corresponding series of this process.

Theorem 1.

Suppose that PS-innovations

(ε_{t})

, given by Equation (1), have finite first moment

μ_{ε} = μ_{ε} (a)

, which is uniformly bounded on

a \in (0, R)

. Then, the INSB(1) series

(X_{t})

, defined by Equation (3), has an INMA

(\infty)

representation:

X_{t} \overset{d}{=} \sum_{k = 1}^{\infty} α^{k} \circ ξ_{t - k},

(6)

where

ξ_{t} = q_{t} ε_{t}

,

t \in Z

. Similarly, series

(Y_{t})

, defined by Equation (5), has a representation:

Y_{t} \overset{d}{=} ε_{t} + \sum_{k = 1}^{\infty} α^{k} \circ ξ_{t - k},

(7)

where both sums above converge almost surely.

Proof.

Using a similar procedure to that in Stojanović et al. [19] (see Theorem 2.1), it can be proved that RVs

(ξ_{t})

are mutually uncorrelated with the mean and variance, respectively,

\begin{matrix} μ_{ξ} & : = E [ξ_{t}] = E [q_{t}] E [ε_{t}] = μ_{q} μ_{ε} (a) = μ_{q} a g^{'} (a) \\ σ_{ξ}^{2} & : = Var [ξ_{t}] = E [ξ_{t}^{2}] - μ_{ξ}^{2} = μ_{q} \frac{a f^{'} (a) + a^{2} f^{″} (a)}{f (a)} - μ_{q}^{2} {(a \frac{f^{'} (a)}{f (a)})}^{2} \\ = μ_{q} a \frac{f^{'} (a)}{f (a)} + μ_{q} a^{2} \frac{f^{″} (a) f (a) - μ_{q} {(f^{'} (a))}^{2}}{{(f (a))}^{2}} \\ = μ_{ξ} + μ_{q} a^{2} [g^{″} (a) + F_{ε} (c) {(g^{'} (a))}^{2}], \end{matrix}

(8)

where

μ_{q} : = E [q_{t}] = P {ε_{t} \geq c} = 1 - F_{ε} (c)

and

F_{ε} (c) : = P {ε_{t} < c}

is the cumulative distribution function (CDF) of RVs

(ε_{t})

. Further, according to assumptions of the theorem, there is an

L > 0

such that

\sum_{k = 1}^{\infty} P {ε_{t} \geq k} = \sum_{k = 1}^{\infty} k P \{ε_{t} = k\} = μ_{ε} (a) \leq L < + \infty, \forall a \in (0, R) .

Based on that and independence of RVs

ε_{t}

and

q_{t}

, for each

t \in Z

, it follows that

\begin{matrix} \sum_{k = 1}^{\infty} P {ξ_{t} \geq k} & = \sum_{k = 1}^{\infty} P {ε_{t} \geq k} P {q_{t} = 1} = P {ε_{t} \geq c} \sum_{k = 1}^{\infty} k P {ε_{t} = k} \\ = μ_{q} μ_{ε} (a) \leq μ_{q} L < + \infty, \end{matrix}

where the sum above converges uniformly on

a \in (0, R)

. In addition, as

1 / k

,

k = 1, 2, \dots

is a monotone and bounded sequence, the Abelian criterion for convergence of infinite sums implies that

\sum_{k = 1}^{\infty} \frac{1}{k} P {ξ_{t} \geq k} = μ_{q} \sum_{k = 1}^{\infty} \frac{1}{k} P {ε_{t} \geq k} < + \infty .

(9)

Moreover, the above convergence is uniform on

a \in (0, R)

, and using Theorem 2.1 in Alzaid and Al-Osh [22], it follows that inequality (9) is sufficient for the equality

G_{X} (u; θ) = \prod_{k = 1}^{\infty} G_{ξ} (1 + α^{k} (u - 1); θ_{ξ}),

(10)

where

G_{ξ} (u; θ_{ξ}) : = E [u^{ξ_{t}}] = 1 + μ_{q} (\frac{f (a u)}{f (a)} - 1)

(11)

is the PGF of RVs

(ξ_{t})

,

G_{X} (u; θ) : = E [u^{X_{t}}]

is the PGF of RVs

(X_{t})

, while

θ_{ξ} = {(a, μ_{q})}^{'}

and

θ : = {(a, α, μ_{q})}^{'}

are vectors of (unknown) parameters. Furthermore, the above product absolutely converges for every

u \in [- 1, 1]

, so using the bijective correspondence between PGFs and PMFs of arbitrary RVs, it follows that Equation (10) is equivalent to the INMA

(\infty)

representation in Equation (6).

To prove the almost sure convergence in Equation (6), note that using Equation (3), for any

k = 1, 2, \dots

, it holds that

X_{t} \overset{d}{=} α^{k} \circ X_{t - k} + \sum_{j = 1}^{k} α^{j} \circ ξ_{t - j} .

(12)

Now, let us consider a random event:

\begin{matrix} A & : = \{lim_{k \to \infty} \sum_{j = 1}^{k} α^{j} \circ ξ_{t - j} = X_{t}\} = \{lim_{k \to \infty} α^{k} \circ X_{t - k} = 0\} = ⋂_{δ > 0} ⋃_{n = 1}^{\infty} ⋂_{k = n}^{\infty} \{0 \leq α^{k} \circ X_{t - k} < δ\} \\ = ⋃_{n = 1}^{\infty} ⋂_{k = n}^{\infty} \{α^{k} \circ X_{t - k} = 0\} = ⋃_{n = 1}^{\infty} A_{n}, \end{matrix}

where

A_{n} : = ⋂_{k = n}^{\infty} {α^{k} \circ X_{t - k} = 0} .

According to Equation (12), for any

m, n = 1, 2, \dots

, one obtains

α^{n} \circ X_{t - n} \overset{d}{=} α^{m + n} \circ X_{t - m - n} + \sum_{j = n + 1}^{m + n} α^{j} \circ ξ_{t - j} .

From here, using the probability continuity property and the definition of the thinning operator, for events

A_{n}

, we obtain

\begin{matrix} P (A_{n}) & = P (lim_{m \to \infty} ⋂_{k = n}^{m + n} \{α^{k} \circ X_{t - k} = 0\}) \\ = lim_{m \to \infty} (P \{α^{m + n} \circ X_{t - m - n} = 0\} \times \prod_{j = n + 1}^{m + n} P \{α^{j} \circ ξ_{t - j} = 0\}) \\ = lim_{m \to \infty} (\sum_{k = 0}^{\infty} {(1 - α^{m + n})}^{k} P \{X_{t - m - n} = k\}) \times lim_{m \to \infty} \prod_{j = n + 1}^{m + n} (\sum_{k = 0}^{\infty} {(1 - α^{j})}^{k} P \{ξ_{t - j} = k\}) \\ = lim_{m \to \infty} G_{X} (1 - α^{m + n}; θ) \times lim_{m \to \infty} \prod_{j = n + 1}^{m + n} G_{ξ} (1 - α^{j}; θ_{ξ}) \\ = G_{X} (1; θ) \times \prod_{j = n + 1}^{\infty} G_{ξ} (1 - α^{j}; θ_{ξ}) \\ = \prod_{j = n + 1}^{\infty} G_{ξ} (1 - α^{j}; θ_{ξ}) . \end{matrix}

By applying again the property of continuity of probability and convergence of the product in Equation (10), it follows that

P (A) = lim_{n \to \infty} P (A_{n}) = lim_{n \to \infty} \prod_{j = n + 1}^{\infty} G_{ξ} (1 - α^{j}; θ_{ξ}) = 1,

that is, the sum in Equation (6) converges almost surely. Similarly, using the definition of the series

(Y_{t})

, given by Equation (5), the almost sure convergence in Equation (7) is proved. □

Remark 1

(PGFs of the INSB series). By applying a procedure similar to the previous theorem, as well as some general facts about PGFs of non-negative stationary integer-valued time series (see, e.g., [20]), explicit expressions for the PGFs of the INSB(1) process can be obtained. Namely, using Equations (5), (10) and (11), the first-order PGFs of the series

(X_{t})

and

(Y_{t})

are

\begin{matrix} G_{X} (u; θ) & = \prod_{k = 1}^{\infty} [1 + μ_{q} (\frac{f ((1 + α^{k} (u - 1)) a)}{f (a)} - 1)], \\ G_{Y} (u; θ) & = G_{ε} (u; θ) G_{X} (u; θ) = \frac{f (a u)}{f (a)} \prod_{k = 1}^{\infty} [1 + μ_{q} (\frac{f ((1 + α^{k} (u - 1)) a)}{f (a)} - 1)] . \end{matrix}

(13)

Furthermore, suppose that

u = {(u_{1}, \dots, u_{r})}^{'} \in R^{r}

,

r \geq 2

, and

X_{t}^{(r)} : = {(X_{t}, \dots, X_{t + r - 1})}^{'}

,

Y_{t}^{(r)} : = {(Y_{t}, \dots, Y_{t + r - 1})}^{'}

,

t \in Z

, are the overlapping blocks of series

(X_{t})

and

(Y_{t})

, respectively. The replacement

k = 1, \dots, r - 1

in Equation (12) and some calculations give the r-dimensional PGFs of random vectors

X_{t}^{(r)}

and

Y^{(r)}

as follows:

\begin{matrix} G_{X}^{(r)} (u; θ) & : = E [u_{1}^{X_{t}} \dots u_{r}^{X_{t + r - 1}}] \\ = G_{X} (\prod_{k = 0}^{r - 1} (1 + α^{k} (u_{k + 1} - 1)); θ) \\ \times \prod_{ℓ = 2}^{r} [1 + μ_{q} (\frac{f (\prod_{k = 0}^{r - ℓ} (1 + α^{k + 1} (u_{k + ℓ} - 1)) a)}{f (a)} - 1)] \\ and \\ G_{Y}^{(r)} (u; θ) & : = E [u_{1}^{Y_{t}} \dots u_{r}^{Y_{t + r - 1}}] \\ = G_{X} (\prod_{k = 0}^{r - 1} (1 + α^{k} (u_{k + 1} - 1)); θ) \\ \times \prod_{ℓ = 1}^{r} [\frac{(1 - μ_{q}) f (a u_{ℓ}) + μ_{q} f (\prod_{k = 0}^{r - ℓ} (1 + α^{k} (u_{k + ℓ} - 1)) a)}{f (a)}] . \end{matrix}

(14)

Also, it is worth noting that by using realizations of the (only) observable series

(Y_{t})

, that is, the second-order PGF

G_{Y}^{(2)} (u; θ)

, the parameter estimators of the INSB(1) process will be obtained (see Section 4, below).

Based on the previous theorem, the following statement concerns the mean, variance and correlation structure of INSB series.

Theorem 2.

Assume that the conditions of Theorem 1 hold, as well as that series

(X_{t})

and

(Y_{t})

, given by Equations (3) and (5), respectively, have second-order finite moments. Then, both of these series are strictly stationary and ergodic. The mean and variance for the series

(X_{t})

are, respectively,

\begin{matrix} μ_{X} & : = \frac{α μ_{q}}{1 - α} a g^{'} (a), \\ σ_{X}^{2} & : = μ_{X} + \frac{α^{2} a^{2} μ_{q}}{1 - α^{2}} [g^{″} (a) + F_{ε} (c) {(g^{'} (a))}^{2}], \end{matrix}

(15)

and the autocorrelation function (ACF) is

ρ_{X} (k) = α^{k}

,

k = 0, 1, \dots

Similarly, for the series

(Y_{t})

, it is

\begin{matrix} μ_{Y} & : = \frac{(1 - α F_{ε} (c))}{1 - α} a g^{'} (a), \\ σ_{Y}^{2} & : = μ_{Y} + \frac{a^{2}}{1 - α^{2}} [g^{″} (a) (1 - α^{2} F_{ε} (c)) + α^{2} F_{ε} (c) (1 - F_{ε} (c)) {(g^{'} (a))}^{2}], \end{matrix}

(16)

and its ACF is

ρ_{Y} (k) = \{\begin{matrix} 1, & k = 0, \\ \frac{α^{k} (σ_{X}^{2} + μ_{q} σ_{ε}^{2})}{σ_{X}^{2} + σ_{ε}^{2}}, & k = 1, 2, \dots \end{matrix}

Additionally, both sums in their INMA(∞) representations, given by Equations (6) and (7), converge in the mean-square sense.

Proof.

Using Equation (6) and the well-known features of the binomial thinning operator (see, e.g., Kella & Löpker [23]), for the mean of the series

(X_{t})

, one obtains

μ_{X} = E [X_{t}] = \sum_{j = 1}^{\infty} E [α^{j} \circ ξ_{t - j}] = μ_{ξ} \sum_{j = 1}^{\infty} α^{j} = \frac{α μ_{ξ}}{1 - α} .

Analogously, the variance of this series is obtained as follows:

\begin{matrix} σ_{X}^{2} & = Var [X_{t}] = \sum_{j = 1}^{\infty} Var [α^{j} \circ ξ_{t - j}] = \sum_{j = 1}^{\infty} (α^{2 j} Var [ξ_{t - j}] + α^{j} (1 - α^{j}) E [ξ_{t - j}]) \\ = \frac{α^{2} σ_{ξ}^{2}}{1 - α^{2}} + (\frac{α}{1 - α} - \frac{α^{2}}{1 - α^{2}}) μ_{ξ} = \frac{α μ_{ξ}}{1 - α^{2}} + \frac{α^{2} σ_{ξ}^{2}}{1 - α^{2}} . \end{matrix}

From here, by substituting Equations (8) for the mean value (

μ_{ξ}

) and variance (

σ_{ξ}^{2}

) of the series

(ξ_{t})

, and after some computations, Equations (15) are obtained. Finally, according to Equation (12), the ACF of the series

(X_{t})

is obtained from the equalities

\begin{matrix} ρ_{X} (k) & : = Corr [X_{t}, X_{t + k}] = Corr [X_{t}, α^{k} \circ X_{t}] + \sum_{j = 1}^{k} Corr [X_{t}, α^{j} \circ ξ_{t + k - j}] = α^{k}, k = 0, 1, \dots \end{matrix}

Note that previously proven facts imply that RVs

(X_{t})

have finite moments up to the second order. Then, using again Equation (12) and features of the binomial thinning, one obtains

E {[X_{t} - \sum_{j = 1}^{k} α^{j} \circ ξ_{t - j}]}^{2} = E {[α^{k} \circ X_{t - k}]}^{2} = α^{2 k} E [X_{t - k}^{2}] + α^{k} (1 - α^{k}) E [X_{t - k}] ⟶ 0, k \to \infty .

Thus, the mean-square convergence of the sum in Equation (6) holds. Finally, the statement for the series

(Y_{t})

can be proved in a completely analogous way. □

According to the previous theorem, in the non-trivial case

0 < μ_{q} < 1

, the inequalities

ρ_{Y} (k) < ρ_{X} (k)

,

k = 1, 2, \dots

obviously hold. Therefore, the basic INSB series

(Y_{t})

has a weaker correlation than

(X_{t})

, i.e., compared with the correlation of INAR-based models. Also, by applying the previous results, the over-dispersion conditions for INSB series can be described as follows.

Remark 2

(Over-dispersion conditions). Recall that the over-dispersion of the PS series

(ε_{t})

depends on the function

g (a) = ln f (a)

. Namely, Equations (2) imply that

D_{ε} (a) : = σ_{ε}^{2} - μ_{ε} > 0

if and only if

g^{″} (a) > 0

,

\forall a \in (0, R) .

On the other hand, the over-dispersion conditions of the series

(ξ_{t})

are weaker, because Equations (8) imply that

D_{ξ} (a) : = σ_{ξ}^{2} - μ_{ξ} > 0

holds if and only if

g^{″} (a) \geq 0 o r F_{ε} (c) > - \frac{g^{″} (a)}{{(g^{'} (a))}^{2}} .

Thereafter, using Theorem 2, that is, Equations (15), it is obtained that

D_{X} (a) : = σ_{X}^{2} - μ_{X} = \frac{α^{2} D_{ξ} (a)}{1 - α^{2}},

so the over-dispersion conditions are the same for both series

(X_{t})

and

(ξ_{t})

. Finally, Equation (16) implies that

D_{Y} (a) : = σ_{Y}^{2} - μ_{Y} > 0

if and only if

g^{″} (a) \geq 0 o r \frac{α^{2} F_{ε} (c) (1 - F_{ε} (c))}{1 - α^{2} F_{ε} (c)} > - \frac{g^{″} (a)}{{(g^{'} (a))}^{2}} .

According to the inequality

α^{2} x (1 - x) / (1 - α^{2} x) < x

,

\forall α, x \in (0, 1)

, it follows that

F_{ε} (c) > \frac{α^{2} F_{ε} (c) (1 - F_{ε} (c))}{1 - α^{2} F_{ε} (c)} .

Thus, over-dispersion of the series

(Y_{t})

implies over-dispersion for

(X_{t})

, that is,

(X_{t})

has weaker over-dispersion conditions than

(Y_{t})

.

The Markov properties of the INSB(1) series, their marginal distributions and the properties of zero inflation are discussed below.

Theorem 3.

Let

(ε_{t})

be the PS-distributed series, with the PMF given by Equation (1). The martingale mean series

(X_{t})

, defined by Equation (3), is a homogeneous Markov process with one-step transition probabilities:

\begin{matrix} p_{i, j}^{(X)} : = P \{X_{t} = j | X_{t - 1} = i\} & = F_{ε} (c) (\binom{i}{j}) α^{j} {(1 - α)}^{i - j} I (i \geq j) + (1 - F_{ε} (c)) \\ \times \sum_{k = 0}^{\infty} (\binom{i + k}{j}) α^{j} {(1 - α)}^{i + k - j} p_{ε} (k; a) I (i + k \geq j) . \end{matrix}

(17)

Similarly, series

(Y_{t})

, given by Equation (5), is a Markov process with transition probabilities:

\begin{matrix} p_{i, j}^{(Y)} & : = P \{Y_{t} = j | Y_{t - 1} = i\} = \sum_{k = 0}^{m} (\binom{i}{k}) α^{k} {(1 - α)}^{i - k} p_{η} (j - k; θ), \end{matrix}

(18)

where

m = min {i, j}

and

p_{η} (x; θ) : = (1 - F_{ε} (c)) p_{ε} (x; a) + F_{ε} (c) \sum_{k = 0}^{\infty} \sum_{ℓ = k}^{\infty} (\binom{ℓ}{k}) α^{k} {(1 - α)}^{ℓ - k} p_{ε} (x + k; a) p_{ε} (ℓ; a)

(19)

is the PMF of the series

η_{t} : = ε_{t} - α \circ ε_{t - 1} (1 - q_{t - 1})

,

t \in Z

.

Proof.

Using Equations (3) and (4), as well as the definition of conditional probabilities and binomial thinning, for the conditional distribution of

X_{t}

on a given

X_{t - 1}

, one obtains

\begin{matrix} p_{i, j}^{(X)} & = P \{α \circ (X_{t - 1} + ξ_{t - 1}) = j | X_{t - 1} = i\} \\ = P \{α \circ X_{t - 1} = j | X_{t - 1} = i\} P {q_{t - 1} = 0} \\ + P \{α \circ (X_{t - 1} + ε_{t - 1}) = j | X_{t - 1} = i\} P {q_{t - 1} = 1} \\ = F ε (c) p_{i} (j; α) I (i \geq j) + (1 - F ε (c)) \sum_{k = 0}^{\infty} P \{α \circ (X_{t - 1} + k) = j | X_{t - 1} = i\} p_{ε} (k; a) \\ = F ε (c) p_{i} (j; α) I (i \geq j) + (1 - F ε (c)) \sum_{k = 0}^{\infty} p_{i + k} (j; α) p_{ε} (k; a) I (i + k \geq j), \end{matrix}

where

p_{i} (j; α) : = (\binom{i}{j}) α^{j} {(1 - α)}^{i - j}, j = 0, 1, \dots, i

is the PMF of binomial distribution

B (i; α)

. From here, Equation (17) is directly obtained.

On the other hand, notice that for the series

(Y_{t})

, it is valid that

\begin{matrix} Y_{t} = X_{t} + ε_{t} & = α \circ (X_{t - 1} + q_{t - 1} ε_{t - 1}) + ε_{t} = α \circ (Y_{t - 1} - ε_{t - 1} + q_{t - 1} ε_{t - 1}) + ε_{t} \\ \overset{d}{=} α \circ Y_{t - 1} + η_{t}, \end{matrix}

(20)

where using conditional probabilities, the PMF of the series

(η_{t})

is obtained as follows:

\begin{matrix} p_{η} (x; θ) & = P \{ε_{t} = x\} P {q_{t - 1} = 1} + P \{ε_{t} - α \circ ε_{t - 1} = x\} P {q_{t - 1} = 0} \\ = (1 - F_{ε} (c)) p_{ε} (x; a) + F_{ε} (c) \sum_{k = 0}^{\infty} P \{ε_{t} = x + k\} P \{α \circ ε_{t - 1} = k\} \\ = (1 - F_{ε} (c)) p_{ε} (x; a) + F_{ε} (c) \sum_{k = 0}^{\infty} p_{ε} (x + k; a) \sum_{ℓ = 0}^{\infty} p_{ℓ} (k; α) p_{ε} (ℓ; a) I (ℓ \geq k) . \end{matrix}

It can easily be seen that the last obtained equality is equivalent to Equation (19). According to this, for the conditional distribution of

Y_{t}

on a given

Y_{t - 1}

, one obtains

\begin{matrix} p_{i, j}^{(Y)} & = P \{α \circ Y_{t - 1} + η_{t} = j | Y_{t - 1} = i\} = \sum_{k = 0}^{m} P \{α \circ Y_{t - 1} = k | Y_{t - 1} = i\} p_{η} (j - k; θ) \\ = \sum_{k = 0}^{m} p_{i} (k; α) p_{η} (j - k; θ), \end{matrix}

which proves Equation (18), as well as the theorem completely. □

Remark 3

(Distributional properties). Note that the first summand in Equation (17) exists if and only if martingale means

(X_{t})

pass from the state

X_{t - 1} = i

to the non-increasing state

X_{t} = j \leq i

. In addition, the transition probabilities given by Equations (17) and (18) give the marginal PMFs of the INSB series

(X_{t})

and

(Y_{t})

:

\begin{matrix} p_{X} (x; θ) : = P \{X_{t} = x\} & = \sum_{k = 0}^{\infty} P \{X_{t} = x | X_{t - 1} = k\} P \{X_{t - 1} = k\} = \sum_{k = 0}^{\infty} p_{k, x}^{(X)} p_{X} (k; θ), \\ p_{Y} (y; θ) : = P \{Y_{t} = y\} & = \sum_{k = 0}^{\infty} P \{Y_{t} = y | Y_{t - 1} = k\} P \{Y_{t - 1} = k\} = \sum_{k = 0}^{\infty} p_{k, y}^{(Y)} p_{Y} (k; θ) . \end{matrix}

Finally, using the PGFs of the INSB series

(X_{t})

and

(Y_{t})

, given by Equations (13), the PMFs of these two series can also be expressed as follows:

\begin{matrix} p_{X} (k; θ) & = \frac{1}{k!} \frac{\partial^{k} G_{Y} (u; θ)}{\partial u^{k}} |_{u = 0}, p_{Y} (k; θ) = \frac{1}{k!} \frac{\partial^{k} G_{Y} (u; θ)}{\partial u^{k}} |_{u = 0} . \end{matrix}

(21)

In a similar way to INAR processes with zero inflation (see, e.g., Li et al. [24,25]), the distribution of zero lengths of INSB(1) processes is examined below. For this purpose, recall that we have assumed that the condition

0 \in S

is satisfied and observe the distribution of zero “runs”, as the number of zeros occurrences between two distinct non-zero values. The following statement gives the expected lengths of zeros in INSB(1) model.

Theorem 4.

The expected lengths of zero runs for the series

(X_{t})

and

(Y_{t})

are, respectively,

\begin{matrix} L_{0}^{(X)} & = \frac{f (a) - μ_{q} [f (a) - f (a (1 - α))]}{μ_{q} [f (a) - f (a (1 - α))]}, \\ L_{0}^{(Y)} & = \frac{p_{η} (0)}{1 - p_{η} (0)}, \end{matrix}

(22)

where

μ_{q} : = 1 - F_{ε} (c)

and

p_{η} (0)

is the proportion of zeros in the RVs

(η_{t})

.

Proof.

Using Equation (17), as well as Equation (1) for the PS distributions family, the probabilities of transitions from zero to zero and from zero to non-zero values of the series

(X_{t})

are obtained as follows:

\begin{matrix} p_{0}^{(X)} & : = P \{X_{t} = 0 | X_{t - 1} = 0\} = F_{ε} (c) + (1 - F_{ε} (c)) \sum_{k = 0}^{\infty} {(1 - α)}^{k} p_{ε} (k; a) \\ = 1 + μ_{q} [\frac{1}{f (a)} \sum_{k = 0}^{\infty} m (k) {(a (1 - α))}^{k} - 1] \\ = 1 + μ_{q} [\frac{f (a (1 - α))}{f (a)} - 1], \\ 1 - p_{0}^{(X)} & = μ_{q} [1 - \frac{f (a (1 - α))}{f (a)}] . \end{matrix}

It can easily be seen (see, e.g., Stojanović et al. [21]) that zero lengths of the series

(X_{t})

have a geometric distribution with parameter

1 - p_{0}^{(X)}

. Therefore, the expected length of zero is

L_{0}^{(X)} = \frac{p_{0}^{(X)}}{1 - p_{0}^{(X)}},

and from here the first equality in Equations (22) immediately follows. Similarly, according to Equation (18) and using the same procedure as the previous one, the second equality in (22) is easily obtained. □

Remark 4

(Zero-inflation properties). Based on the previous results, the proportions of zeros in the INSB series can be easily computed. Indeed, using the definition of series

(ε_{t})

and

(ξ_{t})

, given by Definition 1 and Theorem 1, one obtains

\begin{matrix} p_{ε} (0; a) & : = P \{ε_{t} = 0\} = \frac{f (0)}{f (a)}, \\ p_{ξ} (0; θ_{ξ}) & : = P \{q_{t} ε_{t} = 0\} = P \{q_{t} = 0\} + P \{q_{t} = 1\} P \{ε_{t} = 0\} = 1 + μ_{q} [p_{ε} (0; a) - 1] . \end{matrix}

From here, we obtain

p_{ξ} (0; θ_{ξ}) - p_{ε} (0; a) = (1 - μ_{q}) [1 - p_{ε} (0; a)] > 0,

i.e., RVs

(ξ_{t})

have a more pronounced zero inflation than

(ε_{t})

. On the other hand, using the PGFs of the series

(X_{t})

and

(Y_{t})

, that is, Equations (13) and (21), the zero proportions of these two series are

\begin{matrix} p_{X} (0; θ) & : = P \{X_{t} = 0\} = G_{X} (0; θ) = \prod_{k = 1}^{\infty} [1 + μ_{q} (\frac{f (a (1 - α^{k}))}{f (a)} - 1)], \\ p_{Y} (0; θ) & : = P \{Y_{t} = 0\} = G_{Y} (0; θ) = \frac{f (0)}{f (a)} \prod_{k = 1}^{\infty} [1 + μ_{q} (\frac{f (a (1 - α^{k}))}{f (a)} - 1)] . \end{matrix}

Note that these proportions are products of convex combinations of values

f (a (1 - α^{k})) / f (a)

,

k = 0, 1, 2, \dots

, and constant 1. Since the function

f (a)

is monotonically increasing, it follows that

\frac{f (0)}{f (a)} < \frac{f (a (1 - α^{k}))}{f (a)} < 1 + μ_{q} (\frac{f (a (1 - α^{k}))}{f (a)} - 1) < 1, k = 1, 2, \dots

and this implies

p_{Y} (0; θ) < p_{ε} (0; θ) < p_{X} (0; θ) .

Thus, the basic INSB series

(Y_{t})

has a less pronounced zero inflation compared with the other mentioned series. Finally, martingale means

(X_{t})

have the most pronounced zero inflation and can be an adequate stochastic model in the fitting real-world series with pronounced zero values.

4. Parameter Estimation Procedure

Due to the specific structure of the INSB(1) process, its parameter estimation procedure is more complex compared with most known count time-series models. For instance, according to Equation (20), it follows that the (only) observable series

(Y_{t})

has an INAR(1) structure, but with 1-dependent innovations

η_{t} : = ε_{t} - α \circ ε_{t - 1} (1 - q_{t - 1})

. Therefore, the conditional mean of this series is

E [Y_{t + 1} | Y_{t}] = α Y_{t} + μ_{ε} - α \circ ε_{t} (1 - q_{t}),

and it depends on realizations of (unobservable) noise indicator

q_{t} = q_{t} (c)

. Thus, some of the widely used estimation methods, e.g., conditional last squares (CLS) method [26], as well as conditional maximum likelihood (CML) method [27], cannot be used here. Moreover, according to Theorem 2 and Equations (15) and (16), it is clear that even some moment-based estimation methods, i.e., the well-known Yule–Walker (YW) estimators [28], cannot simply obtain the above (see Section 6).

In order to obtain efficient parameter estimators of the INSB(1) process, an estimation technique called probability-generating function (PGF) method is examined here. It is worth to notice that some general results of PGF estimation theory were given in Esquivel [29]. Thereafter, some more specific PGF estimation procedures were recently examined in Stojanović et al. [19,20,21], as well as in Cadena et al. [30]. Also, it can be emphasized that the main idea of the PGF method is close to the empirical characteristic function (ECF) estimation method introduced by Yu [31]. Similar to the ECF method, the goal of the PGF method is minimizing the “distance” between the theoretical PGF of the series

(Y_{t})

, given by Equation (14), and its appropriate empirical PGF (of order

r \in N)

:

{\tilde{G}}_{T}^{(r)} (u) : = \frac{1}{T - r + 1} \sum_{t = 1}^{T - r + 1} u_{1}^{Y_{t}} \dots u_{r}^{Y_{t + r - 1}} .

(23)

where

u = {(u_{1}, \dots, u_{r})}^{'} \in R^{r}

and

{Y_{1}, Y_{2}, \dots, Y_{T}}

is a finite realization of the basic INSB(1) series

(Y_{t})

. Since the stationarity and ergodicity of the series

(Y_{t})

was proved in Theorem 2, it follows that

E [{\tilde{G}}_{T}^{(r)} (u)] = G_{Y}^{(r)} (u; θ_{0}),

(24)

where

θ_{0} \in (0, R) \times {(0, 1)}^{2}

is the true value of unknown parameters

θ = {(a, α, μ_{q})}^{'}

. Hence,

{\tilde{G}}_{T}^{(r)} (u)

is an unbiased estimator of

G_{Y}^{(r)} (u; θ_{0})

. In addition, the PGF

G_{Y}^{(r)} (u; θ)

is well defined on the set

{[- 1, 1]}^{r}

, so the objective function can be given as follows:

Q_{T}^{(r)} (θ) : = \int_{- 1}^{1} \dots \int_{- 1}^{1} w (u) {(G_{Y}^{(r)} (u; θ) - {\tilde{G}}_{T}^{(r)} (u))}^{2} d u,

(25)

where

d u : = d u_{1} \dots d u_{r}

and

w : R^{r} \to R^{+}

is a weight function, integrable on

{[- 1, 1]}^{r}

. Then, PGF estimators are usually obtained through minimization of the objective function given by Equation (24), with respect to parameters

θ

. More precisely, PGF estimates are solutions to the equation

{\hat{θ}}_{T} = \arg min_{θ \in Θ} Q_{T}^{(r)} (θ),

(26)

where

Θ = (0, R) \times {(0, 1)}^{2}

is a parameter space of the regular and stationary INSB(1) process. In order to solve Equation (26), some of the numerical integration procedures can be used, which is described in the next section, Section 5. Similar to Stojanović et al. [20], the following statement examines the strict consistency and asymptotic normality (AN) of PGF estimators of the INSB(1) process under certain regulatory conditions.

Theorem 5.

Let

θ_{0}

be the true value of the parameter θ and

{\hat{θ}}_{T}

,

T = 1, 2, \dots

, be solutions to Equation (26). Additionally, assume that the following regularity conditions hold:

$(A_{1})$: $θ_{0} \in Θ$ and ${\hat{θ}}_{T} \in Θ$ , for T large enough.
$(A_{2})$: The function

$Q_{0}^{(r)} (θ) : = \int_{- 1}^{1} \dots \int_{- 1}^{1} w (u) {(G_{Y}^{(r)} (u; θ) - G_{Y}^{(r)} (u; θ_{0}))}^{2} d u$

has a unique minimum $Q_{0}^{(r)} (θ_{0}) = 0$ at the point $θ = θ_{0}$ .
$(A_{3})$: $\frac{\partial G_{Y}^{(r)} (u; θ_{0})}{\partial θ} \frac{\partial G_{Y}^{(r)} (u; θ_{0})}{\partial θ^{'}}$ is a non-zero matrix uniformly bounded by some positive and w-integrable function $W : R^{r} \to R^{+}$ .
$(A_{4})$: $\frac{\partial^{2} Q_{T}^{(r)} (θ_{0})}{\partial θ \partial θ^{'}}$ is a regular matrix.

Then,

{\hat{θ}}_{T}

is a strictly consistent and AN estimator of the parameter θ.

Proof.

To prove the statement of theorem, we use a procedure based on some general results related to PGF estimators, described in Stojanović et al. [20]. First, the consistency of the estimator

{\hat{θ}}_{T}

should be checked. As the INSB(1) series

(Y_{t})

is ergodic, Equation (24) and the strong law of large numbers (SLLN) give

{\tilde{G}}_{T}^{(r)} (u) \overset{as}{⟶} G_{Y}^{(r)} (u; θ_{0}), T \to \infty .

(27)

Furthermore, by assumption

(A_{1})

, the set

\bar{Θ} = [0, R] \times {[0, 1]}^{2}

is a compact, with

θ_{0}

belonging to its interior. Therefore, the continuous functions

G_{Y}^{(r)} (u; θ)

and

{\tilde{G}}_{T}^{(r)} (u)

are bounded on compacts

{[- 1, 1]}^{r} \times \bar{Θ}

and

{[- 1, 1]}^{r}

, respectively, so that for some

M_{1}, M_{2} > 0

is valid:

max_{(u; θ) \in {[- 1, 1]}^{r} \times \bar{Θ}} | G_{Y}^{(r)} (u; θ) | \leq M_{1} < + \infty, max_{u \in {[- 1, 1]}^{r}} | {\tilde{G}}_{T}^{(r)} (u) | \leq M_{2} < + \infty .

According to this and similarly as in Stojanović et al. [20], it follows that

\begin{matrix} | Q_{T}^{(r)} (θ) - Q_{0}^{(r)} (θ) | & \leq (3 M_{1} + M_{2}) \int_{- 1}^{1} \dots \int_{- 1}^{1} w (u) | G_{Y}^{(r)} (u; θ_{0}) - {\tilde{G}}_{T}^{(r)} (u) | d u, \end{matrix}

which, along with Equation (27), implies that

sup_{θ \in \bar{Θ}} | Q_{T}^{(r)} (θ) - Q_{0}^{(r)} (θ_{0}) | \overset{as}{⟶} 0, T \to + \infty .

Thus,

Q_{T}^{(r)} (θ)

converge almost surely and uniformly to

Q_{0}^{(r)} (θ)

. According to this, as well as assumption

(A_{2})

and Theorem 2.1 in Newey and McFadden [32], one obtains

{\hat{θ}}_{T}^{(r)} - θ_{0} \overset{as}{⟶} 0, T \to + \infty,

that is,

{\hat{θ}}_{T}

is a strictly consistent estimator of the parameter

θ

.

In order to prove the property of AN for

{\hat{θ}}_{T}

, note that the partial derivatives up to the first two orders of the function

Q_{T}^{(r)} (θ)

are continuous functions. Therefore, they can then be differentiated under the integral sign, as follows:

\begin{matrix} \frac{\partial Q_{T}^{(r)} (θ)}{\partial θ} & = & 2 \int_{- 1}^{1} \dots \int_{- 1}^{1} w (u) [G_{Y}^{(r)} (u; θ) - {\tilde{G}}_{T} (u)] \frac{\partial G_{Y}^{(r)} (u; θ)}{\partial θ} d u, \\ \frac{\partial^{2} Q_{T}^{(r)} (θ)}{\partial θ \partial θ^{'}} & = & 2 \int_{- 1}^{1} \dots \int_{- 1}^{1} w (u) \{\frac{\partial G_{Y}^{(r)} (u; θ)}{\partial θ} \frac{\partial G_{Y}^{(r)} (u; θ)}{\partial θ^{'}} \end{matrix}

(28)

\begin{matrix} + [G_{Y}^{(r)} (u; θ) - {\tilde{G}}_{T} (u)] \frac{\partial^{2} G_{Y}^{(r)} (u; θ)}{\partial θ \partial θ^{'}}\} d u . \end{matrix}

(29)

From here, taking the mean values at

θ = θ_{0}

, one obtains

E [\frac{\partial Q_{T}^{(r)} (θ_{0})}{\partial θ}] = 0_{r \times 1}, E [\frac{\partial^{2} Q_{T}^{(r)} (θ_{0})}{\partial θ \partial θ^{'}}] = 2 W,

(30)

where

W = \int_{- 1}^{1} \dots \int_{- 1}^{1} w (u) \frac{\partial G_{Y}^{(r)} (u; θ_{0})}{\partial θ} \frac{\partial G_{Y}^{(r)} (u; θ_{0})}{\partial θ^{'}} d u .

According to assumption

(A_{3})

, for the function

W : R^{r} \to R^{+}

, it is valid that

0 < ∥\frac{\partial G_{Y}^{(r)} (u; θ_{0})}{\partial θ} \frac{\partial G_{Y}^{(r)} (u; θ_{0})}{\partial θ^{'}}∥ \leq W (u), \forall u \in {[- 1, 1]}^{r},

where

∥\cdot∥

is some matrix norm on

R^{r} \times R^{r}

. Therefore, it follows that

0 < ∥W∥ \leq \int_{- 1}^{1} \dots \int_{- 1}^{1} g (u) W (u) d u < + \infty,

and using Equations (30) and SLLN, it is obtained that

(\frac{\partial Q_{T}^{(r)} (θ_{0})}{\partial θ}, \frac{\partial^{2} Q_{T}^{(r)} (θ_{0})}{\partial θ \partial θ^{'}}) \overset{a s}{⟶} (0, 2 W), T \to + \infty .

(31)

Further, according to Equations (23) and (28), the gradient of function

Q_{T}^{(r)} (θ)

is as follows:

\frac{\partial Q_{T}^{(r)} (θ)}{\partial θ} = \frac{2}{T - r + 1} \sum_{t = 1}^{T - r + 1} C_{t}^{(r)} (θ),

(32)

where

C_{t}^{(r)} (θ) : = \int_{- 1}^{1} \dots \int_{- 1}^{1} w (u) [G_{Y}^{(r)} (u; θ) - u_{1}^{X_{t}} \dots u_{r}^{X_{t + r - 1}}] \frac{\partial G_{Y}^{(r)} (u; θ)}{\partial θ} d u,

and

E [C_{t}^{(r)} (θ_{0})] = 0

holds. Using some general facts on ECF theory (see, e.g., [30,31]), absolute summability of covariance

γ_{Y} (k) : = Cov (Y_{t}, Y_{t + k})

,

k = 0, \pm 1, \dots

of the series

(Y_{t})

is sufficient for a non-zero finite value:

\begin{matrix} V^{2} & : = lim_{T \to \infty} Var [\frac{1}{\sqrt{T - r + 1}} \sum_{t = 1}^{T - r + 1} C_{t}^{(r)} (θ_{0})] = lim_{T \to \infty} \frac{1}{T - r + 1} E {[\sum_{t = 1}^{T - r + 1} C_{t}^{(r)} (θ_{0})]}^{2} \\ = lim_{T \to \infty} \frac{1}{{(T - r + 1)}^{2}} \sum_{t = 1}^{T - r + 1} \sum_{s = 1}^{T - r + 1} Cov [C_{t}^{(r)} (θ_{0}), C_{s}^{(r)} (θ_{0})] . \end{matrix}

(33)

According to Theorem 2, for each

α \in (0, 1)

, it is valid that

\sum_{k = - \infty}^{+ \infty} γ_{Y} (k) = σ_{Y}^{2} (2 \sum_{k = 1}^{+ \infty} \frac{α^{k} (σ_{X}^{2} + μ_{q} σ_{ε}^{2})}{σ_{X}^{2} + σ_{ε}^{2}} + 1) = \frac{2 α (σ_{X}^{2} + μ_{q} σ_{ε}^{2})}{1 - α} + σ_{Y}^{2} < + \infty,

so the central limit theorem for stationary processes [33], as well as Equations (32) and (33), gives

\sqrt{T - r + 1} \frac{\partial Q_{T}^{(r)} (θ_{0})}{\partial θ} \overset{d}{⟶} N (0_{r \times 1}, 4 V^{2}), T \to + \infty,

(34)

where “d” means the convergence in distribution.

On the other hand, according to the Taylor expansion of the function

\partial Q_{T}^{(r)} (θ) / \partial θ

at

θ = θ_{0}

, it follows that

\frac{\partial Q_{T}^{(r)} (θ)}{\partial θ} = \frac{\partial Q_{T}^{(r)} (θ_{0})}{\partial θ} + \frac{\partial^{2} Q_{T}^{(r)} (θ_{0})}{\partial θ \partial θ^{'}} (θ - θ_{0}) + o (θ - θ_{0}) .

From here, using assumption

(A_{4})

and the equality

\partial Q_{T}^{(r)} ({\hat{θ}}_{T}) / \partial θ = 0

, one obtains

{\hat{θ}}_{T} - θ_{0} = - {[\frac{\partial^{2} Q_{T}^{(r)} (θ_{0})}{\partial θ \partial θ^{'}}]}^{- 1} \frac{\partial Q_{T}^{(r)} (θ_{0})}{\partial θ} + o ({\hat{θ}}_{T} - θ_{0}) .

(35)

Finally, Equations (31), (34) and (35) give

\sqrt{T - r + 1} ({\hat{θ}}_{T}^{(r)} - θ_{0}) \overset{d}{⟶} N (0_{r \times 1}, W^{- 1} V^{2} W^{- 1}), T \to + \infty,

and the theorem is fully proved. □

Remark 5.

Using similar considerations as in ECF estimation theory (see, e.g., Yu [31]), the PGF estimators of the (true) parameter

θ = θ_{0}

can be calculated using the realization of a two-dimensional random vector

Y_{t}^{(2)} : = {(Y_{t}, Y_{t + 1})}^{'}

. In that case, the objective function

Q_{T}^{(2)}

is given as a double integral that can be approximately calculated by applying some well-known cubature formulas (see the next section, Section 5). To this end, the two-dimensional PGF of the basic INSB(1) series

(Y_{t})

should be determined. By replacing

r = 2

in Equation (14) and using Equation (13), this PGF can be obtained in the following way:

\begin{matrix} G_{Y}^{(2)} (u_{1}, u_{2}; θ) & = \prod_{k = 1}^{\infty} [1 + μ_{q} (\frac{f (a (1 + α^{k} (u - 1)))}{f (a)} - 1)] \\ \times \frac{[(1 - μ_{q}) f (a u_{1}) + μ_{q} f (a u)] f (a u_{2})}{{(f (a))}^{2}}, \end{matrix}

(36)

where

u = u_{1} (1 + α (u_{2} - 1)) .

Figure 2 shows, as an illustration, the theoretical and empirical PGFs of the series

(Y_{t})

with geometrically distributed PS innovations.

Figure 2. Two-dimensional (a) theoretical PGFs; (b) empirical PGF of the series

(Y_{t})

. Innovations are PS-distributed RVs with geometric distribution and parameters

a = α = μ_{q} = 0.5

(c = 1)

.

5. Numerical Simulations

In this part, numerical simulations of the PGF estimation procedure for the unknown parameters

θ = {(a, α, μ_{q})}^{'}

of the INSB(1) process were performed. To that aim, as mentioned earlier, different PS-distributed innovations

(ε_{t})

can be considered. For some practical reasons, related to the application of the INAR(1) process that will be further explained, two different PS-distributions of the innovations

(ε_{t})

were examined, one with a Poisson distribution and the other with a geometric distribution. Using Monte Carlo simulations, samples of length

T = 1000

were generated in both cases, based on 500 independent realizations of the PS series

(ε_{t})

. According to that, by applying Equations (3) and (5), respectively, the series of martingale means

(X_{t})

and the basic INSB series

(Y_{t})

were then generated. In that way, the realizations

{Y_{1}, \dots, Y_{T}}

of the INSB-based series

(Y_{t})

were obtained, on which the PGF method could be applied.

The estimators of parameters

θ = {(a, α, μ_{q})}^{'}

were calculated by minimizing the double integral

Q_{T}^{(2)} (θ) = \int_{- 1}^{1} \int_{- 1}^{1} w (u_{1}, u_{2}) {(G_{Y}^{(2)} (u_{1}, u_{2}; θ) - {\tilde{G}}_{T}^{(2)} (u_{1}, u_{2}))}^{2} d u_{1} d u_{2},

(37)

where

w : {[- 1, 1]}^{2} \to R^{+}

is the weight function, and

G_{Y}^{(2)} (u_{1}, u_{2}; θ)

is the two-dimensional PGF of the INSB(1) series

(Y_{t})

. For some specific PS-distributed innovations, this PGF can be obtained after some computation, by using Equation (36). In that way, for PS innovations with Poisson distribution, one obtains

\begin{matrix} G_{Y}^{(2)} (u_{1}, u_{2}; θ) & = \prod_{k = 1}^{\infty} [1 + μ_{q} (exp (α^{k} (u - 1) a) - 1)] exp ((u_{1} + u_{2} - 2) a) \\ \times [1 + μ_{q} (exp (α u_{1} (u_{2} - 1) a) - 1)], \end{matrix}

while for PS innovations with geometric distribution, it follows that

\begin{matrix} G_{Y}^{(2)} (u_{1}, u_{2}; θ) & = \prod_{k = 1}^{\infty} [1 + \frac{μ_{q} α^{k} (u - 1) a}{1 - (1 + α^{k} (u - 1)) a}] \cdot \frac{{(1 - a)}^{2} (1 - a (μ_{q} u_{1} + (1 - μ_{q}) u))}{(1 - a u_{1}) (1 - a u_{2}) (1 - a u)}, \end{matrix}

where

u = u_{1} (1 + α (u_{2} - 1)) .

It is worth mentioning that these PGFs are not obtained in closed form, but they can be easily approximated by finite k-terminal products and with arbitrary precision.

After that, the integral in Equation (37) is approximately calculated using the cubature formula:

I (h; w) : = \int_{- 1}^{1} \int_{- 1}^{1} w (u_{1}, u_{2}) h (u_{1}, u_{2}) d u_{1} d u_{2} \approx \sum_{j = 1}^{N} w_{j} h (u_{1 j}, u_{2 j}),

where

(u_{1 j}, u_{2 j})

are the cubature nodes and

w_{j}

are the weight coefficients. In this case, the Gaussian cubature formulas with

N = 36

nodes were used, where two-dimensional weights are based on the Gegenbauer orthogonal polynomials, i.e.,

w_{k} (u_{1}, u_{2}) = {((1 - u_{1}^{2}) (1 - u_{2}^{2}))}^{(k - 1) / 2},

where

k = 0, 1, 2

. Obviously, these cubatures reduce to the well-known Gauss–Chebyshev cubatures of the first type (when

k = 0

), Gauss–Chebyshev cubatures of the second type (when

k = 2

) and Gauss–Legendre cubatures (when

k = 1

). These cubatures were calculated within software package “Orthogonal polynomials” in Wolfram Mathematica language, authorized by Cvetković and Milovanović [34]. Thereafter, minimization of the objective function given by Equation (37) was performed using the box-constrained optimization procedure “nlminb” [35] in the statistical programming language R. At the same time, realizations of a uniform distribution

U (0, 1)

were taken as the initial values of the parameters.

Table 2 presents the summary statistics of the thus obtained PGF estimates, that is, their minimums (Min.), mean values (Mean), maximums (Max.), as well as the mean squared estimation errors (MSEEs), for all proposed weights and both innovation distributions considered. Additionally, the values of the objective functions

{\hat{S}}_{T}

are given, as reference errors in the estimation. Regarding to the results presented, it is seen that the PGF procedure provides efficient parameter estimates, with similar properties, for both innovation series

(ε_{t})

. Notice that slightly smaller estimation errors are observed with weight

w_{2} (u_{1}, u_{2})

. This is expected, because this weight emphasizes points around the coordinate origin. On the contrary, somewhat larger errors in the estimation are observed with weight

w_{0} (u_{1}, u_{2})

, which “forces” ends of the interval

[- 1, 1]

. Finally, when

k = 1

, Gauss–Legendre orthogonal polynomials with weight

w_{1} (u_{1}, u_{2}) \equiv 1

are obtained. Their accuracy is also satisfactory, and due to their simplicity, similarly to what is reported in Stojanović et al. [19,21], they will be used in some practical applications of the PGF method.

Table 2. The summary statistics, estimation errors and AN testing of parameter estimates of INSB

(1)

process (true parameters are

a = α = 0.5

,

c = 1

).

In addition, Table 2 contains the AN test results, where Anderson–Darling normality test was performed. The test statistic, denoted as AD, as well as the corresponding p-values, were calculated using the procedure from the software package “nortest” [36], within the statistical programming language R (version 4.3.2). According to the values thus obtained, it is easy to see that the property of AN is verified for most PGF estimates of parameters

θ = {(a, α, μ_{q})}^{'}

, at the significance level of

0.01 < p < 0.05

. At the same time, it is worth emphasizing that estimates of critical value c can be simply obtained from the estimates of parameter

μ_{q}

and the equality

c = inf_{x = 0, 1, \dots} \{F_{ε} (x) \geq 1 - μ_{q}\} .

(38)

6. Application of the Model

Some practical applications of the INSB process in real-world data modeling are discussed here, taking two sets of actual data. The first series, named Series A, represents the number of traffic accidents with a fatal outcome in the Republic of Serbia, collected according to the official statistics of the Office for Information Technologies and Egoverment [37], in the period from 1 January 2015 to 31 December 2021. The second one (Series B) represents the number of forest fires in Evros, one of the largest regional units of the Republic of Greece, with data taken from the official website of the Hellenic Fire Service [38], from 1 January 2019 to 31 December 2021. In that way, count time series of lengths

T_{1} = 2541

and

T_{2} = 1096

, respectively, were obtained, and their dynamics are shown in Figure 3.

Figure 3. Daily dynamics of the number of traffic accidents with fatalities in the Republic of Serbia (Series A) and the number of wild fires in the Evros region of the Republic of Greece (Series B).

Summary statistics of both series are shown in Table 3, where their specific descriptive characteristics can already be observed. For instance, with Series A, a slight over-dispersion is noticeable. Using the previous theoretical results, it appears that its dynamics can be modeled by an INSB(1) process with Poisson-distributed innovations

(ε_{t})

. Namely, in this case,

g (a) = ln f (a) = a

holds, and according to Equation (15) in Theorem 2, the over-dispersion index is

D_{Y} (a) : = σ_{Y}^{2} - μ_{Y} = α^{2} a^{2} μ_{q} F_{ε} (c) / (1 - α^{2}) > 0 .

On the other hand, when Series B is observed, it is noticeable that the over-dispersion is much more pronounced, as are the properties of zero inflation. Thus, it appears that geometrically distributed PS innovations can be used to model its dynamics.

Table 3. The summary statistics, stationarity testing and the correlation structure of real-world data.

Further, the augmented Dickey–Fuller (ADF) test was conducted, with the alternative hypothesis that the observed series are stationary, which was confirmed in both cases. Also, the estimated values of the autocorrelation functions (ACFs) of both series are decreasing, as is shown in Figure 4. In addition, it is noticeable that the decrease in both ACFs, especially for Series A, is slower than in the case of the regular INAR(1) process. In accordance with previous theoretical results, primarily Theorem 2, this suggests the possibility of modeling the dynamics of both series with the INSB(1) procedure.

Figure 4. Autocorrelation functions (ACFs) of the considered time-series data.

On the other hand, the weak correlation in Series A indicates the possibility that the hypothesis of the absence of correlation in the members of this time series is valid here. Therefore, the following null hypotheses are tested here:

\begin{matrix} H_{0} : ρ_{x} (k) = 0 and ρ_{| x |} (k) = 0, & at individual lag k \geq 1, \\ H_{0} : ρ_{x} (k) = 0 and ρ_{| x |} (k) = 0, & for cumulative k = 1, \dots, m, \end{matrix}

where

ρ_{x} (k)

and

ρ_{| x |} (k)

are the kth-order correlations of the series

(x_{t})

and

(| x_{t} - E (x_{t}) |)

, respectively. Using the results given by Dalla et al. [39], the above hypotheses are tested using the robust statistics

J_{x; | x |} (k) = \frac{n^{2}}{n - k} ({\hat{ρ}}_{x} (k) + {\hat{ρ}}_{| x |} (k)), C_{x; | x |} (k) = \sum_{k = 1}^{m} J_{x; | x |} (k),

where

{\hat{ρ}}_{x} (k)

and

{\hat{ρ}}_{| x |} (k)

are estimated kth correlations of the above series, respectively. Alternatively, the absence of correlation in the series

(x_{t})

and

{(x_{t} - E (x_{t}))}^{2}

is also tested, via the hypotheses

\begin{matrix} H_{0} : ρ_{x} (k) = 0 and ρ_{x^{2}} (k) = 0, & at individual lag k \geq 1, \\ H_{0} : ρ_{x} (k) = 0 and ρ_{x^{2}} (k) = 0, & for cumulative k = 1, \dots, m, \end{matrix}

where

ρ_{x} (k)

and

ρ_{x^{2}} (k)

are the appropriate kth-order correlations. Like previously, the statistics used for testing are

J_{x; x^{2}} (k) = \frac{n^{2}}{n - k} ({\hat{ρ}}_{x} (k) + {\hat{ρ}}_{x}^{2} (k)), C_{x; x^{2}} (k) = \sum_{k = 1}^{m} J_{x; x^{2}} (k) .

For all the above statistics, the following convergences hold:

J_{x; | x |} (k), J_{x; x^{2}} (k) \overset{d}{⟶} χ_{2}^{2}, C_{x; | x |} (k), C_{x; x^{2}} (k) \overset{d}{⟶} χ_{2 m}^{2},

where

χ_{2}^{2}

and

χ_{2 m}^{2}

are the RVs with chi-square distributions. Figure 5 reports the test results of both observed series, at the significance level

α = 5 %

(showed with dashed lines), obtained using the function “iid.test()“ within the R-package “testcorr“.

Figure 5. Tests for i.i.d. properties of the considered time-series data: the testing of individual data ((left) diagrams); the cumulative testing of data series ((right) diagrams).

In the case of individual testing, it is obvious that the null hypothesis is rejected up to a certain correlation lag. Specifically, with Series A, the i.i.d. property appears for the first time when

k = 74

, where the appropriate p-values of the statistics

J_{x; | x |} (k)

and

J_{x; x^{2}} (k)

are equal to 0.144 and 0.114, respectively. Similarly, with Series B, the first i.i.d. case is when

k = 30

. On the other hand, following the cumulative tests, it is clear that the null hypothesis of non-correlation of the observed series does not hold at any level. Thus, the assumption about their possible modeling using the INSB(1) process can be justified.

To compare INSB(1) modeling with INAR(1) process modeling, both of these stochastic models were formed based on the observed data. First, by applying equality

Y_{t} = α \circ Y_{t - 1} + ε_{t},

where

t = 1, 2, \dots, T,

the INAR(1) model was constructed, whose parameters

(a, α)

can be estimated using some well-known procedures. Here, the CLS estimation method was applied, based on the minimization of the function

\begin{matrix} S_{T} (θ) & : = \sum_{t = 2}^{T} {(Y_{t} - E [Y_{t} | Y_{t - 1}])}^{2} = \sum_{t = 2}^{T} {(Y_{t} - α Y_{t - 1} - μ_{ε})}^{2}, \end{matrix}

(39)

and the CLS-estimates

\tilde{α}

,

{\tilde{μ}}_{ε}

, same as in the regression procedure, are easily computed by differentiating Equation (39) and equating it to zero. Notice that in the case of the PS distributions family, by using the estimator

{\tilde{μ}}_{ε}

, estimates of the parameter a can be also easily determined (see, c.f. Table 1). Hence, for the Poisson and geometric distributions, the corresponding estimates are

\tilde{a} = {\tilde{μ}}_{ε}

and

\tilde{a} = {\tilde{μ}}_{ε} / (1 + {\tilde{μ}}_{ε})

, respectively. By applying some basic results of CLS theory [40], the asymptotic properties of the CLS estimators obtained in this way can be proven. The results of this estimation procedure are given in the upper part of Table 4.

Table 4. Estimated parameters of the INAR(1) and INSB(1) processes, their respective estimation errors and predictive test statistics.

Taking as initial values the estimates obtained from the previous CLS procedure, the PGF method is then applied. PGF estimates of parameters

θ = {(a, α, μ_{ε})}^{'}

are calculated using the previously described estimation procedure, that is, by minimizing the double integral given by Equation (37). In doing so, Gauss–Legendre orthogonal polynomials with weight

w_{1} (u_{1}, u_{2}) \equiv 1

are used, and the estimated values for both series, along with the corresponding estimated standard errors, are also presented in Table 4. Additionally, parameter estimates of critical value c can be computed using Equation (38), and for Series A and B, these values are

c = 0

and

c = 1

, respectively. Accordingly, this means that both series can be equally modeled by the regular INAR(1) or INSB(1) model, although the INSB(1) model appears to have slightly better fitting capabilities.

To confirm this fact, the efficiency of fitting is analyzed for both stochastic models and both of the mentioned estimation procedures. For this purpose, using previously obtained estimated parameter values, 500 Monte Carlo simulations of INAR(1) and INSB(1) time series are generated, while the efficiency of the fit to the real-world data used here is checked using the MSEE statistics and Akaike’s information criterion (AIC). Their average values, as well as the values of objective function

Q_{T}^{(2)} (θ)

, are presented in the middle part of Table 4. It is worth noting that fitting statistics, as well as the estimated parameter values, are close for both series, i.e., for both estimation methods and both stochastic models. However, it is noticeable that the fitting errors are slightly less when the INSB(1) model is applied. This is somewhat more emphasized in the case of Series B, for which it can be concluded that the INSB(1) process represents a more suitable stochastic fitting model. Some of the aforementioned facts can also be observed in the above plots of Figure 4, where the empirical and fitted frequencies of both time series (as well as both stochastic models) are shown.

Finally, for both of these models, the analysis of forecast accuracy based on them is examined. To this end, the time interval from 1 January 2023 to 30 April 2023 was taken as the horizon of the forecast length (

h = 120

). The testing procedure was carried out using a one-tailed Diebold–Mariano test of predictive accuracy [41]. More precisely, the null hypothesis was that the INAR(1) and INSB(1) models have the same predictive accuracy, while the alternative hypothesis was that the INSB(1) model has better accuracy. The test statistic, labeled DM, as well as the corresponding p-values, were calculated within the package “forecast” [42] in the statistical programming language R and are presented in the lower part of Table 4. Based on this, it can be seen that the INSB(1) model has better forecast accuracy, which is in accordance with the previous results obtained. As an illustration, the lower diagrams in Figure 6 show the frequency distribution of both models, based on forecast data.

Figure 6. Plots above: Frequency distributions of the observed data fitted by the INAR(1) and INSB(1) processes. Plots below: Prediction features of INAR(1) and INSB(1) processes.

7. Conclusions

A novel count time-series model, named INSB(1) process, is presented here. Using the general form of PS-distributed innovations, as well as the noise indicator series, this stochastic model can be viewed as a generalization of some related models, primarily the INAR(1) processes. The key properties of the proposed model, as well as the procedure for estimating its parameters based on probability-generating functions (PGF method), were discussed in detail. Through Monte Carlo simulations, the consistency of PGF estimators, as well as a practical application of the INSB(1) process in real-world data fitting, was examined. In order to verify the effectiveness of the proposed model, it was applied in fitting distributions and for forecasting two accident-based real-world time series: the number of serious traffic accidents and the number of forest fires. At the same time, the INSB(1) model was also compared with the regular INAR(1) model. Based on the obtained results, that is, the fitting errors and DM-statistics presented in the previous section, the proposed model has the same or even better fitting possibilities with the observed data. According to the above, the better characteristics of the INSB model can be observed, especially in the prediction of future values of the considered time series.

Let us notice that as members of the PS distributions family, two special cases of the INSB(1) model, with Poisson-distributed and geometrically distributed innovations, were considered. For both, the fitting procedures were examined in various aspects, as well as prediction accuracy. The obtained results indicate the appropriateness of the proposed model, which at the same time represents a motivation for further research. This can be conducted, for instance, in defining a higher-order INSB processes or, similar to the General Split-BREAK (GSB) model, to use some other discrete-type stochastic distribution as the innovation series. Finally, it should be mentioned that there are integer-valued distributions that do not belong to the family of PS distributions, which may be a limitation of the proposed model.

Author Contributions

Conceptualization, V.S.S., H.S.B. and Z.G.; methodology, V.S.S. and Z.G.; software, V.S.S., H.S.B. and K.K.; validation, H.S.B., Z.G. and F.E.A.; formal analysis, V.S.S., H.S.B. and F.E.A.; data curation, V.S.S., H.S.B. and K.K.; writing—original draft preparation, V.S.S., H.S.B. and Z.G.; writing—review and editing, Z.G., F.E.A. and K.K.; visualization, V.S.S. and K.K.; supervision, V.S.S., H.S.B. and F.E.A.; project administration, Z.G. and F.E.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The official data used in the manuscript can be found on the following websites: https://data.gov.rs/sr/datasets/podatsi-o-saobratshajnim-nezgodama-po-politsijskim-upravama-i-opshtinama/ (accessed on 13 November 2023) and https://www.fireservice.gr/en_US/anoichta-dedomena (accessed on 13 November 2023).

Conflicts of Interest

The authors declare no conflict of interest.

References

Graziadei, H.; Lijoi, A.; Lopes, H.F.; Marques F., P.C.; Prünster, I. Prior Sensitivity Analysis in a Semi-Parametric Integer-Valued Time Series Model. Entropy 2020, 22, 69. [Google Scholar] [CrossRef] [PubMed]
Alqawba, M.; Fernando, D.; Diawara, N. A Class of Copula-Based Bivariate Poisson Time Series Models with Applications. Computation 2021, 9, 108. [Google Scholar] [CrossRef]
Stapper, M. Count Data Time Series Modelling in Julia—The CountTimeSeries.jl Package and Applications. Entropy 2021, 23, 666. [Google Scholar] [CrossRef] [PubMed]
Chesneau, C.; Bakouch, H.S.; Tomy, L.; Veena, G. A New Discrete Distribution on Integers: Analytical and Applied Study on Stock Exchange and Flood Data. J. Stat. Manag. Syst. 2022, 25, 1899–1917. [Google Scholar] [CrossRef]
Khoo, W.C.; Ong, S.H.; Atanu, B. Coherent Forecasting for a Mixed Integer-Valued Time Series Model. Mathematics 2022, 10, 2961. [Google Scholar] [CrossRef]
Fatemeh, G.; Hassan, B.; Kadir, K. A Pliant Model to Count Data: Nabla Poisson–Lindley Distribution with a Practical Data Example. Bull. Iran. Math. Soc. 2023, 49, 32. [Google Scholar]
Mohammadpour, M.; Bakouch, H.S.; Shirozhan, M. Poisson–Lindley INAR(1) Model with Applications. Braz. J. Probab. Stat. 2018, 32, 262–280. [Google Scholar] [CrossRef]
Lívio, T.; Bourguignon, M.; Nascimento, F. INAR(1) Processes with Inflated-parameter Generalized Power Series Innovations. J. Time Ser. Econom. 2020, 12, 20190033. [Google Scholar] [CrossRef]
Bermúdez, L.; Karlis, D. Multivariate INAR(1) Regression Models Based on the Sarmanov Distribution. Mathematics 2021, 9, 505. [Google Scholar] [CrossRef]
Li, Q.; Chen, H.; Liu, X. A New Bivariate Random Coefficient INAR(1) Model with Applications. Symmetry 2022, 14, 39. [Google Scholar] [CrossRef]
Maya, R.; Chesneau, C.; Krishna, A.; Irshad, M.R. Poisson Extended Exponential Distribution with Associated INAR(1) Process and Applications. Stats 2022, 5, 755–772. [Google Scholar] [CrossRef]
Mohammadi, Z.; Sajjadnia, Z.; Bakouch, H.S.; Sharafi, M. Zero-and-One Inflated Poisson–Lindley INAR(1) Process for Modelling Count Time Series with Extra Zeros and Ones. J. Stat. Comput. Simulat. 2022, 92, 2018–2040. [Google Scholar] [CrossRef]
Zeng, X.; Kakizawa, Y. Bias-Correction of Some Estimators in the INAR (1) Process. Stat. Probab. Lett. 2022, 187, 109503. [Google Scholar] [CrossRef]
Yu, K.; Tao, T. An Observation-Driven Random Parameter INAR(1) Model Based on the Poisson Thinning Operator. Entropy 2023, 25, 859. [Google Scholar] [CrossRef] [PubMed]
Stojanović, V.; Popović, B.Č.; Popović, P. Model of General Split-BREAK Process. REVSTAT Stat. J. 2015, 13, 145–168. [Google Scholar]
Stojanović, V.S.; Bakouch, H.S.; Ljajko, E.; Božović, I. Laplacian Split-BREAK Process with Application in Dynamic Analysis of the World Oil and Gas Market. Axioms 2023, 12, 622. [Google Scholar] [CrossRef]
Jovanović, M.; Stojanović, V.; Kuk, K.; Popović, B.; Čisar, P. Asymptotic Properties and Application of GSB Process: A Case Study of the COVID-19 Dynamics in Serbia. Mathematics 2022, 10, 3849. [Google Scholar] [CrossRef]
Ljajko, E.; Stojanović, V.S.; Tošić, M.; Božović, I. Cauchy Split-Break Process: Asymptotic Properties and Application in Securities Market Analysis. U.P.B. Sci. Bull. Ser. A Appl. Math. Phys. 2023, 85, 139–154. [Google Scholar]
Stojanović, V.; Randjelović, D.; Kuk, K. Noise-Indicator Non-negative Integer-Valued Autoregressive Time Series of the First Order. Braz. J. Probab. Stat. 2018, 32, 147–171. [Google Scholar] [CrossRef]
Stojanović, V.; Ljajko, E.; Tošić, M. Parameters Estimation in Non-Negative Integer-Valued Time Series: Approach Based on Probability Generating Functions. Axioms 2023, 12, 112. [Google Scholar] [CrossRef]
Stojanović, V.S.; Bakouch, H.S.; Ljajko, E.; Qarmalah, N. Zero-and-One Integer-Valued AR(1) Time Series with Power Series Innovations and Probability Generating Function Estimation Approach. Mathematics 2023, 11, 1772. [Google Scholar] [CrossRef]
Alzaid, A.A.; Al-Osh, M.A. An Integer-Valued p^th-order Autoregressive Structure (INAR(p)) Process. J. Appl. Probab. 1990, 27, 314–324. [Google Scholar] [CrossRef]
Kella, O.; Löpker, A. On Binomial Thinning and Mixing. Indag. Math. 2023, 34, 1121–1145. [Google Scholar] [CrossRef]
Li, C.; Wang, D.; Zhang, H. First-Order Mixed Integer-Valued Autoregressive Processes with Zero-Inflated Generalized Power Series Innovations. J. Korean Stat. Soc. 2015, 44, 232–246. [Google Scholar] [CrossRef]
Li, C.; Cui, S.; Wang, D. Monitoring the Zero-Inflated Time Series Model of Counts with Random Coefficient. Entropy 2021, 23, 372. [Google Scholar] [CrossRef]
Azrak, R.; Mélard, G. Asymptotic Properties of Conditional Least-Squares Estimators for Array Time Series. Stat. Inference Stoch. Processes 2021, 24, 525–547. [Google Scholar] [CrossRef]
Cui, Y.; Zheng, Q. Conditional Maximum Likelihood Estimation for a Class of Observation-Driven Time Series Models for Count Data. Stat. Probab. Lett. 2017, 123, 193–201. [Google Scholar] [CrossRef]
Martin, V.L.; Tremayne, A.R.; Jung, R.C. Efficient Method of Moments Estimators for Integer Time Series Models. J. Time Ser. Anal. 2014, 35, 491–516. [Google Scholar] [CrossRef]
Esquìvel, M.L. Some Applications of Probability Generating Function Based Methods to Statistical Estimation. Discuss. Math. 2009, 29, 131–153. [Google Scholar] [CrossRef]
Cadena, M.; Mohammad Masjed-Jamei, M.; Omey, E.; Vesilo, R. New Bivariate Probability Models Based on Panjer-Type Relations. Bull. Cl. des Sci. Mathématiques et Nat. Sci. Mathématiques 2003, in press. [Google Scholar]
Yu, J. Empirical Characteristic Function Estimation and Its Applications. Econom. Rev. 2004, 23, 93–123. [Google Scholar] [CrossRef]
Newey, W.K.; McFadden, D. Large Sample Estimation and Hypothesis Testing. In Handbook of Econometrics; Elsevier: Amsterdam, The Netherlands, 1994; Volume 4. [Google Scholar]
Billingsley, P. Probability and Measure; John Wiley & Sons: New York, NY, USA, 1995. [Google Scholar]
Cvetković, A.S.; Milovanović, G.V. The Mathematica Package “Orthogonal Polynomials”. Facta Univ. Ser. Math. Inform. 2004, 19, 17–36. [Google Scholar]
Gay, D.M. Usage Summary for Selected Optimization Routines. In Computing Science, Technical Report; AT&T Bell Laboratories, Murray Hill: New York, NY, USA, 1990; Volume 153. [Google Scholar]
Gross, L. Tests for Normality. R Package Version 1.0-2. Available online: http://CRAN.R-project.org/package=nortest (accessed on 3 September 2023).
The Office for Information Technologies and Egovernment, Open Data Portal. Available online: https://data.gov.rs/sr/datasets/podatsi-o-saobratshajnim-nezgodama-po-politsijskim-upravama-i-opshtinama/ (accessed on 3 September 2023).
Fire Brigade of Greece, Open Data/Datasets. Available online: https://www.fireservice.gr/en_US/anoichta-dedomena (accessed on 3 September 2023).
Dalla, V.; Giraitis, L.; Phillips, P.C.B. Robust Tests for White Noise and Cross-Correlation. Cowles Foundation. Discussion Paper No. 2194. 2020. Available online: https://cowles.yale.edu/sites/default/files/files/pub/d21/d2194-r.pdf (accessed on 18 December 2023).
Tjøstheim, D. Estimation in Non-linear Time Series Models. Stoch. Process. Appl. 1986, 21, 251–273. [Google Scholar] [CrossRef]
Diebold, F.X.; Mariano, R.S. Comparing Predictive Accuracy. J. Bus. Econ. Stat. 1995, 13, 253–263. [Google Scholar]
Hyndman, R. Forecasting Functions for Time Series and Linear Models. R Package Version 7.1. 2016. Available online: http://CRAN.R-project.org/package=forecast (accessed on 3 September 2023).

Figure 1. (a) Realizations of the INSB(1) time series. (b) Empirical frequency distributions of the INSB(1) time series (parameters are

α = 0.5, a = 2, c = 1

).

Figure 2. Two-dimensional (a) theoretical PGFs; (b) empirical PGF of the series

(Y_{t})

. Innovations are PS-distributed RVs with geometric distribution and parameters

a = α = μ_{q} = 0.5

(c = 1)

.

Figure 3. Daily dynamics of the number of traffic accidents with fatalities in the Republic of Serbia (Series A) and the number of wild fires in the Evros region of the Republic of Greece (Series B).

Figure 4. Autocorrelation functions (ACFs) of the considered time-series data.

Figure 5. Tests for i.i.d. properties of the considered time-series data: the testing of individual data ((left) diagrams); the cumulative testing of data series ((right) diagrams).

Figure 6. Plots above: Frequency distributions of the observed data fitted by the INAR(1) and INSB(1) processes. Plots below: Prediction features of INAR(1) and INSB(1) processes.

Table 1. Some specific PS distributions, along with their over-dispersion indices and PGFs.

Distributions	$S$	$m (x)$	$(0, R)$	$f (a)$	$μ_{ε}$	$D_{ε} (a)$	$G_{ε} (u; a)$	$R / a$
1. Bernoulli	$\{0, 1\}$	1	$(0, \infty)$	$1 + a$	$\frac{a}{1 + a}$	$- \frac{a^{2}}{{(1 + a)}^{2}}$	$\frac{1 + a u}{1 + a}$	∞
2. Binomial	$\{0, \dots, n\}$	$(\binom{n}{x})$	$(0, \infty)$	${(1 + a)}^{n}$	$\frac{n a}{1 + a}$	$- \frac{n a^{2}}{{(1 + a)}^{2}}$	${(\frac{1 + a u}{1 + a})}^{n}$	∞
3. Poisson	$\{0, \dots, \infty\}$	$\frac{1}{x!}$	$(0, \infty)$	$exp (a)$	a	0	$exp (a (u - 1))$	∞
4. Geometric	$\{0, \dots, \infty\}$	1	$(0, 1)$	$\frac{1}{1 - a}$	$\frac{a}{1 - a}$	$\frac{a^{2}}{{(1 - a)}^{2}}$	$\frac{1 - a}{1 - a u}$	$1 / a$
5. Negative binomial	$\{0, \dots, \infty\}$	$\frac{Γ (x + n)}{x! Γ (n)}$	$(0, 1)$	$\frac{1}{{(1 - a)}^{n}}$	$\frac{n a}{{(1 - a)}^{n}}$	$\frac{n a^{2}}{{(1 - a)}^{2}}$	${(\frac{1 - a}{1 - a u})}^{n}$	$1 / a$

Table 2. The summary statistics, estimation errors and AN testing of parameter estimates of INSB

(1)

process (true parameters are

a = α = 0.5

,

c = 1

).

Table 2. The summary statistics, estimation errors and AN testing of parameter estimates of INSB

(1)

process (true parameters are

a = α = 0.5

,

c = 1

).

Sample		Poisson $(μ_{q} \approx 0.3935)$			$S_{T}^{(2)}$	Geometric $(μ_{q} = 0.5)$			$S_{T}^{(2)}$
Sample		$\hat{a}$	$\hat{α}$	${\hat{μ}}_{q}$	$S_{T}^{(2)}$	$\hat{a}$	$\hat{α}$	${\hat{μ}}_{q}$	$S_{T}^{(2)}$
$w_{0} (u_{1}, u_{2})$	Min.	0.3904	0.4358	0.3677	$3.14 \times 10^{- 4}$	0.3898	0.4397	0.3950	$3.45 \times 10^{- 4}$
	Mean	0.5105	0.4989	0.3897	$9.69 \times 10^{- 3}$	0.5082	0.5019	0.4997	$3.62 \times 10^{- 3}$
	Max.	0.7619	0.5499	0.4361	$3.05 \times 10^{- 2}$	0.7142	0.5760	0.6197	$1.33 \times 10^{- 2}$
	MSEE	$4.54 \times 10^{- 3}$	$2.77 \times 10^{- 3}$	$1.09 \times 10^{- 4}$	–	$1.99 \times 10^{- 3}$	$2.27 \times 10^{- 4}$	$1.18 \times 10^{- 4}$	–
	$A D$	0.5576	0.3708	0.8099	–	0.7178	0.2457	0.5065	–
	(p-value)	(0.1303)	(0.3892)	(0.0337 *)	–	(0.0593)	(0.7524)	(0.1967)	–
$w_{1} (u_{1}, u_{2})$	Min.	0.3836	0.4523	0.3559	$1.59 \times 10^{- 4}$	0.3904	0.4255	0.4056	$1.55 \times 10^{- 4}$
	Mean	0.5091	0.5002	0.3905	$1.41 \times 10^{- 2}$	0.5091	0.5018	0.5016	$1.15 \times 10^{- 3}$
	Max.	0.7421	0.5493	0.4234	$4.18 \times 10^{- 2}$	0.6832	0.5694	0.5913	$8.01 \times 10^{- 3}$
	MSEE	$2.27 \times 10^{- 3}$	$7.65 \times 10^{- 4}$	$1.02 \times 10^{- 4}$	–	$2.06 \times 10^{- 3}$	$2.06 \times 10^{- 4}$	$1.17 \times 10^{- 4}$	–
	$A D$	0.2437	0.2861	0.3188	–	0.5911	0.2410	0.5262	–
	(p-value)	(0.7589)	(0.6178)	(0.5301)	–	(0.1191)	(0.7613)	(0.1714)	–
$w_{2} (u_{1}, u_{2})$	Min.	0.3951	0.4765	0.3807	$1.12 \times 10^{- 4}$	0.4065	0.4175	0.4070	$2.38 \times 10^{- 4}$
	Mean	0.5068	0.5012	0.3914	$8.98 \times 10^{- 3}$	0.5070	0.4987	0.5018	$1.12 \times 10^{- 3}$
	Max.	0.7581	0.5471	0.4284	$2.88 \times 10^{- 2}$	0.6486	0.5607	0.6179	$6.52 \times 10^{- 3}$
	MSEE	$2.57 \times 10^{- 3}$	$6.17 \times 10^{- 4}$	$3.08 \times 10^{- 4}$	–	$1.98 \times 10^{- 3}$	$2.09 \times 10^{- 4}$	$9.23 \times 10^{- 5}$	–
	$A D$	0.2694	0.3384	0.5972	–	0.2123	0.9120	0.3959	–
	(p-value)	(0.6497)	(0.4734)	(0.1182)	–	(0.8513)	(0.0195 *)	(0.3645)	–

*

0.01 < p < 0.05

.

Table 3. The summary statistics, stationarity testing and the correlation structure of real-world data.

Statistics	Series A	Series B
Minimum	0	0
Maximum	7	18
Mode	0	0
Median	1	0
Mean	1.261	1.116
St. deviation	1.256	1.763
Variance	1.577	3.109
Skewness	1.062	2.931
Kurtosis	4.099	14.44
ADF-test	−7.705	−5.667
(p-value)	(<0.01)	(<0.01)
ACF(1)	0.150	0.535
ACF(2)	0.119	0.427
⋯	⋯	⋯
ACF(10)	0.115	0.255
⋯	⋯	⋯
ACF(40)	0.083	0.007

Table 4. Estimated parameters of the INAR(1) and INSB(1) processes, their respective estimation errors and predictive test statistics.

Parameters/Statistics (Stand. Errors)	Series A		Series B
Parameters/Statistics (Stand. Errors)	INAR(1)	INSB(1)	INAR(1)	INSB(1)
a	1.0713 ( $5.24 \times 10^{- 2}$ )	1.0911 ( $4.07 \times 10^{- 2}$ )	0.3422 ( $7.79 \times 10^{- 3}$ )	0.3624 ( $7.47 \times 10^{- 3}$ )
$α$	0.1500 ( $2.94 \times 10^{- 2}$ )	0.1697 ( $1.96 \times 10^{- 2}$ )	0.5348 ( $4.26 \times 10^{- 3}$ )	0.4837 ( $4.06 \times 10^{- 3}$ )
$μ_{q}$	–	0.6290 ( $1.96 \times 10^{- 2}$ )	–	0.4503 ( $4.06 \times 10^{- 3}$ )
$Q_{T}^{(2)}$	$1.39 \times 10^{- 2}$	$4.63 \times 10^{- 3}$	$7.74 \times 10^{- 3}$	$3.04 \times 10^{- 3}$
MSEE	$1.89 \times 10^{- 2}$	$1.82 \times 10^{- 2}$	$2.48 \times 10^{- 2}$	$1.99 \times 10^{- 2}$
AIC	−661.95	−698.43	−531.39	−553.12
$D M$	2.6741 **		2.4683 **
(p-value)	( $3.77 \times 10^{- 3}$ )		( $6.86 \times 10^{- 3}$ )

**

p < 0.01

.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Integer-Valued Split-BREAK Process with a General Family of Innovations and Application to Accident Count Data Modeling

Abstract

1. Introduction

2. Definition and Structure of the INSB(1) Process

3. Main Properties of the INSB(1) Process

4. Parameter Estimation Procedure

5. Numerical Simulations

6. Application of the Model

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics