A Seasonal Transmuted Geometric INAR Process: Modeling and Applications in Count Time Series

Ghodake, Aishwarya; Awale, Manik; Bakouch, Hassan S.; Alomair, Gadir; Daghestani, Amira F.

doi:10.3390/math13152334

Open AccessArticle

A Seasonal Transmuted Geometric INAR Process: Modeling and Applications in Count Time Series

by

Aishwarya Ghodake

^1,*

,

Manik Awale

¹

,

Hassan S. Bakouch

²

,

Gadir Alomair

^3,*

and

Amira F. Daghestani

⁴

¹

Department of Statistics, Savitribai Phule Pune University, Pune 411007, India

²

Department of Mathematics, College of Science, Qassim University, Buraydah 51452, Saudi Arabia

³

Department of Quantitative Methods, School Business, King Faisal University, Al-Ahsa 31982, Saudi Arabia

⁴

Department of Mathematics, College of Science and Humanities, Imam Abdulrahman Bin Faisal University, Jubail 35811, Saudi Arabia

^*

Authors to whom correspondence should be addressed.

Mathematics 2025, 13(15), 2334; https://doi.org/10.3390/math13152334

Submission received: 6 May 2025 / Revised: 10 July 2025 / Accepted: 18 July 2025 / Published: 22 July 2025

(This article belongs to the Special Issue Applied Statistics in Real-World Problems)

Download

Browse Figures

Versions Notes

Abstract

In this paper, the authors introduce the transmuted geometric integer-valued autoregressive model with periodicity, designed specifically to analyze epidemiological and public health time series data. The model uses a transmuted geometric distribution as a marginal distribution of the process. It also captures varying tail behaviors seen in disease case counts and health data. Key statistical properties of the process, such as conditional mean, conditional variance, etc., are derived, along with estimation techniques like conditional least squares and conditional maximum likelihood. The ability to provide k-step-ahead forecasts makes this approach valuable for identifying disease trends and planning interventions. Monte Carlo simulation studies confirm the accuracy and reliability of the estimation methods. The effectiveness of the proposed model is analyzed using three real-world public health datasets: weekly reported cases of Legionnaires’ disease, syphilis, and dengue fever.

Keywords:

autoregression; binomial thinning; coherent forecasting; count time series; seasonality; simulation

MSC:

62M10

1. Introduction

Statistical time series models are widely used to understand and forecast disease incidence, supporting early detection of unusual trends and aiding in public health planning. Their ability to predict future counts is especially useful in surveillance systems.

Count time series models are essential in epidemiology, particularly for monitoring disease incidence, hospital admissions, etc. These models are well-suited for non-negative integer values that appear frequently in public health and related fields. The integer-valued autoregressive (INAR) model is commonly used to describe such data. Steutel and van Harn [1] introduced the binomial thinning operator, which forms the basis of the classical INAR(1) model proposed by McKenzie [2] and Al-Osh and Alzaid [3]. Extensions such as autoregressive moving average models with negative binomial and geometric marginals were also developed by McKenzie [4], along with binomial ARMA models by Al-Osh and Alzaid [5]. Applications in medicine were discussed by Cardinal et al. [6].

Over time, several variants of the INAR model have been proposed. Freeland and McCabe [7,8] and Park and Kim [9] focused on low-count and diagnostic issues. Periodic structures were incorporated by Monteiro et al. [10] and later by Bourguignon et al. [11]. Kim and Park [12], Maiti and Biswas [13,14], and Maiti et al. [15,16] studied coherent forecasting for zero-inflated and overdispersed data. Spatial and seasonal modeling in infectious disease data were explored by Held and Paul [17], while compound Poisson INAR models were discussed by Schweer and Weiss [18].

Additional developments include models with threshold-based random coefficient INAR models by Li et al. [19] and seasonal extensions like those by Tian et al. [20] and Awale et al. [21]. Almuhayfith et al. [22] applied these ideas to influenza data.

Other studies have proposed models using negative binomial [23], geometric [24], shifted geometric [25], model with deflation or inflation of zeros [26]. Manaa and Souakri [27] proposed a zero-inflated Poisson INAR(1) model with periodic structure to handle seasonality and excess zeros in count data. The seasonal count time series model which can take care of negative dependence is developed by Kong and Lund [28]. Coherent forecasting for the overdispersed novel geometric AR(1) (NoGeAR(1)) model is discussed in [29]. Models handling underdispersion and equidispersion are covered in Bourguignon and Weiß [30], and a simple INAR(1) model with zero-distorted generalized geometric innovations was proposed by Kang et al. [31]. A comprehensive overview of modeling approaches for count time series is provided in Davis et al. [32], including models with fixed marginal distributions, state-space methods, multivariate extensions, and Bayesian techniques.

Although many distributional forms have been explored, traditional count models may still fall short in capturing asymmetry or heavy tails. The quadratic rank transmutation (QRT) technique [33] enhances flexibility by modifying standard distributions. The transmuted geometric (TGD) distribution introduced by Chakraborty and Bhati [34] adds a transmutation parameter

α

to adjust skewness and covers both underdispersion and overdispersion. This paper proposes a seasonal transmuted geometric INAR model that builds on these ideas to accurately model seasonal variation in count data time series.

The Transmuted Geometric Distribution

Let X be a non-negative random variable having transmuted geometric distribution (TGD) [34] with parameters

α

and q, then its probability mass function (pmf) is given as

P (X = x) = (1 - α) q^{x} (1 - q) + α (1 - q^{2}) q^{2 x}, x = 0, 1, 2 \dots, 0 < q < 1, - 1 \leq α \leq 1 .

(1)

This model is a finite mixture of two geometric distributions: The first component follows a geometric distribution with parameter q (

G D (q)

) and mixing proportion

(1 - α)

. The second component follows a geometric distribution with the parameter

q^{2}

(

G D (q^{2})

) and the mixing proportion

α

.

Also, we can see this distribution as

P (X = x) = \frac{w (X) \times G D (q)}{E [w (X)]},

which represents a weighted geometric distribution with a weight function given by

w (X) = (1 - α) + α (1 + q) q^{x} .

The expectation of

w (X)

is computed as

E [w (X)] = \sum_{x = 0}^{\infty} [(1 - α) + α (1 + q) q^{x}] (1 - q) q^{x} = 1 .

Note that in (1), when

α = 0

, the transmuted geometric distribution reduces exactly to the ordinary geometric distribution. The extra parameter

α

effectively mixes the geometric distribution with a second weighted component, so by adjusting

α

we can widen or narrow the tail of the distribution to capture under/overdispersion in the data. It can model a variety of count data patterns by changing

α

.

The probability generating function (pgf) of X is given by

Φ_{X} (γ) = \frac{(1 - q) (1 - α q (- 1 + γ) - q^{2} γ)}{(1 - q γ) (1 - q^{2} γ)}, | q γ | < 1 .

Using the above pgf, we can derive the mean and variance of X as follows:

Φ_{X}^{'} (1) = \frac{q (1 - α + q)}{(1 - q^{2})}, Φ_{X}^{″} (1) = \frac{2 q^{2} [{(1 + q)}^{2} - α (1 + 2 q)]}{{(1 - q^{2})}^{2}} .

Here,

Φ_{X}^{'} (1)

and

Φ_{X}^{″} (1)

are the first and second derivatives of the pgf of X evaluated at

γ = 1

.

The variance of X is given as

Var (X) = Φ_{X}^{″} (1) + Φ_{X}^{'} (1) - {(Φ_{X}^{'} (1))}^{2} .

Using this, the mean and variance of X are

μ_{x} = E (X) = \frac{q (1 - α + q)}{(1 - q^{2})} and σ_{x}^{2} = V a r (X) = \frac{q [- α^{2} q + {(1 + q)}^{2} - α (1 + q^{2})]}{{(1 - q^{2})}^{2}},

respectively.

The variances of geometric and transmuted geometric distributions are given as

V a r (X_{G e o m}) = \frac{q}{{(1 - q)}^{2}} and V a r (X_{T G D}) = \frac{q [- α^{2} q + {(1 + q)}^{2} - α (1 + q^{2})]}{{(1 - q^{2})}^{2}} .

(2)

Skewness (Sk) and kurtosis (Kurt) of the geometric and transmuted geometric distributions are given as

Sk (X_{G e o m}) = \frac{1 + q}{\sqrt{q}}, and Kurt (X_{G e o m}) = 7 + \frac{1}{q} + q

(3)

Sk (X_{T G D}) = \frac{q (2 α^{3} q^{2} - {(1 + q)}^{4} + 3 α^{2} (q + q^{3}) + α (1 + 4 q^{2} + q^{4}))}{{[q {(1 + q)}^{2} - α q (1 + q^{2}) - α^{2} q^{2}]}^{\frac{3}{2}}} .

(4)

Kurt (X_{T G D}) = - 3 - \frac{12 q {(1 + q)}^{2}}{{(α - 1 + q [q (α - 1) + α^{2} - 2)]}^{2}} - \frac{(1 + q (4 + q)) (1 + q (8 + q))}{q (α - 1 + q [q (α - 1) + α^{2} - 2])} .

(5)

In Figure 1, we compare the variance, skewness, and kurtosis of the transmuted geometric distribution (red) with the baseline geometric distribution (blue) for different values of

α

(

- 0.99

,

0.50

,

0.99

). The TGD exhibits a higher variance than the geometric distribution for positive

α

(

0.50

,

0.99

). For negative

α

(

- 0.99

), it aligns closely with the geometric distribution. Skewness decreases as q increases, indicating a reduced asymmetry. Skewness remains higher for the TGD at positive

α

, especially for small q, and remains close to the geometric distribution for negative

α

. Kurtosis also decreases as q increases, with the TGD showing a higher peakedness for positive

α

and a closer resemblance to the geometric distribution for negative

α

. These results show that

α

controls the spread (variability) and skewness (asymmetry).

We examine the quantiles of the geometric distribution

G D (q)

and the transmuted geometric distribution

T G D (q, α)

with

q = 0.80

:

For $α = - 0.98$ , the $99.99$ th quantile is 41 for $G D$ and 44 for $T G D$ , indicating a heavier tail for the transmuted geometric distribution.
For $α = 0.98$ , the $99.99$ th quantile is 41 for $G D$ and 24 for $T G D$ , showing a lighter tail for the transmuted geometric distribution.

Thus, the transmutation parameter

α

controls the tail behavior of the distribution, with negative

α

leading to heavier tails and positive

α

to lighter tails.

Figure 2 presents simulated data for a transmuted geometric distribution (TGD) and its geometric counterpart (

α = 0

), using a fixed value of

q = 0.6

. The figure includes plots for three different values of the transmutation parameter

α

: −0.99, 0.50, and 0.99. Each line highlights the distinct behavior of the geometric and transmuted geometric distributions under varying

α

. The plots clearly capture how the parameter

α

influences the shape and spread of the pmf. For negative values of

α

(plots (a) and (b)), the distribution becomes more dispersed, with higher probabilities for larger values of x. In contrast, positive values of

α

(plots (c) and (d)) result in a more concentrated distribution, with a sharp peak near smaller values of x. The neutral case (

α = 0

, plot (e)) represents the geometric distribution (baseline). This shows that

α

controls the concentration and dispersion of probabilities in the distribution.

The paper is organized as follows. Section 2 introduces the TGDINAR(1) model and discusses its basic properties. The estimation of parameters is presented in Section 3. The analysis of real-world datasets is given in Section 4. Section 5 concludes the paper.

2. Construction and Properties of the Process

In this section, we introduce a transmuted geometric INAR process with the period ‘s’ to handle stationary non-negative seasonal integer-valued time series. This model is based on the binomial thinning operator ‘∗’, which is defined as

λ * X = \sum_{i = 1}^{X} W_{i},

where

W_{i}

’s are i.i.d. Bernoulli random variables with parameter

λ

. The pgf of W is

Φ_{W} (γ) = (1 - λ + λ γ), | γ | \leq 1 .

The random variables

{W_{i}}

are mutually independent of X. The transmuted geometric INAR model with period ‘s’ is defined as follows.

Definition 1.

A discrete-time non-negative integer-valued stochastic process

{X_{t}}

is said to be a transmuted geometric INAR process with period s (TGDINAR(1)s) based on a binomial thinning operator if the following conditions are satisfied:

X_{t} = λ * X_{t - s} + ϵ_{t}, 0 \leq λ < 1; t \geq s .

(6)

1.: ${X_{t}}$ is a sequence of non-negative integer-valued random variables, each marginally following a transmuted geometric distribution with parameters $(q, α)$ .
Remark: When $s = 1$ and $α = 0$ , this model reduces to the first-order geometric INAR model (GINAR(1)) [4].
2.: $s \in N$ denotes the seasonal period (with $s \geq 1$ ), where $N$ is the set of positive integers.
3.: ${ϵ_{t}}$ is a sequence of innovation random variables (i.i.d.) with non-negative integer values, independent of past values of ${X_{t}}$ and the thinning variables.
4.: The thinning variables involved in $λ * X_{t - s}$ are mutually independent and are also independent of the random variables $X_{t - s}, X_{t - 2 s}, \dots$ as well as the innovation sequence ${ϵ_{t}}$ .

Using the pgf of

X_{t}

and W, the pgf of

λ * X_{t - s}

is given as

Φ_{λ * X_{t - s}} (γ) = Φ_{X_{t - s}} (Φ_{W} (γ)) = \frac{(1 - q) [1 - α q λ (γ - 1) - q^{2} (1 - λ + λ γ)]}{[1 - q (1 - λ + λ γ)] [1 - q^{2} (1 - λ + λ γ)]} .

(7)

In addition, the pgf of the random variable

ϵ_{t}

is given as

\begin{matrix} Φ_{ϵ_{t}} (γ) & = \frac{Φ_{X_{t}} (γ)}{Φ_{λ * X_{t - s}} (γ)} \\ = \frac{(1 - α q (γ - 1) - q^{2} γ) (1 - q (1 - λ (1 - γ))) [1 - q^{2} (1 - λ (1 - γ))]}{(1 - q γ) (1 - q^{2} γ) [1 - α q λ (γ - 1) - q^{2} (1 - λ (1 - γ))]}, | q γ | < 1 . \end{matrix}

The pgf of the innovation term

ϵ_{t}

can also be written as

\begin{matrix} Φ_{ϵ_{t}} (γ) & = λ + (1 - λ) [\frac{(1 - q) (1 - α) (1 + q (1 - λ))}{(1 - q γ) (1 + q (1 - λ) - α λ)} + \frac{α (1 - q^{2}) ((1 - λ) q - λ)}{(1 - γ q^{2}) ((1 - λ) q - α λ)} \\ + \frac{(1 - α) α λ (1 - q^{2})}{(1 + q (1 - λ) - α λ) ((1 - λ) q - α λ) (1 - q^{2} + q (q + α) (1 - γ) λ)}] . \end{matrix}

(8)

The innovation term is distributed as a degenerated 0,

geometric (q), geometric (q^{2}), geometric (\frac{λ q (α + q)}{q^{2} - λ q (α + q) - 1})

with probabilities

λ, \frac{(1 - α) (1 - λ) ((1 - λ) q + 1)}{1 + α λ + (1 - λ) q},

\frac{(1 - α) α (1 - λ)}{1 + α λ + (1 - λ) q}, \frac{λ q (α + q)}{1 - q^{2} + λ q (α + q)}

, respectively.

Hence, the pmf of the innovations

{ϵ_{t}}

is given as follows:

\begin{matrix} P (ϵ_{t} = ϵ) & = λ I_{{0}} + \frac{(1 - q) q^{ϵ} [(1 - α) (1 - λ) ((1 - λ) q + 1)]}{1 - α λ + (1 - λ) q} \\ + \frac{(1 - α) α (1 - λ) λ}{(- α λ + (1 - λ) q + 1) ((1 - λ) q - α λ)} (1 - \frac{λ q (α + q)}{1 - q^{2} + λ q (α + q)}) \times \\ {(\frac{λ q (α + q)}{1 - q^{2} + λ q (α + q)})}^{ϵ} + \frac{(1 - q^{2}) q^{2 ϵ} (α (1 - λ) ((1 - λ) q - λ))}{(1 - λ) q - α λ} . \end{matrix}

(9)

Alternatively, the pmf of

ϵ_{t}

can be written as

P (ϵ_{t} = ϵ) = \{\begin{matrix} \frac{(α q + 1) (1 - (1 - λ) q) (1 - (1 - λ) q^{2})}{1 - (1 - λ) q^{2} + α λ q}, & ϵ = 0, \\ \frac{(1 - q) q^{ϵ} [(1 - α) (1 - λ) ((1 - λ) q + 1)]}{1 - α λ + (1 - λ) q} + \frac{(1 - α) α (1 - λ) λ}{(1 - α λ + (1 - λ) q) ((1 - λ) q - α λ)} \\ \times (1 - \frac{λ q (α + q)}{1 - q^{2} + λ q (α + q)}) {(\frac{λ q (α + q)}{1 - q^{2} + λ q (α + q)})}^{ϵ} \\ + \frac{(1 - q^{2}) q^{2 ϵ} (α (1 - λ) ((1 - λ) q - λ))}{(1 - λ) q - α λ}, & ϵ = 1, 2, 3, \dots \end{matrix}

(10)

Thus, the mean and variance of the innovation sequence

{ϵ_{t}}

, respectively, are given as

\begin{matrix} μ_{ϵ} = E (ϵ_{t}) = \frac{q (1 - λ) (1 - α + q)}{(1 - q^{2})}, \end{matrix}

and

\begin{matrix} σ_{ϵ}^{2} = V (ϵ_{t}) = \frac{q (λ - 1)}{{(1 - q^{2})}^{2}} [α + q α^{2} (1 + λ) + q^{2} α (1 + 2 λ) - {(1 + q)}^{2} (1 + λ q)] . \end{matrix}

Theorem 1.

The TGDINAR(1) process has a unique stationary solution given by

\begin{matrix} X_{t} \overset{d}{=} \sum_{j = 0}^{\infty} λ^{(j)} * ϵ_{t - j s} . \end{matrix}

The symbol

\overset{d}{=}

stands for equality in distribution, when

j \geq 1

,

λ^{(j)} ≜ \underset{\underset{j times}{⏟}}{λ * λ * \dots * λ}

, and

λ^{(0)} * ε_{t} ≜ ε_{t}

. For all

t \in N_{0}

, the infinite sum is interpreted as the limit in probability of its corresponding finite sums.

The proof of this theorem is deferred to the Appendix A.1.

Theorem 2.

One-step-ahead conditional distribution of the process

{X_{t}}

defined in (6) is given by

\begin{matrix} P (X_{t} = y ∣ X_{t - s} = x) = \\ \{\begin{matrix} \frac{{(1 - λ)}^{x} (- 1 - α q) A C}{R}, i f x \geq 0, y = 0, \\ \frac{{(1 - λ)}^{x}}{R} [- λ q (1 + α q) (A q + C) - A C (1 + α q) (q + q^{2}) + A C (α q + q^{2}) \\ - A C (1 + α q) x λ_{1} + \frac{A C (1 + α q) (α λ q + λ q^{2})}{R}], i f x \geq 0, y = 1, \\ \frac{{(1 - λ)}^{x} q^{2} A C}{R} [(α + q) (1 + q) - (1 + α q) (1 + q + q^{2})] + \frac{{(1 - λ)}^{x} q}{R} \\ [α + q + (- 1 - α q) (1 + q)] \times [λ q (A q + C) + A C x λ_{1} - \frac{A C (α λ q + λ q^{2})}{R}] \\ + \frac{{(1 - λ)}^{x} (- 1 - α q)}{R} {λ^{2} q^{3} + λ q (A q + C) \times (\frac{- (α λ q + λ q^{2})}{R} + λ_{1} x) \\ + A C [\frac{{(α λ q + λ q^{2})}^{2}}{R^{2}} - \frac{λ_{1} (α λ q + λ q^{2}) x}{R} + λ_{1}^{2} (\binom{x}{2})]}, i f x \geq 0, y = 2, \\ \frac{{(1 - λ)}^{x} q^{y} A C}{R} [(α + q) \sum_{j = 0}^{y - 1} q^{j} + (- 1 - α q) \sum_{j = 0}^{y} q^{j}] + \frac{{(1 - λ)}^{x} q^{y - 1}}{R} [(α + q) \sum_{j = 0}^{y - 2} q^{j} \\ - (1 + α q) \sum_{j = 0}^{y - 1} q^{j}] \times [λ q (A q + C) + A C (\frac{- (α λ q + λ q^{2})}{R} + λ_{1} x)] + \sum_{i = 2}^{y - 1} \frac{{(1 - λ)}^{x} q^{y - i}}{R} \times \\ ((α + q) \sum_{j = 0}^{y - i - 1} q^{j} + (- 1 - α q) \sum_{j = 0}^{y - i} q^{j}) \times [λ^{2} q^{3} \sum_{j = 0}^{i - 2} {(- 1)}^{i - j - 2} λ_{1}^{j} {(\frac{α λ q + λ q^{2}}{R})}^{i - j - 2} (\binom{x}{j}) \\ + (λ q^{2} A + λ q C) \times \sum_{j = 0}^{i - 1} {(- 1)}^{i - j - 1} λ_{1}^{j} {(\frac{α λ q + λ q^{2}}{R})}^{i - j - 1} (\binom{x}{j}) \\ + A C \sum_{j = 0}^{i} λ_{1}^{j} {(\frac{α λ q + λ q^{2}}{R})}^{i - j} (\binom{x}{j}) {(- 1)}^{i - j}] - \frac{{(1 - λ)}^{x} (1 + α q)}{R} \times \\ {λ^{2} q^{3} \sum_{j = 0}^{y - 2} {(- 1)}^{y - j - 2} λ_{1}^{j} {(\frac{α λ q + λ q^{2}}{R})}^{y - j - 2} (\binom{x}{j}) + (λ q^{2} A + λ q C) \times \\ \sum_{j = 0}^{y - 1} {(- 1)}^{y - j - 1} λ_{1}^{j} {(\frac{α λ q + λ q^{2}}{R})}^{y - j - 1} (\binom{x}{j}) + A C \sum_{j = 0}^{y} {(- 1)}^{y - j} λ_{1}^{j} {(\frac{α λ q + λ q^{2}}{R})}^{y - j} (\binom{x}{j})}, \\ i f x \geq 0, y \geq 3, \end{matrix} \end{matrix}

where

A = - 1 + q - λ q

,

C = - 1 + q^{2} - λ q^{2}

,

R = - 1 - α λ q - (λ - 1) q^{2}

,

λ_{1} = \frac{λ}{1 - λ}

.

One-step-ahead conditional mean and variance of the process

{X_{t}}

are given as

E (X_{t} | X_{t - s}) = λ X_{t - s} + \frac{q (1 - λ) (1 - α + q)}{(1 - q^{2})},

and

V a r (X_{t} | X_{t - s}) = λ (1 - λ) X_{t - s} + \frac{q (λ - 1)}{{(1 - q^{2})}^{2}} [α + q α^{2} (1 + λ) + q^{2} α (1 + 2 λ) - {(1 + q)}^{2} (1 + λ q)] .

Using recursion in Equation (6), we can write

X_{t + k} = λ^{(m)} * X_{t - r} + \sum_{j = 0}^{m - 1} λ^{(j)} * ϵ_{t - j s}

(11)

where

m = ⌈\frac{k}{s}⌉

and

r = m s - k

, where

⌈ \cdot ⌉

denotes the ceiling function (i.e., the smallest integer greater than or equal to ‘.’.

The proof of this theorem is deferred to the Appendix A.2.

Theorem 3.

Let

{X_{t}}

be a

T G D I N A R {(1)}_{s}

process defined as in (6). Then, the k-step-ahead conditional pmf of

X_{t + k}

given

X_{t - r}

is given by

\begin{matrix} P (X_{t + k} = y ∣ X_{t - r} = x) = \\ \{\begin{matrix} \frac{{(1 - λ^{m})}^{x} (- 1 - α q) A C}{R}, if x \geq 0, y = 0, \\ \frac{{(1 - λ^{m})}^{x}}{R} [- λ^{m} q (1 + α q) (A q + C) - A C (1 + α q) (q + q^{2}) + A C (α q + q^{2}) \\ - A C (1 + α q) x λ_{1} + \frac{A C (1 + α q) (α λ^{m} q + λ^{m} q^{2})}{R}], if x \geq 0, y = 1, \\ \frac{{(1 - λ^{m})}^{x} q^{2} A C}{R} [(α + q) (1 + q) - (1 + α q) (1 + q + q^{2})] + \frac{{(1 - λ^{m})}^{x} q}{R} [α + q + (- 1 - α q) (1 + q)] \times \\ [λ^{m} q (A q + C) + A C x λ_{1} - \frac{A C λ^{m} q (α + q)}{R}] - \frac{{(1 - λ^{m})}^{x} (1 + α q)}{R} {λ^{2 m} q^{3} + λ^{m} q (A q + C) \times \\ (\frac{- λ^{m} q (α + q)}{R} + λ_{1} x) + A C [\frac{λ^{2 m} q^{2} {(α + q)}^{2}}{R^{2}} - \frac{λ_{1} λ^{m} q (α + q) x}{R} + λ_{1}^{2} (\binom{x}{2})]}, if x \geq 0, y = 2, \\ \frac{{(1 - λ^{m})}^{x} q^{y} A C}{R} [(α + q) \sum_{j = 0}^{y - 1} q^{j} - (1 + α q) \sum_{j = 0}^{y} q^{j}] + \frac{{(1 - λ^{m})}^{x} q^{y - 1}}{R} [(α + q) \sum_{j = 0}^{y - 2} q^{j} - (1 + α q) \sum_{j = 0}^{y - 1} q^{j}] \times \\ [λ^{m} q (A q + C) + A C (\frac{- λ^{m} q (α + q)}{R} + λ_{1} x)] + \sum_{i = 2}^{y - 1} \frac{{(1 - λ^{m})}^{x} q^{y - i}}{R} ((α + q) \sum_{j = 0}^{y - i - 1} q^{j} - (1 + α q) \sum_{j = 0}^{y - i} q^{j}) \\ \times [λ^{2 m} q^{3} \sum_{j = 0}^{i - 2} {(- 1)}^{i - j - 2} λ_{1}^{j} {(\frac{λ^{m} q (α + q)}{R})}^{i - j - 2} (\binom{x}{j}) + λ^{m} q (q A + C) \times \sum_{j = 0}^{i - 1} {(- 1)}^{i - j - 1} λ_{1}^{j} \times \\ {(\frac{λ^{m} q (α + q)}{R})}^{i - j - 1} (\binom{x}{j}) + A C \sum_{j = 0}^{i} λ_{1}^{j} {(\frac{λ^{m} q (α + q)}{R})}^{i - j} (\binom{x}{j}) {(- 1)}^{i - j}] - \frac{{(1 - λ^{m})}^{x} (1 + α q)}{R} \times \\ {λ^{2 m} q^{3} \sum_{j = 0}^{y - 2} {(- 1)}^{y - j - 2} λ_{1}^{j} {(\frac{λ^{m} q (α + q)}{R})}^{y - j - 2} (\binom{x}{j}) + λ^{m} q (q A + C) \times \\ \sum_{j = 0}^{y - 1} {(- 1)}^{y - j - 1} λ_{1}^{j} {(\frac{λ^{m} q (α + q)}{R})}^{y - j - 1} (\binom{x}{j}) + A C \sum_{j = 0}^{y} {(- 1)}^{y - j} λ_{1}^{j} {(\frac{λ^{m} q (α + q)}{R})}^{y - j} (\binom{x}{j})}, \\ i f x \geq 0, y \geq 3, \end{matrix} \end{matrix}

where

A = - 1 + q - λ^{m} q

,

C = - 1 + q^{2} - λ^{m} q^{2}

,

R = - 1 - α λ^{m} q - (λ^{m} - 1) q^{2}

,

λ_{1} = \frac{λ^{m}}{(1 - λ^{m})}

.

The proof of this theorem is deferred to the Appendix A.3.

Corollary 1.

Suppose

X_{t + k}

and

X_{t - r}

satisfy the process given in (6), and

k \in N

. Then we have the following results:

1.: The conditional expectation of $X_{t + k}$ given $X_{t - r}$ is

$E (X_{t + k} | X_{t - r}) = λ^{m} X_{t - r} + \frac{q (1 - λ^{m}) (1 - α + q)}{(1 - q^{2})} .$

(12)

when k $\to \infty,$ implies $E (X_{t + k} | X_{t - r})$ $\to μ_{x},$ which is the unconditional mean.
2.: The conditional variance of $X_{t + k}$ given $X_{t - r}$ is

$\begin{matrix} V a r (X_{t + k} | X_{t - r}) & = \frac{(1 - λ^{m})}{{(1 - q^{2})}^{2}} [(1 - α) q - (α - 1) (1 + 2 λ^{m}) q^{3} + λ^{m} X_{t - r} \\ + λ^{m} q^{4} (1 + X_{t - r}) - q^{2} (- 2 + α^{2} (1 + λ^{m}) + λ^{m} (2 X_{t - r} - 1))] . \end{matrix}$

(13)

when k $\to \infty,$ implies $V a r (X_{t + k} | X_{t - r})$ $\to σ_{x}^{2},$ which is the unconditional variance.
3.: The autocovariance function of the process is given by

$\begin{matrix} Cov (X_{t}, X_{t - k s}) & = \frac{λ^{k} q [- α^{2} q + {(1 + q)}^{2} - α (1 + q^{2})]}{{(1 - q^{2})}^{2}} \\ = λ^{k} σ_{x}^{2}, k = 0, 1, 2, \dots \end{matrix}$

(14)

Proof.

The proof of this corollary is deferred to the Appendix B. □

3. Estimation of Parameters

In this section, two methods for estimating the parameters of the TGDINAR(1)s process with seasonal period s are discussed: the conditional least squares (CLS) method and the conditional maximum likelihood (CML) method.

3.1. Conditional Least Squares (CLS) Estimation

To obtain the conditional least squares (CLS) estimators of the parameters

θ = {(α, q, λ)}^{'}

, we differentiate the objective function

Q_{n} (θ)

with respect to each parameter (

α

, q, and

λ

) and equate the resulting expressions to zero. The objective is to determine the values of

α

, q, and

λ

that minimize

Q_{n} (θ)

. The objective function is given by

Q_{n} (θ) = \sum_{t = s + 1}^{n} {[X_{t} - E (X_{t} | X_{t - s})]}^{2} = \sum_{t = s + 1}^{n} {(X_{t} - λ X_{t - s} - \frac{q (1 - λ) (1 - α + q)}{1 - q^{2}})}^{2} .

(15)

The CLS estimators are obtained by solving the system of equations

\frac{\partial Q_{n} (θ)}{\partial λ} = 0, \frac{\partial Q_{n} (θ)}{\partial q} = 0, \frac{\partial Q_{n} (θ)}{\partial α} = 0 .

The estimator

{\hat{λ}}_{C L S}

is derived by differentiating

Q_{n} (θ)

with respect to

λ

and solving for

λ

. The closed-form solution for

{\hat{λ}}_{C L S}

is

{\hat{λ}}_{C L S} = \frac{(n - s) \sum_{t = s + 1}^{n} X_{t} X_{t - s} - \sum_{t = s + 1}^{n} X_{t} \sum_{t = s + 1}^{n} X_{t - s}}{(n - s) \sum_{t = s + 1}^{n} X_{t - s}^{2} - {(\sum_{t = s + 1}^{n} X_{t - s})}^{2}} .

(16)

Substituting

{\hat{λ}}_{C L S}

into

Q_{n} (θ)

yields the function

Q_{n} (α, q)

, which depends only on

α

and q. Since closed-form solutions for

{\hat{α}}_{C L S}

and

{\hat{q}}_{C L S}

are not available, numerical optimization techniques are used to obtain the estimates.

3.2. Conditional Maximum-Likelihood (CML) Estimation

In this subsection, we obtain the estimates using the conditional maximum-likelihood (CML) method. The CML estimator of

θ = {(α, q, λ)}^{'}

is the value

θ_{C M L}

that maximizes the likelihood function. Since the likelihood is not tractable, the maximum-likelihood estimate of

θ

is computed using numerical methods. The log-likelihood function is defined as

log L (x_{1}, \dots, x_{n}; α, q, λ) = \sum_{t = s + 1}^{n} log P (X_{t} = x_{t} ∣ X_{t - s} = x_{t - s}),

(17)

where

P (X_{t} = x_{t} ∣ X_{t - s} = x_{t - s})

is given by Theorem 2. The CML estimates of the parameters are obtained by maximizing the log-likelihood function defined in Equation (17).

3.3. Simulation Study

A simulation study has been carried out to compare the performance of the two estimation procedures using Monte Carlo experiments. We simulated 1000 data series from the TGDINAR(1)s process, each of sizes 50, 100, 300, 500, 1000, and 1500, for various combinations of parameter values and with a seasonal period of

s = 12

. The

T G D (q, α)

distribution is unimodal with a non-zero mode when

- 1 < α < - {(q (2 + q))}^{- 1}

, provided that

q > 0.414

[34]. We selected the following parameter combinations:

θ = (- 0.70, 0.75, 0.50)

,

θ = (- 0.85, 0.50, 0.10)

,

θ = (- 0.80, 0.85, 0.75)

and

θ = (0.15, 0.30, 0.25)

.

The estimation results, including mean squared errors (MSEs), are presented in Table 1. Estimates are obtained using the two methods, namely CLS and CML. We have computed the mean of the estimates over 1000 simulations and their MSEs. It can be seen that the CML has the lowest MSE compared to the CLS method. We plotted the box plots for three parameters using the estimates obtained from the CML and CLS for 1000 simulations. From the plots given in Figure 3, Figure 4, Figure 5 and Figure 6, we can say that as the sample size increases, the estimated parameter values converge to the actual value. Also, we have plotted the Q-Q plots based on 1000 simulations for standardized estimated values of the parameters for two estimation methods and three-parameter combinations. Due to space constraints, only one Q-Q plot is presented (Figure 7). The alignment of the sample quantiles with the theoretical quantiles along the diagonal line shows that the normality assumption is satisfied for a large sample size. The Q-Q plots confirm normality for all estimation methods and parameter combinations.

4. Real-Life Data Analysis

4.1. Exploratory Analysis of Legionnaires’ Data in Bavaria (2001–2023)

Legionnaires’ disease is a type of pneumonia caused by inhaling water droplets contaminated with Legionella bacteria. These bacteria flourish in warm water environments such as hot tubs, air conditioning systems, and large plumbing systems. The incubation period typically ranges from 2 to 10 days but can extend up to 20 days. In regions like Bavaria, Germany, the incidence increases from late spring to early fall due to warmer temperatures and greater use of water-based cooling systems.

We begin by examining the Legionnaires’ case data in Bavaria over the period 2001–2023, extracted from the Robert Koch Institute (https://survstat.rki.de (accessed on 12 August 2023)). This dataset consists of weekly counts of reported Legionnaires’ cases per year. Figure 8 shows the total number of cases of Legionnaires’ disease each year. The number of cases has been increasing over time, especially after 2017, with the highest in 2023. Figure 9 shows the weekly average of cases of Legionnaires’ disease in all years. The number of cases stays low from around week 10 to week 30, then starts to increase. The highest average number of cases occurs near weeks 48 to 50, which means that more cases tend to occur in the late part of the year. Figure 10 shows a heat map of Legionnaires’ cases by week and year. The bright yellow and orange areas, especially in recent years like 2018 and 2022, show that more cases usually occur near the end of the year. It also shows that, while the number of cases varies from year to year, the peak tends to occur around the same time each year.

These exploratory insights justify the use of a flexible and seasonally structured count model like the proposed TGDINAR(1)s, which is specifically designed to handle overdispersion, tail behavior, and periodicity in count time series.

Legionnaires’ Disease

We analyzed the weekly number of cases of Legionnaires’ disease in Bavaria, Germany, from 2001 to 2003, available at the Robert Koch Institute (https://survstat.rki.de (accessed on 12 August 2023)). The data consists of 159 weekly observations.

The bar graph in Figure 11 shows that the data set is positively skewed. The average value of the data is

1.4277

, and its spread is

2.0311

, which is relatively high compared to the average. This indicates that the data are more spread out than usual. The time series plot in Figure 12 shows repeating peaks every 3 weeks, visually confirming the seasonal cycle, while the PACF plot supports this with a strong spike at lag 3, statistically validating the seasonal period

s = 3

. Together, they indicate a consistent 3-week pattern in cases of Legionnaires’ disease. Figure 13 presents the marginal calibration plot, showing the difference between the fitted cumulative distribution functions (CDFs) and the empirical CDF of the observed data. Among the competing models, the curve corresponding to the transmuted geometric distribution remains closest to the zero line throughout the support, indicating a minimal deviation from the empirical CDF. This suggests that the TGD has a better fit to the data compared to alternative models.

To compare the performance of various models, we used the Akaike information criterion (AIC), Bayesian information criterion (BIC), and root mean square error (RMSE). Both AIC and BIC evaluate how well the model fits the data while accounting for the model complexity. Smaller values of these two criteria suggest a model with greater accuracy. RMSE measures prediction accuracy, with smaller values reflecting better predictive performance. Taking into account the AIC, BIC, and RMSE together in Table 2, the objective is to identify the model that best balances accuracy, complexity, and predictive performance. These criteria suggest that the transmuted geometric distribution and TGDINAR(1)s model are the better choices.

To predict future model values, we split the dataset into two parts; the first 149 observations are used to build the model, while the remaining ten observations are used for forecasting. We checked the independence of Pearson’s residual using the ACF plot in Figure 14. The highest posterior probability (HPP) intervals were constructed as the shortest intervals that contain 90% of the forecast distribution, based on the conditional pmf in Theorem 3. Table 3 shows the predicted mean, median, mode, and HPP intervals for all ten-step-ahead forecasts.

Comparative Models:

PoINAR(1)s: The INAR(1) process with Poisson marginal distributions, based on binomial thinning ( $ϕ$ ) [11].
GINAR(1)s: The INAR(1) process with a geometric marginal distribution and seasonal period s, based on binomial thinning ( $ϕ$ ) [21].
NGINAR(1)s: The INAR(1) process with new geometric marginal distributions, based on negative binomial thinning ( $α$ ), with seasonal period s [20].
ZMGINAR(1)s: The INAR(1) process with zero-modified geometric marginal distributions, based on negative binomial thinning ( $α$ ) [35].
TGDINAR(1)s: The proposed model.

4.2. Exploratory Analysis of Syphilis Cases (2007–2010)

Syphilis is a sexually transmitted infection (STI) caused by the bacterium Treponema pallidum. It can be transmitted through sexual contact, blood transfusion, or from mother to child during pregnancy. If not treated, it can cause serious complications, including neurological and cardiovascular damage, and adverse birth outcomes, such as stillbirth and neonatal death.

The syphilis data from 2007 to 2010 show clear trends and patterns. Figure 15 presents the average weekly cases in all years, with fluctuations throughout the year and peaks around weeks 6, 17, and 41. This suggests possible seasonal patterns or repeated increases in cases during certain times. The heatmap of weekly counts per year (Figure 16) highlights specific weeks, especially in 2008 and 2009, with higher case counts (in yellow and red), which supports the observed peaks and indicates consistent periods of higher incidence. Figure 17 shows the total number of cases per year, with a steady increase from 2007 to a peak in 2009, followed by a decrease in 2010. This pattern may reflect an outbreak or improved case detection in 2009, followed by better control or changes in reporting. Overall, 2009 had the highest number of syphilis cases during this period.

Syphilis Patient Data

The dataset records weekly syphilis cases in Massachusetts, USA, from 2007 to 2010, available in R (https://cran.r-project.org/web/packages/ZIM/ZIM.pdf (accessed on 3 March 2024)) [36]. It has 197 observations, with a mean of 2.7259 and a variation of 4.4449, indicating overdispersion.

Figure 18 presents a bar plot with a positively skewed distribution, while Figure 19 presents the time series plot along with the sample ACF and PACF for the syphilis dataset. The PACF plot exhibits a prominent spike at lag 6, providing statistical support for a seasonal period of 6 weeks. These findings suggest a consistent 6-week cyclical pattern in the incidence of syphilis cases.

Model performance was assessed using the AIC, BIC, and RMSE. Table 4 shows that the proposed TGDINAR(1)s model achieves consistently better results than the others. Figure 20 is a marginal calibration plot in which the TGD curve remains closer to zero, suggesting a better fit than the other distributions. To predict future values, we used 199 observations for model building and the last 10 for forecasting. Table 5 shows the predicted mean, median, mode, and HPP intervals for all ten-step-ahead forecasts. The independence of Pearson’s residuals was assessed using the ACF plot in Figure 21.

4.3. Exploratory Analysis of Dengue Cases (2001–2023)

Dengue is a mosquito-borne viral infection common in tropical and subtropical regions. In places like Bavaria, Germany, while not endemic, dengue cases are reported due to travel-related exposures, with incidence peaking during warmer months when mosquito activity is higher.

We analyze weekly dengue case data in Bavaria from 2001 to 2023, obtained from the Robert Koch Institute (https://survstat.rki.de (accessed on 24 April 2024)). Figure 22 shows a growing trend in annual dengue cases, with peaks in 2018 and 2023. Figure 23 presents the average weekly number of cases, which tends to increase during the mid-year and peaks between weeks 35 and 45. The heatmap in Figure 24 further confirms this seasonal pattern, highlighting increased activity in late summer and early fall, particularly in recent years.

Dengue Fever Cases

We analyzed the weekly number of cases of dengue disease in Berlin, Germany, from 2010 to 2015. The data consists of a total of 298 observations.

The bar plot in Figure 25 shows that the data is positively skewed. The average value of the data is

1.0772

, and its spread is

1.4587

, which is relatively high compared to the average. This indicates that the data are more spread out than usual. The PACF plot in Figure 26 indicates that the periodicity of this model is

s = 1

. This figure contains the time series plot and its sample ACF and PACF plot. Figure 27 is a marginal calibration plot suggesting that the transmuted geometric distribution provides a better fit to the data compared to the other distributions.

To compare the performance of various models for such data, we used the AIC, BIC, and RMSE. Obtaining the AIC, BIC, and RMSE measures presented in Table 6, we aim to identify the model that best balances accuracy, complexity, and predictive performance. These criteria suggest that the transmuted geometric distribution and TGDINAR(1)s model are the best choices.

We split the dataset into two parts to predict future model values: the first 288 observations are used to build the model, while the remaining 10 observations are used for forecasting. We checked the independence of Pearson’s residual using the ACF plot in Figure 28. We calculated the 90% highest predictive probability interval (HPP) instead of standard prediction intervals. Table 7 shows the predicted mean, median, mode, and HPP intervals for all ten-step-ahead forecasts.

5. Concluding Remarks

This paper deals with modeling overdispersed count time series with long or short tails using a transmuted geometric marginal distribution. Coherent forecasts are obtained based on the conditional distribution. A detailed Monte Carlo simulation study is carried out to check the performance of the estimation methods. The proposed approach is illustrated using both simulated data and data from practical applications. Model adequacy is assessed using Pearson residuals. The results show that the TGDINAR(1)s model provides a better fit compared to the others overdispersed count time series models. A key limitation of the proposed model is its focus on univariate count time series, restricting its applicability to more complex real-world scenarios. Future work will explore extending the model to capture structural breaks and non-linear patterns in the data. In addition, future extensions could consider a bivariate TGDINAR model with a seasonal structure to capture interdependencies between two related count processes, or address non-stationary cases to model time-varying behavior.

Author Contributions

Conceptualization, M.A., H.S.B. and A.G.; methodology, M.A., A.G. and H.S.B.; software, M.A. and A.G.; validation, M.A., A.G., H.S.B. and G.A.; formal analysis, M.A., A.G. and H.S.B.; investigation, M.A., A.G., H.S.B. and G.A.; resources, M.A., A.G., H.S.B., G.A. and A.F.D.; data curation, M.A., A.G., H.S.B. and G.A.; writing—original draft preparation, M.A., A.G. and H.S.B.; writing—review and editing, M.A., A.G., H.S.B., G.A. and A.F.D.; visualization, M.A., A.G., H.S.B., G.A. and A.F.D.; supervision, M.A. and H.S.B.; project administration, M.A., A.G., H.S.B., G.A. and A.F.D.; funding acquisition, G.A. and A.F.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research work was funded by King Faisal University, Saudi Arabia [GRANT KFU252539].

Data Availability Statement

The data that support the findings of this article are available from the corresponding author upon reasonable request.

Acknowledgments

This work was supported by the Deanship of Scientific Research, Vice Presidency for Graduate Studies and Scientific Research, King Faisal University, Saudi Arabia [GRANT KFU252539].

Manik Awale and Aishwarya Ghodake gratefully acknowledge the research grant [EEQ/2021/000341] from the ANRF (DST-SERB), Government of India, for supporting their research.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Theorem Proofs

Appendix A.1. Proof of Theorem 1

Proof.

Using recursion in (6) and using the property of binomial thinning operator, for k

\in N_{0}

, we can write

X_{t + k} = λ^{(m)} * X_{t - r} + \sum_{j = 0}^{m - 1} λ^{(j)} * ϵ_{t - j s}, where m = ⌈\frac{k}{s}⌉ and r = m s - k .

(A1)

Using the above equation, we adjust the terms and take its expectation after squaring it on both sides,

\begin{matrix} E {[X_{t} - \sum_{j = 1}^{m - 1} λ^{(j)} * ϵ_{t - j s} - ϵ_{t}]}^{2} = E {[λ^{(m)} * X_{t - r}]}^{2} . \end{matrix}

Conditioning on

X_{t - r}

, the distribution of the entity

λ * X_{t - r} | X_{t - r}

becomes binomial

(X_{t - r}, λ)

, the distribution of

λ^{(m)} * X_{t - r} | X_{t - r}

is

b i n o m i a l (X_{t - r}, λ^{m})

.

The pgf of the random variable

λ^{(m)} * X_{t - r} | X_{t - r}

is given as follows:

Φ_{λ^{(m)} * X_{t - r} | X_{t - r}} (s) = {(1 - λ^{m} + λ^{m} s)}^{X_{t - r}} .

(A2)

The pgf of the random variable

λ^{(2)} * X_{t - r} | X_{t - r}

is given as follows:

\begin{matrix} Φ_{λ * λ * X_{t - r} | X_{t - r}} (s) = {(1 - λ^{2} + λ^{2} s)}^{X_{t - r}} . \end{matrix}

As pgf uniquely determines the pmf, we can find the moments as follows:

\begin{matrix} E (λ^{(2)} * X_{t - r} | X_{t - r}) = λ^{2} X_{t - r} and V a r (λ^{(2)} * X_{t - r} | X_{t - r}) = λ^{2} (1 - λ^{2}) X_{t - r} . \end{matrix}

The pgf of the random variable

λ^{(3)} * X_{t - r} | X_{t - r}

is given as follows:

\begin{matrix} Φ_{λ * λ * λ * X_{t - r} | X_{t - r}} (s) = {(1 - λ^{3} + λ^{3} s)}^{X_{t - r}} . \end{matrix}

(A3)

The moments are as follows:

\begin{matrix} E (λ^{(3)} * X_{t - r} | X_{t - r}) = λ^{3} X_{t - r} and V a r (λ^{(3)} * X_{t - r} | X_{t - r}) = λ^{3} (1 - λ^{3}) X_{t - r} . \end{matrix}

Using the pgf, moments can be easily derived, and we can find the second moment from the (A3) as follows:

\begin{matrix} E [E {(λ^{(m)} * X_{t - r})}^{2} ∣ X_{t - r}] \\ = E [Var (λ^{(m)} * X_{t - r} ∣ X_{t - r}) + {(E (λ^{(m)} * X_{t - r} ∣ X_{t - r}))}^{2}] \\ = E [λ^{m} (1 - λ^{m}) X_{t - r} + λ^{2 m} X_{t - r}^{2}] \\ = λ^{m} (1 - λ^{m}) E (X_{t - r}) + λ^{2 m} E (X_{t - r}^{2}) \\ = λ^{m} (1 - λ^{m}) μ_{x} + λ^{2 m} (σ_{x}^{2} + μ_{x}^{2}) \\ \to 0 as k \to \infty (m \to \infty) . \end{matrix}

where

μ_{x}

and

σ_{x}^{2}

are the mean and variance of the stationary distribution. It proves that

{X_{t}}

converges to

\sum_{j = 0}^{m - 1} λ^{(j)} * ϵ_{t - j s}

in distribution. □

Appendix A.2. Proof of Theorem 2

Proof.

Let

X_{t} = λ * X_{t - s} + ϵ_{t}, t \geq s .

The marginal distribution of

X_{t}

is the transmuted geometric

(q, α)

. The pgf of X is given by

Φ_{X} (γ) = \frac{(1 - q) (1 - α q (γ - 1) - q^{2} γ)}{(1 - q γ) (1 - q^{2} γ)}, | q γ | < 1 .

Hence, the pgf of

ϵ

is given by

\begin{matrix} Φ_{ϵ} (γ) = \frac{(1 - α q (γ - 1) - q^{2} γ) (1 - q (1 - λ (1 - γ))) [1 - q^{2} (1 - λ (1 - γ))]}{(1 - q γ) (1 - q^{2} γ) [1 - α q λ (γ - 1) - q^{2} (1 - λ (1 - γ))]}, | q γ | < 1 . \end{matrix}

(A4)

The pmf of

ϵ_{t}

is given in (10).

The pgf of W is given as

Φ_{W} (γ) = (1 - λ + λ γ), | γ | \leq 1 .

Now, the pgf of

λ * X_{t - s} | X_{t - s}

can be given as

Φ_{λ * X_{t - s} | X_{t - s}} (γ) = {(1 - λ + λ γ)}^{X_{t - s}} .

Therefore, a one-step-ahead conditional pgf can be obtained as

\begin{matrix} Φ_{X_{t} | X_{t - s}} (γ) \\ = {(1 - λ + λ γ)}^{X_{t - s}} \times Φ_{ϵ} (γ) \\ = {(1 - λ + λ γ)}^{X_{t - s}} \times \frac{(1 - α q (γ - 1) - q^{2} γ) (1 - q (1 - λ (1 - γ))) (1 - q^{2} (1 - λ (1 - γ)))}{(1 - q γ) (1 - q^{2} γ) [1 - α q λ (γ - 1) - q^{2} (1 - λ (1 - γ))]} . \end{matrix}

The conditional distribution of

X_{t}

given

X_{t - s}

can be obtained using the above pgf by taking its differentiation.

Some of the one-step conditional probabilities (for any

x > 0

) are as follows:

$P (X_{t} = 0 | X_{t - s} = 0)$

$\begin{matrix} = \frac{(α q + 1) (1 - (1 - λ) q) (1 - (1 - λ) q^{2})}{1 - (1 - λ) q^{2} + α λ q} . \end{matrix}$
$P (X_{t} = 1 ∣ X_{t - s} = 0)$

$\begin{matrix} = \frac{(1 - q) q^{x} [(1 - α) (1 - λ) ((1 - λ) q + 1)]}{1 - α λ + (1 - λ) q} + \frac{(1 - α) α (1 - λ) λ}{(1 - α λ + (1 - λ) q) ((1 - λ) q - α λ)} \\ \times (1 - \frac{λ q (α + q)}{1 - q^{2} + λ q (α + q)}) {(\frac{λ q (α + q)}{1 - q^{2} + λ q (α + q)})}^{x} \\ + \frac{(1 - q^{2}) q^{2 x} (α (1 - λ) ((1 - λ) q - λ))}{(1 - λ) q - α λ} . \end{matrix}$
$P (X_{t} = 0 ∣ X_{t - s} = 1)$

$\begin{matrix} = \frac{(1 - λ) (α q + 1) (1 - (1 - λ) q) (1 - (1 - λ) q^{2})}{1 - (1 - λ) q^{2} + α λ q} . \end{matrix}$

By continuing this process, we can calculate all the probabilities of the one-step-ahead conditional distribution. □

Appendix A.3. Proof of Theorem 3

Proof.

To obtain the k-step-ahead conditional pmf, we proceed as follows.

X_{t + k} = λ * λ * \dots λ * X_{t} + {λ * \dots λ * ϵ_{t} + \dots + λ * ϵ_{t + k - 1} + ϵ_{t + k}},

(A5)

which can be written as

X_{t + k} = λ^{(m)} * X_{t} + \sum_{j = 0}^{m - 1} λ^{(j)} * ϵ_{t + k - j s}, where m = ⌈\frac{k}{s}⌉, and r = m s - k .

(A6)

Note that,

λ^{(m)}

indicates the operation of binomial thinning m times.

We know that

λ * X

is a binomial random variable for known value of

X = x

; hence,

P (λ * X = w | X = x) = (\binom{w}{x}) λ^{w} {(1 - λ)}^{(w - x)} .

Φ_{X_{t + k} | X_{t - r}} (γ) = Φ_{λ^{(m)} * X_{t - r} | X_{t - r}} (γ) \times Φ_{\sum_{j = 0}^{m - 1} λ^{(j)} * ϵ_{(t + k - j s)}} (γ)

As the pgf of

Φ_{λ * X_{t - r} | X_{t - r}} (γ)

with binomial thinning is given as

\begin{matrix} Φ_{λ * X_{t - r} | X_{t - r}} (γ) = {(1 - λ + λ γ)}^{X_{t - r}}, \end{matrix}

\begin{matrix} Φ_{λ * λ * X_{t - r} | X_{t - r}} (γ) = {(1 - λ^{2} + λ^{2} γ)}^{X_{t - r}}, \end{matrix}

\begin{matrix} Φ_{λ * λ * λ * X_{t - r} | X_{t - r}} (γ) = {(1 - λ^{3} + λ^{3} γ)}^{X_{t - r}} . \end{matrix}

Hence, using mathematical induction we can write the k-step-ahead pgf as follows:

Φ_{λ^{(m)} * X_{t - r} | X_{t - r}} (γ) = {(1 - λ^{m} + λ^{m} γ)}^{X_{t - r}} .

(A7)

As pgf uniquely determines the pmf, by differentiating pgf in (A7) for

γ

, the pmf of

λ^{(m)} * X_{t - r} | X_{t - r}

can be derived using the relation

P (λ^{(m)} * x = a | x) = \frac{Φ_{λ^{(m)} * x | x}^{(a)} (0)}{a!} .

(A8)

where

Φ_{λ^{(m)} * x | x}^{(a)} (0)

is the

a^{t h}

derivative of

Φ_{λ^{(m)} * x | x}^{(a)} (γ)

at

γ

= 0.

One-step-ahead innovation distribution is given by

\begin{matrix} Φ_{λ * ϵ_{t}} (γ) & = Φ_{ϵ_{t}} (Φ_{W} (γ)) \\ = Φ_{ϵ_{t}} (1 - λ + λ γ), \end{matrix}

Φ_{λ * ϵ_{t}} (γ) \times Φ_{ϵ_{t}} (γ) = \frac{(1 - α q (γ - 1) - q^{2} γ) (1 - q (1 - λ^{2} (1 - γ))) (1 - q^{2} (1 - λ^{2} (1 - γ)))}{(1 - q γ) (1 - q^{2} γ) [1 - α q λ^{2} (γ - 1) - q^{2} (1 - λ^{2} (1 - γ))]} .

(A9)

By using a similar procedure, we can write,

\begin{matrix} Φ_{\sum_{j = 0}^{m - 1} λ^{(j)} * ϵ_{t + k - j s}} (γ) = & \frac{(1 - α q (γ - 1) - q^{2} γ) (1 - q (1 - λ^{m} (1 - γ))) (1 - q^{2} (1 - λ^{m} (1 - γ)))}{(1 - q γ) (1 - q^{2} γ) [1 - α q λ^{m} (γ - 1) - q^{2} (1 - λ^{m} (1 - γ))]} . \end{matrix}

(A10)

By using (A7) and (A10), we can get the equation for the k-step-ahead conditional pgf of the process

{X_{t}}

as

\begin{matrix} Φ_{X_{t + k} | X_{t - r}} (γ) = Φ_{λ^{(m)} * X_{t - r} | X_{t - r}} (γ) \times Φ_{\sum_{j = 0}^{m - 1} λ^{(j)} * ϵ_{(t + k - j s)}} (γ) \end{matrix}

= {(1 - λ^{m} + λ^{m} γ)}^{X_{t - r}} \frac{(1 - α q (γ - 1) - q^{2} γ) (1 - q (1 - λ^{m} (1 - γ))) (1 - q^{2} (1 - λ^{m} (1 - γ)))}{(1 - q γ) (1 - q^{2} γ) [1 - α q λ^{m} (γ - 1) - q^{2} (1 - λ^{m} (1 - γ))]} .

(A11)

where

m = ⌈\frac{k}{s}⌉, r = m s - k .

As pgf uniquely determines the pmf, by differentiating the above pgf with respect to

γ

, we get the probabilities as follows:

$P (X_{t + k} = 0 | X_{t - r} = 0)$

$\begin{matrix} = \frac{(α q + 1) (1 - (1 - λ^{m}) q) (1 - (1 - λ^{m}) q^{2})}{1 - (1 - λ^{m}) q^{2} + α λ^{m} q} . \end{matrix}$
$P (X_{t + k} = 1 | X_{t - r} = 0)$

$\begin{matrix} = \frac{(1 - q) q^{x} [(1 - α) (1 - λ^{m}) ((1 - λ^{m}) q + 1)]}{1 - α λ^{m} + (1 - λ^{m}) q} \\ + \frac{(1 - α) α (1 - λ^{m}) λ^{m}}{(1 - α λ^{m} + (1 - λ^{m}) q) ((1 - λ^{m}) q - α λ^{m})} \\ \times (1 - \frac{λ^{m} q (α + q)}{1 - q^{2} + λ^{m} q (α + q)}) {(\frac{λ^{m} q (α + q)}{1 - q^{2} + λ^{m} q (α + q)})}^{x} \\ + \frac{(1 - q^{2}) q^{2 x} (α (1 - λ^{m}) ((1 - λ^{m}) q - λ^{m}))}{(1 - λ^{m}) q - α λ^{m}} . \end{matrix}$
$P (X_{t + k} = 0 | X_{t - r} = 1)$

$\begin{matrix} = \frac{(1 - λ^{m}) (α q + 1) (1 - (1 - λ^{m}) q) (1 - (1 - λ^{m}) q^{2})}{1 - (1 - λ^{m}) q^{2} + α λ^{m} q} . \end{matrix}$

By continuing this process, we can calculate all the probabilities of the k-step-ahead conditional distribution. □

Appendix B. Corollary Proofs

Appendix B.1. Proof of Corollary 1, (1)

Proof.

As

m = ⌈\frac{k}{s}⌉, and r = m s - k

, we can derive that

\begin{matrix} E (X_{t + k} | X_{t - r}) & = E (λ * X_{t + k - s} + ϵ_{t + k} | X_{t - r}) \\ = λ E (X_{t + k - s} | X_{t - r}) + μ_{ϵ} \\ = λ^{2} E (X_{t + k - 2 s} | X_{t - r}) + μ_{ϵ} (1 + λ) \\ \dots \\ = λ^{m} E (X_{t + k - m s} | X_{t - r}) + μ_{ϵ} (1 + λ + \dots + λ^{m - 1}) \\ = λ^{m} E (X_{t - r} | X_{t - r}) + μ_{ϵ} (1 + λ + \dots + λ^{m - 1}) \\ = λ^{m} X_{t - r} + (1 - λ^{m}) μ_{x} . \end{matrix}

As

lim_{m \to \infty} E (X_{t + k} | X_{t - r}) = μ_{x}

. □

Appendix B.2. Proof of Corollary 1, (2)

Proof.

After differentiating the k-step-ahead conditional pgf as given in Equation (A11), we get the first two moments and hence the k-step-ahead conditional variance is as follows:

E (X_{t + k} | X_{t - r}) = Φ_{X_{t + k} | X_{t - r}}^{'} (1) = λ^{m} X_{t - r} + \frac{q (1 - λ^{m}) (1 - α + q)}{(1 - q^{2})},

(A12)

Φ_{X_{t + k} | X_{t - r}}^{″} (1)

\begin{matrix} = \frac{1}{{(1 - q^{2})}^{2}} [2 (α - 1) λ^{m} (- 1 + λ^{m}) q X_{t - r} + λ^{2 m} (- 1 + X_{t - r}) X_{t - r} \\ - 2 (- 1 + α) (- 1 + λ^{m}) q^{3} (- 2 + λ^{m} X_{t - r}) + 2 q^{2} (1 + α^{2} λ^{m} (- 1 + λ^{m}) - α {(1 - λ^{m})}^{2} \\ + λ^{m} (- 1 + X_{t - r}) - λ^{2 m} {(X_{t - r})}^{2}) + q^{4} (2 - 2 λ^{m} (1 + X_{t - r}) + λ^{2 m} X_{t - r} (1 + X_{t - r}))] . \end{matrix}

Hence, the variance is

\begin{matrix} V a r (X_{t + k} | X_{t - r}) & = Φ_{X_{t + k} | X_{t - r}}^{″} (1) + Φ_{X_{t + k} | X_{t - r}}^{'} (1) - {(Φ_{X_{t + k} | X_{t - r}}^{'} (1))}^{2} \\ = \frac{(1 - λ^{m})}{{(1 - q^{2})}^{2}} [(1 - α) q - (α - 1) (1 + 2 λ^{m}) q^{3} + λ^{m} X_{t - r} \\ + λ^{m} q^{4} (1 + X_{t - r}) - q^{2} (- 2 + α^{2} (1 + λ^{m}) + λ^{m} (2 X_{t - r} - 1))] . \end{matrix}

(A13)

□

Appendix B.3. Proof of Corollary 1, (3)

Proof.

Consider

h = k s + r

,

r = 0, 1, 2, \dots, s - 1

. When

r = 0

, the lag h is a multiple of s.

In general,

X_{t} = λ^{(k)} * X_{t - k s} + \sum_{j = 0}^{k - 1} λ^{(j)} * ϵ_{t - j s} .

Now,

Cov (X_{t}, X_{t - k s}) = γ (h) = E (X_{t} X_{t - k s}) - E (X_{t}) E (X_{t - k s}) .

Hence,

\begin{matrix} γ (h) & = E (X_{t} X_{t - k s}) - E (X_{t}) E (X_{t - k s}) \\ = E (X_{t - k s} \cdot E (X_{t} | X_{t - k s})) - E (E (X_{t} | X_{t - k s})) \cdot E (X_{t - k s}) \\ = E (X_{t - k s} \cdot [λ^{k} X_{t - k s} + (1 - λ^{k}) μ_{x}]) - E (λ^{k} X_{t - k s} + (1 - λ^{k}) μ_{x}) \cdot E (X_{t - k s}) \\ = λ^{k} E {(X_{t - k s})}^{2} + (1 - λ^{k}) μ_{x} E (X_{t - k s}) - (λ^{k} μ_{x} + (1 - λ^{k}) μ_{x}) μ_{x} \\ = λ^{k} (σ_{x}^{2} + μ_{x}^{2}) + (1 - λ^{k}) μ_{x}^{2} - μ_{x}^{2} λ^{k} - μ_{x}^{2} (1 - λ^{k}) \\ = λ^{k} σ_{x}^{2} \\ γ (h) & = λ^{k} γ (0) \end{matrix}

As the process is stationary,

γ (0) = Var (X_{t})

. Hence, we can write

ρ (h) = λ^{k}

. When

h = k s

,

ρ (h) = λ^{h / s} .

Furthermore, the ACF is zero when h is not a multiple of s. This is because the observation at time t depends only on the observations at time points

t - s, t - 2 s, \dots

, and is independent of all other past observations.

□

References

Steutel, F.W.; van Harn, K. Discrete analogues of self-decomposability and stability. Ann. Probab. 1979, 7, 893–899. [Google Scholar] [CrossRef]
McKenzie, E. Some simple models for discrete variate time series 1. J. Am. Water Resour. Assoc. 1985, 21, 645–650. [Google Scholar] [CrossRef]
Al-Osh, M.A.; Alzaid, A.A. First-order integer-valued autoregressive (INAR (1)) process. J. Time Ser. Anal. 1987, 8, 261–275. [Google Scholar] [CrossRef]
McKenzie, E. Autoregressive moving-average processes with negative-binomial and geometric marginal distributions. Adv. Appl. Probab. 1986, 18, 679–705. [Google Scholar] [CrossRef]
Al-Osh, M.; Alzaid, A. Binomial autoregressive moving average models. Stoch. Model. 1991, 7, 261–282. [Google Scholar] [CrossRef]
Cardinal, M.; Roy, R.; Lambert, J. On the application of integer-valued time series models for the analysis of disease incidence. Stat. Med. 1999, 18, 2025–2039. [Google Scholar] [CrossRef]
Freeland, R.K.; McCabe, B.P. Analysis of low count time series data by Poisson autoregression. J. Time Ser. Anal. 2004, 25, 701–722. [Google Scholar] [CrossRef]
Freeland, R.K.; McCabe, B.P. Forecasting discrete valued low count time series. Int. J. Forecast. 2004, 20, 427–434. [Google Scholar] [CrossRef]
Park, Y.; Kim, H.Y. Diagnostic checks for integer-valued autoregressive models using expected residuals. Stat. Pap. 2012, 53, 951–970. [Google Scholar] [CrossRef]
Monteiro, M.; Scotto, M.G.; Pereira, I. Integer-valued autoregressive processes with periodic structure. J. Stat. Plan. Inference 2010, 140, 1529–1541. [Google Scholar] [CrossRef]
Bourguignon, M.; Vasconcellos, K.L.P.; Reisen, V.A.; Ispány, M. A Poisson INAR (1) process with a seasonal structure. J. Stat. Comput. Simul. 2016, 86, 373–387. [Google Scholar] [CrossRef]
Kim, H.Y.; Park, Y.S. Coherent forecasting in binomial AR(p) model. Commun. Stat. Appl. Methods 2010, 17, 27–37. [Google Scholar] [CrossRef]
Maiti, R.; Biswas, A. Coherent forecasting for over-dispersed time series of count data. Braz. J. Probab. Stat. 2015, 29, 747–766. [Google Scholar] [CrossRef]
Maiti, R.; Biswas, A. Coherent forecasting for stationary time series of discrete data. AStA Adv. Stat. Anal. 2015, 99, 337–365. [Google Scholar] [CrossRef]
Maiti, R.; Biswas, A.; Das, S. Time series of zero-inflated counts and their coherent forecasting. J. Forecast. 2015, 34, 694–707. [Google Scholar] [CrossRef]
Maiti, R.; Biswas, A.; Das, S. Coherent forecasting for count time series using Box–Jenkins’s AR(p) model. Stat. Neerl. 2016, 70, 123–145. [Google Scholar] [CrossRef]
Held, L.; Paul, M. Modeling seasonality in space-time infectious disease surveillance data. Biom. J. 2012, 54, 824–843. [Google Scholar] [CrossRef]
Schweer, S.; Weiss, C.H. Compound Poisson INAR (1) processes: Stochastic properties and testing for overdispersion. Comput. Stat. Data Anal. 2014, 77, 267–284. [Google Scholar] [CrossRef]
Li, H.; Yang, K.; Zhao, S.; Wang, D. First-order random coefficients integer-valued threshold autoregressive processes. AStA Adv. Stat. Anal. 2018, 102, 305–331. [Google Scholar] [CrossRef]
Tian, S.; Wang, D.; Cui, S. A seasonal geometric INAR process based on negative binomial thinning operator. Stat. Pap. 2020, 61, 2561–2581. [Google Scholar] [CrossRef]
Awale, M.; Kashikar, A.; Ramanathan, T. Modeling seasonal epidemic data using integer autoregressive model based on binomial thinning. Model Assist. Stat. Appl. 2020, 15, 1–17. [Google Scholar] [CrossRef]
Almuhayfith, F.E.; Okereke, E.W.; Awale, M.; Bakouch, H.S.; Alqifari, H.N. Some developments on seasonal INAR processes with application to influenza data. Sci. Rep. 2023, 13, 22037. [Google Scholar] [CrossRef]
Olakorede, N.M.; Olanrewaju, S.O. Integer-valued first order autoregressive (INAR (1)) model with negative binomial (NB) innovation for the forecasting of time series count data. Int. J. Stat. Probab. 2025, 12, 1–23. [Google Scholar] [CrossRef]
Aghababaei Jazi, M.; Jones, G.; Lai, C.D. Integer valued AR (1) with geometric innovations. J. Iran. Stat. Soc. 2022, 11, 173–190. [Google Scholar]
Nastić, A.S. On shifted geometric INAR (1) models based on geometric counting series. Commun. Stat.-Theory Methods 2012, 41, 4285–4301. [Google Scholar] [CrossRef]
Bourguignon, M.; Borges, P.; Fajardo Molinares, F. A new geometric INAR (1) process based on counting series with deflation or inflation of zeros. J. Stat. Comput. Simul. 2018, 88, 3338–3348. [Google Scholar] [CrossRef]
Manaa, A.; Souakri, R. Zero-inflated Poisson INAR (1) model with periodic structure. Commun. Stat.-Theory Methods 2025, 54, 1250–1270. [Google Scholar] [CrossRef]
Kong, J.; Lund, R. Seasonal count time series. J. Time Ser. Anal. 2023, 44, 93–124. [Google Scholar] [CrossRef]
Andrews, D.K.; Balakrishna, N. Coherent forecasting of NoGeAR (1) model. J. Indian Soc. Probab. Stat. 2024, 26, 87–111. [Google Scholar] [CrossRef]
Bourguignon, M.; Weiß, C.H. An INAR (1) process for modeling count time series with equidispersion, underdispersion and overdispersion. Test 2017, 26, 847–868. [Google Scholar] [CrossRef]
Kang, Y.; Sheng, D.; Lu, F. A simple INAR (1) model for analyzing count time series with multiple features. Commun. Stat.-Theory Methods 2025, 54, 457–475. [Google Scholar] [CrossRef]
Davis, R.A.; Fokianos, K.; Holan, S.H.; Joe, H.; Livsey, J.; Lund, R.; Pipiras, V.; Ravishanker, N. Count time series: A methodological review. J. Am. Stat. Assoc. 2021, 116, 1533–1547. [Google Scholar] [CrossRef]
Shaw, W.T.; Buckley, I.R. The alchemy of probability distributions: Beyond Gram-Charlier expansions, and a skew-kurtotic-normal distribution from a rank transmutation map. arXiv 2009, arXiv:0901.0434. [Google Scholar]
Chakraborty, S.; Bhati, D. Transmuted geometric distribution with applications in modeling and regression analysis of count data. SORT-Stat. Oper. Res. Trans. 2016, 40, 153–176. [Google Scholar]
Barreto-Souza, W. Zero-modified geometric INAR (1) process for modelling count time series with deflation or inflation of zeros. J. Time Ser. Anal. 2015, 36, 839–852. [Google Scholar] [CrossRef]
Yang, M.; Zamba, G.; Cavanaugh, J. ZIM: Zero-Inflated Models (ZIM) for Count Time Series with Excess Zeros, R package version 1.1.0; The R Project for Statistical Computing: Vienna, Austria, 2018. Available online: https://github.com/biostatstudio/ZIM (accessed on 5 May 2025).

Figure 1. Variance, skewness, and kurtosis plot for geometric distribution (blue line) and transmuted geometric distribution (red line) for various values of

α

.

Figure 1. Variance, skewness, and kurtosis plot for geometric distribution (blue line) and transmuted geometric distribution (red line) for various values of

α

.

Figure 2. TGD model with a fixed value of q = 0.6 and various values of

α

. (a)

α = - 0.95

, (b)

α = - 0.25

, (c)

α = 0.95

, (d)

α = 0.25

, (e)

α = 0

.

Figure 2. TGD model with a fixed value of q = 0.6 and various values of

α

. (a)

α = - 0.95

, (b)

α = - 0.25

, (c)

α = 0.95

, (d)

α = 0.25

, (e)

α = 0

.

Figure 3. Comparison of estimation methods using boxplots for

α = - 0.70

,

q = 0.75

, and

λ = 0.50

.

Figure 3. Comparison of estimation methods using boxplots for

α = - 0.70

,

q = 0.75

, and

λ = 0.50

.

Figure 4. Comparison of estimation methods using boxplots for

α = - 0.85

,

q = 0.5

, and

λ = 0.1

.

Figure 4. Comparison of estimation methods using boxplots for

α = - 0.85

,

q = 0.5

, and

λ = 0.1

.

Figure 5. Comparison of estimation methods using boxplots for

α = - 0.80

,

q = 0.85

, and

λ = 0.75

.

Figure 5. Comparison of estimation methods using boxplots for

α = - 0.80

,

q = 0.85

, and

λ = 0.75

.

Figure 6. Comparison of estimation methods using boxplots for

α = 0.15

,

q = 0.30

, and

λ = 0.25

.

Figure 6. Comparison of estimation methods using boxplots for

α = 0.15

,

q = 0.30

, and

λ = 0.25

.

Figure 7. QQ plots using CML for

α = - 0.80

,

q = 0.85

, and

λ = 0.75

.

Figure 7. QQ plots using CML for

α = - 0.80

,

q = 0.85

, and

λ = 0.75

.

Figure 8. Total Legionnaires’ cases per year from 2001 to 2023.

Figure 9. Average Legionnaires’ cases by week from 2001 to 2023.

Figure 10. Heatmap of Legionnaires’ cases by week and year from 2001 to 2023.

Figure 11. Frequency plot of counts for Legionnaires disease data.

Figure 12. Sample path, sample ACF, and sample PACF plots for Legionnaires’ disease data.

Figure 13. Marginal calibration plot of Legionnaires’ disease data.

Figure 14. Sample ACF of Pearson residuals for Legionnaires’ disease data.

Figure 15. Average syphilis cases by week from 2007 to 2010.

Figure 16. Heatmap of syphilis cases by week and year from 2007 to 2010.

Figure 17. Total syphilis cases per year from 2007 to 2010.

Figure 18. Frequency plot of counts for syphilis cases.

Figure 19. Sample path, sample ACF, and sample PACF plots for syphilis cases.

Figure 20. Marginal calibration plot of syphilis cases.

Figure 21. Sample ACF of Pearson residuals for syphilis cases.

Figure 22. Total Dengue cases per year from 2001 to 2023.

Figure 23. Average dengue cases by week from 2001 to 2023.

Figure 24. Heatmap of dengue cases by week and year from 2001 to 2023.

Figure 25. Frequency plot of counts for dengue disease data.

Figure 26. Sample path, sample ACF, and sample PACF plots for dengue disease data.

Figure 27. Marginal calibration plot of dengue disease data.

Figure 28. Sample ACF of Pearson residuals for dengue disease data.

Table 1. Parameter estimates and their mean squared errors.

n	${\hat{α}}_{CLS}$	${\hat{q}}_{CLS}$	${\hat{λ}}_{CLS}$	${\hat{α}}_{CML}$	${\hat{q}}_{CML}$	${\hat{λ}}_{CML}$
	$α = - 0.7$ , $q = 0.75$ , $λ = 0.5$
50	$- 0.648$	$0.742$	$0.471$	$- 0.638$	$0.748$	$0.488$
	( $0.024$ )	( $0.012$ )	( $0.003$ )	( $0.009$ )	( $0.002$ )	( $0.003$ )
100	$- 0.654$	$0.750$	$0.476$	$- 0.682$	$0.748$	$0.493$
	( $0.002$ )	( $0.000$ )	( $0.017$ )	( $0.090$ )	( $0.004$ )	( $0.000$ )
300	$- 0.665$	$0.754$	$0.495$	$- 0.696$	$0.749$	$0.498$
	( $0.001$ )	( $0.000$ )	( $0.002$ )	( $0.000$ )	( $0.000$ )	( $0.000$ )
500	$- 0.665$	$0.751$	$0.495$	$- 0.700$	$0.749$	$0.499$
	( $0.000$ )	( $0.000$ )	( $0.000$ )	( $0.001$ )	( $0.000$ )	( $0.000$ )
1000	$- 0.671$	$0.752$	$0.498$	$- 0.703$	$0.750$	$0.499$
	( $0.000$ )	( $0.000$ )	( $0.001$ )	( $0.001$ )	( $0.000$ )	( $0.000$ )
1500	$- 0.673$	$0.752$	$0.498$	$- 0.702$	$0.750$	$0.499$
	( $0.001$ )	( $0.000$ )	( $0.000$ )	( $0.000$ )	( $0.000$ )	( $0.000$ )
	$α = - 0.85$ , $q = 0.50$ , $λ = 0.10$
50	$- 0.820$	$0.501$	$0.070$	$- 0.748$	$0.507$	$0.116$
	( $0.009$ )	( $0.005$ )	( $0.028$ )	( $0.023$ )	( $0.003$ )	( $0.012$ )
100	$- 0.822$	$0.501$	$0.086$	$- 0.806$	$0.504$	$0.010$
	( $0.010$ )	( $0.002$ )	( $0.002$ )	( $0.167$ )	( $0.004$ )	( $0.000$ )
300	$- 0.828$	$0.502$	$0.096$	$- 0.842$	$0.500$	$0.099$
	( $0.000$ )	( $0.001$ )	( $0.000$ )	( $0.010$ )	( $0.000$ )	( $0.004$ )
500	$- 0.830$	$0.502$	$0.097$	$- 0.849$	$0.500$	$0.100$
	( $0.001$ )	( $0.000$ )	( $0.001$ )	( $0.004$ )	( $0.001$ )	( $0.001$ )
1000	$- 0.832$	$0.502$	$0.099$	$- 0.853$	$0.500$	$0.099$
	( $0.003$ )	( $0.000$ )	( $0.000$ )	( $0.000$ )	( $0.000$ )	( $0.000$ )
1500	$- 0.830$	$0.502$	$0.099$	$- 0.847$	$0.500$	$0.100$
	( $0.000$ )	( $0.000$ )	( $0.001$ )	( $0.005$ )	( $0.000$ )	( $0.000$ )
	$α = - 0.8$ , $q = 0.85$ , $λ = 0.75$
50	$- 0.750$	$0.824$	$0.728$	$- 0.666$	$0.846$	$0.744$
	( $0.000$ )	( $0.000$ )	( $0.078$ )	( $0.003$ )	( $0.002$ )	( $0.000$ )
100	$- 0.771$	$0.845$	$0.725$	$- 0.729$	$0.846$	$0.746$
	( $0.000$ )	( $0.001$ )	( $0.000$ )	( $0.014$ )	( $0.000$ )	( $0.000$ )
300	$- 0.781$	$0.849$	$0.737$	$- 0.780$	$0.851$	$0.750$
	( $0.001$ )	( $0.001$ )	( $0.002$ )	( $0.001$ )	( $0.000$ )	( $0.000$ )
500	$- 0.785$	$0.850$	$0.743$	$- 0.788$	$0.849$	$0.749$
	( $0.000$ )	( $0.000$ )	( $0.002$ )	( $0.002$ )	( $0.000$ )	( $0.000$ )
1000	$- 0.788$	$0.850$	$0.747$	$- 0.793$	$0.850$	$0.749$
	( $0.000$ )	( $0.000$ )	( $0.000$ )	( $0.001$ )	( $0.000$ )	( $0.000$ )
1500	$- 0.790$	$0.850$	$0.748$	$- 0.798$	$0.850$	$0.750$
	( $0.001$ )	( $0.000$ )	( $0.000$ )	( $0.000$ )	( $0.000$ )	( $0.000$ )
	$α = 0.15$ , $q = 0.30$ , $λ = 0.25$
50	$0.174$	$0.293$	$0.161$	$- 0.140$	$0.266$	$0.239$
	( $0.000$ )	( $0.000$ )	( $0.005$ )	( $1.322$ )	( $0.046$ )	( $0.049$ )
100	$0.172$	$0.299$	$0.199$	$0.044$	$0.296$	$0.241$
	( $0.000$ )	( $0.000$ )	( $0.025$ )	( $1.215$ )	( $0.026$ )	( $0.026$ )
300	$0.171$	$0.302$	$0.228$	$0.138$	$0.303$	$0.246$
	( $0.001$ )	( $0.000$ )	( $0.005$ )	( $0.008$ )	( $0.002$ )	( $0.005$ )
500	$0.170$	$0.304$	$0.239$	$0.158$	$0.308$	$0.248$
	( $0.002$ )	( $0.000$ )	( $0.002$ )	( $0.859$ )	( $0.013$ )	( $0.001$ )
1000	$0.171$	$0.303$	$0.243$	$0.145$	$0.302$	$0.248$
	( $0.001$ )	( $0.000$ )	( $0.001$ )	( $0.034$ )	( $0.002$ )	( $0.000$ )
1500	$0.170$	$0.303$	$0.248$	$0.163$	$0.304$	$0.250$
	( $0.001$ )	( $0.000$ )	( $0.000$ )	( $0.017$ )	( $0.001$ )	( $0.001$ )

Table 2. AIC and BIC for the seasonal INAR(1) models for Legionnaires’ disease data.

Model	Estimated Values	AIC	BIC	RMSE
PoINAR(1)s	$\hat{ϕ} = 0.1324$ , $\hat{λ} = 1.2283$	505.0183	511.1561	1.3887
GINAR(1)s	$\hat{ϕ} = 0.0987$ , $\hat{θ} = 0.5752$	514.1081	520.2459	1.3939
NGINAR(1)s	$\hat{α} = 0.1648$ , $\hat{μ} = 1.3424$	513.2207	519.3585	1.3876
ZMGINAR(1)s	$\hat{α} = 0.3750$ , $\hat{π} = - 0.3158$ , $\hat{μ} = 1.1636$	506.1553	515.3620	1.4088
TGDINAR(1)s	$\hat{α} = 0.9999$ , $\hat{q} = 0.4565$ , $\hat{λ} = 0.1184$	498.3401	507.5468	1.3902

Table 3. Point forecast for Legionnaires’ disease data with

γ

= 0.90.

Table 3. Point forecast for Legionnaires’ disease data with

γ

= 0.90.

k	$X_{t + k}$	Mean	Mode	Median	HPP
1	1	1.72	1	1	[0, 4)
2	3	1.96	1	2	[0, 4)
3	1	1.60	1	1	[0, 4)
4	1	1.41	1	1	[0, 3)
5	1	1.44	1	1	[0, 3)
6	4	1.41	1	1	[0, 3)
7	2	1.42	1	1	[0, 3)
8	2	1.42	1	1	[0, 3)
9	3	1.42	1	1	[0, 3)
10	0	1.42	1	1	[0, 3)

Table 4. AIC and BIC for the seasonal INAR(1) models for syphilis cases.

Model	Estimated Values	AIC	BIC	RMSE
PoINAR(1)s	$\hat{ϕ} = 0.1444$ , $\hat{λ} = 2.3552$	820.7281	827.2945	2.0338
GINAR(1)s	$\hat{ϕ} = 0.1342$ , $\hat{θ} = 0.7196$	833.4564	840.0229	2.0413
NGINAR(1)s	$\hat{α} = 0.3297$ , $\hat{μ} = 2.4377$	828.0160	834.5824	2.0591
ZMGINAR(1)s	$\hat{α} = 0.3750$ , $\hat{π} = - 0.1254$ , $\hat{μ} = 2.2853$	825.2454	835.0950	2.0664
TGDINAR(1)s	$\hat{α} = - 0.8055$ , $\hat{q} = 0.6526$ , $\hat{λ} = 0.0643$	806.4264	816.2760	2.0332

Table 5. Point forecast for syphilis cases with

γ

= 0.90.

Table 5. Point forecast for syphilis cases with

γ

= 0.90.

k	$X_{t + k}$	Mean	Mode	Median	HPP
1	2	2.74	1	2	[0, 6)
2	1	2.74	1	2	[0, 6)
3	0	2.68	1	2	[0, 6)
4	0	2.61	1	2	[0, 6)
5	2	2.74	1	2	[0, 6)
6	0	2.61	1	2	[0, 6)
7	0	2.79	1	2	[0, 6)
8	0	2.79	1	2	[0, 6)
9	0	2.79	1	2	[0, 6)
10	1	2.78	1	2	[0, 6)

Table 6. AIC and BIC for the seasonal INAR(1) models for dengue disease data.

Model	Estimated Values	AIC	BIC	RMSE
PoINAR(1)s	$\hat{ϕ} = 0.1363$ , $\hat{λ} = 0.9348$	850.0882	857.4824	1.1896
GINAR(1)s	$\hat{ϕ} = 0.1133$ , $\hat{θ} = 0.5109$	855.7436	863.1378	1.1909
NGINAR(1)s	$\hat{α} = 0.2114$ , $\hat{μ} = 1.0409$	852.6276	860.0218	1.1916
ZMGINAR(1)s	$\hat{α} = 0.3750$ , $\hat{π} = - 0.2332$ , $\hat{μ} = 0.9602$	849.2594	860.3507	1.2194
TGDINAR(1)s	$\hat{α} = - 0.8059$ , $\hat{q} = 0.4077$ , $\hat{λ} = 0.1496$	840.0040	851.0953	1.1893

Table 7. Point forecast for dengue disease data with

γ

= 0.90.

Table 7. Point forecast for dengue disease data with

γ

= 0.90.

k	$X_{t + k}$	Mean	Mode	Median	HPP
1	1	1.22	1	1	[0, 3)
2	2	1.10	0	1	[0, 3)
3	0	1.09	0	1	[0, 3)
4	1	1.08	0	1	[0, 3)
5	0	1.08	0	1	[0, 3)
6	0	1.08	0	1	[0, 3)
7	0	1.08	0	1	[0, 3)
8	0	1.08	0	1	[0, 3)
9	1	1.08	0	1	[0, 3)
10	3	1.08	0	1	[0, 3)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ghodake, A.; Awale, M.; Bakouch, H.S.; Alomair, G.; Daghestani, A.F. A Seasonal Transmuted Geometric INAR Process: Modeling and Applications in Count Time Series. Mathematics 2025, 13, 2334. https://doi.org/10.3390/math13152334

AMA Style

Ghodake A, Awale M, Bakouch HS, Alomair G, Daghestani AF. A Seasonal Transmuted Geometric INAR Process: Modeling and Applications in Count Time Series. Mathematics. 2025; 13(15):2334. https://doi.org/10.3390/math13152334

Chicago/Turabian Style

Ghodake, Aishwarya, Manik Awale, Hassan S. Bakouch, Gadir Alomair, and Amira F. Daghestani. 2025. "A Seasonal Transmuted Geometric INAR Process: Modeling and Applications in Count Time Series" Mathematics 13, no. 15: 2334. https://doi.org/10.3390/math13152334

APA Style

Ghodake, A., Awale, M., Bakouch, H. S., Alomair, G., & Daghestani, A. F. (2025). A Seasonal Transmuted Geometric INAR Process: Modeling and Applications in Count Time Series. Mathematics, 13(15), 2334. https://doi.org/10.3390/math13152334

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Seasonal Transmuted Geometric INAR Process: Modeling and Applications in Count Time Series

Abstract

1. Introduction

The Transmuted Geometric Distribution

2. Construction and Properties of the Process

3. Estimation of Parameters

3.1. Conditional Least Squares (CLS) Estimation

3.2. Conditional Maximum-Likelihood (CML) Estimation

3.3. Simulation Study

4. Real-Life Data Analysis

4.1. Exploratory Analysis of Legionnaires’ Data in Bavaria (2001–2023)

Legionnaires’ Disease

4.2. Exploratory Analysis of Syphilis Cases (2007–2010)

Syphilis Patient Data

4.3. Exploratory Analysis of Dengue Cases (2001–2023)

Dengue Fever Cases

5. Concluding Remarks

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. Theorem Proofs

Appendix A.1. Proof of Theorem 1

Appendix A.2. Proof of Theorem 2

Appendix A.3. Proof of Theorem 3

Appendix B. Corollary Proofs

Appendix B.1. Proof of Corollary 1, (1)

Appendix B.2. Proof of Corollary 1, (2)

Appendix B.3. Proof of Corollary 1, (3)

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI