1. Introduction
Time series analysis, a cornerstone of statistics and data science, enables researchers and analysts to uncover hidden patterns, make predictions, and extract meaningful insights from sequential data. While widely adopted in numerous domains, most conventional models primarily focus on continuous-valued time series. However, many real-world scenarios require careful modeling of integer-valued series, including those with both positive and negative integers. Addressing this challenge, this paper introduces a novel integer-valued time series model that integrates copula theory and the bivariate Skellam distribution, adeptly accommodating positive and negative integers.
Copulas are widely recognized for their flexibility in modeling complex dependence structures among random variables. In this study, we extend the applicability of copulas to the domain of integer-valued time series, where maintaining integer constraints without compromising dependence structures is critical. A core focus of our research is to ensure the integrity of integer-valued data when applying copula theory. Although copulas effectively capture intricate dependencies, integer-valued time series present unique challenges that require innovative solutions. We tackle this by introducing a specialized construction that integrates copulas while preserving the discrete integer nature of the data.
The proposed methodology offers several advantages, including its ability to model a wide range of dependencies—positive and negative associations, non-linear relationships, and complex temporal dependence—and to adapt to various real-world applications. To demonstrate the practicality and effectiveness of our model, we conduct extensive empirical analyses on simulated and real datasets. These analyses confirm that our model performs well compared to traditional continuous-valued time series approaches. Our approach has implications across multiple fields such as finance, epidemiology, and environmental science, where accurately capturing the dynamics of integer-valued series, including both positive and negative integers, is critical.
The foundation of our framework is built on two key characteristics. First, the data consist of integer-valued time series counts, positive or negative. For example,
Karlis et al. (
2024) modeled score differences in handball, highlighting limitations in using the Poisson distribution due to violations of the assumption of equal mean and variance. They proposed the Conway–Maxwell–Poisson distribution as suitable, yet emphasized the necessity of integer-valued distributions when analyzing score differences, making the Skellam distribution an ideal choice. They further utilized Frank and Gumbel copulas to capture correlation and tail dependence, respectively. Similarly,
Iacopini and Santagiustina (
2021) applied this model within a Bayesian framework to social media count data, jointly analyzing dynamics- and amplification-driven jumps between the US and the UK while managing stationarity issues through state transitions. Furthermore,
Wang and Wang (
2018) addressed non-stationary multivariate count data, acknowledging challenges in specifying precise dependency structures.
Heinen and Rengifo (
2007) introduced a multivariate autoregressive conditional double Poisson model using copulas.
Fokianos et al. (
2020) advocated Poisson marginals without resorting to continuous approximations, implicitly suggesting extensions to negative count series.
Jan Bulla and Kachour (
2015) developed the bivariate Skellam distribution and proposed parameter estimation via maximum likelihood (ML), approximated using Gaussian distributions, with a practical application to soccer goal differences. As highlighted in the review by
Karlis and Khan (
2023) on integer data modeling, copula-based approaches emerge as promising methods. Another application by
Barreto-Souza (
2015) analyzed integer-valued time series data for Saudi stock market close–open price differences in 2012. Although an autoregressive model was utilized there, our copula-based bivariate approach explicitly models both marginal and joint temporal dependencies.
In the following sections, we explore the methodological details while rigorously maintaining integer constraints, present empirical results, and illustrate practical applications. The remainder of the manuscript is structured as follows.
Section 2 reviews the copula-based Skellam model and outlines the estimation procedure and diagnostic tools.
Section 3 presents simulation studies.
Section 4 provides a real-data application to stock tick differences. Finally,
Section 5 concludes with a summary and discussion of possible extensions.
2. The Copula-Based Skellam Model
In this section, we revisit the Skellam distribution (
Skellam, 1951) and its regression formulation, then discuss the copula construction used to build a Markovian dependence structure for integer-valued time series.
2.1. Skellam Distribution and Regression
Suppose that
denotes a random positive or negative integer value at time
t. We say that
follows a Skellam distribution with rate parameters
and
if the probability mass function (pmf) is
for
,
, where
is the modified Bessel function of the first kind and order
r (
Abramowitz & Stegun, 1968),
and
r can be real or complex. Simplifications are obtained when
r is an integer; see
Bowman (
2012).
We write
. A simple check shows that
with moment generating function
The cumulative distribution function (cdf) of
is
Following generalized linear models (GLMs) (
Nelder & Wedderburn, 1972), covariates can be included via a regression specification. A Skellam regression model fits two GLMs for
and
with log link:
where
is a vector of
k covariates associated with regression parameter vector
for
, and
.
2.2. Copula-Based Bivariate Skellam Model
We now introduce the copula-based Skellam model, combining copula theory with Skellam marginals to construct a multivariate distribution suitable for serially dependent counts. There are many applicable copula families; see
Joe (
2014) for an extensive review. Different bivariate copulas capture different types of dependence.
For a first-order Markov specification, we consider consecutive Skellam random variables
. Let
denote the Skellam cdf at time
t, and
a bivariate copula with dependence parameter
. The joint pmf is obtained by the standard inclusion–exclusion formula for discrete margins:
where
and
For a Gaussian copula, we have
where
is the standard normal quantile function,
is the bivariate standard normal cdf with correlation
, and
is the corresponding pdf. Substituting (
6) into (
4) yields the joint pmf of
when serial dependence is captured by a Gaussian copula. Although (
6) has no closed form, accurate numerical approximations are straightforward in the bivariate case.
2.3. Estimation Method
Let
be a first-order Markov count series with marginal pmf
and bivariate cdf
where
and
denote marginal and copula parameters, respectively. The joint pmf factorizes as
where
Hence, the log-likelihood is
where
. The parameter vector is estimated by
Numerical maximization yields
and the observed Hessian matrix; the inverse of the latter provides the observed Fisher information, whose diagonal square-roots give standard errors for the ML estimates. Asymptotic properties of such estimates can be found in
Alqawba and Diawara (
2021). R code for the negative log-likelihood corresponding to (
8) is provided in
Appendix A.
2.4. Randomized Quantile Residuals
The primary aim of this section is to define residuals that are exactly i.i.d. under the fitted model, so that their empirical distribution, autocorrelation, and other diagnostics provide a direct gauge of model adequacy: any systematic departure from standard normal behaviour signals lack of fit.
Classical linear models with independent Gaussian errors possess well-known normal residuals, but those residuals lose their approximate normality when the response is discrete or otherwise non-Gaussian. To restore the desired
property,
Dunn and Smyth (
1996a) introduced randomized quantile residuals (RQRs). We adopt that idea here.
Under the assumptions of our model, the residual at time
t is defined as
where
is the inverse of the standard normal quantile function,
,
,
, and the random variables
make
continuous and mutually independent. By the probability transformation theorem,
.
The conditional cdf required above is
with
and
denoting the copula that links consecutive observations; see
Joe (
2014) for derivations. Similar residuals appear elsewhere under different names—for example, “pseudo-residuals” in
MacDonald and Zucchini (
1997) and as the basis for residual cumulative sum (CUSUM) charts in
Alqawba et al. (
2022). CUSUM detects small persistent shifts and is used for count data because it accumulates small, persistent deviations in noisy discrete processes, enabling early detection of shifts in the underlying event rate. The theoretical properties of CUSUM rely on independence and stationarity, which are violated in dependent count time series; model-based generalized residuals restore these assumptions and allow valid change detection
Page (
1954);
Dunn and Smyth (
1996a);
Rémillard (
2017);
Alqawba et al. (
2022).
Because the
’s are standard normal under the fitted model, any departure from
—in marginal distribution, serial dependence, or higher-order structure—serves as clear evidence that the model fails to capture some aspect of the data. The R function for computing RQRs is given in
Appendix A.
3. Simulation Studies
In this simulation study, each run generated a synthetic integer-valued time series under a Markov Skellam model with Gaussian copula, employing known “true” parameters
,
, and
. These parameters define the marginal Skellam distributions (through
and
) and the serial dependence structure (through the copula parameter
). For each scenario, we varied both the sample size
n (e.g., 100, 200, 500) and the sign and magnitude of
(e.g.,
or
), then repeated the simulation 500 times, fitting the model each time via maximum likelihood estimation. We subsequently took the mean of the 500 estimates for each parameter to assess bias, and computed the Mean Absolute Deviation of Estimates (MADE) to evaluate overall estimation accuracy; for a given parameter
,
where
is the estimate of
from the
rth simulation out of
R runs, and
is the true parameter value.
From the table of results, one observes that for each scenario the mean estimated values of , , and lie close to their corresponding true parameters, especially for moderate to larger n. As n increases to 200 or 500, the estimates become more precise, generally converging closer to the true values with smaller MADE. This improvement is consistent across both positively and negatively correlated scenarios ( vs. ), indicating that the maximum likelihood procedure adapts well regardless of whether the underlying dependence is positive or negative.
For smaller sample sizes (e.g., ), one can detect mild bias in some cases, where the mean estimated values for , , or differ from the true parameter by a moderate margin. However, even in these small-n scenarios, the MADE remains within a tolerable range, suggesting that the estimators do not systematically diverge from the target. As the sample size grows to 200 and then 500, this bias diminishes, and the MADE values drop substantially, reflecting the consistency of the maximum likelihood estimator.
Overall, these simulations demonstrate that the Markov Skellam model with Gaussian copula is well-estimated by maximum likelihood, with robust performance across various combinations of marginal parameters and dependence levels. The MADE metrics confirm that the absolute deviations from truth tend to be moderate at lower sample sizes and shrink as n increases. This pattern aligns with standard large-sample asymptotic theory, supporting the reliability of the proposed model and estimation procedure for integer-valued time series data with Skellam marginals and Gaussian-copula-based serial dependence.
Figure 1 shows that, even with negative serial dependence (
) and the smallest sample size (
), the box-plots cluster tightly around the true values of
,
, and
, confirming good finite-sample performance. As sample size increases to
and
, the interquartile ranges shrink and the medians coincide almost exactly with the reference lines, indicating greater precision. Estimator accuracy improves monotonically with sample size across both positive and negative autocorrelation settings, underscoring the robustness of the estimation method.
Figure 2 presents QQ-plots for each estimated parameter across different parameter sets and sample sizes, illustrating that the estimates adhere closely to normality, particularly as the sample size increases. This supports inference procedures relying on normality assumptions, such as Wald tests and likelihood-ratio tests, as applied to the parameter estimates.
4. Applications
We now apply the proposed model to a real data set. The example originates in
Barreto-Souza (
2015), who analysed integer-valued time series data for the Saudi stock market based on close and open price differences in 2012. Although an integer-valued autoregressive first order (INAR(1)) skew process model was considered there, the copula-based bivariate construction presented here has several advantages, as it simultaneously models the temporal dependence structure between successive observations and the joint dependency structures implied by the copula.
The underlying assumption is that there is dependence in the time series. While
Barreto-Souza (
2015) estimate the difference between means under a skew discrete Laplace distribution and a thinning operator, our method provides separate estimates for the Poisson means underlying the Skellam distribution. The skew Laplace distribution can be thought of as a difference between two independent geometric random variables, each describing the number of trials until first success in a sequence of Bernoulli trials with fixed success probability. Although this is a valid construction, the Poisson model instead captures the number of events at a constant average rate, which is arguably more natural for tick-level price changes.
Table 1 confirms that the STC series is centred close to zero, moderately overdispersed (
), and roughly symmetric. These features align well with the Skellam distribution, which naturally accommodates both positive and negative counts.
Figure 3 reinforces this interpretation: the top panel resembles a symmetric Skellam-like shape, while the ACF in the bottom panel shows a single significant spike at lag 1, suggesting first-order serial dependence. Such dependence can be captured through a Gaussian-copula structure, with Skellam marginals for the integer-valued observations.
Hence, as argued by
Sellers (
2012) and
Weiß (
2020), the Skellam distribution is particularly suitable for such data. Following the simulation design in
Table 2, we estimated the model under four different copula functions: Gaussian, Clayton, Joe, and Gumbel. The resulting parameter estimates are reported in
Table 3. The four copulas yield similar Poisson mean estimates.
Under the Gumbel copula, the estimated Poisson means are
and
with standard errors
and
, respectively. Gumbel is an example of an extreme-value copula that captures upper-tail dependence. The dependence parameter (
) under Gaussian copula is close to the thinning parameter
reported in
Barreto-Souza (
2015), but with smaller standard error. The Gumbel copula is useful when variables tend to show tail dependence or extreme co-movements. With dependence parameter
, a first-order autoregressive structure is adequate, and has smallest AIC, with value
.
Their skew integer autoregressive model did not show evidence of a trend, and their relatively large standard errors for the individual parameters suggested the need for a more efficient model. In contrast, the Poisson-based Skellam specification proposed here yields smaller standard errors. Another reason for this improvement may be that the probabilistic structure of bounded integer-valued processes remains an open problem; see
Scotto et al. (
2015) for a discussion.
To compare the proposed model here against established models for
Z-valued integer time series, we compare our results to the illustrative empirical study reported by
Karlis and Khan (
2023). Using the Saudi Telecom (STC) tick-difference series, they fit a set of widely used first-order
Z-INAR-type alternatives, including TINAR and PDINAR (Skellam-based constructions), as well as discrete Laplace and skew-Laplace variants (e.g., DLINAR and skew-Laplace SINARZ), and evaluated model adequacy primarily through marginal fit. Their Table 2 reports an “MSE” defined as the sum of squared differences between the observed relative-frequency vector of
and the corresponding relative-frequency vector obtained from parametric bootstrap simulations (1000 replications) under each fitted model. Following the same diagnostic definition, we compute an analogous pmf-based discrepancy for the fitted copula models. The copula specifications yield
values around
–
(
Table 3), improving upon the Skellam-based
Z-INAR benchmarks (TINAR:
; PDINAR:
; Skellam-based SINARZ:
) and performing comparably to PSINARAsym (
). However, the discrete Laplace and skew-Laplace baselines in
Karlis and Khan (
2023) remain markedly lower (DLINAR:
; skew-Laplace SINARZ:
), consistent with their conclusion that heavier-tailed and/or skewed integer marginals can more closely reproduce the observed frequency mass. Collectively, this comparison situates the copula-based approach as a competitive alternative to standard Skellam-based
Z-INAR formulations.
Figure 4,
Figure 5 and
Figure 6 show that the randomized quantile residuals from all four copula models follow the
QQ line closely, supporting the assumption of normality and, by implication, the adequacy of the marginal Skellam–copula specification. Because the residuals are essentially Gaussian, standard CUSUM charts are appropriate. The CUSUM trajectories remain inside the
bounds except for a small excursion around observations 100–200, indicating only transient departures and no sustained process shift. Together with the single significant ACF spike at lag 1, these diagnostics confirm that a first-order dependence structure, captured through the copula, suffices for the STC series.
5. Summary
Count data play a crucial role in many areas and provide valuable insights. For integer-valued count data, reliable inferences lead to accurate characterization of the underlying process. This article has presented an original integer-valued time series model via copula theory, accounting for both positive and negative integers, and offers a versatile solution for modeling such series. The proposed class of models enhances the accuracy of modeling, forecasting, and decision-making in settings where integer-valued data with mixed signs are observed.
There are several possible extensions. The Poisson marginals could be replaced by more flexible distributions such as the negative binomial or Conway–Maxwell–Poisson. These models could be computationally demanding, but the use of copulas offers an improved efficacy in providing a tool frame to build from the univariate but also high dimensional or multivariate integer valued time series models. The comparison with the Exponentially Weighted Moving Average time series model is another interesting project. Although the first order Markov process was presented for stable solutions with minimal complexities, higher order Markov processes can be suggested, when delay effects are suspected.
Author Contributions
Conceptualization, M.A., N.D. and M.M.S.; methodology, M.A., N.D. and M.M.S.; software, M.A., N.D. and M.M.S.; validation, M.A., N.D. and M.M.S.; formal analysis, M.A., N.D. and M.M.S.; investigation, M.A., N.D. and M.M.S.; resources, M.A., N.D. and M.M.S.; data curation, M.A., N.D. and M.M.S.; writing—original draft preparation, M.A., N.D. and M.M.S.; writing—review and editing, M.A., N.D. and M.M.S.; visualization, M.A., N.D. and M.M.S.; supervision, M.A., N.D. and M.M.S.; project administration, M.A., N.D. and M.M.S. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
Data and R code used in this study are available from the corresponding author upon reasonable request.
Acknowledgments
The authors thank the anonymous editor and the two referees for their constructive comments and suggestions, which improved the manuscrpt significantly.
Conflicts of Interest
The authors declare no conflicts of interest.
Abbreviations
The following abbreviations are used in this manuscript:
| ACF | Autocorrelation function |
| CUSUM | Cumulative sum control chart |
| GLM | Generalized linear model |
| RQR | Randomized quantile residual |
Appendix A. R Code
Appendix A.1. Negative Log-Likelihood
| Listing A1. Function nll_skellam(). |
![Jrfm 19 00027 i001 Jrfm 19 00027 i001]() |
Appendix A.2. Randomized Quantile Residuals
| Listing A2. Function rqr_skellam(). |
![Jrfm 19 00027 i002 Jrfm 19 00027 i002]() |
References
- Abramowitz, M., & Stegun, I. A. (1968). Handbook of mathematical functions with formulas, graphs, and mathematical tables (Vol. 55). US Government Printing Office. [Google Scholar]
- Alqawba, M., & Diawara, N. (2021). Copula-based Markov zero-inflated count time series models with application. Journal of Applied Statistics, 48(5), 786–803. [Google Scholar] [CrossRef] [PubMed]
- Alqawba, M., Kim, J.-M., & Radwan, T. (2022). Residual-based cumulative sum charts to monitor time series of counts via copula-based Markov models. Applied Stochastic Models in Business and Industry, 38(6), 1039–1048. [Google Scholar] [CrossRef]
- Barreto-Souza, W. (2015). Zero-modified geometric INAR (1) process for modelling count time series with deflation or inflation of zeros. Journal of Time Series Analysis, 36(6), 839–852. [Google Scholar] [CrossRef]
- Bowman, F. (2012). Introduction to bessel functions. Courier Corporation. [Google Scholar]
- Dunn, P. K., & Smyth, G. K. (1996a). Randomized quantile residuals. Journal of Computational and Graphical Statistics, 5(3), 236–244. [Google Scholar] [CrossRef]
- Fokianos, K., Støve, B., Tjøstheim, D., & Doukhan, P. (2020). Multivariate count autoregression. Bernoulli, 26(1), 471–499. [Google Scholar] [CrossRef]
- Heinen, A., & Rengifo, E. (2007). Multivariate autoregressive modeling of time series count data using copulas. Journal of Empirical Finance, 14(4), 564–583. [Google Scholar] [CrossRef]
- Iacopini, M., & Santagiustina, C. R. (2021). Filtering the intensity of public concern from social media count data with jumps. Journal of the Royal Statistical Society Series A: Statistics in Society, 184(4), 1283–1302. [Google Scholar] [CrossRef]
- Jan Bulla, C. C., & Kachour, M. (2015). On the bivariate skellam distribution. Communications in Statistics—Theory and Methods, 44(21), 4552–4567. [Google Scholar] [CrossRef]
- Joe, H. (2014). Dependence modeling with copulas. Chapman and Hall/CRC. [Google Scholar]
- Karlis, D., & Khan, M. N. (2023). Models for integer data. Annual Review of Statistics and Its Application, 10, 297–323. [Google Scholar] [CrossRef]
- Karlis, D., Michels, R., & Otting, M. (2024). Modelling handball outcomes using univariate and bivariate approaches. arXiv, arXiv:2404.04213. [Google Scholar] [CrossRef]
- MacDonald, I. L., & Zucchini, W. (1997). Hidden markov and other models for discrete-valued time series (Vol. 110). CRC Press. [Google Scholar]
- Nelder, J. A., & Wedderburn, R. W. M. (1972). Generalized linear models. Journal of the Royal Statistical Society. Series A (General), 135(3), 370–384. Available online: http://www.jstor.org/stable/2344614 (accessed on 29 November 2025). [CrossRef]
- Page, E. S. (1954). Continuous inspection schemes. Biometrika, 41(1–2), 100–115. [Google Scholar] [CrossRef]
- Rémillard, B. (2017). Goodness-of-fit tests for copulas of multivariate time series. Springer. [Google Scholar]
- Scotto, M. G., Weiß, C. H., & Gouveia, S. (2015). Thinning-based models in the analysis of integer-valued time series: A review. Statistical Modelling, 15(6), 590–618. [Google Scholar] [CrossRef]
- Sellers, K. (2012). A distribution describing differences in count data containing common dispersion levels. Advances and Applications in Statistical Sciences, 7(3), 35–46. [Google Scholar]
- Skellam, J. G. (1951). Random dispersal in theoretical populations. Biometrika, 38(1/2), 196–218. [Google Scholar] [CrossRef] [PubMed]
- Wang, F., & Wang, H. (2018). Modelling non-stationary multivariate time series of counts via common factors. Journal of the Royal Statistical Society Series B: Statistical Methodology, 80(4), 769–791. [Google Scholar] [CrossRef]
- Weiß, C. (2020). Stationary count time series models. Wiley Interdisciplinary Reviews: Computational Statistics, 13, e1502. [Google Scholar] [CrossRef]
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |