# Count Data Time Series Modelling in Julia—The CountTimeSeries.jl Package and Applications

## Abstract

**:**

## 1. Introduction

## 2. The CountTimeSeries Package

#### 2.1. INGARCH Framework

#### 2.2. INARMA Framework

#### 2.3. Package Structure

`CountModel`covers every possible model described in the previous section. Models in the INGARCH or INARMA framework are collected in the types

`INGARCH`and

`INARMA`, respectively. Subtypes of these two are finally

`INGARCHModel`,

`INARCHModel`and

`IIDModel`, as well as

`INARMAModel`,

`INARModel`, and

`INMAModel`. This definition of a type tree allows to implement methods for certain groups of models.

`Model()`is implemented to specify a model. The user provides the model framework, INGARCH or INARMA, the distribution, the link function, model orders p and q, regressors if wanted with an indicator whether they should be treated as internal or external and whether or not zero inflation should be considered. Default setting is a simple Poisson IID model. A Negative Binomial INARCH(1) with zero inflation is for example specified by

`Model(distr = "NegativeBinomial", pastObs = 1, zi = true)`.

`parameter`, with entries for ${\beta}_{0}$, $\alpha $, $\beta $, $\eta $, $\varphi $, and $\omega $. Note that in the implementation, no notational distinction is made between parameters for internal and external regressors. The two ways of providing parameter values are useful for example during the optimization of the likelihood, which uses the parameter vector, whereas estimation results are more convenient to handle as the parameter type.

`simulate()`function by calling

`simulate(T, model, parameter)`for a time series of length

`T`.

`MLESettings()`function. The optimization routine can be

`"NelderMead"`,

`"BFGS"`, or

`"LBFGS"`. If inference in terms of confidence intervals shall be conducted, the argument

`ci`needs to be set to true. Standard errors are then computed from the numerical Hessian matrix.

`fit()`function, which takes the time series, the model, and, if chosen, the settings as input. The likelihood is maximized while considering constraints on the parameters. Constraints include positivity of conditional means at any time, proper thinning probabilities and constraints that ensure stability of the process.

`parametercheck`checks whether parameters are valid whenever calling the log-likelihood function. If invalid parameters are put in, the log-likelihood function simply returns negative infinity. This approach is usually unproblematic if starting values for the optimization are valid and not too close to being invalid.

`fit()`function returns an object with estimates, standard errors, log-likelihood for estimates, and many more. This result object can be forwarded to functions for information criteria,

`AIC()`,

`BIC()`, and

`HQIC()`, or to the function

`pit()`. Then, the non-randomized probability integral transform histogram (see Czado et al. [14]) is plotted.

`predict()`function. For models of the INGARCH framework, two options are available. Predictions can either be deterministic or simulation-based. In the deterministic approach, conditional means are used as prediction, and if an observation in the definition of the conditional mean is not observed, it is replaced by its corresponding prediction. In the simulation-based approach, the time series is continued many times with random realizations, or chains, following the process. This conveniently provides prediction intervals as the quantiles of the chains.

`model = Model(pastObs = 1)`

`y = simulate(500, model, [3, 0.95])[1]`

`result = fit(y, model)`

`pred = predict(result, 100, 10000)`

## 3. Application: COVID-19

#### 3.1. Model

#### 3.2. Implementation

`X`, the Poisson model is created by running

`modelPois = Model(model = "INGARCH",`

`pastObs = [1, 7],`

`pastMean = 1,`

`distr = "Poisson",`

`link = "Log",`

`external = fill(false, 5),`

`X = X)`

`"NegativeBinomial"`. To fit the models to the time series

`y`, settings are chosen as the Nelder–Mead optimization routine and no inference. Then, the Poisson model is fitted by running

`settingPois = MLESetting(y, modelPois, inits, optimizer = "NelderMead",`

`ci = false)`

`resultsPois = fit(y, modelPois, settingPois)`

`QPois(resultsPois)`. The function returns an updated results object.

`predict`. For each of the three results, a matrix of new values for regressors is needed and saved as

`xNew`. Then, the Julia code for a prediction with 10,000 chains is

`predict(resultsPois, 7, 10000, xNew)`

#### 3.3. Results

## 4. Application: Animal Health in New Zealand

#### 4.1. Model and Implementation

`Model(model = "INARMA",`

`pastObs = 1,`

`zi = true)`

`distr = "NegativeBinomial"`. For a model order $p=2$,

`pastObs = 1`is changed to

`pastObs = 1:2`.

`inits`of type parameter, the function

`MLESettings`is again used to specify estimation settings as

`settings = MLESettings.(fill(dat.Anorexia, 8), models,`

`optimizer = "NelderMead", ci = true)`

`inits`. The output then is a vector of estimation settings. Following that manner, the results for both time series can be computed all at once by

`results = Array{INARMAResults, 2}(undef, (8, 2))`

`results[:, 1] = fit.(fill(dat.Anorexia, 8), models, settings)`

`results[:, 2] = fit.(fill(dat.Lesions, 8), models, settings)`

`AIC.(results, 2)`

#### 4.2. Results

## 5. Application: Corporate Insolvencies in Rhineland-Palatinate

#### 5.1. Models and Implementation

`AIC(results)`,

`BIC(results)`, or

`HQIC(results)`. An additional argument

`dropfirst`can be passed to these functions to suppress the likelihood contributions of first observations. This comes handy when comparing for example an INARCH(1) and an INARCH(2). Likelihood based estimation of these conditions on the first one or two observations respectively. For rather short time series, this difference might be crucial.

`pit`produces a histogram running

`pit(results, nbins = 10, level = 0.95)`that is uniformly distributed if the model choice is correct. The argument

`nbins`specifies the number of bins and the argument

`level`can be used to test the uniform distribution at the $(1-\mathtt{level})$-level. If

`level`is put in, lines are drawn in the histogram and if at least one bin exceeds the lines, the null hypothesis of a uniform distribution is rejected.

#### 5.2. Results

## 6. Simulation Study: Finite Sample ML—Estimation

#### 6.1. Study and Implementation

`simulate(T, model, truepar)`

`model`and parameters

`truepar`. Then, with estimation settings collected in the object

`setting`, parameters are estimated by

`fit(y, model, setting, initiate = initiate, printResults = false)`

`initiate`specifies how the recursion ${\lambda}_{t}$ is started and the argument

`printResults`can be set to

`false`to omit that results are printed in the console.

#### 6.2. Results

## 7. Discussion and Outlook

## Supplementary Materials

## Funding

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## Appendix A

**Table A1.**Estimation results without zero inflation—significance highlighting for 5% level (light green), 1% level (medium green), 0.1% level (dark green), and not significant at the 5% level (gray).

Model | ${\widehat{\mathit{\beta}}}_{0}$ | ${\widehat{\mathit{\alpha}}}_{1}$ | ${\widehat{\mathit{\alpha}}}_{2}$ | $\widehat{\mathit{\varphi}}$ | AIC | |
---|---|---|---|---|---|---|

Anorexia | Pois-INAR(1) | 0.511 | 0.385 | - | - | 225.05 |

(0.087) | (0.073) | - | - | |||

Pois-INAR(2) | 0.450 | 0.353 | 0.098 | - | 224.70 | |

(0.088) | (0.078) | (0.072) | - | |||

NB-INAR(1) | 0.579 | 0.304 | - | 0.194 | 181.64 | |

(0.173) | (0.078) | - | (0.011) | |||

NB-INAR(2) | 0.545 | 0.220 | 0.118 | 0.139 | 180.37 | |

(0.188) | (0.096) | (0.078) | (0.059) | |||

Lesions | Pois-INAR(1) | 1.172 | 0.173 | - | - | 294.77 |

(0.146) | (0.068) | - | - | |||

Pois-INAR(2) | 0.976 | 0.145 | 0.132 | - | 292.17 | |

(0.153) | (0.068) | (0.067) | - | |||

NB-INAR(1) | 1.236 | 0.128 | - | 0.839 | 268.59 | |

(0.217) | (0.076) | - | (0.006) | |||

NB-INAR(2) | 1.017 | 0.084 | 0.164 | 0.608 | 265.85 | |

(0.214) | (0.076) | (0.075) | (0.248) |

District | Mean | Var | Model | Trend |
---|---|---|---|---|

Ahrweiler | 3.11 | 3.39 | Pois IID | No |

Altenkirchen | 3.03 | 5.34 | NB-INGARCH(1, 1) | Yes |

Alzey-Worms | 2.76 | 4.61 | NB-INGARCH(2, 1) | Yes |

Bad Dürckheim | 1.88 | 2.57 | NB INARCH(2) | Yes |

Bad Kreuznach | 4.94 | 5.90 | Pois IID | Partly |

Bernkastell-Wittlich | 3.25 | 4.58 | Pois IID | Yes |

Birkenfeld | 2.29 | 3.22 | NB-INARCH(1) | Yes |

Cochem-Zell | 1.29 | 1.72 | NB IID | Partly |

Donnersbergkreis | 1.38 | 1.86 | P-INGARCH(1, 1) | Partly |

Eifelkr. Bitburg-Prüm | 2.03 | 3.28 | NB-IID | Partly |

Frankenthal, kfr. S. | 0.96 | 1.29 | P-INGARCH(1, 1) | Partly |

Gemersheim | 1.71 | 1.93 | Pois IID | No |

Kaiserslautern, kfr. S. | 2.95 | 4.59 | P-INGARCH(2, 1) | Yes |

Kaiserslautern | 2.73 | 6.03 | NB-INGARCH(2, 1) | Yes |

Koblenz | 3.49 | 4.79 | P-INGARCH(1, 1) | Partly |

Kusel | 1.27 | 1.53 | P-INARCH(2) | No |

Landau i.d.P | 0.83 | 0.91 | Pois IID | No |

Ludwigshafen | 3.06 | 4.73 | NB IID | Yes |

Mainz | 4.90 | 10.62 | NB IID | Yes |

Mainz-Bingen | 4.55 | 8.21 | P-INGARCH(2, 1) | Partly |

Mayen-Koblenz | 5.24 | 6.67 | NB-INGARCH(1, 1) | Partly |

Neustadt a.d.W. | 1.06 | 1.23 | P-INGARCH(1, 1) | Yes |

Neuwied | 6.73 | 11.84 | P-INGARCH(2, 1) | Yes |

Pirmasens | 0.94 | 0.91 | Pois IID | No |

Rhein-Hunsrück-Kreis | 2.56 | 3.07 | Pois IID | Partly |

Rhein-Lahn-Kreis | 2.88 | 3.83 | Pois IID | Partly |

Rhein-Pfalz-Kreis | 2.40 | 2.82 | Pois IID | Yes |

Speyer | 1.01 | 1.04 | Pois IID | Partly |

Südliche Weinstraße | 1.76 | 1.98 | Pois IID | Yes |

Südwestpfalz | 1.61 | 1.87 | Pois IID | Yes |

Trier, kfr. S. | 1.87 | 1.85 | Pois IID | Yes |

Trier-Saarburg | 1.46 | 1.90 | NB IID | No |

Vulkaneifel | 1.41 | 1.70 | P-INGARCH(1, 1) | Yes |

Westerwald | 5.30 | 8.66 | NB-INGARCH(1, 1) | Yes |

Worms | 2.94 | 8.88 | NB-INGARCH(1, 1) | Yes |

Zweibrücken | 0.88 | 1.13 | P-INARCH(1) | No |

**Figure A3.**Mean relative bias of ${\widehat{\alpha}}_{1}$ for T = 50 (solid), T = 200 (dashed), and T = 1000 (dotted).

**Figure A4.**Mean relative bias of ${\widehat{\beta}}_{1}$ for T = 50 (solid), T = 200 (dashed), and T = 1000 (dotted).

## References

- Alzaid, A.; Al-Osh, M. First-Order Integer-Valued Autoregressive (INAR(1)) Process: Distributional and Regression Properties. Stat. Neerl.
**1988**, 41, 53–60. [Google Scholar] [CrossRef] - Ferland, R.; Latour, A.; Oraichi, D. Integer-Valued GARCH Process. J. Time Ser. Anal.
**2006**, 27, 923–942. [Google Scholar] [CrossRef] - Bezanson, J.; Karpinski, S.; Shah, V.B.; Edelman, A. Julia: A Fast Dynamic Language for Technical Computing. arXiv
**2012**, arXiv:1209.5145. [Google Scholar] - Liboschik, T.; Fried, R.; Fokianos, K.; Probst, P. tscount: Analysis of Count Time Series, R Package Version 1.4.1; Available online: https://cran.r-project.org/web/packages/tscount/index.html (accessed on 16 March 2021).
- Weiß, C.H.; Feld, M.H.J.M.; Mamode Khan, N.; Sunecher, Y. INARMA Modeling of Count Time Series. Stats
**2019**, 2, 284–320. [Google Scholar] [CrossRef][Green Version] - Harte, D. HiddenMarkov: Hidden Markov Models; R Package Version 1.8-11; Statistics Research Associates: Wellington, New Zealand, 2017. [Google Scholar]
- Himmelmann, L. HMM: HMM—Hidden Markov Models, R Package Version 1.0; Available online: https://cran.r-project.org/web/packages/HMM/index.html (accessed on 16 March 2021).
- Jackman, S. pscl: Classes and Methods for R Developed in the Political Science Computational Laboratory; R Package Version 1.5.5. Available online: https://github.com/atahk/pscl/ (accessed on 16 March 2021).
- Zeileis, A.; Kleiber, C.; Jackman, S. Regression Models for Count Data in R. J. Stat. Softw.
**2008**, 27, 1–25. [Google Scholar] [CrossRef][Green Version] - Mouchet, M. HMMBase—A Lightweight and Efficient Hidden Markov Model Abstraction. 2020. Available online: https://github.com/maxmouchet/HMMBase.jl (accessed on 16 March 2021).
- Weiß, C.H.; Feld, M. On the performance of information criteria for model identification of count time Series. Stud. Nonlinear Dyn. Econom.
**2019**, 24. [Google Scholar] [CrossRef] - Liboschik, T.; Kerschke, P.; Fokianos, K.; Fried, R. Modelling interventions in INGARCH processes. Int. J. Comput. Math.
**2016**, 93, 640–657. [Google Scholar] [CrossRef][Green Version] - Aghababaei Jazi, M.; Jones, G.; Lai, C.D. First-order integer valued AR processes with zero inflated poisson innovations. J. Time Ser. Anal.
**2012**, 33, 954–963. [Google Scholar] [CrossRef] - Czado, C.; Gneiting, T.; Held, L. Predictive Model Assessment for Count Data. Biometrics
**2009**, 65, 1254–1261. [Google Scholar] [CrossRef] [PubMed] - RKI. Robert-Koch-Institut: [email protected] 2.0. 2021. Available online: https://survstat.rki.de/ (accessed on 16 March 2021).
- NPGEO. RKI COVID19. 2021. Available online: https://npgeo-corona-npgeo-de.hub.arcgis.com/datasets/dd4580c810204019a7b8eb3e0b329dd6_0 (accessed on 16 March 2021).
- World Health Organization. Transmission of SARS-CoV-2: Implications for Infection Prevention Precautions. 2020. Available online: https://www.who.int/news-room/commentaries/detail/transmission-of-sars-cov-2-implications-for-infection-prevention-precautions (accessed on 7 May 2021).
- Christou, V.; Fokianos, K. Quasi-Likelihood Inference for Negative Binomial Time Series Models. J. Time Ser. Anal.
**2014**, 35, 55–78. [Google Scholar] [CrossRef] - Mohammadpour, M.; Bakouch, H.; Shirozhan, M. Poisson-Lindley INAR(1) model with applications. Braz. J. Probab. Stat.
**2018**, 32, 262–280. [Google Scholar] [CrossRef] - Schweer, S.; Weiß, C.H. Compound Poisson INAR(1) processes: Stochastic properties and testing for overdispersion. Comput. Stat. Data Anal.
**2014**, 77, 267–284. [Google Scholar] [CrossRef] - Röhl, K.H.; Vogt, G. Unternehmensinsolvenzen in Deutschland. 2019. Available online: https://www.iwkoeln.de/studien/iw-trends/beitrag/klaus-heiner-roehl-unternehmensinsolvenzen-in-deutschland-trendwende-voraus-449151.html (accessed on 16 March 2021).
- Li, Q.; Chen, H.; Zhu, F. Robust Estimation for Poisson Integer-Valued GARCH Models Using a New Hybrid Loss. J. Syst. Sci. Complex.
**2021**. [Google Scholar] [CrossRef] - Xiong, L.; Zhu, F. Minimum Density Power Divergence Estimator for Negative Binomial Integer-Valued GARCH Models. Commun. Math. Stat.
**2021**. [Google Scholar] [CrossRef] - Weiß, C.H. Stationary count time series models. WIREs Comput. Stat.
**2021**, 13. [Google Scholar] [CrossRef][Green Version] - Möller, T.; Weiß, C.; Kim, H.Y.; Sirchenko, A. Modeling Zero Inflation in Count Data Time Series with Bounded Support. Methodol. Comput. Appl. Probab.
**2018**, 20. [Google Scholar] [CrossRef] - Quoreshi, A.M.M.S. Bivariate Time Series Modeling of Financial Count Data. Commun. Stat. Theory Methods
**2006**, 35, 1343–1358. [Google Scholar] [CrossRef] - Eurostat. GISCO: Geographische Informationen und Karten. 2021. Available online: https://ec.europa.eu/eurostat/de/web/gisco/geodata/reference-data/administrative-units-statistical-units/nuts (accessed on 16 March 2021).

**Figure 9.**Mean Bias of ${\widehat{\alpha}}_{1}$ for T = 50 (solid), T = 200 (dashed), and T = 1000 (dotted).

**Figure 10.**Mean Bias of ${\widehat{\beta}}_{1}$ for for T = 50 (solid), T = 200 (dashed), and T=1000 (dotted).

**Table 1.**Prediction Limburg–Weilburg: Root Mean Squared Prediction Error, Median Absolute Prediction Error and Percentage of Observations Inside the 95% Prediction Interval.

Criterion | Model | Prediction Horizon | ||||||
---|---|---|---|---|---|---|---|---|

1 | 2 | 3 | 4 | 5 | 6 | 7 | ||

RMSPE | Poisson | 29.53 | 30.18 | 30.11 | 30.14 | 28.81 | 28.07 | 27.86 |

Quasi-Poisson | 29.53 | 30.07 | 30.04 | 29.98 | 28.59 | 27.89 | 27.71 | |

Negative Binomial | 30.95 | 31.14 | 30.89 | 30.86 | 30.38 | 29.68 | 30.09 | |

MedAPE | Poisson | 10.14 | 10.39 | 10.19 | 10.27 | 10.38 | 10.20 | 10.03 |

Quasi-Poisson | 10.07 | 10.22 | 10.26 | 10.12 | 10.28 | 9.96 | 9.69 | |

Negative Binomial | 12.55 | 12.43 | 12.19 | 12.05 | 11.71 | 11.97 | 11.75 | |

Inside PI | Poisson | 46.5 | 46.5 | 46.5 | 48.5 | 50.5 | 49.5 | 49.5 |

Quasi-Poisson | 97.0 | 97.0 | 97.0 | 97.0 | 97.0 | 97.0 | 97.0 | |

Negative Binomial | 98.0 | 98.0 | 99.0 | 99.0 | 99.0 | 99.0 | 99.0 |

(Quasi-)Poisson | NegativeBinomial | |||||
---|---|---|---|---|---|---|

Estimate | Std. Err. | Conf. Interval | Estimate | Std. Err. | Conf. Interval | |

${\beta}_{0}$ | −0.159 | 0.049 | (−0.254,−0.063) | −0.540 | 0.107 | (−0.749,−0.330) |

${\alpha}_{1}$ | 0.041 | 0.018 | (0.006, 0.077) | 0.115 | 0.063 | (−0.010,0.239) |

${\alpha}_{7}$ | 0.122 | 0.022 | (0.079, 0.165) | 0.088 | 0.080 | (−0.068, 0.245) |

${\beta}_{1}$ | 0.053 | 0.030 | (−0.006, 0.112) | 0.005 | 0.100 | (−0.190, 0.200) |

${\zeta}_{1}$ | −0.032 | 0.015 | (−0.061,−0.003) | −0.034 | 0.057 | (−0.146, 0.078) |

${\zeta}_{2}$ | 0.221 | 0.027 | (0.168, 0.274) | 0.252 | 0.076 | (0.103, 0.401) |

${\zeta}_{3}$ | 0.314 | 0.029 | (0.257, 0.370) | 0.356 | 0.078 | (0.203, 0.509) |

${\zeta}_{4}$ | 0.149 | 0.024 | (0.102, 0.196) | 0.194 | 0.084 | (0.030, 0.359) |

${\zeta}_{5}$ | 0.265 | 0.024 | (0.218, 0.312) | 0.317 | 0.074 | (0.172, 0.463) |

$\varphi $ | 1.405 | 1.516 | 0.164 | (1.195, 1.837) |

**Table 3.**Estimation results with zero inflation: significance highlighting for 5% level (light green), 1% level (medium green), 0.1% level (dark green), and not significant at the 5% level (grey).

Model | ${\widehat{\mathit{\beta}}}_{0}$ | ${\widehat{\mathit{\alpha}}}_{1}$ | ${\widehat{\mathit{\alpha}}}_{2}$ | $\widehat{\mathit{\varphi}}$ | $\widehat{\mathit{\omega}}$ | AIC | |
---|---|---|---|---|---|---|---|

Anorexia | Pois-INAR(1) | 2.215 | 0.338 | - | - | 0.669 | 183.00 |

(0.428) | (0.074) | - | - | (0.069) | |||

Pois-INAR(2) | 2.692 | 0.249 | 0.118 | - | 0.752 | 179.85 | |

(0.536) | (0.090) | (0.070) | - | (0.061) | |||

NB-INAR(1) | 1.642 | 0.310 | - | 1.278 | 0.424 | 182.25 | |

(0.831) | (0.077) | - | (1.766) | (0.283) | |||

NB-INAR(2) | 2.363 | 0.225 | 0.120 | 2.668 | 0.630 | 179.94 | |

(0.794) | (0.093) | (0.073) | (3.561) | (0.178) | |||

Lesions | Pois-INAR(1) | 2.042 | 0.175 | - | - | 0.372 | 276.26 |

(0.278) | (0.071) | - | - | (0.077) | |||

Pois-INAR(2) | 2.055 | 0.110 | 0.176 | - | 0.464 | 271.32 | |

(0.333) | (0.073) | (0.073) | - | (0.089) | |||

NB-INAR(1) | 1.344 | 0.130 | - | 1.018 | 0.047 | 270.60 | |

(0.599) | (0.077) | - | (1.081) | (0.242) | |||

NB-INAR(2) | 1.252 | 0.084 | 0.166 | 0.920 | 0.103 | 267.76 | |

(0.754) | (0.076) | (0.075) | (1.218) | (0.313) |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Stapper, M. Count Data Time Series Modelling in Julia—The CountTimeSeries.jl Package and Applications. *Entropy* **2021**, *23*, 666.
https://doi.org/10.3390/e23060666

**AMA Style**

Stapper M. Count Data Time Series Modelling in Julia—The CountTimeSeries.jl Package and Applications. *Entropy*. 2021; 23(6):666.
https://doi.org/10.3390/e23060666

**Chicago/Turabian Style**

Stapper, Manuel. 2021. "Count Data Time Series Modelling in Julia—The CountTimeSeries.jl Package and Applications" *Entropy* 23, no. 6: 666.
https://doi.org/10.3390/e23060666