A Copula-Based Model for Analyzing Bivariate Offense Data

Dimuthu Fernando; Wimarsha Jayanetti

doi:10.3390/stats8040111

and

Department of Statistics, Grand Valley State University, Allendale, MI 49401, USA

^*

Author to whom correspondence should be addressed.

Stats2025, 8(4), 111;https://doi.org/10.3390/stats8040111

This article belongs to the Section Time Series Analysis

Version Notes

Order Reprints

Abstract

We developed a class of bivariate integer-valued time series models using copula theory. Each count time series is modeled as a Markov chain, with serial dependence characterized through copula-based transition probabilities for Poisson and Negative Binomial marginals. Cross-sectional dependence is modeled via a bivariate Gaussian copula, allowing for both positive and negative correlations and providing a flexible dependence structure. Model parameters are estimated using likelihood-based inference, where the bivariate Gaussian copula integral is evaluated through standard randomized Monte Carlo methods. The proposed approach is illustrated through an application to offense data from New South Wales, Australia, demonstrating its effectiveness in capturing complex dependence patterns.

Keywords:

count time series; copula; bivariate models

1. Introduction

Multivariate count time series frequently arise in modern statistical analysis and often exhibit dependence both within and between series. In many applications, time-series counts are observed as bivariate vectors that display serial dependence within each series, as well as cross-correlation between the two. Building on the framework of the integer-valued autoregressive moving average model (INARMA), Quoreshi [1] proposed the bivariate integer-valued moving average model (BINMA), which accommodates both positive and negative correlations between counts. He also extended the BINMA model to a multivariate setting. Wang et al. [2] proposed a bivariate zero-inflated Poisson model to analyze occupational injuries. Heinen and Rengifo [3] proposed a multivariate autoregressive conditional doubly Poisson model capable of handling over-dispersion, serial dependence, and cross-correlation. The cross-correlation between the time series was modeled using a multivariate Gaussian copula, and the parameters were estimated through a two-stage procedure. The work of Karlis and Pedeli [4] presented a bivariate integer-valued autoregressive process of order 1 (BINAR (1)) in which the cross-correlation is modeled by the use of copula to accommodate both positive and negative correlations in Poisson and Negative Binomial counts. Also, they illustrated the use of Frank and Gaussian copulas to specify the joint distribution of the innovations. Marginal time series are modeled using Poisson and Negative Binomial INAR(1) models. Ravishanker et al. [5] applied state space models for multivariate count time series and used them to analyze a market dataset. One major advantage of copula-based methods is that they can separate the modeling of marginal and temporal dependence. Bradshaw and Blei [6] constructed a generative model of underreported campus sexual assault data that allows the estimation of the true incidence and reporting rates. Additionally, they used the Hamiltonian Monte Carlo (HMC) sampling scheme for posterior inference regarding reporting rates and assault incidence in each school and applied this method to analyze campus sexual assault data. Cui and Zhu [7] proposed a new bivariate Poisson INGARCH model, which allows for positive or negative cross-correlation between time series. Ahamad et al. [8] proposed a bivariate count data regression model to capture the dependence between multiple crash outcomes that traditional independent count models fail to represent. In this approach, appropriate marginal distributions (such as Poisson or Negative Binomial) are specified for each crash count type, and a copula function is then used to link the two margins to model their joint dependence. Jeng et al. [9] proposed a copula-based time series model to forecast COVID-19 cases and trends based on wastewater SARS-CoV-2 viral load and clinical variables. The model was developed in two stages. In the first stage, time-series methods were used to examine and characterize the marginal distributions of both the dependent and independent variables. In the second stage, copula-based marginal regression analysis was applied to model and predict the COVID-19 case trends.

Traditional bivariate Negative Binomial regression models typically assume a specific joint distribution for the dependent variables or their associated error terms. A common approach is to impose a bivariate gamma distribution on the error terms when combined in the bivariate framework. However, such assumptions may inadequately capture the underlying dependence structure between the error terms. Moreover, as noted by Xu and Hardin [10], there is no explicit joint distribution for the error terms in this setting. This limitation motivates the need for more flexible approaches. In particular, a copula-based bivariate Negative Binomial model provides a natural extension. In this previous work (alaqawba et al. [11]), we developed a class of copula-based models to analyze bivariate count data, where the marginal distributions were specified as Poisson and zero-inflated Poisson (ZIP). However, when analyzing count time series data, over-dispersion is often present. In this manuscript, we extend that framework by incorporating Negative Binomial marginals, which better accommodate over-dispersion. The copula-based approach enables us to model each marginal distribution appropriately while flexibly capturing the dependence between the series through the selected copula family. This flexibility allows the model to accommodate both positive and negative dependence, providing a more general and adaptable framework for analyzing bivariate count time series data.

The remainder of the paper is organized as follows. Section 2 provides a concise overview of Poisson and Negative Binomial regression models, along with a summary of copula theory. It then introduces the proposed class of copula-based bivariate models for analyzing two dependent time series, where each series is modeled through a copula-based Markov chain and jointly linked using a bivariate copula family. Section 3 outlines the parameter estimation procedure via Maximum Likelihood Estimation (MLE), presents the results of simulation studies, and applies the proposed methodology to a real dataset. Finally, Section 4 concludes the paper.

2. Materials and Methods

2.1. The Poisson Distribution

The Poisson distribution is a common choice for modeling count data. In our proposed copula-based bivariate model, we use it as a marginal distribution. Let

y_{t}

represent a random count observed at time t. The probability mass function (pmf) of the Poisson distribution is given by:

f (y_{t}) = \frac{e^{- λ} λ^{y_{t}}}{y_{t}!},

where

λ > 0

is the intensity parameter with

E (y_{t}) = λ

and

V (y_{t}) = λ

.

2.2. The Negative Binomial Distribution

With the introduction of an additional parameter

(κ)

the Negative Binomial distribution is able to account for over-dispersion when compared to the Poisson distribution.

f (y_{t}) = \frac{Γ (κ + y_{t})}{Γ (κ) y_{t}!} {(\frac{κ}{κ + λ})}^{κ} {(\frac{λ}{κ + λ})}^{y_{t}} f o r y_{t} = 0, 1, 2, \dots

where

λ

and

κ

are parameters associated with intensity and dispersion with

E (y_{t}) = λ

and

V (y_{t}) = λ + \frac{λ^{2}}{κ}

.

2.3. Copulas

As a multivariate cumulative distribution function (cdf), the copula is a joint function that captures the dependence structure between variables. With uniform margins

U (0, 1)

as in Nelson [12], a n-dimensional copula is a function

C : {[0, 1]}^{n} \to [0, 1]

with the following three properties:

$C (1, \dots, u_{t}, \dots, 1) = u_{t}, \forall t = 1, 2, \dots, n$ and $u_{t} \in [0, 1] .$
$C (u_{1}, u_{2}, \dots, u_{n}) = 0$ if at least one $u_{t} = 0$ for $t = 1, 2, \dots, n .$
For any $u_{t_{1}}, u_{t_{2}} \in [0, 1]$ with $u_{t_{1}} \leq u_{t_{2}}$ , for $t = 1, 2, \dots, n,$

$\sum_{j_{1} = 1}^{2} \sum_{j_{2} = 1}^{2} \dots \sum_{j_{n} = 1}^{2} {(- 1)}^{j_{1} + j_{2} + \dots + j_{n}} C (u_{1 j_{1}}, u_{2 j_{2}} \dots, u_{n j_{n}}) \geq 0 .$

Let

Y_{1}, \dots, Y_{n}

be r.v.’s with marginal cdf’s

F_{1}, \dots, F_{n}

and joint cdf F, then

there exists an n-dimensional copula C such that for all $y_{1}, \dots, y_{n} \in R$

$F (y_{1}, y_{2}, \dots, y_{n}) = C (F_{1} (y_{1}), F_{2} (y_{2}), \dots, F_{n} (y_{n})) .$
If $Y_{1}, \dots, Y_{n}$ are continuous then the copula C is unique. Otherwise, C can be uniquely determined on n dimensional rectangle $R a n g e (F_{1}) \times R a n g e (F_{2}) \times \dots \times R a n g e (F_{n})$ .

When all the margins are integer valued, the multivariate probability mass function can be obtained as

f (y_{1}, y_{2}, \dots, y_{n}) = P (Y_{1} = y_{1}, Y_{2} = y_{2}, \dots, Y_{n} = y_{n})

= \sum_{j_{1} = 1}^{2} \sum_{j_{2} = 1}^{2} \dots \sum_{j_{n} = 1}^{2} {(- 1)}^{j_{1} + j_{2} + \dots + j_{n}} C (u_{1 j_{1}}, u_{2 j_{2}} \dots, u_{n j_{n}})

(1)

where

u_{t 1} = F_{t} (y_{t})

and

u_{t 2} = F_{t} (y_{t}^{-})

. Here

F_{t} (y_{t}^{-})

is the left-hand limit of

F_{t}

at

y_{t}

, which is equal to

F_{t} (y_{t} - 1)

. In the bivariate case,

\begin{matrix} P r (Y_{1} = y_{1}, Y_{2} = y_{2}) & = C (F (y_{1}), F (y_{2}); θ) - C (F (y_{1}^{-}), F (y_{2}); θ) \\ - C (F (y_{1}), F (y_{2}^{-}); θ) + C (F (y_{1}^{-}), F (y_{2}^{-}); θ) . \end{matrix}

Here,

θ

denotes the dependence parameter of the copula function, and a variety of copula families, denoted by C, are available for selection. Table 1 lists several commonly used copula families. More details on these families can be found in Joe [13]. Bivariate copulas such as Gaussian, Frank, and t can model both positive and negative dependencies, while the Gumbel, Clayton, and Plackett copulas are limited to capturing only positive dependencies. In this study, we focus mainly on the Gaussian copula, as it can accommodate both positive and negative dependence; however, different families of copula may be used depending on the context and nature of the data.

Table 1. Bivariate copula functions.

2.4. Copula Based Model for Count Time Series Data

This section focuses on the development of a class of bivariate count time series models. The joint distribution of successive observations is formulated using copula functions, allowing for flexible modeling of both serial dependence and cross-correlation structures. Specifically, cross-dependence between the two series is captured via an additional copula function. The models are constructed under a first-order stationary Markov framework with marginal distributions specified as either Poisson or Negative Binomial. For first-order Markov models, bivariate copula functions such as the bivariate Gaussian copula are chosen to construct the joint distribution between two consecutive observations.

2.5. Copula Based Bivariate Model

Bivariate integer-valued time series model was constructed via copula theory. Suppose that we observe a series of 2-dimensional vector,

{Y_{t}}_{t = 1}^{n}

, where

Y_{t} = {(Y_{1 t}, Y_{2 t})}^{'}

for

t = 1, 2, \dots, n

. Assume that each series

{Y_{1 t}}_{t = 1}^{n}

and

{Y_{2 t}}_{t = 1}^{n}

follows a first-order Markov process based on copula (see Alqawba and Diawara [14] for an example). Then, the mean vector

μ_{t}

, and the covariance matrix, say

Γ (t, t - 1)

are defined as follows.

\begin{matrix} μ_{t} & = & E (Y_{t}) = [\begin{matrix} E (Y_{1 t}) \\ E (Y_{2 t}) \end{matrix}], \end{matrix}

and

\begin{matrix} Γ (t, t - 1) & = & COV (Y_{t}, Y_{t - 1}) \\ = & [\begin{matrix} COV (Y_{1 t}, Y_{1, t - 1}) & COV (Y_{1 t}, Y_{2, t - 1}) \\ COV (Y_{2 t}, Y_{1, t - 1}) & COV (Y_{2 t}, Y_{2, t - 1}) \end{matrix}] . \end{matrix}

Since the conditional dependence is defined through a first-order Markov process, the covariance matrix

Γ (t, t - 1)

is defined for

t = 2, \dots, n .

The diagonal elements of the covariance matrix correspond to the autocovariance within each time series, while the off-diagonal elements capture the cross-covariance between the two series. Given the presence of serial dependence and cross-correlation, the joint probability distribution of

Y_{1 t}

and

Y_{2 t}

conditional on

Y_{1, t - 1}

and

Y_{2, t - 1}

, for

t = 1, \dots, n

, is expressed as:

\begin{matrix} f (y_{1 t}, y_{2 t} | y_{1, t - 1}, y_{2, t - 1}) = \int_{V^{- 1} (F_{1, t}^{-})}^{V^{- 1} (F_{1, t}^{+})} \int_{V^{- 1} (F_{2, t}^{-})}^{V^{- 1} (F_{2, t}^{+})} V_{2} (z_{1}, z_{2}, R) d z_{2} d z_{1}, \end{matrix}

(2)

where

V^{- 1}

denotes the inverse cdf of the normal distribution with

V_{2} (., R)

being the probability density function of the bivariate normal distribution. The matrix R is the correlation matrix of the joint distribution, which captures the cross-sectional dependence, and is defined as:

R = [\begin{matrix} 1 & ρ \\ ρ & 1 \end{matrix}],

where

ρ

is a dependence parameter Gaussian copula function that describes the cross-sectional dependence between the two count time series. Also,

F_{i, t}^{+} = F (y_{i t} | y_{i, t - 1})

and

F_{i, t}^{-} = F (y_{i t} - 1 | y_{i, t - 1})

, for

i = 1, 2

, where:

\begin{matrix} F (y_{i t} | y_{i, t - 1}) = \frac{F_{12} (y_{i t}, y_{i, t - 1}) - F_{12} (y_{i t}, y_{i, t - 1} - 1)}{f_{t - 1} (y_{i, t - 1}; θ)}, \end{matrix}

is the conditional cdf of

Y_{i t}

given

Y_{i, t - 1}

, for

i = 1, 2

, and

F_{12} (y_{i t}, y_{i, t - 1}) = C (F_{t} (y_{i t}), F_{t - 1} (y_{i, t - 1}); δ),

Here,

C (.; δ)

denotes a bivariate copula function with dependence parameter

δ

, which characterizes the serial dependence within a single time series. The vector of marginal parameters, denoted by

θ

, reduces to a scalar in the Poisson case, i.e.,

θ = λ

. The proposed model is applicable to the analysis of bivariate count time series data with marginal distributions that may follow any discrete distribution, offering flexibility beyond traditional parametric assumptions.

2.6. Inference

Parameter estimation is performed by maximizing the likelihood function, with the log-likelihood constructed using copula theory. Since this function has no closed-form expression, its maximization cannot be performed using standard methods [15]. The maximization technique used is presented next.

Using the conditional density function shown in the Equation (2) for

t = 1

, the joint distribution of

Y_{11}

and

Y_{21}

is given by

f (y_{11}, y_{21}) = \int_{V^{- 1} (F_{1, 1}^{-})}^{V^{- 1} (F_{1, 1}^{+})} \int_{V^{- 1} (F_{2, 1}^{-})}^{V^{- 1} (F_{2, 1}^{+})} V_{2} (z_{1}, z_{2}, R) d z_{2} d z_{1},

(3)

and for

t = 2, \dots, n

, the conditional bivariate distribution of

Y_{1 t} = y_{1 t}

and

Y_{2 t} = y_{2 t}

given

Y_{1, t - 1} = y_{1, t - 1}

and

Y_{2, t - 1} = y_{2, t - 1}

is given by

f (y_{1 t}, y_{2 t} | y_{1, t - 1}, y_{2, t - 1}) = \int_{V^{- 1} (F_{1, t}^{-})}^{V^{- 1} (F_{1, t}^{+})} \int_{V^{- 1} (F_{2, t}^{-})}^{V^{- 1} (F_{2, t}^{+})} V_{2} (z_{1}, z_{2}, R) d z_{2} d z_{1} .

(4)

Hence, joining the Equations in (3) and (4), the likelihood function is given by

L (ϑ; y) = f (y_{11}, y_{21}) . \prod_{t = 2}^{n} f (y_{1 t}, y_{2 t} ∣ y_{1, t - 1}, y_{2, t - 1}),

(5)

where

ϑ = {(θ^{'}, δ_{1}, δ_{2}, ρ)}^{'}

, here

θ

is the vector of marginal parameters,

δ_{1}

and

δ_{2}

are the serial dependence parameters to deal with the first and second count series, respectively. The bivariate dependence between the two time series is captured by

ρ

. Therefore, taking the log of the function in Equation (5), we can construct the log-likelihood function as follows:

log L (ϑ; y) = l (ϑ; y) = log f (y_{1 t}, y_{2 t}) + \sum_{t = 2}^{n} log f (y_{1 t}, y_{2 t} ∣ y_{1, t - 1}, y_{2, t - 1}) .

(6)

Maximizing the log-likelihood function in Equation (6) yields the ML estimates for the proposed model class. For the likelihood-based estimation of copula models, convergence issues may arise due to the high dimensionality of the parameter space and the nonlinearity of the likelihood surface. In our setting, the log-likelihood involves a bivariate normal integral, given in (2), which has no closed-form solution. To compute this integral, we employ the standard randomized importance sampling method of Genz and Bretz [16], which is effective for dimensions below ten. This procedure is implemented in the mvtnorm package by Hothorn et al. [17], available on CRAN. The package includes the function pmvnorm for computing multivariate normal probabilities.Then, the parameter estimates, i.e.,

\hat{ϑ}

, can be obtained as

\begin{matrix} \hat{ϑ} = \underset{ϑ}{\arg \max} l (ϑ; y) . \end{matrix}

This maximization technique produces a numerically calculated Hessian matrix that provides the Fisher’s information matrix (FIM). Using the inverse of the FIM, yields standard errors of the ML estimates of

ϑ

. In the next section, we evaluate the effectiveness of the proposed class of models through a comprehensive simulation study.

3. Results

3.1. Simulation Studies

A comprehensive simulation study was conducted to assess the proposed estimation method and verify the asymptotic properties of the parameter estimates. In our simulation procedure, we first generate a correlated Gaussian time series and then transform it into uniform random variables using the cdf of the standard normal distribution. These uniforms are then mapped through the inverse cdf of the chosen count distribution, producing dependent integer-valued time series that preserve both the serial and cross-dependence structures. We first consider bivariate Poisson count time series data. For each univariate series, a first-order stationary copula-based Markov model was specified, where a copula family defines the joint distribution of consecutive observations. The two series were then coupled at each time point using a bivariate copula function. Here,

λ_{1}

and

λ_{2}

denote the means of the two marginal distributions;

δ_{1}

and

δ_{2}

measure serial dependence within each series; and

ρ

measures the cross-correlation between the series. A Gaussian copula was chosen as the candidate copula family, with true parameters

λ_{1} = 4

,

λ_{2} = 6

,

δ_{1} = 0.5

,

δ_{2} = 0.4

, and

ρ = 0.5

. Assuming stationarity, the parameters of the marginal distributions,

θ

, are held constant over time. Simulations were performed for sample sizes of

n =

50, 100, 300 and 1000, each replicated 500 times. For the five parameters, standard error (SE), mean square error (MSE), and mean absolute error (MAE) were computed, with results summarized in Table 2. The MSE and MAE are defined as follows.

M S E = \frac{1}{m} \sum_{i = 1}^{m} {(θ_{i} - \hat{θ_{i}})}^{2}, M A E = \frac{1}{m} \sum_{i = 1}^{m} ∣ θ_{i} - \hat{θ_{i}} ∣,

where

\hat{θ_{i}}

estimated value of the parameter and m is the number of replications. We conducted another simulation setting using the Gaussian copula as the candidate copula family with true parameters (

λ_{1}

= 3,

λ_{2}

= 5,

δ_{1}

=

0.6

,

δ_{2}

=

0.4

,

ρ

=

- 0.5

). In this simulation scenario, the two time series are assumed to have negative cross-correlation. The corresponding results are presented in Table 3.

Table 2. Parameter estimates using Gaussian copula for univariate and joint distributions with Poisson marginals.

Table 3. Parameter estimates using Gaussian copula for univariate and joint distributions with Poisson marginals under negative cross-correlation.

Table 2 and Table 3 demonstrate that the parameter estimates converge to the true values, with standard errors decreasing as the sample size increases. Further, we observe that both the MSE and MAE for the parameter estimates decrease as the sample size increases. This pattern is expected, since larger sample sizes provide more information about the underlying dependence structure, leading to more accurate and stable parameter estimates. In practical terms, as we increase the sample size, the parameter estimates produced by the MLE approach become more precise, resulting in smaller bias and reduced variability. Figure 1 and Figure 2 present the Q–Q plots for the parameter estimates obtained using Poisson marginals. The Q–Q plots show that the empirical distribution of the parameter estimates aligns closely with the

45^{0}

reference line, indicating that the sampling distribution of the estimates is approximately normal. This supports the asymptotic normality of the maximum likelihood estimates and suggests stable inference as sample size increases.

Figure 1. Q–Q Plots of ML estimates for n = 1000 under positive cross-correlation with Poisson marginals.

Figure 2. Q–Q Plots of ML estimates for n = 1000 under negative cross-correlation with Poisson marginals.

With the introduction of one additional parameter the Negative Binomial(NB) distribution is able to account for over-dispersion when compared to the Poisson distribution. We have performed simulations choosing the Gaussian copula for univariate and joint distributions with NB marginals. Table 4 and Table 5 demonstrate that the parameter estimates converge to the true values, with standard errors decreasing as the sample size increases. These results indicate that our proposed model is performing well with the Negative Binomial marginals for both negative and positive cross-correlations.

Table 4. Parameter estimates using Gaussian copula for univariate and joint distributions with Negative Binomial marginals.

Table 5. Parameter estimates using Gaussian copula for univariate and joint distributions with Negative Binomial marginals under negative cross-correlation.

Figure 3 and Figure 4 present the Q–Q plots for the parameter estimates obtained using Negative Binomial marginals. The Q–Q plots show that the empirical distribution of the parameter estimates aligns closely with the

45^{0}

reference line, indicating that the sampling distribution of the estimates is approximately normal.

Figure 3. Q–Q plots of the ML estimates for n = 1000 with positive cross-correlation with Negative Binomial marginals.

Figure 4. Q–Q plots of the ML estimates for n = 1000 with negative cross-correlation with Negative Binomial marginals.

3.2. Real-Data Application

In this application, we fit a copula-based bivariate model to analyze offense data from New South Wales, Australia. The dataset, obtained from the NSW Bureau of Crime Statistics and Research, is categorized by local government area, offense category, and month. The analyzed series consists of 228 paired monthly observations spanning from August 1995 to July 2014. This dataset has also been used in previous research (Yang et al. [18]). We selected sexual offense counts from Northern Beaches and Waverley city for analysis. The sexual offense counts are the sum of two subcategories:

Sexual assault, and
Sexual touching, sexual act and other sexual offenses.

The empirical means for the two count time series are 17.452 (Northern Beaches) and 6.316 (Waverley), respectively. The monthly counts of sexual offenses for the two areas are shown in Figure 5. Compared to the Waverley count time series, the Northern Beaches count series exhibits higher counts and greater variation over time. Figure 6 presents bar plots of the count distributions and the sample autocorrelation functions (ACFs) for the two count series. The ACFs reveal clear serial dependence in both series. This observation motivates a detailed examination of serial and cross-series dependencies using the proposed copula-based bivariate model.

Figure 5. Sexual offenses counts for Waverley and Northern beaches.

Figure 6. Bar plot and ACF for counts of offenses for Waverley (Top) and Northern beaches (Bottom).

We do not observe any noticeable trends or seasonality in either count time series, supporting the stationarity assumption. Furthermore, the bar plots indicate that there is no evidence of zero-inflation in either series. We analyze these data using a copula-based bivariate model with Poisson and negative binomial marginals. The parameter estimates from our proposed model are compared with those from alternative bivariate count time series models, and model performance is evaluated using the AIC criterion. The manuscript by Yang et al. [18] shows the parameter estimates for BINAR (1) and all the other models presented in Table 6.

Table 6. Parameter estimates of the sexual offense counts under different models.

We compare our model with following bivariate integer-valued time series models using the AIC criterion.

The Poisson BINAR(1) model (Pedeli and Karlis (2011) [19]).
BINDSETINAR(2,1) model (Monteiro et al. (2012) [20]).
The first order bivariate threshold integer-valued autoregressive process (BTINAR(1)) (Yang et al. [18]).

We have used several candidate copula families to model the joint distribution. Their corresponding AIC values are reported in Table 7 to evaluate comparative model fit and identify the most suitable copula family. By comparing the observed AIC values, the models with Negative Binomial marginals outperformed those with Poisson marginals. Among these, the Gaussian copula with Negative Binomial marginals has the lowest AIC.

Table 7. AIC values of bivariate models fitted with different copula families.

Table 8 presents the parameter estimates for the copula-based model, where a Gaussian copula is used to construct the joint distribution with Poisson and Negative Binomial marginals. In this setting, parameters with subscript “1” correspond to the offense counts recorded in the Northern beaches, and parameters with subscript “2” correspond to the counts in Waverley city.

Table 8. Parameter estimates for the bivariate Poisson and Negative Binomial models.

A comparison of the results shows that our proposed model with Negative Binomial marginals yields a lower AIC value (2719.38), indicating improved model fit. The parameter estimates for

λ_{1}

and

λ_{2}

exhibit smaller standard errors compared to those obtained from all other fitted models, which emphasizes the robustness of our proposed method. Since the model with Negative Binomial marginals can accommodate the over-dispersion present in the data, we can expect better performance compared to other fitted models.

Figure 7 and Figure 8 present the predicted values, which correspond to the conditional expectations of

Y_{t}

given

Y_{t - 1}

under the bivariate negative binomial model for

t = 2, \dots, n

. Overall, the proposed model performs well, with some reduced accuracy for larger count values.

Figure 7. Predicted values of the offense counts for the Waverley City based on the bivariate Negative Binomial model. Dots represent the observed counts, and the lines represent the predicted values.

Figure 8. Predicted values of the offense counts for the Northern Beaches based on the bivariate Negative Binomial model. Dots represent the observed counts, and the lines represent the predicted values.

4. Discussion

In this paper, we introduced a class of bivariate integer-valued time series models based on copula theory. Serial dependence was captured through copula-based transition probabilities within a Markov chain framework, using both Poisson and Negative Binomial marginals. The cross-sectional dependence was modeled with a bivariate Gaussian copula. The performance of the likelihood-based estimation procedure was evaluated through simulation studies. Importance sampling was used to efficiently evaluate the bivariate normal integral. The models were also applied to a real-world count dataset using both marginal distributions. Model comparisons based on the AIC criterion indicated that the proposed approach with Negative Binomial marginals achieved the best fit. Both the simulation results and the empirical analysis confirmed the effectiveness of the proposed methodology.

As a future extension, we plan to construct a multivariate framework for higher-dimensional data using vine copulas.

Author Contributions

Methodology, D.F. and W.J.; software, D.F. and W.J.; formal analysis, D.F. and W.J.; investigation, D.F.; writing—original draft, D.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Acknowledgments

The authors thank the Editor and Reviewers whose comments have significantly improved the manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Quoreshi, A.S. Bivariate time series modeling of financial count data. Commun. Stat.-Theory Methods 2006, 35, 1343–1358. [Google Scholar] [CrossRef]
Wang, K.; Lee, A.H.; Yau, K.K.; Carrivick, P.J. A bivariate zero-inflated Poisson regression model to analyze occupational injuries. Accid. Anal. Prev. 2003, 35, 625–629. [Google Scholar] [CrossRef] [PubMed]
Heinen, A.; Rengifo, E. Multivariate autoregressive modeling of time series count data using copulas. J. Empir. Financ. 2007, 14, 564–583. [Google Scholar] [CrossRef]
Karlis, D.; Pedeli, X. Flexible bivariate INAR (1) processes using copulas. Commun. Stat.-Theory Methods 2013, 42, 723–740. [Google Scholar] [CrossRef]
Ravishanker, N.; Venkatesan, R.; Hu, S. Dynamic models for time series of counts with a marketing application. In Handbook of Discrete-Valued Time Series; Chapman and Hall/CRC: Boca Raton, FL, USA, 2016. [Google Scholar]
Bradshaw, C.; Blei, D.M. A Bayesian model of underreporting for sexual assault on college campuses. Ann. Stat. 2024, 18, 3146–3164. [Google Scholar] [CrossRef]
Cui, Y.; Zhu, F. A new bivariate integer-valued GARCH model allowing for negative cross-correlation. Test 2018, 27, 428–452. [Google Scholar] [CrossRef]
Ahamad, N.; Gayah, V.V.; Donnell, E.T. Copula-based bivariate count data regression models for simultaneous estimation of crash counts based on severity and number of vehicles. J. Acc. Anal. Prev. 2023, 181, 106928. [Google Scholar] [CrossRef] [PubMed]
Jeng, A.H.; Singh, R.; Diawara, N.; Curtis, K.; Gonzalez, R.; Welch, N.; Jackson, C.; Jurgens, D.; Adikari, S. Application of wastewater-based surveillance and copula time-series model for COVID-19 forecast. Sci. Total Environ. 2023, 885, 163655. [Google Scholar] [CrossRef] [PubMed]
Xu, X.; Hardin, J.W. Regression models for bivariate count outcomes. Stata J. 2016, 16, 301–315. [Google Scholar] [CrossRef]
Alqawba, M.; Fernando, D.; Diawara, N. A class of copula-based bivariate poisson time series models with applications. Computation 2021, 9, 108. [Google Scholar] [CrossRef]
Nelsen, R.B. An Introduction to Copulas; Springer: Cham, Switzerland, 2007. [Google Scholar]
Joe, H. Dependence Modeling with Copulas; Chapman and Hall/CRC: Boca Raton, FL, USA, 2014. [Google Scholar]
Alqawba, M.; Diawara, N. Copula-based Markov zero-inflated count time series models with application. J. Appl. Stat. 2021, 48, 786–803. [Google Scholar] [CrossRef] [PubMed]
Panagiotelis, A.; Czado, C.; Joe, H. Pair copula constructions for multivariate discrete data. J. Am. Stat. Assoc. 2012, 107, 1063–1072. [Google Scholar] [CrossRef]
Genz, A.; Bretz, F. Computation of Multivariate Normal and t Probabilities; Springer Science & Business Media: Cham, Switzerland, 2009; Volume 195. [Google Scholar]
Hothorn, T.; Bretz, F.; Genz, A. On multivariate t and Gauss probabilities in R. Sigma 2001, 1000, 3. [Google Scholar]
Yang, K.; Zhao, Y.; Li, H.; Wang, D. On bivariate threshold Poisson integer-valued autoregressive processes. J. Metrika. 2023, 86, 931–963. [Google Scholar] [CrossRef]
Pedeli, X.; Karlis, D. A bivariate INAR (1) process with application. J. Stat. Model. 2011, 11, 325–349. [Google Scholar] [CrossRef]
Monteiro, M.; Scotto, M.G.; Pereira, I. Integer-valued self-exciting threshold autoregressive processes. Commun. Stat.-Theory 2012, 41, 2717–2737. [Google Scholar] [CrossRef]

Figure 1. Q–Q Plots of ML estimates for n = 1000 under positive cross-correlation with Poisson marginals.

Figure 2. Q–Q Plots of ML estimates for n = 1000 under negative cross-correlation with Poisson marginals.

Figure 3. Q–Q plots of the ML estimates for n = 1000 with positive cross-correlation with Negative Binomial marginals.

Figure 4. Q–Q plots of the ML estimates for n = 1000 with negative cross-correlation with Negative Binomial marginals.

Figure 5. Sexual offenses counts for Waverley and Northern beaches.

Figure 6. Bar plot and ACF for counts of offenses for Waverley (Top) and Northern beaches (Bottom).

Figure 7. Predicted values of the offense counts for the Waverley City based on the bivariate Negative Binomial model. Dots represent the observed counts, and the lines represent the predicted values.

Figure 8. Predicted values of the offense counts for the Northern Beaches based on the bivariate Negative Binomial model. Dots represent the observed counts, and the lines represent the predicted values.

Table 1. Bivariate copula functions.

Copula	Copula Function
Gaussian	$C (u_{1}, u_{2}; δ) = Φ_{δ} (Φ^{- 1} (u_{1}), Φ^{- 1} (u_{2})), δ \in [- 1, 1]$
Frank	$C (u_{1}, u_{2}; δ) = - \frac{1}{δ} log [1 + \frac{(e^{- δ u_{1}} - 1) (e^{- δ u_{2}} - 1)}{e^{- δ - 1}}], δ \in R {0}$
Gumbel	$C (u_{1}, u_{2}; δ) = exp [- {({(- log (u_{1}))}^{δ} + {(- log (u_{2}))}^{δ})}^{1 / δ}], δ \geq 1$
Clayton	$C (u_{1}, u_{2}; δ) = {(u_{1}^{- δ} + u_{2}^{- δ} - 1)}^{- 1 / δ}, δ > 0$
Plackett	$C (u_{1}, u_{2}; δ) = \frac{[1 + (δ - 1) (u_{1} + u_{2})] - \sqrt{{[1 + (δ - 1) (u_{1} + u_{2})]}^{2} - 4 u_{1} u_{2} δ (δ - 1)}}{2 (δ - 1)}, δ \geq 0$
Bivariate t	$C (u_{1}, u_{2}; δ) = τ_{δ} (τ^{- 1} (u_{1}), τ^{- 1} (u_{2})), δ \in [- 1, 1]$

Table 2. Parameter estimates using Gaussian copula for univariate and joint distributions with Poisson marginals.

Sample Size	Parameter	Estimate	SE	MSE	MAE
50	$λ_{1} (4)$	4.099	0.572	0.321	0.478
	$λ_{2} (6)$	6.096	0.594	0.247	0.411
	$δ_{1} (0.5)$	0.418	0.103	0.115	0.119
	$δ_{2} (0.4)$	0.333	0.117	0.017	0.112
	$ρ_{1} (0.5)$	0.476	0.101	0.013	0.092
100	$λ_{1} (4)$	4.107	0.395	0.166	0.323
	$λ_{2} (6)$	6.100	0.434	0.198	0.348
	$δ_{1} (0.5)$	0.429	0.067	0.009	0.079
	$δ_{2} (0.4)$	0.350	0.074	0.008	0.072
	$ρ_{1} (0.5)$	0.458	0.071	0.007	0.066
300	$λ_{1} (4)$	4.099	0.213	0.055	0.188
	$λ_{2} (6)$	6.079	0.231	0.059	0.194
	$δ_{1} (0.5)$	0.436	0.039	0.006	0.065
	$δ_{2} (0.4)$	0.352	0.040	0.004	0.052
	$ρ_{1} (0.5)$	0.455	0.039	0.004	0.051
1000	$λ_{1} (4)$	4.102	0.122	0.024	0.129
	$λ_{2} (6)$	6.081	0.127	0.019	0.118
	$δ_{1} (0.5)$	0.437	0.019	0.004	0.065
	$δ_{2} (0.4)$	0.357	0.021	0.002	0.044
	$ρ_{1} (0.5)$	0.455	0.022	0.002	0.047

Table 3. Parameter estimates using Gaussian copula for univariate and joint distributions with Poisson marginals under negative cross-correlation.

Sample Size	Parameter	Estimate	SE	MSE	MAE
50	$λ_{1} (3)$	3.139	0.613	0.394	0.494
	$λ_{2} (5)$	4.996	0.508	0.258	0.409
	$δ_{1} (0.6)$	0.488	0.092	0.021	0.119
	$δ_{2} (0.4)$	0.333	0.122	0.019	0.109
	$ρ (- 0.5)$	−0.449	0.109	0.015	0.096
100	$λ_{1} (3)$	3.144	0.423	0.199	0.356
	$λ_{2} (5)$	5.003	0.379	0.143	0.306
	$δ_{1} (0.6)$	0.501	0.063	0.014	0.100
	$δ_{2} (0.4)$	0.340	0.081	0.010	0.081
	$ρ (- 0.5)$	−0.453	0.075	0.008	0.069
300	$λ_{1} (3)$	3.141	0.229	0.072	0.217
	$λ_{2} (5)$	4.983	0.201	0.041	0.165
	$δ_{1} (0.6)$	0.509	0.035	0.009	0.091
	$δ_{2} (0.4)$	0.345	0.040	0.004	0.058
	$ρ (- 0.5)$	−0.446	0.038	0.004	0.057
1000	$λ_{1} (3)$	3.139	0.126	0.039	0.159
	$λ_{2} (5)$	4.974	0.104	0.011	0.092
	$δ_{1} (0.6)$	0.512	0.019	0.008	0.087
	$δ_{2} (0.4)$	0.351	0.022	0.003	0.049
	$ρ (- 0.5)$	−0.443	0.021	0.003	0.048

Table 4. Parameter estimates using Gaussian copula for univariate and joint distributions with Negative Binomial marginals.

Sample Size	Parameter	Estimate	SE	MSE	MAE
50	$λ_{1} (4)$	4.050	0.724	0.525	0.578
	$κ_{1} (1.5)$	1.554	0.603	0.367	0.428
	$λ_{2} (6)$	6.057	0.852	0.728	0.669
	$κ_{2} (2.5)$	2.469	0.759	0.576	0.591
	$δ_{1} (0.3)$	0.265	0.127	0.017	0.105
	$δ_{2} (0.3)$	0.268	0.121	0.016	0.097
	$ρ (0.5)$	0.505	0.111	0.012	0.085
100	$λ_{1} (4)$	4.098	0.506	0.265	0.405
	$κ_{1} (1.5)$	1.421	0.361	0.131	0.276
	$λ_{2} (6)$	6.108	0.612	0.385	0.480
	$κ_{2} (2.5)$	2.315	0.557	0.331	0.426
	$δ_{1} (0.3)$	0.278	0.087	0.008	0.071
	$δ_{2} (0.3)$	0.280	0.087	0.008	0.071
	$ρ (0.5)$	0.499	0.079	0.006	0.063
300	$λ_{1} (4)$	4.085	0.291	0.092	0.241
	$κ_{1} (1.5)$	1.451	0.173	0.044	0.175
	$λ_{2} (6)$	6.118	0.375	0.155	0.311
	$κ_{2} (2.5)$	2.342	0.292	0.133	0.308
	$δ_{1} (0.3)$	0.292	0.051	0.003	0.041
	$δ_{2} (0.3)$	0.286	0.049	0.003	0.041
	$ρ (0.5)$	0.496	0.045	0.002	0.036
1000	$λ_{1} (4)$	4.101	0.175	0.041	0.162
	$κ_{1} (1.5)$	1.362	0.093	0.027	0.145
	$λ_{2} (6)$	6.088	0.192	0.044	0.164
	$κ_{2} (2.5)$	2.253	0.148	0.083	0.252
	$δ_{1} (0.3)$	0.294	0.027	0.001	0.022
	$δ_{2} (0.3)$	0.292	0.029	0.001	0.024
	$ρ (0.5)$	0.497	0.024	0.001	0.019

Table 5. Parameter estimates using Gaussian copula for univariate and joint distributions with Negative Binomial marginals under negative cross-correlation.

Sample Size	Parameter	Estimate	SE	MSE	MAE
50	$λ_{1} (4)$	4.052	0.731	0.535	0.589
	$κ_{1} (1.5)$	1.593	0.564	0.326	0.417
	$λ_{2} (6)$	6.109	0.915	0.848	0.716
	$κ_{2} (2.5)$	2.586	0.712	0.514	0.535
	$δ_{1} (0.3)$	0.263	0.126	0.017	0.105
	$δ_{2} (0.3)$	0.260	0.125	0.017	0.105
	$ρ (- 0.5)$	−0.490	0.112	0.012	0.087
100	$λ_{1} (4)$	4.100	0.507	0.266	0.405
	$κ_{1} (1.5)$	1.457	0.351	0.125	0.277
	$λ_{2} (6)$	6.091	0.652	0.432	0.527
	$κ_{2} (2.5)$	2.361	0.563	0.336	0.464
	$δ_{1} (0.3)$	0.278	0.089	0.008	0.073
	$δ_{2} (0.3)$	0.281	0.088	0.008	0.071
	$ρ (- 0.5)$	−0.495	0.081	0.007	0.065
300	$λ_{1} (4)$	4.095	0.299	0.098	0.248
	$κ_{1} (1.5)$	1.372	0.174	0.046	0.181
	$λ_{2} (6)$	6.128	0.361	0.146	0.301
	$κ_{2} (2.5)$	2.259	0.288	0.140	0.316
	$δ_{1} (0.3)$	0.291	0.051	0.003	0.041
	$δ_{2} (0.3)$	0.289	0.052	0.003	0.042
	$ρ (- 0.5)$	−0.497	0.047	0.002	0.037
1000	$λ_{1} (4)$	4.108	0.201	0.052	0.181
	$κ_{1} (1.5)$	1.419	0.136	0.025	0.131
	$λ_{2} (6)$	6.092	0.218	0.056	0.189
	$κ_{2} (2.5)$	2.402	0.236	0.065	0.024
	$δ_{1} (0.3)$	0.295	0.029	0.001	0.023
	$δ_{2} (0.3)$	0.289	0.032	0.001	0.027
	$ρ (- 0.5)$	−0.487	0.027	0.001	0.024

Table 6. Parameter estimates of the sexual offense counts under different models.

Model	Parameter	Estimate	SE	AIC
BINAR(1)	$α_{1}$	0.276	0.018	3031.74
	$α_{2}$	0.099	0.028
	$λ_{1}$	12.630	0.381
	$λ_{2}$	5.676	0.199
	$ϕ$	0.405	0.235
BINDSETINAR(2,1)	$α_{1, 1}$	0.000	0.041	2968.84
	$α_{1, 2}$	0.285	0.023
	$λ_{1}$	15.712	0.617
	$α_{2, 1}$	0.004	0.068
	$α_{2, 2}$	0.115	0.033
	$λ_{2}$	5.973	0.344
$B T I N A R_{I} (1)$	$α_{1, 1}$	0.036	0.038	2955.53
	$α_{1, 2}$	0.044	0.030
	$α_{2, 1}$	0.317	0.021
	$α_{2, 2}$	0.293	0.047
	$λ_{1}$	15.329	0.596
	$λ_{2}$	5.761	0.207
	$ϕ$	0.477	0.275
$B T I N A R_{I I} (1)$	$α_{1, 1}$	0.200	0.020	3010.99
	$α_{1, 2}$	0.019	0.098
	$α_{2, 1}$	0.337	0.018
	$α_{2, 2}$	0.094	0.040
	$λ_{1}$	12.822	0.376
	$λ_{2}$	5.862	0.393
	$ϕ$	0.389	0.245
$B T I N A R_{I I I} (1)$	$α_{1, 1}$	0.093	0.035	2960.57
	$α_{1, 2}$	0.072	0.036
	$α_{2, 1}$	0.331	0.021
	$α_{2, 2}$	0.301	0.036
	$λ_{1}$	14.544	0.565
	$λ_{2}$	5.717	0.218
	$ϕ$	0.557	0.276

Table 7. AIC values of bivariate models fitted with different copula families.

	Marginal Distribution
Copula	Negative Binomial	Poisson
Gaussian	2719.38	3214.91
Frank	2747.75	3119.16
Clayton	2728.31	3168.33

Table 8. Parameter estimates for the bivariate Poisson and Negative Binomial models.

Parameter	Poisson		Negative Binomial
Parameter	Estimate	SE	Estimate	SE
$λ_{1}$	20.929	0.003	17.571	0.052
$κ_{1}$			4.406	0.538
$λ_{2}$	7.397	0.001	6.142	0.047
$κ_{2}$			5.591	1.011
$δ_{1}$	0.341	0.025	0.377	0.129
$δ_{2}$	0.157	0.001	0.207	0.071
$ρ$	0.129	0.032	0.061	0.093
AIC	3214.91		2719.38

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

A Copula-Based Model for Analyzing Bivariate Offense Data

Abstract

1. Introduction

2. Materials and Methods

2.1. The Poisson Distribution

2.2. The Negative Binomial Distribution

2.3. Copulas

2.4. Copula Based Model for Count Time Series Data

2.5. Copula Based Bivariate Model

2.6. Inference

3. Results

3.1. Simulation Studies

3.2. Real-Data Application

4. Discussion

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics