Previous Article in Journal
Prediction Inferences for Finite Population Totals Using Longitudinal Survey Data
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Copula-Based Model for Analyzing Bivariate Offense Data

by
Dimuthu Fernando
* and
Wimarsha Jayanetti
Department of Statistics, Grand Valley State University, Allendale, MI 49401, USA
*
Author to whom correspondence should be addressed.
Stats 2025, 8(4), 111; https://doi.org/10.3390/stats8040111
Submission received: 24 September 2025 / Revised: 12 November 2025 / Accepted: 16 November 2025 / Published: 19 November 2025
(This article belongs to the Section Time Series Analysis)

Abstract

We developed a class of bivariate integer-valued time series models using copula theory. Each count time series is modeled as a Markov chain, with serial dependence characterized through copula-based transition probabilities for Poisson and Negative Binomial marginals. Cross-sectional dependence is modeled via a bivariate Gaussian copula, allowing for both positive and negative correlations and providing a flexible dependence structure. Model parameters are estimated using likelihood-based inference, where the bivariate Gaussian copula integral is evaluated through standard randomized Monte Carlo methods. The proposed approach is illustrated through an application to offense data from New South Wales, Australia, demonstrating its effectiveness in capturing complex dependence patterns.

1. Introduction

Multivariate count time series frequently arise in modern statistical analysis and often exhibit dependence both within and between series. In many applications, time-series counts are observed as bivariate vectors that display serial dependence within each series, as well as cross-correlation between the two. Building on the framework of the integer-valued autoregressive moving average model (INARMA), Quoreshi [1] proposed the bivariate integer-valued moving average model (BINMA), which accommodates both positive and negative correlations between counts. He also extended the BINMA model to a multivariate setting. Wang et al. [2] proposed a bivariate zero-inflated Poisson model to analyze occupational injuries. Heinen and Rengifo [3] proposed a multivariate autoregressive conditional doubly Poisson model capable of handling over-dispersion, serial dependence, and cross-correlation. The cross-correlation between the time series was modeled using a multivariate Gaussian copula, and the parameters were estimated through a two-stage procedure. The work of Karlis and Pedeli [4] presented a bivariate integer-valued autoregressive process of order 1 (BINAR (1)) in which the cross-correlation is modeled by the use of copula to accommodate both positive and negative correlations in Poisson and Negative Binomial counts. Also, they illustrated the use of Frank and Gaussian copulas to specify the joint distribution of the innovations. Marginal time series are modeled using Poisson and Negative Binomial INAR(1) models. Ravishanker et al. [5] applied state space models for multivariate count time series and used them to analyze a market dataset. One major advantage of copula-based methods is that they can separate the modeling of marginal and temporal dependence. Bradshaw and Blei [6] constructed a generative model of underreported campus sexual assault data that allows the estimation of the true incidence and reporting rates. Additionally, they used the Hamiltonian Monte Carlo (HMC) sampling scheme for posterior inference regarding reporting rates and assault incidence in each school and applied this method to analyze campus sexual assault data. Cui and Zhu [7] proposed a new bivariate Poisson INGARCH model, which allows for positive or negative cross-correlation between time series. Ahamad et al. [8] proposed a bivariate count data regression model to capture the dependence between multiple crash outcomes that traditional independent count models fail to represent. In this approach, appropriate marginal distributions (such as Poisson or Negative Binomial) are specified for each crash count type, and a copula function is then used to link the two margins to model their joint dependence. Jeng et al. [9] proposed a copula-based time series model to forecast COVID-19 cases and trends based on wastewater SARS-CoV-2 viral load and clinical variables. The model was developed in two stages. In the first stage, time-series methods were used to examine and characterize the marginal distributions of both the dependent and independent variables. In the second stage, copula-based marginal regression analysis was applied to model and predict the COVID-19 case trends.
Traditional bivariate Negative Binomial regression models typically assume a specific joint distribution for the dependent variables or their associated error terms. A common approach is to impose a bivariate gamma distribution on the error terms when combined in the bivariate framework. However, such assumptions may inadequately capture the underlying dependence structure between the error terms. Moreover, as noted by Xu and Hardin [10], there is no explicit joint distribution for the error terms in this setting. This limitation motivates the need for more flexible approaches. In particular, a copula-based bivariate Negative Binomial model provides a natural extension. In this previous work (alaqawba et al. [11]), we developed a class of copula-based models to analyze bivariate count data, where the marginal distributions were specified as Poisson and zero-inflated Poisson (ZIP). However, when analyzing count time series data, over-dispersion is often present. In this manuscript, we extend that framework by incorporating Negative Binomial marginals, which better accommodate over-dispersion. The copula-based approach enables us to model each marginal distribution appropriately while flexibly capturing the dependence between the series through the selected copula family. This flexibility allows the model to accommodate both positive and negative dependence, providing a more general and adaptable framework for analyzing bivariate count time series data.
The remainder of the paper is organized as follows. Section 2 provides a concise overview of Poisson and Negative Binomial regression models, along with a summary of copula theory. It then introduces the proposed class of copula-based bivariate models for analyzing two dependent time series, where each series is modeled through a copula-based Markov chain and jointly linked using a bivariate copula family. Section 3 outlines the parameter estimation procedure via Maximum Likelihood Estimation (MLE), presents the results of simulation studies, and applies the proposed methodology to a real dataset. Finally, Section 4 concludes the paper.

2. Materials and Methods

2.1. The Poisson Distribution

The Poisson distribution is a common choice for modeling count data. In our proposed copula-based bivariate model, we use it as a marginal distribution. Let y t represent a random count observed at time t. The probability mass function (pmf) of the Poisson distribution is given by:
f ( y t ) = e λ λ y t y t ! ,
where λ > 0 is the intensity parameter with E ( y t ) = λ and V ( y t ) = λ .

2.2. The Negative Binomial Distribution

With the introduction of an additional parameter ( κ ) the Negative Binomial distribution is able to account for over-dispersion when compared to the Poisson distribution.
f ( y t ) = Γ ( κ + y t ) Γ ( κ ) y t ! κ κ + λ κ λ κ + λ y t f o r y t = 0 , 1 , 2 ,
where λ and κ are parameters associated with intensity and dispersion with E ( y t ) = λ and V ( y t ) = λ + λ 2 κ .

2.3. Copulas

As a multivariate cumulative distribution function (cdf), the copula is a joint function that captures the dependence structure between variables. With uniform margins U ( 0 , 1 ) as in Nelson [12], a n-dimensional copula is a function C : [ 0 , 1 ] n [ 0 , 1 ] with the following three properties:
  • C ( 1 , , u t , , 1 ) = u t , t = 1 , 2 , , n and u t [ 0 , 1 ] .
  • C ( u 1 , u 2 , , u n ) = 0 if at least one u t = 0 for t = 1 , 2 , , n .
  • For any u t 1 , u t 2 [ 0 , 1 ] with u t 1 u t 2 , for t = 1 , 2 , , n ,
    j 1 = 1 2 j 2 = 1 2 j n = 1 2 ( 1 ) j 1 + j 2 + + j n C ( u 1 j 1 , u 2 j 2 , u n j n ) 0 .
Let Y 1 , , Y n be r.v.’s with marginal cdf’s F 1 , , F n and joint cdf F, then
  • there exists an n-dimensional copula C such that for all y 1 , , y n R
    F ( y 1 , y 2 , , y n ) = C ( F 1 ( y 1 ) , F 2 ( y 2 ) , , F n ( y n ) ) .
  • If Y 1 , , Y n are continuous then the copula C is unique. Otherwise, C can be uniquely determined on n dimensional rectangle R a n g e ( F 1 ) × R a n g e ( F 2 ) × × R a n g e ( F n ) .
When all the margins are integer valued, the multivariate probability mass function can be obtained as
f ( y 1 , y 2 , , y n ) = P ( Y 1 = y 1 , Y 2 = y 2 , , Y n = y n )
= j 1 = 1 2 j 2 = 1 2 j n = 1 2 ( 1 ) j 1 + j 2 + + j n C ( u 1 j 1 , u 2 j 2 , u n j n )
where u t 1 = F t ( y t ) and u t 2 = F t ( y t ) . Here F t ( y t ) is the left-hand limit of F t at y t , which is equal to F t ( y t 1 ) . In the bivariate case,
P r ( Y 1 = y 1 , Y 2 = y 2 ) = C ( F ( y 1 ) , F ( y 2 ) ; θ ) C ( F ( y 1 ) , F ( y 2 ) ; θ ) C ( F ( y 1 ) , F ( y 2 ) ; θ ) + C ( F ( y 1 ) , F ( y 2 ) ; θ ) .
Here, θ denotes the dependence parameter of the copula function, and a variety of copula families, denoted by C, are available for selection. Table 1 lists several commonly used copula families. More details on these families can be found in Joe [13]. Bivariate copulas such as Gaussian, Frank, and t can model both positive and negative dependencies, while the Gumbel, Clayton, and Plackett copulas are limited to capturing only positive dependencies. In this study, we focus mainly on the Gaussian copula, as it can accommodate both positive and negative dependence; however, different families of copula may be used depending on the context and nature of the data.

2.4. Copula Based Model for Count Time Series Data

This section focuses on the development of a class of bivariate count time series models. The joint distribution of successive observations is formulated using copula functions, allowing for flexible modeling of both serial dependence and cross-correlation structures. Specifically, cross-dependence between the two series is captured via an additional copula function. The models are constructed under a first-order stationary Markov framework with marginal distributions specified as either Poisson or Negative Binomial. For first-order Markov models, bivariate copula functions such as the bivariate Gaussian copula are chosen to construct the joint distribution between two consecutive observations.

2.5. Copula Based Bivariate Model

Bivariate integer-valued time series model was constructed via copula theory. Suppose that we observe a series of 2-dimensional vector, { Y t } t = 1 n , where Y t = ( Y 1 t , Y 2 t ) for t = 1 , 2 , , n . Assume that each series { Y 1 t } t = 1 n and { Y 2 t } t = 1 n follows a first-order Markov process based on copula (see Alqawba and Diawara [14] for an example). Then, the mean vector μ t , and the covariance matrix, say Γ ( t , t 1 ) are defined as follows.
μ t = E ( Y t ) = E ( Y 1 t ) E ( Y 2 t ) ,
and
Γ ( t , t 1 ) = COV ( Y t , Y t 1 ) = COV ( Y 1 t , Y 1 , t 1 ) COV ( Y 1 t , Y 2 , t 1 ) COV ( Y 2 t , Y 1 , t 1 ) COV ( Y 2 t , Y 2 , t 1 ) .
Since the conditional dependence is defined through a first-order Markov process, the covariance matrix Γ ( t , t 1 ) is defined for t = 2 , , n . The diagonal elements of the covariance matrix correspond to the autocovariance within each time series, while the off-diagonal elements capture the cross-covariance between the two series. Given the presence of serial dependence and cross-correlation, the joint probability distribution of Y 1 t and Y 2 t conditional on Y 1 , t 1 and Y 2 , t 1 , for t = 1 , , n , is expressed as:
f ( y 1 t , y 2 t | y 1 , t 1 , y 2 , t 1 ) = V 1 ( F 1 , t ) V 1 ( F 1 , t + ) V 1 ( F 2 , t ) V 1 ( F 2 , t + ) V 2 ( z 1 , z 2 , R ) d z 2 d z 1 ,
where V 1 denotes the inverse cdf of the normal distribution with V 2 ( . , R ) being the probability density function of the bivariate normal distribution. The matrix R is the correlation matrix of the joint distribution, which captures the cross-sectional dependence, and is defined as:
R = 1 ρ ρ 1 ,
where ρ is a dependence parameter Gaussian copula function that describes the cross-sectional dependence between the two count time series. Also, F i , t + = F ( y i t | y i , t 1 ) and F i , t = F ( y i t 1 | y i , t 1 ) , for i = 1 , 2 , where:
F ( y i t | y i , t 1 ) = F 12 ( y i t , y i , t 1 ) F 12 ( y i t , y i , t 1 1 ) f t 1 ( y i , t 1 ; θ ) ,
is the conditional cdf of Y i t given Y i , t 1 , for i = 1 , 2 , and
F 12 ( y i t , y i , t 1 ) = C ( F t ( y i t ) , F t 1 ( y i , t 1 ) ; δ ) ,
Here, C ( . ; δ ) denotes a bivariate copula function with dependence parameter δ , which characterizes the serial dependence within a single time series. The vector of marginal parameters, denoted by θ , reduces to a scalar in the Poisson case, i.e., θ = λ . The proposed model is applicable to the analysis of bivariate count time series data with marginal distributions that may follow any discrete distribution, offering flexibility beyond traditional parametric assumptions.

2.6. Inference

Parameter estimation is performed by maximizing the likelihood function, with the log-likelihood constructed using copula theory. Since this function has no closed-form expression, its maximization cannot be performed using standard methods [15]. The maximization technique used is presented next.
Using the conditional density function shown in the Equation (2) for t = 1 , the joint distribution of Y 11 and Y 21 is given by
f ( y 11 , y 21 ) = V 1 ( F 1 , 1 ) V 1 ( F 1 , 1 + ) V 1 ( F 2 , 1 ) V 1 ( F 2 , 1 + ) V 2 ( z 1 , z 2 , R ) d z 2 d z 1 ,
and for t = 2 , , n , the conditional bivariate distribution of Y 1 t = y 1 t and Y 2 t = y 2 t given Y 1 , t 1 = y 1 , t 1 and Y 2 , t 1 = y 2 , t 1 is given by
f ( y 1 t , y 2 t | y 1 , t 1 , y 2 , t 1 ) = V 1 ( F 1 , t ) V 1 ( F 1 , t + ) V 1 ( F 2 , t ) V 1 ( F 2 , t + ) V 2 ( z 1 , z 2 , R ) d z 2 d z 1 .
Hence, joining the Equations in (3) and (4), the likelihood function is given by
L ( ϑ ; y ) = f ( y 11 , y 21 ) . t = 2 n f ( y 1 t , y 2 t y 1 , t 1 , y 2 , t 1 ) ,
where ϑ = ( θ , δ 1 , δ 2 , ρ ) , here θ is the vector of marginal parameters, δ 1 and δ 2 are the serial dependence parameters to deal with the first and second count series, respectively. The bivariate dependence between the two time series is captured by ρ . Therefore, taking the log of the function in Equation (5), we can construct the log-likelihood function as follows:
log L ( ϑ ; y ) = l ( ϑ ; y ) = log f ( y 1 t , y 2 t ) + t = 2 n log f ( y 1 t , y 2 t y 1 , t 1 , y 2 , t 1 ) .
Maximizing the log-likelihood function in Equation (6) yields the ML estimates for the proposed model class. For the likelihood-based estimation of copula models, convergence issues may arise due to the high dimensionality of the parameter space and the nonlinearity of the likelihood surface. In our setting, the log-likelihood involves a bivariate normal integral, given in (2), which has no closed-form solution. To compute this integral, we employ the standard randomized importance sampling method of Genz and Bretz [16], which is effective for dimensions below ten. This procedure is implemented in the mvtnorm package by Hothorn et al. [17], available on CRAN. The package includes the function pmvnorm for computing multivariate normal probabilities.Then, the parameter estimates, i.e., ϑ ^ , can be obtained as
ϑ ^ = arg max ϑ l ( ϑ ; y ) .
This maximization technique produces a numerically calculated Hessian matrix that provides the Fisher’s information matrix (FIM). Using the inverse of the FIM, yields standard errors of the ML estimates of ϑ . In the next section, we evaluate the effectiveness of the proposed class of models through a comprehensive simulation study.

3. Results

3.1. Simulation Studies

A comprehensive simulation study was conducted to assess the proposed estimation method and verify the asymptotic properties of the parameter estimates. In our simulation procedure, we first generate a correlated Gaussian time series and then transform it into uniform random variables using the cdf of the standard normal distribution. These uniforms are then mapped through the inverse cdf of the chosen count distribution, producing dependent integer-valued time series that preserve both the serial and cross-dependence structures. We first consider bivariate Poisson count time series data. For each univariate series, a first-order stationary copula-based Markov model was specified, where a copula family defines the joint distribution of consecutive observations. The two series were then coupled at each time point using a bivariate copula function. Here, λ 1 and λ 2 denote the means of the two marginal distributions; δ 1 and δ 2 measure serial dependence within each series; and ρ measures the cross-correlation between the series. A Gaussian copula was chosen as the candidate copula family, with true parameters λ 1 = 4 , λ 2 = 6 , δ 1 = 0.5 , δ 2 = 0.4 , and ρ = 0.5 . Assuming stationarity, the parameters of the marginal distributions, θ , are held constant over time. Simulations were performed for sample sizes of n = 50, 100, 300 and 1000, each replicated 500 times. For the five parameters, standard error (SE), mean square error (MSE), and mean absolute error (MAE) were computed, with results summarized in Table 2. The MSE and MAE are defined as follows.
M S E = 1 m i = 1 m ( θ i θ i ^ ) 2 , M A E = 1 m i = 1 m θ i θ i ^ ,
where θ i ^ estimated value of the parameter and m is the number of replications. We conducted another simulation setting using the Gaussian copula as the candidate copula family with true parameters ( λ 1 = 3, λ 2 = 5, δ 1 = 0.6 , δ 2 = 0.4 , ρ = 0.5 ). In this simulation scenario, the two time series are assumed to have negative cross-correlation. The corresponding results are presented in Table 3.
Table 2 and Table 3 demonstrate that the parameter estimates converge to the true values, with standard errors decreasing as the sample size increases. Further, we observe that both the MSE and MAE for the parameter estimates decrease as the sample size increases. This pattern is expected, since larger sample sizes provide more information about the underlying dependence structure, leading to more accurate and stable parameter estimates. In practical terms, as we increase the sample size, the parameter estimates produced by the MLE approach become more precise, resulting in smaller bias and reduced variability. Figure 1 and Figure 2 present the Q–Q plots for the parameter estimates obtained using Poisson marginals. The Q–Q plots show that the empirical distribution of the parameter estimates aligns closely with the 45 0 reference line, indicating that the sampling distribution of the estimates is approximately normal. This supports the asymptotic normality of the maximum likelihood estimates and suggests stable inference as sample size increases.
With the introduction of one additional parameter the Negative Binomial(NB) distribution is able to account for over-dispersion when compared to the Poisson distribution. We have performed simulations choosing the Gaussian copula for univariate and joint distributions with NB marginals. Table 4 and Table 5 demonstrate that the parameter estimates converge to the true values, with standard errors decreasing as the sample size increases. These results indicate that our proposed model is performing well with the Negative Binomial marginals for both negative and positive cross-correlations.
Figure 3 and Figure 4 present the Q–Q plots for the parameter estimates obtained using Negative Binomial marginals. The Q–Q plots show that the empirical distribution of the parameter estimates aligns closely with the 45 0 reference line, indicating that the sampling distribution of the estimates is approximately normal.

3.2. Real-Data Application

In this application, we fit a copula-based bivariate model to analyze offense data from New South Wales, Australia. The dataset, obtained from the NSW Bureau of Crime Statistics and Research, is categorized by local government area, offense category, and month. The analyzed series consists of 228 paired monthly observations spanning from August 1995 to July 2014. This dataset has also been used in previous research (Yang et al. [18]). We selected sexual offense counts from Northern Beaches and Waverley city for analysis. The sexual offense counts are the sum of two subcategories:
  • Sexual assault, and
  • Sexual touching, sexual act and other sexual offenses.
The empirical means for the two count time series are 17.452 (Northern Beaches) and 6.316 (Waverley), respectively. The monthly counts of sexual offenses for the two areas are shown in Figure 5. Compared to the Waverley count time series, the Northern Beaches count series exhibits higher counts and greater variation over time. Figure 6 presents bar plots of the count distributions and the sample autocorrelation functions (ACFs) for the two count series. The ACFs reveal clear serial dependence in both series. This observation motivates a detailed examination of serial and cross-series dependencies using the proposed copula-based bivariate model.
We do not observe any noticeable trends or seasonality in either count time series, supporting the stationarity assumption. Furthermore, the bar plots indicate that there is no evidence of zero-inflation in either series. We analyze these data using a copula-based bivariate model with Poisson and negative binomial marginals. The parameter estimates from our proposed model are compared with those from alternative bivariate count time series models, and model performance is evaluated using the AIC criterion. The manuscript by Yang et al. [18] shows the parameter estimates for BINAR (1) and all the other models presented in Table 6.
We compare our model with following bivariate integer-valued time series models using the AIC criterion.
  • The Poisson BINAR(1) model (Pedeli and Karlis (2011) [19]).
  • BINDSETINAR(2,1) model (Monteiro et al. (2012) [20]).
  • The first order bivariate threshold integer-valued autoregressive process (BTINAR(1)) (Yang et al. [18]).
We have used several candidate copula families to model the joint distribution. Their corresponding AIC values are reported in Table 7 to evaluate comparative model fit and identify the most suitable copula family. By comparing the observed AIC values, the models with Negative Binomial marginals outperformed those with Poisson marginals. Among these, the Gaussian copula with Negative Binomial marginals has the lowest AIC.
Table 8 presents the parameter estimates for the copula-based model, where a Gaussian copula is used to construct the joint distribution with Poisson and Negative Binomial marginals. In this setting, parameters with subscript “1” correspond to the offense counts recorded in the Northern beaches, and parameters with subscript “2” correspond to the counts in Waverley city.
A comparison of the results shows that our proposed model with Negative Binomial marginals yields a lower AIC value (2719.38), indicating improved model fit. The parameter estimates for λ 1 and λ 2 exhibit smaller standard errors compared to those obtained from all other fitted models, which emphasizes the robustness of our proposed method. Since the model with Negative Binomial marginals can accommodate the over-dispersion present in the data, we can expect better performance compared to other fitted models.
Figure 7 and Figure 8 present the predicted values, which correspond to the conditional expectations of Y t given Y t 1 under the bivariate negative binomial model for t = 2 , , n . Overall, the proposed model performs well, with some reduced accuracy for larger count values.

4. Discussion

In this paper, we introduced a class of bivariate integer-valued time series models based on copula theory. Serial dependence was captured through copula-based transition probabilities within a Markov chain framework, using both Poisson and Negative Binomial marginals. The cross-sectional dependence was modeled with a bivariate Gaussian copula. The performance of the likelihood-based estimation procedure was evaluated through simulation studies. Importance sampling was used to efficiently evaluate the bivariate normal integral. The models were also applied to a real-world count dataset using both marginal distributions. Model comparisons based on the AIC criterion indicated that the proposed approach with Negative Binomial marginals achieved the best fit. Both the simulation results and the empirical analysis confirmed the effectiveness of the proposed methodology.
As a future extension, we plan to construct a multivariate framework for higher-dimensional data using vine copulas.

Author Contributions

Methodology, D.F. and W.J.; software, D.F. and W.J.; formal analysis, D.F. and W.J.; investigation, D.F.; writing—original draft, D.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Acknowledgments

The authors thank the Editor and Reviewers whose comments have significantly improved the manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Quoreshi, A.S. Bivariate time series modeling of financial count data. Commun. Stat.-Theory Methods 2006, 35, 1343–1358. [Google Scholar] [CrossRef]
  2. Wang, K.; Lee, A.H.; Yau, K.K.; Carrivick, P.J. A bivariate zero-inflated Poisson regression model to analyze occupational injuries. Accid. Anal. Prev. 2003, 35, 625–629. [Google Scholar] [CrossRef] [PubMed]
  3. Heinen, A.; Rengifo, E. Multivariate autoregressive modeling of time series count data using copulas. J. Empir. Financ. 2007, 14, 564–583. [Google Scholar] [CrossRef]
  4. Karlis, D.; Pedeli, X. Flexible bivariate INAR (1) processes using copulas. Commun. Stat.-Theory Methods 2013, 42, 723–740. [Google Scholar] [CrossRef]
  5. Ravishanker, N.; Venkatesan, R.; Hu, S. Dynamic models for time series of counts with a marketing application. In Handbook of Discrete-Valued Time Series; Chapman and Hall/CRC: Boca Raton, FL, USA, 2016. [Google Scholar]
  6. Bradshaw, C.; Blei, D.M. A Bayesian model of underreporting for sexual assault on college campuses. Ann. Stat. 2024, 18, 3146–3164. [Google Scholar] [CrossRef]
  7. Cui, Y.; Zhu, F. A new bivariate integer-valued GARCH model allowing for negative cross-correlation. Test 2018, 27, 428–452. [Google Scholar] [CrossRef]
  8. Ahamad, N.; Gayah, V.V.; Donnell, E.T. Copula-based bivariate count data regression models for simultaneous estimation of crash counts based on severity and number of vehicles. J. Acc. Anal. Prev. 2023, 181, 106928. [Google Scholar] [CrossRef] [PubMed]
  9. Jeng, A.H.; Singh, R.; Diawara, N.; Curtis, K.; Gonzalez, R.; Welch, N.; Jackson, C.; Jurgens, D.; Adikari, S. Application of wastewater-based surveillance and copula time-series model for COVID-19 forecast. Sci. Total Environ. 2023, 885, 163655. [Google Scholar] [CrossRef] [PubMed]
  10. Xu, X.; Hardin, J.W. Regression models for bivariate count outcomes. Stata J. 2016, 16, 301–315. [Google Scholar] [CrossRef]
  11. Alqawba, M.; Fernando, D.; Diawara, N. A class of copula-based bivariate poisson time series models with applications. Computation 2021, 9, 108. [Google Scholar] [CrossRef]
  12. Nelsen, R.B. An Introduction to Copulas; Springer: Cham, Switzerland, 2007. [Google Scholar]
  13. Joe, H. Dependence Modeling with Copulas; Chapman and Hall/CRC: Boca Raton, FL, USA, 2014. [Google Scholar]
  14. Alqawba, M.; Diawara, N. Copula-based Markov zero-inflated count time series models with application. J. Appl. Stat. 2021, 48, 786–803. [Google Scholar] [CrossRef] [PubMed]
  15. Panagiotelis, A.; Czado, C.; Joe, H. Pair copula constructions for multivariate discrete data. J. Am. Stat. Assoc. 2012, 107, 1063–1072. [Google Scholar] [CrossRef]
  16. Genz, A.; Bretz, F. Computation of Multivariate Normal and t Probabilities; Springer Science & Business Media: Cham, Switzerland, 2009; Volume 195. [Google Scholar]
  17. Hothorn, T.; Bretz, F.; Genz, A. On multivariate t and Gauss probabilities in R. Sigma 2001, 1000, 3. [Google Scholar]
  18. Yang, K.; Zhao, Y.; Li, H.; Wang, D. On bivariate threshold Poisson integer-valued autoregressive processes. J. Metrika. 2023, 86, 931–963. [Google Scholar] [CrossRef]
  19. Pedeli, X.; Karlis, D. A bivariate INAR (1) process with application. J. Stat. Model. 2011, 11, 325–349. [Google Scholar] [CrossRef]
  20. Monteiro, M.; Scotto, M.G.; Pereira, I. Integer-valued self-exciting threshold autoregressive processes. Commun. Stat.-Theory 2012, 41, 2717–2737. [Google Scholar] [CrossRef]
Figure 1. Q–Q Plots of ML estimates for n = 1000 under positive cross-correlation with Poisson marginals.
Figure 1. Q–Q Plots of ML estimates for n = 1000 under positive cross-correlation with Poisson marginals.
Stats 08 00111 g001
Figure 2. Q–Q Plots of ML estimates for n = 1000 under negative cross-correlation with Poisson marginals.
Figure 2. Q–Q Plots of ML estimates for n = 1000 under negative cross-correlation with Poisson marginals.
Stats 08 00111 g002
Figure 3. Q–Q plots of the ML estimates for n = 1000 with positive cross-correlation with Negative Binomial marginals.
Figure 3. Q–Q plots of the ML estimates for n = 1000 with positive cross-correlation with Negative Binomial marginals.
Stats 08 00111 g003
Figure 4. Q–Q plots of the ML estimates for n = 1000 with negative cross-correlation with Negative Binomial marginals.
Figure 4. Q–Q plots of the ML estimates for n = 1000 with negative cross-correlation with Negative Binomial marginals.
Stats 08 00111 g004
Figure 5. Sexual offenses counts for Waverley and Northern beaches.
Figure 5. Sexual offenses counts for Waverley and Northern beaches.
Stats 08 00111 g005
Figure 6. Bar plot and ACF for counts of offenses for Waverley (Top) and Northern beaches (Bottom).
Figure 6. Bar plot and ACF for counts of offenses for Waverley (Top) and Northern beaches (Bottom).
Stats 08 00111 g006
Figure 7. Predicted values of the offense counts for the Waverley City based on the bivariate Negative Binomial model. Dots represent the observed counts, and the lines represent the predicted values.
Figure 7. Predicted values of the offense counts for the Waverley City based on the bivariate Negative Binomial model. Dots represent the observed counts, and the lines represent the predicted values.
Stats 08 00111 g007
Figure 8. Predicted values of the offense counts for the Northern Beaches based on the bivariate Negative Binomial model. Dots represent the observed counts, and the lines represent the predicted values.
Figure 8. Predicted values of the offense counts for the Northern Beaches based on the bivariate Negative Binomial model. Dots represent the observed counts, and the lines represent the predicted values.
Stats 08 00111 g008
Table 1. Bivariate copula functions.
Table 1. Bivariate copula functions.
CopulaCopula Function
Gaussian C ( u 1 , u 2 ; δ ) = Φ δ ( Φ 1 ( u 1 ) , Φ 1 ( u 2 ) ) , δ [ 1 , 1 ]
Frank C ( u 1 , u 2 ; δ ) = 1 δ log 1 + ( e δ u 1 1 ) ( e δ u 2 1 ) e δ 1 , δ R { 0 }
Gumbel C ( u 1 , u 2 ; δ ) = exp ( log ( u 1 ) ) δ + ( log ( u 2 ) ) δ 1 / δ , δ 1
Clayton C ( u 1 , u 2 ; δ ) = ( u 1 δ + u 2 δ 1 ) 1 / δ , δ > 0
Plackett C ( u 1 , u 2 ; δ ) = [ 1 + ( δ 1 ) ( u 1 + u 2 ) ] [ 1 + ( δ 1 ) ( u 1 + u 2 ) ] 2 4 u 1 u 2 δ ( δ 1 ) 2 ( δ 1 ) , δ 0
Bivariate t C ( u 1 , u 2 ; δ ) = τ δ ( τ 1 ( u 1 ) , τ 1 ( u 2 ) ) , δ [ 1 , 1 ]
Table 2. Parameter estimates using Gaussian copula for univariate and joint distributions with Poisson marginals.
Table 2. Parameter estimates using Gaussian copula for univariate and joint distributions with Poisson marginals.
Sample SizeParameterEstimateSEMSEMAE
50 λ 1 ( 4 ) 4.0990.5720.3210.478
λ 2 ( 6 ) 6.0960.5940.2470.411
δ 1 ( 0.5 ) 0.4180.1030.1150.119
δ 2 ( 0.4 ) 0.3330.1170.0170.112
ρ 1 ( 0.5 ) 0.4760.1010.0130.092
100 λ 1 ( 4 ) 4.1070.3950.1660.323
λ 2 ( 6 ) 6.1000.4340.1980.348
δ 1 ( 0.5 ) 0.4290.0670.0090.079
δ 2 ( 0.4 ) 0.3500.0740.0080.072
ρ 1 ( 0.5 ) 0.4580.0710.0070.066
300 λ 1 ( 4 ) 4.0990.2130.0550.188
λ 2 ( 6 ) 6.0790.2310.0590.194
δ 1 ( 0.5 ) 0.4360.0390.0060.065
δ 2 ( 0.4 ) 0.3520.0400.0040.052
ρ 1 ( 0.5 ) 0.4550.0390.0040.051
1000 λ 1 ( 4 ) 4.1020.1220.0240.129
λ 2 ( 6 ) 6.0810.1270.0190.118
δ 1 ( 0.5 ) 0.4370.0190.0040.065
δ 2 ( 0.4 ) 0.3570.0210.0020.044
ρ 1 ( 0.5 ) 0.4550.0220.0020.047
Table 3. Parameter estimates using Gaussian copula for univariate and joint distributions with Poisson marginals under negative cross-correlation.
Table 3. Parameter estimates using Gaussian copula for univariate and joint distributions with Poisson marginals under negative cross-correlation.
Sample SizeParameterEstimateSEMSEMAE
50 λ 1 ( 3 ) 3.1390.6130.3940.494
λ 2 ( 5 ) 4.9960.5080.2580.409
δ 1 ( 0.6 ) 0.4880.0920.0210.119
δ 2 ( 0.4 ) 0.3330.1220.0190.109
ρ ( 0.5 ) −0.4490.1090.0150.096
100 λ 1 ( 3 ) 3.1440.4230.1990.356
λ 2 ( 5 ) 5.0030.3790.1430.306
δ 1 ( 0.6 ) 0.5010.0630.0140.100
δ 2 ( 0.4 ) 0.3400.0810.0100.081
ρ ( 0.5 ) −0.4530.0750.0080.069
300 λ 1 ( 3 ) 3.1410.2290.0720.217
λ 2 ( 5 ) 4.9830.2010.0410.165
δ 1 ( 0.6 ) 0.5090.0350.0090.091
δ 2 ( 0.4 ) 0.3450.0400.0040.058
ρ ( 0.5 ) −0.4460.0380.0040.057
1000 λ 1 ( 3 ) 3.1390.1260.0390.159
λ 2 ( 5 ) 4.9740.1040.0110.092
δ 1 ( 0.6 ) 0.5120.0190.0080.087
δ 2 ( 0.4 ) 0.3510.0220.0030.049
ρ ( 0.5 ) −0.4430.0210.0030.048
Table 4. Parameter estimates using Gaussian copula for univariate and joint distributions with Negative Binomial marginals.
Table 4. Parameter estimates using Gaussian copula for univariate and joint distributions with Negative Binomial marginals.
Sample SizeParameterEstimateSEMSEMAE
50 λ 1 ( 4 ) 4.0500.7240.5250.578
κ 1 ( 1.5 ) 1.5540.6030.3670.428
λ 2 ( 6 ) 6.0570.8520.7280.669
κ 2 ( 2.5 ) 2.4690.7590.5760.591
δ 1 ( 0.3 ) 0.2650.1270.0170.105
δ 2 ( 0.3 ) 0.2680.1210.0160.097
ρ ( 0.5 ) 0.5050.1110.0120.085
100 λ 1 ( 4 ) 4.0980.5060.2650.405
κ 1 ( 1.5 ) 1.4210.3610.1310.276
λ 2 ( 6 ) 6.1080.6120.3850.480
κ 2 ( 2.5 ) 2.3150.5570.3310.426
δ 1 ( 0.3 ) 0.2780.0870.0080.071
δ 2 ( 0.3 ) 0.2800.0870.0080.071
ρ ( 0.5 ) 0.4990.0790.0060.063
300 λ 1 ( 4 ) 4.0850.2910.0920.241
κ 1 ( 1.5 ) 1.4510.1730.0440.175
λ 2 ( 6 ) 6.1180.3750.1550.311
κ 2 ( 2.5 ) 2.3420.2920.1330.308
δ 1 ( 0.3 ) 0.2920.0510.0030.041
δ 2 ( 0.3 ) 0.2860.0490.0030.041
ρ ( 0.5 ) 0.4960.0450.0020.036
1000 λ 1 ( 4 ) 4.1010.1750.0410.162
κ 1 ( 1.5 ) 1.3620.0930.0270.145
λ 2 ( 6 ) 6.0880.1920.0440.164
κ 2 ( 2.5 ) 2.2530.1480.0830.252
δ 1 ( 0.3 ) 0.2940.0270.0010.022
δ 2 ( 0.3 ) 0.2920.0290.0010.024
ρ ( 0.5 ) 0.4970.0240.0010.019
Table 5. Parameter estimates using Gaussian copula for univariate and joint distributions with Negative Binomial marginals under negative cross-correlation.
Table 5. Parameter estimates using Gaussian copula for univariate and joint distributions with Negative Binomial marginals under negative cross-correlation.
Sample SizeParameterEstimateSEMSEMAE
50 λ 1 ( 4 ) 4.0520.7310.5350.589
κ 1 ( 1.5 ) 1.5930.5640.3260.417
λ 2 ( 6 ) 6.1090.9150.8480.716
κ 2 ( 2.5 ) 2.5860.7120.5140.535
δ 1 ( 0.3 ) 0.2630.1260.0170.105
δ 2 ( 0.3 ) 0.2600.1250.0170.105
ρ ( 0.5 ) −0.4900.1120.0120.087
100 λ 1 ( 4 ) 4.1000.5070.2660.405
κ 1 ( 1.5 ) 1.4570.3510.1250.277
λ 2 ( 6 ) 6.0910.6520.4320.527
κ 2 ( 2.5 ) 2.3610.5630.3360.464
δ 1 ( 0.3 ) 0.2780.0890.0080.073
δ 2 ( 0.3 ) 0.2810.0880.0080.071
ρ ( 0.5 ) −0.4950.0810.0070.065
300 λ 1 ( 4 ) 4.0950.2990.0980.248
κ 1 ( 1.5 ) 1.3720.1740.0460.181
λ 2 ( 6 ) 6.1280.3610.1460.301
κ 2 ( 2.5 ) 2.2590.2880.1400.316
δ 1 ( 0.3 ) 0.2910.0510.0030.041
δ 2 ( 0.3 ) 0.2890.0520.0030.042
ρ ( 0.5 ) −0.4970.0470.0020.037
1000 λ 1 ( 4 ) 4.1080.2010.0520.181
κ 1 ( 1.5 ) 1.4190.1360.0250.131
λ 2 ( 6 ) 6.0920.2180.0560.189
κ 2 ( 2.5 ) 2.4020.2360.0650.024
δ 1 ( 0.3 ) 0.2950.0290.0010.023
δ 2 ( 0.3 ) 0.2890.0320.0010.027
ρ ( 0.5 ) −0.4870.0270.0010.024
Table 6. Parameter estimates of the sexual offense counts under different models.
Table 6. Parameter estimates of the sexual offense counts under different models.
ModelParameterEstimateSEAIC
BINAR(1) α 1 0.2760.0183031.74
α 2 0.0990.028
λ 1 12.6300.381
λ 2 5.6760.199
ϕ 0.4050.235
BINDSETINAR(2,1) α 1 , 1 0.0000.0412968.84
α 1 , 2 0.2850.023
λ 1 15.7120.617
α 2 , 1 0.0040.068
α 2 , 2 0.1150.033
λ 2 5.9730.344
B T I N A R I ( 1 ) α 1 , 1 0.0360.0382955.53
α 1 , 2 0.0440.030
α 2 , 1 0.3170.021
α 2 , 2 0.2930.047
λ 1 15.3290.596
λ 2 5.7610.207
ϕ 0.4770.275
B T I N A R I I ( 1 ) α 1 , 1 0.2000.0203010.99
α 1 , 2 0.0190.098
α 2 , 1 0.3370.018
α 2 , 2 0.0940.040
λ 1 12.8220.376
λ 2 5.8620.393
ϕ 0.3890.245
B T I N A R I I I ( 1 ) α 1 , 1 0.0930.0352960.57
α 1 , 2 0.0720.036
α 2 , 1 0.3310.021
α 2 , 2 0.3010.036
λ 1 14.5440.565
λ 2 5.7170.218
ϕ 0.5570.276
Table 7. AIC values of bivariate models fitted with different copula families.
Table 7. AIC values of bivariate models fitted with different copula families.
Marginal Distribution
Copula Negative Binomial Poisson
Gaussian2719.383214.91
Frank2747.753119.16
Clayton2728.313168.33
Table 8. Parameter estimates for the bivariate Poisson and Negative Binomial models.
Table 8. Parameter estimates for the bivariate Poisson and Negative Binomial models.
ParameterPoissonNegative Binomial
Estimate SE Estimate SE
λ 1 20.9290.00317.5710.052
κ 1 4.4060.538
λ 2 7.3970.0016.1420.047
κ 2 5.5911.011
δ 1 0.3410.0250.3770.129
δ 2 0.1570.0010.2070.071
ρ 0.1290.0320.0610.093
AIC3214.912719.38
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Fernando, D.; Jayanetti, W. A Copula-Based Model for Analyzing Bivariate Offense Data. Stats 2025, 8, 111. https://doi.org/10.3390/stats8040111

AMA Style

Fernando D, Jayanetti W. A Copula-Based Model for Analyzing Bivariate Offense Data. Stats. 2025; 8(4):111. https://doi.org/10.3390/stats8040111

Chicago/Turabian Style

Fernando, Dimuthu, and Wimarsha Jayanetti. 2025. "A Copula-Based Model for Analyzing Bivariate Offense Data" Stats 8, no. 4: 111. https://doi.org/10.3390/stats8040111

APA Style

Fernando, D., & Jayanetti, W. (2025). A Copula-Based Model for Analyzing Bivariate Offense Data. Stats, 8(4), 111. https://doi.org/10.3390/stats8040111

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop