Two important aspects related to the practical implementation of the 2D-GSB process are examined here. First, numerical Monte Carlo simulations of the basic series of the 2D-GSB model are carried out, for which the previously described estimators are calculated, and their efficiency is analyzed. Then, based on real data, the application of the 2D-GSB process is presented in the analysis of the dynamics and empirical distributions of the total number of different forms of criminal offenses in the Republic of Serbia.
5.1. Numerical Simulations of the 2D-GSB Estimates
This section describes the estimation of the parameters of the 2D-GSB model, based on
independent Monte Carlo replications of the basic 2D-GSB series. In the first step, a series of innovations
are generated as independent and identically distributed vectors with two-dimensional normal distribution
, and thereafter, the indicator series
, defined as in Equation (4), is easily determined. According to this, the basic 2D-GSB series
is constructed, with the mean vector
as well as the bivariate increments
, which are used for estimation of the unknown parameters
The numerical simulations are designed to examine the finite-sample behavior of the proposed estimators and to verify the theoretical results derived in
Section 4. In particular, the Monte Carlo study focuses on the accuracy, stability, and asymptotic properties of the estimators under repeated sampling from the 2D-GSB process. In doing so, for the basic series
the mean vector
is taken, and for the threshold parameter the value
is chosen, as well as for the covariance matrix:
It is worth noting that then, according to the previous considerations (see Remark 3), the parameter
, defined by Equation (4), represents the survivor function of the mixture
distribution, which does not have a closed form. Thus, it is estimated here by an additional Monte Carlo experiment with
independent realizations. Note that using such extensive simulations, the estimated value of
is obtained with high accuracy, so it can be used as a reference value. In this way, as the true values of the parameters, the vector
is obtained.
The estimates of the vector
are calculated using estimators
and
, defined by Equations (45) and (46), respectively. To this end, realizations of the series
of length
are observed, and descriptive statistics (Min, Mean, Max), along with the appropriate estimation errors, i.e., bias, standard deviation (StDev) and root mean-squared error (RMSE), of the estimates thus obtained are shown in
Table 1. As mentioned above, due to the non-stationarity of the series
, estimates of the mean vector
have an unbounded asymptotic variance, so there is a large range of their observed values. Nevertheless, it is obvious that the estimator
is more efficient than
, because its error statistics are significantly smaller. Furthermore, in order to investigate the asymptotic properties of the estimates thus obtained, they were also tested in relation to the AN property, and the results of these tests are also presented in
Table 1. For this purpose, the following three statistical tests of normality were used:
- -
Shapiro–Wilk normality test (SW);
- -
Anderson–Darling normality test (AN);
- -
Jarque–Bera normality test (JB).
Test statistics, as well as their corresponding
-values (listed in parentheses above), were calculated using procedures from the R-4.5.2 package “nortest” [
33]. It is evident that both estimators
and
have the AN property, even though they are obtained from the realization of a non-stationary series
. It should be noted that this is closely related to Theorems 4 and 5, which, among others, describe AN properties of scaled processes based on the observed GSB series
.
Further, the MoM estimates of the true parameter
are simply obtained by using Equations (26)–(28), while the ECF estimates are calculated by minimizing the integral given by Equation (36). Hence, similarly as in Milovanović [
34], the well-known Gauss-Hermite cubature are used, with the weight function
and 81 cubature nodes, where the entire procedure is obtained using the R-4.5.2 package “statmod” [
35]. Thereafter, taking the previously obtained MoM estimates as initial values, the objective function given by Equation (36) is minimized using the constrained optimization procedure “L-BFGS-B” [
36], also implemented in the statistical programming language “R”. Finally, in order to examine the efficiency, as well as other previously mentioned asymptotic properties of the estimates thus obtained, different series lengths
are considered. Their basic descriptive statistics, along with statistics and
-values of the aforementioned normality tests, are also calculated in the statistical software “R” and presented in the following
Table 2,
Table 3 and
Table 4.
The results thus reported indicate that both estimation procedures perform satisfactorily even for moderate sample sizes. The empirical means of all estimated parameters are very close to their true values, while the corresponding biases remain small and mainly decrease as the sample size increases. A mild non-monotonic behavior of the finite-sample accuracy can be observed for the variance parameter , which is typical for mixture-type models. Additionally, within the 2D-GSB framework, variance parameters enter both the dispersion structure and the regime-selection probability , which depends implicitly on the matrix . Overall, as expected, the dispersion of the estimates measured through standard deviations and mean squared errors generally decreases as the sample size increases. For larger samples, including the longest series considered (), the estimates exhibit small bias and moderate variability across all parameters, supporting their suitability for empirical applications based on longer time series.
Overall, the simulation results indicate that both estimation approaches perform satisfactorily across different sample sizes. The MoM procedure is computationally straightforward and particularly suitable for quick preliminary estimation or large datasets, due to its closed-form structure. In contrast, the ECF approach involves higher computational cost but exhibits stronger asymptotic efficiency properties. The results reported in
Table 2,
Table 3 and
Table 4 suggest that the ECF estimator tends to achieve slightly lower dispersion and mean squared error in moderate samples, whereas MoM remains stable and practically convenient. This trade-off highlights the complementary roles of the two procedures in applied implementation. Also, note that in practical implementation, the threshold parameter
can be determined either via its theoretical one-to-one relationship with
or through a quantile-based calibration of the innovation norm. This ensures a transparent and data-driven specification of regime activation.
Finally, note that although the model is derived under Gaussian innovation assumptions, the mixture-based structure provides a degree of robustness to moderate deviations from normality, as reflected in the empirical application. As an illustration,
Figure 6 and
Figure 7 display the Q–Q plots of the empirical distributions of the estimated parameters against the corresponding Gaussian quantiles for
. The plots provide graphical support for the asymptotic normality of both estimation procedures and indicate a slightly improved finite-sample behavior of the ECF estimators. In this way, these graphical representations are consistent with the theoretical asymptotic results and with the variance and RMSE comparisons shown in
Table 2,
Table 3 and
Table 4. In general, the Monte Carlo results confirm the above-mentioned theoretical properties of the 2D-GSB estimator and provide the possibility of their applicability in practical, multivariate time series analysis.
5.2. Application: A Case Study of Crime Dynamics
After determining the properties of the proposed estimators over finite samples through Monte Carlo simulations, we consider here the empirical application of the 2D-GSB framework. To illustrate the practical performance of the proposed model, we apply it to real-world multivariate time series representing the total number of specific criminal offenses committed on the territory of the Republic of Serbia. The data were obtained based on official records of the Ministry of Internal Affairs of the Republic of Serbia, which are monitored daily, starting from 1 January 2015 and ending with 31 December 2024, which resulted in a time series length of . It should be noted that each of the observed series is obtained and classified according to the official Criminal Code of the Republic of Serbia (code KD_xxx), where bivariate series contain data on related criminal activities as their components. In this way, two bivariate series are observed, designated as Series A and Series B, whose components are the following:
A1: Petty theft (code KD_203).
A2: Aggravated theft and robbery (code KD_204).
B1: Counterfeiting money, securities, counterfeiting and misuse of payment cards (codes KD_241-244).
B2: Document falsification and other special cases of document falsification (codes KD_355-357).
The dynamics of both bivariate time series are illustrated in
Figure 8, where the pronounced fluctuations, i.e., sudden “jumps” in the number of committed criminal acts, are clearly visible. At the same time, the intercorrelation between the components of both bivariate series is noticeable even at first glance. Therefore, use of a synchronized threshold mechanism is particularly suitable in this context, as external shocks (e.g., policy changes, economic disturbances, or enforcement actions) may simultaneously affect related crime categories. The common regime indicator thus provides a natural interpretation of coordinated spikes and structural shifts observed in the data. Note that although the observed series represent daily crime counts, the proposed 2D-GSB model is not intended to directly model the count-valued observation space, but rather to capture the underlying common dynamics of pronounced fluctuations and synchronized regime changes.
In this context, the descriptive statistics reported in
Table 5 reveal a pronounced overdispersion of both series (especially Series A), as well as extremely heavy-tailed behavior, reflected in very high kurtosis and skewness in Series B. In addition, the average value of document forgeries (component B
2) is approximately 7.3 per day, but the range varies from as few as 0 to as many as 304 such crimes per day. Along with the significant cross-dependence between their components, these features motivate the use of a latent regime-based framework rather than standard count-based models. Also, in contrast to univariate approaches, the two-dimensional GSB framework enables the joint modeling of related crime categories while explicitly accounting for cross-dependence in their extreme dynamics.
Since the original series represent crime counts and exhibit pronounced heteroscedasticity and skewness, a logarithmic transformation (“log-volume”) is applied prior to modeling. This transformation not only stabilizes the variance and reduces asymmetry but also facilitates a closer approximation of the increment process by a Gaussian mixture distribution, as assumed in the 2D-GSB framework. Consequently, the transformed increment series is more consistent with the underlying distributional structure of the model. For these reasons, as basic bivariate series
and
, the realizations of the so-called log-volumes, i.e., logarithmic values of series A and B, are observed as follows:
As is stated in [
37,
38], the main goal of these transformations is to more evenly obtain values of both series, while based on increasing of the logarithmic function, the emphasis of fluctuations will remain. Additionally, note that, unlike the series
, which represents the usual log-transformation, the series
is a so-called shifted log-transformation, as a consequence of the equality
. In this way, from inequalities
and
, it follows that both series of log-volumes are non-negative
.
Further, using the log-volumes as a basic bivariate series, the location parameter
for both series is estimated, following the procedure described in
Section 4.3. In more detail, the
-estimates are obtained according to Equations (45) and (46), which correspond, respectively, to the sample and weighted mean values of the bivariate series
. Using Equations (11) and (12), the increment series
and
are then constructed. Based on these series, the remaining parameters collected in the vector
are estimated, including the probability of exceeding the threshold and the elements of the covariance matrix. To this end, the procedures presented in
Section 4.1 and
Section 4.2 are applied, namely the method of moments (MoM) and empirical characteristic functions (ECF) method, thus ensuring consistency with the theoretical framework developed previously.
The resulting estimates reported in
Table 6 demonstrate stability and interpretability, thereby enabling further analysis of different crime categories. In particular, the series-specific
-estimates reflect systematic differences in the average growth rates of the corresponding crime categories. At the same time, the estimated
-parameters suggest substantial variability and cross-dependence in the increment dynamics, justifying the use of a multivariate threshold-based model. Note that the magnitudes of the estimated parameters remain stable and interpretable across estimation methods, supporting the adequacy of the proposed inference framework. The modest differences between the MoM and ECF estimates remain within a comparable range and do not change the overall structural interpretation.
From an interpretative perspective, the estimated parameters provide additional insight into the structural dynamics of the analyzed crime categories. For Series A (petty theft and aggravated theft/robbery), the estimated probability suggests that coordinated shock-activated episodes occur in roughly 13% of observations, indicating recurrent but not dominant structural fluctuations affecting both offense types simultaneously. In contrast, for Series B (counterfeiting-related offenses and document falsification), the estimated probabilities range between 0.13 and 0.16, implying a comparable frequency of coordinated disturbances. However, the substantially higher estimated threshold (approximately 1.3–1.5, compared to 0.23–0.26 in Series A) indicates that more pronounced innovation magnitudes are required to trigger regime activation in financial and document-related crimes. This suggests that while synchronized shifts occur with similar frequency across both crime groups, the intensity of shocks necessary to activate such shifts differs, reflecting potentially distinct structural sensitivity patterns within the two categories.
To further assess the empirical adequacy of the proposed model, we compare the 2D-GSB specification with a standard first-order vector auto-regression (VAR(1)) benchmark estimated on the stationary increment series
and
.
Table 7 reports the corresponding log-likelihood (LogLik), Akaike and Bayesian information criteria (AIC and BIC), together with the joint root mean square error (RMSE) for both models and both series. The joint RMSE is defined as the square root of the arithmetic mean of the component-wise mean squared deviations between empirical and fitted densities, thereby providing a single aggregate measure of distributional fit. As can be seen, the 2D-GSB model achieves higher log-likelihood values and lower information criteria and discrepancy measures than the standard VAR(1) specification. It is also worth noting that the proposed 2D-GSB framework involves only four parameters, compared to seven in the VAR model. These results indicate a more parsimonious yet substantially improved distributional fit of the increment process, particularly in the case of Series A.
In addition,
Figure 9 presents the empirical marginal distributions of the bivariate increments together with the Gaussian fit implied by the VAR(1) specification and the Gaussian mixture of the increments implied by the 2D-GSB model introduced in
Section 3. The VAR(1) estimation is carried out using the R-4.5.2 package “vars” [
39], while the 2D-GSB parameters are obtained via the MoM procedure for Series A and the ECF method for Series B, as described previously. As illustrated in the figure, the mixture representation underlying the increments of the 2D-GSB model provides a closer alignment with the empirical distributions, particularly in capturing increased dispersion and heavier-tail behavior. In contrast, the single-Gaussian structure of the VAR(1) model tends to underestimate the probability of extreme observations in most cases. Overall, the visual agreement between the empirical histograms and the fitted mixture densities further supports the adequacy of the 2D-GSB framework for modeling synchronized regime dynamics in the observed data.
Further, fitting of the empirical distributions of the underlying Series A and B is carried out. Due to the distinct transformations in Equation (47), the implied distributions of these series differ. While Series A follows a mixture of bivariate log-normal distributions, Series B is characterized by a mixture of shifted bivariate log-normal distributions. In both cases, the Jacobian of the inverse transformation plays a crucial role, ensuring a proper mapping from the latent Gaussian mixture to the observable crime counts. Thus, the fitted distributions of the original crime series are obtained by an explicit change-of-variables procedure based on the transformations defined in Equation (47).
Let
denote either of the latent processes given by Equation (47). From the theoretical results in
Section 3,
follows a discrete mixture of bivariate Gaussian distributions
where
is the PDF of the bivariate Gaussian distribution. Thus, for Series A, the inverse transformation
yields the density
where
Similarly, for Series B, using the inverse map
, the resulting density is given by
where
Thus, the proposed framework induces a mixture of bivariate log-normal distributions for Series A and a mixture of shifted bivariate log-normal distributions for Series B, providing a link between the latent 2D-GSB dynamics and empirical distributions of observed crime counts. Nevertheless, it is worth noting that due to the non-stationarity of the mentioned series, which also depend on time
, it is necessary to apply some numerical procedures to calculate their PDFs. For this purpose, the R-4.5.2 package “distr” [
40] is used, and the results of the applied procedure are shown in
Figure 10.
As illustrated in
Figure 10, the empirical distributions of the original crime counts are shown together with the fitted theoretical densities obtained via the inverse-log mixture representations implied by the 2D-GSB model. For Series A (theft-related offenses), the fitted log-normal mixtures capture both the central mass and the pronounced right tails of the distributions, indicating that the proposed model adequately reflects the observed variability and intermittency. Similarly, for Series B (counterfeiting-related offenses), the shifted log-normal mixtures provide a satisfactory approximation of the highly skewed empirical distributions, particularly in the lower-count region and the gradual tail decay. Overall, the agreement between empirical histograms and fitted densities confirms that the mixture structure derived from the latent 2D-GSB dynamics translates effectively to the level of observed criminal activity. In particular, extreme crime counts are naturally explained as realizations generated under the high-variance regime, without the need for additional ad hoc distributional assumptions.