Multivariate INAR(1) Regression Models Based on the Sarmanov Distribution

: A multivariate INAR(1) regression model based on the Sarmanov distribution is proposed for modelling claim counts from an automobile insurance contract with different types of coverage. The correlation between claims from different coverage types is considered jointly with the serial correlation between the observations of the same policyholder observed over time. Several models based on the multivariate Sarmanov distribution are analyzed. The new models offer some advantages since they have all the advantages of the MINAR(1) regression model but allow for a more ﬂexible dependence structure by using the Sarmanov distribution. Driven by a real panel data set, these models are considered and ﬁtted to the data to discuss their goodness of ﬁt and computational efﬁciency.


Introduction
In many areas, such as the actuarial field used in the application section of this paper, little attention has been paid to the possibility of including several dependence assumptions in the regression models to fit the data at hand. Specifically, we focus on two sources of dependence in panel count data: first, time dependence, or the serial dependence between observations of the same individual over time; second, cross dependence, or the dependence between types of observations of the same individual. A natural way to address this issue is through a multivariate time series model for count data. For a broad review see [1]. The family of multivariate integer-valued autoregressive models (MINAR) is a useful model that allows for certain flexibility without being complex.
These models can be applied to the ratemaking problem of pricing an automobile insurance contract with two types of coverage, considering the serial correlation between the observations of the same policyholder over time and the correlation between claims from different coverage types, as shown in [2], where a bivariate integer-valued autoregressive process of order 1, BINAR (1), is fitted to the data using a bivariate Poisson distribution to allow for cross correlation.
In this paper, we expand on the previous paper in two ways. First, we extend from a bivariate to a multivariate setting using a multivariate integer-valued autoregressive process of order 1, MINAR (1). Second, we make use of multivariate discrete distributions defined using the Sarmanov family to address the cross correlation. Such distributions can have interesting advantages with respect to other approaches that are available for the multivariate discrete case. For example, the multivariate Poisson distribution see, e.g., [3] suffers from the need to evaluate multiple sums, which can be slow and error-prone. At the same time, this model is restricted to positive correlation between variables. A different approach based on copulas see, e.g., [4] suffers from the fact that the joint probability mass function (pmf) needs evaluation of multiple terms since it cannot be written in a simple way. This can also create numerical problems. Therefore, our approach using the Sarmanov family provides models that are less computationally intensive but can still have a reasonable range of correlation structure.
In sum, we propose a MINAR(1) regression models based on the Sarmanov family of distributions to fit panel count data. We model the time correlation via a MINAR(1) model while we use multivariate discrete distributions based on the Sarmanov derivation to model the cross correlation. The model allows for the inclusion of covariates. In Section 2, we define the background material needed to fully describe the model and its estimation in Section 3. In the application Section 4, models are applied to a Spanish insurance claim counts database. Finally, some remarks can be found in the concluding Section 5.

Multivariate Integer-Valued Autoregressive Models
In the univariate setting, integer-valued autoregressive models have been suggested as discrete counterparts of the standard Gaussian autoregressive process. The integervalued autoregressive model of order 1 (INAR(1)) [5,6] is defined as follows.
where Y 0 represents an initial value of the process and α ∈ [0, 1). The sequence { t } is usually referred to as the innovation process and consists of uncorrelated non-negative integer-valued random variables with mean µ and finite variance σ 2 . At each time point t, t is independent of Y t and of α • Y t−1 .
The above definition tries to mimic the classical AR(1) models and is based on the notion of binomial thinning. The binomial thinning operator "•" is defined in [7]: where {Z j } are independent identically distributed Bernoulli random variables with P(Z j = 1) = 1 − P(Z j = 0) = α. The binomial thinning introduces serial dependence through conditioning on Y t−1 while preserves the integer nature of the process. Parametric models can be constructed by an appropriate choice of distribution of the innovations. The autocorrelation at lag h is given by ρ(h) = α h , for any non-negative integer h. The implication is that similarly to a Gaussian autoregressive process, ρ(h) decays exponentially with lag h and is strictly positive.
Extending the above approach to a multivariate setting, it is assumed that A is a m × m matrix with entries α ij with 0 ≤ α ij ≤ 1 for i, j = 1, . . . , m, and Y is a random vector with values in N m . A • Y is an m-dimensional random vector with i-th component where the discrete series in all α ij • Y j , i, j = 1, . . . , m are assumed to be independent. Useful properties of this matricial operator can be found in [8,9].
Based on (3), we can define a multivariate integer-valued autoregressive process of order 1 (MINAR(1)) as in [10]: where {R t } t∈Z is a sequence of non-negative integer-valued random vectorswith mean µ R and variance Σ R , independent of A • Y t−1 . Then, the ith element of the MINAR(1) process is where α ij • Y j,t−1 are assumed to be mutually independent binomial thinning operations as defined in (2). The non-negative and integer-valued random process {Y t } t∈Z is the only strictly stationary solution of (4) if the largest eigenvalue of A is less than 1 and E||R t || < ∞. Basic thinning operation properties can be used to show that the mean vector µ and covariance matrix γ(h) at lag h of the process Y t are given by see [11]: Please note that γ(0) is the stationary variable of the variance matrix of the process. Please note that an important ingredient of the model is the choice of distribution for the innovations, i.e., the distribution of R t . A thorough discussion of models and distributions for multivariate counts can be found in [1]. Here we will use a different approach: a multivariate discrete distribution defined using the Sarmanov family of distribution. A detailed discussion about this is provided in Section 2.2. Applications of the MINAR model are described in the recent paper by [12].
The method of conditional maximum likelihood can be considered for the estimation of the MINAR(1) process [10]. Let θ = (vec(A) , µ R , vec(Σ R ) ) be the unknown parameter vector. The maximum likelihood estimator (MLE) of θ is defined asθ = argmax θ (θ) where is the maximum log-likelihood function. The conditional probability functions involved in (6) are convolutions of m sums of binomials and a distribution g(k 1 , . . . , k m ) = P(R 1t = k 1 , . . . , R mt = k m ), (8) corresponding to the joint distribution of the innovations {R t }. Hence, f (y t |y t−1 , θ) can be expressed as the multiple sum where m i = min(y it , y i,t−1 ), i = 1, . . . , m. Numerical techniques can be used for the maximization of (6). However, the numerical difficulty of the maximum likelihood approach, increase intensely with a dimensional increase [10] and the assumption of a cross-correlated innovation process. To avoid such complications, Ref. [11] consider a simplified MINAR(1) model where a unique source of dependence between the univariate series is assumed. In particular, they assume {R t } follow jointly a discrete multivariate distribution while A is a m × m diagonal matrix with independent elements α i = [A] ii , i = 1, . . . , m. The assumption for A substantially reduces the correlation structure. Now, each univariate series {Y it } at time t is a function of its own predecessors at time t − 1, but not of the predecessors of the rest of the series, i.e., Ref. [11] suggested a pairwise likelihood approach for the estimation of this reduced model which transform the multivariate estimation problem to a set of bivariate problems. Finally, in the above model, additional covariate information can be applied by assuming some functional relationships for the mean of the innovation terms. The model is no longer stationary.

Sarmanov Family
The Sarmanov family was introduced in [13]. Ref. [14] studied some general methods for the construction of families considering different types of marginal distributions. Here we present the case for discrete distributions. The Sarmanov family has the well-known Farlie-Gumbel-Morgenstern (FGM) copula as a special case. Consequently, it is strongly connected with copula-based models. For the discrete case, this family has the additional advantage that the joint pmf can be written as a single expression, while in copula-based models we can have serious numerical issues for calculating the pmf.
Assume that P 1 (x 1 ) and P 2 (x 2 ) are two pmf with supports defined on A 1 ⊆ R and Then a joint pmf defined by where the factor ωq 1 (x 1 )q 2 (x 2 ) measures the departure of two variables X 1 , X 2 from independence and ω is a real number that satisfies the condition In the case where ω = 0, the variables X 1 , X 2 are independent. Depending on choices for the functions q(x), we can derive different cases. For example, a well-known case is when |q i (x i )| < 1 [14] and satisfying the condition Considering the case of uniform [0, 1] marginals F 1 (x 1 ) = u and F 2 (x 2 ) = v, what is known as the FGM family of copulas arises, defined as: The overall constraint on ω is given by: We make use of another case, referenced in [14,15]. We use is the value of the Laplace transform of the marginal distribution evaluated at s = 1, i.e., where P(·) is the marginal distribution. This has led to several bivariate discrete distributions. The pmf has the form where L i (·) is the Laplace transform for the i-th marginal, i = 1, 2. For example, consider the Poisson marginal with mean λ, then L(k) = exp(−λ(1 − exp(−k))). This bivariate distribution has been studied in [15]. The joint pmf is given by x, y = 0, 1 . . ., λ 1 , λ 2 > 0 and ω is a dependence parameter with a suitable range of values.
. The correlation is given by while in the general case where u i = −L(1) − L(1)µ i , and µ i , σ 2 i are the mean and the variance of the marginal models. For a general function g(·) we have that and thus, we can get the results in all cases. Please note that we assume that the expectations exist.
In a similar manner, we can define the negative binomial case. The bivariate negative binomial is defined as where µ j (j = 1, 2) is the mean parameter for each case while the marginal variances are and hence φ j is the overdispersion parameter for each dimension. In addition, we use as which is the L(1), i.e., the Laplace transform of the negative binomial distribution evaluated at t = 1. This type of Sarmanov model has been applied extensively and many models have been described. The negative binomial case has been examined in [16]. For other distri-butions, the generalized Poisson is described in [17,18]. In addition, Ref. [19] examined a double Poisson case. Other such constructions refer to truncated Poisson [20,21] for a bivariate exponentiated exponential geometric case, Ref. [22] for zero-inflated power series, Ref. [23] for zero-inflated generalized Poisson, and [24] discussed a bivariate Poisson inverse Gaussian model. A bivariate Poisson-Lindley model can be found in [25,26].
In a context similar to what we attempt here, the bivariate Poisson in (15) has been used for bivariate integer-valued time series in [27,28] for a bivariate INGARCH type and in [29] for a BINAR model. In addition, Ref. [30] used a bivariate Poisson-Lindley as an innovation distribution for a BINAR model. Finally, actuarial usage of such a family is used in [31,32].
Other choices for g(x) are given in [24] and include q(x) = x α − E(X α ) or q(x) = P(x) − E(P(x)), but in this case some restrictions apply. In addition, the FGM family is a special case of the Sarmanov family but a restricted interval is needed to work well, and in our case the positive integers are not a restricted interval. For further generalization, see the work of [24].
In this model, dependencies between all parameter combinations are calculated but the construction has a high computational cost. Please note that parameter ω ij measures the dependence between X i and X j while parameter ω ijk measures the dependence between the triplet X i , X j , X k in a similar fashion to a 3-way interaction. The above general form of the model assumes up to d-way dependence between all variables. This is perhaps overkill because of the added complexity and the difficulty in interpreting the parameters, which may be unnecessary for real applications. As expected, the calculation of correlation is complicated.
To improve on this, we use a limited version based on the construction defined above. In our model we consider dependence only between two variables taking all combinations up to 2-way, namely It is easy to prove that this structure has properties of multivariate Sarmanov as proposed by [33]. Under this construction q i (x i ), i = 1, . . . , d are bounded functions.
Using the extension of the model in [15], we can create a trivariate Poisson distribution using again the Laplace transform. Now the joint pmf is given by where x, y, z = 0, 1 . . ., λ 1 , λ 2 , λ 3 > 0 and ω ij 's are the dependence parameters with a suitable range of values. Again, c = 1 − exp(−1). The above can be extended to other distributions such as a negative binomial by considering the appropriate functions. See the bivariate case given in (17). Therefore, we get for the trivariate negative binomial that where c j are defined in (18). The correlation coefficients are given by: where v i , v j , defined similar to (16). Such multivariate models have been described in [19,34]. In addition, an interesting discussion can be found in [32,35] where different models for multivariate counts were proposed.

The Model
Consider that we have n individuals. Each individual is observed at a certain number of time points T i . We observe for individual i at time point t the vector Y it , i = 1, . . . , n, and t = 1, . . . , T i . Without loss of generality, we can assume a different length of observations for each individual, say T i .
We define vector Y it as Y it = (Y 1it , . . . , Y mit ) , i.e., we observe for each time point t and the i-th individual m different variables (j = 1, . . . , m). In our application, m = 3 and each vector at each time point refers to the number of claims for the three types of claim at this time point and individual.
To capture the time correlation, we model the temporal correlation via a MINAR(1) model assuming that where now matrix A is diagonal to create a more parsimonious representation. Please note that we can assume that this matrix may depend on the characteristics of the i-th individual or has some temporal structure, i.e., using A it instead. This is not the case for our application. In other terms, we assume the same time correlation structure for all individuals and all time points. This assumption may be relaxed if necessary. R it is a vector of size m with the innovations that drive the model. We assume that R it follows some trivariate (m−variate, m = 3) discrete distribution as defined by a trivariate Poisson-Sarmanov distribution (21) or a trivariate negative binomial Sarmanov (22), discussed in Section 2. For our application, the marginal distributions are Poisson or negative binomial, respectively, with means λ j , j = 1, 2, 3. We consider the case of a trivariate negative binomial defined in (22) to account for overdispersion in the marginal distributions where an additional overdispersion parameter φ j for j = 1, 2, 3 is needed to account for the overdispersion of each variable. Recall that if the overdispersion parameter tends to ∞, the negative binomial tends to a Poisson distribution. In some sense, a negative binomial is more general than the Poisson, as it includes the Poisson as a special case.
In addition, the mean of each of the variables is associated with some covariate information that can be time related (i.e., changes over time) by the usual log link, i.e., where X jit is a vector of coefficients associated with the j-th variable, for individual i at time point t. In this application, we assume the same covariates for all the variables and hence the models, dropping the first subscript, simplify to From the above definition, the cross correlation of the models comes from the multivariate joint distribution used for the innovations. The time correlation is captured by the diagonal matrix A, each parameter of which is associated with the autocorrelation of each of the variables. Notably, the parameters ω 12 , ω 13 and ω 23 measure the cross correlation between the three variables.
The parameters that need to be estimated are, the diagonal elements of A, say α 1 , α 2 , α 3 that measure the autocorrelations for the three variables, the vector of regression coefficients β j associated with the three variables and the parameters ω 12 , ω 13 and ω 23 . Please note that we again assume that they are the same for all individuals and we do not assume any covariate information for them.

ML Estimation
Based on the above derivation the likelihood for the i-th individual L i that contains the information at all T i time points will be Assuming that we have m = 3, i.e., 3 variables, Θ is the totality of parameters to be estimated, namely Θ = (α 1 , α 2 , α 3 , β 1 , β 2 , β 3 , ω 12 , ω 13 , ω 23 ).
Having obtained the individual likelihood L i , one can see that the individual loglikelihood is simply and hence the log-likelihood to be minimized is i.e., the sum all the individual log-likelihoods. Obviously, this is computationally demanding, and we may think of ways to simplify it.

Data
The data used in this section belongs to an automobile portfolio from an insurance company operating in Spain. The same data have been used previously in [2,[36][37][38][39]. For this application, only cars categorized as being for private use and policies with full coverage were considered, resulting in a database of 14,386 policyholders. For each policy, the total number of policyholder claims, related to three types of coverage, were reported within a yearly period. First, we counted as N 1 type those third-party liability coverage claims. Second, we counted as N 2 type both the comprehensive coverage claims (damage to the policyholder's vehicle caused by any unknown party, including theft, flood or fire) and the collision coverage claims (damage resulting from a collision when the policyholder is at fault). Finally, we counted as N 3 type those claims related to a set of basic guarantees that include emergency roadside assistance or legal and medical assistance.
We use seven years of data for each policyholder, so all T i = 7. For each individual, we have seven observations made at successive time points for the three types of claim considered here (i.e., third-party liability, comprehensive and collision, and all other guarantees) and for a set of covariates, some of which vary across time. Table 1 describes these covariates.  Table 2 presents the fitted models. We used trivariate Poisson and trivariate negative binomial distributions for the innovations. For each family, we fitted the models that assume: (1) full independence, neither time correlation nor cross correlation (ω 12 = ω 13 = ω 23 = α 1 = α 2 = α 3 = 0); (2) only time series correlation (ω 12 = ω 13 = ω 23 = 0); (3) only cross correlation (α 1 = α 2 = α 3 = 0); and (4) full structure, with both time and cross correlation.

Results
All models were fitted using the optim function in R. Initial parameters were selected from the previous model by assuming the value 0 for the extra parameters. Please note that the full independence model actually fits 3 separate GLM (Poisson or negative binomial regression). We also report the number of parameters and the AIC defined as 2L(M) + 2d M where L(M) is the maximized log-likelihood for the model M, and d M is the number of parameters. Standard errors were obtained using the Hessian matrix from the optim function.
A comparison of the model, based on the log-likelihoods and formal LRT statistics, shows that both terms are needed: cross correlation as captured by the trivariate distribution and time correlation as captured by the MINAR(1) model. The improvement in the trivariate model is much larger. The negative binomial model is improving further since it can capture the observed overdispersion and the excessive zero counts. Overall, the results support the time series model with trivariate negative binomial judged by the best value of AIC. Other information criteria select the same model. The results for the selected model are shown in Table 3. The regression coefficients for the three variables can be seen, as well as the autocorrelation parameters and the overdispersion parameters of the three variables. For N 2 , autocorrelation is very small and non-significant. In addition, no variables were found to be significant for N 2 . This was expected since we have observed mostly 0 and 1 values. Autocorrelations for N 1 and N 3 are also small, but significant and necessary to fit the data at hand. For variable N 1 , ZON and AGE are significant, while for N 3 , ZON, LOY, AGE and POW are significant. To examine the performance of the model, we used the following approach. For all observations, we kept out the last time point, i.e., t = 7. Then, we fitted the model up to time point t − 1 and calculated the expected frequencies for all triplets for the time point t = 7 based on the conditional distribution P(N 1t = n 1 , N 2t = n 2 , N 3t = n 3 |N 1,t−1 = x, N 2,t−1 = y, N 3,t−1 = z).
Next, summing up all the individuals, we obtain what we expect to see for that sample of data for time point t = 7, given what we have seen up to point t = 6, i.e., we derive the expected frequencies. To examine the 3-variate structure of the data, Table 4 presents the observed and fitted numbers for the largest observed frequencies. The fitted numbers are close to the observed ones. A χ 2 goodness of fit test gives a value of 14.382 with 11 degrees of freedom leading to a p-value of 0.2125, which implies a satisfactory fit for the model.
The marginal cases for the three variables can be seen in Figure 1. We have plotted for each of the three variables the observed and expected frequencies. Quite a good fit can be seen. Please note that for N 2 , the observed values at t = 7 had values of only 0 and 1. Overall, we believe that the model captures the underlying structure and describes the data at hand satisfactorily. Table 4. Observed and expected frequencies for the most frequent triplets n 1 , n 2 , n 3 . The fit is satisfactory. We obtain χ 2 = 14.382 and p-value = 0.2125.

Conclusions
In this paper, we developed a model to account for time and cross dependence for a case of longitudinal multivariate count data. The time series part was captured by a MINAR model with a diagonal structure. This was sufficient to account for the time dependence in our case, but more complicated structure can be considered, such as a non-diagonal matrix A or higher order models. In addition, families like multivariate INGARCH models see [40] could be considered.
The dependence between the count variables was captured by Sarmanov-type multivariate distributions. This provides a flexible way to define multivariate distributions reducing computational needs, such as those that copula-based models have. In our case, we considered Poisson and negative binomial marginal distributions. Other choices like a zero-inflated model are also plausible, with small changes in the model development.