1. Introduction
Poisson distribution is one of the most widely utilized univariate and multivariate regression models for count data analysis. The univariate regression model is comprised of one response variable, while the multivariate regression model is made up of two or more response variables that are correlated. There are several multivariate count data models based on Poisson distribution, namely multivariate Poisson regression model, multivariate Poisson-gamma mixture model, multivariate Poisson log-normal model, and multivariate generalized Poisson regression model [
1,
2].
Poisson distribution is characterized by its restrictive assumption on equidispersion, a condition where the sample variance is equal to the mean. However, when the sample mean is less (greater) than the variance, it is called overdispersion (underdispersion). The multivariate Poisson regression model assumes equidispersion and allows for nonnegative correlations, while the multivariate Poisson-gamma mixture model handles overdispersion and nonnegative correlations. Furthermore, the multivariate Poisson-log-normal model allows overdispersion and a more flexible correlation structure. The multivariate generalized Poisson regression (MGPR) model is the most flexible amongst all because it allows any type of dispersion and correlation.
The MGPR model developed by Famoye [
2] is a regression model based on multivariate generalized Poisson distribution, which was obtained by the same procedure in constructing bivariate generalized Poisson distribution, which is a product of generalized Poisson marginal with a multiplicative factor [
3,
4]. To date, bivariate and multivariate generalized Poisson regression models have been applied to real data with units of analysis such as elderly, women, patients, soccer teams, or households [
2,
5,
6]. These models are global regression models that assume the relationships between variables are spatially constant.
When the units of analysis used are locations, spatial effects need to be taken into account in modeling the relationships between the explanatory variables and the response variables. Two possible issues arise when the sample data contain spatial elements, namely spatial dependency and spatial heterogeneity [
7]. Out of these two, spatial dependency or spatial autocorrelation is best known and most acknowledged. Spatial dependency occurs when the intensity of an event at one location influences the intensity at surrounding locations. Meanwhile, spatial heterogeneity describes the distribution of an event and the distribution of a relationship across a landscape. This means that what matters in a socio-economic relationship may not be homogenously distributed across space, such as infant mortality rates or unemployment rates at different locations. Both of these issues tend to arise when data are collected by location units such as provinces, districts/cities, sub-districts, villages, etc.
One statistical technique commonly used to address spatial effects is geographically weighted regression (GWR) model [
8]. The GWR model allows the relationships between response variables and explanatory variables vary over space so the regression coefficients systematically vary across space. With this method, separate regression models are estimated for each location unit. All observations, i.e., location units, are included in each regression but with different weights. The farther a spatial unit from the location for which the regression is fitted, the smaller its data are weighted. This is in line with Tobler’s First Law, which stated that everything is related to everything else, but near things are more related than distant things [
9].
The GWR models for univariate response applied in previous studies were based on normal distribution [
10,
11,
12,
13]. Another study used Poisson distribution to develop a geographically weighted Poisson regression model and semi-parametric geographically weighted Poisson regression [
14]. Meanwhile, Rodrigues et al. [
15] and Chen et al. [
16] used the GWR model based on logistic distribution and negative binomial distribution, respectively. The development of the GWR model for multivariate responses based on several different distributions has been carried out, such as based on normal distribution [
17], Poisson distribution [
18], Weibull distribution [
19], Student-
t distribution [
20], negative binomial distribution [
21], and logistic distribution [
22].
Furthermore, two GWR models for count data which use multivariate responses have been developed, based on the Poisson distribution and negative binomial distribution. The first model is called geographically weighted multivariate Poisson regression (GWMPR) model [
18], while the second model is called geographically weighted negative binomial regression (GWMNBR) model [
21]. The GWMPR model assumes equidispersion, whereas the GWMNBR model only handles cases of overdispersion. In this paper, we propose an alternative multivariate regression model that can handle any type of dispersion by taking into account spatial effects in measuring the relationships between the response variables and the explanatory variables. The proposed model is called a geographically weighted multivariate generalized Poisson regression (GWMGPR) model.
The following discussion in this paper is divided into several main topics. In
Section 2, we define the GWMGPR model followed by discussions on the parameter estimation using the maximum likelihood estimation (MLE) method in
Section 3. We present simultaneous hypothesis testing in
Section 4 to test the significance of all parameters together. In this section, we define the distribution of the test statistic and the critical region for the corresponding hypothesis by applying the maximum likelihood ratio test. We give conclusions in
Section 5.
2. GWMGPR Model
The GWMGPR model is an extension of existing model of MGPR developed by Famoye [
2] as follows:
where
with
and
, where
is the response variable
,
is the dispersion parameter of
, and
shows the correlation between
and
. When
, the MGPR model is reduced to multivariate Poisson regression model. Meanwhile, if
or
, then the MGPR model experiences underdispersion or overdispersion, respectively. The variable
and
are negatively or positively correlated when
or
, respectively.
We use the usual log-linear relationship between the marginal
and the covariates
as follows:
where
is an exposure measure for observation
i of the response variable
l,
is a vector of the explanatory variables, and
is a parameter vector of the MGPR model for the response variable
l,
Equation (
2) produces one regression model with the same regression coefficients for all
n observations (locations), which we called the global model. Meanwhile, the local regression model produces
n regression models; therefore, each location has its own regression coefficients. This is the underlying idea of the GWMGPR model, which is to build a model that takes into account spatial variations in relationships that allows the regression coefficients to vary across geographical locations. In this case, the location-specific regression coefficients are functions of longitude and latitude coordinates denoted by
, a vector of two-dimensional coordinates of location
i. This means that the response variables
are predicted by explanatory variables
, each of which has a regression coefficient depending on the location where the data were observed.
Let
n random samples
,
from the random variables
∼
be given, where
is a two-dimensional coordinate vector of location
i; then, the GWMGPR model based on MGPR in (
2) is written as
where
is the exposure for location
i of the response variable
l with
3. Parameter Estimation
As stated earlier, we use the MLE method to obtain the estimators of the GWMGPR model. The MLE method is widely used to find estimators from observed data. This method requires that the estimators, say
, maximizes the log-likelihood function by solving the gradient function to zero. For a more detailed discussion about MLE, including its principles and properties, see Pawitan [
23].
Supposing
n random samples
are taken independently from random variables
, then the likelihood function of location
i that corresponds to (
1) is
where
and
is stated in (
3). Hence, the parameter vector of location
i of the corresponding GWMGPR model is written as
Let
be the log-likelihood function of (
4) such that
where
and
is defined in (
5).
To estimate the parameters for location
i, we can approximate Equation (
2) by Equation (
3). The regression model is built on using a subset or the entire data points by giving a weight based on the distance of the data points to location
i. We give more weight to data points that are closer to location
i than other data points that are farther away. Thus, the weights vary according to the proximity of the data points to location
i. This means that we need to compute a weighting matrix for each location where the weights represent the proximity of each data point to location
i with closer points having more weight in the estimation of the parameters for location
i.
The above explanation means that the local parameters of
are estimated by a weighting function,
. The weights
,
at each location
are defined by a function of distance
between the center of location
i and other locations. Several weighting functions (kernels) are presented in Fotheringham et al. [
8]. The kernels generally define weights such that locations close to the regression point will make more contributions in estimating the parameters than locations farther away. The most common kernels are the Gaussian function and bi-square function accompany with fixed or adaptive bandwidth.
Fixed Gaussian weighting function is defined as
and fixed bi-square weighting function is
where
is Euclidean distance between location
i and location
j
and
b is the bandwidth that controls the degree of distance-decay. The bandwidth is a measure of the distance-decay in the weighting function and indicates the extent to which the local model should be smoothed [
8]. When
b is large (small), it gives relative higher (lower) weight for more remote locations. This means that a fixed weighting functions may lead to a problem of undersmoothing in areas with only small observations and oversmoothing in areas with a high density of observations.
An alternative to the fixed weighting functions is adaptive kernels or spatially varying kernels, which give smaller bandwidth in areas with more data points and greater bandwidth in areas with less data points. Two types of adaptive kernel are adaptive Gaussian and adaptive bi-square. Adaptive Gaussian weighting function is defined as
and adaptive bi-square weighting function is stated as
where
is the bandwidth of location
i.
Therefore, the implementation of GWMGPR requires a choice of specific function: a bisquare or a Gaussian function and the choice of bandwidth. The results of GWMGPR are more sensitive to the bandwidth of the particular weighting function chosen rather than to the selection of the weighting function. This means that we need to determine optimum bandwidth so there are sufficient data points within the kernel to get a reliable estimation in terms of bias and standard error but not too large of bandwidth so that there are too many data points such that the heterogeneity is washed away or not too high of bandwidth that the kernel only includes one data point for each location, then the system breaks down. The greater the bandwidth, the lower standard error of the coefficient regression but the higher bias, and vice versa.
There are several techniques used to obtain optimal bandwidth in terms of optimum trade-off between bias and standard error, such as the generalized cross-validation (GCV), which is used to determine the best combination of the bias and standard error. The optimum bandwidth is indicated by the minimum value of GCV, which is defined as follows [
8]:
where
is the prediction of
for bandwidth
b, and
is the number of effective parameters in the GWMGPR model.
Once the weights
are calculated, the parameter estimation is determined. To obtain the estimator of location
i, we need information of location
j by applying a weighting function
to the log-likelihood function. When the optimal weighting function is obtained, the log-likelihood function of location
i in (
8) is written as a weighted log-likelihood function as follows:
where
and
The MLE method is applied to the parameter estimation; therefore, the weighted log-likelihood function in (
10) is maximized over the vector parameter
in (
7). The first partial derivatives of
with respect to
are shown as follows:
with
is stated in (
11), where
and
with
and
are stated in (
12) and (
13), respectively.
From Equations (
15) to (
17), the total number of parameters of GWMGPR model, notated by
, is
. The first partial derivatives of
with respect to each parameter in (
15) through (
17) are not closed-form and the MLE does not give analytical solutions. A numerical optimization, such as the Newton–Raphson algorithm, as the simplest and the most common optimization method, is needed when there is no analytical solution available to obtain the maximum likelihood estimators. The Newton–Raphson algorithm is
where
is the vector of parameter of location
i at the
iteration,
is the inverse of Hessian matrix of location
i at the
iteration, and
is the gradient vector of location
i at the
iteration. For a more detailed discussion about numerical optimization methods, see Cameron and Trivedi [
24].
4. Simultaneous Hypothesis Testing
Simultaneous hypothesis testing aims to test more than one parameter, for example to test all the parameters
,
together. This is used to determine the significance of the regression parameters in the model jointly. The hypothesis for simultaneous testing of parameter
is stated as follows:
If we cannot reject
, then it means that none of the regression parameter values are significant. In case of rejection, it means that one or more of
’s have a significant effect in the model.
The critical region of hypothesis testing in (
18) is obtained using the likelihood ratio test technique by calculating the ratio between the maximum value of the likelihood function under
,
, with the maximum value of the likelihood function under the population,
. The maximum value of
is the maximum value in (
4). Meanwhile, the calculation of
is described below.
The first step to obtain the maximum
in determining the critical region for the hypothesis in (
18) is to define the parameter space under
,
,
. Let
be the vector of parameters under
and the GWMGPR model of location
i under
be
The likelihood function in space
is
Let
, then the log-likelihood function under
is written as
where
and
By carrying out the same procedure as described previously, we need the information of location
j to obtain the estimator of location
i. The contribution of each location depends on the weighting function
, such that the weighted log-likelihood function corresponding to (
21) is written as
where
and
is defined as
The first partial derivatives of
with respect to each parameter in the parameter vector
are
where
is defined in (
23) with
and
Equations (
24) through (
26) are not closed-form, so a numerical iteration is needed to solve the equations. The total number of parameters in the parameter space
, notated by
, is
.
Using a significance level
, the critical region of the likelihood ratio test for the GWMGPR model with regard to the hypothesis in (
18) is
where
is a constant that depends on
with
,
and
are stated in (
4) and (
20), respectively. Equation (
27) can be written as
The distribution of the test statistic
in (
28) is written in the following Theorem 1.
Theorem 1. Supposing the maximum likelihood estimator under and the population regarding to hypothesis testing stated in (18) arerespectively, then the test statistic in (28) follows a chi-square distribution with degrees of freedom. Proof of Theorem 1. Suppose the maximum likelihood estimator under the population related to the hypothesis stated in (
18) is partitioned as follows:
where
and
The maximum likelihood estimators and the known parameters in
are partitioned as
where
and
with the true parameter is partitioned as
Therefore, the hypothesis in (
18) can be written as
The test statistic
stated in (
28) can be described as follows:
Let
, using Taylor’s second order expansion around
, we get
where
and
Since
, then
Meanwhile, when the function
is represented via second order approximation around
, the following is obtained:
or it can be written as
Following (
30) and (
31), the test statistic in (
29) can be written as
Let the partition of Fisher information matrix
and its inverse
be
and
Based on the nature of the conditional distribution, given as
,
and
, we get
A simpler form of (
33) using a simple manipulation of the partitioned matrix of the Fisher information is
Since
and
then
The statistic
in (
32) can then be written as
With regard to the normality for MLE under the regularity conditions (see Cameron and Trivedi [
25]), the distribution of the partitioned matrix is
such that
□
Regarding Equation (
27), the critical region of the test statistic
is
Therefore,
in (
18) is rejected when
.