Parameter Estimation and Hypothesis Testing of Geographically Weighted Multivariate Generalized Poisson Regression

We introduce a new multivariate regression model based on the generalized Poisson distribution, which we called geographically-weighted multivariate generalized Poisson regression (GWMGPR) model, and we present a maximum likelihood step-by-step procedure to obtain parameters for it. We use the maximum likelihood ratio test to examine the significance of the regression parameters and to define their critical region.


Introduction
Poisson distribution is one of the most widely utilized univariate and multivariate regression models for count data analysis. The univariate regression model is comprised of one response variable, while the multivariate regression model is made up of two or more response variables that are correlated. There are several multivariate count data models based on Poisson distribution, namely multivariate Poisson regression model, multivariate Poisson-gamma mixture model, multivariate Poisson log-normal model, and multivariate generalized Poisson regression model [1,2].
Poisson distribution is characterized by its restrictive assumption on equidispersion, a condition where the sample variance is equal to the mean. However, when the sample mean is less (greater) than the variance, it is called overdispersion (underdispersion). The multivariate Poisson regression model assumes equidispersion and allows for nonnegative correlations, while the multivariate Poisson-gamma mixture model handles overdispersion and nonnegative correlations. Furthermore, the multivariate Poisson-log-normal model allows overdispersion and a more flexible correlation structure. The multivariate generalized Poisson regression (MGPR) model is the most flexible amongst all because it allows any type of dispersion and correlation.
The MGPR model developed by Famoye [2] is a regression model based on multivariate generalized Poisson distribution, which was obtained by the same procedure in constructing bivariate generalized Poisson distribution, which is a product of generalized Poisson marginal with a multiplicative factor [3,4]. To date, bivariate and multivariate generalized Poisson regression models have been applied to real data with units of analysis such as elderly, women, patients, soccer teams, or households [2,5,6]. These models are global regression models that assume the relationships between variables are spatially constant.
When the units of analysis used are locations, spatial effects need to be taken into account in modeling the relationships between the explanatory variables and the response variables. Two possible issues arise when the sample data contain spatial elements, namely spatial dependency and spatial heterogeneity [7]. Out of these two, spatial dependency or spatial autocorrelation is best known and most acknowledged. Spatial dependency occurs when the intensity of an event at one location influences the intensity at surrounding locations. Meanwhile, spatial heterogeneity describes the distribution of an event and the distribution of a relationship across a landscape. This means that what matters in a socio-economic relationship may not be homogenously distributed across space, such as infant mortality rates or unemployment rates at different locations. Both of these issues tend to arise when data are collected by location units such as provinces, districts/cities, sub-districts, villages, etc.
One statistical technique commonly used to address spatial effects is geographically weighted regression (GWR) model [8]. The GWR model allows the relationships between response variables and explanatory variables vary over space so the regression coefficients systematically vary across space. With this method, separate regression models are estimated for each location unit. All observations, i.e., location units, are included in each regression but with different weights. The farther a spatial unit from the location for which the regression is fitted, the smaller its data are weighted. This is in line with Tobler's First Law, which stated that everything is related to everything else, but near things are more related than distant things [9].
The GWR models for univariate response applied in previous studies were based on normal distribution [10][11][12][13]. Another study used Poisson distribution to develop a geographically weighted Poisson regression model and semi-parametric geographically weighted Poisson regression [14]. Meanwhile, Rodrigues et al. [15] and Chen et al. [16] used the GWR model based on logistic distribution and negative binomial distribution, respectively. The development of the GWR model for multivariate responses based on several different distributions has been carried out, such as based on normal distribution [17], Poisson distribution [18], Weibull distribution [19], Student-t distribution [20], negative binomial distribution [21], and logistic distribution [22]. Furthermore, two GWR models for count data which use multivariate responses have been developed, based on the Poisson distribution and negative binomial distribution. The first model is called geographically weighted multivariate Poisson regression (GWMPR) model [18], while the second model is called geographically weighted negative binomial regression (GWMNBR) model [21]. The GWMPR model assumes equidispersion, whereas the GWMNBR model only handles cases of overdispersion. In this paper, we propose an alternative multivariate regression model that can handle any type of dispersion by taking into account spatial effects in measuring the relationships between the response variables and the explanatory variables. The proposed model is called a geographically weighted multivariate generalized Poisson regression (GWMGPR) model.
The following discussion in this paper is divided into several main topics. In Section 2, we define the GWMGPR model followed by discussions on the parameter estimation using the maximum likelihood estimation (MLE) method in Section 3. We present simultaneous hypothesis testing in Section 4 to test the significance of all parameters together. In this section, we define the distribution of the test statistic and the critical region for the corresponding hypothesis by applying the maximum likelihood ratio test. We give conclusions in Section 5.
We use the usual log-linear relationship between the marginal Y il and the covariates x i as follows: where q il is an exposure measure for observation i of the response variable l, x i is a vector of the explanatory variables, and β l is a parameter vector of the MGPR model for the response variable l, , l = 1, 2, . . . , g.
Equation (2) produces one regression model with the same regression coefficients for all n observations (locations), which we called the global model. Meanwhile, the local regression model produces n regression models; therefore, each location has its own regression coefficients. This is the underlying idea of the GWMGPR model, which is to build a model that takes into account spatial variations in relationships that allows the regression coefficients to vary across geographical locations. In this case, the location-specific regression coefficients are functions of longitude and latitude coordinates denoted by u i = (u 1i , u 2i ), a vector of two-dimensional coordinates of location i. This means that the response variables Y 1i , Y 2i , . . . , Y il are predicted by explanatory variables x i , each of which has a regression coefficient depending on the location where the data were observed.
Let n random samples (x i , y i ), i = 1, 2, . . . , n from the random variables y i ∼MGP (µ (u i ) , ϕ (u i ) , γ (u i )) be given, where u i is a two-dimensional coordinate vector of location i; then, the GWMGPR model based on MGPR in (2) is written as where q il is the exposure for location i of the response variable l with , l = 1, 2, . . . , g; i = 1, 2, . . . , n.

Parameter Estimation
As stated earlier, we use the MLE method to obtain the estimators of the GWMGPR model. The MLE method is widely used to find estimators from observed data. This method requires that the estimators, sayθ, maximizes the log-likelihood function by solving the gradient function to zero. For a more detailed discussion about MLE, including its principles and properties, see Pawitan [23].
Supposing n random samples (x i , y i ) , i = 1, 2, . . . , n are taken independently from random variables y i ∼ MGP (µ (u i ) , ϕ (u i ) , γ (u i )), then the likelihood function of location i that corresponds to (1) is where and µ il (u i ) is stated in (3). Hence, the parameter vector of location i of the corresponding GWMGPR model is written as Let Q 1 be the log-likelihood function of (4) such that where and z il is defined in (5).
To estimate the parameters for location i, we can approximate Equation (2) by Equation (3). The regression model is built on using a subset or the entire data points by giving a weight based on the distance of the data points to location i. We give more weight to data points that are closer to location i than other data points that are farther away. Thus, the weights vary according to the proximity of the data points to location i. This means that we need to compute a weighting matrix for each location where the weights represent the proximity of each data point to location i with closer points having more weight in the estimation of the parameters for location i.
The above explanation means that the local parameters ofθ are estimated by a weighting function, w ij . The weights w ij , i, j = 1, 2, . . . , n at each location u i are defined by a function of distance d ij between the center of location i and other locations. Several weighting functions (kernels) are presented in Fotheringham et al. [8]. The kernels generally define weights such that locations close to the regression point will make more contributions in estimating the parameters than locations farther away. The most common kernels are the Gaussian function and bi-square function accompany with fixed or adaptive bandwidth.
Fixed Gaussian weighting function is defined as and fixed bi-square weighting function is where d ij is Euclidean distance between location i and location j and b is the bandwidth that controls the degree of distance-decay. The bandwidth is a measure of the distance-decay in the weighting function and indicates the extent to which the local model should be smoothed [8]. When b is large (small), it gives relative higher (lower) weight for more remote locations. This means that a fixed weighting functions may lead to a problem of undersmoothing in areas with only small observations and oversmoothing in areas with a high density of observations. An alternative to the fixed weighting functions is adaptive kernels or spatially varying kernels, which give smaller bandwidth in areas with more data points and greater bandwidth in areas with less data points. Two types of adaptive kernel are adaptive Gaussian and adaptive bi-square. Adaptive Gaussian weighting function is defined as and adaptive bi-square weighting function is stated as where b i is the bandwidth of location i. Therefore, the implementation of GWMGPR requires a choice of specific function: a bisquare or a Gaussian function and the choice of bandwidth. The results of GWMGPR are more sensitive to the bandwidth of the particular weighting function chosen rather than to the selection of the weighting function. This means that we need to determine optimum bandwidth so there are sufficient data points within the kernel to get a reliable estimation in terms of bias and standard error but not too large of bandwidth so that there are too many data points such that the heterogeneity is washed away or not too high of bandwidth that the kernel only includes one data point for each location, then the system breaks down. The greater the bandwidth, the lower standard error of the coefficient regression but the higher bias, and vice versa.
There are several techniques used to obtain optimal bandwidth in terms of optimum trade-off between bias and standard error, such as the generalized cross-validation (GCV), which is used to determine the best combination of the bias and standard error. The optimum bandwidth is indicated by the minimum value of GCV, which is defined as follows [8]: whereŷ i (b) is the prediction of y i for bandwidth b, and v 1 is the number of effective parameters in the GWMGPR model. Once the weights w ij are calculated, the parameter estimation is determined. To obtain the estimator of location i, we need information of location j by applying a weighting function w ij to the log-likelihood function. When the optimal weighting function is obtained, the log-likelihood function of location i in (8) is written as a weighted log-likelihood function as follows: where and The MLE method is applied to the parameter estimation; therefore, the weighted log-likelihood function in (10) is maximized over the vector parameter θ (u i ) in (7). The first partial derivatives of Q 1 with respect to θ (u i ) are shown as follows: with Q 2 is stated in (11), where γ jlm e −y jl − z jl e −y jm − z jm = e −y jl − z jl e −y jm − z jm , and ∂z jl with z jl and r l are stated in (12) and (13), respectively.
From Equations (15) to (17), the total number of parameters of GWMGPR model, notated by a 1 , is (g + 2p + 3) gn/2. The first partial derivatives of Q 1 with respect to each parameter in (15) through (17) are not closed-form and the MLE does not give analytical solutions. A numerical optimization, such as the Newton-Raphson algorithm, as the simplest and the most common optimization method, is needed when there is no analytical solution available to obtain the maximum likelihood estimators. The Newton-Raphson algorithm iŝ

Simultaneous Hypothesis Testing
Simultaneous hypothesis testing aims to test more than one parameter, for example to test all the parameters β lk (u i ), (l = 1, 2, . . . g; i = 1, 2, . . . , n) together. This is used to determine the significance of the regression parameters in the model jointly. The hypothesis for simultaneous testing of parameter β lk (u i ) is stated as follows: .. = β l p (u i ) = 0; (i = 1, 2, . . . , n; l, m = 1, 2, . . . , g) , If we cannot reject H 0 , then it means that none of the regression parameter values are significant. In case of rejection, it means that one or more of β lk (u i )'s have a significant effect in the model. The critical region of hypothesis testing in (18) is obtained using the likelihood ratio test technique by calculating the ratio between the maximum value of the likelihood function under H 0 , L (ω), with the maximum value of the likelihood function under the population, L (Ω). The maximum value of L (Ω) is the maximum value in (4). Meanwhile, the calculation of L (ω) is described below.
The first step to obtain the maximum L (ω) in determining the critical region for the hypothesis in (18) is to define the parameter space under H 0 , ω = {β ωl0 (u i ) , ϕ ωl (u i ) , γ ωlm (u i )}, (l, m = 1, 2, . . . , g; l < m). Let θ ω2 (u i ) be the vector of parameters under H 0 , and the GWMGPR model of location i under H 0 be The likelihood function in space ω is Let Q 1ω = ln L (ω), then the log-likelihood function under H 0 is written as where By carrying out the same procedure as described previously, we need the information of location j to obtain the estimator of location i. The contribution of each location depends on the weighting function w ij , such that the weighted log-likelihood function corresponding to (21) is written as where and µ ωjl (u i ) is defined as µ ωjl (u i ) = q jl exp (β ωl0 (u i )) .
The first partial derivatives of Q 1ω with respect to each parameter in the parameter vector where Q 2ω is defined in (23) with Equations (24) through (26) are not closed-form, so a numerical iteration is needed to solve the equations. The total number of parameters in the parameter space ω, notated by a 2 , is (g + 3) gn/2 . Using a significance level α ∈ (0, 1), the critical region of the likelihood ratio test for the GWMGPR model with regard to the hypothesis in (18) is where k α is a constant that depends on α with 0 ≤ k α ≤ 1, L (Ω) and L (ω) are stated in (4) and (20), respectively. Equation (27) can be written as The distribution of the test statistic G 2 in (28) is written in the following Theorem 1.
respectively, then the test statistic G 2 in (28) follows a chi-square distribution with gpn degrees of freedom.
Proof of Theorem 1. Suppose the maximum likelihood estimator under the population related to the hypothesis stated in (18) with the true parameter is partitioned as Therefore, the hypothesis in (18) can be written as Since θ ω − θ ω = 0,θ ω2 − θ 2 and θ ω2 − θ 2 =θ 2 − θ 2 + I −1 22 I 21 θ 1 − θ ω1 , then GWMGPR model using real data to validate the theoretical results in this paper that are being prepared for our next manuscript.
Author Contributions: Conceptualization, S.M.B., P.P., and S.S.; methodology, S.M.B., P.P., S.S. and S.P.R.; writing-original draft preparation, S.M.B.; writing-review and editing, S.M.B. and P.P. All authors have read and agreed to the published version of the manuscript.