Multivariate Gamma Regression: Parameter Estimation, Hypothesis Testing, and Its Application

: Gamma distribution is a general type of statistical distribution that can be applied in various ﬁelds, mainly when the distribution of data is not symmetrical. When predictor variables also a ﬀ ect positive outcome, then gamma regression plays a role. In many cases, the predictor variables give e ﬀ ect to several responses simultaneously. In this article, we develop a multivariate gamma regression (MGR), which is one type of non-linear regression with response variables that follow a multivariate gamma (MG) distribution. This work also provides the parameter estimation procedure, test statistics, and hypothesis testing for the signiﬁcance of the parameter, partially and simultaneously. The parameter estimators are obtained using the maximum likelihood estimation (MLE) that is optimized by numerical iteration using the Berndt–Hall–Hall–Hausman (BHHH) algorithm. The simultaneous test for the model’s signiﬁcance is derived using the maximum likelihood ratio test (MLRT), whereas the partial test uses the Wald test. The proposed MGR model is applied to model the three dimensions of the human development index (HDI) with ﬁve predictor variables. The unit of observation is regency / municipality in Java, Indonesia, in 2018. The empirical results show that modeling using multiple predictors makes more sense compared to the model when it only employs a single predictor.


Introduction
Gamma distribution is one family of continuous probability distributions and generalizations of exponential distributions [1]. Nagar, Correa, and Gupta [2] mentioned that the gamma distribution function was first introduced by Swiss mathematician Leonhard Euler (1729). Because this function is considered important, many researchers have studied and developed it. Bhattacharya [3], among others, conducted a study on testing the homogeneity of the parameters (shape and scale) of the gamma distribution. Chen and Kotz [4] conducted a study on the probability density function (pdf) of gamma distribution with three parameters (shape, scale, and location). Many researchers also study and develop bivariate gamma distribution; among others are Schickedanz and Krause [5], who conducted a study on testing scale parameters from two gamma-distributed data using the generalized likelihood ratio (GLR). Nadarajah [6] studied the types of bivariate gamma distribution. Next, Nadarajah and Gupta [7] developed two new bivariate gamma distributions based on gamma and beta random variables. In addition, Mathai and Moschopoulos [8] discussed joint densities, product moments, conditional densities, and conditional moments that were developed from two bivariate gamma distributions.

Multivariate Gamma Regression Model
Suppose y l is the response variables data (y l1 , y l2 , . . . , y lk ) that follows multivariate gamma distribution and x l is the corresponding predictor variables (x l1 , x l2 , . . . , x ls ), with sample size as n observations (l = 1, 2, . . . , n). In this section, we discuss the construction of the MGR model, its parameter estimation, and hypothesis testing. A short explanation about univariate gamma regression is introduced to make a smooth transition into the MGR model.
The pdf for the lth observation is formulated in Equation (5) which will be used to compose the likelihood function in Equation (6).
with α i > 0, γ > 0, λ i ∈ R, λ 1 < y l1 < ∞, λ 2 < y l2 < ∞, λ k < y lk < ∞, Later, we discuss parameter estimation on MGR using MLE. The likelihood function constructed from Equation (5) is: with values α 1 , α 2 , α k , and α k * based on Equation (5). The log-likelihood function from Equation (6) is: By substituting the values of α 1 , α 2 , α k , and α k * according to Equation (5), the log-likelihood function is: In this article, the log value is based on e or natural logarithm. The first derivatives of the log-likelihood function for each parameter are as follows.

•
Step 1. Determine the initial value forθ k ∈ R satisfies the constraints in Equation (5), andβ are obtained from the estimate of univariate gamma regression. The Hessian H(θ) in BHHH is approximated as the negative of the sum of the outer products of the gradients of individual observations. The gradient vector g(θ) is a vector with each element consisting of the first derivative of the log-likelihood function for each of the estimated parameters.

•
Step 2. Determine the tolerance limit so that the BHHH iteration process stops. In this study, the tolerance value used is ε = 10 −8 .

•
Step 3. Start the BHHH iteration using the following formula.

Proposition 1.
If Ω is a set of parameters under the population, ω is a set of parameters under the null hypothesis, and the hypothesis being used is the simultaneous test of MGR model, then the test statistic is A Corollary of Proposition 1: The hypothesis being used in the simultaneous test of the MGR model in Section 2 can be stated in the following form: the null hypothesis is β * = 0 (sk×1) and the alternative It is noted thatθ Ω andθ ω are estimators that maximize the likelihood and the log-likelihood functions under the population and under the null hypothesis. The principle of the MLE method is to maximize the likelihood functions [17]. The following are test statistics for the hypothesis being used in the simultaneous test of the MGR model in Section 2.
where Λ 0 is a constant value between 0 < Λ 0 ≤ 1. L(ω) and L Ω in Equation (16) are: Based on Equation (17), is difficult to simplify. To simplify the calculation, the test statistics in Equation (16) are expressed in a form equivalent to: Symmetry 2020, 12, 813 7 of 17 The application of natural logarithms in Equation (18) obtains the following test statistics.
Proposition 2. Based on Proposition 1, the distribution of test statistics G 2 is Chi-square with sk degrees of freedom, which can be written as follows.

A Corollary of Proposition 2:
Ifθ Ω is an estimator that maximizes the likelihood and the log-likelihood functions under the population,θ ω is an estimator that maximizes the likelihood and the log-likelihood functions under the null hypothesis, based on Equation (19), so: LogL(θ ω ) function can be approached by Taylor's second-degree expansion aroundθ Ω as follows.
Equation (25) can be simplified by outlining the quadratic form of θ Ω − θ ω T I θ Ω θ Ω − θ ω , so we obtained: From Equation (26), this can be obtained: Based on Equation (28), the quadratic form given by Equation (26) distributed Chi-square with sk degrees of freedom is: with z = I 11 − 1 2β * d → N(0, I sk ), n → ∞. sk is a vector dimension β * or the difference between the number of parameter sets under the population with the number of parameter sets under the null hypothesis, symbolized by n(Ω) − n(ω).
Based on Proposition 2 and Proposition 3, the decision to reject the null hypothesis is made if is the number of parameters under the population, and n(ω) is the number of parameters under the null hypothesis.

Data and Method
The parameter estimation and hypothesis testing on MGR were done based on the following steps. The MGR model was specified based on the pdf in Equation (5) for n observations, l = 1, 2, . . . , n, to construct the likelihood and the log-likelihood functions. The first derivative of the log-likelihood function for each parameter was computed, then equalized to zero. If the solutions were closed-form, then the parameter estimators were obtained. Otherwise, numerical optimization was needed. As shown in the previous section, the solution for parameter optimization was not closed-form, such that the BHHH algorithm was employed in this work.
The overall test for MGR's significance was done using the maximum likelihood ratio test (MLRT). The test statistic was formulated in Equation (19). Meanwhile, the partial test for individual parameter significance in MGR was done using the Wald test [18]. Its test statistics are provided in Equation (31). The proposed MGR model, along with its parameter estimation and hypothesis testing, was applied on real data as an application of this study.
This study used secondary data obtained from Statistics Indonesia. The data used were three response variables, i.e., the life expectancy index, education index, and expenditure index, with six predictor variables: percentage of households that have a private toilet, net enrollment rate of schooling, population density, percentage of poor people, and unemployment rate. The data were observed for 119 regencies/municipalities in Java, Indonesia, in the year 2018.

Application on Human Development Dimensions Data
First, testing the gamma distribution was done using the Kolmogorov-Smirnov (KS) test. The null hypothesis is the data that follows the gamma distribution against the alternative hypothesis that data does not follow the gamma distribution. The test statistic value of the KS test for each response variable is presented in Table 1. In this paper, the goodness of fit is done univariately as the test for multivariate gamma distribution is not available yet. The test for that is another extensive work that is not covered in this paper. Once each response follows gamma distribution, we assume the multiresponses data follow a multivariate gamma distribution. This assumption is the limitation of this work, such that the proposed model can be applied to real data without delay. Each response variable has D n < D (0.05) and p-value > α. The test concludes not to reject the null hypothesis, meaning that the data of life expectancy index (Y 1 ), the education index (Y 2 ), and the expenditure index (Y 3 ) follow the gamma distribution. Therefore, as our research limitation, as mentioned previously, the three response variables are assumed to follow MG distribution.
To support our assumption, we calculated the correlation between the pair of the response variables to show there are dependencies among responses. The correlation coefficients for each pair are as follows: (i) Y 1 and Y 2 is 0.398 with p-value close to zero, (ii) Y 1 and Y 3 is 0.324 with p-value close to zero, (iii) Y 2 and Y 3 is 0.818 with p-value close to zero. The correlation coefficient between education index (Y 2 ) and expenditure index (Y 3 ) is stronger than the other pairs. To find out whether there is dependency among the response variables, one can use Bartlett's test of sphericity so that the data are feasible for multivariate analysis. This test has statistic value χ 2 = 148.735 and p-value = 2.22 × 10 −16 . The χ 2 > χ 2 3;0,05 (or 7.815) and p-value < α, and alpha is 0.05. The decision is to reject the null hypothesis (Pearson correlation matrix not equal to an identity matrix), which means the correlation between the response variables is significant in the multivariate sense. Therefore, the data analysis needs to be done in a multivariate way using the MGR model.
We also tested the multicollinearity among the predictor variables. The variance inflation factor (VIF) value for each predictor variable is 1.358 (for X 1 ), 1.350 (X 2 ), 1.560 (X 3 ), 1.849 (X 4 ), and 1.211 (X 5 ). The VIF value for each of the predictor variables is less than ten which shows there is no multicollinearity among the predictor variables.
In Table 2, the mean values for response variables Y 1 , Y 2 , and Y 3 are 0.806, 0.632, and 0.735. Although Y 1 , Y 2 , and Y 3 have mean values that do not differ greatly, they are not necessarily of the same quality; it depends on the size of the spread of the data. One measure of data distribution that can be used is the coefficient of variation (CoV). The CoV for Y 1 , Y 2 , and Y 3 are 5.200, 12.210, and 9.140, respectively. The CoV for education index (Y 2 ) is the highest among others, which means that the variable is more heterogeneous. The CoV for predictor variables X 1 , X 2 , X 3 , X 4 , and X 5 are, respectively, 12.100, 16.750, 136.250, 43.750, and 45.530. The CoV for population density (X 3 ) is the highest among other predictor variables as its range is also the biggest one. The dependency between response and predictor variables can be shown visually by the matrix plot, as exhibited in Figure 1. The correlation between X 3 and X 4 (−0.585) is stronger than the correlation between X 4 and the other predictor variables, even stronger than other pairs. The correlation between X 1 and X 5 (−0.017) is weakest compared to the correlation of other couples. There are indications that the relationship is non-linear between X 3 with the response variable and the other predictor variables.
For the correlation between response and predictor variables, log(Y 1 ) has the strongest correlation with X 1 (0.434) compared to other predictors. The log(Y 2 ) and X 3 have the strongest correlation (0.705) compared with other predictors, while the correlation of log(Y 3 ) and X 3 is the strongest one (0.744). This value shows that log expenditure index and population density has the strongest relationship among other pairs. The dependency between response and predictor variables can be shown visually by the matrix plot, as exhibited in Figure 1. The correlation between X3 and X4 (−0.585) is stronger than the correlation between X4 and the other predictor variables, even stronger than other pairs. The correlation between X1 and X5 (−0.017) is weakest compared to the correlation of other couples. There are indications that the relationship is non-linear between X3 with the response variable and the other predictor variables. For the correlation between response and predictor variables, log(Y1) has the strongest correlation with X1 (0.434) compared to other predictors. The log(Y2) and X3 have the strongest correlation (0.705) compared with other predictors, while the correlation of log(Y3) and X3 is the strongest one (0.744). This value shows that log expenditure index and population density has the strongest relationship among other pairs. To find out which predictor variables significantly predicted response variables, we employed the MGR model. Table 3 presents the ML estimates of the MGR model with a single predictor and their corresponding standard errors, z score, and p-value. Every single predictor does not affect any response variables. Only the intercepts when the MGR model employs X 3 as a single predictor are significant. Table 3. Parameter estimation of multivariate gamma regression (MGR) model with a single predictor.

Parameter
Estimate Standard Error z p-Value  The MGR model with a single predictor (for example, the X 5 ) for the life expectancy index, education index, and expenditure index is obtained as follows. Table 3, it is shown that all predictor variables are not significant. For comparison, we also did MGR modeling with multiple predictors. Table 4 presents the ML estimates of the MGR model with multiple predictors along with their corresponding standard errors, z score, and p-value.  The estimate of the scale parameter is 0.649423, with its standard error 0.000028. The estimate of λ 1 , the location parameter for Y 1 , is 0.670845 (standard error 0.006884); meanwhile, the estimate for λ 2 is −0.309362, with standard error 0.006507, and for λ 3 is 0.000468 (standard error 0.006530). The significant parameters are the scale parameter γ, the location parameter for Y 1 and Y 2 , respectively, and λ 1 and λ 2 , as their p-values are less than α = 10%. The estimate of each parameter corresponding to each predictor is summarized in Table 4. Therefore, the MGR model for the life expectancy index, education index, and expenditure index is obtained as follows.

As summarized in
The Akaike information criterion (AIC) value is −63.903, and the corrected Akaike information criterion (AICc) value is −53.361. To know the average squared difference between the estimated and the actual values, one can use the mean square error (MSE). The MSEs for the life expectancy index, education index, and expenditure index are 0.001, 0.002, and 0.003, respectively. As the MSE is an unbiased estimator of variance, the MSE value is expected to be not much different from the variance of each response variable, i.e., 0.002 (expectancy index), 0.006 (education index), and 0.004 (expenditure index).
We can perform the simultaneous test for the model's significance using Wilk's likelihood ratio statistics derived based on the MLRT. The test statistic value is 46.682, and the value of the Chi-square table with 15 degrees of freedom and α = 10% is 22.307. The test statistical value is larger than the value of the Chi-square table; therefore, the decision is to reject the null hypothesis. It means that the five predictor variables have a significant effect on the response variables simultaneously. To find out the predictor variables that partially affect the response variable, one can use test statistics in Equation (31). From Table 4, it can be seen that the significant predictor variable that influences the life expectancy index is the unemployment rate (X 5 ); meanwhile the education index and expenditure index are significantly affected by the percentage of poor people (X 4 ) and unemployment rate (X 5 ).
Based on the results of MGR modeling with a single predictor (Table 3) and multiple predictors (Table 4), it can be determined the differences in the coefficient signs only happen for X 5 in response to Y 2 and Y 3 , as shown in Table 5. We can also find the supports of this evidence from the matrix plot in Figure 1, that individually, X 5 has a negative relationship with Y 1 , while it has positive dependencies with Y 2 and Y 3 . On the other side, the X 4 has a stronger negative individual relationship with all responses. Therefore, when X 4 and X 5 are used as predictors together in the MGR model, the sign of X 5 changes as there is a significant correlation (−0.391 with p-value < 0.05) between X 4 and X 5 , where X 4 affects the response of Y 2 and Y 3 is stronger than X 5 . Table 5. The difference in coefficient signs and significance of the parameters in the MGR model.

Multiple Predictors
Single Predictor .
Recall the VIF value for X 5 is 1.211, which is small. This value means that there is a weak relationship between X 5 and (X 1 , X 2 , X 3 , and X 4 ). However, there is a significant correlation between X 4 alone and X 5 . In the MGR with multiple predictors, the positive sign for X 5 will not change if X 4 also has a positive sign for responses X 2 and X 3 . Unfortunately, that is not the case. The correlation between X 4 and X 2 has a different sign compared with the correlation between X 5 and X 2 . The sign of X 4 and X 5 in MGR with multiple predictors can change depending on its correlation with the response variable. The same explanation pertains to response X 3 .
Life expectancy index (Y 1 ) has a negative association with the percentage of poor people (X 4 ), even though it is not significant for regency/municipality in Java. This finding means that an increase in life expectancy index is not affected by the percentage of poor people in Java. The education index (Y 2 ) and expenditure index (Y 3 ) have a significant negative dependency on the percentage of poor people in Java.
The predictions resulting from the MGR model are expected to be close to the actual values. The closer those two values, the narrower the spread, as displayed in Figure 2. It can be seen that fitting values for Y 2 and Y 3 are better than those of Y 1 . This result is also supported by significant predictors, as reported in Table 4. The life expectancy index has one significant predictor, while the other two responses have two significant predictors that increase their coefficients of determination.
percentage of poor people in Java.
The predictions resulting from the MGR model are expected to be close to the actual values. The closer those two values, the narrower the spread, as displayed in Figure 2. It can be seen that fitting values for Y2 and Y3 are better than those of Y1. This result is also supported by significant predictors, as reported in Table 4. The life expectancy index has one significant predictor, while the other two responses have two significant predictors that increase their coefficients of determination.

Conclusions
The proposed MGR model has been developed along with its parameter estimation and hypothesis testing. The solution of parameter estimation using MLE is not closed-form such that it is optimized numerically using the BHHH algorithm. The MLRT and Wald tests are employed for testing the model's significance and the individual parameter, respectively. The proposed MGR model is applied to model the three dimensions of the human development index (HDI) with five predictor variables. The empirical results show that modeling using multiple predictors makes more sense compared to the model when it only employs a single predictor. When multiple predictors are used in the MGR model, there is a possibility that the sign of a particular parameter changes

Conclusions
The proposed MGR model has been developed along with its parameter estimation and hypothesis testing. The solution of parameter estimation using MLE is not closed-form such that it is optimized numerically using the BHHH algorithm. The MLRT and Wald tests are employed for testing the model's significance and the individual parameter, respectively. The proposed MGR model is applied to model the three dimensions of the human development index (HDI) with five predictor variables. The empirical results show that modeling using multiple predictors makes more sense compared to the model when it only employs a single predictor. When multiple predictors are used in the MGR model, there is a possibility that the sign of a particular parameter changes compared to when it is employed alone. This is a common problem that arises in modeling caused by collinearity among predictors. This issue can be overcome in future work.

Appendix A
The first derivatives of the log-likelihood function for each parameter under the null hypothesis.