Estimation and Hypothesis Testing for the Parameters of Multivariate Zero Inflated Generalized Poisson Regression Model

: We propose a multivariate regression model called Multivariate Zero Inﬂated Generalized Poisson Regression (MZIGPR) type II. This model further develops the Bivariate Zero Inﬂated Generalized Poisson Regression (BZIGPR) type II. This study aims to develop parameter estimation, test statistics, and hypothesis testing, both simultaneously and partially, for signiﬁcant parameters of the MZIGPR model. The steps of the EM algorithm for obtaining the parameter estimator are also described in this article. We use Berndt–Hall–Hall–Hausman (BHHH) numerical iteration to optimize the EM algorithm. Simultaneous testing is carried out using the maximum likelihood ratio test (MLRT) and the Wald test to partially assess the hypothesis. The proposed MZIGPR model is then used to model the three response variables: the number of maternal childbirth deaths, the number of postpartum maternal deaths, and the number of stillbirths with four predictors. The units of observation are the sub-districts of the Pekalongan Residency, Indonesia. The indicate overdispersion in the data on the number of maternal childbirth deaths and stillbirths, and underdispersion in the data on the number of postpartum maternal deaths. The empirical studies show that the three response variables are signiﬁcantly affected by all the predictor variables.


Introduction
The Poisson regression model is commonly used to analyze data in which the response variable follows a Poisson distribution. Even though the Poisson regression model has been widely applied in various disciplines, it involves an assumption that, in some cases, is difficult to realize-namely, equidispersion. Equidispersion is the condition wherein sample variance is equal to the mean. When the sample variance is greater (less) than the sample mean, it is called overdispersion (underdispersion). The use of Poisson regression on over-or underdispersed data tends to make the standard error and test statistics derived from the model inaccurate, resulting in invalid conclusions [1,2].
As alternatives, several distribution models have been developed, such as negative binomial distribution (NBD) and log-normal Poisson distribution, which can be used to resolve overdispersion. Alternative models for count data that can overcome both overand underdispersion include double Poisson distribution, gamma count distribution, and generalized Poisson distribution (GPD). The first two distribution models are weak in their probability function, which is complex, and the variance and mean have no explicit form [3]. Research on GPD has developed it into generalized Poisson regression (GPR) known as the GP-1 regression model (or the classic GPR model [4]), GP-2 regression (or the restricted GPR model [5]), the bivariate GPR (BGPR) model [6], and the multivariate GPR (MGPR) model [7].
Another problem with Poisson regression is the quantity of response data with a value of zero (excess zero). Lambert [8] developed zero inflated Poisson distribution (ZIPD) to Symmetry 2021, 13, 1876 2 of 11 resolve this problem. The ZIP model assumes that the population, or observations, consist of two groups. A single observation is included in a group that is always zero (zero states) with a probability of p, and is included in a group with a zero value and a positive value in the count data generated by Poisson distribution (Poisson state) with a probability of 1-p. The development of the ZIP model has led to the emergence of several studies on bivariate ZIP (BZIP) and multivariate ZIP (MZIP) models [9][10][11].
The ZIPR model is the proper way to handle excess zero in the Poisson model, but is not suitable for controlling overdispersion. A study by Ridout et al. in 1998 showed that the results of ZIP parameter estimation are biased when there is an overdispersion of excess zero data. In such cases, the Zero Inflated Negative Binomial (ZINB) model performs better than the ZIP model with smaller AIC and BIC values [12]. However, the iteration technique used for parameter estimation in the ZINB model often fails to converge. Therefore, Famoye and Singh (2006) introduced the Zero Inflated Generalized Poisson (ZIGP) model as an alternative to the ZINB model [13]. Czado et al. [14] defined ZIGP distribution as an analog of ZIP distribution with additional zero inflation parameters. The ZIGP regression model was formed based on ZIGP distribution. Thus, the ZIGP regression model can model count data with over/underdispersion and excess zero. The ZIGP model for univariate response has been applied in many aspects, such as domestic violence, entrepreneurship, biology, health, and zoology [1,[13][14][15][16]. Zhang and Huang (2015) introduced the bivariate ZIGP (BZIGP), which is an extension of the univariate ZIGP. Zhang and Huang proposed type I and type II BZIGP distribution with a flexible correlation structure that can be used both when the correlation is positive or negative, and when there is over or underdispersion. The difference between the two marginal distributions of BZIGP gives the zero inflation parameters. BZIGP type I has the same zero inflation parameter on both marginals, whereas the two marginals of BZIGP type II have different zero inflation parameters [17].
BZIGP type I distribution was later developed into MZIGP type I distribution by Huang et al. in 2017. MZIGP type I was applied to data with two responses: the number of days children were inactive in the previous four weeks due to sickness, and the number of days children were bedridden in the previous four weeks due to sickness [18]. MZIGP type I distribution only considers zero and non-zero response pairs. This study proposes a multivariate regression model that can be used on count data with excess zeros and several types of dispersion, and allows for several combinations of response pairs. The proposed model is called a multivariate Zero Inflated Generalized Poisson Regression (MZIGPR) type II (henceforth MZIGPR (II) ).
Based on the previously mentioned background, the aims of this study are: (i) to construct an MZIGPR (II) model, (ii) to obtain a parameters estimator, (iii) to test the significance of the model as well as the significance of the individual parameter, and (iv) to apply the MZIGPR (II) model to real data. The case study used here includes factors that influence the number of maternal childbirth deaths (Y 1 ), the number of postpartum maternal deaths (Y 2 ), and the number of stillbirths (Y 3 ). The units of observation are 91 sub-districts in Pekalongan Residency, Indonesia. The predictor variables include the percentage of childbirths assisted by health workers, the percentage of TT2+ vaccination in pregnant women, the percentage of obstetric complications that were handled, and the ratio of midwives in the population (per 10,000).
The decline in maternal mortality and neonatal mortality has become a global issue because it arises in all countries worldwide, and not only in developing countries. The first and second targets of the third goal of the Sustainable Development Goals (SDGs) are to reduce the maternal mortality ratio to less than 70 deaths per 100,000 live births, and to reduce neonatal deaths to at least 12 deaths per 1000 live births, by 2030 [19]. Despite showing a decline, Indonesia's maternal and neonatal mortality rates are still relatively high, at 305 maternal deaths per 100,000 live births and 15 neonatal deaths per 1000 live births [20,21]. Pekalongan Residency has more maternal deaths and stillbirths than other residencies in Central Java Province, Indonesia-24.0% and 20.79% [22]. To achieve the SDGs by 2030, greater efforts are required.
The discussion in this study is divided into several sections. In Section 2, we discuss the MZIGPR (II) model. Section 3 discusses the estimation of the parameters of the MZIGPR (II) model using the EM algorithm method, and the testing of the hypothesis of the MZIGPR (II) model using maximum likelihood ratio test (MLRT). The proposed approach is used to model data on the numbers of maternal childbirth deaths, postpartum maternal deaths, and stillbirths. The discussion and conclusions can be found in the last two sections.

Multivariate Zero Inflated Generalized Poisson Regression Type II (MZIGPR (II) ) Model
A k-dimensional multivariate ZIGP (MZIGP (II) ) has been constructed via a mixture of multivariate GP, univariate GP, and a distribution degenerate at the original point (0, 0, . . . ,0). Suppose Y 1 is a case of maternal childbirth death and Y 2 is a case of stillbirth. The sub-districts can then be categorized as follows, based on death cases: (i) subdistricts that have no cases of childbirth maternal deaths and stillbirths (Y 1 = 0, Y 2 = 0), (ii) sub-districts with cases of childbirth maternal deaths, but no cases of stillbirths (Y 1 > 0, Y 2 = 0), (iii) sub-districts with no cases of childbirth maternal deaths, but stillbirths (Y 1 = 0, Y 2 > 0), and (iv) sub-districts with cases of childbirth maternal deaths and stillbirths (Y 1 > 0, Y 2 > 0). To obtain a k-dimensional ZIGP multivariate model, we use a mixture of a degenerate on the zero points (0, 0, . . . , 0), k distributions with univariate GP for one type of death and k − 1 zeros (for example, (GP(λ 1 , ϕ 1 ), 0, . . . , 0), . . . , (0, . . . , 0, GP(λ k , ϕ k ))), k(k−1) 2 = C k 2 distributions with bivariate GP for two types of death and k − 2 zeros, . . . , k distributions each with (k − 1)-dimensional GP for k − 1 types of death and 1 zero, and finally, a k-dimensional GP for all k types of death. Since the amount of mixture formed depends on the number of responses used, the following discussion will focus on a model with three responses.
This MZIGP (II) distribution is used mainly for situations in which the observed events are mostly zero. Therefore, if the observed events are mostly non-zero, then the k-dimensional Y > 0 will represent all mixtures of l-dimensional Y > 0 and ( . Thus, the pmf of MZIGPR (II) can be written as follows: (1) (which can be seen in detail in Appendix A), with functions µ ki and p ki , satisfies where ϕ is the dispersion parameter, γ is the zero inflation parameter, x i is the i-th row of the covariate matrix X, β k is the q-dimensional regression parameter column vector, and γ k is the q-dimensional zero inflation parameter column vector. The MZIGPR (II) model reduces to the MGPR model when the zero inflation parameter γ = 0, and it reduces to the MZIPR model when the dispersion parameter ϕ = 0.

Parameter Estimation of MZIGPR (II) Model
The number of elements of I 0 , I 1 , I 2 , I 3 , I 4 is n 0 , n 1 , n 2 , n 3 and n 4 , where n 4 = n − n 0 − n 1 − n 2 − n 3 . There is no limit to the sample size used, regardless of whether the sample sizes are the same or different. In this study, we used different sample sizes of n 0 , n 1 , n 2 , n 3 and n 4 .
Suppose θ = β 1 , β 2 , β 3 , γ 1 , γ 2 , γ 3 , ϕ 1 , ϕ 2 , ϕ 3 , η 12 , η 13 , η 23 ; then, the likelihood function of the observation data can be written as follows: Parameter estimation is carried out using the EM algorithm. We employ a, b, c, and d as unobserved variables or latent variables. The latent variable a divides n 0 into } is a function that will be maximized using the EM algorithm. This likelihood function is proportional to The ln-likelihood function L(θ|Y com ) = (θ|Y com ) is stated as follows: , in which µ ki and p ki refer to Equation (2). The solution of the first derivative of (θ|Y com ) concerning the parameters γ and θ does not have an explicit form. Therefore The M-step in the EM algorithm is the step to updateθ whereθ (c) is the vector estimator at the c-th iteration and g θ is the gradient vector.

Hypothesis Testing of the MZIGPR (II) Model
Hypothesis testing of the MZIGPR (II) model was conducted via both simultaneous and partial parameter testing. Simultaneous parameter testing is used to determine the significance of parameters simultaneously in the model. The hypothesis for the simultaneous testing of β and γ is as follows: H 1 : at least one β kr = 0 and γ kr = 0; k = 1, 2, . . . , m; r = 1, 2, . . . , q The statistic test is G 2 = −2 ln L(ω) L(Ω) = 2 ln L Ω − ln(L(ω)) . Reject the null hypothesis if G 2 > χ 2 α,v with significance level α, and v is the difference n(Ω) − n(ω). A partial test is carried out to determine which parameters have a significant effect on the model. The hypothesis for the partial testing of β and γ is a. Partial testing of β The statistic test is Z g =γ kr se(γ kr ) . Reject the null hypothesis if Z g > Z α/2 .

Application of Data
The proposed MZIGPR (II) model is applied to model the numbers of maternal childbirth deaths, postpartum maternal deaths, and stillbirths. The data used in this study are secondary data collected by the Central Java Provincial Health Office in 2017. There are seven regencies and municipalities with 91 sub-districts in the Karesidenan Pekalongan, Central Java.
This study employs three response variables (Y) and four predictor variables (X). The three response variables are the number of maternal childbirth deaths, represented as Y 1 , the number of postpartum maternal deaths, Y 2, and the number of stillbirths, Y 3 . The four predictor variables are the percentage of childbirths assisted by medical personnel in a sub-district (X 1 ), the percentage of TT2 + immunization in pregnant women in a sub-district (X 2 ), the percentage of obstetric complications handled in a sub-district (X 3 ), and the ratio of midwives in the population (per 10,000) in a sub-district (X 4 ). Table 1 shows that the mean values of the response variables Y 1 , Y 2 , and Y 3 are 0.18, 0.67, and 2.76, respectively. As regards the data spread, measured by the coefficient of variation (CoV), the number of maternal childbirth deaths (Y 1 ) showed the largest CoV, 300.88. This indicates that the variable Y 1 is more heterogeneous than Y 2 and Y 3 . The heterogeneity of Y 1 is shown in Figure 1. Even though the histograms of Y 1 , Y 2, and Y 3 show asymmetrical curves, Y 1 s curve looks more skewed to the right compared to the other two response variables.
In the assumed MZIGP (II) distribution, the response data with a value of zero consist of two states, namely, the zero state and the generalized Poisson (GP) state. Thus, the response data Y 1 , Y 2, and Y 3, which are zero (Y ki = 0; k = 1, 2, 3; i = 1, . . . , n), also consist of two states. The first Y ki = 0 (zero states) means no maternal childbirth or postpartum deaths and no stillbirths in a sub-district. The second Y ki = 0 (GP state) means that there are maternal childbirth and postpartum deaths, and stillbirths, in a sub-district. However, during the last year when the data were collected, there were no maternal childbirth or postpartum deaths, or stillbirths. The zero value in the GP state is assumed to derive from a certain distribution; in this case, the MGP distribution.   Table 2. Hypothesis testing simultaneously on these parameters determines whether each predictor affects the response variable differently. Simultaneous testing gives the result that the statistic 2 89092.513 G = is greater than   MZIGPR (II) modeling including all predictor variables yields parameter estimates, as shown in Table 2. Hypothesis testing simultaneously on these parameters determines whether each predictor affects the response variable differently. Simultaneous testing gives the result that the statistic G 2 = 89092.513 is greater than χ 2 0.05,24 = 36.415, so H 0 is rejected. The decision to reject H 0 indicates that at least one predictor variable affects the number of maternal deaths during childbirth and postpartum, as well as stillbirths. Partial tests were performed to determine which variables had a significant influence on the model. Table 2 shows that all predictors significantly affect the three responses at a significance level of 0.05.

Discussion
There are two types of MZIGPR (II) models-the log model and the logit model. The log model states that the probability Y i of the GP state is affected by significant variables. In contrast, the logit model states that the probability Y i of the zero states is affected by significant variables. The GP state yields zero observations and positive integers generated by the GP distribution, whereas the zero states produce zero observations only.
The MZIGPR (II) model for the number of maternal deaths during childbirth or postpartum, and the number of stillbirths, based on the data of Table 2, is constructed as follows: a. Regression model of maternal childbirth deaths: b. Regression model for postpartum maternal deaths: c. Regression model for stillbirths: The estimated dispersion parameters for ϕ 1 , ϕ 2 , and ϕ 3 are 1.091, −0.148 and 0.516, respectively, which means that the dispersion is not zero. The values of ϕ 1 and ϕ 3 indicate the existence of overdispersion in the data on the number of maternal childbirth deaths and stillbirths. In contrast, the values of ϕ 2 indicate underdispersion in the data on the number of postpartum maternal deaths.
The log and logit models in Equations (6)- (8) show that the variables that significantly affect the GP state also affect the zero state. Based on the regression coefficient of the MZIGPR model, there is a pattern of relationships between several predictor variables and responses that contradict the existing theory. For example, the log model shows that the variable coefficient X 1 (the percentage of childbirths assisted by medical personnel) is positive. This means that every 1% increase in the number of childbirths assisted by medical personnel will increase the average number of maternal childbirth deaths (Y 1 ) by 1.06 people and the average number of stillbirths (Y 3 ) by 1.03 people, but decrease the average number of postpartum maternal deaths (Y 2 ) by 0.98 people when the other predictor variables were held constant. Thus, X 1 shows an inappropriate relationship pattern with Y 1 and Y 3 .
The coefficient of variable X 2 (the percentage of TT2+ vaccination in pregnant women) is negative. If all other variables are constant, then every 1% increase in pregnant women vaccinated with TT2+ will reduce the average number of maternal childbirth deaths (Y 1 ) by 0.98 people and the average number of postpartum maternal deaths (Y 2 ) by 0.97 people. Still, the number of stillbirths (Y 3 ) increased by 1.01 people. In this case, X 2 has an inappropriate relationship with Y 3 . The remaining predictor variables are interpreted in the same way as in the logit model for X 1 and X 2 .
In the logit model, the X 1 variable has a negative value. This means that every one percent increase in the number of childbirths assisted by medical personnel will reduce the average number of maternal survives during childbirth (Y 1 ) by 0.99 people, the average number of maternal survives during postpartum (Y 2 ) by 0.88 people, and the average number of live births (Y 3 ) by 0.92 people, when the other predictor variables are held constant. In the logit model, the variable X 1 has a negative value. This means that for every one percent increase in medically assisted births, the average number of mothers who survive childbirth (Y 1 ) decreases by 0.99 people. A one percent increase in X 1 will also reduce the mean number of mothers who survive after birth (Y 2 ) and the number of live births (Y 3 ) by 0.88 and 0.92 people, respectively, if the other predictor variables are constant.
The relationship pattern between X 1 and the three responses is inappropriate. The remaining variables in the logit model are interpreted in the same way as the X 1 variable in the logit model.

Conclusions
In this study, we developed an MZIGPR (II) model, along with its parameter estimation and hypothesis testing. The parameter estimation is performed using the EM algorithm, followed by the BHHH iteration. The maximum likelihood ration test (MLRT) and Wald tests have been used to tests the significance of the model and the individual parameters, respectively. The proposed MZIGPR (II) model is applied to model the number of maternal childbirth deaths, the number of postpartum maternal deaths, and the number of stillbirths.
MZIGPR (II) can be used in cases of both overdispersion and underdispersion. We found underdispersion in the data on the number of postpartum maternal deaths and overdispersion in the data on the number of maternal childbirth deaths and stillbirths. The empirical results show that all four predictors affect the three response variables.
The main limitation of this study is that the data used are under-representational, because they were only collected at public health centers (puskesmas). Several variables thought to affect the numbers of maternal deaths during childbirth and postpartum, as well as stillbirths, were not used due to limited data availability at the sub-district level. In further research, the predictor variables used in this study can be replaced or added to other more relevant variables, according to the existing theory.
The MZIGPR (II) model's global parameters can be used in all observation locations. Differences in regional characteristics and geographical conditions pertaining in each observation location make this global model less accurate. MZIGPR (II) modeling with spatial aspects can be undertaken in further research.

Conflicts of Interest:
The authors declare no conflict of interest.