3.3. Hypothesis Testing of MPIGR Model
The hypothesis testing of the MPIG regression model was undertaken by the Maximum Likelihood Ratio Test (MLRT) method both simultaneously and partially. Simultaneous hypothesis testing is performed in order to determine the significance of the regression parameters in the model simultaneously with the following hypothesis: the null hypothesis is βj1 = βj2 = … = βjk = … = βjp = 0 and τ = 0 and the alternative hypothesis is at least one βjk ≠ 0 and τ ≠ 0, where j = 1, 2, …, m; and k = 1, 2,…, p.
Let Ω as a set of parameters under population with
and
is a set of parameters under null hypothesis with
The
is the likelihood of the full model, which includes all of the predictor variables, and
is the likelihood of a saturated model without predictor variables. The likelihood function for each model, as follows:
and
The test statistics for the hypothesis in the simultaneous test of MPIGR model is formulated, as below.
The log-likelihood function in Equation (22) is maximized by determining the first-order partial derivative of the log-likelihood function with respect to the parameters of
and
and the results are as follows.
Generally, we get:
Hence, the statistics G
2 for the MPIGR model determined by substituting Equations (9) and (22) to Equation (23) and the result is as follows:
The details of the statistics
G2 based on the likelihood ratio test method is described in
Appendix A. The statistics
G2 follows the asymptotic of the Chi-square distribution, such that the significant level
α reject the null hypothesis when
G2 value falls into the rejection region, i.e., when
.
3.4. Application
The MPIGR model in this study is applied in order to model the number of infants, child, and maternal death. The data are collected from the Health Profile of Java, Indonesia, in 2017. There are six provinces with 119 cities or municipalities in Java Island. Because of the data limitation, Banten Province was not included in this study. Thus, this study only used 111 cities or municipalities.
The variables used in this study is consist of three correlated response variables, namely the number of infant mortality (Y1), under-five children mortality(Y2), and maternal mortality (Y3). There are eight predictor variables, such as the percentage of antenatal care visit by pregnant women (X1), the percentage of pregnant women who received Fe3 tablet (X2), the percentage of complete neonatal visits (X3), the percentage of Low Birth Weight (LBW) (X4), the percentage of healthy house (X5), the percentage of active integrated service post (X6), the percentage of infants received vitamin A (X7), and the percentage of births that are assisted by health workers (X8). Banten Province does not have several predictor variables selected, which is quite important for modelling the response variable. Thus, Banten Province was excluded from the study on the consideration of the selected predictor variables.
Every region in Java Island has different characteristics. Therefore, the exposure variable is needed, because the city or municipality is worth comparing. The exposure variable used in this study is the number of live births of each city or municipality in Java.
Based on
Table 1, the means of three response variables are 118.08, 20.41, and 16.4. Because the mean of
Y1,
Y2, and
Y3 differ greatly, we need to measure the spread of the data. The coefficient of variation (CoV) can be used to measure data distribution. The CoV of
Y1,
Y2, and
Y3 are 63.4, 425.6, and 89.1. The number of child mortality (
Y2) has the highest CoV, which means that variable Y
2 is more heterogenous than the other two variables. This evidence is also supported by histogram for
Y1,
Y2, and
Y3 in
Figure 1. It shows that the Y
2 curve is quite skewed to the right than
Y1 and
Y3.
Table 2 displays the characteristics of each predictor that is presumed to influence the number of infants, child, and maternal deaths. The characteristics of each predictor are explained based on the mean and standard deviation for each province in Java Island. As a health indicator, these predictors are expected to meet the targets in order to improve the quality of health in Indonesia.
Predictors other than LBW are expected to have a high percentage, because these predictors are thought to reduce the number of infants, child, and maternal death. In comparison, LBW is expected to have a low percentage, because LBW is believed to be able to increase the number of infant deaths.
Based on
Table 2, almost all of the provinces in Java reach an average of 80–90% for some predictors, except LBW. Still, the province with a low percentage, such as Yogyakarta, has the average of the ratio of complete neonatal visits at 77.32% that should be increased. Furthermore, for the percentage of active integrated service posts, other provinces except Jakarta have a percentage of 60–78%, which is a quite low average. The active integrated service post is a form of Community-Based Health Efforts to facilitate the public for infants, children, and maternal to get health services. Aside from active integrated service posts, the percentage of healthy homes in all provinces in Java has not reached 80% other than Central Java Province. Additional suggestions to the government, the role of the community should be improved in order to encourage the people to get involved in the implementation of active integrated service posts, and the percentage of healthy homes.
The characteristics of each predictor provide a description and presumption about the predictors affect the number of infants, child, and maternal deaths in Java. Further analysis is required to obtain more accurate results. The MPIGR model is used to determine the predictors, since it significantly affects the number of infants, child, and maternal death in Java. Before applying the MPIGR, it is necessary to test the overdispersion assumption. Overdispersion occurs when the variance is higher than the mean. The overdispersion exists when the deviance value over the degree of freedom is higher than one, and the ratio of Pearson Chi-square value over the degree of freedom is higher than one.
Table 3 shows that all of the response variables suffer overdispersion, because the values of deviance/df are higher than one. Therefore, the MPIGR model should be used to model the data. The relationship between pair of response variables was measured by the Pearson’s product–moment correlation. The coefficient of Pearson’s correlation between variable
Y1 and
Y2 is 0.543 (
p-value = 6.97 × 10
−10). The coefficient of Pearson’s correlation between variable
Y1 and
Y3 is 0.587 (
p-value = 1.29 × 10
−11). Otherwise, the coefficient of Pearson’s correlation between variable
Y2 and Y
3 is 0.130 (
p-value = 0.172). Even though there is one pair of the response variables that has significantly no correlation, we need to make sure whether there is dependency among the response variables in multivariate way. Therefore, we calculated the correlation using Bartlett’s test. The result shows that
and
p-value (7.20 × 10
−20) < α (0.05). The decision is to reject the null hypothesis, stating that Pearson correlation matrix not equal to an identity matrix. Thus, the response variable can be used in multivariate analysis while using the MPIGR model.
The significance of the simultaneous test shows that the statistics G2 = 39.86 × 108 is higher than ; hence, the decision to reject the null hypothesis. It means that there is at least one predictor variable that significantly influences the number of infants, child, and maternal mortality.
The partial hypothesis testing is done in order to determine the significant predictor variables that are influencing the number of infants, child, and maternal mortality in Java.
Table 4 shows the estimation results of MPIGR model parameters.
The estimate of the dispersion parameter (
τ) is 0.493 with its
p < 0.001. Based on the empirical results summarized in
Table 3, all of the predictor variables have a significant effect on the three responses. The MPIGR model of these three responses and eight predictors can be written in the following equations:
We use the Mean Squared Error (MSE) to measure the difference in the average squared between the estimated and the actual value in order to determine whether the model fits the data well. The Root Mean Squared Error (RMSE) reveals the estimates of standard deviation of each response, where their standard deviation of response observations are reported in
Table 1. The MSE and RMSE for the response variables are tabulated in
Table 5. It is shown that the RMSE values are close the standard deviation of each response. This empirical results also prove that the predicted responses are relatively very close to the observations values.
In addition to RMSE written in
Table 5, the scatter plots of true and prediction values of each response are exhibited in order to show that the MPIGR model is good at predicting the observed data. To support this result,
Figure 2 is displayed to see how spread out the residuals are. Based on
Figure 2, the fitting values for
Y1 and
Y3 are better than those of
Y2. This empirical results, of course, can be improved. These findings become the big concerns for the next research that are possibly related to the spatial dependencies among the responses that are discussed in the coming section.