On the Bayesian Mixture of Generalized Linear Models with Gamma-Distributed Responses

: This paper proposes enhanced studies on a model consisting of a ﬁnite mixture framework of generalized linear models (GLMs) with gamma-distributed responses estimated using the Bayesian approach coupled with the Markov Chain Monte Carlo (MCMC) method. The log-link function, which relates the mean and linear predictors of the model, is implemented to ensure non-negative values of the predicted gamma-distributed responses. The simulation-based inferential processes related to the Bayesian-MCMC method is carried out using the Gibbs sampler algorithm. The performance of proposed model is conducted through two real data applications on the gross domestic product per capita at purchasing power parity and the annual household income per capita. Graphical posterior predictive checks are carried out to verify the adequacy of the ﬁtted model for the observed data. The predictive accuracy of this model is compared with other Bayesian models using the widely applicable information criterion (WAIC). We ﬁnd that the Bayesian mixture of GLMs with gamma-distributed responses performs properly when the appropriate prior distributions are applied and has better predictive accuracy than the Bayesian mixture of linear regression model and the Bayesian gamma regression model.


Introduction
The finite mixture model, as a class of model-based clustering methods, provides a flexible model for capturing heterogeneous data in a data-driven manner. The finite mixture of statistical distribution, commonly called the finite mixture distribution, is the most basic type of finite mixture model. Due to its flexibility for modeling heterogeneous data, the finite mixture model with its various developments has been widely studied in the research. The finite mixture model of the exponential family of distributions is one of the models included in that study. Wiper et al. (2001) determined a mixture of gamma distribution for estimating some quantities for a M/G/1 queue. Lopera et al. (2011) introduced a Bayesian analysis for the mixture of normal-exponential distributions including joint modeling of the mean and variance. Garrido and Cepeda (2012) proposed the mixtures of normal and gamma distributions belonging to the biparametric exponential family. The simulated studies and applications of heteroscedastic Weibull-Normal mixture models with the Bayesian approach were discussed in Garrido and Cuervo (2014).
Further advanced models have been established by including generalized linear models (GLMs) in each mixture component. The finite mixture of GLMs allows the construction of different parameters in each mixture component, hence, it can accommodate different heterogeneous cases that occur in the data. Wedel and Kamakura (2000) defined this model as the Generalized Mixture Regression Models (GLIMMIX), whereas Grün and Leisch (2008) called it the Finite Mixtures of Generalized Linear Regression Models. Inferential methods, through frequentist or Bayesian approaches, can be implemented to estimate the finite mixture of GLMs. The Bayesian framework, which performs an estimating procedure concerning the posterior distribution of parameters, has some advantages when used to determine the parameter of the finite mixture model. One of the advantages of the Bayesian approach is that when regularity conditions are not fulfilled, i.e., when sample size and mixture component proportions are small, the Bayesian parameter estimation still delivers a valid inference and does not need asymptotic normality (Frühwirth-Schnatter 2006). Since the joint posterior distribution incorporates all possible sample observations into the maximum number of mixture components, its determining process through the Bayesian framework is most difficult to solve analytically. A numerical approach, the well-known Markov Chain Monte Carlo (MCMC) method, needs to be implemented to deal with that problem. The Gibbs sampler algorithm carried out by augmenting the data is one of the algorithms in MCMC that is commonly applied to generate iteratively random samples from the posterior distribution of parameters. The iterative computing procedure conducted within the MCMC makes a structured simulation-based inference when the Bayesian framework is coupled with the MCMC.
Many papers studied the finite mixture of GLMs estimated by the Bayesian-MCMC approach. Lenk and DeSarbo (2000) introduced the finite mixture of GLMs with random effects and implemented the Gibbs sampler algorithm to fit the regression coefficients formed as a finite mixture of normal distributions. Hurn et al. (2003) developed a mixture of the Poisson and the logistic regression models, which were estimated through the Gibbs sampler and the Metropolis-Hastings algorithms. The application of a finite mixture of negative binomial regression for modeling heterogeneity in accident data was analyzed by Park and Lord (2009). Meanwhile, the mixture of Bernoulli regression was proposed by Iriawan et al. (2018) to determine the admission conditions of the Bidikmisi scholarship for students wanting to enroll in universities.
The purpose of this study is to examine more extensively a finite mixture of GLMs with gamma-distributed responses with a known number of mixture components estimated by using the Bayesian-MCMC approach. GLMs with gamma-distributed responses are commonly applied to model non-negative, continuous, and positive-skewed data, which are characteristics of economic distribution data. The study focuses on the application side rather than the theoretical side to find out the advantages of the proposed model. We consider appropriate descriptions of how Bayesian-MCMC is used to estimate the proposed model; then, we demonstrate the particular performances of the proposed model using two real applications with distinct response distributions. In the first real case, i.e., modeling the gross domestic product at purchasing power parity per capita, we study the responses that have a gamma distribution. In contrast, in the second real case, i.e., modeling the household income clustered in different sub-populations, we examine the responses that do not fit the gamma distribution but satisfy the essential characteristics required for developing the proposed model. Furthermore, specific extreme responses are detected in each mixture component in this second real case. Those two case studies are expected to strengthen the proposed model's application. The performance of the proposed model in prediction accuracy is compared with the Bayesian mixture of linear regression and the Bayesian gamma regression model through the WAIC. The Bayesian-MCMC computations are performed using the BUGS language through MultiBUGS, a software that supports the parallel computation of MCMC chains (Goudie et al. 2020). The convergence diagnostic of MCMC chains is verified through a CODA package which was developed by Plummer et al. (2006). This paper is arranged in the following steps: in Section 2, the flowchart diagram of the research methodology is described. In Section 3, the Bayesian framework for inferencing the finite mixture of GLMs with gamma-distributed responses is provided with the required computation schemes for the MCMC method. In addition to describe the WAIC, we also present the Gelman-Rubin method, which is considered to be a suitable method for assessing the convergent condition of MCMC chains. The real data applications are Econometrics 2022, 10, 32 3 of 28 presented in Section 4; Section 5 provides some discussions; and the main conclusions are given in Section 6.

Materials and Methods
The workflow of the research methodology that is applied in this paper is presented in Figure 1.
Econometrics 2022, 10, x FOR PEER REVIEW 3 of 31 inferencing the finite mixture of GLMs with gamma-distributed responses is provided with the required computation schemes for the MCMC method. In addition to describe the WAIC, we also present the Gelman-Rubin method, which is considered to be a suitable method for assessing the convergent condition of MCMC chains. The real data applications are presented in Section 4; Section 5 provides some discussions; and the main conclusions are given in Section 6.

Materials and Methods
The workflow of the research methodology that is applied in this paper is presented in Figure 1.

Bayesian Approach for the Finite Mixture of GLMs
In a generalized linear model, a framework is constructed through a relationship between random samples i y as a response and a vector of predictors x as a j-th predictor on the i-th observation, = 1, 2,..., jp , and = 1,2,..., in . The responses are considered to be generated from a family of exponential distributions. The structure of the GLMs is formulated as (1).
where  i is a linear predictor; (.)

Bayesian Approach for the Finite Mixture of GLMs
In a generalized linear model, a framework is constructed through a relationship between random samples y i as a response and a vector of predictors x i = (1, x i1 , . . . , x ip ) t with x ij as a j-th predictor on the i-th observation, j = 1, 2, . . . , p, and i = 1, 2, . . . , n. The responses are considered to be generated from a family of exponential distributions. The structure of the GLMs is formulated as (1).
where η i is a linear predictor; g(.) is a link function; µ i is an expected value of the response y i ; and β is a vector of unknown coefficients; β = (β 0 , β 1 , . . . , β p ) t . The vector β can represent the unknown parameters that describe the relationship between predictor variables and the response. Grün and Leisch (2008) developed a finite mixture of a generalized linear regression model by inserting the generalized linear model (1) into mixture components. A finite mixture of GLMs with K known mixture components is defined by (2): where ϑ denotes a vector of all parameters included in the model; f k (y i |x i , θ k ) represents the density function which is assumed to be a family of exponential distributions having a particular vector of parameter θ k ; and w k is a mixing parameter that satisfies ∑ K k=1 w k = 1 and w k > 0, ∀k, k = 1, 2, . . . , K. Thus, the mean on the k-th mixture component is provided by (3): where β k = (β 0k , β 1k , . . . β pk ) t is a vector of unknown coefficients β on the k-th mixture component. The extension of the model can be taken based on the mixing parameter w k , the specific density function f k (y i |x i , θ k ), and the link function g(.). The mixing parameter can be formed as a function that depends on a set of predictors recognized as concomitant variables (Grün and Leisch 2008): It is commonly considered that all mixture components have the same link functions and the same statistical distribution family. In this paper, we limit our study based on model (2), where the mixing parameter is not a function of the predictors.
The finite mixture of GLMs with gamma-distributed response is defined by supposing the response Y be a random variable distributed as a gamma distribution. It has a probability density function (pdf) specified as follows: where Γ(.) denotes the gamma function; α is a shape parameter; λ is an inverse scale parameter; α, λ > 0; and I(.) is the indicator function (Corrales and Cepeda-Cuervo 2019). The mean and variance of Y are given by E(Y) = µ = α/λ and Var(Y) = α/λ 2 . As proposed by Wiper et al. (2001), who considered that λ = α/µ, Equation (4) can be reparameterized as a function of shape α and parameter µ; then, the pdf for the gamma distribution can be rewritten as follows: Let y 1 , y 2 , . . . , y n be positive independent random samples that follow a gamma distribution as defined by mean µ i and shape parameter α: with two appropriated link functions: g(µ i ) = 1/µ i , as the canonical link or the inverse-link function and g(µ i ) = log(µ i ) as the log-link function. Model (2) can be reformed by the following equation: where ϑ = (α, β 1 , β 2 , . . . , β K , w) t ; and α = (α 1 , α 2 , . . . , α K ) t is a vector of shape parameters for the pdf of gamma distribution on k-th mixture component G k (y i |α k , α k /µ ik ), β k = (β 0k , β 1k , . . . , β pk ) t , k = 1, 2, . . . , K for µ ik defined in (3), and w = (w 1 , w 2 , . . . , w K ) t is a vector of mixing parameters. When the inverse-link function and the log-link function are implemented, the mean µ ik is reformulated, respectively, as (7) and (8): Econometrics 2022, 10, 32 5 of 28 for the inverse-link function and for the log-link function. β jk and x ijk are the coefficients of regression and the predictor of the k-th mixture component, respectively, where i = 1, 2, . . . , n, and j = 1, 2, . . . , p.
In spite of the fact that the inverse-link function is a canonical link function for gamma distribution, its implementation can produce a negative value for the predictive mean. Therefore, it does not assure a positive value of the mean as required by a response that has a gamma distribution. Unlike the inverse-link function, the use of the log-link function ensures the positivity of the predictive mean (Myers et al. 2012). Based on our implementation studies, we find computationally that the inverse-link function can not fulfill required results related to an appropriate positive mean. Hence, this paper focuses only on the finite mixture of GLMs with gamma-distributed responses and the log-link function, which is formulated as follows: where µ ik(log) is defined by Equation (8). We assume that the number of mixture components K is known and the same predictors are implemented in each mixture component. It can have a different number of observations. In the next section, we shorten the term of model (9) as a finite mixture of the log-link gamma GLMs.

Bayesian Framework for Inference
Concerning the Bayesian approach for estimating unknown parameters of model (9), unknown parameters are regarded as random variables that should be estimated through the posterior distribution of parameters. The joint posterior distribution for ϑ, π(ϑ|y, x), can be formed mathematically in proportional form: where p(ϑ) is the joint prior distribution of ϑ, and p(y|ϑ, x) is the likelihood, which is defined by thus, the joint posterior distribution with a log-link function is given by where µ ik(log) is defined by Equation (8).
In the finite mixture modeling, not all observations are allocated in only one mixture component. It needs a random variable called "a latent random variable" to act as an indicator for the allocation of the observation in the mixture components. This latent random variable denoted as z i is supposed to represent missing or incomplete data. It causes the computation of the likelihood function to be performed based on an incomplete data scheme. Diebolt and Robert (1994) proposed the Bayesian-MCMC approach to estimate the unknown parameters of a finite mixture model. Considering the vector of a latent random variable z i = (z i1 , z i2 , . . . , z iK ) t , it should be used to indicate in which mixture component the observation y i belongs. The value z ik ∈ {0, 1} and ∑ K k=1 z ik = 1 with z ik = 1 if the observation y i is drawn from the k-th mixture component, but otherwise z ik = 0. Hence, the complete-data likelihood function for the finite mixture of the log-link gamma GLMs can be expanded from (11) to be rewritten as: By using (14), the joint posterior distribution (13) can be reformulated as: where z = (z 1 , z 2 , . . . , z n ) and n k = ∑ n i=1 z ik represents the number of observations included in the k-th mixture component.

Bayesian-MCMC Approach
The process of MCMC computation needs a suitable choice for the prior distribution: a subjective probability distribution containing the experimenter's subjective belief relating to the true value of parameters that apparently occur. We propose some possibilities to choose reasonable priors for α, β 1 , β 2 , . . . , β K , and w. Wesner et al. (2020) who performed a Bayesian GLMs with gamma-distributed responses and a log-link function implemented the gamma distribution as a prior distribution for the shape parameter α. We adopt that prior in each mixture component; which can be stated as for k = 1, 2, . . . , K. The parameters of gamma distribution, υ and ν, can be chosen informatively regarding the distributional pattern of data. However, the selection of parameters in the gamma distribution as the prior distribution of shape parameters needs to be done attentively because it can affect the inference process of the posterior parameters. Concerning the coefficients β k , we take the prior distribution which is suitable for generalized linear modeling. Gelman et al. (2008) suggested a weakly informative prior for the coefficient β in the GLMs. That prior can provide a stable condition on the model selection process through a posterior predictive (Gelman et al. 2017). Lemoine (2019) proposed several weakly informative priors based on a normal distribution with zero mean as β 0k , β jk ∼ N(0, σ 2 ) Econometrics 2022, 10, 32 7 of 28 where j = 1, 2, . . . , p, and k = 1, 2, . . . , K. While the mixing parameter w that belongs to a simplex, i.e., (w 1 , . . . , w K ): ∑ K k=1 w k = 1, w k > 0, ∀k, k = 1, 2, . . . , K has the Dirichlet distribution, Dir(e 1 , e 2 , . . . , e K ), as a conjugate prior distribution. The Dirichlet distribution is a widely used distribution for modeling compositional data described as a measurement of proportions. The prior parameters e 1 , e 2 , . . . , e K are supposed to be the same in each mixture (i.e., e k = e 0 ), after which the conjugate prior for w becomes an equation: According to the explanations above, we apply the gamma, normal, and Dirichlet distributions as recommended by prior distributions for α, β 1 , β 2 , . . . , β K , and w respectively.
The Gibbs sampler algorithm is carried out to evoke random samples from the fullconditional posterior distribution of the parameter, which implies that the distribution of each parameter is given conditionally by the data and the other remaining parameters (Gelman et al. 2013). Since z ik ∈ {0, 1} and ∑ K k=1 z ik = 1, the full-conditional posterior distribution of z has a multinomial distribution: where Pr(z ik |α, w, β k , y i , x i ), which stands as the probability of each element z ik determined by for k = 1, 2, . . . , K. The full-conditional posterior distribution of the shape parameter α is formed by where the prior distribution p(α) is given by (16). Referring to (15) and (18), the full conditional posterior distribution for the parameter w is given by: The full conditional posterior distributions of unknown coefficients, β k are obtained as follows: where β \k denotes all β except β k and the prior distribution p(β k ) is given by (17). Algorithm 1 provides a Gibbs sampler for estimating the unknown parameters α, β 1 , β 2 , . . . , β K , and w. Algorithm 1: the Gibbs sampler for estimating the finite mixture of the log-link gamma GLMs.
Such an estimation process of mixture models through the latent allocation variables z is applied to a finite mixture modeling with observations drawn from the whole population rather than from different sub-populations, as noted by Rufo et al. (2006). It means that the latent allocation variables z, represented as missing data, are practically unobserved although the z variables are iteratively determined during MCMC computation as in Algorithm 1. If such variables are known, i.e., all observations already recognize their membership on each mixture component, then the Bayesian inferential procedure is implemented for each mixture component (Frühwirth-Schnatter 2006). We mention a finite mixture of GLMs with gamma-distributed responses, which has its estimation schemes developed by the Bayesian framework coupled with the MCMC method as a Bayesian mixture of GLMs with gamma-distributed responses. In a shortened designation, it can be called a Bayesian mixture of the log-link gamma GLMs.

Convergence Diagnostics
In the Bayesian schemes, the convergence of MCMC simulation implies that the Markov chain attains the posterior distribution of parameters. The MCMC convergence condition can be shown graphically by the trace plot of MCMC dynamical movements. The trace plot outlines the number of iterations against the generated values of estimated parameters. If these values, which are inside a domain, do not have firm periodicities or tendencies, then it can be supposed that convergence is reached. In a practical manner, the Econometrics 2022, 10, 32 9 of 28 trace plot which virtually has a "fat hairy caterpillar" like plot indicates a convergence of the Markov chain (Tatarinova and Schumitzky 2015). Other analytical methods that can be implemented to assess the convergence of MCMC are fully explained by Cowles and Carlin (1996). One of the analytical methods, the Gelman Rubin method, is recommended to be implemented during the estimation of the Bayesian model through MCMC simulation (Gelman and Shirley 2011).
The Gelman Rubin method constructs m mutually independent Markov chains, which assess its convergence by estimating the potential scale-reduction factor (PSRF). Every m Markov chains is convergent if the PSRF is less than 1.2 (Tatarinova and Schumitzky 2015). The implementation of convergence diagnostic methods for MCMC in the context of Bayesian mixture modeling is discussed in Suryaningtyas et al. (2018), Susanto et al. (2019), and Iriawan et al. (2019). All authors suggest that it should combine diagnostics methods with a graphical approach to determine the convergence condition of MCMC.

Model Selection
The Bayesian mixture models can be selected based on their out-of-sample predictive accuracy which can be estimated through information criteria approaches. Watanabe (2010) introduced the WAIC which is recommended for singular statistical models in which finite mixture models are classified into these models. The WAIC is formed by estimating the expected log pointwise predictive density ( elppd waic ): where lppd is the estimation of the log pointwise predictive density, and p waic is the estimated effective number of parameters. lppd is defined by: where S is the number of simulations performed and ϑ (s) is the simulated values of parameter ϑ at the s-th iteration with s = 1, 2, . . . , S. p waic is calculated based on the posterior sample variance of the log pointwise predictive density which is summed over all the data points y i : Another version of p waic that can be used is constructed as a mean-based formula: The variance-based formula (24) is more appropriate for practical use than the meanbased formula (25) since the variance-based formula gives results closer to the leaveone-out cross-validation (LOO-CV) as a natural method for estimating the out-of-sample predictive accuracy. In our proposed model, the values of pointwise predictive density h log y i |x i , ϑ (s) are calculated for each observation from Equation (9) in which ϑ (s) = (α (s) , β 1 (s) , β 2 (s) . . . , β K (s) , w (s) ) t are computed by Algorithm 1. The WAIC can be defined as a deviance scale form by multiplying Equation (23) by −2: Equation (26) is suitable since it can be compared with other measures of deviance, e.g., AIC and DIC (Gelman et al. 2014). The WAIC value of a model cannot be interpreted without regarding the WAIC values of other models. From the viewpoint of predictive accuracy, the performance of the two models can be compared by measuring the difference in their WAIC values. If the difference is significantly large, then the model that has the lowest value of WAIC has better predictive accuracy than the others.

Real Data Applications
Faraway (2016) considers the GLMs with gamma-distributed responses to be appropriate in two conditions. First, if the response clearly has a gamma distribution, then the GLMs with gamma-distributed responses is certainly applicable. Second, if another condition is not sure about a statistical distribution of the response but supposes there is a relationship between the mean and the variance of the response. Thus, we use two real data applications that have a different typical distribution of response to show the specific performances of the Bayesian mixture of the log-link gamma GLMs.
In the first case, we study the modeling of the gross domestic product at purchasing power parity (GDP_PPP) per capita in 160 countries. GDP_PPP per capita as the response for the overall data as well as in each mixture component has a gamma distribution with no extreme values.
Conversely, in the second case, the household income data as the whole response are not identified as following a specific statistical distribution. Nevertheless, in each mixture component, the responses have different statistical distributions. In the first mixture component, the response has a generalized gamma distribution with four parameters, and in the other mixture components, the responses have lognormal distribution with three parameters. The data responses furthermore satisfy some essential properties such as being non-negative, continuous, positive-skewed, and the existence of linear relationships between the natural logarithm of the response and household characteristics as predictors. Such a case represented the implementation of the Bayesian mixture of the log-link gamma GLMs for modeling the responses that do not follow the gamma distribution but fulfil important characteristics that are needed to determine the model. Moreover, in this case, some extreme responses lay in each mixture component.

Modeling GDP_PPP
The GDP_PPP per capita gives gross domestic product per capita values in current international dollars transformed by purchasing power parity, which is a converter factor that can provide possibly determine economic comparisons between countries. For example the population weighted Gini ratio measure of income inequality based on GDP_PPP per capita can be used to assess intercountry income disparity (World Bank 2020). Other uses for purchasing power parity are recently considered by the World Bank (2021b).
We study the GDP_PPP per capita of 160 countries for 2019 as the response that is affected by the predictors of economic and social factors. There are four economic and social factors: number of populations, compulsory education years, gross domestics product (GDP), and corruption perception index (CPI). These countries are classified based on four economic groups: low, lower-middle, upper-middle, and high income (World Bank 2021a). We designate these groups as first, second, third, and fourth. Among the countries, there are 20, 42, 44, and 54 countries in the first, second, third, and fourth groups, respectively. The complete list of 160 countries can be seen in Appendix A.

Data Description
Data were obtained from World Development Indicators (WDI) of World Bank Open Data (2021) except for the CPI data, which were accessed from Transparency.org (2021). Before we apply our proposed model, we examine some data characteristics. Figure 2 shows the GDP_PPP per capita has a positive-skewed distribution which tends to follow a gamma distribution. Econometrics 2022, 10, x FOR PEER REVIEW 12 of 31 To achieve more accurate results, goodness-of-fit tests using the Kolmogorov-Smirnov and the Chi-Squared tests are conducted to verify a significant statistical distribution for the GDP_PPP per capita data. We set the null hypothesis, which state that the GDP_PPP per capita in whole data and in each group follow a gamma distribution. Table 1 shows that all p-values are higher than a significance level of 0.05, which means that the null hypothesis is not rejected. It can be concluded that gamma distribution can be used to fit the GDP_PPP per capita for whole data and grouped data. The implementation of the log-link gamma GLMs conceptually needs a linear relationship between the natural logarithm of the response and the predictors (Myers et al. 2012). The scatter plots displayed in Figure 3 represent the linear relationships between the natural logarithm of GDP_PPP per capita with the four predictors. The natural logarithm of GDP_PPP per capita has a linear relationship with the compulsory education years and the CPI. Other predictors, i.e., the number of populations and the GDP, do not have that linear relationship. To achieve more accurate results, goodness-of-fit tests using the Kolmogorov-Smirnov and the Chi-Squared tests are conducted to verify a significant statistical distribution for the GDP_PPP per capita data. We set the null hypothesis, which state that the GDP_PPP per capita in whole data and in each group follow a gamma distribution. Table 1 shows that all p-values are higher than a significance level of 0.05, which means that the null hypothesis is not rejected. It can be concluded that gamma distribution can be used to fit the GDP_PPP per capita for whole data and grouped data. The implementation of the log-link gamma GLMs conceptually needs a linear relationship between the natural logarithm of the response and the predictors (Myers et al. 2012). The scatter plots displayed in Figure 3 represent the linear relationships between the natural logarithm of GDP_PPP per capita with the four predictors. The natural logarithm of GDP_PPP per capita has a linear relationship with the compulsory education years and the CPI. Other predictors, i.e., the number of populations and the GDP, do not have that linear relationship. Econometrics 2022, 10, x FOR PEER REVIEW 13 of 31 Therefore, we only assign the compulsory education years and the CPI as predictor variables in our proposed model, which has four mixture components representing the four economic groups. The model is defined by with         Therefore, we only assign the compulsory education years and the CPI as predictor variables in our proposed model, which has four mixture components representing the four economic groups. The model is defined by with ϑ = (α, β 1 , β 2 , β 3 , β 4 ,w) t and µ i k k(log) = exp β 0k + β 1k eduyears i k k + β 2k cpi i k k with i k = 1, 2, . . . , n k for k = 1, 2, 3, 4, i.e., n = n 1 + n 2 + n 3 + n 4 . The predictor eduyears i k k denotes the compulsory education years in the i-th country in the k-th economics group. The predictor cpi i k k denotes the CPI on the i-th country in the k-th economics group.

Estimated Parameters
The estimation processes perform 20,000 samples from two Markov chains that are eliminated in the first 1000 sample iterations in each chain as a burn-in stage. The prior distributions are constructed for three parameters that have to be estimated: the shape parameter α, the mixing parameter w, and the coefficient β jk . Regarding the pattern of distribution in Figure 1, the goodness-of-fit test in Table 1, and the proposed prior (16), we use the prior distribution for α k by α k ∼ Gamma(6, 1) for k = 1, 2, 3, 4; α k has a gamma distribution with a shape parameter of 6 and an inverse scale parameter of 1. Referring to the conjugate prior on (18), the mixing parameter w has a Dirichlet distribution as a prior distribution with value 1 for all parameters: w ∼ Dir(1, 1, 1, 1) The equal value of parameters on that Dirichlet distribution, Dir(1, 1, 1, 1), implies a uniform distribution over the three-dimensional simplex. Concerning the prior distribution for β jk , which is formed based on (17) according to our empirical studies, a large variance of σ 2 , i.e., a noninformative prior for β jk , can lead to no regularization on coefficient estimation. Consequently, the variance σ 2 needs to be relatively close to a zero mean; hence, the prior distributions for β jk consists of for k = 1, 2, 3, 4. The detailed estimated results can be seen in Table 2. In Table 2, the estimated values of parameters have a 95% posterior credible interval and a related diagnostic measurement of MCMC convergence, i.e., the PSRF. All PSRF values less than 1.2 ensure that the two Markov chains are convergent to achieve the posterior distribution of the parameters. Figure 4 confirms the convergence of two Markov chains forβ 1k andβ 2k .  Table 2, the estimated values of parameters have a 95% posterior credible interval and a related diagnostic measurement of MCMC convergence, i.e., the PSRF. All PSRF values less than 1.2 ensure that the two Markov chains are convergent to achieve the posterior distribution of the parameters. Figure 4 confirms the convergence of two Markov chains for  1 k and  2 k .
The posterior predictive check can be conducted for model checking purposes by determining the graphical posterior predictive checks. The graphical posterior predictive check that displays the observed data alongside the replicated data regenerated from the fitted model can show systematic differences between the observed and replicated data. Other methods that are useful for checking Bayesian models are discussed by Lunn et al. (2013). Figure 5 presents the distribution of observed and replicated data to show the adequacy of log-link gamma GLMs for fitting observed data. The posterior predictive check can be conducted for model checking purposes by determining the graphical posterior predictive checks. The graphical posterior predictive check that displays the observed data alongside the replicated data regenerated from the fitted model can show systematic differences between the observed and replicated data. Other methods that are useful for checking Bayesian models are discussed by Lunn et al. (2013). Figure 5 presents the distribution of observed and replicated data to show the adequacy of log-link gamma GLMs for fitting observed data. Econometrics 2022, 10, x FOR PEER REVIEW 16 of 31 In the Bayesian framework, a 95% posterior credible interval means that the actual probability of having true values of the estimated parameter in that interval is 0.95. Therefore, it can be applied as an assessment tool to determine the significance of the estimated parameter, which can be considered significant if the 95% posterior credible interval does not include a zero value. The 95% posterior credible intervals of two estimated parameters, 21  , which is not significant, means that the predictor CPI does not have an effect on the GDP_PPP per capita in the low-income country group, because the compulsory education years has a positive effect on the GDP_PPP per capita. On the other hand, among high-income countries, it is not significantly influenced by compulsory education since the estimated 14  can have a zero value. The CPI, even only slightly, still has a positive influence on the GDP-PPP per capita. In the lower-middle and upper-middle country groups, the compulsory education years and the CPI have a positive influence on the GDP_PPP per capita.
Concerning comparative studies, we take two models: the Bayesian mixture of linear regression model as a simpler mixture of regression model and the Bayesian gamma regression model as a model without a finite mixture framework. The Bayesian mixture of linear regression model is given by as a normal distribution in the k-th mixture component and The prior distributions for  jk and the standard deviations  k consists of (0,8000)  In the Bayesian framework, a 95% posterior credible interval means that the actual probability of having true values of the estimated parameter in that interval is 0.95. Therefore, it can be applied as an assessment tool to determine the significance of the estimated parameter, which can be considered significant if the 95% posterior credible interval does not include a zero value. The 95% posterior credible intervals of two estimated parameters, β 21 andβ 14 , contain zero values indicating that theβ 21 andβ 14 are not significant for the model. The estimatedβ 21 , which is not significant, means that the predictor CPI does not have an effect on the GDP_PPP per capita in the low-income country group, because the compulsory education years has a positive effect on the GDP_PPP per capita. On the other hand, among high-income countries, it is not significantly influenced by compulsory education since the estimatedβ 14 can have a zero value. The CPI, even only slightly, still has a positive influence on the GDP-PPP per capita. In the lower-middle and upper-middle country groups, the compulsory education years and the CPI have a positive influence on the GDP_PPP per capita.
Concerning comparative studies, we take two models: the Bayesian mixture of linear regression model as a simpler mixture of regression model and the Bayesian gamma regression model as a model without a finite mixture framework. The Bayesian mixture of linear regression model is given by with N k (.) as a normal distribution in the k-th mixture component and The prior distributions for β jk and the standard deviations σ k consists of where σ k ∼ U(0, 8000) means the standard deviations σ k are distributed uniformly between 0 and 8000, for k = 1, 2, 3, 4. The estimate procedures run 20,000 samples from two Markov chains, with the first 600 sample iterations in each chain discarded as a burn-in stage. The estimated parameters for the Bayesian mixture of linear regression model are presented on Table 3.  Table 3 reveals that the predictors, eduyears and cpi, do not affect the response in the first mixture component. It can be observed that two estimated parameters,β 11 andβ 21 , have 95% posterior credible intervals that contain zero values.
The modeling of GDP_PPP per capita can be developed without regard to the four economic groups. In such a case, the Bayesian gamma regression model can be implemented without having a finite mixture framework. The gamma regression model is specified by Similar prior distributions and processes for MCMC simulation in the Bayesian mixture of the log-link gamma GLMs are implemented for inferencing the Bayesian gamma regression model. The estimated parameters of the Bayesian gamma regression model can be seen in Table 4. The signs of the coefficientsβ 0 ,β 1 , andβ 2 are positive, which signifies that the enhancement of the compulsory education years and the improvement of the corruption perception index will increase the GDP_PPP per capita. We take a comparative study between the fitted Bayesian gamma regression model, the fitted Bayesian mixture of linear regression model as a simpler mixture model, and the fitted Bayesian mixture of the log-link gamma GLMs in predictive performance by computing the WAIC value. Table 5 displays the WAIC value for the fitted Bayesian mixture of the log-link gamma GLMs, which is smaller than the WAIC values of the fitted Bayesian gamma regression model and the fitted Bayesian mixture of linear regression model. Thus, in the predictive capability aspect, the fitted Bayesian mixture of the log-link gamma GLMs is better than the fitted Bayesian gamma regression model and the fitted Bayesian mixture of linear regression model. To find out the predictive capability more comprehensively, we apply both models to predict future GDP_PPP values for 2020. The density plots of predicted results are compared graphically with the observed data of GDP_PPP in 2020 that are obtained from World Bank Open Data (2022). Figure 6 represents the density plots that can exhibit some differences between the predicted results and the observed data. It can be seen from Figure 6 that the fitted Bayesian mixture of the log-link gamma GLMs has a better performance than the fitted Bayesian mixture of linear regression model and the fitted Bayesian gamma regression model in predicting future GDP_PPP values in 2020.

Modeling Household Income
In the field of income distribution analysis, the density function of income distribution can be used as an important approach to verify economic inequality. A functional form of the density function is determined to obtain a specific model that can exhibit the reality about income distribution, so inequality analyses can be built in terms of that specific model. Since the functional form of the density function is generally unknown, it needs to be estimated. Cowell and Flachaire (2015) remarked that the finite mixture model could be more convenient for estimating the density function of income distributions. They started to describe conceptually the implementation of the finite mixtures of linear regression for analyzing income distributions. However, the finite mixture of GLMs was not carried out in their studies. It can be seen from Figure 6 that the fitted Bayesian mixture of the log-link gamma GLMs has a better performance than the fitted Bayesian mixture of linear regression model and the fitted Bayesian gamma regression model in predicting future GDP_PPP values in 2020.

Modeling Household Income
In the field of income distribution analysis, the density function of income distribution can be used as an important approach to verify economic inequality. A functional form of the density function is determined to obtain a specific model that can exhibit the reality about income distribution, so inequality analyses can be built in terms of that specific model. Since the functional form of the density function is generally unknown, it needs to be estimated. Cowell and Flachaire (2015) remarked that the finite mixture model could be more convenient for estimating the density function of income distributions. They started to describe conceptually the implementation of the finite mixtures of linear regression for analyzing income distributions. However, the finite mixture of GLMs was not carried out in their studies.
The dynamics of household income as a fundamental element of income inequality can affect economic growth. Some researchers studied interrelated topics on household income, income inequality, and economic growth. Causa et al. (2014) found that improved growth in household income reduced income inequality. Meanwhile, Stiglitz (2015) stated that immoderate inequality could cause inadequate economic performance.

Data Description
We research the annual household income per capita in six economic corridors based on the Masterplan for Accelerating and Expansion of Indonesia Economic Development 2011-2025 (Coordinating Ministry for Economic Affairs 2011). The corridors consist of the six main regions of Indonesia: Sumatra, Java, Kalimantan, Sulawesi, Bali-Nusa Tenggara, and Papua-Maluku. Alisjahbana (2011) grouped corridors into three groups comprising two economic corridors in each. The first group consisted of Sumatra and Java; the second group, Kalimantan and Sulawesi, and the third, Bali-Nusa Tenggara and Papua-Maluku. In our research, we construct three mixture components representing the three groups. We develop our model from the point of view proposed by Chotikapanich et al. (2012). They constructed the income distribution of the whole region, which could be represented as a mixture model with subset regions as a member of the mixture component.
Household income data were determined from The Fifth Wave of the Indonesia Family Life Survey (IFLS-5) 2014 (Strauss et al. 2016). We sample 5545 households from 23 provinces in the six economic corridors. In more detail, 4, 267, 586, and 692 households are members of the first, second, and third groups, respectively. Our verification of the whole household income data using the Kolmogorov-Smirnov test indicates that it does not pursue a specific statistical distribution. However, once we examine the data in each mixture component, the response in the first mixture has a generalized gamma distribution with four parameters and a lognormal distribution with three parameters for responses in the other mixture components. Therefore, we need to determine an appropriate statistical distribution suitable for modeling household income data.
Some essential patterns such as non-negative, continuous, and positive-skewed are present in the household income data which are displayed in Figure 7. It shows the possibility of proposing gamma distribution as a representative statistical distribution for household income data. Figure 8 exhibits the scatter plots in four dimensions which relate the natural logarithm of household income data to all three predictors; the number of household members, the number of completed years of formal education by the head of the household, and the natural logarithm of the household's wealth. The scatter plots indicate a linear relationship between the natural logarithm of household income data and all three predictors which exist in the overall response data in Figure 8a and in each mixture component shown in Figure 8b-d. These relationships give evidence that it can be modeled by the log-link gamma GLMs.
The performance of the gamma GLM was studied by Fu and Moncher (2004) who verified the unbiasedness and stability of the GLM for non-negative, continuous, and positive-skewed data. They found that the GLM assuming the gamma distribution gave better predictive accuracy and efficiency than the GLM assuming the normal distribution or the lognormal distribution. Thus, we consider that the gamma GLM is a reasonable model for analyzing household income data.
Previously, Wicaksono et al. (2017) showed that the natural log of annual household income per capita as a response variable had been regressive on household characteristics as predictors. Nonetheless, they worked under the assumption that the household income data followed a normal distribution, and their model did not use the finite mixture modeling framework. is the natural logarithm of the household's wealth (in Indonesian rupiah per year) with k = 1, 2, 3. We take two Markov chains with 25,000 iterations on each chain that are discarded during the first 10,000 iterations as burn-in. Thus, we use 50,000 samples to estimate the parameters.  The prior distributions are set up for the shape parameter  , the mixing parameter w , and the coefficient  jk . Referring to the pattern of distribution in Figure 3 and the proposed prior (16), we use the prior distribution for  k : where  k has a gamma distribution with a shape parameter of 2 and an inverse scale parameter of 1.
The mixing parameter w has a Dirichlet distribution as a prior distribution with value of 1 for all parameters, (1,1,1) w Dir .
The prior distributions are set up for the shape parameter α, the mixing parameter w, and the coefficient β jk . Referring to the pattern of distribution in Figure 3 and the proposed prior (16), we use the prior distribution for α k : where α k has a gamma distribution with a shape parameter of 2 and an inverse scale parameter of 1.
The mixing parameter w has a Dirichlet distribution as a prior distribution with value of 1 for all parameters, w ∼ Dir(1, 1, 1) The equal value of parameters on that Dirichlet distribution, Dir(1, 1, 1), implies a uniform distribution over the two-dimensional simplex. The prior of β jk is specified based on (17). Accordingly, we propose the small variance σ 2 = 0.1; thus, the prior distribution for β jk is β 0k , β jk ∼ N(0, 0.1) for j = 1, 2, 3 and k = 1, 2, 3. The implementation of the prior distribution N(0,0.1) suggests that the corresponding predictors β jk have fewer effects on the posterior distribution β jk . This arrangement scheme of prior distribution allows a data-driven approach, i.e., the likelihood is more dominant than the prior distribution throughout the computation of the posterior distribution. The results are provided in Table 6. It can be noted from Table 6 that all of the estimated parameters are significant at a 95% posterior credible interval. It can be shown that for all estimated parameters, the values of PSRF that are equal to 1 confirm that the Markov chains for all estimated parameters are convergent. Figure 9, which represents the trace plot of the Markov chains for estimated coefficientsβ 0 ,β 1 ,β 2 , andβ 3 , proves the convergences. with ( ) The estimated parameters of the Bayesian gamma regression model shown in Table  6 are determined through the Gibbs sampler algorithm with the same prior distributions which are used to infer the fitted Bayesian mixture of the log-link gamma GLMs:  The estimated shape parameters α (2.2740, 0.5182, and 1.6270) for the first, second and third groups, respectively, closely fit the distribution pattern of annual household income per capita in each mixture component, and the mixing parameters are 0.7693, 0.1058, and 0.1249, respectively. In this case, the mixing parameter represents the proportion of households in the groups against the sample populations. The sign of the estimated coefficientsβ 0 ,β 1 ,β 2 , andβ 3 are as expected for each of the groups. The negative sign forβ 1 suggests that the increasing number of household members reduces the annual household income per capita. The length of education by the head of the household positively correlates with the annual household income per capita, indicating positive sign forβ 2 . It implies that knowledge may improve as the length of education by the head of household rise, thereby resulting in higher annual household income per capita. The household's wealth also has a positive relationship to the annual household income per capita. This relation is generally reasonable since households with more wealth can generate more income using their own wealth. The results show that the Bayesian-MCMC approach gives a suitable result for estimating a finite mixture of the log-link gamma GLMs. The distribution of observed and replicated data depicting a graphical posterior predictive check in Figure 10 shows that the fitted model generally matches the observed data. Similar processes for MCMC simulations are also conducted with two Markov chains with 25,000 iterations discarded in the first 10,000 iterations as burn-in, so it uses 50,000 samples. Based on the PSRF values in Table 7, the estimated parameters are convergent to their posterior distributions, whereas the 95% posterior credible intervals indicate that all To learn the capability of predictive accuracy, the fitted Bayesian mixture of the log-link gamma GLMs is compared with the fitted Bayesian gamma regression model by calculating the WAIC. The gamma regression model is given by: The estimated parameters of the Bayesian gamma regression model shown in Table 6 are determined through the Gibbs sampler algorithm with the same prior distributions which are used to infer the fitted Bayesian mixture of the log-link gamma GLMs: α ∼ Gamma(2, 1) and β 0 , β j ∼ N(0, 0.1) Similar processes for MCMC simulations are also conducted with two Markov chains with 25,000 iterations discarded in the first 10,000 iterations as burn-in, so it uses 50,000 samples. Based on the PSRF values in Table 7, the estimated parameters are convergent to their posterior distributions, whereas the 95% posterior credible intervals indicate that all estimated parameters are significant. However, the WAIC value of the Bayesian gamma regression, which has an infinite value as shown in Table 8, indicates that the Bayesian mixture of the log-link gamma GLMs is significantly better in the predictive accuracy than the Bayesian gamma regression model.

Discussion
Two examples of real-data applications have a different typical distribution of response data. In the first example, the modeling of GDP_PPP per capita following the gamma distribution, the Bayesian mixture of the log-link gamma GLMs considerably outperforms the Bayesian mixture of linear regression model and the Bayesian gamma regression based on the predictive measure of WAIC. Moreover, through the finite mixture framework, we discover the factual relationship between the GDP_PPP per capita and the predictors in each group. In low-income countries, only the compulsory education years has a significant effect on the GDP_PPP per capita. Conversely, only the CPI has an important contribution to the change in GDP_PPP per capita in high-income countries. In this case, the usefulness of the Bayesian mixture of the log-link gamma GLMs can be noted.
The second case, in addition to revealing the implementation of the model on the data that does not comply with the gamma distribution, exposes the influential presence of extreme values which can be regarded as outliers that can be eliminated from data or remain preserved in the data depending on their characteristics. Nevertheless, the existence of plentiful outliers in the data distribution was natural in the heavy-tailed distribution perspective (Klebanov and Volchenkova 2019). Hence, the outliers that exist in the household income data of the IFLS-5 do not need to be removed since they present an actual condition.
Referring to Equations (24)-(26), we find out that one of the possible sources in which the WAIC can have a large value is in the extreme values in the data distribution of y i that have a heavy-tailed distribution. Klebanov and Volchenkova (2019) showed that the observations belonging to the tail in a heavy-tailed distribution, i.e., extreme values, can have an infinite variance. To verify the existence of extreme values, we examine the adjusted boxplot, which is a modification of the boxplot used to depict a robust measure of skewness (Hubert and Vandervieren 2008). The adjusted boxplots displayed in Figure 11 describe some extreme values that exist in the household income data.
outperforms the Bayesian mixture of linear regression model and the Bayesian gamma regression based on the predictive measure of WAIC. Moreover, through the finite mixture framework, we discover the factual relationship between the GDP_PPP per capita and the predictors in each group. In low-income countries, only the compulsory education years has a significant effect on the GDP_PPP per capita. Conversely, only the CPI has an important contribution to the change in GDP_PPP per capita in high-income countries. In this case, the usefulness of the Bayesian mixture of the log-link gamma GLMs can be noted.
The second case, in addition to revealing the implementation of the model on the data that does not comply with the gamma distribution, exposes the influential presence of extreme values which can be regarded as outliers that can be eliminated from data or remain preserved in the data depending on their characteristics. Nevertheless, the existence of plentiful outliers in the data distribution was natural in the heavy-tailed distribution perspective (Klebanov and Volchenkova 2019). Hence, the outliers that exist in the household income data of the IFLS-5 do not need to be removed since they present an actual condition.
Referring to Equations (24)-(26), we find out that one of the possible sources in which the WAIC can have a large value is in the extreme values in the data distribution of i y that have a heavy-tailed distribution. Klebanov and Volchenkova (2019) showed that the observations belonging to the tail in a heavy-tailed distribution, i.e., extreme values, can have an infinite variance. To verify the existence of extreme values, we examine the adjusted boxplot, which is a modification of the boxplot used to depict a robust measure of skewness (Hubert and Vandervieren 2008). The adjusted boxplots displayed in Figure  11 describe some extreme values that exist in the household income data. While more specifically, if we divide the data following the three mixture components as our proposed model, some extreme values will also be split into each mixture component. It will cause a difference in the WAIC value which is substantively wide between the fitted Bayesian gamma regression model and the fitted Bayesian mixture of the log-link gamma GLMs.
Several improvements for further studies related to the topic of this paper can be made for future research purposes. Inferential processes discussed in this paper use the While more specifically, if we divide the data following the three mixture components as our proposed model, some extreme values will also be split into each mixture component. It will cause a difference in the WAIC value which is substantively wide between the fitted Bayesian gamma regression model and the fitted Bayesian mixture of the log-link gamma GLMs.
Several improvements for further studies related to the topic of this paper can be made for future research purposes. Inferential processes discussed in this paper use the Gibbs sampler algorithm through the MultiBUGS software. Further research can be developed by using the Hamiltonian Monte Carlo algorithm (HMC) with the no-U-turn (NUTS) sampler through the Stan software. Solikhah et al. (2021) showed that HMC using NUTS sampler performed well in estimating parameters of a mixture of K-component Fisher's z autoregressive models. In addition, the proposed model can be developed for a mixture of distributions with more general gamma regression models with mean and shape (or variance) parameters following regression structures previously studied in Corrales and Cepeda-Cuervo (2019).

Conclusions
In this paper, we examine more extensively a Bayesian mixture of GLMs with gammadistributed responses that combined three main parts: the GLMs with gamma-distributed response, finite mixture modeling, and a computational procedure for inferencing through the Bayesian-MCMC approach. Two link functions which are available for GLMs with gamma-distributed responses can give different results on the predictive mean, the log-link function is more appropriate than the inverse-link function to ensure the positivity of the predictive mean. Considering the implementation of the model on two real data applications, the link function used and the chosen prior distributions give an important role for the Bayesian-MCMC approach to work appropriately throughout the simulation-based inferential processes. We note some advantages of the Bayesian mixture of the log-link gamma GLMs. The model has better predictive accuracy than the Bayesian mixture of linear regression and the Bayesian gamma regression model. It can point out real relationships between the response and the predictors. Furthermore, it handles the problems concerning extreme values, whereas the Bayesian model without finite mixture framework has difficulty overcoming extreme values. Nevertheless, our research only uses the Gibbs sampler algorithm and the mixing parameter which is not a function of the predictors. Therefore, it can be recommended for the future research.