Two-Component Unit Weibull Mixture Model to Analyze Vote Proportions

Renata Rojas Guerra; Fernando A. Peña-Ramírez; Charles P. Mafalda; Gauss Moutinho Cordeiro

doi:10.3390/IOCMA2023-14550

,

and

¹

Department of Statistics, Centro de Ciências Naturais e Exatas, Universidade Federal de Santa Maria, Santa Maria 97105-900, Brazil

²

Department of Statistics, Facultad de Ciencias, Universidad Nacional de Colombia, Bogotá 111321, Colombia

³

Department of Statistics, Centro de Ciências Exatas e da Natureza, Universidade Federal de Pernambuco, Recife 50670-901, Brazil

^*

Author to whom correspondence should be addressed.

Comput. Sci. Math. Forum2023, 7(1), 45;https://doi.org/10.3390/IOCMA2023-14550

This article belongs to the Proceedings The 1st International Online Conference on Mathematics and Applications

Version Notes

Order Reprints

Abstract

In this paper, we present a two-component Weibull mixture model. An important property is that this new model accommodates bimodality, which can appear in data representing phenomena in some heterogeneous populations. We provide statistical properties, such as the quantile function and moments. Additionally, the expectation-maximization (EM) algorithm is used to find maximum-likelihood estimates of the model parameters. Further, a Monte Carlo study is carried out to evaluate the performance of the estimators on finite samples. The new model’s relevance is shown with an application referring to the vote proportion for the Brazilian presidential elections runoff in 2018. The proportion of votes is an important measure in analyzing electoral data. Since it is a variable limited to the unitary interval, unit distributions should be considered to analyze its probabilistic behavior. Thus, the introduced model is suitable for describing the characteristics detected in these data, such as the asymmetric behavior, bimodality, and the unit interval as support. In the application, the superiority of the proposed model is verified when comparing the fit with the two-component beta mixture models.

Keywords:

Brazilian elections; EM algorithm; mixture distributions; unit models; unit Weibull

1. Introduction

Finite mixture models appeared in a study on the asymmetry of grouped materials not being homogeneous [1], being useful in the presence of multimodality, heavy tails, and asymmetry [2]. Many works have appeared in the literature in the context of finite mixtures. For example, Jewell [3] proposed a model for exponential mixtures. Considering Weibull mixture models, we can cite [4] for characterizations of the failure rate function and [5] for reliability approximations. Recently, Huang et al. [6] analyzed individual periods in combined sea waves using parametric mixture models.

In data with limited support, beta mixture models have been studied by several authors. Ji et al. [7] proposed a study on the beta mixture to solve problems related to correlations of gene expression levels, Bouguila et al. [8] presented a study on Bayesian analysis, and Grün et al. [9] studied beta mixture in regression models. The Kumaraswamy mixture model is an alternative to the beta mixture models. Khalid et al. [10] carried out a Bayesian study on the three-component Kumaraswamy mixture.

In this paper, a new two-component mixture model is proposed as an alternative to model population heterogeneities in the unit support. We consider that each mixture component follows a unit Weibull (

UW

) distribution [11]. Some of the contributions of this new distribution, the so-called Weibull mixture model of the two-component unit (

UWUW

), are: (i) all estimation routines, including simulations and applications, are performed using the expectation-maximization (EM) algorithm, and (ii) applicability for electoral data modeling. The EM algorithm is a computational method used to calculate the maximum likelihood estimator (MLE) iteratively [12]. It is widely used to estimate the maximum probability for finite mixture models [13]. Finally, the adjustment to electoral data, defined as the district’s share of votes by the total number of valid votes cast in the district, the proportions of votes are useful since the electoral districts can vary considerably in the size of the population [14]. Additionally, this measure can analyze other characteristics of the electoral process, such as electoral volatility [15] and nationalization of electoral change [14]. The data set used refers to the proportions of votes in the Brazilian presidential elections runoff in 2018.

The rest of the work is organized as follows. In Section 2, the new mixture model is presented. Section 3 introduces the EM algorithm to perform maximum likelihood estimation for the

UWUW

model. In Section 4, an application is made with electoral data. The final considerations of this work are addressed in Section 5.

2. The Proposed Model

In this section, the two-component unit Weibull mixture distribution, so-denoted

UWUW

, is introduced. Let X be a random variable with

UWUW

distribution. Then, its cdf is obtained as

\begin{matrix} F_{UWUW} (x; Θ) & = p F_{UW} (x; θ_{1}) + (1 - p) F_{UW} (x; θ_{2}) \\ = p τ^{{[- log x / log μ_{1}]}^{β_{1}}} + (1 - p) τ^{{[- log x / log μ_{2}]}^{β_{2}}}, \end{matrix}

where

θ_{1}

=

(μ_{1}

,

β_{1})^{⊤}

,

θ_{2}

= (

μ_{2}

,

β_{2})^{⊤}

,

μ_{1}

and

μ_{2} \in (0, 1)

are location parameters associated with the

τ

th quantiles of each component of the mixture,

β_{1}

and

β_{2} > 0

are shape parameters, and

τ \in (0, 1)

is assumed to be known. One can note we use a parameterization based on quantiles to formulate each component of the mixture. The advantage of working with reparametrization in terms of quantiles is its flexibility to model data with heterogeneous conditional distributions [16,17]. The

UWUW

probability density function (pdf) is given by

\begin{matrix} f_{UWUW} (x; Θ) & = p f_{UW} (x; θ_{1}) + (1 - p) f_{UW} (x; θ_{2}) \\ = p \frac{β_{1} log τ}{x log μ_{1}} {(\frac{log x}{log μ_{1}})}^{β_{1} - 1} τ^{{(log x / log μ_{1})}^{β_{1}}} \\ + (1 - p) \frac{β_{2} log τ}{x log μ_{2}} {(\frac{log x}{log μ_{2}})}^{β_{2} - 1} τ^{{(log x / log μ_{2})}^{β_{2}}} . \end{matrix}

(1)

Figure 1 shows some plots of the

UWUW

pdf for some combinations of parameters and

τ = 0.5

, which reveals the high flexibility of the new distribution. It accommodates bimodal, unimodal, descending, and bath forms under different asymmetric characteristics. Additionally, it is possible to identify a bimodal form for different values of p. Hereafter, we denote X as a random variable following a

UWUW

distribution, this is,

X \sim UWUW (Θ)

.

Figure 1. Plots of the

UWUW

density for some parameter values. (a) For

p = 0.4

. (b) For

p = 0.6

and

β_{1} = 3.0

.

3. Parameter Estimation

An approach to the iterative computation of MLEs when the observations can be treated as incomplete data is the well-known expectation-maximization (EM) algorithm. Considering the context of two-component mixture models, let

x = {x_{1}, \dots, x_{n}}

be a random sample of size n from a random variable X having pdf (4) with unknown parameter vector

Θ = {(θ_{1}^{⊤}, θ_{2}^{⊤}, p)}^{⊤},

where

θ_{1}

= (

μ_{1}

,

β_{1})^{⊤}

and

θ_{2}

= (

μ_{2}

,

β_{2})^{⊤}

. It is customary to call

x

of “incomplete data” since it is associated with a second component

z = {z_{1}, \dots, z_{n}}

of unobserved values of a latent random variable Z. Each value

z_{i}

of Z indicates which component of the mixture belongs to the ith observation

x_{i}

such that

\begin{matrix} z_{i} = \{\begin{matrix} 1 & if x_{i} has pdf f_{UW} (x | θ_{1}), \\ 0 & if x_{i} has pdf f_{UW} (x | θ_{2}), \end{matrix} \end{matrix}

where

P (Z = 1) = p

and

P (Z = 0) = 1 - p

. The complete-data specification is determined by the joint density of

(X, Z)

\begin{matrix} f_{X, Z} (x_{i}, z_{i}; Θ) = & {[p \frac{β_{1} log τ}{x_{i} log μ_{1}} {(\frac{log x_{i}}{log μ_{1}})}^{β_{1} - 1} τ^{{(log x_{i} / log μ_{1})}^{β_{1}}}]}^{z_{i}} \\ \times {[(1 - p) \frac{β_{2} log τ}{x log μ_{2}} {(\frac{log x}{log μ_{2}})}^{β_{2} - 1} τ^{{(log x / log μ_{2})}^{β_{2}}}]}^{1 - z_{i}}, \end{matrix}

and based on it, the complete log-likelihood function, for the sample of size n, is given by

\begin{matrix} l_{c} (Θ) = & \sum_{i = 1}^{n} log f_{X, Z} (x_{i}, z_{i}; Θ) \\ = & \sum_{i = 1}^{n} z_{i} log [p \frac{β_{1} log τ}{x_{i} log μ_{1}} {(\frac{log x_{i}}{log μ_{1}})}^{β_{1} - 1} τ^{{(log x_{i} / log μ_{1})}^{β_{1}}}] \\ + \sum_{i = 1}^{n} (1 - z_{i}) log [(1 - p) \frac{β_{2} log τ}{x log μ_{2}} {(\frac{log x}{log μ_{2}})}^{β_{2} - 1} τ^{{(log x / log μ_{2})}^{β_{2}}}] . \end{matrix}

(2)

The EM algorithm iterates, between two steps, to compute the MLEs of

Θ

. In the E-step or expectation step, due to (2), is unobservable, it is replaced by its conditional expectation with respect to the conditional distribution of Z, given

x

and the current parameter estimates. More specifically, in the

(k + 1)

th iteration, the E-step computes

\begin{matrix} Q (Θ, Θ^{(k)}) = & E_{Θ^{(k)}} [l_{c} (Θ) | x] \\ = & \sum_{i = 1}^{n} log f_{X, Z} (x_{i}, z_{i}; Θ) \\ = & \sum_{i = 1}^{n} {\bar{z}}_{i 1} log [p \frac{β_{1} log τ}{x_{i} log μ_{1}} {(\frac{log x_{i}}{log μ_{1}})}^{β_{1} - 1} τ^{{(log x_{i} / log μ_{1})}^{β_{1}}}] \\ + & \sum_{i = 1}^{n} {\bar{z}}_{i 2} log [(1 - p) \frac{β_{2} log τ}{x log μ_{2}} {(\frac{log x}{log μ_{2}})}^{β_{2} - 1} τ^{{(log x / log μ_{2})}^{β_{2}}}], \end{matrix}

(3)

where

\begin{matrix} {\bar{z}}_{i 1} = \frac{p^{(k)} f_{U W} (x; θ_{1}^{(k)})}{p^{(k)} f_{U W} (x; θ_{1}^{(k)}) + (1 - p^{(k)}) f_{U W} (x; θ_{2}^{(k)})}, \end{matrix}

\begin{matrix} {\bar{z}}_{i 2} = \frac{(1 - p^{(k)}) f_{U W} (x; θ_{2}^{(k)})}{p^{(k)} f_{U W} (x; θ_{1}^{(k)}) + (1 - p^{(k)}) f_{U W} (x; θ_{2}^{(k)})}, \end{matrix}

and

Θ^{(k)} = {(θ_{1}^{(k)}, θ_{2}^{(k)}, p^{(k)})}^{⊤}

are obtained from the kth iteration.

The M-step or maximization step, requires the maximization of (3) with respect to

Θ

. This is

\begin{matrix} Θ^{(k + 1)} = \underset{Θ}{arg max} Q (Θ, Θ^{(k)}) . \end{matrix}

(4)

The vector

Θ^{(k + 1)}

is used to initialize the next iteration. Thus, the EM algorithm is initialized by the starting values

Θ^{(0)} = {(θ_{1}^{(0)}, θ_{2}^{(0)}, p^{(0)})}^{⊤}

and the MLEs

\hat{Θ}

of

Θ

are obtained by

\hat{Θ} = Θ^{(k + 1)}

when a convergence criterion

| Θ^{(k + 1)} - Θ^{(k)} | < ε

is reached [12]. We set

ε

= 10,000. It should be noted that it is not possible to obtain analytical results from these expressions. It is necessary to perform this maximization by applying some iterative techniques, for example, Newton–Raphson’s method [18].

4. Application

In what follows, we present a case study that illustrates the suitability of the

UWUW

distribution for modeling real unit data sets. The database considered is the municipality’s vote proportion of the winning candidate in the Brazilian presidential elections runoff in 2018. Since it presents a bimodal shape, see Figure 2a, a unimodal distribution would not be appropriate to fit this data set. Therefore, the

UWUW

distribution is a suitable alternative to model these data. Its performance is compared with other double-bounded component mixtures that have already been studied in the literature: two-component beta mixture (

BB

) model. In this paper, the parameterization proposed by [19] is considered to define the

BB

model, which has pdf given by

\begin{matrix} f (x; Θ) & = p \frac{Γ (μ_{1} + β_{1})}{Γ (μ_{1}) Γ (β_{1})} x^{μ_{1} - 1} {(1 - x)}^{β_{1} - 1} + (1 - p) \\ \frac{Γ (μ_{2} + β_{2})}{Γ (μ_{2}) Γ (β_{2})} x^{μ_{2} - 1} {(1 - x)}^{β_{2} - 1}, 0 < x < 1, \end{matrix}

where

Θ = {(μ_{1}, μ_{2}, β_{1}, β_{2}, p)}^{⊤}

,

μ_{1}

and

μ_{2} \in (0, 1)

are location parameters associated with the mean of each mixture component,

β_{1}

and

β_{2}

> 0

are precision parameters, and

p \in (0, 1)

is the parameter that measures the weights of the mixture.

Figure 2. Estimated densities (a) and empirical cdf (b) of the

BB

,

KWKW

and

UWUW

models.

For all competitive mixture models, the parameter estimation is carried out using the EM algorithm following the steps described in Section 3. The Corrected Anderson–Darling (

A^{*}

) [20], Cramér–von Misses (

W^{*}

) [21], and the Kolmogorov–Smirnov (

K S

) [22] statistics are calculated to assess the quality-of-fit for the three fitted models. The lower their values are, the better the model fit. All the analysis is performed using the R programming language, and the goodness-of-fit measures are computed using the AdequacyModel [23] subroutine.

Table 1 displays the parameter estimates, standard errors, and the model comparison criteria of the three considered models. The results indicate that the

UWUW

distribution provides the lowest values for all goodness-of-fit statistics. The

KWKW

presents the worse performance, not being an adequate alternative to fit these data.

Table 1. Parameter estimates and standard errors (given in parentheses) for the models fitted to Bolsonaro’s vote proportion in Brazilian presidential elections in 2018.

Figure 2a presents the histogram of the vote proportion data overlaid with the estimated densities of the fitted models. The bimodality of the data is confirmed, and the

UWUW

model provides the closest fit to the histogram. Clearly, the

KWKW

model is not adequate to fit these data. Further, Figure 2b gives plots of the empirical and estimated cdfs. This visual inspection favors the results in Figure 2a and Table 1, indicating that the proposed model is appropriate to fit these data. Thus, it can be an effective alternative to analyze vote proportions, being quite competitive with the

BB

model and providing consistently better fits than the

KWKW

model. Therefore, the

UWUW

provides a useful tool for modeling bimodal data restricted to the unit interval. Additionally, with the estimates of the mixture parameters, it is possible to identify that more than 50 % of the observations belong to the first mixture component. The estimated median of the first component is

{\hat{μ}}_{1} = 0.2677

and the estimated median of the second component is

{\hat{μ}}_{2} = 0.6649

.

5. Conclusions

A two-component mixture model was defined to describe the heterogeneities of the population with the limited domain. The two-component unit Weibull mixture (

UWUW

) model is formulated considering that each mixture component follows the unit Weibull distribution. Some of the main properties of

UWUW

have been presented, such as ordinary moments. The EM algorithm was used to obtain maximum likelihood estimates for the model parameters. To evaluate the performance of the EM algorithm, Monte Carlo simulations were performed. An application to electoral data illustrates the importance and potential of the new model. The motivating data set is about the vote proportions obtained by the winning candidate in the Brazilian presidential runoff elections in 2018. The results indicate that our proposal is adequate to fit this data set since it is suitable to analyze the asymmetric and bimodal behaviors. From the mixing parameter estimate, we can conclude that 53.68% of the observations are from the first component of the mixture with estimated median at

{\hat{μ}}_{1} = 0.2677 .

The estimated median for the municipalites from the second mixture component was

{\hat{μ}}_{2} = 0.6491 .

This application proved empirically that the

UWUW

performance may overcome other two-component mixture models based on other widely known unit distributions such as the beta and Kumaraswamy.

Author Contributions

Conceptualization, R.R.G.; methodology, R.R.G. and F.A.P.-R.; software, R.R.G.; validation, R.R.G. and F.A.P.-R.; formal analysis, R.R.G.; investigation, R.R.G., F.A.P.-R. and C.P.M.; resources, C.P.M.; data curation, R.R.G.; writing—original draft preparation, R.R.G., F.A.P.-R. and C.P.M.; writing—review and editing, G.M.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Pearson, K. Contributions to the mathematical theory of evolution. Philos. Trans. R. Soc. Lond. A 1894, 185, 71–110. [Google Scholar]
Lachos, V.H.; Moreno, E.J.L.; Chen, K.; Cabral, C.R.B. Finite mixture modeling of censored data using the multivariate Student-t distribution. J. Multivar. Anal. 2017, 159, 151–167. [Google Scholar] [CrossRef]
Jewell, N.P. Mixtures of exponential distributions. Ann. Stat. 1982, 10, 479–484. [Google Scholar] [CrossRef]
Jiang, R.; Murthy, D. Mixture of Weibull distributions-parametric characterization of failure rate function. Appl. Stoch. Model. Data Anal. 1998, 14, 47–65. [Google Scholar] [CrossRef]
Bučar, T.; Nagode, M.; Fajdiga, M. Reliability approximation using finite Weibull mixture distributions. Reliab. Eng. Syst. Saf. 2004, 84, 241–251. [Google Scholar] [CrossRef]
Huang, W.; Dong, S. Probability distribution of wave periods in combined sea states with finite mixture models. Appl. Ocean Res. 2019, 92, 101938. [Google Scholar] [CrossRef]
Ji, Y.; Wu, C.; Liu, P.; Wang, J.; Coombes, K.R. Applications of beta-mixture models in bioinformatics. Bioinformatics 2005, 21, 2118–2122. [Google Scholar] [CrossRef]
Bouguila, N.; Ziou, D.; Monga, E. Practical Bayesian estimation of a finite beta mixture through Gibbs sampling and its applications. Stat. Comput. 2006, 16, 215–225. [Google Scholar] [CrossRef]
Grün, B.; Kosmidis, I.; Zeileis, A. Extended Beta Regression in R: Shaken, Stirred, Mixed, and Partitioned; Technical Report, Working Papers in Economics and Statistics; University of Innsbruck, Research Platform Empirical and Experimental Economics (EEECON): Innsbruck, Austria, 2011. [Google Scholar]
Khalid, M.; Aslam, M.; Sindhu, T.N. Bayesian analysis of 3-components Kumaraswamy mixture model: Quadrature method vs. Importance sampling. Alex. Eng. J. 2020, 59, 2753–2763. [Google Scholar] [CrossRef]
Mazucheli, J.; Menezes, A.; Ghitany, M. The unit-Weibull distribution and associated inference. J. Appl. Probab. Stat. 2018, 13, 1–22. [Google Scholar]
Dempster, A.P.; Laird, N.M.; Rubin, D.B. Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B (Methodol.) 1977, 39, 1–22. [Google Scholar]
Redner, R.A.; Walker, H.F. Mixture densities, maximum likelihood and the EM algorithm. SIAM Rev. 1984, 26, 195–239. [Google Scholar] [CrossRef]
Alemán, E.; Kellam, M. The nationalization of presidential elections in the Americas. Elect. Stud. 2017, 47, 125–135. [Google Scholar] [CrossRef]
Powell, E.N.; Tucker, J.A. Revisiting electoral volatility in post-communist countries: New data, new results and new approaches. Br. J. Political Sci. 2013, 44, 123–147. [Google Scholar] [CrossRef]
Bayes, C.L.; Bazán, J.L.; De Castro, M. A quantile parametric mixed regression model for bounded response variables. Stat. Its Interface 2017, 10, 483–493. [Google Scholar] [CrossRef]
Mazucheli, J.; Menezes, A.F.B.; Fernandes, L.B.; de Oliveira, R.P.; Ghitany, M.E. The unit-Weibull distribution as an alternative to the Kumaraswamy distribution for the modeling of quantiles conditional on covariates. J. Appl. Stat. 2020, 47, 954–974. [Google Scholar] [CrossRef]
Press, W.H.; Teukolsky, S.A.; Vetterling, W.T.; Flannery, B.P. Numerical Recipes 3rd Edition: The Art of Scientific Computing; Cambridge University Press: Cambridge, UK, 2007. [Google Scholar]
Ferrari, S.L.P.; Cribari-Neto, F. Beta regression for modelling rates and proportions. J. Appl. Stat. 2004, 7, 799–815. [Google Scholar] [CrossRef]
Chen, G.; Balakrishnan, N. A general purpose approximate goodness-of-fit test. J. Qual. Technol. 1995, 27, 154–161. [Google Scholar] [CrossRef]
Durbin, J.; Knott, M. Components of Cramér-Von Mises statistics. J. R. Stat. Soc. Ser. B (Methodol.) 1972, 34, 290–307. [Google Scholar] [CrossRef]
Goodman, L.A. Kolmogorov-Smirnov tests for psychological research. Psychol. Bull. 1954, 51, 160–168. [Google Scholar] [CrossRef]
Marinho, P.R.D.; Silva, R.B.; Bourguignon, M.; Cordeiro, G.M.; Nadarajah, S. AdequacyModel: An R package for probability distributions and general purpose optimization. PLoS ONE 2019, 14, e0221487. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Plots of the

UWUW

density for some parameter values. (a) For

p = 0.4

. (b) For

p = 0.6

and

β_{1} = 3.0

.

Figure 2. Estimated densities (a) and empirical cdf (b) of the

BB

,

KWKW

and

UWUW

models.

Table 1. Parameter estimates and standard errors (given in parentheses) for the models fitted to Bolsonaro’s vote proportion in Brazilian presidential elections in 2018.

	$\hat{μ_{1}}$	$\hat{μ_{2}}$	$\hat{β_{1}}$	$\hat{β_{1}}$	$\hat{p}$	$W^{*}$	$A^{*}$	$KS$
$BB$	0.5816	0.1985	9.7510	29.3260	0.7268	1.2937	7.4584	0.0477
	(0.0035)	(0.0026)	(0.3201)	(1.3521)	-
$UWUW$	0.2677	0.6491	2.7011	2.9611	0.5368	0.4119	3.6768	0.0153
	(0.0039)	(0.0027)	(0.0545)	(0.0567)	-

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Two-Component Unit Weibull Mixture Model to Analyze Vote Proportions^†

Abstract

1. Introduction

2. The Proposed Model

3. Parameter Estimation

4. Application

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics

Two-Component Unit Weibull Mixture Model to Analyze Vote Proportions †

Abstract

1. Introduction

2. The Proposed Model

3. Parameter Estimation

4. Application

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics

Two-Component Unit Weibull Mixture Model to Analyze Vote Proportions^†