Determinants of Blank and Null Votes in the Brazilian Presidential Elections

Rojas Guerra, Renata; De Souza Moraes, Kerolene; De Jesus Moreira Junior, Fernando; Peña-Ramírez, Fernando A.; Novaes Pereira, Ryan

doi:10.3390/stats8020038

Open AccessArticle

Determinants of Blank and Null Votes in the Brazilian Presidential Elections

by

Renata Rojas Guerra

^1,*

,

Kerolene De Souza Moraes

¹,

Fernando De Jesus Moreira Junior

¹

,

Fernando A. Peña-Ramírez

¹

and

Ryan Novaes Pereira

^2,*

¹

Department of Statistics, Universidade Federal de Santa Maria, Av. Roraima 1000, Santa Maria 97105-340, Brazil

²

Department of Statistics, Faculty of Science and Technology, São Paulo State University (UNESP), Rua Sen. Roberto Simonsen 305, Presidente Prudente 19060-080, Brazil

^*

Authors to whom correspondence should be addressed.

Stats 2025, 8(2), 38; https://doi.org/10.3390/stats8020038

Submission received: 18 March 2025 / Revised: 29 April 2025 / Accepted: 2 May 2025 / Published: 13 May 2025

(This article belongs to the Section Regression Models)

Download

Browse Figures

Versions Notes

Abstract

:

This study analyzes the factors influencing the proportions of blank and null votes in Brazilian municipalities during the 2018 presidential elections. The behavior of the variable of interest is examined using unit regression models within the Generalized Additive Models for Location, Scale, and Shape (GAMLSS) framework. Specifically, five different unit regression models are explored, beta, simplex, Kumaraswamy, unit Weibull, and reflected unit Burr XII regressions, each incorporating submodels for both indexed distribution parameters. The beta regression model emerges as the best fit through rigorous model selection and diagnostic procedures. The findings reveal that the disaggregated municipal human development index (MHDI), particularly its income, longevity, and education dimensions, along with the municipality’s geographic region, significantly affect voting behavior. Notably, higher income and longevity values are linked to greater proportions of blank and null votes, whereas the educational level exhibits a negative relationship with the variable of interest. Additionally, municipalities in the Southeast region tend to have higher average proportions of blank and null votes. In terms of variability, the ability of a municipality’s population to acquire goods and services is shown to negatively influence the dispersion of vote proportions, while municipalities in the Northeast, North, and Southeast regions exhibit distinct patterns of variation compared to other regions. These results provide valuable insights into electoral participation’s socioeconomic and regional determinants, contributing to broader discussions on political engagement and democratic representation in Brazil.

Keywords:

beta distribution; Brazilian elections; GAMLSS models; proportion of votes; unit regression models

1. Introduction

Latin American elections are typically characterized by a high proportion of blank and null votes (BNVs) [1]. According to [2], this type of voting can occur due to socio-demographic, institutional, or political factors. The socio-demographic approach considers economic and social development variables, including literacy rates, education, and healthcare. The institutional aspect encompasses features, such as the compulsory nature of voting (whether mandatory or not), the type of electoral system, the structure of the ballot, and the timing of elections. Finally, in the political sphere, a BNV is seen as a deliberate act of protest driven by voter dissatisfaction with political or economic conditions. However, these classifications do not create mutually exclusive categories, as several theorized causal mechanisms (e.g., protest motives) for null voting correspond to two or more categories and empirical variables. See, for example, [1,3,4,5].

In electoral systems where voting is compulsory, there is a higher tendency for electoral participation, even when sanctions for abstention are minimal or loosely enforced [6]. On the other hand, the related literature emphasizes the need to discuss its consequences on voting choices, particularly regarding the quality of electoral decisions [6]. In this context, an invalid vote is defined as one that is not attributed to any political party or candidate. Such votes lead to ballots that are excluded from the mandate allocation process and primarily consist of BNVs.

Recently, researchers have shown a growing interest in studying this voting behavior, particularly due to its prevalence in various countries. The works of [1,7,8] aimed to identify the factors associated with BNVs in Latin American countries. Ref. [9] focused their analyses on the occurrence of BNVs in Germany. Ref. [10] conducted comparisons between Latin America and post-communist Europe, while [11,12] centered their analyses on European democracies. Ref. [13] studied this type of electoral behavior in Western European countries, Australia, New Zealand, and the Americas. Additionally, we can mention [14,15] for their contributions focused on the Peruvian and Colombian elections, respectively.

In Brazil, compulsory voting in a political context marked by bribery, nepotism, extortion, and high levels of corruption and scandals increases resentment toward candidates, which is expressed through abstention [16]. In this way, BNVs can represent protest manifestations perceived as a reaction against specific candidates or policies [17]. According to the Electoral Glossary of the Superior Electoral Court [18], a blank vote occurs when a voter refrains from selecting any candidate. Conversely, null votes occur when voters enter numbers that are not valid, i.e., they do not correspond to any candidate. According to Article 211 of the Electoral Code, Law No. 4737/1965 [19], the candidate “most voted who has obtained an absolute majority of votes, excluding, for the calculation of this, the blank and null votes” is considered elected. Thus, casting a BNV in practice equates to abstention, even if the voter physically participates in the election.

According to [20], in the 2018 elections, the percentage of null votes was 6.14% in the first round and 7.43% in the second. Blank votes accounted for 2.65% in the first round and 2.14% in the second. Thus, analyzing the dynamics of BNVs contributes to understanding the Brazilian political landscape and society’s stance on the obligation to choose representatives. In this context, the objective of this research is to identify the factors influencing the proportions of BNVs in Brazilian municipalities during the 2018 presidential elections, specifically (i) describing the behavior of the proportion of blank and null votes, (ii) fitting unit regression models to identify the determinants of the proportion of BNVs, and (iii) estimating the impact of the factors identified as determinants of the variable of interest.

Given that the proportion of BNVs is confined within a doubly bounded interval, it is crucial to employ models that account for this constraint. According to [21], this variable type often exhibits skewness, heteroskedasticity, and, in some cases, heavy tails, which can affect results obtained from models assuming normality. Beta regression [21] is the classical model for fitting doubly bounded or unit data, as in the case of the proportion of votes. However, simplex regressions [22,23], Kumaraswamy (Kw) [24,25], unit Weibull (UW) [26], and reflected unit Burr XII (RUBXII) [27] have been proposed as alternatives when the beta distribution fails to accommodate the data dynamics.

The aforementioned models employed in this study belong to the Generalized Additive Models for Location, Scale, and Shape (GAMLSS) family [28]. This flexible framework is particularly suited for analyzing the role of socioeconomic disparities in BNVs during the 2018 Brazilian presidential elections, as it allows us to capture complex parametric relationships that, until now, have not been studied in this electoral context. The importance of this approach lies in the fact that GAMLSS models encompass a wide range of probability distributions and allow for the inclusion of regression structures on multiple parameters of the distribution. The ability to use a variety of distributions, including those with bounded support, makes them particularly suitable for modeling data within the [0, 1] interval, such as BNVs. This flexibility is one of its main advantages, in addition to its ability to handle large data fluctuations (extreme values) and variables with asymmetric behavior, modeling them more adequately than traditional models based on Gaussian distributions. Furthermore, the ease of implementation through well-established R packages has contributed to the growing use of GAMLSS in recent literature [29,30,31,32]. The results obtained in our analysis within this framework provide valuable insights for advancing the understanding of electoral behavior and democratic participation, especially in contexts of socioeconomic inequality.

The remainder of this paper is structured as follows. Section 2 presents the definition of the GAMLSS class, which encompasses the unit regression models used in this study. Section 3 provides a description of the dataset and its construction. Section 4 presents and discusses the regression models fitted for the proportion of BNVs in the second round of the 2018 Brazilian presidential elections. Finally, Section 5 offers concluding remarks.

2. Unit Regression Models in the GAMLSS Family

In this section, we present the definition of the GAMLSS class, which encompasses the unit regression models used in the analyses of this work. This class includes semi-parametric regression models, requiring the assumption of a probability distribution for the response variable while also allowing for non-parametric smoothing functions. GAMLSS models were introduced by [33], with their initial developments presented in [34,35]. The objective of this framework was to establish a general class of regression models, encompassing and surpassing some limitations of popular generalized linear models [36] and generalized additive models [37].

Let

Y_{1}, \dots, Y_{n}

be a sample of n independent observations, where each

Y_{t}, t = 1, \dots, n

, follows a probability distribution with a density function denoted by

D (y_{t}; θ_{t}),

where

θ_{t} = {(μ_{t}, σ_{t}, ν_{t}, τ_{t})}^{⊤}

represents the vector of parameters associated with this distribution. In the GAMLSS class, each component of

θ_{t}

follows a regression structure, which can also incorporate additive components. Typically,

μ_{t}

represents a location parameter and

σ_{t}

accounts for precision (or dispersion), while

ν_{t}

and

τ_{t}

define shape parameters. The structure of the GAMLSS class can be expressed as follows:

\begin{matrix} g_{1} (μ) & = η_{1} = X_{1} β_{1} + \sum_{j = 1}^{J_{1}} s_{1 j} (x_{1 j}), \\ g_{2} (σ) & = η_{2} = X_{2} β_{2} + \sum_{j = 1}^{J_{2}} s_{2 j} (x_{2 j}), \\ g_{3} (ν) & = η_{3} = X_{3} β_{3} + \sum_{j = 1}^{J_{3}} s_{3 j} (x_{3 j}), \\ g_{4} (τ) & = η_{4} = X_{4} β_{4} + \sum_{j = 1}^{J_{4}} s_{4 j} (x_{4 j}), \end{matrix}

where

μ = {(μ_{1}, \dots, μ_{n})}^{⊤}

,

σ = {(σ_{1}, \dots, σ_{n})}^{⊤}

,

ν = {(ν_{1}, \dots, ν_{n})}^{⊤}

, and

τ = {(τ_{1}, \dots, τ_{n})}^{⊤}

. For

k = 1, \dots, 4

,

g_{k} (\cdot)

is a strictly monotonic and twice-differentiable link function that maps

(0, 1)

into

R

;

η_{k}

is the linear predictor;

β_{k}

is a parameter vector associated with the regressor matrix

X_{k}

, which is assumed fixed and known; and

s_{k j} (\cdot)

are non-parametric smoothing functions applied to the corresponding explanatory variable

x_{kj}

. The matrix

X_{k}

has dimensions

n \times p_{k}

, where n is the sample size and

p_{k}

represents the number of regressors associated with the k-th submodel, including a column of ones when an intercept is required. The matrix

x_{j k}

has dimensions

n \times J_{k}

, where

J_{k}

represents the number of non-linear regressors associated with the k-th submodel.

The GAMLSS class is highly flexible [34], where (i) the response variable is not restricted to the exponential family; (ii) in addition to modeling the location parameter, other distribution parameters can also be included; and (iii) different additive terms can be considered as covariates. Furthermore, the gamlss package in R enables fitting this model and its subclasses across hundreds of probability distributions. If the required distribution is not available, it is possible to create custom “gamlss.family” objects to extend the package’s functionalities [34]. This allows for the estimation of unit regression models using these resources.

In the analysis of data constrained to doubly bounded or unit intervals, it is essential to employ suitable distributions. Some notable distributions in this context include beta regression [21], Kw [24,25], simplex [22,23], UW [26,38], and RUBXII [27]. These distributions are biparametric, with regression models constructed through reparameterizations involving a location parameter and an additional parameter related to dispersion, precision, or shape. The GAMLSS framework is particularly appropriate in this context, as it allows for modeling not only the response variable’s mean (or other location parameter) but also its variability, facilitating a more comprehensive data analysis. In our study, which deals with proportions that may exhibit heteroskedasticity and skewed behavior, this capability is especially relevant. The regression models presented in the following subsections, which are part of the GAMLSS model formulation, have the following key assumptions: (i) the response variable is continuous and restricted to the interval (0, 1); (ii) the observations are independent; and (iii) the random variable follows a beta, simplex, Kw, UW, or RUBXII distribution.

Further, notice that the general formulation of GAMLSS models allows for incorporating regression structures in a maximum of four distributional parameters, typically referred to as location, scale, and shape. However, the distributions considered in the next subsections are biparametric. As such, the GAMLSS framework, in those cases, includes regression structures only for the two parameters that define each distribution. In this work, we follow the approach adopted in previous studies, such as [30,31,39], who excluded additive components from the model. This approach is based on the fact that restricting the model to linear terms improves the interpretability of the estimated effects since, depending on the chosen link function, we can obtain a straightforward interpretation of the estimated parameters. Furthermore, more parsimonious specifications are generally preferable, which is achieved by excluding nonlinear components as long as the final model is well specified. Moreover, it is worth noting that, to the best of our knowledge, there is a lack of studies considering the additive component for the Kw, UW, and RUBXII regressions. Thus, the models considered take the form

g_{1} (μ) = η_{1} = X_{1} β_{1}

(1)

and

g_{2} (σ) = η_{2} = X_{2} β_{2},

(2)

where

β_{1} \in R^{r}

and

β_{2} \in R^{q}

are unknown parameter vectors linked to the linear predictors

η_{1}

and

η_{2}

, respectively. The matrices

X_{1}

and

X_{2}

have dimensions (

n \times r

) and (

n \times q

), with

(r + q) < n

, representing the observed regressors associated with the predictors, which are assumed fixed and known. For these models, the strictly monotonic and twice-differentiable functions

g_{1} (μ) : R \mapsto (0, 1)

and

g_{2} (σ) : R \mapsto R^{+}

(except in the case of beta regression, where

σ

in gamlss belongs to the unit interval) are used. Furthermore, we employ the logit link function for parameters with a parametric space in the unit interval and the logarithmic link function for parameters with a parametric space over the positive real line. Finally, the systematic components from Equations (1) and (2) are combined with the random components of each distribution. The subsequent sections describe these components and the parameterizations used for each distribution.

2.1. Beta Regression

Let Y be a random variable with a beta distribution, parameterized by the mean

μ \in (0, 1)

and dispersion parameter

σ \in (0, 1)

[40]. Its probability density function (pdf) is given by

f (y; μ, σ) = \frac{1}{B (μ (1 - σ) σ^{- 2}, (1 - μ) (1 - σ) σ^{- 2})} y^{μ (1 - σ) σ^{- 2} - 1} {(1 - y)}^{(1 - μ) (1 - σ) σ^{- 2} - 1},

(3)

where

0 < y < 1,

B (α, β) = \int_{0}^{1} x^{α - 1} {(1 - x)}^{β - 1} d x

is the beta function. Under this parameterization, the variance of Y is given by

\begin{matrix} V a r (Y) = μ (1 - μ) / (1 + σ^{2}) . \end{matrix}

(4)

Notably, this parameterization is implemented for the beta density in the gamlss package in R [34,41,42].

The beta regression model was proposed by [21], albeit with a parameterization different from that available in gamlss. Since then, this model has been used to study various practical problems, and some generalizations have been introduced in the literature. For instance, Refs. [43,44] proposed beta regression models with parametric link functions. Ref. [45] developed control charts based on the residuals of beta regression. Additionally, Ref. [46] suggested specification tests and presented an application to COVID-19 data.

2.2. Simplex Regression

Let Y be a random variable following a simplex distribution, parameterized by the mean

μ \in (0, 1)

and dispersion parameter

σ \in (0, \infty)

[22]. Its pdf is given by

f (y; μ, σ) = {2 π σ^{2} {[y (1 - y)]}^{3}}^{- 1 / 2} exp \{- \frac{{(y - μ)}^{2}}{2 σ^{2} μ^{2} y (1 - y) {(1 - μ)}^{2}}\} .

(5)

In the context of regression using simplex distribution, one of the earliest contributions was the work of [47], whose proposal allows for modeling both linear and nonlinear relationships between predictors and the response variable. Ref. [48] studied the effect of measurement errors on model estimation. Ref. [49] adopted a Bayesian approach and conducted comparisons with beta regression. Additionally, Ref. [50] compared beta and simplex regressions to explain homicide rates in the capitals of Brazilian states. The simplex regression model is available in the gamlss package.

2.3. Kumaraswamy Regression

Let Y be a random variable following the Kw distribution, parameterized by the median

μ \in (0, 1)

and the dispersion parameter

σ \in (0, \infty)

[24]. Its pdf is given by

f (y; μ, σ) = \frac{log (0.5)}{σ^{2} log (1 - μ^{1 / σ^{2}})} y^{1 / σ^{2}} {(1 - y^{1 / σ^{2}})}^{log (0.5) / log (1 - μ^{1 / σ^{2}}) - 1},

(6)

where

0 < y < 1,

and log denotes the natural logarithm.

The parameterization in Equation (6) was also employed by [51] to develop a time-series model based on the Kw distribution. Additionally, Ref. [52] proposed a generalization of this regression model using the Aranda-Ordaz link function. For recent studies on this distribution, see [53,54,55]. Although Kw regression is not implemented in the gamlss package, it has been made available by [56].

2.4. Unit Weibull Regression

Let Y be a random variable following the UW distribution, indexed by its

τ \times 100

-th quantile

μ \in (0, 1)

and shape parameter

σ \in (0, \infty)

[26]. Its pdf is given by

f (y; μ, σ) = \frac{σ}{y} (\frac{log τ}{log μ}) {(\frac{log y}{log μ})}^{σ - 1} τ^{{(log y / log μ)}^{σ}},

where

0 < y < 1 .

In this work, we consider

τ = 0.5

to model the median of Y, facilitating comparisons with Kw regression. The UW distribution is also part of the extended unit Weibull family [57] and has been recently introduced in the literature. Consequently, most advancements pertain to studies on the properties of this distribution. See, for example, Refs. [58,59,60]. Although Weibull regression is not available in the gamlss package, implementations have been provided by [56].

2.5. Reflected Unit Burr XII Regression

Let Y be a random variable following the RUBXII distribution, indexed by its

τ \times 100

-th quantile

μ \in (0, 1)

and shape parameter

σ \in (0, \infty)

[27]. Its pdf is given by

f (y; μ, σ) = \frac{log {(1 - τ)}^{- σ} {log}^{σ - 1} {(1 - y)}^{- 1}}{(1 - y) log [1 + {log}^{σ} {(1 - μ)}^{- 1}]} {[1 + {log}^{σ} {(1 - y)}^{- 1}]}^{\frac{log (1 - τ)}{log [1 + {log}^{σ} {(1 - μ)}^{- 1}]} - 1},

where

0 < y < 1 .

We set

τ = 0.5

to model the median of Y, enabling comparisons with Kw and UW regressions. The RUBXII distribution is also part of the extended reflected unit Weibull family [57], and the regression model was recently introduced in the literature by [27]. Implementations following the gamlss package approach have been made available by [56].

3. Data Preparation and Descriptive Analysis

In this section, we describe the database and its construction. This study is based on a cross-sectional dataset, where the variable of interest is the proportion of blank and null votes observed in each Brazilian municipality during the 2018 presidential elections. The data were obtained from the database of the Institute of Applied Economic Research [61], which compiles economic, financial, demographic, geographic, and social data for Brazilian municipalities and states. Additionally, open databases provided by the Brazilian Institute of Geography and Statistics [62] and the Atlas of Human Development in Brazil [63] were consulted.

Municipalities with missing data for any of the variables included in the analysis, as well as those created after 2010, were excluded. These exclusions correspond to 16 municipalities: 1 in the North region (Mojuí), 6 in the Northeast (Trancoso, Olivença, Cococi, Messejana, Vertentes do Lajeado, and Nazária), 2 in the Central–West (Figueirão and Paraíso das Águas), 3 in the Southeast (Ponte de Itabapoana, São João Marcos, and Santo Amaro), and 4 in the South (Porto de Cima, Pinto Bandeira, Pescaria Brava, and Balneário Rincão). After these exclusions, the final dataset comprises 5562 observations.

Table 1 presents the variables considered in the analysis, along with their respective sources and descriptions. All the variables are quantitative, except for Region and Capital, which are categorical. The variable Region indicates the geographic region to which each municipality belongs, while Capital identifies whether the municipality is a state capital.

For the regression models, the Capital variable is transformed into a dummy variable that takes the value one if the municipality is a state capital and zero otherwise. Region is treated as a polytomous variable, with the Central–West region defined as the reference category. Dummy variables are created for the other regions, assuming the value one if the municipality belongs to the respective region and zero otherwise.

The variables PBNVF and PBNVS denote the proportion of BNVs in the first and second rounds of the 2018 elections, respectively. These proportions are computed as the ratio of the total number of BNVs to the overall number of votes cast.

One of the key explanatory variables is the municipal human development index (MHDI), a numerical indicator ranging from 0 to 1. As illustrated in Figure 1, lower values indicate worse conditions, while higher values reflect better socioeconomic conditions. In this study, we use the disaggregated MHDI, which consists of three dimensions: income, health, and education. This allows for the estimation of their individual effects. The variable MHDI_I measures the average ability of a municipality’s population to acquire goods and services. MHDI_H incorporates information on life expectancy and mortality, while MHDI_E quantifies the educational level of both the adult and young population.

Lastly, the demographic density (DD) of the municipalities is included as an explanatory variable. This indicator is defined as the ratio of the total number of inhabitants to the land area of the municipality.

Table 2 presents the descriptive statistics of the response variable and the quantitative covariates.

The variable PBNVS presents a mean of 0.08 and a median of 0.08, indicating that the central tendency of the distribution is concentrated around 8%. The minimum and maximum observed values are 0.02 and 0.23, respectively, resulting in a range of 0.21. The distribution exhibits positive skewness (0.77), evidencing asymmetry to the right, with a concentration of municipalities around lower values and a tail extending toward higher proportions. The kurtosis coefficient (0.02) suggests a platykurtic distribution, with thinner tails and a flatter peak compared to the normal distribution. Furthermore, the coefficient of variation (41.33%) denotes a high degree of relative dispersion, reinforcing the heterogeneity of the PBNVS across municipalities. These empirical characteristics justify the adoption of flexible distributional assumptions at the modeling stage.

Figure 2 presents the boxplot and histogram of the variable PBNVS. The correlations were computed using Spearman’s method, as shown in Table 3. The results indicate that the response variable exhibits a positive correlation with most covariates. The strongest correlation with PBNVS is observed for PBNVF, followed by MHDI_H. This suggests that these covariates play a relevant role in this study, as they are associated with variations in the response variable.

Figure 2 further substantiates the descriptive findings. The histogram, constructed using relative frequencies and overlaid with a density estimate, reveals a pronounced concentration of observations between 0.05 and 0.10, with a visible mode around the central value. The positively skewed shape of the density function confirms the asymmetry previously noted, suggesting that most municipalities present relatively low levels of blank and null votes, while a smaller subset exhibits notably higher proportions.

The smooth decline of the density curve toward the right tail reinforces the presence of rare but influential extreme values. The boxplot displays a median consistent with the reported central tendency and highlights the presence of upper outliers. These outliers, in conjunction with the interquartile range, reflect the substantial heterogeneity across observations and the long right tail of the distribution. Together, the histogram and boxplot emphasize the asymmetric, dispersed, and bounded nature of the data, reinforcing the choice of continuous distributions defined on the (0,1) interval and capable of capturing such features in the modeling stage.

Table 3 presents the Spearman correlation coefficients between the proportion of blank and null votes in both rounds (PBNVF and PBNVS) and the selected covariates. A moderate and significant correlation is observed between the two voting rounds (

ρ = 0.5076

; p < 0.005), indicating partially consistent behavior across elections.

PBNVS exhibits significant positive correlations with all the dimensions of the municipal human development index (MHDI), especially with health (MHDI_H,

ρ = 0.4218

) and income (MHDI_I,

ρ = 0.3651

), suggesting that higher development levels are associated with increased blank and null voting in the second round. In contrast, PBNVF shows negligible and mostly non-significant associations with the MHDI indicators.

Demographic density (DD) is positively and significantly correlated with both voting variables and all the MHDI dimensions, indicating that more densely populated municipalities tend to show higher development levels and greater proportions of blank and null votes.

Table 4 presents the frequency distribution of the Region variable. Approximately 50% of the municipalities in the sample belong to the S and Southeast regions, which are also the most developed in the country. Conversely, the N and NE regions, which exhibit lower socioeconomic indicators, account for about 40% of the observations. This regional structure mirrors the official territorial organization and the administrative concentration of municipalities in Brazil, particularly in the Northeast. Given these disproportions, regional stratification may play a relevant role in the variability in the response variable and covariates, potentially introducing spatial heterogeneity to be accounted for in the modeling process.

Figure 3 displays boxplots of the PBNVS according to the dummy variable Capital and the Region variable. Regarding the relationship between PBNVS and Region, the S region stands out from the others. It exhibits the highest central tendency and greatest variability, with a wide interquartile range and the highest maximum value. In contrast, the NE region shows the lowest median, despite the presence of some upper outliers. The remaining regions display intermediate values with relatively compact distributions. These differences reinforce the presence of regional heterogeneity and support the inclusion of the geographic region as a relevant covariate in the modeling process. In the comparison between PBNVS and Capital, a higher incidence of outliers is observed in non-capital municipalities.

4. Fitting the Unit Regressions

This section presents and discusses the regression models fitted considering PBNVS as the dependent variable. Parameter estimation for all the models was performed using the maximum likelihood method with the Rigby and Stasinopoulos (RS) algorithm [42]. Model selection was based on the Akaike information criterion (AIC) and the Bayesian information criterion (BIC).

As suggested by [42], the complete model including all the covariates for both the location (

μ

) and dispersion (

σ

) submodels was initially fitted for all candidate classes, i.e., beta, simplex, Kw, UW, and RUBXII random components.

The specification presenting the most favorable information criteria was then subjected to a stepwise selection procedure for further refinement, ensuring parsimony without compromising model performance.

The explanatory variables considered included socioeconomic indicators, electoral information, and regional effects. The categorical variable Region was represented by dummy variables, with the CW region used as the reference category. The binary variable Capital, indicating whether the municipality is a state capital, was initially included in all the models. However, in the models employing the Kw, UW, and RUBXII distributions, the inclusion of Capital in the dispersion submodel led to convergence issues during the parameter estimation. To address this, the variable was excluded from the

σ

-submodel in these cases, ensuring stable and reliable model fitting.

Table 5 presents the estimated coefficients, corresponding standard errors, and associated p-values for the complete models. All the models incorporate submodels for both the location (

μ

) and dispersion (

σ

) parameters.

We observe that, for the location submodel, most distributional classes yielded coefficients with similar magnitudes and directions. The only exception is the variable MHDI_I, which appears with a negative coefficient for the Kw and RUBXII models and a positive coefficient for the other classes. In fact, there is a noticeable similarity in the estimated coefficients of all the covariates across these models. The beta and simplex distributions also show similarity in their estimated values. This result is expected, given that the beta and simplex models are parameterized in terms of the mean, whereas the others are parameterized in terms of the median.

There is a strong negative association between MHDI_E and PBNVS. This suggests that improvements in local educational attainment are consistently associated with lower levels of electoral disengagement. On the other hand, the health-related component (MHDI_H) exhibits a generally positive effect, indicating that municipalities with better public health infrastructure may display higher rates of blank and null voting. This relationship may reflect underlying behavioral or institutional phenomena, such as protest voting or political disaffection in areas with relatively higher access to public services.

In contrast, the income dimension (MHDI_I) shows a more nuanced effect. While some models suggest a negative association with the proportion of blank and null votes, the magnitude and significance of the estimates vary depending on the distributional assumptions. This variability may reflect the complex interplay between economic conditions and political engagement, where income alone may not fully capture socio-political motivations behind non-nominal voting behavior.

The regional dummy variables also play a significant role in capturing structural heterogeneity across the Brazilian territory. Even after accounting for human development indicators, the Southeast region exhibits higher average levels of blank and null votes, which points to regional cultural or political dynamics not fully captured by conventional socioeconomic variables. The inclusion of regional effects proves crucial for mitigating omitted variable bias and improving model calibration.

From a distributional standpoint, all the models exhibit similar signs and relative magnitudes for the covariates, but subtle differences in standard errors and p-values reveal the sensitivity of inferential conclusions to distributional misspecification. While the beta and simplex models are well suited for symmetric or light-tailed data, they may struggle with asymmetric patterns, particularly near the boundaries. In contrast, the Kumaraswamy, unit Weibull, and especially the RUBXII distributions demonstrate greater adaptability to skewness and tail behavior. The RUBXII distribution, derived from a reparametrized Burr XII, proves particularly robust in handling extreme proportion values, though it introduces additional computational complexity and parameter interpretation challenges.

Model performance was further assessed using information criteria and goodness-of-fit metrics, with the AIC, BIC, and pseudo-

R^{2}

(RSQ) values summarized in Table 6. In addition, predictive accuracy was evaluated through standard forecasting metrics, namely, the mean absolute percentage error (MAPE), mean absolute error (MAE), and root mean squared error (RMSE), providing complementary insights into the models’ predictive capabilities. These metrics serve complementary roles: the AIC and BIC balance goodness of fit and model parsimony, while the RSQ and residual-based error assess predictive accuracy. The results indicate that the beta regression model achieved the lowest AIC and BIC values. Regarding the RSQ, the RUBXII, Kw, and beta models showed the highest values, respectively. In terms of predictive accuracy, all the candidate classes presented similar MAE and RMSE values, differing in the fourth decimal, with the UW model obtaining the lowest MAPE (approximately 17%). Given that the MAPE, MAE, and RMSE values are relatively low, it can be concluded that all the models provided satisfactory predictive accuracy.

The one that performed best across the majority of the evaluation metrics was selected. Based on this criterion, the beta regression model was identified as the most appropriate. Once the beta distribution was selected as the random component in the GAMLSS regression, we refined its specification using a stepwise model selection procedure. To this aim, we used the AIC as the selection criteria at the stepGAIC function in the gamlss package and then checked the significance of the regressors to include only those that are significant at the 5% level. We refer to the final model selected through this procedure as Model A. To formally assess the effect of the regressors on the variance of the PBNVS, we considered the relationship between the

σ

parameter and the variance of the beta distribution, presented in Equation (4), interpreting

σ

as an indicator of data heterogeneity. In addition, we fitted a version of the model assuming constant dispersion, which we refer to as Model B. Table 7 presents the estimates, standard errors, and p-values of the final model and its constant-dispersion competitor.

The comparison between Models A and B reveals that, while the general directional relationships between the dependent variable and the regressors are preserved across both models, the magnitude of the coefficients varies. These variations highlight the impact of assuming constant dispersion in Model B, which may lead to biased estimates and underscore the necessity of appropriately modeling dispersion in regression analyses.

Since both models are nested, we conducted a Likelihood Ratio Test (LRT) to evaluate whether incorporating covariates into the dispersion submodel enhances the model’s fit. The null hypothesis of the LRT is that the simpler model is preferred. The LRT yielded a test statistic of 279.12 with a p-value < 0.001. Therefore, we reject the null hypothesis at any conventional significance level, concluding that including additional predictors in the dispersion submodel significantly improved the model fit compared to the simpler nested alternative.

These findings are further visualized in Figure 4, which displays the partial effect plots for each predictor in the dispersion submodel. The shaded bands represent 95% confidence intervals, confirming the strength and direction of these effects. Notably, all the effects are nearly linear, indicating that the systematic component and the link function specified for

σ

is appropriate and the model fit is stable.

Based on these results, the beta regression with variable dispersion (Model A) is selected as the final model, and the next step is to perform residual analysis as a diagnostic technique to assess the adequacy of the model fit. According to [64], if the model is correctly specified, the distribution of quantile residuals should approximate a standard normal distribution. Figure 5 displays the histogram of the residuals from the beta regression, which suggests a tendency toward normality. Additionally, the quantile–quantile plot (QQ-plot) of the quantile residuals is provided. The results support the adequacy of the beta regression model, as the histogram closely resembles a normal distribution, and most of the points in the QQ-plot align well with the 45-degree reference line. This is further supported by the Filliben correlation coefficient of

0.9961

, close to one, as well as the residual summary statistics, the mean near zero (

- 0.0049

), variance close to one (

0.9976

), slight negative skewness (

- 0.2856

), and kurtosis (

3.6941

), all of which reinforce the appropriateness of the fitted model.

Therefore, the final estimated location submodel takes the following form:

\begin{matrix} log [\frac{\hat{μ_{t}}}{1 - \hat{μ_{t}}}] & = - 3.74480 + 0.81425 \times MHDI_H - 0.34533 \times MHDI_E \\ + 0.17588 \times Capital + 7.63509 \times PBNVF + 0.11825 \times North \\ - 0.07418 \times Northeast + 0.20922 \times South + 0.58678 \times Southeast \end{matrix}

(7)

The dispersion submodel controls for the variability in the response across observations. The final dispersion submodel, reflecting these sources of heterogeneity, takes the following form:

\begin{matrix} log [\frac{\hat{σ_{t}}}{1 - \hat{σ_{t}}}] & = - 1.46455 \times MHDI_I - 1.59864 \times MHDI_H - 0.89196 \times MHDI_E \\ + 0.37642 \times Capital - 0.16225 \times North \\ - 0.40964 \times Northeast - 0.22061 \times Southeast \end{matrix}

(8)

From Equations (7) and (8), several insights regarding the PBNVS, assuming fixed levels of the other variables, can be drawn:

The MHDI_H also shows a positive effect, indicating that municipalities with better health outcomes tend to have more blank and null votes. This result suggests that more significant health development could be linked to lower political participation, possibly due to social problems or disillusionment with the electoral system in these areas. Also, higher socioeconomic development and urbanization contribute to the wider circulation and more democratic access to the political information necessary for voting in national elections [1].
The variable MHDI_E shows a negative effect on the PBNVS, suggesting that municipalities with better educational outcomes tend to have a lower PBNVS. This indicates that better education is associated with greater political awareness and participation, which translates into fewer protest votes or disengagement from the electoral process. Education and literacy play a crucial role in shaping the political competence of individual voters [1].
The effects on the Region variable show that geographic location significantly influences the PBNVS. Municipalities in the Southeast region, for example, have the highest PBNVS, indicating a notable regional tendency toward protest or disaffection voting in this area. Other regions, such as the North and South, also show positive effects, albeit to a lesser extent. These results suggest that regional factors, such as the local political climate, and other regional conditions, such as cultural influences, significantly influence voting behavior.
The dispersion model shows that the IDMH_I significantly reduces variability in the PBNVS, meaning wealthier municipalities tend to have more stable voting patterns. Regional effects also play a role, with the SE, NE, and N regions showing negative coefficients, indicating lower variability in these regions.

5. Final Considerations

This study provides a comprehensive analysis of the socioeconomic and regional determinants of blank and null votes in the second round of the 2018 Brazilian presidential elections. By employing flexible GAMLSS models tailored for proportions, we rigorously evaluated five distinct unit regression distributions—beta, simplex, Kumaraswamy, unit Weibull, and reflected unit Burr XII—each capturing different distributional characteristics of the response variable.

Among the tested models, the beta regression exhibited the lowest AIC and BIC values. In terms of accuracy measures (MAPE, MAE, and RMSE), the models yielded similar results. Given these results, the beta regression was chosen as the most appropriate model for analyzing the proportion of BNVs.

Regarding the explanatory variables, MHDI_I, MHDI_H, and MHDI_E, along with the regional classification of municipalities, were the most influential factors affecting the mean proportion of blank and null votes. Specifically, municipalities with a higher MHDI_I and MHDI_H exhibited higher proportions of BNVs, while MHDI_E had a negative effect. Regionally, municipalities in the Southeast had higher average proportions of blank and null votes. Additionally, MHDI_I was found to negatively impact the variability in BNVs, while the Northeast, North, and Southeast regions exhibited distinct effects on the dispersion of the vote proportion.

This study provides insights into the Brazilian political landscape by identifying key factors influencing voters’ decisions to invalidate their votes. The model incorporates socioeconomic, demographic, and political variables, highlighting how regional differences, income levels, education, and political polarization shaped voter behavior during the 2018 elections. The findings may also aid political actors in shaping campaign strategies, as voters who annul their votes represent a potential audience that could be engaged in future elections.

A limitation of the model is its inability to separate the various motivations underlying blank and null votes, such as protest votes, disillusionment, or mere disinterest in the electoral process. The analysis considers BNVs in an aggregated manner, without distinguishing the specific reasons that lead voters to invalidate their votes.

Author Contributions

Conceptualization, R.R.G.; methodology, K.D.S.M. and R.R.G.; software, K.D.S.M., R.N.P. and R.R.G.; validation, F.D.J.M.J., F.A.P.-R. and R.R.G.; formal analysis, F.D.J.M.J. and R.R.G.; investigation, K.D.S.M., R.N.P., F.D.J.M.J., F.A.P.-R. and R.R.G.; resources, R.N.P., F.A.P.-R. and R.R.G.; data curation, K.D.S.M.; writing—original draft preparation, K.D.S.M., F.D.J.M.J. and R.R.G.; writing—review and editing, F.D.J.M.J., F.A.P.-R. and R.R.G.; visualization, K.D.S.M., R.N.P. and R.R.G.; supervision, F.D.J.M.J., F.A.P.-R. and R.R.G.; project administration, R.R.G.; funding acquisition, R.N.P., F.A.P.-R. and R.R.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Serrapilheira Institute (grant number 2211-41692), FAPERGS (grant number 23/2551-0001595-1), CNPq (grant number 306274/2022-1), and FAPESP (grant number 2024/18409-7). The content of this work is solely the responsibility of the authors and does not necessarily represent the official views of the funding agencies.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data supporting this research are publicly available and can be accessed through the sources listed in Table 1. Additionally, the data are provided in the following repository, https://github.com/ryanxnovaes/sec-project (accessed on 25 February 2025), along with all the computer code used in the analysis.

Acknowledgments

The authors are grateful to the reviewers for their constructive comments and suggestions, which helped to improve the quality and presentation greatly.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Power, T.J.; Garand, J.C. Determinants of Invalid Voting in Latin America. Elect. Stud. 2007, 26, 432–444. [Google Scholar] [CrossRef]
McAllister, I.; Makkai, T. Institutions, Society or Protest? Explaining Invalid Votes in Australian Elections. Elect. Stud. 1993, 12, 23–40. [Google Scholar] [CrossRef]
Zulfikarpasic, A. Le vote blanc: Abstention civique ou expression politique? Rev. Franç. Sci. Polit. 2001, 51, 247–268. [Google Scholar]
Power, T.J.; Roberts, J.T. Compulsory Voting, Invalid Ballots, and Abstention in Brazil. Political Res. Q. 1995, 48, 795–826. [Google Scholar] [CrossRef]
Uggla, F. Incompetence, Alienation, or Calculation? Explaining Levels of Invalid Ballots and Extra-Parliamentary Votes. Comp. Political Stud. 2008, 41, 1141–1164. [Google Scholar] [CrossRef]
Freire, A.; Turgeon, M. Random Votes Under Compulsory Voting: Evidence from Brazil. Elect. Stud. 2020, 66, 102168. [Google Scholar] [CrossRef]
Cohen, M.J. Protesting via the Null Ballot: An Assessment of the Decision to Cast an Invalid Vote in Latin America. Political Behav. 2017, 40, 395–414. [Google Scholar] [CrossRef]
Cohen, M.J. A Dynamic Model of the Invalid Vote: How a Changing Candidate Menu Shapes Null Voting Behavior. Elect. Stud. 2018, 53, 111–121. [Google Scholar] [CrossRef]
Fatke, M.; Heinsohn, T. Invalid Voting in German Constituencies. Ger. Politics 2017, 26, 273–291. [Google Scholar] [CrossRef]
Kouba, K.; Lysek, J. Institutional Determinants of Invalid Voting in Post-Communist Europe and Latin America. Elect. Stud. 2016, 41, 92–104. [Google Scholar] [CrossRef]
Moral, M. The Passive-Aggressive Voter: The Calculus of Casting an Invalid Vote in European Democracies. Political Res. Q. 2016, 69, 1–14. [Google Scholar] [CrossRef]
Singh, S.P. Politically Unengaged, Distrusting, and Disaffected Individuals Drive the Link Between Compulsory Voting and Invalid Balloting. Political Sci. Res. Methods 2019, 7, 107–123. [Google Scholar] [CrossRef]
Solvak, M.; Vassil, K. Indifference or Indignation? Explaining Purposive Vote Spoiling in Elections. J. Elections Public Opin. Parties 2015, 25, 463–481. [Google Scholar] [CrossRef]
Lemonte, A.J.; Bazán, J.L. New Class of Johnson Distributions and Its Associated Regression Model for Rates and Proportions. Biom. J. 2016, 58, 727–746. [Google Scholar] [CrossRef] [PubMed]
Pachón, M.; Carroll, R.; Barragán, H. Ballot Design and Invalid Votes: Evidence from Colombia. Elect. Stud. 2017, 48, 98–110. [Google Scholar] [CrossRef]
Stockemer, D.; LaMontagne, B.; Scruggs, L. Bribes and Ballots: The Impact of Corruption on Voter Turnout in Democracies. Int. Political Sci. Rev. 2013, 34, 74–90. [Google Scholar] [CrossRef]
Kang, W.-T. Protest Voting and Abstention Under Plurality Rule Elections: An Alternative Public Choice Approach. J. Theor. Politics 2004, 16, 79–102. [Google Scholar] [CrossRef]
Tribunal Superior Eleitoral (TSE). Glossário Eleitoral. Results. Available online: https://www.tse.jus.br/servicos-eleitorais/glossario/glossario-eleitoral (accessed on 20 April 2025).
Tribunal Superior Eleitoral (TSE). Código Eleitoral—Lei nº 4.737. Results. Available online: https://www.tse.jus.br/legislacao/codigo-eleitoral/ (accessed on 20 April 2025).
Tribunal Superior Eleitoral (TSE). Electoral Data Repository. Results. Available online: http://english.tse.jus.br/the-brazilian-electoral-system/statistics (accessed on 20 April 2025).
Ferrari, S.; Cribari-Neto, F. Beta Regression for Modelling Rates and Proportions. J. Appl. Stat. 2004, 31, 799–815. [Google Scholar] [CrossRef]
Barndorff-Nielsen, O.E.; Jørgensen, B. Some Parametric Models on the Simplex. J. Multivar. Anal. 1991, 39, 106–116. [Google Scholar] [CrossRef]
Espinheira, P.; Silva, A.O. Nonlinear Simplex Regression Models. arXiv 2018, arXiv:1805.10843. [Google Scholar]
Mitnik, P.A.; Baek, S. The Kumaraswamy Distribution: Median-Dispersion Re-Parameterizations for Regression Modeling and Simulation-Based Estimation. Stat. Pap. 2013, 54, 177–192. [Google Scholar] [CrossRef]
Bayes, C.L.; Bazán, J.L.; De Castro, M. A Quantile Parametric Mixed Regression Model for Bounded Response Variables. Stat. Its Interface 2017, 10, 483–493. [Google Scholar] [CrossRef]
Mazucheli, J.; Menezes, A.F.B.; Fernandes, L.B.; De Oliveira, R.P.; Ghitany, M.E. The Unit-Weibull Distribution as an Alternative to the Kumaraswamy Distribution for the Modeling of Quantiles Conditional on Covariates. J. Appl. Stat. 2020, 47, 954–974. [Google Scholar] [CrossRef] [PubMed]
Ribeiro, T.F.; Cordeiro, G.M.; Peña-Ramírez, F.A.; Guerra, R.R. A New Quantile Regression for the COVID-19 Mortality Rates in the United States. Comput. Appl. Math. 2021, 40, 255. [Google Scholar] [CrossRef]
Stasinopoulos, M.D.; Rigby, R.A.; Bastiani, F.D. GAMLSS: A Distributional Regression Approach. Stat. Model. 2018, 18, 248–273. [Google Scholar] [CrossRef]
Regis, R.O.; Ospina, R.; Bernardino, W.; Cribari-Neto, F. Asset Pricing in the Brazilian Financial Market: Five-Factor GAMLSS Modeling. Empir. Econ. 2022; in press. [Google Scholar]
Ribeiro, T.F.; Seidel, E.J.; Guerra, R.R.; Peña-Ramírez, F.A.; Silva, A.M. Soybean Production Value in the Rio Grande do Sul under the GAMLSS Framework. Commun. Stat. Case Stud. Data Anal. Appl. 2021, 7, 146–165. [Google Scholar] [CrossRef]
Guerra, R.R.; Peña-Ramírez, F.A.; Ribeiro, T.F.; Cordeiro, G.M.; Mafalda, C.P. Unit Regression Models to Explain Vote Proportions in the Brazilian Presidential Elections in 2018. Colomb. J. Stat. 2024, 47, 283–300. [Google Scholar]
Araújo, F.J.M.; Guerra, R.R.; Peña-Ramírez, F.A. The Burr XII Quantile Regression for Salary-Performance Models with Applications in the Sports Economy. Comput. Appl. Math. 2022, 41, 282. [Google Scholar] [CrossRef]
Rigby, R.A.; Stasinopoulos, D.M. The GAMLSS Project: A Flexible Approach to Statistical Modelling. In New Trends in Statistical Modelling, Proceedings of the 16th International Workshop on Statistical Modelling, Odense, Denmark, 2–6 July 2001; University of Southern Denmark: Sønderborg, Denmark, 2001; p. 345. [Google Scholar]
Rigby, R.A.; Stasinopoulos, D.M. Generalized Additive Models for Location, Scale and Shape. J. R. Stat. Soc. Ser. C (Appl. Stat.) 2005, 54, 507–554. [Google Scholar] [CrossRef]
Akantziliotou, K.; Rigby, R.A.; Stasinopoulos, D.M. The R Implementation of Generalized Additive Models for Location, Scale and Shape. In Statistical Modelling in Society, Proceedings of the 17th International Workshop on Statistical Modelling, Chania, Greece, 8–12 July 2002; Statistical Modelling Society: Vienna, Austria, 2002; pp. 75–83. [Google Scholar]
Nelder, J.A.; Wedderburn, R.W.M. Generalized Linear Models. J. R. Stat. Soc. Ser. A (Gen.) 1972, 135, 370–384. [Google Scholar] [CrossRef]
Hastie, T.; Tibshirani, R. Exploring the Nature of Covariate Effects in the Proportional Hazards Model. Biometrics 1990, 46, 1005–1016. [Google Scholar] [CrossRef]
Mazucheli, J.; Menezes, A.F.B.; Ghitany, M.E. The Unit-Weibull Distribution and Associated Inference. J. Appl. Probab. Stat. 2018, 13, 1–22. [Google Scholar]
Nakamura, L.R.; Ramires, T.G.; Righetto, A.J.; Silva, V.C.; Konrath, A.C. Using the Box-Cox Family of Distributions to Model Censored Data: A Distributional Regression Approach. Braz. J. Biom. 2022, 40, 407–414. [Google Scholar] [CrossRef]
Rigby, R.A.; Stasinopoulos, M.D.; Heller, G.Z.; De Bastiani, F. Distributions for Modeling Location, Scale, and Shape: Using GAMLSS in R; Chapman and Hall/CRC: Boca Raton, FL, USA, 2019. [Google Scholar]
Rigby, R.; Stasinopoulos, D.; Voudouris, V. A Comparison of GAMLSS with Quantile Regression. Stat. Model. 2013, 13, 335–348. [Google Scholar] [CrossRef]
Stasinopoulos, M.D.; Rigby, R.A.; Heller, G.Z.; Voudouris, V.; De Bastiani, F. Flexible Regression and Smoothing: Using GAMLSS in R; CRC Press: Boca Raton, FL, USA, 2017. [Google Scholar]
Canterle, D.R.; Bayer, F.M. Variable Dispersion Beta Regressions with Parametric Link Functions. Stat. Pap. 2019, 60, 1541–1567. [Google Scholar] [CrossRef]
Rauber, C.; Cribari-Neto, F.; Bayer, F.M. Improved Testing Inferences for Beta Regressions with Parametric Mean Link Function. AStA Adv. Stat. Anal. 2020, 104, 687–717. [Google Scholar] [CrossRef]
Rauber, C.; Lima-Filho, L.M.A.; Bayer, F.M. Residual-Based CUSUM Beta Regression Control Chart for Monitoring Double-Bounded Processes. Qual. Reliab. Eng. Int. 2022, 38, 3252–3269. [Google Scholar] [CrossRef]
Santana-e-Silva, J.J.; Cribari-Neto, F.; Vasconcellos, K.L.P. Beta Distribution Misspecification Tests with Application to COVID-19 Mortality Rates in the United States. PLoS ONE 2022, 17, e0274781. [Google Scholar] [CrossRef]
Espinheira, P.L.; de Oliveira Silva, A. Residual and Influence Analysis to a General Class of Simplex Regression. TEST 2019, 28, 1–30. [Google Scholar] [CrossRef]
Carrasco, J.M.F.; Reid, N. Simplex Regression Models with Measurement Error. Commun. Stat.—Simul. Comput. 2021, 50, 3420–3435. [Google Scholar] [CrossRef]
López, F.O. A Bayesian Approach to Parameter Estimation in Simplex Regression Model: A Comparison with Beta Regression. Rev. Colomb. Estad. 2013, 36, 1–21. [Google Scholar]
Cordeiro, G.M.; Rocha, E.; Figueiredo, D.; Fernandes, A.; Ortega, E.M.M.; Prataviera, F. The Beta and Simplex Regression Models to Explain Homicides in State Capitals of Brazil. Model Assist. Stat. Appl. 2020, 15, 215–224. [Google Scholar] [CrossRef]
Bayer, F.M.; Bayer, D.M.; Pumi, G. Kumaraswamy Autoregressive Moving Average Models for Double Bounded Environmental Data. J. Hydrol. 2017, 555, 385–396. [Google Scholar] [CrossRef]
Pumi, G.; Rauber, C.; Bayer, F.M. Kumaraswamy Regression Model with Aranda-Ordaz Link Function. TEST 2020, 29, 1051–1071. [Google Scholar] [CrossRef]
Dey, S.; Mazucheli, J.; Anis, M.Z. Estimation of Reliability of Multicomponent Stress–Strength for a Kumaraswamy Distribution. Commun. Stat.—Theory Methods 2017, 46, 1560–1572. [Google Scholar] [CrossRef]
Dey, S.; Mazucheli, J.; Nadarajah, S. Kumaraswamy Distribution: Different Methods of Estimation. Comput. Appl. Math. 2018, 37, 2094–2111. [Google Scholar] [CrossRef]
Hamedi-Shahraki, S.; Rasekhi, A.; Yekaninejad, M.S.; Eshraghian, M.R.; Pakpour, A.H. Kumaraswamy Regression Modeling for Bounded Outcome Scores. Pak. J. Stat. Oper. Res. 2021, 17, 79–88. [Google Scholar] [CrossRef]
Guerra, R.R. UnitDistForGAMLSS, Repository: Figshare. Available online: https://github.com/renata-rojasg/UnitDistForGAMLSS (accessed on 7 May 2025).
Guerra, R.R.; Peña-Ramírez, F.A.; Bourguignon, M. The Unit Extended Weibull Families of Distributions and Its Applications. J. Appl. Stat. 2021, 48, 3174–3192. [Google Scholar] [CrossRef]
Iliev, A.; Rahnev, A.; Kyurkchiev, N.; Markov, S. A Study on the Unit-Logistic, Unit-Weibull and Topp-Leone Cumulative Sigmoids. Biomath Commun. 2019, 6, 1–15. [Google Scholar] [CrossRef]
Alotaibi, R.M.; Tripathi, Y.M.; Dey, S.; Rezk, H.R. Bayesian and Non-Bayesian Reliability Estimation of Multicomponent Stress–Strength Model for Unit Weibull Distribution. J. Taibah Univ. Sci. 2020, 14, 1164–1181. [Google Scholar] [CrossRef]
Almetwally, E.M.; Jawa, T.M.; Sayed-Ahmed, N.; Park, C.; Zakarya, M.; Dey, S. Analysis of Unit-Weibull Based on Progressive Type-II Censored with Optimal Scheme. Alex. Eng. J. 2023, 63, 321–338. [Google Scholar] [CrossRef]
IPEADATA. Dados Macroeconômicos, Regionais e Sociais. Available online: http://www.ipeadata.gov.br/Default.aspx (accessed on 1 March 2025).
SIDRA. Sistema IBGE de Recuperação Automática. Available online: https://sidra.ibge.gov.br/tabela/1301 (accessed on 1 March 2025).
ATLASBR. Atlas do Desenvolvimento Humano no Brasil. Available online: http://www.atlasbrasil.org.br/ (accessed on 1 March 2025).
Dunn, P.K.; Smyth, G.K. Randomized Quantile Residuals. J. Comput. Graph. Stat. 1996, 5, 236–244. [Google Scholar] [CrossRef]

Figure 1. Scale of the human development index.

Figure 2. Histogram and boxplot of the PBNVS.

Figure 3. Boxplot of the PVBNS by Region and Capital.

Figure 4. Partial effect plots for dispersion predictors (

σ

) in Model A.

Figure 4. Partial effect plots for dispersion predictors (

σ

) in Model A.

Figure 5. Histogram of the quantile residuals from the beta regression alongside the density curve of the standard normal distribution, and the QQ-plot of the quantile residuals from the beta regression.

Table 1. Summary and description of the variables used in this study.

Variable	Source	Description
`PBNVF`	IPEADATA	Proportion of blank and null votes in the first round of the 2018 presidential election.
`PBNVS`	IPEADATA	Proportion of blank and null votes in the second round of the 2018 presidential election (dependent variable).
`MHDI_I`	ATLAS	Municipal human development index related to income from the 2010 census.
`MHDI_H`	ATLAS	Municipal human development index related to health from the 2010 census.
`MHDI_E`	ATLAS	Municipal human development index related to education from the 2010 census.
`Capital`	_	Indicates whether the municipality is the capital of one of the Brazilian states (1—Yes; 0—No).
`Region`	_	Indicates which region of Brazil the municipality belongs to (North, Northeast, Central–West, Southeast, or South).
`DD`	IBGE_SIDRA	Ratio between the population and the land area of the municipality.

Table 2. Descriptive statistics of variables used in the modeling.

Variables	Mean	Median	Skewness	Kurtosis	Min.	Max.	CV
`PBNVF`	0.08	0.08	0.53	0.38	0.02	0.20	30.46
`PBNVS`	0.08	0.08	0.77	0.02	0.02	0.23	41.33
`MHDI_I`	0.64	0.65	−0.10	−0.87	0.40	0.89	12.55
`MHDI_H`	0.80	0.81	−0.41	−0.49	0.67	0.89	5.58
`MHDI_E`	0.56	0.56	−0.10	−0.52	0.21	0.82	16.70
`DD`	108.23	24.37	13.60	226.83	0.13	13,024.56	529.05

Table 3. Spearman correlation matrix with respective p-values in parentheses.

Variables	PBNVS	PBNVF	MHDI_I	MHDI_H	MHDI_E
`PBNVF`	0.5076
	(<0.005)
`MHDI_I`	0.3651	−0.0258
	(<0.005)	(0.0541)
`MHDI_H`	0.4218	−0.0177	0.8530
	(<0.005)	(0.1862)	(<0.005)
`MHDI_E`	0.3185	−0.0079	0.8268	0.7262
	(<0.005)	(0.5538)	(<0.005)	(<0.005)
`DD`	0.2817	0.3226	0.2401	0.1496	0.3106
	(<0.005)	(<0.005)	(<0.005)	(<0.005)	(<0.005)

Table 4. Frequency of municipalities by Brazilian region.

Region (Acronym)	Absolute Frequency	Relative Frequency (%)
`Central–West (CW)`	465	8.36
`North (N)`	449	8.07
`Northeast (NE)`	1792	32.22
`South (S)`	1188	21.36
`Southeast (SW)`	1668	29.99
`Total`	5562	100.00

Table 5. Estimated parameters, standard errors, and p-values of the regression models fitted to the data.

Coefficients	$μ$ Submodel			$σ$ Submodel
Coefficients	Estimate	Std. Error	p-Value	Estimate	Std. Error	p-Value
	Beta Regression
`(Intercept)`	−3.7534	0.0938	<0.0001	0.1238	0.3319	0.7091
`MHDI_I`	0.0974	0.0904	0.2814	−1.3679	0.3303	<0.0001
`MHDI_H`	0.7702	0.1350	<0.0001	−1.5853	0.4898	0.0012
`MHDI_E`	−0.3803	0.0547	<0.0001	−0.9125	0.2102	<0.0001
`Capital`	0.1570	0.0417	0.0002	0.5183	0.1662	0.0018
`DD`	0.0001	0.0000	0.8529	0.0001	0.0000	0.0291
`PBNVF`	7.6081	0.1259	<0.0001	−0.3995	0.5059	0.4297
`North`	0.1204	0.0194	<0.0001	−0.1496	0.0488	0.0022
`Northeast`	−0.0671	0.0161	<0.0001	−0.3834	0.0473	<0.0001
`South`	0.2085	0.0140	<0.0001	0.0031	0.0188	0.8699
`Southeast`	0.5886	0.0132	<0.0001	−0.1958	0.0338	<0.0001
	Simplex Regression
`(Intercept)`	−3.7935	0.0981	<0.0001	3.8324	0.2965	<0.0001
`MHDI_I`	0.1192	0.0956	0.2126	−1.5521	0.2728	<0.0001
`MHDI_H`	0.7875	0.1494	<0.0001	−2.0821	0.4306	<0.0001
`MHDI_E`	−0.3619	0.0613	<0.0001	−0.7015	0.1846	<0.0001
`Capital`	0.1503	0.0365	<0.0001	0.3998	0.1267	0.0016
`DD`	0.0001	0.0000	0.9236	0.0001	0.0000	0.0021
`PBNVF`	7.6211	0.1295	<0.0001	−7.0924	0.3773	<0.0001
`North`	0.1190	0.0196	<0.0001	−0.2568	0.0522	<0.0001
`Northeast`	−0.0605	0.0168	0.0003	−0.3360	0.0510	<0.0001
`South`	0.2062	0.0145	<0.0001	−0.1670	0.0427	<0.0001
`Southeast`	0.5847	0.0143	<0.0001	−0.6380	0.0376	<0.0001
	Kw Regression
`(Intercept)`	−3.3465	0.0640	<0.0001	−1.0394	0.1889	<0.0001
`MHDI_I`	−0.1270	0.0701	0.0701	1.0815	0.2307	<0.0001
`MHDI_H`	0.5833	0.1010	<0.0001	2.2050	0.3062	<0.0001
`MHDI_E`	−0.5093	0.0417	<0.0001	−0.4675	0.1443	0.0012
`Capital`	0.2318	0.0324	<0.0001
`DD`	0.0001	0.0000	0.7763	0.0001	0.0000	0.4140
`PBNVF`	7.3592	0.1072	<0.0001	2.1552	0.3124	<0.0001
`North`	0.1126	0.0163	<0.0001	0.2774	0.0497	<0.0001
`Northeast`	−0.1266	0.0136	<0.0001	0.2156	0.0385	<0.0001
`South`	0.2304	0.0119	<0.0001	0.2182	0.0341	<0.0001
`Southeast`	0.6116	0.0117	<0.0001	0.7699	0.0329	<0.0001
	UW Regression
`(Intercept)`	−4.3493	0.0848	<0.0001	0.5947	0.2158	0.0059
`MHDI_I`	0.5394	0.0791	<0.0001	0.3715	0.2013	0.0651
`MHDI_H`	0.9927	0.1245	<0.0001	0.5823	0.3048	0.0561
`MHDI_E`	−0.3240	0.0517	<0.0001	1.5881	0.1316	<0.0001
`Capital`	0.0485	0.0186	0.0091
`DD`	0.0001	0.0000	0.8272	0.0001	0.0000	<0.0001
`PBNVF`	7.8964	0.1039	<0.0001	3.0382	0.2630	<0.0001
`North`	0.1342	0.0159	<0.0001	0.0279	0.0375	0.4568
`Northeast`	−0.0088	0.0146	0.5461	0.2620	0.0377	<0.0001
`South`	0.1934	0.0117	<0.0001	0.0402	0.0272	0.1401
`Southeast`	0.6137	0.0113	<0.0001	−0.1296	0.0246	<0.0001
	RUBXII Regression
`(Intercept`)	−3.3437	0.0635	<0.0001	−1.0671	0.1849	<0.0001
`MHDI_I`	−0.1263	0.0700	0.0713	1.1040	0.2281	<0.0001
`MHDI_H`	0.5832	0.1004	<0.0001	2.2096	0.3009	<0.0001
`MHDI_E`	−0.5126	0.0416	<0.0001	−0.4707	0.1424	0.0010
`Capital`	0.2323	0.0321	<0.0001
`DD`	0.0001	0.0000	0.7485	0.0001	0.0000	0.4234
`PBNVF`	7.3611	0.1066	<0.0001	1.7838	0.3065	<0.0001
`North`	0.1125	0.0163	<0.0001	0.2784	0.0493	<0.0001
`Northeast`	−0.1276	0.0136	<0.0001	0.2238	0.0380	<0.0001
`South`	0.2301	0.0119	<0.0001	0.2132	0.0337	<0.0001
`Southeast`	0.6105	0.0117	<0.0001	0.7436	0.0326	<0.0001

Table 6. Model fit quality metrics for the beta, simplex, Kw, UW, and RUBXII distributions. The best values across models for each criterion are highlighted in bold.

Regression Model	AIC	BIC	RSQ	MAPE	MAE	RMSE
Beta	−30,425.71	−30,279.99	0.7613	18.3361	0.0126	0.0161
Simplex	−30,344.15	−30,198.43	0.7508	18.3498	0.0127	0.0161
Kw	−30,340.90	−30,201.80	0.7753	18.5982	0.0125	0.0160
UW	−29,279.42	−29,140.33	0.7123	17.4838	0.0130	0.0168
RUBXII	−30,321.86	−30,182.76	0.7766	18.6195	0.0125	0.0160

Table 7. Estimated parameters, standard errors, and p-values of Model A and Model B.

Coefficients	$μ$ Submodel			$σ$ Submodel
Coefficients	Estimate	Std. Error	p-Value	Estimate	Std. Error	p-Value
	Model A
`(Intercept)`	−3.7448	0.0912	<0.0001	0.1664	0.3328	0.6172
`MHDI_I`				−1.4646	0.3267	<0.0001
`MHDI_H`	0.8143	0.1194	<0.0001	−1.5986	0.4813	<0.0001
`MHDI_E`	−0.3453	0.0436	<0.0001	−0.8920	0.2057	<0.0001
`Capital`	0.1759	0.0405	<0.0001	0.3764	0.1521	0.0133
`PBNVF`	7.6351	0.1225	<0.0001
`North`	0.1183	0.0187	<0.0001	−0.1623	0.0476	0.0007
`Northeast`	−0.0742	0.0153	<0.0001	−0.4096	0.0432	<0.0001
`South`	0.2092	0.0139	<0.0001
`Southeast`	0.5868	0.0131	<0.0001	−0.2206	0.0280	<0.0001
	Model B
`(Intercept)`	−3.8246	0.0920	<0.0001	−2.7416	0.0101	<0.0001
`MHDI_H`	0.9472	0.1222	<0.0001
`MHDI_E`	−0.3885	0.0453	<0.0001
`Capital`	0.1992	0.0435	<0.0001
`PBNVF`	7.4147	0.1272	<0.0001
`North`	−0.0793	0.0150	<0.0001
`Northeast`	−0.3993	0.0386	<0.0001
`South`	0.2112	0.0134	<0.0001
`Southeast`	0.6209	0.0126	<0.0001

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Rojas Guerra, R.; De Souza Moraes, K.; De Jesus Moreira Junior, F.; Peña-Ramírez, F.A.; Novaes Pereira, R. Determinants of Blank and Null Votes in the Brazilian Presidential Elections. Stats 2025, 8, 38. https://doi.org/10.3390/stats8020038

AMA Style

Rojas Guerra R, De Souza Moraes K, De Jesus Moreira Junior F, Peña-Ramírez FA, Novaes Pereira R. Determinants of Blank and Null Votes in the Brazilian Presidential Elections. Stats. 2025; 8(2):38. https://doi.org/10.3390/stats8020038

Chicago/Turabian Style

Rojas Guerra, Renata, Kerolene De Souza Moraes, Fernando De Jesus Moreira Junior, Fernando A. Peña-Ramírez, and Ryan Novaes Pereira. 2025. "Determinants of Blank and Null Votes in the Brazilian Presidential Elections" Stats 8, no. 2: 38. https://doi.org/10.3390/stats8020038

APA Style

Rojas Guerra, R., De Souza Moraes, K., De Jesus Moreira Junior, F., Peña-Ramírez, F. A., & Novaes Pereira, R. (2025). Determinants of Blank and Null Votes in the Brazilian Presidential Elections. Stats, 8(2), 38. https://doi.org/10.3390/stats8020038

Article Menu

Determinants of Blank and Null Votes in the Brazilian Presidential Elections

Abstract

1. Introduction

2. Unit Regression Models in the GAMLSS Family

2.1. Beta Regression

2.2. Simplex Regression

2.3. Kumaraswamy Regression

2.4. Unit Weibull Regression

2.5. Reflected Unit Burr XII Regression

3. Data Preparation and Descriptive Analysis

4. Fitting the Unit Regressions

5. Final Considerations

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI