A Study on Computational Algorithms in the Estimation of Parameters for a Class of Beta Regression Models

: Beta regressions describe the relationship between a response that assumes values in the zero-one range and covariates. These regressions are used for modeling rates, ratios, and proportions. We study computational aspects related to parameter estimation of a class of beta regressions for the mean with ﬁxed precision by maximizing the log-likelihood function with heuristics and other optimization methods. Through Monte Carlo simulations, we analyze the behavior of ten algorithms, where four of them present satisfactory results. These are the di ﬀ erential evolutionary, simulated annealing, stochastic ranking evolutionary, and controlled random search algorithms, with the latter one having the best performance. Using the four algorithms and the optim function of R , we study sets of parameters that are hard to be estimated. We detect that this function fails in most cases, but when it is successful, it is more accurate and faster than the others. The annealing algorithm obtains satisfactory estimates in viable time with few failures so that we recommend its use when the optim function fails.

Beta regressions are widely used for modeling the relationship between a response variable that takes values in the continuous range (0, 1) and independent variables or covariates. The beta regression was proposed in [15] as a helpful model for describing limited-range continuous data. This modeling is based on the beta distribution, which is widely flexible and covers diverse shapes (symmetric, asymmetric, unimodal, bimodal), for different values of its parameters. The beta distribution may be parameterized to model the mean and precision parameters in terms of covariates and regression parameters. Works focusing on maximum likelihood (ML) estimation for the parameters that index the beta regression and extensions were conducted in [17][18][19], among others.
The objective of this work is to continue the research that has been developed for beta regression models based on ML estimation. More specifically, our main objective is to evaluate the most efficient computational algorithms based on heuristic methods for maximizing the likelihood function of beta regression models. These algorithms are the controlled random search, differential evolutionary, DIRECT, DIRECT_L, evolutionary, genetic, memetic, particle swarm, self-adapted evolutionary, and simulated annealing methods. The performance of these algorithms is evaluated by using the Monte Carlo simulation method with the R software [20], considering the quality and computational time of the solutions found and analyzing the behavior in different scenarios. The codes and simulation results are available in the following repository: https://github.com/Raydonal/ MLE-BetaRegression-Optimization (accessed on 6 January 2022).
The rest of this paper is structured as follows. In Section 2, we provide background on the beta distribution and its regression. In this section, we also summarize the algorithms analyzed in the present study. In Section 3, two Monte Carlo simulations are conducted. First, we evaluate the computational performance of the mentioned algorithms and then we assess the optim function of the betareg package of R for these algorithms. We present the conclusions of our study in Section 4.

Beta Models
In the beta regression models, one assume that the response variable Y follows beta distribution with probability density function (PDF) given by where p > 0, q > 0, and Γ is the gamma function. The mean and variance of Y are stated as: As proposed in [15], a parameterization different from that used in (1) can be established in terms of location and precision parameters, µ and φ namely, . Therefore, µ is the mean of the response variable and φ is a precision parameter, since for fixed µ, as φ increases, the variance of Y decreases. Thus, the PDF of the response variable Y can be written as: where 0 < µ < 1 and φ > 0. Note that, as mentioned, the PDF formulated in (3) has several different shapes (symmetric, asymmetric -left or right skewness-, uniform, as well as "J" and inverted "J" -unimodal-, and bimodal) depending on the values assumed by the parameters µ and φ. The flexibility of the beta PDF is illustrated in Figure 1 for several different parameter combinations. In this investigation, we consider a fixed φ parameter. However, there are cases where it is convenient to take a varying φ t parameter and to assign a regression structure to log(φ t ). A study of this variant is out of the objective of our work, but this can be included in future studies as mentioned at the final section. The analyzed model is defined assuming that the random variable Y t follows a beta distribution with PDF established as in (3), with mean µ t , unknown precision φ, and that: where β = (β 1 , . . . , β k ) is a vector of unknown regression parameters to be estimated, and x t1 , . . . , x tk are the values of the covariates which are assumed to be fixed and known. Note that the function g stated in (4) is a link function. We assume that g is strictly monotonic and twice differentiable, which maps µ t to the real line [21] (p. 228). In the case of beta models, a suitable choice of g is the logit function formulated as g(µ) = log(µ/(1 − µ)) and used in generalized linear models [22], which allows the mean to be formulated as: Note that the expression given in (5) is similar to logistic regression models based on the binomial distribution to describe proportions/percentages [23]. These models are suitable only if the outcome is of the form r out of N (with y = r/N). However, this is inapplicable in situations where the raw numbers r and N are not available.
Let Y = (Y 1 , . . . , Y n ) be independent random variables, where each Y t , for t ∈ {1, . . . , n}, follows a beta distribution with mean µ t defined in (5) and PDF stated as in (3). In addition, y = (y 1 , . . . , y n ) are the observed values of Y and, as mentioned x t1 , . . . , x tk are observations of k fixed covariates, for k < n. Then, the log-likelihood function for θ = (β 1 , . . . , β k , φ) based y is expressed as: where Therefore, when maximizing the log-likelihood function defined in (6), we obtain the ML estimator of θ. Under mild standard regularity conditions (for example, the conditions described in [24] (Sections 7.1 and 7.2)), the ML estimator of θ is consistent and asymptotically normal, whereas the log-likelihood function defined in (6) is strictly concave [25]. Note that the ML estimator of θ does not have a closed form [26]. Hence, they need to be obtained by numerically maximizing the log-likelihood function using nonlinear optimization methods such as Newton or quasi-Newton algorithms [27]. Nonetheless, this ML estimator is biased in small sample size [19,28], but its bias decreases as the sample size increases, which justifies the search for alternative non-linear optimization algorithms. The beta regression model can be extended to cover different sources of heterogeneity, including non-constant dispersion and non-linearity [29][30][31], temporal dependence [32][33][34][35], inflated points [36,37], truncation [38], and error-in-variables as well as latent information [39][40][41][42], among others. To keep a clear focus, in the present study, we restrict our attention primarily to the fixed precision. Our presentation, however, is not critically dependent on the fixed precision assumption and the simulation results can be increased over other scenarios in future works.

Optimization Algorithms
As mentioned, it is possible to estimate the parameters of the beta regression model by maximizing the log-likelihood function defined in (6). To do this, several optimization methods can be used, such as heuristics that aim to find satisfactory solutions in viable com-putational time. These methods are generally effective for highly complex computational problems, such as the one of interest in this investigation.
Monte Carlo simulations considered in this study are divided into two steps. The first step is more general, allowing us to evaluate a large number of methods, selecting those that provide better computational results. The second step aims to compare the previously selected methods with the most employed one by the command optim of R from the betareg package.
The authors in [43] evaluated the 18 optimization methods available in the R language applied to 48 different optimization problems with known solutions. In our case and during the first step, we carry out a simulation study considering only the problem of maximizing the log-likelihood function defined in (6), with 10 different methods available in R packages. In the second stage, we use the methods that provided better results in the previous stage and the optim function implemented in the betareg package, in order to make a comparison. The methods utilized are defined as follows: • Genetic algorithm: This is a heuristic inspired by the basic principle of biological evolution and natural selection, simulating evolution so that the fittest individuals survive, imitating its mechanisms such as the processes of selection, crossing, and mutation. The ga function that implements this algorithm is available in an R package named GA [44]. • Differential evolutionary algorithm: This method is similar to the genetic algorithm indicated to find the global optimum of real-valued functions with real-valued parameters as well [45]. Such an algorithm does not need the function to be optimized that is continuous or differentiable and is available by the DEoptim command of an R package named DEoptim [46]. • Self-adapted evolutionary algorithm: This method proposed in [47] is a strategy of self-adaptation of the covariance matrix that is implemented in the cma_es function of the cmaes package of R. This is also an evolutionary method, which uses a covariance matrix approximation to be more efficient in the generation of next generations. • Simulated annealing algorithm: This is a metaheuristic based on the thermodynamic annealing process that performs a probabilistic local search replacing the current solution with a solution in its vicinity, obtaining good solutions regardless of the chosen starting point. The GenSA function is available in an R package named GenSA [48]. A general strategy to improve the simulated annealing (SANN) algorithm is to inject noise via the Markov chain Monte Carlo algorithms (noise-boosted) to sample high probability regions of the search space and to accept solutions that increase the search breadth [49][50][51]. Another approach is based on hybridized search gradient methods or genetic algorithms for cases of difficulties in the convergence with SANN [52,53]. • Controlled random search algorithm: This is a direct search heuristic [54] that tries to balance the fulfillment of constraints and convergence by storing possible trial points by the nloptr_crs function. Such a function has implemented this algorithm and is available in an R package named nloptr [55]. • DIRECT algorithm: This is a deterministic method based on the division of the search space into increasingly smaller hyperrectangles and was proposed in [56].
The nloptr_d function has implemented such an algorithm and is available in the nloptr package of R. • DIRECT_L algorithm: This is a variation of DIRECT containing certain randomness and was proposed in [57]. The nloptr_d_l function is available in the nloptr package. • Evolutionary algorithm: A common practice in evolutionary methods is to apply a penalty function to bias the search for viable solutions. This method is a strategy improved by stochastic ranking that proposes a way to eliminate subjectivity in the configuration of penalty parameters and was proposed in [58]. The nloptr_i function implements this algorithm and is available in the nloptr package of R.

First Stage of the Simulation
The first stage of the Monte Carlo simulation study was carried out considering the 10 methods mentioned and applied to the estimation of the mentioned beta regression parameters. The data were generated from the beta regression structure with link function g(µ t ) = β 0 + β 1 x t , such as defined in (4). The values of the covariate x t were obtained as independent realizations of the uniform distribution, denoted as Uniform(0, 1). Thus, under this structure, we can explore the behavior of the optimization procedures from a challenging vision but in a simple scenario and easy to compute due to the intensive calculations. Therefore, multiple regressions and modeling with varying precision were not considered in the present study and will be considered in future research. Logically, the simulation schemes can be structured using more covariates in mean or precision, including categorical and continuous covariates, nonlinearity, and other linking specifications to evaluate, for example, misspecification and sparsity. Nevertheless, this is beyond the scope of our original work.
Note that, in our study, the matrix of covariates remains constant throughout the experiment. Then, random samples of the response variable Y t were generated following the beta distribution presented in (3), that is, Y t ∼ Beta(µ t , φ). Four different sets of parameter values were considered as stated in Table 1, where Sets 1 and 2 result in average response values close to one, while Sets 3 and 4 report these values close to zero. For example, β 0 = −2.5 and β 1 = −1.2 result in mean response values close to zero, µ ∈ (0.024, 0.075) and, for φ = 5, a beta PDF is induced close to the inverted "J"-shape (asymptotes at zero). This situation is an extreme scenario, where traditional optimization procedures generally fail as shown in [64]. The sample sizes used were n ∈ {30, 60, 90, 120}. For each sample size, 50 instances were generated and, for each instance, 100 replicates of each method were performed. Thus, we got 80,000 observations per method. The number of iterations and other method control parameters were adjusted to allow a maximum of 10,000 function calls. Among the 10 evaluated methods, three of them returned an error in some replicates, without providing any result. These methods and the number of failures are shown in Table 2. The seven remaining methods did not return any error. The performance of methods can be further visualized with box plots summarizing the main statistical properties of the measures. The bar in each box is the median value of the measure across the runnings, while outliers deviating by one or more quartiles from the median are denoted as discrete dots at the extremes (outliers). Figure 2 displays the values obtained of the overall log-likelihood function corresponding with each method through box plots. With the exception of the nloptr_d and nloptr_d_l methods, the other algorithms obtained similar results to each other.  Figure 3 shows the performance in terms of the values of the log-likelihood function corresponding to the beta regression mean for each of the four sets described in Table 1. For the first set of parameters, most methods seem to have a similar performance, with the exception of the malschains and cma_es methods. In the second set, the nloptr_crs and malschains methods stand out from the rest. In the third set, there is a certain heterogeneity in the estimates, whereas for the fourth set, the malschains method stands out again, with the nloptr_d and nloptr_d_l methods being quite different from the others. Although the malschains method seems to perform well in some sets, it has some inconsistency with 33,900 failed executions, which are about 42%. Figure 4 presents the box plots of the intercept estimates (β 0 ) for each method, separated by parameter set and sample size, where the blue line indicates the real value of the parameter to be estimated. Such estimates seem to be more accurate as the sample size increases, as expected. The nloptr_d and nloptr_d_l methods report large variations in the results, which is evident in the graphical plot of Set 3. Note that the ga and cma_es methods present a large number of outliers in most sets. The other methods seem to behave similarly to each other, obtaining estimates close to the expected value.
The box plots of the estimates of the parameter β 1 , separating by parameter set and sample size, are shown in Figure 5, where once again the blue line indicates the real value of the parameter to be estimated. The results are very similar to those obtained in Figure 4, since again the nloptr_d and nloptr_d_l methods present large variations in the results and the ga and cma_es methods show a large number of outliers, while the rest of the methods obtained similar results, approaching the blue line. Again, an approximation of the estimates in relation to the blue line is observed as the sample size increases.   In addition to the quality of the estimates, it is of interest to assess the speed of the methods to obtain them. Figure 6 shows a bar plot of the average time per execution in seconds for each method. The ga algorithm is the slowest one, taking more than 1.5 s per run. The nloptr_crs method is much faster than all the others. The other methods have a similar speed performance to each other. Figure 7 sketches the bar plots of the average time per execution in seconds, for each method, separated by a set of parameters. Observe that the execution speed behavior does not seem to depend on the set of parameters to be estimated, with the exception of the cma_es method, which was much slower when dealing with estimates for φ = 148.
Given these results, the DEoptim, GenSA, nloptr_i, and nloptr_crs methods seem to be the most suitable, as the malschains, nloptr_d, nloptr_d_l, PSopt, ga, and cma_es methods show some inconsistencies in the parameter estimation.

Second Stage of the Simulation
In this step, the same methodology as the previous step is used, considering only the methods that have previously obtained better results (DE, SA, isres, crs) and the optim function currently utilized in the betareg package. Thus, it is possible to compare how the heuristics behave in relation to the currently most employed method, that is, the optim function of the betareg package. Furthermore, sets are investigated in which the methods are expected to have greater difficulty in estimating the parameters, diversifying the distribution of the covariate to the cases: Uniform(0,1); Normal(0,1); and Student-t(3). Note that the value of the precision parameter φ drastically decreases as the variance increases. The configuration used for the generated samples is found in Table 3. Note that, in Table 4, for Sets 5 and 6, only the DEoptim and GenSA methods can consistently estimate the parameters based on the number of failures per method.   In Figure 8, the box plots of the estimates of β 0 for Sets 5 and 6 are presented. Note that, for Set 5, only the DE and SA methods reported successes, whereas for Set 6, the isres, crs, and optim methods had few successes. This result is consistent with the number of failures shown in Table 4. Furthermore, the estimates performed with the DE and SA methods are similar. For the same sets, in Figure 9, the box plots of the estimates of β 1 are displayed, where similar behavior is detected, with the notable difference that, in this case, the optim function provides more heterogeneous results in the case of samples with small sizes. The estimates of φ are shown in Figure 10, where a similar behavior is observed in relation to the estimates of β 0 and β 1 .
For Sets 7 and 8, observe that, in Table 4, the crs and isres methods failed in all attempts, while the optim function failed in most executions. In addition, the DE and SA methods had few flaws/failures, being the most consistent algorithms. The estimates of β 0 shown in Figure 11 report that the behavior remains similar between the DE and SA methods. Note that the optim function did not have any success in estimating the parameters for samples of size n = 120. The same can be seen in the estimates of β 1 in Figure 12 and in the estimates of φ in Figure 13.
For Sets 9 and 10, observe that, in Table 4, once again the crs and isres methods are unable to provide estimates. In this case, the optim function had a lower number of failures, although it is still a considerably significant number. The DE method follows without failing, while the SA method reports some few flaws. Figures 14-16 show the estimates of β 0 , β 1 , and φ, respectively. In this case, the optim function did not obtain estimates for Set 9. Furthermore, a similar behavior between the DE and SA methods is confirmed. Note that the estimates of β 1 were not very accurate for Set 9.
For Sets 5 and 6, Figure 17 sketches the average execution time in seconds, considering only the successes. Note that, despite the large number of failures, the optim function is quite efficient when it comes to estimating the parameters. Although the DE and SA methods have obtained close estimates, the execution time of the SA method is much smaller, so that its use is indicated in the studied cases. For Sets 7 and 8, as previously shown in Figure 18, it is confirmed that the execution time of the SA method is less than for the DE method, despite the similarity in the estimates. For Sets 9 and 10, as for the execution times shown in Figure 19, the previous behavior is once again detected, with the SA method being more efficient than the DE method.

Conclusions
This work evaluated the performance of the most used optimization methods when numerically maximizing the likelihood function associated with the beta regression model for different scenarios, varying the distribution of the covariate, its parameters, and sample size.
We considered a Monte Carlo simulation study in two stages. From the results of the first stage of these simulations carried out and the analysis of the graphical plots presented, we observed that six methods reported some inconsistencies and were not suitable for the problem. These methods are the genetic, evolutionary with self-adaptation of the covariance matrix, DIRECT, DIRECT_L, memetic with local search strings, and particle swarm algorithms. Four other methods provided satisfactory results, that is, the differential evolutionary, simulated annealing, controlled random search, and evolutionary with improved stochastic ranking algorithms. All of them obtained similar and satisfactory results in relation to the estimates in the evaluated scenarios. However, among them, the controlled random search algorithm had a superior computational speed performance in relation to the others. In the second stage of the simulations, sets of parameters were used to make the estimation process more difficult. In this case, we observed that the controlled random search and evolutionary with improved stochastic ranking algorithms presented a very large number of failures, in some cases not achieving any success. The optim function used in the betareg package of R had many failures, but in cases where it was successful, it outperformed the other methods both in terms of accuracy of estimation and speed. The differential evolutionary and simulated annealing algorithms provided satisfactory estimates with few failures, with the simulated annealing algorithm being more efficient in terms of computational time.
Based on the findings detected in our computational study, we report the following. The optim function, implemented in the betareg package to maximize the log-likelihood function of the beta regression model and to estimate its parameters, is accurate and efficient for most of the cases. Nevertheless, in some scenarios reported in our study, it presents some difficulties in estimating such parameters. In these scenarios, we recommend the use of the simulated annealing algorithm, which for the cases studied in this work showed greater consistency, providing satisfactory estimates in viable computational time.
Some limitations of our investigation, which open naturally the possibility of future works, are related to investigating the case of a varying precision parameter by means of a regression structure to log(φ t ). In addition, since we used one covariate in the simulation study, an exploration of experimental results in the case of multiple regression is of interest because could affect the results. These open problems and others that are related are suitable to be further analyzed so that we hope to report their findings in a future publication.

Acknowledgments:
The authors would also like to thank the editor and reviewers for their constructive comments which helped improve the presentation of the manuscript. R.O. thanks to Andre Leite for technical comments in the old version of this manuscript. The authors thank to FACEPE, CAPES and CNPq, Brazil.

Conflicts of Interest:
The authors declare no conflict of interest.