Next Article in Journal
New Methods for Multivariate Normal Moments
Previous Article in Journal
mbX: An R Package for Streamlined Microbiome Analysis
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Evaluating Estimator Performance Under Multicollinearity: A Trade-Off Between MSE and Accuracy in Logistic, Lasso, Elastic Net, and Ridge Regression with Varying Penalty Parameters

Department of Mathematics and Statistics, Florida International University, Miami, FL 33199, USA
*
Author to whom correspondence should be addressed.
Stats 2025, 8(2), 45; https://doi.org/10.3390/stats8020045
Submission received: 5 May 2025 / Revised: 27 May 2025 / Accepted: 29 May 2025 / Published: 31 May 2025

Abstract

:
Multicollinearity in logistic regression models can result in inflated variances and yield unreliable estimates of parameters. Ridge regression, a regularized estimation technique, is frequently employed to address this issue. This study conducts a comparative evaluation of the performance of 23 established ridge regression estimators alongside Logistic Regression, Elastic-Net, Lasso, and Generalized Ridge Regression (GRR), considering various levels of multicollinearity within the context of logistic regression settings. Simulated datasets with high correlations (0.80, 0.90, 0.95, and 0.99) and real-world data (municipal and cancer remission) were analyzed. Both results show that ridge estimators, such as k A L 1 , k A L 2 ,   k K L 1 ,   and k K L 2 , exhibit strong performance in terms of Mean Squared Error (MSE) and accuracy, particularly in smaller samples, while GRR demonstrates superior performance in large samples. Real-world data further confirm that GRR achieves the lowest MSE in highly collinear municipal data, while ridge estimators and GRR help prevent overfitting in small-sample cancer remission data. The results underscore the efficacy of ridge estimators and GRR in handling multicollinearity, offering reliable alternatives to traditional regression techniques, especially for datasets with high correlations and varying sample sizes.

1. Introduction

Logistic regression stands as a key method in statistical modeling, particularly for binary classification, where the outcome variable is dichotomous. Its widespread use in various fields, including epidemiology, social sciences, and machine learning, is due to its ability to model the probability of an event occurring as a function of one or more predictor variables [1]. Unlike a linear model, which assumes a continuous outcome, logistic regression uses a logistic function to convert the linear combination of predictors into a probability, making it well-suited for categorical outcomes. This ability to estimate probabilities directly makes logistic regression invaluable for risk assessment, predictive modeling, and hypothesis testing in several applied contexts.
However, multicollinearity, characterized by a high correlation among predictor variables, adds complexity to the interpretation and reliability of logistic regression models [2]. Although multicollinearity does not introduce bias into the coefficient estimates in the same manner as in linear regression, it significantly inflates the standard errors of these coefficients [3]. Inflated standard errors lead to decreased t-statistics (or Wald statistics) associated with the coefficients, which reduces the likelihood of detecting statistically significant effects. This increases the risk of Type II errors, where genuine relationships between predictors and the outcome may be overlooked [4]. As a result, the practical implications of multicollinearity can lead to misleading conclusions, ultimately compromising the model’s usefulness for both predictive and explanatory purposes. Although the coefficients are not biased, their stability diminishes, resulting in substantial changes in the estimated coefficients when there are slight variations in the data. This instability makes it challenging to determine the true magnitude and direction of the predictors’ effects [5]. Multicollinearity also complicates the assessment of variable importance. The high correlation among predictors complicates the identification of their individual impacts on the outcome, risking misleading conclusions about their influence [6]. The interpretation of odds ratios, which are central to logistic regression, can also be problematic. With inflated standard errors, the confidence intervals around the odds ratios broaden, making it difficult to draw precise inferences about the magnitude of the effects. This complicates the practical application of the model findings [7].
To mitigate multicollinearity in logistic regression, researchers often use regularization techniques like Ridge Regression, Lasso, and Elastic Net [8]. These methods are intended to minimize the overlap between predictor variables, strengthen the consistency of coefficient estimates, and boost the overall reliability and clarity of the model.
Hoerl and Kennard (1970) introduced ridge regression to address multicollinearity in engineering data. Their work showed that a nonzero ridge parameter (k) can lower the MSE of ridge regression compared to the variance of the Ordinary Least Squares (OLS) estimator [9]. Since their pioneering contribution, extensive research has been conducted to refine and enhance the ridge regression methodologies. A multitude of scholars have proposed innovative estimators for the ridge parameter, making remarkable strides in the field. Their contributions not only enrich existing knowledge but also open new avenues for exploration and understanding. Notable contributions include those of McDonald & Schwing, 1973; Hoerl et al., 1975; McDonald & Galarneau, 1975; J. F. & P, 1976; Dempster et al., 1977; Gibbons, 1981; Schaeffer et al., 1984; Schaeffer, 1986; Walker & Birch, 1988; Kibria, 2003; Khalaf & Shukur, 2005 [10,11,12,13,14,15,16,17,18,19,20] and very recently Muniz & Kibria, 2009; Månsson et al., 2010; Kibria et al., 2012; Hefnawy & Farag, 2014; Aslam, 2014; A. V. Dorugade, 2014; Arashi & Valizadeh, 2015; Ayinde & Lukman, 2016; Lukman & Ayindez, 2017; Melkumova & Shatskikh, 2017; Lukman et al., 2018, 2019; Herawati et al., 2018, 2024; Yüzbaşı et al., 2020; Golam Kibria & Lukman, 2020; Kibria, 2023; Hoque & Kibria, 2023; Mermi et al., 2024; Nayem et al., 2024; Hoque & Kibria, 2024; Yasmin & Kibria, 2025, among others [20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40]. These contributions have played a critical role in the ongoing refinement of ridge regression techniques, ensuring their continued relevance in statistical modeling and analyses.
Despite extensive research on ridge regression, relatively little attention has been devoted to its application in logistic regression. Schaeffer et al. (1984) presented a logistic ridge regression (LRR) estimator, and later research investigated different approaches for estimating the ridge parameter (k) and evaluated their effectiveness using Monte Carlo simulations [16,17,23]. Recently, Mermi et al. (2024) carried out a thorough comparative evaluation of 366 ridge parameter estimators introduced over various periods, taking into account elements like the number of independent variables, sample size, correlation among predictors, and variance of errors [38]. In the present study, we incorporated 16 of the most effective estimators from Mermi et al. (2024). Furthermore, we utilized seven estimators that Kibria and Lukman (2020) and Kibria (2023) specifically suggested for non-normal population distributions, guaranteeing a thorough assessment of logistic ridge regression across different modeling scenarios [34,36].
In the presence of multicollinearity, selecting an optimal estimator requires a careful balance between minimizing the MSE and maximizing the classification accuracy. MSE minimization is crucial for obtaining precise parameter estimates and reducing prediction errors, particularly in regression contexts, where deviations from true values carry significant implications [5]. However, an exclusive focus on the MSE may fail to account for the importance of classification accuracy, especially in binary or categorical response settings, where correct classification is critical for decision-making [7]. The selection of an estimator must strategically weigh these competing metrics, emphasizing MSE reduction in scenarios where precise numerical predictions are paramount and prioritizing classification accuracy when robust categorical predictions are the primary objective. As evidenced by numerous simulation studies, the optimal estimator is often context-dependent, varying based on the specific application and the relative costs associated with prediction errors versus misclassifications. Unlike much of the existing literature that focuses predominantly on linear models or limits evaluation to MSE, this study emphasizes both MSE and classification accuracy to provide a more comprehensive assessment of the model performance in the presence of multicollinearity. While previous studies have explored various ridge estimators in linear regression contexts, there remains a significant gap in systematically evaluating their effectiveness within logistic regression frameworks, particularly under high multicollinearity and non-normal data conditions. By incorporating recently developed and robust ridge parameter estimators tailored for such scenarios, this study aims to address this gap and offer practical guidance for the selection of estimators in binary classification problems. Therefore, it is crucial to conduct a thorough assessment of both the MSE and classification accuracy to make a well-informed choice concerning estimator selection when multicollinearity is present.
This research intends to conduct an in-depth comparison of 23 ridge regression estimators alongside traditional LR, Lasso, EN, and GRR. This examination goes beyond the typical emphasis on MSE by also assessing classification accuracy in situations of multicollinearity. The objective of this study is to enhance the existing literature on ridge regression by systematically evaluating the performance of estimators under various levels of multicollinearity. Additionally, real-world datasets will be utilized to corroborate the findings obtained from simulations, thereby demonstrating the practical applicability of the methodologies employed.

2. Methodology

The OLS estimator of θ is the following:
θ ^ O L S = X X 1 X Y
In regression analysis, let Y be the response, X be a matrix of predictors, and θ be a parameter vector. The ε denotes the random error. When the response variable is binary, standard OLS regression is no longer suitable, and logistic regression is employed instead. In this section, we present LRR, initially proposed by Schaefer et al. (1984), and explore its enhancements through more recent advancements [16]. The refinements aim to enhance the performance and stability of logistic ridge regression, making it a more robust alternative in cases of multicollinearity.
Let the i th observation of the response, y i , follows a Bernoulli distribution, denoted as y i B e r n o u l l i   π i , where π i represents the probability of success associated with the i th observation. The parameter π i is modeled as a function of the predictors using the logit link, ensuring that predicted probabilities remain within the unit interval.
π i = e x p x i θ 1 + e x p x i θ
where x i is the n × ( m + 1 ) data matrix with m predictors, and θ is the ( m + 1 ) × 1 coefficients vector. The approach of maximum likelihood for estimating θ involves maximizing the subsequent log-likelihood.
l ( X ; θ ) = y l o g ( π ) + ( 1 y ) l o g ( 1 π )
This can be achieved by setting the first derivative of the expression equal to zero. As a result, the maximum likelihood estimates are found by solving the resulting equation:
l ( X ; θ ) θ = X ( y π ) = 0
As this equation is nonlinear in nature, the Newton-Raphson method must be employed to find a solution. As a result, the solution to the previously mentioned equation can be obtained through the subsequent iterative weighted least squares method.
θ ^ M L = X W X 1 X W Z
where W = d i a g π ^ i 1 π ^ i , and
Z i = l o g π ^ i + y i π ^ i π ^ i 1 π ^ i
The asymptotic covariance matrix of the maximum likelihood estimator is the inverse of the matrix of second derivatives:
C o v θ ^ M L = E 2 l ( X ; θ ) θ   θ = X W X 1
and the asymptotic MSE equals:
E θ ^ M L θ 2 = E θ ^ M L θ θ ^ M L θ = t r X W X 1 = j = 1 J 1 λ j
where λ j is the eigenvalue of X W X matrix. A key drawback of the maximum likelihood estimate is that its asymptotic variance can inflate due to high correlation among independent variables, leading to small eigenvalues. To address this multicollinearity issue, Schaefer et al. (1984) proposed using the following LRR estimator [16]:
θ ^ L R R = X W ^ X + k I 1 X W ^ X θ ^ M L = Z θ ^ M L
Numerous ridge estimators have been suggested in the literature for different kinds of models, as mentioned previously, to determine the ridge parameter k. For this study, we selected the 16 most effective estimators identified by Mermi et al. (2024) based on their performance in different statistical settings [38]. In addition, we incorporated seven estimators introduced by Kibria & Lukman (2020) and Kibria (2023) which are specifically designed for asymmetric data [34,36]. These 23 estimators were selected to ensure a comprehensive comparison that spanned both classical and modern approaches to ridge parameter selection. Including estimators suited for asymmetric data is particularly important because such data structures frequently arise in applied settings and can adversely affect the performance of conventional ridge estimators [9,19]. The comprehensive set of 23 ridge estimators utilized in this study, encompassing both simulation-based analyses and real-life applications, is systematically presented in Table 1.

2.1. Generalized Ridge Regression (GRR)

In 2017, Yang and Emura presented a GRR estimator that uses non-uniform shrinkage instead of the conventional uniform shrinkage method, replacing the identity matrix with a diagonal matrix   N ^ Δ [44].
For a binary response variable y i { 0,1 } , the GRR estimator can be adapted by incorporating it into the penalized logistic regression framework. The parameter estimates θ ^ G R R are derived by minimizing the penalized negative log-likelihood, as follows:
θ ^ G R R = a r g m i n θ i = 1 n y i l o g π i + 1 y i l o g 1 π i + λ θ N ^ ( Δ ) θ
where π i = 1 1 + e x p x i θ is the predicted probability from logistic regression, λ > 0 is the shrinkage parameter, Δ 0 is the threshold parameter, and N ^ ( Δ ) = d i a g n ^ 1 ( Δ ) , , n ^ p ( Δ ) is a diagonal matrix encoding non-uniform penalties for each coefficient [45].
n ^ i ( Δ ) = 1 2 ,   if   z i Δ 1 ,   if   z i < Δ
where z i = θ ^ i 0 S D θ ^ 0 is a standardized initial estimate of θ i , S D θ ^ 0 = 1 p 1 i = 1 p θ ^ i 0 1 p j = 1 p θ ^ j 0 2 1 / 2 , and θ ^ i 0 = x i y x i x i serves as an initial estimator based on a simple componentwise pseudo-regression.
This formulation allows for adaptive shrinkage, where coefficients associated with stronger initial signals (larger z i ) are penalized less, while weaker signals are shrunk more aggressively, thus promoting both interpretability and generalization in high-dimensional binary classification problems.

2.2. Least Absolute Shrinkage and Selection Operator (Lasso)

Lasso (Tibshirani, 1996) is an effective regression method that conducts variable selection and regularization at the same time, enhancing both the precision and clarity of statistical models [46]. Unlike ridge regression, which uses the squared l 2 norm, Lasso employs the l 1   norm to reduce the total of squared differences while adhering to a limitation on the overall absolute size of the coefficients. This constraint promotes sparsity in the model, effectively mitigating multicollinearity and identifying a subset of truly relevant predictors [46,47].
θ ^ lasso   = a r g   m i n θ i = 1 n l i ( θ ) + λ j = 1 p θ j
where l ( β ) is the log-likelihood of the binary logistic model, λ 0 is the tuning (regularization) parameter, θ is the coefficient vector, j = 1 p β j is the l 1 norm to shrink coefficients toward zero, n is the sample size, and p is the number of predictors.

2.3. Elastic Net (EN)

The EN, which was introduced by Zou and Hastie in 2005, extends Lasso by combining penalties from both the l 1 norm (Lasso) and l 2 norm (ridge), addressing limitations in variable selection [48]. This dual regularization reduces coefficients like ridge regression while enforcing sparsity by setting some to zero, as in the case of Lasso. The l 1 norm ensures model sparsity, while the l 2 norm alleviates restrictions on the number of selected predictors, enabling more flexibility [49]. EN’s ability to select more predictors than Lasso makes it a powerful tool for variable identification, as demonstrated in spectral data analysis [50]. For a binary outcome variable, the EN reduces the penalized negative log-likelihood of the logistic regression model to calculate the best-parameter vector.
Let:
π i = 1 1 + e x p x i θ
Then the EN estimator becomes:
θ ^ E N = a r g   m i n θ i = 1 n y i l o g π i + 1 y i l o g 1 π i + λ ( 1 α ) j = 1 p θ j 2 + α j = 1 p θ j
where π i is the predicted probability that Y i = 1 , θ is the coefficient vector, λ is the regularization strength, and α is the weight between the l 1 norm and l 2 norm ( 0 = Ridge, 1 = Lasso ) .
In this study for GRR, Lasso, and EN, the optimum λ , tuning parameter was obtained using 5-fold cross-validation.

2.4. Simulation Study

The objective of this research is to evaluate the effectiveness of the leading ridge estimators against LR, Lasso, EN, and GRR, focusing on minimizing MSE and enhancing accuracy. Due to the infeasibility of a theoretical comparison, a Monte Carlo simulation study has been performed utilizing the R programming language [51]. The method used for generating data for the models adheres to a recognized procedure [19].
X i j = 1 ρ 2 1 / 2 d i j + ρ d i ( p + 1 ) , i = 1,2 , , n ;   j = 1,2 , , P
where ρ indicates the correlation between two predictors, d i j denotes the independent pseudo-random variable, and P indicates the number of independent variables. Moreover, the response, Y was obtained through the Bernoulli π i distribution, where:
π i = e x p x i θ 1 + e x p x i θ
In the modeling process, we considered three distinct values for the number of independent variables ( P = 3 , 5 , and 10), the correlation between the predictors ( ρ = 0.80 , 0.90 , 0.95 , and 0.99), and three sample sizes ( n = 100 , 200 , and 300). The procedure for generating data for the predictors was carried out according to the established values of P, ρ , and n. In order to confirm the reliability of the findings, the experiment was conducted 5000 times.
Following the theoretical foundation established by Newhouse and Oman (1971), which posits that the MSE is minimized when the coefficient vector   θ aligns with the normalized eigenvector corresponding to the largest eigenvalue of the X X matrix, the simulation study computes MSE by comparing estimated coefficient vectors, θ , derived from various models, to the true   θ [52]. This true θ is determined for each simulation by extracting and normalizing the eigenvector associated with the largest eigenvalue of X X , ensuring θ θ = 1 , thereby providing a theoretically grounded benchmark for evaluating estimator performance through the calculation of the average squared difference between θ ^ and θ . The average estimated MSE was computed using the formula applied to all simulations [39].
M S E ( θ ^ j * ) = 1 N j = 1 P θ ^ j θ θ ^ j θ
where θ ^ j * is the estimators, θ is the parameter, and N = 5000.
In the evaluation of the classification performance, accuracy was used as a key metric to quantify the proportion of correctly predicted binary outcomes. For each simulation, we applied a 70% training and 30 % testing split on the simulated data. Models were trained on the 70 % training subset, and predictions were generated for the held-out 30 % test set. The predicted values, which were initially probabilities derived from the logistic regression link function, were converted into binary classifications using a threshold of 0.5. Specifically, if the predicted probability exceeded 0.5, the observation was classified as 1; otherwise, it was classified as 0.
The models’ overall accuracy was calculated using a confusion matrix based on the total number of correctly classified observations [53].
A c c u r a c y = T P + T N T P + T N + F P + F N
where TP—true positive, TN—negative, FP—false positive, and FN—false negative, accuracy serves as the comprehensive metric for the model’s ability to make correct predictions across the entire dataset. For each simulation, accuracy values were calculated, and the average accuracy was then computed across all simulations, considering different combinations of parameters (n, P, and ρ ) and models.

3. Results & Discussion

In this part, we carried out an extensive simulation study to assess the effectiveness of ridge estimators relative to conventional logistic regression, elastic net, Lasso, and generalized ridge regression. The results, summarized in Table 2, Table 3, Table 4 and Table 5, present the MSE and classification accuracy of the estimators under different conditions. Specifically, this study examines the impact of correlation coefficients (0.80, 0.90, 0.95, and 0.99). The findings, systematically detailed in these tables, illustrate the effects of different correlation structures, the number of independent variables, and sample sizes on the estimator performance.
Table 2 compares the MSE and Accuracy across various regression models under moderate multicollinearity (correlation = 0.80) for different predictor counts (P = 3, 5, 10) and sample sizes (n = 100, 200, 300). For P = 3, k K L 2   consistently achieves the lowest MSE (0.088), demonstrating its effectiveness in low-dimensional settings. As P increases to 5, k K L 2 remains optimal, with an MSE of 0.082. However, for P = 10, GRR performs best, with an MSE of 0.064 at n = 300, highlighting its capacity for high-dimensional data. LR exhibits the highest MSE, indicating limitations in multicollinear environments. EN and Lasso show competitive performance, especially with larger P and n, benefiting from regularization.
In terms of accuracy, most models show stable performance, with slight improvements as sample size increases. For P = 3, ridge estimators like k K L 2 , k K 5 , and k K 1 maintain an accuracy of 0.736 at n = 100, rising to 0.738 at n = 300. With P = 5, models such as GRR, EN, and Lasso achieve accuracies of 0.764 to 0.766 at n = 300, demonstrating effectiveness in moderate complexity. For P = 10, GRR leads with an accuracy of 0.812 at n = 300, showing its robustness in high-dimensional spaces.
Table 3 shows that under high multicollinearity (correlation = 0.90), MSE values increase, indicating greater prediction difficulty. Focusing on the non-penalized methods first, we see that LR exhibits the highest MSE, particularly as P increases, indicating its vulnerability to overfitting with more predictors. GRR substantially reduces MSE compared to LR, highlighting the benefits of l 2 regularization. As expected, increasing sample size ( n ) generally decreases MSE for all methods, as more data provides a better estimate of the underlying relationships. For the ridge estimators, we observe generally low and stable MSE across different P and n , suggesting that these methods are less sensitive to the increase in predictor dimension in this context. However, some estimators like k D ,   k L A 2 , k L 1 ,   k A 2 , and k M 2 show relatively higher MSE compared to others, indicating potential challenges in capturing the underlying data structure with specific kernels. Notably, k A L 1 ,   k A L 2 , k K L 1 , and k K L 2 achieve some of the lowest MSE values, suggesting good performance in this scenario.
Regarding accuracy, the table displays the proportion of correctly classified instances for each estimator under the same conditions. Like the MSE trends, the accuracy tends to improve with increasing sample size ( n ) for most methods, reflecting the benefit of more data. LR, which suffered from a high MSE, also exhibits the lowest accuracy, reinforcing the detrimental effect of overfitting. In contrast, the penalized methods, Lasso and EN, demonstrate higher accuracy, with GRR achieving the highest accuracy among the estimators. The ridge estimators, except for k D ,   k L A 2 ,   k L 1 , k A 2 , and k M 2 , achieve high and relatively consistent accuracy across different P and n , often surpassing the traditional methods. This suggests that the ridge estimators are effective in capturing the non-linear relationships in the data, leading to better classification performance. The methods k A L 1 , k A L 2 ,   k K L 1 , and k K L 2 , which had low MSE, also maintain high accuracy, further indicating their effectiveness. The methods k D , k L A 2 ,   k L 1 ,   k A 2 , and k M 2 that showed relatively higher MSE also show lower accuracy compared to other ridge estimators, suggesting a link between MSE and classification performance.
Table 4 presents the MSE for various estimators across different parameter settings (P = 3, 5, 10) and sample sizes (n = 100, 200, 300) given a correlation of 0.95. Firstly, within the ridge estimators, we observe a general trend of decreasing MSE as the sample size (n) increases, as expected due to the larger amount of information available for estimation. Notably, the estimators k A L 1 ,   k A L 2 ,   k K L 1 ,   and k K L 2   consistently demonstrate lower MSE values, particularly as P increases, suggesting superior performance in capturing the underlying relationship under higher dimensionality. In contrast, the K D   estimator exhibits significantly higher MSE, indicating poor performance and potential instability. Among the traditional methods, GRR consistently achieves the lowest MSE across all settings, highlighting its effectiveness in handling high correlations and varying dimensions. EN and Lasso also demonstrate reasonable performance, with the MSE decreasing as the sample size increases, albeit not as effectively as the GRR. LR shows the highest MSE, especially with larger P values, indicating its sensitivity to high correlation and dimensionality.
The accuracy metric presented alongside the MSE provides insights into the classification performance of the estimators. Similar to the MSE trends, we observe that the accuracy generally improves with increasing sample size across all estimators. However, the ridge estimators show remarkably consistent and high accuracy, often reaching 83.7% for P = 10 and n = 300. This suggests that these estimators are robust and effective in capturing underlying patterns, even with increased dimensionality. GRR, EN, and Lasso also exhibit good accuracy, with GRR showing slightly better performance, especially at higher P values. LR, while showing improvement with larger sample sizes, lags the other methods in terms of accuracy, consistent with its higher MSE.
Table 5 reveals the impact of extremely high multicollinearity (correlation of 0.99) on the MSE. The near-perfect correlation increases the MSE values across all models, highlighting the difficulty of accurate prediction under extreme dependencies. The ridge estimators, particularly k A L 1 ,   k A L 2 ,   k K L 1 ,   and k K L 2 consistently exhibit the lowest MSE across all parameter settings (P = 3, 5, 10) and sample sizes. Importantly, their MSE values decrease as the sample size increases, demonstrating the desired consistency and effectiveness in handling the strong dependencies. In contrast, the k D estimator shows significantly higher MSE, indicating poor performance and a lack of robustness to high correlation. Among the classical methods, GRR stands out with the lowest MSE and exhibits a clear trend of decreasing MSE with increasing sample size, suggesting a superior ability to mitigate the impact of high correlation. EN and Lasso also show reasonable performance, with MSE decreasing as the sample size increases, although not as effectively as GRR. LR exhibits the highest MSE, especially with larger P values, and shows less consistent improvement with increasing sample size, indicating its struggle to handle the strong linear dependency of the data.
From Table 5, we observe that the accuracy generally improves with increasing sample size across all estimators. The ridge estimators, specifically k A L 1 ,   k A L 2 ,   k K L 1 ,   and k K L 2 , demonstrate remarkably consistent and high accuracy, often reaching 84.3% for P = 10 and n = 300. This suggests that these estimators are robust and effective in capturing the underlying patterns, even with extreme correlations. GRR also exhibits good accuracy, showing slightly better performance, especially at higher P values. EN and Lasso show comparable accuracy to GRR, while LR lags the other methods in terms of accuracy, consistent with its higher MSE.

4. Applications

We conducted an analysis of two real-life datasets to effectively illustrate the findings derived from the simulation presented in this section.

4.1. Municipal Data

In this part, the models are applied to the municipal dataset from Statistics Sweden [54]. A binary logistic regression model is employed, with the dependent variable indicating whether the net population expands (coded as 1) or contracts (coded as 0). The model aims to explain the response variable using the following predictors: X1 (population), X2 (number of unemployed individuals), X3 (number of newly constructed buildings), and X4 (number of bankrupt firms).
The full dataset consists of 271 entries, which correspond to the municipalities in Sweden. The correlation plot illustrated in Figure 1 indicates that all correlation values are above 0.90, with some approaching 0.99. Furthermore, the condition number is 38.33, suggesting a notable issue with multicollinearity within the data [55].
Table 6 presents the performance of various statistical estimators, primarily focusing on shrinkage methods, evaluated based on MSE and Accuracy. Lower MSE values indicate a closer approximation of the estimated values to the true values, suggesting higher precision and a reduced prediction error. Conversely, higher accuracy scores reflect a greater proportion of correct predictions, implying a better classification performance. Examining the estimators, we observe that GRR exhibits the lowest MSE (0.1470747), indicating superior precision among the tested models, which aligns with the simulation results since this is a large sample. This suggests that the GRR, by introducing a form of regularization that shrinks coefficients towards zero, effectively mitigates overfitting and improves the model’s predictive accuracy. In contrast, LR, without any shrinkage, shows the highest MSE (0.7111922), highlighting the potential for substantial prediction errors in unregularized models, especially in datasets with multicollinearity or high dimensionality. The Lasso and EN estimators, which also incorporate regularization, fall within a moderate range of MSE values, demonstrating their ability to balance bias and variance, although not as effectively as GRR for this data (Figure 2).
In terms of accuracy, most of the estimators, including GRR and the various ridge estimators, such as k A S , k A 1 , and k D achieve an accuracy of approximately 67.1%, suggesting a consistent ability to correctly classify a significant portion of the observations. This uniform accuracy across several shrinkage estimators implies that, while the MSE varies significantly, the overall classification performance remains relatively stable. However, the LR estimator exhibits a lower accuracy of 62.2%, aligning with its higher MSE and indicating a less reliable classification capability. The Lasso and EN estimators, like their MSE results, show moderate accuracy levels (64.63%), further demonstrating their balanced performance. The consistency of high accuracy among the ridge estimators, despite variations in their specific shrinkage parameters, suggests that these methods are robust in classification tasks, potentially due to their specific adaptations to the dataset’s characteristics. Overall, the findings suggest that shrinkage methods, particularly GRR, are effective in minimizing prediction errors and maintaining high classification accuracy for large data, highlighting the importance of regularization in statistical modeling.

4.2. Cancer Remission Data

In contrast to the previous analysis, which examined a larger dataset of approximately 271 samples, we now focus on a smaller dataset to further evaluate the performance of the estimators. Specifically, we analyze the Cancer Remission dataset [56,57], which consists of 27 observations. This dataset provides a valuable opportunity to assess the performance of estimators in a setting with a limited sample size, offering insights into their robustness and effectiveness in small-sample scenarios. The response variable is binary, indicating whether a patient achieves complete cancer remission (Y = 1) or not (Y = 0). The dataset includes observations from 27 patients, of whom nine experienced complete remission. The explanatory variables in the dataset are standardized such that X X corresponds to a correlation matrix.
The condition number computed as k = λ m a x λ m i n   = 201.33 . This high condition number and Figure 3 strongly suggest the presence of severe multicollinearity, which can adversely impact the stability and reliability of parameter estimates in regression models [55].
Table 7 presents the MSE and accuracy for various models, including ridge regression with different shrinkage estimators, LR, Lasso, EN, and GRR. Several models, particularly those with high MSE values like LR, Lasso, EN, and some ridge estimators, achieve perfect accuracy (100%). However, with a sample size of only 27, this perfect accuracy strongly suggests overfitting. Overfitting occurs when a model conforms too closely to the training data, picking up noise instead of true patterns, which results in weak performance on new, unseen data. The models with lower MSE values, such as k L 2 , k L 3 , and k A S , while not achieving perfect accuracy, might offer a more robust representation of the underlying relationships in the data, as they are less prone to overfitting.

5. Concluding Remarks

This study aims to evaluate the effectiveness of ridge regression, GRR, Lasso, and Elastic Net within the context of a logistic regression model by balancing the MSE with prediction accuracy. Given that a theoretical assessment of the estimators cannot be performed, a comprehensive simulation study has been carried out to assess their performance across various parametric conditions. The simulation studies, focusing on varying levels of correlation (0.80, 0.90, 0.95, and 0.99), consistently demonstrate the superior performance of ridge estimators with varying penalties, particularly k A L 1 , k A L 2 ,   k K L 1 ,   and k K L 2 , for small samples and GRR for large samples in terms of both MSE and accuracy. These methods exhibit desirable characteristics, such as lower MSE, indicating a better fit and consistent reduction of MSE with increasing sample size, highlighting robustness. Furthermore, they achieve high accuracy, suggesting reliable prediction capabilities even under severe multicollinearity. The simulation findings are corroborated by the real-world applications involving municipal and cancer remission data. Specifically, in the municipal data, characterized by high correlations among predictors, the GRR achieves the lowest MSE, aligning with the simulation results and demonstrating its effectiveness in mitigating overfitting and improving prediction accuracy in large multicollinear datasets. In the cancer remission data, a small-sample scenario with high multicollinearity, the models with lower MSE, including several ridge estimators and GRR, exhibit more realistic accuracy, suggesting robustness and reduced overfitting compared to models with inflated accuracy and high MSE, which are indicative of overfitting.
The alignment of the simulation findings with actual applications highlights the significance of using suitable statistical techniques, especially when dealing with multicollinearity and different sample sizes. The ridge estimators and GRR emerge as robust and reliable choices, demonstrating their ability to balance bias and variance, minimize prediction errors, and maintain high classification accuracy. These findings have significant implications for practical applications, highlighting the potential of these methods to enhance predictive modeling in diverse fields where multicollinearity and limited sample sizes are common challenges.
While ridge estimators and GRR show strong predictive performance, it is important to note the limitations of logistic regression, particularly the assumption of linearity in the logit and the interpretability challenges posed by coefficient shrinkage, which can bias odds ratio estimates [58]. Future research could explore advanced regularization techniques such as Smoothly Clipped Absolute Deviation (SCAD), which was proposed by Fan and Li (2001) [59], and adaptive Lasso, which offers variable selection capabilities and improved theoretical properties. Bayesian ridge regression also provides a flexible framework for incorporating prior information and quantifying uncertainty. Additionally, machine learning models, such as random forests and gradient boosting, can effectively handle multicollinearity and capture complex nonlinear relationships, offering valuable alternatives for high-dimensional and correlated data [48].

Author Contributions

Conceptualization, H.M.N. and B.M.G.K.; background research, S.A.; methodology development, H.M.N. and B.M.G.K.; formal analysis and interpretation, H.M.N., S.A. and B.M.G.K.; writing-original draft preparation, H.M.N. and B.M.G.K.; writing-review and editing, H.M.N., S.A. and B.M.G.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Acknowledgments

We are grateful to the Editor and reviewers for their valuable comments and suggestions, which have significantly improved the overall quality and presentation of the paper.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Hosmer, D.W.; Lemeshow, S. Applied Logistic Regression, 2nd ed.; Wiley: New York, NY, USA, 2000. [Google Scholar]
  2. Menard, S.W. Logistic Regression: From Introductory to Advanced Concepts and Applications; Sage: Thousand Oaks, CA, USA, 2010. [Google Scholar]
  3. Allison, P.D. Logistic Regression Using SAS: Theory and Application; SAS Institute: Cary, NC, USA, 2012. [Google Scholar]
  4. Fox, J.; Monette, G. Generalized collinearity diagnostics. J. Am. Stat. Assoc. 1992, 87, 178–183. [Google Scholar] [CrossRef]
  5. Gujarati, D.N. Basic Econometrics, 4th ed.; McGraw-Hill Higher Education: New York, NY, USA, 2002. [Google Scholar]
  6. Field, A. Discovering Statistics Using IBM SPSS Statistics; Sage Publications Limited: London, UK, 2024. [Google Scholar]
  7. Agresti, A. Foundations of Linear and Generalized Linear Models; John Wiley & Sons: Hoboken, NJ, USA, 2015. [Google Scholar]
  8. Hastie, T.; Tibshirani, R.; Friedman, J.H. The Elements of Statistical Learning: Data Mining, Inference, and Prediction; Springer: New York, NY, USA, 2009; Volume 2, pp. 1–758. [Google Scholar]
  9. Hoerl, A.E.; Kennard, R.W. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 1970, 12, 55–67. [Google Scholar] [CrossRef]
  10. McDonald, G.C.; Schwing, R.C. Instabilities of regression estimates relating air pollution to mortality. Technometrics 1973, 15, 463–481. [Google Scholar] [CrossRef]
  11. Hoerl, A.E.; Kannard, R.W.; Baldwin, K.F. Ridge regression: Some simulations. Commun. Stat.-Theory Methods 1975, 4, 105–123. [Google Scholar] [CrossRef]
  12. McDonald, G.C.; Galarneau, D.I. A Monte Carlo evaluation of some ridge-type estimators. J. Am. Stat. Assoc. 1975, 70, 407–416. [Google Scholar] [CrossRef]
  13. Lawless, J.F.; Wang, P. A simulation study of ridge and other regression estimators. Commun. Stat.-Theory Methods 1976, 5, 307–323. [Google Scholar]
  14. Dempster, A.P.; Schatzoff, M.; Wermuth, N. A simulation study of alternatives to ordinary least squares. J. Am. Stat. Assoc. 1977, 72, 104. [Google Scholar]
  15. Gibbons, D.G. A simulation study of some ridge estimators. J. Am. Stat. Assoc. 1981, 76, 131–139. [Google Scholar] [CrossRef]
  16. Schaeffer, R.L.; Gunst, R.F.; Mason, R.L. A ridge logistic estimator. Commun. Stat.-Theory Methods 1984, 13, 99–113. [Google Scholar] [CrossRef]
  17. Schaefer, R.L. Alternative estimators in logistic regression when the data are collinear. J. Stat. Comput. Simul. 1986, 25, 75–91. [Google Scholar] [CrossRef]
  18. Walker, E.; Birch, J.B. Influence measures in ridge regression. Technometrics 1988, 30, 221–227. [Google Scholar] [CrossRef]
  19. Kibria, B.M.G. Performance of some new ridge regression estimators. Commun. Stat.-Simul. Comput. 2003, 32, 419–435. [Google Scholar] [CrossRef]
  20. Khalaf, G.; Shukur, G. Choosing ridge parameter for regression problems. Commun. Stat.-Theory Methods 2005, 34, 1177–1182. [Google Scholar] [CrossRef]
  21. Muniz, G.; Kibria, B.M.G. On some ridge regression estimators: An empirical comparisons. Commun. Stat.-Simul. Comput. 2009, 38, 621–630. [Google Scholar] [CrossRef]
  22. Månsson, K.; Shukur, G.; Kibria, B.M.G. A simulation study of some ridge regression estimators under different distributional assumptions. Commun. Stat.-Simul. Comput. 2010, 39, 1639–1670. [Google Scholar] [CrossRef]
  23. Kibria, B.M.G.; Månsson, K.; Shukur, G. Performance of some logistic ridge regression estimators. Comput. Econ. 2012, 40, 401–414. [Google Scholar] [CrossRef]
  24. Hefnawy, A.E.; Farag, A. A combined nonlinear programming model and Kibria method for choosing ridge parameter regression. Commun. Stat.-Simul. Comput. 2014, 43, 1442–1470. [Google Scholar] [CrossRef]
  25. Aslam, M. Performance of Kibria’s method for the heteroscedastic ridge regression model: Some Monte Carlo evidence. Commun. Stat.-Simul. Comput. 2014, 43, 673–686. [Google Scholar] [CrossRef]
  26. Dorugade, A.V. On comparison of some ridge parameters in ridge regression. Sri Lankan J. Appl. Stat. 2014, 15, 31. [Google Scholar] [CrossRef]
  27. Arashi, M.; Valizadeh, T. Performance of Kibria’s methods in partial linear ridge regression model. Stat. Pap. 2015, 56, 231–246. [Google Scholar] [CrossRef]
  28. Ayinde, K.; Lukman, A.F. Review and classification of the ridge parameter estimation techniques. Hacet. J. Math. Stat. 2016, 46, 1. [Google Scholar]
  29. Lukman, A.F.; Ayinde, K. Review and classifications of the ridge parameter estimation techniques. Hacet. J. Math. Stat. 2017, 46, 953–968. [Google Scholar] [CrossRef]
  30. Melkumova, L.E.; Shatskikh, S.Y. Comparing ridge and LASSO estimators for data analysis. Procedia Eng. 2017, 201, 746–755. [Google Scholar] [CrossRef]
  31. Herawati, N.; Nisa, K.; Setiawan, E. Regularized multiple regression methods to deal with severe multicollinearity. Int. J. Stat. Appl. 2018, 8, 167–172. [Google Scholar]
  32. Lukman, A.F.; Oluyemi, O.A.; Akanbi, O.B.; Clement, O.A. Classification-based ridge estimation techniques of Alkhamisi methods. J. Probab. Stat. Sci. 2018, 16, 165–181. [Google Scholar]
  33. Lukman, A.F.; Ayinde, K.; Binuomote, S.; Clement, O.A. Modified ridge-type estimator to combat multicollinearity: Application to chemical data. J. Chemom. 2019, 33, e3125. [Google Scholar] [CrossRef]
  34. Kibria, B.M.G.; Lukman, A.F. A new ridge-type estimator for the linear regression model: Simulations and applications. Scientifica 2020, 2020, 9758378. [Google Scholar] [CrossRef]
  35. Yüzbaşı, B.; Arashi, M.; Ejaz Ahmed, S. Shrinkage estimation strategies in generalised ridge regression models: Low/high-dimension regime. Int. Stat. Rev. 2020, 88, 229–251. [Google Scholar] [CrossRef]
  36. Kibria, B.M.G. More than hundred (100) estimators for estimating the shrinkage parameter in a linear and generalized linear ridge regression models. J. Econom. Stat. 2023, 2, 233–252. [Google Scholar]
  37. Hoque, M.A.; Kibria, B.M. Some one and two parameter estimators for the multicollinear Gaussian linear regression model: Simulations and applications. Surv. Math. Its Appl. 2023, 18, 183–221. [Google Scholar]
  38. Mermi, S.; Akkuş, Ö.; Göktaş, A.; Gündüz, N. A new robust ridge parameter estimator having no outlier and ensuring normality for linear regression model. J. Radiat. Res. Appl. Sci. 2024, 17, 100788. [Google Scholar] [CrossRef]
  39. Nayem, H.M.; Aziz, S.; Kibria, B.G. Comparison among ordinary least squares, ridge, lasso, and elastic net estimators in the presence of outliers: Simulation and application. Int. J. Stat. Sci. 2024, 24, 25–48. [Google Scholar] [CrossRef]
  40. Hoque, M.A.; Kibria, B.M.G. Performance of some estimators for the multicollinear logistic regression model: Theory, simulation, and applications. Res. Stat. 2024, 2, 2364747. [Google Scholar] [CrossRef]
  41. Alkhamisi, M.A.; Shukur, G. A Monte Carlo study of recent ridge parameters. Commun. Stat.-Simul. Comput. 2007, 36, 535–547. [Google Scholar] [CrossRef]
  42. Alkhamisi, M.; Khalaf, G.; Shukur, G. Some modifications for choosing ridge parameters. Commun. Stat.-Theory Methods 2006, 35, 2005–2020. [Google Scholar] [CrossRef]
  43. Muniz, G.; Kibria, B.M.G.; Månsson, K.; Shukur, G. On developing ridge regression parameters: A graphical investigation. Sort 2012, 36, 115–138. [Google Scholar]
  44. Yang, S.-P.; Emura, T. A Bayesian approach with generalized ridge estimation for high-dimensional regression and testing. Commun. Stat.-Simul. Comput. 2017, 46, 6083–6105. [Google Scholar] [CrossRef]
  45. Emura, T.; Matsumoto, K.; Uozumi, R.; Michimae, H.G. ridge: An R package for generalized ridge regression for sparse and high-dimensional linear models. Symmetry 2024, 16, 223. [Google Scholar] [CrossRef]
  46. Yuan, M.; Lin, Y. On the non-negative garrotte estimator. J. R. Stat. Soc. Ser. B Stat. Methodol. 2007, 69, 143–161. [Google Scholar] [CrossRef]
  47. Lounici, K. Sup-norm convergence rate and sign concentration property of Lasso and Dantzig estimators. Electron. J. Stat. 2008, 2, 90–102. [Google Scholar] [CrossRef]
  48. Zou, H.; Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B Stat. Methodol. 2005, 67, 301–320. [Google Scholar] [CrossRef]
  49. Park, H.; Konishi, S. Robust logistic regression modelling via the elastic net-type regularization and tuning parameter selection. J. Stat. Comput. Simul. 2016, 86, 1450–1461. [Google Scholar] [CrossRef]
  50. Emmert-Streib, F.; Dehmer, M. High-dimensional LASSO-based computational regression models: Regularization, shrinkage, and selection. Mach. Learn. Knowl. Extr. 2019, 1, 359–383. [Google Scholar] [CrossRef]
  51. Alheety, M.I.; Nayem, H.M.; Kibria, B.G. An unbiased convex estimator depending on prior information for the classical linear regression model. Stats 2025, 8, 16. [Google Scholar] [CrossRef]
  52. Newhouse, J.P.; Oman, S.D. An Evaluation of Ridge Estimators; RAND: Santa Monica, CA, USA, 1971. [Google Scholar]
  53. Baratloo, A.; Hosseini, M.; Negida, A.; El Ashal, G. Part 1: Simple definition and calculation of accuracy, sensitivity and specificity. Emergency 2015, 3, 48. [Google Scholar]
  54. Asar, Y.; Genç, A. New shrinkage parameters for the Liu-type logistic estimators. Commun. Stat.-Simul. Comput. 2016, 45, 1094–1103. [Google Scholar] [CrossRef]
  55. Weissfeld, L.A.; Sereika, S.M. A multicollinearity diagnostic for generalized linear models. Commun. Stat.-Theory Methods 1991, 20, 1183–1198. [Google Scholar] [CrossRef]
  56. Ertan, E.; Akay, K.U. Identifying a class of ridge-type estimators in binary logistic regression models. Statistics 2024, 58, 1092–1116. [Google Scholar] [CrossRef]
  57. Lesaffre, E.; Marx, B.D. Collinearity in generalized linear regression. Commun. Stat.-Theory Methods 1993, 22, 1933–1952. [Google Scholar] [CrossRef]
  58. Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 1996, 58, 267–288. [Google Scholar] [CrossRef]
  59. Fan, J.; Li, R. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 2001, 96, 1348–1360. [Google Scholar] [CrossRef]
Figure 1. Correlation matrix of municipal data.
Figure 1. Correlation matrix of municipal data.
Stats 08 00045 g001
Figure 2. MSE and Accuracy comparison.
Figure 2. MSE and Accuracy comparison.
Stats 08 00045 g002
Figure 3. Correlation matrix of cancer remission data.
Figure 3. Correlation matrix of cancer remission data.
Stats 08 00045 g003
Table 1. Ridge penalties compared in this study.
Table 1. Ridge penalties compared in this study.
Ridge PenaltiesReference
k A S   : m a x 1 q i [41]
k A 1   : m a x s 2 θ ^ i 2 [42]
k D   : σ ^ 2 m i n   ( σ ^ 2 α ^ i 2 + 1 λ i ) [25]
k L A 1 : p i = 1 p m a x λ i σ ^ 2 ( n p ) σ ^ 2 + m a x λ i a ^ i 2 [27]
k K 1 : m e d i a n σ ^ 2 θ ^ i 2 + 1 λ i [36]
k K 2 : m a x σ ^ 2 θ ^ i 2 + 1 λ i [36]
k K 3 : m i n σ ^ 2 θ ^ i 2 + 1 λ i [36]
k K 4 : G e o m e t r i c mean σ ^ 2 θ ^ i 2 + 1 λ i [36]
k K 5 : H a r m o n i c m e a n σ ^ 2 θ ^ i 2 + 1 λ i [36]
k L A 2 : m e d i a n m a x   λ i σ ^ 2 ( n p ) σ ^ 2 + λ i α ^ i 2 [27]
k L 1 : m a x m a x   λ i σ ^ 2 ( n p ) σ ^ 2 + m a x   λ i a ^ i 2 [32]
k L A 3 : p i = 1 p λ i σ ^ 2 ( n p ) σ ^ 2 + λ i a ^ i 2 [27]
k A 2 : 1 p i = 1 p m a x   λ i σ ^ 2 ( n p ) σ ^ 2 + λ i α ^ i 2 [42]
k M 1 : m a x 1 λ i σ ^ 2 ( n p ) σ ^ 2 + λ i a ^ i 2 [43]
k M 2 : i = 1 p m a x   λ i σ ^ 2 ( n p ) σ ^ 2 + λ i α ^ i 2 1 p [43]
k L 2 : m a x 1 / λ i σ ^ 2 ( n p ) σ ^ 2 + m a x   λ i α ^ i 2 [32]
k L 3 : m a x 1 / λ i σ ^ 2 ( n p ) σ ^ 2 + λ i m a x   α ^ i 2 [32]
k K 6 : i = 0 p 1 q i 1 p [21]
k L 4 : m a x 1 / λ i σ ^ 2 ( n p ) σ ^ 2 + λ i m a x   α ^ i 2 [32]
k A L 1 : 1 2 σ ^ 2 m a x   λ i   m a x a ^ i 2 [28]
k A L 2 : 1 2 p σ ^ 2 m a x   λ i i = 1 p a ^ i 2 [28]
k K L 1 : σ ^ p 1 + p n [34]
k K L 2 : σ ^ × m a x p 1 + p n , p 1 + 1 p [34]
Table 2. Estimated MSE values and Accuracy for correlation 0.80.
Table 2. Estimated MSE values and Accuracy for correlation 0.80.
ModelMSEAccuracy
P = 3P = 5P = 10P = 3P = 5P = 10
n = 100n = 200n = 300n = 100n = 200n = 300n = 100n = 200n = 300n = 100n = 200n = 300n = 100n = 200n = 300n = 100n = 200n = 300
k A S 0.0990.0990.1000.0960.0980.0990.0910.0950.0970.7360.7340.7340.7700.7700.7700.8150.8160.818
k A 1 0.1000.1000.1000.0990.0990.1000.0970.0980.0990.7360.7340.7340.7700.7700.7700.8150.8160.818
k D 0.2970.2230.1920.2470.1540.1110.6020.1870.1200.7300.7320.7330.7570.7630.7660.7870.7990.808
k L A 1 0.1000.1110.1190.0510.0490.0430.0670.0610.0570.7350.7340.7340.7700.7700.7700.8150.8160.818
k K 1 0.0920.0920.0920.0850.0870.0860.0860.0890.0890.7360.7340.7340.7700.7700.7700.8150.8160.818
k K 2 0.0880.0890.0890.0700.0750.0760.0660.0720.0750.7360.7340.7340.7700.7700.7700.8150.8160.818
k K 3 0.0960.0950.0950.0970.0970.0960.0990.0990.0990.7360.7340.7340.7700.7700.7700.8150.8160.818
k K 4 0.0930.0930.0920.0890.0900.0890.0900.0920.0930.7360.7340.7340.7700.7700.7700.8150.8160.818
k K 5 0.0910.0910.0910.0830.0850.0850.0820.0860.0870.7360.7340.7340.7700.7700.7700.8150.8160.818
k L A 2 0.3580.3410.3280.1450.1430.1250.0560.0630.0610.7300.7310.7320.7620.7640.7660.8090.8090.813
k L 1 0.3580.3420.3280.1450.1430.1250.0560.0630.0610.7300.7310.7320.7620.7640.7660.8090.8090.813
k L A 3 0.1170.1320.1410.0440.0460.0410.0490.0430.0400.7350.7340.7340.7690.7690.7700.8150.8160.818
k A 2 0.3580.3420.3280.1460.1430.1250.0560.0630.0610.7300.7310.7320.7620.7640.7660.8090.8090.813
k M 1 0.1000.1000.1000.1000.1000.1000.1000.1000.1000.7360.7340.7340.7700.7700.7700.8150.8160.818
k M 2 0.3580.3420.3280.1460.1430.1250.0560.0630.0610.7300.7310.7320.7620.7640.7660.8090.8090.813
k L 2 0.0970.0980.0980.0940.0960.0960.0930.0950.0960.7360.7340.7340.7700.7700.7700.8150.8160.818
k L 3 0.1000.1000.1000.1000.1000.1000.1000.1000.1000.7360.7340.7340.7700.7700.7700.8150.8160.818
k K 6 0.0990.0990.1000.0960.0980.0990.0900.0950.0970.7360.7340.7340.7700.7700.7700.8150.8160.818
k L 4 0.1000.1000.1000.1000.1000.1000.1000.1000.1000.7360.7340.7340.7700.7700.7700.8150.8160.818
k A L 1 0.0860.0860.0870.0560.0550.0510.0620.0570.0540.7360.7340.7340.7700.7700.7700.8150.8160.818
k A L 2 0.0890.0900.0900.0490.0490.0450.0500.0460.0440.7360.7340.7340.7700.7700.7700.8150.8160.818
k K L 1 0.0900.0900.0900.0850.0860.0850.0890.0890.0890.7360.7340.7340.7700.7700.7700.8150.8160.818
k K L 2 0.0880.0880.0880.0820.0820.0810.0890.0880.0880.7360.7340.7340.7700.7700.7700.8150.8160.818
LR0.5930.4170.3690.5090.2370.1710.4310.2600.1510.7210.7270.7290.7460.7580.7620.7730.7940.804
Lasso0.3830.3440.3280.2100.1630.1350.1610.1030.0840.7260.7290.7310.7560.7610.7640.7970.8030.809
EN0.3320.3050.2970.1640.1330.1120.1250.0780.0650.7280.7300.7320.7600.7630.7650.8030.8060.811
GRR0.2800.2680.2640.1140.1000.0870.0640.0470.0410.7320.7320.7330.7640.7660.7680.8090.8120.815
Table 3. Estimated MSE values and Accuracy for correlation 0.90.
Table 3. Estimated MSE values and Accuracy for correlation 0.90.
ModelMSEAccuracy
P = 3P = 5P = 10P = 3P = 5P = 10
n = 100n = 200n = 300n = 100n = 200n = 300n = 100n = 200n = 300n = 100n = 200n = 300n = 100n = 200n = 300n = 100n = 200n = 300
k A S 0.0980.0990.0990.0950.0970.0980.0880.0930.0960.7460.7460.7460.7820.7820.7820.8300.8310.832
k A 1 0.0990.1000.1000.0980.0990.0990.0960.0980.0980.7460.7460.7460.7820.7820.7820.8300.8310.832
k D 0.3570.2660.2360.3420.2070.1591.0000.3180.2050.7420.7440.7450.7730.7760.7790.8060.8170.822
k L A 1 0.0970.1030.1190.0530.0480.0470.0680.0620.0590.7460.7460.7460.7810.7820.7820.8300.8310.832
k K 1 0.0930.0930.0930.0880.0890.0890.0870.0900.0900.7460.7460.7460.7820.7820.7820.8300.8310.832
k K 2 0.0880.0870.0880.0720.0750.0770.0680.0730.0760.7460.7460.7460.7820.7820.7820.8300.8310.832
k K 3 0.0970.0960.0960.0970.0970.0970.0980.0990.0990.7460.7460.7460.7820.7820.7820.8300.8310.832
k K 4 0.0940.0930.0930.0910.0910.0910.0910.0930.0940.7460.7460.7460.7820.7820.7820.8300.8310.832
k K 5 0.0910.0900.0910.0840.0860.0870.0840.0870.0890.7460.7460.7460.7820.7820.7820.8300.8310.832
k L A 2 0.3500.3430.3470.1340.1430.1450.0440.0530.0610.7420.7430.7440.7780.7780.7800.8270.8280.829
k L 1 0.3500.3430.3470.1340.1430.1450.0440.0530.0610.7420.7430.7440.7790.7780.7800.8270.8280.829
k L A 3 0.1160.1250.1450.0460.0440.0460.0490.0430.0420.7460.7460.7460.7810.7820.7820.8300.8310.832
k A 2 0.3510.3440.3470.1350.1430.1450.0440.0540.0610.7420.7430.7440.7790.7780.7800.8280.8280.829
k M 1 0.1000.1000.1000.1000.1000.1000.1000.1000.1000.7460.7460.7460.7820.7820.7820.8300.8310.832
k M 2 0.3510.3440.3470.1350.1430.1450.0440.0540.0610.7420.7430.7440.7790.7780.7800.8280.8280.829
k L 2 0.0970.0980.0980.0950.0960.0970.0940.0960.0960.7460.7460.7460.7820.7820.7820.8300.8310.832
k L 3 0.1000.1000.1000.1000.1000.1000.1000.1000.1000.7460.7460.7460.7820.7820.7820.8300.8310.832
k K 6 0.0980.0990.0990.0940.0970.0980.0870.0930.0950.7460.7460.7460.7820.7820.7820.8300.8310.832
k L 4 0.1000.1000.1000.1000.1000.1000.1000.1000.1000.7460.7460.7460.7820.7820.7820.8300.8310.832
k A L 1 0.0840.0810.0870.0590.0550.0550.0650.0600.0580.7460.7460.7460.7810.7820.7820.8300.8310.832
k A L 2 0.0860.0840.0900.0520.0490.0490.0530.0490.0480.7460.7460.7460.7810.7820.7820.8300.8310.832
k K L 1 0.0890.0880.0900.0840.0840.0840.0880.0880.0890.7460.7460.7460.7820.7820.7820.8300.8310.832
k K L 2 0.0870.0850.0870.0810.0800.0800.0880.0870.0870.7460.7460.7460.7820.7820.7820.8300.8310.832
LR0.7860.4900.4240.8710.3620.2622.9870.5190.2940.7310.7380.7410.7590.7700.7750.7870.8110.818
Lasso0.4610.3850.3680.3020.2160.1890.2770.1530.1290.7380.7410.7420.7730.7750.7780.8160.8220.825
EN0.3760.3260.3190.2220.1600.1440.2050.1030.0900.7410.7430.7430.7760.7770.7790.8220.8250.827
GRR0.2890.2630.2680.1290.1040.0990.0690.0520.0480.7430.7440.7450.7790.7800.7810.8260.8280.830
Table 4. Estimated MSE values & Accuracy for correlation 0.95.
Table 4. Estimated MSE values & Accuracy for correlation 0.95.
ModelMSEAccuracy
P = 3P = 5P = 10P = 3P = 5P = 10
n = 100n = 200n = 300n = 100n = 200n = 300n = 100n = 200n = 300n = 100n = 200n = 300n = 100n = 200n = 300n = 100n = 200n = 300
k A S 0.0980.0990.0990.0940.0970.0980.0870.0920.0950.7520.7490.7510.7870.7870.7890.8370.8370.837
k A 1 0.0990.1000.1000.0980.0990.0990.0950.0970.0980.7520.7490.7510.7870.7870.7890.8370.8370.837
k D 0.4450.3420.2790.4690.2740.2122.3040.5260.3380.7500.7470.7500.7800.7840.7850.8180.8250.829
k L A 1 0.1000.1140.1150.0530.0470.0470.0680.0620.0590.7530.7490.7510.7870.7870.7880.8370.8370.837
k K 1 0.0950.0960.0950.0910.0920.0920.0910.0920.0920.7520.7490.7510.7870.7870.7890.8370.8370.837
k K 2 0.0900.0900.0880.0720.0760.0770.0700.0760.0780.7520.7490.7510.7870.7870.7890.8370.8370.837
k K 3 0.0980.0980.0970.0970.0980.0980.0990.0990.0990.7520.7490.7510.7870.7870.7890.8370.8370.837
k K 4 0.0950.0950.0940.0920.0930.0930.0930.0940.0950.7520.7490.7510.7870.7870.7890.8370.8370.837
k K 5 0.0920.0930.0920.0860.0880.0890.0870.0890.0910.7520.7490.7510.7870.7870.7890.8370.8370.837
k L A 2 0.3430.3660.3540.1130.1320.1450.0370.0440.0510.7510.7480.7500.7860.7860.7870.8360.8360.836
k L 1 0.3440.3660.3540.1130.1320.1450.0370.0440.0510.7510.7480.7500.7860.7860.7870.8360.8360.836
k L A 3 0.1210.1390.1410.0450.0420.0460.0490.0430.0410.7530.7490.7510.7870.7870.7880.8370.8370.837
k A 2 0.3440.3670.3550.1140.1320.1450.0370.0440.0510.7510.7480.7500.7860.7860.7870.8360.8360.836
k M 1 0.1000.1000.1000.1000.1000.1000.1000.1000.1000.7520.7490.7510.7870.7870.7890.8370.8370.837
k M 2 0.3440.3670.3550.1140.1320.1450.0370.0440.0510.7510.7480.7500.7860.7860.7870.8360.8360.836
k L 2 0.0980.0990.0990.0960.0970.0980.0960.0970.0970.7520.7490.7510.7870.7870.7890.8370.8370.837
k L 3 0.1000.1000.1000.1000.1000.1000.1000.1000.1000.7520.7490.7510.7870.7870.7890.8370.8370.837
k K 6 0.0980.0990.0990.0940.0970.0980.0860.0920.0950.7520.7490.7510.7870.7870.7890.8370.8370.837
k L 4 0.1000.1000.1000.1000.1000.1000.1000.1000.1000.7520.7490.7510.7870.7870.7890.8370.8370.837
k A L 1 0.0860.0880.0850.0600.0550.0560.0650.0610.0590.7530.7490.7510.7870.7870.7880.8370.8370.837
k A L 2 0.0880.0920.0880.0520.0480.0500.0540.0500.0490.7530.7490.7510.7870.7870.7880.8370.8370.837
k K L 1 0.0890.0900.0890.0830.0830.0840.0880.0880.0880.7520.7490.7510.7870.7870.7890.8370.8370.837
k K L 2 0.0870.0880.0860.0800.0790.0800.0880.0870.0870.7520.7490.7510.7870.7870.7890.8370.8370.837
LR1.2040.6700.5247.3440.6260.4206.5951.0450.5720.7370.7420.7470.7660.7760.7810.7940.8160.823
Lasso0.6000.4810.4270.4240.2970.2560.4160.2320.1880.7470.7460.7490.7800.7820.7840.8280.8300.832
EN0.4630.3840.3480.2960.1990.1770.2810.1420.1160.7490.7470.7500.7830.7850.7860.8310.8330.834
GRR0.3020.2820.2620.1240.1010.0980.0650.0500.0450.7510.7490.7510.7860.7860.7880.8350.8360.836
Table 5. Estimated MSE values & Accuracy for correlation 0.99.
Table 5. Estimated MSE values & Accuracy for correlation 0.99.
ModelMSEAccuracy
P = 3P = 5P = 10P = 3P = 5P = 10
n = 100n = 200n = 300n = 100n = 200n = 300n = 100n = 200n = 300n = 100n = 200n = 300n = 100n = 200n = 300n = 100n = 200n = 300
k A S 0.0980.0990.0990.0940.0970.0980.0860.0920.0940.7560.7550.7540.7950.7920.7920.8420.8430.843
k A 1 0.0990.1000.1000.0980.0990.0990.0950.0970.0980.7560.7550.7540.7950.7920.7920.8420.8430.843
k D 0.8240.5280.3821.2500.6170.4134.8061.6251.0100.7540.7550.7540.7900.7900.7900.8280.8350.838
k L A 1 0.0990.1110.1110.0580.0520.0490.0690.0630.0600.7560.7550.7540.7950.7920.7920.8420.8430.843
k K 1 0.0990.0990.0990.0980.0980.0980.0970.0970.0970.7560.7550.7540.7950.7920.7920.8420.8430.843
k K 2 0.0900.0900.0880.0780.0800.0800.0780.0820.0830.7560.7550.7540.7950.7920.7920.8420.8430.843
k K 3 0.0990.0990.0990.0990.0990.0990.0990.0990.0990.7560.7550.7540.7950.7920.7920.8420.8430.843
k K 4 0.0970.0970.0970.0970.0970.0970.0970.0980.0980.7560.7550.7540.7950.7920.7920.8420.8430.843
k K 5 0.0930.0940.0930.0910.0920.0930.0930.0950.0950.7560.7550.7540.7950.7920.7920.8420.8430.843
k L A 2 0.2930.3250.3280.0970.1100.1180.0300.0320.0350.7560.7550.7540.7950.7920.7910.8420.8430.843
k L 1 0.2930.3250.3280.0970.1100.1180.0300.0320.0350.7560.7550.7540.7950.7920.7910.8420.8430.843
k L A 3 0.1200.1370.1370.0520.0490.0480.0500.0440.0420.7560.7550.7540.7950.7920.7920.8420.8430.843
k A 2 0.2940.3260.3280.0970.1100.1180.0300.0320.0350.7560.7550.7540.7950.7920.7910.8420.8430.843
k M 1 0.1000.1000.1000.1000.1000.1000.1000.1000.1000.7560.7550.7540.7950.7920.7920.8420.8430.843
k M 2 0.2940.3260.3280.0970.1100.1180.0300.0320.0350.7560.7550.7540.7950.7920.7910.8420.8430.843
k L 2 0.0990.0990.0990.0980.0990.0990.0980.0980.0990.7560.7550.7540.7950.7920.7920.8420.8430.843
k L 3 0.1000.1000.1000.1000.1000.1000.1000.1000.1000.7560.7550.7540.7950.7920.7920.8420.8430.843
k K 6 0.0980.0990.0990.0930.0970.0980.0850.0920.0940.7560.7550.7540.7950.7920.7920.8420.8430.843
k L 4 0.1000.1000.1000.1000.1000.1000.1000.1000.1000.7560.7550.7540.7950.7920.7920.8420.8430.843
k A L 1 0.0850.0870.0830.0620.0580.0560.0630.0590.0570.7560.7550.7540.7950.7920.7920.8420.8430.843
k A L 2 0.0880.0900.0860.0560.0530.0510.0520.0490.0480.7560.7550.7540.7950.7920.7920.8420.8430.843
k K L 1 0.0890.0890.0880.0840.0840.0840.0880.0880.0880.7560.7550.7540.7950.7920.7920.8420.8430.843
k K L 2 0.0870.0870.0850.0810.0800.0800.0880.0870.0860.7560.7550.7540.7950.7920.7920.8420.8430.843
LR4.4972.0441.3697.5432.6991.7099.6535.2592.7540.7400.7480.7490.7740.7800.7830.7960.8220.829
Lasso1.3560.9010.7291.2170.7440.5861.7110.5800.4520.7530.7530.7520.7910.7890.7890.8360.8400.840
EN1.0510.6710.5220.9180.4770.3591.0240.3320.2520.7530.7540.7530.7920.7900.7900.8390.8420.842
GRR0.2810.2650.2440.1140.0970.0890.0450.0370.0340.7560.7550.7540.7950.7920.7910.8420.8430.843
Table 6. Comparison of MSE and Accuracy for municipal data.
Table 6. Comparison of MSE and Accuracy for municipal data.
EstimatorMSEAccuracy
k A S 0.2430.671
k A 1 0.2430.671
k D 0.5600.622
k L A 1 0.1670.671
k K 1 0.2500.671
k K 2 0.2470.671
k K 3 0.2500.671
k K 4 0.2490.671
k K 5 0.2490.671
k L A 2 0.1600.659
k L 1 0.1600.659
k L A 3 0.1510.671
k A 2 0.1600.659
k M 1 0.2500.671
k M 2 0.1600.659
k L 2 0.2500.671
k L 3 0.2500.671
k K 6 0.2430.671
k L 4 0.2500.671
k A L 1 0.1990.671
k A L 2 0.1790.671
k K L 1 0.2170.671
k K L 2 0.2090.671
LR0.7110.622
Lasso0.1870.646
EN0.1880.646
GRR0.1470.671
Table 7. Comparison of MSE and Accuracy for cancer remission data.
Table 7. Comparison of MSE and Accuracy for cancer remission data.
EstimatorMSEAccuracy
k A S 0.1290.667
k A 1 0.1230.667
k D 68.3561.000
k L A 1 0.4340.667
k K 1 0.3670.667
k K 2 2.0411.000
k K 3 0.1390.667
k K 4 0.1560.667
k K 5 0.6260.741
k L A 2 3.2341.000
k L 1 3.1381.000
k L A 3 0.7610.778
k A 2 3.6681.000
k M 1 0.1270.667
k M 2 4.4781.000
k L 2 0.1140.667
k L 3 0.1270.667
k K 6 0.2440.667
k L 4 0.1270.667
k A L 1 0.1480.667
k A L 2 0.1770.667
k K L 1 0.2250.667
k K L 2 0.2250.667
LR376.4141.000
Lasso33.0971.000
EN19.6571.000
GRR0.7170.778
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Nayem, H.M.; Aziz, S.; Kibria, B.M.G. Evaluating Estimator Performance Under Multicollinearity: A Trade-Off Between MSE and Accuracy in Logistic, Lasso, Elastic Net, and Ridge Regression with Varying Penalty Parameters. Stats 2025, 8, 45. https://doi.org/10.3390/stats8020045

AMA Style

Nayem HM, Aziz S, Kibria BMG. Evaluating Estimator Performance Under Multicollinearity: A Trade-Off Between MSE and Accuracy in Logistic, Lasso, Elastic Net, and Ridge Regression with Varying Penalty Parameters. Stats. 2025; 8(2):45. https://doi.org/10.3390/stats8020045

Chicago/Turabian Style

Nayem, H. M., Sinha Aziz, and B. M. Golam Kibria. 2025. "Evaluating Estimator Performance Under Multicollinearity: A Trade-Off Between MSE and Accuracy in Logistic, Lasso, Elastic Net, and Ridge Regression with Varying Penalty Parameters" Stats 8, no. 2: 45. https://doi.org/10.3390/stats8020045

APA Style

Nayem, H. M., Aziz, S., & Kibria, B. M. G. (2025). Evaluating Estimator Performance Under Multicollinearity: A Trade-Off Between MSE and Accuracy in Logistic, Lasso, Elastic Net, and Ridge Regression with Varying Penalty Parameters. Stats, 8(2), 45. https://doi.org/10.3390/stats8020045

Article Metrics

Back to TopTop