Next Article in Journal
Shock Mach Number Effect on Instability Evolution at a Light–Heavy Fluid Interface: A Numerical Investigation
Previous Article in Journal
Evaluating Some Improper Sine and Cosine Integrals
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Novel Data-Driven Shrinkage Ridge Parameters for Handling Multicollinearity in Regression Models with Environmental and Chemical Data Applications

by
Muteb Faraj Alharthi
Department of Mathematics and Statistics, College of Science, Taif University, Taif 21994, Saudi Arabia
Axioms 2025, 14(11), 812; https://doi.org/10.3390/axioms14110812
Submission received: 3 September 2025 / Revised: 24 October 2025 / Accepted: 30 October 2025 / Published: 31 October 2025
(This article belongs to the Section Mathematical Analysis)

Abstract

Multicollinearity among predictor variables is a common challenge in modeling chemical and environmental datasets in physical sciences, often leading to collinearity issues and unreliable parameter estimates when fitting regression models. Ridge regression method emerges as an effective solution to this problem by introducing a penalty term ( k ) that shrinks parameters to mitigate multicollinearity and balances bias and variance. In this study, we propose three novel shrinkage parameters for ridge regression and use them to develop three ridge-type estimators, referred to as SPS1, SPS2, and SPS3, which are designed to enhance parameter estimation based on sample size ( n ) , number of predictors ( p ) , and standard error ( σ ) . These shrinkage estimators aim to improve the accuracy of regression models in the presence of multicollinearity. To evaluate the performance of the SPS estimators, we conduct comprehensive Monte Carlo simulations comparing them to ordinary least squares (OLS) and other existing estimators based on the mean squared error (MSE) criteria. The simulation results demonstrate that the SPS estimators outperform OLS and other methods. Additionally, we apply these three shrinkage estimators to two real-world environmental and chemical datasets, showing their ability to address multicollinearity as compared to OLS and other estimators. The proposed SPS estimators offer more stable and accurate regression results, contributing to improved decision-making in environmental modeling, pollution analysis, and other scientific research involving correlated variables.

1. Introduction

Linear regression models are widely employed in physical sciences to analyze relationships between variables in fields such as environmental monitoring, material science, chemistry, and geophysics. A critical assumption in these models is that predictor variables are independent; however, physical systems often exhibit strong interdependencies and correlations among measured factors. This multicollinearity presents significant challenges in regression analysis, as it obscures the individual effect of each variable and leads to unstable coefficient estimates. In some cases, parameter estimates may fluctuate dramatically or change signs unexpectedly, undermining the physical interpretability of the model. Such instability reduces the reliability of conclusions drawn about complex physical phenomena and limits the model’s predictive accuracy, complicating the understanding and management of environmental and physical processes.
While many researchers have proposed different shrinkage or ridge parameters to address multicollinearity, only a few are widely examined in the literature. Initially, Hoerl and Kennard [1] developed a ridge regression method that introduced a penalty parameter ( k ) to reduce the influence of highly correlated variables to remove the effect of multicollinearity. In the shrinkage technique, penalty term is added to stabilize the coefficient estimates, hence improving the statistical model’s predictive accuracy. However, the ridge parameter introduces a slight bias in the estimators. When the penalty parameter is set to zero, ridge regression essentially becomes equivalent to OLS. Over the years, various improvements have been made to ridge regression to enhance its effectiveness in tackling multicollinearity. Rao and Toutenburg [2] proposed generalized ridge regression, offering more flexibility in selecting the penalty terms. Furthermore, the authors in [3,4] improved ridge estimators and the process of shrinking coefficients, enhancing the model’s stability and precision. Most recently, advanced developments by authors [5,6,7,8,9] innovated shrinkage parameters to handle severe multicollinearity, further improving the accuracy, reliability, and predication of regression models. One of the recent studies is Ref. [10], who introduced several two-ridge estimators, which are based on the data structure, for handling collinear datasets.
It is clear from the literature that no single ridge estimator achieves optimal performance across all scenarios, creating a significant research gap in addressing multicollinearity effectively. Existing methods often fail to incorporate critical factors, such as sample size, standard error, and the number of predictors, which are crucial for improving the accuracy and reliability of regression estimates. To bridge this gap, we propose three new estimators, denoted as SPS1, SPS2 and SPS3, that specifically address these issues by incorporating the sample size ( n ) , standard error ( σ ) , and number of predictors ( p ) in their formulations. Our newly proposed estimators are designed to mitigate multicollinearity, shrink regression coefficients, and achieve lower MSE as compared to OLS and other existing estimators. Through extensive simulation analysis and practical applications, the newly proposed SPS estimators consistently outperform both OLS and other existing estimators across various scenarios.
The structure of this paper is organized as follows. Section 2 introduces the statistical models, discusses various ridge estimators, and presents the proposed shrinkage parameters. Section 3 details the simulation study conducted to evaluate the performance of the proposed estimators. In Section 4, two real-life datasets are analyzed to demonstrate the practical advantages of the proposed estimators. Finally, the paper concludes with key findings and remarks in Section 5.

2. Materials and Methods

Consider the following multiple linear regression model:
y = X α + ϵ ,
where y = y 1 y 2 y n is an ( n × 1 ) observations response vector, X = x 11 x 12 x 1 p x 21 x 22 x 2 p x n 1 x n 2 x n p is an ( n × p ) design matrix, α = α 1 α 2 α p is a ( p × 1 ) unknown parameter vector, and ϵ = ϵ 1 ϵ 2 ϵ n is an ( n × 1 ) error vector.
The OLS estimator is given as follows:
α ^ o l s = ( X X ) 1 X y ,
where C o v α ^ = σ 2 ( X X ) 1 . However, when predictors are highly correlated, X X becomes nearly singular (| X X | ≈ 0), causing unstable estimates and large variances [11]. To address this issue, ridge regression method introduces a shrinkage parameter k ( k >   0 ) . Ridge regression estimator is given as follows:
α ^ R R = ( X X + k I ) 1 X y ,
where I is the identity matrix. This stabilizes estimates by trading bias for reduced variance. The ridge shrinkage parameter k biases the estimates; however, it reduces their variance compared to OLS estimator [1]. When k = 0 , the ridge regression estimator approaches the OLS estimator ( α ^ o l s = α ^ R R ), and as k → ∞, α ^ 0 .
The regression model (1) can be expressed in its canonical form as follows:
y = Z β + ϵ ,
where Z = X Q , β = β 1 β 2 β p = Q α , and Q is an orthogonal matrix such that Q X X Q = Λ, with Λ  =   d i a g   ( λ 1 ,   λ 2 ,   ,   λ p ) representing the eigenvalues of X X . In this form, OLS and ridge regression estimator are given in Equation (5) and Equation (6), respectively.
β ^ o l s = ( Z Z ) 1 Z y ,
β ^ R R = ( Z Z + K ) 1 Z y ,
where K   =   d i a g ( k 1 ,   k 2 ,   ,   k p )   a n d   k i   >   0 for all i = 1 ,   2 , ,   p .
The MSE of the OLS and ridge regression estimator are defined in Equation (7) and Equation (8), respectively.
M S E ( β ^ o l s ) = σ 2 i = 1 p 1 λ i ,
S E ( β ^ R R ) = σ 2 i = 1 p λ i λ i + k i 2 + k i   β i 2 λ i + k i 2 .
The first term in Equation (8) represents variance, while the second term captures the bias introduced by ridge regression. As k increases, the variance decreases and the bias increases.

2.1. Ridge Regression Estimators

In ridge regression, k is often referred to as the ridge, shrinkage, or penalty term, and it plays an important role in addressing collinearity in the data. Its value must be estimated from real data, and much of the recent research in ridge regression has focused on developing methods to determine an optimal k value. In this section, we review various statistical approaches for estimating k and their applications in addressing multicollinearity.

2.1.1. Existing Ridge Estimators

Hoerl and Kennard [1] mathematically determined that the optimal ridge parameter ( k i ) equals the estimated error variance divided by the square of the ordinary least squares coefficient estimate, as presented in Equation (9).
k ^ i = σ ^ 2 β ^ max 2 ,
where σ ^ 2   is the estimated error variance and β ^ max = m a x ( β ^ 1 ,   β ^ 2 ,   , β ^ p ) . This estimator is known as HK estimator.
Hoerl et al. [12] further modified the HK estimator, and developed BHK estimator with estimated k defined as follows:
k ^ B H K =   σ ^ 2 i = 1 p β ^ i 2 .
Kibria [13] introduced some new ridge estimators based on averages: arithmetic mean, geometric mean, and median for collinearity dataset. These estimators are denoted by KAM, KGM, and Kmed in this work. Their estimated k values are mathematically defined as follows:
k ^ A M = 1 p i = 1 p σ ^ 2 β ^ i 2 ,
k ^ G M = σ ^ 2 i = 1 p β ^ i 2 1 / p ,
k ^ m e d = Med σ ^ 2 β ^ i 2
Khalaf et al. [14] developed a new estimator denoted as KMS by making eigenvalue-based adjustments. The estimated k value for this estimator is expressed as follows:
k ^ K M S = λ max i = 1 p β ^ i σ ^ 2 β ^ max 2 .
Most recently, Ref. [15] introduced a few shrinkage parameters to improve and better handle collinear data. Among these, the Balanced Log Ridge Estimator (BLRE) is chosen for comparison with our proposed estimators. The k -value for the BLRE is defined as follows:
k ^ B L R E = 1 p i = 1 p γ i σ ^ 2 β ^ m a x 2 ,
where γ i = ln λ i β i ^ .
The newly proposed SPS estimators are compared with OLS and above ridge estimators, HK, BHK, KAM, KGM, Kmed, KMS, and BLRE in this research.

2.1.2. Proposed Ridge-Type Estimators

In this study, we proposed three new ridge parameters to establish three new ridge estimators, that referred to as SPS1, SPS2, and SPS3, which are based on data components such as the sample size ( n ) , the number of predictor variables ( p ) , and the standard error ( σ ) . The k ^ 1 ,   k ^ 2   and   k ^ 3 for the three proposed estimators are mathematically defined as follows:
SPS 1 : k ^ 1 = σ ^   p ( 1 + p n )     l n ( 1 + p ) ,
SPS 2 : k ^ 2 = σ ^   p ( 1 + p n )   1 p n + p ,
SPS 3 : k ^ 3 = σ ^   p ( 1 + p n ) + ( σ ^ 2 n + p ) .
The proposed k ^ 1 ,   k ^ 2   a n d   k ^ 3 assist in controlling the influence of correlated independent variables. SPS1 adjusts the penalty with a logarithmic term, which is influenced by both the number of predictors and the sample size. SPS2 introduces a correction factor based on the relationship between p and n . SPS3 incorporates both the standard error and the sum of the sample size and number of predictors, providing a more balanced adjustment to the penalty term.

2.2. Mean Squared Error

The estimators’ performance was evaluated using the MSE criterion, a widely used metric in previous similar studies [16,17,18]. The MSE can be defined as follows:
M S E β ^ = 1 N j = 1 N β ^ i j β i β ^ i j β i   ,     i = 1,2 ,   , p .
As the theoretical comparisons can be complex, we examined the performance of these ridge estimators (Equations (9)–(18)) via Monte Carlo simulations, as described in next section.

3. Monte Carlo Simulation Approach

To assess ridge estimators, the predictor variables are generated using Equation (20), where x i j is simulated through a standard approach widely used by researchers [19,20,21].
x i j = ( 1 ρ 2 ) 1 2   z j i + ρ z i ,   p + 1 ,   j = 1,2 , , n ;     i = 1,2 , , p .
To generate the dependent variable, we assumed the following model:
y j = β 0 + β 1 X 1 j + β 2 X 2 j + + β p X p j +   ϵ j ,     j = 1,2 ,   ,   n ,
where y j represents the dependent variable, X j 1 ,   X j 2 , ,   X j p are the predictor variables and ϵ j   is the error term   o f   t h e   l i n e a r   r e g r e s s i o n   m o d e l . The number of observations is denoted as   ( n ) , and the β coefficients are selected under the assumption that β′β = 1. For the model in Equation (21), the intercept term is set to zero β 0 = 0 .
The correlation ( ρ ) between the predictor variables are given values 0.80 ,   0.85 ,   0.90 ,   0.95 ,   a n d   0.99 . The error terms ϵ j are followed a normal distribution with zero mean and unit variance. The study explored the impact of varying factors, such as the number of independent variables ( p ) , the error variance σ 2 , and the sample size ( n ) . The specific values used in the analysis are as follows:
  • Sample sizes: n   = 10 ,   20 ,   50 ,   100 .
  • Number of independent variables: p   =   4 ,   6 ,   8 ,   10 .
  • Error variance: σ 2 = 0.5 ,   1 ,   4 ,   5 ,   7 ,   10 ,   11 .
These variations were considered to examine the influence of these factors on the model’s behavior and results. To evaluate the MSE across different values of ρ ,   n ,   a n d   p ,   N   =   10,000   simulations were run in R programming. The results are presented in Table A1, Table A2, Table A3, Table A4, Table A5, Table A6, Table A7 and Table A8 (Appendix A).
The following steps were used to calculate the MSE for the estimators:
  • Data Preparation
Standardize predictors using Equation (20) and compute eigenvalues ( λ 1 , λ 2 , λ 3 , …, λ p ) and eigenvectors ( e 1 , e 2 , e 3 , …, e p ) of X X . Set true coefficients as β = emaxP, where P = [ e 1 , …, e p ] and e m a x   corresponds to the largest eigenvector ( λ m a x )   with errors generated from N ( 0 , 1 ) .
  • Model Estimation
Compute response values using Equations (5) and (6), and derive OLS and other ridge estimates using their formulas.
  • MSE Calculation
Repeat the procedure for N = 10000 Monte Carlo simulations. Calculate the MSE as follows:
  M S E β ^ = 1 10000   J = 1 10000 β ^ i j β i β ^ i j β i   .

Discussion of the Simulation Results

The simulation results presented in Table A1, Table A2, Table A3, Table A4, Table A5, Table A6, Table A7 and Table A8 provided a detailed comparison of the performance of OLS, existing ridge estimators (HK, KMS, KAM, KGM, Kmed, and BLRE), and our newly proposed ridge estimators ( S P S 1 , S P S 2 , S P S 3 ). The study evaluated the effects of sample size ( n ) , number of predictors ( p ) , correlation coefficient ( ρ ) , and error variance ( σ 2 ) on the mean squared error (MSE) of these estimators. The results highlighted the robustness and efficiency of our newly proposed estimators across a wide range of scenarios. Key observations from the tables are summarized below.
  • Performance Across Sample Sizes  ( n ) : The performance of the estimators varies significantly with different sample sizes. As the sample size increased (from n = 10, 20 to n = 100), the MSE tended to decrease across most estimators, which is a common trend in statistical estimation due to the law of large numbers. For instance, in Table A1 ( p = 4, n = 20), the MSE for most estimators was higher, whereas in Table A3 ( p = 4, n = 100), MSE values were noticeably lower. This improvement is particularly evident in the new SPS estimators, especially the SPS1, which consistently exhibited lower MSE compared to OLS and other existing estimators. This suggests that the SPS estimators, particularly the SPS1, are more stable and efficient as the sample size increases, further supporting their robustness in larger datasets.
  • Impact of Number of Predictors  ( p ) : The number of predictors also influenced the MSE of the estimators. As the number of predictors increases from p = 4, 6, 8 and p = 10, the MSE generally increases, especially for estimators like OLS, which struggle more as the model complexity increases. For example, in Table A4 ( p = 10, n = 20), the MSE for OLS was considerably higher than in the p = 4 scenarios, reflecting the difficulty of OLS in dealing with more complex models. The proposed SPS estimators continued to show robust performance across different values of p , especially the SPS1, which tended to outperform other estimators even as p increased. This highlights that the SPS estimators remain effective even in higher-dimensional settings, where traditional estimators like OLS may fail to perform well.
  • Effect of Correlation Coefficient  ( ρ ) : The correlation coefficient had a notable effect on the MSE, with higher correlation leading to increased estimation difficulty. As ρ increased from 0.80 to 0.99, the MSE for most estimators increased, especially for OLS. This is particularly apparent in Table A1, Table A2 and Table A3, where at ρ = 0.99, the MSE for OLS was much higher than at ρ = 0.80. The newly proposed SPS estimators, particularly the SPS1, maintained relatively low MSE even as the correlation increases. This indicates that the SPS estimators are more robust to high correlation, which is often a challenging condition for many existing estimators and OLS. This robustness makes the SPS estimators particularly attractive when high correlations between predictors are present.
  • Influence of Error Variance  ( σ 2 ) : The influence of error variance was another critical factor affecting the performance of the estimators. As σ2 increased from 0.5 to 11, the MSE generally increased for all estimators, indicating that higher error variance leads to greater estimation uncertainty. However, the newly proposed SPS estimators showed a distinct advantage under high error variance conditions. For instance, in Table A4 ( p = 10, n = 20) at σ2 = 11, the SPS1 still outperformed OLS and several other existing estimators, suggesting that it can handle larger error variances more effectively. In contrast, traditional estimators like KGM and Kmed showed substantial increases in MSE as error variance rose, especially at ρ = 0.99. This confirms that the SPS estimators, particularly the SPS1, are more robust to high levels of error variance, providing more reliable estimates under such conditions.
  • Comparison with existing estimators: When comparing the performance of the newly proposed SPS estimators with existing ones (HK, BHK, KMS, KAM, KGM, Kmed, BLRE), the SPS estimators generally outperform OLS and most of the existing estimators across different conditions, especially under higher correlations and larger error variances. Among the traditional estimators, KAM, KMS, and BHK were the most competitive, showing lower MSE than OLS in many scenarios, particularly at higher sample sizes and moderate to high correlation. However, the SPS estimators, particularly the SPS1, showed consistently superior performance across a wide range of conditions, including high correlation ( ρ = 0.99) and large error variances (σ2 = 11). The SPS1 consistently provided the lowest MSE across most of the datasets, making it the most reliable estimator in challenging conditions. In contrast, the SPS2 and SPS3 exhibited competitive performance but with slightly higher MSE compared to the SPS1, particularly in cases of very high correlation and large error variances. Nonetheless, they still outperformed OLS and several existing estimators, making them valuable alternatives in many situations.

4. Environmental and Chemical Science Data Applications

Regression analysis involving multicollinear data, especially in environmental, chemical, and physical studies, is of particular interest to many researchers, where predictor interdependence is common.
The performance of our newly developed estimators, OLS, and other existing ridge estimators was evaluated using two real datasets. We considered the environmental Air Pollution Dataset [22] and the chemical Hald Cement Dataset [11], see also: Ref. [23]. These real datasets share features similar to those taken into account in our earlier simulation work.

4.1. Air Pollution Dataset

This environmental dataset contains 20 real-world measurements of urban nitrogen dioxide (NO2) levels ( y ), along with humidity ( X 1 ) , temperature ( X 2 ) , and air pressure ( X 3 ) . From 15 predictors, we selected only three predictor variables X 1 ,   X 2   a n d   X 3 as their correlations are obvious. NO2 ranges from 0.05 ppm (light) to 0.25 ppm (moderate pollution). The natural correlations between weather variables, such as humidity and air pressure, make this dataset useful for testing regression models on atmospheric data. The linear regression model for modeling this dataset can be written as follows:
y = β 0 + β 1 X 1 + β 2 X 2 + β 3 X 3 + ϵ .
In this equation, β 0 represents the intercept, while β 1 , β 2 and β 3 are the coefficients associated with each predictor variable X 1 ,   X 2   a n d   X 3 , respectively. The term ϵ reflects the error term, which accounts for the difference between the actual and predicted values.
To check if the dataset is affected by multicollinearity, three tools were used: Variance Inflation Factor (VIF), the Condition Number (CN), and a heatmap display. For the CN, the eigenvalues of the data are needed, which are as follows: λ 1 = 1.09 ,     λ 2 = 0.05   a n d   λ 3 = 0.003 .
The presence of near-zero eigenvalues (0.05 and 0.003) shows that the predictors are highly interdependent, with one variable almost perfectly explainable by the others.
When the CN is greater than 30, it indicates the presence of multicollinearity in the dataset.
The CN, calculated as the ratio of the largest to smallest eigenvalue, is as follows:
C N = λ m a x λ m i n   363 .
Additionally, the VIF values for each predictor are as follows:
V I F X 1 = 0.917 ,     V I F X 2 =   20   a n d   V I F X 3 = 333 .
Since all VIF values far exceeded the threshold of 10, this confirms high multicollinearity in the dataset.
From Figure 1, it is also evident that variables exhibit strong correlations with each other; therefore, strong multicollinearity exists in the data. Given that OLS regression would produce unstable coefficient estimates, we employed our newly developed estimators that were designed to handle highly collinear data effectively. These estimators were compared against OLS and other existing methods to demonstrate their superiority in mitigating multicollinearity.
This practical application proved that our proposed estimator’s performance is better compared to the OLS and other existing approaches. These findings, presented in Table 1, provided strong confirmation of our earlier simulation results.

4.2. Hald Cement Dataset

This dataset consists of 13 observations with five numerical variables: the dependent variable (y) and four independent variables X 1   X 2 ,   X 3   a n d   X 4 . The regression model is expressed as follows:
y = β 0 + β 1 X 1 + β 2 X 2 + β 3 X 3 + β 4 X 4 + ϵ .
To check if the dataset is affected by multicollinearity, two indicators were used: the CN and the heatmap. For the CN, the eigenvalues of the data are needed. The eigenvalues are   λ 1 = 1.57 ,   λ 2 = 0.198 ,   λ 3 = 0.0126 ,   a n d   λ 4 = 0.0016 .
Therefore, the CN is approximately 986, which is greater than the threshold of 30. This indicates that the data exhibits multicollinearity. Figure 2 clearly shows that the independent variables are correlated with each other.
The newly proposed and existing estimators were employed to compare their performance based on MSE, in order to identify the best estimator for mitigating multicollinearity.
Table 2 presents the MSE and regression coefficients for both the newly proposed estimators (SPS1, SPS2, SPS3) and the existing ones based on the real Hald Cement Dataset. It is evident that the results aligned well with the simulation results, validating the performance of the proposed estimators. In the analysis of the Hald Cement Dataset, the newly proposed estimators (SPS1, SPS2, SPS3) were compared with OLS and other existing methods (HK, HKB, KAM, KGM, Kmed, KMS, and BLRE) based on MSE. SPS1 estimator showed the best performance, having the lowest MSE, which was significantly better than OLS and other methods. SPS2 and SPS3 also performed well, with MSE values of 0.381985, outperforming all other methods except Kmed (0.375257).
Overall, the SPS estimators offer a promising alternative for more accurate predictions in modeling this dataset.

5. Conclusions

This study comprehensively evaluated the performance of newly proposed ridge-type shrinkage estimators (SPS1, SPS2, and SPS3) with OLS and other existing estimators, such as HK, KAM, KGM, Kmed, KMS, and BLRE, under various scenarios of multicollinearity including sample size, and error variance. Both simulations and real-world datasets analyses confirmed the superiority of the proposed SPS estimators in mitigating multicollinearity and induced instability while maintaining estimation efficiency.
SPS estimators are recommended for regression analysis involving multicollinear data, particularly in environmental and physical science studies, where predictor interdependence is common.
Future work could explore theoretical extensions in nonlinear or heteroscedastic settings and applications in more complex, big environmental and physical science datasets.

Funding

This research was funded by Taif University.

Data Availability Statement

The datasets that support the findings of this study are available within the article.

Acknowledgments

The author would like to acknowledge the Deanship of Graduate Studies and Scientific Research, Taif University for funding this work.

Conflicts of Interest

The author declares no conflicts of interest.

Appendix A

Table A1. MSE of estimators for ( p = 4 ,   n = 20 ) .
Table A1. MSE of estimators for ( p = 4 ,   n = 20 ) .
σ 2 ρ OLSHKBHKKMSKAMKGMKmedBLRE S P S 1 S P S 2 S P S 3
10.800.3817220.3363110.2395910.3736870.1993990.2066860.3177930.3412710.0963690.1510970.134367
0.900.8804470.6881870.4694790.8490010.378310.3940130.6628170.9354490.0927130.1732590.146667
0.9910.329115.5095464.4531179.6852123.06542.8088298.316298846.11710.032030.0858750.064564
40.801.0667190.611520.4402640.9721680.2839930.3324240.6284550.5703060.1089930.1893450.160515
0.902.4308391.0950470.7971612.1489140.5011310.5651391.3876918.171530.0888790.1733250.141207
0.9926.219558.2050796.27541221.781192.9318022.32986318.183212512.7410.0319040.0505030.041198
70.8037.903161.0248810.8664546.0712230.7587730.7692884.8656060.7974760.6967690.5744520.652587
0.9087.106021.5016991.11410614.186430.6939710.74262215.009991.0489540.676040.5437440.630084
0.991027.09317.283496.432572163.37360.5680261.076394363.522293.587210.6440370.4946770.592247
110.80102.34471.1123321.0186716.3144810.9053860.9182778.2829850.9249050.8707510.7967830.878491
0.90241.40981.3448211.08658514.713290.8293530.86461425.683580.9064780.8487120.7597320.856623
0.993021.61617.212447.991548189.37030.8121141.380869808.7333567.87430.8515430.7638810.860182
Table A2. MSE of estimators for ( p = 4 ,   n = 50 ) .
Table A2. MSE of estimators for ( p = 4 ,   n = 50 ) .
σ 2 ρ OLSHKBHKKMSKAMKGMKmedBLRE S P S 1 S P S 2 S P S 3
10.800.1250580.1196280.0989330.1239360.0736360.0813120.1148580.1170470.0669110.0835040.081279
0.900.2411460.2201430.1620370.2372210.1163660.1167240.204620.2147730.0828190.1159150.111081
0.992.4728871.5470491.046382.3524330.8639050.9294621.8159296.1508030.0715140.1511940.136765
40.800.3642840.2752610.1980940.3462670.1245530.1506910.2456590.22670.1027060.1494890.142146
0.900.7229830.4408990.2974240.6667070.1736790.2026470.4013540.3565390.1063910.1738590.16255
0.996.9577252.1466441.7395.9117610.8614280.8343894.0990815.691220.0412720.0953540.084222
70.8012.729080.6389140.7132642.0479040.7488780.743141.163840.7373040.4395550.3587130.382736
0.9024.571160.6608690.6299033.4698130.6314870.6566032.2238920.6442840.3911790.2974320.327522
0.99252.86132.9349181.35340734.473710.3504470.47067654.084552.4815190.3656520.2572850.295085
110.8036.774430.8067270.841382.0152660.8897960.8965741.7207670.893280.7054620.6063510.681401
0.9070.842120.7879150.772633.4670960.8291130.8474213.8497490.815090.663810.5506270.636029
0.99697.73111.7168840.98179528.965690.5657120.63880496.362110.7698510.6359960.511050.60571
Table A3. MSE of estimators for ( p = 4 ,   n = 100 ) .
Table A3. MSE of estimators for ( p = 4 ,   n = 100 ) .
σ 2 ρ OLSHKBHKKMSKAMKGMKmedBLRE S P S 1 S P S 2 S P S 3
10.800.0731580.0714040.0633380.0727660.0459990.056570.0693210.069820.0502480.0575560.057055
0.900.1581280.1497320.1101350.1564560.0834730.0777610.1413430.1442380.0758820.0966660.095111
0.991.5660581.0905230.6970961.5015630.5787080.6227011.1354161.1771220.0866370.1639960.15661
40.800.2004620.1692050.1175110.193950.0799650.1000830.1492790.141030.0875370.1139450.111849
0.900.4208660.3036820.1979360.3977440.1153850.1257760.2590760.2343390.1040390.1532830.149016
0.994.2644371.6150851.0899593.6843440.5621530.5962362.3564954.7824040.0519890.1150220.108108
70.807.2140690.4873290.6593911.1443710.6873470.674470.6249570.6964490.2488110.2284490.226216
0.9015.435190.4930080.5362022.2504330.5593190.5637571.2152910.5881410.2162710.1821230.184494
0.99163.29341.5705530.66443521.75540.2261230.31775429.206140.5140660.1785410.1215090.131159
110.8021.034220.7562190.8081841.1409870.8788370.8827051.0066580.884620.5026030.4051020.446738
0.9045.246120.659030.6939892.0191120.78360.8036951.6885130.7877610.4634540.3608410.404912
0.99430.9491.1257240.59060916.337930.4710550.54327344.921530.5312090.4342530.3216890.371107
Table A4. MSE of estimators for ( p = 10 ,   n = 20 ) .
Table A4. MSE of estimators for ( p = 10 ,   n = 20 ) .
σ 2 ρ OLSHKBHKKMSKAMKGMKmedBLRE S P S 1 S P S 2 S P S 3
10.802.2841271.6850510.9033032.2500520.4591750.5037771.6155482.9933220.0453120.1019560.067479
0.904.8597033.2884671.9064094.779080.9133671.0692123.519616384.34840.0342110.086030.052673
0.9949.8618830.684418.4850448.931738.5594889.23774842.4569676,912.960.0207370.0247320.015411
40.806.3868343.07751.492756.1064730.6014890.7597843.7359351.489830.1231050.0974290.078083
0.9013.117626.1797573.01506212.497891.1438381.4876118.107051168.04860.1012420.0716590.057825
0.99137.871161.3614632.09949130.78639.48589610.09598107.865734,245.330.0879690.0234960.030116
70.80233.08138.7855612.365695105.68090.686440.74324853.163024.0742060.8819320.6805490.765443
0.90465.274615.700254.102241210.34720.6762240.820967121.92188.7966270.8680340.6504870.741114
0.994894.56162.807238.511222172.9592.6143386.4066772193.3792757.7170.8534830.6158770.714483
110.80617.90347.8248162.268186164.66070.8144160.85764399.679672.837450.9507560.8486280.899381
0.901383.50815.58333.963246365.97450.9638330.988453270.99786.296080.9470470.8375360.892433
0.9913,426.38128.682737.204993467.911.1531532.5546494770.136110.98930.9435680.8253550.88445
Table A5. MSE of estimators for ( p = 10 ,   n = 50 ) .
Table A5. MSE of estimators for ( p = 10 ,   n = 50 ) .
σ 2 ρ OLSHKBHKKMSKAMKGMKmedBLRE S P S 1 S P S 2 S P S 3
10.800.4171470.3834210.2341130.4148570.0961170.1001270.3416070.3905640.056660.1240110.110414
0.900.9089360.7621170.4059990.9004440.1746320.1855950.6548830.8284990.053820.148560.127847
0.999.542065.9200853.4694499.3750051.4402311.7627197.1758712217.9590.0167140.0955930.071293
40.801.1730280.7568620.3370561.1421750.1137370.1308680.6549150.7140770.0585510.156340.132142
0.902.3506381.2665010.5746342.2666270.1842770.2348891.1955071.3140970.0447980.1529960.123747
0.9924.545759.0542484.25218223.195581.2947951.74552315.259741463.1810.0130790.0446090.032908
70.8042.576091.1640270.66216619.145740.6440680.595964.9279420.6104930.6360570.375010.424154
0.9085.571561.9629770.67912236.534290.4700640.4412810.944290.7646030.5853380.3119920.362793
0.99881.106921.405824.526549363.26190.2178170.345077236.323512.102840.5406980.2627770.313951
110.80116.03681.0045090.76625426.700640.838240.8148916.9892010.7170930.8403750.6482960.707707
0.90256.63912.1329520.7649458.589450.7082740.68619922.203080.816930.806620.5859220.652807
0.992422.86114.178911.760762504.03910.3272920.358444413.17335.21770.7886030.5552890.625595
Table A6. MSE of estimators for ( p = 10 ,   n = 100 ) .
Table A6. MSE of estimators for ( p = 10 ,   n = 100 ) .
σ 2 ρ OLSHKBHKKMSKAMKGMKmedBLRE S P S 1 S P S 2 S P S 3
10.800.1378120.134590.1047910.1375440.0417960.0426780.1276970.1337450.0535780.0828750.080015
0.900.2663240.2527470.168580.2653120.0705460.0729230.2256970.2522040.0649090.1150320.10977
0.992.6506961.9333460.9972412.6163570.5404920.6299131.8352312.8484950.0439260.1739540.154299
40.800.3657010.312710.1831180.3615180.0641690.0700380.2592540.2846850.0737890.143330.135751
0.900.7086550.5226770.2456010.6947770.0774670.0897120.4019390.4728480.0687050.1673590.155538
0.997.5389623.3489941.5679457.2250210.5164050.7552724.0929257.9980940.0218020.1087460.093146
70.8013.310370.4654730.6026396.4300890.6882360.6089210.9185230.5030250.3975330.2247570.233673
0.9027.05390.5657060.49772612.074120.520650.4517511.8285950.3808720.3444490.1725840.183042
0.99257.25374.8352470.917106107.86830.1301970.14528740.439491.7644210.2855180.1106680.124518
110.8036.24540.6038390.7609968.8800350.8707570.8402641.2763530.7492290.6794720.4457950.483727
0.9075.558310.6429970.6342718.132110.7633420.7283273.2151770.5897750.6272310.3786550.418069
0.99744.52252.7438640.657579160.69270.284490.26130873.795670.6662850.5705770.3085350.349748
Table A7. MSE of estimators for ( p = 6 ,   n = 10 ) .
Table A7. MSE of estimators for ( p = 6 ,   n = 10 ) .
σ 2 ρ OLSHKBHKKMSKAMKGMKmedBLRE S P S 1 S P S 2 S P S 3
0.50.853.3786582.5283981.7621733.2875371.1604461.0722052.7332174.8967760.0664390.1295210.086383
0.9511.197356.8861335.34690410.700422.6413522.7173439.04849492.038670.0670730.1507050.096489
0.9952.3326930.6912523.9846850.4664213.2991112.0758546.6528114,495.490.0332330.0743240.042417
10.8511.44535.2735814.09419210.30941.8960171.9487777.99593649.787620.1947210.1734990.151345
0.9541.5737817.6538914.4739937.700114.9684644.63217832.38513508.45320.1602450.1457540.119691
0.99217.216991.8344771.08078193.797421.6594415.95904180.9525182,157.50.1451540.0556670.074836
50.85305.933215.9317513.04077130.72922.724922.987827124.332411.575890.8711560.7312760.803255
0.951126.80776.2636650.4237416.49547.2351056.579562491.730876.33930.8397740.6945660.768471
0.995405.858480.6797352.45952374.29830.8521350.333123346.464399,519.70.8270580.6444510.740844
100.851232.28758.038940.75881305.743210.510657.637171429.452746.673960.9558620.9032860.939088
0.953908.595137.531548.76107943.03562.1768563.3584291491.98971.501260.9620960.9030550.94383
0.9921,305.45591.9954593.45394726.27662.24463103.283810,093.682172.4250.9075810.8033360.865478
Table A8. MSE of estimators for ( p = 8 ,   n = 10 ) .
Table A8. MSE of estimators for ( p = 8 ,   n = 10 ) .
σ 2 ρ OLSHKBHKKMSKAMKGMKmedBLRE S P S 1 S P S 2 S P S 3
0.50.850.3822130.3559090.2477020.3792870.148130.1338690.3310370.3667580.0700560.1291370.114566
0.951.1023720.9422840.5841521.0867120.3952890.4285920.8690861.0889920.0681970.1626430.137268
0.997.1516244.3740963.413276.9721881.7369721.9795765.83079419372.740.0332760.1214570.093012
10.851.3711180.7574220.4376021.2768280.1804110.2199010.6820650.7307150.0738380.1608610.134973
0.955.2085222.4614261.5295024.820430.5452550.6532763.0605229.9809130.0467270.135360.105492
0.9920.816846.6520624.34867818.708761.6189021.85037113.302892616.950.0212760.0493780.037687
50.8530.324420.7733570.670119.0298860.5972490.5853163.2799110.5363330.5503130.3509030.404997
0.95108.02672.8643670.93773732.022290.412740.41790817.90291.2657460.5149290.3165230.370594
0.99556.681720.421085.618645168.0870.5282991.3103164.386314.247920.5139250.3146390.370418
100.85136.8431.6816560.84627116.412790.8432250.84704311.86550.9103180.8351990.6948070.774308
0.95583.87762.9121451.20688770.058650.708190.71804481.402511.0613430.8309350.6849440.767827
0.992434.2368.515153.702356266.21840.5060510.756916562.64432.578010.7912250.6159150.716577

References

  1. Hoerl, A.E.; Kennard, R.W. Ridge Regression: Applications to Nonorthogonal Problems. Technometrics 1970, 12, 69–82. [Google Scholar] [CrossRef]
  2. Rao, C.R.; Toutenburg, H. Linear Models. In Linear Models: Least Squares and Alternatives; Springer: New York, NY, USA, 1999; pp. 5–21. [Google Scholar] [CrossRef]
  3. Golub, G.H.; Heath, M.; Wahba, G. Generalized Cross-Validation as a Method for Choosing a Good Ridge Parameter. Technometrics 1979, 21, 215–223. [Google Scholar] [CrossRef]
  4. Friedman, J.H. A Variable Span Smoother; Technical Report No. 5; Laboratory for Computational Statistics, Stanford University: Stanford, CA, USA, 1984. [Google Scholar]
  5. Haider, B.; Asim, S.M.; Wasim, D.; Kibria, B.G. A Simulation-Based Comparative Analysis of Two-Parameter Robust Ridge M-Estimators for Linear Regression Models. Stats 2025, 8, 84. [Google Scholar] [CrossRef]
  6. Khan, M.S.; Alharthi, A.S. Adaptive Penalized Regression for High-Efficiency Estimation in Correlated Predictor Settings: A Data-Driven Shrinkage Approach. Mathematics 2025, 13, 2884. [Google Scholar] [CrossRef]
  7. Chand, S.; Kibria, B.M.G. A new ridge type estimator and its performance for the linear regression model: Simulation and application. Hacet. J. Math. Stat. 2024, 53, 837–850. [Google Scholar] [CrossRef]
  8. Khan, M.S.; Ali, A.; Suhail, M.; Kibria, B.M.G. On some two parameter estimators for the linear regression models with correlated predictors: Simulation and application. Commun. Stat. Simul. Comput. 2024, 1–15. [Google Scholar] [CrossRef]
  9. Akhtar, N.; Alharthi, M.F. Enhancing accuracy in modelling highly multicollinear data using alternative shrinkage parameters for ridge regression methods. Sci. Rep. 2025, 15, 10774. [Google Scholar] [CrossRef] [PubMed]
  10. Alharthi, M.F.; Akhtar, N. Newly Improved Two-Parameter Ridge Estimators: A Better Approach for Mitigating Multicollinearity in Regression Analysis. Axioms 2025, 14, 186. [Google Scholar] [CrossRef]
  11. Gujarati, D.N.; Porter, D.C. Basic Econometrics, 5th ed.; McGraw-Hill/Irwin: New York, NY, USA, 2009. [Google Scholar]
  12. Hoerl, A.E.; Kannard, R.W.; Baldwin, K.F. Ridge regression:some simulations. Commun. Stat. 1975, 4, 105–123. [Google Scholar] [CrossRef]
  13. Kibria, B.M.G. Performance of Some New Ridge Regression Estimators. Commun. Stat. Simul. Comput. 2003, 32, 419–435. [Google Scholar] [CrossRef]
  14. Khalaf, G.; Månsson, K.; Shukur, G. Modified Ridge Regression Estimators. Commun. Stat. Theory Methods 2013, 42, 1476–1487. [Google Scholar] [CrossRef]
  15. Akhtar, N.; Alharthi, M.F. A comparative study of the performance of new ridge estimators for multicollinearity: Insights from simulation and real data application. AIP Adv. 2024, 14, 14. [Google Scholar] [CrossRef]
  16. Khan, M.S.; Ali, A.; Suhail, M.; Awwad, F.A.; Ismail, E.A.A.; Ahmad, H. On the performance of two-parameter ridge estimators for handling multicollinearity problem in linear regression: Simulation and application. AIP Adv. 2023, 13. [Google Scholar] [CrossRef]
  17. Muniz, G.; Kibria, B.M.G. On Some Ridge Regression Estimators: An Empirical Comparisons. Commun. Stat. Simul. Comput. 2009, 38, 621–630. [Google Scholar] [CrossRef]
  18. Lukman, A.F.; Ayindez, K. Review and classifications of the ridge parameter estimation techniques. Hacet. J. Math. Stat. 2017, 46, 953–968. [Google Scholar] [CrossRef]
  19. Lukman, A.F.; Ayinde, K.; Siok Kun, S.; Adewuyi, E.T. A Modified New Two-Parameter Estimator in a Linear Regression Model. Model. Simul. Eng. 2019, 2019, 6342702. [Google Scholar] [CrossRef]
  20. Alkhamisi, M.A.; Shukur, G. A Monte Carlo Study of Recent Ridge Parameters. Commun. Stat. Simul. Comput. 2007, 36, 535–547. [Google Scholar] [CrossRef]
  21. Alharthi, M.F.; Akhtar, N. Modified Two-Parameter Ridge Estimators for Enhanced Regression Performance in the Presence of Multicollinearity: Simulations and Medical Data Applications. Axioms 2025, 14, 527. [Google Scholar] [CrossRef]
  22. U.S. Environmental Protection Agency. Hourly NO2 Concentration and Meteorological Data (2018–2022); Air Quality System (AQS); U.S. Environmental Protection Agency: Washington, DC, USA, 2025.
  23. Hald, A. Statistical Theory with Engineering Applications; John Wiley & Sons: Hoboken, NJ, USA, 1952. [Google Scholar]
Figure 1. Heatmap display of air pollution dataset.
Figure 1. Heatmap display of air pollution dataset.
Axioms 14 00812 g001
Figure 2. Heatmap display of Hald Cement Dataset.
Figure 2. Heatmap display of Hald Cement Dataset.
Axioms 14 00812 g002
Table 1. MSE and regression coefficients for air pollution dataset.
Table 1. MSE and regression coefficients for air pollution dataset.
MethodMSE β ^ 0 β ^ 1 β ^ 2 β ^ 3
OLS2.9792020.0388630.621592−0.07121−0.16889
HK1.321717−2.070440.0388520.158958−0.0168
BHK0.862875−0.54526−2.009020.0388040.035427
KAM2.4479142.500708−0.50237−1.781610.036829
KGM0.3147790.0387852.103305−0.37532−0.30316
Kmed0.370558−1.708360.0374441.249123−0.03155
KMS1.488609−0.34251−0.413970.0388960.067531
BLRE1.2566371.082581−0.04478−2.270840.036829
SPS10.2753830.0386840.097186−0.72364−0.30316
SPS20.293119−1.390380.0380055.501533−0.03155
SPS30.293119−0.23044−0.611910.0351180.067531
Table 2. MSE and regression coefficients for Hald Cement Dataset.
Table 2. MSE and regression coefficients for Hald Cement Dataset.
MethodMSE β ^ 0 β ^ 1 β ^ 2 β ^ 3 β ^ 4
OLS28.17521−0.24971−0.365050.7151650.1129260.045647
HK22.90219−0.365260.9232050.01873911.45462−0.23748
HKB11.975310.9273080.1053680.310029−0.24971−0.33059
KAM26.990040.1133387.570884−0.23612−0.365260.506491
KGM0.45648311.76045−0.24971−0.326930.9273390.007486
Kmed0.375257−0.2497−0.365250.4808120.1134030.112597
KMS26.75516−0.365220.9271430.00674511.81024−0.23748
BLRE21.409950.9264780.1129960.100843−0.22143−0.33059
SPS10.3339520.11163611.50577−0.24971−0.289840.506491
SPS20.38198510.58041−0.2452−0.365250.3029180.007486
SPS30.381985−0.24964−0.352080.9271090.0031410.112597
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Alharthi, M.F. Novel Data-Driven Shrinkage Ridge Parameters for Handling Multicollinearity in Regression Models with Environmental and Chemical Data Applications. Axioms 2025, 14, 812. https://doi.org/10.3390/axioms14110812

AMA Style

Alharthi MF. Novel Data-Driven Shrinkage Ridge Parameters for Handling Multicollinearity in Regression Models with Environmental and Chemical Data Applications. Axioms. 2025; 14(11):812. https://doi.org/10.3390/axioms14110812

Chicago/Turabian Style

Alharthi, Muteb Faraj. 2025. "Novel Data-Driven Shrinkage Ridge Parameters for Handling Multicollinearity in Regression Models with Environmental and Chemical Data Applications" Axioms 14, no. 11: 812. https://doi.org/10.3390/axioms14110812

APA Style

Alharthi, M. F. (2025). Novel Data-Driven Shrinkage Ridge Parameters for Handling Multicollinearity in Regression Models with Environmental and Chemical Data Applications. Axioms, 14(11), 812. https://doi.org/10.3390/axioms14110812

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop