In this section, we investigate the finite-sample properties of the proposed estimators using Monte Carlo simulation experiments and two real-data examples.
8.1. Simulation Experiments
The properties of the estimator under investigation and the standards being applied to evaluate the findings will determine how the simulation experiment should be prepared. Following [
1], the following equation was used to produce the regressor variables:
where
is an independent and identically standard normal random variable and
is the correlation between
and
for
. The response variable
was then obtained via the following equation:
where
is
, the regression vector is partitioned as
, and
varies over the set
, which represents the degree of deviation from the null hypothesis in Equation (
3). In this simulation, we chose the value of
k as follows: Following [
1,
38], we rewrote the model in Equation (
1) in its canonical form as
, where
,
, and
are the eigenvalues of
and
is an orthogonal
matrix whose columns are the corresponding eigenvectors. Thus, we have
; then, we estimated the value
k as
, and
is the estimated value of
obtained by fitting the regression model to the generated data. The biasing parameter was determined by utilizing the KL heuristic outlined in
Appendix A.2—Algorithm A1. We set
, as suggested by [
18], and
. The correlation coefficient
was chosen to vary over the set
,
, and
. We set
for testing the hypothesis in Equation (
3). It was seen that the performance of all the estimators had a similar pattern when the values of
, and
q were varied. To save space, we chose
,
, and
; then, we ran the simulation over
iterations for
. For each estimator, we computed the mean squared error (MSE) as follows:
where
is any of the proposed estimators in this study.
For the purpose of comparison, we used the relative efficiency of the mean squared error (RE) with respect to
, which is defined as follows:
The complete simulation framework used to calculate the MSEs and relative efficiencies of all estimators is provided in
Appendix A.2—Algorithm A2. A number greater than one of the REs indicated the superiority of
over
, and vice versa.
Figure 1 shows the graphs for the cases we considered. The following conclusion can be obtained:
The sub-model estimate
consistently beat all other estimators when the null hypothesis in Equation (
3) was true or approximately true. However, its relative efficiency decreased and eventually approached zero as
increased. Moreover, all the estimators outperformed the regular
estimator in terms of the mean squared error across all values of
.
For all values of , the RE positive shrinkage estimator dominated over all other estimators except when the sub-model was true, in which case the RE sub-model and the pretest estimators outperformed it.
The relative efficiencies exhibited a consistent pattern when the values of , and were held constant for both sample sizes used in this simulation.
8.2. Example 1: Biomass Production in the Cape Fear Estuary
In this section, we consider a case study discussed by Rawlings et. al [
39] in two different chapters of their book, also available for free within the "
VisCollin"-R-package by Friendly [
40]. The data were originally studied by Rick [
41]. His goal was to detect the key soil factors that affect the aerial biomass in the marsh grass Spartina alterniflora in the Cape Fear Estuary of North Carolina at three different locations. At each location, three types of Spartina vegetation areas were sampled, namely devegetated "dead" areas, "short" Spartina areas, and "tall" Spartina areas. Five samples of the soil substrate from different sites were collected within each location and vegetation type. These samples were then evaluated for 14 different physico-chemical parameters of the soil monthly for several months, resulting in a total of 45 samples. Thus, there were 45 observations in this data set covering the following 17 variables: the location
loc; area type
type; hydrogen sulfide
; salinity in percentage
SAL; ester hydrolase
Eh7; soil acidity in water
pH; buffer acidity at pH 6.6
BUF; concentration of the following elements: phosphorus
P, potassium
K, calcium
Ca, magnesium
Mg, sodium
Na, manganese
Mn, zinc
Zn, copper
Cu, and ammonium
NH4; and aerial biomass
BIO as the response variable.
One main objective was to employ the set of variables that can accurately predict the response variable. As a first investigation of multicollinearity, we constructed a correlation plot among these variables, which is given below in
Figure 2.
The plot shows that there are many significant relationships between these variables, and some of these relations are very strong and highly significant with
. This indicates that multicollinearity exists among these variables, which can be easily detected by the variance inflation factor plot given below in
Figure 3.
The VIF plot shows a serious multicollinearity problem, as some of the value are more than 5 in many cases. Hence, the RE estimator will reduce this problem and produce a better estimate of the parameters. Moreover, the proposed estimators will be an additional improvement to estimate and predict the target response variable BIO.
In the absence of prior knowledge, the limitation on the parameters is established either through the judgment of an expert or by utilizing existing methodologies for variable selection, such as the Akaike information criterion (AIC), forward selection (FW), backward elimination (BE), best subset selection (BS), or the Bayes information criterion (BIC), or by using some penalization algorithms, such as the LASSO, adaptive LASSO, and others, to produce a sub-model. In this example, we first employed the forward, backward, and best subset selection methods to produce a sub-model, and then obtained the RE, pretest, and shrinkage estimators. Secondly, we applied random forest, K-nearest neighbors, and a neural network as machine learning algorithms to compare the prediction error with the seven proposed estimators. The sub-models selected by the forward, backward, and best subset selection methods are summarized in
Table 1 below.
In our analysis, we examined two sub-models: the forward sub-model, which included the variables pH, Ca, Mg, and Cu along with the intercept, and the second one, which was the backward/best subset that included the variables SAL, K, Zn, Eh7, Mg, Cu, and NH4 along with the intercept. The two sub-models are designated as Sub.1 and Sub.2, respectively.
We fit a full model with all the available variables and the selected sub-model. The whole model yielded an estimated value of
. Kibria and Lukman [
1] presented several methods for estimating the biasing parameter
, but we chose the one with the lowest MSE, which was also provided by [
38]. The estimated value of
was determined by writing the model in Equation (
1) in its canonical form as follows:
where
,
, and
are the eigenvalues of
, and
is an orthogonal
matrix whose columns are the eigenvectors corresponding to the eigenvalues in
. In this case,
, and the estimated value of
k is given by the following:
Using the previous approach, we found . Then, the RE-type estimators were calculated. To ensure full reproducibility and methodological transparency, the hyperparameter tuning and training settings for all machine learning models, including the neural network (NN), random forest (RF), and K-nearest neighbors (KNN), were trained and tuned using the train function provided by the caret package in R.
For the NN model, the nnet implementation was used. A grid search was applied over two main hyperparameters, the number of hidden neurons (size) and the weight decay parameter (decay). The grid systematically explored hidden-layer sizes ranging from 1 to 10 and decay values between 0.1 and 0.5 in increments of 0.1. This procedure aimed to balance the model complexity and generalization ability, while the weight-decay term controlled overfitting through regularization. The RF model was trained using the random forest method. The number of predictors randomly selected at each node split mtry was optimized through internal cross-validation, and the total number of trees (ntree) was fixed at 500 to maintain model stability and ensure comparability across repeated runs. For the KNN regression model, a grid search was conducted over the number of neighbors (k), covering odd integer values from 1 to 19 to prevent ties in distance-based voting. Prior to model fitting, all the predictor variables were standardized by centering around the zero mean and scaling to the unit variance to ensure fair distance computation. All the models were trained under identical resampling conditions using 5-fold CV, as specified through the trainControl function in caret. The model performance was assessed via the root mean square error, which served as the common evaluation metric across all algorithms. This consistent and fully specified setup guaranteed both the reproducibility and fair comparability of the RF, KNN, and NN models in the presented study.
In order to evaluate the estimators’ performance, we implemented a bootstrap method—see Ahmed et al. [
42]—and calculated the mean squared prediction error in the subsequent manner:
Select, with replacement, a sample of size from the data set K times, say .
Partition each sample from (1) into separate training and testing sets. Divide the training () and testing () sets at a ratio of 70% and 30%, respectively. Then, fit the full and sub-models using the training data set, and obtain the values of all the RE-type estimators.
Evaluate the predicted response values using each estimator based on the testing data set as follows:
where
;
;
is the matrix of the other variables in the model; and
is any of the proposed RE estimators.
Find the prediction error of each estimator for each sample as follows:
where
.
Calculate the average prediction error of all the estimators as follows:
Finally, calculate the relative efficiency of the prediction error with respect to
as follows:
The findings shown in
Table 2 align with the outcomes of the simulations discussed in the preceding subsection.
Next, we employed various penalizations and three machine learning algorithms to analyze prostate cancer data. Our objective was to determine the prediction error. We first scaled all variables, including the responses (BIO), and then applied the penalizations or machine learning algorithms to avoid any differences in the variable units. The results of our investigation are summarized in the following table.
As shown in
Table 3, every machine learning algorithm outperformed the traditional RE estimator. However, the performance of the ridge, LASSO, and SCAD penalization methods was inferior to that of the RE estimator. This discrepancy may be attributed, in part, to the presence of multicollinearity among the predictor variables.
Overall, the RE-type shrinkage estimators exhibited the highest relative efficiency, outperforming both the penalized and ML models; the penalized estimators achieved moderate gains under correlated predictors, while the ML algorithms performed well in capturing nonlinear interactions, but remained less efficient in structured linear settings. The superior performance of RE-type shrinkage estimators can be attributed to their ability to incorporate valid linear restrictions and simultaneously apply shrinkage, which reduces estimator variance under multicollinearity. When the underlying data structure is approximately linear, these estimators achieve a favorable bias–variance trade-off, whereas the penalized and ML models, despite their flexibility, may lose efficiency due to over-regularization or overfitting in such structured settings.
Upon the careful examination of the numerical data, it became apparent that the prediction error’s relative efficiencies differed from one method to another. To gain a deeper comprehension of the range of values and potential outliers in our prediction, we examined the associated prediction error using the following box plots, and based on 1000 replications.
The box plot in
Figure 4 clearly illustrates the distribution of our data set, and with further analysis, it became apparent that there are clues of possible outliers. The extended whiskers and isolated data points beyond the usual range indicate variability and occurrences that diverge from the general pattern. Furthermore, applying a shrinkage technique to the (RE) estimator revealed a significant effect on the suppression of outliers. Applying shrinkage not only improves the estimate process, but also helps us identify and reduce the impact of outliers, resulting in a more flexible and dependable representation of the underlying data structure.
8.3. Example 2:
Air Pollution Data Set
The air pollution data initially utilized by McDonald and Schwing [
43] were subsequently employed by [
12,
44] to demonstrate universal ridge shrinkage methods. The data can be accessed freely at Carnegie Mellon University’s StatLib (
https://lib.stat.cmu.edu/datasets/, accessed on 17 November 2025), and it has 15 covariates pertaining to air pollution and socio-economic and meteorological observations, with the mortality rate as the dependent variable for 60 US cities in 1960. These variables are the average annual precipitation in inches
Precip, annual average percentage of relative humidity at 1:00 pm
Humidly, average January temperature in degrees F
JanTemp, average July temperature in degrees F
JulyTemp, percentage of people aged 65 or older
Over65, population per household
House, median number of school years completed by persons over 22 years
Educ, percentage of housing units that are sound and with all facilities
Sound, population density per squared mile in urbanized areas
Density, percentage of non-white population
NonWhite, percentage of people employed in white-collar occupations
WhiteCol, percentage of families with an income less than USD 3000
Poor, relative hydrocarbon pollution potential
HC, relative nitric oxide pollution potential
NOX, relative sulphur dioxide pollution potential
SO2, and total age-adjusted mortality rate per 100,000
MORT.
As we did in the first example, to examine multicollinearity among the explanatory variables, we created the correlation matrix shown in
Figure 5 below, which shows that a significant relationship exists among some of the variables. This result is also supported by the VIF plot given in
Figure 6.
Using the ols_step_best_subset function from the olsrr package applied to the pollution data set, the best subset selection procedure identified the variables Precip, JanTemp, JulyTemp, Educ, NonWhite, and SO2 based on the Cp criterion, whereas the variables Precip, JanTemp, JulyTemp, House, Educ, NonWhite, and SO2 were selected according to the AIC.
Following the methods employed in the first example and detailed in
Section 8.2, the relative efficiencies of the prediction error for the proposed estimators about
are presented below in
Table 4.
Table 4 reiterates the analogous conclusions derived in Example 1 regarding the preferences of the proposed estimators. Secondly, we utilized the identical penalized approaches and machine learning algorithms applied in Example 1, and a summary of our results is displayed in the subsequent table (see
Table 5).
The results of the penalization methods and machine learning algorithms matched with those obtained in Example 1. Furthermore, we created box plots for the variables in the second data set in
Figure 7, which again depicted the distribution of this data set. The figure indicates the presence of probable outliers in the data set; hence, utilizing shrinkage estimation methods will mitigate the influence of these outliers, enhance the estimating process, and yield a more adaptable and reliable representation of the underlying data structure.