On Some Test Statistics for Testing the Regression Coe ﬃ cients in Presence of Multicollinearity: A Simulation Study

: Ridge regression is a popular method to solve the multicollinearity problem for both linear and non-linear regression models. This paper studied forty di ﬀ erent ridge regression t -type tests of the individual coe ﬃ cients of a linear regression model. A simulation study was conducted to evaluate the performance of the proposed tests with respect to their empirical sizes and powers under di ﬀ erent settings. Our simulation results demonstrated that many of the proposed tests have type I error rates close to the 5% nominal level and, among those, all tests except one have considerable gain in powers over the standard ordinary least squares (OLS) t -type test. It was observed from our simulation results that seven tests based on some ridge estimators performed better than the rest in terms of achieving higher power gains while maintaining a 5% nominal size.


Introduction
Multicollinearity is the occurrence of high inter-correlations among independent variables in a multiple regression model. When this condition is present, it can result in unstable and unreliable regression coefficient estimates if the method of ordinary least squares is used. One of the proposed solutions to the problem of multicollinearity is the concept of ridge regression as pioneered by Hoerl and Kennard [1] They found that there is a nonzero value of k (ridge or shrinkage parameter) for which mean square error (MSE) for the ridge regression estimator is smaller than the variance of the ordinary least squares (OLS) estimator.
Estimating the shrinkage parameter (k) is a vital issue in the ridge regression model. Several researchers at different period of times have worked in this area of research and proposed different estimators for k. To mention a few, Hoerl and Kennard [1], Hoerl, Kennard and Baldwin [2], Lawless and Wang [3], Gibbons [4], Nomura [5], Kibria [6], Khalaf [7], Khalaf and Shukur [8], Alkhamisi and Shukur [9], Muniz and Kibria [10], Feras and Gore [11], Gruber [12], Muniz et al. [13], Mansson et al. [14], Hefnawy and Farag [15], Roozbeh and Arashi [16], Arashi and Valizadeh [17], Aslam [18], Asar and Karaibrahimoglu [19], Saleh et al. [20], Asar and Erişoglu [21], Goktas and Sevinc [22], Fallah et al. [23], Norouzirad and Arashi [24], and very recently Saleh et al. [25], among others Stats 2020, 3 It is well known that, to make inference about an unknown population parameter, one may consider both confidence interval and hypothesis testing methods. However, the literature on the test statistics for testing the regression coefficients under the ridge regression model is very limited. First, Halawa and Bassiouni [26] proposed non exact t-tests for the regression coefficients under ridge regression estimation and compared empirical sizes and powers of only two tests based on the estimator of k proposed by Hoerl and Kennard [1] and Hoerl, Kennard, and Baldwin [2]. Their results evidenced that, for models with large standard errors, the ridge based t-tests have correct sizes with considerable gain in powers over those of the least squares t-test. For models with small standard errors, tests are found to be slightly exceeding the nominal level in few cases. Cule et al. [27] evaluated the performance of tests proposed by Hoerl and Kennard [1], Hoerl, Kennard, and Baldwin [2], and Lawless and Wang [3] based on linear ridge and logistic ridge regression models. Gokpinar and Ebegil [28] evaluated the performance of the t-tests based on 22 different estimators of the ridge parameter k collected from the published literature. Finally, Kibria and Banik [29] analyzed the performance of the t-tests based on 16 popular estimators of the ridge parameter.
Since different ridge regression estimators are considered by several researchers at different times and under different simulation methods and conditions, testing regression coefficients based on the basis of size (Type I error) and power properties under the ridge regression model are not comparable as a whole. Therefore, the important contribution of this paper is to make a more comprehensive comparison of a much larger ensemble of available t test statistics for testing regression coefficients. We consider in our analysis most of the ones analyzed in Gokpinar and Ebegil [28] and Kibria and Banik [29] as well as other test statistics based on other ridge estimators not included in the aforementioned studies at the same time. In total, our paper compares forty different t-tests statistics. The test statistics were compared based on the empirical type I error and the power properties following the testing procedures that are detailed in Halawa and Bassiouni [26]. These results are of interest for statistical practitioners using ridge regression in different fields of application as a guide to which test statistics to use when testing the significance of variables in their ridge regression models. This paper is organized as follows. The proposed test statistics for the linear regression model are described in Section 2. To compare the performance of the test statistics, a simulation study is conducted in Section 3. An application is discussed in Section 4. Finally, some concluding remarks are given in Section 5.

Test Statistics for Regression Coefficients
Let us consider the following multiple linear regression model: Y is an (n × 1) dimensional vector of dependent variables centered about their mean, X is an (n × q) dimensional observed matrix of the regressors centered and scaled such that X T X is in correlation form, β is (q × 1) dimensional unknown coefficient vector, and ε is (n × 1) error vector distributed as multivariate normal with mean 0 and variance-covariance matrix σ 2 I n , where I n is an (n × n) identity matrix.
The ordinary least square estimator (OLS) of the parameter vector β is: To test whether the i-th component of the parameter vector β is equal to zero, the following test is used based on the OLS estimator: whereβ i is the ith component ofβ, and S(β i ) is the square root of the ith diagonal element of The test statistic in Equation (3) is the least square test statistic. Under the null hypothesis, it is distributed as Student tdistribution with n−q−1 degrees of freedom. However, when X T X is ill conditioned due to multicollinearity, the least square estimator in (2) produces unstable estimators with unduly large sampling variance. Adding a constant k to the diagonal elements of X T X improves the ill conditioned situation. This is called ridge regression. The ridge estimator of the parameter vector β is then:β where k > 0 is the ridge or shrinkage parameter. The bias, the variance matrix, and the MSE expression ofβ (k) are respectively given as follows: and σ 2 is estimated as follows:σ To test whether the i-th component of the parameter vector β is equal to zero, Halawa and Bassouni [26] proposed the following t-test statistic based on the ridge estimator of the parameter vector: whereβ i(k) is the ith element ofβ (k) , and S(β i(k) ) is the square root of the ith diagonal element of Var (β (k) ). Under the null hypothesis, the test statistic (7) was shown to be approximately distributed as a Student t-distribution with n − q − 1 degrees of freedom. For more details on this topic, see Halawa and Bassiouni [26], among others.

Simulation Study
Our simulation study has two parts. First, we analyzed the empirical Type I error of the tests. The test statistics that achieved the nominal size of 5% were kept, and the ones that deviated significantly from the 5% size were discarded. Then, the second part of the simulation study compared the tests statistics that achieved 5% nominal size in regards to statistical power.

Type I Error Rates Simulation Procedure
R Studio was used for all calculations of this paper. The R package lmridge was used to fit the ridge regression models. For the empirical Type I error simulation and the power of the test, we considered sample sizes n = 30, 50, 80, and 100, the number of regressors q = 4, 6, 8, 10, and 25, and the standard deviation of the error term was chosen as σ = 1. To see the effects of multicollinearity by stating the correlation matrix among the regressors, we assumed ρ = 0.80 and 0.95. An n × p matrix X was created as H Λ 0.5 G T , where H is any (n × p) matrix whose columns are orthogonal, Λ is the diagonal matrix of eigenvalues of the correlation matrix, and G is the matrix of normalized eigenvectors of the correlation matrix, respectively. Following Halawa and Bassiouni [26], our study was based on the most favorable (MF) direction of β for model (1). The MF orientation of β corresponds to the largest normalized eigenvector of the matrix X T X, which is a vector of the form (1/ √ q)1 q . We chose not to use the least favorable orientation (LF) of β in our simulation, since all the literature available shows that both orientations give similar results in terms of Type I error and power. For a detailed explanation of MF and LF directions of β and other details of the simulation procedure, please see the paper by Halawa and Bassiouni [26].
To estimate the 5% nominal size (α = 0.05) for testing H 0 : β i = 0 versus H 1 : β i 0 under different conditions, 5000 pseudo random vectors from N(0, σ 2 ) were created to compute the error term in (1). Without loss of any generality, we let zero intercept for (1). Under the null model, substituting the i-th element of the considered MF β by zero, model (1) was used to find 5000 simulated vectors of Y. The estimated sizes were computed as the percentage of times the absolute values of all selected test statistics were greater than the critical value of t 0.025, (n-q-1).

Type I Error Rates: Simulation Results
In Tables 2 and 3, we recorded the empirical sizes of the tests for the MF orientation for correlation levels of 0.80 and 0.95, respectively    If the true Type I error rate is 5%, then, for a simulation based in 5000 runs, the observed Type I error will be in the following interval 95% of the times 0.05 ± 2 0.05×0.95 5000 ≈ (4.4%, 5.6%). We did not consider those tests for comparison's purpose whose observed average Type I error was not in the above range.
Based on the above tables, we observed the following: The tests based on the following ridge estimators, K VR , K KibAM , K M2 , K M3 , K M4 , K M6 , K M8 , K M9 , K M10 , K M12 , K ASH , K SG1 , K SG2, and K D1, have Type I errors very well above the 5% nominal size and therefore cannot be recommended. (ii) The tests based on the following ridge estimators, K M11 , K NOM, and K FG, did not surpass the 5% nominal size but stayed below it-around 3% to 4%-and therefore cannot be recommended. (iii) The rest of the tests (including the test based on the ordinary least squares estimator) were, on average, very close to the nominal size of 5% for different sample sizes, number of variables, and levels of correlation analyzed. These tests were the ones that were compared in terms of statistical power.
We also carried out simulations for nominal sizes of 10% and 1%, and the behavior of the tests was consistent with what was observed for a nominal size of 5%. Those results are available upon request. However, we are including a table of simulated Type I errors for nominal size 1% and correlation level 0.95 in Table 4 so that one can verify that the behavior of the tests was consistent with the results for 5% nominal size shown before.

Statistical Power Simulation Procedure
After calculating the empirical type I error rates of the tests based on our initial forty ridge estimators, we discarded seventeen that did not have a nominal size between 4.4% and 5.6%.
The remaining twenty-three test statistics were compared in terms of power. Following the paper by Gokpinar and Ebegil [28], we replaced the i-th component of the β vector by J w(0)σβ i , where J is a whole positive number, and w 2 (0) = (1 + (q − 2)ρ)/[(1 − ρ)(1 + (q − 1)ρ)]. We picked J = 6 since that value achieved approximately a power of 80% for the OLS test when q = 4, and having a sizeable power for the OLS test allowed for a better comparison with the other ones.
Based on 5000 simulation runs, the powers of the tests were computed by the proportion of times the absolute value of the test statistic exceeded the critical value t 0.025, (n-q-1). All combinations of sample sizes of n = 30, 50, 100 and number of regressors q = 4, 6, 10 were considered under correlation levels of 0.80 and 0.95, respectively.

Statistical Power: Simulation Results
We recorded the empirical statistical power of the tests for the MF orientation for correlation levels of 0.8 and 0.95 in Tables 5 and 6, respectively. For a better visualization of the power of the ridge tests vs. The OLS test, we provided the power of the test for α = 0.05 and ρ = 0.80 and q = 4, 6, and 10 in Figures 1-3, respectively.         For a better visualization of the power of the ridge tests vs. The OLS test, we provided the power of the test for α = 0.05 and ρ = 0.90 and q = 4, 6, and 10 in Figures 4-6, respectively.  For a better visualization of the power of the ridge tests vs. the OLS test, we provided the power of the test for α = 0.05 and ρ = 0.90 and q = 4, 6, and 10 in Figures 4-6, respectively.     The following Table 7 provides the average gains in power for the tests with respect to the OLS test for both levels of correlations, namely 0.80 and 0.95.   The following Table 7 provides the average gains in power for the tests with respect to the OLS test for both levels of correlations, namely 0.80 and 0.95.  The following Table 7 provides the average gains in power for the tests with respect to the OLS test for both levels of correlations, namely 0.80 and 0.95.  Based on the above tables, we observed the following: (i) All the considered tests (with the exception of the one based on K KSAM when n = 30) achieved higher statistical power than the OLS test. (ii) Keeping the number of variables in the model fixed, if the sample size increased, the power of the tests also increased, as was expected. (iii) Keeping the sample size fixed, increasing the number of variables in the model decreased the power of the tests. (iv) Among the tests considered, the ones with the highest gain in power over the OLS test across different values of q, n, and ρ were the ones based on the following ridge estimators: K HKB , K KibMED , K KibGM , K M5, K KSMAX, K K12, and K D4 . The observed gains over the OLS test were between 12% to 28% (see Table 7). Therefore, we recommend these seven tests to data analysis practitioners since they achieve the highest power among the ones considered while maintaining a 5% probability of Type I error.

Application Example
The following car consumption dataset available in the webpage http://data-mining-tutorials. blogspot.com/2010/05/solutions-for-multicollinearity-in.html (See Appendix A) was used to illustrate the finding of the paper.
The goal was to create a linear regression model to predict the consumption of cars from various characteristics such as: price, engine size, horsepower, and weight. There were n = 27 observations in the dataset. We made use of the mctest and the lmridge R packages in our computations. For more info on the functionality of the aforementioned packages, see Ullah, Aslam, and Altaf [36].
There was strong evidence of multicollinearity in the data, as evidenced by all of the VIFs (variance inflation factors) being greater than 10 (See Table 8 below). Also, the condition number (CN), which is defined as CN = ( largest eigenvalue (X T X) smallest eigenvalue(X T X) ) 2 = 38.3660, was greater than 30, indicating high dependency between the explanatory variables. Since multicollinearity existed, ridge regression estimation was preferable to OLS estimation for this model. We contrasted the results of the OLS method with ridge regression using two of the ridge estimators that showed higher power, namely K KibMED and K KibGM, and the analyses are given in the following Table 9. From Table 9, we observed that no variable except for weight was a significant predictor of car consumption under the OLS estimation. When ridge regression was applied, all variables (price, engine size, horsepower, and weight of the car) became significant predictors of car consumption, and the MSE [computed using Equation (5) of the coefficient vector also decreased compared to the OLS estimate, as is expected when a ridge regression approach is appropriate. Also, we could see that the sign of the coefficient of horsepower reversed from negative and not significant under the OLS estimation to positive and significant under ridge regression estimation. Change of sign in the coefficients is one of the signals that the ridge regression approach is a good fit for this particular problem according to what is explained in the foundational paper by Hoerl and Kennard [1]. Also, it makes physical sense that higher horsepower of the car leads to higher gas consumption, thus a positive sign for the coefficient would be the right choice.

Some Concluding Remarks
In this paper, we investigated forty different ridge regression estimators in order to find some good test statistics for testing the regression coefficients of the linear regression model in case of multicollinearity. A simulation study under different conditions was constructed to make the empirical comparison among the ridge regression estimators. We compared the performance of the test statistics based on the empirical size and the power of the test. It was observed from our simulations that the tests based on ridge estimators K HKB , K KibMED , K KibGM , K M5, K KSMAX, K K12, and K D4 were the best in terms of achieving higher power gains with respect to the OLS test while maintaining a 5% nominal size.
Our results are consistent with Kibria and Banik [29], although they did not conclude which tests were the best ones. While Gokpinar and Ebergil [28] concluded that the best tests in terms of power were the ones based on K HSL and K HKB , we found that the gains in power over the OLS test for K HSL are somewhat smaller than the gains in power for the tests based on the seven estimators we mentioned above, and therefore we did not include K HSL in our final list.
All in all, based on our simulation results, we recommend the tests based on K HKB , K KibMED , K KibGM , K M5, K KSMAX, K K12, and K D4 to statistical practitioners for the purpose of testing linear regression coefficients when multicollinearity is present.