Abstract
Multicollinearity negatively affects the efficiency of the maximum likelihood estimator (MLE) in both the linear and generalized linear models. The Kibria and Lukman estimator (KLE) was developed as an alternative to the MLE to handle multicollinearity for the linear regression model. In this study, we proposed the Logistic Kibria-Lukman estimator (LKLE) to handle multicollinearity for the logistic regression model. We theoretically established the superiority condition of this new estimator over the MLE, the logistic ridge estimator (LRE), the logistic Liu estimator (LLE), the logistic Liu-type estimator (LLTE) and the logistic two-parameter estimator (LTPE) using the mean squared error criteria. The theoretical conditions were validated using a real-life dataset, and the results showed that the conditions were satisfied. Finally, a simulation and the real-life results showed that the new estimator outperformed the other considered estimators. However, the performance of the estimators was contingent on the adopted shrinkage parameter estimators.
Keywords:
Kibria-Lukman estimator; logistic regression model; Liu estimator; multicollinearity; ridge regression estimator MSC:
62JO5; 62J07; 62J12
1. Introduction
Frisch [1] coined the term “multicollinearity” to describe the problem that occurs when the explanatory variables in a model are linearly related. This problem posed a severe threat to different regression models, e.g., the linear regression model (LRM), logistic regression model, Poisson regression model and gamma regression model. The parameters in the linear and logistic regression models are popularly estimated using the ordinary least squares (OLS) estimator and the maximum likelihood estimator (MLE), respectively. However, both estimators with multicollinearity possess high standard error, and occasionally, the estimated regression coefficients exhibit the wrong coefficient signs, making the conclusion doubtful [2,3]. The ridge regression estimator (RRE) and the logistic ridge estimator are notable alternatives to the OLS estimator in the LRM and the logistic regression model [4,5]. The Liu estimator is an alternative to the ridge estimator which accounts for multicollinearity in the LRM and the logistic regression model [6,7]. The modified ridge-type estimator is a two-parameter estimator that competes favorably with the ridge and Liu estimators [8,9]. Recently the K-L estimator emerged as another estimator in the ridge class and the Liu estimator with a single biasing parameter class [10]. The K-L estimator is a form of the Liu-type estimator with one parameter that minimizes the residual sum of squares with respect to the L2 norm with a prior information. The K-L estimator outperforms the RRE and Liu estimators based on the theoretical conditions. In this study, we developed the K-L estimator for parameter estimation with the logistic regression model, derived its statistical properties, performed a theoretical comparison with other estimators, and validated its performance by a simulation and a real-life application.
The organization of this paper is as follows. The proposed estimator is discussed in Section 2. A theoretical comparison of various estimators is presented in Section 3. A simulation study is conducted in Section 4. Real-life data are analyzed in Section 5. Finally, some concluding remarks are given in Section 6.
2. Proposed Estimator
Given that is a binary response variable, then the logistic regression model is defined as a Bernoulli distribution: .
where and is the row of , which is an matrix of explanatory variables, is a vector of regression coefficients and The parameters in the logistic regression model are estimated by the method of MLE. The MLE of is
where and
Multicollinearity among the explanatory variables affects the MLE. The variance of the regression parameter is often influenced by the presence of multicollinearity [11,12]. The RRE is an alternative to the MLE in linear and logistic regression models [4,5]. The logistic ridge estimator (LRE) is defined as:
where I is an identity matrix, k (k > 0) is the ridge parameter, and is the estimate of G using . The ridge parameter [13] is defined as
while the logistic version [14] is as follows:
The Liu estimator [6] is an alternative to the ridge estimator in the linear regression model, while the logistic Liu estimator (LLE) [7] is expressed as follows:
where d is the Liu parameter. Further, we adopted the following method to compute Liu parameter d [15]:
where max and min represent the maximum and minimum operators, respectively. Further, represents the jth eigenvalue of the and , where Q is the eigenvector of .
Liu [16] proposed a two-parameter estimator called the Liu-type estimator. Inan and Erdogan [17] extended this work to the logistic regression model. The logistic Liu-type estimator (LLTE) is as follows:
where k (k > 0) and d are the biasing parameters of the LLTE.
Ozkale and Kaciranlar [15] developed the two-parameter estimator (TPE) to mitigate multicollinearity in the LRM. Huang [18] developed the logistic TPE estimator (LTPE), defined as follows:
where k (k > 0) and d are the biasing parameters. The biasing parameters are defined in Equations (5) and (7), respectively.
Recently, the K-L estimator (KLE) [10] has shown better performance than the ordinary least squares, the RRE and the LE for parameter estimation in the LRM. The KLE is defined as
where k (k > 0) is the KLE biasing parameter, which, as will be discussed in Section 3.6, was obtained by minimizing the mean squared error (MSE). However, in this study, we propose the logistic K-L estimator (LKLE) as
The bias and the matrix mean squared error (MMSE) of the LKLE is obtained as follows:
The bias of the LKLE is as follows:
where .
The variance of the LKLE is defined as follows:
where .
Therefore, the MMSE and the scalar mean squared error (MSE) are, respectively, defined by
and
The MMSE and MSE of the MLE, LRE, LLE, LLTE and LTPE are given, respectively, as follows:
where
where
where
The following lemmas are needful to prove the statistical properties of the proposed estimator.
Lemma 1.
Let M be a positive definite matrix, that is, M > 0, andbe some vector. Thenif and only if[19].
Lemma 2.
Letbe two linear estimators of β [20]. Suppose thatwheredenotes the covariance matrix ofandConsequently,
if and only ifwhere
3. Comparison among the Estimators
In this section, we will perform a theoretical comparison of the proposed estimator with the available estimators in terms of MMSEs.
3.1. Comparison between and
Theorem 1.
If k > 0, the estimatoris preferable to the estimatorin the MMSE sense, if and only if,, where.
Proof.
can be written in scalar form as follows:
is positive definite since . Hence, using Lemma 2, > 0 if and only if
Simplifying Equation (27), we have . This was practically illustrated in Section 3.5 (Proof completed). □
3.2. Comparison between and
Theorem 2.
If k > 0, the estimatoris preferable to the estimatorin the MMSE sense, if and only if,, whereand
Proof.
can be written in scalar form as follows:
is positive definite since for k .
Hence, using Lemma 2, > 0 if and only if
Simplifying Equation (29), we have . This was practically illustrated in Section 5 (Proof completed). □
3.3. Comparison between and
Theorem 3.
If k > 0 and 0 < d < 1, the estimatoris preferable to the estimatorin MMSE sense if and only if,whereand
Proof.
can be written in scalar form as follows:
since for and .
Hence, using Lemma 2.2, > 0 if and only if
This was practically illustrated in Section 5 (Proof completed). □
3.4. Comparison between and
Theorem 4.
If k > 0 and, the estimatoris preferable to the estimatorin MMSE sense if and only if,where,and
Proof.
can be written in scalar form as follows:
is non-negative (nn), since .
Hence, using Lemma 2.2, if and only if
This was practically illustrated in Section 5 (Proof completed). □
3.5. Comparison between and
Theorem 5.
If k > 0 and, the estimatoris preferable to the estimatorin MMSE sense if and only if,where,and
Proof.
can be written in scalar form as follows:
is non-negative (nn) since .
Hence, using Lemma 2, > 0 if and only if
This was practically illustrated in Section 5 (Proof completed). □
3.6. Selection of k
Since the shrinkage parameter plays a significant role in estimating biased estimators such as the LRE, LLE and LKLE, several researchers have introduced various shrinkage parameter estimation methods for the different regression models [21,22,23,24,25,26,27,28,29]. Based on these studies, we propose some shrinkage estimators of the parameter k for the LKLE.
To estimate parameter k, following [4], we will consider the generalized version of KL- estimator, which is given as follows:
where K = diag(k1, k2, …, kp).
The MSE of KL estimator in (44) would be
Differentiating Equation (42) with respect to , (all terms except kj will be 0) and equating to 0, we have
Simplifying further Equation (43), we have
Dividing both sides of Equation (44) by 2, we obtain
By replacing with its unbiased estimates, Equation (45) becomes
Following Hoerl et al. [13], and based on the study of Mansson et al. [7], Lukman and Ayinde [3] and Qasim et al. [22,30], we suggest the following biasing parameter estimators for the logistic regression model:
- LKLE 1:
- LKLE 2: k =
- LRE 1:
- LRE 2:
- LLE:
- LLTE: ,
- LTPE: ,
4. Monte Carlo Simulation
In this section, we compare the performance of the logistic regression estimators using a simulation study. A significant number of simulation studies have been conducted to compare the performance of estimators for both linear and logistic regression models [24,25,26,27,28,29,30,31,32,33,34,35]. The MSE is a function of p and is minimized subject to constraint β′β = 1 [36,37]. Schaefer [14] showed that the logistic regression model can be designed employing a similar approach to that of the linear regression model. The correlated explanatory variables can be obtained using the simulation procedure given in [38,39].
where are independent standard normal pseudo-random numbers and is the correlation between the explanatory variables. The values of are chosen to be 0.9, 0.95, 0.99 and 0.999. The response variable is generated from the Bernoulli distribution, i.e., where . Sample size, n, is varied, i.e., 50, 100, 250 or 300. The estimated MSE is calculated as
where denotes the vector of the estimated regression coefficient in ith replication and β is the vector of the true parameter values, chosen such that . The experiment was replicated 2000 times. We present the estimated MSEs and the bias of each of the estimators for p = 3 in Table 1 and Table 2, respectively. For p = 7, the results are provided in Table 3 and Table 4, respectively. We observed that increasing the sample size resulted in a decrease in the MSE values for each case. The following observations were obtained from the simulation result. The MSE values of the estimators increased as the degree of correlation and the number of explanatory variables increased. The simulation results show that the LKLE performed best at most levels of multicollinearity, sample sizes and the number of explanatory variables with few exceptions. The LTPE competed favorably in most cases, except on a few occasions. Upon comparing the performance of the shrinkage parameters in the LKLE, we found that LKLE 1 performed well except in a few cases. The MLE performed least well when there was multicollinearity in the data. Of the two-parameter estimators (LTPE and LLTE), LTPE performed better. Additionally, it is obvious that the bias of the proposed estimator was the lowest in most cases. Generally, the LKLE estimator is preferred over the two-parameter estimator.
Table 1.
Estimated MSEs and Bias for p = 3.
Table 2.
Estimated MSEs and Bias for p = 3.
Table 3.
Estimated MSEs and Bias for p = 7.
Table 4.
Estimated MSEs and Bias for p = 7.
5. Application: Cancer Data
The performance of LKLE and the other estimators was evaluated using a cancer remission dataset [34,40]. In the dataset, the binary response variable yi is 1 if the patient experiences complete cancer remission and 0 otherwise. There are five explanatory variables. These explanatory variables include cell index (x1), smear index (x2), infıl index (x3), blast index (x4) and temperature (x5). There were 27 patients, of which nine experienced complete remission. The eigenvalues of the matrix were found to be λ1 = 9.2979, λ2 = 3.8070, λ3 = 3.0692, λ4 = 2.2713 and λ5 = 0.0314. To test the multicollinearity among the explanatory variables, we use condition index (CI), computed as = 17.2. There was moderate collinearity when CI was between 10 and 30 and severe multicollinearity when CI exceeded 30 [41]. Thus, the results provide evidence of moderate multicollinearity among the explanatory variables. Next, we compared the performance of the estimators using the previously described dataset. The estimated regression coefficients and the corresponding scalar MSE values are given in Table 5. The scalar MSEs of each of the estimators under study were obtained using Equations (17), (19), (21) and (23)–(25), respectively. The proposed LKLE estimator surpassed the other estimators in this study in terms of MSE.
Table 5.
Validation of the theoretical conditions for the cancer data.
Moreover, we also evaluated the theoretical conditions as stated in Theorems 1 to 5 for the actual dataset. The validation results of these conditions are given in Table 6. As shown, all the theorem conditions hold for the cancer data, because all the inequalities in the theorems were less than one, as expected.
Table 6.
Regression coefficients and MSEs of the logistic regression estimators for the cancer dataset *.
The logistic ridge estimator competed favorably in the simulation and the real-life application. The real-life application result agreed with the simulation study. However, the performance of the estimators in both the simulation and real life was a function of the biasing parameter. For instance, LKLE 1 performed best in the simulation study, while in the real-life analysis, LKLE 2 outperformed LKLE 1. Among the two-parameter estimators, the logistic two-parameter estimator (LTPE) performed best. Of the one-parameter estimators, LKLE outperformed the ridge and the Liu estimator. Generally, LKLE dominated among both the one- and two-parameter estimator. The performance of these estimators is a function of biasing parameters k and d. Additionally, as shown in Table 5. and did not fit well for the following estimators: MLE, LLE, LLTE, LTPE and LKLE 1.
6. Some Concluding Remarks
Kibria and Lukman (2020) developed the K-L estimator to circumvent the multicollinearity problem for the linear regression model. In this paper, we described the logistic Kibria-Lukman estimator (LKLE) to address the challenge of multicollinearity for the logistic regression model. We theoretically determined the superiority of LKLE over other existing estimators in terms of the MSE. The performance of the estimators was evaluated using the Monte Carlo simulation study. In the design of the experiment, factors such as the degree of correlation, the sample size and the number of explanatory variables were varied. The results showed that the performance of the estimators was highly dependent on these factors. Finally, to illustrate the efficiency of the proposed estimator, we applied a cancer dataset and observed that the results agreed with those of the simulation study to some extent. The findings of this study will be helpful for practitioners and applied researchers who use a logistic regression model with correlated explanatory variables.
Author Contributions
A.F.L.: Conceptualization, Methodology, Formal analysis, Software, Writing—original draft. B.M.G.K.: Conceptualization, Supervision, Review. R.F.: Writing—original draft, Resources, Review. M.A.: Methodology, Super vision, Writing—original draft. E.T.A.: Methodology, Formal analysis, Software, Writing—original draft. C.K.N.: Writing—original draft, Review. All authors have read and agreed to the published version of the manuscript.
Funding
Authors received no funding for this work.
Data Availability Statement
The Data is available as this article.
Conflicts of Interest
Authors declared no conflict of interest.
References
- Frisch, R. Statistical Confluence Analysis by Means of Complete Regression Systems; University Institute of Economics: Oslo, Norway, 1934. [Google Scholar]
- Kibria, B.M.G.; Mansson, K.; Shukur, G. Performance of some logistic ridge regression estimators. Comp. Econ. 2012, 40, 401–414. [Google Scholar] [CrossRef]
- Lukman, A.F.; Ayinde, K. Review and classifications of the ridge parameter estimation techniques. Hacet. J. Math. Stat. 2017, 46, 953–967. [Google Scholar] [CrossRef]
- Hoerl, A.E.; Kennard, R.W. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 1970, 12, 55–67. [Google Scholar] [CrossRef]
- Schaeffer, R.L.; Roi, L.D.; Wolfe, R.A. A ridge logistic estimator. Commun. Stat. Theory Methods 1984, 13, 99–113. [Google Scholar] [CrossRef]
- Liu, K. A new class of biased estimate in linear regression. Commun. Stat. 1993, 22, 393–402. [Google Scholar]
- Mansson, K.; Kibria, B.M.G.; Shukur, G. On Liu estimators for the logit regression model. Econ. Model. 2012, 29, 1483–1488. [Google Scholar] [CrossRef]
- Lukman, A.F.; Ayinde, K.; Binuomote, S.; Onate, A.C. Modified ridge-type estimator to combat multicollinearity: Application to chemical data. J. Chemomet. 2019, 33, e3125. [Google Scholar] [CrossRef]
- Lukman, A.F.; Adewuyi, E.; Onate, A.C.; Ayinde, K. A Modified Ridge-Type Logistic Estimator. Iran. J. Sci. Technol. Trans. A Sci. 2020, 44, 437–443. [Google Scholar] [CrossRef]
- Kibria, B.M.G.; Lukman, A.F. A new ridge type estimator for the linear regression model: Simulations and applications. Scientifica 2020, 2020, 9758378. [Google Scholar] [CrossRef]
- Lukman, A.F.; Ayinde, K.; Aladeitan, B.; Bamidele, R. An unbiased estimator with prior information. Arab. J. Basic Appl. Sci. 2020, 27, 45–55. [Google Scholar] [CrossRef]
- Dawoud, I.; Lukman, A.F.; Haadi, A. A new biased regression estimator: Theory, simulation and application. Sci. Afr. 2022, 15, e01100. [Google Scholar] [CrossRef]
- Hoerl, A.E.; Kennard, R.W.; Baldwin, K.F. Ridge regression: Some simulation. Commun. Stat. Theory Methods 1975, 4, 105–123. [Google Scholar] [CrossRef]
- Schaeffer, R.L. Alternative estimators in logistic regression when the data is collinear. J. Stat. Comput. Simul. 1986, 25, 75–91. [Google Scholar] [CrossRef]
- Özkale, M.R.; Kaciranlar, S. The restricted and unrestricted two-parameter estimators. Commun. Statist. Theor. Meth 2007, 36, 2707–2725. [Google Scholar] [CrossRef]
- Liu, K. Using Liu-type estimator to combat collinearity. Commun. Stat.-Theory Methods 2003, 32, 1009–2003. [Google Scholar] [CrossRef]
- Inan, D.; Erdogan, B.E. Liu-Type logistic estimator. Commun. Stat. Simul. Comput. 2013, 42, 1578–1586. [Google Scholar] [CrossRef]
- Huang, J. A Simulation Research on a Biased Estimator in Logistic Regression Model. In Computational Intelligence and Intelligent Systems. ISICA 2012. Communications in Computer and Information Science; Li, Z., Li, X., Liu, Y., Cai, Z., Eds.; Springer: Berlin/Heidelberg, Germany, 2012; Volume 316. [Google Scholar] [CrossRef]
- Farebrother, R.W. Further results on the mean square error of ridge regression. J. R. Stat. Soc. Ser. B 1976, 38, 248–250. [Google Scholar] [CrossRef]
- Trenkler, G.; Toutenburg, H. Mean squared error matrix comparisons between biased estimators—An overview of recent results. Stat. Pap. 1990, 31, 165–179. [Google Scholar] [CrossRef]
- Kibria, B.M.G. Performance of some new ridge regression estimators. Commun. Stat. Simul. Comput. 2003, 32, 419–435. [Google Scholar] [CrossRef]
- Qasim, M.; Amin, M.; Ullah, M.A. On the performance of some new liu parameters for the gamma regression model. J. Stat. Comput Simul 2018, 88, 3065–3080. [Google Scholar] [CrossRef]
- Amin, M.; Akram, M.N.; Majid, A. On the estimation of Bell regression model using ridge estimator. Commun. Stat. Simul. Comput. 2021. [Google Scholar] [CrossRef]
- Lukman, A.F.; Zakariya, A.; Kibria, B.M.G.; Ayinde, K. The KL estimator for the inverse gaussian regression model. Concurrency Computat. Pract. Exper. 2021, 33, e6222. [Google Scholar] [CrossRef]
- Lukman, A.F.; Aladeitan, B.; Ayinde, K.; Abonazel, M.R. Modified ridge-type for the Poisson Regression Model: Simulation and Application. J. Appl. Stat. 2021, 49, 2124–2136. [Google Scholar] [CrossRef] [PubMed]
- Lukman, A.F.; Adewuyi, E.; Månsson, K.; Kibria, B.M.G. A new estimator for the multicollinear poisson regression model: Simulation and application. Sci. Rep. 2021, 11, 3732. [Google Scholar] [CrossRef]
- Amin, M.; Qasim, M.; Amanullah, M.; Afzal, S. Performance of some ridge estimators for the gamma regression model. Stat. Pap. 2020, 61, 997–1026. [Google Scholar] [CrossRef]
- Amin, M.; Qasim, M.; Afzal, S.; Naveed, M. New ridge estimators in the inverse Gaussian regression: Monte Carlo simulation and application to chemical data. Commun. Stat. Simul. Comput. 2020, 51, 6170–6187. [Google Scholar] [CrossRef]
- Naveed, M.; Amin, M.; Afzal, S.; Qasim, M. New shrinkage parameters for the inverse Gaussian liu regression. Commun. Stat. Theory Methods 2020, 51, 3216–3236. [Google Scholar] [CrossRef]
- Qasim, M.; Amin, M.; Omer, T. Performance of some new Liu parameters for the linear regression model. Commun. Stat. Theory Methods 2020, 49, 4178–4196. [Google Scholar] [CrossRef]
- Ayinde, K.; Lukman, A.F.; Samuel, O.O.; Ajiboye, S.A. Some new adjusted ridge estimators of linear regression model. Int. J. Civil Eng. Technol. 2018, 9, 2838–2852. [Google Scholar]
- Asar, Y.; Genç, A. Two-parameter ridge estimator in the binary logistic regression. Commun. Stat.-Simul. Comput. 2017, 46, 7088–7099. [Google Scholar] [CrossRef]
- Kibria, B.M.G.; Banik, S. Some ridge regression estimators and their performances. J. Mod. Appl. Stat. Methods 2016, 15, 206–238. [Google Scholar] [CrossRef]
- Özkale, M.R.; Arıcan, E. A new biased estimator in logistic regression model. Statistics 2016, 50, 233–253. [Google Scholar] [CrossRef]
- Varathan, N.; Wijekoon, P. Optimal generalized logistic estimator. Commun. Stat. Theory Methods 2018, 47, 463–474. [Google Scholar] [CrossRef]
- Saleh, A.K.; Md, E.; Arashi, M.; Kibria, B.M.G. Theory of Ridge Regression Estimation with Applications; John Wiley: Hoboken, NJ, USA, 2019. [Google Scholar]
- Newhouse, J.P.; Oman, S.D. An Evaluation of Ridge Estimators; P-716-PR; Rand Corporation: Santa Monica, CA, USA, 1971; pp. 1–28. [Google Scholar]
- Gibbons, D.G. A simulation study of some ridge estimators. J. Am. Stat. Assoc. 1981, 76, 131–139. [Google Scholar] [CrossRef]
- McDonald, G.; Galarneau, D.I. A Monte Carlo evaluation of some ridge-type estimators. J. Am. Stat. Assoc. 1975, 70, 407–416. [Google Scholar] [CrossRef]
- Lesaffre, E.; Marx, B.D. Collinearity in generalized linear regression. Commun. Stat. Theory Methods 1993, 22, 1933–1952. [Google Scholar] [CrossRef]
- Gujarati, D.N. Basic Econometrics; McGraw-Hill: New York, NY, USA, 1995. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).