Next Article in Journal
Fabrication of a Novel Antifouling Polysulfone Membrane with in Situ Embedment of Mxene Nanosheets
Next Article in Special Issue
Availability of Real-World Data in Italy: A Tool to Navigate Regional Healthcare Utilization Databases
Previous Article in Journal
A Scientometric Review of Resource Recycling Industry
Previous Article in Special Issue
Determinants and Differences of Township Hospital Efficiency among Chinese Provinces
Article

Bring More Data!—A Good Advice? Removing Separation in Logistic Regression by Increasing Sample Size

Institute of Clinical Biometrics, Center for Medical Statistics, Informatics and Intelligent Systems (CEMSIIS), Spitalgasse 23, 1090 Vienna, Austria
*
Author to whom correspondence should be addressed.
Int. J. Environ. Res. Public Health 2019, 16(23), 4658; https://doi.org/10.3390/ijerph16234658
Received: 25 October 2019 / Revised: 12 November 2019 / Accepted: 20 November 2019 / Published: 22 November 2019
(This article belongs to the Special Issue Statistical Advances in Epidemiology and Public Health)
The parameters of logistic regression models are usually obtained by the method of maximum likelihood (ML). However, in analyses of small data sets or data sets with unbalanced outcomes or exposures, ML parameter estimates may not exist. This situation has been termed ‘separation’ as the two outcome groups are separated by the values of a covariate or a linear combination of covariates. To overcome the problem of non-existing ML parameter estimates, applying Firth’s correction (FC) was proposed. In practice, however, a principal investigator might be advised to ‘bring more data’ in order to solve a separation issue. We illustrate the problem by means of examples from colorectal cancer screening and ornithology. It is unclear if such an increasing sample size (ISS) strategy that keeps sampling new observations until separation is removed improves estimation compared to applying FC to the original data set. We performed an extensive simulation study where the main focus was to estimate the cost-adjusted relative efficiency of ML combined with ISS compared to FC. FC yielded reasonably small root mean squared errors and proved to be the more efficient estimator. Given our findings, we propose not to adapt the sample size when separation is encountered but to use FC as the default method of analysis whenever the number of observations or outcome events is critically low. View Full-Text
Keywords: maximum likelihood estimation; logistic regression; Firth’s correction; separation; penalized likelihood; bias maximum likelihood estimation; logistic regression; Firth’s correction; separation; penalized likelihood; bias
Show Figures

Figure 1

MDPI and ACS Style

Šinkovec, H.; Geroldinger, A.; Heinze, G. Bring More Data!—A Good Advice? Removing Separation in Logistic Regression by Increasing Sample Size. Int. J. Environ. Res. Public Health 2019, 16, 4658. https://doi.org/10.3390/ijerph16234658

AMA Style

Šinkovec H, Geroldinger A, Heinze G. Bring More Data!—A Good Advice? Removing Separation in Logistic Regression by Increasing Sample Size. International Journal of Environmental Research and Public Health. 2019; 16(23):4658. https://doi.org/10.3390/ijerph16234658

Chicago/Turabian Style

Šinkovec, Hana, Angelika Geroldinger, and Georg Heinze. 2019. "Bring More Data!—A Good Advice? Removing Separation in Logistic Regression by Increasing Sample Size" International Journal of Environmental Research and Public Health 16, no. 23: 4658. https://doi.org/10.3390/ijerph16234658

Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop