In this paper, we investigate several variable selection procedures to give an overview of the existing literature for practitioners. “Let the data speak for themselves” has become the motto of many applied researchers since the number of data has significantly grown. Automatic model selection has been promoted to search for data-driven theories for quite a long time now. However, while great extensions have been made on the theoretical side, basic procedures are still used in most empirical work, e.g., stepwise regression. Here, we provide a review of main methods and state-of-the art extensions as well as a topology of them over a wide range of model structures (linear, grouped, additive, partially linear and non-parametric) and available software resources for implemented methods so that practitioners can easily access them. We provide explanations for which methods to use for different model purposes and their key differences. We also review two methods for improving variable selection in the general sense.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited