Cross-Validation, Information Theory, or Maximum Likelihood? A Comparison of Tuning Methods for Penalized Splines
Abstract
:1. Introduction
2. Theoretical Background
2.1. Smoothing Spline Definition
2.2. Representation and Computation
2.3. Bayesian Inference
3. Tuning Methods
3.1. Cross-Validation Methods
3.2. Information Theory Methods
3.3. Maximum Likelihood Methods
4. Simulation Study
4.1. Simulation Design
4.2. Simulation Analyses
4.3. Simulation Results
5. Real Data Examples
5.1. Global Warming Example
5.2. Motorcycle Accident Example
6. Discussion
6.1. Overview
6.2. Summary of Results
6.3. Limitations and Future Directions
6.4. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
OCV | Ordinary Cross-Validation |
GCV | Generalized Cross-Validation |
AIC | An Information Criterion |
BIC | Bayesian Information Criterion |
ML | Maximum Likelihood |
REML | Restricted Maximum Likelihood |
RMSE | Root Mean Squared Error |
Appendix A. Supplementary Results for Gaussian Errors
Appendix A.1. RMSE Results
Appendix A.2. Coverage Results
Appendix B. Supplementary Results for Multivariate t5 Errors
Appendix B.1. RMSE Results
Appendix B.2. Coverage Results
Appendix C. Supplementary Results for Uniform Errors
Appendix C.1. RMSE Results
Appendix C.2. Coverage Results
References
- Ramsay, J.O.; Silverman, B.W. Applied Functional Data Analysis; Springer: New York, NY, USA, 2002. [Google Scholar]
- Ramsay, J.O.; Silverman, B.W. Functional Data Analysis, 2nd ed.; Springer: New York, NY, USA, 2005. [Google Scholar]
- Ramsay, J.O.; Hooker, G.; Graves, S. Functional Data Analysis with R and MATLAB; Springer: New York, NY, USA, 2009. [Google Scholar]
- Ullah, S.; Finch, C.F. Applications of functional data analysis: A systematic review. BMC Med. Res. Methodol. 2013, 13, 43. [Google Scholar] [CrossRef] [Green Version]
- Wang, J.L.; Chiou, J.M.; Müller, H.G. Functional Data Analysis. Annu. Rev. Stat. Its Appl. 2016, 3, 257–295. [Google Scholar] [CrossRef] [Green Version]
- Stone, A.A.; Shiffman, S. Ecological momentary assessment (EMA) in behavorial medicine. Ann. Behav. Med. 1994, 16, 199–202. [Google Scholar] [CrossRef]
- Shiffman, S.; Stone, A.A.; Hufford, M.R. Ecological Momentary Assessment. Annu. Rev. Clin. Psychol. 2008, 4, 1–32. [Google Scholar] [CrossRef]
- Helwig, N.E.; Gao, Y.; Wang, S.; Ma, P. Analyzing spatiotemporal trends in social media data via smoothing spline analysis of variance. Spat. Stat. 2015, 14, 491–504. [Google Scholar] [CrossRef]
- Helwig, N.E.; Shorter, K.A.; Hsiao-Wecksler, E.T.; Ma, P. Smoothing spline analysis of variance models: A new tool for the analysis of cyclic biomechaniacal data. J. Biomech. 2016, 49, 3216–3222. [Google Scholar] [CrossRef] [Green Version]
- Helwig, N.E.; Sohre, N.E.; Ruprecht, M.R.; Guy, S.J.; Lyford-Pike, S. Dynamic properties of successful smiles. PLoS ONE 2017, 12, e0179708. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Helwig, N.E.; Ruprecht, M.R. Age, gender, and self-esteem: A sociocultural look through a nonparametric lens. Arch. Sci. Psychol. 2017, 5, 19–31. [Google Scholar] [CrossRef]
- Lawrence, R.L.; Sessions, W.C.; Jensen, M.C.; Staker, J.L.; Eid, A.; Breighner, R.; Helwig, N.E.; Braman, J.P.; Ludewig, P.M. The effect of glenohumeral plane of elevation on supraspinatus subacromial proximity. J. Biomech. 2018, 79, 147–154. [Google Scholar] [CrossRef] [PubMed]
- Almquist, Z.W.; Helwig, N.E.; You, Y. Connecting Continuum of Care point-in-time homeless counts to United States Census areal units. Math. Popul. Stud. 2020, 27, 46–58. [Google Scholar] [CrossRef]
- Hammell, A.E.; Helwig, N.E.; Kaczkurkin, A.N.; Sponheim, S.R.; Lissek, S. The temporal course of over-generalized conditioned threat expectancies in posttraumatic stress disorder. Behav. Res. Ther. 2020, 124, 103513. [Google Scholar] [CrossRef] [PubMed]
- Helwig, N.E. Regression with ordered predictors via ordinal smoothing splines. Front. Appl. Math. Stat. 2017, 3, 15. [Google Scholar] [CrossRef] [Green Version]
- Helwig, N.E. Multiple and Generalized Nonparametric Regression. In SAGE Research Methods Foundations; Atkinson, P., Delamont, S., Cernat, A., Sakshaug, J.W., Williams, R.A., Eds.; SAGE: Thousand Oaks, CA, USA, 2020. [Google Scholar] [CrossRef]
- James, G.; Witten, D.; Hastie, T.; Tibshirani, R. An Introduction to Statistical Learning, with Applications in R; Springer: New York, NY, USA, 2013. [Google Scholar] [CrossRef]
- Hoerl, A.; Kennard, R. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 1970, 12, 55–67. [Google Scholar] [CrossRef]
- Gu, C.; Kim, Y.J. Penalized likelihood regression: General formulation and efficient approximation. Can. J. Stat. 2002, 30, 619–628. [Google Scholar] [CrossRef] [Green Version]
- R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing; R Version 4.1.0; R Core Team: Vienna, Austria, 2021. [Google Scholar]
- Altman, N.S. Kernel smoothing of data with correlated errors. J. Am. Stat. Assoc. 1990, 85, 749–759. [Google Scholar] [CrossRef]
- Opsomer, J.; Wang, Y.; Yang, Y. Nonparametric regression with correlated errors. Stat. Sci. 2001, 16, 134–153. [Google Scholar] [CrossRef]
- Wang, Y. Mixed effects smoothing spline analysis of variance. J. R. Stat. Soc. Ser. B 1998, 60, 159–174. [Google Scholar] [CrossRef]
- Wang, Y. Smoothing spline models with correlated random errors. J. Am. Stat. Assoc. 1998, 93, 341–348. [Google Scholar] [CrossRef]
- Zhang, D.; Lin, X.; Raz, J.; Sowers, M. Semiparametric stochastic mixed models for longitudinal data. J. Am. Stat. Assoc. 1998, 93, 710–719. [Google Scholar] [CrossRef]
- Reiss, P.T.; Ogden, R.T. Smoothing parameter selection for a class of semiparametric linear models. J. R. Stat. Soc. Ser. B 2009, 71, 505–523. [Google Scholar] [CrossRef]
- Krivobokova, T.; Kauermann, G. A note on penalized spline smoothing with correlated errors. J. Am. Stat. Assoc. 2007, 102, 1328–1337. [Google Scholar] [CrossRef]
- Lee, T.C.M. Smoothing parameter selection for smoothing splines: A simulation study. Comput. Stat. Data Anal. 2003, 42, 139–148. [Google Scholar] [CrossRef]
- Kimeldorf, G.; Wahba, G. Some results on Tchebycheffian spline functions. J. Math. Anal. Appl. 1971, 33, 82–95. [Google Scholar] [CrossRef] [Green Version]
- Kim, Y.J.; Gu, C. Smoothing spline Gaussian regression: More scalable computation via efficient approximation. J. R. Stat. Soc. Ser. B 2004, 66, 337–356. [Google Scholar] [CrossRef] [Green Version]
- Gu, C. Smoothing Spline ANOVA Models, 2nd ed.; Springer: New York, NY, USA, 2013. [Google Scholar] [CrossRef]
- Moore, E.H. On the reciprocal of the general algebraic matrix. Bull. Am. Math. Soc. 1920, 26, 394–395. [Google Scholar] [CrossRef]
- Penrose, R. A generalized inverse for matrices. Math. Proc. Camb. Philos. Soc. 1955, 51, 406–413. [Google Scholar] [CrossRef] [Green Version]
- Wahba, G. Bayesian “confidence intervals” for the cross-validated smoothing spline. J. R. Stat. Soc. Ser. B 1983, 45, 133–150. [Google Scholar] [CrossRef]
- Nychka, D. Bayesian confidence intervals for smoothing splines. J. Am. Stat. Assoc. 1988, 83, 1134–1143. [Google Scholar] [CrossRef]
- Kalpić, D.; Hlupić, N. Multivariate Normal Distributions. In International Encyclopedia of Statistical Science; Lovric, M., Ed.; Springer: Berlin/Heidelberg, Germany, 2011; pp. 907–910. [Google Scholar] [CrossRef]
- Allen, D.M. The relationship between variable selection and data augmentation and a method for prediction. Technometrics 1974, 16, 125–127. [Google Scholar] [CrossRef]
- Stone, M. Cross-validatory choice and assessment of statistical predictions. J. R. Stat. Soc. Ser. B (Methodol.) 1974, 36, 111–133. [Google Scholar] [CrossRef]
- Wahba, G.; Wold, S. A completely automatic French curve: Fitting spline functions by cross validation. Commun. Stat. 1975, 4, 1–17. [Google Scholar] [CrossRef]
- Craven, P.; Wahba, G. Smoothing noisy data with spline functions: Estimating the correct degree of smoothing by the method of generalized cross-validation. Numer. Math. 1979, 31, 377–403. [Google Scholar] [CrossRef]
- Li, K.C. Asymptotic optimality for Cp, CL, cross-validation and generalized cross-validation: Discrete index set. Ann. Stat. 1987, 15, 958–975. [Google Scholar] [CrossRef]
- Akaike, H. A new look at the statistical model identification. IEEE Trans. Autom. Control 1974, 19, 716–723. [Google Scholar] [CrossRef]
- Schwarz, G.E. Estimating the dimension of a model. Ann. Stat. 1978, 6, 461–464. [Google Scholar] [CrossRef]
- Wahba, G. A comparison of GCV and GML for choosing the smoothing parameters in the generalized spline smoothing problem. Ann. Stat. 1985, 4, 1378–1402. [Google Scholar] [CrossRef]
- Ruppert, D.; Wand, M.P.; Carroll, R.J. Semiparametric Regression; Cambridge University Press: Cambridge, UK, 2003. [Google Scholar]
- Patterson, H.D.; Thompson, R. Recovery of inter-block information when block sizes are unequal. Biometrika 1971, 58, 545–554. [Google Scholar] [CrossRef]
- Falk, M. A simple approach to the generation of uniformly distributed random variables with prescribed correlations. Commun. Stat.-Simul. Comput. 1999, 28, 785–791. [Google Scholar] [CrossRef]
- Helwig, N.E. npreg: Nonparametric Regression via Smoothing Splines; R Package Version 1.0-6; The Comprehensive R Archive Network. 2021. Available online: https://cran.r-project.org/package=npreg (accessed on 22 August 2021).
- GISTEMP Team. GISS Surface Temperature Analysis (GISTEMP); Dataset Version 4; NASA Goddard Institute for Space Studies. Available online: https://data.giss.nasa.gov/gistemp/ (accessed on 17 May 2021).
- Lenssen, N.; Schmidt, G.; Hansen, J.; Menne, M.; Persin, A.; Ruedy, R.; Zyss, D. Improvements in the GISTEMP uncertainty model. J. Geophys. Res. Atmos. 2019, 124, 6307–6326. [Google Scholar] [CrossRef]
- Silverman, B.W. Aspects of the spline smoothing approach to non-parametric regression curve fitting. J. R. Stat. Soc. Ser. B 1985, 47, 1–52. [Google Scholar] [CrossRef]
- Venables, W.N.; Ripley, B.D. Modern Applied Statistics with S, 4th ed.; Springer: New York, NY, USA, 2002; ISBN 0-387-95457-0. [Google Scholar]
- Koenker, R.; Ng, P.; Portnoy, S. Quantile smoothing splines. Biometrika 1994, 81, 673–680. [Google Scholar] [CrossRef]
- Li, G.Y.; Shi, P.; Li, G. Global convergence rates of B-spline M-estimators in nonparametric regression. Stat. Sin. 1995, 5, 303–318. [Google Scholar]
- Wood, S.N. mgcv: Mixed GAM Computation Vehicle with Automatic Smoothness Estimation; R Package Version 1.8-35; The Comprehensive R Archive Network. 2021. Available online: https://cran.r-project.org/package=mgcv (accessed on 22 August 2021).
- Helwig, N.E. bigsplines: Smoothing Splines for Large Samples; R Package Version 1.1-1; The Comprehensive R Archive Network. 2018. Available online: https://cran.r-project.org/package=bigsplines (accessed on 22 August 2021).
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Berry, L.N.; Helwig, N.E. Cross-Validation, Information Theory, or Maximum Likelihood? A Comparison of Tuning Methods for Penalized Splines. Stats 2021, 4, 701-724. https://doi.org/10.3390/stats4030042
Berry LN, Helwig NE. Cross-Validation, Information Theory, or Maximum Likelihood? A Comparison of Tuning Methods for Penalized Splines. Stats. 2021; 4(3):701-724. https://doi.org/10.3390/stats4030042
Chicago/Turabian StyleBerry, Lauren N., and Nathaniel E. Helwig. 2021. "Cross-Validation, Information Theory, or Maximum Likelihood? A Comparison of Tuning Methods for Penalized Splines" Stats 4, no. 3: 701-724. https://doi.org/10.3390/stats4030042
APA StyleBerry, L. N., & Helwig, N. E. (2021). Cross-Validation, Information Theory, or Maximum Likelihood? A Comparison of Tuning Methods for Penalized Splines. Stats, 4(3), 701-724. https://doi.org/10.3390/stats4030042