# Parametric and Nonparametric Frequentist Model Selection and Model Averaging

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Parametric Model Selection and Model Averaging

#### 2.1. Model Selection

#### 2.1.1. AIC, TIC, and BIC

#### 2.1.2. FIC

#### 2.1.3. Mallows Model Selection

#### 2.1.4. Cross-Validation (CV)

#### 2.1.5. Model Selection by Other Penalty Functions

#### 2.2. Model Averaging

#### 2.2.1. Bayesian and FIC Weights

#### 2.2.2. Mallows Weight Selection Method

#### 2.2.3. Jackknife Model Averaging Method (CV)

## 3. Nonparametric (NP) Model Selection and Model Averaging

#### 3.1. NP Model Selection

#### 3.1.1. AIC, BIC, and GCV

#### 3.1.2. Mallows Model Selection

#### 3.1.3. Cross Validation (CV)

#### 3.2. NP Model Averaging

## 4. Conclusions

## Acknowledgements

## Conflicts of Interest

## References

- A. Pagan, and A. Ullah. Nonparametric Econometrics. Cambridge, UK: Cambridge University Press, 1999. [Google Scholar]
- Q. Li, and J.S. Racine. Nonparametric Econometrics: Theory and Practice. Princeton, NJ, USA: Princeton University Press, 2007. [Google Scholar]
- A. Belloni, and V. Chernozhukov. “L1-penalized quantile regression in high-dimensional sparse models.” Ann. Stat. 39 (2011): 82–130. [Google Scholar] [CrossRef]
- C. Zhang, J. Fan, and T. Yu. “Multiple testing via FDRL for large-scale imaging data.” Ann. Stat. 39 (2011): 613–642. [Google Scholar] [CrossRef] [PubMed]
- H. Akaike. “Information Theory and An Extension of the Maximum Likelihood Principle.” In International Symposium on Information Theory. Edited by B.N. Petrov and F. Csaki. New York, USA: Springer-Verlag, 1973, pp. 267–281. [Google Scholar]
- C.L. Mallows. “Some comments on C
_{p}.” Technometrics 15 (1973): 661–675. [Google Scholar] - G. Schwarz. “Estimating the dimension of a model.” Ann. Stat. 6 (1978): 461–464. [Google Scholar] [CrossRef]
- M. Stone. “Cross-validatory choice and assessment of statistical predictions.” J. R. Stat. Soc. 36 (1974): 111–147. [Google Scholar]
- P. Craven, and G. Wahba. “Smoothing noisy data with spline functions.” Numer. Math. 31 (1979): 377–403. [Google Scholar] [CrossRef]
- G. Claeskens, and N.L. Hjort. “The focused information criterion.” J. Am. Stat. Assoc. 98 (2003): 900–945. [Google Scholar] [CrossRef]
- I.E. Frank, and J.H. Friedman. “A statistical view of some chemomtrics regression tools.” Technometrics 35 (1993): 109–135. [Google Scholar] [CrossRef]
- W. Fu, and K. Knight. “Asymptotics for lasso-type estimators.” Ann. Stat. 28 (2000): 1356–1378. [Google Scholar] [CrossRef]
- A.E. Hoerl, and R.W. Kennard. “Ridge regression: Biased estimation for nonorthogonal problems.” Technometrics 12 (1970): 55–67. [Google Scholar] [CrossRef]
- J. Fan, and J. Lv. “A selective overview of variable selection in high dimensional feature space.” Stat. Sin. 20 (2010): 101–148. [Google Scholar] [PubMed]
- P. Bühlmann, and S. Van de Geer. Statistics for High-Dimensional Data: Methods, Theory and Applications. New York, NY, USA: Springer, 2011. [Google Scholar]
- G. Claeskens, and N.L. Hjort. Model Selection and Model Averaging. Cambridge, UK: Cambridge University Press, 2008. [Google Scholar]
- D. Andrews, and B. Lu. “Consistent model and moment selection procedures for GMM estimation with application to dynamic panel data models.” J. Econom. 101 (2001): 123–164. [Google Scholar] [CrossRef]
- A.R. Hall, A. Inoue, K. Jana, and C. Shin. “Information in generalized method of moments estimation and entropy-based moment selection.” J. Econom. 138 (2007): 488–512. [Google Scholar] [CrossRef]
- B.M. Pötscher. “Effects of model selection on inference.” Econom. Theory 7 (1991): 163–185. [Google Scholar] [CrossRef]
- P. Kabaila. “The Effect of Model Selection on Confidence Regions and Prediction Regions.” Econom. Theory 11 (1995): 537–549. [Google Scholar] [CrossRef]
- P. Bühlmann. “Efficient and adaptive post-model-selection estimators.” J. Stat. Plan. Inference 79 (1999): 1–9. [Google Scholar] [CrossRef]
- H. Leeb, and B.M. Pötscher. “The finite-sample distribution of post-model-selection estimators and uniform versus nonuniform approximations.” Econom. Theory 19 (2003): 100–142. [Google Scholar] [CrossRef]
- H. Leeb, and B.M. Pötscher. “Can one estimate the conditional distribution of post-model-selection estimators? ” Ann. Stat. 34 (2006): 2554–2591. [Google Scholar] [CrossRef]
- L. Breiman. “Heuristics of instability and stabilization in model selection.” Ann. Stat. 24 (1996): 2350–2383. [Google Scholar] [CrossRef]
- S. Jin, L. Su, and A. Ullah. “Robustify financial time series forecasting.” Econom. Rev., 2013, in press. [Google Scholar] [CrossRef]
- J.F. Geweke. Contemporary Bayesian Econometrics and Statistics. Hoboken, NJ, USA: John Wiley and Sons Inc., 2005. [Google Scholar]
- J.F. Geweke. “Bayesian model comparison and validation.” Am. Econ. Rev. Pap. Proc. 97 (2007): 60–64. [Google Scholar] [CrossRef]
- D. Draper. “Assessment and propagation of model uncertainty.” J. R. Stat. Soc. 57 (1995): 45–97. [Google Scholar]
- J.A. Hoeting, D. Madigan, A.E. Raftery, and C.T. Volinsky. “Bayesian model averaging: A tutorial (with discussion).” Stat. Sci. 14 (1999): 382–417. [Google Scholar]
- M. Clyde, and E.I. George. “Model uncertainty.” Stat. Sci. 19 (2004): 81–94. [Google Scholar]
- W.A. Brock, S.N. Durlauf, and K.D. West. “Policy evaluation uncertain economic environment.” Brook. Pap. Econ. Act. 2003 (2003): 235–301. [Google Scholar] [CrossRef]
- X. Sala-i-Martin, G. Doppelhofer, and R.I. Miller. “Determinants of long-term growth: A Bayesian Averaging of Classical Estimates (BACE) approach.” Am. Econ. Rev. 94 (2004): 813–835. [Google Scholar] [CrossRef]
- J.R. Magnus, O. Powell, and P. Prüfer. “A comparison of two model averaging techniques with an application to growth empirics.” J. Econom. 154 (2010): 139–153. [Google Scholar] [CrossRef]
- S.T. Buckland, K.P. Burnham, and N.H. Augustin. “Model selection: An integral part of inference.” Biometrics 53 (1997): 603–618. [Google Scholar] [CrossRef]
- Y. Yang. “Adaptive regression by mixing.” J. Am. Stat. Assoc. 96 (2001): 574–586. [Google Scholar] [CrossRef]
- K.P. Burnham, and D.R. Anderson. Model Selection and Multimodel Inference: A Practical Information-Theoretical Approach. New York, NY, USA: Springer-Verlag, 2002. [Google Scholar]
- G. Leung, and A.R. Barron. “Information theory and mixing least-squares regressions.” IEEE Trans. Inf. Theory 52 (2006): 3396–3410. [Google Scholar] [CrossRef]
- Z. Yuan, and Y. Yang. “Combining linear regression models: When and how? ” J. Bus. Econ. Stat. 100 (2005): 1202–1204. [Google Scholar] [CrossRef]
- B.E. Hansen. “Notes and comments least squares model averaging.” Econometrica 75 (2007): 1175–1189. [Google Scholar] [CrossRef]
- B.E. Hansen, and J. Racine. “Jackknife model averaging.” J. Econom. 167 (2012): 38–46. [Google Scholar] [CrossRef]
- A.T.K. Wan, X. Zhang, and G. Zou. “Least squares model averaging by mallows criterion.” J. Econom. 156 (2010): 277–283. [Google Scholar] [CrossRef]
- G. Kapetanios, V. Labhard, and S. Price. “Forecasting using predictive likelihood model averaging.” Econ. Lett. 91 (2006): 373–379. [Google Scholar] [CrossRef]
- A.T.K. Wan, and X. Zhang. “On the use of model averaging in tourism research.” Ann. Tour. Res. 36 (2009): 525–532. [Google Scholar] [CrossRef]
- J.M. Bates, and C.W. Granger. “The combination of forecasts.” Oper. Res. Q. 20 (1969): 451–468. [Google Scholar] [CrossRef]
- I. Olkin, and C.H. Speigelman. “A semiparametric approach to density estimation.” J. Am. Stat. Assoc. 82 (1987): 858–865. [Google Scholar] [CrossRef]
- Y. Fan, and A. Ullah. “Asymptotic normality of a combined regression estimator.” J. Multivar. Anal. 71 (1999): 191–240. [Google Scholar] [CrossRef]
- D.H. Wolpert. “Stacked generalization.” Neural Netw. 5 (1992): 241–259. [Google Scholar] [CrossRef]
- M. LeBlanc, and R. Tibshirani. “Combining estimates in regression and classification.” J. Am. Stat. Assoc. 91 (1996): 1641–1650. [Google Scholar] [CrossRef]
- Y. Yang. “Mixing strategies for density estimation.” Ann. Stat. 28 (2000): 75–87. [Google Scholar] [CrossRef]
- O. Catoni. The Mixture Approach to Universal Model Selection. Technical Report; Paris, France: Ecole Normale Superieure, 1997. [Google Scholar]
- M.I. Jordan, and R.A. Jacobs. “Hiearchical mixtures of experts and the EM algorithm.” Neural Comput. 6 (1994): 181–214. [Google Scholar] [CrossRef]
- X. Jiang, and M.A. Tanner. “On the asymptotic normality of hierarchical mixtures-of-experts for generalized linear models.” IEEE Trans. Inf. Theory 46 (2000): 1005–1013. [Google Scholar] [CrossRef]
- V.G. Vovk. “Aggregateing Strategies.” In Proceedings of the 3rd Annual Workshop on Computational Learning Theory, Rochester, NY, USA, 06–08 August 1990; Volume 56, pp. 371–383.
- V.G. Vovk. “A game of prediction with expert advice.” J. Comput. Syst. Sci. 56 (1998): 153–173. [Google Scholar] [CrossRef]
- N. Merhav, and M. Feder. “Universal prediction.” IEEE Trans. Inf. Theory 44 (1998): 2124–2147. [Google Scholar] [CrossRef]
- A. Ullah. “Nonparametric estimation of econometric functionals.” Can. J. Econ. 21 (1988): 625–658. [Google Scholar] [CrossRef]
- J. Fan, and I. Gijbels. Nonparametric Estimation of Econometric Functionals. London, UK: Champman and Hall, 1996. [Google Scholar]
- R.L. Eubank. Nonparametric Regression and Spline Smoothing. New York, NY, USA: CRC Press, 1999. [Google Scholar]
- S. Geman, and C. Hwang. “Diffusions for global optimization.” SIAM J. Control Optim. 24 (1982): 1031–1043. [Google Scholar] [CrossRef]
- W.K. Newey. “Convergence rates and asymptotic normality for series estimators.” J. Econom. 79 (1997): 147–168. [Google Scholar] [CrossRef]
- H. Wang, X. Zhang, and G. Zou. “Frequentist model averaging estimation: A review.” J. Syst. Sci. Complex. 22 (2009): 732–748. [Google Scholar] [CrossRef]
- L. Su, and Y. Zhang. “Variable Selection in Nonparametric and Semiparametric Regression Models.” In Handbook of Applied Nonparametric and Semiparametric Econometrics and Statistics. Edited by A. Ullah, J. Racine and L. Su. Oxford, UK: Oxford University Press, 2013, in press. [Google Scholar]
- A.K. Srivastava, V.K. Srivastava, and A. Ullah. “The coefficient of determination and its adjusted version in linear regression models.” Econom. Rev. 14 (1995): 229–240. [Google Scholar] [CrossRef]
- V. Rousson, and N.F. Gosoniu. “An R-square coefficient based on final prediction error.” Stat. Methodol. 4 (2007): 331–340. [Google Scholar] [CrossRef]
- Y. Wang. On Efficiency Properties of An R-square Coefficient Based on Final Prediction Error. Working Paper; Beijing, China: School of International Trade and Economics, University of International Business and Economics, 2013. [Google Scholar]
- K. Takeuchi. “Distribution of information statistics and criteria for adequacy of models.” Math. Sci. 153 (1976): 12–18, In Japanese. [Google Scholar]
- E. Maasoumi. “A compendium to information theory in economics and econometrics.” Econom. Rev. 12 (1993): 137–181. [Google Scholar] [CrossRef]
- A. Ullah. “Entropy, divergence and distance measures with econometric applications.” J. Stat. Plan. Inference 49 (1996): 137–162. [Google Scholar] [CrossRef]
- R. Nishi. “Asymptotic properties of criteria for selection of variables in multiple regression.” Ann. Stat. 12 (1984): 758–765. [Google Scholar] [CrossRef]
- E.J. Hannan, and B.G. Quinn. “The determination of the order of an autoregression.” J. R. Stat. Soc. 41 (1979): 190–195. [Google Scholar]
- C.M. Hurvich, and C.L. Tsai. “Regression and time series model selection in small samples.” Biometrika 76 (1989): 297–307. [Google Scholar] [CrossRef]
- J. Kuha. “AIC and BIC: Comparisons of assumptions and performance.” Sociol. Methods Res. 33 (2004): 188–229. [Google Scholar] [CrossRef]
- M. Stone. “An asymptotic equivalence of choice of model by cross-validation and Akaike’s criterion.” J. R. Stat. Soc. 39 (1977): 44–47. [Google Scholar]
- M. Stone. “1979. Comments on model selection criteria of Akaike and Schwartz.” J. R. Stat. Soc. 41 (1979): 276–278. [Google Scholar]
- G.S. Maddala. Introduction to Econometrics. New York, NY, USA: Macmillan, 1988. [Google Scholar]
- R. Tibshirani. “Regression shrinkage and selection via the lasso.” J. R. Stat. 58 (1996): 267–288. [Google Scholar]
- A. Ullah, A.T.K. Wan, H. Wang, X. Zhang, and G. Zou. A Semiparametric Generalized Ridge Estimator and Link with Model Averaging. Working Paper; Riverside, CA, USA: Department of Economics, University of California, 2013. [Google Scholar]
- H. Zou. “The adaptive lasso and its oracle properties.” J. Am. Stat. Assoc. 101 (2006): 1418–1429. [Google Scholar] [CrossRef]
- C. Zhang. “Nearly unbiased variable selection under minimax concave penalty.” Ann. Stat. 38 (2010): 894–942. [Google Scholar] [CrossRef]
- J. Fan, and R. Li. “Variable selection via nonconcave penalized likelihood and its oracle properties.” J. Am. Stat. Assoc. 96 (2001): 1348–1360. [Google Scholar] [CrossRef]
- B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani. “Least angle regression.” Ann. Stat. 32 (2004): 407–499. [Google Scholar]
- X. Zhang, A.T.K. Wan, and S.Z. Zhou. “Focused information criteria, model selection, and model averaging in a tobit model with a nonzero threshold.” J. Bus. Econ. Stat. 30 (2012): 132–143. [Google Scholar] [CrossRef]
- K.C. Li. “Asymptotic optimality for C
_{p}, C_{L}, cross-validation and generalized cross-validation: discrete index set.” Ann. Stat. 15 (1987): 958–975. [Google Scholar] [CrossRef] - B. Hansen. “Least-squares forecast averaging.” J. Econom. 146 (2008): 342–350. [Google Scholar] [CrossRef]
- D.W.K. Andrews. “Asymptotic optimality of generalized C
_{L}, cross-validation, and generalized cross-validation in regression with heteroskedastic errors.” J. Econom. 47 (1991): 359–377. [Google Scholar] [CrossRef] - X. Lu, and L. Su. Jackknife Model Averaging for Quantile Regressions. Working Paper; Singapore: School of Economics, Singapore Management University, 2012. [Google Scholar]
- G. Kuersteiner, and R. Okui. “Constructing optimal instruments by first-stage prediction averaging.” Econometrica 78 (2010): 697–718. [Google Scholar]
- F. Yao, and A. Ullah. “A nonparametric R
^{2}test for the presence of relevant variables.” J. Stat. Plan. Inference, 143 (2013): 1527–1547. [Google Scholar] [CrossRef] - L. Su, and A. Ullah. “A nonparametric goodness-of-fit-based test for conditional heteroskedasticity.” Econom. Theory 29 (2013): 187–212. [Google Scholar] [CrossRef]
- L.H. Huang, and J. Chen. “Analysis of variance, coefficient of determination and f-test for local polynomial regression.” Ann. Stat. 36 (2008): 2085–2109. [Google Scholar] [CrossRef]
- C. Hurvich, J. Simonoff, and C. Tsai. “Smoothing parameter selection in nonparametric regression using an improved Akaike information criterion.” J. R. Stat. Soc. 60 (1998): 271–293. [Google Scholar] [CrossRef]
- J. Racine, and Q. Li. “Nonparametric estimation of regression functions with both categorical and continuous data.” J. Econom. 119 (2004): 99–130. [Google Scholar] [CrossRef]
- D.W.K. Andrews. “Consistent moment selection procedures for generalized method of moments estimation.” Econometrica 67 (1999): 543–564. [Google Scholar] [CrossRef]
- X. Chen, H. Hong, and M. Shum. “Nonparametric likelihood ratio model selection tests between parametric likelihood and moment condition models.” J. Econom. 141 (2007): 109–140. [Google Scholar] [CrossRef]
- S.M. Schennach. “Instrumental variable estimation of nonlinear errors-in-variables models.” Econometrica 75 (2007): 201–239. [Google Scholar] [CrossRef]
- B. Hansen. Nonparametric Sieve Regression: Least Squares Averaging Least Squares, and Cross-validation. Working Paper; Madison, WI, USA: University of Wisconsin, 2012. [Google Scholar]
- H. Liang, G. Zou, A.T.K. Wan, and X. Zhang. “Optimal weight choice for frequentist model average estimators.” J. Am. Stat. Assoc. 106 (2011): 1053–1066. [Google Scholar] [CrossRef]
- E.A. Nadaraya. “Some new estimates for distribution functions.” Theory Probab. Its Appl. 9 (1964): 497–500. [Google Scholar] [CrossRef]
- G.S. Watson. “Smooth regression analysis.” Sankhya Ser. A 26 (1964): 359–372. [Google Scholar]
- P.G. Hall, and J.S. Racine. Infinite Order Cross-validated Local Polynomial Regression. Working Paper; Ontario, Canada: Department of Economic, McMaster University, 2013. [Google Scholar]
- W. Härdle, P. Hall, and J.S. Marron. “How far are automatically chosen regression smoothing parameters from their optimum? ” J. Am. Stat. Assoc. 83 (1988): 86–99. [Google Scholar] [CrossRef]
- Q. Li, and J. Racine. Empirical Applications of Smoothing Categorical Variables. Working Paper; Ontario, Canada: Department of Economic, McMaster University, 2001. [Google Scholar]
- J. Racine. “Consistent cross-validatory model-selection for dependent data: Hv-block cross-validation.” J. Econom. 99 (2000): 39–61. [Google Scholar] [CrossRef]
- M. Caner. “A lasso type GMM estimator.” Econom. Theory 25 (2009): 270–290. [Google Scholar] [CrossRef]
- M. Caner, and M. Fan. A Near Minimax Risk Bound: Adaptive Lasso with Heteroskedastic Data in Instrumental Variable Selection. Working Paper; Raleigh, USA: North Carolina State University, 2011. [Google Scholar]
- P.E. Garcia. Instrumental Variable Estimation and Selection with Many Weak and Irrelevant Instruments. Working Paper; Madison, WI, USA: University of Wisconsin, 2011. [Google Scholar]
- Z. Liao. “Adaptive GMM shrinkage estimation with consistent moment selection.” Econom. Theory FirstView (2013): 1–48. [Google Scholar] [CrossRef]
- E. Gautier, and A. Tsybakov. High-Dimensional Instrumental Variables Regression and Confidence Sets. Working Paper; Malakoff Cedex, France: Centre de Recherche en Economie et Statistique, 2011. [Google Scholar]

© 2013 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

## Share and Cite

**MDPI and ACS Style**

Ullah, A.; Wang, H.
Parametric and Nonparametric Frequentist Model Selection and Model Averaging. *Econometrics* **2013**, *1*, 157-179.
https://doi.org/10.3390/econometrics1020157

**AMA Style**

Ullah A, Wang H.
Parametric and Nonparametric Frequentist Model Selection and Model Averaging. *Econometrics*. 2013; 1(2):157-179.
https://doi.org/10.3390/econometrics1020157

**Chicago/Turabian Style**

Ullah, Aman, and Huansha Wang.
2013. "Parametric and Nonparametric Frequentist Model Selection and Model Averaging" *Econometrics* 1, no. 2: 157-179.
https://doi.org/10.3390/econometrics1020157