Prediction by Empirical Similarity via Categorical Regressors
Abstract
:1. Introduction
2. Preliminaries
Empirical Similarity Model
3. Prediction Based on Empirical Similarity with Categorical Regressors
4. Application
5. Conclusions and Future Work
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Fix, E.; Hodges, J.L. Discriminatory Analysis. Nonparametric Discrimination: Consistency Properties; Technical Report 4; Project Number 21-49-004; USAF School of Aviation Medicine: Randolph Field, TX, USA, 1951. [Google Scholar]
- Fix, E.; Hodges, J.L. Discriminatory Analysis: Samall Sample Performance; Technical Report 21-49-004; USAF School of Aviation Medicine: Randolph Field, TX, USA, 1952. [Google Scholar]
- Cover, T.M.; Hart, P.E. Nearest Neighbor Pattern Classification. IEEE Trans. Inf. Theory 1967, 13, 21–27. [Google Scholar] [CrossRef]
- Devroye, L.; Gyorfy, L.; Lugosi, G. A probabilistic Theory of Pattern Recognition; Springer: New York, NY, USA, 1996. [Google Scholar]
- Akaike, H. An approximation to the density function. Ann. Inst. Stat. Math. 1954, 6, 127–132. [Google Scholar] [CrossRef]
- Rosenblatt, M. Remarks on some Nonparametric Estimates of a Density Function. Ann. Math. Stat. 1956, 27, 832–837. [Google Scholar] [CrossRef]
- Parzen, E. On the Estimation of a Probability Density Function and the Mode. Ann. Math. Stat. 1962, 33, 1065–1076. [Google Scholar] [CrossRef]
- Silverman, B.W. Density Estimation for Statistics and Data Analysis; Chapman & Hall: London, UK, 1986. [Google Scholar]
- Scott, D.W. Multivariate Density Estimation: Theory, Practice and Visualization; Wiley: New York, NY, USA, 1992. [Google Scholar]
- Gilboa, I.; Lieberman, O.; David, S. Empirical Similarity. Rev. Econ. Stat. 2006, 88, 433–444. [Google Scholar] [CrossRef]
- Gilboa, I.; Schmeidler, D. Case-based decision theory. Q. J. Econ. 1995, 110, 605–639. [Google Scholar] [CrossRef]
- Gilboa, I.; Schmeidler, D. A Theory of Case-Based Decisions; Cambridge University Press: Cambridge, UK, 2001. [Google Scholar]
- Gilboa, I.; Schmeidler, D. Inductive inference: An axiomatic approach. Econometrica 2003, 71, 1–26. [Google Scholar] [CrossRef]
- Gayer, G.; Gilboa, I.; Lieberman, O. Rule-Based and Case-Based Reasoning in Housing Prices. BE J. Theor. Econ. 2007, 7. [Google Scholar] [CrossRef] [Green Version]
- Gilboa, I.; Lieberman, O.; Schmeidler, D. A similarity-based approach to prediction. J. Econ. 2011, 162, 124–131. [Google Scholar] [CrossRef] [Green Version]
- Lieberman, O. Asymptotic Theory for Empirical Similarity Models. Econ. Theory 2010, 4, 1032–1059. [Google Scholar] [CrossRef]
- Lieberman, O. A Similarity-Based Approach to Time-Varying Coefficient Non-Stationary Autoregression. J. Time Ser. Anal. 2012, 33, 484–502. [Google Scholar] [CrossRef]
- Davison, A.C. Statistical Models; Cambridge University Press: Cambridge, UK, 2003. [Google Scholar]
- Wassermann, L. All of Nonparametric Statistics; Springer: New York, NY, USA, 2006. [Google Scholar]
- Billot, A.; Gilboa, I.; Samet, D.; Schmeidler, D. Probabilities as similarity-weighted frequencies. Econometrica 2005, 73, 1125–1136. [Google Scholar] [CrossRef]
- Billot, A.; Gilboa, I.; Schmeidler, D. Axiomatization of an exponential similarity function. Math. Soc. Sci. 2008, 55, 107–115. [Google Scholar] [CrossRef] [Green Version]
- Gilboa, I.; Lieberman, O.; Schmeidler, D. On the definition of objective probabilities by empirical similarity. Synthese 2010, 172, 79–95. [Google Scholar] [CrossRef]
- Lieberman, O.; Phillips, P.C. Norming Rates and Limit Theory for Some Time-Varying Coefficient Autoregressions. J. Time Ser. Anal. 2014, 35, 592–623. [Google Scholar] [CrossRef]
- Hamid, A.; Heiden, M. Forecasting volatility with empirical similarity and Google Trends. J. Econ. Behav. Organ. 2015, 117, 62–81. [Google Scholar] [CrossRef]
- Gayer, G.; Lieberman, O.; Yaffe, O. Similarity-based model for ordered categorical data. Econ. Rev. 2019, 38, 263–278. [Google Scholar] [CrossRef]
- Aitchison, J.; Aitken, C.G. Multivariate binary discrimination by the kernel method. Biometrika 1976, 63, 413–420. [Google Scholar] [CrossRef]
- Delgado, M.A.; Mora, J. Nonparametric and semiparametric estimation with discrete regressors. Econom. J. Econom. Soc. 1995, 63, 1477–1484. [Google Scholar] [CrossRef]
- Chen, S.X.; Tang, C.Y. Nonparametric regression with discrete covariate and missing values. Stat. Its Interface 2011, 4, 463–474. [Google Scholar] [CrossRef] [Green Version]
- Nie, Z.; Racine, J.S. The crs Package: Nonparametric Regression Splines for Continuous and Categorical Predictors. R J. 2012, 4, 48–56. [Google Scholar] [CrossRef]
- Ma, S.; Racine, J.S. Additive regression splines with irrelevant categorical and continuous regressors. Stat. Sin. 2013, 23, 515–541. [Google Scholar]
- Chu, C.Y.; Henderson, D.J.; Parmeter, C.F. Plug-in bandwidth selection for kernel density estimation with discrete data. Econometrics 2015, 3, 199–214. [Google Scholar] [CrossRef]
- Guo, C.; Berkhahn, F. Entity embeddings of categorical variables. arXiv 2016, arXiv:1604.06737. [Google Scholar]
- Racine, J.; Li, Q. Nonparametric estimation of regression functions with both categorical and continuous data. J. Econ. 2004, 119, 99–130. [Google Scholar] [CrossRef] [Green Version]
- Li, Q.; Racine, J. Nonparametric estimation of distributions with categorical and continuous data. J. Multivar. Anal. 2003, 86, 266–292. [Google Scholar] [CrossRef] [Green Version]
- Li, Q.; Racine, J.S.; Wooldridge, J.M. Estimating average treatment effects with continuous and discrete covariates: The case of Swan-Ganz catheterization. Am. Econ. Rev. 2008, 98, 357–362. [Google Scholar] [CrossRef]
- Farnè, M.; Vouldis, A.T. A Methodology for Automatised Outlier Detection in High-Dimensional Datasets: An Application to Euro Area Banks’ Supervisory Data; Working Paper Series; European Central Bank: Frankfurt, Germany, 2018. [Google Scholar]
- Crampton, E.W. The growth of the odontoblasts of the incisor tooth as a criterion of the vitamin C intake of the guinea pig. J. Nutr. 1947, 33, 491–504. [Google Scholar] [CrossRef]
- R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2018. [Google Scholar]
- Hintze, J.; Nelson, R.D. Violin Plots: A Box Plot-Density Trace Synergism. Am. Stat. 1998, 52, 181–184. [Google Scholar]
- Tutz, G.; Gertheiss, J. Regularized regression for categorical data. Stat. Model. 2016, 16, 161–200. [Google Scholar] [CrossRef]
- Chiquet, J.; Grandvalet, Y.; Rigaill, G. On coding effects in regularized categorical regression. Stat. Model. 2016, 16, 228–237. [Google Scholar] [CrossRef]
- Tibshirani, R.; Wainwright, M.; Hastie, T. Statistical Learning with Sparsity: The Lasso and Generalizations; Chapman and Hall/CRC: Boca Raton, FL, USA, 2015. [Google Scholar]
Binary Dist. | Euclidean Dist. | ||||
---|---|---|---|---|---|
Method | |||||
Scenario 1 | LS FR | 499.59 | 14.72 | 1049.20 | 14.79 |
LS EX | 5.28 | 2.74 | 17.09 | 2.71 | |
MLE FR | 38.32 | 4.23 | 52.33 | 4.66 | |
MLE EX | 3.18 | 1.62 | 6.37 | 1.55 | |
Scenario 2 | LS FR | 113.17 | 6.67 | 170.42 | 6.94 |
LS EX | 3.91 | 1.96 | 12.06 | 1.94 | |
MLE FR | 23.84 | 3.57 | 32.75 | 4.00 | |
MLE EX | 2.70 | 1.48 | 4.82 | 1.38 |
Method | |||||
---|---|---|---|---|---|
Scenario 1 | LS FR | 719.35 | 0.00 | 105.88 | 15.42 |
LS EX | 3.79 | 0.00 | 3.56 | 2.84 | |
MLE FR | 30.60 | 0.00 | 20.68 | 4.45 | |
MLE EX | 2.35 | 0.00 | 2.24 | 1.68 | |
Scenario 2 | LS FR | 86.65 | 0.00 | 46.55 | 6.94 |
LS EX | 2.72 | 0.00 | 2.53 | 1.97 | |
MLE FR | 25.20 | 0.00 | 7.62 | 4.04 | |
MLE EX | 2.12 | 0.00 | 1.58 | 1.57 |
Estimate | s.e. | p-Value | ||
---|---|---|---|---|
Scenario 1 | Intercept | 12.46 | 0.99 | |
Supp. VC | −3.70 | 0.99 | <0.01 | |
Dose 1.0 mg | 9.13 | 1.21 | <0.01 | |
Dose 2.0 mg | 15.50 | 1.21 | <0.01 | |
Scenario 2 | Intercept | 9.28 | 1.16 | <0.01 |
Supp. VC | 3.76 | 1.20 | <0.01 | |
Dose 1.0 mg | 8.52 | 1.49 | <0.01 | |
Dose 2.0 mg | 14.26 | 1.47 | <0.01 |
LS | MLE | ||||
---|---|---|---|---|---|
Method | FR | EX | FR | EX | |
Scenario 1 | (binary) | 14.54 | 14.54 | 15.39 | 15.36 |
(Euclidean) | 14.55 | 14.54 | 15.45 | 16.40 | |
14.54 | 14.51 | 15.35 | 15.19 | ||
Regression | 13.67 | - | - | - | |
Scenario 2 | (binary) | 14.51 | 14.88 | 18.00 | 18.41 |
(Euclidean) | 14.40 | 14.04 | 17.54 | 18.55 | |
14.69 | 15.27 | 18.19 | 18.15 | ||
Regression | 15.31 | - | - | - |
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Sanchez, J.D.; Rêgo, L.C.; Ospina, R. Prediction by Empirical Similarity via Categorical Regressors. Mach. Learn. Knowl. Extr. 2019, 1, 641-652. https://doi.org/10.3390/make1020038
Sanchez JD, Rêgo LC, Ospina R. Prediction by Empirical Similarity via Categorical Regressors. Machine Learning and Knowledge Extraction. 2019; 1(2):641-652. https://doi.org/10.3390/make1020038
Chicago/Turabian StyleSanchez, Jeniffer Duarte, Leandro C. Rêgo, and Raydonal Ospina. 2019. "Prediction by Empirical Similarity via Categorical Regressors" Machine Learning and Knowledge Extraction 1, no. 2: 641-652. https://doi.org/10.3390/make1020038
APA StyleSanchez, J. D., Rêgo, L. C., & Ospina, R. (2019). Prediction by Empirical Similarity via Categorical Regressors. Machine Learning and Knowledge Extraction, 1(2), 641-652. https://doi.org/10.3390/make1020038