Next Article in Journal
Guidelines to Support Graphical User Interface Design for Children with Autism Spectrum Disorder: An Interdisciplinary Approach
Previous Article in Journal
Top-of-Rail Lubricants: Potential Risks and Benefits
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Extended Abstract

Computationally Efficient Bootstrap Expressions for Bandwidth Selection in Nonparametric Curve Estimation †

Research group MODES, Department of Mathematics, CITIC, Universidade da Coruña, 15071 A Coruña, Spain
*
Author to whom correspondence should be addressed.
Presented at the XoveTIC Congress, A Coruña, Spain, 27--28 September 2018.
Proceedings 2018, 2(18), 1164; https://doi.org/10.3390/proceedings2181164
Published: 17 September 2018
(This article belongs to the Proceedings of XoveTIC Congress 2018)

Abstract

:
Bootstrap methods are used for bandwidth selection in: (1) nonparametric kernel density estimation with dependent data (smoothed stationary bootstrap and smoothed moving blocks bootstrap), and (2) nonparametric kernel hazard rate estimation (smoothed bootstrap). In these contexts, four new bandwidth parameter selectors are proposed based on closed bootstrap expressions of the MISE of the kernel density estimator (case 1) and two approximations of the kernel hazard rate estimation (case 2). These expressions turn out to be very useful since Monte Carlo approximation is no longer needed. Finally, these smoothing parameter selectors are empirically compared with the already existing ones via a simulation study.

1. Introduction

This work deals with the well known problem of data-driven choice of smoothing parameters in nonparametric density and hazard rate estimation (see [1,2,3,4]). Our aim is also to propose new bootstrap procedures for nonparametric density estimation considering dependent data. On the other hand, hazard rate estimation is considered and two bootstrap bandwidth selectors based on some approximation of the kernel hazard rate estimator are proposed.

2. Nonparametric Density Estimation

Let us consider a random sample, ( X 1 , , X n ) , coming from a population with density f and the kernel density estimator (see [5,6]), which strongly depends on a bandwidth selector, h. In fact, its choice is really important since it regulates the degree of smoothing applied to the data.
In this context, the smoothed stationary bootstrap (SSB) resampling plan has been proposed (see the Appendix for a detailed description of the algorithm and [7]), as well as a bandwidth selector, namely h S S B * . It is the result of minimizing the SSB version of the MISE. A closed expression for the bootstrap MISE is also obtained by [7]. On the other hand, smoothed moving blocks bootstrap (SMBB) has been proposed (see the Appendix for a complete description of the method), as well as a bandwidth selector, h S M B B * , which is the minimizer in h of the closed expression for the M I S E S M B B * (see [8] for a deeper insight on the topic). It is worth mentioning that the exact expressions for the M I S E S S B * ( h ) and M I S E S M B B * ( h ) are really useful since Monte Carlo approximation is no longer necessary.

3. Nonparametric Hazard Rate Estimation

Let us consider ( X 1 , X 2 , , X n ) , a simple random sample coming from a population with continuous density f and cumulative distribution function F. Consider, additionally, the nonparametric hazard rate estimator (see [3,4]), the kernel density estimator f ^ h and the kernel distribution estimator F ^ h . In order to establish a bootstrap bandwidth selector for the hazard rate estimator, two approximations of the hazard rate estimator are considered. The two hazard rate approximated versions are given by:
r ˜ h , 1 ( x ) = f ^ h ( x ) 1 F ( x ) . r ˜ h , 2 ( x ) = 1 1 F ( x ) f ^ h ( x ) + f ( x ) ( 1 F ( x ) ) 2 F ^ h ( x ) f ( x ) ( 1 F ( x ) ) 2 + r ( x ) .
Closed-form expressions of the MISE of r ˜ h , 1 and r ˜ h , 2 , as well as their bootstrap versions can be found in [9]. Moreover, two bootstrap bandwidth selectors, namely h B O O T 1 and h B O O T 2 , are defined as the minimizers of M I S E r ˜ h , 1 , w * ( h ) and M I S E r ˜ h , 2 , w * ( h ) , respectively (see [9] for a deeper insight on the approach). It is worth mentioning that Monte Carlo approximation is not required.

4. Simulation Results

A simulation study is now carried out in order to check the good empirical behaviour of the new smoothing parameter selectors in both contexts. These are the models considered:
  • Density estimation: An AR(1) model given by X t = 0.6 X t 1 + 0.8 a t , where a t = d N ( 0 , 1 ) .
  • Hazard rate estimation: A Gumbel model such that f ( x ) = e x e e x , x 0 .

5. Discussion

Figure 1 shows that h S S B * and h S M B B * display a similar performance, actually the best one. According to Table 1, h B O O T 1 and h B O O T 2 display the overall best performance.

Funding

The authors acknowledge partial support by MINECO grants MTM2014-52876-R and MTM2017-82724-R (EU ERDF support included). Additionally, financial support from the Xunta de Galicia (Centro Singular de Investigación de Galicia accreditation ED431G/01 2016-2019 and Grupos de Referencia Competitiva ED431C2016-015) and the European Union (European Regional Development Fund - ERDF), is gratefully acknowledged. The first author aknowledges financial support from the Xunta de Galicia and the European Union (European Social Fund - ESF), the reference of which is ED481A-2017/215. Additionally, the work of the first author has been partially carried out during a visit at the University of California, San Diego, financed by INDITEX, with reference INDITEX-UDC 2017.

Conflicts of Interest

The authors declare no conflict of interest. The founding sponsors had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, and in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:
MISEMean integrated squared error
ISEIntegrated squared error
SSBSmoothed stationary bootstrap
SMBBSmoothed moving blocks bootstrap
iidIndependent and identically distributed
h D O DO-validation bandwidth selector for hazard rate estimation (see [10])
h G C M * González-Manteiga, Cao, Marron bandwidth selector for hazard rate estimation (see [11])
h P I Plug-in bandwidth selector for bandwidth selection with dependent data (see [12])
h C V l Leave- ( 2 l + 1 ) -out cross-validation for density estimation (see [13])
h S M C V Modified cross validation for density estimation with dependent data (see [8])
h P C V Penalized cross validation for density estimation with dependent data (see [8])
h C V Cross validation bandwidth selector for hazard rate estimation (see [14])
h M I S E Bandwidth selector which minimizes the theoretical MISE(h)

Appendix A

Smoothed stationary bootstrap
  • Draw X 1 * ( S B ) from F n , the empirical distribution function of the sample.
  • Define X 1 * = X 1 * ( S B ) + g U 1 * , where U 1 * has been drawn with density K and independently from X 1 * ( S B ) .
  • Assume we have already drawn X 1 * , , X i * (and, consequently, X 1 * ( S B ) , , X i * ( S B ) ) and consider the index j, for which X i * ( S B ) = X j . We define a binary auxiliary random variable I i + 1 * , such that P * I i + 1 * = 1 = 1 p and P * I i + 1 * = 0 = p . We assign X i + 1 * ( S B ) = X j mod n + 1 whenever I i + 1 * = 1 and we use the empirical distribution function for
    X i + 1 * ( S B ) | I i + 1 * = 0 , where mod stands for the modulus operator.
  • Once drawn X i + 1 * ( S B ) , we define X i + 1 * = X i + 1 * ( S B ) + g U i + 1 * , where, again, U i + 1 * has been drawn from the density K and independently from X i + 1 * ( S B ) .
Smoothed moving blocks bootstrap
  • Fix the block length, b N , and define k = min N n b
  • Define:
    B i , b = ( X i , X i + 1 , , X i + b 1 )
  • Draw ξ 1 , ξ 2 , , ξ k with uniform discrete distribution on { B 1 , B 2 , , B q } , with q = n b + 1
  • Define X 1 * ( M B B ) , , X n * ( M B B ) as the first n components of
    ( ξ 1 , 1 , ξ 1 , 2 , , ξ 1 , b , ξ 2 , 1 , ξ 2 , 2 , ξ 2 , b , , ξ k , 1 , ξ k , 2 , , ξ k , b )
  • Define X i * = X i * ( M B B ) + g U i * , where U i * has been drawn with density K and independently from X i * ( M B B ) , for all i = 1 , 2 , , n

References

  1. Silverman, B.W. Density Estimation for Statistics and Data Analysis; Chapman & Hall: London, UK, 1986. [Google Scholar]
  2. Devroye, L. A Course in Density Estimation; Birkhauser: Boston, MA, USA, 1987. [Google Scholar]
  3. Watson, G.S.; Leadbetter, M.R. Hazard analysis I. Biometrika 1964a, 51, 175–184. [Google Scholar] [CrossRef]
  4. Watson, G.S.; Leadbetter, M.R. Hazard analysis II. Sankhyā Ser. A 1964b, 26, 101–116. [Google Scholar]
  5. Parzen, E. Estimation of a probability density-function and mode. Ann. Stat. 1962, 33, 1065–1076. [Google Scholar] [CrossRef]
  6. Rosenblatt, M. Estimation of a probability density-function and mode. Ann. Stat. 1956, 27, 832–837. [Google Scholar] [CrossRef]
  7. Barbeito, I.; Cao, R. Smoothed stationary bootstrap bandwidth selection for density estimation with dependent data. Comput. Stat. Data Anal. 2016, 104, 130–147. [Google Scholar] [CrossRef]
  8. Barbeito, I.; Cao, R. A review and some new proposals for bandwidth selection in nonparametric density estimation for dependent data. In From Statistics to Mathematical Finance: Festschrift in Honour of Winfried Stute; Ferger, D., González Manteiga, W., Schmidt, T., Wang, J.L., Eds.; Springer International Publishing: Cham, Switzerland, 2017; pp. 173–208. ISBN 978-3-319-50986-0. [Google Scholar]
  9. Barbeito, I.; Cao, R. Smoothed bootstrap bandwidth selection for nonparametric hazard rate estimation. Preprint 2018. [Google Scholar] [CrossRef]
  10. Gámiz, M.L.; Mammen, E.; Martínez-Miranda, M.D.; Nielsen, J.P. Double one-sided cross-validation of local linear hazards. J. R. Stat. Soc. Ser. B Stat. 2016, 78, 775–779. [Google Scholar] [CrossRef]
  11. González-Manteiga, W.; Cao, R.; Marron, J.S. Bootstrap Selection of the Smoothing Parameter in Nonparametric Hazard Rate Estimation. J. Am. Stat. Assoc. 1996, 91, 1130–1140. [Google Scholar]
  12. Hall, P.; Lahiri, S.N.; Truong, Y.K. On bandwidth choice for density estimation with dependent data. Ann. Stat. 1995, 23, 2241–2263. [Google Scholar] [CrossRef]
  13. Hart, J.D.; Vieu, P. Data-driven bandwidth choice for density estimation based on dependent data. Ann. Stat. 1990, 18, 873–890. [Google Scholar] [CrossRef]
  14. Patil, P.N. On the Least Squares Cross-Validation Bandwidth in Hazard Rate Estimation. Ann. Stat. 1993, 21, 1792–1810. [Google Scholar] [CrossRef]
Figure 1. Boxplot of log M I S E ( h ^ ) / M I S E ( h M I S E ) , n = 100 , where h ^ = h C V l (first box), h S M C V (second box), h P C V (third box), h S S B * (fourth box), h S M B B * (fifth box) and h P I (sixth box).
Figure 1. Boxplot of log M I S E ( h ^ ) / M I S E ( h M I S E ) , n = 100 , where h ^ = h C V l (first box), h S M C V (second box), h P C V (third box), h S S B * (fourth box), h S M B B * (fifth box) and h P I (sixth box).
Proceedings 02 01164 g001
Table 1. Mean and median of I S E ( h ^ ) , n = 100 , where h ^ = h C V (third column), h D O (fourth column), h B O O T 1 (fifth column), h B O O T 2 (sixth column) and h G C M * (seventh column).
Table 1. Mean and median of I S E ( h ^ ) , n = 100 , where h ^ = h C V (third column), h D O (fourth column), h B O O T 1 (fifth column), h B O O T 2 (sixth column) and h G C M * (seventh column).
CVDOBOOT1BOOT2GCM
Gumbel modelMean 0.1656 0.01651 0.02914 0.02882 0.03595
Median 0.15527 0.01037 0.012844 0.01282 0.01739
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Barbeito, I.; Cao, R. Computationally Efficient Bootstrap Expressions for Bandwidth Selection in Nonparametric Curve Estimation. Proceedings 2018, 2, 1164. https://doi.org/10.3390/proceedings2181164

AMA Style

Barbeito I, Cao R. Computationally Efficient Bootstrap Expressions for Bandwidth Selection in Nonparametric Curve Estimation. Proceedings. 2018; 2(18):1164. https://doi.org/10.3390/proceedings2181164

Chicago/Turabian Style

Barbeito, Inés, and Ricardo Cao. 2018. "Computationally Efficient Bootstrap Expressions for Bandwidth Selection in Nonparametric Curve Estimation" Proceedings 2, no. 18: 1164. https://doi.org/10.3390/proceedings2181164

Article Metrics

Back to TopTop