# Estimating Neural Network’s Performance with Bootstrap: A Tutorial

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Notation

## 3. Central Limit Theorem for an Averaging Estimator $\theta $

**Corollary**

**1.**

**Proof.**

**Corollary**

**2.**

**Proof.**

- The distribution is symmetric around the average, with the same number of observations below and above it; and
- The standard deviation of the distribution can be used as a statistical error, knowing that ca. 68% of the results will be in a region of $\pm \sigma $ around the average.

## 4. Bootstrap

Algorithm 1: Pseudo-code of the bootstrap algorithm. |

## 5. Other Resampling Techniques

#### 5.1. Hold-Out Set Approach

#### 5.2. Leave-One-Out Cross-Validation

Algorithm 2: Leave-one-out cross-validation (LOOCV) algorithm. |

#### 5.3. k-Fold Cross-Validation

Algorithm 3: k-fold cross-validation (k-fold CV) algorithm. |

#### 5.4. Jackknife

#### 5.5. Subsampling

## 6. Algorithms for Performance Estimation

#### 6.1. Split/Train Algorithm

Algorithm 4: Split/train algorithm applied to the estimation of the distribution of a statistical estimator. |

#### 6.2. Bootstrap

#### 6.3. Mixed Approach between Bootstrap and Split/Train

Algorithm 5: Bootstrap algorithm applied to the estimation of the distribution of a statistical estimator. |

## 7. Application to Synthetic Data

Algorithm 6: Algorithm for synthetic data generation. |

#### 7.1. Results of Bootstrap

#### 7.2. Comparison of Split/Train and Bootstrap Algorithms

## 8. Application to Real Data

## 9. Limitations and Promising Research Developments

## 10. Conclusions

## 11. Software

## Author Contributions

## Funding

## Data Availability Statement

## Conflicts of Interest

## Abbreviations

MSE | Mean Square Error |

NNM | Neural Network Model |

CLT | Central Limit Theorem |

iid | independent identically distributed |

Probability Density Function | |

LOOCV | Leave-one-out cross-validation |

CV | cross-validation |

## References

- Michelucci, U. Applied Deep Learning—A Case-Based Approach to Understanding Deep Neural Networks; APRESS Media, LLC: New York, NY, USA, 2018. [Google Scholar]
- Izonin, I.; Tkachenko, R.; Verhun, V.; Zub, K. An approach towards missing data management using improved GRNN-SGTM ensemble method. Eng. Sci. Technol. Int. J.
**2020**. [Google Scholar] [CrossRef] - Tkachenko, R.; Izonin, I.; Kryvinska, N.; Dronyuk, I.; Zub, K. An Approach towards Increasing Prediction Accuracy for the Recovery of Missing IoT Data based on the GRNN-SGTM Ensemble. Sensors
**2020**, 20, 2625. [Google Scholar] [CrossRef] - Izonin, I.; Tkachenko, R.; Vitynskyi, P.; Zub, K.; Tkachenko, P.; Dronyuk, I. Stacking-based GRNN-SGTM Ensemble Model for Prediction Tasks. In Proceedings of the 2020 International Conference on Decision Aid Sciences and Application (DASA), Sakheer, Bahrain, 8–9 November 2020; pp. 326–330. [Google Scholar]
- Alonso-Atienza, F.; Rojo-Álvarez, J.L.; Rosado-Muñoz, A.; Vinagre, J.J.; García-Alberola, A.; Camps-Valls, G. Feature selection using support vector machines and bootstrap methods for ventricular fibrillation detection. Expert Syst. Appl.
**2012**, 39, 1956–1967. [Google Scholar] [CrossRef] - Dietterich, T.G. Ensemble Methods in Machine Learning. In Multiple Classifier Systems; Springer: Berlin/Heidelberg, Germany, 2000; pp. 1–15. [Google Scholar]
- Perrone, M.P.; Cooper, L.N. When Networks Disagree: Ensemble Methods for Hybrid Neural Networks; Technical Report; Brown Univ Providence Ri Inst For Brain And Neural Systems: Providence, RI, USA, 1992. [Google Scholar]
- Tkachenko, R.; Tkachenko, P.; Izonin, I.; Vitynskyi, P.; Kryvinska, N.; Tsymbal, Y. Committee of the combined RBF-SGTM neural-like structures for prediction tasks. In International Conference on Mobile Web and Intelligent Information Systems; Springer: Berlin/Heidelberg, Germany, 2019; pp. 267–277. [Google Scholar]
- Sagi, O.; Rokach, L. Ensemble learning: A survey. Wiley Interdiscip. Rev. Data Min. Knowl. Discov.
**2018**, 8, e1249. [Google Scholar] [CrossRef] - Tiwari, M.K.; Chatterjee, C. Uncertainty assessment and ensemble flood forecasting using bootstrap based artificial neural networks (BANNs). J. Hydrol.
**2010**, 382, 20–33. [Google Scholar] [CrossRef] - Zio, E. A study of the bootstrap method for estimating the accuracy of artificial neural networks in predicting nuclear transient processes. IEEE Trans. Nucl. Sci.
**2006**, 53, 1460–1478. [Google Scholar] [CrossRef] - Zhang, J. Inferential estimation of polymer quality using bootstrap aggregated neural networks. Neural Netw.
**1999**, 12, 927–938. [Google Scholar] [CrossRef] - Efron, B.; Tibshirani, R.J. An Introduction to the Bootstrap; CRC Press: Boca Raton, FL, USA, 1994. [Google Scholar]
- Good, P.I. Introduction to Statistics through Resampling Methods and R; John Wiley & Sons: Hoboken, NJ, USA, 2013. [Google Scholar]
- Chihara, L.; Hesterberg, T. Mathematical Statistics with Resampling and R; Wiley Online Library: Hoboken, NJ, USA, 2011. [Google Scholar]
- Williams, J.; MacKinnon, D.P. Resampling and distribution of the product methods for testing indirect effects in complex models. Struct. Equ. Model. A Multidiscip. J.
**2008**, 15, 23–51. [Google Scholar] [CrossRef][Green Version] - Montgomery, D.C.; Runger, G.C. Applied Statistics and Probability for Engineers; Wiley: Hoboken, NJ, USA, 2014. [Google Scholar]
- Johnson, N.; Kotz, S.; Balakrishnan, N. Chi-squared distributions including chi and Rayleigh. In Continuous Univariate Distributions; John Wiley & Sons: Hoboken, NJ, USA, 1994; pp. 415–493. [Google Scholar]
- Efron, B. Bootstrap Methods: Another Look at the Jackknife. Ann. Statist.
**1979**, 7, 1–26. [Google Scholar] [CrossRef] - Paass, G. Assessing and improving neural network predictions by the bootstrap algorithm. In Advances in Neural Information Processing Systems; Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 1992; pp. 196–203. [Google Scholar]
- González-Manteiga, W.; Prada Sánchez, J.M.; Romo, J. The Bootstrap—A Review; Universidad Carlos III de Madrid: Getafe, Spain, 1992. [Google Scholar]
- Lahiri, S. Bootstrap methods: A review. In Frontiers in Statistics; World Scientific: Singapore, 2006; pp. 231–255. [Google Scholar]
- Swanepoel, J. Invited review paper a review of bootstrap methods. S. Afr. Stat. J.
**1990**, 24, 1–34. [Google Scholar] - Hinkley, D.V. Bootstrap methods. J. R. Stat. Soc. Ser. B (Methodol.)
**1988**, 50, 321–337. [Google Scholar] [CrossRef] - Efron, B. Second thoughts on the bootstrap. Stat. Sci.
**2003**, 18, 135–140. [Google Scholar] [CrossRef] - Chernick, M.R. Bootstrap Methods: A Guide for Practitioners and Researchers; John Wiley & Sons: Hoboken, NJ, USA, 2011; Volume 619. [Google Scholar]
- Lahiri, S. Bootstrap methods: A practitioner’s guide-MR Chernick, Wiley, New York, 1999, pp. xiv+ 264, ISBN 0-471-34912-7. J. Stat. Plan. Inference
**2000**, 1, 171–172. [Google Scholar] [CrossRef] - Chernick, M.; Murthy, V.; Nealy, C. Application of bootstrap and other resampling techniques: Evaluation of classifier performance. Pattern Recognit. Lett.
**1985**, 3, 167–178. [Google Scholar] [CrossRef] - Efron, B. The Jackknife, the Bootstrap and Other Resampling Plans; SIAM: Philadelphia, PA, USA, 1982. [Google Scholar]
- Zainuddin, N.H.; Lola, M.S.; Djauhari, M.A.; Yusof, F.; Ramlee, M.N.A.; Deraman, A.; Ibrahim, Y.; Abdullah, M.T. Improvement of time forecasting models using a novel hybridization of bootstrap and double bootstrap artificial neural networks. Appl. Soft Comput.
**2019**, 84, 105676. [Google Scholar] [CrossRef] - Li, X.; Deng, S.; Wang, S.; Lv, Z.; Wu, L. Review of small data learning methods. In Proceedings of the 2018 IEEE 42nd Annual Computer Software and Applications Conference (COMPSAC), Tokyo, Japan, 23–27 July 2018; Volume 2, pp. 106–109. [Google Scholar]
- Reed, S.; Lee, H.; Anguelov, D.; Szegedy, C.; Erhan, D.; Rabinovich, A. Training deep neural networks on noisy labels with bootstrapping. arXiv
**2014**, arXiv:1412.6596. [Google Scholar] - Diciccio, T.J.; Romano, J.P. A review of bootstrap confidence intervals. J. R. Stat. Soc. Ser. B (Methodol.)
**1988**, 50, 338–354. [Google Scholar] [CrossRef] - Khosravi, A.; Nahavandi, S.; Srinivasan, D.; Khosravi, R. Constructing optimal prediction intervals by using neural networks and bootstrap method. IEEE Trans. Neural Netw. Learn. Syst.
**2014**, 26, 1810–1815. [Google Scholar] [CrossRef] [PubMed] - Gonçalves, S.; Politis, D. Discussion: Bootstrap methods for dependent data: A review. J. Korean Stat. Soc.
**2011**, 40, 383–386. [Google Scholar] [CrossRef] - Chernick, M.R. The Essentials of Biostatistics for Physicians, Nurses, and Clinicians; Wiley Online Library: Hoboken, NJ, USA, 2011. [Google Scholar]
- Pastore, A. An introduction to bootstrap for nuclear physics. J. Phys. G Nucl. Part Phys.
**2019**, 46, 052001. [Google Scholar] [CrossRef][Green Version] - Sohn, R.; Menke, W. Application of maximum likelihood and bootstrap methods to nonlinear curve-fit problems in geochemistry. Geochem. Geophys. Geosyst.
**2002**, 3, 1–17. [Google Scholar] [CrossRef] - Anirudh, R.; Thiagarajan, J.J. Bootstrapping graph convolutional neural networks for autism spectrum disorder classification. In Proceedings of the ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019; pp. 3197–3201. [Google Scholar]
- Gligic, L.; Kormilitzin, A.; Goldberg, P.; Nevado-Holgado, A. Named entity recognition in electronic health records using transfer learning bootstrapped neural networks. Neural Netw.
**2020**, 121, 132–139. [Google Scholar] [CrossRef] - Ruf, J.; Wang, W. Neural networks for option pricing and hedging: A literature review. J. Comput. Financ.
**2019**, 24, 1–46. [Google Scholar] [CrossRef][Green Version] - Efron, B. Better bootstrap confidence intervals. J. Am. Stat. Assoc.
**1987**, 82, 171–185. [Google Scholar] [CrossRef] - Gareth, J.; Daniela, W.; Trevor, H.; Robert, T. An Introduction to Statistical Learning: With Applications in R; Spinger: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
- Quenouille, M.H. Approximate tests of correlation in time-series. J. R. Stat. Soc. Ser. B (Methodol.)
**1949**, 11, 68–84. [Google Scholar] - Cameron, A.C.; Trivedi, P.K. Microeconometrics: Methods and Applications; Cambridge University Press: Cambridge, UK, 2005. [Google Scholar]
- Miller, R.G. The jackknife—A review. Biometrika
**1974**, 61, 1–15. [Google Scholar] - Efron, B. Nonparametric estimates of standard error: The jackknife, the bootstrap and other methods. Biometrika
**1981**, 68, 589–599. [Google Scholar] [CrossRef] - Wu, C.F.J. Jackknife, bootstrap and other resampling methods in regression analysis. Ann. Stat.
**1986**, 14, 1261–1295. [Google Scholar] [CrossRef] - Efron, B.; Stein, C. The jackknife estimate of variance. Ann. Stat.
**1981**, 9, 586–596. [Google Scholar] [CrossRef] - Shao, J.; Wu, C.J. A general theory for jackknife variance estimation. Ann. Stat.
**1989**, 17, 1176–1197. [Google Scholar] [CrossRef] - Bickel, P.J.; Boley, N.; Brown, J.B.; Huang, H.; Zhang, N.R. Subsampling methods for genomic inference. Ann. Appl. Stat.
**2010**, 4, 1660–1697. [Google Scholar] [CrossRef] - Robinson, D.G.; Storey, J.D. subSeq: Determining appropriate sequencing depth through efficient read subsampling. Bioinformatics
**2014**, 30, 3424–3426. [Google Scholar] [CrossRef] [PubMed][Green Version] - Quiroz, M.; Villani, M.; Kohn, R.; Tran, M.N.; Dang, K.D. Subsampling MCMC—An introduction for the survey statistician. Sankhya A
**2018**, 80, 33–69. [Google Scholar] [CrossRef][Green Version] - Elliott, M.R.; Little, R.J.; Lewitzky, S. Subsampling callbacks to improve survey efficiency. J. Am. Stat. Assoc.
**2000**, 95, 730–738. [Google Scholar] [CrossRef] - Paparoditis, E.; Politis, D.N. Resampling and subsampling for financial time series. In Handbook of Financial Time Series; Springer: Berlin/Heidelberg, Germany, 2009; pp. 983–999. [Google Scholar]
- Bertail, P.; Haefke, C.; Politis, D.N.; White, H.L., Jr. A subsampling approach to estimating the distribution of diversing statistics with application to assessing financial market risks. In UPF, Economics and Business Working Paper; Universitat Pompeu Fabra: Barcelona, Spain, 2001. [Google Scholar]
- Chernozhukov, V.; Fernández-Val, I. Subsampling inference on quantile regression processes. Sankhyā Indian J. Stat.
**2005**, 67, 253–276. [Google Scholar] - Politis, D.N.; Romano, J.P.; Wolf, M. Subsampling for heteroskedastic time series. J. Econom.
**1997**, 81, 281–317. [Google Scholar] [CrossRef] - Politis, D.N.; Romano, J.P.; Wolf, M. Subsampling; Springer Science & Business Media: Berlin/Heidelberg, Germany, 1999. [Google Scholar]
- Delgado, M.A.; Rodrıguez-Poo, J.M.; Wolf, M. Subsampling inference in cube root asymptotics with an application to Manski’s maximum score estimator. Econ. Lett.
**2001**, 73, 241–250. [Google Scholar] [CrossRef][Green Version] - Gonzalo, J.; Wolf, M. Subsampling inference in threshold autoregressive models. J. Econom.
**2005**, 127, 201–224. [Google Scholar] [CrossRef][Green Version] - Politis, D.N.; Romano, J.P. Large sample confidence regions based on subsamples under minimal assumptions. Ann. Stat.
**1994**, 22, 2031–2050. [Google Scholar] [CrossRef] - Kingma, D.P.; Ba, J.A. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 2015; pp. 1–15. [Google Scholar]
- Harrison, D., Jr.; Rubinfeld, D.L. Hedonic housing prices and the demand for clean air. J. Environ. Econ. Manag.
**1978**, 5, 81–102. [Google Scholar] [CrossRef][Green Version] - Original paper by Harrison, D.; Rubinfeld, D. The Boston Housing Dataset Website. 1996. Available online: https://www.cs.toronto.edu/~delve/data/boston/bostonDetail.html (accessed on 15 March 2021).
- Jones, L.K. The computational intractability of training sigmoidal neural networks. IEEE Trans. Inf. Theory
**1997**, 43, 167–173. [Google Scholar] [CrossRef] - Michelucci, U. Code for Estimating Neural Network’s Performance with Bootstrap: A Tutorial. 2021. Available online: https://github.com/toelt-llc/NN-Performance-Bootstrap-Tutorial (accessed on 20 March 2021).

**Figure 1.**A numerical demonstration of the CLT. Panel (

**a**) shows the asymmetric chi-squared distribution of random values for $k=10$ [18], normalized to have the average equal to zero; in panels (

**b**–

**d**), the distribution of the average of the random values is shown for sample size $n=2$, $n=10$, and $n=200$, respectively.

**Figure 2.**Distribution of the MSE values obtained by evaluating a trained NNM on 1800 bootstrap samples generated from ${S}_{V}$. The NNM used consists of a small neural network with two layers, each having four neurons with the sigmoid activation functions, trained for 250 epochs, with a mini-batch size of 16 with the Adam optimizer.

**Figure 3.**Distribution of the MSE values obtained by using Algorithms 4 (black line) and 5 (gray lines). The gray lines were obtained by generating 1800 bootstrap samples from two different validation datasets ${S}_{V}^{\left(1\right)}$ and ${S}_{V}^{\left(2\right)}$, as described in the text. The vertical dashed lines indicate the average of the respective distributions.

**Table 1.**Comparison of the average of the MSE and its variance obtained with selected algorithms applied to the synthetic dataset. The running times were obtained on a 23 GHz 8-Core Intel i9 with 32 Gb 2667 MHz DDR4 Memory.

Algorithm | <MSE> | $\mathit{\sigma}$ | Running Time |
---|---|---|---|

Split/Train (100 splits) | 0.098 | 0.01 | 5.8 min |

Simple Bootrap (100 bootstrap samples) | 0.097 | 0.009 | 5.7 s |

k-fold cross-validation ($k=5)$ | 0.106 | 0.008 | 0 18 s |

Mixed approach (10 splits/100 bootstrap samples) | 0.105 | 0.01 | 59 s |

**Table 2.**Comparison of the average of the MSE and its variance obtained with selected algorithms applied to the Boston Dataset. The running times were obtained on a 23 GHz 8-Core Intel i9 with 32 Gb 2667 MHz DDR4 Memory. <MSE> and $\sigma $ are expressed in the table in 1000 USD.

Algorithm | <MSE> | $\mathit{\sigma}$ | Running Time |
---|---|---|---|

Split/Train (100 splits) | 75.1 | 18.4 | 5.6 min |

Simple Bootrap (100 bootstrap samples) | 74.7 | 17.8 | 6.2 s |

k-fold cross-validation ($k=5)$ | 77.2 | 17.2 | 16.7 s |

Mixed Bootrap (10 splits/100 bootstrap samples) | 75.0 | 15.2 | 63 s |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Michelucci, U.; Venturini, F.
Estimating Neural Network’s Performance with Bootstrap: A Tutorial. *Mach. Learn. Knowl. Extr.* **2021**, *3*, 357-373.
https://doi.org/10.3390/make3020018

**AMA Style**

Michelucci U, Venturini F.
Estimating Neural Network’s Performance with Bootstrap: A Tutorial. *Machine Learning and Knowledge Extraction*. 2021; 3(2):357-373.
https://doi.org/10.3390/make3020018

**Chicago/Turabian Style**

Michelucci, Umberto, and Francesca Venturini.
2021. "Estimating Neural Network’s Performance with Bootstrap: A Tutorial" *Machine Learning and Knowledge Extraction* 3, no. 2: 357-373.
https://doi.org/10.3390/make3020018