# Measuring the Uncertainty of Predictions in Deep Neural Networks with Variational Inference

^{1}

^{2}

^{3}

^{*}

^{†}

## Abstract

**:**

## 1. Introduction

## 2. Variational Inference and Related Work

#### 2.1. Bayesian and Variational Inference

#### 2.2. Related Work

## 3. Materials and Methods

- useful prediction uncertainty information can be computed
- the number of parameters to be optimized does not differ significantly from the non-Bayesian case

#### 3.1. Derivation of the Approach

#### 3.2. Implementation

## 4. Results and Discussion

#### 4.1. LeNet and the MNIST Dataset

#### 4.2. Comparison to Bayes by Backprop

#### 4.3. GoogLeNet and Custom Dataset

`RandomResizedCrop`provided by pytorch. Similarly to Section 4.1, we compare a frequentist model to our Bayesian approach. Both models are trained based on an Image-Net pre-trained version of GoogLeNet available from pytorch (https://download.pytorch.org/models/googlenet-1378be20.pth). In both cases, the pre-trained model is fine-tuned for 100 epochs using a learning rate of $0.01$ with momentum of $0.9$ and a batch size of 32. For the Bayesian approach, the a posteriori uncertainties are initialized the same for all layers. As before they take on the values of 0.4 for ${\tau}_{i}$ and 0.1 for ${\tau}_{bi}$. The a priori variances are initialized to $2.0$ for the biases and $1.0$ for the weights. The regularization term of the Kullback-Leibler divergence was weighted with a factor of $0.2\times {10}^{-6}$. After 100 epochs, the frequentist model reaches an accuracy of $0.9$ across the entire validation dataset. The Bayesian model reaches an accuracy of about $0.9$ for a one-pass evaluation over the validation dataset (the exact number differs for each random forward pass). When averaging over the results of 100 forward passes for each image, the accuracy of the Bayesian model improves to $0.924$.

## 5. Conclusions

## Author Contributions

## Funding

## Conflicts of Interest

## References

- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks; Advances in Neural Information Processing Systems, Curran Associates, Inc.: Red Hook, NY, USA, 2012. [Google Scholar]
- Bengio, Y.; Schwenk, H.; Senécal, J.S.; Morin, F.; Gauvain, J.L. Neural Probabilistic Language Models. In Innovations in Machine Learning; Springer: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
- Hornik, K.; Stinchcombe, M.; White, H. Universal Approximation of an Unknown Mapping and Its Derivatives Using Multilayer Feedforward Networks. Neural Netw.
**1990**, 3, 551–560. [Google Scholar] [CrossRef] - Liang, S.; Srikant, R. Why Deep Neural Networks For Function Approximation? In Proceedings of the ICLR, Toulon, France, 24–26 April 2017. [Google Scholar]
- Gulshan, V.; Peng, L.; Coram, M.; Stumpe, M.C.; Wu, D.; Narayanaswamy, A.; Venugopalan, S.; Widner, K.; Madams, T.; Cuadros, J.; et al. Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs. JAMA J. Am. Med. Assoc.
**2016**, 316, 2402–2410. [Google Scholar] [CrossRef] [PubMed] - Greenspan, H.; van Ginneken, B.; Summers, R.M. Deep Learning in Medical Imaging: Overview and Future Promise of an Exciting New Technique. IEEE Trans. Med. Imaging
**2016**, 35. [Google Scholar] [CrossRef] - Li, X.; Ding, Q.; Sun, J.Q. Remaining useful life estimation in prognostics using deep convolution neural networks. Reliab. Eng. Syst. Saf.
**2018**, 172, 1–11. [Google Scholar] [CrossRef][Green Version] - Banerjee, K.; Dinh, T.V.; Levkova, L. Velocity estimation from monocular video for automotive applications using convolutional neural networks. In Proceedings of the IEEE Intelligent Vehicles Symposium, Los Angeles, CA, USA, 11–14 June 2017. [Google Scholar]
- Jozwik, K.M.; Kriegeskorte, N.; Storrs, K.R.; Mur, M. Deep Convolutional Neural Networks Outperform Feature-Based but not Categorical Models in Explaining Object Similarity Judgements. Front. Psychol.
**2017**, 8, 1726. [Google Scholar] [CrossRef] [PubMed][Green Version] - Heaton, J.; Polson, N.; Witte, J. Deep learning for finance: Deep portfolios. Appl. Stoch. Model. Bus. Ind.
**2017**, 33, 3–12. [Google Scholar] [CrossRef] - Kendall, A.; Gal, Y. What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? In Advances in Neural Information Processing Systems 30; Curran Associates, Inc.: Red Hook, NY, USA, 2017; pp. 5574–5584. [Google Scholar]
- Gal, Y.; Ghahramani, Z. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning. In Proceedings of the 33rd ICML, New York, NY, USA, 19–24 June 2016. [Google Scholar]
- Rowan, A. Bayesian Deep Learning with Edward (and a Trick Using Dropout); PyData: London, UK, 2017. [Google Scholar]
- Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. J. Mach. Learn.
**2014**, 15, 1929–1958. [Google Scholar] - Wan, L.; Zeiler, M.; Zhang, S.; LeCun, Y.; Fergus, R. Regularization of Neural Networks Using DropConnect. In Proceedings of the 30th ICML, Atlanta, GA, USA, 16–21 June 2013. [Google Scholar]
- Gal, Y.; Ghahramani, Z. Bayesian Convolutional Neural Networks with Bernoulli Approximate Variational Inference. In Proceedings of the ICLR, San Juan, PR, USA, 2–4 May 2016. [Google Scholar]
- Kingma, D.P.; Salimans, T.; Welling, M. Variational Dropout and the Local Reparameterization Trick. In Advances in Neural Information Processing Systems 28; Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2015; pp. 2575–2583. [Google Scholar]
- Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
- Arora, S.; Bhaskara, A.; Ge, R.; Ma, T. Provable Bounds for Learning Some Deep Representations. In Proceedings of the 31st International Conference on Machine Learning, Bejing, China, 21–26 June 2014. [Google Scholar]
- Polson, N.G.; Sokolov, V. Deep Learning: A Bayesian Perspective. Bayesian Anal.
**2017**, 12, 1275–1304. [Google Scholar] [CrossRef] - Neal, R.M. Bayesian Learning for Neural Networks; Springer: New York, NY, USA, 1996. [Google Scholar] [CrossRef]
- MacKay, D.J.C. Bayesian Methods for Backpropagation Networks. In Models of Neural Networks III; Springer: New York, NY, USA, 1996; pp. 211–254. [Google Scholar] [CrossRef]
- Toussaint, U.V.; Gori, S.; Dose, V. Invariance priors for Bayesian feed-forward neural networks. Neural Netw.
**2006**, 19, 1550–1557. [Google Scholar] [CrossRef] [PubMed][Green Version] - Lakshminarayanan, B.; Pritzel, A.; Blundell, C. Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles. In Advances in Neural Information Processing Systems 30; Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2017; pp. 6402–6413. [Google Scholar]
- Hinton, G.E.; van Camp, D. Keeping the neural networks simple by minimizing the description length of the weights. In Proceedings of the Sixth Annual Conference on Computational Learning Theory, Santa Cruz, CA, USA, 26–28 July 1993. [Google Scholar]
- Jordan, M.I.; Ghahramani, Z.; Jaakkola, T.S.; Saul, L.K. An Introduction to Variational Methods for Graphical Models. Mach. Learn.
**1999**, 37, 183–233. [Google Scholar] [CrossRef] - Blei, D.M.; Kucukelbir, A.; McAuliffe, J.D. Variational Inference: A Review for Statisticians. arXiv
**2016**, arXiv:1601.00670. [Google Scholar] [CrossRef][Green Version] - Bishop, C.M. Pattern Recognition and Machine Learning; Springer: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
- Gal, Y.; Ghahramani, Z. Dropout as a Bayesian Approximation: Appendix. arXiv
**2016**, arXiv:1506.02157. [Google Scholar] - Blundell, C.; Cornebise, J.; Kavukcuoglu, K.; Wierstra, D. Weight Uncertainty in Neural Networks. In Proceedings of the ICML, Lille, France, 6–11 July 2015. [Google Scholar]
- LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-Based Learning Applied to Document Recognition. Proc. IEEE
**1998**, 86, 2278–2324. [Google Scholar] [CrossRef][Green Version] - Louizos, C.; Welling, M. Structured and Efficient Variational Deep Learning with Matrix Gaussian Posteriors. arXiv
**2016**, arXiv:1603.04733. [Google Scholar] - Amari, S.I. Differential-Geometrical Methods in Statistics; Springer: Berlin/Heidelberg, Germany, 1985. [Google Scholar]
- Hernandez-Lobato, J.M.; Li, Y.; Rowland, M.; Hernandez-Lobato, D.; Bui, T.; Turner, R.E. Black-Box Alpha Divergence Minimization. arXiv
**2015**, arXiv:1511.03243. [Google Scholar] - Li, Y.; Gal, Y. Dropout Inference in Bayesian Neural Networks with Alpha-divergences. arXiv
**2017**, arXiv:1703.02914. [Google Scholar] - Posch, K.; Pilz, J. Correlated Parameters to Accurately Measure Uncertainty in Deep Neural Networks. IEEE Trans. Neural Netw. Learn. Syst.
**2020**, 1–15. [Google Scholar] [CrossRef] [PubMed][Green Version] - Kingma, D.P.; Welling, M. Auto-Encoding Variational Bayes. arXiv
**2013**, arXiv:1312.6114. [Google Scholar] - Hershey, J.R.; Olsen, P.A. Approximating the Kullback Leibler Divergence Between Gaussian Mixture Models. In Proceedings of the 2007 IEEE International Conference on Acoustics, Speech, and Signal Processing, Honolulu, HI, USA, 15–20 April 2007. [Google Scholar]
- Jia, Y.; Shelhamer, E.; Donahue, J.; Karayev, S.; Long, J.; Girshick, R.; Guadarrama, S.; Darrell, T. Caffe: Convolutional Architecture for Fast Feature Embedding. arXiv
**2014**, arXiv:1408.5093. [Google Scholar] - BVLC. Caffe. 2016. Available online: https://github.com/BVLC/caffe (accessed on 1 May 2017).
- Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32; Wallach, H., Larochelle, H., Beygelzimer, A., Alché-Buc, F., Fox, E., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2019; pp. 8024–8035. [Google Scholar]
- LeCun, Y.; Cortes, C.; Burges, C.J. The MNIST Database of Handwritten Digits. Available online: http://yann.lecun.com/exdb/mnist/ (accessed on 1 May 2017).
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015. [Google Scholar]
- Bengio, Y. Deep Learning of Representations for Unsupervised and Transfer Learning. JMLR
**2012**, 27, 17–37. [Google Scholar]

**Figure 1.**Training visualization of frequentist LeNet with dropout. The horizontal line marks the achieved accuracy.

**Figure 3.**Confusion matrix for the frequentist LeNet with dropout and exchanged data and ROC curves for all three models.

**Figure 4.**Visualization of the a priori distribution. Bias terms may take on larger values since they act on sums.

**Figure 5.**Visualization of the initialization of ${\tau}_{i}$ and ${\tau}_{bi}$. Network weights can differ at most by the size of their expectation value from their expectation value.

**Figure 9.**Confusion matrix for the Bayes LeNet with dropout and exchanged data and ROC curves for all three models.

**Figure 10.**Boxplots of the random network outputs of two representative images of the MNIST dataset. Left: Boxplot of a correct classification, right: boxplot of an incorrect classification result.

**Figure 11.**All 14 images where the network was sure within 95% credible intervals about its wrong classification result.

Model | Test Error |
---|---|

without dropout | 0.93% |

with dropout | 0.75% |

with dropout and exchanged data | 1.94% |

Test Error Reduction wrt. Frequentist Networks | |||
---|---|---|---|

Model | Test Error | Absolute | Relative |

without dropout | 0.85% | 0.08% | 8.6% |

with dropout | 0.71% | 0.04% | 5.3% |

dropout & exch. data | 1.64% | 0.3% | 15.5% |

Layer | ${\mathit{\tau}}_{\mathit{i}}$ | ${\mathit{\tau}}_{\mathbf{bi}}$ |
---|---|---|

convolutional 1 | 0.003905073 | 0.01703058 |

convolutional 2 | 0.000045391 | 0.1021243 |

fully connected 1 | 0.7580626 | 0.1471348 |

fully connected 2 | 0.02509901 | 0.00004402 |

Layer | ${\mathit{\tau}}_{\mathit{i}}$ | ${\mathit{\tau}}_{\mathbf{bi}}$ |
---|---|---|

convolutional 1 | 0.003570068 | 0.01361556 |

convolutional 2 | 0.000045395 | 0.1025237 |

fully connected 1 | 0.5672782 | 0.1359651 |

fully connected 2 | 0.01789308 | 0.00004324 |

Quite Certain | Uncertain | |
---|---|---|

correct | 9609 | 297 |

wrong | 14 | 80 |

Method | Test Error |
---|---|

Bayes by Backprop, Gaussian prior [30] | 2.04% |

Bayes by Backprop, Scale mixture prior [30] | 1.32% |

SGD (vanilla) [30] | 1.88% |

Ours | 1.54% |

Train | Test | |
---|---|---|

Apple | 155 | 38 |

Avocado | 304 | 77 |

Banana | 165 | 39 |

Blackberry | 127 | 32 |

Blueberry | 103 | 26 |

Carrot | 129 | 32 |

Cucumber | 143 | 36 |

Grape | 185 | 46 |

Peach | 217 | 53 |

Pear | 187 | 48 |

Strawberry | 237 | 58 |

Quite Certain | Uncertain | |
---|---|---|

correct | 331 | 117 |

wrong | 1 | 36 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Steinbrener, J.; Posch, K.; Pilz, J.
Measuring the Uncertainty of Predictions in Deep Neural Networks with Variational Inference. *Sensors* **2020**, *20*, 6011.
https://doi.org/10.3390/s20216011

**AMA Style**

Steinbrener J, Posch K, Pilz J.
Measuring the Uncertainty of Predictions in Deep Neural Networks with Variational Inference. *Sensors*. 2020; 20(21):6011.
https://doi.org/10.3390/s20216011

**Chicago/Turabian Style**

Steinbrener, Jan, Konstantin Posch, and Jürgen Pilz.
2020. "Measuring the Uncertainty of Predictions in Deep Neural Networks with Variational Inference" *Sensors* 20, no. 21: 6011.
https://doi.org/10.3390/s20216011