Neural Stochastic Differential Equations with Neural Processes Family Members for Uncertainty Estimation in Deep Learning
Abstract
:1. Introduction
- Considering the translation equivariance properties of ConvCNPs, the implementation of the vNPs–SDE model with ConvCNPs can effectively handle ID data with missing rates for 1D regression and 2D image classification tasks.
- Applying the property of permutation invariance, the implemented vNPs–SDE model with CNPs or ANPs surpasses BBP, MC-dropout, and vanilla SDE-Net in multidimensional regression tasks with high missing rates by most metrics.
2. Materials
2.1. Definition of the Neural Processes Family
- (1)
- Encoder: The encoder E of NPs has two paths—a deterministic path and a latent path. In the deterministic path, each context pair is passed through a multi-layer perceptron to produce a deterministic representation . In the latent path, a latent representation is generated by passing through each context pair to another . Thus, the purpose of encoder E is to convert the input space into deterministic or latent representation space, where the input space represents n context points , and the representation space produces and for each of the pairs .
- (2)
- Aggregator: Aggregator a aims to summarise the n global representations and . The simplest operation of aggregator a is the mean function , which can ensure order invariance and perform well in practice. For the deterministic path, a is applied to to produce the deterministic code . For the latent path, however, we are interested in achieving an order-invariant global latent representation, so we apply a to to produce the latent code , which can parameterize the normal distribution for the latent path.
- (3)
- Decoder: In decoder D, the sampled global latent variables and are concatenated alongside the new target locations as inputs, and finally passed through D to produce the predictions for the corresponding values of . We parameterize decoder D as a neural network.
- ;
- , ;
- ;
- ,
2.2. Definition of SDE-Net
3. Proposed Methods
3.1. The Architecture of vNPs–SDE-Net
3.1.1. vNPs–SDE-Net for Synthetic 1D Regression and 2D Image Classification Tasks
3.1.2. vNPs–SDE-Net for Multidimensional Regression Tasks
3.2. The Objective Function of the vNPs–SDE-Net for Uncertainty Estimates
- (1)
- Completed_I = vNPs
- (2)
- Means, Vars = SDE-Net(Completed_I)
3.3. The Implementation of vNPs–SDE-Net
3.3.1. The Implementation of vNPs–SDE-Net with ConvCNPs
Algorithm 1 Implementation of the ConvCNPs–SDE model |
Inputs: ID dataset ; CR and MR are the context rate and missing rate, respectively; ccnps represents the ConvCNPs model of vNPs for completing the ID dataset; is the downsampling net for 2D image classification tasks or the upsampling net for 1D regression tasks; is the fully connected net; f represents the drift net and g represents the diffusion net; t is the layer depth; is the cross-entropy loss function, is the log-likelihood loss function, and is the binary cross-entropy loss function. Outputs: Means and Vars for #training iterations do |
1. Sample a minibatch of m data: ; |
2. if for 1D regression task: |
3. Context points are generated from sampled target points based on CR, where equals ; |
4. Forward through the ConvCNPs model: Y_dist = ccnps; |
5. Forward through the upsampling net of the SDE-Net block: ; |
6. else for 2D image classification task: |
7. Forward through the ConvCNPs model: Y_dist = ccnps; |
8. Forward through the downsampling net of the SDE-Net block: ; |
9. for k = 0 to t − 1 do |
10. Sample ; ; |
11. end for |
12. Forward through the fully connected layer of the SDE-Net block: ; |
13. Update and f by ; |
14. Update ccnps by ; |
15. Sample a minibatch of data from ID: ; |
16. Sample a minibatch of data from OOD: ; |
17. Forward through the downsampling or upsampling nets of the SDE-Net block: ; |
18. Update g by ; |
for #testing iterations do |
19. Evaluate the of ConvCNPs–SDE model; |
20. Sample a minibatch of m data from ID: ; |
21. mask = Bernoulli (1-MR) |
22. masked_ = mask ∗ ; |
23. completed_= ccnps; |
24. Means, Vars = SDE-Net(completed_); |
3.3.2. The Implementation of vNPs–SDE-Net with CNPs or ANPs
Algorithm 2 Implementation of the CNPs–SDE or ANPs–SDE models |
Inputs: ID dataset ; MR is the missing rate of the ID dataset; the downsampling layer is the encoder of the CNPs or ANPs models; f and g are the drift net and diffusion net, respectively; is the negative log-likelihood loss function for the CNPs model or the ELBO for the ANPs model; is the binary cross-entropy loss function; the fully collected layer is the decoder of the CNPs or ANPs models to produce Means and Vars. Outputs: Means and Vars for #training iterations do |
1. Sample a minibatch of m data: ; |
2. Forward through the downsampling net: d_mean_z = and ; |
3. Forward through the SDE-Net block: |
4. for k = 0 to do |
5. Sample ; |
6. ; |
7. end for |
8. Means, Vars () |
9. Update and f by ; |
10. Sample a minibatch of data from ID: ; |
11. Sample a minibatch of ; |
12. Forward through the downsampling or upsampling nets of the SDE-Net block: ; |
13. Update g by ; |
for #testing iterations do |
14. Evaluate the CNPs–SDE or ANPs–SDE models; |
15. Sample a minibatch of m data from ID: ; |
16. mask = Bernoulli(1-MR); |
17. masked_ = mask ∗ ; |
18. Means, Vars = CNPs_SDE (masked_) or ANPs_SDE (masked_). |
4. Results
4.1. Evaluation Metrics
4.2. vNPs–SDE Model for ID Dataset with MR
4.2.1. ConvCNPs–SDE-Net for Synthetic 1D Regression Tasks
4.2.2. The CNPs–SDE and ANPs–SDE Models for Multidimensional Regression Tasks
4.2.3. ConvCNPs–SDE-Net for Image Classification Dataset: MNIST
4.2.4. ConvCNPs–SDE-Net for Image Classification Dataset: CIFAR10
5. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
SDE-Net | Neural stochastic differential equation model |
DNNs | Deep neural networks |
ID | In-distribution |
OOD | Out-of-distribution |
NPs | Neural processes |
vNPs | Vanilla neural processes or neural process variants |
ConvCNP | Convolutional conditional neural process |
CNPs | Conditional neural processes |
ANPs | Attentive neural processes |
BNNs | Bayesian neural processes |
PCA | Principal component analysis |
GPs | Gaussian processes |
MLP | Multilayer perceptron |
CNNs | Convolutional neural networks |
ODE-Net | Neural ordinary differential equation |
Conv2d | Two-dimensional convolution |
RKHS | Reproducing kernel Hilbert space |
ResNet | Residual networks |
ELBO | Evidence lower bound |
KL | Kullback–Leibler divergence |
References
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. In Proceedings of the Advances in Neural Information Processing Systems, Lake Tahoe, NV, USA, 3–8 December 2012; pp. 1097–1105. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar]
- Singh, S.P.; Kumar, A.; Darbari, H.; Singh, L.; Jain, S. Machine translation using deep learning: An overview. In Proceedings of the 2017 International Conference on Computer, Communications and Electronics (Comptelix), Jaipur, India, 1–2 July 2017; pp. 162–167. [Google Scholar]
- Mousavi, S.S.; Schukat, M.; Howley, E. Deep Reinforcement Learning: An Overview. In Proceedings of the SAI Intelligent Systems Conference, London, UK, 3–4 September 2018; pp. 426–440. [Google Scholar]
- Guo, C.; Pleiss, G.; Sun, Y.; Weinberger, K.Q. On calibration of modern neural networks. In Proceedings of the 34th International Conference on Machine Learning, Sydney, NSW, Australia, 6–11 August 2017; Volume 70, pp. 1321–1330. [Google Scholar]
- MacKay, C.; David, J. Information Theory, Inference and Learning Algorithms; Cambridge University Press: New York, NY, USA, 2003. [Google Scholar]
- Kingma, D.P.; Salimans, T.; Welling, M. Variational dropout and the local reparameterization trick. In Proceedings of the 28th International Conference on Neural Information Processing Systems, Montréal, Canada, 7–12 December 2015; Volume 2, pp. 2575–2583. [Google Scholar]
- Blundell, C.; Cornebise, J.; Kavukcuoglu, K.; Wierstra, D. Weight uncertainty in neural network. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 6–11 July 2015; pp. 1613–1622. [Google Scholar]
- Izmailov, P.; Maddox, W.J.; Kirichenko, P.; Garipov, T.; Vetrov, D.P.; Wilson, A.G. Subspace Inference for Bayesian Deep Learning. In Proceedings of the 35th Conference on Uncertainty in Artificial Intelligence, Tel Aviv, Israel, 22–25 July 2019; pp. 1169–1179. [Google Scholar]
- Garipov, T.; Izmailov, P.; Podoprikhin, D.; Vetrov, D.P.; Wilson, A.G. Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs. In Proceedings of the 32nd Conference on Neural Information Processing Systems, Montréal, Canada, 2–8 December 2018; Volume 31, pp. 8789–8798. [Google Scholar]
- Wang, Y.; Yao, S.; Xu, T. Incremental Kernel Principal Components Subspace Inference with Nyström Approximation for Bayesian Deep Learning. IEEE Access 2021, 9, 36241–36251. [Google Scholar] [CrossRef]
- Gal, Y.; Ghahramani, Z. Bayesian Convolutional Neural Networks with Bernoulli Approximate Variational Inference. In Proceedings of the ICLR workshop track, San Juan, Puerto Rico, 2–4 May 2016. [Google Scholar]
- Lakshminarayanan, B.; Pritzel, A.; Blundell, C. Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 3–9 December 2017; Volume 30, pp. 6402–6413. [Google Scholar]
- Izmailov, P.; Podoprikhin, D.; Garipov, T.; Vetrov, D.P.; Wilson, A.G. Averaging Weights Leads to Wider Optima and Better Generalization. In Proceedings of the 34th Conference on Uncertainty in Artificial Intelligence, Monterey, CA, USA, 6–8 August 2018; pp. 876–885. [Google Scholar]
- Geifman, Y.; Uziel, G.; El-Yaniv, R. Bias-Reduced Uncertainty Estimation for Deep Neural Classifiers. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
- Gal, Y.; Ghahramani, Z. Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. In Proceedings of the 33rd International Conference on International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016; Volume 48, pp. 1050–1059. [Google Scholar]
- Kendall, A.; Gal, Y. What uncertainties do we need in Bayesian deep learning for computer vision. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 3–9 December 2017; Volume 30, pp. 5580–5590. [Google Scholar]
- Chen, R.T.Q.; Rubanova, Y.; Bettencourt, J.; Duvenaud, D. Neural ordinary differential equations. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, Montréal, Canada, 2–8 December 2018; Volume 31, pp. 6572–6583. [Google Scholar]
- Kong, L.; Sun, J.; Zhang, C. SDE-Net: Equipping Deep Neural Networks with Uncertainty Estimates. In Proceedings of the 37th International Conference on Machine Learning, Virtual Event, Vienna, Austria, 13–18 July 2020; Volume 1, pp. 5405–5415. [Google Scholar]
- Øksendal, B. Stochastic differential equations. In Stochastic Differential Equations; Springer: Berlin, Germany, 2003; p. 11. [Google Scholar]
- Bass, R.F. Stochastic Processes. In Cambridge Series in Statistical and Probabilistic Mathematics; Cambridge University Press: New York, NY, USA, 2011. [Google Scholar]
- Jeanblanc, M.; Yor, M.; Chesney, M. Continuous-Path Random Processes:Mathematical Prerequisites. In Mathematical Methods for Financial Markets; Avellaneda, M., Barone-Adesi, G., Eds.; Springer: Dordrecht, The Netherlands; Heidelberg, Germany; London, UK; New York, NY, USA, 2009. [Google Scholar]
- Hendrycks, D.; Gimpel, K. A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks. In Proceedings of the international Conference on Learning Representations, Caribe Hilton, San Juan, Puerto Rico, 2–4 May 2016. [Google Scholar]
- Lee, K.; Lee, H.; Lee, K.; Shin, J. Training Confidence-calibrated Classifiers for Detecting Out-of-Distribution Samples. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
- Malinin, A.; Gales, M. Predictive uncertainty estimation via prior networks. In Proceedings of the 32nd Conference on Neural Information Processing Systems, Montréal, Canada, 2–8 December 2018; Volume 31, pp. 7047–7058. [Google Scholar]
- Arpit, D.; Stanisław, J.; Nicolas, B.; David, K.; Emmanuel, B.; Maxinder, S.K.; Tegan, M.; Asja, F.; Aaron, C.; Yoshua, B. A Closer Look at Memorization in Deep Networks. In Proceedings of the 34th International Conference on Machine Learning, Sydney, NSW, Australia, 6–11 August 2017; Volume 70, pp. 233–242. [Google Scholar]
- Jiang, L.; Huang, D.; Liu, M.; Yang, W. Beyond Synthetic Noise: Deep Learning on Controlled Noisy Labels. In Proceedings of the 37th International Conference on Machine Learning, Vienna, Austria, 12–18 July 2020; Volume 1, pp. 4804–4815. [Google Scholar]
- Garnelo, M.; Schwarz, J.; Rosenbaum, D.; Viola, F.; Rezende, D.J.; Eslami, S.M.A.; Teh, Y.W. Neural Processes. arXiv 2018, arXiv:1807.01622. [Google Scholar]
- Garnelo, M.; Rosenbaum, D.; Maddison, C.; Ramalho, T.; Saxton, D.; Shanahan, M.; The, Y.W.; Rezende, D.J.; Eslami, A. Conditional neural processes. In Proceedings of the International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; pp. 1690–1699. [Google Scholar]
- Kim, H.; Mnih, A.; Schwarz, J.; Garnelo, M.; Eslami, S.M.A.; Rosenbaum, D.; Vinyals, O.; The, Y.W. Attentive Neural Processes. In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Gordon, J.; Bruinsma, W.P.; Foong, A.Y.K.; Requeima, J.; Dubois, Y.; Turner, R.E. Convolutional Conditional Neural Processes. In Proceedings of the 8th International Conference on Learning Representations, Formerly Addis Ababa, Ethiopia, 26 April–1 May 2020. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30, pp. 5998–6008. [Google Scholar]
- LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. Available online: http://yann.lecun.com/exdb/mnist/ (accessed on 30 November 1998). [CrossRef] [Green Version]
- Cohen, T.S.; Welling, M. Group equivariant convolutional networks. In Proceedings of the 33rd International Conference on International Conference on Machine Learning, New York, NY, USA, 20–22 June 2016; Volume 48, pp. 2990–2999. [Google Scholar]
- Aronszajn, N. Theory of Reproducing Kernels. Trans. Am. Math. Soc. 1950, 68, 337–404. [Google Scholar] [CrossRef]
- Zaheer, M.; Kottur, S.; Ravanbakhsh, S.; Poczos, B.; Salakhutdinov, R.R.; Smola, A.J. Deep Sets. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4 December 2017; Volume 30, pp. 3391–3401. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Identity Mappings in Deep Residual Networks. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 8–16 October 2016; pp. 630–645. [Google Scholar]
- Rezende, D.; Mohamed, S. Variational Inference with Normalizing Flows. In Proceedings of the 32nd International Conference on Machine Learning, Lile, France, 6–11 July 2015; pp. 1530–1538. [Google Scholar]
- Raissi, M.; Karniadakis, G.E. Hidden physics models: Machine learning of nonlinear partial differential equations. J. Comput. Phys. 2018, 357, 125–141. [Google Scholar] [CrossRef] [Green Version]
- Fawcett, T. An introduction to ROC analysis. Pattern Recognit. Lett. 2006, 27, 861–874. [Google Scholar] [CrossRef]
- Dua, D.; Graff, C. UCI Machine Learning Repository. 2017. Available online: http://archive.ics.uci.edu/ml (accessed on 31 December 2017).
- Krizhevsky, A.; Hinton, G. Learning Multiple Layers of Features from Tiny Images. 2009. Available online: http://www.cs.toronto.edu/~kriz/cifar.html (accessed on 8 April 2009).
Model | Missing Rate | RMSE |
---|---|---|
BBP | MR = 0.1 | 16.6 ± 0.1 |
MC-dropout | 12.4 ± 0.4 | |
ANPs–SDE-Net | 9.3 ± 0.6 | |
SDE-Net | 9.3 ± 0.7 | |
CNPs–SDE-Net | 9.1 ± 0.6 | |
BBP | MR = 0.3 | 20.4 ± 0.5 |
MC-dropout | 15.2 ± 0.6 | |
ANPs–SDE-Net | 13.1 ± 0.9 | |
SDE-Net | 13.1 ± 1.0 | |
CNPs-SDE Net | 12.9 ± 0.8 | |
BBP | MR = 0.5 | 23.5 ± 0.4 |
MC-dropout | 17.5 ± 1.0 | |
ANPs–SDE-Net | 16.1 ± 1.1 | |
SDE-Net | 16.1 ± 1.2 | |
CNPs–SDE-Net | 15.8 ± 1.0 | |
BBP | MR = 0.7 | 26.3 ± 1.1 |
MC-dropout | 19.6 ± 1.2 | |
ANPs–SDE-Net | 18.5 ± 1.3 | |
SDE-Net | 18.5 ± 1.4 | |
CNPs–SDE-Net | 18.2 ± 1.2 | |
BBP | MR = 0.9 | 28.8 ± 1.3 |
MC-dropout | 21.5 ± 1.3 | |
ANPs–SDE-Net | 20.7 ± 1.4 | |
SDE-Net | 20.7 ± 1.6 | |
CNPs–SDE-Net | 20.4 ± 1.3 |
Model | #Parameters | RMSE | TNR at TPR 95% | AUROC | Detection Accuracy | AUPR In | AUPR Out |
---|---|---|---|---|---|---|---|
BBP | 30.0K | 9.5 ± 0.2 | 9.0 ± 1.4 | 56.8 ± 0.9 | 52.1 ± 0.7 | 45.3 ± 1.3 | 1.3 ± 0.1 |
MC-dropout | 14.9K | 8.8 ± 0.0 | 6.1 ± 0.5 | 53.0 ± 1.2 | 53.7 ± 0.6 | 99.2 ± 0.2 | 1.1 ± 0.1 |
ANPs–SDE-Net | 288.5K | 8.8 ± 0.0 | 44.4 ± 4.2 | 84.2 ± 1.6 | 75.2 ± 1.7 | 99.8 ± 0.0 | 28.8 ± 2.0 |
CNPs–SDE-Net | 141.0K | 8.9 ± 0.1 | 7.9 ± 1.2 | 59.3 ± 1.5 | 58.6 ± 0.9 | 99.2 ± 0.1 | 1.3 ± 0.1 |
SDE-Net | 12.4K | 8.7 ± 0.1 | 64.3 ± 0.6 | 84.1 ± 1.1 | 80.6 ± 0.5 | 99.7 ± 0.0 | 24.7 ± 1.0 |
MNIST with MR | Model | Classification Accuracy | TNR at TPR 95% | AUROC | Detection Accuracy | AUPR In | AUPR Out |
---|---|---|---|---|---|---|---|
MR = 0.1 | BBP | 88.76 ± 1.73 | 33.17 ± 5.09 | 87.77 ± 2.03 | 83.75 ± 2.32 | 85.85 ± 3.21 | 94.02 ± 3.42 |
MC-dropout | 98.90 ± 0.06 | 88.66 ± 0.04 | 96.22 ± 0.04 | 92.18 ± 0.02 | 89.23 ± 0.05 | 98.44 ± 0.03 | |
ConvCNPs–SDE-Net | 99.32 ± 0.06 | 98.69 ± 0.04 | 99.68 ± 0.00 | 97.99 ± 0.02 | 99.01 ± 0.01 | 99.89 ± 0.00 | |
SDE-Net | 98.87 ± 0.06 | 99.40 ± 0.03 | 99.86 ± 0.01 | 98.82 ± 0.04 | 99.61 ± 0.02 | 99.94 ± 0.01 | |
MR = 0.3 | BBP | 78.37 ± 3.10 | 24.90 ± 3.12 | 82.06 ± 4.12 | 74.46 ± 2.27 | 73.86 ± 2.15 | 90.22 ± 2.30 |
MC-dropout | 93.47 ± 0.06 | 50.10 ± 0.06 | 92.01 ± 0.03 | 86.84 ± 0.06 | 84.68 ± 0.03 | 95.75 ± 0.05 | |
ConvCNPs–SDE-Net | 98.94 ± 0.07 | 99.04 ± 0.04 | 99.78 ± 0.01 | 98.27 ± 0.06 | 99.34 ± 0.03 | 99.92 ± 0.01 | |
SDE-Net | 94.98 ± 0.19 | 99.47 ± 0.02 | 99.70 ± 0.03 | 98.96 ± 0.03 | 99.51 ± 0.06 | 99.82 ± 0.04 | |
MR = 0.5 | BBP | 52.50 ± 4.15 | 16.70 ± 2.10 | 70.47 ± 3.12 | 78.37 ± 3.10 | 54.43 ± 3.34 | 84.60 ± 2.12 |
MC-dropout | 75.88 ± 0.10 | 21.72 ± 0.09 | 81.64 ± 0.09 | 75.87 ± 0.06 | 70.73 ± 0.04 | 89.60 ± 0.05 | |
ConvCNPs–SDE-Net | 97.45 ± 0.17 | 98.58 ± 0.03 | 99.70 ± 0.01 | 98.15 ± 0.05 | 99.17 ± 0.02 | 99.87 ± 0.01 | |
SDE-Net | 80.54 ± 0.36 | 99.17 ± 0.05 | 98.45 ± 0.08 | 97.26 ± 0.04 | 98.06 ± 0.05 | 98.96 ± 0.07 | |
MR = 0.7 | BBP | 23.71 ± 4.11 | 17.40 ± 2.23 | 66.36 ± 3.12 | 61.91 ± 3.23 | 43.01 ± 2.44 | 83.18 ± 2.23 |
MC-dropout | 46.48 ± 0.10 | 12.16 ± 0.16 | 70.32 ± 0.11 | 65.67 ± 0.16 | 53.76 ± 0.21 | 83.06 ± 0.15 | |
ConvCNPs–SDE-Net | 88.96 ± 0.33 | 97.45 ± 0.06 | 99.20 ± 0.03 | 97.47 ± 0.07 | 98.03 ± 0.04 | 99.62 ± 0.03 | |
SDE-Net | 49.25 ± 0.11 | 45.25 ± 1.36 | 93.53 ± 0.13 | 90.68 ± 0.15 | 92.36 ± 0.16 | 95.68 ± 0.12 | |
MR = 0.9 | BBP | 10.08 ± 1.14 | 54.34 ± 2.34 | 84.13 ± 4.34 | 77.78 ± 1.44 | 59.63 ± 2.33 | 93.72 ± 1.34 |
MC-dropout | 17.15 ± 0.54 | 8.41 ± 0.44 | 60.38 ± 0.35 | 57.66 ± 0.34 | 38.11 ± 0.24 | 77.82 ± 0.36 | |
ConvCNPs–SDE-Net | 36.54 ± 0.58 | 79.72 ± 0.65 | 96.20 ± 0.06 | 93.82 ± 0.14 | 92.86 ± 0.16 | 97.81 ± 0.08 | |
SDE-Net | 14.56 ± 0.34 | 10.30 ± 0.49 | 71.82 ± 0.25 | 67.83 ± 0.18 | 64.34 ± 0.30 | 82.80 ± 0.23 | |
MR = RMR | BBP | 36.90 ± 10.56 | 26.37 ± 3.63 | 73.05 ± 2.26 | 67.85 ± 1.62 | 51.52 ± 2.56 | 87.43 ± 1.22 |
MC-dropout | 42.97 ± 1.60 | 11.26 ± 2.41 | 69.34 ± 1.43 | 64.95 ± 1.20 | 53.56 ± 2.62 | 82.43 ± 1.12 | |
ConvCNPs–SDE-Net | 68.90 ± 11.66 | 95.72 ± 1.60 | 97.89 ± 0.85 | 95.91 ± 0.96 | 95.70 ± 1.67 | 98.89 ± 0.45 | |
SDE-Net | 45.34 ± 13.66 | 22.73 ± 7.30 | 86.24 ± 5.13 | 83.25 ± 5.82 | 83.74 ± 6.71 | 91.00 ± 3.11 |
MNIST with MR | Model | TNR at TPR 95% | AUROC | Detection Accuracy | AUPR Succ | AUPR Err |
---|---|---|---|---|---|---|
MR = 0.1 | BBP | 40.78 ± 2.34 | 89.55 ± 0.92 | 82.52 ± 1.25 | 98.79 ± 0.22 | 44.14 ± 3.22 |
MC-dropout | 86.52 ± 1.22 | 95.59 ± 0.88 | 91.76 ± 0.68 | 99.93 ± 0.01 | 36.68 ± 1.88 | |
ConvCNPs–SDE-Net | 92.44 ± 1.06 | 98.15 ± 0.12 | 95.01 ± 0.50 | 99.99 ± 0.00 | 31.49 ± 5.90 | |
SDE-Net | 84.43 ± 3.22 | 96.75 ± 0.60 | 92.59 ± 0.76 | 99.96 ± 0.01 | 31.32 ± 2.89 | |
MR = 0.3 | BBP | 24.90 ± 2.26 | 82.15 ± 2.82 | 75.45 ± 1.32 | 94.61 ± 2.34 | 54.26 ± 2.04 |
MC-dropout | 60.92 ± 1.24 | 91.73 ± 0.41 | 54.25 ± 1.51 | 99.21 ± 0.01 | 46.33 ± 1.31 | |
ConvCNPs–SDE-Net | 85.80 ± 3.63 | 97.22 ± 0.42 | 93.00 ± 1.02 | 99.97 ± 0.01 | 35.81 ± 4.71 | |
SDE-Net | 54.25 ± 1.51 | 91.11 ± 0.46 | 84.01 ± 0.93 | 99.44 ± 0.04 | 40.00 ± 1.75 | |
MR = 0.5 | BBP | 12.17 ± 2.16 | 72.48 ± 1.26 | 68.08 ± 1.52 | 78.42 ± 4.25 | 68.32 ± 2.55 |
MC-dropout | 27.28 ± 0.33 | 82.15 ± 0.33 | 76.10 ± 0.23 | 92.85 ± 0.13 | 56.85 ± 0.43 | |
ConvCNPs–SDE-Net | 72.53 ± 3.26 | 95.07 ± 0.51 | 89.36 ± 0.39 | 99.86 ± 0.01 | 36.57 ± 4.60 | |
SDE-Net | 33.07 ± 0.53 | 83.79 ± 0.35 | 76.12 ± 0.24 | 95.33 ± 0.14 | 56.45 ± 0.83 | |
MR = 0.7 | BBP | 10.17 ± 1.27 | 65.10 ± 1.21 | 65.09 ± 0.82 | 51.87 ± 0.57 | 82.44 ± 1.22 |
MC-dropout | 15.67 ± 0.87 | 73.82 ± 0.35 | 68.55 ± 0.32 | 72.23 ± 0.37 | 72.72 ± 0.33 | |
ConvCNPs–SDE-Net | 43.58 ± 1.67 | 89.10 ± 0.16 | 82.10 ± 0.41 | 98.44 ± 0.07 | 48.40 ± 1.01 | |
SDE-Net | 28.88 ± 0.97 | 76.12 ± 0.50 | 68.84 ± 0.47 | 75.88 ± 0.34 | 75.61 ± 0.76 | |
MR = 0.9 | BBP | 9.21 ± 1.22 | 69.39 ± 1.45 | 64.41 ± 0.62 | 26.28 ± 1.46 | 94.27 ± 0.32 |
MC-dropout | 7.45 ± 0.32 | 59.26 ± 0.43 | 57.29 ± 0.12 | 25.06 ± 0.22 | 86.16 ± 0.22 | |
ConvCNPs–SDE-Net | 15.73 ± 0.67 | 70.02 ± 0.70 | 64.73 ± 0.81 | 60.84 ± 1.19 | 77.94 ± 0.59 | |
SDE-Net | 7.14 ± 0.42 | 60.91 ± 0.41 | 59.25 ± 0.30 | 22.04 ± 0.79 | 88.79 ± 0.25 | |
MR = RMR | BBP | 10.85 ± 2.43 | 70.43 ± 5.52 | 67.83 ± 3.42 | 68.10 ± 6.22 | 80.86 ± 3.23 |
MC-dropout | 34.81 ± 13.32 | 78.00 ± 5.40 | 72.60 ± 4.40 | 77.47 ± 9.45 | 76.60 ± 4.21 | |
ConvCNPs–SDE-Net | 36.95 ± 11.08 | 88.19 ± 2.74 | 81.51 ± 2.59 | 94.38 ± 3.33 | 71.57 ± 6.18 | |
SDE-Net | 35.11 ± 15.40 | 83.44 ± 6.02 | 76.25 ± 4.80 | 80.15 ± 15.44 | 83.92 ± 3.18 |
CIFAR10 with MR | Model | Classification Accuracy | TNR at TPR 95% | AUROC | Detection Accuracy | AUPR In | AUPR Out |
---|---|---|---|---|---|---|---|
MR = 0.1 | BBP | 19.83 ± 0.45 | 64.31 ± 3.35 | 92.57 ± 2.47 | 86.49 ± 2.45 | 84.29 ± 1.55 | 96.46 ± 1.42 |
MC-dropout | 43.01 ± 0.35 | 3.67 ± 0.13 | 50.90 ± 0.15 | 52.50 ± 0.10 | 31.15 ± 0.16 | 71.50 ± 0.05 | |
ConvCNPs–SDE-Net | 78.66 ± 0.20 | 4.46 ± 0.31 | 59.97 ± 0.16 | 59.33 ± 0.11 | 42.23 ± 0.15 | 75.54 ± 0.07 | |
SDE-Net | 23.23 ± 0.25 | 1.57 ± 0.10 | 37.79 ± 0.13 | 50.36 ± 0.06 | 24.04 ± 0.14 | 63.41 ± 0.06 | |
MR = 0.3 | BBP | 19.64 ± 0.15 | 59.53 ± 3.43 | 92.62 ± 1.89 | 86.50 ± 1.55 | 86.45 ± 2.45 | 96.36 ± 1.45 |
MC-dropout | 24.91 ± 0.25 | 1.87 ± 0.15 | 40.81 ± 0.35 | 50.01 ± 0.26 | 23.62 ± 0.12 | 65.83 ± 0.25 | |
ConvCNPs–SDE-Net | 76.46 ± 0.20 | 4.25 ± 0.37 | 58.31 ± 0.30 | 57.89 ± 0.20 | 40.55 ± 0.18 | 74.71 ± 0.22 | |
SDE-Net | 13.11 ± 2.89 | 4.61 ± 0.25 | 52.28 ± 0.34 | 52.55 ± 0.15 | 31.46 ± 0.12 | 72.74 ± 0.30 | |
MR = 0.5 | BBP | 19.44 ± 0.25 | 46.25 ± 4.26 | 87.53 ± 3.72 | 81.05 ± 3.72 | 77.70 ± 3.38 | 93.95 ± 1.68 |
MC-dropout | 18.49 ± 0.19 | 2.22 ± 0.15 | 40.79 ± 0.19 | 50.00 ± 0.15 | 23.42 ± 0.17 | 66.10 ± 0.16 | |
ConvCNPs–SDE-Net | 72.21 ± 0.26 | 3.57 ± 0.19 | 55.08 ± 0.28 | 55.35 ± 0.13 | 37.36 ± 0.15 | 72.98 ± 0.22 | |
SDE-Net | 10.33 ± 0.06 | 4.97 ± 0.08 | 50.74 ± 0.26 | 50.95 ± 0.17 | 29.18 ± 0.21 | 72.33 ± 0.13 | |
MR = 0.7 | BBP | 18.39 ± 0.52 | 21.77 ± 4.09 | 77.28 ± 3.11 | 70.66 ± 2.47 | 67.15 ± 2.63 | 87.66 ± 2.29 |
MC-dropout | 14.61 ± 0.18 | 2.76 ± 0.28 | 42.66 ± 0.22 | 50.00 ± 0.13 | 24.26 ± 0.23 | 67.36 ± 0.14 | |
ConvCNPs–SDE-Net | 62.47 ± 0.49 | 2.50 ± 0.06 | 48.70 ± 0.28 | 52.08 ± 0.05 | 31.64 ± 0.14 | 69.60 ± 0.13 | |
SDE-Net | 10.10 ± 0.08 | 4.91 ± 0.28 | 49.86 ± 0.50 | 50.27 ± 0.19 | 27.85 ± 0.31 | 72.05 ± 0.34 | |
MR = 0.9 | BBP | 14.60 ± 0.67 | 1.10 ± 0.40 | 55.31 ± 2.84 | 65.55 ± 1.23 | 54.23 ± 2.43 | 68.97 ± 0.93 |
MC-dropout | 12.01 ± 0.16 | 2.81 ± 0.11 | 43.53 ± 0.06 | 50.00 ± 0.07 | 24.45 ± 0.04 | 67.91 ± 0.16 | |
ConvCNPs–SDE-Net | 36.48 ± 0.43 | 1.29 ± 0.01 | 37.86 ± 0.39 | 50.49 ± 0.05 | 23.48 ± 0.30 | 64.32 ± 0.18 | |
SDE-Net | 10.18 ± 0.06 | 4.91 ± 0.18 | 49.40 ± 0.06 | 50.19 ± 0.12 | 27.02 ± 0.05 | 71.92 ± 0.11 | |
MR = RMR | BBP | 16.37 ± 0.38 | 15.04 ± 5.27 | 70.19 ± 1.22 | 65.72 ± 0.64 | 59.93 ± 0.21 | 83.36 ± 1.87 |
MC-dropout | 14.78 ± 0.08 | 2.31 ± 0.25 | 41.87 ± 0.35 | 50.00 ± 0.15 | 23.72 ± 0.23 | 66.80 ± 0.33 | |
ConvCNPs–SDE-Net | 56.04 ± 4.08 | 1.98 ± 0.24 | 46.79 ± 2.04 | 51.80 ± 0.48 | 30.41 ± 1.90 | 68.45 ± 0.98 | |
SDE-Net | 10.18 ± 0.05 | 4.82 ± 0.26 | 50.01 ± 0.36 | 50.42 ± 0.22 | 28.00 ± 0.28 | 72.03 ± 0.27 |
CIFAR10 with MR | Model | TNR at TPR 95% | AUROC | Detection Accuracy | AUPR Succ | AUPR Err |
---|---|---|---|---|---|---|
MR = 0.1 | BBP | 10.97 ± 0.66 | 58.75 ± 0.62 | 56.19 ± 0.70 | 25.69 ± 0.32 | 85.16 ± 0.77 |
MC-dropout | 10.89 ± 0.60 | 67.19 ± 0.12 | 63.17 ± 0.23 | 61.69 ± 0.13 | 69.60 ± 0.24 | |
ConvCNPs–SDE-Net | 27.47 ± 0.80 | 83.77 ± 0.18 | 77.33 ± 0.26 | 94.88 ± 0.12 | 54.12 ± 0.64 | |
SDE-Net | 21.85 ± 1.60 | 72.51 ± 0.29 | 66.64 ± 0.35 | 44.85 ± 0.58 | 88.73 ± 0.21 | |
MR = 0.3 | BBP | 9.84 ± 0.34 | 56.17 ± 0.24 | 54.57 ± 0.34 | 24.77 ± 0.14 | 84.32 ± 0.67 |
MC-dropout | 6.47 ± 0.34 | 58.22 ± 0.26 | 56.90 ± 0.46 | 32.83 ± 0.66 | 78.92 ± 0.24 | |
ConvCNPs–SDE-Net | 26.96 ± 0.83 | 82.70 ± 0.24 | 76.04 ± 0.37 | 93.83 ± 0.17 | 55.70 ± 0.50 | |
SDE-Net | 8.61 ± 1.24 | 65.01 ± 0.68 | 63.04 ± 0.57 | 22.85 ± 0.88 | 91.82 ± 0.34 | |
MR = 0.5 | BBP | 8.19 ± 0.02 | 55.73 ± 0.68 | 55.17 ± 0.70 | 23.32 ± 0.43 | 83.74 ± 0.02 |
MC-dropout | 6.11 ± 0.14 | 57.18 ± 0.23 | 56.69 ± 0.22 | 24.32 ± 0.26 | 84.06 ± 0.34 | |
ConvCNPs–SDE-Net | 23.55 ± 0.79 | 80.31 ± 0.16 | 73.78 ± 0.33 | 91.32 ± 0.05 | 57.15 ± 0.57 | |
SDE-Net | 5.89 ± 1.22 | 53.64 ± 0.83 | 53.40 ± 0.19 | 13.31 ± 0.32 | 90.45 ± 0.33 | |
MR = 0.7 | BBP | 9.00 ± 0.09 | 53.62 ± 0.66 | 53.73 ± 0.61 | 20.29 ± 1.04 | 84.15 ± 0.25 |
MC-dropout | 5.89 ± 0.78 | 56.51 ± 0.58 | 56.08 ± 0.38 | 18.90 ± 0.22 | 87.23 ± 0.38 | |
ConvCNPs–SDE-Net | 18.24 ± 0.69 | 75.85 ± 0.39 | 69.81 ± 0.40 | 84.37 ± 0.38 | 61.18 ± 1.22 | |
SDE-Net | 5.05 ± 0.98 | 50.28 ± 0.51 | 51.17 ± 0.40 | 10.67 ± 0.44 | 89.87 ± 0.28 | |
MR = 0.9 | BBP | 5.06 ± 0.13 | 53.45 ± 2.20 | 53.78 ± 1.96 | 16.89 ± 1.98 | 86.25 ± 0.14 |
MC-dropout | 5.73 ± 0.78 | 55.93 ± 0.35 | 55.23 ± 0.28 | 14.60 ± 0.75 | 89.49 ± 0.23 | |
ConvCNPs–SDE-Net | 11.53 ± 0.48 | 66.46 ± 0.45 | 62.43 ± 0.48 | 56.21 ± 0.53 | 74.77 ± 0.72 | |
SDE-Net | 4.81 ± 0.88 | 50.93 ± 0.40 | 51.68 ± 0.36 | 11.26 ± 0.68 | 89.86 ± 0.25 | |
MR = RMR | BBP | 8.12 ± 0.17 | 55.30 ± 1.18 | 53.92 ± 0.85 | 19.66 ± 0.15 | 86.11 ± 0.74 |
MC-dropout | 6.66 ± 0.58 | 56.51 ± 0.88 | 55.59 ± 0.83 | 19.04 ± 0.55 | 87.32 ± 0.25 | |
ConvCNPs–SDE-Net | 16.33 ± 1.57 | 75.42 ± 1.58 | 69.77 ± 1.39 | 80.73 ± 3.76 | 66.10 ± 2.00 | |
SDE-Net | 5.34 ± 0.77 | 51.45 ± 0.94 | 51.74 ± 0.89 | 11.69 ± 0.56 | 90.21 ± 0.24 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, Y.; Yao, S. Neural Stochastic Differential Equations with Neural Processes Family Members for Uncertainty Estimation in Deep Learning. Sensors 2021, 21, 3708. https://doi.org/10.3390/s21113708
Wang Y, Yao S. Neural Stochastic Differential Equations with Neural Processes Family Members for Uncertainty Estimation in Deep Learning. Sensors. 2021; 21(11):3708. https://doi.org/10.3390/s21113708
Chicago/Turabian StyleWang, Yongguang, and Shuzhen Yao. 2021. "Neural Stochastic Differential Equations with Neural Processes Family Members for Uncertainty Estimation in Deep Learning" Sensors 21, no. 11: 3708. https://doi.org/10.3390/s21113708
APA StyleWang, Y., & Yao, S. (2021). Neural Stochastic Differential Equations with Neural Processes Family Members for Uncertainty Estimation in Deep Learning. Sensors, 21(11), 3708. https://doi.org/10.3390/s21113708