Minimax Bayesian Neural Networks

Hong, Junping; Kuruoglu, Ercan Engin

doi:10.3390/e27040340

Open AccessArticle

Minimax Bayesian Neural Networks

by

Junping Hong

and

Ercan Engin Kuruoglu

^*

Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, China

^*

Author to whom correspondence should be addressed.

Entropy 2025, 27(4), 340; https://doi.org/10.3390/e27040340

Submission received: 31 December 2024 / Revised: 14 March 2025 / Accepted: 19 March 2025 / Published: 25 March 2025

(This article belongs to the Special Issue Bayesian Networks and Causal Discovery)

Download

Browse Figures

Review Reports Versions Notes

Abstract

Robustness is an important issue in deep learning, and Bayesian neural networks (BNNs) provide means of robustness analysis, while the minimax method is a conservative choice in the classical Bayesian field. Recently, researchers have applied the closed-loop idea to neural networks via the minimax method and proposed the closed-loop neural networks. In this paper, we study more conservative BNNs with the minimax method, which formulates a two-player game between a deterministic neural network and a sampling stochastic neural network. From this perspective, we reveal the connection between the closed-loop neural and the BNNs. We test the models on some simple data sets and study their robustness under noise perturbation, etc.

Keywords:

Bayesian neural networks; robustness; noise perturbation; minimax game; closed-loop neural networks; maximal coding rate distortion

1. Introduction

Nowadays, deep learning, as a data-driven method, has become more and more popular and has been applied to multiple areas, such as weather forecasting [1] and image classification [2,3]. Most neural networks are trained with supervised learning with an end-to-end framework. Representation learning seeks a good representation of the trained data, such as learning representation by mutual information [4] and maximal coding rate [5].

Although deep learning seems to be rather successful, robustness issues still haunt the field of deep learning [6,7]. Mackay [8] discussed the issues about determinist neural networks and introduced the first framework of Bayesian neural networks (BNNs); Neal [9] proposed Monte Carlo Monte Carlo (MCMC) methods for the implementation of BNNs which are computationally intensive. Dropout is a well-known technique that is proposed to prevent overfitting [10] and can be seen as an approximation of Bayesian methods [11]. BNNs aim to learn a distribution of neural networks through posterior estimation and use random variables to describe the weights of neural networks and update the mean and the variance simultaneously [12]. Previous works have shown that BNNs can quantify the uncertainty of neural networks [13], are robust to the choice of prior [14], and are more robust to gradient attack than deterministic neural networks [15]. For Prior Networks, Malinin and Gales [16] argue that the randomness of deep learning includes model uncertainty, data uncertainty, and distributional uncertainty, and people can use Prior Networks to perform out-of-distribution detection. Wang et al. [17] have applied MCMC methods to the information bottleneck study. In addition, the minimax method is often thought of as a robustness help for Bayesian methods [18], and it will improve the robustness at the cost of accuracy because it considers the best case of the worst case.

The minimax method has been used for a long time. Previous studies using the minimax game to study the robustness of neural networks use fault-tolerant neural networks [19,20,21] and obtain the two-player game between normal neural networks and fault neural networks. Closed-loop transcription neural networks [22] design a new two-player game between the decoder and the composition of encoder and decoder with a minimax loss for representation learning.

Inspired by the previous minimax work [22] and the classical BNNs, we apply the minimax game to the classical BNNs. The reason for this is to make the BNNs more conservative and reveal the connection between the closed-loop neural networks and BNNs. To the best of our knowledge, this is the first time that the classical BNNs have been separated as two neural networks through the minimax game. Other minimax works, such as fault-tolerant neural networks [21], care about how the prediction performance of neural networks behaves under different levels of fault nodes or edges in neural networks, which is similar to dropout techniques. Compared with closed-loop transcription networks [22], they use the composition of some deterministic neural networks to obtain the minimax game, while we use a stochastic neural network instead.

The contribution of this paper includes two perspectives. One is that we propose a naturally more conservative variant of a BNN and we can adjust the variance level of the weights of BNNs through minimax loss. And our framework can provide a reference for the variance setting for the classical BNNs. The other is that our framework reveals the connection between closed-loop neural networks and BNNs. In addition, this formulation provides flexibility to change the suitable randomness or variance for the noise part.

The paper is organized as follows: Section 2 introduces the formulation of the minimax BNNs. Section 3 presents the experiments and results. Section 4 is a discussion of the research.

2. Minimax Bayesian Neural Networks

Maximal coding rate reduction (MCR) first appeared as a loss for representation learning in 2020 [5]. Then, closed-loop neural networks formulated a minimax loss with MCR [22]. Minimax Bayesian neural networks (BNNs) use the same minimax loss as [22], and the main difference is that our minimax game is between a deterministic neural network f and a random sampling neural network

g = f + r * ξ

, where r is the radius of the hypersphere controlling the variance, and

ξ

is random noise. Besides, if we view the classical BNNs as a hypersphere, then f is the center and g is sampled on the equator. The minimax formulation for BNNs is given by

\begin{matrix} min_{g} max_{f} τ (f, g) ≐ Δ R (f (X)) + Δ R (g (X)) + Δ R (f (X), g (X)) \\ = Δ R (Z) + Δ R (\hat{Z}) + Δ R (Z, \hat{Z}), \end{matrix}

(1)

where X denotes the data, f denotes a deterministic neural network, and g denotes the sampling stochastic neural network.

Δ R (f (X))

and

Δ R (g (X))

denote the data compression with different labels by f and g through the MCR loss, and

Δ R (f (X), g (X))

characterizes the compression difference by f and g for the same data set. For the rigorous definition, please see [22].

Z = f (X)

denotes the final output of representation learning for f, and

\hat{Z} = g (X)

denotes the same for g. Note that the third term of this loss controls the gap between f and g. The current setting allows the gap to be relatively large sometimes, so we also use the

l o g

case to denote a smaller gap as

l o g (1 + Δ R (Z, \hat{Z}))

, one example is shown in Appendix A.2.

Because MCR loss might not be very straightforward, here we provide an equivalent formulation via supervised learning. Noting that the main focus is to understand the formulation of the minimax BNNs and reveal their connection to the closed-loop neural networks, we only run our experiments via representation learning formulation.

\begin{matrix} min_{f, g} τ (f, g) ≐ l o s s (f (X)) + l o s s (g (X)) \\ s . t . \begin{matrix} p r e (f (X)) - p r e (g (X)) = c, \end{matrix} \end{matrix} .

(2)

where X, f, and g denote the same as the previous case.

l o s s (f (X))

and

l o s s (g (X))

represent the losses for f and g, such as cross-entropy for classification.

p r e (f (X))

and

p r e (g (X))

are the final predictions for f or g. c is a constant that denotes the gap between f and g, for example, c could be 100 for 1000 predictions, which means we allow the maximal difference between f and g to be 10 percent. Note that we can use the Lagrange method to transform this into a min-max or max-min formulation [21].

Compared with the classical BNNs, which represent only one realization, the minimax Bayesian framework will train two neural networks, one is the mean or the center point and the other is sampled on the equator. Because minimax BNNs consider both the best case and the worst case, they are more robust than the classical BNNs, with the cost of performance due to the perturbation in g. Another difference is that the variance of weights is given in advance and updated across the training process, while the variance level of minimax BNNs is given by the setting gap and updated by the sampling. Then, for closed-loop neural networks with the same minimax loss as the representation in [22], the two deterministic neural networks are f and

f \circ g \circ f

. The main issue is that

f \circ g \circ f

cannot change as quickly as

g = f + r * ξ

, since we only need to find a suitable level for r. In their formulation, they need to add multiple activation functions to support the image generation, hence they need to use the batch normalization (BN) layer to accelerate the process [23].

3. Experiments and Results

The data sets include MNIST [24], Fashion MNIST (FMNIST) [25], and CIFAR-10 [26]. For MNIST and FMNIST, we use the same convolutional neural network (CNN) [27] as [22]. The optimization algorithm is Adam(0.5, 0.999), and the learning rate is 0.001. f only has one kind of activation function with Leakyrelu; f is initialized with

N (0, 0.02)

and

ξ

is sampled from

N (0, 0.01)

, and this might change if we are required to update the shape of

ξ

with Bayes by Backpropagation [13]. Normally, the zone for gold search is from 0 to 100, and we will use grid search to validate our correctness. After training, we map the data to the subspace and use the k-nearest neighbors (KNN) method [28] to predict the labels implemented through the scikit-learn package [29]. In addition, the suitable radius r is from

0.2

to

0.5

for the

l o g

case on MNIST data, and the r will decrease across the training process; r is usually about 3 to 6 without changing to the third term. We build the model through PyTorch (https://pytorch.org/, accessed on 13 March 2025) [30], and the codes are public at https://github.com/Jacob-Hong17/MinMax-BNN, accessed on 30 December 2024.

3.1. Main Results

In Table 1, we can see that the results of minimax BNNs are slightly worse than the closed-loop work in most cases because our formulation is also minimax, and the component neural network will bring more noise into the training process. However, our formulation is more robust than the closed-loop work because in every sampling we can always obtain the suitable r by sampling, while

f \circ g \circ f

can not quickly update like

g = f + r * ξ

. The closed-loop idea is of great importance in the control theory; here, we reveal the connection between the closed-loop work and the minimax BNNs. Apart from these findings, we also study the meaning of r for the MCR loss and some issues for searching for the suitable r in Appendix A, see Figure A1, Figure A2, Figure A3 and Figure A4.

In Table 2, we can see the best sampling result is for

r = 0.2

, this is because our formulation is minimax, which considers both the best case and the worst case. This is the reason why minimax BNNs can only obtain good results rather than the best results. However, this formulation is easy to implement, and the sampling results after training can be used as a reference for classical BNNs. For these results, classical BNNs should use a hyperparameter of variance that is at least below

0.2

.

3.2. Noise Perturbation

Furthermore, we test how r changes when adding noise to the data. In this part, we set the corruption ratio as

0, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9

with Gaussian random noise and normalization of the data and find the corresponding radius, see Figure 1. As we can see, the radius will normally decrease and then increase if we enlarge the noise ratio from 0 to 1. The reason why r will drop in the beginning is because adding small noise is equivalent to small perturbations from the Taylor expansion perspective. We can see that r will usually increase after reaching a changing point. Because there is more and more noise in the data, this requires more perturbation on g to obtain the fixed gap requirements based on the MCR loss.

3.3. Dataset Similarity

Our formulation for out-of-distribution detection is very similar to the one in [31] because

A (X + ξ) + b = (A + ξ) X + b

. Adding noise to the neural network is usually more costly, but our formulation allows us the flexibility to adjust the noise levels from small to large in contrast to the classical BNNs or previous work. In Table 3 and Table 4, we generate the corresponding radius r for different data sets to evaluate their similarity and obtain results similar to those in [31,32]. These clearly show that similar data sets will have a lower r or compression volume based on the trained neural networks f, and non-Gaussian noise like Cauchy noise will have a larger volume than in the Gaussian case.

4. Discussion

In this paper, we apply the minimax game to classical BNNs and obtain a more conservative variant of a BNN. Furthermore, we reveal the connection between closed-loop neural networks and our minimax BNNs and point out the limitation that closed-loop neural networks are not able to quickly adapt to the loss requirement compared with the minimax BNNs. Last but not the least, this framework provides some reference for variance setting for the classical BNNs and is more flexible in adjusting the variance level for well-trained models.

One limitation of this paper is that the distribution of the random sampling neural networks is always a Gaussian distribution, and we believe that minimax BNNs with non-Gaussian distributions might be worth exploring.

Author Contributions

Conceptualization, J.H.; Methodology, J.H.; Experiments, J.H.; Supervision, E.E.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Tsinghua University SIGS Start-up fund under Grant QD2022024C, Shenzhen Ubiquitous Data Enabling Key Lab under Grant ZDSYS20220527171406015, and Shenzhen Science and Technology Innovation Commission under Grant JCYJ20220530143002005.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Data is contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Appendix A.1

Here, we present how the MCR loss changes as the radius changes for different cases. Ideally, we believe that the relationship between the loss and the radius should be concave. However, if the representation dimension is not enough, the whole function is not concave, see Figure A1. Hence, we need to adjust the search area to obtain a suitable radius. Next, we show how the model with or without sufficient dimensions behaves under noise perturbation, see Figure A2, and we can see that models without sufficient dimensions are not robust compared to the others. Another point is that the trained model with a BN layer might perform poorly with high numbers of training epochs in contrast to the model without a BN layer because the activation function of the neural network we used is only simple relu or Leakrelu, which is not suitable for the design of BN.

Figure A1. Representation loss with different radius values or perturbation levels generated by grid search. The interval of 100 points in (b,c) is

0.01

, the interval of the next 100 points is

0.1

, and the interval of the last 100 points is 1. (a) Normal case, concave function. (b) Not concave, without sufficient dimensions,

l o g

. (c) Not concave, without sufficient dimensions.

Figure A1. Representation loss with different radius values or perturbation levels generated by grid search. The interval of 100 points in (b,c) is

0.01

, the interval of the next 100 points is

0.1

, and the interval of the last 100 points is 1. (a) Normal case, concave function. (b) Not concave, without sufficient dimensions,

l o g

. (c) Not concave, without sufficient dimensions.

Figure A2. Perturbation performance of different models on FMNIST (20 samples, 1000 epochs), noting bias is 0 to have a fair comparison. (a) CNN without bias or BN, 128 dim. (b) CNN without bias or BN, 11 dim. (c) CNN with bias 128 dim. (d) CNN with BN, 128 dim.

Appendix A.2

To validate the effectiveness of the golden search, we use grid search to compare the results, see Figure A3. For the MCR loss without

l o g

, r is normally from 2 to 5 for MNIST data and becomes smaller and smaller during training. For the MCR loss with

l o g

case, r is about

0.1

to

0.5

for MNIST, which also depends on how well the model is trained. Last but not least, adding bias or BN would affect the searching radius if we do not set the searching zone properly. In Figure A4, when adding bias or BN, the radius would be stuck at the far area, and we believe this is mainly because BN can not perfectly transform a distribution. Setting a suitable zone or resampling can deal with this.

Figure A3. Golden search validation by grid search: (a) Golden search, CNN with 128 dim. (b) Grid search, CNN with 128 dim. (c) Golden search, CNN with 11 dim. (d) Grid search, CNN with 11 dim.

Figure A4. Histogram of the search for r using the golden search on MNIST (

l o g

case, 1000 samples from 0 to 100). (a) CNN without bias, 128 dim,

l o g

. (b) CNN with bias, 128 dim,

l o g

. (c) CNN with BN, 128 dim,

l o g

.

Figure A4. Histogram of the search for r using the golden search on MNIST (

l o g

case, 1000 samples from 0 to 100). (a) CNN without bias, 128 dim,

l o g

. (b) CNN with bias, 128 dim,

l o g

. (c) CNN with BN, 128 dim,

l o g

.

References

Bi, K.; Xie, L.; Zhang, H.; Chen, X.; Gu, X.; Tian, Q. Pangu-weather: A 3D high-resolution model for fast and accurate global weather forecast. arXiv 2022, arXiv:2211.02556. [Google Scholar]
Li, S.; Song, W.; Fang, L.; Chen, Y.; Ghamisi, P.; Benediktsson, J.A. Deep learning for hyperspectral image classification: An overview. IEEE Trans. Geosci. Remote Sens. 2019, 57, 6690–6709. [Google Scholar]
Kuruoglu, E.E.; Taylor, A.S. Using Annotations for Summarizing a Document Image and Itemizing the Summary Based on Similar Annotations. US Patent 7,712,028, 4 May 2010. [Google Scholar]
Hjelm, R.D.; Fedorov, A.; Lavoie-Marchildon, S.; Grewal, K.; Bachman, P.; Trischler, A.; Bengio, Y. Learning deep representations by mutual information estimation and maximization. arXiv 2018, arXiv:1808.06670. [Google Scholar]
Yu, Y.; Chan, K.H.R.; You, C.; Song, C.; Ma, Y. Learning diverse and discriminative representations via the principle of maximal coding rate reduction. Adv. Neural Inf. Process. Syst. 2020, 33, 9422–9434. [Google Scholar]
Liu, M.; Liu, S.; Su, H.; Cao, K.; Zhu, J. Analyzing the noise robustness of deep neural networks. In Proceedings of the 2018 IEEE Conference on Visual Analytics Science and Technology (VAST), Berlin, Germany, 21–26 October 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 60–71. [Google Scholar]
Fawzi, A.; Moosavi-Dezfooli, S.M.; Frossard, P. The robustness of deep networks: A geometrical perspective. IEEE Signal Process. Mag. 2017, 34, 50–62. [Google Scholar]
MacKay, D.J. A practical Bayesian framework for backpropagation networks. Neural Comput. 1992, 4, 448–472. [Google Scholar]
Neal, R.M. Bayesian Learning for Neural Networks; Springer Science & Business Media: Berlin, Germany, 2012; Volume 118. [Google Scholar]
Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
Gal, Y.; Ghahramani, Z. Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. In Proceedings of the International Conference on Machine Learning, PMLR, New York, NY, USA, 20–22 June 2016; pp. 1050–1059. [Google Scholar]
Jospin, L.V.; Laga, H.; Boussaid, F.; Buntine, W.; Bennamoun, M. Hands-on Bayesian neural networks—A tutorial for deep learning users. IEEE Comput. Intell. Mag. 2022, 17, 29–48. [Google Scholar]
Blundell, C.; Cornebise, J.; Kavukcuoglu, K.; Wierstra, D. Weight uncertainty in neural network. In Proceedings of the International Conference on Machine Learning, PMLR, Lille, France, 7–9 July 2015; pp. 1613–1622. [Google Scholar]
Izmailov, P.; Vikram, S.; Hoffman, M.D.; Wilson, A.G.G. What are Bayesian neural network posteriors really like? In Proceedings of the International Conference on Machine Learning, PMLR, Virtual, 18–24 July 2021; pp. 4629–4640. [Google Scholar]
Carbone, G.; Wicker, M.; Laurenti, L.; Patane, A.; Bortolussi, L.; Sanguinetti, G. Robustness of Bayesian neural networks to gradient-based attacks. Adv. Neural Inf. Process. Syst. 2020, 33, 15602–15613. [Google Scholar]
Malinin, A.; Gales, M. Predictive uncertainty estimation via prior networks. Adv. Neural Inf. Process. Syst. 2018, 31, 7047–7058. [Google Scholar]
Wang, Z.; Huang, S.L.; Kuruoglu, E.E.; Sun, J.; Chen, X.; Zheng, Y. PAC-Bayes information bottleneck. arXiv 2021, arXiv:2109.14509. [Google Scholar]
Berger, J.O. Statistical Decision Theory and Bayesian Analysis; Springer Science & Business Media: Berlin, Germany, 2013. [Google Scholar]
Neti, C.; Schneider, M.H.; Young, E.D. Maximally fault tolerant neural networks. IEEE Trans. Neural Netw. 1992, 3, 14–23. [Google Scholar] [PubMed]
Deodhare, D.; Vidyasagar, M.; Keethi, S.S. Synthesis of fault-tolerant feedforward neural networks using minimax optimization. IEEE Trans. Neural Netw. 1998, 9, 891–900. [Google Scholar]
Duddu, V.; Rao, D.V.; Balas, V.E. Adversarial fault tolerant training for deep neural networks. arXiv 2019, arXiv:1907.03103. [Google Scholar]
Dai, X.; Tong, S.; Li, M.; Wu, Z.; Psenka, M.; Chan, K.H.R.; Zhai, P.; Yu, Y.; Yuan, X.; Shum, H.Y.; et al. Ctrl: Closed-loop transcription to an LDR via minimaxing rate reduction. Entropy 2022, 24, 456. [Google Scholar] [CrossRef] [PubMed]
Ioffe, S. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv 2015, arXiv:1502.03167. [Google Scholar]
LeCun, Y. The MNIST Database of Handwritten Digits. 1998. Available online: http://yann.lecun.com/exdb/mnist/ (accessed on 30 December 2024).
Xiao, H.; Rasul, K.; Vollgraf, R. Fashion-mnist: A novel image dataset for benchmarking machine learning algorithms. arXiv 2017, arXiv:1708.07747. [Google Scholar]
Krizhevsky, A.; Hinton, G. Learning Multiple Layers of Features from Tiny Images; University of Toronto: Toronto, ON, Canada, 2009. [Google Scholar]
LeCun, Y.; Bengio, Y. Convolutional networks for images, speech, and time series. Handb. Brain Theory Neural Netw. 1995, 3361, 1995. [Google Scholar]
Guo, G.; Wang, H.; Bell, D.; Bi, Y.; Greer, K. KNN model-based approach in classification. In On The Move to Meaningful Internet Systems 2003: CoopIS, DOA, and ODBASE: OTM Confederated International Conferences, CoopIS, DOA, and ODBASE 2003, Catania, Sicily, Italy, 3–7 November 2003; Proceedings; Springer: Berlin/Heidelberg, Germany, 2003; pp. 986–996. [Google Scholar]
Kramer, O.; Kramer, O. Scikit-learn. Mach. Learn. Evol. Strateg. 2016, 45–53. [Google Scholar] [CrossRef]
Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. Pytorch: An imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 2019, 32, 8026–8037. [Google Scholar]
Liang, S.; Li, Y.; Srikant, R. Enhancing the reliability of out-of-distribution image detection in neural networks. arXiv 2017, arXiv:1706.02690. [Google Scholar]
Mukhoti, J.; Kirsch, A.; van Amersfoort, J.; Torr, P.H.; Gal, Y. Deep deterministic uncertainty: A new simple baseline. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17 June–24 June 2023; pp. 24384–24394. [Google Scholar]

Figure 1. Perturbation impact on different data sets: (a) MNIST, (b) FMNIST.

Table 1. Comparison of accuracies of Minimax BNN and Closed-Loop NNs.

Models	f (MNIST)	Closed-Loop	f (FMNIST)	Closed-Loop
test 1	96.28%	96.57%	85.82%	86.09%
test 2	96.43%	96.40%	85.79%	86.24%
test 3 ( $l o g$ )	96.70%	96.82%	86.21%	86.81%
test 4 ( $l o g$ )	96.73%	97.25%	86.09%	86.71%

Table 2. Accuracy of sampling neural networks g on MNIST.

r (Test 4, 20 Samples)	Max	Min	Mean	Variance
0.1	97.01%	96.93%	96.97%	5.0 × $10^{- 8}$
0.2	97.07%	96.9%	96.96%	1.5 × $10^{- 7}$
0.5	97.03%	96.84%	96.93%	2.6 × $10^{- 7}$
1	96.95%	96.65%	96.81%	7.1 × $10^{- 7}$
2	96.38%	95.18%	95.82%	9.8 × $10^{- 6}$
3	92.21%	78.3%	84.78%	1.7 × $10^{- 3}$
4	62.99%	26.33%	40.83%	9.7 × $10^{- 3}$
6	17.88%	10.45%	13.33%	5.0 × $10^{- 4}$
8	14.54%	8.22%	10.84%	1.8 × $10^{- 4}$
10	11.93%	7.91%	10.29%	1.2 × $10^{- 4}$

Table 3. Different r values for other data sets trained by MNIST.

r ( $\log$ , 20 Samples)	Max	Min	Mean	Variance
MNIST	0.544	0.481	0.507	3.1 × $10^{- 4}$
FMNIST	1.768	1.186	1.473	1.8 × $10^{- 2}$
CIFAR-10 (channel 1)	3.134	1.999	2.639	1.0 × $10^{- 1}$
Gaussian	3.751	1.778	2.661	3.1 × $10^{- 1}$
Laplace	5.637	2.902	3.800	4.9 × $10^{- 1}$
Cauchy	5.807	4.188	4.942	1.9 × $10^{- 1}$

Table 4. Different r values for other data sets trained by FMNIST.

r ( $\log$ , 20 Samples)	Max	Min	Mean	Variance
FMNIST	0.591	0.503	0.546	5.4 × $10^{- 4}$
MNIST	1.904	1.479	1.671	1.6 × $10^{- 2}$
CIFAR-10 (channel 1)	2.599	2.155	2.458	1.2 × $10^{- 2}$
Gaussian	3.220	2.189	2.560	7.6 × $10^{- 2}$
Laplace	4.129	2.852	3.487	9.4 × $10^{- 2}$
Cauchy	5.851	4.121	4.723	1.6 × $10^{- 1}$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hong, J.; Kuruoglu, E.E. Minimax Bayesian Neural Networks. Entropy 2025, 27, 340. https://doi.org/10.3390/e27040340

AMA Style

Hong J, Kuruoglu EE. Minimax Bayesian Neural Networks. Entropy. 2025; 27(4):340. https://doi.org/10.3390/e27040340

Chicago/Turabian Style

Hong, Junping, and Ercan Engin Kuruoglu. 2025. "Minimax Bayesian Neural Networks" Entropy 27, no. 4: 340. https://doi.org/10.3390/e27040340

APA Style

Hong, J., & Kuruoglu, E. E. (2025). Minimax Bayesian Neural Networks. Entropy, 27(4), 340. https://doi.org/10.3390/e27040340

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Minimax Bayesian Neural Networks

Abstract

1. Introduction

2. Minimax Bayesian Neural Networks

3. Experiments and Results

3.1. Main Results

3.2. Noise Perturbation

3.3. Dataset Similarity

4. Discussion

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

Appendix A

Appendix A.1

Appendix A.2

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI