# A Model-Agnostic Algorithm for Bayes Error Determination in Binary Classification

^{1}

^{2}

^{3}

^{4}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Mathematical Notation and Dataset Aggregation

**Definition**

**1.**

**Definition**

**2.**

## 3. ILD Algorithm Mathematical Framework

#### 3.1. True Positive, True Negative, False Positive, False Negative

#### 3.2. Accuracy

**Theorem**

**1.**

**Proof.**

#### 3.3. Sensitivity and Specificity

#### 3.4. ROC Curve

#### 3.5. Perfect Bucket and Perfect Dataset

**Definition**

**3.**

**Definition**

**4.**

**Definition**

**5.**

**Definition**

**6.**

## 4. The Intrinsic Limit Determination Algorithm

#### 4.1. Effect of One Single Flip

#### 4.2. ILD Theorem

**Theorem**

**2**

**(ILD Theorem).**

**Proof.**

Algorithm 1: to construct the curve $\tilde{\mathcal{C}}$ |

#### 4.3. Handling Missing Values

## 5. Application of the ILD Algorithm to the Framingham Heart Study Dataset

- age: $0\phantom{\rule{4.pt}{0ex}}\mathrm{if}\phantom{\rule{4.pt}{0ex}}\mathrm{age}<40\phantom{\rule{4.pt}{0ex}}\mathrm{years},\phantom{\rule{4.pt}{0ex}}1\phantom{\rule{4.pt}{0ex}}\mathrm{if}\phantom{\rule{4.pt}{0ex}}\mathrm{age}\ge 40\phantom{\rule{4.pt}{0ex}}\mathrm{years}\phantom{\rule{4.pt}{0ex}}\mathrm{and}\phantom{\rule{4.pt}{0ex}}\mathrm{age}<60\phantom{\rule{4.pt}{0ex}}\mathrm{years},\phantom{\rule{4.pt}{0ex}}2\phantom{\rule{4.pt}{0ex}}\mathrm{if}\phantom{\rule{4.pt}{0ex}}\mathrm{age}\ge 60\phantom{\rule{4.pt}{0ex}}\mathrm{years}$;
- total cholesterol: $0\phantom{\rule{4.pt}{0ex}}\mathrm{if}\phantom{\rule{4.pt}{0ex}}\mathrm{total}\phantom{\rule{4.pt}{0ex}}\mathrm{cholesterol}<200\phantom{\rule{3.33333pt}{0ex}}\mathrm{mg}/\mathrm{dL},\phantom{\rule{4.pt}{0ex}}1\phantom{\rule{4.pt}{0ex}}\mathrm{if}\phantom{\rule{4.pt}{0ex}}\mathrm{total}\phantom{\rule{4.pt}{0ex}}\mathrm{cholesterol}\ge 200\phantom{\rule{3.33333pt}{0ex}}\mathrm{mg}/\mathrm{dL}\phantom{\rule{4.pt}{0ex}}\mathrm{and}\phantom{\rule{4.pt}{0ex}}\mathrm{total}\phantom{\rule{4.pt}{0ex}}\mathrm{cholesterol}<240\phantom{\rule{3.33333pt}{0ex}}\mathrm{mg}/\mathrm{dL},\phantom{\rule{4.pt}{0ex}}2\phantom{\rule{4.pt}{0ex}}\mathrm{if}\phantom{\rule{4.pt}{0ex}}\mathrm{total}\phantom{\rule{4.pt}{0ex}}\mathrm{cholesterol}\ge 240\phantom{\rule{3.33333pt}{0ex}}\mathrm{mg}/\mathrm{dL}$;
- SBP: $0\phantom{\rule{4.pt}{0ex}}\mathrm{if}\phantom{\rule{4.pt}{0ex}}\mathrm{SBP}<120\phantom{\rule{3.33333pt}{0ex}}\mathrm{mmHg},\phantom{\rule{4.pt}{0ex}}1\phantom{\rule{4.pt}{0ex}}\mathrm{if}\phantom{\rule{4.pt}{0ex}}\mathrm{SBP}\ge 120\phantom{\rule{3.33333pt}{0ex}}\mathrm{mmHg}$.

## 6. Conclusions

- to determine the prediction power (namely, the BE) of a specific set of categorical features,
- to decide when to stop searching for better models, and,
- to decide if it is necessary to enrich the dataset.

## Author Contributions

## Funding

## Data Availability Statement

## Conflicts of Interest

## Appendix A

**Definition**

**A1.**

**Theorem**

**A1.**

**Proof.**

#### Interpretation of the Perfection Index I_{p}

## References

- Raschka, S. Model evaluation, model selection, and algorithm selection in machine learning. arXiv
**2018**, arXiv:1811.12808. [Google Scholar] - Arlot, S.; Celisse, A. A survey of cross-validation procedures for model selection. Stat. Surv.
**2010**, 4, 40–79. [Google Scholar] [CrossRef] - Michelucci, U.; Venturini, F. Estimating neural network’s performance with bootstrap: A tutorial. Mach. Learn. Knowl. Extr.
**2021**, 3, 357–373. [Google Scholar] [CrossRef] - Michelucci, U. Applied Deep Learning—A Case-Based Approach to Understanding Deep Neural Networks; APRESS Media, LLC: New York, NY, USA, 2018. [Google Scholar]
- Yu, T.; Zhu, H. Hyper-parameter optimization: A review of algorithms and applications. arXiv
**2020**, arXiv:2003.05689. [Google Scholar] - García, V.; Mollineda, R.A.; Sánchez, J.S. On the k-NN performance in a challenging scenario of imbalance and overlapping. Pattern Anal. Appl.
**2008**, 11, 269–280. [Google Scholar] [CrossRef] - Yuan, B.W.; Luo, X.G.; Zhang, Z.L.; Yu, Y.; Huo, H.W.; Johannes, T.; Zou, X.D. A novel density-based adaptive k nearest neighbor method for dealing with overlapping problem in imbalanced datasets. Neural Comput. Appl.
**2021**, 33, 4457–4481. [Google Scholar] [CrossRef] - Schlimmer, J.C.; Granger, R.H. Incremental learning from noisy data. Mach. Learn.
**1986**, 1, 317–354. [Google Scholar] [CrossRef] - Angluin, D.; Laird, P. Learning from noisy examples. Mach. Learn.
**1988**, 2, 343–370. [Google Scholar] [CrossRef] [Green Version] - Raychev, V.; Bielik, P.; Vechev, M.; Krause, A. Learning programs from noisy data. ACM Sigplan Not.
**2016**, 51, 761–774. [Google Scholar] [CrossRef] - Tumer, K.; Ghosh, J. Bayes error rate estimation using classifier ensembles. Int. J. Smart Eng. Syst. Des.
**2003**, 5, 95–109. [Google Scholar] [CrossRef] [Green Version] - Gareth, J.; Daniela, W.; Trevor, H.; Robert, T. An Introduction to Statistical Learning: With Applications in R; Spinger: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
- Tumer, K.; Bollacker, K.; Ghosh, J. A mutual information based ensemble method to estimate bayes error. In Intelligent Engineering Systems through Artificial Neural Networks; ASME Press: New York, NY, USA, 1998; Volume 8. [Google Scholar]
- Ghosh, J. Multiclassifier systems: Back to the future. In International Workshop on Multiple Classifier Systems; Springer: Berlin/Heidelberg, Germany, 2002; pp. 1–15. [Google Scholar]
- Richard, M.D.; Lippmann, R.P. Neural network classifiers estimate Bayesian a posteriori probabilities. Neural Comput.
**1991**, 3, 461–483. [Google Scholar] [CrossRef] [PubMed] - Shoemaker, P.; Carlin, M.; Shimabukuro, R.; Priebe, C. Least-Squares Learning and Approximation of Posterior Probabilities on Classification Problems by Neural Network Models; Technical Report; Naval Ocean Systems Center: San Diego, CA, USA, 1991. [Google Scholar]
- Gibson, W.J.; Nafee, T.; Travis, R.; Yee, M.; Kerneis, M.; Ohman, M.; Gibson, C.M. Machine learning versus traditional risk stratification methods in acute coronary syndrome: A pooled randomized clinical trial analysis. J. Thromb. Thrombolysis
**2020**, 49, 1–9. [Google Scholar] [CrossRef] [PubMed] - Sherazi, S.W.A.; Jeong, Y.J.; Jae, M.H.; Bae, J.W.; Lee, J.Y. A machine learning–based 1-year mortality prediction model after hospital discharge for clinical patients with acute coronary syndrome. Health Inform. J.
**2020**, 26, 1289–1304. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Vaid, A.; Somani, S.; Russak, A.J.; De Freitas, J.K.; Chaudhry, F.F.; Paranjpe, I.; Johnson, K.W.; Lee, S.J.; Miotto, R.; Richter, F.; et al. Machine learning to predict mortality and critical events in a cohort of patients with COVID-19 in New York City: Model development and validation. J. Med. Internet Res.
**2020**, 22, e24018. [Google Scholar] [CrossRef] [PubMed] - Kim, H.J.; Han, D.; Kim, J.H.; Kim, D.; Ha, B.; Seog, W.; Lee, Y.K.; Lim, D.; Hong, S.O.; Park, M.J.; et al. An Easy-to-Use Machine Learning Model to Predict the Prognosis of Patients with COVID-19: Retrospective Cohort Study. J. Med. Internet Res.
**2020**, 22, e24225. [Google Scholar] [CrossRef] [PubMed] - Wang, S.; Pathak, J.; Zhang, Y. Using electronic health records and machine learning to predict postpartum depression. In MEDINFO 2019: Health and Wellbeing e-Networks for All; IOS Press, 1013 BG: Amsterdam, The Netherlands, 2019; pp. 888–892. [Google Scholar]
- Hogg, R.V.; Tanis, E.A.; Zimmerman, D.L. Probability and Statistical Inference; Pearson/Prentice Hall: Upper Saddle River, NJ, USA, 2010. [Google Scholar]
- Mahmood, S.S.; Levy, D.; Vasan, R.S.; Wang, T.J. The Framingham Heart Study and the epidemiology of cardiovascular disease: A historical perspective. Lancet
**2014**, 383, 999–1008. [Google Scholar] [CrossRef] [Green Version] - Nocedal, J.; Wright, S. Numerical Optimization; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
- Framingham Dataset Download, Kaggle Website. 2021. Available online: https://www.kaggle.com/eeshanpaul/framingham (accessed on 29 June 2021).
- Wilson, P.W.; D’Agostino, R.B.; Levy, D.; Belanger, A.M.; Silbershatz, H.; Kannel, W.B. Prediction of coronary heart disease using risk factor categories. Circulation
**1998**, 97, 1837–1847. [Google Scholar] [CrossRef] [PubMed] [Green Version] - D’Agostino, R.B.; Vasan, R.S.; Pencina, M.J.; Wolf, P.A.; Cobain, M.; Massaro, J.M.; Kannel, W.B. General cardiovascular risk profile for use in primary care. Circulation
**2008**, 117, 743–753. [Google Scholar] [CrossRef] [PubMed] [Green Version] - World Health Organisation. Cardiovascular Diseases (CVDs). 2021. Available online: https://www.who.int/news-room/fact-sheets/detail/cardiovascular-diseases-(cvds) (accessed on 28 June 2021).
- Herschtal, A.; Raskutti, B. Optimising area under the ROC curve using gradient descent. In Proceedings of the Twenty-First International Conference on Machine Learning, Banff, AB, Canada, 4–8 July 2004; p. 49. [Google Scholar]
- Joachims, T. A support vector method for multivariate performance measures. In Proceedings of the 22nd International Conference on Machine Learning, Bonn, Germany, 7–11 August 2005; pp. 377–384. [Google Scholar]

**Figure 1.**An intuitive representation of the dataset aggregation step for a dataset with two binary features ${F}_{1}$ and ${F}_{2}$. Observations with, for example, ${F}_{1}=0$ and ${F}_{2}=0$ will be in bucket 1 in the aggregated dataset B. Features with ${F}_{1}=0$ and ${F}_{2}=1$ in bucket 2 and so on.

**Figure 3.**Two examples of ROC curves obtained from random flipping using the dataset as described in the text.

**Figure 4.**Visual explanation for the ILD Theorem. Panel (

**A**): two consecutive segments after flipping components j and then $j+1$; Panel (

**B**): two consecutive segments after flipping components $j+1$ and then j; Panel (

**C**): parallelogram representing the difference between the area under the segments in panel (

**A**) and the area under the segments in panel (

**B**).

**Figure 5.**Comparison of the performance of the ILD algorithm (red) and Naïve Bayes classifier (sblue) implemented on categorical features based on one single training and validation split using the dataset as described in the text.

**Figure 6.**Comparison between the performance of the ILD algorithm (red) and Naïve Bayes classifier (blue) implemented on categorical features based on 100 different training and validation splits. (Top panel): AUC; (bottom panel): difference between the AUC provided by the ILD algorithm and the Naïve Bayes classifier.

Bucket | Feature 1 | Feature 2 | Class 0 | Class 1 |
---|---|---|---|---|

1 | ${F}_{1}^{\left[1\right]}=0$ | ${F}_{2}^{\left[1\right]}=0$ | ${m}_{0}^{\left[1\right]}$ | ${m}_{1}^{\left[1\right]}$ |

2 | ${F}_{1}^{\left[2\right]}=0$ | ${F}_{2}^{\left[2\right]}=1$ | ${m}_{0}^{\left[2\right]}$ | ${m}_{1}^{\left[2\right]}$ |

3 | ${F}_{1}^{\left[3\right]}=1$ | ${F}_{2}^{\left[3\right]}=0$ | ${m}_{0}^{\left[3\right]}$ | ${m}_{1}^{\left[3\right]}$ |

4 | ${F}_{1}^{\left[4\right]}=1$ | ${F}_{2}^{\left[4\right]}=1$ | ${m}_{0}^{\left[4\right]}$ | ${m}_{1}^{\left[4\right]}$ |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Michelucci, U.; Sperti, M.; Piga, D.; Venturini, F.; Deriu, M.A.
A Model-Agnostic Algorithm for Bayes Error Determination in Binary Classification. *Algorithms* **2021**, *14*, 301.
https://doi.org/10.3390/a14110301

**AMA Style**

Michelucci U, Sperti M, Piga D, Venturini F, Deriu MA.
A Model-Agnostic Algorithm for Bayes Error Determination in Binary Classification. *Algorithms*. 2021; 14(11):301.
https://doi.org/10.3390/a14110301

**Chicago/Turabian Style**

Michelucci, Umberto, Michela Sperti, Dario Piga, Francesca Venturini, and Marco A. Deriu.
2021. "A Model-Agnostic Algorithm for Bayes Error Determination in Binary Classification" *Algorithms* 14, no. 11: 301.
https://doi.org/10.3390/a14110301