# M-ary Rank Classifier Combination: A Binary Linear Programming Problem

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Problem Statement and Model

## 3. Conditional Independence Properties

**Example**

**1**

**.**Consider a binary classification problem (with equiprobable classes ${C}_{1}$ and ${C}_{2}$) and two classifiers ${\mathcal{C}}_{1}$ and ${\mathcal{C}}_{2}$ with similar performances and whose outputs are ${\mathbf{u}}_{1}$ and ${\mathbf{u}}_{2}$, i.e., their probabilities of correct classification ${\alpha}_{1}$ and ${\alpha}_{2}$ are equal:

**Example**

**2**

**.**Consider once again the binary classification case introduced in Example (1) and assume that the classifiers are very efficient: ${\alpha}_{1}={\alpha}_{2}\approx 1$. Then

## 4. Rank Class Combination Problem

#### 4.1. Rank-Order Statistic Model

#### 4.2. Total Order Ranking with Disagreement Distance

#### 4.3. Total Order Ranking with Condorcet Distance

**Example**

**3**

**.**The problem selected to illustrate our theory is that of combining four classifiers for recognizing handwritten digits 0 to 9. Binary images from the MNIST database are used [33]. The four classifiers are tested on a sample and proposed rankings are collected in Table 2.

## 5. Classifier Ensemble Information Measure

#### 5.1. Disagreement Distance

**Theorem**

**1**

**.**Let ${\left\{{\mathcal{K}}_{i}\right\}}_{i=1}^{M}$ be an ensemble of conditionally independent classifiers voting on K classes. Then, the interval of variation of the conjunction coefficient ${I}_{d}$ is $[0,1]$.

#### 5.2. Condorcet Distance

**Theorem**

**2**

**.**Let ${\left\{{\mathcal{K}}_{i}\right\}}_{i=1}^{M}$ be an ensemble of conditionally independent classifiers voting on K classes. Then the interval of variation of the conjunction coefficient ${I}_{C}$ defined by (30) is

## 6. Experiments

#### 6.1. The Detection of Cervical Cancer

#### 6.2. The Dataset

#### 6.3. Experimental Protocol

## 7. Conclusions and Future Research

## Author Contributions

## Funding

## Conflicts of Interest

## Appendix A. Conjunction Coefficient Extreme Values for Disagreement Metric

## Appendix B. Conjunction Coefficient Extreme Values for the Condorcet Metric

**In the case of M being even**

**The case of M being odd.**Let $M=2m+1$. In the case of maximum disagreement, then ${\sum}_{i<j}^{K}({\delta}_{ij}-{\delta}_{ji})=\pm 1$ and ${\delta}_{ij}{\delta}_{ji}=m(m+1),\forall i,j$. It comes ${\sum}_{i<j}^{K}{\delta}_{ij}({\delta}_{ij}-1)={\sum}_{i<j}1=\frac{K(K-1)}{2}$. Moreover,

## References

- Schapire, R.E. Using output codes to boost multiclass learning problems. In Proceedings of the Fourteenth International Conference on Machine Learning, Nashville, TN, USA, 8–12 July 1997; pp. 313–321. [Google Scholar]
- Woźniak, M.; Graña, M.; Corchado, E. A survey of multiple classifier systems as hybrid systems. Special Issue on Information Fusion in Hybrid Intelligent Fusion Systems. Inf. Fusion
**2014**, 16, 3–17. [Google Scholar] [CrossRef] - Oza, N.; Tumer, L. Classifier ensembles: Select real-world applications. Inf. Fusion
**2008**, 9, 4–20. [Google Scholar] [CrossRef][Green Version] - Han, M.; Zhu, X.; Yao, W. Remote sensing image classification based on neural network ensemble algorithm. Neurocomputing
**2012**, 78, 133–138. [Google Scholar] [CrossRef] - Raj Kumar, P.A.; Selvakumar, S. Distributed Denial of Service Attack Detection Using an Ensemble of Neural Classifier. Comput. Commun.
**2011**, 34, 1328–1341. [Google Scholar] [CrossRef] - Bolton, R.J.; Hand, D.J. Statistical Fraud Detection: A Review. Stat. Sci.
**2002**, 17, 235–255. [Google Scholar] - Nanni, L. Ensemble of classifiers for protein fold recognition. Neurocomputing
**2006**, 69, 850–853. [Google Scholar] [CrossRef] - Vigneron, V.; Duarte, L.T. Rank-order principal components. A separation algorithm for ordinal data exploration. In Proceedings of the 2018 International Joint Conference on Neural Networks (IJCNN), Rio, Brazil, 8–13 July 2018; pp. 1036–1041. [Google Scholar]
- Altinçay, H.; Demirekler, M. An information theoretic framework for weight estimation in the combination of probabilistic classifiers for speaker identification. Speech Commun.
**2000**, 30, 255–272. [Google Scholar] [CrossRef] - Yang, S.; Browne, A. Neural network ensembles: Combining multiple models for enhanced performance using a multistage approach. Expert Syst.
**2004**, 21, 279–288. [Google Scholar] [CrossRef] - Wozniak, M. Hybrid Classifiers: Methods of Data, Knowledge, and Classifier Combination; Number 519 in Studies in Computational Intelligence; Springer: Berlin/Heidelberg, Germany, 2014. [Google Scholar]
- Kuncheva, L.I. Classifier Ensembles for Changing Environments. In Proceedings of the 5th International Workshop on Multiple Classifier Systems, Cagliari, Italy, 9–11 June 2004; pp. 1–15. [Google Scholar]
- Bhatt, N.; Thakkar, A.; Ganatra, A.; Bhatt, N. Ranking of Classifiers based on Dataset Characteristics using Active Meta Learning. Int.J. Comput. Appl.
**2013**, 69, 31–36. [Google Scholar] [CrossRef] - Abaza, A.; Ross, A. Quality Based Rank-level Fusion in Multibiometric Systems. In Proceedings of the 3rd IEEE International Conference on Biometrics: Theory, Applications and Systems, Washington, DC, USA, 28–30 September 2009; pp. 459–464. [Google Scholar]
- Li, Y.; Wang, N.; Perkins, E.; Zhang, C.; Gong, P. Identification and optimization of classifier genes from multi-class earthworm microarray dataset. PLoS ONE
**2010**, 5, e13715. [Google Scholar] [CrossRef] - García-Lapresta, J.L.; Martínez-Panero, M. Borda Count Versus Approval Voting: A Fuzzy Approach. Public Choice
**2002**, 112, 167–184. [Google Scholar] [CrossRef] - Zhang, H.; Su, J. Naive Bayesian Classifiers for Ranking. In Proceedings of the 15th European Conference on Machine Learning, Pisa, Italy, 20–24 September 2004; Volume 3201, pp. 501–512. [Google Scholar]
- Dietterich, T.G. Ensemble Methods in Machine Learning. In Proceedings of the First International Workshop on Multiple Classifier Systems, Cagliari, Italy, 21–23 June 2000; Springer: London, UK, 2000; pp. 1–15. [Google Scholar]
- Denison, D.D.; Hansen, M.; Holmes, C.C.; Mallick, B.; Yu, B. Nonlinear Estimation and Classification; Number 171 in Lecture Notes in Statistic; Springer: New York, NY, USA, 2003. [Google Scholar]
- Breiman, L. Bagging predictors. Mach. Learn.
**1996**, 24, 123–140. [Google Scholar] [CrossRef][Green Version] - Lee, S.; Kouzani, A.; Hu, E. Random forest based lung nodule classification aided by clustering. Comput. Med. Imaging Graph.
**2010**, 34, 535–542. [Google Scholar] [CrossRef] [PubMed] - Panigrahi, S.; Kundu, A.; Sural, S.; Majumdar, A. Credit card fraud detection: A fusion approach using Dempster–Shafer theory and Bayesian learning. Inf. Fusion
**2009**, 10, 354–363. [Google Scholar] [CrossRef] - Hansen, L.; Salamon, P. Neural network ensembles. IEEE Trans. Pattern Anal. Mach. Intell.
**1990**, 12, 993–1001. [Google Scholar] [CrossRef][Green Version] - Anthimopoulos, M.; Christodoulidis, S.; Ebner, L.; Christe, A.; Mougiakakou, S. Lung Pattern Classification for Interstitial Lung Diseases Using a Deep Convolutional Neural Network. IEEE Trans. Med. Imaging
**2016**, 35, 1207–1216. [Google Scholar] [CrossRef] - Kawaguchi, K. Deep Learning without Poor Local Minima. In Advances in Neural Information Processing Systems 29; Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2016; pp. 586–594. [Google Scholar]
- Datta, S.; Pihur, V.; Datta, S. An adaptive optimal ensemble classifier via bagging and rank aggregation with applications to high dimensional data. BMC Bioinform.
**2010**, 11, 427. [Google Scholar] [CrossRef] [PubMed] - Nadal, J.; Legault, R.; Suen, C. Complementary algorithms for the recognition of totally unconstrained handwritten numerals. In Proceedings of the 10th International Conference on Pattern Recognition, Atlantic City, NJ, USA, 16–21 June 1990; pp. 443–446. [Google Scholar]
- Brüggemann, R.; Patil, G. Ranking and Prioritization for Multi-Indicator Systems: Introduction to Partial Order Applications; Environmental and Ecological Statistics; Springer: New York, NY, USA, 2011. [Google Scholar]
- Benson, D. Representations of Elementary Abelian p-Groups and Vector Bundles, 1st ed.; Cambridge Tracts in Mathematics; Cambridge University Press: Cambridge, UK, 2016. [Google Scholar]
- Vigneron, V.; Duarte, L. Toward Rank Disaggregation: An Approach Based on Linear Programming and Latent Variable Analysis. In Latent Variable Analysis and Signal Separation; Tichavský, P., Babaie-Zadeh, M., Michel, O.J., Thirion-Moreau, N., Eds.; Springer International Publishing: Cham, Switzerland, 2017; pp. 192–200. [Google Scholar]
- Gehrlein, W.; Lepelley, D. Voting Paradoxes and Group Coherence: The Condorcet Efficiency of Voting Rules, 1st ed.; Studies in Choice and Welfare; Springer: Berlin/Heidelberg, Germany, 2011. [Google Scholar]
- Korte, B.; Vygen, J. Combinatorial Optimization: Theory and Algorithms, 4th ed.; Springer Publishing Company, Incorporated: Berlin/Heidelberg, Germany, 2007. [Google Scholar]
- LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-Based Learning Applied to Document Recognition. Proc. IEEE
**1998**, 86, 2278–2324. [Google Scholar] [CrossRef] - Li, G.; Guillaud, M.; Follen, M.; MacAulay, C. Double staining cytologic samples with quantitative Feulgen-thionin and anti-Ki-67 immunocytochemistry as a method of distinguishing cells with abnormal DNA content from normal cycling cells. Anal. Quant. Cytopathol. Histopathol.
**2012**, 34, 273–284. [Google Scholar] - Scheurer, M.; Guillaud, M.; Tortolero-Luna, G.; McAulay, C.; Follen, M.; Adler-Storthz, K. Human papillomavirus-related cellular changes measured by cytometric analysis of DNA ploidy and chromatin texture. Cytom. Part B Clin. Cytom.
**2007**, 72, 324–331. [Google Scholar] [CrossRef][Green Version] - Witten, D.M.; Tibshirani, R. A framework for feature selection in clustering. J. Am. Stat. Assoc.
**2010**, 105, 713–726. [Google Scholar] [CrossRef] [PubMed][Green Version] - Kondo, Y.; Salibian-Barrera, M.; Zamar, R. RSKC: An R Package for a Robust and Sparse K-Means Clustering Algorithm. J. Stat. Softw. Artic.
**2016**, 72, 1–26. [Google Scholar] [CrossRef] - Safo, S.E.; Ahn, J. General Sparse Multi-class Linear Discriminant Analysis. Comput. Stat. Data Anal.
**2016**, 99, 81–90. [Google Scholar] [CrossRef] - Zhong, M.; Tang, H.; Chen, H.; Tang, Y. An EM algorithm for learning sparse and overcomplete representations. Neurocomputing
**2004**, 57, 469–476. [Google Scholar] [CrossRef]

**Figure 1.**General framework for classifier combination. The classifier ${\mathcal{K}}_{i}\left(\mathbf{x}\right)$ produces output vector ${\mathbf{u}}_{i}$. Finally, from ${\mathbf{u}}_{i}$ the combination function produces a final decision vector $\mathbf{z}$.

**Figure 4.**Images of cervical cells colored with Papanicolaou stain. (

**a**) Clumps of abnormal cells with large nuclei. (

**b**) Abnormal cells with dense nuclei.

**Figure 6.**Graphic representations of the classification results for disagreement (blue) and Condorcet (red) distances.

**Table 1.**Confusion matrix of a classifier ${\mathcal{K}}_{i}$ used to estimate $p\left({U}_{ik}\right|{C}_{j})$ in the Bayesian approach. ${U}_{i}={R}_{j}$ denotes the classifier decision on class being ranked jth.

Predicted Classes | |||||||
---|---|---|---|---|---|---|---|

${R}_{1}$ | … | ${R}_{j}$ | … | ${R}_{K}$ | |||

True classes | ${C}_{1}$ | ${n}_{11}$ | ⋯ | ${n}_{1j}$ | ⋯ | ${n}_{1K}$ | ${n}_{1\phantom{\rule{0.166667em}{0ex}}\xb7\phantom{\rule{0.166667em}{0ex}}}$ |

⋮ | ⋮ | ⋱ | ⋮ | ⋮ | ⋮ | ||

${C}_{j}$ | ${n}_{j1}$ | ⋯ | ${n}_{jj}$ | ⋯ | ${n}_{jK}$ | ${n}_{j\phantom{\rule{0.166667em}{0ex}}\xb7\phantom{\rule{0.166667em}{0ex}}}$ | |

⋮ | ⋮ | ⋮ | ⋮ | ⋮ | |||

${C}_{K}$ | ${n}_{K1}$ | ⋯ | ${n}_{Kj}$ | ⋯ | ${n}_{KK}$ | ${n}_{K\phantom{\rule{0.166667em}{0ex}}\xb7\phantom{\rule{0.166667em}{0ex}}}$ | |

${n}_{\phantom{\rule{0.166667em}{0ex}}\xb7\phantom{\rule{0.166667em}{0ex}}1}$ | ⋯ | ${n}_{\phantom{\rule{0.166667em}{0ex}}\xb7\phantom{\rule{0.166667em}{0ex}}j}$ | ⋯ | ${n}_{\phantom{\rule{0.166667em}{0ex}}\xb7\phantom{\rule{0.166667em}{0ex}}K}$ |

Digits | Classifier Ranks | Proposed Rank | ||||
---|---|---|---|---|---|---|

${\mathcal{K}}_{\mathbf{1}}$ | ${\mathcal{K}}_{\mathbf{2}}$ | ${\mathcal{K}}_{\mathbf{3}}$ | ${\mathcal{K}}_{\mathbf{4}}$ | Disag. | Condorcet | |

0 | 1 | 4 | 3 | 10 | 3 | 3 |

1 | 2 | 2 | 1 | 2 | 2 | 2 |

2 | 3 | 1 | 2 | 1 | 1 | 1 |

3 | 4 | 6 | 4 | 3 | 4 | 4 |

4 | 5 | 5 | 7 | 5 | 5 | 6 |

5 | 6 | 3 | 6 | 4 | 6 | 5 |

6 | 7 | 8 | 5 | 6 | 7 | 7 |

7 | 8 | 7 | 8 | 9 | 8 | 8 |

8 | 9 | 10 | 10 | 8 | 10 | 10 |

9 | 10 | 9 | 9 | 7 | 9 | 9 |

HPV Test | Total Number of Cells | Number (or %) of | |||
---|---|---|---|---|---|

– Debris – – Cancer – | |||||

positive | 405 | 78 | (0.19) | 49 | (0.12) |

positive | 114 | 19 | (0.17) | 8 | (0.07) |

positive | 206 | 31 | (0.15) | 13 | (0.06) |

positive | 448 | 30 | (0.06) | 2 | (0.004) |

positive | 519 | 70 | (0.13) | 33 | (0.06) |

negative | 137 | 13 | (0.09) | – | – |

negative | 76 | 5 | (0.06) | – | – |

negative | 211 | 84 | (0.39) | – | – |

negative | 251 | 31 | (0.12) | – | – |

negative | 251 | 52 | (0.20) | – | – |

negative | 257 | 40 | (0.15) | – | – |

negative | 223 | 24 | (0.11) | – | – |

negative | 691 | 155 | (0.22) | – | – |

negative | 67 | 23 | (0.24) | – | – |

Total | 3857 | 655 | (0.17) | 105 | (0.02) |

No. of Patients | No. of Nuclei | No./Yype of Data | |
---|---|---|---|

control patients | 9 | 2165 | 427/noisy objects |

risky patients | 5 | 1692 | 105/atypical nuclei |

– | – | 228 / noisy objects | |

Total | 14 | 3857 | 760 objects |

**Table 5.**Classification results with disagreement and Condorcet combination rules using a set of M classifiers (with $4\le M\le 1321$).

Disagreement Distance | Condorcet Distance | ||||||||
---|---|---|---|---|---|---|---|---|---|

${\mathit{I}}_{\mathit{d}}$ | $\mathit{M}$ | Error Rate | FPR | FNR | ${\mathit{I}}_{\mathit{C}}$ | $\mathit{M}$ | Error Rate | FPR | FNR |

0.873 | 1321 | 0.0800 | 0.0934 | 0.0644 | 0.901 | 1321 | 0.0820 | 0.0870 | 0.0777 |

0.901 | 1073 | 0.0544 | 0.0491 | 0.0576 | 0.906 | 1073 | 0.0572 | 0.0566 | 0.0561 |

0.866 | 907 | 0.0560 | 0.0553 | 0.0562 | 0.966 | 907 | 0.0428 | 0.0529 | 0.0313 |

0.895 | 845 | 0.0524 | 0.0593 | 0.0464 | 0.920 | 845 | 0.0508 | 0.0532 | 0.0465 |

0.870 | 765 | 0.0484 | 0.0502 | 0.0500 | 0.822 | 765 | 0.0516 | 0.0555 | 0.0493 |

0.800 | 538 | 0.0636 | 0.0675 | 0.0600 | 0.817 | 538 | 0.0664 | 0.0718 | 0.0592 |

0.792 | 302 | 0.0728 | 0.0766 | 0.0710 | 0.744 | 302 | 0.1020 | 0.0923 | 0.1126 |

0.781 | 205 | 0.0896 | 0.0906 | 0.0864 | 0.757 | 205 | 0.1472 | 0.1015 | 0.1968 |

0.759 | 120 | 0.1292 | 0.1157 | 0.1439 | 0.776 | 120 | 0.1472 | 0.1118 | 0.1846 |

0.697 | 95 | 0.1424 | 0.1023 | 0.1809 | 0.660 | 95 | 0.1424 | 0.0985 | 0.1888 |

0.706 | 66 | 0.1392 | 0.1098 | 0.1667 | 0.689 | 66 | 0.1548 | 0.1055 | 0.2090 |

0.739 | 49 | 0.1328 | 0.0840 | 0.1854 | 0.672 | 49 | 0.1568 | 0.1117 | 0.2067 |

0.694 | 38 | 0.1460 | 0.1152 | 0.1770 | 0.516 | 38 | 0.1772 | 0.1404 | 0.2233 |

0.643 | 19 | 0.1540 | 0.1294 | 0.1801 | 0.484 | 19 | 0.1696 | 0.1323 | 0.2032 |

0.496 | 13 | 0.1644 | 0.1127 | 0.2224 | 0.455 | 13 | 0.1728 | 0.1261 | 0.2248 |

0.561 | 4 | 0.1580 | 0.1238 | 0.1962 | 0.477 | 4 | 0.1668 | 0.1234 | 0.2130 |

**Table 6.**Results obtained for the sparse k-means (SkM), general sparse multi-class linear discriminant analysis (GSM-LDA), and sparse EM (sEM) algorithms: Average and standard error of clustering error rate, false positive rate fpr, and false negative rate fnr on 20 simulations.

Algorithm | Error Rate | FPR | FNR |
---|---|---|---|

skm [36] | 0.192$\phantom{\rule{3.33333pt}{0ex}}\pm \phantom{\rule{3.33333pt}{0ex}}$0.016 | 0.205$\phantom{\rule{3.33333pt}{0ex}}\pm \phantom{\rule{3.33333pt}{0ex}}$0.044 | 0.165$\phantom{\rule{3.33333pt}{0ex}}\pm \phantom{\rule{3.33333pt}{0ex}}$0.084 |

gsm [38] | 0.159$\phantom{\rule{3.33333pt}{0ex}}\pm \phantom{\rule{3.33333pt}{0ex}}$0.022 | 0.133$\phantom{\rule{3.33333pt}{0ex}}\pm \phantom{\rule{3.33333pt}{0ex}}$0.050 | 0.118$\phantom{\rule{3.33333pt}{0ex}}\pm \phantom{\rule{3.33333pt}{0ex}}$0.099 |

sem [39] | 0.090$\phantom{\rule{3.33333pt}{0ex}}\pm \phantom{\rule{3.33333pt}{0ex}}$0.047 | 0.077$\phantom{\rule{3.33333pt}{0ex}}\pm \phantom{\rule{3.33333pt}{0ex}}$0.022 | 0.062$\phantom{\rule{3.33333pt}{0ex}}\pm \phantom{\rule{3.33333pt}{0ex}}$0.061 |

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Vigneron, V.; Maaref, H. *M*-ary Rank Classifier Combination: A Binary Linear Programming Problem. *Entropy* **2019**, *21*, 440.
https://doi.org/10.3390/e21050440

**AMA Style**

Vigneron V, Maaref H. *M*-ary Rank Classifier Combination: A Binary Linear Programming Problem. *Entropy*. 2019; 21(5):440.
https://doi.org/10.3390/e21050440

**Chicago/Turabian Style**

Vigneron, Vincent, and Hichem Maaref. 2019. "*M*-ary Rank Classifier Combination: A Binary Linear Programming Problem" *Entropy* 21, no. 5: 440.
https://doi.org/10.3390/e21050440