# A Theoretical Approach to Ordinal Classification: Feature Space-Based Definition and Classifier-Independent Detection of Ordinal Class Structures

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Formalisation and Generalised Working Definition for Ordinal Class Structures

#### 2.1. Formalisation

**Definition**

**1**

**Remark**

**1**

**Example**

**1**

#### 2.2. Feature Space-Based Working Definition for Ordinal Classification Tasks

**Definition**

**2**

**Definition**

**3**

## 3. Comparison to Previous Work and Additional Theoretical Outcomes

#### 3.1. Special Case for 3-Class Classification Tasks and Detection of FS-Ordinal Structures

**Theorem**

**1**

#### 3.2. FS-Ordinal versus SVM-Ordinal Structures

**Corollary**

**1**

## 4. Classifier-Independent Level of Separability Measures

#### 4.1. Discriminant Ratio

#### 4.2. Ordinal-Scaled and Categorical Features

#### 4.3. Interpretation

## 5. Evaluation

#### 5.1. Traditionally Ordinal Data Sets

#### 5.2. Additional Data Set Information

#### 5.3. Results for Traditionally Ordinal Data Sets

#### 5.4. Results for Traditionally Non-Ordinal Data Sets

#### 5.5. Running Time Comparison

## 6. Discussion

#### 6.1. Operational Complexity and Detection Limitations

#### 6.2. Iris Data Set—A Motivational Example for the Detection of FS-Ordinal Structures

## 7. Conclusions

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

## Abbreviations

ANN | Artificial Neural Network |

AOT | Averaged Operational Time |

BVDB | BioVid Heat Pain Database |

CM | Classification Model |

CMC | Contraceptive Method Choice (Data Set) |

CV | Cross Validation |

DR | Discriminant Ratio |

ECG | Electrocardiogram |

EDA | Electrodermal Activity |

EMG | Electromyogram |

ERA | Employee Rejection/Acceptance (Data Set) |

ESL | Employee Selection (Data Set) |

FS-ordinal | Feature Space-Based Ordinal |

LEV | Lecturers Evaluation (Data Set) |

LSM | Level of Separability Measure |

Mfeat | Multiple Features (Data Set) |

OC | Ordinal Classification |

OR | Ordinal Regression |

PSM | Pairwise Separability Matrix |

SMO | Sequential Minimal Optimisation |

std | Standard Deviation |

SVM | Support Vector Machine |

SVM-Acc | Support Vector Machine Resubstitution Accuracy |

SWD | Social Workers Decisions (Data Set) |

## Appendix A. Proof of Corollary 1

## Appendix B. BioVid Heat Pain Database Part A

## References

- Bellmann, P.; Lausser, L.; Kestler, H.A.; Schwenker, F. Introducing Bidirectional Ordinal Classifier Cascades Based on a Pain Intensity Recognition Scenario; ICPR Workshops (6); Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2020; Volume 12666, pp. 773–787. [Google Scholar]
- Hühn, J.C.; Hüllermeier, E. Is an ordinal class structure useful in classifier learning? IJDMMM
**2008**, 1, 45–67. [Google Scholar] [CrossRef][Green Version] - Lattke, R.; Lausser, L.; Müssel, C.; Kestler, H.A. Detecting Ordinal Class Structures. In MCS; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2015; Volume 9132, pp. 100–111. [Google Scholar]
- LeCun, Y.; Bengio, Y.; Hinton, G.E. Deep learning. Nature
**2015**, 521, 436–444. [Google Scholar] [CrossRef] [PubMed] - Liu, Y.; Kong, A.W.; Goh, C.K. Deep Ordinal Regression Based on Data Relationship for Small Datasets. In Proceedings of the Twenty-Sixth Joint Conference on Artificial Intelligence (IJCAI), Melbourne, Australia, 19–25 August 2017; pp. 2372–2378. [Google Scholar]
- Lin, Z.; Gao, Z.; Ji, H.; Zhai, R.; Shen, X.; Mei, T. Classification of cervical cells leveraging simultaneous super-resolution and ordinal regression. Appl. Soft Comput.
**2022**, 115, 108208. [Google Scholar] [CrossRef] - Niu, Z.; Zhou, M.; Wang, L.; Gao, X.; Hua, G. Ordinal Regression with Multiple Output CNN for Age Estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; IEEE Computer Society: Washington, DC, USA, 2016; pp. 4920–4928. [Google Scholar]
- Chen, S.; Zhang, C.; Dong, M.; Le, J.; Rao, M. Using Ranking-CNN for Age Estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; IEEE Computer Society: Washington, DC, USA, 2017; pp. 742–751. [Google Scholar]
- Gutiérrez, P.A.; Pérez-Ortiz, M.; Sánchez-Monedero, J.; Fernández-Navarro, F.; Hervás-Martínez, C. Ordinal Regression Methods: Survey and Experimental Study. IEEE Trans. Knowl. Data Eng.
**2016**, 28, 127–146. [Google Scholar] [CrossRef][Green Version] - Cruz-Ramírez, M.; Hervás-Martínez, C.; Sánchez-Monedero, J.; Gutiérrez, P.A. Metrics to guide a multi-objective evolutionary algorithm for ordinal classification. Neurocomputing
**2014**, 135, 21–31. [Google Scholar] [CrossRef] - Cardoso, J.S.; Sousa, R.G. Measuring the Performance of Ordinal Classification. IJPRAI
**2011**, 25, 1173–1195. [Google Scholar] [CrossRef][Green Version] - Frank, E.; Hall, M.A. A Simple Approach to Ordinal Classification. In ECML; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2001; Volume 2167, pp. 145–156. [Google Scholar]
- Cover, T.M.; Hart, P.E. Nearest neighbor pattern classification. IEEE Trans. Inf. Theory
**1967**, 13, 21–27. [Google Scholar] [CrossRef] - Breiman, L.; Friedman, J.H.; Olshen, R.A.; Stone, C.J. Classification and Regression Trees; Wiley: Wadsworth, OH, USA, 1984. [Google Scholar]
- Vapnik, V. The Nature of Statistical Learning Theory; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
- Abe, S. Support Vector Machines for Pattern Classification; Advances in Pattern Recognition; Springer: London, UK, 2005. [Google Scholar]
- Chu, W.; Keerthi, S.S. New approaches to support vector ordinal regression. In ICML; ACM International Conference Proceeding Series; ACM: New York, NY, USA, 2005; Volume 119, pp. 145–152. [Google Scholar]
- Cardoso, J.S.; da Costa, J.F.P.; Cardoso, M.J. Modelling ordinal relations with SVMs: An application to objective aesthetic evaluation of breast cancer conservative treatment. Neural Netw.
**2005**, 18, 808–817. [Google Scholar] [CrossRef] [PubMed][Green Version] - Chu, W.; Keerthi, S.S. Support Vector Ordinal Regression. Neural Comput.
**2007**, 19, 792–815. [Google Scholar] [CrossRef] [PubMed] - Lausser, L.; Schäfer, L.M.; Schirra, L.R.; Szekely, R.; Schmid, F.; Kestler, H.A. Assessing phenotype order in molecular data. Sci. Rep.
**2019**, 9, 1–10. [Google Scholar] [CrossRef] [PubMed][Green Version] - Bellmann, P.; Schwenker, F. Ordinal Classification: Working Definition and Detection of Ordinal Structures. IEEE Access
**2020**, 8, 164380–164391. [Google Scholar] [CrossRef] - McCullagh, P. Regression models for ordinal data. J. R. Stat. Soc. Ser. B Methodol.
**1980**, 42, 109–127. [Google Scholar] [CrossRef] - Agresti, A. Analysis of Ordinal Categorical Data; John Wiley & Sons: Hoboken, NJ, USA, 2010; Volume 656. [Google Scholar]
- Fisher, R.A. The use of multiple measurements in taxonomic problems. Ann. Eugen.
**1936**, 7, 179–188. [Google Scholar] [CrossRef] - Kächele, M.; Palm, G.; Schwenker, F. SMO Lattices for the Parallel Training of Support Vector Machines. In Proceedings of the ESANN, Bruges, Belgium, 22–24 April 2015. [Google Scholar]
- Fan, R.; Chen, P.; Lin, C. Working Set Selection Using Second Order Information for Training Support Vector Machines. J. Mach. Learn. Res.
**2005**, 6, 1889–1918. [Google Scholar] - Dua, D.; Graff, C. UCI Machine Learning Repository; University of California: Irvine, CA, USA, 2017. [Google Scholar]
- Wilcoxon, F. Individual Comparisons by Ranking Methods. Biom. Bull.
**1945**, 1, 80–83. [Google Scholar] [CrossRef] - Bellmann, P.; Thiam, P.; Schwenker, F. Using Meta Labels for the Training of Weighting Models in a Sample-Specific Late Fusion Classification Architecture. In Proceedings of the ICPR, Milan, Italy, 10–15 January 2021; IEEE: Washington, DC, USA, 2020; pp. 2604–2611. [Google Scholar]
- Breiman, L. Bagging Predictors. Mach. Learn.
**1996**, 24, 123–140. [Google Scholar] [CrossRef][Green Version] - Snoek, C.; Worring, M.; Smeulders, A.W.M. Early versus late fusion in semantic video analysis. In Proceedings of the ACM Multimedia, Singapore, 6–11 November 2005; ACM: New York, NY, USA, 2005; pp. 399–402. [Google Scholar]
- Schäfer, L.M. Systems Biology of Tumour Evolution: Estimating Orders from Omics Data. Ph.D. Thesis, Universität Ulm, Ulm, Germany, 2021. [Google Scholar]
- Cover, T.M. Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition. IEEE Trans. Electron. Comput.
**1965**, EC-14, 326–334. [Google Scholar] [CrossRef][Green Version] - Lausser, L.; Schäfer, L.M.; Kestler, H.A. Ordinal Classifiers Can Fail on Repetitive Class Structures. Arch. Data Sci. Ser. A
**2018**, 4, 25. [Google Scholar] - Walter, S.; Gruss, S.; Ehleiter, H.; Tan, J.; Traue, H.C.; Crawcour, S.C.; Werner, P.; Al-Hamadi, A.; Andrade, A.O. The biovid heat pain database data for the advancement and systematic validation of an automated pain recognition system. In Proceedings of the CYBCONF, Lausanne, Switzerland, 13–15 June 2013; IEEE: Washington, DC, USA, 2013; pp. 128–131. [Google Scholar]
- Kächele, M.; Amirian, M.; Thiam, P.; Werner, P.; Walter, S.; Palm, G.; Schwenker, F. Adaptive confidence learning for the personalization of pain intensity estimation systems. Evol. Syst.
**2017**, 8, 71–83. [Google Scholar] [CrossRef] - Kächele, M.; Thiam, P.; Amirian, M.; Schwenker, F.; Palm, G. Methods for Person-Centered Continuous Pain Intensity Assessment From Bio-Physiological Channels. J. Sel. Top. Signal Process.
**2016**, 10, 854–864. [Google Scholar] [CrossRef]

**Figure 1.**General classification task processing steps.

**Left**: Sequential processing steps.

**Right**: Step-specific processing examples. The detection of ordinal class structures is included in the Data Analysis step (highlighted in green colour, in the online version of the manuscript).

**Figure 2.**Example of an ordinal-structured 2-dimensional 5-class toy data set with class order ${\omega}_{1}\prec {\omega}_{2}\prec {\omega}_{3}\prec {\omega}_{4}\prec {\omega}_{5}$. The relationship between ${\mu}_{2,3}$ and ${\mu}_{3,4}$ could be either ≤ or ≥, because class ${\omega}_{2}$ is closer to edge class ${\omega}_{1}$, whereas class ${\omega}_{4}$ is closer to edge class ${\omega}_{5}$. For ${\mu}_{3,5}$ and ${\mu}_{3,4}$, it holds ${\mu}_{3,5}\ge {\mu}_{3,4}$.

**Figure 3.**Detectionof FS-ordinal structures. If the given task ${X}_{\Omega}$ constitutes an FS-ordinal classification task, then the output includes exactly two permutations, which represent the ordinal structure of the current task (This figure is adapted from our previous work [21].).

**Figure 4.**Example of an ordinal-structured 3-class toy data set with class order ${\omega}_{1}\prec {\omega}_{2}\prec {\omega}_{3}$ (This figure is adapted from our previous work [21].).

**Figure 6.**Iris data set. Depicted are all binary combinations of the features Sepal Length, Sepal Width, Petal Length, and Petal Width, in cm. The legend is provided in the bottom right plot.

Variable | Description |
---|---|

$X\subset {\mathbb{R}}^{d}$ | d-dimensional data set, $d\in \mathbb{N}$ |

$\Omega =\{{\omega}_{1},\dots ,{\omega}_{c}\}$ | set of class labels, with $c>2$, $c\in \mathbb{N}$ |

$I=\{1,\dots ,c\}$ | index set |

${\mathcal{T}}^{c}$ | set of all permutations $\tau $ of the set I |

$\mu \in {\mathcal{M}}^{d}$ | mapping for measuring the level of separability |

${\mu}_{i,j}\in {\mathbb{R}}_{\ge 0}$ | level of separability between classes ${\omega}_{i}$ and ${\omega}_{j}$ |

${M}^{\left(\tau \right)}={\left({\mu}_{\tau \left(i\right),\tau \left(j\right)}\right)}_{i,j=1}^{c}$ | symmetric pairwise separability matrix (PSM) |

Author | Middle Name | Institute | ORCID | Notation |
---|---|---|---|---|

Ludwig Lausser | No | MSB | No | ${x}_{1}$ |

Hans A. Kestler | Yes | MSB | Yes | ${x}_{2}$ |

Friedhelm Schwenker | No | NIP | Yes | ${x}_{3}$ |

**Table 3.**Data Set Properties (Traditionally Ordinal Data Sets). Cl: Number of Classes. Fea: Number of Features. Sam: Number of Samples. #${\omega}_{i}$: Number of samples in class ${\omega}_{i}$.

Data Set | Cl | Fea | Sam | #${\mathit{\omega}}_{1}$ | #${\mathit{\omega}}_{2}$ | #${\mathit{\omega}}_{3}$ | #${\mathit{\omega}}_{4}$ | #${\mathit{\omega}}_{5}$ | #${\mathit{\omega}}_{6}$ | #${\mathit{\omega}}_{7}$ | #${\mathit{\omega}}_{8}$ | #${\mathit{\omega}}_{9}$ |
---|---|---|---|---|---|---|---|---|---|---|---|---|

CMC | 3 | 9 | 1473 | 629 | 511 | 333 | − | − | − | − | − | − |

LEV-4 | 4 | 4 | 1000 | 93 | 280 | 403 | 224 | − | − | − | − | − |

SWD | 4 | 10 | 1000 | 32 | 352 | 399 | 217 | − | − | − | − | − |

Cars | 4 | 6 | 1728 | 1210 | 384 | 69 | 65 | − | − | − | − | − |

Nursery | 4 | 8 | $\mathrm{12,958}$ | 4320 | 328 | 4266 | 4044 | − | − | − | − | − |

ESL-5 | 5 | 4 | 488 | 52 | 100 | 116 | 135 | 85 | − | − | − | − |

LEV | 5 | 4 | 1000 | 93 | 280 | 403 | 197 | 27 | − | − | − | − |

BVDB | 5 | 194 | 8700 | 1740 | 1740 | 1740 | 1740 | 1740 | − | − | − | − |

ERA-7 | 7 | 4 | 1000 | 92 | 142 | 181 | 172 | 158 | 118 | 137 | − | − |

ESL | 9 | 4 | 488 | 2 | 12 | 38 | 100 | 116 | 135 | 62 | 19 | 4 |

ERA | 9 | 4 | 1000 | 92 | 142 | 181 | 172 | 158 | 118 | 88 | 31 | 18 |

**Table 4.**Ordinal Structure Detection (Traditionally Ordinal Data Sets). DR: Detection based on the discriminant ratio. SVM-Acc: Detection based on the linear SVM resubstitution accuracy. ✓: Ordinal class structure found. ×: No ordinal class structure found.

Type | CMC | LEV-4 | SWD | Cars | Nursery | ESL-5 | LEV | BVDB | ERA-7 | ESL | ERA |
---|---|---|---|---|---|---|---|---|---|---|---|

DR | ✓ | ✓ | ✓ | × | × | ✓ | ✓ | ✓ | ✓ | × | ✓ |

SVM-Acc | × | × | × | × |

**Table 5.**Data Set Properties (Traditionally Non-Ordinal Data Sets). Cl: Number of Classes. Fea: Number of Features. Sam: Number of Samples.

Data Set | Cl | Fea | Sam | Class Distribution |
---|---|---|---|---|

Iris | 3 | 4 | 150 | 50 per class |

Seeds | 3 | 7 | 210 | 70 per class |

Forests | 4 | 27 | 523 | 83—86—159—195 |

Vehicles | 4 | 18 | 846 | 199—212—217—218 |

Segment | 7 | 19 | 2310 | 330 per class |

Mfeat | 10 | 649 | 2000 | 200 per class |

**Table 6.**Ordinal Structure Detection (Traditionally Non-Ordinal Data Sets). DR: Detection based on the discriminant ratio. SVM-Acc: Detection based on the linear SVM resubstitution accuracy. ✓ : Ordinal class structure found. ×: No ordinal class structure found.

Type | Iris | Seeds | Forests | Vehicles | Segment | Mfeat |
---|---|---|---|---|---|---|

DR | ✓ | ✓ | × | ✓ | × | × |

SVM-Acc | × | ✓ | ✓ | × | × | × |

**Table 7.**Running Time Comparison. Cl: Number of Classes. Fea: Number of Features. Sam: Number of Samples. DR: Detection based on the discriminant ratio. SVM-Acc: Detection based on the linear SVM resubstitution accuracy. Depicted are the mean and standard deviation (std) values, for the operational time in ms, averaged over ten repetitions. For the SVM-Acc approach, for the BVDB data set, we removed the digits from the std value, for the sake of readability.

Data Set | Cl | Fea | Sam | DR | SVM-Acc |
---|---|---|---|---|---|

Iris | 3 | 4 | 150 | 0.25 ± 0.14 | 19.97 ± 3.02 |

Seeds | 3 | 7 | 210 | 0.16 ± 0.01 | 17.36 ± 1.61 |

CMC | 3 | 9 | 1473 | 0.34 ± 0.11 | 1987.39 ± 2.25 |

Forests | 4 | 27 | 523 | 0.42 ± 0.15 | 2267.25 ± 4.47 |

Vehicles | 4 | 18 | 846 | 0.43 ± 0.04 | 9069.24 ± 13.29 |

LEV-4 | 4 | 4 | 1000 | 0.32 ± 0.07 | 70.16 ± 1.88 |

SWD | 4 | 10 | 1000 | 0.34 ± 0.02 | 110.61 ± 1.71 |

Cars | 4 | 6 | 1728 | 0.31 ± 0.02 | 92.20 ± 4.33 |

Nursery | 4 | 8 | 12,958 | 1.76 ± 0.14 | 2079.20 ± 16.89 |

ESL-5 | 5 | 4 | 488 | 0.29 ± 0.02 | 52.81 ± 1.91 |

LEV | 5 | 4 | 1000 | 0.40 ± 0.04 | 90.75 ± 2.12 |

BVDB | 5 | 194 | 8700 | 44.26 ± 4.91 | 469,839.44 ± 2608 |

ERA-7 | 7 | 4 | 1000 | 1.09 ± 0.09 | 539.96 ± 5.43 |

Segment | 7 | 19 | 2310 | 1.77 ± 0.12 | 13,851.62 ± 58.83 |

ESL | 9 | 4 | 488 | 34.01 ± 0.43 | 200.59 ± 1.94 |

ERA | 9 | 4 | 1000 | 77.50 ± 2.56 | 616.12 ± 1.61 |

Mfeat | 10 | 649 | 2000 | 392.36 ± 1.13 | 19,661.20 ± 27.87 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Bellmann, P.; Lausser, L.; Kestler, H.A.; Schwenker, F.
A Theoretical Approach to Ordinal Classification: Feature Space-Based Definition and Classifier-Independent Detection of Ordinal Class Structures. *Appl. Sci.* **2022**, *12*, 1815.
https://doi.org/10.3390/app12041815

**AMA Style**

Bellmann P, Lausser L, Kestler HA, Schwenker F.
A Theoretical Approach to Ordinal Classification: Feature Space-Based Definition and Classifier-Independent Detection of Ordinal Class Structures. *Applied Sciences*. 2022; 12(4):1815.
https://doi.org/10.3390/app12041815

**Chicago/Turabian Style**

Bellmann, Peter, Ludwig Lausser, Hans A. Kestler, and Friedhelm Schwenker.
2022. "A Theoretical Approach to Ordinal Classification: Feature Space-Based Definition and Classifier-Independent Detection of Ordinal Class Structures" *Applied Sciences* 12, no. 4: 1815.
https://doi.org/10.3390/app12041815