# A Theoretical Approach to Ordinal Classification: Feature Space-Based Definition and Classifier-Independent Detection of Ordinal Class Structures

## Abstract

## 1. Introduction

## 2. Formalisation and Generalised Working Definition for Ordinal Class Structures

#### 2.1. Formalisation

#### 2.2. Feature Space-Based Working Definition for Ordinal Classification Tasks

## 3. Comparison to Previous Work and Additional Theoretical Outcomes

#### 3.1. Special Case for 3-Class Classification Tasks and Detection of FS-Ordinal Structures

#### 3.2. FS-Ordinal versus SVM-Ordinal Structures

## 4. Classifier-Independent Level of Separability Measures

#### 4.1. Discriminant Ratio

#### 4.2. Ordinal-Scaled and Categorical Features

#### 4.3. Interpretation

## 5. Evaluation

#### 5.1. Traditionally Ordinal Data Sets

#### 5.2. Additional Data Set Information

#### 5.3. Results for Traditionally Ordinal Data Sets

#### 5.4. Results for Traditionally Non-Ordinal Data Sets

#### 5.5. Running Time Comparison

## 6. Discussion

#### 6.1. Operational Complexity and Detection Limitations

#### 6.2. Iris Data Set—A Motivational Example for the Detection of FS-Ordinal Structures

## 7. Conclusions

## Abbreviations

## Appendix A. Proof of Corollary 1

## Appendix B. BioVid Heat Pain Database Part A

## References

**Figure 1.**General classification task processing steps.

**Left**: Sequential processing steps.

**Right**: Step-specific processing examples. The detection of ordinal class structures is included in the Data Analysis step (highlighted in green colour, in the online version of the manuscript).

**Figure 2.**Example of an ordinal-structured 2-dimensional 5-class toy data set with class order ${\omega}_{1}\prec {\omega}_{2}\prec {\omega}_{3}\prec {\omega}_{4}\prec {\omega}_{5}$. The relationship between ${\mu}_{2,3}$ and ${\mu}_{3,4}$ could be either ≤ or ≥, because class ${\omega}_{2}$ is closer to edge class ${\omega}_{1}$, whereas class ${\omega}_{4}$ is closer to edge class ${\omega}_{5}$. For ${\mu}_{3,5}$ and ${\mu}_{3,4}$, it holds ${\mu}_{3,5}\ge {\mu}_{3,4}$.

**Figure 3.**Detectionof FS-ordinal structures. If the given task ${X}_{\Omega}$ constitutes an FS-ordinal classification task, then the output includes exactly two permutations, which represent the ordinal structure of the current task (This figure is adapted from our previous work [21].).

**Figure 4.**Example of an ordinal-structured 3-class toy data set with class order ${\omega}_{1}\prec {\omega}_{2}\prec {\omega}_{3}$ (This figure is adapted from our previous work [21].).

**Figure 6.**Iris data set. Depicted are all binary combinations of the features Sepal Length, Sepal Width, Petal Length, and Petal Width, in cm. The legend is provided in the bottom right plot.

Variable | Description |
---|---|

$X\subset {\mathbb{R}}^{d}$ | d-dimensional data set, $d\in \mathbb{N}$ |

$\Omega =\{{\omega}_{1},\dots ,{\omega}_{c}\}$ | set of class labels, with $c>2$, $c\in \mathbb{N}$ |

$I=\{1,\dots ,c\}$ | index set |

${\mathcal{T}}^{c}$ | set of all permutations $\tau $ of the set I |

$\mu \in {\mathcal{M}}^{d}$ | mapping for measuring the level of separability |

${\mu}_{i,j}\in {\mathbb{R}}_{\ge 0}$ | level of separability between classes ${\omega}_{i}$ and ${\omega}_{j}$ |

${M}^{\left(\tau \right)}={\left({\mu}_{\tau \left(i\right),\tau \left(j\right)}\right)}_{i,j=1}^{c}$ | symmetric pairwise separability matrix (PSM) |

Author | Middle Name | Institute | ORCID | Notation |
---|---|---|---|---|

Ludwig Lausser | No | MSB | No | ${x}_{1}$ |

Hans A. Kestler | Yes | MSB | Yes | ${x}_{2}$ |

Friedhelm Schwenker | No | NIP | Yes | ${x}_{3}$ |

**Table 3.**Data Set Properties (Traditionally Ordinal Data Sets). Cl: Number of Classes. Fea: Number of Features. Sam: Number of Samples. #${\omega}_{i}$: Number of samples in class ${\omega}_{i}$.

Data Set | Cl | Fea | Sam | #${\mathit{\omega}}_{1}$ | #${\mathit{\omega}}_{2}$ | #${\mathit{\omega}}_{3}$ | #${\mathit{\omega}}_{4}$ | #${\mathit{\omega}}_{5}$ | #${\mathit{\omega}}_{6}$ | #${\mathit{\omega}}_{7}$ | #${\mathit{\omega}}_{8}$ | #${\mathit{\omega}}_{9}$ |
---|---|---|---|---|---|---|---|---|---|---|---|---|

CMC | 3 | 9 | 1473 | 629 | 511 | 333 | − | − | − | − | − | − |

LEV-4 | 4 | 4 | 1000 | 93 | 280 | 403 | 224 | − | − | − | − | − |

SWD | 4 | 10 | 1000 | 32 | 352 | 399 | 217 | − | − | − | − | − |

Cars | 4 | 6 | 1728 | 1210 | 384 | 69 | 65 | − | − | − | − | − |

Nursery | 4 | 8 | $\mathrm{12,958}$ | 4320 | 328 | 4266 | 4044 | − | − | − | − | − |

ESL-5 | 5 | 4 | 488 | 52 | 100 | 116 | 135 | 85 | − | − | − | − |

LEV | 5 | 4 | 1000 | 93 | 280 | 403 | 197 | 27 | − | − | − | − |

BVDB | 5 | 194 | 8700 | 1740 | 1740 | 1740 | 1740 | 1740 | − | − | − | − |

ERA-7 | 7 | 4 | 1000 | 92 | 142 | 181 | 172 | 158 | 118 | 137 | − | − |

ESL | 9 | 4 | 488 | 2 | 12 | 38 | 100 | 116 | 135 | 62 | 19 | 4 |

ERA | 9 | 4 | 1000 | 92 | 142 | 181 | 172 | 158 | 118 | 88 | 31 | 18 |

**Table 4.**Ordinal Structure Detection (Traditionally Ordinal Data Sets). DR: Detection based on the discriminant ratio. SVM-Acc: Detection based on the linear SVM resubstitution accuracy. ✓: Ordinal class structure found. ×: No ordinal class structure found.

Type | CMC | LEV-4 | SWD | Cars | Nursery | ESL-5 | LEV | BVDB | ERA-7 | ESL | ERA |
---|---|---|---|---|---|---|---|---|---|---|---|

DR | ✓ | ✓ | ✓ | × | × | ✓ | ✓ | ✓ | ✓ | × | ✓ |

SVM-Acc | × | × | × | × |

**Table 5.**Data Set Properties (Traditionally Non-Ordinal Data Sets). Cl: Number of Classes. Fea: Number of Features. Sam: Number of Samples.

Data Set | Cl | Fea | Sam | Class Distribution |
---|---|---|---|---|

Iris | 3 | 4 | 150 | 50 per class |

Seeds | 3 | 7 | 210 | 70 per class |

Forests | 4 | 27 | 523 | 83—86—159—195 |

Vehicles | 4 | 18 | 846 | 199—212—217—218 |

Segment | 7 | 19 | 2310 | 330 per class |

Mfeat | 10 | 649 | 2000 | 200 per class |

**Table 6.**Ordinal Structure Detection (Traditionally Non-Ordinal Data Sets). DR: Detection based on the discriminant ratio. SVM-Acc: Detection based on the linear SVM resubstitution accuracy. ✓ : Ordinal class structure found. ×: No ordinal class structure found.

Type | Iris | Seeds | Forests | Vehicles | Segment | Mfeat |
---|---|---|---|---|---|---|

DR | ✓ | ✓ | × | ✓ | × | × |

SVM-Acc | × | ✓ | ✓ | × | × | × |

**Table 7.**Running Time Comparison. Cl: Number of Classes. Fea: Number of Features. Sam: Number of Samples. DR: Detection based on the discriminant ratio. SVM-Acc: Detection based on the linear SVM resubstitution accuracy. Depicted are the mean and standard deviation (std) values, for the operational time in ms, averaged over ten repetitions. For the SVM-Acc approach, for the BVDB data set, we removed the digits from the std value, for the sake of readability.

Data Set | Cl | Fea | Sam | DR | SVM-Acc |
---|---|---|---|---|---|

Iris | 3 | 4 | 150 | 0.25 ± 0.14 | 19.97 ± 3.02 |

Seeds | 3 | 7 | 210 | 0.16 ± 0.01 | 17.36 ± 1.61 |

CMC | 3 | 9 | 1473 | 0.34 ± 0.11 | 1987.39 ± 2.25 |

Forests | 4 | 27 | 523 | 0.42 ± 0.15 | 2267.25 ± 4.47 |

Vehicles | 4 | 18 | 846 | 0.43 ± 0.04 | 9069.24 ± 13.29 |

LEV-4 | 4 | 4 | 1000 | 0.32 ± 0.07 | 70.16 ± 1.88 |

SWD | 4 | 10 | 1000 | 0.34 ± 0.02 | 110.61 ± 1.71 |

Cars | 4 | 6 | 1728 | 0.31 ± 0.02 | 92.20 ± 4.33 |

Nursery | 4 | 8 | 12,958 | 1.76 ± 0.14 | 2079.20 ± 16.89 |

ESL-5 | 5 | 4 | 488 | 0.29 ± 0.02 | 52.81 ± 1.91 |

LEV | 5 | 4 | 1000 | 0.40 ± 0.04 | 90.75 ± 2.12 |

BVDB | 5 | 194 | 8700 | 44.26 ± 4.91 | 469,839.44 ± 2608 |

ERA-7 | 7 | 4 | 1000 | 1.09 ± 0.09 | 539.96 ± 5.43 |

Segment | 7 | 19 | 2310 | 1.77 ± 0.12 | 13,851.62 ± 58.83 |

ESL | 9 | 4 | 488 | 34.01 ± 0.43 | 200.59 ± 1.94 |

ERA | 9 | 4 | 1000 | 77.50 ± 2.56 | 616.12 ± 1.61 |

Mfeat | 10 | 649 | 2000 | 392.36 ± 1.13 | 19,661.20 ± 27.87 |

