Exploring Symmetry of Binary Classification Performance Metrics

Luque, Amalia; Carrasco, Alejandro; Martín, Alejandro; Lama, Juan Ramón

doi:10.3390/sym11010047

Open AccessArticle

Exploring Symmetry of Binary Classification Performance Metrics

by

Amalia Luque

^1,*

,

Alejandro Carrasco

²

,

Alejandro Martín

¹

and

Juan Ramón Lama

¹

Ingeniería del Diseño; Escuela Politécnica Superior. Universidad de Sevilla, 41011 Sevilla, Spain

²

Tecnología Electrónica; Escuela Ingeniería Informática. Universidad de Sevilla, 41012 Sevilla, Spain

^*

Author to whom correspondence should be addressed.

Symmetry 2019, 11(1), 47; https://doi.org/10.3390/sym11010047

Submission received: 14 November 2018 / Revised: 13 December 2018 / Accepted: 24 December 2018 / Published: 3 January 2019

(This article belongs to the Special Issue Symmetry in Computing Theory and Application)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Selecting the proper performance metric constitutes a key issue for most classification problems in the field of machine learning. Although the specialized literature has addressed several topics regarding these metrics, their symmetries have yet to be systematically studied. This research focuses on ten metrics based on a binary confusion matrix and their symmetric behaviour is formally defined under all types of transformations. Through simulated experiments, which cover the full range of datasets and classification results, the symmetric behaviour of these metrics is explored by exposing them to hundreds of simple or combined symmetric transformations. Cross-symmetries among the metrics and statistical symmetries are also explored. The results obtained show that, in all cases, three and only three types of symmetries arise: labelling inversion (between positive and negative classes); scoring inversion (concerning good and bad classifiers); and the combination of these two inversions. Additionally, certain metrics have been shown to be independent of the imbalance in the dataset and two cross-symmetries have been identified. The results regarding their symmetries reveal a deeper insight into the behaviour of various performance metrics and offer an indicator to properly interpret their values and a guide for their selection for certain specific applications.

Keywords:

performance metrics; classification; computational symmetry; machine learning

1. Introduction

Symmetry has played and continues playing, a highly significant role in the way of how humans perceive the world [1]. In the scientific fields, symmetry plays a key role as it can be discovered in nature [2,3], society [4] and mathematics [5]. Moreover, symmetry also provides an intuitive way to attain faster and deeper insights into scientific problems.

In recent years, an increasing interest has arisen in detecting and taking advantage of symmetry in various aspects of theoretical and applied computing [6]. Several studies involving symmetry have been published in network technology [7], human interfaces [8], image processing [9], data hiding [10] and many other applications [11].

On the other hand, pattern recognition and machine learning procedures are becoming key aspects of modern science [12] and the hottest topics in the scientific literature on computing [13]. Furthermore, in this field, symmetry is playing an interesting role either as a subject of study, in the form of machine learning algorithms to discover symmetries [14] or as a means to improve the results obtained by automatic recognition systems [15]. Let us emphasize this point: not only can knowing the symmetry of a certain computer algorithm be intrinsically rewarding since it sheds light on the behaviour of the algorithm but it can also be very useful for its interpretation, its optimization or as a criterion for the selection among various competing algorithms. As an example, in recent research, we have employed a symmetric criterion to select the best feature-extraction procedures (Discrete Cosine Transform versus Discrete Fourier Transform) [16] in an application of the classification of sounds [17,18] effectively deployed in a Wireless Sensor Network as shown in Figure 1. Another examples of industrial applications using classification of sounds can be found in Refs. [19,20].

In the broad field of machine learning, the study of how to measure the performance of various classifiers has attracted continued attention [21,22,23]. Classification performance metrics play a key role in the assessment of the overall classification process in the test phase, in the selection from among various competing classifiers in the validation phase and are even sometimes used as the loss function to be optimized in the process of model construction during the classification training phase.

However, to the best of our knowledge, no systematic study into the symmetry of these metrics has yet been undertaken. By discovering their symmetries, we would reach a better understanding of their meaning, we could obtain useful insights into when their use would be more appropriate and we would also gain additional and meaningful indicators for the selection of the best performance metric.

Although several dozen performance metrics can be found in the literature, we will focus on those which are probably the most commonly used: the metrics based on the confusion matrix [24]. Accuracy, precision and recall (sensitivity) are undoubtedly some of the most popular metrics. On the other hand, our research will be focused on the cases where there are only two classes (binary classifiers). Although this is certainly a limitation, it does provide a solid ground base for further research. Moreover, multiclass performance metrics are usually obtained by decomposing the multiclass problem into several binary classification sub-problems [25].

2. Materials and Methods

2.1. Definitions

Let us first consider an original (baseline) experiment

E^{B}

, defined by the duple

E^{B} = (C^{B}, D^{B})

composed of a set of

n^{B}

classifiers,

C^{B} = {c_{i}^{B}}

and a set of their corresponding

n^{B}

datasets,

D^{B} = {D_{i}^{B}}, i = 1, \dots, n^{B}

. The elements in every dataset belong to either of two classes,

G_{1}

and

G_{2}

, which are called Positive (

P

) and Negative (

N

) classes, respectively. The

i

-th classifier

c_{i}^{B}

operates on the corresponding

D_{i}^{B}

dataset, thereby obtaining a resulting classification which can be defined by its binary confusion matrix

c m_{i}^{B}

and hence

D_{i}^{B} \overset{c_{i}^{B}}{\to} c m_{i}^{B}

. The set of confusion matrices are denominated

C M^{B} = {c m_{i}^{B}}

. The baseline experiment can therefore be defined as the set of classifiers operating on the set of datasets to obtain a set of confusion matrices,

E^{B} : D^{B} \overset{C^{B}}{\to} C M^{B}

.

This paper will explore the behaviour of binary classification performance metrics when the original experiment is subject to

n^{E}

different types of transformations. Let us define the

k

-th transformed experiment

E^{k} = (C^{k}, D^{k})

composed of a set of

n^{k}

classifiers,

C^{k} = {c_{i}^{k}}

and a set of their corresponding

n^{k}

datasets,

D^{k} = {D_{i}^{k}}

, whose result is a set of confusion matrices

C M^{k} = {c m_{i}^{k}}

. Hence,

E^{k} : D^{k} \overset{C^{k}}{\to} C M^{k}

, where

k = {B, 1, 2, \dots, n^{E}}

, indicates the type of transformation. In the

k

-th experiment, when the

i

-th classifier

c_{i}^{k}

operates on its corresponding

D_{i}^{k}

dataset, the result is summarized in the binary confusion matrix defined as

c m_{i}^{k} = [\begin{matrix} a_{i}^{k} & f_{i}^{k} \\ g_{i}^{k} & b_{i}^{k} \end{matrix}],

(1)

where

$a_{i}^{k}$ is the number of positive elements in $D_{i}^{k}$ correctly classified as positive;
$b_{i}^{k}$ is the number of negative elements in $D_{i}^{k}$ correctly classified as negative;
$f_{i}^{k}$ is the number of positive elements in $D_{i}^{k}$ incorrectly classified as negative; and
$g_{i}^{k}$ is the number of negative elements in $D_{i}^{k}$ incorrectly classified as positive.

Let us call

P_{i}^{k}

,

N_{i}^{k} a n d M_{i}^{k}

the positive, negative and total number of elements in

D_{i}^{k}

. Therefore

M_{i}^{k} = P_{i}^{k} + N_{i}^{k}

,

a_{i}^{k} + f_{i}^{k} = P_{i}^{k}

and

g_{i}^{k} + b_{i}^{k} = N_{i}^{k}

. The confusion matrix can then be described as

c m_{i}^{k} = [\begin{matrix} a_{i}^{k} & P_{i}^{k} - a_{i}^{k} \\ N^{k} - b_{i}^{k} & b_{i}^{k} \end{matrix}] .

(2)

Let us now define

α_{i}^{k}

as the ratio of positive elements in

D_{i}^{k}

correctly classified as positive; and

β_{i}^{k}

as the ratio of negative elements in

D_{i}^{k}

correctly classified as negative. That is,

α_{i}^{k} \equiv \frac{a_{i}^{k}}{P_{i}^{k}}, β_{i}^{k} \equiv \frac{b_{i}^{k}}{N_{i}^{k}} .

(3)

The confusion matrix can therefore be rewritten as

c m_{i}^{k} = [\begin{matrix} α_{i}^{k} P_{i}^{k} & P_{i}^{k} - α_{i}^{k} P_{i}^{k} \\ N_{i}^{k} - β_{i}^{k} N_{i}^{k} & β_{i}^{k} N_{i}^{k} \end{matrix}] = [\begin{matrix} α_{i}^{k} P_{i}^{k} & (1 - α_{i}^{k}) P_{i}^{k} \\ (1 - β_{i}^{k}) N_{i}^{k} & β_{i}^{k} N_{i}^{k} \end{matrix}] .

(4)

On the other hand, a dataset

D_{i}^{k}

is called imbalanced if it has a different number of positive and negative elements, that is,

P_{i}^{k} \neq N_{i}^{k}

. Classification on the presence of imbalanced datasets is a challenging task requiring specific considerations [26]. To quantify the imbalance, several indicators have been proposed, such as the dominance [27,28], the proportion between positive and negative instances (formalized as

1 : X

) [29] and the imbalance ratio (

I R

) defined as

P_{i}^{k} / N_{i}^{k}

[30], which is also called skew [31]. This value lies within the

[0, \infty]

range and has a value

I R = 1

in the balanced case. We prefer to use an indicator showing a value

0

in the balanced case, a value

+ 1

when all the elements in the dataset are positive and

- 1

if all the elements are negative. We define the imbalance coefficient

δ_{i}^{k}

, which is an indicator that has these characteristics, as

δ_{i}^{k} \equiv 2 \frac{P_{i}^{k}}{M_{i}^{k}} - 1.

(5)

The imbalance coefficient is graphically shown in Figure 2 (solid blue cline) as a function of the proportion of positive elements in the dataset. For the sake of comparison, that figure also shows the

I R

imbalance ratio (dashed green line).

Based on the imbalance coefficient, the number of positive and negative elements in the dataset can be rewritten as

P_{i}^{k} = \frac{1 + δ_{i}^{k}}{2} M_{i}^{k} .

(6)

N_{i}^{k} = M_{i}^{k} - P_{i}^{k} = M_{i}^{k} (1 - \frac{1 + δ_{i}^{k}}{2}) = \frac{1 - δ_{i}^{k}}{2} M_{i}^{k} .

(7)

By substituting these expressions into Equation (4), the confusion matrix becomes

c m_{i}^{k} = [\begin{matrix} α_{i}^{k} \frac{1 + δ_{i}^{k}}{2} M_{i}^{k} & (1 - α_{i}^{k}) \frac{1 + δ_{i}^{k}}{2} M_{i}^{k} \\ (1 - β_{i}^{k}) \frac{1 - δ_{i}^{k}}{2} M_{i}^{k} & β_{i}^{k} \frac{1 - δ_{i}^{k}}{2} M_{i}^{k} \end{matrix}] = λ_{i}^{k} M_{i}^{k},

(8)

where

λ_{i}^{k}

is the unitary confusion matrix defined as

λ_{i}^{k} \equiv [\begin{matrix} α_{i}^{k} \frac{1 + δ_{i}^{k}}{2} & (1 - α_{i}^{k}) \frac{1 + δ_{i}^{k}}{2} \\ (1 - β_{i}^{k}) \frac{1 - δ_{i}^{k}}{2} & β_{i}^{k} \frac{1 - δ_{i}^{k}}{2} \end{matrix}] .

(9)

It can be seen that

λ_{i}^{k}

is a function of 3 variables: the ratio of positive

(α_{i}^{k})

and negative

(β_{i}^{k})

correctly classified elements and the imbalance coefficient

(δ_{i}^{k})

, that is,

λ_{i}^{k} = λ_{i}^{k} (α_{i}^{k}, β_{i}^{k}, δ_{i}^{k})

.

In order to measure the performance of the classification process,

m

metrics are used. In this paper we focus on metrics that are based on the unitary confusion matrix and, for the sake of much easier comparison, all these metrics are converted within the range

[- 1, 1]

. Let us define

{}^{j}γ_{i}^{k}

as the

j

-th of such metrics for the

c_{i}^{k}

classifier operating on the

D_{i}^{k}

dataset, where

j = 1, \dots, m

. Since it is based on the unitary confusion matrix,

{}^{j}γ_{i}^{k} = {}^{j}γ_{i}^{k} (λ_{i}^{k}) = {}^{j}γ_{i}^{k} (α_{i}^{k}, β_{i}^{k}, δ_{i}^{k})

.

Let us now define

μ_{j}^{k}

as the set of the

j

-th metric values corresponding to the

k

-th experiment

E^{k} = (C^{k}, D^{k})

, that is,

μ_{j}^{k} \equiv {{}^{j}γ_{i}^{k}}, i = 1, 2, \dots, n^{k}

. Additionally, the sets

α^{k} \equiv {α_{i}^{k}}

,

β^{k} \equiv {β_{i}^{k}} a n d δ^{k} \equiv {δ_{i}^{k}}

are also defined.

2.2. Representation of Metrics

With these definitions, it is clear that the metric

μ_{j}^{k} = μ_{j}^{k} (α^{k}, β^{k}, δ^{k})

and hence it is a 4-dimensional function since

μ_{j}^{k}

(one dimension) depends on

α^{k}, β^{k} a n d δ^{k}

(three independent dimensions). To depict their values, a first approach could involve a 3D representation space where each

(α_{i}^{k}, β_{i}^{k}, δ_{i}^{k})

point is color-coded according to the value

{}^{j}γ_{i}^{k} (α_{i}^{k}, β_{i}^{k}, δ_{i}^{k})

.

To show the different types of representations, let us define an arbitrary metric function

μ_{j}^{k} (α^{k}, β^{k}, δ^{k}) = \frac{\sin (2 π α^{k} δ^{k}) + \sin (2 π β^{k})}{2} .

(10)

This function is only used as an example, corresponds to no specific classification metric and has been selected for its aesthetic results. Figure 3 depicts the 3D representation for said example function. The

n^{E} = 1000

pairs of classifiers and datasets used in the experiment

E^{k} = (C^{k}, D^{k})

are selected in such a way that the space

(α^{k}, β^{k}, δ^{k})

is covered with equally spaced points. The above figure may cause confusion, mainly when the number of points

(n^{E})

increases. An alternative is to slice the 3D graphic by a plane corresponding to a certain value of the imbalance coefficient. Figure 4a depicts such a slice in the 3D graphic for an arbitrary value

δ = 0.75

and Figure 4b shows the slice on a 2D plane.

In the previous figure, the slice contains 100 values of the metric. However, to obtain a clearer understanding of the metric behaviour, a much larger number of points is recommended. For this purpose, the experiment is designed by selecting a set of virtual pairs of classifiers and datasets

(C^{k}, D^{k})

in such a way that the plane

(α^{k}, β^{k})

is fully covered. The result, as shown in Figure 5, appears as a heat map for a certain value of the imbalance coefficient (

δ = 0.75

in the example).

In order to analyse the behaviour of the metric for different values of the imbalance coefficient, a panel of heat maps can be used, as depicted in Figure 6.

2.3. Transformations

The original baseline experiment

E^{B} = (C^{B}, D^{B})

is subject to various types of transformations. As a result of the

k

-th transformation, the metrics related to the baseline experiment

μ_{j}^{B} (α^{B}, β^{B}, δ^{B})

are transformed into

μ_{j}^{k} (α^{k}, β^{k}, δ^{k})

, which can be written either as

μ_{j}^{k} (α^{k}, β^{k}, δ^{k}) = T^{k} [μ_{j}^{B} (α^{B}, β^{B}, δ^{B})]

or as

μ_{j}^{B} (α^{B}, β^{B}, δ^{B}) \overset{T^{k}}{\to} μ_{j}^{k} (α^{k}, β^{k}, δ^{k}) .

(11)

It is said that the metric

μ_{j}

is symmetric under the transformation

T^{k}

if

μ_{j}^{k} = μ_{j}^{B}

. Conversely,

μ_{j}

is called antisymmetric under

T^{k}

(or symmetric under the complementary transformation

{\bar{T}}^{k}

) if

μ_{j}^{k} = - μ_{j}^{B} .

Analogously, it is said that the metrics

μ_{u}

and

μ_{v}

are cross-symmetric under the transformation

T^{k}

if

μ_{u}^{k} = μ_{v}^{B}

. Conversely,

μ_{u}

and

μ_{v}

are called anti-cross-symmetric under

T^{k}

(or cross-symmetric under the complementary transformation

{\bar{T}}^{k}

) if

μ_{u}^{k} = - μ_{v}^{B} .

2.3.1. One-Dimensional Transformations

One-dimensional transformations is the name given to those mirror reflections with respect to a single (one and only one) dimension of the 4-dimensional performance metric. Type

α

transformation implies that the

i

-th transformed classifier

(c_{i}^{α})

shows a ratio of correctly classified positive elements

(α_{i}^{α})

which has the symmetric value of the ratio

(α_{i}^{B})

obtained by the baseline classifier

(c_{i}^{B})

. Since the values of such ratios lie within the range

[0, 1]

, the symmetry exists with respect to the hyperplane

α = 0.5

and can be stated as

α_{i}^{α} = 1 - α_{i}^{B}

. An example of this transformation is depicted in Figure 7.

Analogously, type

β

transformation implies that the

i

-th transformed classifier

(c_{i}^{β})

shows a ratio of correctly classified negative elements

(β_{i}^{β}),

which has the symmetric value of the ratio

(β_{i}^{B})

obtained by the baseline classifier

(c_{i}^{B})

. Since the value of such ratios also lie within the range

[0, 1]

, the symmetry exists with respect to the hyperplane

β = 0.5

and can be stated as

β_{i}^{β} = 1 - β_{i}^{B}

. An example of this transformation is depicted in Figure 8.

Conversely, type

δ

transformation, which, instead of operating on classifiers, operates on datasets, implies that the

i

-th transformed dataset

(D_{i}^{δ})

has an imbalance ratio

(δ_{i}^{δ}),

which has the symmetric value of the imbalanced ratio

(δ_{i}^{B})

in the baseline corresponding to dataset

(δ_{i}^{B})

. Since the value of such imbalance ratios lie within the range

[- 1, 1]

, the symmetry exists with respect to the hyperplane

δ = 0

and can be stated as

δ_{i}^{δ} = - δ_{i}^{B}

. An example of this transformation is depicted in Figure 9.

Finally, type

μ

transformation jointly operates on classifiers and datasets in such a way that the

j

-th of performance metrics

{}^{j}γ_{i}^{μ}

for the

c_{i}^{μ}

classifier operating on the

D_{i}^{μ}

dataset has the symmetric value of the performance metric in the baseline experiment

({}^{j}γ_{i}^{B})

. Since the value of such metrics lie within the range

[- 1, 1]

, the symmetry exists with respect to the hyperplane

μ = 0

and can be stated as

{}^{j}γ_{i}^{μ} = - {}^{j}γ_{i}^{B}

. An example of this transformation is depicted in Figure 10 where it should be noted that the

μ

dimension is shown by the colour code of each point. Therefore, an inversion in

μ

is shown as a colour inversion.

2.3.2. Multidimensional Transformations

Let us now consider transformations that exchange two or more dimensions of the 4-dimensional performance metric. Firstly, let us define type

σ

transformation as that which exchanges

α

and

β

dimensions. This implies that the

i

-th transformed classifier/dataset pair

(c_{i}^{σ}, D_{i}^{σ})

shows a ratio of correctly classified positive elements

(α_{i}^{σ})

which has the same value as the ratio of correctly classified negative elements

(β_{i}^{B})

obtained by the baseline classifier/dataset pair

(c_{i}^{B}, D_{i}^{B})

. This exchange can be seen as the symmetry with respect to the hyperplane

α = β

(main diagonal of the

α, β

plane) and can be stated as

α_{i}^{σ} = β_{i}^{B}; β_{i}^{σ} = α_{i}^{B}

. An example of this transformation is depicted in Figure 11.

Although the four axes in these plots remain dimensionless, not all of them have the same meaning. So,

α

and

β

are both ratios of correctly classified elements. It would be nonsensical, for instance, to rescale

α

without also rescaling

β

. However,

δ

has a completely different meaning and its scale can and in fact does, differ from

α

and

β

. The same reasons can be applied to the axes

μ

. Therefore, all the exchanges of multidimensional axes are meaningless, except the interchange of

α

and

β

. All the other remaining exchanges are dismissed in our study.

The one- and two-dimensional transformations described above are called basic transformations and are summarized in Table 1.

2.3.3. Combined Transformations.

More complex transformations can be obtained by concatenating basic transformations. For instance, applying basic transformation

α

(

T^{α}

) and then basic transformation

β

(

T^{β}

) produces a new combined transformation

T^{α β} = T^{α} \cdot T^{β}

featured by

α^{α β} = 1 - α^{B}; β^{α β} = 1 - β^{B}

. As each of the one-dimensional transformations operates on an independent axis, they have the commutative and associative properties, that is, given 3 one-dimensional transformations,

T^{U}

,

T^{V} a n d T^{W},

it is true that

T^{U} \cdot T^{V} = T^{V} \cdot T^{U}

and that that

(T^{U} \cdot T^{V}) \cdot T^{W} = T^{U} \cdot (T^{V} \cdot T^{W})

.

However, bi-dimensional type

σ

transformation

T^{σ}

operates on the same axis as

T^{α}

and

T^{β}

. In this case, the order of transformation matters, as they do not have the commutative property. For instance,

T^{α σ} [μ_{j}^{B}] = T^{σ} {T^{α} [μ_{j}^{B} (α^{B}, β^{B}, δ^{B})]} = T^{σ} {μ_{j}^{B} (1 - α^{B}, β^{B}, δ^{B})} = μ_{j}^{B} (β^{B}, 1 - α^{B}, δ^{B}) .

On the other hand,

T^{σ α} [μ_{j}^{B}] = T^{α} {T^{σ} [μ_{j}^{B} (α^{B}, β^{B}, δ^{B})]} = T^{α} {μ_{j}^{B} (β^{B}, α^{B}, δ^{B})} = μ_{j}^{B} (1 β^{B}, α^{B}, δ^{B})

. Therefore, it is clear that

T^{α σ} \neq T^{σ α}

.

Having 5 basic transformations and not initially considering their order, any combined transformation can be binary coded in terms of the presence/absence of each basic component. Therefore

2^{5} = 32

combinations are possible; only 31 if the identity transformation (coded 00000) is dismissed. In order to code a combined transformation, the order

μ, σ, δ, β, α

is used where transformation

μ

indicates the Most Significant Bit (MSB) and the transformation

α

specifies the Least Significant Bit (LSB). An example of this code is shown in Table 2. With this selection, codes greater than 15 contain a transformation type

μ

, that is, they are useful in exploring antisymmetric behaviour. In the cases where the order of transformations matters,

σ = 1

and (

α = 1

or

β = 1

), then their corresponding codes refer to various different combined transformations.

A first example of combined transformations is that of the inverse labelling of classes. As stated above, the elements in every dataset belong to either of two classes,

G_{1}

and

G_{2}

, which are called Positive (

P

) and Negative (

N

) classes, respectively. The inverse labelling transformation (

T^{L}

) explores the classification metric behaviour when the labelling of the classes is inverted, that is, when

G_{2}

is called the Positive class and

G_{1}

the Negative class. Let us consider the

i

-th classifier

c_{i}^{L}

operating on its corresponding

D_{i}^{L}

dataset. In the baseline experiment, the ratio of correctly classified positive elements (

α_{i}^{B}

) refers to class

G_{1} a n d

conversely (

β_{i}^{B}

) refers to class

G_{2}

. In the

T^{L}

transformed experiment, the ratio of correctly classified positive elements (

α_{i}^{L}

) refers to class

G_{2} a n d

conversely (

β_{i}^{L}

) refers to class

G_{1}

, which means that

α_{i}^{L} = β_{i}^{B}

and

β_{i}^{L} = α_{i}^{B}

. That is, the first step of this transformation implies interchanging the axes

α

and

β,

which is equivalent to reflection symmetry with respect to the main diagonal, formerly defined as the basic transformation of type

σ

(Figure 12b).

Additionally, in the baseline experiment, the number of positive elements (

P_{i}^{B}

) refers to class

G_{1},

while in the

T^{L}

transformed experiment, the number of positive elements (

P_{i}^{L}

) refers to class

G_{2}

, which means that

P_{i}^{L} = N_{i}^{B}

and

N_{i}^{L} = P_{i}^{B}

, while the total number of elements remains unaltered:

M_{i}^{L} = M_{i}^{B}

. Therefore, by recalling Equation (5),

δ_{i}^{L} \equiv 2 \frac{P_{i}^{L}}{M_{i}^{L}} - 1 = 2 \frac{N_{i}^{B}}{M_{i}^{B}} - 1 = 2 \frac{M_{i}^{B} - P_{i}^{B}}{M_{i}^{B}} - 1 = - (2 \frac{P_{i}^{B}}{M_{i}^{B}} - 1) = - δ_{i}^{B} .

(12)

Hence, the second step of this transformation also implies reflection symmetry with respect to the hyperplane

δ = 0

, previously defined as the basic transformation of type

δ

(Figure 12c).

Finally, the complementary transformation

{\bar{T}}^{L}

involves a third and final step of inverting the sign of the metric, which is equivalent to reflection symmetry with respect to the hyperplane

μ = 0

, formerly defined as the basic transformation of type

μ

(Figure 12d).

Therefore, the inverse labelling transformation can be defined as

T^{L} = T^{σ δ} = T^{σ} \cdot T^{δ}

and its complementary as

{\bar{T}}^{L} = T^{σ δ μ} = T^{σ} \cdot T^{δ} \cdot T^{μ}

, where

T^{L} : μ_{j}^{L} (α^{L}, β^{L}, δ^{L}) = μ_{j}^{B} (β^{B}, α^{B}, - δ^{B}) .

(13)

A second example of combined transformations is given by the inverse-scoring transformation (

T^{S}

) which explores classification metric behaviour when the scoring of the classification results are inverted. In the baseline experiment, let us consider the

i

-th classifier

c_{i}^{B}

operating on its corresponding

D_{i}^{B}

dataset, thereby obtaining a ratio

α_{i}^{B}

of correctly classified positive elements and a ratio

β_{i}^{B}

in the negative case. The

j

-th metric assigns a score of

{}^{j}γ_{i}^{B} (α_{i}^{B}, β_{i}^{B}, δ_{i}^{B}) to this result

. High values of the score

{}^{j}γ_{i}^{B}

usually correspond to high ratios

α_{i}^{B}, β_{i}^{B}

. In the inverted score transformation (

T^{S}

), the

i

-th classifier

c_{i}^{S}

operating on its corresponding

D_{i}^{S}

dataset obtains a ratio

α_{i}^{S}

of correctly classified positive elements which is equal to the ratio of positive elements incorrectly classified in the baseline experiment, that is,

α_{i}^{S} = 1 - α_{i}^{B},

which implies a type

α

transformation. Analogously, for the negative class,

β_{i}^{S} = 1 - β_{i}^{B},

which implies a type

β

transformation. If

α_{i}^{B}, β_{i}^{B}

have high values, then

α_{i}^{S}, β_{i}^{S}

will have low values and, to be consistent, the result should be marked with a low score. For that reason, the inverse scoring transform also implies a transformation type

μ

, that is, it uses the symmetric value of the metric

{}^{j}γ_{i}^{S} = - {}^{j}γ_{i}^{B}

. Therefore, the inverse labelling transformation can be defined as

T^{S} = T^{α β μ} = T^{α} \cdot T^{β} \cdot T^{μ}

where

T^{S} : μ_{j}^{S} (α^{S}, β^{S}, δ^{S}) = μ_{j}^{B} (1 - α^{B}, 1 - β^{B}, δ^{B}) .

(14)

The results are depicted in Figure 13.

A third example is that of the full inversion (

T^{F}

), which explores the classification metric behaviour when both the labelling (

T^{L}

) and the scores (

T^{S}

) are inverted. This transformation can be featured by the concatenation of their two components, which can be written as

T^{F} = T^{L} \cdot T^{S} = T^{σ δ} \cdot T^{α β μ} = T^{σ δ α β μ} = T^{α β δ σ μ} .

(15)

T^{F} : μ_{j}^{F} (α^{F}, β^{F}, δ^{F}) = - μ_{j}^{B} (1 - β^{B}, 1 - α^{B}, - δ^{B}) .

(16)

The results are depicted in Figure 14.

Finally let us consider the

T^{α σ β}

transformation

T^{α σ β} [μ_{j}^{B} (α^{B}, β^{B}, δ^{B})] = T^{σ β} [μ_{j}^{B} (1 - α^{B}, β^{B}, δ^{B})] = T^{β} [μ_{j}^{B} (β^{B}, 1 - α^{B}, δ^{B})] = μ_{j}^{B} (β^{B}, α^{B}, δ^{B}),

(17)

that is,

T^{α σ β} = T^{σ}

. Analogously, it can be shown that

T^{β σ α} = T^{σ}

.

2.4. Performance Metrics

Based on the binary confusion matrix, numerous performance metrics have been proposed [32,33,34,35,36]. For our study, the focus is placed on 10 of these metrics, which are summarized in Table 3. The terms used in that table are taken from the elements of a generic confusion matrix which can be stated as

c m = [\begin{matrix} a & f \\ g & b \end{matrix}],

(18)

The last three metrics (

M C C

,

B M

and

M K

) take values within the

[- 1, 1]

range, while the ranges for the first seven lie within the

[0, 1]

interval. For comparison purposes, these metrics are used herein in their normalized version (

[- 1, 1]

interval). By naming a metric defined within the

[0, 1]

interval as

μ

, it can be normalized within the

[- 1, 1]

range by the expression

μ_{n} \equiv 2 μ - 1.

(19)

It can easily be shown that all these metrics can be expressed as a function

μ = μ (α, β, δ)

.

Although only performance metrics based on the confusion matrix are considered, a marginal approach to Receiver Operating Characteristics (ROC) analysis [37] can also be carried out. In this analysis, the Area Under Curve (

A U C

) is commonly used as a performance metric. However, for classifiers offering only a label (and not a set of scores for each label) or when a single threshold is used on scores, the value of

A U C n

and

B M

are the same [38]. Therefore, in the forthcoming sections, whenever

B M

is mentioned it could also be understood as

A U C n

.

2.5. Exploring Symmetries

In order to determine the existence of any symmetric or cross-symmetric behaviour on the 10 classification performance metrics described in the previous section, we should explore whether, for each metric (or pair of metrics), its baseline and any of the 31 combinations of transformations obtain the same result as that of the baseline of the same metric (symmetry) or any other metric (cross-symmetry). Moreover, many of these combined transformations must take the order into account. Therefore, several thousands of different analyses have to be undertaken. Although performing this task using analytical derivations is not an impossible assignment (preferably using some kind of symbolic computation), it is certainly arduous.

An alternative approach is to identify the distance of two metrics. More formally, for the

U

-th transformation, let us consider the

i

-th combination of classifier

c_{i}^{U}

operating on the

D_{i}^{U}

dataset. The classification result is measured using the

r

-th metric,

{}^{r}γ_{i}^{U}

. Similarly, for the

V

-th transform and the

i

-th combination of classifier

c_{i}^{V}

operating on the

D_{i}^{V}

dataset, let us measure its performance using the

s

-th metric,

{}^{s}γ_{i}^{V}

. The distance between these measures is defined as

d i s t ({}^{r}γ_{i}^{U}, {}^{s}γ_{i}^{V}) \equiv | {}^{r}γ_{i}^{U} - {}^{s}γ_{i}^{V} |

. The distance between the

r

-th metric

μ_{r}^{U} = {{}^{r}γ_{i}^{U}}

and the

s

-th metric

μ_{s}^{V} = {{}^{s}γ_{i}^{V}}

can then be defined as

d i s t (μ_{r}^{U}, μ_{s}^{V}) \equiv \frac{1}{n} \sum_{i = 1}^{n} d i s t ({}^{r}γ_{i}^{U}, {}^{s}γ_{i}^{V}) = \frac{1}{n} \sum_{i = 1}^{n} | {}^{r}γ_{i}^{U}, {}^{s}γ_{i}^{V} | .

(20)

Therefore, symmetric or cross-symmetric behaviour can be identified by a distance equal to zero.

It should be noted that if the

r

-th metric is symmetric under the

U

-th transformation, that is,

μ_{r}^{U} = T^{U} (μ_{r}^{B}) = μ_{r}^{B}

and also under the

V

-th transformations,

μ_{r}^{V} = T^{V} (μ_{r}^{B}) = μ_{r}^{B}

, it will also be symmetric under the concatenation of the two transformations. In effect,

μ_{r}^{U V} = T^{V} [T^{U} (μ_{r}^{B})] = T^{V} (μ_{r}^{B}) = μ_{r}^{B} .

(21)

Conversely, this is not true for cross-symmetries. If the

r

-th and

s

-th metric are cross-symmetric under the

U

-th transformation, that is,

μ_{r}^{U} = T^{U} (μ_{r}^{B}) = μ_{s}^{B}

and also under the

V

-th transformations,

μ_{r}^{V} = T^{V} (μ_{r}^{B}) = μ_{s}^{B}

, they are not necessarily cross-symmetric under the concatenation of the two transformations. In effect,

μ_{r}^{U V} = T^{V} [T^{U} (μ_{r}^{B})] = T^{V} (μ_{s}^{B}) = μ_{r}^{B} \neq μ_{s}^{B} .

(22)

2.6. Statistical Symmetries

The symmetries of the performance metrics can also be explored from a statistical point of view. Let us recall that

D_{i}^{k}

is the

i

-th dataset in the

k

-th experiment with an imbalance described by its imbalance coefficient

δ_{i}^{k}

. The elements in

D_{i}^{k}

are processed by the

c_{i}^{k}

classifier in order to obtain a ratio of correctly classified positive

α_{i}^{k}

and negative

β_{i}^{k}

elements. The

j

-th metric

{}^{j}γ_{i}^{k}

is based on these values and hence

{}^{j}γ_{i}^{k} = {}^{j}γ_{i}^{k} (α_{i}^{k}, β_{i}^{k}, δ_{i}^{k})

. Let us also recall that the set of all these values for

i = 1, \dots, n^{k}

, are denoted

μ_{j}^{k} = {{}^{j}γ_{i}^{k}}

,

α^{k} = {α_{i}^{k}}

,

β^{k} = {β_{i}^{k}}

and

δ^{k} = {δ_{i}^{k}}

and therefore

μ_{j}^{k} = μ_{j}^{k} (α^{k}, β^{k}, δ^{k})

.

Let us now suppose that the elements

c_{i}^{k}, D_{i}^{k}

in the experiments are randomly selected in such a way that

α^{k}

,

β^{k} a n d δ^{k}

are uniformly distributed within their respective ranges. Therefore,

μ_{j}^{k}

becomes a random variable, which can be statistically described.

First of all, the probability density function (pdf) of

μ_{j}^{k}

:

p d f (μ_{j}^{k})

is obtained and its symmetry (or lack thereof) is ascertained. A more precise assessment of the statistical symmetry can be obtained by computing the skewness, which is defined as

ξ_{j}^{k} \equiv s k e w (μ_{j}^{k}) = E [{(\frac{μ_{j}^{k} - {\bar{μ}}_{j}^{k}}{\sqrt{v a r (μ_{j}^{k})}})}^{3}],

(23)

where

{\bar{μ}}_{j}^{k}

is the mean of

μ_{j}^{k}

and

v a r (μ_{j}^{k})

is its variance.

3. Results

3.1. Identifying Symmetries

The symmetric behaviour of the 10 metrics is first determined by means of computing the distance between the baseline and each of the 31 possible transformations, in accordance with Equation (20). The results are depicted in Figure 15. Each row shows the symmetries of a metric. In the columns are the 31 different transformations. Any given metric-transformation pair (small rectangles in the graphic) is shown in yellow if it has zero-distance with the metric baseline. The right-hand-side of the plot (whose code is greater than or equal to 16) corresponds to a combined transformation where the

μ

axis has been inverted, that is, where the transformation type

μ

is present. This is therefore the area for antisymmetric behaviour.

Let us first analyse each metric in terms of the accuracy (

A C C n

), the Matthews correlation coefficient (

M C C

) and the markedness (

M K

). These three metrics present a symmetric behaviour for the combined transformations shown in Table 4. For instance, the first row indicates that the three metrics are symmetric for a combination of the transformations

δ

and

σ

taken in any order (

δ σ

or

σ δ

), which corresponds to the code 12 (01100) for a coding scheme (

μ, σ, δ, β, α

) where

μ

represents the Most Significant Bit and

α

represents the Least Significant Bit.

The first case (code 12) corresponds to the transformation

T^{σ δ},

or, in other words, to the inverse labelling transformation

T^{L} = T^{σ δ}

which can be formulated for accuracy as

μ_{A C C n} (α, β, δ) = μ_{A C C n} (β, α, - δ) .

(24)

The results are depicted in Figure 16.

The second case (code 15) corresponds to 4 transformations ordered in two different ways. In the first ordering, we have

T^{α σ β δ} = T^{α σ β} \cdot T^{δ}

. Recalling Equation (17),

T^{α σ β} = T^{σ}

. It can therefore be written that

T^{α σ β δ} = T^{σ} \cdot T^{δ} = T^{σ δ} = T^{L}

, that is, it is equivalent to the inverse labelling transformation. The same result is obtained for

T^{β σ α δ}

. Hence, code 15 is the same case as code 12.

The third case (code 19) corresponds to the transformation

T^{α β μ},

or, in other words, to the inverse scoring transformation

T^{S} = T^{α β μ},

which can be formulated for accuracy as

μ_{A C C n} (α, β, δ) = - μ_{A C C n} (1 - α, 1 - β, δ) .

(25)

The results are depicted in Figure 17.

Finally, code 31 corresponds to 5 transformations ordered in 4 different ways. In the first ordering we have

T^{α β σ δ μ}

but, by considering that the order of

T^{δ}

and

T^{μ}

are not relevant, it can also be written as

T^{α β σ δ μ} = T^{α β δ σ μ} = T^{α β δ} \cdot T^{σ μ} = T^{L} \cdot T^{S} = T^{F}

, that is, it is equivalent to the full transformation. The same result is obtained for the 3 remaining orderings which can be formulated for accuracy as

μ_{A C C n} (α, β, δ) = - μ_{A C C n} (1 - β, 1 - α, - δ) .

(26)

The results are depicted in Figure 18.

Let us now focus on precision (

P R C n

) and the negative prediction value (

N P V n

). These two metrics present a symmetric behaviour for the combined transformations shown in Table 5.

These two metrics present symmetric behaviour for only the combined transformations code 31 (

11111

) which, in any of its ordering, is equivalent to the full inversion

T^{F} = T^{L} \cdot T^{S} = T^{α β δ σ μ}

and can be formulated for precision as

μ_{P R C n} (α, β, δ) = - μ_{P R C n} (1 - β, 1 - α, - δ) .

(27)

In other words, precision is symmetric with respect to the concatenation of inverse labelling and the inverse scoring transformations. The results are depicted in Figure 19.

Let us now analyse the geometric mean (

G M n

), which presents symmetric behaviour for the combined transformations shown in Table 6.

In first place, code 4 corresponds to

T^{δ}

. In fact, this metric is not only symmetric with respect to

δ

but also independent of

δ

, as it can be seen in Table 3. Secondly, combined transformations coded as 8 and 11 are equivalent to the

T^{σ}

transformation, that is,

G M n

is symmetric with respect to the diagonal in the

α, β

plane. This can be formulated as

μ_{G M n} (α, β) = μ_{G M n} (β, α) .

(28)

Finally, codes 12 and 15 imply concatenating

T^{δ}

to

T^{σ}

but as the metric is independent of

δ

, it is again equivalent to

T^{σ}

, that is,

T^{σ δ} = T^{σ} \cdot T^{δ} = T^{σ}

. These results are depicted in Figure 20.

In the case of bookmaker informedness (

B M

), the symmetric behaviour is obtained for the combined transformations shown in Table 7.

Again code 4 corresponds to

T^{δ}

as a consequence that this metric is independent of

δ

(see Table 3). Secondly, combined transformations coded as 8 and 11 are equivalent to the

T^{σ}

transformation, that is,

B M

is symmetric with respect to the diagonal in the

α, β

plane. This can be formulated as

μ_{B M} (α, β) = μ_{B M} (β, α) .

(29)

Additionally, codes 12 and 15 imply concatenating

T^{δ}

to

T^{σ}

but since the metric is independent of

δ

, it is again equivalent to

T^{σ}

, that is,

T^{σ δ} = T^{σ} \cdot T^{δ} = T^{σ}

. These results are depicted in Figure 21.

Code 19 and also code 23 since the metric does not depend on

δ

, correspond to the transformation

T^{α β μ}

or, in other words, to the inverse scoring transformation

T^{S} = T^{α β μ}

, which can be formulated for bookmaker informedness as

μ_{B M} (α, β) = - μ_{B M} (1 - α, 1 - β) = - μ_{B M} (1 - β, 1 - α) .

(30)

The results are depicted in Figure 22.

In other words, the bookmaker informedness is symmetric with respect to the inverse labelling and to the inverse scoring transformations. This implies that it is also symmetric with respect to the concatenations of these two transforms, which occurs in codes 27 and 31 (recall that the latter is independent of

δ

) corresponding to the full inversion

T^{F} = T^{L} + T^{S} = T^{α β δ σ μ},

which can be formulated as

μ_{B M} (α, β) = - μ_{B M} (1 - β, 1 - α) .

(31)

The results are depicted in Figure 23.

In the case of sensitivity (

S N S n

), the symmetric behaviour is found for the combined transformations shown in Table 8.

Codes 2 and 4 correspond to

T^{β}

and

T^{δ}

as a consequence of this metric being independent of

β

and

δ

(see Table 3). Code 19 (and also codes 17, 21 and 23 since the metric does not depend on

β

nor

δ

) corresponds to the transformation

T^{α β μ},

or, in other words, to the inverse scoring transformation

T^{S} = T^{α β μ}

, which can be formulated as

μ_{S N S n} (α) = - μ_{S N S n} (1 - α) .

(32)

This result is depicted in Figure 24.

On considering the specificity (

S P C n

), its symmetric behaviour is shown in Table 9.

Codes 1 and 4 corresponds to

T^{α}

and

T^{δ}

as a consequence of this metric being independent of

α

and

δ

(see Table 3). Code 19 (and also codes 18, 22 and 23 as the metric depends neither on

α

nor on

δ

) corresponds to the transformation

T^{α β μ}

, that is, to the inverse scoring transformation

T^{S} = T^{α β μ}

, which can be formulated as

μ_{S P C n} (β) = - μ_{S P C n} (1 - β) .

(33)

This result is depicted in Figure 25.

Finally, it can be observed that the

F_{1} n

score metric is not symmetric under any transformation. The results for each metric are summarized in Table 10.

3.2. Identifying Cross-Symmetries

In order to explore whether any cross-symmetry can be identified among the 10 metrics, we have computed the distance (using Equation (20)) of the baseline of each metric (and its 31 possible transformations), to the remaining baseline metrics. The results are depicted in Figure 26. Each row corresponds to the baseline of a metric and each column to the baseline and its 31 transformations of the other metric. Any given metric-metric pair (small squares in the graphic) is shown in yellow if it has zero-distance for any possible transformation.

The diagonal presents a summary of the results explored in the previous section, that is, every metric, except for the

F_{1} n

score, presents some kind of symmetry under some transformation. The cases of cross-symmetries appear in the elements off diagonal. Two cross-symmetries arise: the

S N S n - S P C n

and the

P R C n - N P V n

.

In order to attain a deeper insight into these cross-symmetries, let us consider, for each of the two pairs, the distances between the baseline of the first metric in the pair and the full set of transformations (including the baseline) of the second metric. The results are depicted in Figure 27. Each row shows the cross-symmetries of a pair of metrics. In the columns are the 32 different transformations (including the baseline) of the second metric in the pair. Any given (second-metric transformation) pair (small squares in the graphic) is shown in yellow if it has zero-distance with the first metric baseline. As in Figure 15, the right-hand-side of the plot (with code greater than or equal to 16) corresponds to combined transformation where the

μ

axis has been inverted, that is, where the transformation type

μ

is present. This is therefore the area for antisymmetric behaviour.

Let us first analyse each pair of metrics in terms of the

P R C n - N P V n

or

P R C n - N P V n

pair, which present cross-symmetric behaviour for the combined transformations shown in Table 11.

Codes 12 and 15 correspond to the transformation

T^{σ δ}

or, in other words, to the inverse labelling transformation

T^{L} = T^{σ δ},

which can be formulated as

μ_{P R C n} (α, β, δ) = μ_{N P V n} (β, α, - δ) .

(34)

The results are depicted in Figure 28.

Code 19 corresponds to the transformation

T^{α β μ}

or, in other words, to the inverse scoring transformation

T^{S} = T^{α β μ}

, which can be formulated as

μ_{P R C n} (α, β, δ) = - μ_{N P V n} (1 - α, 1 - β, δ) .

(35)

The results are depicted in Figure 29.

Although the

P R C n - N P V n

pair is cross-symmetric with respect to the inverse labelling and to the inverse scoring transformations, this does not imply that it is also cross-symmetric with respect to the concatenations of these two transforms (see Equation (22)). This is the reason why code 31 (corresponding to the full inversion

T^{F} = T^{L} + T^{S} = T^{α β δ σ μ}

is not present in Table 11.

The results for the pair

N P V n - P R C n

are exactly the same. Therefore,

μ_{N P V n} (α, β, δ) = μ_{P R C n} (β, α, - δ) = - μ_{P R C n} (1 - α, 1 - β, δ) .

(36)

Let us now consider the pair of metrics

S N S n - S P C n

and its cross-symmetric behaviour, which is found for the combined transformations shown in Table 12.

Since specificity remains independent from

δ

(see Table 3), codes 25, 11, 12 and 15 correspond to

T^{σ δ}

, that is, to the inverse labelling which can be formulated as

μ_{S N S n} (α, β) = μ_{S P C n} (β, α) .

(37)

Additionally, since specificity is also independent of

α

, then codes 9 (

T^{α σ}

) and 13 (

T^{α σ δ}

) are equivalent to

T^{σ δ}

. Moreover, after a

T^{σ}

transformation, the resulting metric has no dependence on

β

(due to the axis inversion) and hence codes 10 (

T^{σ β}

) and 14 (

T^{σ β δ}

) are also equivalent to

T^{σ δ}

. These results are depicted in Figure 30.

On the other hand, code 31 corresponds to full inversion transformation

T^{F} = T^{σ δ α β μ},

which can be formulated as

μ_{S N S n} (α, β) = - μ_{S P C n} (1 - β, 1 - α) .

(38)

It can be shown that the remaining codes (25, 26, 27, 29 and 30) are also equivalent to

T^{F}

. Moreover, after a

T^{σ}

transformation, the resulting metric does not depend on

β

(due to the axis inversion) and hence codes 10 (

T^{σ β}

) and 14 (

T^{σ β δ}

) are also equivalent to

T^{σ δ}

. These results are depicted in Figure 31.

The results for the pair

S P C n - S N S n

are exactly the same, so

μ_{S P C n} (α, β) = μ_{S N S n} (β, α) = - μ_{S N S n} (1 - β, 1 - α) .

(39)

The results for every pair of cross-symmetric metrics are summarized in Table 13.

3.3. Skewness of the Statistical Descriptions of the Metrics

In order to explore the symmetric behaviour of the statistical descriptions of the metrics, let us recall that, for the baseline experiment,

μ_{j}^{B} = μ_{j}^{B} (α^{B}, β^{B}, δ^{B})

can be considered a statistical variable. First of all, let us select a subset of the

μ_{j}^{B}

corresponding to a certain value

δ_{0}

of the imbalance coefficient, that is,

μ_{j}^{B} (α^{B}, β^{B}, δ_{0})

and obtain its probability density function (pdf) which will be called local pdf (since it is obtained solely for a value of

δ^{B}

). The results

p d f (μ_{j}^{k}, δ_{0})

for every metric with

δ^{B} = 0.5

are shown in Figure 32.

This result can be generalized for various values of the imbalance coefficient

δ^{B}

by obtaining the

p d f (μ_{j}^{k}, δ^{B})

depicted in Figure 33 as a set of heatmap plots. In every plot, the horizontal axis represents the imbalance coefficient while the value of the metric is drawn in the vertical axis. The value of the

p d f

is colour-coded.

In Figure 32 and Figure 33, the symmetry of the statistical descriptions of the metrics can easily be observed. However, in order to achieve a more precise insight, the local skewness

ξ_{j}^{B}

of every

p d f

is obtained in accordance with Equation (23) and its value

ξ_{j}^{B} (δ^{B})

is shown in Figure 34 for every metric. It can be observed that 6 metrics (

S N S n

,

S P C n

,

A C C n

,

M C C

,

B M a n d M K

) have a symmetric

p d f

; one metric (

G M n

) has a

p d f

slightly asymmetric but its asymmetry does not depend on

δ^{B}

; 2 metrics (

P R C n

and

N P V n

) have a clearly asymmetric

p d f

but their skewness is symmetric with respect to the origin; and finally, the

F_{1} n

metric has a

p d f

and a skewness that are both asymmetric.

Let us now examine the

μ_{j}^{B}

for all the values of the imbalance coefficient

δ^{B}

, that is,

μ_{j}^{B} (α^{B}, β^{B}, δ^{B})

and obtain its probability density function (pdf) which will be called global pdf (as it is obtained for every

δ^{B}

). The resulting

p d f (μ_{j}^{k})

is shown in Figure 35 for every metric.

It can be observed that all the metrics show a symmetric

p d f

except for

G M n

and

F_{1} n

. The global pdf for

G M n

maintains the slight asymmetry of local pdf (global skewness of 0.18) since

G M n

does not depend on

δ

. In the cases of

P R C n

and

N P V n,

the symmetry of the local skewness compensates for their values and hence they show a symmetric global pdf. Finally, the positive values of

F_{1} n

local skewness partially compensate for its negative values (see Figure 34), which results in an almost uniform global pdf except for their extreme values (global skewness of 0.14). These results are summarized in Table 14.

4. Discussion

From the previous results, summarized in Table 10, Table 13 and Table 14, it can be seen that although several thousands of combined transformations have been tested, the performance metrics only present three types of symmetries: under labelling inversion; under scoring inversion; and under full inversion (the sequence of labelling and scoring inversion).

For a certain performance metric to be symmetric under labelling inversion means that it pays attention to or focuses on, positive and negative classes with the same intensity and therefore classes can be exchanged without affecting the value of the metric. These metrics should be used in applications where the cost of misclassification is the same for each class. This is the case for 5 out of the 10 metrics tested:

A C C n

,

M C C

,

B M

,

M K and G M n

.

Other metrics, however, are more focused on the classification results obtained for the positive class. This is the case of 3 metrics:

S N S n

, which only depends on

α

;

P R C n,

which measures the ratio of success on the elements classified as positive; and the

F_{1}

score, which is a combination of

S N S n

and

P R C n

. These metrics found their main applications when the cost of misclassifying the positive class is higher than the cost of misclassifying the negative class, for instance, in the case of disease detection in medical diagnostics. Finally, other metrics are more focused on the classification results obtained for the negative class. This is the case of 2 metrics:

S P C n,

which only depends on

β

; and

N P V n,

which measures the ratio of success on the elements classified as negative. These 2 metrics are mainly applied if the most important issue is the misclassification of negative classes, for instance, in the case of identification of non-reliable clients in granting loans.

On the other hand, if a metric shows symmetric behaviour under scoring inversion it means that the good classifiers are positively scored to the same extent as bad classifiers are negatively scored. For instance, let us consider a first classifier which correctly classifies 80% of positive elements and also 70% of negative elements. Additionally, a second classifier obtains a ratio of 20% for positive and 30% for negative elements. A scoring-inversion symmetric-performance metric would have a value of, for example, +0.5 for the first classifier and a value of −0.5 for the second classifier. Therefore, the scoring symmetry indicates the relative importance assigned by the metric to the good and bad classifiers. This is the case for 6 out of the 10 metrics tested:

A C C n

,

M C C

,

B M

,

M K

,

G M n

,

S N S n and S P C n

. Conversely,

G M n

is more demanding as regards scoring good results than scoring bad results. This feature can be useful if the objective of the classification is focused on obtaining excellent results (and not just good results). Finally, on 3 of the metrics tested (

P R C n

,

N P V n a n d F_{1} n

), awarding good results differs from scoring bad results in that it depends on the relative values of the parameters (

α

,

β

and

δ

).

Additionally, it can be seen that metrics showing both labelling and scoring symmetries also show symmetry for the full inversion (concatenation of the two symmetries). This is the case for 4 out of the 10 metrics tested:

A C C n

,

M C C

,

B M and M K

. An interesting result is that for

P R C n

and

N P V n

, although they have no labelling nor scoring symmetry, they do have full inversion symmetry. This fact means that swapping the positive and negative class labels also inverts how the good and bad classifiers are scored. An example of all these symmetries can be found in Table 15.

A particular degenerate case of symmetry arises when a metric depends on none of the variables. For example, from the results obtained in this research, several metrics have shown themselves to be independent of the imbalance coefficient

δ

. This is the case for 4 out of the 10 metrics tested:

S N S n

,

S P C n

,

G M n

and

B M

. This is a particularly interesting result, since these metrics have no kind of bias if the classes are imbalanced. Conversely, the interpretation of classification metrics which do depend on

δ

should be carefully considered since they can be misleading as to what a good classifier is.

Additionally, some other metrics appear to be independent from the classification success ratios:

S N S n,

which only depends on

α

; and

S P C n,

which only depends on

β

. This can be interpreted as a sort of one-dimensionality of these metrics, that is,

S N S n

is only focused on the positive class, while

S P C n

is only concerned about the negative class.

On the other hand, the two pairs of cross-symmetries found can be straightforwardly interpreted: when the labelling of classes are inverted,

S N S n

becomes

S P C n

and

P R C n

becomes

N P V n

. Moreover, by exchanging the scoring procedure of good and bad classifiers,

P R C n

becomes

N P V n

.

Let us now focus on the interpretation of the results of statistical symmetries. Statistical local symmetry means that, for a certain dataset, that is, for a certain value of the imbalance coefficient, the probability that a random classifier obtains a good score is the same as the probability that it obtains a bad score. This is the case for 6 out of the 10 metrics tested:

A C C n

,

M C C

,

B M

,

M K

,

G M n

,

S N S n and S P C n

. They coincide with the metrics in that they have scoring symmetry, which shows that both concepts are closely related. Conversely,

G M n

has a greater probability of having a bad result than a good result, which is consistent with the fact that it is more demanding on obtaining excellent results (and not just good results). Additionally,

P R C n

obtains good results with a higher probability (lower probability in the case of

N P V n

) if the positive class is the majority class and vice versa if it is the minority class. Awarding good results differs from scoring bad ones in a way that depends on the relative values of the parameters (

α

,

β a n d δ

). Finally, in the case of balanced classes, the probability of obtaining good

F_{1} n

scores is greater than obtaining bad scores for, which shows some sort of indulgent judgment. However, the detailed behaviour of

F_{1} n

scores for different values of

δ

is more complex.

On the other hand, statistical global symmetry means that the probability that a random classifier operating on a random dataset obtains a good score is the same as obtaining a bad score. This is the case for 8 out of the 10 metrics tested:

A C C n

,

M C C

,

B M

,

M K

,

G M n

,

S N S n

,

S P C n

,

P R C n

and

N P V n

. Conversely,

G M n

and

F_{1} n

are more likely to have a bad result than a good result, which can be interpreted as meaning that they are slightly tough judges.

On considering all these results and their meanings, the ten metrics can be organized into 5 clusters that show the features described in Table 16.

In Table 16, the identification of clusters has been carried out by means of informal reasoning. To formalize these analyses, every metric has been described with a set of features corresponding to the columns in Table 16. Most of the columns are binary valued (yes or no), while others admit several values. For instance, labelling symmetry value can be yes, no,

S N S n - S P C n

cross-symmetry or

P R C n - N P V n

cross-symmetry. In these cases, a one-hot coding mechanism (also called 1-of-K scheme) is employed [39]. The result is that each metric is defined using a set of 14 features. Although regular or advanced clustering techniques can be used [40,41,42,43], the reduced number of elements in the dataset (10 performance metrics) invites to address the problem using more intuitive methods. Using Principal Component Analysis (PCA) [44], the problem can be reduced to a bi-dimensional plane and its result is depicted in Figure 36. The 5 clusters mentioned in this section clearly appear therein.

Another way to represent how performance metrics are grouped according to their symmetries is by drawing a dendrogram [45]. To this end, the 14 features are employed to characterize each performance metric. The distances between the metrics are then computed in the space of the

ℝ^{14}

features. These distances are employed to gauge how much the metrics are separated, as shown in Figure 37. Once again, this result is consistent with the 5 previously identified clusters.

5. Conclusions

Based on the results obtained in our analysis, it can be stated that the majority of the most commonly used classification performance metrics present some type of symmetry. We have identified 3 and only 3 types of symmetric behaviour: labelling inversion, scoring inversion and the combination of the two inversions. Additionally, several metrics have been revealed as being robust under imbalanced datasets, while others do not show this important feature. Finally two metrics has been identified as one-dimensional, in that they focus exclusively on the positive (sensitivity) or on the negative class (specificity). The metrics have been grouped into 5 clusters according to their symmetries.

Selecting one performance metric or another is mainly a matter of its application, depending on issues such as whether the dataset is balanced, misclassification has the same cost in either class and whether good scores should only be reserved for very good classification ratios. None of the studied metrics can be universally applied. However, according to their symmetries, two of these metrics appear especially worthy in general-purpose applications: the Bookmaker Informedness (

B M

) and the Geometric Mean (

G M

). Both of these metrics are robust under imbalanced datasets and treat both classes in the same way (labelling symmetry). The former metric (

B M

) also has scoring symmetry while the latter (

G M

) is slightly more demanding in terms of scoring good results over bad results.

In future research, the methodology for the analysis of symmetry developed in this paper can be extended to other classification performance metrics, such as those derived from multiclass confusion matrix or some ranking metrics (i.e. Receiver Operating Characteristic curve).

Author Contributions

A.L. conceived and designed the experiments; A.L., A.C., A.M. and J.R.L. performed the experiments, analysed the data and wrote the paper.

Funding

This research was funded by the Telefónica Chair “Intelligence in Networks” of the University of Seville.

Conflicts of Interest

The authors declare there to be no conflict of interest. The founding sponsors played no role: in the design of the study; in the collection, analyses or interpretation of data; in the writing of the manuscript; and in the decision to publish the results.

References

Speiser, A. Symmetry in science and art. Daedalus 1960, 89, 191–198. [Google Scholar]
Wigner, E. The Unreasonable Effectiveness of Mathematics. In Natural Sciences–Communications in Pure and Applied Mathematics; Interscience Publishers Inc.: New York, NY, USA, 1960; Volume 13, p. 1. [Google Scholar]
Islami, A. A match not made in heaven: On the applicability of mathematics in physics. Synthese 2017, 194, 4839–4861. [Google Scholar] [CrossRef]
Siegrist, J. Symmetry in social exchange and health. Eur. Rev. 2005, 13, 145–155. [Google Scholar] [CrossRef] [Green Version]
Varadarajan, V.S. Symmetry in mathematics. Comput. Math. Appl. 1992, 24, 37–44. [Google Scholar] [CrossRef] [Green Version]
Garrido, A. Symmetry and Asymmetry Level Measures. Symmetry 2010, 2, 707–721. [Google Scholar] [CrossRef] [Green Version]
Xiao, Y.H.; Wu, W.T.; Wang, H.; Xiong, M.; Wang, W. Symmetry-based structure entropy of complex networks. Phys. A Stat. Mech. Appl. 2008, 387, 2611–2619. [Google Scholar] [CrossRef] [Green Version]
Magee, J.J.; Betke, M.; Gips, J.; Scott, M.R.; Waber, B.N. A human–computer interface using symmetry between eyes to detect gaze direction. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 2008, 38, 1248–1261. [Google Scholar] [CrossRef]
Liu, Y.; Hel-Or, H.; Kaplan, C.S.; Van Gool, L. Computational symmetry in computer vision and computer graphics. Found. Trends Comput. Gr. Vis. 2010, 5, 1–195. [Google Scholar] [CrossRef]
Tai, W.L.; Chang, Y.F. Separable Reversible Data Hiding in Encrypted Signals with Public Key Cryptography. Symmetry 2018, 10, 23. [Google Scholar] [CrossRef]
Graham, J.H.; Whitesell, M.J.; II, M.F.; Hel-Or, H.; Nevo, E.; Raz, S. Fluctuating asymmetry of plant leaves: Batch processing with LAMINA and continuous symmetry measures. Symmetry 2015, 7, 255–268. [Google Scholar] [CrossRef]
Bishop, C.M. Pattern Recognition and Machine Learning (Information Science and Statistics); Springer: New York, NY, USA, 2006. [Google Scholar]
Top 10 Technology Trends for 2018: IEEE Computer Society Predicts the Future of Tech. Available online: https://www.computer.org/web/pressroom/top-technology-trends-2018 (accessed on 18 October 2018).
Brachmann, A.; Redies, C. Using convolutional neural network filters to measure left-right mirror symmetry in images. Symmetry 2016, 8, 144. [Google Scholar] [CrossRef]
Zhang, P.; Shen, H.; Zhai, H. Machine learning topological invariants with neural networks. Phys. Rev. Lett. 2018, 120, 066401. [Google Scholar] [CrossRef] [PubMed]
Luque, A.; Gómez-Bellido, J.; Carrasco, A.; Barbancho, J. Optimal Representation of Anuran Call Spectrum in Environmental Monitoring Systems Using Wireless Sensor Networks. Sensors 2018, 18, 1803. [Google Scholar] [CrossRef]
Romero, J.; Luque, A.; Carrasco, A. Anuran sound classification using MPEG-7 frame descriptors. In Proceedings of the XVII Conferencia de la Asociación Española para la Inteligencia Artificial (CAEPIA), Granada, Spain, 23–26 October 2016. [Google Scholar]
Luque, A.; Romero-Lemos, J.; Carrasco, A.; Barbancho, J. Non-sequential automatic classification of anuran sounds for the estimation of climate-change indicators. Exp. Syst. Appl. 2018, 95, 248–260. [Google Scholar] [CrossRef]
Glowacz, A. Fault diagnosis of single-phase induction motor based on acoustic signals. Mech. Syst. Signal Process. 2019, 117, 65–80. [Google Scholar] [CrossRef]
Glowacz, A. Acoustic-Based Fault Diagnosis of Commutator Motor. Electronics 2018, 7, 299. [Google Scholar] [CrossRef]
Caruana, R.; Niculescu-Mizil, A. Data mining in metric space: An empirical analysis of supervised learning performance criteria. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, WA, USA, 22–25 August 2004. [Google Scholar]
Ferri, C.; Hernández-Orallo, J.; Modroiu, R. An experimental comparison of performance measures for classification. Pattern Recognit. Lett. 2009, 30, 27–38. [Google Scholar] [CrossRef] [Green Version]
Hossin, M.; Sulaiman, M.N. A review on evaluation metrics for data classification evaluations. Int. J. Data Min. Knowl. Manag. Process 2015, 5, 1. [Google Scholar]
Ting, K.M. Confusion matrix. In Encyclopedia of Machine Learning and Data Mining; Springer: Boston, MA, USA, 2017; p. 260. [Google Scholar]
Aly, M. Survey on multiclass classification methods. Neural Netw. 2005, 19, 1–9. [Google Scholar]
Tsai, M.F.; Yu, S.S. Distance metric based oversampling method for bioinformatics and performance evaluation. J. Med. Syst. 2016, 40, 159. [Google Scholar] [CrossRef]
García, V.; Mollineda, R.A.; Sánchez, J.S. Index of balanced accuracy: A performance measure for skewed class distributions. In Iberian Conference on Pattern Recognition and Image Analysis; Springer: Berlin/Heidelberg, Germany, 2009; pp. 441–448. [Google Scholar]
López, V.; Fernández, A.; García, S.; Palade, V.; Herrera, F. An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics. Inf. Sci. 2013, 250, 113–141. [Google Scholar] [CrossRef]
Daskalaki, S.; Kopanas, I.; Avouris, N. Evaluation of classifiers for an uneven class distribution problem. Appl. Artif. Intell. 2006, 20, 381–417. [Google Scholar] [CrossRef]
Amin, A.; Anwar, S.; Adnan, A.; Nawaz, M.; Howard, N.; Qadir, J.; Hussain, A. Comparing oversampling techniques to handle the class imbalance problem: A customer churn prediction case study. IEEE Access 2016, 4, 7940–7957. [Google Scholar] [CrossRef]
Jeni, L.A.; Cohn, J.F.; De La Torre, F. Facing imbalanced data--recommendations for the use of performance metrics. In Proceedings of the 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, Geneva, Switzerland, 2–5 September 2013. [Google Scholar]
Powers, D.M. Evaluation: From Precision, Recall and F-measure to ROC, Informedness, Markedness and Correlation; Technical Report SIE-07-001; School of Informatics and Engineering, Flinders University: Adelaide, Australia, 2011. [Google Scholar]
Sokolova, M.; Lapalme, G. A systematic analysis of performance measures for classification tasks. Inf. Process. Manag. 2009, 45, 427–437. [Google Scholar] [CrossRef]
Matthews, B.W. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim. Biophys. Acta Protein Struct. 1975, 405, 442–451. [Google Scholar] [CrossRef]
Jurman, G.; Riccadonna, S.; Furlanello, C. A comparison of MCC and CEN error measures in multi-class prediction. PLoS ONE 2012, 7, e41882. [Google Scholar] [CrossRef] [PubMed]
Gorodkin, J. Comparing two K-category assignments by a K-category correlation coefficient. Comput. Biol. Chem. 2004, 28, 367–374. [Google Scholar] [CrossRef] [PubMed]
Flach, P.A. The geometry of ROC space: Understanding machine learning metrics through ROC isometrics. In Proceedings of the 20th International Conference on Machine Learning (ICML-03), Washington, DC, USA, 21–24 August 2003. [Google Scholar]
Sokolova, M.; Japkowicz, N.; Szpakowicz, S. Beyond accuracy, F-score and ROC: A family of discriminant measures for performance evaluation. In Australasian Joint Conference on Artificial Intelligence; Springer: Berlin/Heidelberg, Germany, 2006; pp. 1015–1021. [Google Scholar]
Bishop, C.M. Pattern Recognition and Machine Learning; Springer: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
Chakraborty, S.; Das, S. k—Means clustering with a new divergence-based distance metric: Convergence and performance analysis. Pattern Recognit. Lett. 2017, 100, 67–73. [Google Scholar] [CrossRef]
Wang, Y.; Lin, X.; Wu, L.; Zhang, W.; Zhang, Q.; Huang, X. Robust subspace clustering for multi-view data by exploiting correlation consensus. IEEE Trans. Image Process. 2015, 24, 3939–3949. [Google Scholar] [CrossRef]
Wang, Y.; Zhang, W.; Wu, L.; Lin, X.; Zhao, X. Unsupervised metric fusion over multiview data by graph random walk-based cross-view diffusion. IEEE Trans. Neural Netw. Learn. Syst. 2017, 28, 57–70. [Google Scholar] [CrossRef]
Wu, L.; Wang, Y.; Shao, L. Cycle-Consistent Deep Generative Hashing for Cross-Modal Retrieval. IEEE Trans. Image Process. 2019, 28, 1602–1612. [Google Scholar] [CrossRef] [PubMed]
Jolliffe, I. Principal component analysis. In International Encyclopedia of Statistical Science; Springer: Berlin/Heidelberg, Germany, 2011; pp. 1094–1096. [Google Scholar]
Earle, D.; Hurley, C.B. Advances in dendrogram seriation for application to visualization. J. Comput. Gr. Stat. 2015, 24, 1–25. [Google Scholar] [CrossRef]

Figure 1. Node of the Wireless Sensor Network where the symmetry of classification performance metrics has been primarily applied.

Figure 2. Imbalance coefficient (solid blue line) and imbalance ratio (dashed green line) vs. the proportion of positive elements in the dataset.

Figure 3. 3D representation of a 4-dimension metric value

μ_{j}^{k} (α^{k}, β^{k}, δ^{k})

. The value of the metric

μ_{j}^{k}

is colour-coded for every point in the

(α^{k}, β^{k}, δ^{k})

3D space.

Figure 3. 3D representation of a 4-dimension metric value

μ_{j}^{k} (α^{k}, β^{k}, δ^{k})

. The value of the metric

μ_{j}^{k}

is colour-coded for every point in the

(α^{k}, β^{k}, δ^{k})

3D space.

Figure 4. Representation of a metric value

μ_{j}^{k} (α^{k}, β^{k})

for

δ = 0.75

. (a) Slice of the 3D graphic by a plane corresponding to

δ = 0.75

; (b) 2D representation of the slice.

Figure 4. Representation of a metric value

μ_{j}^{k} (α^{k}, β^{k})

for

δ = 0.75

. (a) Slice of the 3D graphic by a plane corresponding to

δ = 0.75

; (b) 2D representation of the slice.

Figure 5. Heat map of a metric value

μ_{j}^{k} (α^{k}, β^{k})

for

δ = 0.75

.

Figure 5. Heat map of a metric value

μ_{j}^{k} (α^{k}, β^{k})

for

δ = 0.75

.

Figure 6. Panel of heat maps representing the metric

μ_{j}^{k} (α^{k}, β^{k}, δ^{k})

.

Figure 6. Panel of heat maps representing the metric

μ_{j}^{k} (α^{k}, β^{k}, δ^{k})

.

Figure 7. Transformation type

α

of a metric. (a) Baseline metric. (b) Reflection symmetry with respect to the hyperplane

α = 0.5

.

Figure 7. Transformation type

α

of a metric. (a) Baseline metric. (b) Reflection symmetry with respect to the hyperplane

α = 0.5

.

Figure 8. Transformation type

β

of a metric. (a) Baseline metric; (b) Reflection symmetry with respect to the hyperplane

β = 0.5

.

Figure 8. Transformation type

β

of a metric. (a) Baseline metric; (b) Reflection symmetry with respect to the hyperplane

β = 0.5

.

Figure 9. Transformation type

δ

of a metric. (a) Baseline metric. (b) Reflection symmetry with respect to the hyperplane

δ = 0

.

Figure 9. Transformation type

δ

of a metric. (a) Baseline metric. (b) Reflection symmetry with respect to the hyperplane

δ = 0

.

Figure 10. Transformation type

μ

of a metric. (a) Baseline metric. (b) Reflection symmetry with respect to the hyperplane

μ = 0

.

Figure 10. Transformation type

μ

of a metric. (a) Baseline metric. (b) Reflection symmetry with respect to the hyperplane

μ = 0

.

Figure 11. Transformation type

σ

of a metric. (a) Baseline metric; (b) Reflection symmetry with respect to the hyperplane

α = β

.

Figure 11. Transformation type

σ

of a metric. (a) Baseline metric; (b) Reflection symmetry with respect to the hyperplane

α = β

.

Figure 12. Transformation by inverse labelling of classes (

T^{L}

). (a) Baseline metric; (b) Reflection symmetry with respect to the main diagonal (

T^{σ}

); (c) Reflection symmetry with respect to the plane

δ = 0

(

T^{δ})

; (d) Reflection symmetry with respect to the plane

μ = 0

(colour inversion,

T^{μ}

).

Figure 12. Transformation by inverse labelling of classes (

T^{L}

). (a) Baseline metric; (b) Reflection symmetry with respect to the main diagonal (

T^{σ}

); (c) Reflection symmetry with respect to the plane

δ = 0

(

T^{δ})

; (d) Reflection symmetry with respect to the plane

μ = 0

(colour inversion,

T^{μ}

).

Figure 13. Transformation by inverse scoring (

T^{S}

). (a) Baseline metric; (b) Reflection symmetry with respect to the plane

α = 0

(

T^{α}

); (c) Reflection symmetry with respect to the plane

β = 0

(

T^{β})

; (d) Reflection symmetry with respect to the plane

μ = 0

(colour inversion,

T^{μ}

).

Figure 13. Transformation by inverse scoring (

T^{S}

). (a) Baseline metric; (b) Reflection symmetry with respect to the plane

α = 0

(

T^{α}

); (c) Reflection symmetry with respect to the plane

β = 0

(

T^{β})

; (d) Reflection symmetry with respect to the plane

μ = 0

(colour inversion,

T^{μ}

).

Figure 14. Transformation by full inversion scoring (

T^{F}

). (a) Baseline metric; (b) Reflection symmetry with respect to the plane

α = 0

(

T^{α}

); (c) Reflection symmetry with respect to the plane

β = 0

(

T^{β})

. (c) Reflection symmetry with respect to the main diagonal (

T^{σ}

); (d) Reflection symmetry with respect to the plane

δ = 0

(

T^{δ})

. (e) Reflection symmetry with respect to the plane

μ = 0

(colour inversion,

T^{μ}

).

Figure 14. Transformation by full inversion scoring (

T^{F}

). (a) Baseline metric; (b) Reflection symmetry with respect to the plane

α = 0

(

T^{α}

); (c) Reflection symmetry with respect to the plane

β = 0

(

T^{β})

. (c) Reflection symmetry with respect to the main diagonal (

T^{σ}

); (d) Reflection symmetry with respect to the plane

δ = 0

(

T^{δ})

. (e) Reflection symmetry with respect to the plane

μ = 0

(colour inversion,

T^{μ}

).

Figure 15. Symmetric behaviour of performance metrics for any combined transformation.

Figure 16. Symmetry of accuracy with respect to inverse labelling (

T^{L}

). (a) Baseline metric; (b) Reflection symmetry with respect to the main diagonal (

T^{σ}

); (c) Reflection symmetry with respect to the plane

δ = 0

(

T^{δ})

.

Figure 16. Symmetry of accuracy with respect to inverse labelling (

T^{L}

). (a) Baseline metric; (b) Reflection symmetry with respect to the main diagonal (

T^{σ}

); (c) Reflection symmetry with respect to the plane

δ = 0

(

T^{δ})

.

Figure 17. Symmetry of accuracy with respect to the inverse scoring (

T^{S}

). (a) Baseline metric; (b) Reflection symmetry with respect to the plane

α = 0

(

T^{α}

); (c) Reflection symmetry with respect to the plane

β = 0

(

T^{β})

; (d) Reflection symmetry with respect to the plane

μ = 0

(colour inversion,

T^{μ}

).

Figure 17. Symmetry of accuracy with respect to the inverse scoring (

T^{S}

). (a) Baseline metric; (b) Reflection symmetry with respect to the plane

α = 0

(

T^{α}

); (c) Reflection symmetry with respect to the plane

β = 0

(

T^{β})

; (d) Reflection symmetry with respect to the plane

μ = 0

(colour inversion,

T^{μ}

).

Figure 18. Symmetry of accuracy with respect to the full inversion (

T^{F}

). (a) Baseline metric; (b) Reflection symmetry with respect to the plane

α = 0

(

T^{α}

); (c) Reflection symmetry with respect to the plane

β = 0

(

T^{β})

; (d) Reflection symmetry with respect to the main diagonal (

T^{σ}

); (e) Reflection symmetry with respect to the plane

δ = 0

(

T^{δ})

; (f) Reflection symmetry with respect to the plane

μ = 0

(colour inversion,

T^{μ}

).

Figure 18. Symmetry of accuracy with respect to the full inversion (

T^{F}

). (a) Baseline metric; (b) Reflection symmetry with respect to the plane

α = 0

(

T^{α}

); (c) Reflection symmetry with respect to the plane

β = 0

(

T^{β})

; (d) Reflection symmetry with respect to the main diagonal (

T^{σ}

); (e) Reflection symmetry with respect to the plane

δ = 0

(

T^{δ})

; (f) Reflection symmetry with respect to the plane

μ = 0

(colour inversion,

T^{μ}

).

Figure 19. Symmetry of precision with respect to the full inversion (

T^{F}

). (a) Baseline metric; (b) Reflection symmetry with respect to the plane

α = 0

(

T^{α}

); (c) Reflection symmetry with respect to the plane

β = 0

(

T^{β})

; (d) Reflection symmetry with respect to the main diagonal (

T^{σ}

); (e) Reflection symmetry with respect to the plane

δ = 0

(

T^{δ})

; (f) Reflection symmetry with respect to the plane

μ = 0

(colour inversion,

T^{μ}

).

Figure 19. Symmetry of precision with respect to the full inversion (

T^{F}

). (a) Baseline metric; (b) Reflection symmetry with respect to the plane

α = 0

(

T^{α}

); (c) Reflection symmetry with respect to the plane

β = 0

(

T^{β})

; (d) Reflection symmetry with respect to the main diagonal (

T^{σ}

); (e) Reflection symmetry with respect to the plane

δ = 0

(

T^{δ})

; (f) Reflection symmetry with respect to the plane

μ = 0

(colour inversion,

T^{μ}

).

Figure 20. Symmetry of geometric mean with respect to

T^{σ}

. (a) Baseline metric; (b) Reflection symmetry with respect to the main diagonal (

T^{σ}

).

Figure 20. Symmetry of geometric mean with respect to

T^{σ}

. (a) Baseline metric; (b) Reflection symmetry with respect to the main diagonal (

T^{σ}

).

Figure 21. Symmetry of bookmaker informedness with respect to

T^{σ}

. (a) Baseline metric; (b) Reflection symmetry with respect to the main diagonal (

T^{σ}

).

Figure 21. Symmetry of bookmaker informedness with respect to

T^{σ}

. (a) Baseline metric; (b) Reflection symmetry with respect to the main diagonal (

T^{σ}

).

Figure 22. Symmetry of bookmaker informedness with respect to the inverse scoring (

T^{S}

). (a) Baseline metric; (b) Reflection symmetry with respect to the plane

α = 0

(

T^{α}

); (c) Reflection symmetry with respect to the plane

β = 0

(

T^{β})

; (d) Reflection symmetry with respect to the plane

μ = 0

(colour inversion,

T^{μ}

).

Figure 22. Symmetry of bookmaker informedness with respect to the inverse scoring (

T^{S}

). (a) Baseline metric; (b) Reflection symmetry with respect to the plane

α = 0

(

T^{α}

); (c) Reflection symmetry with respect to the plane

β = 0

(

T^{β})

; (d) Reflection symmetry with respect to the plane

μ = 0

(colour inversion,

T^{μ}

).

Figure 23. Symmetry of bookmaker informedness with respect to the full inversion (

T^{F}

). (a) Baseline metric; (b) Reflection symmetry with respect to the plane

α = 0

(

T^{α}

); (c) Reflection symmetry with respect to the plane

β = 0

(

T^{β})

; (c) Reflection symmetry with respect to the main diagonal (

T^{σ}

); (d) Reflection symmetry with respect to the plane

δ = 0

(

T^{δ})

; (e) Reflection symmetry with respect to the plane

μ = 0

(colour inversion,

T^{μ}

).

Figure 23. Symmetry of bookmaker informedness with respect to the full inversion (

T^{F}

). (a) Baseline metric; (b) Reflection symmetry with respect to the plane

α = 0

(

T^{α}

); (c) Reflection symmetry with respect to the plane

β = 0

(

T^{β})

; (c) Reflection symmetry with respect to the main diagonal (

T^{σ}

); (d) Reflection symmetry with respect to the plane

δ = 0

(

T^{δ})

; (e) Reflection symmetry with respect to the plane

μ = 0

(colour inversion,

T^{μ}

).

Figure 24. Symmetry of sensitivity with respect to the combined transformation (

T^{α μ}

). (a) Baseline metric; (b) Reflection symmetry with respect to the plane

α = 0

(

T^{α}

); (c) Reflection symmetry with respect to the plane

μ = 0

(colour inversion,

T^{μ}

).

Figure 24. Symmetry of sensitivity with respect to the combined transformation (

T^{α μ}

). (a) Baseline metric; (b) Reflection symmetry with respect to the plane

α = 0

(

T^{α}

); (c) Reflection symmetry with respect to the plane

μ = 0

(colour inversion,

T^{μ}

).

Figure 25. Symmetry of specificity with respect to the combined transformation (

T^{β μ}

) (a) Baseline metric; (b) Reflection symmetry with respect to the plane

β = 0

(

T^{β}

); (c) Reflection symmetry with respect to the plane

μ = 0

(colour inversion,

T^{μ}

).

Figure 25. Symmetry of specificity with respect to the combined transformation (

T^{β μ}

) (a) Baseline metric; (b) Reflection symmetry with respect to the plane

β = 0

(

T^{β}

); (c) Reflection symmetry with respect to the plane

μ = 0

(colour inversion,

T^{μ}

).

Figure 26. Cross-symmetric behaviour of performance metrics for any combined transformation.

Figure 27. Cross-symmetric behaviour for any combined transformation.

Figure 28. Cross-symmetry of the

P R C n - N P V n

pair with respect to the inverse labelling (

T^{L}

). (a) Baseline

P R C n

metric; (b) Baseline

N P V n

metric; (c) Reflection symmetry of

N P V n

with respect to the main diagonal (

T^{σ}

); (d) Reflection symmetry of

N P V n

with respect to the plane

δ = 0

(

T^{δ})

.

Figure 28. Cross-symmetry of the

P R C n - N P V n

pair with respect to the inverse labelling (

T^{L}

). (a) Baseline

P R C n

metric; (b) Baseline

N P V n

metric; (c) Reflection symmetry of

N P V n

with respect to the main diagonal (

T^{σ}

); (d) Reflection symmetry of

N P V n

with respect to the plane

δ = 0

(

T^{δ})

.

Figure 29. Cross-symmetry of the

P R C n - N P V n

pair with respect to the inverse scoring (

T^{S}

). (a) Baseline

P R C n

metric; (b) Baseline

N P V n

metric. (c) Reflection symmetry of

N P V n

with respect to the plane

α = 0

(

T^{α}

); (d) Reflection symmetry of

N P V n

with respect to the plane

β = 0

(

T^{β})

; (e) Reflection symmetry of

N P V n

with respect to the plane

μ = 0

(colour inversion,

T^{μ}

).

Figure 29. Cross-symmetry of the

P R C n - N P V n

pair with respect to the inverse scoring (

T^{S}

). (a) Baseline

P R C n

metric; (b) Baseline

N P V n

metric. (c) Reflection symmetry of

N P V n

with respect to the plane

α = 0

(

T^{α}

); (d) Reflection symmetry of

N P V n

with respect to the plane

β = 0

(

T^{β})

; (e) Reflection symmetry of

N P V n

with respect to the plane

μ = 0

(colour inversion,

T^{μ}

).

Figure 30. Cross-symmetry of the

S N S n - S P C n

pair with respect to the inverse labelling (

T^{L}

). (a) Baseline

S N S n

metric; (b) Baseline

S P C n

metric; (c) Reflection symmetry of

S P C n

with respect to the main diagonal (

T^{σ}

); (d) Reflection symmetry of

S P C n

with respect to the plane

δ = 0

(

T^{δ})

.

Figure 30. Cross-symmetry of the

S N S n - S P C n

pair with respect to the inverse labelling (

T^{L}

). (a) Baseline

S N S n

metric; (b) Baseline

S P C n

metric; (c) Reflection symmetry of

S P C n

with respect to the main diagonal (

T^{σ}

); (d) Reflection symmetry of

S P C n

with respect to the plane

δ = 0

(

T^{δ})

.

Figure 31. Cross-symmetry of the

S N S n - S P C n

pair with respect to the full inversion (

T^{F}

). (a) Baseline

S N S n

metric. (b) Baseline

S P C n

metric. (c) Reflection symmetry of

S P C n

with respect to the main diagonal (

T^{σ}

). (d) Reflection symmetry of

S P C n

with respect to the plane

δ = 0

(

T^{δ})

. (e) Reflection symmetry with respect to the plane

α = 0

(

T^{α}

). (f) Reflection symmetry with respect to the plane

β = 0

(

T^{β})

. (g) Reflection symmetry with respect to the plane

μ = 0

(colour inversion,

T^{μ}

).

Figure 31. Cross-symmetry of the

S N S n - S P C n

pair with respect to the full inversion (

T^{F}

). (a) Baseline

S N S n

metric. (b) Baseline

S P C n

metric. (c) Reflection symmetry of

S P C n

with respect to the main diagonal (

T^{σ}

). (d) Reflection symmetry of

S P C n

with respect to the plane

δ = 0

(

T^{δ})

. (e) Reflection symmetry with respect to the plane

α = 0

(

T^{α}

). (f) Reflection symmetry with respect to the plane

β = 0

(

T^{β})

. (g) Reflection symmetry with respect to the plane

μ = 0

(colour inversion,

T^{μ}

).

Figure 32. Local probability density function of every metric and

δ = 0

.

Figure 32. Local probability density function of every metric and

δ = 0

.

Figure 33. Local probability density function of every metric as a function of

δ

. The value of pdf is colour coded.

Figure 33. Local probability density function of every metric as a function of

δ

. The value of pdf is colour coded.

Figure 34. Skewness of the statistical description for every metric as a function of

δ

.

Figure 34. Skewness of the statistical description for every metric as a function of

δ

.

Figure 35. Global probability density function of every metric and

δ = 0

.

Figure 35. Global probability density function of every metric and

δ = 0

.

Figure 36. Bi-dimensional representation of performance metrics according to their symmetries.

Figure 37. Dendrogram of performance metrics according to their symmetries.

Table 1. Summary of basic transformations.

Transformation	$α^{k}$	$β^{k}$	$δ^{k}$	$μ^{k}$
$α$	$1 - α^{B}$	$β^{B}$	$δ^{B}$	$μ^{B}$
$β$	$α^{B}$	$1 - β^{B}$	$δ^{B}$	$μ^{B}$
$δ$	$α^{B}$	$β^{B}$	$- δ^{B}$	$μ^{B}$
$μ$	$α^{B}$	$β^{B}$	$δ^{B}$	$- μ^{B}$
$σ$	$β^{B}$	$α^{B}$	$δ^{B}$	$μ^{B}$

Table 2. Example of the coding of combined transformations.

Transformation Code	$μ$	$σ$	$δ$	$β$	$α$
28	1	1	1	0	0

Table 3. Definition of classification performance metrics.

Symbol	Metric	Scoring
$S N S$	Sensitivity	$\frac{a}{a + f}$
$S P C$	Specificity	$\frac{b}{b + g}$
$P R C$	Precision	$\frac{a}{a + g}$
$N P V$	Negative Predictive Value	$\frac{b}{b + f}$
$A C C$	Accuracy	$\frac{a + b}{a + f + b + g}$
$F_{1}$	$F_{1} score$	$2 \frac{P R C \cdot S N S}{P R C + S N S}$
$G M$	Geometric Mean	$\sqrt{S N S \cdot S P C}$
$M C C$	Matthews Correlation Coefficient	$\frac{a \cdot b - g \cdot f}{\sqrt{(a + g) (a + f) (b + g) (b + f)}}$
$B M$	Bookmaker Informedness	$S N S + S P C - 1$
$M K$	Markedness	$P P V + N P V - 1$

Table 4. Symmetric transformations of

A C C n

,

M C C

and

M K

.

Table 4. Symmetric transformations of

A C C n

,

M C C

and

M K

.

Code	$μ$	$σ$	$δ$	$β$	$α$	Specific Order	Any Order
12	0	1	1	0	0		$δ σ$
15	0	1	1	1	1	$α σ β (= σ)$ $β σ α (= σ)$	$δ$
19	1	0	0	1	1		$α β μ$
31	1	1	1	1	1	$α β σ$ $β α σ$ $σ α β$ $σ β α$	$δ μ$

Table 5. Symmetric transformations of

P R C n

and

N P V n

.

Table 5. Symmetric transformations of

P R C n

and

N P V n

.

Code	$μ$	$σ$	$δ$	$β$	$α$	Specific Order	Any Order
31	1	1	1	1	1	$α β σ$ $β α σ$ $σ α β$ $σ β α$	$δ μ$

Table 6. Symmetric transformations of

G M n

.

Table 6. Symmetric transformations of

G M n

.

Code	$σ$	$δ$	$β$	$α$	Specific Order	Any Order
4	0	1	0	0		$δ$
8	1	0	0	0		$σ$
11	1	0	1	1	$α σ β (= σ)$ $β σ α (= σ)$
12	1	1	0	0		$δ σ$
15	1	1	1	1	$α σ β (= σ)$ $β σ α (= σ)$	$δ$

Table 7. Symmetric transformations of

B M

.

Table 7. Symmetric transformations of

B M

.

Code	$μ$	$σ$	$δ$	$β$	$α$	Specific Order	Any Order
4	0	0	1	0	0		$δ$
8	0	1	0	0	0		$σ$
11	0	1	0	1	1	$α σ β (= σ)$ $β σ α (= σ)$
12	0	1	1	0	0		$δ σ$
15	0	1	1	1	1	$α σ β (= σ)$ $β σ α (= σ)$	$δ$
19	1	0	0	1	1		$α β μ$
23	1	0	1	1	1		$α β δ μ$
27	1	1	0	1	1	$α β σ$ $β α σ$ $σ α β$ $σ β α$	$μ$
31	1	1	1	1	1	$α β σ$ $β α σ$ $σ α β$ $σ β α$	$δ μ$

Table 8. Symmetric transformations of sensitivity.

Code	$μ$	$δ$	$β$	$α$	Any Order
2	0	0	1	0	$β$
4	0	1	0	0	$δ$
6	0	1	1	0	$β δ$
17	1	0	0	1	$α μ$
19	1	0	1	1	$α β μ$
21	1	1	0	1	$α δ μ$
23	1	1	1	1	$α β δ μ$

Table 9. Symmetric transformations of specificity.

Code	$μ$	$δ$	$β$	$α$	Any Order
1	0	0	0	1	$α$
4	0	1	0	0	$δ$
5	0	1	0	1	$α δ$
18	1	0	1	0	$β μ$
19	1	0	1	1	$α β μ$
22	1	1	1	0	$β δ μ$
23	1	1	1	1	$α β δ μ$

Table 10. Summary of symmetries.

Metric	Independent of			Symmetry (under Inversion of)
Metric	$α$	$β$	$δ$	Labelling	Scoring	Full
$S N S n$		✓	✓		✓
$S P C n$	✓		✓		✓
$P R C n$						✓
$N P V n$						✓
$A C C n$				✓	✓	✓
$F_{1} n$
$G M n$			✓	✓
$M C C$				✓	✓	✓
$B M$			✓	✓	✓	✓
$M K$				✓	✓	✓

Table 11. Cross-symmetric transformations of the

P R C n - N P V n

pair.

Table 11. Cross-symmetric transformations of the

P R C n - N P V n

pair.

Code	$μ$	$σ$	$δ$	$β$	$α$	Specific Order	Any Order
12	0	1	1	0	0		$δ σ$
15	0	1	1	1	1	$α σ β (= σ)$ $β σ α (= σ)$	$δ$
19	1	0	0	1	1		$α β μ$

Table 12. Cross-symmetric transformations of the

S N S n - S P C n

pair.

Table 12. Cross-symmetric transformations of the

S N S n - S P C n

pair.

Code	$μ$	$σ$	$δ$	$β$	$α$	Specific Order	Any Order
8	0	1	0	0	0		$σ$
9	0	1	0	0	1	$α σ$
10	0	1	0	1	0	$σ β$
11	0	1	0	1	1	$α σ β (= σ)$ $σ β α (= σ)$
12	0	1	1	0	0		$σ δ$
13	0	1	1	0	1	$α σ$	$δ$
14	0	1	1	1	0	$σ β$	$δ$
15	0	1	1	1	1	$α σ β (= σ)$ $σ β α (= σ)$	$δ$
25	1	1	0	0	1	$σ α$	$μ$
26	1	1	0	1	0	$β σ$	$μ$
27	1	1	0	1	1	$α β σ$ $β α σ$ $σ α β$ $σ β α$	$μ$
29	1	1	1	0	1	$σ α$	$δ μ$
30	1	1	1	1	0	$β σ$	$δ μ$
31	1	1	1	1	1	$α β σ$ $β α σ$ $σ α β$ $σ β α$	$δ μ$

Table 13. Summary of cross-symmetries.

Metric	Cross-Symmetry (under Inversion of)
Metric	Labelling	Scoring	Full
$S N S n$	$S P C n$	$(S N S n)$	$S P C n$
$S P C n$	$S N S n$	$(S P C n)$	$S N S n$
$P R C n$	$N P V n$	$N P V n$	$(P R C n)$
$N P V n$	$P R C n$	$P R C n$	$(N P V n)$

Table 14. Summary of statistical symmetry.

Metric	Statistical Symmetry
Metric	Local	Global (Skewness)
$S N S n$	✓	✓
$S P C n$	✓	✓
$P R C n$		✓
$N P V n$		✓
$A C C n$	✓	✓
$F_{1} n$		(0.14)
$G M n$		(0.18)
$M C C$	✓	✓
$B M$	✓	✓
$M K$	✓	✓

Table 15. Examples of symmetric behaviour of metrics under several transformations (for balanced classes). Numbers in bold represent cases of asymmetric behaviour.

Metric	Baseline $α : 0.8$ $; β : 0.7$	Labelling Inversion $α : 0.7; β : 0.8$	Scoring Inversion $α : 0.2$ $; β : 0.3$	Full Inversion $α : 0.3$ $; β : 0.2$
$A C C n$	$0.500$	$0.500$	$- 0.500$	$- 0.500$
$M C C$	$0.503$	$0.503$	$- 0.503$	$- 0.503$
$B M$	$0.500$	$0.500$	$- 0.500$	$- 0.500$
$M K$	$0.505$	$0.505$	$- 0.505$	$- 0.505$
$G M n$	$0.497$	$0.497$	$- 0.510$	$- 0.510$
$S N S n$	$0.600$	$0.400$	$- 0.600$	$- 0.400$
$S P C n$	$0.400$	$0.600$	$- 0.400$	$- 0.600$
$P R C n$	$0.455$	$0.556$	$- 0.556$	$- 0.455$
$N P V n$	$0.556$	$0.455$	$- 0.455$	$- 0.566$
$F_{1} n$	$0.524$	$0.474$	$- 0.579$	$- 0.429$

Table 16. Summary of symmetric behaviour.

Cluster		Metric	Independent of			Symmetry (under Inversion of)			Statistical Symmetry
Cluster		Metric	$α$	$β$	$δ$	Labelling	Scoring	Full	Local	Global (Skewness)
I	a	$A C C n$				✓	✓	✓	✓	✓
		$M C C$				✓	✓	✓	✓	✓
		$M K$				✓	✓	✓	✓	✓
	b	$B M$			✓	✓	✓	✓	✓	✓
II		$S N S n$		✓	✓	$S P C n$	✓	$S P C n$	✓	✓
II		$S P C n$	✓		✓	$S N S n$	✓	$S N S n$	✓	✓
III		$P R C n$				$N P V n$	$N P V n$	✓		✓
III		$N P V n$				$P R C n$	$P R C n$	✓		✓
IV		$G M n$			✓	✓				(0.18)
V		$F_{1} n$								(0.14)

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Luque, A.; Carrasco, A.; Martín, A.; Lama, J.R. Exploring Symmetry of Binary Classification Performance Metrics. Symmetry 2019, 11, 47. https://doi.org/10.3390/sym11010047

AMA Style

Luque A, Carrasco A, Martín A, Lama JR. Exploring Symmetry of Binary Classification Performance Metrics. Symmetry. 2019; 11(1):47. https://doi.org/10.3390/sym11010047

Chicago/Turabian Style

Luque, Amalia, Alejandro Carrasco, Alejandro Martín, and Juan Ramón Lama. 2019. "Exploring Symmetry of Binary Classification Performance Metrics" Symmetry 11, no. 1: 47. https://doi.org/10.3390/sym11010047

APA Style

Luque, A., Carrasco, A., Martín, A., & Lama, J. R. (2019). Exploring Symmetry of Binary Classification Performance Metrics. Symmetry, 11(1), 47. https://doi.org/10.3390/sym11010047

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Exploring Symmetry of Binary Classification Performance Metrics

Abstract

1. Introduction

2. Materials and Methods

2.1. Definitions

2.2. Representation of Metrics

2.3. Transformations

2.3.1. One-Dimensional Transformations

2.3.2. Multidimensional Transformations

2.3.3. Combined Transformations.

2.4. Performance Metrics

2.5. Exploring Symmetries

2.6. Statistical Symmetries

3. Results

3.1. Identifying Symmetries

3.2. Identifying Cross-Symmetries

3.3. Skewness of the Statistical Descriptions of the Metrics

4. Discussion

5. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI