Does Score Bias Correction Improve the Fusion of Classifiers?

Vergara, Luis; Salazar, Addisson

doi:10.3390/make7040151

Open AccessArticle

Does Score Bias Correction Improve the Fusion of Classifiers?

by

Luis Vergara

and

Addisson Salazar

^*

Institute of Telecommunications and Multimedia Applications, Universitat Politècnica de València, Camino de Vera s/n, 46022 València, Spain

^*

Author to whom correspondence should be addressed.

Mach. Learn. Knowl. Extr. 2025, 7(4), 151; https://doi.org/10.3390/make7040151

Submission received: 17 October 2025 / Revised: 15 November 2025 / Accepted: 21 November 2025 / Published: 24 November 2025

(This article belongs to the Section Learning)

Download

Browse Figures

Versions Notes

Abstract

We demonstrate that the potential bias in the scores generated by individual classifiers negatively affects their fusion. Consequently, we present an algorithm to improve the effectiveness of score fusion in classification. The algorithm corrects the score class conditional bias before fusion. The interest of the procedure is demonstrated theoretically, first in general terms and then considering exponential models for the score class conditional distributions. The case of beta distributions is also addressed using Monte Carlo simulations. Finally, a real-life application of fusion of two modalities (EEG, ECG) and two classifiers (Gaussian Bayes and Logistic Regression) is included, showing significant improvement with respect to conventional fusion without bias correction.

Keywords:

score fusion; bias correction; classification; fusion statistic; classifier fusion; multimodal fusion; unimodal fusion

1. Introduction

Fusion of scores is of interest whenever different entities collaborate to arrive at a common opinion on the occurrence of a given phenomenon. This is particularly relevant in the fusion of automatic classifiers where a score is to be assigned to each possible class [1,2,3,4,5,6,7]. Fused classifiers can correspond to one or several modalities. In the latter case, the term “multimodal fusion” is often used [8,9,10,11,12,13,14,15]. In the Bayesian approach, the scores can be considered posterior probabilities of each class, so they are restricted within the range 0–1 and must add up to 1. This imposes a relevant limitation on the score fusion framework, as explained below.

Let us consider the two-class problem

c = 0,1

. Given a feature vector

x

, a score

s

within (0, 1) is assigned to class

c = 1

, and a score

1 - s

is assigned to class

c = 0

. Therefore

s

can be considered an estimate of a binary random variable from a given feature vector

x

. A good estimator should provide scores close to 1 for feature vectors belonging to class

c = 1

and close to 0 for feature vectors belonging to class

c = 0

. From that perspective, notice that

s

is inevitably a conditionally biased estimate. This is immediately evident. Let us call

p (s / c = i)

to the probability density function (PDF) of

s

conditional to class

c = i

. Moreover, conditional to class

c = i

, the true value of the binary random variable will be

i

. Then, let us compute the conditional biases

b_{0}

and

b_{1}

(for simplicity in the rest of the analysis, we define the bias so that both are positive).

b_{0} = E (s / c = 0) - 0 = \underset{\geq 0}{\underset{⏟}{\int_{0}^{1} s \cdot p (s / c = 0) d s}} - 0 \Rightarrow 0 \leq b_{0} \leq 1 b_{1} = 1 - E (s / c = 1) = 1 - \underset{\leq 1}{\underset{⏟}{\int_{0}^{1} s \cdot p (s / c = 1) d s}} \geq 0 \Rightarrow 0 \leq b_{1} \leq 1,

(1)

where the lower bounds only hold for perfect (unrealistic) estimators, i.e.,

p (s / c = i) = δ (s - i)

. These (inevitable) biases impose a limitation on classifier performance. Let us demonstrate it from a rather inverse perspective: if we could compensate the bias, the detector performance would improve. Before that, let us assume that we use a maximum a posteriori (MAP) criterion, where the most probable class is selected, i.e., as follows:

s \begin{matrix} \overset{c = 1}{>} \\ \underset{c = 0}{<} \end{matrix} 1 - s \Leftrightarrow s \begin{matrix} \overset{c = 1}{>} \\ \underset{c = 0}{<} \end{matrix} 0.5 .

(2)

This implicitly assumes that the scores are calibrated, that is, that P(c = 1/s) = s. However, there is no conceptual problem in implementing the bias correction method described in the following section with uncalibrated scores. In fact, the goal of the correction method is to improve accuracy (a “post-thresholding indicator”), whereas calibration refers to using the scores as probabilities of class membership (a “pre-thresholding indicator”). Therefore, calibration and correction can be complementary; for example, it will always be possible to calibrate the corrected scores.

Equation (2) defines the two-class classification problem as a detection problem, where class

c = 1

is detected when the statistic

s

is greater than the threshold 0.5 (the term detector will be frequently used throughout the article). As we see bellow, if we could correct the conditional bias

b_{0}

, the probability of false alarm

P F A = \int_{0.5}^{1} p (s / c = 0) d s

will decrease and if we could correct the conditional bias

b_{1}

, the probability of detection

P D = \int_{0.5}^{1} p (s / c = 1) d s

will increase. Correcting

b_{0}

involves a shift to the left (towards 0) of

p (s / c = 0)

, while correcting

b_{1}

involves a shift to the right (towards 1) of

p (s / c = 1)

. Let us call

P F A_{b}

and

P D_{b}

to the corresponding probabilities after bias correction, then

P F A_{b} = \int_{0.5}^{1 - b_{0}} p (s + b_{0} / c = 0) d s \underset{s^{'} = s + b_{0}}{\underset{⏟}{=}} \int_{0.5 + b_{0}}^{1} p (s^{'} / c = 0) d s^{'} < \int_{0.5}^{1} p (s^{'} / c = 0) d s^{'} = P F A P D_{b} = \int_{0.5}^{1 + b_{1}} p (s - b_{1} / c = 1) d s \underset{s^{'} = s - b_{1}}{\underset{⏟}{=}} \int_{0.5 - b_{1}}^{1} p (s^{'} / c = 1) d s^{'} > \int_{0.5}^{1} p (s^{'} / c = 1) d s^{'} = P D .

(3)

Therefore, correcting the bias reduces

P F A

and increases

P D

, thus improving the detector performance. Unfortunately, direct correction of the conditional bias is an ill-posed problem. Certainly, we can compute sample estimates

{\hat{b}}_{0}

and

{\hat{b}}_{1}

from labeled training data. However, compensating for the bias of the sample under test requires knowledge of the true class because we have to subtract

{\hat{b}}_{0}

from the computed score if the true class is

c = 0

, but we must add

{\hat{b}}_{1}

if the true class is

c = 1

. But determining the true class of the sample under test is exactly the problem we are trying to solve from the very beginning. Furthermore, averaging individual scores from different detectors to obtain a score with reduced conditional bias is not an option because the individual biases are all positive. For example, it is well-known that by averaging

N

independent and identically distributed random variables (i.i.d.r.v.), we obtain a random variable where variance has been reduced by a factor

N

. However, the mean (and so the possible bias in an estimation context) remains the same. A more formal analysis of the conditional bias influence on the optimum fusion of classifiers is given in [16] for some specific models.

Despite the above discouraging arguments, we propose in the following section an algorithm for conditional bias correction in the framework of score fusion. We first provide an intuitive (rather heuristic) rationale of the proposal. Then, a general formal analysis is carried out. This later indicates that the expected improvement depends on the integration of the bivariate class conditional PDFs in a given area. So, in Section 3, the analytically tractable case of exponential PDFs is analyzed. Then in Section 4, we consider the more general case of beta-PDFs. Beta-distribution is the more common model for random variables in (0, 1). Unfortunately, this is analytically infeasible, so we resorted to Monte Carlo simulations. Finally, in Section 5, we evaluated the usefulness of the proposed algorithm in a real-world experiment, combining two biosignal modalities, electroencephalograms (EEGs) and electrocardiograms (ECGs), and two lightweight classifiers, Gaussian Bayes (GB) and Logistic Regression (LR), to improve sleep arousal detection.

2. The Algorithm for Conditional Bias Correction

Let us consider two individual classifiers defined, respectively, by the variable

m = 1,2

. Given a sample under test

x_{m}

, each classifier provides a score

0 \leq s_{m} \leq 1

, which, according to the previous section, represents the posterior probability of class

c = 1

, i.e.,

P (c = 1 / x_{m}) = s_{m}

, and hence

P (c = 0 / x_{m}) = 1 - s_{m}

. Every classifier implements the MAP test (2) with

s_{m}

; hence, the corresponding probabilities of false alarm and detection will be given by

P F A_{m} = \Pr (s_{m} > 0.5 / c = 0) = \int_{0.5}^{1} (s_{m} / c = 0) d s_{m} P D_{m} = \Pr (s_{m} > 0.5 / c = 1) = \int_{0.5}^{1} (s_{m} / c = 1) d s_{m} .

(4)

We will assume that

s_{1}

and

s_{2}

are i.i.d.r.v. to facilitate the analysis, so that we can write

(s_{1} s_{2} / c = i) = p (s_{1} / c = i) \cdot p (s_{2} / c = i) = p_{i} (s_{1}) p_{i} (s_{2}) P F A_{1} = P F A_{2} = P F A P D_{1} = P D_{2} = P D .

(5)

Let us start by the case of no bias correction. Scores are fused by computing the mean

\frac{1}{2} (s_{1} + s_{2})

, which is to be compared with the 0.5 threshold as in (2). For convenience, we prefer to consider the score

= s_{1} + s_{2}

0 \leq z \leq 2

, and compare

z

with a threshold 1, which is an equivalent test. It is evident that if both original scores exceed the threshold of 0.5, their sum will exceed the threshold of 1. And vice versa if both individual scores are below the threshold of 0.5. Therefore, the possible improvement through fusion will occur when both classifiers make different decisions, such that the score of the one who makes a mistake is offset by the score of the one who makes a correct decision.

Now consider that before adding the scores we apply a conditional bias correction. Let us define the biases of every classifier as in (1):

b_{m 0} = E [s_{m} / c = 0] - 0 > 0 b_{m 1} = 1 - E [s_{m} / c = 1] > 0

. From assumption (5), we may simplify to

b_{10} = b_{20} = b_{0}

and

b_{11} = b_{21} = b_{1}

. These values can be estimated during training from labeled samples, but for simplicity, we keep the notation

b_{0}

and

b_{1}

for the computed estimates. The bias correction is implemented as indicated by the following equations (Figure 1 depicts the general scheme):

s_{m}^{b} = \{\begin{matrix} s_{m} - b_{0} i f s_{m} < 0.5 \\ s_{m} + b_{1} i f s_{m} > 0.5 \end{matrix} .

(6)

This can be interpreted as accepting the class selected by the individual classifier as “true” to perform the corresponding correction of the conditional bias. Certainly, the individual classifier will provide both correct and incorrect decisions, depending on its own performance (essentially its operating point given by

P D_{m}

and

P {F A}_{m}

). The following analysis deduces the expected performance of the proposed method, implicitly incorporating the performances of the individual detectors.

First notice that working with the corrected scores by no means modifies the performance of the individual classifiers. This is because the bias is always positive, hence if

s_{m} < 0.5 \Rightarrow s_{m}^{b} = s_{m} - b_{0} < 0.5

and if

s_{m} > 0.5 \Rightarrow s_{m}^{b} = s_{m} + b_{1} > 0.5

. Consider now that the fusion consists of

z^{b} = s_{1}^{b} + s_{2}^{b}

- 2 b_{0} \leq z^{b} \leq 2 + 2 b_{1}

, and compare

z^{b}

with a threshold of 1. Again, if both original scores exceed the threshold of 0.5, then

z^{b} = s_{1} + s_{2} + 2 b_{1} > 1

. But if both original scores are below the threshold of 0.5, then

z^{b} = s_{1} + s_{2} - 2 b_{0} < 1

. Therefore, as with

z

, the potential improvement using

z^{b}

will be obtained in cases where individual classifiers make different decisions, i.e., the following:

z^{b} = s_{1} + s_{2} + b_{1} - b_{0} \{\begin{matrix} w h e r e s_{1} > 0.5 s_{2} < 0.5 \\ w h e r e s_{1} < 0.5 s_{2} > 0.5 \end{matrix} .

(7)

Obviously, if

b_{1} = b_{0} \Rightarrow z^{b} = z

, i.e., the correction of biases does not contribute anything with respect to conventional fusion. However, let us assume that the class to be detected, as defined in (2), is the one with the most biased scores, i.e.,

b_{1} > b_{0} \Rightarrow z^{b} > z

. Therefore, if the correct decision in (7) is

c = 1

, bias correction will contribute to increasing the probability of detection. Certainly, if the correct decision is

c = 0

, bias correction will contribute to increasing the probability of false alarm. But, considering that

b_{1} > b_{0}

(the scores corresponding to

c = 1

are “worst” than the scores corresponding to

c = 0

), disagreement events of the type (7), should be more likely when the true class is

c = 1

. Then, it is expected that the increasing probability of detection will be more “significant” than the increasing probability of false alarm. This rather heuristic argument encourages us to undertake a formal analysis of the procedure, as we do below.

Let us call, respectively,

P D_{z}

and

P F A_{z}

the probability of detection and false alarm of

z = s_{1} + s_{2}

, and

P D_{z^{b}}

and

P F A_{z^{b}}

the probability of detection and false alarm of

z^{b} = s_{1}^{b} + s_{2}^{b}

. Our aim is to deduce how

P D_{z}

and

P F A_{z}

change when bias correction is implemented, i.e., to deduce the relation between

P D_{z^{b}}

and

P D_{z}

, as well as

P F A_{z^{b}}

and

P F A_{z}

. But, as already stated above

z^{b} > z

, which allows us to ensure that

P D_{z^{b}} > P D_{z}

and

P F A_{z^{b}} > P F A_{z}

. The increase in both probabilities will occur due to those cases in which both detectors make a different decision, being

z < 1

and

z^{b} > 1

. Let us express this conclusion in terms of

s_{1}

and

s_{2}

(we define

Δ = b_{1} - b_{0} \Rightarrow 0 \leq Δ \leq 1

).

P D_{z^{b}} = P D_{z} + 2 P D_{z^{b}} = P D_{z} + 2 \Pr (s_{1} + s_{2} + Δ > 1 / s_{1} + s_{2} < 1, s_{1} < 0.5, s_{2} > 0.5, c = 1) P F A_{z^{b}} = P F A_{z} + 2 P {F A}_{z^{b}} = = P F A_{z} + 2 \Pr (s_{1} + s_{2} + Δ > 1 / s_{1} + s_{2} < 1, s_{1} < 0.5, s_{2} > 0.5, c = 0)

(8)

The factor 2 appears because the two events,

s_{1} < 0.5, s_{2} > 0.5,

and

s_{1} > 0.5, s_{2} < 0.5,

are equivalent as far as we have assumed in (5) that

s_{1}

and

s_{2}

have the same class conditional PDFs. Figure 2 helps us to compute the required probabilities in (8). The shadowed area corresponds to the values

s_{1}

and

s_{2}

that satisfy the conditioning in (8); therefore, we have to integrate the corresponding bivariate PDF in that area. From (5) and Figure 2, we can write

\begin{matrix} D_{z^{b}} = P D_{z} + 2 (\int_{0}^{0.5 - Δ} \int_{1 - s_{1} - Δ}^{1 - s_{1}} p_{1} (s_{1}) p_{1} (s_{2}) d s_{1} d s_{2} + \int_{0.5 - Δ}^{0.5} \int_{0.5}^{1 - s_{1}} p_{1} (s_{1}) p_{1} (s_{2}) d s_{1} d s_{2}) 0 \leq Δ \leq 0.5 \\ P D_{z^{b}} = P D_{z} + 2 \int_{0}^{0.5} \int_{0.5}^{1 - s_{1}} p_{1} (s_{1}) p_{1} (s_{2}) d s_{1} d s_{2} 0.5 \leq Δ \leq 1 \\ P F A_{z^{b}} = P F A_{z} + 2 (\int_{0}^{0.5 - Δ} \int_{1 - s_{1} - Δ}^{1 - s_{1}} p_{0} (s_{1}) p_{0} (s_{2}) d s_{1} d s_{2} + \int_{0.5 - Δ}^{0.5} \int_{0.5}^{1 - s_{1}} p_{0} (s_{1}) p_{0} (s_{2}) d s_{1} d s_{2}) 0 \leq Δ \leq 0.5 \\ P F A_{z^{b}} = P F A_{z} + 2 \int_{0}^{0.5} \int_{0.5}^{1 - s_{1}} p_{0} (s_{1}) p_{0} (s_{2}) d s_{1} d s_{2} 0.5 \leq Δ \leq 1 . \end{matrix}

(9)

Thus, the potential improvement of fusion with bias correction depends on how the individual PDFs integrate inside the shadowed area of Figure 2. In principle, it is expected that the bivariate PDF for class

c = 0

would be concentrated in the lower left corner of the square in Figure 2, while the corresponding for class

c = 1

should be concentrated in the upper right corner. On the other hand, the greater the bias, the greater the occupancy of the bivariate PDF along the square, so the integral of

p_{1} (s_{1}) p_{1} (s_{2})

in the shadowed area should be larger than that of

p_{0} (s_{1}) p_{0} (s_{2})

. A more specific conclusion requires knowledge of

p_{1} (\cdot)

and

p_{0} (\cdot)

. Therefore, in the next section, we will consider conditional PDFs to be truncated exponentials. While this is reasonably realistic, the exponential assumption makes the analysis tractable.

Furthermore, in Section 4, we will assume the beta distribution. This is the most common distribution for random variables in the finite domain (0, 1). It covers a wide variety of distributions by adjusting two parameters. Unfortunately, a theoretical analysis for the beta distribution is not feasible, so we resort to Monte Carlo simulations.

3. The Exponential Distribution Case

Let us consider that

p_{0} (s_{m})

and

p_{1} (s_{m})

are, respectively, a truncated exponential and a transformed truncated exponential PDF in the interval (0, 1), namely the following:

p_{0} (s_{m}) = \{\begin{matrix} \frac{λ_{0}}{1 - e^{- λ_{0}}} e^{- λ_{0} s_{m}} 0 \leq s_{m} \leq 1 \\ 0 r e s t \end{matrix} p_{1} (s_{m}) = \{\begin{matrix} \frac{λ_{1}}{1 - e^{- λ_{1}}} e^{- λ_{1} (1 - s_{m})} 0 \leq s_{m} \leq 1 \\ 0 r e s t \end{matrix} .

(10)

These two exponential models, respectively, concentrate more probability as we approach

s_{m} = 0

from the right side or as we approach

s_{m} = 1

from the left side, which seems an expected property of reasonable classifiers. Moreover, according to (1)

b_{0} = E (s_{m} / c = 0) = \frac{1 - e^{- λ_{0}}}{λ_{0}} b_{1} = 1 - E (s_{m} / c = 1) = \frac{1 - e^{- λ_{1}}}{λ_{1}} .

(11)

We have represented in Figure 3a two exponential distributions, where the blue one corresponds to a truncated exponential

p_{0} (s_{m})

with

λ_{0} = 10 \to b_{0} = 0.1

and the red one corresponds to a transformed truncated exponential

p_{1} (s_{m})

with

λ_{1} = 3.2 \to b_{1} = 0.3

. Note that the greater the bias, the slower the decay of the exponentials. Then in Figure 3b, we have represented the contour plots of

p_{0} (s_{1}) p_{0} (s_{2})

(blue) and

p_{1} (s_{1}) p_{1} (s_{2})

(red) considering the conditional PDFs of Figure 3a. We have also marked in Figure 3b the shadowed area from Figure 2 where both bivariate PDFs have to be integrated to calculate the increase in

P D_{z}

and

P F A_{z}

due to bias correction. Clearly, the distribution corresponding to the class with more bias expands more along the square, leading to a greater overlap with the shaded area. Therefore, greater increase in detection probability than in false alarm probability is to be expected. In order to obtain a more precise and quantitative assessment, we have calculated the integrals in the shaded area for this exponential case. Thus, we first calculate in Appendix A the probabilities corresponding to the individual detectors and to the fusion without bias correction. The expressions obtained are

P D = \frac{1 - e^{- λ_{1} 0.5}}{1 - e^{- λ_{1}}}; P F A = e^{- λ_{0} 0.5} \frac{1 - e^{- λ_{0} 0.5}}{1 - e^{- λ_{0}}} P D_{z} = \frac{(1 - e^{- λ_{1}} (1 + λ_{1}))}{{(1 - e^{- λ_{1}})}^{2}}; P F A_{z} = \frac{e^{- λ_{0}} (1 - e^{- λ_{0}} - λ_{0})}{{(1 - e^{- λ_{0}})}^{2}} .

(12)

Then, we also calculate in Appendix A the integrals required in (9) for the computation of

P D_{z^{b}}

and

P F A_{z^{b}}

.

\begin{matrix} P D_{z^{b}} = P D_{z} + 2 (\frac{λ_{1} e^{- λ_{1}}}{{(1 - e^{- λ_{1}})}^{2}} (1 - e^{- λ_{1} Δ}) (0.5 - Δ) + \frac{λ_{1} e^{- λ_{1}}}{{(1 - e^{- λ_{1}})}^{2}} (Δ - \frac{1}{λ_{1}} (1 - e^{- λ_{1} Δ}))) 0 \leq Δ \leq 0.5 \\ P D_{z^{b}} = P D_{z} + 2 (\frac{λ_{1} e^{- λ_{1}}}{{(1 - e^{- λ_{1}})}^{2}} (0.5 - \frac{1}{λ_{1}} (1 - e^{- λ_{1} 0.5}))) 0.5 \leq Δ \leq 1 \\ P F A_{z^{b}} = P F A_{z} + 2 (\frac{λ_{0} e^{- λ_{0}}}{{(1 - e^{- λ_{0}})}^{2}} (e^{λ_{0} Δ} - 1) (0.5 - Δ) + \frac{λ_{0} e^{- λ_{0}}}{{(1 - e^{- λ_{0}})}^{2}} (\frac{1}{λ_{0}} (e^{λ_{0} Δ} - 1) - Δ)) 0 \leq Δ \leq 0.5 \\ P F A_{z^{b}} = P F A_{z} + 2 (\frac{λ_{0} e^{- λ_{0}}}{{(1 - e^{- λ_{0}})}^{2}} (\frac{1}{λ_{0}} (e^{λ_{0} 0.5} - 1) - 0.5)) 0.5 \leq Δ \leq 1 . \end{matrix}

(13)

Using the previous expressions, we can calculate the balanced accuracy

(B A)

, defined as the mean of the “sensitivity” (equivalent to probability of detection) and the “specificity” (equivalent to 1 minus the probability of false alarm), namely the following:

B A = \frac{1}{2} (P D + 1 - P F A) = 1 - (\frac{1}{2} (1 - P D) + \frac{1}{2} P F A) = 1 - P E B A_{z} = \frac{1}{2} (P D_{z} + 1 - P F A_{z}) = 1 - (\frac{1}{2} (1 - P D_{z}) + \frac{1}{2} P F A_{z}) = 1 - P E_{z} B A_{z^{b}} = \frac{1}{2} (P D_{z^{b}} + 1 - P F A_{z^{b}}) = 1 - (\frac{1}{2} (1 - P D_{z^{b}}) + \frac{1}{2} P F A_{z^{b}}) = 1 - P E_{z^{b}},

(14)

where

P E, P E_{z}

, and

P E_{z^{b}}

are the probabilities of error assuming equal priors

P_{0} = P_{1} = \frac{1}{2}

. Then, we show in Figure 3c the balanced accuracies for

b_{0} = 0.05

and

0.05 \leq b_{1} \leq 0.35 \Leftrightarrow 0 \leq Δ \leq 0.30

. As expected, the balanced accuracy decreases with increasing

Δ

, but is always higher when bias correction is applied. In fact, the improvement achieved compared to individual classifiers, as well as conventional fusion, is greater the larger

Δ

is.

4. The Beta Distribution Case

In the previous section, we carried out a theoretical study, made possible by the analytical tractability of the exponential distribution. However, the most common distribution for random variables in the finite domain (0, 1) is the beta distribution. This distribution is defined by two parameters,

α > 0

and

β > 0

, which allow for a wide variety of PDFs to be considered. Unfortunately, its analytical expression prevents a theoretical study like the previous one, so we will resort to Monte Carlo simulations to evaluate the effectiveness of the bias correction. Let us consider that

p_{0} (s_{m})

is a beta-PDF:

p_{0} (s_{m}) = \frac{s_{m}^{α_{0} - 1} {(1 - s_{m})}^{β_{0} - 1}}{Β (α_{0}, β_{0})}

where

Β (α_{0}, β_{0})

is the beta-function, and that

p_{1} (s_{m}) = \frac{{(1 - s_{m})}^{α_{1} - 1} s_{m}^{β_{1} - 1}}{Β (α_{1}, β_{1})}

, which also corresponds to a beta-PDF but with interchanged parameters. Among the wide variety of beta-PDFs, we have selected some that seem appropriate for the case at hand. This was carried out by first setting a value

α_{0} = α_{1} = α

in

p_{0} (s_{m})

and

p_{1} (s_{m})

, and then calculating the values

β_{0}

and

β_{1}

which, respectively, give biases

b_{0}

and

b_{1}

. Notice that

β_{i} = α_{i} \frac{1 - b_{i}}{b_{i}}

. Thus, in Figure 4a(i–iii), we have represented beta-PDFs, respectively, corresponding to

α = 0.4

,

α = 0.7

, and

α = 1.0

. In all three cases, the values

β_{0}

and

β_{1}

have been calculated to give biases

b_{0} = 0.1

and

b_{1} = 0.3

. Note that each

α

parameter defines a different type of PDF, in any case appropriate to concentrate more probability to the left (

p_{0} (s_{m})

) or to the right (

p_{1} (s_{m})

) of the threshold of 0.5. As in Figure 3b, we have represented in Figure 4b(i–iii) the contour plots of

p_{0} (s_{1}) p_{0} (s_{2})

(blue) and

p_{1} (s_{1}) p_{1} (s_{2})

(red) considering, respectively, the conditional PDFs of Figure 4a(i–iii). We have marked in Figure 4b(i–iii) the shadowed area from Figure 2 where both bivariate PDFs have to be integrated to calculate the increase in

P D_{z}

and

P F A_{z}

due to bias correction. Once again, the distribution corresponding to the class

c = 1

expands more along the square, leading to a greater overlap with the shaded area. Then, we performed Monte Carlo simulations for the three

α

values, a unique value

b_{0} = 0.05

for class

c = 0

and a range of values

0.05 \leq b_{1} \leq 0.35

for class

c = 1

, so that

0 \leq Δ \leq 0.30

. We have generated 10,000 samples from every required beta-PDF. Then, we have computed the balanced accuracies using (14), considering the MAP detector (2) for the individual scores, for the sum of the original scores and for the sum of the bias-corrected scores. Correction was implemented as indicated in (6) with sample estimates of the conditional biases

b_{0}

and

b_{1}

. The results are shown in Figure 4c(i–iii). Similarly to the exponential case, an increase in bias reduces the balanced accuracy, while correcting for bias always produces the best result, with the relative improvement increasing with

Δ

.

5. A Real Data Case

This section is devoted to evaluating the effectiveness of conditional bias correction in an automatic biomedical signal analysis problem to facilitate diagnosis. Two signal modalities and two classification methods will be considered. As we will see, this will open up the opportunity for 10 possible options, including unimodal classifiers and fusion of scores from two different modalities and/or two different classification methods.

The goal is to develop an automatic classifier for arousals [17,18] during sleep, as their frequency is correlated with the presence of apnea and epilepsy. An expert physician, through a thorough analysis of so-called polysomnograms, usually performs this classification manually: a set of different types of biomedical signals recorded during monitoring of the patient’s sleep. The record of arousal detections is called hypnogram. It is of obvious interest to use automatic hypnograms to alleviate the tedious and subjective task of the physician.

The polysomnograms were obtained from the public database Physionet [19]. In this experiment, we will consider the EEG and ECG signals recorded synchronously during patient sleep. Then, 8 EEG features and 12 ECG features were obtained, respectively, from the EEG and ECG signals in non-overlapped intervals of 30 sec. A total number of 10 patients were considered. The number of intervals varies for every patient, ranging from 749 (6 h and 14 min) to 925 (7 h and 42 min). The eight EEG features were powers in the frequency bands delta (0–4 Hz), theta (5–7 Hz), alpha (8–12 Hz), sigma (13–15 Hz), and beta (16–30 Hz), and three Hjorth parameters: activity, mobility, and complexity. These features are routinely used in the analysis of EEG signals [20,21,22]. The 12 ECG features were the following: autoregressive coefficients (4), Shannon’s entropy maximal overlap discrete wavelet packet transform at level four (4), and multiscale wavelet variance estimates up to fourth order using a Daubechies wavelet (4). More details on how to compute these features may be found in [23,24,25,26].

In this problem, class 1 corresponds to arousal and class 0 corresponds to any of the other possible sleep stages. We have considered two different classifiers. The first one is the Gaussian Bayes (GB), a generative classifier which assumes general Gaussian models for both

p (x / c = 1)

and

p (x / c = 0)

, then the scores are calculated using

s = \frac{e^{- \frac{1}{2} ({(x - μ_{1})}^{T} C_{1}^{- 1} (x - μ_{1}))}}{e^{- \frac{1}{2} ({(x - μ_{1})}^{T} C_{1}^{- 1} (x - μ_{1}))} + e^{- \frac{1}{2} ({(x - μ_{0})}^{T} C_{0}^{- 1} (x - μ_{0}))}},

(15)

where

μ_{0}, μ_{1}, C_{0}, C_{1}

are the mean vectors and covariance matrices of the feature vectors, which have been estimated from the training data using maximum likelihood (ML) estimates. The second one is Logistic Regression, a discriminative method which computes the score using

s = \frac{1}{1 + e^{- w^{T} x}} .

(16)

There are different options for fitting the coefficient vector

w

; in our case, we opted for a closed form solution by solving an overdetermined system of linear equations. There exists an equation of the form

w^{T} x = \nabla

for each instance

x

of the training set, where

\nabla

is a large positive number if

x

belongs to

c = 1

or a large (magnitude) negative number if

x

belongs to

c = 0

. The classifiers were trained separately for every patient using the first half of the respective EEG or ECG recordings. Training includes the calculation of the conditional bias estimates

{\hat{b}}_{0}

and

{\hat{b}}_{1}

required by the proposed correction algorithm (5). The second half was then used for testing. The scores of all patients were grouped by class. Thus, each classifier generated a total of 6706 class 0 scores and 1708 class 1 scores.

We have considered ten options. The first four correspond to one modality–one classifier. The next two also consider one modality but the two classifiers are fused. Finally, the last four correspond to the fusion of the two modalities, either with the same classifier or a different one. Fusion is always implemented by simply computing the mean score.

Thus, Table 1 shows the results of the 10 options (numbered from 1 to 10) without applying bias correction. The conditional bias is shown as well as the balanced accuracy. Notice the bias is significantly higher for class 1 and that fusion does not reduce it (in fact, the bias after fusion is the mean bias). Also notice that fusion of GB + LR for one modality does not improve over using a single classifier for one modality. This is because of the high correlation between the scores of the same modality, as discussed later. The best balanced accuracy in Table 1 is obtained by option 9, which is the fusion of EEG GB + ECG LR, although the 83.02% obtained is only slightly greater than the 82.55% of the single modality EEG + GB.

Then, Table 2 shows the results corresponding to conditional bias correction. Notice that bias is clearly reduced in all options. The mean factor of reduction is 3 in class 0 and 2.5 in class 1. On the other hand, the balanced accuracy stays the same for the first four options as bias correction does not affect it when operating with individual detectors. However, the balanced accuracy increases in each of the six fusion options when compared with the corresponding option without bias correction. The best balanced accuracy is provided by option 10, which is very close to option 7. An improvement of 4.63% is achieved compared to the best case of Table 2.

To achieve statistical validation, we have repeated the previous experiments 100 times. In each repetition, a partition of the data set into two distinct halves was considered. In this way, we have obtained the mean and standard deviation of all the values shown in Table 1 and Table 2. The mean values are similar to the data in Table 1 and Table 2. The standard deviations never exceed 3% of the mean values.

We also performed significance tests on the balance accuracies obtained by bias correction in options 5 to 10, characterizing the distribution of the null hypothesis with the 100 balance accuracy values obtained with the corresponding option without bias correction. In all cases, the p-value was less than 0.05, so the null hypothesis is rejected, and we can consider that the balance accuracies from the bias correction are statistically significant.

For better visualization, we have represented all the balanced accuracies from Table 1 and Table 2 together in Figure 5. It is clear that the fusion, in general, improves with respect to the use of a single detector; furthermore, the fusion of modalities improve with respect to the fusion of methods and fusion with bias correction improves with respect to fusion without bias correction.

The improved results of modality fusion can be explained in terms of the correlation between the fused scores. Note that in the previous theoretical developments, as well as in the simulations, we assumed uncorrelated scores. However, in real-world application, a certain degree of correlation can be expected, which will result in some deterioration in performance. It is expected that the correlation level between scores from different modalities should be lower than that from different methods applied to the same modality. This is because, if the modality is the same, the feature vector at the detector input will be the same regardless of the method used, resulting in a high correlation between the output scores. Changing the modality will reduce the output score correlation. This is shown in Table 3, where the correlation coefficients between scores for class 0 and class 1 are indicated. The minimum correlation coefficient in both classes corresponds to EEG-LR and ECG-GB, which coincides with the maximum balanced accuracy in Table 2.

In the experiments with real data in this section, as well as in the analyses and simulations in Section 3 and Section 4, we have always considered the MAP detector (threshold 0.5). This is convenient for potential extensions to the multi-class case, but it is not essential in the two-class case, where we could vary the threshold to adjust different operating points. In Table 4, we reproduce the balanced accuracy results obtained with different thresholds for the 10 options from Table 1 and Table 2. A practical range of thresholds between 0.4 and 0.7 was selected, avoiding excessive false alarms for lower thresholds and low detectability for higher thresholds. We observe that the results maintain the improvement from bias correction and are consistent with what was observed in the previous results.

Thus, bias corrections do not affect individual detectors, while fusion with bias correction always provides improvement compared to fusion without correction. It is also observed that the relative improvement between correction and no correction increases as the threshold rises. This is explainable since the class with the highest bias was chosen as the class to detect, which means that corrections tend to raise the score values, with the impact of the correction becoming greater as the threshold increases.

6. Discussion

Let us discuss in terms of first principles. The optimum detector given

s_{1}

and

s_{2}

imply computation of the likelihood ratio

\frac{p_{1} (s_{1}, s_{2})}{p_{0} (s_{1}, s_{2})}

, which is to be compared with a given threshold. Unfortunately, most of the time, this only has theoretical interest because knowledge of the likelihood ratio is far from being realistic. Simple rules like computing the mean score (the one considered in this work) are much more useful yet reasonably grounded if the scores are considered as posterior probability of class

c = 1

. However, it seems tentative to incorporate in the fusion rule any knowledge about the conditional PDFs which could be learned during training. Notice that assuming independence it is

p_{i} (s_{1}, s_{2}) = p_{i} (s_{1}) p_{i} (s_{2})

, and biases

b_{0}

and

b_{1}

are respective statistics of

p_{0} (s_{1})

and

p_{1} (s_{1})

that are estimated during training and incorporated in the proposed method to improve the mean rule. Essentially, the method helps to resolve the situation in which both detectors make different decisions and the mean is below the threshold. This is achieved by increasing the value of the fusion statistic (the sum in our case) by the difference between the bias of the class to be detected and the bias of the other class. By defining the class to be detected as the one with the greatest bias, the difference will always be positive, which will increase both the probability of detection and the probability of false alarm. However, it is expected that the increase in detection probability will be more significant than the increase in false alarm probability, due to the fact that class

c = 1

has a greater bias, which implies uncorrected scores closer to the 0.5 threshold than those of class

c = 0

.

An interesting point to discuss is that the analyses and simulations in Section 2, Section 3 and Section 4 assumed independence between the scores to be fused. To overcome this limitation, we should incorporate a score joint probability density model. Unfortunately, this could make the necessary analysis and/or simulations intractable. Notice that the chosen dependence model would still be an approximation, not necessarily appropriate for a real-world problem. Recently, it has been formally shown that the existence of correlation is one of the factors that can reduce the effectiveness of the fusion [16], so, in this sense, we can consider that the conclusions of Section 2, Section 3 and Section 4 set an upper limit on the performance achievable with the proposed method. Notice that in the real-world application of Section 5, the scores to be fused exhibit relatively high correlation values (Table 3), yet the bias correction proved effective, especially when the correlation is lower, as expected. Clearly, it would be valuable to investigate methods that reduce the correlation between the scores to be fused. For example, if different classifiers are involved in the fusion, different training subsets could be used for each classifier. Similarly, subsets recorded at different times for different modalities could be used for training. In any case, it would be advisable to analyze whether the improvement due to decorrelation outweighs the worsening due to the reduction in the effective size because of the partition of the whole training set into smaller subsets. Another approximation to the dependence problem could be devising more complex fusion methods that account for the possible correlation to obtain optimum solutions [16,27].

A line of research that also deserves further investigation concerns extending the method to the multi-class case. This extension is not always obvious in the field of machine learning. Frequently, the extension is carried out by decomposing the multi-class problem into a number of two-class problems. This approach can also be applied here, with everything considered in Section 2, Section 3 and Section 4 being directly applicable. Obviously, a global multi-class approach is always desirable. To this end, note that in the multi-class problem, each classifier generates as many scores as there are classes, which must satisfy the constraint of summing to 1. From training data, it is possible to determine the conditional biases of each of these scores, keeping in mind that the value to be estimated is 1 for the correct class and 0 for the rest of the classes. Corrections would be made, as in the two-class case, the class with the highest score is considered to be the “correct” one. One point to consider is that the corrected scores should sum to 1. We could, for example, correct all scores except 1, and adjust the uncorrected score so that they all sum to 1. In any case, extending the analyses in Section 3 and Section 4 to the multi-class case is not straightforward and requires significant additional effort.

Other possible lines of interest involve making a “soft” correction, for example, proportional to the score, thus taking into account the reliability of the decision made by the individual detector on the correct class.

Finally, other real-world data applications could be topics of future research.

7. Conclusions

Regarding the conclusions, we have shown that conditional bias is always greater than zero and that bias correction improves accuracy. However, we face an ill-posed problem since, strictly speaking, conditional bias correction requires prior knowledge of the true class. Nevertheless, we have shown that corrections can be made to the scores provided by each individual detector before fusion, using its own detections as true. Conditions for improvements have been derived theoretically, first in general and then for the exponential distribution case. Interest has also been demonstrated for the versatile beta distribution case, resorting to Monte Carlo simulations. Beta distribution is particularly well-suited for normalized random variables and allows for very different shapes of the class conditional PDF. Finally, a real data case was considered. We observed that the proposed method provides the best results, despite the different (more realistic) conditions of the real data with respect to the simplifying assumptions of the prior analysis.

Author Contributions

Conceptualization, L.V. and A.S.; methodology, L.V. and A.S.; software, L.V. and A.S.; validation, A.S. and L.V.; formal analysis, L.V.; investigation, L.V. and A.S.; resources, A.S.; data curation, A.S.; writing—original draft preparation, L.V.; writing—review and editing, A.S. and L.V.; supervision, L.V.; project administration, L.V.; funding acquisition, L.V. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the Generalitat Valenciana, Spain under Grant CIPROM/2022/20 and Agencia Estatal de Investigación, Spain under Grant PID2024-161353OB-I00.

Data Availability Statement

The data presented in this study are available on request from the corresponding authors.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Let us first calculate the

P D

and

P F A

corresponding to the individual classifiers

P D = \Pr (s_{m} > 0.5 / c = 1) = \int_{0.5}^{1} \frac{λ_{1}}{1 - e^{- λ_{1}}} e^{- λ_{1} (1 - s_{m})} d s_{m} = \frac{λ_{1} e^{- λ_{1}}}{1 - e^{- λ_{1}}} ({\frac{e^{λ_{1} s_{m}}}{λ_{1}}|}_{0.5}^{1}) = = \frac{e^{- λ_{1}}}{1 - e^{- λ_{1}}} (e^{λ_{1}} - e^{λ_{1} 0.5}) = \frac{1 - e^{- λ_{1} 0.5}}{1 - e^{- λ_{1}}}

(A1)

P F A = \Pr (s_{m} > 0.5 / c = 0) = \int_{0.5}^{1} \frac{λ_{0}}{1 - e^{- λ_{0}}} e^{- λ_{1} s_{m}} d s_{m} = \frac{λ_{0}}{1 - e^{- λ_{1}}} ({\frac{e^{- λ_{0} s_{m}}}{- λ_{0}}|}_{0.5}^{1}) = = \frac{1}{1 - e^{- λ_{0}}} (e^{- λ_{0} 0.5} - e^{- λ_{0}}) = e^{- λ_{0} 0.5} \frac{1 - e^{- λ_{0} 0.5}}{1 - e^{- λ_{0}}}

(A2)

Now, let us calculate

P D_{z}

and

P F A_{z}

. We have to compute the integral of the corresponding bivariate PDF in the upper triangle of Figure 2, delimited below by the straight line

s_{2} = 1 - s_{1}

, hence

P D_{z} = \Pr (s_{1} + s_{2} > 1 / c = 1) = \int_{0}^{1} \int_{1 - s_{1}}^{1} \frac{λ_{1}}{1 - e^{- λ_{1}}} e^{- λ_{1} (1 - s_{1})} \frac{λ_{1}}{1 - e^{- λ_{1}}} e^{- λ_{1} (1 - s_{2})} d s_{1} d s_{2} = {(\frac{λ_{1} e^{- λ_{1}}}{1 - e^{- λ_{1}}})}^{2} \int_{0}^{1} e^{λ_{1} s_{1}} (\int_{1 - s_{1}}^{1} e^{λ_{1} s_{2}} d s_{2}) d s_{1} = = {(\frac{λ_{1} e^{- λ_{1}}}{1 - e^{- λ_{1}}})}^{2} \int_{0}^{1} e^{λ_{1} s_{1}} ({\frac{e^{λ_{1} s_{2}}}{λ_{1}}|}_{1 - s_{1}}^{1}) d s_{1} = {(\frac{λ_{1} e^{- λ_{1}}}{1 - e^{- λ_{1}}})}^{2} \frac{1}{λ_{1}} \int_{0}^{1} e^{λ_{1} s_{1}} (e^{λ_{1}} - e^{λ_{1} (1 - s_{1})}) d s_{1} = = {(\frac{λ_{1} e^{- λ_{1}}}{1 - e^{- λ_{1}}})}^{2} \frac{1}{λ_{1}} e^{λ_{1}} \int_{0}^{1} (e^{λ_{1} s_{1}} - 1) d s_{1} = \frac{λ_{1} e^{- λ_{1}}}{{(1 - e^{- λ_{1}})}^{2}} (({\frac{e^{λ_{1} s_{1}}}{λ_{1}}|}_{0}^{1}) - 1) = \frac{λ_{1} e^{- λ_{1}}}{{(1 - e^{- λ_{1}})}^{2}} (\frac{e^{λ_{1}} - 1}{λ_{1}} - 1) = \frac{(1 - e^{- λ_{1}} (1 + λ_{1}))}{{(1 - e^{- λ_{1}})}^{2}},

(A3)

and

P F A_{z} = \Pr (s_{1} + s_{2} > 1 / c = 0) = \int_{0}^{1} \int_{1 - s_{1}}^{1} \frac{λ_{0}}{1 - e^{- λ_{0}}} e^{- λ_{0} s_{1}} \frac{λ_{0}}{1 - e^{- λ_{0}}} e^{- λ_{0} s_{2}} d s_{1} d s_{2} = {(\frac{λ_{0}}{1 - e^{- λ_{0}}})}^{2} \int_{0}^{1} e^{- λ_{0} s_{1}} (\int_{1 - s_{1}}^{1} e^{- λ_{0} s_{2}} d s_{2}) d s_{1} = = {(\frac{λ_{0}}{1 - e^{- λ_{0}}})}^{2} \int_{0}^{1} e^{- λ_{0} s_{1}} ({\frac{e^{- λ_{0} s_{2}}}{- λ_{0}}|}_{1 - s_{1}}^{1}) d s_{1} = \frac{λ_{0}}{{(1 - e^{- λ_{0}})}^{2}} \int_{0}^{1} e^{- λ_{0} s_{1}} (e^{- λ_{0}} - e^{- λ_{0} (1 - s_{1})}) d s_{1} = = \frac{λ_{0} e^{- λ_{0}}}{{(1 - e^{- λ_{0}})}^{2}} \int_{0}^{1} (e^{- λ_{0} s_{1}} - 1) d s_{1} = \frac{λ_{0} e^{- λ_{0}}}{{(1 - e^{- λ_{0}})}^{2}} (({\frac{e^{- λ_{0} s_{1}}}{- λ_{0}}|}_{0}^{1}) - 1) = \frac{λ_{0} e^{- λ_{0}}}{{(1 - e^{- λ_{0}})}^{2}} (\frac{1 - e^{- λ_{0}}}{λ_{0}} - 1) = \frac{e^{- λ_{0}} (1 - e^{- λ_{0}} - λ_{0})}{{(1 - e^{- λ_{0}})}^{2}}

(A4)

Finally, from (9), we can calculate

P D_{z^{b}}

and

P F A_{z^{b}}

. Assuming for the moment that

Δ < 0.5

, we can write

P D_{z^{b}} = P D_{z} + 2 (\underset{v_{P D z}}{\underset{⏟}{\int_{0}^{0.5 - Δ} \int_{1 - s_{1} - Δ}^{1 - s_{1}} p_{1} (s_{1}) p_{1} (s_{2}) d s_{1} d s_{2}}} + \underset{η_{P D z}}{\underset{⏟}{\int_{0.5 - Δ}^{0.5} \int_{0.5}^{1 - s_{1}} p_{1} (s_{1}) p_{1} (s_{2}) d s_{1} d s_{2}}}),

(A5)

where

v_{P D z} = \int_{0}^{0.5 - Δ} \int_{1 - s_{1} - Δ}^{1 - s_{1}} \frac{λ_{1}}{1 - e^{- λ_{1}}} e^{- λ_{1} (1 - s_{1})} \frac{λ_{1}}{1 - e^{- λ_{1}}} e^{- λ_{1} (1 - s_{2})} d s_{1} d s_{2} = {(\frac{λ_{1} e^{- λ_{1}}}{1 - e^{- λ_{1}}})}^{2} \int_{0}^{0.5 - Δ} e^{λ_{1} s_{1}} (\int_{1 - s_{1} - Δ}^{1 - s_{1}} e^{λ_{1} s_{2}} d s_{2}) d s_{1} = = {(\frac{λ_{1} e^{- λ_{1}}}{1 - e^{- λ_{1}}})}^{2} \int_{0}^{0.5 - Δ} e^{λ_{1} s_{1}} ({\frac{e^{λ_{1} s_{2}}}{λ_{1}}|}_{1 - s_{1} - Δ}^{1 - s_{1}}) d s_{1} = {(\frac{λ_{1} e^{- λ_{1}}}{1 - e^{- λ_{1}}})}^{2} \frac{1}{λ_{1}} \int_{0}^{0.5 - Δ} e^{λ_{1} s_{1}} (e^{λ_{1} (1 - s_{1})} - e^{λ_{1} (1 - s_{1} - Δ)}) d s_{1} = = {(\frac{λ_{1} e^{- λ_{1}}}{1 - e^{- λ_{1}}})}^{2} \frac{1}{λ_{1}} e^{λ_{1}} \int_{0}^{0.5 - Δ} (1 - e^{- λ_{1} Δ}) d s_{1} = \frac{λ_{1} e^{- λ_{1}}}{{(1 - e^{- λ_{1}})}^{2}} (1 - e^{- λ_{1} Δ}) (0.5 - Δ),

(A6)

\begin{matrix} η_{P D z} = \int_{0.5 - Δ}^{0.5} \int_{0.5}^{1 - s_{1}} \frac{λ_{1}}{1 - e^{- λ_{1}}} e^{- λ_{1} (1 - s_{1})} \frac{λ_{1}}{1 - e^{- λ_{1}}} e^{- λ_{1} (1 - s_{2})} d s_{1} d s_{2} = {(\frac{λ_{1} e^{- λ_{1}}}{1 - e^{- λ_{1}}})}^{2} \int_{0.5 - Δ}^{0.5} e^{λ_{1} s_{1}} (\int_{0.5}^{1 - s_{1}} e^{λ_{1} s_{2}} d s_{2}) d s_{1} = \\ = {(\frac{λ_{1} e^{- λ_{1}}}{1 - e^{- λ_{1}}})}^{2} \int_{0.5 - Δ}^{0.5} e^{λ_{1} s_{1}} ({\frac{e^{λ_{1} s_{2}}}{λ_{1}}|}_{0.5}^{1 - s_{1}}) d s_{1} = {(\frac{λ_{1} e^{- λ_{1}}}{1 - e^{- λ_{1}}})}^{2} \frac{1}{λ_{1}} \int_{0.5 - Δ}^{0.5} e^{λ_{1} s_{1}} (e^{λ_{1} (1 - s_{1})} - e^{λ_{1} 0.5}) d s_{1} = \\ = {(\frac{λ_{1} e^{- λ_{1}}}{1 - e^{- λ_{1}}})}^{2} \frac{e^{λ_{1}}}{λ_{1}} \int_{0.5 - Δ}^{0.5} (1 - e^{λ_{1} (s_{1} - 0.5)}) d s_{1} = \frac{λ_{1} e^{- λ_{1}}}{{(1 - e^{- λ_{1}})}^{2}} (Δ - ({\frac{e^{λ_{1} (s_{1} - 0.5)}}{λ_{1}}|}_{0.5 - Δ}^{0.5})) = \\ = \frac{λ_{1} e^{- λ_{1}}}{{(1 - e^{- λ_{1}})}^{2}} (Δ - \frac{1}{λ_{1}} (1 - e^{- λ_{1} Δ})) \end{matrix}

(A7)

and

P F A_{z^{b}} = P F A_{z} + 2 (\underset{v_{P F A z}}{\underset{⏟}{\int_{0}^{0.5 - Δ} \int_{1 - s_{1} - Δ}^{1 - s_{1}} p_{0} (s_{1}) p_{0} (s_{2}) d s_{1} d s_{2}}} + \underset{η_{P F A z}}{\underset{⏟}{\int_{0.5 - Δ}^{0.5} \int_{0.5}^{1 - s_{1}} p_{0} (s_{1}) p_{0} (s_{2}) d s_{1} d s_{2}}}) .

(A8)

where

\begin{matrix} v_{P F A z} = \int_{0}^{0.5 - Δ} \int_{1 - s_{1} - Δ}^{1 - s_{1}} \frac{λ_{0}}{1 - e^{- λ_{0}}} e^{- λ_{0} s_{1}} \frac{λ_{0}}{1 - e^{- λ_{0}}} e^{- λ_{0} s_{2}} d s_{1} d s_{2} = {(\frac{λ_{0}}{1 - e^{- λ_{0}}})}^{2} \int_{0}^{0.5 - Δ} e^{- λ_{0} s_{1}} (\int_{1 - s_{1} - Δ}^{1 - s_{1}} e^{- λ_{0} s_{2}} d s_{2}) d s_{1} = \\ = {(\frac{λ_{0}}{1 - e^{- λ_{0}}})}^{2} \int_{0}^{0.5 - Δ} e^{- λ_{0} s_{1}} ({\frac{e^{- λ_{0} s_{2}}}{- λ_{0}}|}_{1 - s_{1} - Δ}^{1 - s_{1}}) d s_{1} = {(\frac{λ_{0}}{1 - e^{- λ_{0}}})}^{2} \frac{1}{λ_{0}} \int_{0}^{0.5 - Δ} e^{- λ_{0} s_{1}} (e^{- λ_{0} (1 - s_{1} - Δ)} - e^{- λ_{0} (1 - s_{1})}) d s_{1} = \\ = {(\frac{λ_{0}}{1 - e^{- λ_{0}}})}^{2} \frac{e^{- λ_{0}}}{λ_{0}} \int_{0}^{0.5 - Δ} (e^{λ_{0} Δ} - 1) d s_{1} = \frac{λ_{0} e^{- λ_{0}}}{{(1 - e^{- λ_{0}})}^{2}} (e^{λ_{0} Δ} - 1) (0.5 - Δ) . \end{matrix}

(A9)

\begin{matrix} η_{P F A z} = \int_{0.5 - Δ}^{0.5} \int_{0.5}^{1 - s_{1}} \frac{λ_{0}}{1 - e^{- λ_{0}}} e^{- λ_{0} s_{1}} \frac{λ_{0}}{1 - e^{- λ_{0}}} e^{- λ_{0} s_{2}} d s_{1} d s_{2} = {(\frac{λ_{0}}{1 - e^{- λ_{0}}})}^{2} \int_{0.5 - Δ}^{0.5} e^{- λ_{0} s_{1}} (\int_{0.5}^{1 - s_{1}} e^{- λ_{0} s_{2}} d s_{2}) d s_{1} = \\ = {(\frac{λ_{0}}{1 - e^{- λ_{0}}})}^{2} \int_{0.5 - Δ}^{0.5} e^{- λ_{0} s_{1}} ({\frac{e^{- λ_{0} s_{2}}}{- λ_{0}}|}_{0.5}^{1 - s_{1}}) d s_{1} = {(\frac{λ_{0}}{1 - e^{- λ_{0}}})}^{2} \frac{1}{λ_{0}} \int_{0.5 - Δ}^{0.5} e^{- λ_{0} s_{1}} (e^{- λ_{0} 0.5} - e^{- λ_{0} (1 - s_{1})}) d s_{1} = \\ = {(\frac{λ_{0}}{1 - e^{- λ_{0}}})}^{2} \frac{e^{- λ_{0}}}{λ_{0}} \int_{0.5 - Δ}^{0.5} (e^{- λ_{0} (s_{1} - 0.5)} - 1) d s_{1} = \frac{λ_{0} e^{- λ_{0}}}{{(1 - e^{- λ_{0}})}^{2}} (({\frac{e^{- λ_{0} (s_{1} - 0.5)}}{- λ_{0}}|}_{0.5 - Δ}^{0.5}) - Δ) = \\ = \frac{λ_{0} e^{- λ_{0}}}{{(1 - e^{- λ_{0}})}^{2}} (\frac{1}{λ_{0}} (e^{λ_{0} Δ} - 1) - Δ) . \end{matrix}

(A10)

Finally, for

Δ \geq 0.5

,

v_{P D z}

and

v_{P F A z}

vanish and

η_{P D z}

and

η_{P F A z}

saturates to

η_{P D z} = \int_{0}^{0.5} \int_{0.5}^{1 - s_{1}} \frac{λ_{1}}{1 - e^{- λ_{1}}} e^{- λ_{1} (1 - s_{1})} \frac{λ_{1}}{1 - e^{- λ_{1}}} e^{- λ_{1} (1 - s_{2})} d s_{1} d {s_{2}}^{2} = = \frac{λ_{1} e^{- λ_{1}}}{{(1 - e^{- λ_{1}})}^{2}} (0.5 - \frac{1}{λ_{1}} (1 - e^{- λ_{1} 0.5})) η_{P F A z} = \int_{0}^{0.5} \int_{0.5}^{1 - s_{1}} \frac{λ_{0}}{1 - e^{- λ_{0}}} e^{- λ_{0} s_{1}} \frac{λ_{0}}{1 - e^{- λ_{0}}} e^{- λ_{0} s_{2}} d s_{1} d s_{2} = \frac{λ_{0} e^{- λ_{0}}}{{(1 - e^{- λ_{0}})}^{2}} (\frac{1}{λ_{0}} (e^{λ_{0} 0.5} - 1) - 0.5)

(A11)

References

Ruta, D.; Gabrys, B. An overview of classifier fusion methods. Comput. Inf. Syst. 2000, 7, 1–10. [Google Scholar]
Tulyakov, S.; Jaeger, S.; Govindaraju, V.; Doermann, D. Review of classifier combination methods. Stud. Comput. Intell. 2008, 90, 361–386. [Google Scholar]
Ross, A.; Nandakumar, K. Fusion, Score-Level, Encyclopedia of Biometrics; Springer: New York, NY, USA, 2009; pp. 611–616. [Google Scholar]
Mohandes, M.; Deriche, M.; Aliyu, S. Classifiers Combination Techniques: A Comprehensive Review. IEEE Access 2018, 6, 19626–19639. [Google Scholar] [CrossRef]
Heydarian, H.; Adam, M.T.P.; Burrows, T.L.; Rollo, M.E. Exploring Score-Level and Decision-Level Fusion of Inertial and Video Data for Intake Gesture Detection. IEEE Access 2015, 13, 643–655. [Google Scholar] [CrossRef]
Mi, A.; Wang, L.; Qi, J. A Multiple Classifier Fusion Algorithm Using Weighted Decision Templates. Sci. Program. 2016, 3, 3943859. [Google Scholar] [CrossRef]
Pereira, L.M.; Salazar, A.; Vergara, L. A comparative analysis of early and late fusion for the multimodal two-class problem. IEEE Access 2023, 11, 84283–84300. [Google Scholar] [CrossRef]
Atrey, P.K.; Hossain, M.A.; El Saddik, A.; Kankanhalli, M.S. Multimodal fusion for multimedia analysis: A survey. Multimed. Syst. 2010, 16, 345–379. [Google Scholar] [CrossRef]
Lahat, D.; Adali, T.; Jutten, C. Multimodal Data Fusion: An Overview of Methods, Challenges, and Prospects. Proc. IEEE 2015, 103, 1449–1477. [Google Scholar] [CrossRef]
Pawlowski, M.; Wróblewska, A.; Sysko-Romaczuk, S. Effective Techniques for Multimodal Data Fusion: A Comparative Analysis. Sensors 2023, 23, 2381. [Google Scholar] [CrossRef]
Barua, A.; Ahmed, M.U.; Begum, S. A Systematic Literature Review on Multimodal Machine Learning: Applications, Challenges, Gaps and Future Directions. IEEE Access 2023, 11, 14804–14831. [Google Scholar] [CrossRef]
Hanzelik, P.P.; Kummer, A.; Abonyi, J. Data Reconciliation-Based Hierarchical Fusion of Machine Learning Models. Mach. Learn. Knowl. Extr. 2024, 6, 2601–2617. [Google Scholar] [CrossRef]
Tiwari, A.; Shukla, R.; Tiwari, S. Alzheimer’s Disease Detection from Fused PET and MRI Modalities Using an Ensemble Classifier. Mach. Learn. Knowl. Extr. 2023, 5, 512–538. [Google Scholar] [CrossRef]
Zhang, L.; Li, T.; Cui, H.; Zhang, Q.; Jiang, Z.; Li, J.; Welsch, R.E.; Jia, Z. A Novel Prediction Model for Multimodal Medical Data Based on Graph Neural Networks. Mach. Learn. Knowl. Extr. 2025, 7, 92. [Google Scholar] [CrossRef]
Arrowsmith, J.; Susnjak, T.; Jang-Jaccard, J. Multimodal Deep Learning for Android Malware Classification. Mach. Learn. Knowl. Extr. 2025, 7, 23. [Google Scholar] [CrossRef]
Vergara, L.; Salazar, A. On the Optimum Linear Soft Fusion of Classifiers. Appl. Sci. 2025, 15, 5038. [Google Scholar] [CrossRef]
Jobert, M.; Shulz, H.; Jahnig, P.; Tismer, C.; Bes, F.; Escola, H. A computerized method for detecting episodes of wakefulness during sleep based on the Alpha slow-wave index (ASI). Sleep 1994, 17, 37–46. [Google Scholar] [CrossRef]
Yazdi, M.; Samaee, M.; Massicotte, D. A Review on Automated Sleep Study. Ann. Biomed. Eng. 2024, 52, 1463–1491. [Google Scholar] [CrossRef]
Heneghan, C. St. Vincent’s University Hospital/University College Dublin, Sleep Apnea Database. 2011. Available online: https://physionet.org/content/ucddb/1.0.0/ (accessed on 20 November 2025).
Hjorth, B. EEG analysis based on time domain properties. Electroencephalogr. Clin. Neurophysiol. 1970, 29, 306–310. [Google Scholar] [CrossRef]
Salazar, A.; Vergara, L.; Miralles, R. On including sequential dependence in ICA mixture models. Signal Process. 2010, 90, 2314–2318. [Google Scholar] [CrossRef]
Motamedi-Fakhr, S.; Moshrefi-Torbati, M.; Hill, M.; Hill, C.; White, P. Signal processing techniques applied to human sleep EEG signals—A review. Biomedical. Signal Process. Control 2014, 10, 21–33. [Google Scholar] [CrossRef]
Li, T.; Zhou, M. ECG Classification UsingWavelet Packet Entropy and Random Forests. Entropy 2016, 18, 285. [Google Scholar] [CrossRef]
Sun, H.; Jia, J.; Goparaju, B.; Huang, G.; Sourina, O.; Matt Travis, M.; Westover, B. Large-Scale Automated Sleep Staging. Sleep 2017, 40, zsx139. [Google Scholar] [CrossRef]
Franz Ehrlich, J.B. Automatic Sleep Arousal Detection Using Heart Rate from a Single-Lead. In Proceedings of the 2022 Computing in Cardiology (CinC), Tampere, Finland, 4–7 September 2022. [Google Scholar]
Clifford, G.; Azuaje, F.; McSharry, P. Advanced Methods and Tools for ECG Data Analysis; Artech House: Norwood, MA, USA, 2006. [Google Scholar]
Salazar, A.; Safont, G.; Vergara, L.; Vidal, E. Graph Regularization Methods in Soft Detector Fusion. IEEE Access 2023, 11, 144747–144759. [Google Scholar] [CrossRef]

Figure 1. General scheme for conditional bias correction.

Figure 2. Area (shadowed) where

p_{1} (s_{1}) p_{1} (s_{2})

and

p_{0} (s_{1}) p_{0} (s_{2})

have to be integrated to calculate, respectively, the increase in probability of detection and probability of false alarm.

Figure 2. Area (shadowed) where

p_{1} (s_{1}) p_{1} (s_{2})

and

p_{0} (s_{1}) p_{0} (s_{2})

have to be integrated to calculate, respectively, the increase in probability of detection and probability of false alarm.

Figure 3. Exponential PDF cases. (a)

p_{0} (s_{m})

with

b_{0} = 0.1

, and

p_{1} (s_{m})

with

b_{1} = 0.3

. (b) Contour plots of the corresponding bivariate PDFs and area (shadowed) to be integrated. (c) Balanced accuracy for

b_{0} = 0.05

and

0.05 \leq b_{1} \leq 0.35 \Leftrightarrow 0 \leq Δ \leq 0.30

.

Figure 3. Exponential PDF cases. (a)

p_{0} (s_{m})

with

b_{0} = 0.1

, and

p_{1} (s_{m})

with

b_{1} = 0.3

. (b) Contour plots of the corresponding bivariate PDFs and area (shadowed) to be integrated. (c) Balanced accuracy for

b_{0} = 0.05

and

0.05 \leq b_{1} \leq 0.35 \Leftrightarrow 0 \leq Δ \leq 0.30

.

Figure 4. (a) Beta-PDFs for

α = 0.4 (i)

,

α = 0.7 (i i)

, and

α = 1.0 (i i i)

, in all three cases

b_{0} = 0.1

and

b_{1} = 0.3

. (b) Contour plots of the corresponding bivariate PDFs and area (shadowed) to be integrated. (c) Balanced accuracies computed by Monte Carlo simulations for

α = 0.4 (i)

,

α = 0.7 (i i)

, and

α = 1.0 (i i i)

with

b_{0} = 0.05

and

0.05 \leq b_{1} \leq 0.35 \Leftrightarrow 0 \leq Δ \leq 0.30

.

Figure 4. (a) Beta-PDFs for

α = 0.4 (i)

,

α = 0.7 (i i)

, and

α = 1.0 (i i i)

, in all three cases

b_{0} = 0.1

and

b_{1} = 0.3

. (b) Contour plots of the corresponding bivariate PDFs and area (shadowed) to be integrated. (c) Balanced accuracies computed by Monte Carlo simulations for

α = 0.4 (i)

,

α = 0.7 (i i)

, and

α = 1.0 (i i i)

with

b_{0} = 0.05

and

0.05 \leq b_{1} \leq 0.35 \Leftrightarrow 0 \leq Δ \leq 0.30

.

Figure 5. Balanced accuracy values for the 10 options from Table 1 (*) and Table 2 (^O).

Table 1. Bias, probability of detection, probability of false alarm, and balanced accuracy of different options without bias correction. The best result without fusion and the best result with fusion are highlighted.

	−1 Modality −1 Method				−1 Modality −2 Methods (Fusion)		−2 Modalities −1 Method (Fusion)		−2 Modalities −2 Methods (Fusion)
OPTION	1	2	3	4	5	6	7	8	9	10
Modality and Method	EEG GB	EEG LR	ECG GB	ECG LR	EEG GB + EEG LR	ECG GB + ECG LR	EEG GB + ECG GB	EEG LR + ECG LR	EEG GB + ECG LR	EEG LR + ECG GB
Bias class 0	0.067	0.044	0.070	0.028	0.055	0.049	0.068	0.036	0.047	0.057
Bias class 1	0.291	0.355	0.354	0.439	0.323	0.396	0.322	0.397	0.365	0.354
Prob. of Detection (%)	70.86	64.01	64.60	55.94	68.05	59.92	70.10	64.31	69.69	65.24
Prob. of f. alarm (%)	5.76	4.03	5.67	1.95	4.40	2.82	4.92	1.83	3.65	2.80
Balanced Accuracy (%)	82.55	79.99	79.47	76.99	81.83	78.55	82.59	81.24	83.02	81.22

Table 2. Bias, probability of detection, probability of false alarm, and balanced accuracy of different options with bias correction. The best result without fusion and the best result with fusion are highlighted.

	−1 Modality −1 Method				−1 Modality −2 Methods (Fusion)		−2 Modalities −1 Method (Fusion)		−2 Modalities −2 Methods (Fusion)
OPTION	1	2	3	4	5	6	7	8	9	10
Modality and Method	EEGb GB	EEGb LR	ECGb GB	ECGb LR	EEGb GB + EEGb LR	ECGb GB + ECGb LR	EEGb GB + ECGb GB	EEGb LR + ECGb LR	EEGb GB + ECGb LR	EEGb LR + ECGb GB
Bias class 0	0.020	0.016	0.024	0.009	0.018	0.016	0.022	0.012	0.014	0.020
Bias class 1	0.104	0.143	0.150	0.205	0.124	0.178	0.127	0.174	0.155	0.146
Prob. of Detection (%)	70.86	64.01	64.60	55.94	73.84	71.85	82.27	75.95	78.41	81.10
Prob. of f. alarm (%)	5.76	4.03	5.67	1.95	6.17	5.44	7.90	4.44	5.77	6.84
Balanced Accuracy (%)	82.55	79.99	79.47	76.99	83.84	83.21	87.13	85.75	86.32	87.18

Table 3. Correlation coefficients between the different sets of scores. The smallest correlation values are highlighted.

Class 0	EEG-GB	EEG-LR	ECG-GB	ECG-LR	Class 1	EEG-GB	EEG-LR	ECG-GB	ECG-LR
EEG-GB	1.00	0.61	0.32	0.26	EEG-GB	1.00	0.66	0.28	0.31
EEG-LR	0.61	1.00	0.20	0.30	EEG-LR	0.66	1.00	0.21	0.36
ECG-GB	0.32	0.20	1.00	0.44	ECG-GB	0.28	0.21	1.00	0.51
ECG-LR	0.26	0.30	0.44	1.00	ECG-LR	0.31	0.36	0.51	1.00

Table 4. Balanced accuracy of the different options for different thresholds, with bias correction (BC) and with no bias correction (NBC). The best result for each threshold is highlighted.

		−1 Modality −1 Method				−1 Modality −2 Methods (Fusion)		−2 Modalities −1 Method (Fusion)		−2 Modalities −2 Methods (Fusion)
OPTION		1	2	3	4	5	6	7	8	9	10
Modality and Method		EEG GB	EEG LR	ECG GB	ECG LR	EEG GB + EEG LR	ECG GB + ECG LR	EEG GB + ECG GB	EEG LR + ECG LR	EEG GB + ECG LR	EEG LR + ECG GB
BA (%) threshold 0.40	BC NBC	83.23 83.23	81.28 81.28	79.64 79.64	78.06 78.06	85.24 83.56	84.20 81.99	87.38 86.96	86.59 84.32	86.44 85.91	87.39 86.59
BA (%) threshold 0.45	BC NBC	82.92 82.92	80.50 80.50	79.68 79.68	77.28 77.28	84.62 82.83	83.64 80.78	87.26 86.37	86.93 83.50	86.44 84.94	87.35 85.74
BA (%) threshold 0.50	BC NBC	82.55 82.55	79.99 79.99	79.47 79.47	76.69 76.69	83.84 81.83	83.21 78.55	87.13 82.59	85.75 81.24	86.32 83.02	87.18 81.22
BA (%) threshold 0.55	BC NBC	82.26 82.26	79.71 79.71	79.26 79.26	76.46 76.46	83.20 80.39	82.34 75.74	86.67 77.78	84.67 77.64	85.82 78.12	86.49 77.28
BA (%) threshold 0.60	BC NBC	82.04 82.04	79.38 79.38	78.98 78.98	76.67 75.67	82.61 79.56	81.24 74.86	85.51 76.55	83.91 75.35	85.20 76.03	85.81 75.40

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Vergara, L.; Salazar, A. Does Score Bias Correction Improve the Fusion of Classifiers? Mach. Learn. Knowl. Extr. 2025, 7, 151. https://doi.org/10.3390/make7040151

AMA Style

Vergara L, Salazar A. Does Score Bias Correction Improve the Fusion of Classifiers? Machine Learning and Knowledge Extraction. 2025; 7(4):151. https://doi.org/10.3390/make7040151

Chicago/Turabian Style

Vergara, Luis, and Addisson Salazar. 2025. "Does Score Bias Correction Improve the Fusion of Classifiers?" Machine Learning and Knowledge Extraction 7, no. 4: 151. https://doi.org/10.3390/make7040151

APA Style

Vergara, L., & Salazar, A. (2025). Does Score Bias Correction Improve the Fusion of Classifiers? Machine Learning and Knowledge Extraction, 7(4), 151. https://doi.org/10.3390/make7040151

Article Menu

Does Score Bias Correction Improve the Fusion of Classifiers?

Abstract

1. Introduction

2. The Algorithm for Conditional Bias Correction

3. The Exponential Distribution Case

4. The Beta Distribution Case

5. A Real Data Case

6. Discussion

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI