Enhancing Cascade Object Detection Accuracy Using Correctors Based on High-Dimensional Feature Separation

Kovalchuk, Andrey V.; Lebedev, Andrey A.; Shemagina, Olga V.; Nuidel, Irina V.; Yakhno, Vladimir G.; Stasenko, Sergey V.

doi:10.3390/technologies13120593

Open AccessArticle

Enhancing Cascade Object Detection Accuracy Using Correctors Based on High-Dimensional Feature Separation

by

Andrey V. Kovalchuk

^1,2

,

Andrey A. Lebedev

¹,

Olga V. Shemagina

²,

Irina V. Nuidel

²

,

Vladimir G. Yakhno

^1,2

and

Sergey V. Stasenko

^1,3,*

¹

Research Center in Artificial Intelligence, Institute of Information Technologies, Mathematics and Mechanics, Lobachevsky State University of Nizhny Novgorod, 603022 Nizhny Novgorod, Russia

²

Laboratory of Autowave Processes, Institute of Applied Physics RAS, 603950 Nizhny Novgorod, Russia

³

Moscow Center for Advanced Studies, 123592 Moscow, Russia

^*

Author to whom correspondence should be addressed.

Technologies 2025, 13(12), 593; https://doi.org/10.3390/technologies13120593

Submission received: 7 November 2025 / Revised: 3 December 2025 / Accepted: 13 December 2025 / Published: 16 December 2025

(This article belongs to the Special Issue Image Analysis and Processing)

Download

Browse Figures

Versions Notes

Abstract

This study addresses the problem of correcting systematic errors in classical cascade object detectors under severe data scarcity and distribution shift. We focus on the widely used Viola–Jones framework enhanced with a modified Census transform and propose a modular “corrector” architecture that can be attached to an existing detector without retraining it. The key idea is to exploit the blessing of dimensionality: high-dimensional feature vectors constructed from multiple cascade stages are transformed by PCA and whitening into a space where simple linear Fisher discriminants can reliably separate rare error patterns from normal operation using only a few labeled examples. This study presents a novel algorithm designed to correct the outputs of object detectors constructed using the Viola–Jones framework enhanced with a modified census transform. The proposed method introduces several improvements addressing error correction and robustness in data-limited conditions. The approach involves image partitioning through a sliding window of fixed aspect ratio and a modified census transform in which pixel intensity is compared to the mean value within a rectangular neighborhood. Training samples for false negative and false positive correctors are selected using dual Intersection-over-Union (IoU) thresholds and probabilistic sampling of true positive and true negative fragments. Corrector models are trained based on the principles of high-dimensional separability within the paradigm of one- and few-shot learning, utilizing features derived from cascade stages of the detector. Decision boundaries are optimized using Fisher’s rule, with adaptive thresholding to guarantee zero false acceptance. Experimental results indicate that the proposed correction scheme enhances object detection accuracy by effectively compensating for classifier errors, particularly under conditions of scarce training data. On two railway image datasets with only about one thousand images each, the proposed correctors increase Precision from 0.36 to 0.65 on identifier detection while maintaining high Recall (0.98 → 0.94), and improve digit detection Recall from 0.94 to 0.98 with negligible loss in Precision (0.92 → 0.91). These results demonstrate that even under scarce training data, high-dimensional feature separation enables effective one-/few-shot error correction for cascade detectors with minimal computational overhead.

Keywords:

error correction; high-dimensional separability; multiclass classification; computer vision; cascade detectors

1. Introduction

Modern artificial intelligence (AI) systems are increasingly being integrated into nearly every aspect of human activity—from medicine and autonomous vehicles to finance, industrial automation [1,2,3]. Despite remarkable advances achieved through machine learning (ML) and deep learning (DL), all AI systems remain inherently prone to errors. These errors represent a critical obstacle to the safe, trustworthy, and effective deployment of AI in real-world contexts where accuracy, robustness, and reliability are paramount [4,5]. Understanding the sources of these errors and developing mechanisms to mitigate them is thus a major focus of current AI research.

In real-world machine learning practice, the classical assumption of independent and identically distributed (IID) data is frequently violated [6,7,8,9]. According to this assumption, data samples are drawn independently from a fixed probability distribution over the joint space of inputs and outputs. However, in realistic environments, this assumption is overly restrictive and rarely holds. Data evolve over time—through aging, sensor degradation, or changing external conditions—and novel data patterns often emerge that differ from those seen during training [10,11]. Such deviations result in distribution shifts, concept drift, and domain adaptation challenges [12,13].

Recent research has emphasized that the hypothesis of a stationary distribution is an extremely strong and often unrealistic simplification [9]. To address this, auxiliary concepts like covariate shift and dataset drift have been introduced, relaxing the IID assumption and enabling adaptive learning mechanisms [8,14]. Nonetheless, the IID framework remains central to statistical learning theory and continues to shape the mathematical underpinnings of modern ML systems [15].

Another fundamental source of error in AI systems is data uncertainty, arising from the fact that training datasets cannot encompass the full diversity of possible real-world situations [16,17]. Two primary forms of uncertainty are commonly recognized: aleatoric uncertainty, stemming from inherent noise or random measurement and labeling errors in the data, and epistemic uncertainty, reflecting the model’s lack of knowledge or limited generalization beyond its training set [16,18]. Furthermore, selection bias—when training data do not accurately represent the target environment—leads to systematic performance degradation in deployment [19,20].

A related and particularly insidious issue is shortcut learning (or spurious correlation learning) [21]. In this case, models exploit superficial patterns in the training data that happen to correlate with the target labels, enabling them to achieve high test accuracy without capturing the true underlying causal relationships [22]. As a result, such systems perform well on benchmarks but fail catastrophically when faced with even minor changes in input distribution, as shown in studies on dataset bias, adversarial shifts, and out-of-distribution (OOD) generalization [23].

In many cases, AI system errors are opaque and difficult to interpret. The inexplicability of errors complicates both diagnosis and correction. These failures may arise from software faults, design flaws, unexpected human–AI interactions, or adversarial attacks. In particular, research on adversarial examples has shown that small, seemingly insignificant perturbations to input data can cause dramatic and unpredictable failures in even highly accurate deep models [24,25,26]. This fragility highlights fundamental weaknesses in how neural networks represent information and generalize to unseen data [27].

Error correction has long been a core principle of learning algorithms. Classical perceptron models and their successors rely on backpropagation to iteratively minimize errors, forming the foundation of modern deep learning [2,28]. In reinforcement learning (RL), the concept of error correction extends to the continuous improvement of an agent’s policy through interaction with its environment [29]. However, in model-based RL, discrepancies inevitably arise between the learned model of the environment and the true dynamics of the real world, leading to suboptimal or unsafe behavior [30,31].

A common response to AI errors is systematic retraining of the model to incorporate newly observed cases. However, this approach suffers from several critical drawbacks:

Preserving existing skills requires retraining on the full dataset.
Retraining demands substantial computational resources and time.
New errors may be introduced during retraining.
Retaining prior knowledge (avoiding catastrophic forgetting) is not guaranteed.

These limitations make retraining unsuitable for real-time or safety-critical error correction in dynamic environments [32]. An alternative approach is the use of AI correctors—external modules or algorithms that complement existing AI systems by diagnosing errors and producing corrected outputs [33,34,35]. Correctors have key advantages: they are flexible, modular, and reversible, allowing the base AI system to remain unchanged and easily restorable if necessary. Recent work has explored correctors as a foundation for more resilient, interpretable, and self-healing AI architectures capable of maintaining reliability under non-stationary and uncertain conditions [36].

Most traditional error correction methods rely on iterative retraining on large datasets to avoid introducing new errors. However, this process is computationally intensive and unsuitable for real-time systems. In contrast, the corrector method performs non-iterative correction using one or a few labeled examples, particularly effective in high-dimensional spaces [34,36,37].

An AI corrector requires a small labeled error set, used to train a binary classifier that distinguishes between normal and erroneous states. This reduces the problem to learning with few examples, where the error class is sparsely represented [38,39]. Such learning challenges are addressed by one-shot and few-shot learning, which aim to generalize from minimal data [38]. Modern approaches employ meta-learning and transfer learning, where models pre-trained on related tasks acquire generalizable meta-skills that enable rapid adaptation to new conditions [40,41,42].

The success of learning from few examples often relies on either dimensionality reduction or the blessing of dimensionality [37,43,44,45].

Dimensionality reduction isolates key features for classification with minimal samples.
The blessing of dimensionality implies that, under regular distribution conditions [34,46], classical linear methods (e.g., SVMs, Fisher discriminants) can achieve robust separability even with few examples.

Based on these principles, Gorban et al. [36] proposed the concept of AI correctors, combining a binary classifier that identifies high-risk states without retraining the base system. Training uses two classes:

Normal—correct system operation;
Error—labeled faulty situations.

The correction algorithm operates as follows. The corrector first collects the input data, internal signals, and output values from the base classifier. Using these signals, it applies stochastic partitioning to determine whether the current situation corresponds to normal operation or exhibits a high probability of error. When a potentially erroneous state is detected, the corrector invokes an auxiliary classifier or regressor to generate an adjusted output, thereby compensating for the identified error. As new error types arise, the corrector can be rapidly updated through one-shot learning without full system retraining. For complex applications, a multi-corrector architecture may be employed, where several specialized correctors handle different error categories under the coordination of a central dispatcher.

Our work presents a novel algorithm for post-processing and error correction in object detectors constructed using the Viola–Jones framework enhanced with a modified census transform. The proposed method improves robustness and accuracy under data-limited conditions through image partitioning with a fixed-aspect-ratio sliding window and a census transform that compares each pixel intensity with the mean value within a rectangular neighborhood. Training samples for false-negative and false-positive correctors are selected using dual Intersection-over-Union (IoU) thresholds and probabilistic sampling. The corrector models are trained within a one- and few-shot learning paradigm based on high-dimensional separability principles and features extracted from the detector’s cascade stages. Decision boundaries are optimized using Fisher’s criterion with adaptive thresholding to ensure zero false acceptance. Experimental results demonstrate that the proposed correction scheme significantly improves detection accuracy by effectively compensating for classifier errors, particularly in scenarios with scarce training data.

The main contributions of this work are (i) a corrector architecture for modified Census-based Viola–Jones detectors that operates in a high-dimensional feature space and can be trained in a one-/few-shot regime; (ii) a principled feature construction and clustering scheme that exploits high-dimensional separability to build cluster-wise Fisher correctors with zero false acceptance on the correction set; (iii) an experimental study on two challenging railway datasets demonstrating that the proposed correctors significantly improve Precision and/or Recall under severe data limitations, with minimal computational overhead.

Related Work

Recent years have seen rapid progress in few-shot and one-shot learning, including few-shot object detection (FSOD) methods that adapt large detectors to novel classes with only a handful of annotated examples. Modern FSOD approaches often rely on meta-learning, contrastive representation learning, and powerful deep backbones to improve sample efficiency and robustness to distribution shifts [47,48,49]. These works share with our approach the goal of exploiting high-dimensional representations for data-efficient learning, but they are primarily concerned with learning new classes rather than correcting systematic errors of an existing detector.

A parallel line of research focuses on robustness under distribution shift. Test-time adaptation (TTA) methods adapt a pre-trained model using unlabeled target data at test time, often via self-training and online optimization, and have been extensively surveyed in recent work [50,51]. Generalized out-of-distribution (OOD) detection aims to identify samples whose distribution differs from the training data and has been reviewed in several comprehensive surveys and evaluations for computer vision and related domains [52,53]. Both TTA and OOD detection typically rely on deep feature extractors and may involve iterative adaptation steps or auxiliary scoring networks, which increases computational cost at test time.

Finally, model rectification and class-incremental learning methods explicitly modify or “patch” a trained model to reduce systematic biases and correct its predictions on new data. Recent studies emphasize rectification strategies such as feature boosting and compression, exemplar compression, and dynamic expansion of feature spaces to restore performance on old classes while incorporating new ones [54,55,56]. While conceptually related to our goal of correcting an existing detector without retraining it from scratch, these approaches again operate within deep neural architectures and often incur non-trivial retraining or adaptation overhead.

In contrast, we target classical cascade object detectors and industrial inspection scenarios where strict real-time constraints and limited computational resources preclude heavy adaptation mechanisms. Our corrector architecture is designed to be extremely lightweight: at test time, it adds only linear-time processing

O (n)

in the dimensionality n of the feature vector on top of the modified Census-based Viola–Jones detector, while training remains a quadratic-time

O (n^{2})

offline procedure on a relatively small correction set. This enables us to exploit high-dimensional separability and one-/few-shot error correction without compromising the speed of the underlying recognition system.

2. Detector Correction for Object Detection in Images

A biomorphic system for semantic analysis was developed that uses individual cascade detectors for each concept in its vocabulary, employing strong classifiers [57] based on non-local binary patterns [58]. The detectors are organized sequentially using a multi-stage detection technique, in which each detector uses a cascade of connected strong classifiers.

2.1. Object Detection Using a Cascade Detector

The cascade detector algorithm [57] has proven to be an effective method for detecting objects (e.g., faces) in images by using cascades of weak classifiers trained on a set of simple features. The algorithm implements the idea of a sliding window, where each image fragment is classified as containing an object (1) or not (0). Feature cascades are organized as a truncated binary tree, with the response (0) of any strong classifier interpreted as “not an object,” and “an object” otherwise. The main stages of the detector algorithm can be described as follows.

Detector Feature Space. The classical version of the detector [57] uses Haar features, which are calculated based on the differences in the sums of intensities in rectangular regions. The developed algorithm uses modified Census features. Based on this encoding, a weak classifier is obtained. For this, each binary pattern code is assigned a weak classifier response

{0, 1}

, determined by comparing the distribution responses on the training database.

In the classical Viola–Jones detector [57], Haar-like features computed from rectangular intensity differences are used as inputs to weak classifiers. In our implementation we replace Haar features with modified Census features inspired by [58]. For each pixel, the standard Census transform encodes a local neighborhood by comparing each neighbor to the center pixel and forming a binary pattern. In the modified version, we compare each neighbor to the mean intensity within a rectangular neighborhood rather than to the center pixel. This change makes the descriptor less sensitive to local noise at a single pixel and to small illumination fluctuations, while preserving the strong invariance of Census-like encodings to monotonic intensity changes. The resulting binary patterns are then aggregated over predefined regions and mapped to weak classifier responses {0, 1} based on their empirical distributions on the training set.

Cascade Structure of the Detector. The cascade consists of a sequence of stages, where each stage quickly discards most of the negatives, leaving for further processing only those fragments that may contain the target object. If at least one stage rejects a fragment, further calculations for it are not performed. The “strong classifier” is created using the AdaBoost algorithm [59], which combines features and weak classifiers. Its binary function 0.1 is determined during training, minimizing recognition errors in the training database.

Image Scanning. The detector uses a sliding window method, which moves across the image with specified steps

S_{x}

and

S_{y}

. Let the image have dimensions

W \times H

, and the window have dimensions

w \times h

, then the coordinates of the upper-left corner of the window are given by

x_{i} = i \cdot s_{x}, y_{j} = j \cdot s_{y}, i = 0, 1, \dots, ⌊\frac{W - w}{s_{x}}⌋, j = 0, 1, \dots, ⌊\frac{H - h}{s_{y}}⌋

(1)

For each window

R_{i j}

, features are calculated, and a binary classification

D (R_{i j}) \to {0, 1}

is performed using a cascade architecture of strong classifiers, where “1” means object detection, and “0” indicates no object. This procedure allows the image to be divided into multiple overlapping or non-overlapping fragments, each of which is used for classification.

Scaling and Aspect Ratio. To account for objects of different sizes, the detector is applied at multiple scales, and the aspect ratio of the window can be arbitrary, allowing for adaptation to different object types and shooting conditions. The scheme used (Figure 1) ensures high performance, as most windows are quickly discarded early in the cascade, and detailed checking is performed only on potentially relevant fragments.

Statement of the Detector Correction Problem. Detector

D (R)

can make errors, generating two types of errors:

False Negative (FN): The object is present, but the detector returned 0.
False Positive (FP): The object is missing, but the detector returned a 1.

To improve detection accuracy, two types of correctors are introduced:

FN Corrector: Trained on errors where the detector fails (FN). The features used are data obtained only from the K detector stages, as well as the average intensity values within a fragment, calculated using the modified Census transform.
FP Corrector: Trained on false positives (FPs) and correctly identified positives (TP). It uses features similar to those for the FN corrector, but takes all detector stages into account.

A training scheme (Figure 2) was developed to create and apply correctors during detector operation:

The scheme in Figure 2 allows for the generation of correctors in cases of incomplete labeling on input images. Fragments used for training are accumulated in the training procedure’s internal database and are used once the required threshold number of examples is reached.

For the FN corrector, we use features extracted only from the first

K = 3

stages of the cascade. This choice is motivated by an empirical analysis of where FN-related candidates are rejected within the cascade. Figure 3 shows the cumulative share of rejected FN-related candidates as a function of the cascade stage number; the curve is averaged over all digit detectors obtained in our experiments. Approximately

83 %

of all false-negative events occur within the first three stages, while the contribution of later stages is comparatively small. At the same time, computing features from all cascade stages for every fragment would significantly reduce the overall speed of the system, as it would require evaluating the entire cascade for all candidates, thus negating the efficiency benefits of the early-rejection mechanism. Restricting the FN corrector to the first three stages therefore provides a good compromise: it covers the vast majority of FN-related errors while keeping the feature computation cost low.

2.2. Selecting Fragments for Building Correctors

For training correctors, fragments are selected based on the degree of matching with the labeling (G). The relative intersection metric (IoU) is used as a measure of matching:

IoU (R, G T) = \frac{| R \cap G |}{| R \cup G |}

(2)

where R is a rectangular fragment and G is the corresponding labeling (Figure 4).

Selecting examples for the FN corrector. Fragment R is considered a detector error and is selected for training the FN corrector according to the following rule:

M_{f n}^{i} = \{\begin{matrix} \emptyset, if \exists R : [IoU (R, G_{i}) > T_{h i} \land D (R) = 1] \\ \{R ∣ IoU (R, G_{i}) > T_{h i} \land D (R) = 0\}, otherwise \end{matrix}

(3)

Error-free fragments (TN) are selected with a predetermined probability

p_{t n}

, provided that for any

G_{i}

the following condition is met:

M_{t n}^{i} = \{R ∣ IoU (R, G_{i}) < T_{l o} \land D (R) = 0\}

(4)

The parameter

p_{t n}

is necessary to reduce the number of fragments

tn, B

in cases where the number of labeled fragments is much smaller than the background ones, and to balance the database for training the corrector (Equations (3) and (4)).

Selecting Examples for the FP Corrector. Two types of fragments are used to construct the FP corrector. The first type are false positives, which are determined using the intersection threshold with the labeling:

M_{f p}^{i} = \{R ∣ IoU (R, G_{i}) < T_{l o} \land D (R) = 1\}

(5)

The second type of fragment are true positives, used as positive examples for the corrector.

M_{t p}^{i} = \{R ∣ IoU (R, G_{i}) > T_{h i} \land D (R) = 1\}

(6)

They are also included in the training set with probability

p_{t p}

. A visualization of the fragment selection process for constructing correctors according to rules (Equations (3)–(6)) is shown in Figure 5.

Training samples for the correctors are collected until their size reaches a predetermined threshold of

N_{f n} N_{f p}

for the FN and FP correctors, respectively.

2.3. Construction of Correctors

The correctors FN and FP receive as input the combined feature vector

x

for each fragment R and generate a binary response about the fragment’s membership in the sets being corrected. This is achieved by using an algorithm developed for high-dimensional spaces, which constructs a separating hyperplane based on an analysis of the distinguishability of classes (errors and correct solutions).

Using the theoretical justification for the possibility of applying the Fisher criterion to the problem of constructing a corrector for a base classifier, a general algorithm for obtaining correctors (Algorithm 1) is presented, which constructs a separating hyperplane between the set of errors X and the set of correct solutions Y.

Algorithm 1 Corrector Construction Algorithm.

Require:: Sets $X, Y$ ; number of clusters k; number of principal components m; thresholds $θ$ or $(θ_{1}, \dots, θ_{k})$ ; Fisher discriminants $(w_{1}, \dots, w_{k})$ ; centroid matrices H and W.
Ensure:: Fisher discriminants $(w_{1}, \dots, w_{k})$ and thresholds $(θ_{1}, \dots, θ_{k})$ for each cluster.
1:: Compute centroid of set X:

$x_{c} = \frac{1}{| X |} \sum_{x \in X} x$
2:: Centralize data:

$X_{c} = X - x_{c}, Y^{*} = Y - x_{c}$
3:: Extract principal components from $X_{c}$ .
4:: Select m components with the largest eigenvalues $(λ_{1} \geq \dots \geq λ_{m} > 0)$ of the covariance matrix of $X_{c}$ .
5:: Project $X_{c}$ and $Y^{*}$ into the principal component space:

$X_{r} = X_{c} P, Y_{r}^{*} = Y^{*} P$

Construct transition matrix $H = P^{T}$ .
6:: Construct whitening matrix:

$W = diag (λ_{1}^{- 1 / 2}, \dots, λ_{m}^{- 1 / 2})$

Apply transformation:

$X_{w} = X_{r} W, Y_{w}^{*} = Y_{r}^{*} W$
7:: Partition $X_{w}$ into k clusters using the k-means algorithm; obtain centroids $c_{1}, \dots, c_{k}$ .
8:: For each cluster $i = 1, \dots, k$ , compute Fisher discriminant $w_{i}$ and corresponding threshold $θ_{i}$ using $(X_{w}^{i}, Y_{w}^{*})$ .
9:: return $(w_{1}, \dots, w_{k})$ , $(θ_{1}, \dots, θ_{k})$

Algorithm 1 transforms the data into a multicluster feature space, where the task is to minimize errors in each cluster. To evaluate the resulting feature space, distributions of measurements in the studied cluster were constructed.

Once the corrector is built on the training data, the algorithm for applying it is as follows Algorithm 2:

Algorithm 2 Correction Algorithm.

Require:: Input vector z; centroid $x_{c}$ ; cluster centroids $(c_{1}, \dots, c_{k})$ ; threshold vector $(θ_{1}, \dots, θ_{k})$ ; Fisher discriminants $(w_{1}, \dots, w_{k})$ ; transition matrices H and W.
Ensure:: Corrected output classification.
1:: Centralize input by subtracting the centroid $x_{c}$ :

$z_{c} = z - x_{c}$
2:: Apply transformation matrices H and W:

$z_{w} = (z_{c} H) W$
3:: Determine the closest cluster t from ${c_{1}, \dots, c_{k}}$ using Euclidean distance:

$t = arg min_{i} ∥ z_{w} - c_{i} ∥$
4:: Classify the sample as belonging to the corrected (error) set if

$(w_{t}, z_{w}) < θ_{t}$
5:: return Correction decision for input z.

To construct correctors, features extracted from image fragments using a detector are used. The features are the average segment intensities calculated using the modified Census transform. The size of the input vector for the corrector is determined by the number of features in each cascade and the number of cascades:

x (R) = [x_{R}^{1}, x_{R}^{2}, \dots, x_{R}^{N}], x_{R}^{m} = [μ_{R}^{1, m}, μ_{R}^{2, m}, \dots, μ_{R}^{Z^{m}, m}]

(7)

where

μ_{R}^{k, m}

is the average intensity of nine segments of weak classifier k at the m-th cascade in fragment R.

The FN corrector uses data obtained from the first K detector stages. The FP corrector uses features obtained from all stages, where

M \geq K

is the total number of detector stages.

Training samples for the correctors are collected until the minimum sizes of

N_{f n}

and

N_{f p}

are reached. Thus, the correctors are trained on a sufficiently large number of examples, minimizing the risk of overfitting and ensuring high generalization ability.

To improve the accuracy of the FN and FP correctors, it is proposed to further divide errors into groups (clusters), since errors arising during detector operation often have different natures and differ in the nature of the feature space. Clustering allows for the specific characteristics of errors to be taken into account and separate hyperplanes to be constructed for each type of error. Suppose we have a set of detector errors (FN or FP), represented as a set of feature vectors:

X_{n e g} = \{x_{1}, x_{2}, \dots, x_{M}\}

(8)

where each feature

x_{i}

is a vector of feature values obtained from the fragment corresponding to the detection error.

The goal of pre-processing is to partition the set

X_{n e g}

into a predetermined number of clusters K:

X_{neg} = \cup_{j = 1}^{K} X_{j}, X_{i} \cap X_{j} = \emptyset, i \neq j

(9)

This partitioning (Equation (9)) is performed using the K-means algorithm [60], which minimizes the within-cluster variance.

After partitioning the errors into clusters, a separate separating hyperplane is constructed for each resulting cluster

X_{j}

, which separates the errors of this cluster from the set of all positive (correct) examples

X_{p o s}

.

For each cluster, a separate discrimination problem is solved using the Fisher criterion.

2.4. Fisher’s Rule for Decision Making

The correction algorithm is based on Fisher’s rule for linear discrimination using bases [61]. Let there be two classes: an error class (e.g., FN or FP) with feature distributions

N (μ_{1}, σ_{1}^{2})

and a class of correct fragments with distributions

N (μ_{2}, σ_{2}^{2})

. The optimal weight vector w for the linear discriminant is determined by maximizing the Fisher criterion:

J (w) = \frac{{(μ_{1} - μ_{2})}^{2}}{σ_{1}^{2} + σ_{2}^{2}}

(10)

The linear decision is made according to the following function:

f (x) = w^{T} x + b

(11)

where b is the bias. The class is determined by the rule:

C (x) = \{\begin{matrix} 1, f (x) \geq θ \\ 0, otherwise \end{matrix}

(12)

where

θ

is the threshold.

When constructing correctors for working with a detector under conditions of imbalanced classes, the key requirement is to achieve zero foreign acceptance rate (FAR) on the selected data, since the number of fragments into which the image is divided significantly exceeds the number of fragments containing the target object. To achieve this, the threshold

θ

is chosen such that no example from a class that is not an error is mistakenly classified as an error for the given dataset. Formally, let

X_{1}

be the set of erroneous fragments (

f n

for the FN corrector and

f p

for the FP corrector). Then, the threshold selection condition is written as

θ = min \{f (x) ∣ x \in X_{1}\}

(13)

If condition (13) is not satisfied, then additional weight adjustments are made or a more stringent threshold is selected to ensure the required level of

F A R = 0

.

2.5. Computational Complexity

Let n denote the dimensionality of the feature vector used by a corrector after PCA projection and whitening. At test time, evaluating a corrector for a single fragment requires only a small number of matrix–vector multiplications and dot products: projection to the PCA subspace, whitening, computation of distances to a few cluster centers, and evaluation of the corresponding Fisher discriminants. All these operations scale linearly with n, so the inference complexity of the corrector is

O (n)

per fragment. This linear-time overhead is negligible compared to the cost of evaluating thousands of weak classifiers in the underlying modified Census-based Viola–Jones cascade.

In contrast, training a corrector involves estimating covariance matrices and solving small eigenvalue problems in the PCA space. These steps scale quadratically with the feature dimension, i.e., the training complexity is

O (n^{2})

. Since training is performed offline and only on a relatively small correction set, this quadratic dependence is acceptable and, in fact, reflects an explicit design trade-off: we deliberately restrict ourselves to linear-time inference and moderate

O (n^{2})

training in order to maintain the high speed of the base recognition system. Many recent few-shot and model-rectification methods rely on deep backbones and iterative gradient-based adaptation at test time, which introduces significantly higher computational overhead than our lightweight linear operations.

3. Experimental Results

To evaluate the effectiveness of the proposed approach, experiments were conducted on two datasets (databases):

The first database contains images of railway tank cars, in which areas containing digital tank car identifiers are detected. The database consists of 1153 images of railcars with marked identifiers. The second database contains images of numbers on railcars. The database consists of 1067 images of identifiers with marked digits [0…9] on each. Both databases were downloaded from the Kaggle website [62].

The railway tank car dataset contains 1153 full images of tank cars with annotated rectangular regions corresponding to identifier areas. Image resolutions range from 480 to 1080 pixels by height, with substantial variability in viewing angles, background clutter (tracks, surrounding cars, infrastructure), and illumination (day/night, weather conditions). Size of objects variates from 12 percent of heights to 80. The second dataset contains 1067 cropped identifier images with bounding boxes for individual digits 0–9. In this case, the background is less cluttered but the digits exhibit strong variations in font, size, and contrast, as well as occlusions and motion blur. These characteristics make both datasets challenging for classical cascade detectors and representative of industrial inspection scenarios.

For each dataset, we employ a repeated random splitting strategy with a 5:4:1 ratio between training, correction, and test subsets. Concretely, all images are first randomly partitioned into 10 approximately equal folds. For each of the 10 experimental runs, we then randomly select one fold as the test set, four folds as the correction set, and the remaining five folds as the training set. Thus, in every run the test subset is strictly independent from both the training and correction subsets used to train the detector and the correctors, while across the 10 runs, every image appears in the test set at least once, which allows us to accumulate statistics over the entire dataset.

The correction set is used only to collect fragments on which the detector makes errors. Since the base detector does not fail on all images, only a fraction of fragments in the correction folds actually correspond to false negatives or false positives. The correctors are therefore trained on a subset of objects extracted from the correction folds—namely, on those fragments where the detector produces errors. This reflects the practical regime in which only relatively few error examples are available and motivates the one-/few-shot nature of our corrector construction. We report mean Precision and Recall (and, where appropriate, standard deviations) over the 10 runs to account for variability due to random splits.

Each database is randomly split into three parts in a 5/4/1 ratio for the corresponding samples:

The training set is used to build the cascade detector.
The test set is used exclusively to evaluate the performance of the algorithm (without and after applying correctors).
The correction set is used to obtain training sets and build the FN and FP correctors.

The splitting experiment is repeated 10 times to reduce the dependence on uneven distribution of test and training data.

The experiments use standard metrics characterizing the performance of the object detector: Precision and Recall.

\begin{matrix} P r & = \frac{t p}{t p + f p} \\ R e & = \frac{t p}{t p + f n} \end{matrix}

where

t p

is the number of correctly detected objects;

f p

is the number of false positives; and

f n

is the number of missed objects. To determine whether a detection is true or false, a threshold of 0.5 was used for

I o U

, and the results for each split were averaged.

Table 1 shows that, for digit detection, the FN corrector increases Recall from 0.94 to 0.98 while causing only a minor decrease in Precision (0.92 → 0.91). This indicates that the corrector successfully recovers a substantial fraction of previously missed digits with almost no additional false alarms. For identifier detection, the FP corrector exhibits a different trade-off: Precision improves dramatically from 0.36 to 0.65, whereas Recall slightly decreases from 0.98 to 0.94. In practice, this behaviour is desirable in industrial railway monitoring, where false alarms (spurious identifiers) incur high manual verification costs, while missing a small fraction of identifiers can be tolerated if downstream systems include redundancy (e.g., multiple frames per car or additional OCR checks).

The trained detector on the first part of the database was first applied to the test set without correctors, and metrics were calculated. Then, based on the errors (FN and FP) obtained on the correction set, the corresponding correctors were trained. Subsequently, retesting was conducted on the same test set using the trained correctors to evaluate the correctors’ effectiveness. For digit detection, 10 different detectors were built, localizing the corresponding digit in the image. The final results present the average metrics of all digit detectors (Table 1).

To determine the stability of the corrector’s quality characteristics as a function of the fragment intersection thresholds with the markup, the

Precision (T) / Recall (T)

dependences were constructed for the two corrector types under study (Figure 6a). Note that at high threshold values, the corrector lacks examples to construct the separating surface, which is related to the selected step of the sliding fragment window. Fragments become less variable, which impairs the corrector’s generalization to other images.

Figure 5 further analyses the dependence of corrector quality on the IoU thresholds

T_{h i}

and

T_{l o}

used for fragment selection. For the FN corrector (Figure 6a), increasing

T_{h i}

initially improves Recall but eventually leads to performance degradation when the pool of available positive fragments becomes too small. A similar pattern is observed for the FP corrector in Figure 6b: higher

T_{l o}

values reduce the diversity of background fragments, which harms generalization and leads to unstable Recall. These observations highlight the importance of balancing the strictness of IoU thresholds against the need for diverse training data when constructing correctors.

Table 1 should be interpreted in the context of the sequential use of the two detectors for semantic scene analysis. In the first stage, an identifier detector localizes wagon identifier regions; in the second stage, a digit detector is applied only within these regions. For the identifier stage, false negatives are critical, since missing an identifier discards the whole wagon instance; for the digit stage, both false negatives and false positives directly influence the correctness of the final identifier string. The proposed correctors modify this pipeline in a complementary manner. For identifiers, the FP corrector substantially increases Precision (from 0.36 to 0.65) with a moderate decrease in Recall (from 0.98 to 0.94). For digits, the FN corrector increases Recall (from 0.94 to 0.98) with a negligible change in Precision (from 0.92 to 0.91). As a result, the effective Recall of the two-stage chain “identifier → digits”, approximated by the product of stage recalls, remains essentially unchanged (about 0.92 both before and after applying the correctors), while the fraction of incorrectly recognized digits in the test set decreases from approximately

8 %

to approximately

6 %

, i.e., by about

25 %

in relative terms.

From the computational viewpoint, the sequential configuration with corrected detectors leads to a significant reduction in the number of fragments processed by the second stage. Increasing identifier Precision from 0.36 to 0.65 implies that the expected number of candidate identifier regions per true identifier decreases by about 45%. Since the computational cost of the digit detector is approximately proportional to the number of regions it processes, this reduction in candidates results in a decreased workload for the second stage and an overall per-frame processing time reduction on the order of 37%, while practically preserving the same end-to-end Recall of the semantic pipeline.

As the

T_{l o}

threshold increases, the Recall characteristic of the FP corrector suffers, while the Precision characteristic improves (Figure 6b).

The experimental results (Table 1, Figure 6) on the selected databases demonstrate the advantage of using correctors to improve detector performance. Using the FN corrector significantly increases the recall metric (Recall), which corresponds to a reduction in the number of missed objects. The FP corrector effectively suppresses false positives, increasing precision.

4. Discussion

This study proposes an approach for implementing correction algorithms that leverage the blessing of dimensionality in high-dimensional feature spaces. The proposed method enhances the accuracy and performance of existing detectors while maintaining computational efficiency. The corrector architecture offers several advantages that make it suitable for a wide range of applications. First, it enables the learning of new error classes without reconstructing or retraining the underlying model—an essential capability for autonomous and adaptive systems where rapid error correction is critical. Second, the design is computationally economical: once trained, the correctors introduce minimal processing overhead, as mapping new measurements to their corresponding clusters is considerably faster than retraining the base algorithm.

Promising applications of the proposed architecture include scenarios where the core model must rapidly adapt to new or rare events. In computer vision, correctors can be trained to recognize novel object types or detect production defects immediately after the first occurrence. In cybersecurity, they can identify emerging attack patterns or anomalous behaviors from single instances, enabling the generation of targeted filters for characteristic threat signals. In natural language processing, correctors may refine model outputs incrementally, adapting to new slang, misspellings, or linguistic variations as they appear.

The architecture also supports integration into multi-corrector systems, enabling the simultaneous handling of multiple error types. In such configurations, a dispatcher first assigns each input to a corresponding error cluster and routes it to the appropriate elementary corrector. Each corrector operates independently on its designated error class, ensuring modularity and scalability. As new error types emerge, additional modules can be trained on minimal data without affecting existing components.

Further architectural improvements may focus on identifying high-dimensional feature spaces that satisfy the conditions of stochastic separation theorems, facilitating the use of classical linear discriminants and hyperplane-based separation methods. Another promising direction involves combining correctors with novelty detection mechanisms—first identifying inputs that deviate from the training distribution, and then selectively activating correction modules. This integration could enhance overall stability and reduce unnecessary correction activations.

The requirement of zero false acceptance rate (FAR = 0) is enforced on the finite correction set by choosing conservative thresholds for the Fisher discriminants. While this is desirable from a safety perspective, it inevitably introduces a form of overfitting: in truly open-world deployments, previously unseen background patterns may still pass through the corrector. In practice, the FAR–FRR trade-off can be controlled by relaxing the thresholds using a separate validation set or by combining the correctors with explicit novelty detection mechanisms.

Compared to recent paradigms such as out-of-distribution (OOD) detection, test-time adaptation, and model rectification, our approach adopts a modular view: instead of modifying the base detector parameters at test time, we attach external corrector modules that operate in a high-dimensional feature space. This design avoids catastrophic forgetting and allows new error types to be incorporated via one-/few-shot training of additional modules, at the cost of maintaining a dedicated correction set and potentially multiple correctors. We believe that integrating high-dimensional correctors with OOD detectors or lightweight test-time adaptation could further improve robustness in highly non-stationary environments.

5. Conclusions

This study has presented a novel algorithm for correcting the outputs of object detectors built upon the Viola–Jones framework enhanced with a modified Census transform. The proposed method improves detection robustness and error resilience under data-limited conditions by combining sliding-window image partitioning, mean-based census transformation, and probabilistic sampling guided by dual Intersection-over-Union (IoU) thresholds. Corrector models are trained within the one- and few-shot learning paradigm, leveraging high-dimensional separability and features extracted from cascade stages of the base detector. Decision boundaries are optimized using Fisher’s criterion with adaptive thresholding to ensure zero false acceptance. Experimental evaluation confirms that the proposed correction scheme effectively compensates for classifier errors and significantly enhances detection accuracy, particularly when available training data are scarce. Future work will focus on extending the corrector architecture to multiclass detection scenarios and integrating novelty detection mechanisms for adaptive real-time performance.

The obtained results demonstrate that the proposed method for constructing correctors using classical separation methods in high-dimensional spaces and a modified Census transform is an effective tool for improving the performance of object detectors, especially in situations with limited training data. The described approach demonstrates significant potential for improving the accuracy and stability of detectors built on the basis of the method [39] and the modified Census transform.

The present study is limited by the use of two specialized railway datasets of moderate size and by the assumption of zero FAR enforced on the correction set. Future work will therefore focus on extending the evaluation to more diverse object detection benchmarks, systematically studying the FAR–FRR trade-off under different operating points, and combining the proposed correctors with modern OOD detection and test-time adaptation techniques, while preserving the linear-time inference regime that is crucial for real-time industrial applications.

Author Contributions

Conceptualization, S.V.S., A.V.K. and V.G.Y.; methodology, S.V.S. and A.V.K.; software, A.V.K., A.A.L. and O.V.S.; validation, S.V.S., A.A.L., A.V.K. and O.V.S.; formal analysis, S.V.S., A.A.L., A.V.K. and O.V.S.; investigation, A.A.L., A.V.K. and O.V.S.; resources, A.A.L., A.V.K., O.V.S. and I.V.N.; data curation, A.A.L., A.V.K., O.V.S. and I.V.N.; writing—original draft preparation, S.V.S. and A.V.K.; writing—review and editing, S.V.S., A.A.L., V.G.Y. and A.V.K.; visualization, A.V.K., A.A.L. and O.V.S.; supervision, S.V.S. and A.V.K.; project administration, S.V.S. and A.V.K.; funding acquisition, S.V.S. and A.V.K. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Ministry of Economic Development of the Russian Federation (grant No. 139-15-2025-004 dated 17 April 2025, agreement identifier 000000C313925P3X0002).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The dataset utilized in this study was downloaded from the Kaggle website and is available upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Russell, S.; Norvig, P. Artificial Intelligence: A Modern Approach, 3rd ed.; Prentice Hall: Hoboken, NJ, USA, 2010. [Google Scholar]
LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
Silver, D.; Huang, A.; Maddison, C.J.; Guez, A.; Sifre, L.; Van Den Driessche, G.; Schrittwieser, J.; Antonoglou, I.; Panneershelvam, V.; Lanctot, M.; et al. Mastering the game of Go with deep neural networks and tree search. Nature 2016, 529, 484–489. [Google Scholar] [CrossRef]
Amodei, D.; Olah, C.; Steinhardt, J.; Christiano, P.; Schulman, J.; Mané, D. Concrete problems in AI safety. arXiv 2016, arXiv:1606.06565. [Google Scholar] [CrossRef]
Varshney, K.R. Engineering Safety in Machine Learning. In Proceedings of the 2016 Information Theory and Applications Workshop (ITA), La Jolla, CA, USA, 31 January–5 February 2016. [Google Scholar]
Cucker, F.; Smale, S. On the Mathematical Foundations of Learning. Bull. Am. Math. Soc. 2002, 39, 1–49. [Google Scholar] [CrossRef]
Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction; Springer: Berlin/Heidelberg, Germany, 2009. [Google Scholar]
Quiñonero-Candela, J.; Lawrence, N.D. Dataset Shift in Machine Learning; MIT Press: Cambridge, MA, USA, 2009. [Google Scholar]
Moreno-Torres, J.G.; Raeder, T.; Alaiz-Rodríguez, R.; Chawla, N.V.; Herrera, F. A unifying view on dataset shift in classification. Pattern Recognit. 2012, 45, 521–530. [Google Scholar] [CrossRef]
Sugiyama, M.; Krauledat, M.; Müller, K.R. Covariate shift adaptation by importance weighted cross-validation. J. Mach. Learn. Res. 2012, 8, 985–1005. [Google Scholar]
Widmer, G.; Kubat, M. Learning in the presence of concept drift and hidden contexts. Mach. Learn. 1996, 23, 69–101. [Google Scholar] [CrossRef]
Gama, J.; Žliobaitė, I.; Bifet, A.; Pechenizkiy, M.; Bouchachia, A. A survey on concept drift adaptation. ACM Comput. Surv. 2014, 46, 44. [Google Scholar] [CrossRef]
Sahoo, S.S.; Lampert, C.H.; Martius, G. Learning equations for extrapolation and control. Int. Conf. Mach. Learn. 2018, 80, 4442–4450. [Google Scholar]
Kompa, B.; Snoek, J.; Beam, A. Second opinion needed: Communicating uncertainty in medical machine learning. Npj Digit. Med. 2021, 4, 4. [Google Scholar] [CrossRef]
Vapnik, V. Statistical Learning Theory; Wiley: Hoboken, NJ, USA, 1998. [Google Scholar]
Kendall, A.; Gal, Y. What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? In Proceedings of the Advances in Neural Information Processing Systems 30 (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
Gal, Y.; Ghahramani, Z. Dropout as a Bayesian approximation. Int. Conf. Mach. Learn. 2016, 48, 1050–1059. [Google Scholar]
Der Kiureghian, A.; Ditlevsen, O. Aleatory or epistemic? Does it matter? Struct. Saf. 2009, 31, 105–112. [Google Scholar] [CrossRef]
Shimodaira, H. Improving predictive inference under covariate shift by weighting the log-likelihood function. J. Stat. Plan. Inference 2000, 90, 227–244. [Google Scholar] [CrossRef]
Liu, C.; Tang, K.; Qin, Y.; Lei, Q. Bridging Distribution Shift and AI Safety: Conceptual and Methodological Synergies. arXiv 2025, arXiv:2505.22829. [Google Scholar] [CrossRef]
Geirhos, R.; Jacobsen, J.H.; Michaelis, C.; Zemel, R.; Brendel, W.; Bethge, M.; Wichmann, F.A. Shortcut learning in deep neural networks. Nat. Mach. Intell. 2020, 2, 665–673. [Google Scholar] [CrossRef]
Ong Ly, C.; Unnikrishnan, B.; Tadic, T.; Patel, T.; Duhamel, J.; Kandel, S.; Moayedi, Y.; Brudno, M.; Hope, A.; Ross, H.; et al. Shortcut learning in medical AI hinders generalization: A method for estimating AI model generalization without external data. Npj Digit. Med. 2024, 7, 224. [Google Scholar] [CrossRef]
Hendrycks, D.; Basart, S.; Mu, N.; Kadavath, S.; Wang, F.; Dorundo, E.; Desai, R.; Zhu, T.; Parajuli, S.; Guo, M.; et al. The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 11–17 October 2021. [Google Scholar]
Zhou, S.; Liu, C.; Ye, D.; Zhu, T.; Zhou, W.; Yu, P. Adversarial Attacks and Defenses in Deep Learning: From a Perspective of Cybersecurity. ACM Comput. Surv. 2022, 55, 1–39. [Google Scholar] [CrossRef]
Goodfellow, I.J.; Shlens, J.; Szegedy, C. Explaining and harnessing adversarial examples. arXiv 2014, arXiv:1412.6572. [Google Scholar]
Madry, A.; Makelov, A.; Schmidt, L.; Tsipras, D.; Vladu, A. Towards deep learning models resistant to adversarial attacks. arXiv 2017, arXiv:1706.06083. [Google Scholar]
Ilyas, A.; Santurkar, S.; Tsipras, D.; Engstrom, L.; Tran, B.; Madry, A. Adversarial Examples Are Not Bugs, They Are Features. In Proceedings of the Advances in Neural Information Processing Systems 32 (NeurIPS 2019), Vancouver, BC, Canada, 8–14 December 2019. [Google Scholar]
Rosenblatt, F. The perceptron: A probabilistic model for information storage and organization. Psychol. Rev. 1958, 65, 386–408. [Google Scholar] [CrossRef]
Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction, 2nd ed.; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
Deisenroth, M.; Rasmussen, C. PILCO: A Model-Based and Data-Efficient Approach to Policy Search. In Proceedings of the 28th International Conference on Machine Learning (ICML-11), Bellevue, WA, USA, 28 June–2 July 2011. [Google Scholar]
Chua, K.; Calandra, R.; McAllister, R.; Levine, S. Deep Reinforcement Learning in a Handful of Trials Using Probabilistic Dynamics Models. In Proceedings of the Advances in Neural Information Processing Systems 31 (NeurIPS 2018), Montreal, QC, Canada, 3–8 December 2018. [Google Scholar]
French, R. Catastrophic forgetting in connectionist networks. Trends Cogn. Sci. 1999, 3, 128–135. [Google Scholar] [CrossRef]
Gorban, A.; Tyukin, I.; Romanenko, I. The Blessing of Dimensionality: Separation Theorems in the Thermodynamic Limit. IFAC-PapersOnLine 2016, 49, 64–69. [Google Scholar] [CrossRef]
Gorban, A.; Tyukin, I. Stochastic Separation Theorems. Neural Netw. 2017, 94, 255–259. [Google Scholar] [CrossRef]
Gorban, A.; Golubkov, A.; Grechuk, B.; Mirkes, E.; Tyukin, I. Correction of AI systems by linear discriminants: Probabilistic foundations. Inf. Sci. 2018, 466, 303–322. [Google Scholar] [CrossRef]
Gorban, A.; Grechuk, B.; Mirkes, E.; Stasenko, S.; Tyukin, I. High-dimensional separability for one-and few-shot learning. Entropy 2021, 23, 1090. [Google Scholar] [CrossRef]
Gorban, A.; Tyukin, I. Blessing of dimensionality: Mathematical foundations of the statistical physics of data. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 2018, 376, 20170237. [Google Scholar] [CrossRef]
Snell, J.; Swersky, K.; Zemel, R. Prototypical Networks for Few-shot Learning. In Proceedings of the Advances in Neural Information Processing Systems 30 (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017; pp. 4077–4087. [Google Scholar]
Vinyals, O.; Blundell, C.; Lillicrap, T.; Kavukcuoglu, K.; Wierstra, D. Matching Networks for One Shot Learning. In Proceedings of the Advances in Neural Information Processing Systems 29 (NIPS 2016), Barcelona, Spain, 5–10 December 2016; pp. 3630–3640. [Google Scholar]
Pan, S.; Yang, Q. A Survey on Transfer Learning. IEEE Trans. Knowl. Data Eng. 2010, 22, 1345–1359. [Google Scholar] [CrossRef]
Finn, C.; Abbeel, P.; Levine, S. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. Int. Conf. Mach. Learn. 2017, 70, 1126–1135. [Google Scholar]
Hospedales, T.; Antoniou, A.; Micaelli, P.; Storkey, A. Meta-Learning in Neural Networks: A Survey. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 5149–5169. [Google Scholar] [CrossRef] [PubMed]
Anderson, J.; Belkin, M.; Goyal, N.; Rademacher, L.; Voss, J. The More, the Merrier: The Blessing of Dimensionality for Learning Large Gaussian Mixtures. Proc. Mach. Learn. Res. 2014, 35, 1135–1164. [Google Scholar]
Kainen, P. Utilizing Geometric Anomalies of High Dimension: When Complexity Makes Computation Easier. In Computer Intensive Methods in Control and Signal Processing: The Curse of Dimensionality; Birkhäuser: Boston, MA, USA, 1997; pp. 283–294. [Google Scholar]
Tyukin, I.; Gorban, A.; Alkhudaydi, M.; Zhou, Q. Demystification of Few-Shot and One-Shot Learning. In Proceedings of the 2021 International Joint Conference on Neural Networks (IJCNN), Shenzhen, China, 18–22 July 2021; pp. 1–7. [Google Scholar]
Gorban, A.; Makarov, V.; Tyukin, I. The unreasonable effectiveness of small neural ensembles in the high-dimensional brain. Phys. Life Rev. 2019, 29, 55–88. [Google Scholar] [CrossRef]
Xin, Z.; Chen, S.; Wu, T.; Shao, Y.; Ding, W.; You, X. Few-shot object detection: Research advances and challenges. Inf. Fusion 2024, 107, 102307. [Google Scholar] [CrossRef]
Liu, T.; Zhang, L.; Wang, Y.; Guan, J.; Fu, Y.; Zhao, J.; Zhou, S. Recent Few-Shot Object Detection Algorithms: A Survey with Performance Comparison. ACM Trans. Intell. Syst. Technol. 2023, 14, 66. [Google Scholar] [CrossRef]
Huang, G.; Laradji, I.; Vázquez, D.; Rodriguez, P. A Survey of Self-Supervised and Few-Shot Object Detection. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 4071–4089. [Google Scholar] [CrossRef]
Liang, J.; He, R.; Tan, T. A Comprehensive Survey on Test-Time Adaptation under Distribution Shifts. Int. J. Comput. Vis. 2025, 133, 31–64. [Google Scholar] [CrossRef]
Zhao, H.; Liu, Y.; Alahi, A.; Lin, T. On Pitfalls of Test-Time Adaptation. In Proceedings of the 40th International Conference on Machine Learning (ICML), Honolulu, HI, USA, 23–29 July 2023; pp. 42058–42080. [Google Scholar]
Yang, J.; Zhou, K.; Li, Y.; Liu, Z. Generalized Out-of-Distribution Detection: A Survey. Int. J. Comput. Vis. 2024, 132, 5635–5662. [Google Scholar] [CrossRef]
Theunissen, L.; Mortier, T.; Saeys, Y.; Waegeman, W. Evaluation of out-of-distribution detection methods for data shifts in single-cell transcriptomics. Brief. Bioinform. 2025, 26, bbaf239. [Google Scholar]
Wang, Q.; Gao, Y.; Chen, J.; Sebe, N. FOSTER: Feature Boosting and Compression for Class-Incremental Learning. In Proceedings of the Computer Vision—ECCV 2022: 17th European Conference On Computer Vision, Tel Aviv, Israel, 23–27 October 2022; Springer: Cham, Switzerland, 2022; pp. 398–414. [Google Scholar]
Luo, J.; Peng, J.; Zhang, J.; Zhang, R.; Han, J.; Liu, J. Class-Incremental Exemplar Compression for Class-Incremental Learning. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 18–22 June 2023; pp. 11371–11380. [Google Scholar]
Ferdinand, M.; Garcia Cifuentes, A.; Faltings, B. Feature Expansion and Enhanced Compression for Class-Incremental Learning. Neurocomputing 2025, 610, 128782. [Google Scholar]
Viola, P.; Jones, M. Rapid Object Detection Using a Boosted Cascade of Simple Features. In Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2001), Kauai, HI, USA, 8–14 December 2001; pp. I-511–I-518. [Google Scholar]
Fröba, B.; Ernst, A. Face Detection with the Modified Census Transform. In Proceedings of the Sixth IEEE International Conference on Automatic Face and Gesture Recognition, Seoul, Republic of Korea, 19 May 2004; pp. 91–96. [Google Scholar]
Freund, Y.; Schapire, R. A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. J. Comput. Syst. Sci. 1997, 55, 119–139. [Google Scholar]
Tao, C. Unsupervised Fuzzy Clustering with Multi-Center Clusters. Fuzzy Sets Syst. 2002, 128, 305–322. [Google Scholar] [CrossRef]
Grechuk, B.; Gorban, A.; Tyukin, I. General Stochastic Separation Theorems with Optimal Bounds. Neural Netw. 2021, 138, 33–56. [Google Scholar] [CrossRef] [PubMed]
Li, S.; Nuidel, I.; Shemagina, O.; Stasenko, S.; Kovalchuck, A. Railway. 2025. Available online: https://www.kaggle.com/datasets/nuidelirina/railway (accessed on 10 September 2025).

Figure 1. Schematic of a cascade detector with a modified Census transform.

Figure 2. General scheme for constructing correctors during detector operation.

Figure 3. Cumulative share of rejected FN-related candidates over cascade stages, averaged over all digit detectors obtained in our experiments. Approximately

83 %

of FN-related candidates are rejected within the first three stages.

Figure 3. Cumulative share of rejected FN-related candidates over cascade stages, averaged over all digit detectors obtained in our experiments. Approximately

83 %

of FN-related candidates are rejected within the first three stages.

Figure 4. Visualization of fragment intersection metric calculation.

Figure 5. Visualization of the fragment collection process for constructing correctors.

Figure 6. (a) Dependence of the FN corrector’s quality on the upper intersection threshold of fragments with the ’Digit’-based markup for a fixed

T_{l o}

(0.15). (b) Dependence of the FP corrector quality on the upper threshold of fragment intersection with the ’Identifiers’ database labeling for a fixed

T_{h i}

(0.75).

Figure 6. (a) Dependence of the FN corrector’s quality on the upper intersection threshold of fragments with the ’Digit’-based markup for a fixed

T_{l o}

(0.15). (b) Dependence of the FP corrector quality on the upper threshold of fragment intersection with the ’Identifiers’ database labeling for a fixed

T_{h i}

(0.75).

Table 1. Detector test results.

Database	Configuration	Precision	Recall
Numbers	Without Correctors	0.92	0.94
Numbers	FN Correctors (1)	0.91	0.98
Identifiers	Without Correctors	0.36	0.98
Identifiers	FP Correctors (5)	0.65	0.94

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kovalchuk, A.V.; Lebedev, A.A.; Shemagina, O.V.; Nuidel, I.V.; Yakhno, V.G.; Stasenko, S.V. Enhancing Cascade Object Detection Accuracy Using Correctors Based on High-Dimensional Feature Separation. Technologies 2025, 13, 593. https://doi.org/10.3390/technologies13120593

AMA Style

Kovalchuk AV, Lebedev AA, Shemagina OV, Nuidel IV, Yakhno VG, Stasenko SV. Enhancing Cascade Object Detection Accuracy Using Correctors Based on High-Dimensional Feature Separation. Technologies. 2025; 13(12):593. https://doi.org/10.3390/technologies13120593

Chicago/Turabian Style

Kovalchuk, Andrey V., Andrey A. Lebedev, Olga V. Shemagina, Irina V. Nuidel, Vladimir G. Yakhno, and Sergey V. Stasenko. 2025. "Enhancing Cascade Object Detection Accuracy Using Correctors Based on High-Dimensional Feature Separation" Technologies 13, no. 12: 593. https://doi.org/10.3390/technologies13120593

APA Style

Kovalchuk, A. V., Lebedev, A. A., Shemagina, O. V., Nuidel, I. V., Yakhno, V. G., & Stasenko, S. V. (2025). Enhancing Cascade Object Detection Accuracy Using Correctors Based on High-Dimensional Feature Separation. Technologies, 13(12), 593. https://doi.org/10.3390/technologies13120593

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Enhancing Cascade Object Detection Accuracy Using Correctors Based on High-Dimensional Feature Separation

Abstract

1. Introduction

Related Work

2. Detector Correction for Object Detection in Images

2.1. Object Detection Using a Cascade Detector

2.2. Selecting Fragments for Building Correctors

2.3. Construction of Correctors

2.4. Fisher’s Rule for Decision Making

2.5. Computational Complexity

3. Experimental Results

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI