Robust Deep Active Learning via Distance-Measured Data Mixing and Adversarial Training

Shinan Song; Xing Wang; Shike Dong; Jingyan Jiang

doi:10.3390/e27111159

,

and

¹

School of Computer Science and Engineering, Changchun University of Technology, Changchun 130012, China

²

Sendelta International Academy Shenzhen, Shenzhen 518100, China

³

College of Bigdata and Internet, Shenzhen Technology University, Shenzhen 518118, China

^*

Author to whom correspondence should be addressed.

Entropy2025, 27(11), 1159;https://doi.org/10.3390/e27111159

Version Notes

Order Reprints

Abstract

Accurate uncertainty estimation in unlabeled data represents a fundamental challenge in active learning. Traditional deep active learning approaches suffer from a critical limitation: uncertainty-based selection strategies tend to concentrate excessively around noisy decision boundaries, while diversity-based methods may miss samples that are crucial for decision-making. This over-reliance on confidence metrics when employing deep neural networks as backbone architectures often results in suboptimal data selection. We introduce Distance-Measured Data Mixing (DM2), a novel framework that estimates sample uncertainty through distance-weighted data mixing to capture inter-sample relationships and the underlying data manifold structure. This approach enables informative sample selection across the entire data distribution while maintaining focus on near-boundary regions without overfitting to the most ambiguous instances. To address noise and instability issues inherent in boundary regions, we propose a boundary-aware feature fusion mechanism integrated with fast gradient adversarial training. This technique generates adversarial counterparts of selected near-boundary samples and trains them jointly with the original instances, thereby enhancing model robustness and generalization capabilities under complex or imbalanced data conditions. Comprehensive experiments across diverse tasks, model architectures, and data modalities demonstrate that our approach consistently surpasses strong uncertainty-based and diversity-based baselines while significantly reducing the number of labeled samples required for effective learning.

Keywords:

active learning; data selection; robustness; uncertainty estimation

Article Metrics

Citations

Article Access Statistics

Journal Statistics

Multiple requests from the same IP address are counted as one view.