Supervised Feature Selection Method Using Stackable Attention Networks

Chen, Zhu; Jiang, Wei; Tan, Jun; Li, Zhiqiang; Gui, Ning

doi:10.3390/math13223703

This is an early access version, the complete PDF, HTML, and XML versions will be available soon.

Open AccessArticle

Supervised Feature Selection Method Using Stackable Attention Networks

by

Zhu Chen

^1,†,

Wei Jiang

^2,†

,

Jun Tan

²

,

Zhiqiang Li

²

and

Ning Gui

^2,*

¹

HUANENG Power International Inc., Beijing 100031, China

²

School of Computer Science and Engineering, Central South University, Changsha 410083, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Mathematics 2025, 13(22), 3703; https://doi.org/10.3390/math13223703

Submission received: 10 October 2025 / Revised: 1 November 2025 / Accepted: 5 November 2025 / Published: 18 November 2025

(This article belongs to the Special Issue Advanced Neural Network and Machine Learning Algorithms, Models and Architectures in Data Mining)

Download Versions Notes

Abstract

Mainstream DNN-based feature selection methods share a similar design strategy: employing one specially designed feature selection module to learn the importance of features along with the model-training process. While these works achieve great success in feature selection, their shallow structures, which evaluate feature importance from one perspective, are easily disturbed by noisy samples, especially in datasets with high-dimensional features and complex structures. To alleviate this limitation, this paper innovatively introduces a Stackable Attention architecture for Feature Selection (SAFS), which can calculate stable and accurate feature weights through a set of Stackable Attention Blocks (SABlocks) rather than from a single module. To avoid information loss from stacking, a feature jump concatenation structure is designed. Furthermore, an inertia-based weight update method is proposed to generate a more robust feature weight distribution. Experiments on twelve real-world datasets, including multiple domains, demonstrate that SAFS produced the best results with significant performance edges compared to thirteen baselines.

Keywords: feature selection; supervised learning; attention networks

Share and Cite

MDPI and ACS Style

Chen, Z.; Jiang, W.; Tan, J.; Li, Z.; Gui, N. Supervised Feature Selection Method Using Stackable Attention Networks. Mathematics 2025, 13, 3703. https://doi.org/10.3390/math13223703

AMA Style

Chen Z, Jiang W, Tan J, Li Z, Gui N. Supervised Feature Selection Method Using Stackable Attention Networks. Mathematics. 2025; 13(22):3703. https://doi.org/10.3390/math13223703

Chicago/Turabian Style

Chen, Zhu, Wei Jiang, Jun Tan, Zhiqiang Li, and Ning Gui. 2025. "Supervised Feature Selection Method Using Stackable Attention Networks" Mathematics 13, no. 22: 3703. https://doi.org/10.3390/math13223703

APA Style

Chen, Z., Jiang, W., Tan, J., Li, Z., & Gui, N. (2025). Supervised Feature Selection Method Using Stackable Attention Networks. Mathematics, 13(22), 3703. https://doi.org/10.3390/math13223703

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Supervised Feature Selection Method Using Stackable Attention Networks

Abstract

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI