This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Open AccessArticle
Using Multi-Layer Bidirectional Distillation to Enhance Local and Global Features for Action Recognition
by
Shilu Kang
Shilu Kang
,
Hua Huo
Hua Huo *
,
Jiaxin Xu
Jiaxin Xu ,
Aokun Mei
Aokun Mei and
Chen Zhang
Chen Zhang
College of Information Engineering and Artificial Intelligence, Henan University of Science and Technology, Luoyang 471000, China
*
Author to whom correspondence should be addressed.
Sensors 2025, 25(22), 6849; https://doi.org/10.3390/s25226849 (registering DOI)
Submission received: 8 October 2025
/
Revised: 5 November 2025
/
Accepted: 7 November 2025
/
Published: 9 November 2025
Abstract
Different action recognition tasks exhibit significant variations in their reliance on local versus global features. Particularly for long-video understanding, dynamically balancing the contributions of both has become a critical challenge for improving recognition accuracy. This paper proposes a Multi-Layer Bidirectional Distillation Model (MBD) based on the two-stream architecture. It employs 3D CNN and video Transformer to capture local and global spatio-temporal features of videos, respectively, aiming to explore the complementary mechanisms between these two feature types and facilitate their synergistic enhancement across diverse recognition task scenarios. The model quantifies feature contributions across specific recognition tasks to map feature dominance, categorizing videos into distinct feature-dominant groups. This mechanism provides a clear direction for knowledge transfer, overcoming the limitations of traditional unidirectional knowledge distillation. Bidirectional knowledge distillation is then performed at the intermediate and final layers, training the model to learn complementary relationships between features and addressing the issue of insufficient representational capacity of non-dominant features. During inference, an adaptive fusion strategy based on feature dominance is adopted, achieving feature fusion via dynamic weighted summation. This mechanism effectively suppresses noise interference from non-dominant features while maximizing the discriminative advantages of dominant features. The MBD model undergoes systematic comparative experiments across four classic action recognition benchmarks (UCF101, HMDB51, Kinectics-400, Something-Something V2). The results demonstrate that the MBD model not only excels in short-video recognition but also outperforms in analyzing complex actions under long-video scenarios.
Share and Cite
MDPI and ACS Style
Kang, S.; Huo, H.; Xu, J.; Mei, A.; Zhang, C.
Using Multi-Layer Bidirectional Distillation to Enhance Local and Global Features for Action Recognition. Sensors 2025, 25, 6849.
https://doi.org/10.3390/s25226849
AMA Style
Kang S, Huo H, Xu J, Mei A, Zhang C.
Using Multi-Layer Bidirectional Distillation to Enhance Local and Global Features for Action Recognition. Sensors. 2025; 25(22):6849.
https://doi.org/10.3390/s25226849
Chicago/Turabian Style
Kang, Shilu, Hua Huo, Jiaxin Xu, Aokun Mei, and Chen Zhang.
2025. "Using Multi-Layer Bidirectional Distillation to Enhance Local and Global Features for Action Recognition" Sensors 25, no. 22: 6849.
https://doi.org/10.3390/s25226849
APA Style
Kang, S., Huo, H., Xu, J., Mei, A., & Zhang, C.
(2025). Using Multi-Layer Bidirectional Distillation to Enhance Local and Global Features for Action Recognition. Sensors, 25(22), 6849.
https://doi.org/10.3390/s25226849
Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details
here.
Article Metrics
Article metric data becomes available approximately 24 hours after publication online.