Using Multi-Layer Bidirectional Distillation to Enhance Local and Global Features for Action Recognition

Kang, Shilu; Huo, Hua; Xu, Jiaxin; Mei, Aokun; Zhang, Chen

doi:10.3390/s25226849

This is an early access version, the complete PDF, HTML, and XML versions will be available soon.

Open AccessArticle

Using Multi-Layer Bidirectional Distillation to Enhance Local and Global Features for Action Recognition

by

Shilu Kang

,

Hua Huo

^*

,

Jiaxin Xu

,

Aokun Mei

and

Chen Zhang

College of Information Engineering and Artificial Intelligence, Henan University of Science and Technology, Luoyang 471000, China

^*

Author to whom correspondence should be addressed.

Sensors 2025, 25(22), 6849; https://doi.org/10.3390/s25226849 (registering DOI)

Submission received: 8 October 2025 / Revised: 5 November 2025 / Accepted: 7 November 2025 / Published: 9 November 2025

(This article belongs to the Special Issue Computer Vision-Based Human Activity Recognition)

Download Versions Notes

Abstract

Different action recognition tasks exhibit significant variations in their reliance on local versus global features. Particularly for long-video understanding, dynamically balancing the contributions of both has become a critical challenge for improving recognition accuracy. This paper proposes a Multi-Layer Bidirectional Distillation Model (MBD) based on the two-stream architecture. It employs 3D CNN and video Transformer to capture local and global spatio-temporal features of videos, respectively, aiming to explore the complementary mechanisms between these two feature types and facilitate their synergistic enhancement across diverse recognition task scenarios. The model quantifies feature contributions across specific recognition tasks to map feature dominance, categorizing videos into distinct feature-dominant groups. This mechanism provides a clear direction for knowledge transfer, overcoming the limitations of traditional unidirectional knowledge distillation. Bidirectional knowledge distillation is then performed at the intermediate and final layers, training the model to learn complementary relationships between features and addressing the issue of insufficient representational capacity of non-dominant features. During inference, an adaptive fusion strategy based on feature dominance is adopted, achieving feature fusion via dynamic weighted summation. This mechanism effectively suppresses noise interference from non-dominant features while maximizing the discriminative advantages of dominant features. The MBD model undergoes systematic comparative experiments across four classic action recognition benchmarks (UCF101, HMDB51, Kinectics-400, Something-Something V2). The results demonstrate that the MBD model not only excels in short-video recognition but also outperforms in analyzing complex actions under long-video scenarios.

Keywords: human action recognition; 3D CNN; video Transformer; complementary mechanisms; knowledge distillation; feature fusion

Share and Cite

MDPI and ACS Style

Kang, S.; Huo, H.; Xu, J.; Mei, A.; Zhang, C. Using Multi-Layer Bidirectional Distillation to Enhance Local and Global Features for Action Recognition. Sensors 2025, 25, 6849. https://doi.org/10.3390/s25226849

AMA Style

Kang S, Huo H, Xu J, Mei A, Zhang C. Using Multi-Layer Bidirectional Distillation to Enhance Local and Global Features for Action Recognition. Sensors. 2025; 25(22):6849. https://doi.org/10.3390/s25226849

Chicago/Turabian Style

Kang, Shilu, Hua Huo, Jiaxin Xu, Aokun Mei, and Chen Zhang. 2025. "Using Multi-Layer Bidirectional Distillation to Enhance Local and Global Features for Action Recognition" Sensors 25, no. 22: 6849. https://doi.org/10.3390/s25226849

APA Style

Kang, S., Huo, H., Xu, J., Mei, A., & Zhang, C. (2025). Using Multi-Layer Bidirectional Distillation to Enhance Local and Global Features for Action Recognition. Sensors, 25(22), 6849. https://doi.org/10.3390/s25226849

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Using Multi-Layer Bidirectional Distillation to Enhance Local and Global Features for Action Recognition

Abstract

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI