Next Article in Journal
Enhancing Intelligent Robot Perception with a Zero-Shot Detection Framework for Corner Casting
Previous Article in Journal
Tools for Researching the Parameters of Photovoltaic Modules
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

Multi-Scale Parallel Enhancement Module with Cross-Hierarchy Interaction for Video Emotion Recognition

1
College of Electrical Engineering and Automation, Xiamen University of Technology, Xiamen 361024, China
2
Xiamen Key Laboratory of Frontier Electric Power Equipment and Intelligent Control, Xiamen 361024, China
*
Author to whom correspondence should be addressed.
Electronics 2025, 14(9), 1886; https://doi.org/10.3390/electronics14091886
Submission received: 5 April 2025 / Revised: 28 April 2025 / Accepted: 4 May 2025 / Published: 6 May 2025

Abstract

Video emotion recognition faces significant challenges due to the strong spatiotemporal coupling of dynamic expressions and the substantial variations in cross-scale motion patterns (e.g., subtle facial micro-expressions versus large-scale body gestures). Traditional methods, constrained by limited receptive fields, often fail to effectively balance multi-scale correlations between local cues (e.g., transient facial muscle movements) and global semantic patterns (e.g., full-body gestures). To address this, we propose an enhanced attention module integrating multi-dilated convolution and dynamic feature weighting, aimed at improving spatiotemporal emotion feature extraction. Building upon conventional attention mechanisms, the module introduces a multi-branch parallel architecture. Convolutional kernels with varying dilation rates (1, 3, 5) are designed to hierarchically capture cross-scale the spatiotemporal features of low-scale facial micro-motion units (e.g., brief lip tightening), mid-scale composite expression patterns (e.g., furrowed brows combined with cheek raising), and high-scale limb motion trajectories (e.g., sustained arm-crossing). A dynamic feature adapter is further incorporated to enable context-aware adaptive fusion of multi-source heterogeneous features. We conducted extensive ablation studies and experiments on popular benchmark datasets such as the VideoEmotion-8 and Ekman-6 datasets. Experiments demonstrate that the proposed method enhances joint modeling of low-scale cues (e.g., fragmented facial muscle dynamics) and high-scale semantic patterns (e.g., emotion-coherent body language), achieving stronger cross-database generalization.
Keywords: video emotion recognition; cross-scale modeling; dynamic feature weighting video emotion recognition; cross-scale modeling; dynamic feature weighting

Share and Cite

MDPI and ACS Style

Zhang, L.; Sun, Y.; Guan, J.; Kang, S.; Huang, J.; Zhong, X. Multi-Scale Parallel Enhancement Module with Cross-Hierarchy Interaction for Video Emotion Recognition. Electronics 2025, 14, 1886. https://doi.org/10.3390/electronics14091886

AMA Style

Zhang L, Sun Y, Guan J, Kang S, Huang J, Zhong X. Multi-Scale Parallel Enhancement Module with Cross-Hierarchy Interaction for Video Emotion Recognition. Electronics. 2025; 14(9):1886. https://doi.org/10.3390/electronics14091886

Chicago/Turabian Style

Zhang, Lianqi, Yuan Sun, Jiansheng Guan, Shaobo Kang, Jiangyin Huang, and Xungao Zhong. 2025. "Multi-Scale Parallel Enhancement Module with Cross-Hierarchy Interaction for Video Emotion Recognition" Electronics 14, no. 9: 1886. https://doi.org/10.3390/electronics14091886

APA Style

Zhang, L., Sun, Y., Guan, J., Kang, S., Huang, J., & Zhong, X. (2025). Multi-Scale Parallel Enhancement Module with Cross-Hierarchy Interaction for Video Emotion Recognition. Electronics, 14(9), 1886. https://doi.org/10.3390/electronics14091886

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop