Cross-Corpus Speech Emotion Recognition Based on Attention-Driven Feature Refinement and Spatial Reconstruction

Tao, Huawei; Jiang, Yixing; Li, Qianqian; Zhao, Li; Yang, Zhizhe

doi:10.3390/info16110945

This is an early access version, the complete PDF, HTML, and XML versions will be available soon.

Open AccessArticle

Cross-Corpus Speech Emotion Recognition Based on Attention-Driven Feature Refinement and Spatial Reconstruction

by

Huawei Tao

^1,2,*

,

Yixing Jiang

^1,2

,

Qianqian Li

³,

Li Zhao

⁴ and

Zhizhe Yang

⁵

¹

Key Laboratory of Grain Information Processing and Control, Ministry of Education, Henan University of Technology, Zhengzhou 450001, China

²

Henan Key Laboratory of Grain Photoelectric Detection and Control, Henan University of Technology, Zhengzhou 450001, China

³

School of Mechanical and Electrical Engineering, Zhengzhou Business University, Zhengzhou 451200, China

⁴

School of Information Science and Engineering, Southeast University, Nanjing 210096, China

⁵

Yunnan Chinese Language and Culture College, Yunnan Normal University, Kunming 650504, China

^*

Author to whom correspondence should be addressed.

Information 2025, 16(11), 945; https://doi.org/10.3390/info16110945 (registering DOI)

Submission received: 8 September 2025 / Revised: 23 October 2025 / Accepted: 26 October 2025 / Published: 30 October 2025

Download Versions Notes

Abstract

In cross-corpus scenarios, inappropriate feature-processing methods tend to cause the loss of key emotional information. Additionally, deep neural networks contain substantial redundancy, which triggers domain shift issues and impairs the generalization ability of emotion recognition systems. To address these challenges, this study proposes a cross-corpus speech emotion recognition model based on attention-driven feature refinement and spatial reconstruction. Specifically, the proposed approach consists of three key components: first, an autoencoder integrated with a multi-head attention mechanism to enhance the model’s ability to focus on the emotional components of acoustic features during the feature compression process of the autoencoder network; second, a feature refinement and spatial reconstruction module designed to further improve the extraction of emotional features, with a gating mechanism employed to optimize the feature reconstruction process; finally, the Charbonnier loss function adopted as the loss metric during training to minimize the difference between features from the source domain and target domain, thereby enhancing the cross-domain robustness of the model. Experimental results demonstrated that the proposed method achieved an average recognition accuracy of 46.75% across six sets of cross-corpus experiments, representing an improvement of 4.17% to 14.33% compared with traditional domain adaptation methods.

Keywords: cross-corpus speech emotion recognition; attention-driven feature refinement; spatial reconstruction unit; domain adaptation optimization

Share and Cite

MDPI and ACS Style

Tao, H.; Jiang, Y.; Li, Q.; Zhao, L.; Yang, Z. Cross-Corpus Speech Emotion Recognition Based on Attention-Driven Feature Refinement and Spatial Reconstruction. Information 2025, 16, 945. https://doi.org/10.3390/info16110945

AMA Style

Tao H, Jiang Y, Li Q, Zhao L, Yang Z. Cross-Corpus Speech Emotion Recognition Based on Attention-Driven Feature Refinement and Spatial Reconstruction. Information. 2025; 16(11):945. https://doi.org/10.3390/info16110945

Chicago/Turabian Style

Tao, Huawei, Yixing Jiang, Qianqian Li, Li Zhao, and Zhizhe Yang. 2025. "Cross-Corpus Speech Emotion Recognition Based on Attention-Driven Feature Refinement and Spatial Reconstruction" Information 16, no. 11: 945. https://doi.org/10.3390/info16110945

APA Style

Tao, H., Jiang, Y., Li, Q., Zhao, L., & Yang, Z. (2025). Cross-Corpus Speech Emotion Recognition Based on Attention-Driven Feature Refinement and Spatial Reconstruction. Information, 16(11), 945. https://doi.org/10.3390/info16110945

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Cross-Corpus Speech Emotion Recognition Based on Attention-Driven Feature Refinement and Spatial Reconstruction

Abstract

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI