Next Article in Journal
A G-Band Pulsed Wave-Traveling Wave Tube for THz Radar
Previous Article in Journal
ReactionWheel Pendulum Stabilization Using Various State-Space Representations
Previous Article in Special Issue
Sound Source Localization Using Hybrid Convolutional Recurrent Neural Networks in Undesirable Conditions
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

Multi-Feature Fusion for Automatic Piano Transcription Based on Mel Cyclic and STFT Spectrograms

1
Sinwt Technology Company Limited, Beijing 100176, China
2
Beijing Institute of Computer Technology and Application, Beijing 100854, China
3
Information Science and Technology College, Dalian Maritime University, Dalian 116026, China
4
Beijing United Wisdom Robotics Co., Ltd., Beijing 100176, China
*
Author to whom correspondence should be addressed.
Electronics 2025, 14(23), 4720; https://doi.org/10.3390/electronics14234720 (registering DOI)
Submission received: 14 November 2025 / Revised: 27 November 2025 / Accepted: 27 November 2025 / Published: 29 November 2025

Abstract

Automatic piano transcription (APT) is a challenging problem in music information retrieval. In recent years, most APT approaches have been based on neural networks and have demonstrated higher performance. However, most previous works utilize a short-time Fourier transform (STFT) spectrogram as input, which results in a noisy spectrogram due to the mixing of harmonics from concurrent notes. To address this issue, a novel APT network based on two spectrograms is proposed. Firstly, the Mel cyclic and Mel STFT spectrograms of the piano musical signal are computed to represent the mixed audio. Next, separate modules for onset, offset, and frame-level note detection are constructed to achieve distinct objectives. To capture the temporal dynamics of notes, an axial attention mechanism is incorporated into the frame-level note detection modules. Finally, a multi-feature fusion module is introduced to aggregate different features and generate the piano note sequences. In this work, the two spectrograms provide complementary information, the axial attention mechanism enhances the temporal relevance of notes, and the multi-feature fusion module incorporates frame-level note, note onset, and note offset features together to deduce final piano notes. Experimental results demonstrate that the proposed approach achieves higher accuracies with lower error rates in automatic piano transcription compared with other reference approaches.
Keywords: automatic piano transcription; mel cyclic spectrogram; axial attention mechanism; multi-feature fusion; dual-stream structure automatic piano transcription; mel cyclic spectrogram; axial attention mechanism; multi-feature fusion; dual-stream structure

Share and Cite

MDPI and ACS Style

Dai, J.; Zheng, Q.; Wang, Y.; Shan, Q.; Wan, J.; Zhang, W. Multi-Feature Fusion for Automatic Piano Transcription Based on Mel Cyclic and STFT Spectrograms. Electronics 2025, 14, 4720. https://doi.org/10.3390/electronics14234720

AMA Style

Dai J, Zheng Q, Wang Y, Shan Q, Wan J, Zhang W. Multi-Feature Fusion for Automatic Piano Transcription Based on Mel Cyclic and STFT Spectrograms. Electronics. 2025; 14(23):4720. https://doi.org/10.3390/electronics14234720

Chicago/Turabian Style

Dai, Jinliang, Qiuyue Zheng, Yang Wang, Qihuan Shan, Jie Wan, and Weiwei Zhang. 2025. "Multi-Feature Fusion for Automatic Piano Transcription Based on Mel Cyclic and STFT Spectrograms" Electronics 14, no. 23: 4720. https://doi.org/10.3390/electronics14234720

APA Style

Dai, J., Zheng, Q., Wang, Y., Shan, Q., Wan, J., & Zhang, W. (2025). Multi-Feature Fusion for Automatic Piano Transcription Based on Mel Cyclic and STFT Spectrograms. Electronics, 14(23), 4720. https://doi.org/10.3390/electronics14234720

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop