This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Open AccessArticle
RNN-Based F0 Estimation Method with Attention Mechanism
by
Ales Jandera
Ales Jandera
,
Martin Muzelak
Martin Muzelak
and
Tomas Skovranek
Tomas Skovranek *
Faculty of BERG, Technical University of Kosice, Nemcovej 3, 04200 Kosice, Slovakia
*
Author to whom correspondence should be addressed.
Information 2025, 16(12), 1089; https://doi.org/10.3390/info16121089 (registering DOI)
Submission received: 12 November 2025
/
Revised: 28 November 2025
/
Accepted: 5 December 2025
/
Published: 7 December 2025
Abstract
Fundamental frequency estimation, also known as F0 estimation, is a crucial task in speech processing and analysis, with significant applications in areas such as speech recognition, speaker identification, and emotion detection. Traditional algorithms, while effective, often encounter challenges in real-time environments due to computational limitations. Recent advances in deep learning, especially in the use of recurrent neural networks (RNNs), have opened new opportunities for enhancing F0 estimation accuracy and efficiency. This paper introduces a novel RNN-based F0 estimation method with an attention mechanism and evaluates its performance against selected state-of-the-art F0 estimation approaches, including standard baseline methods, as well as neural-network-based regression and classification models. By integrating attention mechanisms, the model eliminates the necessity for post-processing steps and enables a more efficient seq2scal estimation process. While the self-attention mechanism used in Transformers captures all pairwise temporal dependencies at a quadratic computational cost, the proposed method’s implementation of the attention mechanism enables it to selectively focus on the most relevant acoustic cues for F0 prediction, enhancing robustness without increasing the model’s complexity. Experimental results using the LibriSpeech and Common Voice datasets demonstrate superior computational efficiency of the proposed method compared to current state-of-the-art RNN-based seq2seq models, while maintaining comparable estimation accuracy. Furthermore, the proposed “RNN-based F0 estimation method with an attention mechanism” achieves the lowest computational complexity among all compared models, while maintaining high accuracy, making it suitable for low-latency, resource-limited deployments and competitive even with standard baseline methods, such as pYIN or CREPE. Finally, the performance of the developed RNN-based F0 estimation method with attention mechanism in terms of RMSE and FLOPs demonstrates the potential of attention mechanisms and sequence modelling in achieving high accuracy alongside lightweight F0 estimation suitable for modern speech processing applications, which aligns with the growing trend towards deploying intelligent systems on resource-constrained devices.
Share and Cite
MDPI and ACS Style
Jandera, A.; Muzelak, M.; Skovranek, T.
RNN-Based F0 Estimation Method with Attention Mechanism. Information 2025, 16, 1089.
https://doi.org/10.3390/info16121089
AMA Style
Jandera A, Muzelak M, Skovranek T.
RNN-Based F0 Estimation Method with Attention Mechanism. Information. 2025; 16(12):1089.
https://doi.org/10.3390/info16121089
Chicago/Turabian Style
Jandera, Ales, Martin Muzelak, and Tomas Skovranek.
2025. "RNN-Based F0 Estimation Method with Attention Mechanism" Information 16, no. 12: 1089.
https://doi.org/10.3390/info16121089
APA Style
Jandera, A., Muzelak, M., & Skovranek, T.
(2025). RNN-Based F0 Estimation Method with Attention Mechanism. Information, 16(12), 1089.
https://doi.org/10.3390/info16121089
Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details
here.
Article Metrics
Article metric data becomes available approximately 24 hours after publication online.