RNN-Based F0 Estimation Method with Attention Mechanism

Jandera, Ales; Muzelak, Martin; Skovranek, Tomas

doi:10.3390/info16121089

This is an early access version, the complete PDF, HTML, and XML versions will be available soon.

Open AccessArticle

RNN-Based F0 Estimation Method with Attention Mechanism

by

Ales Jandera

,

Martin Muzelak

and

Tomas Skovranek

^*

Faculty of BERG, Technical University of Kosice, Nemcovej 3, 04200 Kosice, Slovakia

^*

Author to whom correspondence should be addressed.

Information 2025, 16(12), 1089; https://doi.org/10.3390/info16121089 (registering DOI)

Submission received: 12 November 2025 / Revised: 28 November 2025 / Accepted: 5 December 2025 / Published: 7 December 2025

(This article belongs to the Special Issue Signal Processing and Machine Learning, 2nd Edition)

Download Versions Notes

Abstract

Fundamental frequency estimation, also known as F0 estimation, is a crucial task in speech processing and analysis, with significant applications in areas such as speech recognition, speaker identification, and emotion detection. Traditional algorithms, while effective, often encounter challenges in real-time environments due to computational limitations. Recent advances in deep learning, especially in the use of recurrent neural networks (RNNs), have opened new opportunities for enhancing F0 estimation accuracy and efficiency. This paper introduces a novel RNN-based F0 estimation method with an attention mechanism and evaluates its performance against selected state-of-the-art F0 estimation approaches, including standard baseline methods, as well as neural-network-based regression and classification models. By integrating attention mechanisms, the model eliminates the necessity for post-processing steps and enables a more efficient seq2scal estimation process. While the self-attention mechanism used in Transformers captures all pairwise temporal dependencies at a quadratic computational cost, the proposed method’s implementation of the attention mechanism enables it to selectively focus on the most relevant acoustic cues for F0 prediction, enhancing robustness without increasing the model’s complexity. Experimental results using the LibriSpeech and Common Voice datasets demonstrate superior computational efficiency of the proposed method compared to current state-of-the-art RNN-based seq2seq models, while maintaining comparable estimation accuracy. Furthermore, the proposed “RNN-based F0 estimation method with an attention mechanism” achieves the lowest computational complexity among all compared models, while maintaining high accuracy, making it suitable for low-latency, resource-limited deployments and competitive even with standard baseline methods, such as pYIN or CREPE. Finally, the performance of the developed RNN-based F0 estimation method with attention mechanism in terms of RMSE and FLOPs demonstrates the potential of attention mechanisms and sequence modelling in achieving high accuracy alongside lightweight F0 estimation suitable for modern speech processing applications, which aligns with the growing trend towards deploying intelligent systems on resource-constrained devices.

Keywords: attention mechanism; F0 estimation; fundamental frequency; pitch-lag; recurrent neural network; speech processing

Share and Cite

MDPI and ACS Style

Jandera, A.; Muzelak, M.; Skovranek, T. RNN-Based F0 Estimation Method with Attention Mechanism. Information 2025, 16, 1089. https://doi.org/10.3390/info16121089

AMA Style

Jandera A, Muzelak M, Skovranek T. RNN-Based F0 Estimation Method with Attention Mechanism. Information. 2025; 16(12):1089. https://doi.org/10.3390/info16121089

Chicago/Turabian Style

Jandera, Ales, Martin Muzelak, and Tomas Skovranek. 2025. "RNN-Based F0 Estimation Method with Attention Mechanism" Information 16, no. 12: 1089. https://doi.org/10.3390/info16121089

APA Style

Jandera, A., Muzelak, M., & Skovranek, T. (2025). RNN-Based F0 Estimation Method with Attention Mechanism. Information, 16(12), 1089. https://doi.org/10.3390/info16121089

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

RNN-Based F0 Estimation Method with Attention Mechanism

Abstract

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI