One-Step Discrete Fourier Transform-Based Sinusoid Frequency Estimation under Full-Bandwidth Quasi-Harmonic Interference

João Miguel Silva; Marco António Oliveira; André Ferraz Saraiva; Aníbal J. S. Ferreira

doi:10.3390/acoustics5030049

,

and

¹

Department of Electrical and Computer Engineering, Faculty of Engineering, University of Porto, Rua Dr. Roberto Frias, 4200-465 Porto, Portugal

²

INESC TEC, Rua Dr. Roberto Frias, 4200-465 Porto, Portugal

³

Forensic Science Laboratory, Judiciary Police, Rua Gomes Freire, 1169-007 Lisboa, Portugal

^*

Author to whom correspondence should be addressed.

Acoustics2023, 5(3), 845-869;https://doi.org/10.3390/acoustics5030049

Version Notes

Order Reprints

Abstract

The estimation of the frequency of sinusoids has been the object of intense research for more than 40 years. Its importance in classical fields such as telecommunications, instrumentation, and medicine has been extended to numerous specific signal processing applications involving, for example, speech, audio, and music processing. In many cases, these applications run in real-time and, thus, require accurate, fast, and low-complexity algorithms. Taking the normalized Cramér–Rao lower bound as a reference, this paper evaluates the relative performance of nine non-iterative discrete Fourier transform-based individual sinusoid frequency estimators when the target sinusoid is affected by full-bandwidth quasi-harmonic interference, in addition to stationary noise. Three levels of the quasi-harmonic interference severity are considered: no harmonic interference, mild harmonic interference, and strong harmonic interference. Moreover, the harmonic interference is amplitude-modulated and frequency-modulated reflecting real-world conditions, e.g., in singing and musical chords. Results are presented for when the Signal-to-Noise Ratio varies between −10 dB and 70 dB, and they reveal that the relative performance of different frequency estimators depends on the SNR and on the selectivity and leakage of the window that is used, but also changes drastically as a function of the severity of the quasi-harmonic interference. In particular, when this interference is strong, the performance curves of the majority of the tested frequency estimators collapse to a few trends around and above 0.4% of the DFT bin width.

Keywords:

sinusoidal analysis; frequency estimation; harmonic interference

1. Introduction

Accurate estimation of the frequency of individual sinusoids is required in many classical signal processing problems in telecommunications, medicine, and instrumentation []. Typical application areas include fundamental frequency (or pitch) estimation in speech, singing, and polyphonic music signals [,], voice quality assessment [,], parametric sinusoidal modeling [], coding [,,], and power grid synchronization []. Quite often, these scenarios involve many co-existing sinusoids, possibly hundreds of sinusoids, with most of them being quasi-harmonically related in a non-deterministic way.

In this paper, we focus on accurate frequency estimation of individual sinusoids when they are contaminated by other interfering sinusoids, in addition to wideband noise. This is illustrated in Figure 1, which represents the spectrogram of two frequency-modulated (FM) sinusoids in close proximity. The granularity in the spectrogram representation is a consequence—and limitation—of the underlying discrete Fourier transform (DFT) frequency resolution, which hinders the detailed frequency contours of the FM sinusoids. Plot (b) in Figure 1 illustrates an accurate analysis of those contours by taking as input the same spectral information that is used in the represented spectrogram. The Matlab command file allowing us to replicate this figure is available on GitHub (https://github.com/Anibal-Ferreira/demo_AccSinFreqEst (URL accessed on 30 August 2023)).

Figure 1. Illustrative spectrogram of two FM sinusoids in close proximity (a) and corresponding accurate frequency estimation (brown and blue stars in b plot). The ground truth frequency contours are represented by the green solid line in (b).

Given that processing delay and computational complexity are important issues in real-time interactive applications, we focus on single-step (i.e., non-iterative), low-complexity, DFT-based frequency analysis and estimation that can be easily implemented on low-power platforms which have a limited processing capability.

Frequency estimation of sinusoids may be performed using either time-domain techniques or frequency-domain techniques []. The former are mainly based on correlation or covariance functions, and include Multiple Signal Classification (MUSIC) [] and Estimation of Signal Parameters Via Rotational Invariance Techniques (ESPRIT) [], which are eigenspace-based decomposition methods for the estimation of the frequencies of a known number of complex sinusoids observed in noise []. Despite their accuracy potential, we will not consider them in this paper due to their significant computational complexity.

On the other hand, frequency-domain estimation techniques are mainly based on the phase derivative of the DFT spectrum [,], on cepstral analysis [] or on DFT spectrum peak analysis and coarse–fine frequency estimation [,,]. Phase-based frequency estimators (the reader is referred to [,] for an overview) include phase-based vocoder techniques [] and the reassignment estimator []. Both require information from at least two DFTs. It has been reported that the presence of several interfering sinusoids significantly disturbs phase-based estimators [] (page 392) and [].

In this paper, we focus on DFT-based coarse–fine frequency estimation, given that their computational simplicity is commensurate with that of the fast Fourier transform (FFT), i.e.,

O (N log N)

[,,,,,,,,,,,,,,].

Many published approaches to frequency estimation presume a single complex exponential—or cisoid—which may be regarded as a highly simplistic scenario, given that, in this case, and contrarily to what happens in the case of a real-valued sinusoid, no mirrored spectral image of the cisoid exists on the negative frequency axis, which means that there are no leakage implications (in this paper, we designate the leakage due to the spectral mirror of a cisoid as self-leakage). Thus, a given frequency estimator may be portrayed as very accurate when operating with a cisoid under noise contamination, but its performance may drop significantly when operating with real-valued sinusoids, or even degrade critically under the influence of other co-existing sinusoids; namely, quasi-harmonic sinusoids. In fact, as pointed out by Hainsworth and Macleod [] (page 2), ‘sinusoidal estimation errors consist of bias intrinsic to the estimation algorithm, variance from the noise, and bias due to multiple tones’. These are the real-world conditions that this paper addresses, as many signals in nature have a quasi-harmonic structure, notably singing and musical signals in general, and especially those containing musical chords (for example, piano sounds are known to exhibit a relevant degree of inharmonicity []).

In [], Aboutanios et. al. present an iterative algorithm to estimate the frequency of a cisoid. This algorithm presumes the rectangular window and uses an interpolation formula that requires two additional DFT coefficients, which, in fact, correspond to Odd-DFT (ODFT) coefficients [,] (see also Section 3.3). This algorithm, known as the Aboutanios and Mulgrew (A&M) algorithm, exhibits a performance that is very close to the Cramér–Rao lower bound (CRLB). It has been extended in [] to the estimation of the frequency of a real sinusoid and, in [], to the estimation of the frequencies of multiple superimposed cisoids. In both cases, the A&M baseline algorithm is used after the leakage due to all cisoids on both positive and negative frequency axes are modeled, synthesized, and subtracted from the signal. The purpose of this processing is to leave the cisoid whose frequency is being estimated free from leakage interference.

Other published works have also addressed compensation or mitigation techniques for the bias created by harmonics above the fundamental frequency of a harmonic structure of sinusoids, which implies a significant computational penalty. For example, Liguori et al. [] discuss compensation strategies for a single frequency estimator when the number of interfering harmonics is 2 or 4. Belega and Petri [] discuss compensation strategies for a specific class of windows (maximum side lobe decay—MSD—windows); especially the Hanning window.

More recent work [,] took inspiration from the A&M algorithm and combines iterative leakage compensation techniques in order to deliver accurate frequency estimation results. For example, Liu et al. [] combine several techniques in order to perform accurate frequency estimation of a real-valued sinusoid. Estimation is performed in two steps. In the first step, the MSD window is used in the DFT analysis for coarse frequency estimation. This estimation requires the computation of three additional DFT coefficients inside the main lobe of the frequency response of the MSD window. This process is repeated twice for improved estimation results. An interpolation formula adapted to the MSD window is then used to deliver the ‘coarse’ frequency estimation. Using this estimation, a cisoid having a negative frequency is synthesized and added to the signal, such that the resulting signal reduces to a cisoid having a positive frequency only (i.e., self-leakage is almost entirely cancelled out). Then, in a second step, this new complex-valued signal undergoes a similar processing, but using the rectangular window; DFT analysis is followed by the computation of three additional DFT coefficients inside the main lobe of the frequency response of the rectangular window, and this is a process which is again repeated twice. The final accurate frequency estimation is obtained by means of an interpolation formula that is adapted to the rectangular window.

Given that, in this paper, we focus on low-complexity frequency estimation algorithms, we regard the above approaches as not practical, in case the signal contains tens or hundreds of real-valued sinusoids, because that would require not only removing all of the mirrored spectral images on the negative frequency axis, but also all of the cisoids on the positive frequency axis, except those under analysis. Moreover, bias compensation can be framed as a deterministic process in case the harmonic relationship is strict and precise. When the relationship between the target sinusoid and the interfering sinusoids is quasi-harmonic, and the latter are further subject to amplitude modulation (AM) and frequency modulation (FM) effects, as we admit in this paper, then the interference process is more probabilistic rather than deterministic. Thus, in this paper, we do not aim to optimize further the frequency estimation performance of individual frequency estimators beyond the current state of the art, but focus instead on the intrinsic robustness and performance of a representative selection of simple and efficient frequency estimators (i.e., one-step DFT-based frequency estimators), as reported in the literature, when the target real-valued sinusoid is subject to non-deterministic and AM and FM-modulated quasi-harmonic interference, in addition to noise, both above and below the target sinusoid. To the best of our knowledge, this perspective has not yet been discussed in the literature.

In [], the direct impact of quasi-harmonic interference on the performance of several non-iterative DFT-based frequency estimators was studied. This study was, however, limited in that only two interfering sinusoids were quasi-harmonically related to the target sinusoid, AM/FM modulation effects were not considered, and a representative comparison between frequency estimators using different windows was not made. In this paper, we expand our previous research on accurate frequency estimation of real sinusoids [,,,], and we evaluate the relative performance of a selection of nine representative (and non-iterative) DFT-based frequency estimators under mild and strong full-bandwidth quasi-harmonic interference, both below and above the target sinusoid frequency. This selection of estimators is based on their reported efficiency and performance, and also includes a modified version of a recent sine window-based frequency estimator [,]. Moreover, we consider that the harmonic interference is subject to AM/FM modulation effects reflecting typical perturbations in real-world signals, such as singing [] and musical chords.

The remainder of the paper is structured as follows. In Section 2, we detail the specificities of our DFT-based frequency estimation problem, namely in terms of the signal assumptions, the analysis framework and its constraints, test settings, and the degrees of harmonic interference. We also address in this section the different window functions in our research and their features, and the performance criterion we use to assess results. In Section 3, we describe briefly all nine DFT-based frequency estimators in our research, including a recently improved version of a frequency estimator that is based on the sine window. In Section 4, we present and discuss the main results in this paper when the frequency estimation is not affected by quasi-harmonic interference, when it is full-bandwidth but mild, and when it is full-bandwidth and strong. Finally, Section 5 summarizes the main results and contributions of this paper and projects future research.

2. DFT-Based Frequency Estimation

2.1. The Estimation Problem

We consider that

x [n]

represents a discrete-time signal containing a target sinusoid whose frequency is

ω_{ℓ}

, and that is affected by additive white Gaussian noise,

r [n]

, and, possibly also, other co-existing sinusoidal components that are quasi-harmonically related. These are generally represented by

s [n]

.

The frequency of the target sinusoid is given by

ω_{ℓ} = \frac{2 π}{N} (ℓ + Δ_{ℓ}),

(1)

where ℓ and

Δ_{ℓ}

represent, respectively, the integer part (or the DFT bin index,

0 < ℓ < N / 2

), and the fractional part on the DFT bin scale (

0.0 \leq Δ_{ℓ} < 1.0

or, depending on the interpolation rule,

- 0.5 \leq Δ_{ℓ} < 0.5

), and N is the size of the DFT. In the case of a complex sinusoid (or cisoid),

x [n] = A e^{n ω_{ℓ} + ϕ} + r [n] + s [n]

(2)

and, in the case of a real sinusoid,

x [n] = A sin (n ω_{ℓ} + ϕ) + r [n] + s [n] .

(3)

In both equations, A represents the magnitude of the sinusoid, and

ϕ

represents the starting phase of the sinusoid. The estimation problem involves taking an N-sample segment of the input signal according to (2), or (3), and finding the values of ℓ and

Δ_{ℓ}

after the signal has been multiplied by an analysis window,

h [n]

, and transformed to the frequency domain using an N-point DFT

V [k] = \sum_{n = 0}^{N - 1} h [n] x [n] e^{- j \frac{2 π}{N} k n} .

(4)

Since this analysis is discrete in both time and frequency domains,

V [k]

,

x [n]

, and

h [n]

are all N-periodic. The natural frequency resolution of the N-point DFT (or bin width) is

2 π / N

.

It is known that when the rectangular window is used, and the size of the DFT (N) and the Signal-to-Noise Ratio (SNR) are sufficiently high, then the maximum likelihood estimate of the frequency of a sinusoid corresponds to the frequency value that maximizes the magnitude spectrum. The error variance of this estimate approaches the Cramér–Rao lower bound (CRLB) that characterizes the minimum error variance of a general unbiased estimator [,]. Thus, when several sinusoids co-exist, and provided that N is sufficiently high, i.e., provided that the different sinusoids are resolved by the DFT such that the magnitude spectrum consists of a multimodal function, frequency estimation can be reliably performed while avoiding computationally intensive iterative [] or analysis-by-synthesis [] procedures. In this context, a crude frequency estimator would just identify the local maximum in the magnitude spectrum that exists at

k = ℓ

, i.e.,

ℓ = \underset{k}{argmax} | V [k] |

, and estimate the frequency as

{\hat{ω}}_{ℓ} = ℓ 2 π / N

. This is usually referred to in the literature as coarse frequency estimation. In this worst case scenario, due to the range of

Δ_{ℓ}

, the maximum absolute estimation error is

50 %

of the normalized bin width, i.e.,

| {\hat{ω}}_{ℓ} - ω_{ℓ} | / (2 π / N) \leq 0.5

. Given that this worst-case estimation error is a good anchor for a ‘non-accurate’ frequency interpolator, (in this paper, ‘frequency estimation’ and ‘frequency interpolation’ are used interchangeably), all results in this paper are normalized by the natural frequency resolution of the DFT (

2 π / N

) in order to give a sense, in relative terms, on how accurate a given estimator is.

The main purpose of the estimation problem is, therefore, to find a simple and accurate algorithm, or formula, that estimates the value of

Δ_{ℓ}

through interpolation of the values of

V [k]

around

k = ℓ

, when the sinusoid is affected not only by Gaussian noise according to a given SNR, but also by different degrees of quasi-harmonic interference (no interference, mild, or strong interference). This two-step approach has been identified by Rife and Boorstyn as a “coarse search” and a “fine search” []. While the first step is quite straightforward (it involves only peak-picking), provided that the signal is not overwhelmed by noise, the second is not and represents, in fact, the challenge that is discussed in most DFT-based frequency estimation papers (Section 1), and also in this paper. In particular, when the sinusoid is contaminated by other sinusoids and/or noise, the maximum absolute estimation error should be far less than

50 %

of the normalized bin width, and the variance of the estimation error should approach as much as possible the CRLB [,]. As explained in Section 2.3, in addition to evaluating the relative performance between different estimators under the same test conditions, we are especially interested in assessing how the relative performance changes when there is no harmonic interference, when the quasi-harmonic interference is mild, or when it is strong. This represents a stress test that unveils how consistent the accuracy and robustness of a given frequency estimator remains under real-world conditions.

In this context, and as we will address in Section 2.5, in our research we focus on the challenge that is represented by (3) because it reflects more realistic real-world conditions.

2.2. Estimation Constraints

One good example of our target applications is real-time visual feedback of the singing voice and, notably, of the associated melodic line [,]. This scenario implies that the total processing delay, from the instant the signal is captured using a microphone, till the instant its information is represented on a computer screen, is commensurate with the human perception of ‘instantaneous’. Therefore, the acceptable total processing delay should not exceed a few tens of milliseconds. This is quite in line with the acceptable delay between sound and image before the ‘lip sync’ problem is perceived by a human, or the maximum acceptable delay between the direct sound and a reflected replica before the latter is perceived as a distinct echo []. In both cases, acceptable delays may range between 10 ms and about 50 ms. On the other hand, the syllabic duration in speech is at least 10–20 ms [], and while in singing it tends to be longer, ornamental elements in singing like vibrato mean that important pitch variations should be accurately captured and represented on a screen with a reasonable refresh rate, or time resolution; for example, in the order of 20 ms.

Both aspects, real-time operation and time resolution in signal analysis, and refresh rate, require that the total processing delay should not exceed 50 ms, and that all processing stages should be as parsimonious as possible regarding computational complexity. Therefore, taking advantage of the local stationarity of speech, or singing, a convenient solution to the frequency estimation problem involves using a single DFT and an accurate and computationally efficient frequency estimation procedure. We admit that all individual sinusoidal components that may exist in the signal are eligible for accurate frequency estimation. For example, this is important in order to facilitate segregation of multiple co-existing harmonic structures (as in musical chords). In this perspective, frequency estimation must be carried using an algorithm that:

Excludes iterative procedures and is computationally light;
Computes a DFT whose length is the same as that of the data vector, i.e., zero-padding techniques ([,,,]) are excluded (zero-padding can be looked at as an inefficient frequency interpolation technique);
Maximizes the estimation accuracy and robustness when other interfering sinusoids co-exist in the signal, in addition to noise.

According to these general guidelines, our approach to sinusoidal analysis and frequency estimation involves three fundamental steps:

Multiplication of a signal (or data) vector $x [n]$ by a window function that is represented by $h [n]$ (an operation also known as tapering);
DFT computation, typically by means of an FFT;
Peak picking in the DFT magnitude spectrum and frequency estimation of a target sinusoid by using a simple interpolation algorithm (or formula), taking several samples from the DFT magnitude spectrum.

2.3. Degrees of Harmonic Interference and Test Settings

In our research, we will consider three levels of severity for the quasi-harmonic interference that is represented in (3) by

s [n]

: no interference, mild interference, or strong interference. While the first case is obvious (i.e., there are no other sinusoids than that whose frequency is being estimated), the second and third severity levels can be explained with the help of Figure 2. Given a target sinusoid frequency to be estimated, according to (1), we consider that, under mild harmonic interference, the target frequency is approximately the second harmonic of an existing quasi-harmonic structure, and, under strong harmonic interference, the same target frequency is approximately the fourth harmonic of an existing quasi-harmonic structure of sinusoids. The fundamental frequency of the quasi-harmonic structure is

ω_{0} = \frac{2 π}{N} F_{0}

, where

F_{0}

represents a real number on a DFT bin scale, with

0 < F_{0} < N / 2

. In our simulations, we set

ℓ = 20

and, using a simple frequency spacing condition simulating quasi-harmonicity,

3 F_{0} - (ℓ + 1.0) = ℓ - F_{0}

, we obtain that for mild harmonic interference,

F_{0} = 10.25

DFT bins. For strong harmonic interference, we consider that

F_{0}

is half of this value, i.e.,

F_{0} = 5.125

DFT bins. As explained in Section 2.4, these two

F_{0}

alternatives ensure that, for the different windows that we consider in our research, harmonic resolvability is guaranteed, although tightly in some cases.

Figure 2. Illustration of three severity levels of quasi-harmonic interference affecting the estimation of a target sinusoid frequency: no interference (top panel), and mild and strong quasi-harmonic interference, when the target sinusoid frequency corresponds approximately to the second (middle panel) and fourth (bottom panel) harmonic of an existing quasi-harmonic structure, respectively. The gray/black areas denote that the quasi-harmonic interference is AM/FM-modulated and the

α

index sets the maximum deviation in magnitude and fundamental frequency.

In order to give the quasi-harmonic interference a realistic profile, we consider that it is also affected by AM and FM effects according to a modulation index,

α

, as it is represented in Figure 2. As a reference, we use characteristics that are typical in singing. In fact, singing signals are frequently characterized by a periodic variation of the fundamental frequency, an FM effect known as vibrato, as well as a periodic variation of the signal intensity, an AM effect known as tremolo. Perceptually, tremolo is not as important as vibrato [] and, to a certain extent, the former can be looked at as a consequence of the latter. In fact, tremolo depends on several factors; namely, the relation between the frequencies of partials (i.e., the harmonics), and the frequencies of voice formants []. Therefore, it is a reasonable assumption that the rate of vibrato and tremolo is the same. Sundberg [] (page 164) notes that a typical and comfortable vibrato rate is 6.5 Hz, and that the typical extension of the vibrato is 1.5 semitones on the equally tempered scale. This corresponds to a relative variation of the fundamental frequency by about

{(\sqrt[12]{2})}^{1.5} - 1.0 \approx 9.1 %

. Therefore, we consider in our tests that the quasi-harmonic interference occupies the full Nyquist bandwidth and is subject to a combined AM and FM effect whose depth is

4.5 %

around the mean, and whose rate is 6.5 Hz, respectively.

In this context, the quasi-harmonic interference

s [n]

in (3) is obtained as

s [n] = A (1 + a [n]) \sum_{\begin{matrix} k = 1 \\ k \neq nint {\frac{ℓ}{F_{0}}} \end{matrix}}^{⌊ \frac{N / 2}{F_{0}} ⌋} sin [k F_{0} (\frac{2 π}{N} n + f [n])],

(5)

where

⌊ \cdot ⌋

denotes the “floor” operator retaining the largest integer in the argument,

nint {\cdot}

denotes the nearest integer, the frequency of the target sinusoid is

ω_{ℓ} = \frac{2 π}{N} (ℓ + Δ_{ℓ})

with

ℓ = 20.0

, and

Δ_{ℓ}

varies in the range

[0.0, 1.0 [

. We set

A = 1

, which means that all partials in the harmonic interference have a magnitude that is comparable to that of the target sinusoid. This denotes a condition that typically is more demanding than what happens with real-world signals. In (5) the quasi-harmonic interference is AM and FM modulated using

a [n] = α sin (\frac{2 π}{N} θ n + ϕ_{AM}),

(6)

and

f [n] = β sin (\frac{2 π}{N} θ n + ϕ_{FM}) .

(7)

In these equations, we set

N = 512

, and we assume that the sampling frequency is 22,050 Hz. Moreover, taking into consideration the discussion above regarding the rate of vibrato, we set

θ = \frac{6.5 \times 512}{22,050} \approx 0.151

. Regarding the depth of the AM/FM modulation effects, we set

α = 0.045

, which determines

β = \frac{α}{θ} \approx 0.298

.

For each value of

Δ_{ℓ}

in the range

[0.0, 1.0 [

and using a step of

0.01

, 100 Monte Carlo realizations of the (real) noise vector

r [n]

are generated according to a desired SNR (in the range

[- 10, 70]

dB, in steps of

2.5

dB), in order to collect stable statistics. In each realization of

r [n]

and, therefore, of

x [n]

, the values of

ϕ

,

ϕ_{AM}

, and

ϕ_{FM}

are randomized in the range

[0, 2 π [

, which represents an important part of the Monte Carlo simulations.

2.4. Windows, Selectivity and Leakage

In general, our target applications involve accurate frequency estimation in multitone detection and estimation. In this case, as noted by Harris [], ‘maximum dynamic range in multitone detection requires the Fourier transform of the window to exhibit a highly concentrated central lobe with very low sidelobe structure’. We illustrate this using four windows of decreasing selectivity and increasing main-side lobe attenuation: the rectangular window

h_{R} [n] = 1, n = 0, 1, \dots, N - 1,

(8)

the sine window

h_{S} [n] = \sqrt{h_{H} [n]} = sin \frac{π}{N} (n + 0.5), n = 0, \dots, N - 1,

(9)

the shifted Hanning window that is defined as

h_{H} [n] = \frac{1}{2} [1 - cos \frac{2 π}{N} (n + 0.5)], n = 0, \dots, N - 1,

(10)

and the Gaussian window

h_{G} [n] = exp [- \frac{1}{2} {(γ \frac{n - M}{M})}^{2}], n = 0, \dots, N - 1,

(11)

where

M = (N - 1) / 2

. Figure 3 illustrates the magnitude of the frequency responses of these windows. Due to its importance regarding the optimized tradeoff between time and frequency localization in connection with the uncertainty principle [], the Gaussian window we are illustrating here presumes

γ = 3.0

.

Figure 3. Magnitude frequency response of the rectangular window (dotted line), of the sine window (dashed line), of the Hanning window (solid line), and of the Gaussian window (dash-dotted line). The abscissae axis can also be read as a DFT-bin scale.

Figure 3 shows that the main lobe width of the frequency response of the rectangular, sine, Hanning, and Gaussian window is

4 π / N

,

6 π / N

,

8 π / N

, and

14 π / N

, respectively. This can also be seen as the safe frequency separation between two sinusoids so that they are fairly resolved in the DFT discrete frequency domain, i.e., that allows the two sinusoids to appear as individual peaks in the DFT magnitude spectrum.

In this perspective, the rectangular window has the best selectivity and the Gaussian window has the poorest selectivity. On the other hand, the larger the main lobe, the better the attenuation between the main lobe and the side lobes; a feature referred to as ‘near-end leakage’ []. It is known that the minimum main-side lobe attenuation of the rectangular, sine, Hanning, and Gaussian windows, is about 13 dB, 23 dB, 32 dB, and 57 dB, respectively []. In this perspective, the rectangular window gives rise to the largest leakage and the Gaussian window gives rise to the smallest leakage. The lower the leakage, the smaller the mutual influence between two resolved sinusoids in the discrete Fourier transform (DFT) spectrum []. In fact, Hainsworth and Macleod note that ‘windowing increases variance but reduces bias’ []. These aspects are very important in this paper as they are likely to influence the performance of the estimation process when other interfering sinusoids and noise are present in the signal. The sine window is particularly important in audio analysis/synthesis and coding [,,], since it is frequently used in analysis/synthesis filter banks satisfying perfect reconstruction requirements [,] (e.g., the Modified Discrete Cosine Transform—MDCT [,,]). A different type of sine window is considered in [] in the estimation of the frequency of a damped co-sinusoid.

Figure 3 also helps us to understand that a single stationary cisoid gives rise to a local peak in the DFT magnitude spectrum that, at most, comprises two, three, four, or seven DFT bins falling within the main lobe of the frequency response of the rectangular, sine, Hanning, or Gaussian window, respectively. Thus, in order to avoid leakage as much as possible, it appears appropriate to interpolate the value of

Δ_{ℓ}

using at most the two, three, four, or seven largest DFT spectral lines around a spectral peak when the rectangular, sine, Hanning, or Gaussian window is used, respectively (other methods require at least 6 or even 18 DFT bins []). In our simulations, we are not considering the Gaussian window, since all results with the quadratic interpolation rule in the frequency domain (quadratic—or parabolic—interpolation is strictly accurate only for the Gaussian window [] (page 47)) consistently revealed that it has a relative poor performance. Thus, only the rectangular, sine, and (shifted) Hanning windows are considered in our simulations and results.

2.5. Performance Criterion

The CRLB for the variance of the frequency estimation error when a complex sinusoid (or cisoid) is considered, and an unbiased estimator is used, is given by [,,]

var \{ω_{ℓ} - {\hat{ω}}_{ℓ}\} \geq \frac{6 σ^{2}}{A^{2} N (N^{2} - 1)},

(12)

where A represents the magnitude of the cisoid, N represents the period of the DFT, and

σ^{2}

represents the variance of the noise which is assumed to be zero-mean, white, complex, and Gaussian. However, using a cisoid as a test signal corresponds to the best test scenario since simple frequency estimators can be found that either provide exact estimates in the absence of noise or other interferences, or that perform quite close to the CRLB when noise affects the signal, as discussed in Section 1, and as noted at the end of Section 2.1. This scenario is assumed, for example, in [,,,].

If

x [n]

is instead a real sinusoid according to (3), the magnitude spectrum of

V [k]

exhibits two local maxima (i.e., spectral peaks) that are governed by Dirichlet kernels [,], for example of the form

sin (θ) / sin (θ / N)

in the case of the rectangular window, with one of them being on the positive frequency axis, and the other one being in the ‘mirror’ position on the negative frequency axis. Figure 3 illustrates the magnitude of the Dirichlet kernel (on a dB scale) for different windows. Each spectral peak generates leakage that influences the ‘mirror’ peak and that may be significant if ℓ is a ‘small’ number, for example, less than 5 DFT bins in the case of the rectangular window [,], or less than 9 DFT bins in the case of the Hanning window [].

Most frequency estimators presume that leakage due to the image of a spectral peak (i.e., the self-leakage) can be ignored, which means that they suffer from a structural bias that sets a limit to the performance of the estimator [] (page 391). Despite this, and as noted by Betser et. al. [] (page 513), the CRLB for a single cisoid and unbiased estimators is still a useful reference for biased estimators. One possibility to convert the input real sinusoids and noise to their complex versions is to construct an analytical signal using the Hilbert transform [,]. However, because a practical Hilbert transform modifies the signal near the zero and Nyquist frequencies, in addition to increasing the overall algorithm complexity, it is not considered here.

Furthermore, many real-world signals of great interest, such as speech, or singing, exhibit an approximate harmonic structure and, therefore, each spectral peak, in addition to being influenced by its mirror spectral image, is also influenced by leakage due to neighboring (and interfering) sinusoids. Assessing the robustness of different frequency estimators to different levels of this combined influence, in addition to noise, is, thus, the purpose of this paper.

In this scenario, we create a performance criterion that is based on the CRLB and that assumes that all sinusoids and noise, are real-valued. Given that there is only one target real-valued sinusoid whose frequency is estimated, according to the model of (3), the CRLB becomes scaled by a factor of 2. In addition, and in order to simplify the evaluation of the relative performance of different frequency interpolators, as discussed in Section 2.1, we take the square root of

var \{ω_{ℓ} - {\hat{ω}}_{ℓ}\}

, and we normalize the result by the natural frequency resolution of the DFT (

2 π / N

):

RMSE = \sqrt{2} \frac{\sqrt{var \{ω_{ℓ} - {\hat{ω}}_{ℓ}\}}}{2 π / N} \geq \sqrt{\frac{3 σ^{2}}{A^{2} π^{2} N (1 - 1 / N^{2})}} .

(13)

Thus, this performance criterion is not only based on the well established CRLB, but it also offers a clear meaning: if, for some frequency estimator, the corresponding RMSE is closer to

0.5

(i.e., 50% of the DFT bin width), that means that its performance is poor and non-accurate. On the other hand, if the corresponding RMSE is closer to the normalized CRLB bound, that means that its performance is closer to the ideal performance. The RMSE criterion will be used in Section 4 when the main results of this paper will be presented and discussed.

3. Tested Frequency Estimators

In this section, we describe the frequency estimators that are included in our research. We organize this description by groups of frequency estimators as a function of the windows they use. Their relative performance combining accuracy and sensitivity to noise and harmonic interference will be evaluated in Section 4 using the RMSE criterion. All frequency estimators use information from a few bins surrounding a spectral peak in the DFT magnitude spectrum, the DFT length is the same as that of the data vector, and all of them comply with the preferred structural characteristics that are identified in Section 2.2. As indicated in Section 2.1, since the integer part of the frequency is easily found by locating a local maximum in the magnitude spectrum, all frequency estimators are mainly concerned with estimating the fractional frequency

{\hat{Δ}}_{ℓ}

using the

V [k]

spectral information.

3.1. Rectangular Window-Based Estimators

Two non-iterative, DFT-based frequency estimators that presume the rectangular window have been selected in our research based on their reported simplicity and performance [,,]. These frequency estimators are identified as Macleod98(R) and Jacobsen07(R). Both estimators use three samples from the DFT spectrum [,], including magnitude and phase information. This means that, in addition to the two samples that fall within the main lobe of the frequency response of the rectangular window, one more sample that belongs to one of side lobes is also used. This specific perspective has not been discussed by the authors.

If

| V [ℓ] |

corresponds to a local maximum in the DFT magnitude spectrum, and if

ℜ {\cdot}

denotes the real part of the complex argument, then the Macleod98(R) frequency estimator first computes

α = ℜ \{V [ℓ] V^{*} [ℓ]\}

,

α_{L} = ℜ \{V [ℓ - 1] V^{*} [ℓ]\}

, and

α_{R} = ℜ \{V [ℓ + 1] V^{*} [ℓ]\}

, which allows us to obtain

γ = \frac{α_{L} - α_{R}}{2 α + α_{L} + α_{R}},

and then, finally

{\hat{Δ}}_{ℓ} = \frac{\sqrt{1 + 8 γ^{2}} - 1}{4 γ}, - \frac{1}{2} \leq {\hat{Δ}}_{ℓ} < \frac{1}{2} .

(14)

In the case of the Jacobsen07(R) frequency estimator, frequency estimation is simply obtained by computing

\begin{matrix} {\hat{Δ}}_{ℓ} = \frac{tan \frac{π}{N}}{\frac{π}{N}} ℜ \{\frac{V [ℓ - 1] - V [ℓ + 1]}{2 V [ℓ] - V [ℓ + 1] - V [ℓ - 1]}\}, \\ - \frac{1}{2} \leq {\hat{Δ}}_{ℓ} < \frac{1}{2} . \end{matrix}

(15)

In this equation, a correction factor, which was proposed by Candan in order to reduce bias and improve performance [], has been included.

3.2. Hanning Window-Based Estimators

Based on their reported simplicity and performance, we selected four non-iterative DFT-based frequency estimators that presume the Hanning window [,,,]. They are identified in this paper as Grandke83(H), Macleod98(H), Quinn06(H), and Jacobsen07(H).

The Grandke83(H) frequency estimator uses spectral magnitude information only and is implemented by computing

{\hat{Δ}}_{ℓ} = \frac{2 | V [ℓ + 1] | - | V [ℓ] |}{| V [ℓ + 1] | + | V [ℓ] |}, 0 \leq {\hat{Δ}}_{ℓ} < 1 .

(16)

The author admits that good performance requires that real sinusoids are sufficiently separated in frequency, e.g., by more than 20 DFT bins. Therefore, it is important to assess how the performance compares to that of other estimators when this constraint is relaxed under quasi-harmonic interference.

The Macleod98(H) frequency estimator uses the same

α

,

α_{L}

, and

α_{R}

coefficients that were defined in the previous section and delivers the fractional frequency as

{\hat{Δ}}_{ℓ} = 2 \frac{α_{L} - α_{R}}{2 α - α_{L} - α_{R}}, - \frac{1}{2} \leq {\hat{Δ}}_{ℓ} < \frac{1}{2} .

(17)

The author emphasizes that this estimator benefits from the ‘intrinsic leakage rejection’ of the Hanning window [].

The Quinn06(H) frequency estimator uses the three DFT samples surrounding the local maximum in the magnitude spectrum (

| V [ℓ] |

) []. Defining

α_{L} = \frac{2 ρ + 1}{1 - ρ}

where

ρ = ℜ \{\frac{V [ℓ - 1]}{V [ℓ]}\}

, and

α_{R} = \frac{2 ϱ + 1}{ϱ - 1}

where

ϱ = ℜ \{\frac{V [ℓ + 1]}{V [ℓ]}\}

, frequency is estimated as

{\hat{Δ}}_{ℓ} = \frac{α_{L} + α_{R}}{2} + ξ (α_{L}^{2}) - ξ (α_{R}^{2}), - \frac{1}{2} \leq {\hat{Δ}}_{ℓ} < \frac{1}{2},

(18)

where

\begin{matrix} ξ (x) & = & - \frac{5}{14} log (35 x^{2} + 120 x + 32) + \\ \frac{\sqrt{155}}{140} log (\frac{12 + 7 x - 4 \sqrt{\frac{31}{5}}}{12 + 7 x + 4 \sqrt{\frac{31}{5}}}) . \end{matrix}

The Jacobsen07(H) frequency estimator uses spectral magnitude information only and is implemented by computing

\begin{matrix} {\hat{Δ}}_{ℓ} = 1.36 \frac{| V [ℓ + 1] | - | V [ℓ - 1] |}{| V [ℓ] | + | V [ℓ + 1] | + | V [ℓ - 1] |}, \\ - \frac{1}{2} \leq {\hat{Δ}}_{ℓ} < \frac{1}{2} . \end{matrix}

(19)

It consists of one of the two Hanning window-based fractional frequency estimators that are proposed in [].

3.3. Sine Window-Based Estimators

We include in our simulations three sine window-based frequency estimators. The first version of these frequency estimators was presented in [] and was based on a single ArcTan-based approximation rule (this is not related to the ArcTan-based estimator that is presented in []), which provided a maximum absolute estimation error of 1%, relative to the natural frequency resolution of the DFT filter bank (i.e., of the DFT bin width) and in the absence of noise. A subsequent improvement of this frequency estimator was based on three ArcTan-based approximation rules, which brought the maximum absolute estimation error down to 0.1% of the bin width []. This estimator is identified in our tests as ArcTan(S). Subsequently, Dun and Liu [] found more practical approximations to the three rules and, using essentially the same design approach of [], obtained a frequency estimator which exhibits a maximum absolute estimation error slightly below 0.025% of the DFT bin width. The frequency estimator modified by Dun–Liu is identified in this paper as Dun15(S).

In this paper, we also present an improvement of the estimator modified by Dun–Liu and which reduces further the maximum absolute estimation error to less than 0.006% of the DFT bin width. The context for that improvement is provided next.

Except for a constant gain factor, the magnitude frequency response of the sine window (Equation (9)) can be expressed as [,]

|H_{S} (ω)| = |\frac{cos (N ω / 2) cos (ω / 2)}{cos (ω) - cos (π / N)}| .

(20)

The sine window-based frequency estimators take advantage of the relationship between this frequency response and the magnitude spectrum as available from

| V [k] |

, since the latter is governed by the former. Figure 4 helps to illustrate that relationship according to a filter bank perspective.

Figure 4. Illustration of the projection of a target sinusoid whose frequency is

ω_{ℓ} = \frac{2 π}{N} (ℓ + Δ_{ℓ})

on the frequency response of three adjacent ODFT sub-bands (or channels): channel

k = ℓ - 1

(dash-dotted line), channel

k = ℓ

(solid line), and channel

k = ℓ + 1

(dashed line). As

0.0 \leq Δ_{ℓ} < 1.0

, the magnitude of the

ℓ th

ODFT channel is a local maximum.

This figure presumes that the frequency of the target sinusoid is

ω_{ℓ} = \frac{2 π}{N} (ℓ + Δ_{ℓ})

(with

0.0 \leq Δ_{ℓ} < 1.0

), and that instead of the DFT, the discrete time-frequency transformation is achieved by means of the Odd-frequency DFT (ODFT) []. The ODFT consists of a slight modification of (4), in the sense that the exponent in the complex exponential,

- j \frac{2 π}{N} k n

, is replaced by

- j \frac{2 π}{N} (k + \frac{1}{2}) n

; please see [,,] for details. We denote this time-frequency transformation as

V_{O} [k]

. The net effect of this convenient modification is that the sampled frequencies of the ODFT channels (or sub-band filters) are right-shifted by

π / N

relative to those of the plain DFT, which means that when

Δ_{ℓ} \to 0.0

, or when

Δ_{ℓ} \to 1.0

, the target sinusoid is projected into two non-zero ODFT spectral lines, as Figure 4 helps to clarify (the figure represents the positive side of the frequency axis). In addition, when

Δ_{ℓ}

varies in the range

[0.0, 1.0 [

, the magnitude of the ODFT channel when

k = ℓ

(i.e.,

V_{O} [ℓ]

) is always larger than that of the two neighboring channels (i.e., those corresponding to the ODFT channels

k = ℓ - 1

and

k = ℓ + 1

). Since the magnitude of

V_{O} [k]

is determined by the frequency response of the

k th

ODFT channel (except for a constant scaling factor), then the ratios involving

V_{O} [k]

spectral coefficients when

k = ℓ - 1, ℓ, ℓ + 1

, assuming that leakage effects are negligible, are the same as the ratios involving the frequency responses of the corresponding ODFT sub-bands or channels, as Figure 4 illustrates.

Given that the main lobe of the frequency response of

h_{S} [n]

encompasses three discrete-frequency bins (Section 2.4), this creates an opportunity for three ratios involving different pairs of ODFT spectral coefficients to be computed, as specified by (21), (22), and (23). As it is illustrated in Figure 4, these ratios are the same as the frequency response ratios that are also indicated in (21), (22), and (23), and which take advantage of (20) and of the approximation

sin (θ) \approx θ

for small

θ

. Please see [,] for details.

\begin{matrix} R & = & \frac{| V_{O} [ℓ - 1] |}{| V_{O} [ℓ + 1] |} = \frac{| H_{S} (\frac{2 π}{N} (Δ_{ℓ} + \frac{1}{2})) |}{| H_{S} (\frac{2 π}{N} (Δ_{ℓ} - \frac{3}{2})) |} \\ \approx & \frac{(2 - Δ_{ℓ}) (Δ_{ℓ} - 1)}{Δ_{ℓ} (1 + Δ_{ℓ})} \end{matrix}

(21)

\begin{matrix} Q & = & \frac{| V_{O} [ℓ - 1] |}{| V_{O} [ℓ] |} = \frac{| H_{S} (\frac{2 π}{N} (Δ_{ℓ} + \frac{1}{2})) |}{| H_{S} (\frac{2 π}{N} (Δ_{ℓ} - \frac{1}{2})) |} \\ \approx & \frac{Δ_{ℓ} - 1}{1 + Δ_{ℓ}} \end{matrix}

(22)

\begin{matrix} S & = & \frac{| V_{O} [ℓ + 1] |}{| V_{O} [ℓ] |} = \frac{| H_{S} (\frac{2 π}{N} (Δ_{ℓ} - \frac{3}{2})) |}{| H_{S} (\frac{2 π}{N} (Δ_{ℓ} - \frac{1}{2})) |} \\ \approx & \frac{Δ_{ℓ}}{2 - Δ_{ℓ}} \end{matrix}

(23)

These relations can be easily solved for

Δ_{ℓ}

leading to (24), (25), and (26), respectively.

{\hat{Δ}}_{ℓ} \approx \frac{3 + R - \sqrt{1 + R (14 + R)}}{2 (1 - R)}

(24)

{\hat{Δ}}_{ℓ} \approx \frac{1 - Q}{1 + Q}

(25)

{\hat{Δ}}_{ℓ} \approx \frac{2 S}{1 + S}

(26)

Dun and Liu follow an approach similar to [] and use (25) when

{\hat{Δ}}_{ℓ} < 0.4

, (26) when

{\hat{Δ}}_{ℓ} > 0.6

, and (24) when

0.4 \leq {\hat{Δ}}_{ℓ} \leq 0.6

; for details please see []. Many experiments led us to the conclusion that this choice of a mutually exclusive arrangement of rules not only tends to give rise to an asymmetric error distribution in the full

Δ_{ℓ}

range, i.e.,

0.0 \leq Δ_{ℓ} < 1.0

, but also tends to aggravate bias effects due to the influence of quasi-harmonic sinusoids. We concluded that both aspects are better addressed when mutual exclusion of rules is relaxed, and at least two rules are combined in specific ranges. While avoiding overfitting for specific scenarios, we concluded that a good compromise for many different SNR and harmonic interference conditions can be obtained by combining the estimates provided by (24) and (25) when

{\hat{Δ}}_{ℓ} \leq 0.350

, (24) and (26) when

{\hat{Δ}}_{ℓ} \geq 0.65

, and combining all three rules when

0.35 < {\hat{Δ}}_{ℓ} < 0.65

. Equation (27) explicits these combinations that rely on a weighted average depending on the ODFT spectral coefficients, which provides a natural smoothing between the three rules.

{\hat{Δ}}_{ℓ} = \{\begin{matrix} \frac{Q | V_{O} [ℓ - 1] | + R | V_{O} [ℓ] |}{| V_{O} [ℓ - 1] | + | V_{O} [ℓ] |} & , & 0.0 \leq {\hat{Δ}}_{ℓ} \leq 0.35 \\ \frac{Q | V_{O} [ℓ - 1] | + R | V_{O} [ℓ] | + S | V_{O} [ℓ + 1] |}{| V_{O} [ℓ - 1] | + | V_{O} [ℓ] | + | V_{O} [ℓ + 1] |} & , & 0.35 < {\hat{Δ}}_{ℓ} < 0.65 \\ \frac{R | V_{O} [ℓ] | + S | V_{O} [ℓ + 1] |}{| V_{O} [ℓ] | + | V_{O} [ℓ + 1] |} & , & 0.65 \leq {\hat{Δ}}_{ℓ} < 1.0 \end{matrix}

(27)

This equation expresses our improved sine window-based estimator, which we identify in this paper as Proposed(S). Figure 5 illustrates a practical (and reproducible) example of using (27) in order to perform accurate frequency estimation of the first 11 sinusoids of a short singing signal affected by vibrato. The Matlab command file allowing us to replicate this Figure is available on GitHub (https://github.com/Anibal-Ferreira/demo_AccSinFreqEst (URL accessed on 30 August 2023)).

Figure 5. Illustrative spectrogram of a short excerpt of singing containing vibrato (a) and corresponding accurate frequency estimation of the frequencies of the first 11 harmonics ((b) plot). The different colors in (b) represent the frequency trajectories of individual harmonics.

The sine window we consider in our research (Equation (9)) satisfies requirements of perfect reconstructing analysis–synthesis filter banks [,,] that many voice and music analysis–synthesis processing systems rely upon and, in this sense, is different from the sine window that is considered in other studies; namely, in [].

4. Test Results and Main Conclusions

In this section, we present the main test results when frequency estimation is not affected by harmonic interference (Section 4.1), when the quasi-harmonic interference is full-bandwidth and mild (Section 4.2), and when it is full-bandwidth and strong (Section 4.3). Section 4.4 presents the main conclusions and insights emerging from the results. Nine non-iterative frequency estimators complying with the constraints discussed in Section 2.2 are evaluated when subjected to the same test conditions and settings that are discussed in Section 2.3, and on the assumption that coarse frequency estimation is without error (the practical implication of this assumption is that a local maximum in the magnitude spectrum is always correctly identified). The estimators include two rectangular window-based estimators that are identified in Section 3.1 (Macleod98(R) and Jacobsen07(R)), four Hanning window-based frequency estimators that are identified in Section 3.2 (Grandke83(H), Macleod98(H), Quinn06(H), and Jacobsen07(H)), and three sine window-based frequency estimators that are discussed in Section 3.3 (ArcTan(S), Dun15(S), and Proposed(S)). Figure 6 represents the deterministic estimation error due to structural bias, as a percent of the DFT bin width, which is associated with eight of the tested frequency estimators in the absence of harmonic and noise interferences, when

N = 512

, and when

ℓ = N / 4

. It should be noted that this particular value of ℓ corresponds to the ‘sweet spot’ in the sense that, for a real sinusoid, it minimizes self-leakage due to the Dirichlet kernel located on the negative frequency axis. Even though the setting

ℓ = N / 4

is frequently used in the literature to report results, it is not representative of realistic operational conditions with real-world signals.

Figure 6. Estimation error (as % of the DFT bin width) for representative frequency estimators in the absence of noise and harmonic interference. Results are presented for a single real-valued sinusoid, when

N = 512

, and when

ℓ = N / 4

. Fractional Frequency denotes

Δ_{ℓ}

.

In Figure 6, we use

ℓ = N / 4

just to give a first flavour of a comparative perspective for different frequency estimators, and also to emphasize that a lower estimation error in Figure 6 is not indicative of a ‘better’ estimator, because the test conditions are not representative of the capability of each estimator to handle noise and leakage due to quasi-harmonic interference. This capability will be more correctly assessed in the following sections using as a reference the performance criterion that is described in Section 2.5, and taking the overall error that reflects structural bias, error variance from the noise, and ‘bias due to multiple tones’ [].

4.1. Results with No Harmonic Interference

Figure 7 represents the test results regarding systematic bias, i.e., the

mean \{ω_{ℓ} - {\hat{ω}}_{ℓ}\}

results, when there is no harmonic interference. The results are normalized by the DFT bin width. Figure 8 represent the test results regarding RMSE according to (13) and in the absence of harmonic interference. Regarding systematic bias, it starts in the range 0.1–0.3% for very low SNR values, and vanishes rapidly for all frequency estimators when the SNR varies between −10 dB and 20 dB. Above 25 dB SNR, the bias can be considered negligible.

Figure 7. Bin width-normalized systematic bias for all tested frequency estimators as a function of the SNR and in absence of harmonic interference.

Figure 8. Bin width-normalized RMSE for all tested frequency estimators as a function of the SNR and in absence of harmonic interference. The CRLB is also represented as a reference.

Regarding RSME results, four groups of estimators can be identified that have, each, a similar trend in terms of performance. First, as the SNR increases, the Jacobsen07(H) and the Macleod98(R) estimators are the first two to suffer from strong bias due to noise given that, for SNR > 25 dB, their performance saturates above 0.002 (or 0.2% of the DFT bin width). It is remarkable, however, that, below 20 dB SNR, the Macleod98(R) estimator better approaches the CRLB than any other estimator, which makes it the best performing estimator in this SNR range. Three main reasons help to explain this outcome: the rectangular window allows the estimator to benefit from the best frequency selectivity, as the main lobe of the frequency response of the rectangular window is the narrowest, this advantage is more effective when the SNR becomes less than the minimum main-to-side lobe attenuation of the rectangular window, which is around 13 dB, and the estimator uses spectral phase information in addition to spectral magnitude information.

The ArcTan(S) estimator is the next to show an asymptotic behavior since its RMSE performance saturates to around 0.07% of the DFT bin width for SNR > 35 dB. The next two estimators to show a close asymptotic behavior are the Jacobsen07(R) and the Dun15(S) estimators, when SNR > 50 dB. This causes some surprise, because the Jacobsen07(R) uses the rectangular window, which is known to have the poorest main-to-side lobe attenuation. However, the fact that this estimator also uses phase, in addition to spectral magnitude information, and benefits from the improvement introduced by Candan [], means that its performance exceeds what could be expected at first sight. This result highlights the non-obvious conclusion that a given estimator may possess an intrinsic ability to ‘cancel’ bias, such that it may outperform other estimators that use windows with an improved main-to-side lobe attenuation. Finally, the Quinn06(H), the Macleod98(H), the Grandke83(H), and the Proposed(S) estimators follow a similar trend at an almost constant distance from the CRLB.

The relative behavior between the estimators tested here is consistent with the ‘ranking’ that emerged from our previous research [].

4.2. Results with Mild Quasi-Harmonic Interference

Figure 9 and Figure 10 depict, respectively, the systematic bias and RMSE performance results of all tested frequency estimators when the quasi-harmonic interference, as defined in Section 2.3, is mild. Regarding systematic bias, the trend for all estimators is quite similar to what was observed in the case of absence of quasi-harmonic interference (previous subsection), except for the Macleod98(R) estimator, whose systematic bias level fluctuates even when the SNR is high, although it does not exceed 0.05% of the DFT bin width.

Figure 9. Bin width-normalized systematic bias for all tested frequency estimators as a function of the SNR and under mild quasi-harmonic interference conditions.

Figure 10. Bin width-normalized RMSE for all tested frequency estimators as a function of the SNR and under mild quasi-harmonic interference conditions. The CRLB is also represented as a reference.

Regarding RMSE, four performance trends can be identified as in the previous case (Section 4.1). The first aspect to note is that these trends reflect a degradation relative to what is observed in Figure 8, in the sense that the asymptotic behavior initiates for lower SNR values. This is expected, given that leakage effects are stronger. The second aspect to highlight and that represents a surprise is that the frequency estimators that share a similar trend are different in Figure 8 and in Figure 10. For example, the RMSE performance of the Macleod98(R) estimator now saturates above 0.01 (or 1% of the DFT bin width) for SNR > 10 dB, which shows that bias effects due to leakage dominate. A second trend collapses the RMSE performances of four estimators that were separated in Figure 8: the Jacobsen07(H) estimator, the ArcTan(S) estimator, and the Jacobsen07(R) and Dun15(S) estimators. A third trend groups the Quinn06(H), the Grandke83(H), and the Proposed(S) estimators, whose RMSE performances saturate around 0.04% of the DFT bin width for SNR > 40 dB. Finally, the Macleod98(H) estimator is able to follow the CRLB more closely for SNR > 40 dB. It is interesting to note that this estimator is one of the two estimators that are more distant from the CRLB for SNR < 20 dB, which suggests that a lower performance at low SNR is compensated by a higher performance at high SNR. This is also the case for the Macleod98(R) estimator, which approaches the CRLB closer than any other estimator when the SNR is less than about 5 dB.

4.3. Results with Strong Quasi-Harmonic Interference

Figure 11 and Figure 12 represent, respectively, the performance results of all tested frequency estimators under strong full-bandwith quasi-harmonic interference, as specified in Section 2.3. As expected, relatively to the previous two test cases, both in terms of systematic bias and RMSE, performance curves exhibit a stronger degradation. Regarding systematic bias, in addition to the remarks already made concerning the previous two test cases, the current test case reveals that the systematic bias is non-negligible for most estimators, even when the SNR exceeds 10 dB, and especially in the case of the two rectangular window-based estimators (i.e., the Jacobsen07(R) and the Macleod98(R) estimators), whose structural bias may be as high as 0.1% of the DFT bin width.

Figure 11. Bin width-normalized systematic bias for all tested frequency estimators as a function of the SNR and under strong quasi-harmonic interference conditions.

Figure 12. Bin width-normalized RMSE for all tested frequency estimators as a function of the SNR and under strong quasi-harmonic interference conditions. The CRLB is also represented as a reference.

Regarding the RMSE performance curves, as expected, the asymptotic behavior initiates for even lower SNR values, relative to the previous test case. On the other hand, it can be seen that all performance curves collapse to three major trends. The worst-performing trend involves only the Macleod98(R) frequency estimator, whose RMSE performance saturates to around 3.5% of the DFT bin width for SNR > 5 dB. The next best-performing trend groups the Jacobsen07(R), the Dun15(S), and the ArcTan(S) estimators. The best-performing group of frequency estimators includes the Jacobsen07(H), the Proposed(S), the Grandke83(H), the Quinn06(H), and the Macleod98(H) estimators. The RMSE performance of these estimators saturates to around 0.4% of the bin width for SNR > 20 dB.

4.4. Main Conclusions

The results in Section 4.1, Section 4.2 and Section 4.3 suggest several relevant conclusions that are summarized next.

First, systematic bias affects all frequency estimators in a similar way, varying between 0.1% and 0.3% of the bin width when the SNR is the range −10 dB to +10 dB, and vanishes rapidly for higher SNR, such that it can be considered negligible, except in the case of the Macleod98(R) and Jacobsen07(R) estimators, whose systematic bias can be as high as 0.1% of the DFT bin width under strong quasi-harmonic interference.
Second, in terms of RMSE, it is clear that more severe quasi-harmonic interference conditions degrade the performance of all frequency estimators, but this degradation is not the same for all estimators. This fact is explained by the intrinsic robustness of each estimator, which depends not only on the window that is associated with the estimator, but also on their estimation approach dealing with spectral magnitude information only, or a combination of spectral magnitude and phase. For example, the Jacobsen07(H) estimator uses spectral magnitude only, and the Macleod98(H) estimator uses both spectral magnitude and phase, which is what gives it an ‘intrinsic leakage rejection’ capability [].
Third, the relative performance of the same estimator depends not only on the SNR (as expected), but is also highly influenced by the severity of the quasi-harmonic interference. For example, it is quite interesting to observe that the Jacobsen07(H) estimator is the worst-performing estimator when the test conditions do not involve harmonic interference, it belongs to the second group of worst-performing estimators under mild quasi-harmonic interference, and it belongs to the group of best-performing estimators under strong quasi-harmonic interference. This reflects the fact that all estimators suffer a stronger performance degradation when the test conditions become more severe, but that degradation affects different estimators differently. The Jacobsen07(H) estimator appears to be an exception, as its performance is quite consistent across test cases. Thus, it may be concluded that operational conditions dictate if a given estimator has a better or worse relative performance. For example, the Macleod98(R) estimator exhibits the best relative performance across test cases for very low SNR levels (because it approaches better the CRLB), but exhibits the worst relative performance across test cases for moderate and high SNR levels (because its performance curve saturates to higher RMSE values). Results also suggest that if a given estimator shows a good performance under no harmonic interference, it may perform poorly when subject to strong quasi-harmonic interference. This is the case of the Jacobsen07(R) estimator.
Fourth, when the quasi-harmonic interference is strong, its impact on the frequency estimation performance for the majority of the estimators considered in this paper is quite significant, which confirms that it has a dominant effect in limiting the performance.
Finally, our results suggest that when a frequency estimator shows a relative better performance across test cases at high SNR, that is obtained at the cost of a relatively worse performance across test cases at low SNR. That is clearly the case for the Macleod98(H) estimator when harmonic interference is mild or strong.

Finally, it is instructive to relate the absolute estimation error of the different frequency estimators when they operate on the ‘sweet spot’ and in the absence of harmonic interference and noise, as illustrated in Figure 6, and the variance of the estimation error that is associated to the different estimators under the ‘stress test’ that the results in Figure 12 reflect. It is clear that a much smaller absolute estimation error under ideal (i.e., no stress) conditions is not necessarily indicative of a much better performance under a realistic and stressful scenario. That is the case of the Grandke83(H) and the Proposed(S) estimators.

5. Conclusions

In this paper, we compared the relative performance of nine non-iterative, discrete Fourier transform-based (DFT) frequency estimators, taking as a reference the Cramér–Rao Lower Bound (CRLB) for the error variance of a general unbiased estimator, and considering the combined impact of such aspects as spectral selectivity and main-side lobe attenuation of the analysis window, the Signal-to-Noise Ratio (SNR), the algorithmic approach of the estimator dealing with just spectral magnitude information or a combination of spectral magnitude and phase, and, most importantly, the severity of quasi-harmonic interference that includes amplitude modulation and frequency modulation.

The results indicate that quasi-harmonic interference plays a major role in constraining the performance of all frequency estimators, especially when it is strong, in which case the performances of the majority of the tested frequency estimators collapse to just a few trends relative to the CRLB.

The results also indicate that the performance of a given estimator, which includes systematic bias and variance aspects, is not uniquely determined by the characteristics of the window being used by that estimator, nor is it predicted by the maximum absolute estimation error when the frequency of one single sinusoid is estimated in the absence of noise and harmonic interference. Rather, it depends on how well the frequency estimator takes advantage of the frequency response of the window being used by that estimator, and depends on the intrinsic ability of the estimator to ‘cancel’ bias due to multiple tones, which is a feature that seems to benefit from spectral phase information, in addition to magnitude.

Other relevant conclusions emerging from our research include: (i) a rectangular window-based frequency estimator (Macleod98(R)) approaches the CRLB better than any other estimator for low SNR values (e.g., <20 dB under no harmonic interference), (ii) if a frequency estimator shows a higher relative performance at high SNR it tends to show a lower relative performance at low SNR, (iii) quasi-harmonic interference does not degrade the performance of different estimators in a similar way, and (iv) if the severity of quasi-harmonic interference is high, estimators that are based on the Hanning and sine windows show better and similar performances, in the order of 0.4%, relative to the bin width of the DFT filter bank, which means that they can be considered sufficiently accurate for practical purposes when tens or hundreds of concurrent sinusoids need to be analyzed in real time.

Future work will leverage on the most important findings reported here in order to tackle multi-pitch estimation in concurrent speech and singing, as well as music chord identification and separation.

Author Contributions

Conceptualization, J.M.S. and A.J.S.F.; methodology, J.M.S., M.A.O. and A.F.S.; software, J.M.S., M.A.O., A.F.S. and A.J.S.F.; validation, J.M.S., M.A.O., A.F.S. and A.J.S.F.; formal analysis, J.M.S. and A.J.S.F.; investigation, J.M.S., M.A.O. and A.F.S.; writing—original draft preparation, J.M.S. and A.J.S.F.; writing—review and editing, J.M.S.; visualization, A.F.S.; supervision, A.J.S.F.; project administration, A.J.S.F.; funding acquisition, A.J.S.F. All authors have read and agreed to the published version of the manuscript.

Funding

This work was financed by FEDER funds (Fundo Europeu de Desenvolvimento Regional) through the COMPETE 2020-Operacional Programme for Competitiveness and Internationalization (POCI), and by Portuguese funds through FCT (Fundação para a Ciência e a Tecnologia) in the framework of the project with reference POCI-01-0145-FEDER-029308.

Data Availability Statement

Matlab code replicating Figure 1 and Figure 5 is available on GitHub: https://github.com/Anibal-Ferreira/demo_AccSinFreqEst (URL accessed on 30 August 2023).

Acknowledgments

The authors would like to thank the anonymous reviewers for the comments and suggestions, which helped to improve the quality of the manuscript.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyzes, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

DFT	Discrete Fourier transform
CRLB	Cramér–Rao lower bound
ESPRIT	Estimation of Signal Parameters Via Rotational Invariance Techniques
FFT	Fast Fourier transform
MUSIC	Multiple Signal Classification
ODFT	Odd discrete Fourier transform
RMSE	Root Mean Squared Error
SNR	Signal-to-Noise Ratio

References

Jacobsen, E.; Kootsookos, P. Fast, Accurate Frequency estimators. IEEE Signal Process. Mag. 2007, 24, 123–125. [Google Scholar] [CrossRef]
Klapuri, A.; Davy, M. (Eds.) Signal Processing Methods for Music Transcription; Part III: Multiple Fundamental Frequency Analysis; Springer: New York, NY, USA, 2006. [Google Scholar]
Hoppe, D.; Sadakata, M.; Desain, P. Developement of real-time visual feedback assistance in singing training: A review. J. Comput. Assist. Learn. 2006, 22, 308–316. [Google Scholar] [CrossRef]
Maciel, C.D.; Pereira, J.C.; Stewart, D. Identifying Healthy and Pathological Affected Voice Signals. IEEE Signal Process. Mag. 2010, 27, 120–123. [Google Scholar] [CrossRef]
Kreiman, J.; Vanlancker-Sidtis, D.; Gerratt, B. Defining and Measuring Voice Quality. In Proceedings of the From Sound to Sense, Geneva, Switzerland, 11–13 June 2004; pp. C163–C168. [Google Scholar]
Ferreira, A.J.S. Accurate Estimation in the ODFT Domain of the Frequency, Phase and Magnitude of Stationary Sinusoids. In Proceedings of the 2001 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Platz, NY, USA, 21–24 October 2001; pp. 47–50. [Google Scholar]
Ferreira, A.; Sinha, D. Advances to a Frequency-Domain Parametric Coder of Wideband Speech. In Proceedings of the 140th Convention of the Audio Engineering Society Convention, Paris, France, 4–7 June 2016; p. 9509. [Google Scholar]
Kondoz, A.M. Digital Speech (Coding for Low Bit Rate Communication Systems); John Wiley & Sons, Inc.: Hoboken, NJ, USA, 1994. [Google Scholar]
Painter, T.; Spanias, A. Perceptual Coding of Digital Audio. Proc. IEEE 2000, 88, 451–513. [Google Scholar] [CrossRef]
Borkowsky, J.; Kania, D.; Mroczka, J. Interpolated DFT-based fast and accurate estimation for the control of power. IEEE Trans. Ind. Electron. 2014, 61, 7026–7034. [Google Scholar] [CrossRef]
Hess, W. Pitch Determination of Speech Signals-Algorithms and Devices; Springer: Berlin/Heidelberg, Germany, 1983. [Google Scholar]
Schmidt, R.O. Multiple Emitter Location and Signal Parameter Estimation. IEEE Trans. Antennas Propag. 1986, 34, 276–280. [Google Scholar] [CrossRef]
Roy, R.; Paulraj, A.; Kailath, T. ESPRIT-A Subspace Rotation Approach to Estimation of Parameters of Cisoids in Noise. IEEE Trans. Acoust. Speech Signal Process. 1986, 34, 1340–1344. [Google Scholar] [CrossRef]
Hayes, M.H. Statistical Digital Signal Processing and Modeling; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 1996. [Google Scholar]
Lagrange, M.; Marchand, S. Estimating the Instantaneous Frequency of Sinusoidal Components Using Phase-Based Methods. J. Audio Eng. Soc. 2007, 55, 385–399. [Google Scholar]
Puckette, M.S.; Brown, J.C. Accuracy of Frequency Estimates Using the Phase Vocoder. IEEE Trans. Speech Audio Process. 1998, 6, 166–176. [Google Scholar] [CrossRef]
Rabiner, L.R.; Gold, B. Theory and Application of Digital Signal Processing; Prentice-Hall Inc.: Upper Saddle River, NJ, USA, 1975. [Google Scholar]
Schoukens, J.; Pintelon, R.; Hamme, H.V. The Interpolated Fast Fourier Transform: A comparative study. IEEE Trans. Instrum. Meas. 1992, 41, 226–232. [Google Scholar] [CrossRef]
Keiler, F.; Marchand, S. Survey on extraction of sinusoids in stationary sounds. In Proceedings of the 5th International Conference on Digital Audio Effects (DAFx-02), Hamburg, Germany, 26–28 September 2002; pp. 51–58. [Google Scholar]
Betser, M.; Collen, P.; Richard, G.; David, B. Review and Discussion on Classical STFT-based Frequency Estimators. In Proceedings of the 120th Audio Engineering Society Convention, Paris, France, 20–23 May 2006; p. 6765. [Google Scholar]
Hainsworth, S.; Macleod, M. On Sinusoidal Parameter Estimation. In Proceedings of the 6th International Conference on Digital Audio Effects (DAFx-03), London, UK, 8–11 September 2003; pp. 1–6. [Google Scholar]
Betser, M.; Collen, P.; Richard, G.; David, B. Estimation of Frequency for AM/FM Models Using the Phase Vocoder Framework. IEEE Trans. Signal Process. 2008, 56, 505–517. [Google Scholar] [CrossRef]
Auger, F.; Flandrin, P. Improving the readability of time-frequency and time-scale representation by the reassignment method. IEEE Trans. Signal Process. 1995, 43, 1068–1088. [Google Scholar] [CrossRef]
Rife, D.C.; Vincent, G.A. Use of the Discrete Fourier Transform in the Measurement of Frequencies and Levels of Tones. Bell Syst. Tech. J. 1970, 49, 197–228. [Google Scholar] [CrossRef]
Rife, D.C.; Boorstyn, R.R. Single-Tone Parameter Estimation from Discrete-Time Observations. IEEE Trans. Inf. Theory 1974, 20, 591–598. [Google Scholar] [CrossRef]
Jain, V.K.; Collins, W.L.; Davis, D.C. High-Accuracy Analog Measurements via Interpolated FFT. IEEE Trans. Instrum. Meas. 1979, 28, 113–122. [Google Scholar] [CrossRef]
Grandke, T. Interpolation Algorithms for Discrete Fourier Transforms of Weighted Signals. IEEE Trans. Instrum. Meas. 1983, 32, 350–355. [Google Scholar] [CrossRef]
Renders, H.; Schoukens, J.; Vilain, G. High-Accuracy Spectrum analysis of Sampled Discrete Frequency Signals by Analytical Leakage Compensation. IEEE Trans. Instrum. Meas. 1984, 33, 287–292. [Google Scholar] [CrossRef]
Quinn, B.G. Estimation of Frequency, Amplitude, and Phase from the DFT of a Time Series. IEEE Trans. Signal Process. 1997, 45, 814–817. [Google Scholar] [CrossRef]
Macleod, M.D. Fast Nearly ML Estimation of the Parameters of Real or Complex Single Tones or Resolved Multiple Tones. IEEE Trans. Signal Process. 1998, 46, 141–148. [Google Scholar] [CrossRef]
Klein, J.D. Fast Algorithms for Single Frequency Estimation. IEEE Trans. Signal Process. 2006, 54, 1762–1770. [Google Scholar] [CrossRef]
Quinn, B.G. Frequency Estimation using Tapered Data. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Toulouse, France, 14–19 May 2006; pp. III-73–III-76. [Google Scholar]
Quinn, B.G. Recent Advances in Rapid Frequency Estimation. In Digital Signal Processing; Elsevier: Amsterdam, The Netherlands, 2009; pp. 942–948. [Google Scholar]
Roederer, J.G. The Physics and Psychophysics of Music—An Introduction; Springer: Berlin/Heidelberg, Germany, 1995. [Google Scholar]
Aboutanios, E.; Mulgrew, B. Iterative frequency estimation by interpolation on Fourier coefficients. IEEE Trans. Signal Process. 2005, 53, 1237–1242. [Google Scholar] [CrossRef]
Bellanger, M. Digital Processing of Signals; John Willey & Sons: Hoboken, NJ, USA, 1989. [Google Scholar]
Ferreira, A.; Sinha, D. Accurate and Robust Frequency Estimation in the ODFT Domain. In Proceedings of the 2005 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY, USA, 16–19 October 2005; pp. 203–206. [Google Scholar]
Ye, S.; Kocherry, D.L.; Aboutanios, E. A novel algorithm for the estimation of the parameters of a real sinusoid in noise. In Proceedings of the 2015 23rd European Signal Processing Conference (EUSIPCO), Nice, France, 31 August–4 September 2015; pp. 2271–2275. [Google Scholar] [CrossRef]
Ye, S.; Aboutanios, E. An algorithm for the parameter estimation of multiple superimposed exponentials in noise. In Proceedings of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), South Brisbane, Australia, 19–24 April 2015; pp. 3457–3461. [Google Scholar] [CrossRef]
Liguori, C.; Paolillo, A.; Pignotti, A. Estimation of Signal Parameters in the Frequency Domain in the Presence of Harmonic Interference: A Comparative Analysis. IEEE Trans. Instrum. Meas. 2006, 55, 562–569. [Google Scholar] [CrossRef]
Belega, D.; Petri, D. Effect of noise and harmonics on sine-wave frequency estimation by interpolated DFT algorithms based on few observed cycles. Signal Process. 2017, 140, 207–218. [Google Scholar] [CrossRef]
Matusiak, A.; Borkowski, J.; Mroczka, J. Noniterative method for frequency estimation based on interpolated DFT with low-order harmonics elimination. Measurement 2016, 196, 915–918. [Google Scholar] [CrossRef]
Liu, Z.; Fan, L.; Liu, J.; Liu, N.; Jin, J.; Xing, J. Accurate Frequency Estimator for Real Sinusoid Based on DFT. Electronics 2022, 11, 3042. [Google Scholar] [CrossRef]
Ferreira, A.J.S.; Sousa, R. DFT-based frequency estimation under harmonic interference. In Proceedings of the 4th International Symposium on Communications, Control and Signal Processing, Limassol, Cyprus, 3–5 March 2010. [Google Scholar]
Sousa, R.; Ferreira, A.J.S. Non-iterative frequency estimation in the DFT magnitude domain. In Proceedings of the 4th International Symposium on Communications, Control and Signal Processing, Limassol, Cyprus, 3–5 March 2010. [Google Scholar]
Dun, Y.; Liu, G. A Fine-Resolution Frequency Estimator in the Odd-DFT Domain. IEEE Signal Process. Lett. 2015, 22, 2489–2493. [Google Scholar] [CrossRef]
Sundberg, J. The Science of the Singing Voice; Northern Illinois University Press: Sycamore, IL, USA, 1987. [Google Scholar]
James, B.; Anderson, B.D.O.; Williamson, R.C. Characterization of Threshold for Single Tone Maximum Likelihood Frequency Estimation. IEEE Trans. Signal Process. 1995, 43, 817–821. [Google Scholar] [CrossRef]
Kay, S.M. Fundamentals of Statistical Signal Processing Estimation Theory; Prentice Hall, Inc.: Upper Saddle River, NJ, USA, 1993. [Google Scholar]
Ventura, J.; Sousa, R.; Ferreira, A. Accurate analysis and visual feedback of vibrato in singing. In Proceedings of the 2012 5th International Symposium on Communications, Control and Signal Processing, Rome, Italy, 2–4 May 2012; pp. 1–6. [Google Scholar] [CrossRef]
Moore, B.C.J. An Introduction to the Psychology of Hearing; Academic Press: Cambridge, MA, USA, 1989. [Google Scholar]
Rabiner, L.; Juang, B.H. Fundamentals of Speech Recognition; Prentice-Hall, Inc.: Upper Saddle River, NJ, USA, 1993. [Google Scholar]
Serra, X. A System for Sound Analysis/Transformation/Synthesis Based on a Deterministic Plus Stochastic Decomposition. Ph.D. Thesis, Stanford University, Stanford, CA, USA, 1989. [Google Scholar]
Abe, M.; Smith, J. Design Criteria for simple sinusoidal parameter estimation based on quadratic interpolation of the FFT magnitude peaks. In Proceedings of the 117th Convention of the Audio Engineering Society, San Francisco, CA, USA, 28–31 October 2004; p. 6256. [Google Scholar]
Harris, F.J. On the use of windows for harmonic analysis with the Discrete Fourier Transform. Proc. IEEE 1978, 66, 51–83. [Google Scholar] [CrossRef]
Oppenheim, A.V.; Schafer, R.W. Discrete-Time Signal Processing; Pearson Higher Education, Inc.: New York City, NY, USA, 2010. [Google Scholar]
Bradenburg, K.; Stoll, G. The ISO-MPEG Audio Codec: A Generic-Standard for Coding of High Quality Digital Audio. In Proceedings of the 92nd AES Convention, San Francisco, CA, USA, 1–4 October 1992; p. 3336. [Google Scholar]
Bosi, M.; Brandenburg, K.; Quackenbush, S.; Fielder, L.; Akagiri, K.; Fuchs, H.; Dietz, M. ISO/IEC MPEG-2 Advanced Audio Coding. J. Audio Eng. Soc. 1996, 45, 789–814. [Google Scholar]
Vaidyanathan, P.P. Multirate Systems and Filter Banks; Prentice-Hall, Inc.: Upper Saddle River, NJ, USA, 1993. [Google Scholar]
Malvar, H. Signal Processing with Lapped Transforms; Artech House, Inc.: London, UK, 1992. [Google Scholar]
Princen, J.P.; Johnson, A.W.; Bradley, A.B. Subband/Transform Coding Using Filter Bank Designs Based on Time Domain Alias Cancellation. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Dallas, TX, USA, 6–9 April 1987; pp. 2161–2164. [Google Scholar]
Duda, K.; Barczentewicz, S. Interpolated DFT for sin^α(x) Windows. IEEE Trans. Instrum. Meas. 2014, 63, 754–760. [Google Scholar] [CrossRef]
Oppenheim, A.V.; Schafer, R.W.; Buck, J.R. Discrete-Time Signal Processing, 2nd ed.; Prentice-Hall Inc.: Upper Saddle River, NJ, USA, 1998. [Google Scholar]
Candan, C. A method for fine resolution frequency estimation from three DFT samples. IEEE Signal Process. Lett. 2011, 18, 351–354. [Google Scholar] [CrossRef]
Ferreira, A.; Silva, J.; Brito, F.; Sinha, D. Impact of a shift-invariant harmonic phase model in fully parametric harmonic voice representation and time/frequency synthesis. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Barcelona, Spain, 4–8 May 2020. [Google Scholar]

Figure 1. Illustrative spectrogram of two FM sinusoids in close proximity (a) and corresponding accurate frequency estimation (brown and blue stars in b plot). The ground truth frequency contours are represented by the green solid line in (b).

Figure 2. Illustration of three severity levels of quasi-harmonic interference affecting the estimation of a target sinusoid frequency: no interference (top panel), and mild and strong quasi-harmonic interference, when the target sinusoid frequency corresponds approximately to the second (middle panel) and fourth (bottom panel) harmonic of an existing quasi-harmonic structure, respectively. The gray/black areas denote that the quasi-harmonic interference is AM/FM-modulated and the

α

index sets the maximum deviation in magnitude and fundamental frequency.

Figure 3. Magnitude frequency response of the rectangular window (dotted line), of the sine window (dashed line), of the Hanning window (solid line), and of the Gaussian window (dash-dotted line). The abscissae axis can also be read as a DFT-bin scale.

Figure 4. Illustration of the projection of a target sinusoid whose frequency is

ω_{ℓ} = \frac{2 π}{N} (ℓ + Δ_{ℓ})

on the frequency response of three adjacent ODFT sub-bands (or channels): channel

k = ℓ - 1

(dash-dotted line), channel

k = ℓ

(solid line), and channel

k = ℓ + 1

(dashed line). As

0.0 \leq Δ_{ℓ} < 1.0

, the magnitude of the

ℓ th

ODFT channel is a local maximum.

Figure 5. Illustrative spectrogram of a short excerpt of singing containing vibrato (a) and corresponding accurate frequency estimation of the frequencies of the first 11 harmonics ((b) plot). The different colors in (b) represent the frequency trajectories of individual harmonics.

Figure 6. Estimation error (as % of the DFT bin width) for representative frequency estimators in the absence of noise and harmonic interference. Results are presented for a single real-valued sinusoid, when

N = 512

, and when

ℓ = N / 4

. Fractional Frequency denotes

Δ_{ℓ}

.

Figure 7. Bin width-normalized systematic bias for all tested frequency estimators as a function of the SNR and in absence of harmonic interference.

Figure 8. Bin width-normalized RMSE for all tested frequency estimators as a function of the SNR and in absence of harmonic interference. The CRLB is also represented as a reference.

Figure 9. Bin width-normalized systematic bias for all tested frequency estimators as a function of the SNR and under mild quasi-harmonic interference conditions.

Figure 10. Bin width-normalized RMSE for all tested frequency estimators as a function of the SNR and under mild quasi-harmonic interference conditions. The CRLB is also represented as a reference.

Figure 11. Bin width-normalized systematic bias for all tested frequency estimators as a function of the SNR and under strong quasi-harmonic interference conditions.

Figure 12. Bin width-normalized RMSE for all tested frequency estimators as a function of the SNR and under strong quasi-harmonic interference conditions. The CRLB is also represented as a reference.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

One-Step Discrete Fourier Transform-Based Sinusoid Frequency Estimation under Full-Bandwidth Quasi-Harmonic Interference

Abstract

1. Introduction

2. DFT-Based Frequency Estimation

2.1. The Estimation Problem

2.2. Estimation Constraints

2.3. Degrees of Harmonic Interference and Test Settings

2.4. Windows, Selectivity and Leakage

2.5. Performance Criterion

3. Tested Frequency Estimators

3.1. Rectangular Window-Based Estimators

3.2. Hanning Window-Based Estimators

3.3. Sine Window-Based Estimators

4. Test Results and Main Conclusions

4.1. Results with No Harmonic Interference

4.2. Results with Mild Quasi-Harmonic Interference

4.3. Results with Strong Quasi-Harmonic Interference

4.4. Main Conclusions

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Article Metrics

Citations

Article Access Statistics