Deep Learning versus Spectral Techniques for Frequency Esti- mation of Single-Tones: Reduced Complexity for SDR and IoT Sensor Communications

Despite the increasing role of machine learning in various fields, very few works considered artificial intelligence for frequency estimation (FE). This work presents a comprehensive analysis of deep-learning (DL) approach for frequency estimation of single-tones. It is shown that DL network with two layers having a few nodes can estimate frequency more accurately than wellknown classical techniques. The study is comprehensive, filling gaps of existing works, where it analyzes error under different signal-to-noise ratios, numbers of nodes, and numbers of input samples; also, under missing SNR information. It is found that DL-based FE is not significantly affected by SNR bias or number of nodes. DL-based approach can work properly using minimal number of input nodes N at which classical methods fail. It is possible for DL to use as little as two layers with two or three nodes each, with complexity of O{N} versus O{Nlog2 (N)} for DFT-based FE, noting that less N is required for DL. Hence, DL can significantly reduce FE complexity, memory, cost, and power consumption, making DL-based FE attractive for resource-limited systems like some IoT sensor applications. Also, reduced complexity opens the door for hardware-efficient implementation using short word-length (SWL) or time-efficient software-defined radio (SDR) communications.


Introduction
Estimating the frequency of a single-tone sinusoidal wave under noise has been a fundamental problem in signal processing for communications, and its effect extends to biomedical and power engineering [1,2]. Several classical methods have been proposed to estimate the frequency of a single-tone sinusoid, mostly based on Fourier analysis [3]. Correlation methods can be very attractive for hardware implementation as well as for IoT, sensors, and software-defined ratio (SDR) as they are much more computationally efficient than spectral techniques; however, those techniques are far less accurate than Fourier-based techniques, especially under low values of signal-to-noise ratio (SNR) [4]. Phase-locked loops (PLLs) are widely used in communication systems to handle this problem; however, PLLs can be slower that spectral or correlative techniques as they need time for locking [5]. More samples are needed in digital PLLs (DPLLs) if more realizations are considered for better results, especially under low SNR where noise disturbs the locking process significantly.
Recently machine learning has been emerging as a powerful tool for various tasks in many fields including communications and signal processing [6][7][8]. Despite the increased attention to machine learning, very little works considered their application to handle the problem of frequency estimation. In [9], an approach based on deep learning for singletone frequency estimation under noise has been proposed. Unlike Fourier-based, PLL, or correlative techniques, DL-based approach requires the prior knowledge of SNR; however, SNR estimation can be in error under varying noise conditions, therefore many studies, including [9], assume that SNR is constant during the period of frequency estimation.
In this work we will further investigate the performance DL for frequency estimation under insufficient SNR information. On the other hand, the effect of the number of nodes has also been studied, where it is found that no significant change in performance is obtained after increasing the number of nodes in hidden layers. The contribution of this work involves the following directions: • Comparing the performance of DL-based approach with classical techniques. • Investigating the system performance in case of unavailable SNR information. • Investigating the system performance under various SNR's.
• Investigating the effect of changing the number of nodes in the hidden layers of the network. • Investigating the effect of the input signal length (duration) on the performance of both classical and DL-based methods. This factor has a significant impact on the overall performance and complexity. • Investigating the effect of different realizations during the application phase (no only during the training phase). • Investigating the impact of DL-based FE on IoT, sensors, sensor networks, and software-defined radio (SDR). This paper is organized as follows: Section 2 presents related works; Section 3 presents the signal model and the problem definition; Section 4 reviews the most-effective classical techniques; while Section 5 addresses the yet-unhandled issues of the DL-approach and presents the results in a comparative study that reveals the power of DL versus classical frequency estimation techniques.

Motivation and Related Work
The problem of frequency estimation (FE) has often been handled using the classical Fourier and correlative techniques. However, some recent works have handled this problem by neural networks and deep learning. Good accuracy has been obtained, hence, the use of DL for frequency estimation is promising. Reference [9] designed a three-layer neural network to estimate the frequency of a sinusoid contaminated by white noise process at a signal-to-noise ratio (SNR) of 25 dB, where the trained model can estimate frequency of any previously-unseen noisy sinusoid in short time. However, this work didn't address some important issues including comparisons with classical techniques, the system performance under wrong SNR estimation, the system performance under various values of SNR, and the effect of the number of nodes.
Reference [10] presents an approach based on deep neural networks to estimate the fundamental frequency (F0) which is an essential acoustical feature to find the audible pitch level. This problem has been classically dealt with using time-domain or correlative methods, however, the work in [5] applied a different direction via deep learning, where error was reduced as compared with classical techniques.
Reference [11] presented an approach for estimating the frequency of linear frequency modulation (LFM) signals, a topic that has applications in radar and communication engineering. The approach utilized convolutional neural network (CNN), while this problem has been traditionally handled using time-frequency analysis [12], however, this approach requires significant computational cost as it involves two-dimensional transforms.

Problem Statement
The single-tone sinusoidal generic model is given by: where ( ) is the amplitude, normally has less frequency content than the sinusoidal frequency (i.e., slowly-varying); and is the initial phase. As a carrier for information signals in communication systems via modulation, the sinusoidal carrier has normally a constant amplitude, while the receiving device can apply amplitude estimation and a constant-gain multiplier to restore the original amplitude. Therefore, we consider here the sinusoidal model: with constant amplitude .
The signal in Equation (2) is transmitted for a short time at the start of any communication between two terminals to allow the receiver to estimate the carrier frequency. However, such single-tone signal can undergo frequency change due to Doppler effect. Even minor change in carrier can cause demodulation problems. Performance of OFDM systems can deteriorate for improper carrier estimation [1]. Hence, accurate frequency estimation of the incoming carrier is very important for correct demodulation at the receiver. However, communication channels add noise to transmitted signals, making frequency estimation more challenging task. Hence, the actual carrier model used in frequency estimation will be as follows: where ( ) represents the noise process. In most communication systems, noise is modelled as Gaussian (normal) process with zero-mean and variance that equals the noise power, 2 , and the process is normally referred to as (0, 2 ).
The signal-to-noise ratio (SNR) is defined as follows: and it is expressed usually in decibels: SNRdB = dB = 10 log 10 ( ), (5) which is more convenient to handle large and small values logarithmically.
This work handles the problem of estimating the frequency of the single-tone model in Equation (3) under Gaussian noise ( ). Our approach is to use a deep-learning (DL) neural network with multiple layers in this process, following recent works that confirmed the high accuracy of this approach [9][10][11]. Despite the pioneering role of these works, they haven't addressed many fundamental issues in DL frequency estimation. We will focus on clarifying these issues as follows: 1.
Performance of the DL-approach is compared with the performance of the stillactive classical techniques that are based on Fourier analysis.

2.
In many situations SNR estimation can be inaccurate or unavailable. Hence, the system performance is investigated in case of unavailable SNR information.

3.
The DL-approach is SNR-dependent, hence an investigation of the system performance under various SNR's is presented.

4.
It is expected that the more nodes in the DL approach the better is the accuracy of estimation. Hence, this work investigated the effect of changing the number of nodes in the hidden layers of the network. 5.
The number of input samples (signal length or duration) has a significant impact on the complexity and the performance of classical and DL-based methods. This point has been fully investigated in this work. 6.
The effect of different realizations while training has been handled in the literature of DL-based approaches, as it is a necessary step in the training process. However, the possibility of different realizations in the working environment (application phase) has not been handled previously. In this work we discuss the effect of different realizations during the application phase. 7.
The reduced complexity introduced by DL-based FE in addition to avoiding complex-valued arithmetic will make FE easier and cheaper for IoT communications, sensors, sensor networks, and software-defined radio (SDR). This work presents discussions on such possibilities.
The next section presents an overview of the main well-known approaches in classical FE for single-tone signals.

A Brief Overview of Classical FE for Single-Tones
Literature surveys of classical frequency estimation are presented in [2], [13] and [14]. In [1], an iterative approach for single-tone frequency estimation is presented; however, despite their benefits in accuracy, such approaches can be computationally expensive or complex. In [15,16,17] and the References therein, good DFT-interpolators have been presented, however, despite some accuracy gains, they are computationally more expensive that earlier methods in the literature. Reduced complexity is a property that can be very useful for some resource-limited systems like wireless sensor networks [18]. Complexityaccuracy-speed tradeoff is the selection criterion in this work. Here, a brief overview of the most accurate, computationally-efficient classical approaches is presented. All accurate classical FE methods are based on using the Discrete Fourier Transform (DFT) of the signal: is the number of input samples. Various methods have been designed for FE based on interpolating the DFT of the sinusoid. The most accurate and computationallyefficient methods under additive Gaussian noise are as follows.

The Maximum of DFT Estimator
This method is based on the location (index) of the maximum of the absolute of the DFT, given by: since the sinusoid gives a delta spectrum. The estimated frequency is given by: is the sampling frequency. Error of estimation at SNR = r (dB) is defined in this paper as: and the above definition of error is used to evaluate the performance of DL approach versus existing methods. Another performance criterion is the computational complexity of the method.

The Quadratic Interpolator
This approach designs a quadratic interpolation for the Fourier Transform (DFT) using three points near the index of the maximum of |DFT|, i.e., the following points in the DFT-domain: is given in Equation (7). The estimated frequency will be given by:

The Barycentric Estimator
This estimator is similar to the Quadratic Estimator, with the estimated frequency given by: where: . During this work, the Barycentric Estimator proved to give the best results in terms of error among the selected classical techniques.
The next section presents the structure and training details of the deep learning network that will be adopted for FE of single-tone signals.

Deep Learning for Single-Tone FE: Network Structure and Training
In this section the structure of the DL-approach for frequency estimation of singletones is analyzed and justified. Please note that in DL-based approach, complexity only exists during the training phase, not in the actual network structure. The significant reduction in complexity can make DL-approach for FE suitable for IoT communications involving wireless sensor networks (WSNs).

DL Network Structure
A deep neural network with two hidden layers is shown in Figure 1, where the arrows represent multiplication of relevant items by specific weights and adding biases, using the MATLAB tansig (or tanh) function as the pointwise activation function in hidden layers as follows: The reason for using tanh activation for hidden layers is to prevent results from expanding in magnitude and normalizing them in the range [-1,1]; also, both positive and negative values are needed for proper adjustment of the network, unlike the case in the output layer, where a single output (a decision) should be obtained using the positive linear activation function (ReLU) to give a value proportional to the input frequency as follows: where ReLU is used because the actual value of the frequency is needed, knowing that the frequency is always positive. The operation of the 2-layer network in Figure 1 can be expressed as follows: where is the input vector, , , and are the weight matrices of the first hidden, second hidden and output layers; , , , are the numbers of nodes of the input, first hidden, and second hidden and layers; while , , are the respective biases of the first hidden, second hidden and output layers. The number of nodes in the output layer is = 1, as a scalar is needed to estimate the input frequency. Bold-type is used for matrices and vector functions, while plain-text symbols for scalar functions and variables. The activation functions handle the relevant vectors point-wise as shown in Figure 1, while the output variable is a scalar that should represent the input frequency. The minimum number of layers L for successful extraction of the input features is = 2. Larger may improve the performance a bit, however, increases the complexity.

Training Data
The training data are samples that are simulated using the model in Equation (3) on MATLAB (under Academic License 40635944). The number and magnitude of these samples are dependent on the signal duration ( ), sampling frequency , and the signal-tonoise ratio (SNR). Data samples for each and SNR have been divided into three groups: 70% for training, 15% for testing, and 15% for validation. The training function of the DLnetwork is chosen as MATLAB trainscg function, which updates weight and bias values according to the scaled conjugate gradient method, with learning rate of = 0.1 and number of epochs = 100.
As changes in carrier frequency are expected to be small (mostly due to Doppler Effect), a small frequency range of low frequencies [23-25] Hz is considered in this experiment, which is suitable for many applications in bio-sensor communications and IoT. Higher ranges are possible, but require more training samples , hence, more complex network; for example, handling a maximum frequency of 100kHz may require = 2000 input nodes [9]. The selected frequency range has been divided with a frequency step of = δ/ , with δ = 25 − 23 = 2 and = 40 in the simulations. Of course, more accuracy can be obtained with larger , however, this can be at the expense of more nodes in the network. Note that this frequency step is time-independent, hence, it is a design problem, unlike the frequency step Δ = 1/ in DFT-based techniques, which is time-dependent and follows the Heisenberg Uncertainty Principle in the time-frequency domain [19].
For each incremental frequency { = 23 + • | = 0: } in the selected frequency range, a number of realizations = 100 is generated so that the training vector is: with labels: where is the × 1 column that represents the th realization of Equation (3) with a frequency under the same SNR. Hence, there is a total of + 1 classes. A larger number of realizations during the training phase implies better training.

Results: Deep Learning vs. Classical Single-Tone FE
In this section we analyze the performance of the DL-approach for frequency estimation of single-tones, in comparison with computationally-efficient classical techniques.
For each SNR, a different network is trained to recognize the frequencies in the specified range. As SNR may change in communication systems, this result may imply a complicated set of DL-based estimators is necessary for a proper functioning. It will shortly be shown that this complexity is not necessary. The number of signal samples used in the design of the estimation system has a significant impact on the performance of FE, both in DFT-based and DL-based techniques. Hence, a larger signal duration (allowed by the estimation system design) will give larger = / , hence better estimation. Still better results can be obtained via more realizations. During the training phase, a larger number of realizations r implies better training. In the testing phase (actual working Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 26 February 2021 doi:10.20944/preprints202102.0622.v1 environment), obtaining a number of different realizations is possible via dividing the input samples into sets of specific length that equals the signal length which was selected during the training part (i.e., the number of input nodes to the trained DL network). Hence, for a better frequency estimation, the transmitter should initially send the carrier single-tone for sufficient sending-duration that includes a multiple of the trainingduration (whose sample length is = / ), i.e., = or = / . The final frequency estimate will be the average of estimates over realizations.

Performance under Different Input Lengths
First, we consider comparing the performance of the DL-based FE with the selected classical DFT-based techniques for different values of the training input size ( ). Figure 2 shows the performance of DL method with = = 5 versus selected classical techniques. The sinusoidal frequency is selected as = 24.41 Hz. It is shown that for small , the classical DFT-based approach fails to correctly estimate the input frequency. The number of realizations during the training phase is selected as = 100, while during the application (working) phase the number of realizations is selected as = 30. Figure 3 shows the performance of the DL-based approach versus the best of the selected DFT-based approaches, the Barycentric approach, for different input lengths, while Figure 4 shows the performance of the DL-based approach versus the Barycentric approach with fewer nodes in the network such that = = 3. No significant difference is noticed as compared to Figure 3.

Performance under Insufficient SNR Information
SNR estimation is necessary for many purposes in communication systems, and it is needed for DL frequency estimation, as training is performed under a specific SNR. However, in some systems (e.g., some sensor networks), limited computational capabilities may not allow SNR estimation, except probably for a rough estimate. Knowing the exact value of SNR will not cause the performance of DL estimation to deteriorate, where only minor changes in the performance are noticed, as shown in Figure 5. Hence, the same network can be used for FE under different SNR's. The relative deviation of DL performance for FE, averaged over M values of SNR, is defined as: where ( ) and ( ) are the average errors at the th SNR value (dB) under normal operation (correct SNR Information) and biased SNR, respectively. Figure 6 shows the relative deviation when = = 5. Non-symmetric deviation is expected for positive and negative bias due to non-symmetry of the FE error performance curves (see Figures 1-4).

Figure 6.
Average performance deviation of DL-based FE versus SNR bias.

Effect of Increasing Nodes at Hidden Layers
Generally, no significant performance difference is found when increasing the number of nodes in the hidden layers as shown in Figure 7, where performance comparison versus barycentric method is presented using DL's with = = 5 and = = 3.

Impact on IoT, Sensors, and SDR
Reduced complexity can lead to efficient implementation of FE since DL does not need general-purpose digital multipliers if implemented as a multi-bit system, where trained weights can be implemented using look-up tables.
On the other hand, cheap and hardware-efficient implementation using short wordlength (SWL) is possible due to reduced complexity, where complicated digital multipliers are replaced by simple multiplexers.
Another hardware gain is due to the fact that DFT techniques require the use of complex-valued arithmetic, while DL systems use only real-valued arithmetic. It is wellknown that the implementation of large size DFT is a very complicated task for both hardware and software designs, despite the significant efficiency provided via the latest versions of the Fast Fourier Transform (FFT) algorithm, which reduces the number of (complex) multiplications from { 2 } to { log 2 ( )} [20].
From this work and the works in References [9][10][11], it can be inferred that is proportional to the input frequency ; hence, for high frequencies the size can be large if DL is used, and much larger for classical FE if the same accuracy as DL is required, calling for very complicated DFT implementation in the complex-valued domain.
Implementation of FFT via parallel computing algorithms can alleviate the time cost [20], however, this approach is very expensive, power-consuming, and hardware-costly, which may not be suitable for many systems, especially in the emerging IoT sensor applications.
Large-scale sensor applications require tiny, low-power, low-cost implementation, like sensors used for healthcare monitoring, early detection and monitoring forest fires, natural disaster detection, and large-scale surveillance for security applications.
Complicated, complex-valued operations not only increase the system size and complexity, but also increase the power consumption, which is a crucial factor in sensors deployed in WSNs, as the life-time of these sensors is limited by the irreplaceable battery life [18].
For DL-based approach, the number of (real) multiplications is mostly at the input stage, which is , while other node multiplications are normally as low as 6 real-valued multiplications (which are trivial operations, as they use look-up tables instead of digital multipliers); hence, the overall number of real-valued multiplications is { }.
Whether implemented using SWL or multibit technologies, this reduced complexity (versus the complicated, complex-valued classical FE techniques) makes DL-based approach the most suitable FE technology in sensors and sensor networks, which are inherently liked to modern IoT applications. Similar gains are obtained if implementation is performed via software, where reduced complexity leads to significant time-gain, a factor that is attractive for software-defined radio (SDR) communications.

A Call for Future Direction: DL in the SWL Domain
The Authors are currently investigating the possible implementation of the DL-based FE using short word-length (SWL), where single-bit and ternary technologies are to be considered. As the training phase can be performed offline, it is expected that training could be accomplished using multi-bit computer simulation; however, the training data should be converted into the SWL format. In this case, a new factor will emerge, which is the quantization noise, and this can be handled by better noise-shaping via a proper selection of the oversampling ratio (OSR), the quantization process, and filter design of the sigma-delta modulator (SDM). As adaptivity is now performed offline, this DL design is expected to be even less complicated than the design of adaptivity in SWL system, first introduced in [21] as a result of SWL research supported by the Australian Research Council via Discovery Grant DP0557429. It is expected that successful SWL implementation of DL-based FE would transform many other resource-limited IoT and sensor applications after the advent of DL techniques into the SWL domain.

Conclusions
This paper presented a comprehensive analysis of the performance of deep-learning based approach for frequency estimation of a single-tone sinusoid. The work fills the gaps of unhandled issues in the recently-proposed methods based on deep learning (DL). The paper presented analysis of the frequency estimation error under a range of signal-tonoise ratios (SNRs). It is shown that the number of hidden layers can be as small as two, each with two or three nodes. The number of input samples (nodes) can be very small as compared to DFT-based classical methods, with much better error performance. In addition, it is shown that performance is not degraded significantly in case of wrong information about the actual value of SNR. These results can reduce the complexity, power consumption, and cost of the communication system, properties that can be very useful for systems with limited memory and computational capabilities like wireless sensor networks (WSNs) that are inherently linked to current and future applications on the Internet of Things (IoT). Reduced complexity without using complex-valued arithmetic also makes the DL technology for frequency estimation suitable for software-defined radio (SDR). A call for exploiting this reduced complexity to introduce DL in the short word-length (SWL) domain has also been presented as a future direction.
Author Contributions: The Authors contributed equally to this work. The first Author, Ms. Hind Almayyali contributed to initial tasks of this work during her postgraduate preparatory year of coursework in 2019, where she got the top-rank of High Distinction before she started her research project on deep learning for sensor applications.
Funding: This project is partially funded by Edith Cowan University via the ASPIRE Program.
Data Availability Statement: All types of data have been generated using mathematical equations.